Analyzing Office 365 Photo EXIF data using Power BI

In an earlier post, I wrote about how to mine the OCR data from Office 365 document folders using Power BI. During the research for that post I discovered that the photo EXIF data is also available for analysis from Office 365. I’ll show you how to get to it in this post.

What is EXIF data anyway?

Think of EXIF as special metadata for photos and with today’s cameras, records the settings of the camera as well as the location it was taken, if the camera has GPS. You can read more about the format here: https://photographylife.com/what-is-exif-data  Many smart phones today automatically include the EXIF data when you take photos with your phone. This data is automatically extracted by Office 365 from photos when uploaded to a document library.

Why do I care about this?

Imagine you work for a company that does inspections. Your inspectors use their phone to take photos of issues already. Wouldn’t it be great to show the photos by location and related location data together on a report page? This technique allows you to easily mine that EXIF data.

How do I get to it?

First, upload your images to a SharePoint document folder in Office 365 and the OCR process will initiate  automatically. I’ve had it take up to 15 minutes to process so you may need to be patient. You can do this via SharePoint Mobile app if you are uploading mobile photos.

Second, from Power BI desktop, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the correct column that contains the EXIF data. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

In this video, I’ll show you how to access the EXIF data and what you can do with the data.

Here’s the Power Query M Code

let
Source = SharePoint.Tables(“https://YourInstance/YourSite”, [ApiVersion = 15]),
#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″ = Source{[Id=”88888888-8888-8888-8888-888888888888″]}[Items],
#”Renamed Columns” = Table.RenameColumns(#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″,{{“ID”, “ID.1”}}),
#”Filtered Rows” = Table.SelectRows(#”Renamed Columns”, each [MediaServiceAutoTags] <> null and [Created] >= #datetime(2019, 4, 19, 0, 0, 0)),
#”Expanded FirstUniqueAncestorSecurableObject” = Table.ExpandRecordColumn(#”Filtered Rows”, “FirstUniqueAncestorSecurableObject”, {“Url”}, {“FirstUniqueAncestorSecurableObject.Url”}),
#”Removed Other Columns” = Table.SelectColumns(#”Expanded FirstUniqueAncestorSecurableObject”,{“FirstUniqueAncestorSecurableObject.Url”, “File”, “Properties”}),
#”Expanded File” = Table.ExpandRecordColumn(#”Removed Other Columns”, “File”, {“Name”, “ServerRelativeUrl”, “TimeCreated”, “TimeLastModified”}, {“File.Name”, “File.ServerRelativeUrl”, “File.TimeCreated”, “File.TimeLastModified”}),
#”Merged Columns” = Table.CombineColumns(#”Expanded File”,{“FirstUniqueAncestorSecurableObject.Url”, “File.ServerRelativeUrl”},Combiner.CombineTextByDelimiter(“”, QuoteStyle.None),”Picture URL”),
#”Expanded Properties” = Table.ExpandRecordColumn(#”Merged Columns”, “Properties”, {“vti_mediaservicemetadata”}, {“Properties.vti_mediaservicemetadata”}),
#”Parsed JSON” = Table.TransformColumns(#”Expanded Properties”,{{“Properties.vti_mediaservicemetadata”, Json.Document}}),
#”Expanded Properties.vti_mediaservicemetadata” = Table.ExpandRecordColumn(#”Parsed JSON”, “Properties.vti_mediaservicemetadata”, {“address”, “location”}, {“Properties.vti_mediaservicemetadata.address”, “Properties.vti_mediaservicemetadata.location”}),
#”Expanded Properties.vti_mediaservicemetadata.location” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata”, “Properties.vti_mediaservicemetadata.location”, {“altitude”, “latitude”, “longitude”}, {“Properties.vti_mediaservicemetadata.location.altitude”, “Properties.vti_mediaservicemetadata.location.latitude”, “Properties.vti_mediaservicemetadata.location.longitude”}),
#”Expanded Properties.vti_mediaservicemetadata.address” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata.location”, “Properties.vti_mediaservicemetadata.address”, {“City”, “State”, “Country”, “EnglishAddress”}, {“Properties.vti_mediaservicemetadata.address.City”, “Properties.vti_mediaservicemetadata.address.State”, “Properties.vti_mediaservicemetadata.address.Country”, “Properties.vti_mediaservicemetadata.address.EnglishAddress”}),
#”Changed Type” = Table.TransformColumnTypes(#”Expanded Properties.vti_mediaservicemetadata.address”,{{“File.Name”, type text}, {“Picture URL”, type text}, {“File.TimeCreated”, type datetime}, {“File.TimeLastModified”, type datetime}, {“Properties.vti_mediaservicemetadata.address.City”, type text}, {“Properties.vti_mediaservicemetadata.address.State”, type text}, {“Properties.vti_mediaservicemetadata.address.Country”, type text}, {“Properties.vti_mediaservicemetadata.address.EnglishAddress”, type text}, {“Properties.vti_mediaservicemetadata.location.altitude”, type number}, {“Properties.vti_mediaservicemetadata.location.latitude”, type number}, {“Properties.vti_mediaservicemetadata.location.longitude”, type number}})
in
#”Changed Type”

Analyzing Office 365 OCR Data using Power BI

I’m so excited to see Optical Character Recognition (OCR) working natively now in Office 365! I got my start in government where we scanned a lot of documents. There was a lot of data locked away in those images but no easy way to mine it. OCR was explored as a possible solution, but it was still in its infancy and not very accurate or scalable.

Fast forward to today and now OCR is simply part of our cloud infrastructure and with the assistance of Machine Learning, very accurate. The dream of finally unlocking that data is here! Or is it? Read on to find out more.

By the way, we’ll be covering this and more in our session at The SharePoint Conference, May 21-23 in Las Vegas! Use discount code GATTE to get a discount. https://sharepointna.com

Intelligent Search and Unlocking Your Image Data

SharePoint’s ability to do native OCR processes was first shown at Ignite 2017. There, Naomi Moneypenny did a presentation on Personalized intelligent search across Microsoft 365, where you saw Office 365 OCR in action. https://techcommunity.microsoft.com/t5/Microsoft-Search-Blog/Find-what-you-want-discover-what-you-need-with-personalized/ba-p/109590

She uploaded an image of a receipt and was able to search for it, based on the contents of the receipt image. It was a seamless demo of how Office 365 can intelligently mine the data in image files.

Now, let’s see if we can access the OCR data across multiple files and do something with it.

In the following procedure, I’ll show you how to connect to Office 365 and get to the data that the OCR process returns. In the following post, I’ll show you how to process receipt data to get the total amount spent, solely from the OCR data.

Process Overview

There are three steps to the process to start mining your OCR data. First, you have to add image content that contains text to a SharePoint folder.

The process of getting OCR data

Finding OCR Data

The OCR process that runs against the files in a SharePoint document folder are called Media Services. All derived data is stored in columns that contain Media Services in them.

Unfortunately, I’ve discovered that this feature has not been implemented consistently across the Shared Documents folder, custom folders and OneDrive. There is good news in that there’s a less obvious way to get to the data consistently across all three, using Properties. As shown below, you see the normal column names and where they appear. Only the ones in Properties appear consistently across all. We are only going to cover the basic information but the Properties collection has a lot more data in which to consume.

Audit of which media service fields are available where in Office 365

Adding Image Content to a SharePoint Document Folder

When you upload an image to a SharePoint document folder in Office 365, the OCR process kicks off automatically. I’ve had it take up to 15 minutes but the OCR process will analyze the image for text and return the text in a field called MediaServiceOCR if present and always in Properties.vti_mediaserviceocr.

These columns contain any text that was recognized in the graphics file. The current structure of the returned data is a bit different that what is in the source image. Each instance of the discovered text is returned on a separate line, using a Line Feed character as a delimiter. For example, if you had a two-column table of Term and Meaning, it would return the data like this:

Term

Meaning

Term

Meaning

Original uploaded image
Data returned by media services

While it’s great you can get to the data, the current returned format makes it exceptionally complex to reconstitute the context of the data. Also, the more complex your layout, the more “interesting” your transformations may need to be. I’d strongly recommend this post (https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi/) and this post (https://blog.crossjoin.co.uk/2017/12/14/removing-punctuation-from-text-with-the-text-select-m-function-in-power-bi-power-query-excel-gettransform/ ) to give you the basics of text parsing in Power Query M.

Accessing the OCR Data in Power BI

The OCR columns are item level columns. The normal tendency would be to connect to your SharePoint site using the Power BI SharePoint Folder connector. You’ll be disappointed to find that the Media Services columns aren’t there.

Instead, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the Media Services columns. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

Let’s walk through how to access the data and manipulate it using Power BI. In this scenario, I have two receipts that have been uploaded in a document folder and I’m going to get the total spent on these receipts by analyzing the OCR data.

What about OneDrive for Business?

Yes, it works there too! The Media Service property fields are here as well. In fact, you get more information in an additional column called MediaServicesLocation. Based on my usage, it seems to be specifically populated for image files. If the image contains EXIF data, the MediaServicesLocation will contain the Country, State/Province, and City information of where it was created. Within the Properties collection, you can actually get more detailed information about the photo, like the type of camera that took it and more.

To connect to OneDrive where this will work, you need your OneDrive URL. I normally right-click on the OneDrive base folder in File Explorer and select View Online, as shown below.

Select View Online to get to the OneDrive url needed for Power BI connection

Potential for GDPR Issues

One aspect to consider if you look to do this is a production manner in Europe is that you will likely encounter information that falls under GDPR regulation. Consider this your prompt to think about how this capability would fit into your overall GDPR strategy.

Want a copy of the Power BI model?

Fill out the form below and it will emailed to you automatically, by virtue of the magic of Office Forms, Flow, and Exchange!

I hope you liked this post. If you have comments or questions, post them below in the comments.

New SharePoint Modern Power BI Theme

If you are creating Power BI content that you are embedding into a SharePoint Modern page, you know that the default Power BI theme colors don’t match SharePoint’s colors. Not to worry, we’ve got you covered. 

Attached to this post and also cross-posted to the https://Community.PowerBI.com Theme Gallery is the Tumble Road SharePoint Modern theme. This theme uses the core Modern experience colors in SharePoint, assuring your Power BI content will look great when embedded within SharePoint.

Download the zip file here.

Power BI Dashboards in One Day for SharePoint / Office 365

Want to learn how to build this kind of dashboard over SharePoint data?”

Yes you can! The big difference between our class and ones you’ll find elsewhere is that we go beyond the tool knowledge to show you:

  • How should I approach my BI need?
  • What things should be considered when creating dashboards?
  • What are the best practices?
  • How much is this going to cost me to license?
  • What security aspects should concern me?

Why you should take this course

To save time and money, immediately and in the long run. Since we’ve been in your shoes before, we also discuss topics related to maintenance, implementation, security and design best practices.

According to Forrester, half of a department’s annual reporting needs are ad hoc, which works out to be about 50 reports a year. They also added that the fully loaded cost of an individual report is $3200-$6100 per report. Each report can eat up up to 32 hours of time to develop and test. That’s hundreds of thousands of dollars spent in throwaway report investment.

Learn a faster, cheaper way to explore your data. Power BI is well suited to enable quick ad hoc reporting where you spend the majority of the time developing the data set, rather than the individual report. This helps you save money and reduces the cost of developing ad hoc reports.

Who should take this course

If your company uses Office 365 or SharePoint 2013/2016 and you are a:

  • Manager
  • Business Analyst
  • Power User
  • SharePoint Administrator
  • Consultant

You need this class if you are expected to use data from Office 365 and SharePoint to do your job effectively. The class assumes no prior knowledge of Power BI and Data Concepts

What is required for this class

  • Install Microsoft Power BI Desktop
  • We’ll provide a Office 365 site for you to use for exercises

What you will learn

In this training course you will learn how to create Production Dashboards using Power BI, from SharePoint list and document library data.

You will learn how to use Power BI in a multitude of situations, including ad hoc analysis and the creation of formal dashboard. You will learn about the Power BI components: Power BI Desktop, PowerBI.com, Power BI mobile applications and how they can be used with Office 365 and SharePoint

You’ll also receive an introduction into the core functions of Power BI; Data extraction, loading and transformation using Power Query Formula Language (“M”) and DAX. You’ll receive some guidelines on how to extract Project data in fast manner.

You will discover some data modeling practices that will ensure you have maximum flexibility in analysis. You’ll also learn some visualization best practices to ensure you can tell your digital story effectively.

You’ll learn best practices for maintaining content with your organization. This course provides an end to end view of Power BI for Project reporting, so that you are able to use Power BI immediately for your needs.

What you will get

You’ll get the tools to immediately get started on your design. You’ll receive:

  • Design spec that prompts the asking of the right questions
  • PowerPoint-based layout templates for dashboard paper prototyping
  • Tried and true BI design approach

5 critical value-adds you will take back to your company

  • You’ll have a jump start on a standard BI development approach
  • You’ll deliver new insights into your data.
  • You’ll have a great understanding of how Power BI can be used and implemented
  • You’ll learn techniques to make your dashboards perform well with SharePoint data
  • You’ll gain insight into how others are leveraging Power BI within their companies.

How will the class be conducted?

The class will be conducted live in an 8 hour session over Skype for Business, from 7 AM – 11 AM and 12 PM to 4 PM Pacific Time, with a lunch break from 11 AM -12 PM and other breaks during the day.

The sessions will occur on Wendesday, September 28, 2016.

This class will be recorded and made available, in case you miss part of the class or are unable to attend live.

What’s the cost?

Only $199 per person to learn how to unlock the power of your data.

Questions?

Contact us directly at info@tumbleroad.com.

Curriculum

  • Introduction to Power BI
  • Getting Data from SharePoint
  • Using Power Query M for Data Retrieval and Transformation
  • Data Modeling with DAX
  • Data Visualization Techniques
  • Content Administration and Deployment
  • Licensing and Planning Considerations

The True Cost of the Cubicle Hero

Heroes. Society loves them, honors them and exults them. Corporate offices are filled with a new breed of hero, the Cubicle Hero. These are the people who go beyond the norm and figure it out. They burn the midnight oil and they get it done. They overcome the chaos and reach the goal. All hail the hero!

However, heroes tend to overstay their welcome. In the movie, “The Dark Knight Rises”, character Harvey Dent intones, “You either die the hero or you live long enough to see yourself become the villain.” The Cubicle Hero’s individual victory is celebrated initially, but situations change and the need for the hero diminishes over time. Or so we hope.

Cubicle heroes can become process bottlenecks and productivity killers. Why? The organization’s reward structure doesn’t lead them to being mentors. The cubicle hero has great value to the organization but their way of working can’t scale and the lack of information sharing prevents the organization from truly benefiting from their victory. The hero then gets involved in every project that touches their area and becomes the bottleneck as the demand for their time is greater than what is available. Thus, the hero slowly becomes the villain, delaying projects.

Many years ago, I worked at a company where a core process of the company was dependent on a very skilled hero. He was a great employee and did his job earnestly. However, he also guarded his knowledge so that he was the only one who understood it completely. This became a serious company concern when he was involved in an accident, leaving him unable to work for several months. Several key projects were impacted.

Changing the perspective, expectations and language of what happens as part of these efforts can lead to a different outcome. We need to make it clear that we want and need Corporate Explorers rather than Cubicle Heroes. Leif Erickson, the Viking, may have been the first to reach North America on a heroic journey, but it was the explorer, Columbus, that opened up North America to the world.

Explorers and Heroes share many common traits. They can see the big picture. They can dig down into the details when needed. They put in the extra effort to get the job done. The real difference is in the aftermath. Explorers open new trails so that others may come behind them. Explorers become guides to help others make the same journey. Heroes, on the other hand, continue to hold onto their conquest.

Changing your company culture to encourage Explorers over Heroes creates a scalable culture of knowledge sharing. This organizational approach leads to greater productivity, higher quality collaboration and timelier project progress.

To summarize, I recommend reviewing the following in your organization.

  • Provide a clear path to success for as many as possible to the rewards for exceptional effort, in a way that others and ultimately the organization can leverage
  • Provide public recognition for knowledge sharing
  • Structure rewards, within the process, so we can move from the mentality of one time hero-creation to our true goal of constant productivity improvement
  • Provide the Explorer with opportunities to help facilitate and implement their achievement within the organization. This keeps the Explorer engaged and looking for additional ways to improve
  • Provide collaborative tools like Office 365 and Yammer to help facilitate and support the Explorer’s journey

If you are ready to address more productivity issues in your organization, talk to us or join our Community.