Are you accidentally exposing confidential data with Microsoft Power BI?

Are you accidentally exposing confidential data with Microsoft Power BI? You may if you are misusing the Publish to Web functionality.

What is Publish to Web?

Publish to Web is a function that enables you to share your Microsoft Power BI content with the world. For example, if you are a government agency and you wanted to publish the latest Zika virus statistics or you are a company that wants to show your charitable efforts, you could use Publish to Web to do this. You can see the details here.

This Microsoft Power BI function creates an embed code that you can put into most websites and have it render the reports as needed. One thing that occurs under the covers is all security validation to access the content and detailed data is stripped away. It has to in order to enable anonymous access for internet users.

So what’s the problem?

Some folks, either to avoid the need to buy a Microsoft Power BI license or in trying to embed Microsoft Power BI content in an On-Premises site like Microsoft SharePoint 2013, published their content using this function.

The risk is that if the content is on a page that gets indexed by a major search engine, like Google, the embed code will likely live in Google’s index forever. Then anyone can search for your data.

Really, I need licenses for all data consumers?

Yes, you do. Microsoft Power BI requires both content creator and consumer to be licensed in some fashion. This is true even if you embed the report in Microsoft SharePoint or Microsoft Teams. If you attempt to avoid it using some automated means, you could be creating a multiplexing licensing violation with Microsoft, which can very expensive to resolve.

There are three ways to license Microsoft Power BI. You can license individuals using Microsoft Power BI Pro. Pro is also included in Microsoft Office 365 E5 license. You can also license capacity using Microsoft Power BI Premium, which covers all consumers without having to license them individually. You can also license by usage, using Microsoft Power BI Embedded.

To justify Pro licensing expense, you have to deliver more than $120/year of value per person. Premium/Embedded may lower this bar. These are very low bars.

Using Microsoft Power BI to automate one manual effort heavy, copy/paste operation usually crosses this value bar. We did a one week effort that resulted in a 120 hour per week savings of time, eliminating the need to hire more people. The internal rate for an employee was $65/hour so this one project freed up $405,600 of time to be allocated to higher value activities. Licensing was $8400 for the year so this project returned a 14x benefit for the cost.

The alternative is to risk a data leak, as some folks are doing. The legal cost of a leak will greatly overshadow any license cost for Power BI.

I still need to embed so now what?

Please use the secure embed option if you are on-premises or are using a non-Office 365 platform. You can read more about it here. If you need to embed Microsoft Power BI content in Microsoft Office 365, use the new Power BI web part for Modern pages. You can read about it here.

Power BI Administrator Recommendations

We strongly recommend you either disallow Publish to Web or you restrict it’s use to specific report creators. Having it open to all report creators could lead to accidental use, resulting in a costly data leak.

The easiest method is to manage this with an Active Directory group that has the ability to Publish to Web. Then you can add and subtract members as needed.

You should also consider doing the following:

  • Review the Publish to Web tenant setting documentation here.
  • Review all embed codes in use within your tenant by referring to this doc.
  • Monitor the “PublishToWebReport” audit log event and setup an alert so you can review newly published embed codes for confidentiality

Let us know what you think in the comments below.

CoP TIP: Prevent column changes from breaking your Power BI model

This is a post from our Managed Community of Practice service that supports clients in building a data culture. If you need help for your organization, contact us at hello@marqueeinsights.com to discuss.

Who Broke My Model?

Don’t you hate it? You have a Power BI model pulling data from many sources, like SharePoint and a few Excel spreadsheets. You come in in the morning and see a Teams message that your Power BI model isn’t updating! Oh no, what now?

You look in the Power BI service and see that there’s an error on the refresh. You sigh as once again; someone renamed a column, or they deleted a column from the data source.

Business is constantly changing so this shouldn’t be such an issue, but it is. Power BI refers to all columns by name, so the default behavior will lead to issues. Power BI is getting rolled out everywhere in the org, so this problem is growing quickly.

Models Should Adapt to Changing Data

In a perfect world, Power BI handles schema changes without blowing up. The column that is renamed or removed is simply returned as null values, giving you time to figure out where it is used and make changes accordingly.

Thankfully, that world is here today. Let’s review how to make that happen.

Here’s How to Make Power Query M Adapt Automatically

The key is in how you specify what columns to include in your Power Query. As discussed in your Marquee Insights Power BI class, this action should be done immediately after the Navigation step as this action generally folds back to the data source. The order in which this action is made is of even greater importance, using this technique.

It’s important to select your columns first, after navigation, for this to work.

Normally, when you select your columns, you specify the following Power Query M statement, usually through the UI action:

= Table.SelectColumns(Table_Name,{"Field1", "Field2",
"Field3"})

If Field3 is no longer returned by the data source, the query errors out. Why is that? Well the Table.SelectColumns command has three parameters. The third parameter is what action to take when a field is missing.

= Table.SelectColumns(Table_Name,{"Field1", "Field2",
"Field3"},ActionToTakeWhenFieldIsMissing)

There are four potential values for this third parameter.

  • MissingField.Error
  • MissingField.Ignore
  • MissingField.UseNull
  • MissingField.Type

MissingField.Error is the default so when a column is missing, an error is thrown, and the column is not returned as part of the dataset. The MissingField.Ignore option doesn’t throw the error but also doesn’t return the missing column. Unfortunately, these two options will break any reports that use that missing column.

Thankfully, there’s MissingField.UseNull. This option is the key to making your queries resilient to changes.  If a specified column is not found in the underlying data, it returns the column as a column of null values. The benefit is that your reports will continue to work, and it allows you to track down where the column is being used.

I’ve yet to find any documentation regarding MissingField.Type so that will be covered in a later blog post.

An Example

I’ll use the Projects oData feed from Project Online to illustrate the behavior. Project Online is an example of a product that allows Administrators to define and change custom metadata easily. If you are not reading this on the MarqueeInsights.com site, please go there for the original version. Consequently, model refresh failures are a common issue with Power BI and Project Online.

I’ve created a Project Custom Field called “A Test of Field Removal” in Project Online.

Set up of the Project level Custom Field in Project Online.

I’ve created a new query against the Project oData feed and selected five columns. I’ve specified the default MissingField.Error value. The values load normally as shown below.

Select columns using the default MissingField.Error value.

Now, let’s delete the column from Project Online and refresh Power BI. The result is this error.

Deleting the source column yields an error.

If we change the MissingField parameter to MissingField.UseNull, we get the following result instead. Note, the missing column is returned but now as a column of nulls.

Using MissingField.UseNull, the column is returned with nulls, allowing use to continue.

That’s all there is to it. Simply ensure you are using the correct parameter and you’ll see less data model issues due to renamed and/or removed columns.

Business Intelligence is not just reporting

Whenever I start a conference presentation, I lead with “Business Intelligence is not just reporting!”  This comes about as a call to get new Power BI users to look more broadly. If all you are doing with Power BI is replicating the Excel spreadsheets you already have, you are missing out on significant value. If your users want their new dashboards to look like a spreadsheet, a conversation is warranted.

Let’s discuss three ways that Business Intelligence is so much more than reporting.

BI is your first step towards AI

A lot of companies today are intrigued by the potential of artificial intelligence. Some of them have started projects to implement AI. The issue that many companies will run into as they start the path toward AI is data quality. For AI to be successful, you must have good quality data to train the models. If you haven’t started down the BI path yet, chances are, your data quality is poor at best. Jumping into AI with poor data quality will lead to incorrect outcomes.

BI forces you to clean up your data and processes

Another positive outcome of BI projects is a rationalization of entity and business rule definitions. For example, in a recent project, the client wanted to monitor product profitability across the board. The issue was that the definition of a product was different based on the division producing it. Consequently, we had to find common data ground for such an analysis to be made.

In other projects, we encounter timing discrepancies, dependencies on manual effort for key data, and incomplete data. It’s very difficult to have useful near real-time analysis over data that is received once a month from Finance. These timing issues must be worked out across processes.


Do you really want to bet your career on the numbers within a manually maintained spreadsheet?

BI can eliminate a lot of hidden manual effort

Manual data gardening of spreadsheets is taking up a lot of time in organizations and creating potential problems. Do you really want to bet your career on the numbers within a manually maintained spreadsheet? The sad truth is that many managers do just that.

Also, the manual effort is robbing you of your most precious resource, time. There was one organization where every one of their twenty project managers were spending six hours a week manually creating status reports. Products like Power BI and Flow can be used together to automate the collection and dissemination of data within the organization, freeing up time for more valuable work. A small project like this can easily justify the license expense of Power BI.

BI can set the stage for further successes

In the end, a successful BI journey creates fertile ground for a potentially successful AI implementation. In the end, we will achieve Collaborative Intelligence (CI) where the AI tools augment the human and make them more effective. Many of the most amazing uses of AI are focused on shortening or removing the learning curve, instead of replacing the human.

BI can’t do it alone

Lastly, one of the other challenges we see is the idea that simply bringing in a new BI tool will magically produce results. Time and again, organizations make software investments without the requisite investment in people. To get the highest benefit from a BI tool, your organizational culture must make data part of their day to day activity. Creating a data culture requires active investment and time.


One sign of a company that has achieved a data culture is that data has a “voice.” You’ll hear people ask in meetings, “what does the data say?”

The rise of Data Culture

One sign of a company that has achieved a data culture is that data has a “voice.” You’ll hear people ask in meetings, “what does the data say?” While upgrading software is easy, upgrading people’s habits is much harder. If you are starting the journey, you should consider how you will invest in training and in a community of practice to sustain the change long term. We’ll discuss data culture in more detail in a subsequent post.

We can help you

Without the accompanying culture change, your BI tool implementation may simply be the latest “change tsunami” to wash upon your corporate shore. These waves create massive chaos today as they impact people’s work habits. Then the project washes out after a few months, never to be heard from again. The damage to your co-workers’ trust will remain for a long time.

I hope you can avoid these issues. If you need help navigating the change, please reach out to us as we can help you.

Controlling Power BI Filter pane visibility with Bookmarks

I was working with some bookmarks in a Power BI model earlier when I noticed that in some cases the Filter pane was expanded and in other cases, it was collapsed. I was intrigued so I started investigating.

It seems when you save the bookmark, it now retains the open/close state of the Filter pane. I believe the Display option within the bookmark has to be set in order to control this. So, if you wish to show the Filter pane by default on a report, expand it in the PBI Desktop and update your bookmark. Collapse it and save bookmark if you want it collapsed by default.

If you have a question or comment, post it below.

Analyzing Office 365 Photo EXIF data using Power BI

In an earlier post, I wrote about how to mine the OCR data from Office 365 document folders using Power BI. During the research for that post I discovered that the photo EXIF data is also available for analysis from Office 365. I’ll show you how to get to it in this post.

What is EXIF data anyway?

Think of EXIF as special metadata for photos and with today’s cameras, records the settings of the camera as well as the location it was taken, if the camera has GPS. You can read more about the format here: https://photographylife.com/what-is-exif-data  Many smart phones today automatically include the EXIF data when you take photos with your phone. This data is automatically extracted by Office 365 from photos when uploaded to a document library.

Why do I care about this?

Imagine you work for a company that does inspections. Your inspectors use their phone to take photos of issues already. Wouldn’t it be great to show the photos by location and related location data together on a report page? This technique allows you to easily mine that EXIF data.

How do I get to it?

First, upload your images to a SharePoint document folder in Office 365 and the OCR process will initiate  automatically. I’ve had it take up to 15 minutes to process so you may need to be patient. You can do this via SharePoint Mobile app if you are uploading mobile photos.

Second, from Power BI desktop, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the correct column that contains the EXIF data. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

In this video, I’ll show you how to access the EXIF data and what you can do with the data.

Here’s the Power Query M Code

let
Source = SharePoint.Tables(“https://YourInstance/YourSite”, [ApiVersion = 15]),
#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″ = Source{[Id=”88888888-8888-8888-8888-888888888888″]}[Items],
#”Renamed Columns” = Table.RenameColumns(#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″,{{“ID”, “ID.1”}}),
#”Filtered Rows” = Table.SelectRows(#”Renamed Columns”, each [MediaServiceAutoTags] <> null and [Created] >= #datetime(2019, 4, 19, 0, 0, 0)),
#”Expanded FirstUniqueAncestorSecurableObject” = Table.ExpandRecordColumn(#”Filtered Rows”, “FirstUniqueAncestorSecurableObject”, {“Url”}, {“FirstUniqueAncestorSecurableObject.Url”}),
#”Removed Other Columns” = Table.SelectColumns(#”Expanded FirstUniqueAncestorSecurableObject”,{“FirstUniqueAncestorSecurableObject.Url”, “File”, “Properties”}),
#”Expanded File” = Table.ExpandRecordColumn(#”Removed Other Columns”, “File”, {“Name”, “ServerRelativeUrl”, “TimeCreated”, “TimeLastModified”}, {“File.Name”, “File.ServerRelativeUrl”, “File.TimeCreated”, “File.TimeLastModified”}),
#”Merged Columns” = Table.CombineColumns(#”Expanded File”,{“FirstUniqueAncestorSecurableObject.Url”, “File.ServerRelativeUrl”},Combiner.CombineTextByDelimiter(“”, QuoteStyle.None),”Picture URL”),
#”Expanded Properties” = Table.ExpandRecordColumn(#”Merged Columns”, “Properties”, {“vti_mediaservicemetadata”}, {“Properties.vti_mediaservicemetadata”}),
#”Parsed JSON” = Table.TransformColumns(#”Expanded Properties”,{{“Properties.vti_mediaservicemetadata”, Json.Document}}),
#”Expanded Properties.vti_mediaservicemetadata” = Table.ExpandRecordColumn(#”Parsed JSON”, “Properties.vti_mediaservicemetadata”, {“address”, “location”}, {“Properties.vti_mediaservicemetadata.address”, “Properties.vti_mediaservicemetadata.location”}),
#”Expanded Properties.vti_mediaservicemetadata.location” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata”, “Properties.vti_mediaservicemetadata.location”, {“altitude”, “latitude”, “longitude”}, {“Properties.vti_mediaservicemetadata.location.altitude”, “Properties.vti_mediaservicemetadata.location.latitude”, “Properties.vti_mediaservicemetadata.location.longitude”}),
#”Expanded Properties.vti_mediaservicemetadata.address” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata.location”, “Properties.vti_mediaservicemetadata.address”, {“City”, “State”, “Country”, “EnglishAddress”}, {“Properties.vti_mediaservicemetadata.address.City”, “Properties.vti_mediaservicemetadata.address.State”, “Properties.vti_mediaservicemetadata.address.Country”, “Properties.vti_mediaservicemetadata.address.EnglishAddress”}),
#”Changed Type” = Table.TransformColumnTypes(#”Expanded Properties.vti_mediaservicemetadata.address”,{{“File.Name”, type text}, {“Picture URL”, type text}, {“File.TimeCreated”, type datetime}, {“File.TimeLastModified”, type datetime}, {“Properties.vti_mediaservicemetadata.address.City”, type text}, {“Properties.vti_mediaservicemetadata.address.State”, type text}, {“Properties.vti_mediaservicemetadata.address.Country”, type text}, {“Properties.vti_mediaservicemetadata.address.EnglishAddress”, type text}, {“Properties.vti_mediaservicemetadata.location.altitude”, type number}, {“Properties.vti_mediaservicemetadata.location.latitude”, type number}, {“Properties.vti_mediaservicemetadata.location.longitude”, type number}})
in
#”Changed Type”

Useless Solutions: Windows 3.1 Hot Dog Stand Theme – Power BI style

On occasion, useless solutions like this will be posted as they may not be directly useful in production, but they are educational in value and can lead to useful work.

Oh goodness, what have you done?

This post originally started as a joke at dinner with Adam Saxton of Guy in a Cube and Jason Himmelstein of the BiFocal podcast. It was worth it to see the look on their face when I mentioned I was bringing the Hot Dog Stand Theme to Power BI. And now I have.

What is “Hot Dog Stand?”

For those who have no idea what this is about, Windows 3.1 shipped with a theme called Hot Dog Stand. Coding Horror has a nice write up with screen shots here. It was hideous and no one knew why it was there. It became a joke to change someone’s theme to Hot Dog Stand if they left their workstation unlocked. Hot Dog Stand never made it into Windows 95 and faded, for the most part into history.

An opportunity arose

As luck and work would have it, clients were asking for custom themes and a deep dive into the Power BI themes format was necessary. Hence, the idea to combine wacky fun with learning the ins and outs of theme JSON descriptors by recreating Hot Dog Stand.

Getting Started

I started where likely most people start, the Microsoft docs on Power BI themes here. It’s a helpful document but wow, there’s a lot of potential coding to do here. I needed to get rolling more quickly.

Cool, a theme generator

Next stop was the PowerBI.Tips theme generator. This tool is fairly easy to use to generate a quick theme file. it creates the JSON but has limited options on what can and can’t be changed. The results were ok, but I wasn’t feeling the serious ugly of Hot Dog Stand yet.

Even cooler, a GitHub repository!

After some web searches, I can across this GitHub repository of Power BI theme snippets. David Eldersveld put this repository together, probably due to the same reasons that brought me here. I needed to do more customization but I didn’t want or was able to hand code all of the particulars.

The free VS Code made this pretty easy to assemble. You will likely want to use a JSON formatter as you are doing this. Code may have one but in the interest of moving fast, I found this site that did the job well.


One tip is that if you are going to merge many of the snippets into a theme file, ignore the first three and last two lines of every snippet. Otherwise, you’ll get errors importing it.

The Result

To reiterate, Hot Dog Stand started as a theme generator generated file that I edited in VS Code and augmented with snippets from GitHub. The result is this.

Isn’t it hideous?

If you would like a copy of the theme file, to do your own terrible things with, download it here. If you have questions or comments, please post them below. Check out our Twitter feed at @tumbleroad.

The easiest way to do small-multiples in Power BI!

Recently, there was a post on the Power BI blog about how to do How to do multivariate reporting with Power BI or what you may know as “Small-Multiples.” Small-multiples allow you tell a big story via a chorus of little stories, that would otherwise be obscured by the large number of plotted points. There are many uses of this technique in the data visualization literature. If you are unfamiliar with small-multiples, please read this article as a primer.

In October 2016, Microsoft Research released the Infographics Designer custom visual for Power BI. Much of the attention was focused on how you can use graphics to create custom charts. However, buried in all of this was the ability to create small-multiples easily, using the Infographics Designer.

The Infographics Designer allows for four types of small-multiple visualizations.

  • Cards
  • Columns
  • Bars
  • Lines

In the video below, I’ll demonstrate to you how to use the Infographics Designer to create multiple types of small multiples.

If you have comments or questions, please post them below!

Analyzing Office 365 OCR Data using Power BI

I’m so excited to see Optical Character Recognition (OCR) working natively now in Office 365! I got my start in government where we scanned a lot of documents. There was a lot of data locked away in those images but no easy way to mine it. OCR was explored as a possible solution, but it was still in its infancy and not very accurate or scalable.

Fast forward to today and now OCR is simply part of our cloud infrastructure and with the assistance of Machine Learning, very accurate. The dream of finally unlocking that data is here! Or is it? Read on to find out more.

By the way, we’ll be covering this and more in our session at The SharePoint Conference, May 21-23 in Las Vegas! Use discount code GATTE to get a discount. https://sharepointna.com

Intelligent Search and Unlocking Your Image Data

SharePoint’s ability to do native OCR processes was first shown at Ignite 2017. There, Naomi Moneypenny did a presentation on Personalized intelligent search across Microsoft 365, where you saw Office 365 OCR in action. https://techcommunity.microsoft.com/t5/Microsoft-Search-Blog/Find-what-you-want-discover-what-you-need-with-personalized/ba-p/109590

She uploaded an image of a receipt and was able to search for it, based on the contents of the receipt image. It was a seamless demo of how Office 365 can intelligently mine the data in image files.

Now, let’s see if we can access the OCR data across multiple files and do something with it.

In the following procedure, I’ll show you how to connect to Office 365 and get to the data that the OCR process returns. In the following post, I’ll show you how to process receipt data to get the total amount spent, solely from the OCR data.

Process Overview

There are three steps to the process to start mining your OCR data. First, you have to add image content that contains text to a SharePoint folder.

The process of getting OCR data

Finding OCR Data

The OCR process that runs against the files in a SharePoint document folder are called Media Services. All derived data is stored in columns that contain Media Services in them.

Unfortunately, I’ve discovered that this feature has not been implemented consistently across the Shared Documents folder, custom folders and OneDrive. There is good news in that there’s a less obvious way to get to the data consistently across all three, using Properties. As shown below, you see the normal column names and where they appear. Only the ones in Properties appear consistently across all. We are only going to cover the basic information but the Properties collection has a lot more data in which to consume.

Audit of which media service fields are available where in Office 365

Adding Image Content to a SharePoint Document Folder

When you upload an image to a SharePoint document folder in Office 365, the OCR process kicks off automatically. I’ve had it take up to 15 minutes but the OCR process will analyze the image for text and return the text in a field called MediaServiceOCR if present and always in Properties.vti_mediaserviceocr.

These columns contain any text that was recognized in the graphics file. The current structure of the returned data is a bit different that what is in the source image. Each instance of the discovered text is returned on a separate line, using a Line Feed character as a delimiter. For example, if you had a two-column table of Term and Meaning, it would return the data like this:

Term

Meaning

Term

Meaning

Original uploaded image
Data returned by media services

While it’s great you can get to the data, the current returned format makes it exceptionally complex to reconstitute the context of the data. Also, the more complex your layout, the more “interesting” your transformations may need to be. I’d strongly recommend this post (https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi/) and this post (https://blog.crossjoin.co.uk/2017/12/14/removing-punctuation-from-text-with-the-text-select-m-function-in-power-bi-power-query-excel-gettransform/ ) to give you the basics of text parsing in Power Query M.

Accessing the OCR Data in Power BI

The OCR columns are item level columns. The normal tendency would be to connect to your SharePoint site using the Power BI SharePoint Folder connector. You’ll be disappointed to find that the Media Services columns aren’t there.

Instead, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the Media Services columns. Once in the dataset, you can use Power Query M to parse the data and start analyzing.

Demo

Let’s walk through how to access the data and manipulate it using Power BI. In this scenario, I have two receipts that have been uploaded in a document folder and I’m going to get the total spent on these receipts by analyzing the OCR data.

What about OneDrive for Business?

Yes, it works there too! The Media Service property fields are here as well. In fact, you get more information in an additional column called MediaServicesLocation. Based on my usage, it seems to be specifically populated for image files. If the image contains EXIF data, the MediaServicesLocation will contain the Country, State/Province, and City information of where it was created. Within the Properties collection, you can actually get more detailed information about the photo, like the type of camera that took it and more.

To connect to OneDrive where this will work, you need your OneDrive URL. I normally right-click on the OneDrive base folder in File Explorer and select View Online, as shown below.

Select View Online to get to the OneDrive url needed for Power BI connection

Potential for GDPR Issues

One aspect to consider if you look to do this is a production manner in Europe is that you will likely encounter information that falls under GDPR regulation. Consider this your prompt to think about how this capability would fit into your overall GDPR strategy.

Want a copy of the Power BI model?

Fill out the form below and it will emailed to you automatically, by virtue of the magic of Office Forms, Flow, and Exchange!

I hope you liked this post. If you have comments or questions, post them below in the comments.

In Data We Trust? Part Two

Part 2 of this series explores the difficulty in determining whether your business intelligence content is using authentic data. To illustrate the point, let’s examine a recent Seattle Times article about the Measles outbreak happening in Washington.

An Example

The article in question, “Are measles a risk at your kid’s school? Explore vaccination-exemption data with our new tool,” presents a story filled with data charts and tables and made some conclusions about the situation. Many internal reports and dashboards do the same, presenting data and conclusions. Unlike internal reports, newspapers list the source and assumptions in small print at the end of the story. Knowing the data comes from an official source adds authenticity.

The following note is supposed to increase authenticity.

“Note: Schools with fewer than 10 students were excluded. Schools that hadn’t reported their vaccination data to the Department of Health were also excluded.

Source: Washington Department of Health (2017-18)”

But does it really? Documenting any exclusions and note sources is a good practice. However, it’s not very prominent and if you search for this data, you’ll likely find several links. There’s no link or contact information.

Data authenticity is crucial to making successful decisions. In order to do so, key data questions should be answered.

What data was used?

Many content creators don’t bother to document the source of their information. Many would not have the same level of confidence about the new financial dashboard if the viewer knew the data came from a manually manipulated spreadsheet, instead of directly from the finance system. How would the reader know anyway? In many cases, they wouldn’t. The Seattle Times provided a hint, but more is needed.

When you buy items like wine, you know what you are buying because the label spells it out. A wine bottle is required to have a label with standard data elements to ensure we know what we are buying. For example, a US wine label must have the type of grape used to make the wine.  Even red blends must list the varietal and percentage so that the customer is clear on what is in the bottle. Having the equivalent type of labeling would improve transparency about data authenticity.

Who owns the data we are consuming?

This is very important, especially if we spot incorrect or missing data. Who do we contact? The Seattle Times lists the Washington Department of Health as the data owner. This is a good starting point but doesn’t completely fill the need. For internal reports, all data sources should include an owning team name and a contact email. The data vintage example below also includes the site urls and a contact email.

Data Vintage Example

How old is the data?

It’s one thing to know when’s the last time the data was pulled from the source but that’s not the need. Data age can strongly influence whether it can be used to make a decision. In our Marquee™ products, we include a data freshness indicator that shows proportionally how much of the data has been updated recently. Recently becomes a business rule of what constitutes fresh data. With some companies, the entity must have been updated with in the last seven days to be considered fresh.

Data Freshness indicator for time dependent data.

How to address?

We took the liberty of creating a Power BI model that analyzed the same immunization data used in the Seattle Times story. We’ll use this model to illustrate the simple technique. The following steps were performed to enable a simple “data vintage” page.

Procedure

  • Create a Data Vintage page (you may need more than one, depending on how many datasets and sources you have)
  • Add a back button to the page. We put ours in the upper left corner
  • Add the following information to the page using a consistent format that you’ve decided upon
    • Name of dataset
    • From where is the data sourced and how often
    • Which team owns the data
    • How to contact the data owner, if available
  • Create a Data Vintage bookmark for the data vintage page so that it can be navigated to via a button.
  • Go back to the report page that you created from this data
  • Add an Information button to the upper right corner of the page.
  • Select the button and navigate to the Visualization blade
  • Turn on Action
  • Set Type to Bookmark
  • Set the Bookmark to the one you created in Step 4.
  • Ctrl + Click the Information button to test
  • Ctrl + Click the Back button to test

That’s it. Anytime a user or fellow Power BI Author has a question about the underlying model data, it can be accessed very easily. You’ll also improve impressions of data authenticity by implementing this label in a consistent manner across all content.

A Working Example

We’ve created a different analysis of the Washington State Immunization exemption data, where we also added a data vintage page. You can try it out below. Click the i Information button in the upper right of the screen to display the data vintage.

In Part 3, we’ll examine the problem of data integrity and how can you be sure your data has implemented the proper business rules for your organization.

Have a question or comment? Feel free to post a comment below.

New SharePoint Modern Power BI Theme

If you are creating Power BI content that you are embedding into a SharePoint Modern page, you know that the default Power BI theme colors don’t match SharePoint’s colors. Not to worry, we’ve got you covered. 

Attached to this post and also cross-posted to the https://Community.PowerBI.com Theme Gallery is the Tumble Road SharePoint Modern theme. This theme uses the core Modern experience colors in SharePoint, assuring your Power BI content will look great when embedded within SharePoint.

Download the zip file here.