In Data We Trust? Part Two
Part 2 of this series explores the difficulty in determining whether your business intelligence content is using authentic data. To illustrate the point, let’s examine a recent Seattle Times article about the Measles outbreak happening in Washington.
The article in question, “Are measles a risk at your kid’s school? Explore vaccination-exemption data with our new tool,” presents a story filled with data charts and tables and made some conclusions about the situation. Many internal reports and dashboards do the same, presenting data and conclusions. Unlike internal reports, newspapers list the source and assumptions in small print at the end of the story. Knowing the data comes from an official source adds authenticity.
The following note is supposed to increase authenticity.
“Note: Schools with fewer than 10 students were excluded. Schools that hadn’t reported their vaccination data to the Department of Health were also excluded.
Source: Washington Department of Health (2017-18)”
But does it really? Documenting any exclusions and note sources is a good practice. However, it’s not very prominent and if you search for this data, you’ll likely find several links. There’s no link or contact information.
Data authenticity is crucial to making successful decisions. In order to do so, key data questions should be answered.
What data was used?
Many content creators don’t bother to document the source of their information. Many would not have the same level of confidence about the new financial dashboard if the viewer knew the data came from a manually manipulated spreadsheet, instead of directly from the finance system. How would the reader know anyway? In many cases, they wouldn’t. The Seattle Times provided a hint, but more is needed.
When you buy items like wine, you know what you are buying because the label spells it out. A wine bottle is required to have a label with standard data elements to ensure we know what we are buying. For example, a US wine label must have the type of grape used to make the wine. Even red blends must list the varietal and percentage so that the customer is clear on what is in the bottle. Having the equivalent type of labeling would improve transparency about data authenticity.
Who owns the data we are consuming?
This is very important, especially if we spot incorrect or missing data. Who do we contact? The Seattle Times lists the Washington Department of Health as the data owner. This is a good starting point but doesn’t completely fill the need. For internal reports, all data sources should include an owning team name and a contact email. The data vintage example below also includes the site urls and a contact email.
How old is the data?
It’s one thing to know when’s the last time the data was pulled from the source but that’s not the need. Data age can strongly influence whether it can be used to make a decision. In our Marquee™ products, we include a data freshness indicator that shows proportionally how much of the data has been updated recently. Recently becomes a business rule of what constitutes fresh data. With some companies, the entity must have been updated with in the last seven days to be considered fresh.
How to address?
We took the liberty of creating a Power BI model that analyzed the same immunization data used in the Seattle Times story. We’ll use this model to illustrate the simple technique. The following steps were performed to enable a simple “data vintage” page.
- Create a Data Vintage page (you may need more than one, depending on how many datasets and sources you have)
- Add a back button to the page. We put ours in the upper left corner
- Add the following information to the page using a consistent format that you’ve decided upon
- Name of dataset
- From where is the data sourced and how often
- Which team owns the data
- How to contact the data owner, if available
- Create a Data Vintage bookmark for the data vintage page so that it can be navigated to via a button.
- Go back to the report page that you created from this data
- Add an Information button to the upper right corner of the page.
- Select the button and navigate to the Visualization blade
- Turn on Action
- Set Type to Bookmark
- Set the Bookmark to the one you created in Step 4.
- Ctrl + Click the Information button to test
- Ctrl + Click the Back button to test
That’s it. Anytime a user or fellow Power BI Author has a question about the underlying model data, it can be accessed very easily. You’ll also improve impressions of data authenticity by implementing this label in a consistent manner across all content.
A Working Example
We’ve created a different analysis of the Washington State Immunization exemption data, where we also added a data vintage page. You can try it out below. Click the i Information button in the upper right of the screen to display the data vintage.
In Part 3, we’ll examine the problem of data integrity and how can you be sure your data has implemented the proper business rules for your organization.
Have a question or comment? Feel free to post a comment below.