In Data We Trust? A multi-part series.
I finished the demo of Microsoft Power BI dashboards and reports that we had built for a client. I looked at the room and asked, “What do you think?” This proof of concept hopefully created excitement, by showing what was possible with the client’s data. As we went around the table, people were generally thrilled. The last person’s feedback, however, caught me off guard. “It looks great, but how do I know I can trust the data?”
That question rattled around my brain for days. In the rush toward a data-centric future, clients weren’t asking if their data was trustworthy. I researched the problem further, finding some whitepapers and such on the topic, but no clear recommendations on how to address this issue.
Three Areas of Data Trust Issues
Is the data you are using authentic, in that is it from a trusted official data source? How do you know? Imagine your executives making decisions about your project portfolio based on a manually maintained spreadsheet instead of from data retrieved directly from their Project Management software.
A trusted data source is one aspect to consider. You also need to know if that data source actively managed? It’s one thing to have data from an official source but if it’s a one time extract versus an ongoing process, the value declines quickly.
Lastly, is this data coming from the official system of record? Imagine getting project cost values from a non-accounting system? Is this system the official system of record for cost data? Many reporting solutions obscure the source of their data, making it impossible for end users to determine if the source is the correct and official authoritative
Data integrity is knowing that the agreed upon business rules within your organization are consistently applied to your data. Integrity also looks at how close are you pulling data from the official data source. Is the data being taken as is from the official data source or is it being derived? If it is derived, does it officially defined internal business rules to achieve the outcome?
For example, you are using the Project Health from your project management system. Is the value following the standard Project Management Office definition for Project Health or is it using some specialized logic that maybe was used for a one-time analysis? How do you know? Deriving data is not bad if the process adheres to the established internal rules but you have to have a way to gauge this.
When you go to the grocery store to buy fresh fruit or meat, the ability to assess food freshness is vital to your buying decision. Old fruit isn’t very appealing and can have detrimental health effects. The same could be said about old data.
In data, we should be going through a similar assessment of freshness. Is this data representative of recent activity? This is not when was the data last refreshed into the report, but rather when was the data last modified. Data that has been modified this morning is likely to be more representative than data that was modified three weeks ago. How do you gauge the freshness of your data?
In this series, we’ll explore each of these areas, first examining the challenges end user face determining if the data they are using is trustworthy. We’ll then explore a future where these issues are addressed. Lastly, a presentation of techniques to overcome each of these challenges will follow, enabling you to address these trust issues in your own organization.