Querying Multi-Value Custom Fields



You have a report where the need is to show multiple values for a given custom field. For example, you have a multi-value Project custom field for Impacted Business Organizations.

You want to see your values as a comma delimited list so that this can be used in an Excel pivot table or SSRS tablix report. You might need something like:

Project Impacted Business Orgs
Project XYZ IT, HR, Operations


The Background

When a Project text custom field with associated lookup table is made multi-value, a number of changes are made in the Reporting Database. First, the field is removed from the MSP_EpmProject_UserView as that view only supports single select Project text custom fields with associated lookup table. Second, a new Association View view is created which has the following naming convention: MSPCFPRJ_YourMultiValueCustomFieldName_AssociationView

MSP prefixes all Microsoft views, CF for Custom Field and PRJ for the Project entity. This association view contains a record for each of the multiple custom field values selected, linking the Project record to the lookup table values in the MSPLT_YourMultiValueCustomFieldLookUpTableName_UserView view. LT in this case, stands for lookup table, so there is a MSPLT view for each lookup table in the Reporting Database.

This mechanism was first documented in the Project Server 2007 Report Pack that I wrote and can be found here: http://msdn.microsoft.com/en-us/library/office/bb428828(v=office.12).aspx  The Portfolio report also provides another way to utilize the multi-value field.

The Query

This query uses the XML functionality to build the concatenated string, based on a technique documented on StackOverflow here.

Once I modified the STUFF statement for specific use for Project Server, I wrapped it with an outer SELECT to combine it with all of the data from MSP_EpmProject_UserView. Note, if you have multiple multi-value fields, you will have to duplicate this inner piece in the parentheses for each field. The places to replace with your own field names are highlighted.

SELECT MSP_EpmProject_UserView.*
             , MVList.[YourMultiValueCustomFieldNameValues]
FROM MSP_EpmProject_UserView
   (SELECT   MSP_EpmProject_UserView.ProjectUID 
            ,ISNULL(STUFF((SELECT ', '+ MemberValue 
    FROM [MSPLT_VP Lookup_UserView] 
    INNER JOIN [MSPCFPRJ_YourMultiValueCustomFieldName_AssociationView] 
    ON [MSPLT_YourMultiValueCustomFieldLookUpTableName_UserView].LookupMemberUID = 
    WHERE [MSPCFPRJ_YourMultiValueCustomFieldName_AssociationView].EntityUID = 
    ,1,2, ''),'')AS YourMultiValueCustomFieldNameValues
FROM    MSP_EpmProject_UserView 
GROUP BY ProjectUID ) MVList
ON MSP_EpmProject_UserView.ProjectUID = MVList.ProjectUID

The Output

The output will yield a comma delimited list of values in the last column of the dataset. If you need that comma delimited list sorted, add an ORDER BY MemberValue statement right before the GROUP BY ProjectUID) MVList statement.

Database Diagrams–Project Server Reporting Database


These high level entity relationship diagrams were first published in the deck for my Project Conference Hands On Lab deck. I’ve had a number of requests for this information so here it is.

These diagrams are based on the 2010 RDB but the 2013 RDB should not be materially different. For 2007 users, many of these same entities also exist in the 2007 RDB as well. The UID fields are the keys used to join these entities together.

The name of the entity also contains the recommended table or view name to be used in the Reporting database. Any entity ending with _UserView will automatically include custom fields for that entity. Multi-value custom fields are not included and require special querying to retrieve (more on that later).

If you are tying different data entities together, you should also consider using the lowest level of detail for a given data element. For example, if you are querying Project – Task – Assignments – Resources, I would use AssignmentWork rather than TaskWork or ProjectWork as AssignmentWork will aggregate correctly in PivotTables. Otherwise, you will get a multiple of ProjectWork or TaskWork, depending on the number of records retrieved.

Click on the graphic below to make it bigger.

Project Server Reporting Database Entity Relationship Diagram


Project Server Reporting Database Timesheet Entity Relationship Diagram


Happy querying!

Retrieve Fiscal Year Dates Dynamically

Finance guy

At some point, you will be asked for information by Fiscal Year. The Fiscal calendar was set up in Project Server so it should be accessible for reporting, right? A fair number of companies have fiscal calendars which don’t exactly line up with the standard calendar which can cause some challenges. You could hard code the beginning and ending dates of the fiscal year, but then that would create an annual report maintenance task. What’s a report writer to do?

If you’ve set up the Fiscal calendar within Project Server, it is easy to retrieve this data for the current Fiscal Year based on the current date. The Fiscal calendar information is stored in the MSP_TimeByDay table and can be queried to do all sorts of date based operations.

The example provided below can be used with Excel based reports as well as with SSRS reports.  The key is to do the lookup in two steps. The first part of the query retrieves the Fiscal Year beginning and ending date so that you can use this information in the WHERE clause of the following query to filter the dataset.


Let’s say I need to find out in real time, how many hours of work were scheduled by month for the Fiscal Year.  The real-time requirement pushes me to use SQL rather than OLAP for this data. Since monthly data on assigned work and cost requires aggregation of the time phased assignment data,  I need to query the Assignment By Day view (MSP_EpmAssignmentByDay_UserView) in the Reporting database. Production reports would require additional joins to Resource and Project views to complete the dataset but for illustration purposes, I’ll limit the example to only the Assignment By Day view for illustration simplicity.


The following query works in Excel where I find this data is most commonly requested.

The DECLARE statement declares the two variables where we will store the date results for the beginning and end of the current Fiscal Year.

The next SELECT statement uses a subquery to get the beginning and ending date of the current Fiscal Year. The subquery uses the current date to find the current Fiscal Year for use in the WHERE clause of the primary query.

You may be wondering what this clause is doing as you look at the code below. CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME))  This clause is needed to address the time mismatch between the TimeByDay data and the current time.

All of the dates in the TimeByDay field in the MSP_TimeByDay table are stored as dates with midnight time. If you retrieve the current date and time, the numbers won’t match if you attempt to use it as a filter as the time portion will be different. So, you have to reset the time part of the current time back to midnight for this filtering to work.

  • GETDATE gets the current date and time.
  • Casting the result as a FLOAT represents the date and time as a decimal number, where the portion to the left of the decimal is the date and the time is stored to the right of the decimal.
  • The FLOOR statement sets the decimal portion of the number to its lowest value, which would represent midnight.
  • The outer CAST makes it a Datetime field data type again so that we can use it to filter on a Datetime field. In the last query, you use the declared variables to find all records between the beginning and finishing date of the Fiscal Year. Note, this technique can also be used to get the current Fiscal Quarter, current Timesheet period, calendar month, etc. There is a fair number of data elements in the MSP_TimeByDay table that you can use for date related needs.



If you are using one of the default Excel report templates in the Project Server Business Intelligence Center, you would open the file in Excel client.

  • Go to the Data tab
  • Click Connections
  • Select the connection and click the Properties button
  • Go to the second tab and click into the SQL box
  • Select all (Ctrl+A) and delete the existing SQL
  • Copy the above query and paste into this box
  • Click OK to update and OK on any warning
  • Once updated, click Close to close the dialog

Once it returns the data, you would see the returned fields in the Excel field well next to the Pivot Table.

Looking for more Project Server information?

Visit the AboutMSProject.com Recommended Books page.

The Social Nature of Project Management


A successful configuration of Project Server is one that supports the conversations within the organization. Users have to go beyond use of the system and have concerns over the validity of the data entered. The Project instance captures the requisite information for the conversations and enables participant to synthesize additional outputs and insights. Therefore, an understanding of the work social environment is key to understanding the requirements for configuration and getting the necessary level of user engagement.

Projects: One of the Original Social Networks

Projects have been around for thousands of years as humans started working together to achieve common goals. We didn’t call them projects back then as most interaction was face to face and immediate. By PMI’s definition, these interactions were projects in that they were temporary endeavors undertaken to create a unique product, service or result. Feeding the tribe or building shelter for the family can be considered projects.

As projects got longer in duration and more people were involved, the need to capture the conversation between all parties became critical. The social networks involved in doing the work grew larger and longer. Project plans were born as a technique to keep track of the overall conversation. Thus, project plans represent the latest state of the conversation between everyone involved with the project.

Projects are Dynamic Social Networks

Projects today represent a temporary reorganization of your work social network. Within the project, the roles and importance of members change over the life of the project as information needs change to achieve work.

Tools such as RACI attempt to model the project’s social interactions. Communication plans also attempt to do this from a different perspective. However, RACI, communication plans and other tools of that sort represent a one-time, almost theoretical look, at how the project members and stakeholders interact. These models fall down as soon as the project starts and reality takes over.

Many social networks have different levels of access based on either whether you friend someone (Facebook) or through which contact you know them (LinkedIn’s Levels). Unlike Facebook and LinkedIn, your work social network can be defined and refined by the interaction level with others. The interaction level with a specific person or team represents mutual dependency as the interaction yields information needed to achieve work. A time aspect adds priority to the interaction as tasks that are due earlier require information sooner. Emails, meetings, IMs, tasks and break room discussions all represent types of interactions. As a result, today’s joint deadlines will prioritize a co-worker’s activities to a higher visibility than a co-worker with which you have a deadline 3 months from now.

Modeling the Work Social Network

Your work social network can be modeled as a hierarchy of social zones based on interaction and common experience. Each zone contains social group which morphs over time, depending on the level of immediate interaction. My basic model contains four zones. These are:

  • Inner
  • Shared
  • Experience
  • Public

Members of the Inner zone are those individuals with which I have immediate interactions. These would include:

  • People with which I have joint dependencies
  • My Project Managers
  • My Direct Manager

By immediate, I mean there is a deadline within the three week window of activity that most focus. We look back to last week for status and forward to the next two weeks for planning. The immediate nature of the interaction means my attention to this group’s information and needs will be prioritized above other interactions.

The Shared zone members are people with which I have a current shared experience but do not have joint dependencies. This group’s information will be of interest but not priority. This group includes:

  • Other project team members with which I’m not directly collaborating
  • Other members of my work team with which I do not have project work
  • Past direct collaboration partners on current projects
  • Other interests or information to which I have subscribed

The Experience zone comprises the edge of your social work network. While the Inner and Shared zones tend to focus on interactions within your current work environment, the Experience zone extends beyond. The Experience zone is comprised of those individuals with which you’ve had past interactions. This can be:

  • Former Work team members
  • Former Project team members
  • Past transitory interactions within and external to the work environment

Many may consider this to be the “LinkedIn” zone. Members of this zone can be a source of information as well as an external communication channel of activities and opportunities.

Last, the Public zone is attached to your public persona. This can include:

  • All others in your company with which you have had no interaction but share a common identity
  • Contacts based on public facing interactions

Example Scenario

I have a joint deadline with my DBA this week. The DBA has a family emergency and has to fly to the other coast immediately. Currently, they are within my inner zone and I would be very interested in this absence so that I can respond accordingly.

If my DBA was someone who is on my project but I don’t have shared tasks or deadlines, they would be in my Shared zone. I might be interested in this event but there may be no immediate action needed.

If we had worked together in the past but not currently working together, the DBA would be in my Experience zone. I’d probably regard it as informative.

If the DBA is in my fantasy football league and I email them frequently on the upcoming draft, I may have other social ties to this person, which could raise their level to Shared or Inner.

As you can see from this example, the challenge of implementing a social-centric project management system in the organization is incomplete information. Likely, you will not support interactions in the Experience or Public zones. Failure to support the Inner Zone interactions will slow or prevent adoption of the tool by users.

Inner Zone Support is Key

If projects are the latest state of the conversation, then Project Managers manage conversations rather than schedules. Schedules should be developed to support the conversations necessary to get the work complete. I’ve met many project managers who feel that their job is only about the project schedule. As they don’t see themselves as conversation managers, they tend to develop project plans that don’t support their conversations. Plans will either gather dust or worse, lead the team to improper conclusions.

Creating infrastructure to support the Inner Zone conversation is key to successful adoption of Project Server. Too many times, the system is configured from the perspective of the PMO. The PMO is happy but no one else will see anything of value. No value leads to a lack of engagement and the death spiral of apathy and bad data ensues.

To support the Inner Zone conversations, every configuration element has to be viewed from the perspective of the self and the Inner zone. A great place to start is to “make the system about me”. By this, little things like a My Projects view in Project Center, a Team Site home page that has my Active Issues and Risks and reports that show My Tasks make the system more about me. This raises engagement. Once this “me” need is satisfied, the ability to address the Inner Zone can be an extension of the self-views.

The goal is to turn Project Server into a source of information that drives people to want more, similar to honey in a beehive. By comparison, many customers of Project Server host a weekly cattle drive where we try to herd our user base to enter their time, update their projects and run their reports. Cattle drives aren’t satisfying and aren’t sustainable long term. We can and should look at the existing successful social platforms for ideas to improve the collection and use of data in Project Server. Honey may be easier to achieve than you think.

Project Server Security–Part 1

Security configuration is a confusing topic for many new and old to Project Server. This series provides a in-depth look at the security model and provides decision points and suggested best practices where applicable. We’ll also work through some common scenarios so that you can see how to best utilize the Project Server security model.

This first post provides an overview of the concepts upon which the rest of the series will build. The subsequent posts will dive into the various components that comprise the security system and how they fit together.

A key piece of advice is to keep things simple. Many times, I’ve had to reset a customer’s security configuration because things had become unmanageable. If you find that you are creating single group-category pairs, you may be working too hard to achieve your security goals.

Let’s Talk Terms

It’s important that you understand the terminology used with regards to security.

Project Server Security has three primary parts and supports three levels of security permissions. It also has the ability to control the data that you can see, based on your organizational position.

The three primary parts of security are:

  • Users
  • Groups
  • Categories

Users (or User Accounts) typically represent a single person who can log into Project Server. This is important as Project Server security only applies to Users, to control which functions they can perform and which project and resource data they can see.

Users are sometimes used synonymously with Resources, though this is incorrect. Resources can be assigned work in Projects but may not necessarily be able to log into the system. An example would be a generic role based resource. They are added to the plan to show the resource demand. However,  they will never log into the system. Users, like the system administrator, are typically not resources.

Groups represent roles within Project Server and the specific functional permissions needed to support those roles. These permissions are called Global Permissions, which is the second level of permission.  these permissions define what you can generally do in Project Server. We will discuss the standard roles in greater detail in a later post.

Categories represent pools of Projects and Resource data. The data returned by categories is used to populate Views, like the Project Center. They also contain the security permissions related to allowable actions on the returned data. Hence, these permissions are called Category Permissions, which is the third level of permission. Category Permissions define what you can do functionally to the data you see returned by the Category. In my opinion, these are the most important permissions as they control the ability to create and update data.

You are probably wondering what happened to the first level of permissions? These permissions are called Organizational Permissions, as they impact the entire organization. Organizational Permissions enable the implementer to turn off Project Server functionality for all users, including the administrator.

Lastly, there is a special custom field in Project Server called the Resource Breakdown Structure or RBS. This optional use field is used to capture a data visibility hierarchy. The RBS, when used, filters the visible project or resource data based on a user’s position within the hierarchy. For example, the RBS is used to denote a Resource Manager’s Direct Reports.

Key Entities and Relationships

At the most basic level, the diagram below shows the relationship between key entities. Users derive their allowed functions from the Group. The user also accesses data via Views. The View data derives from the Categories associated with the User’s member Groups.

Therefore, views should only be associated to categories which supply the desired data for the given user role.


The key to understanding the security model is in the relationships between groups and categories. The diagram below outlines the relationships of the out of box security model, where the arrows show the categories associated to each group. Category permissions, set in the relationship, determine what project and resource data appears and how it can be acted upon by a given group of users. Therefore, the category permissions (the intersections) are the most important part of the overall model. A deep dive into this area will be the topic of a future post.


What Security Does and Does Not Control

The Project Server security model controls functionality and data visibility within Project Server and Project Professional. It also automatically manages access to the Project Team Sites created in SharePoint by Project Server when a project plan is published.

Project Server security only applies to Project functionality. Therefore, there is no impact on Business Intelligence Center functionality or in direct SQL database access needed for report development. Special consideration should be taken into account for the handling of confidential projects. For example, if you have configured Project Server to prevent the 2013 Headcount Reduction plan from appearing in Project Center, rest assured, it is likely still visible via the Reporting database and Business Intelligence tools.

The Project Team Site aspect controls the level of access that a Project Team Member has to that Team Site. By default, the Project Owner who is also a member of the Project Manager Group, is the Team Site administrator. If a team member is on the Project team but is not assigned work, they are placed in the Viewer (Read only) role on the Team Site by Project Server. If the team member is assigned to a task, they are promoted to Contributor role on the site, where they can make changes to documents and lists. This permissions synchronization between Project Server and the Project Team Site occurs every time the project is published.

Subsequent posts will cover security configuration in greater detail. Future topics will include understanding the use of the RBS and how to configure Project Server to handle confidential projects.

Converting RTF to Text in SQL Revisited

If you’ve ever attempted to include content from a Project multi-line text custom field or Issues multi-line fields in an Excel report, you will get a result that looks like this:

<div><p>Vendor is having issues with deliveries to the East Coast.</p></div>

At the 2012 Project Conference, I presented a SQL based solution that converts the RTF content to XML and then returns the text. This technique was also published here in the Project Server Forum.

In practice, the technique worked but had failures in some cases. I found early bugs in SharePoint where the lead or trailing <div> tag was missing. However, this did not explain all failures.

I got an email from Doug Welsby of Microsoft Canada, that the failures were due to text values like &nbsp; which are not standard XML. Basically, an inline (Document Type Definition) DTD is needed to convert a non-standard XML value like &nbsp; to XML compliant &#160;  The full list of possible values to convert can be found here: HTML Codes

Technique Assumptions

This technique works on any multi-line Project custom fields or multi-line list columns in SharePoint. Therefore, the Issue Discussion field from SharePoint can also be cleaned with this technique.

The following example illustrates the SQL technique, though you may have to extend the DTD definitions to fit your own data. Use the table to above to add additional ENTITY declarations. The technique also assumes that the multi-line field contains valid HTML. I’ve found no issues with this technique on systems patched to June 2012 CU or later on Project Server 2010.

This technique will work with Project 2007, 2010 and 2013 databases as there are no real differences in how this type of data is stored across releases. This technique does work inside an Office Data Connection.

NOTE: One of the challenges I faced is that I could find no good examples of how to do an inline DTD in SQL. While the technique below works, I’m open to a more elegant way to construct the DTD. Please post any ideas or issues in the comments.

The Technique

The technique does the following:

  • Converts the RTF value to an XML document
  • Extracts the text value from the XML
  • Trims the blank space from the resulting value
  • Replaces any null values with a blank
    First, two variables are declared to hold the XML Header and Footer information. The Header contains the inline DTD declaration for &nbsp; and could be augmented with more declarations. The Footer contains the tags needed to complete the document.

The SQL CONVERT function changes the concatenated string of Header, your multi-line Project custom field and Footer into an XML document. The 3 option in the CONVERT command enables the inline DTD as well as it leaves all internal spaces as is. More details on the CONVERT command can be found here.

The .value method performs and XQuery and returns the string value to SQL. More details on this method can be found here. The method returns the value to an nvarchar(4000) field but you can use nvarchar(max) if you think you may have a truncation issue.

LTRIM removes the leading spaces of the resulting string. You could also do a RTRIM to remove following spaces but I didn’t find this to be an issue. The ISNULL is used to return a blank when a null value is returned. If you don’t do this, you get (blank) in Excel rather than an empty cell. You can fix this in Excel but I found it more efficient to simply fix it in the data source for every report that uses it.

Code Sample

Replace the YourMulti-lineCustomField text below with the name of your custom field. Since the original field is also in the query result set, I tend to name it YourMulti-lineCustomFieldClean so that I can distinguish the two versions.

The ISNULL clause should be repeated for each custom field you are cleaning.

    declare @Headxml nvarchar(350)
    declare @Footxml nvarchar(50)
    set @Headxml = N'<?xml version="1.0"?>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
    [<!ENTITY  nbsp "&#xA0;"><!ENTITY quot "&#x22;">]><html><body>'
    set @Footxml = N'</body></html>'
    select   *      
            .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName]
            .value(N'(/)','nvarchar(4000)'))),'') AS [YourMulti-lineCustomFieldNewName2]
    FROM  dbo.MSP_EpmProject_UserView

What’s the RDB? Part 1

If you are planning to write reports over your Project Server data, you will need to know about the Reporting Database or RDB for short.  

A Little History

In Project Server 2003, all of the data was typically in one database.  Since this database was optimized to support Project Server transactions, reporting on that data could be a bit challenging.  Also, there were potential performance concerns as a long running report could impact the performance of core operations.

In Project Server 2007 and continuing in Project Server 2010, a separate Reporting Database is maintained.  This separation provides a database schema that is friendlier to report writers and it allows administrators to move the RDB to a separate database server if performance becomes a concern.

Project Server 2013 recombined the four databases into one physical database using four different schemas. This change makes Project Server 2013 easier to manage in a cloud infrastructure. If you upgrade from 2010 to 2013 and use the name of the reporting database as your database name, all of your reports will still work with no change.

In part 1, an overview will be presented and an overview of one key relationship within the RDB will be presented.

Ok, So What’s In It?

The Reporting Database is used as a staging area for OLAP Database creation and to provide project server data in an easier format to query.  It contains data on the following major entities:

  • Project Decision
  • Project
  • Task
  • Task by Day
  • Assignment
  • Assignment by Day
  • Resource
  • Resource by Day
  • Timesheet and Administrative Time
  • Issues
  • Risks
  • Deliverables
  • Workflow
  • Other Supporting entities

It’s a bonanza of data for you to use.  Here’s a simplified relationship diagram of the Reporting database core tables.  I’ll use this as the basis for posts going forward and show you how to get some nice reports out of this data.

simplified ERD.

For your convenience, I’m hosting an online interactive copy of the RDB Field Picker.that comes in the Project Server 2010 SDK.

The P-T-A-R Principle

One of the key reporting relationships in the Reporting Database is that between Project – Task – Assignment – Resource (PTAR).  This relationship constitutes the majority of Project Management centric reporting. 


The Project entity contains the core information related to the Project plan.  There is one record for each project on Project Server.  This record includes:

  • Any Project level custom fields
  • Project rollup of work and costs
  • Project start and end dates
  • Project baselines

A complete Excel based field reference can be found in the Project Software Development Kit, found here: Download the Project 2010 SDK

The primary view for Project information in the RDB is MSP_EPMProject_UserView where ProjectUID is the primary identifier.  In the diagram above, all of the other tables that are linked to Project are related to that table by the ProjectUID value.  If you are writing queries which require Project information, your join should use the ProjectUID.


The Task entity contains task level information for each task in every project.  The data is related to the task itself and not to a specific resource.  This record includes:

  • Start and End Dates
  • Baselines
  • Task Custom Fields
  • Work
  • Costs
  • Characteristics of the Task

Task characteristics are especially of interest when reporting over the data.  These tend to follow the naming convention of TaskIs… For example, if I wanted a Milestone only report for a given project, I would select all task records where projectuid is equal to the Project’s UID and TaskIsMilestone is equal to 1.  There are a number of these fields and I would suggest looking at the RDB Field Reference for more details.

The primary task view is MSP_EpmTask_UserView where TaskUID is the primary key.  Though TaskUID is a Globally Unique Identifier (GUID), when working with other related tables/views, I still do my joins using both ProjectUID and TaskUID.  The chances of duplicate GUID values are low but I don’t want to be that type of lucky with my project reporting.


The Assignment entity contains information specific for a given resource on a given task.  Therefore, there will be a record for each resource for each task.  The primary view is MSP_EpmAssignment_UserView where AssignmentUID is the primary identifier. 

The Assignment record contains primarily:

  • Dates
  • Costs
  • Work
  • Baselines
  • Custom Field values where rolldown was enabled.

As a consequence, the Assignments table is one of the largest in the RDB and in Project Server as a whole.  As you need this data to derive the relationship between resources and the tasks to which they are assigned, play specific attention to query execution plans when creating new reports.


The Resource entity contains information related to a specific resource, be it a work, material, cost or budget resource.   There is one record for each resource created in Project Server.  The resource record contains:

  • Resource Custom Fields
  • Resource Characteristics
  • Login Account information
  • Organization information like Team, RBS
  • Cost rates
  • Timesheet Manager

Like the Tasks, the Resource view follows a similar naming convention of ResourceIs… to denote Resource characteristics.  For example, if I only want to see active Resources, I would filter my records based on ResourceIsActive = 1.  Especially for resources, you should add filtering on these fields to ensure you do not include team, generic and inactive resources in your reporting.

That covers the core aspects of these entities.  In future posts, I’ll go into greater details as to how best to query against them as well as cover other relationships within the RDB.

I need information, now what do I do?

If you’ve ever been asked for information to support work you are doing or perhaps, to keep people informed of your efforts, you may have struggled with the process to define and deliver the information needed.  If you identify with this issue, this post will cover three key questions you should answer to help set your requirements so that you can get to your information goal quickly.

What’s the Problem and Why Do I Care?

Our starting point is based on a quote from one of my favorite professors.  “What is the problem and why do I care?”  His point was that you have to be clear on the problem and what you are trying to address if you are going to be successful in formulating a solution.  Clearly understanding the problem will enable you to be effective in gathering the right information for your answer.  You should be able to state it clearly and easily. 

Some examples are:

  • What is the overall cost and schedule status of my project?
  • Are my people overbooked in the fourth quarter?
  • What factors are impacting my project’s delivery?

Otherwise, you are going to waste time pulling together data while looking for a question to answer. 

What are you doing with the information?

The second question relates to how the information will be used.  Essentially, the purpose will help drive the how to best structure the outcome.  Many people fail to identify this aspect correctly, resulting in information that is not structured properly to meet the need.  I’ll cover each of these outcome types in detail in subsequent posts.

Most likely, you are doing one or more of the following to either draw a conclusion or to illustrate a point.  While these are broken out as distinct entities for illustration purposes, in many cases you will be using a combination of techniques.

In this example, we are attempting to understand why sales are so dismal in the North region for November.



When we look at sales data by region or we break down the number of hours entered against a project by each person, we are doing Aggregation.  We may also create synthetic groups to aggregate data based on some attribute of the underlying data.  In many cases, simple number charts are used to convey the data. 

In the example above, we are looking at the November Sales numbers for the Blue, Green and Red Sales teams by region.  The Red team seemed to do well in November, at least according to this view.



Typically, if you have aggregated data, it is likely that you need to compare and rank the groups of data.  The Stack Rank is a very common scenario where you are ranking the data by Best to worst, based on some criteria.  Number charts, bar charts and to a lesser extent, pie charts are commonly used for comparisons. 

In this example, the West region actually had the most November sales of any region.  The Red Team, leader on the previous view, actually sold the least amount in the West Region. 

Here’s why the problem statement is important.  Without having a clear definition of the problem, it isn’t apparent which answer is correct, as different conclusions can be drawn from the same data.



Another fairly common usage of information is to illustrate the composition of the data.  In this scenario, we may be attempting to determine which region has the most salespeople.  When used properly, a composition can be used to quickly convey relevant data.  We see that the East and North Regions are staffed with a smaller number of salespeople than the other regions.  This may give us a clue as to why sales are lagging in the North.



Another interesting information analysis you may choose to do is to understand how information changes over time.  This type of visualization allows you to understand the direction of progress, beyond the current state, enabling you to determine which items may be more worthy of your attention.  For example, projects that are late but are trending back to being on plan may be better off than a late project which is trending later. 

In this example, we see that sales in the North region are flat and actually beginning to decline.  In a real investigation, we would likely dig into this trend further since all other regions are growing.

Variation / Distribution


Another way to visualize the data is to visualize how the data varies for a given period.  In many cases, the temptation is to only look at aggregated values or averages, but sometimes it’s the distribution of the data which tells a more compelling story.  Readers who have a statistical background will be very comfortable with this type of information as distributions, variances and other such items are core to statistical investigation.

In this case, we see the majority of deals for November are small deals, with a second peak.  This view would also provide a wider view of what’s happening.  Do we have a training issue?  Have vendors decided to cut back on orders due to the economy?  Are there other factors at play?  Without this view, these questions may not have been asked.



The last information type is to map out relationships.  If you are deriving information from people relationships, you might here the term “social graph”, which is one way to construct, visualize and consume relationship data..  Relationship maps may uncover potential dependencies between items like people, which are not normally covered by work management and financial management tools. 

In our example, one item jumps out of the data, in that the North region is covered out of one office.  As the other growing regions are covered out of multiple regions, there may be collaborations on how to approach a customer that aren’t happening in the Chicago office.  These collaborations may be resulting in more, small sales in the other offices.  Further research is warranted but you should consider relationship mapping as part of your information arsenal.

What behavior do you want to occur, as a result?

One last aspect to consider is what story should the Information you gather tell?  One way to determine the form of the story is to decide what behavior you want to occur as a result of the information you’ve gathered.  Targeting specific behavior helps you decide in which fashion the data will be presented.

For example, if the focal point of the your information is to ensure certain upcoming tasks are completed on time, you may determine that a stack rank of incomplete tasks, ranked by days until Due Date, is the best way to present the data. 

In the example used in this post, your intent may be to get another salesperson hired in the Atlanta office for the Blue team to support the North Region.  You are able to illustrate a sales decline, some potential reasons for it and you would have to present where investment could improve the situation.

Three aspects that can be used to determine your information requirements were covered in this post.  We’ll dive deeper into these and other relevant information in future posts.