3 Ways to Rev Your Microsoft Project Online and Power BI Performance (Number 2 Is Our Favorite)
Are you having tired of waiting for long refresh times with Power BI? Perhaps, you are getting timeouts because you are retrieving too much data. There’s three easy ways to avoid these issues.
WHY IS MY POWER BI DATA REFRESH SO SLOW?
Many Power BI models refresh slowly as the models are not structured to take advantage of query folding.
Query folding is a process by which specific Power BI transformations are executed by the data source instead of Power BI. Allowing those actions to occur at the source means less data has to be sent to Power BI. Less data means faster data refresh times and smaller report models. Note, not all data sources support query folding, but oData for Project Online and SQL Server for Project Server do.
A sample of these foldable actions include
- Column or row filtering
- Group by and aggregation
- Joins
- Numeric calculations
For example, you need to find out the amount of work scheduled by week for the quarter. You are querying the time phased data from Project Online. If you aren’t careful, you may be retrieving hundreds of thousands of records. Query folding will make a huge difference in speed and the number of records retrieved. If you have the records filtered by Project Online before retrieving them, you may only receive a few thousand records instead.
ISSUE #1: NOT REFERENCING THE DATA SOURCE PROPERLY
This issue occurs primarily using oData sources, such as Project Online, Office 36 and other web based sources. Query folding breaks if you don’t properly reference the oData feed.
In order for query folding to work properly, the transformations that have folding support need to be the first things executed. If a non-folding transformation is added, no subsequent transformation will be folded. If you don’t reference the oData feed properly, the step to navigate to the oData feed isn’t foldable, therefore blocking all subsequent transformations from being folded.
ISSUE #2: DOING DYNAMIC DATE SELECTION IMPROPERLY
This issue causes more headaches with Project’s time phased data than any other action. In many cases, you are looking to create a range of dates for Power BI to query. If you use the Between filter in the Power BI Query Editor, there’s not option for Today’s date +/- a number of days. If you use many of the other Date/Time filters, like Is Latest, you still seem to pull back a lot of data.
Solving this particular issue requires getting familiar with the M query language so that you can understand how Power BI Desktop performs specific actions.
For example, let’s look at the Is Latest date selection. By default, your dataset is unsorted so Power BI creates a list of the values in the selected column and does a Max value. While this is correct, this could result in a lot of records being retrieved to perform a Max and get one record. The M code over a time phased table looks like this:
= Table.SelectRows(#”Filtered Rows”, let latest = List.Max(#”Filtered Rows”[TimeByDay]) in each [TimeByDay] = latest)
To get much better performance, make the following two changes. First, sort the dataset in a descending manner on the column which you are using to decide latest. In this example, that’s TimeByDay. Sorting is a foldable action so the data source will do the work.
Next, change List.Max to List.First. Since the latest date is the first record in the sorted dataset, a lot less data is required to get the answer. So, my statement is now = Table.SelectRows(#”Filtered Rows”, let latest = List.First(#”Filtered Rows”[TimeByDay]) in each [TimeByDay] = latest)
In testing, the original way required over a 1 Mb of data to be retrieved to answer the question. The new way only retrieves 127 Kb.
ISSUE #3: URLS ARE TOO LONG
This issue comes into play when working with Project Online and SharePoint based data sources. Project, in particular, has a lot of long field names. SharePoint has a limit of 2048 characters for a URL. If you are manually selecting a lot of field names, you can accidentally go past the 2048 character limit.
What this means is that Power BI receives a URL that can’t be acted upon. The default Power BI behavior is to then simply retrieve the feed without modifiers. If this happens on a source with many rows, this could significantly impact performance.
In this case, you would break up your dataset into two or more to fit within the limitation and then merge them back together in Power BI.
WANT TO KNOW MORE?
Join us for a free Power BI performance clinic on March 22! This session will do a deep dive into these issues as well as show you how to determine when folding is happening (and not.) Go here to register!
[imageframe lightbox=”no” lightbox_image=”” style_type=”none” hover_type=”none” bordercolor=”” bordersize=”0px” borderradius=”0″ stylecolor=”” align=”none” link=”http://academy.tumbleroad.com/courses/power-bi-performance-clinic” linktarget=”_blank” animation_type=”0″ animation_direction=”down” animation_speed=”0.1″ animation_offset=”” class=”” id=””][/imageframe]
March 16, 2016 at 11:10 pm, jerome charlon said:
Superb article Treb, thanks