Tags:

Power BI basics: Two ways to remove old data and speed up your report

Hugh Johnson explains how you can remove old transactions from your Power BI files with two different methods.

13th Dec 2019
Director Accounting Insights Ltd.
Share this content
Bin
istock_fabrikacr_dc

Out with the old and in with the new 

If you load transactional tables from your accounting software into Power BI then the chances are that they contain a lot of old data that, quite frankly, you are not interested in. Hands up how many of you have transactions in your accounting software that are more than two years old? Five years or more? I know that there are many hands still left up.

That’s ok, it’s just that you probably don’t need all of this old data in your Power BI report and more data makes for a slower report. What’s more, is that unless your old accounting data are archived, every time you refresh your dataset it gets a little bit bigger. 

In many situations, it is enough just to include 25 months of data. In dates terms, this means a maximum of 762 days of data (365 + 366 + 31), or the last 24 months plus the current month. This gives you enough data points to look at year-on-year trends and also do regressions on these trends that factor in some seasonality. 

In Power BI you use Query Editor to strip out unwanted data. You may still have to import all of the older transactions, but then as the data is saved into your model, Power BI can remove the old transactions, making your saved Power BI file smaller and faster. 

In this article, I will give you two methods to do this using Query Editor. The first is to apply a simple date filter. The second is more complex, that removes all the old completed transactions but still retains any old transactions that are still outstanding. You may, for example, want to keep very old invoices in your dataset that have neither been paid nor written off. 

As well as the text description in this article, I have also published a video on YouTube that shows you step by step how you can do this. 

A quick reminder about Query Editor 

When you connect to data in Power BI, either for the first time or to refresh your dataset, Power BI executes “Queries” that retrieve the data from their source(s) and load them into your Power BI dataset that you see in the fields pane in Power BI. Query Editor is the part of Power BI that you use to edit these queries. You can do some very powerful things in Query Editor, like create columns, merge or append queries and, as explained in this article, filter data. 

Query editor

 

The programming language that you use in Query Editor is called “M”. Don’t worry, though, if you don’t know M. A lot of it can be generated automatically through code-free wizards for simple commands and you can achieve a lot more with just a few lines of code. 

A simple date filter to remove old transactions 

In this first example, I don’t need to write any M code. Power BI will do that for me. My dataset contains a transactions (or “Fact”) table called BITransSplits. 

Exploring the table in the Data View in Power BI, I can see that: 

  • The oldest transaction is dated 30th July 2017 

  • There are 57,491 transactions in the table. 

I will use Query Editor to filter it further and just give me the last 12 months plus the current month. 

All I have to do is open Query Editor through the Edit Queries button in the image above, then select my BITransSplits query and apply a filter to the [Dates] column as shown below: 

Date filter

 

 If I select the “Date/Time Filters” and “In the Previous”, then I get a very simple form that I can fill in like this: 

Filter rows

 

The nice thing about this is that I have not hard-coded any date filter. It is always relative to today (or to be more precise, relative to the day that I refresh the dataset). 

It is as simple as that and, in the background, Query Editor has automatically generated the M code for me: 
= Table.SelectRows ( #"Changed Type", each Date.IsInPreviousNMonths([Date], 12) or Date.IsInCurrentMonth([Date])) 

#”Changed Type” referred to here is the previous step of the BITransSplits query. 

The filter applied is whole months, so if today is 11th December 2019 then the function Date.IsInPreviousNMonths( [Date], 12 ) will return TRUE for all dates that are in the months December 2018 through to and including November 2019. I therefore also need to specify the current month. 

Now, this is all very straightforward and I can do it without coding. Looking at the code, however, it is easy to see how you might extend this with other conditions with extra “or” conditions such as Date.IsInNextMonth( [Date] ). 

Now, if I click “Close and Apply”, Power BI will close Query Editor and apply the query. 

As a final check, I can explore my BITransSplits table in the Data View in Power BI and verify that: 

  • The oldest transaction that is outstanding is 1st December 2018 

  • The total number of transactions is now 27,748.

Results of filter

 

A more complex filter based on more than one column 

Let’s say that instead indiscriminately removing transactions over a certain age, I still wanted to retain all transactions that have an outstanding value (such as unpaid invoices) regardless of their age. This requires something a little more complex, but still not difficult. 

I can do it in three steps and I do need to write some M code. The steps are: 

  1. Create a new column that tests for all conditions for my filter, returning TRUE if I want to keep a record and FALSE if not. 

  1. Filter my BITranSplits table, based on this new column, keeping just the TRUE values 

  1. Delete the new column (as I no longer need it). 

This last step might seem odd, but Power BI runs its queries at the time of refresh. A query comprises a sequence of steps that are executed in order. Once they are done they are done. So the last three steps for my query to create my BITransSplits table in my dataset will be: 

  • Create a custom column 

  • Apply filter 

  • Remove columns 

To create a custom column, in Query Editor first select the query that you want to edit (in this case BITransSplits), then go to the “Add Columns” tab and select “Custom Column”. 

Custom column

 

You will then see a form into which you can write your M code: 

Custom column

 

In this form, I can write the filter command. Notice that the “available columns” on the right are all of the columns that are currently in my BITransplits table as a result of the last step of the query. This means that I can create a formula that is based on many fields, not just the [Date] field. 

I want to keep all transactions where the [Outstanding] amount is not zero and remove all of the others that are more than 12 months old. In this first step want the result to be a TRUE/FALSE column that I can simply filter in the next step. I don’t need to worry about giving the column any particular name since I am going to delete it in the third step. 

Custom column

 

I use two M functions: Date.IsInPreviousNMonths, Date.IsInCurrentMonth and two operators “or” and “<>”. 

After clicking “OK”, I see my new Custom column in Query Editor. 

Custom column

 

 

Now, I can filter this column [Custom] to retain just those rows that have a value = “TRUE”. 

Filter rows

 

As a final third step, I can now remove this temporary column [Custom]. 

Remove column

 

If I look in the Query Settings pane on the right, I can see the last three steps that I have just added to my query “BITransSplits”: 

Query settings

 

Now, if I click “Close and Apply”, Power BI will close Query Editor and apply the query. 

Once this has completed, as a final check, I can explore my BITransSplits table in the Data View in Power BI and verify that: 

  • The oldest transaction that is outstanding is 1st Jan 2018 (no filter) 

  • The oldest transaction that is not outstanding is now 1st December 2018 

  • The total number of transactions is now 27,762 

Final result

 

I can also verify that all of the 14 transactions that are older than 1st December 2018 have an outstanding balance: 

Final result

 

Summary 

It is very easy to reduce the size of your dataset without losing any critical information, using Query Editor to apply filters on data load. Doing this will reduce the size of your model and improve its performance. It will not, however, reduce the time that it takes to refresh your data in your model.

It will slightly increase this time as Power BI still needs to load all of the data before it applies these new filter conditions. On balance, though, this is usually a tiny price to pay for what can be a great benefit. 

Useful links 

Tags:

Replies (0)

Please login or register to join the discussion.

There are currently no replies, be the first to post a reply.