In his latest article on Power BI, Hugh Johnson weighs up Query Editor and DAX and asks: which is the better option for preparing your data set?
When you look at your dataset in Power BI, the tables and columns that you see are definitely the result of Power BI “Queries” and possibly also the result of Calculated Tables and Columns in DAX.
“What?” I hear you say. “But I never wrote any Power BI queries. I just imported the data using ‘Get Data’”.
Well here is the thing. When you stepped through the wizard after clicking on “Get Data” and selected your data source, in the background Power BI created a set of queries (one for each table) that it executed to get your data and create your dataset.
Perhaps the only clue that there was something called “queries” being created would have been a brief alert like this, that soon disappeared.
Whenever you refresh your data, Power BI executes these queries.
Common problems with your “starting” dataset
It is highly likely that when you first load your dataset it is not exactly what you need, or at least sub-optimal. Data sources are typically operational systems that are not perfect for reporting. Common problems are:
Unnecessarily old data. Your dataset may include transactions that are several years old, that you have no interest in. A high volume of old, uninteresting transactions will just clog up your model and make it slow.
Unnecessary columns. The tables from many operational systems may contain many columns that you do not need for reporting. These columns may slow down your model and make it harder to use your dataset.
Duplicate data across related tables. Many transactional tables contain data that are also in other reference tables, such as customer names and addresses. This can cause problems with filters in your relational dataset if someone inadvertently tries to use the reference data from a transactional table in a filter or slicer. It also slows down your report because it is unnecessarily carrying too many data.
Your data are not in the right structure. A period column might be a text value like “January 2019”, but what you really want is the month-end date 31/01/2019 in date/time format
Duplicate column rows. This can happen when the table is really the combination of two or more separate things that your source system has combined into a single table.
Let me illustrate this last point with an example from the Xero table structure available through its API. One of my 10 tables that I loaded into Power BI is ‘Invoices’. I expected that this table contained the invoice header information. It does, but not in the form that I had anticipated.
When I browse the data, I see duplicate values for my InvoiceID.
Further investigation reveals that ‘Invoices’ has a misleading name. It is in fact a table of invoices and payments. If multiple payments have been made against one invoice, then the invoice header information including Invoices[InvoiceID], Invoices[Amount Due], Invoices[Date] and Invoices[Due Date] are duplicated. This means that the following DAX formula will give me an inflated figure for the total amount outstanding:
Outstanding := SUM ( Invoices[Amount Due] )
Most accounting data sources that you connect to will have some subtleties like this, which means that you will need to do some work on improving your dataset for your anticipated reporting needs.
Preparing your dataset – DAX or Query Editor?
Power BI provides you with two tools that you can use to get your dataset into the form that you want: DAX and Query Editor. As a general rule of thumb, data analysts consider it better practice to push any fundamental data modifications (as opposed to aggregations) as far upstream as possible.
Therefore, if you cannot modify the data source, the next best thing is to modify the queries that load your data. This is generally true and you do this in Query Editor, where you can do an awful lot of data manipulation without writing a single line of code.
Many things that would require quite complex DAX code, you can do in Query Editor with just a few clicks. When you do need to write code in Query Editor, you cannot use DAX though, but need to use a language called “M”. This is not particularly difficult to learn, it is just different.
A query comprises a number of steps that Power BI runs in sequence as it loads your data. As you modify a query, Power BI Query Editor adds steps to the data load process. If you make a mistake building your query, you can edit or remove individual steps. You can also change the order in which Power BI will apply the steps when it runs the query.
So, if the general rule is to move as much of your dataset manipulation as you can into Query Editor, about what kinds of things are we talking?
What you can do in Query Editor but not (easily) in DAX
Query Editor (and it’s “M” programming language if you need it) is aimed specifically at the lower level manipulation of individual tables and their columns as you load them into Power BI. The primary purpose of your queries is to load a nice, clean dataset ready for your higher-level work in DAX.
If you look at your dataset and find yourself thinking, “I don’t really want to start from here”, then maybe you should consider a bit more prep work in Query Editor. In Query Editor, you can do the following things that would be tricky (if not impossible) to do in DAX, at least not without creating additional tables or columns in your model:
Change the specification for your data source, such as the location of an Excel file.
Remove data (rows, columns, tables) from your dataset. In a recent client project, my starting Power BI file was 380MB once I first loaded my data. Many data were not required, and there were many duplicate data across the tables. After removing the old and duplicate data using Query Editor, my file size was 74MB; an 80% reduction!
Merge or append data into the same place – e.g. merge two columns without creating a new one.
Change values in a column.
Transform data in a column – specifically dates, text and numbers. Here is a video showing how to transform text data about dates into dates.
Split columns by delimiter.
Pivot and Unpivot tables (change rows to columns and columns to rows) – here is an example showing how to convert Month columns in a budget spreadsheet into budget rows based around a single data column (far more suitable for analysis in Power BI).
What you can do in DAX but not (easily) in Query Editor
DAX facilitates analysis of your dataset, rather than preparation of it. You would use DAX to:
Create any Measures
Create columns that are based on table relationships in your model
Manipulating the evaluation context to create, for example, period-to-date, period-on-period aggregations (you would most probably want to do this with Measures, which you simply cannot create in Query Editor)
The big overlap
If you want to create calculated columns in a table that just depend on values of other columns in the same table, then it is very simple to use DAX and M (in Query Editor). An example might be to add a margin column to your ‘Sales’ table as follows that is Sales[Amount] – Sales[Cost].
In DAX, you would write this:
Margin = Sales[Amount] – Sales[Cost]
In Query Editor (M), you add this to your Sales query:
Margin = [#”Amount”] – [#”Cost”]
For those of you not wanting to learn M in addition to DAX; do not be afraid. Generally, in such a situation, I would follow the guideline and use Query Editor rather than DAX and for simple calculated columns like this Power BI Query Editor will build the query steps for you without you needing to know M. I have three comments here though.
The first is to ask if you really need the new column and if you should implement this in a measure. The second consideration is one of transparency. If you are still in testing / prototyping phase of building your model it is much easier to spot your calculated columns if you implement them in DAX.
The third is performance. Moving calculated columns and tables from DAX into your queries will make your Power BI file open faster and perform better once open. It will however add processing (and therefore time) to your data refresh.
If your dataset is not quite what you want for your model, add Query Editor to your repertoire of techniques. In many cases, Query Editor will not only be a simpler alternative to DAX, but it may also dramatically improve the performance of your model.
About Hugh Johnson
I am a founder director of Accounting Insights Ltd, a specialist provider of Power BI reporting solutions to accountants in practice and in industry. I help accountants to use Power BI to create intuitive, engaging reports from their accounting data. I deliver management packs, sales reports & forecasts, liquidity & cash flow reports, stock reports, and accounts receivable analysis from Sage, Xero, Exchequer, Microsoft Business Central, Iris, and CCH.
I hold a Master of Engineering from City University London (awarded with commendation and the highest marks in my year) and a Post Graduate Diploma in International Selling from Dublin Institute of Technology (awarded with distinction and the highest marks in my year).
My personal passions are high-performance Fireball dinghy racing and food.