ABC of Power BI: ‘I’ is for Iterateby
Continuing his ABC of Power BI series, Hugh Johnson discusses iterations in calculations and what to look out for.
Let’s say that you have a sales table, with three columns for Customer, USD_Amount, Exchange_Rate (USD/GBP) and you want to compute the total GBP sales amount for all customers. To get the right result, you must evaluate each row in turn and then aggregate the result, not the other way around. This is basic GCSE maths. It is something that you would almost do without thinking in Excel but requires a little more thought in Power BI.
A subtlety in Power BI, that in my opinion is not immediately clear to someone starting out with it, is that there are two types of DAX calculation functions – Aggregators and Iterators – and that there are two ways to calculate a value using these functions: as a row in a Calculated Column or using a Measure. You can also create columns in Power Query during the data transformation and load process.
Choosing the correct combination of DAX function and Measure/Calculated Column is important when dealing with things that must be considered row by row, such as exchange rate calculations or the classic Total Price = Unit Price x Quantity.
You can see a very simple example here, where two customers have made purchases in USD at different times, with different applicable USD/GBP exchange rates.
Whether you multiply first and then add, or add first and then multiply will give you either the correct £3,062 or incorrect £6,095.50 answer.
In Excel, to do this correctly is almost instinctive. You just add another column “GBP_Amount”, evaluate each row, and sum the results.
In Power BI, there are (at least) three ways to do the calculation and get the answer that you want. There are also several ways to get different answers. If you come from an Excel background, the most obvious way to the correct answer may be to create a new column for the GBP_Amount and sum all the values in this column. You could create this new column using a DAX Calculated Column or add this new column as part of your data transformation and load in Power Query.
The example below is using a DAX Calculated Column to create CustomerSales[GBP_Amount].
This is very simple and intuitive for anyone from an Excel background. Calculated Columns are not always the best way to go, though. Generally, if you just want to do mathematical calculations that summarise data (what is the total customer sales value in GBP) and display the results, then it is preferable to use a DAX Measure vs a Calculated Column.
To get the same result using a Measure, I would use the DAX SUMX function.
Here is the new Measure dropped into a table visualisation, alongside the Calculated Column.
You can see that the Calculated Column and the Measure return the same result.
SUM vs SUMX
When you start to investigate DAX functions, you will see two functions SUM and SUMX whose name implies that they do very similar things. They do. The two syntaxes are below.
They both sum up a series of values. SUM simply adds up the values in a column whereas SUMX first calculates the series of values and then adds up these values. To calculate the series of values, SUMX iterates row-by-row over the specified <table> and evaluates the <expression>.
Thus with SUMX, in this case there is no need to create the additional CustomerSales[GBP_Amount] column.
SUMX used in a Calculated Column
If we take the DAX code for our [MeasureGBPAmount] Measure and use it instead to create a Calculated Column we get a different, perhaps surprising, result:
You can see that it has computed the value £3,062 for every row of the table. The reason for this is that when we are using SUMX in a Calculated Column we have two iterations going on – one is nested within the other. The first iteration, driven by the fact that this is a Calculated Column, iterates row by row through the table to compute the value for each row of the new column. The second iteration is driven by SUMX and iterates again through all the rows in the table. Thus, all the rows in this new column end up with the same value.
I can’t think of one, but there may be a reason why you would want to do this, so I include this as something to be aware of and to avoid unless it is a deliberate choice.
Other X-functions and Iterators
A quick browse of the list of DAX functions will reveal more Aggregation/Iteration pairs like SUM/SUMX. These include:
- AVERAGE / AVERAGEX
- MIN / MINX
- MAX / MAXX
- GEOMEAN / GEOMEANX
- MEDIAN / MEDIANX
Generally, the X at the end of the function name (excluding MAX) indicates that the function is an iterator.
There are other iterator functions, though. A better way to look at it is in the syntax for the function. If the syntax is something like FUNCTION (<table>, <expression>), then the <table> argument is a giveaway that the function is going to iterate over the table. Such an example is FILTER that has the syntax FILTER (<table>, <filter>).
Since FILTER returns a table as its result, you will need to use it within another function that is looking for a table as one of its arguments. Here, if we create an identical Measure and Calculated Column using the following DAX code, we again get different results:
The Measure just returns the amount £1,107 for the Customer ABC row, whereas the Calculated Column returns the amount £1,107 in both rows. In the first case, the filter context given by the rows of the ‘CustomerSales’ table combined with the FILTER function means that no data are visible in the XYZ row and the Measure returns a blank. In the second case, we still have the two iterations across the whole table and so in the XYZ row the value for the GBP Amount is that for the ABC row.
What’s a correct answer?
It is tempting to look at many of the examples given here and say that Power BI is giving the wrong answer. It isn’t. It is giving the correct answers to (in many of the examples) the wrong questions.
General comments and takeaways
- Some DAX functions iterate row by row over a table. It is important to recognise these and that they will return different results if applied in a Calculated Column vs a Measure. An Iterator function will always have <table> as its first argument.
- Having made the point above, I cannot think of an example when I have needed or wanted to use an Iterator function in a Calculated Column, but that day may yet come.
- I used FILTER as an example of an iterator. The truth is that I very rarely use FILTER, rather CALCULATE in most circumstances where I want to filter a table.
I have not intended this article to be a fully comprehensive and detailed guide, but rather to raise awareness of iterations in calculations and things to consider/look out for.
I would highly recommend further reading on the subject, starting with this excellent post from the guys at SQLBI.
You might also be interested in
I am a founder director of Accounting Insights Ltd, a specialist provider of Power BI reporting solutions to accountants in practice and in industry. I help accountants to use Power BI to create intuitive, engaging reports from their accounting data. I deliver management packs,...