Power Query techniques: Grouping
In his latest Power Query editor tutorial, Simon Hurst tackles the grouping option that allows you to generate a summary table from a column of data, based on the values in one or more columns.
Although this is easy enough to achieve in Excel without the use of Power Query by using a PivotTable, if the summary is intended as an intermediate step in creating a report, rather than as an end result, the PivotTable format may prove less easy to work with than a normal Excel Table.
Power Query grouping allows you to create a summary table directly from your raw data. Just like a PivotTable, this summary table will need to be refreshed to reflect changes in the underlying data.
Content seriesView full content series
Note: the Power Query tools started off as an optional add-in for Excel version 2010 and 2013 before they became an integral part of Excel 2016 as the Get & Transform group of the Data Ribbon tab. The same tools can be found in the External data group of the Home Ribbon tab of Power BI Desktop. Throughout this series we have referred to all of these tools as 'Power Query' and base our instructions on the use of Excel 2016.
The last episode of the EZ Guide series included a look at Excel's conditional summary functions. These allow the calculation of totals based on the values in one or more columns. This works well enough if you need the total for a defined list of values but, if you need to extract the total for each of the values in a column, it is less easy to automate the process.Let’s consider an example where we need to summarise our values in order to create a summary report.
You could, for example, copy the entire column that you want to use to summarise your values by to another location. You could then use Insert Ribbon tab, Tables group, Table (or the Control+t keyboard shortcut) to turn it into an Excel Table.
The Table Tools, Design Ribbon tab, Tools group includes a Remove Duplicates command that would ensure each item is included only once. You could then use SUMIF() or SUMIFS() to create your summary formula in the adjacent column:
This will work if there are no new values entered into the column that you are using to summarise by, but, if there are, the process would need to be repeated manually.
As a more automatic alternative, we will create a query based on our Table of data by going to the Data Ribbon tab (in earlier versions of Excel Power Query has its own tab).
Having selected any cell in our Table, from the Get & Transform Ribbon tab (or for Office 365 subscribers the Get & Transform Data Ribbon tab) we use From Table. This loads our Table into the query editor. The Group By command is in the Table group of the Transform Ribbon tab. We can use the Basic option if we just need to group our data using the values in a single column.
Here, we have chosen our Country column. We can choose a name for our new aggregate column and select the column for which we want to create summary values and the type of summary option to use. This list includes equivalents for the Excel conditional summary functions COUNTIFS(), SUMIFS() and AVERAGEIFS() and the more recently introduced MAXIFS() and MINIFS().
Industry insightsView more
In addition, we can use Median, Count Distinct Rows and All Rows. Count Distinct Rows counts the different values in our column, ignoring duplicates. All Rows is a little different, it creates a column in which each ‘cell’ contains all of the individual rows for the item that you are grouping by.
You can then expand these rows, or create an aggregate from them, at a later stage in your query:
The Advanced option for grouping extends the grouping options in two ways. You can group by more than one column. Here, we are summarising by Country and by Product:
You can also add more than one aggregation column. This allows you to group by the value in a column and show a summary, a count, a maximum and a minimum for example.
Here, we have combined both of these to show a range of aggregations by Country by Product:
Once we have grouped and aggregated our data as required, we can use Close and Load to create the summary Table in Excel.
In effect, it is the equivalent of what we can achieve by copying our column, removing duplicates and adding summaries with conditional summary functions but, because we have used Power Query, should the underlying data change, then all we need to do is to right-click in our Table and use Refresh to update our summary table to include all the data in the original table.
Also, because our Power Query output is an Excel Table any formulae that we create that are based on entire columns of that Table will automatically adjust to reflect any changes in our refreshed data:
Although this might seem quite a simple use of Power Query, it is perhaps fitting that as the last article in this series, it highlights some of the most important reasons why Power Query is so much more than a method of processing and working with external data:
- In our example, all the data that we have used is within our single Excel workbook;
- Power Query allows us to cure one of Excel’s main shortcomings – the difficulty of dealing with changing volumes of data;
- Power Query requires us to think a bit differently. In this case, we need to adjust from the assumption that all values in our workbook will automatically be updated as precedent cells are changed to understand the need for a refresh operation, whether manual or timed. This is no different from working with PivotTables.
The majority of Excel users could improve the automation and reliability of their spreadsheets by achieving an understanding of the uses of Power Query, even when just working within Excel and solving relatively simple Excel problems.
Simon Hurst is the founder of technology training consultancy The Knowledge Base and is a past chairman of the ICAEW's IT Faculty.