Power Query techniques: Merging tables of data
Over the past few months we have looked at the range of BI tools included as part of Excel or in the separate Power BI Desktop application. As part of this, we have looked at a range of different techniques.
In this series we are going to examine some of those techniques in more depth in order to understand what they are capable of, and how best to use them, not only when working with external data, but also for changing the way that we work within Excel workbooks.
Content seriesView full content series
To coincide with the publication of the EZ Guide on the use of the VLOOKUP() function, we are going to start by looking at merging tables of data within Power Query. This time we are just going to look in detail at the process of merging two tables before looking at troubleshooting and more advanced issues next time.
Note: the Power Query tools started off as an optional add-in for Excel version 2010 and 2013 before they became an integral part of Excel 2016 as the Get & Transform group of the Data Ribbon tab. The same tools can be found in the External data group of the Home Ribbon tab of Power BI Desktop. Throughout this series we will refer to all of these tools as 'Power Query' and base our instructions on the use of Excel 2016.
At its simplest, joining two tables in Power Query requires each table to include a field that contains the value that will be used to join one table to the other. We'll start off with an example based on small tables of data within an Excel workbook.
We have created an Expenses table and a Clients table. Each table has a Client ID field which will form the link between the two tables. We can click any cell in each table in turn and use the From Table/Range option in the Get & Transform Data group of the Excel 2016 Data Ribbon tab to create queries for each table.
Within the Query Editor, we can join the queries using Home Ribbon tab, Combine group, Merge Queries dropdown, Merge Queries as New:
As the descriptions suggest, Merge Queries as New will create a new, third, query based on the merge operation we choose, whereas Merge Queries will add the merge step to our current query.
Industry insightsView more
The Merge screen lets us choose any existing queries from the dropdown. We can then click in the column in each query that we want to use to join our queries. The Merge screen shows how many of the first set of rows are matched, in our case 23 out the first 24. The number of matches depends on the type of join selected from the Join Kind dropdown.
For now, we'll leave this at the default of 'Left Outer' which will keep all the rows from the first (top) table and matching rows from the second table. When we click on OK to create our merged query we can see the effect of this:
If we click in the white space to the right of the word 'Table' in the Clients column we will see a preview of the table of data that that 'cell' contains. It is important not to click on the word 'Table' itself, as this will create a specific 'Navigation' step, selecting that row rather than just showing a preview. As we can see, Client ID 6 has no match in our Clients table, so an empty table is returned.
In order to include information from our Clients table in our query, we click on the Expand icon at the right of the new column heading. Because the column contains a Table in each row, and that Table could contain multiple rows, the initial choice is between creating additional rows where multiple rows exist, or using an aggregate function to total values from the table within the existing row.
In our case we will choose Expand and we can then select the columns that we want to add to our query rows and whether we want to prefix their names with the original column name. We will expand and use our Client name and Client Discount columns, including the prefix:
Here we can see the effect of expanding our Clients column. We can see that row 24 shows nulls as there is no matching value for Client ID 6 in our Clients table:
The Join Kind that we choose in the Merge screen is vital to how our queries are merged. We have deliberately set up our tables so that each includes a value not found in the other, and we can see how the join kind effects matching and non-matching rows. We can return to our Merge screen by clicking the 'gear' symbol for the Source step in APPLIED STEPS. The Join Kind dropdown shows 6 options:
The join type names themselves are not very helpful if your expertise lies more in the field of spreadsheets than databases but, fortunately, each also includes a more user-friendly explanation.
We have already seen that the Left Outer join preserves all the rows in the table we select at the top of the Merge screen, whether or not a match is found for the Client ID in our second, bottom, table. Perhaps not surprisingly, the Right Outer join works in the opposite way with any rows from the top table that don't find a match being excluded, but non-matching rows from Clients being included:
Full Outer includes all rows from both tables regardless of whether there is a match:
Where you need to 'lookup' a value in one table from another, then you will usually use a Left Outer join. The Left Outer would allow us to find the Client name and discount for each of our expense rows where there was a match, leaving us to decide what to do with null values in those columns where there was no match.
If we needed to create a list of all of our clients, including those with no expense entries, showing the aggregated total of the Value column, we would need to ensure our Clients table was selected at the top and Expenses below and again use a Left Outer join, and then the Aggregate option for our column:
The Inner join can be used where you are happy to exclude rows for which there is no join from your query altogether.
These joins are perhaps most useful for troubleshooting, making it easy to identify items for which there should be a match but where the corresponding value is missing from the other table. These join types will create tables that list all those rows without a match.
In part two we will look at more advanced merges including joins that involve more than a single field; approximate lookups; practical uses of merged tables beyond the simple 'lookup' covered here and we will also address some of the issues and problems that you might come across when merging tables.
Simon Hurst is the founder of technology training consultancy The Knowledge Base and is a past chairman of the ICAEW's IT Faculty.