Supercharge Excel: VLOOKUP() v merging tables – approximate lookupby
Simon Hurst continues his series of articles that looks at how Excel's Power Query/Get & Transform tools can be used for a great deal more than just acquiring data from external sources.
- The first part of the series included an overall introduction to how these tools work, what they are capable of and how they might require a different way of thinking compared to 'standard' Excel techniques
- Part two looked at using the Power Query tools to automate the process of appending several Tables of data into one single Table
- Part three continued the consideration of appending Tables of data, but this time automatically including tables from particular files or folders.
- Part four moved on to merging Tables of data and showed how to add a product category column to our sales invoice table by 'looking up' each product in a separate categories table.
The part four merge was based on there being an exact match for each of our sales invoice values. This time, we are again going to merge two tables so that our invoice table includes an additional column, but we are going to assume that our reporting periods are not exact calendar months and we, therefore, need to match each invoice date with a Periods table. This table will include each period end and start date.
Consequently, we can't use an exact match because most of our dates won't exist in the Periods table. Instead, we need to create the equivalent of an Excel approximate VLOOKUP() type of match, where a value matches the largest item in a sorted table that is less than or equal to that value.
As stated in the Part one introduction, for convenience, we will refer to Power Query throughout but Excel 2016 Get & Transform and Power BI Get Data operate in a similar way. Now that we are starting to cover some more advanced Power Query features, it's also worth mentioning that Power Query is updated monthly. If what you see differs from what is shown and described, it might be due to the version of the Power Query tools that you are using.
Before we start, this approach is more complicated than the techniques we have covered to date and, given that the approximate form of VLOOKUP() doesn't suffer from the same performance issues as the exact form, taken in isolation it would almost certainly be easier just to use VLOOKUP() rather than Power Query for our example. However, it does demonstrate some useful additional techniques, and it could well be valuable as part of an overall Power Query solution. We will see an example of this when we combine several of the techniques from the series to fully automate the process of producing a set of consolidated management accounts in the next part of the series.
The fact that VLOOKUP() can work in two different ways is the cause of many errors in the use of the function. There is an optional fourth argument which is normally set to FALSE to ensure that VLOOKUP() only finds an Exact match and returns #N/A if there is no such match. If the fourth argument is set to TRUE or, crucially, omitted or forgotten, VLOOKUP() defaults to using an Approximate match. The VLOOKUP() approximate match is very specific. It certainly doesn't find the 'most similar' item however you might define that. In fact, for it to work, the first column of the lookup table has to be sorted in ascending order. The lookup value will then be matched against the largest item in the first column of the table that is less than or equal to the lookup value.
Our example should make this clearer. We have a table with the period start and end dates that we want to use to define our 'months' based on full weeks rather than actual month end dates. To find which period an invoice falls into, we compare the date with our sorted column of period start dates. For an invoice to fall into a period, it needs to match the latest date that is before or equal to it. So, 26 August 2010 will match the 6 August 2010 since 06/08/10 is the largest item in the table that is less than or equal to 26/08/10; the next value in the Period start column being 03/09/10:
Approximate lookup in Power Query
There are various ways to replicate the way that an approximate lookup works in Power Query. There are online examples of using Power Query's M code programming language to do so, but we will look at a method that just uses merged tables and standard Power Query interface options. As far as I can see, this method does produce the same result as an approximate VLOOKUP() but if you can see a flaw in the approach, please add a comment to this article.
As we did in Part 4, we are going to start by merging our two tables. First, we need to make our Periods table available as a query by using From Table and loading the result as a connection only. We have already made our invoices Table available in the same way in Part 4. We can then use New Query, Combine Queries, Merge:
We have selected our Table of Invoices (tblUK) and our Periods Table (tblPeriods) from the dropdowns and chosen OrderDate and Period start as our matching fields. The default Join Kind is Left Outer and, as you can see, this only finds 10 matches in tblUK for our Period start values in tblPeriods. We will change our Join Kind to 'Full Outer', and this will include all the values from both tables, whether or not there is a match. This is important as it ensures that, even if a value in tblPeriods doesn't have a match in tblUK, it will be available for our approximate match procedure.
Expand merged tables
As explained in the previous part, a merged table in a query initially shows as a New Column containing 'Table', The Expand/Aggregate icon in the column heading displays a dialog that allows us to expand the table to show any or all of the columns that it contains, or to display an 'aggregate' by applying an aggregate function to any or all of the columns. This is because a table can contain multiple rows as well as multiple columns.
In our case, our Periods table only contains a single row for each period so we choose to expand the Period start and Period columns. As you can see below, where there is no match between the tables, the columns contain null:
We need to add a custom column that will show the OrderDate where there is one and the Period start date where OrderDate is null. The recently-introduced Conditional Column command in the General group of the Add Column Ribbon tab makes this easier than it used to. Before its introduction, you would have had to add a Custom Column and worked out the formula for yourself. We have called our new column FullDate:
We then sort our rows in ascending order using our new column:
The reason for choosing the Full Outer join kind to include all rows from both tables should now be apparent. If we had only included matching rows, the Period start date of 03/09/10 would not be included in our FullDate column. This would mean that, when we use the Transform Ribbon tab, Any Column group, Fill Down command to ensure all of our invoices have a Period value, dates between 06/08/10 and 30/09/10 would all be allocated to the 31/08/10 period. Here is our date after using Fill Down in the Period column:
Having used Fill Down, we can now remove any rows for which the tblUK values are null, remove any columns that we no longer need and format our Period as just a date. We can Close & Load our new query to a worksheet and then use it as the basis of a PivotTable that displays the totals by period:
Coming up next…
Using Power Query to fully automate the production of management accounts.