Simon Hurst explores a range of Excel problems that can be solved using Excel's lookup functions and reveals how to reduce the number of VLOOKUP() errors.
In part one we started with an introduction to lookup functions in general and went on to examine one of the two different ways in which the VLOOKUP() function can be used to return a value from a multi-column range, based on matching a lookup value against a value in the first column of the range.
This time we will look at the other way of using VLOOKUP() – the use of an approximate, rather than exact, match before going on to look at alternatives to VLOOKUP().
An approximate VLOOKUP()
We finished part one with an example of the use of an approximate VLOOKUP() function to find which band a value falls in:
The use of TRUE as the fourth argument of our function means it will work as an approximate lookup. Approximate doesn't mean nearest as our example clearly shows. Our second value is 499,999 only 1 away from 500,000 but 299,999 away from 200,000, yet it is the value of 200,000 that it 'matches' to give us a discount rate of 5%. This is because an approximate lookup matches the largest item in the lookup table that is less than or equal to the lookup value.
Why approximate can be thousands of times faster than exact
The approximate lookup works by comparing the lookup value to the middle item in the first column of the lookup table than then going to the middle of the range of cells above or below that cell and so on.
This has two important consequences: the first column of the lookup table must be sorted in ascending order for this process to work correctly and an approximate lookup is far faster than an exact lookup. The time an exact lookup takes depends directly on the position of the matched item in the list. Because an exact match starts at the beginning of the list and checks each item in turn to see if there is a match, it can have to check as many matches as there are items in the table if the match is the last item in the table.
For an approximate lookup, the worst case would see about a dozen checks necessary to match a table of 1,000 items, making it potentially almost 100 times faster. The higher the number of items the more pronounced the difference in speed. Doubling the 1,000 would add a single check to an approximate lookup up but could add up to 1,000 checks to an exact lookup.
If you are using an exact lookup that is slowing down your spreadsheet, one option is to ensure that the first, lookup, column remains sorted in ascending order and change your exact lookup to an approximate one. However, you also need to consider what happens when no match is found. An exact lookup will return an #N/A error whereas an approximate lookup is likely to return a match.
If you are trying to find a unique item such as a client ID, this would not be a good idea. Accordingly, you would also need to build in a check to identify a failure to find an exact match. Here, we are checking whether an approximate match in column 1, the lookup column itself, is equal to the lookup value. If it isn't, no exact match has been found and we return NA() otherwise we go on to perform the approximate lookup to return the value from the column we actually want to use:
Although we are performing two approximate lookups to replace a single, exact, lookup, when we are working with large lookup tables this can still be much faster.
Is using MATCH() and INDEX() always better than using VLOOKUP()?
If you search the Internet for match v vlookup you will find lots or articles suggesting that all VLOOKUP() formulae should be replaced with the equivalent MATCH(), INDEX() formula. Before debating the advantages and disadvantages of both methods, we'll look at how to use MATCH() and INDEX() to replace VLOOKUP() in our example.
How MATCH() and INDEX() work
As mentioned in part 1, MATCH() is very similar to VLOOKUP() and HLOOKUP(), the most obvious difference is that MATCH() returns the position of a value rather than the value itself. In order to get at the value, you need to combine INDEX() with MATCH(). MATCH() finds a 'position'; INDEX() uses a position to return a value:
We only need to use two arguments for the INDEX() function. The first argument specifies the array that contains the range of cells we want to return our value from which in our case are our discount rate column values.
The second argument is the position in this array where the value we need lies. This second argument uses the MATCH() function to find this position by matching our sales value against our column of minimum sales values using an approximate 'less than' match.
MATCH() takes three arguments: the value to be matched, the range of cells to match against and the equivalent of the fourth VLOOKUP() argument to specify an approximate or exact match.
Why MATCH(), INDEX() is more flexible
This simple example demonstrates some of the arguments on both sides. VLOOKUP() is simpler – a single function rather than a combination of two functions.
The use of MATCH() and INDEX() is more flexible: you can return a value from columns to the left, as well as to the right, of the lookup column and the third argument of MATCH() allows you to use two different types of approximate lookup as well as an exact lookup.
Using 0 performs and exact match and using 1 performs an approximate match equivalent to the VLOOKUP() approximate match, based on finding the largest value less than or equal to the lookup value.
MATCH() allows a third option, using -1 will find the smallest item less than or equal to the lookup value – for this to work the lookup range must be sorted in descending order. This can be used to find the next item in a band rather than the previous item. For example, you could find the next value after today in a list of descending dates.
The MATCH() and INDEX() combination can also be used to perform a two dimensional lookup. Because INDEX() can be used with a range containing rows and columns, MATCH() functions could be used to specify both row and column positions.
In this example, this is used to find an applicable VAT rate for a transaction based on an exact column match to find the type of VAT rate and an approximate, less than, match to find the percentage applicable for a transaction at a particular date:
The first match has a third argument of 0 for an exact match and the second uses 1 for an approximate match which will find the largest (newest) date that is less than or equal to the transaction date. This gives us the VAT percentage for the VAT rate in effect at that date.
Opinions differ regarding the relative speeds of using VLOOKUP() compared with INDEX() and MATCH(). It's probably safe enough to say that for simple lookups with a small lookup table and up to a few hundred transactions any difference is likely to be negligible.
There are more complicated uses where the way that MATCH() and INDEX() work and the additional flexibility they allow can make substantial performance improvements.
So what's the answer?
Given that, if you understand VLOOKUP() you can easily understand MATCH() and INDEX, and vice versa, I'm not convinced by the argument that, because MATCH() and INDEX() can do everything VLOOKUP() can do and more, you might as well abandon VLOOKUP(). For a lot of uses VLOOKUP() is simpler and works just as well. I think it's important to understand how both methods work and to choose the simplest one that does the job.
Of course, there are more solutions to lookup problems than the use of Excel functions. The first in our series of Power Query technique articles looks at using a Power Query merge as an alternative.