Simon Hurst explores a range of Excel problems that can be solved using Excel's lookup functions and reveals how to reduce the number of VLOOKUP() errors.
In part one we started with an introduction to lookup functions in general and went on to examine one of the two different ways in which the VLOOKUP() function can be used to return a value from a multi-column range, based on matching a lookup value against a value in the first column of the range.
This time we will look at the other way of using VLOOKUP() – the use of an approximate, rather than exact, match before going on to look at alternatives to VLOOKUP().
An approximate VLOOKUP()
We finished part one with an example of the use of an approximate VLOOKUP() function to find which band a value falls in:
The use of TRUE as the fourth argument of our function means it will work as an approximate lookup. Approximate doesn't mean nearest as our example clearly shows. Our second value is 499,999 only 1 away from 500,000 but 299,999 away from 200,000, yet it is the value of 200,000 that it 'matches' to give us a discount rate of 5%. This is because an approximate lookup matches the largest item in the lookup table that is less than or equal to the lookup value.
Why approximate can be thousands of times faster than exact
The approximate lookup works by comparing the lookup value to the middle item in the first column of the lookup table than then going to the middle of the range of cells above or below that cell and so on.
This has two important consequences: the first column of the lookup table must be sorted in ascending order for this process to work correctly and an approximate lookup is far faster than an exact lookup. The time an exact lookup takes depends directly on the position of the matched item in the list. Because an exact match starts at the beginning of the list and checks each item in turn to see if there is a match, it can have to check as many matches as there are items in the table if the match is the last item in the table.
For an approximate lookup, the worst case would see about a dozen checks necessary to match a table of 1,000 items, making it potentially almost 100 times faster. The higher the number of items the more pronounced the difference in speed. Doubling the 1,000 would add a single check to an approximate lookup up but could add up to 1,000 checks to an exact lookup.
If you are using an exact lookup that is slowing down your spreadsheet, one option is to ensure that the first, lookup, column remains sorted in ascending order and change your exact lookup to an approximate one. However, you also need to consider what happens when no match is found. An exact lookup will return an #N/A error whereas an approximate lookup is likely to return a match.
If you are trying to find a unique item such as a client ID, this would not be a good idea. Accordingly, you would also need to build in a check to identify a failure to find an exact match. Here, we are checking whether an approximate match in column 1, the lookup column itself, is equal to the lookup value. If it isn't, no exact match has been found and we return NA() otherwise we go on to perform the approximate lookup to return the value from the column we actually want to use:
=IF(VLOOKUP(H17,$K$4:$L$13,1,TRUE)=H17,VLOOKUP(H17,$K$4:$L$13,2,TRUE),NA())
Although we are performing two approximate lookups to replace a single, exact, lookup, when we are working with large lookup tables this can still be much faster.
Is using MATCH() and INDEX() always better than using VLOOKUP()?
Introduction
If you search the Internet for match v vlookup you will find lots or articles suggesting that all VLOOKUP() formulae should be replaced with the equivalent MATCH(), INDEX() formula. Before debating the advantages and disadvantages of both methods, we'll look at how to use MATCH() and INDEX() to replace VLOOKUP() in our example.
How MATCH() and INDEX() work
As mentioned in part 1, MATCH() is very similar to VLOOKUP() and HLOOKUP(), the most obvious difference is that MATCH() returns the position of a value rather than the value itself. In order to get at the value, you need to combine INDEX() with MATCH(). MATCH() finds a 'position'; INDEX() uses a position to return a value:
We only need to use two arguments for the INDEX() function. The first argument specifies the array that contains the range of cells we want to return our value from which in our case are our discount rate column values.
The second argument is the position in this array where the value we need lies. This second argument uses the MATCH() function to find this position by matching our sales value against our column of minimum sales values using an approximate 'less than' match.
MATCH() takes three arguments: the value to be matched, the range of cells to match against and the equivalent of the fourth VLOOKUP() argument to specify an approximate or exact match.
Why MATCH(), INDEX() is more flexible
This simple example demonstrates some of the arguments on both sides. VLOOKUP() is simpler – a single function rather than a combination of two functions.
The use of MATCH() and INDEX() is more flexible: you can return a value from columns to the left, as well as to the right, of the lookup column and the third argument of MATCH() allows you to use two different types of approximate lookup as well as an exact lookup.
Using 0 performs and exact match and using 1 performs an approximate match equivalent to the VLOOKUP() approximate match, based on finding the largest value less than or equal to the lookup value.
MATCH() allows a third option, using -1 will find the smallest item less than or equal to the lookup value – for this to work the lookup range must be sorted in descending order. This can be used to find the next item in a band rather than the previous item. For example, you could find the next value after today in a list of descending dates.
The MATCH() and INDEX() combination can also be used to perform a two dimensional lookup. Because INDEX() can be used with a range containing rows and columns, MATCH() functions could be used to specify both row and column positions.
In this example, this is used to find an applicable VAT rate for a transaction based on an exact column match to find the type of VAT rate and an approximate, less than, match to find the percentage applicable for a transaction at a particular date:
The first match has a third argument of 0 for an exact match and the second uses 1 for an approximate match which will find the largest (newest) date that is less than or equal to the transaction date. This gives us the VAT percentage for the VAT rate in effect at that date.
Speed
Opinions differ regarding the relative speeds of using VLOOKUP() compared with INDEX() and MATCH(). It's probably safe enough to say that for simple lookups with a small lookup table and up to a few hundred transactions any difference is likely to be negligible.
There are more complicated uses where the way that MATCH() and INDEX() work and the additional flexibility they allow can make substantial performance improvements.
So what's the answer?
Given that, if you understand VLOOKUP() you can easily understand MATCH() and INDEX, and vice versa, I'm not convinced by the argument that, because MATCH() and INDEX() can do everything VLOOKUP() can do and more, you might as well abandon VLOOKUP(). For a lot of uses VLOOKUP() is simpler and works just as well. I think it's important to understand how both methods work and to choose the simplest one that does the job.
Coming up…
Of course, there are more solutions to lookup problems than the use of Excel functions. The first in our series of Power Query technique articles looks at using a Power Query merge as an alternative.
You might also be interested in
Simon Hurst is the founder of technology training consultancy The Knowledge Base and is a past chairman of the ICAEW's IT Faculty.
Replies (4)
Please login or register to join the discussion.
Why does almost every Excel blogger blindly compare Vlookup in a native/naked state (as a single function) with an Index/Match combo (at least one secondary function returning arguments to the primary function)??
This is a grossly unfair comparison, because the perceived (but not real) limitations of Vlookup can be easily overcome by nesting it with other functions!
For example:
1. Two Column functions, one Columns function, or even the Match function can dynamically return the col_index_num argument.
2. The Choose function can be used to make Vlookup to act very much like an Index/Match combo, by:
(a) looking to the left, and
(b) referencing two non-contiguous columns for the table _array consisting of a lookup value range and a return value range (vs a large contiguous range with superfluous columns between the two target columns).
This can be achieved by using the following formula syntax: =VLOOKUP( Lookup_value, CHOOSE({1,2}, Lookup_range, Return_value_range), 2, FALSE ).
Note that using {1,2} does NOT make this an array/CSE formula as CHOOSE accepts the array as an argument.
Check out the full explanation/tutorial on Excel MVP Mynda Treacy's MyOnlineTrainingHub website.
I'm happy to agree to disagree on whether using INDEX() and MATCH() is easier than using VLOOKUP() and CHOOSE() for certain types of lookup. But the conclusions I came to were not arrived at 'blindly'. I am aware of the use of CHOOSE() to pass arrays to other functions such as VLOOKUP() and, in particular to allow VLOOKUP() to lookup from right to left. The article was mainly aimed at 'defending' the use of VLOOKUP() for straightforward lookups and explaining situations when INDEX() and MATCH() were more flexible and could do things that VLOOKUP() can't.
The use of CHOOSE() in the way you describe is certainly worth mentioning so thanks for that, if not for the 'blindly'. Although it does help emphasise the point I made at the end of part 1...
I was just pointing out the unfairness of the single function vs combo comparison and that several of the arguments used to discredit Vlookup are flawed. (I've seen many Excel bloggers do this, and can't understand why!)
And I wasn't making a judgement as to which of VLOOKUP or INDEX/MATCH was easier/better in various circumstances, though I do think there's little difference between using INDEX/MATCH and VLOOKUP/CHOOSE for straightforward (columnar) lookups, and agree that the former does offer more options, particularly where range intersections are required.
Thanks Fastlane - I think we are in agreement. In part, the article was inspired by a desire to redress the INDEX() MATCH() v VLOOKUP() argument. I've always felt that for a simple lookup VLOOKUP() is the most straightforward method, and agree with you that there is no reason to claim that INDEX() MATCH() is always preferable to VLOOKUP().