Director SumProduct
Tags:

# New Excel functions: XLOOKUP and XMATCH

Liam Bastick looks at two new functions that will shortly be available in Office 365: XLOOKUP and XMATCH.

6th May 2020
Director SumProduct istock_andreypopov_aw

I have explained previously why INDEX MATCH was a more powerful LOOKUP combination than VLOOKUP, which to this day remains one of the accounting world’s most popular Excel functions. However, time has moved on. There are new kids on the block. At the time of writing, these functions are still rolling out in Office 365, but they are coming.

XLOOKUP(lookup_value, lookup_vector, results_array, [if_not_found], [match_mode], [search_mode])

This function seeks out a lookup_value in the lookup_vector and returns the corresponding value in the results_array. Most of the time you will only require the first three arguments:

• lookup_value: this is required and defines what value you want to look up
• lookup_vector: this reference is required and is the row or column of data you are referencing to look up lookup_value
• results_array: this is where the corresponding item is you wish to return and is also required (even if it is the same as lookup_vector). This does not have to be a vector (ie one row or one column of cells): it may be an array (with at least two rows and at least two columns of cells). The only stipulation is that the number of rows/columns must equal the number of rows/columns in the column/row vector – but more on that later
• if_not_found: this optional argument allows you to replace the usual return of #N/A with something more informative like an alternative formula, text or a value
• match_mode: this argument is optional. There are four choices:
• 0: exact match (default)
• -1: exact match or else the largest value less than or equal to lookup_value
• 1: exact match or else smallest value greater than or equal to lookup_value
• 2: wildcard match. You should use the special character ? to match any character and * to match any run of characters.

What’s impressive, though, is that for certain selections of the final argument (search_mode), you don’t need to put your data in alphanumerical order! As far as I am aware, this is a first for Excel

• search_mode: this argument is also optional. There are again four choices:
• 1: search first to last (default)
• -1: search last to first
• 2: what is known as a binary search, first to last (requires lookup_vector to be sorted). Just so you know, a binary search is a search algorithm that finds the position of a target value within a sorted array. A binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found
• -2: another binary search, this time last to first (and again, this requires lookup_vector to be sorted).

Let’s have a look at XLOOKUP versus VLOOKUP: You can see the XLOOKUP function is shorter:

=XLOOKUP(H52,F41:F47,G41:G47)

Only the first three arguments are needed, whereas VLOOKUP requires both a fourth argument and, for full flexibility, the COLUMNS function as well. XLOOKUP will automatically update if rows/columns are inserted or deleted. It’s just simpler.

HLOOKUP has similar issues: Here, this highlights what happens if I try to deduce the student name from the Student ID. HLOOKUP cannot refer to earlier rows, just as VLOOKUP cannot consider columns to the left. Given any unused elements of the table are ignored also, it’s just good news all round. Goodbye limitations, hello XLOOKUP.

Indeed, things get even more interesting when you start considering XLOOKUP’s final two arguments, namely match_mode and search_mode, viz. Notice that I am searching the ‘Value’ column, which is neither sorted nor contains unique items. Do you see how the results have changed once more, depending upon match_mode and search_mode? The match_mode zero (0) returns “Not Found” now instead of #N/A because there is no exact match and the formula has now stipulated what to do in such an instance.

When match_mode is -1, XLOOKUP seeks an exact match or else the largest value less than or equal to lookup_value (6.5). That would be 4 – but this occurs more than once (B and D both have a value of 4). XLOOKUP chooses depending upon whether it is searching top down (search_mode 1, where B will be identified first) or bottom up (search_mode -1, where D will be identified first). Note that with binary searches (with a search_mode of 2 or -2), the data needs to be sorted. It isn’t – hence we have garbage answers that cannot be relied upon.

With match_mode 1, the result is clearer cut. Only one value is the smallest value greater than or equal to 6.5. That is 7, and is related to A. Again, binary search results should be ignored, although it is worth noting “Not Found” occurs when Excel identifies the lookup value has not been found.

The match_mode 2 results are spurious. This is seeking wildcard matches, but there are no matches, hence “Not Found” instead of N/A for the only search_modes that may be seen as creditable (1 and -1). It’s interesting to note a binary search causes errors which are not trapped by the new argument.

Clearly binary searches are higher maintenance. In the past, it was worth investing in them as they did return results more quickly. However, according to Microsoft, this is no longer the case: apparently, there is “…no significant benefit to using (sic) the binary search options…”. If this is indeed the case, then I would strongly recommend not using them going forward with XLOOKUP.

Whilst XLOOKUP wins hands down against HLOOKUP and VLOOKUP, the same cannot necessarily be said for LOOKUP. You may recall LOOKUP has two forms: an array form and a vector form. As a reminder:

• an array is a collection of cells consisting of at least two rows and at least two columns
• a vector is a collection of cells across just one row (row vector) or down just one column (column vector).

The diagram should be self-explanatory: The array form of LOOKUP looks in the first row or column of an array for the specified value and returns a value from the same position in the last row or column of the same array:

LOOKUP(lookup_value, array)

where:

• lookup_value is the value that LOOKUP searches for in an array. The lookup_value argument can be a number, text, a logical value, or a name or reference that refers to a value
• array is the range of cells that contains text, numbers, or logical values that you want to compare with lookup_value.

The array form of LOOKUP is very similar to the HLOOKUP and VLOOKUP functions. The difference is that HLOOKUP searches for the value of lookup_value in the first row, VLOOKUP searches in the first column, and LOOKUP searches according to the dimensions of array.

If array covers an area that is wider than it is tall (i.e. it has more columns than rows), LOOKUP searches for the value of lookup_value in the first row and returns the result from the last row. Otherwise, LOOKUP searches for the value of lookup_value in the first column and returns the result from the last column instead.

The alternative form is the vector form:

LOOKUP(lookup_value, lookup_vector, [result_vector])

The LOOKUP function vector form syntax has the following arguments:

• lookup_value is the value that LOOKUP searches for in the first vector
• lookup_vector is the range that contains only one row or one column
• [result_vector] is optional – if ignored, lookup_vector is used – this is the where the result will come from and must contain the same number of cells as the lookup_vector.

Like the default versions of HLOOKUP and VLOOKUP, lookup_value must be located in a range of ascending values.

Let me demonstrate with an example: LOOKUP is a great function to use with time series analysis / forecasting. Dates are in ascending order and the LOOKUP syntax is remarkably simple. As a modeller, I use it regularly when I am modelling many more forecast periods than I want assumption periods.

Here, you can see I carry assumptions only for 2020 until 2024 (the final value is 2024, just with a “+” in number formatting). The formula

=LOOKUP(G\$74,\$G\$67:\$K\$68)

returns the corresponding value for the period that is either an exact match or else the largest value less than or equal to the lookup_value. LOOKUP uses the top row of the table for looking up its data and the final row for returning the corresponding value. Simple. As for XLOOKUP:

=XLOOKUP(G\$82,\$G\$67:\$K\$67,\$G\$68:\$K\$68,-1)

This formula is longer and requires two additional arguments (match_mode -1 is required to mirror the behaviour of LOOKUP). Indeed, given that an IF statement is required to ensure no errors for earlier periods, e.g.

=IF(G\$90<\$G\$67,\$G\$68,LOOKUP(G\$90,\$G\$67:\$K\$68))

it may be argued that LOOKUP is a simpler function to use here than its counterpart.

This isn’t the only time LOOKUP outperforms XLOOKUP: Here, we do see a limitation of XLOOKUP. Whilst the third argument of XLOOKUP, results_array, does not need to be a vector, it cannot be the transposition of the lookup_vector. You would have to transpose it using the TRANSPOSE function, for example. This makes LOOKUP much easier to use – compare:

=LOOKUP(H112,F105:F109,G102:K102)

with

=XLOOKUP(H112,F105:F109,TRANSPOSE(G102:K102))

In this instance, LOOKUP wins.

Useful Features of XLOOKUP

XLOOKUP can be used to perform a two-way match, similar to INDEX MATCH: Many advanced users might use the formula

=INDEX(H40:N46,MATCH(G53,G40:G46,0),MATCH(G51,H39:N39,0))

where:

• INDEX(array, row_number, [column_number]) returns a value or the reference to a value from within a table or range (list) citing the row_number and the column_number

• MATCH(lookup_value, lookup_vector, [match_type]) returns the relative position of an item in an array that (approximately) matches a specified value. It’s most commonly used with match_type zero (0), which requires an exact match.

Therefore, this formula finds the position in the row for the student and the position in the column of the subject. The intersection of these two provides the required result.

XLOOKUP does it differently:

=XLOOKUP(G53,G40:G46,XLOOKUP(G51,H39:N39,H40:N46))

Welcome to the wonderful world of the nested XLOOKUP function! Here, the internal formula

=XLOOKUP(G51,H39:N39,H40:N46)

demonstrates a key difference between this and your typical lookup function – the first argument is a cell, the second argument is a column vector and the third is an array – with, most importantly, the same number of rows as the lookup_vector. This means it returns a column vector of data, not a single value. This is great news in the brave new world of dynamic arrays.

In essence, this means the formula resolves to

=XLOOKUP(G53,G40:G46,J40:J46)

as J40:J46 is the resultant vector of =XLOOKUP(G51,H39:N39,H40:N46). This is a really powerful – and virtually new – concept to get your head around, that admittedly SUMPRODUCT exploits too. Once you understand this, it’s clear how this formula works and opens your eyes to the power of nested XLOOKUP functions.

I can’t believe I am talking about the virtues of nested functions here! Let me change the subject quickly…

To show you how dynamic arrays can make the most of being able to create resultant vectors, consider the following example: The formula

=XLOOKUP(G77,I65:L65,I66:L72)

again resolves to a vector – but this time is allowed to spill as a dynamic array. Obviously, this will only work in Office 365, but it’s a very useful tool that might just make you think it’s time to drop that perpetual licence.

Once you start playing with the dynamic range side, you can start to get imaginative. For example: In this illustration, I want to calculate the sales between two periods: This might seem like a simple drop-down list using data validation (ALT + D + L), but XLOOKUP has been used in determining the list to be used for the end months.

Let me explain. I have hidden the range of relevant dates in cell H101 spilled across XLOOKUP can return a reference, so the formula

=XLOOKUP(G100,H94:S94,H94:s94):S94

evaluates to the row vector N94:S94 (since the start month is July). This spilled dynamic array formula is then referenced in the data validation: (You may recall \$H\$101# means the spilled range starting in cell H101.) It should be noted that the formula =XLOOKUP(G100,H94:S94,H94:s94):S94 may not be used directly in the ‘Data Validation’ dialog, but this is a neat trick to ensure you cannot select an end month before the start month (assuming you are a rational human being that selects the start before the end!).

The formula to sum the sales then is

=SUM(XLOOKUP(G100,H94:S94,H95:S95):XLOOKUP(G101,H94:S94,H95:S95))

Again, this uses the fact XLOOKUP can return a reference, so this formula equates to

=SUM(N95:Q95)

Easy! Now I am combining two XLOOKUP formulae with a colon (:) to form a range. This joins other illustrious functions used this way such as CHOOSE, IF, IFS, INDEX, INDIRECT, OFFSET, SINGLE (@), SWITCH and TEXT. First nesting, now joining – what’s next?

Seeking partial matches (sounds like an unfussy dating agency!) suddenly became a lot easier too. You can use wildcards if you want to – just set the match_mode to 2: Here, I am searching for J?n*n* - which is fine as long as you know what the wildcard characters mean:

• ? means “any character”, but just one character. If you wanted to make space for two and only two characters you would use ??
• * means “any number of characters’ – including zero.

For example, M?n*m* would identify “Manmade”, “minimum” and “Manikum” but would not accept “millennium”. Here, our formulae

=XLOOKUP(G184,H174:H179,I174:I179,,2)

=XLOOKUP(G184,H174:H179,I174:I179,,2,-1)

would locate the first and last items that satisfied the condition J?n*n* (i.e. “Jonathan” and “Jonny” respectively).

But what if you wanted an exact match with case sensitivity? You just have to think a little but outside of the proverbial box: Here, we use another feature of XLOOKUP – its ability to search a virtual vector, i.e. one that has been constructed in memory, rather than physically within the spreadsheet cells. Consider the formula

=XLOOKUP(TRUE,EXACT(H145:H154,G159),I145:I154)

Here, the interim calculation =EXACT(H145:H154,G159), looks at the range H145:H154 and deduces whether the cells are an exact match for the selection ‘Sum Product’ in cell G159. The EXACT function would evaluate as

{FALSE; TRUE; FALSE; FALSE; FALSE; FALSE; FALSE; FALSE; TRUE; FALSE}

Therefore, the formula coerces to

=XLOOKUP(TRUE,{FALSE; TRUE; FALSE; FALSE; FALSE; FALSE; FALSE; FALSE; TRUE; FALSE},I145:I154)

and then the formula becomes simple to understand.

As I said at the beginning, XLOOKUP did not land in isolation. In addition to XLOOKUP, XMATCH has arrived with a similar signature to XLOOKUP, but instead it returns the index (position) of the matching item. XMATCH is both easier to use and more capable than its predecessor MATCH. **New 077

XMATCH has the following syntax:

XMATCH(lookup_value, lookup_vector, [match_mode], [search_mode])

where:

• lookup_value: this is required and defines what value you want to look up
• lookup_vector: this reference is required and is the row or column of data you are referencing to look up lookup_value
• match_mode: this argument is optional. There are four choices:
• 0: exact match (default)
• -1: exact match or else the largest value less than or equal to lookup_value
• 1: exact match or else smallest value greater than or equal to lookup_value
• 2: wildcard match. You should use the special character ? to match any character and * to match any run of characters.

Again, for certain selections of the final argument (search_mode), you don’t need to put your data in alphanumerical order

• search_mode: this argument is also optional. There are again four choices:
• 1: search first to last (default)
• -1: search last to first
• 2: this is a binary search, first to last (requires lookup_vector to be sorted)
• -2: another binary search, this time last to first (and again, this requires lookup_vector to be sorted).

As you can see, it’s a fairly straightforward addition to the MATCH family. It acts similarly to MATCH – just with heaps more functionality.

You can check out both functions in the attached Excel workbook.

Tags:

### Replies (1) 