Director SumProduct
Share this content
Tags:

Dynamic Arrays vs legacy array formulae

Prior to the new Dynamic Array functionality, you had to build array formulae if you wanted to work with ranges in Excel. Liam Bastick explains how they both compare.

5th Nov 2019
Director SumProduct
Share this content
Excel
istock_jxfzsy_ex

Prior to this new functionality, if you wanted to work with ranges, you used to have to build array formulae, where references would refer to ranges and be entered as CTRL + SHIFT + ENTER formulae. The main differences are as follows:

  • Dynamic Array formulae may spill outside the cell bounds where the formula is entered. The Dynamic Array formula technically only exists in the cell in the top left-hand corner of the spilled range (as shown earlier), whereas with a legacy CTRL + SHIFT + ENTER formula, the formula would need to be entered in the entire range
     
  • Dynamic arrays will automatically resize as data is added or removed from the source range. CTRL + SHIFT + ENTER array formulae will truncate the return area if it's too small, or return #N/A errors if too large
     
  • Dynamic array formulae will calculate in a 1 x 1 context
     
  • Any new formulae that return more than one result will automatically spill. There's simply no need to press CTRL + SHIFT + ENTER
     
  • According to Microsoft, CTRL + SHIFT + ENTER array formulae are only retained for backwards compatibility reasons. Going forward, you should use Dynamic Array formulae instead
     
  • Dynamic array formulae may be easily modified by changing the source cell, whereas CTRL + SHIFT + ENTER array formulae require that the entire range be edited simultaneously
     
  • Column and row insertion / deletion is prohibited in an active CRL + SHIFT + ENTER array formula range. You first need to delete any existing array formulas that are in the way.

Everybody clear? I think we are finally good to start introducing the other new functions…

SORT function

I am not going to do these alphabetically – let me show then in an order that makes sense (well, to me, anyway).

The SORT function sorts the contents of a range or array:

=SORT(array, [sort_index], [sort_order], [by_column]).

It has four arguments:

  • array: this is required and represents the range that is required to be sorted
  • sort_index: this is optional and refers to the position of the row or the column in the selected array (e.g. second row, third column). 99 times out of 98 you will be defining the column, but to select a row you will need to use this argument in conjunction with the fourth argument, by_column. And be careful, it’s a little counter-intuitive! The default value is 1
  • sort_order: this is also optional. The choices for sort_order are 1 for ascending (default) or -1 for descending. It should be noted that you might not want to hold your breath waiting for ‘Sort by Color’ (sic), ‘Sort by Formula’ or ‘Sort by Custom List’ using this function
  • by_column: this final argument is also optional. Most people want to sort rows of data, so they will want the value to be FALSE (which is the default value if not specified). Should you be booking your mental health check, you may wish to use TRUE to sort by column in certain instances.

This is a function people have been crying out for, for years. Enterprising spreadsheets gurus have developed array formulae and user-defined functions that have replicated this functionality, but you don’t need it anymore! SORT is coming to a theatre near you very soon.

To show you how devilishly simple it is, consider the following data:

SORT function

Sorting the ‘Points’ column in order is easy as this:

 

Sorted points

All you have to do is type =SORT(H13:H27) into cell F32. That’s it! Note that the duplicates are repeated; there is no cull. If you want it in descending order, simply specify the requirement in the formula:

Sorted points descending

This formula is only slightly more sophisticated, in that the sort_order (third argument) needs to be specified as -1 to switch the sort to descending:

=SORT(H13:H27,,-1).

You probably won’t want the points displayed on their own:

Sorted points

Now all of these arguments start to make more sense. SORT(F13:H27,3,-1) produces the whole array (array is F13:H27), sorts it on the third (sort_index 3) column in descending (sort_order -1) order. Blake and Ivana tie on 508 points, but Blake appears first as he was first in the original (source) table.

So far, I have only performed the one SORT. You can have more than one though:

Two-stage sort

Here, I have created a second (two-level) SORT. Here, you need to create what is known as an array constant for the second and third arguments (you just type the braces in – don’t use CTRL + SHIFT + ENTER):

=SORT(F13:G27,{1;2},{1;-1}).

This will sort on column 1 (‘First Name’) first, then sort on column 2 (‘Last Name’) next. This will be in ascending order (1) for the first column and descending order (-1) for the latter. It’s not as straightforward a formula entry as most Excel modellers are used to, but it’s relatively straightforward once you have committed it to erm, um, what do you call it, memory.

My final example of SORT is not something that is limited to this new function, but it does show how things fit together. From all that has been written above, it appears you can only get one value (using SINGLE) or all of them (using Dynamic Arrays). That’s not true as this illustration clearly demonstrates:

Top three only

Only the top three have spilled in this example. How? Well, I cheated. I highlighted cells F108:H110 first, then typed in the formula

=SORT(F13:H27,3,-1)

and then pressed CTRL + SHIFT + ENTER (thus generating the { and } braces). This restricted the spill to the range stipulated. Cool. Other than making sure no one can delete or insert any rows by creating an array formula such as {=1} across the restricted area, these appear to be the only two used of CTRL + SHIFT + ENTER now.

SORT is really useful then, but what if you want to sort on a field you don’t want displayed in the results..?

SORTBY function

The SORTBY function sorts the contents of a range or array based on the values in a corresponding range or array, which does not need to be displayed. The syntax is as follows:

=SORTBY(array, by_array1, [sort_order1], [by_array2], [sort_order2], …).

It has several arguments:

  • array: this is required and represents the range that is required to be sorted
  • by_array1: this is the first range that array will be sorted on and is required
  • sort_order1, sort_order2, …: these are optional. The choices for each sort_order are 1 for ascending (default) or -1 for descending
  • by_array2, …: these arguments are also optional. These represent the second and subsequent ranges that array will be sorted on.

There are some important considerations to note:

  • the by_array arguments must either be one row high or one column wide
  • all of the by_array arguments must be the same size and contain the same number of rows as array if sorting on rows, or the same number of columns as array if sorting on columns
  • if the sort order argument is not 1 or -1, the formula will result in a #VALUE! error.

It’s pretty simple to use. Consider the following source data once more:

Original data

I can use SORTBY as follows:

Results

Here, using the formula

=SORTBY(F13:F27,H13:H27,-1,G13:G27,1)

I have sorted the ‘First Name’ field (F13:F27) on the ‘Points’ column (H13:H27) in descending (-1) order and then used the second sort on ‘Last Name’ (G13:G27) in ascending (1) order. No need for those pesky array references in multiple sorts with the SORT function (as detailed above).

FILTER function

The FILTER function will accept an array, allow you to filter a range of data based upon criteria you define and return the results to a spill range.

The syntax of FILTER is as follows:

=FILTER(array, include, [if_empty]).

It has three arguments:

  • array: this is required and represents the range that is to be filtered
  • include: this is also required. This specifies the condition(s) that must be met
  • if_empty: this argument is optional. This is what will be returned if no data meets the criterion / criteria specified in the include argument. It’s generally a good idea to at least use “” here.

For example, consider the following source data:

Original data

To begin with, I will perform a simple FILTER:

Filter results

Here, in cell F36, I have created the formula

=FILTER(F12:I27,G12:G27=G33,”Not Located.”)

F12:I27 is my source array and I wish only to include shapes (column G12:G27) that are ‘Triangles’ (specified by cell G33). If there are no such shapes, then “Not Located.” is returned instead. To show this, I will change the shape as follows:

Filter results

That is about as basic as it gets. I can get cleverer. Consider the following example:

Original data

I have repeated the source array (cells F48:I63) for clarity. The formula

=FILTER(F48:I63,(G48:G63=G69)*(H48:H63=G70),{"-","None","N/A","N/A"})

looks horrible to begin with, but it’s not quite as bad as it appears upon further scrutiny. The include argument,

(G48:G63=G69)*(H48:H63=G70)

contains two conditions. Firstly, G48:G63=G69 means that the ‘Shape’ (column G48:G63) has to be a ‘Triangle’ (cell G69) and that the ‘Colour’ (column H48:H63) has to be ‘Red’ (cell G70). The multiplication operator (*) is used to denote AND. The Excel function AND cannot be used with arrays – this is nothing special to Dynamic Arrays; AND does not work with CTRL + SHIFT + ENTER formulae either. This syntax is similar to how you would create AND criteria with the SUMPRODUCT function, for example.

The final argument is similar to the syntax in SORT: {"-","None","N/A","N/A"}. Braces (typed in!) are used to create an array argument that specifies what should be written in each column should there be no record that meets both criteria, eg

Filter results

See? Not as bad as you might first think.

My final example is very similar:

Example

Once you realise I have simply repeated referencing for clarity, the formula

=FILTER(F84:I99,(G84:G99=G105)+(H84:H99=G106),{"-","None","N/A","N/A"})

is nothing more than the OR equivalent of the previous example, with ‘+’ replacing ‘*’ to switch from ensuring both conditions are met to only one condition being met. As at the time of writing, XOR is not catered for, but I am sure some clever person will create an equivalent in due course (if Microsoft doesn’t beat them to it), necessity being the mother of invention and all that jazz.

Interlude: the #CALC! error

I mentioned there were two new error messages. I have only referred to #SPILL! so far. There is another, lurking in the background (I say “in the background” as at the time of writing, Microsoft hasn’t written any documentation on it!).

Sometimes, as you explore how you can combine Excel functions with each other you get error messages (e.g. more often than not trying FILTER(FILTER(… will generate an #VALUE! error). When you start playing with these new array functions, you might stumble upon #CALC! This is a new one.

To add to the myriad of error messages such #REF!, #DIV/0!, #VALUE!, #BROWN and #PIPE, let’s introduce #CALC! – which probably means something like, "Excel cannot currently figure out the answer presently, but might be able to in a future release, no promises though". I look forward to the documentation in due course though to fathom its real meaning (probably something like, “Help! Abandon ship!”).

Let’s move on.

UNIQUE function

The hilarious thing about UNIQUE is that it does two things (!). It details distinct items (i.e. provides each value that occurs with no repetition) and also it can return values which occur once and only once in a referred range. I understand that Excel users may welcome the former use with open arms and that database developers may be very interested in the latter. I still think there should have been two functions though. Otherwise, let’s just extend the AGGREGATE function to do just everything (it almost does now) and be done with it!

The UNIQUE function has the following syntax:

=UNIQUE(array, [by_column], [occurs_once]).

It has three arguments:

  • array: this is required and represents the range or array from which to return unique values
  • by_column: this argument is optional. This is a logical value (TRUE / FALSE) indicating how to compare. If you wish to compare by row, the argument should be FALSE or omitted (since this is the default). To compare by column, you will need to select TRUE
  • occurs_once: this argument is also optional. This requires a logical value too:
    • TRUE: only return unique values that occur once
    • FALSE: include all distinct values (default if omitted).

It’s probably clearer with some examples. Let’s give it a go. As always, I need source data:

Source data

Time for the most basic illustration:

Results

In cell L13, I have simply typed

=UNIQUE(F13:F41).

No optional arguments; everything in default. If I have made an error, it’s going to be my default. This has simply listed each store that appears; if “North” and “North ” (extra space) were there, then both would appear. UNIQUE is not case sensitive though and each entry would appear as it first occurs reading down the range F13:F41. The other columns contain similar formulae and UNIQUE looks like it takes seconds to learn. Presently, there’s an in-joke going around the Excel Most Valuable Professionals (MVPs) that array expert Mike Girvin is going to be choked as he dedicated an entire chapter in one of his books to creating that list with an array formula! Sorry Mike. Excel is fun!

It’s just as simple if you want to see unique records for two (or more) columns, viz.

Example

You can see UNIQUE is sort of crying out for SORT, but we’ll get to that shortly.

As mentioned earlier, it’s not the only way of using UNIQUE (no, having a unique use would be just what “they” were expecting, whoever “they” are…). You can use it to determine values that only occur once:

Results

Here, the formula in cell L56,

=UNIQUE(G56:G84,0,1)

uses the non-default value of 1 for the optional occurs_once (third) argument. This means it identifies the salespeople who only occur once in cells G56:G84. Brilliant; I can die content knowing now.

The real power starts coming when you start playing with Excel’s existing functions and features, together with these new functions. Take this comprehensive example:

Example

Let me step you through some of this. The formulae in cells L94 and M94 use UNIQUE in a similar manner to my first example, to generate the list of distinct values in the ‘Section’ and ‘Manager’ fields. However, did you notice they have been sorted? That’s because I used the formula

=SORT(UNIQUE(H94:H122))

in cell L94, for example. Honestly, I think UNIQUE should have another argument for sorting (ascending / descending / none [default]). Watch Microsoft ignore that suggestion.

But then I did something really cool. Cells M105 and M106 use data validation (ALT + D + L) to generate a list from the ‘Lookup Data’ section. That requires taking a closer look:

Data validation

Do you see the source for the data validation in cell M105? =$L$84# - so elegant! This takes the ‘Section’ list and automatically makes the drop-down list the required length! People create all sorts of tricks using OFFSET, dynamic range names and the like to achieve a similar effect. No more. =$L$84# (with the ‘#’, the Spilled Range Operator) is all that is needed. That’s my favourite thing in all of these new functions and features. I’m impressed – and I’m easily impressed.

The ‘AND / OR’ dropdown is a bit of an anti-climax after that, but the final formula that generates the final table, namely

=SORT(UNIQUE(FILTER(F93:I122,IF(M108="OR",(H93:H122=M105)+(I93:I122=M106),
(H93:H122=M105)*(I93:I122=M106)),{"N/A","-","-","-"})),{1;2;3;4},{1;1;1;1})

is rather fun. I am not going to go through it though – as every aspect of this formula is simply a re-hash of an earlier point (assuming you know the IF function!). See if you can work your way through it for yourself.

SEQUENCE function

The penultimate function is SEQUENCE. This function allows you to generate a list of sequential numbers in an array, such as 1, 2, 3, 4. It doesn’t sound particularly exciting, but again, it really ramps up when combined with other functions and features. The syntax is given by:

=SEQUENCE(rows, [columns], [start], [step]).

It has four arguments:

  • rows: this argument is required and specifies how many rows the results should spill over
  • columns: this argument is optional and specifies how many columns (surprise, surprise) the results should spill over. If omitted, the default value is 1
  • start: this argument is also optional. This specifies what number the SEQUENCE should start from. If omitted, the default value is 1
  • step: this final argument is also optional. This specifies the amount each number in the SEQUENCE should increase (the “step”). It may be positive, negative or zero. If omitted, the default value is 937,444. Wait, I’m kidding; it’s 1. They’re very unimaginative down in Redmond.

Therefore, SEQUENCE can be as simple as SEQUENCE(x), which will generate a list of numbers in a column 1, 2, 3, …, x. Therefore, be mindful not to create a formula where x may be volatile and generate alternative values each time it is calculated, e.g. =SEQUENCE(RANDBETWEEN(10,99)) as this will generate the #SPILL! range is volatile in size error.

A vanilla example is rather bland:

Inputs and outputs

Do you see how SEQUENCE propagates across the row first and then down to the next row, just like reading a book? I wonder how that might work in alternative languages of Excel where users read right to left (it has to be the same or there would be chaos when workbooks were shared!).

Some of my peers had fun combining it with the ROMAN function:

Roman numerals

To my mind though, my favourite simple illustration is creating a monthly calendar. A little magic with the DATE and WEEKDAY functions combined with some conditional formatting and suddenly you have:

Dates

As I mentioned above, SEQUENCE is arguably more powerful when included in a more complex formula. For example:

Interest payments

In this instance, I have created a grid using the Excel IPMT function to determine the amount of interest to be paid in each monthly instalment. Cells G62:R71 calculate each monthly amount and column T sums these amounts to calculate the annual interest payment, a figure which is non-trivial to compute. The whole table may be replaced by the formula in cell V62:

=IF($F62="","",-SUM(IPMT(Annual_Interest_Rate/Months_in_Year,
SEQUENCE(1,Months_in_Year,($F62-1)*Months_in_Year+1,1),
Borrowing_Term*Months_in_Year,Amount_Borrowed)))
.

I am not going to explain this and let me tell you why. Our company, SumProduct builds and reviews financial models for a living. We see terrible modelling practices established day-in, day-out. We proactively try to discourage these traits by emphasising that complex formulae should be stepped out and made transparent. Here, that can be done using the original table. I don’t want people using SEQUENCE, Dynamic Arrays or other spilled formulae to wrap up complicated calculations into an opaque Pandora’s Box. Yes, calculation times may be slower. Live with it. Sometimes you need to see the scenery to appreciate the beauty. I’m just a little fearful that people will embrace these functions a little too readily and the Road to Excel Hell beckons shortly. Sorry to be a miserable git.

On an upbeat note, I put a formula in cell G61 which is simple:

=TRANSPOSE(SEQUENCE(Months_in_Year)).

Yes, I am using TRANSPOSE without CTRL + SHIFT + ENTER. We are in new territory here.

RANDARRAY function

And so, to the final function for now: RANDARRAY. Originally, the RANDARRAY function returned an array of random numbers between 0 and 1. It’s not clear from Microsoft, analogous to the pre-existing RAND function, which generates a number greater than or equal to zero and strictly less than one. However, there was a general sense of underwhelm with this function and the new and improved version has recently been released. It’s the first time ever a function has been changed after it has been released publicly. It now allows you to set you own maximum and minimum and decide whether you want the values returned to be decimals (e.g. 17.4381672…) or integers (whole numbers).

The new syntax for the function is now as follows:

=RANDARRAY([rows], [columns],[min],[max],[integer]).

The function has five arguments, all supposedly optional (but upon testing, we weren’t quite as convinced):

  • rows: this specifies how many rows the results should spill over. If omitted, the default value is 1
  • columns: this specifies how many columns the results should spill over. If omitted, the default value is also 1
  • min: this is the minimum value that may be selected randomly. If this is not specified, it is assumed to be zero (0)
  • max: this is the maximum value that may be selected randomly. If this is not specified, it is assumed to be 1
  • integer: if this is set to TRUE, only integer outputs are allowed; the default value (FALSE) provides non-integer (decimal) results.

Other points to note:

  • if rows or columns refers to a blank cell reference, this will generate the new #CALC! error
  • if rows or columns are entered as decimals, the values used will be truncated to the number before the decimal point (e.g. 3.9999999 will be treated as 3)
  • if rows or columns is a value less than 1, #CALC! will be returned
  • if integer is set to TRUE and either min or max is not an integer, this will generate an #VALUE! error
  • max must be greater than or equal to min, else the error #VALUE! is returned.

When we originally discussed the RANDARRAY function, we used this rather comprehensive example to create a list of random integers between two values:

Inputs and outputs

Originally, the formula in cell F44 was

=ROUNDDOWN(RANDARRAY(H36,H37)*(H39-H38+1),0)+INT(H38)

and the article explained how this worked. However, it’s much easier now:

Inputs and outputs

The “new improved” formula in cell F45 (it’s moved down a row due to the additional argument required in cell H40) is simply

=RANDARRAY(H36,H37,H38,H39,H40).

Cool, eh?

For a final example, imagine you are a schoolteacher and you have 10 five-year-old children:

Example

For each of the next 10 weeks, you have topics you want one of them to present on:

Example

I can use RANDARRAY in tandem with SORTBY to determine a presentation order for the term:

Example

Oh dear. I do hope Diana has prepared well or it could all end in tears. She could try swapping with Horace, I suppose. On a serious note, the formula

=SORTBY(F63:F72,RANDARRAY(COUNTA(F63:F72)))

sorts the ‘Child’ order randomly (and a similar formula is used for ‘Topic’ too). In a past life, as an independent expert, I once had to attest that drug testing was being performed entirely randomly, i.e. free from any material bias. SORTBY(RANDARRAY) dries up that well for me once and for all.

Liam Bastick is author of an Introduction to Financial Modelling, which provides a simple walkthrough of the common perils and pitfalls of financial modelling. Presently, this book is only available on Amazon Australia but if anyone would like to buy a copy directly contact Liam via this link. AccountingWEB readers receive a 10% discount plus p&p (including tracking and insurance at cost).

Tags:

Replies (1)

Please login or register to join the discussion.

avatar
By willflash
08th Nov 2019 14:24

Excellent article, thought provoking and informative.
I have just complete a course by Leila Gharani, also very good, but you have taken it much further! Thanks.
One request, would you consider sharing your workbooks?
Cheers anyway.
William Petit

Thanks (0)