# Dynamic Arrays vs legacy array formulae

Prior to the new Dynamic Array functionality, you had to build array formulae if you wanted to work with ranges in Excel. Liam Bastick explains how they both compare.

Prior to this new functionality, if you wanted to work with ranges, you used to have to build array formulae, where references would refer to ranges and be entered as **CTRL + SHIFT + ENTER **formulae. The main differences are as follows:

- Dynamic Array formulae may spill outside the cell bounds where the formula is entered. The Dynamic Array formula technically only exists in the cell in the top left-hand corner of the spilled range (as shown earlier), whereas with a legacy
**CTRL + SHIFT + ENTER**formula, the formula would need to be entered in the entire range

- Dynamic arrays will automatically resize as data is added or removed from the source range.
**CTRL + SHIFT + ENTER**array formulae will truncate the return area if it's too small, or return #N/A errors if too large

- Dynamic array formulae will calculate in a 1 x 1 context

- Any new formulae that return more than one result will automatically spill. There's simply no need to press
**CTRL + SHIFT + ENTER**

- According to Microsoft,
**CTRL + SHIFT + ENTER**array formulae are only retained for backwards compatibility reasons. Going forward, you should use Dynamic Array formulae instead

- Dynamic array formulae may be easily modified by changing the source cell, whereas
**CTRL + SHIFT + ENTER**array formulae require that the entire range be edited simultaneously

- Column and row insertion / deletion is prohibited in an active
**CRL + SHIFT + ENTER**array formula range. You first need to delete any existing array formulas that are in the way.

Everybody clear? I think we are finally good to start introducing the other new functions…

**SORT function**

I am not going to do these alphabetically – let me show then in an order that makes sense (well, to me, anyway).

The **SORT **function sorts the contents of a range or array:

**=SORT(array, [sort_index], [sort_order], [by_column])**.

It has four arguments:

**array:**this is required and represents the range that is required to be sorted**sort_index:**this is optional and refers to the position of the row or the column in the selected**array**(e.g. second row, third column). 99 times out of 98 you will be defining the column, but to select a row you will need to use this argument in conjunction with the fourth argument,**by_column**. And be careful, it’s a little counter-intuitive! The default value is 1**sort_order:**this is also optional. The choices for**sort_order**are 1 for ascending (default) or -1 for descending. It should be noted that you might not want to hold your breath waiting for ‘Sort by Color’ (sic), ‘Sort by Formula’ or ‘Sort by Custom List’ using this function**by_column:**this final argument is also optional. Most people want to sort rows of data, so they will want the value to be FALSE (which is the default value if not specified). Should you be booking your mental health check, you may wish to use TRUE to sort by column in certain instances.

This is a function people have been crying out for, for years. Enterprising spreadsheets gurus have developed array formulae and user-defined functions that have replicated this functionality, but you don’t need it anymore! **SORT** is coming to a theatre near you very soon.

To show you how devilishly simple it is, consider the following data:

Sorting the ‘Points’ column in order is easy as this:

All you have to do is type **=SORT(H13:H27) **into cell **F32**. That’s it! Note that the duplicates are repeated; there is no cull. If you want it in descending order, simply specify the requirement in the formula:

This formula is only slightly more sophisticated, in that the **sort_order** (third argument) needs to be specified as -1 to switch the sort to descending:

**=SORT(H13:H27,,-1)**.

You probably won’t want the points displayed on their own:

Now all of these arguments start to make more sense. **SORT(F13:H27,3,-1)** produces the whole array (**array** is **F13:H27**), sorts it on the third (**sort_index **3) column in descending (**sort_order **-1) order. Blake and Ivana tie on 508 points, but Blake appears first as he was first in the original (source) table.

So far, I have only performed the one **SORT**. You can have more than one though:

Here, I have created a second (two-level) **SORT**. Here, you need to create what is known as an array constant for the second and third arguments (you just type the braces in – don’t use **CTRL + SHIFT + ENTER**):

**=SORT(F13:G27,{1;2},{1;-1})**.

This will sort on column 1 (‘First Name’) first, then sort on column 2 (‘Last Name’) next. This will be in ascending order (1) for the first column and descending order (-1) for the latter. It’s not as straightforward a formula entry as most Excel modellers are used to, but it’s relatively straightforward once you have committed it to erm, um, what do you call it, memory.

My final example of **SORT** is not something that is limited to this new function, but it does show how things fit together. From all that has been written above, it appears you can only get one value (using **SINGLE**) or all of them (using Dynamic Arrays). That’s not true as this illustration clearly demonstrates:

Only the top three have spilled in this example. How? Well, I cheated. I highlighted cells **F108:H110** first, then typed in the formula

**=SORT(F13:H27,3,-1)**

and then pressed **CTRL + SHIFT + ENTER **(thus generating the **{** and **}** braces). This restricted the spill to the range stipulated. Cool. Other than making sure no one can delete or insert any rows by creating an array formula such as **{=1}** across the restricted area, these appear to be the only two used of **CTRL + SHIFT + ENTER** now.

**SORT **is really useful then, but what if you want to sort on a field you don’t want displayed in the results..?

**SORTBY function**

The **SORTBY** function sorts the contents of a range or array based on the values in a corresponding range or array, which does not need to be displayed. The syntax is as follows:

**=SORTBY(array, by_array1, [sort_order1], [by_array2], [sort_order2], …)**.

It has several arguments:

**array:**this is required and represents the range that is required to be sorted**by_array1**: this is the first range that**array**will be sorted on and is required**sort_order1**,**sort_order2**, …: these are optional. The choices for each**sort_order**are 1 for ascending (default) or -1 for descending**by_array2**, …**:**these arguments are also optional. These represent the second and subsequent ranges that**array**will be sorted on.

There are some important considerations to note:

- the
**by_array**arguments must either be one row high or one column wide - all of the
**by_array**arguments must be the same size and contain the same number of rows as**array**if sorting on rows, or the same number of columns as**array**if sorting on columns - if the sort order argument is not 1 or -1, the formula will result in a #VALUE! error.

It’s pretty simple to use. Consider the following source data once more:

I can use **SORTBY** as follows:

Here, using the formula

**=SORTBY(F13:F27,H13:H27,-1,G13:G27,1)**

I have sorted the ‘First Name’ field (**F13:F27**) on the ‘Points’ column (**H13:H27**) in descending (-1) order and then used the second sort on ‘Last Name’ (**G13:G27**) in ascending (1) order. No need for those pesky array references in multiple sorts with the **SORT** function (as detailed above).

**FILTER function**

The **FILTER** function will accept an array, allow you to filter a range of data based upon criteria you define and return the results to a spill range.

The syntax of **FILTER** is as follows:

**=FILTER(array, include, [if_empty])**.

It has three arguments:

**array:**this is required and represents the range that is to be filtered**include:**this is also required. This specifies the condition(s) that must be met**if_empty:**this argument is optional. This is what will be returned if no data meets the criterion / criteria specified in the**include**argument. It’s generally a good idea to at least use**“”**here.

For example, consider the following source data:

To begin with, I will perform a simple **FILTER**:

Here, in cell **F36**, I have created the formula

**=FILTER(F12:I27,G12:G27=G33,”Not Located.”)**

**F12:I27** is my source **array** and I wish only to **include** shapes (column **G12:G27**) that are ‘Triangles’ (specified by cell **G33**). If there are no such shapes, then **“Not Located.”** is returned instead. To show this, I will change the shape as follows:

That is about as basic as it gets. I can get cleverer. Consider the following example:

I have repeated the source **array** (cells **F48:I63**) for clarity. The formula

**=FILTER(F48:I63,(G48:G63=G69)*(H48:H63=G70),{"-","None","N/A","N/A"})**

looks horrible to begin with, but it’s not quite as bad as it appears upon further scrutiny. The **include** argument,

**(G48:G63=G69)*(H48:H63=G70)**

contains two conditions. Firstly, **G48:G63=G69 **means that the ‘Shape’ (column **G48:G63**) has to be a ‘Triangle’ (cell **G69**) and that the ‘Colour’ (column **H48:H63**) has to be ‘Red’ (cell **G70**). The multiplication operator (*****) is used to denote **AND**. The Excel function **AND** cannot be used with arrays – this is nothing special to Dynamic Arrays; **AND **does not work with **CTRL + SHIFT + ENTER** formulae either. This syntax is similar to how you would create **AND** criteria with the **SUMPRODUCT** function, for example.

The final argument is similar to the syntax in **SORT**: **{"-","None","N/A","N/A"}**. Braces (typed in!) are used to create an array argument that specifies what should be written in each column should there be no record that meets both criteria, eg

See? Not as bad as you might first think.

My final example is very similar:

Once you realise I have simply repeated referencing for clarity, the formula

**=FILTER(F84:I99,(G84:G99=G105)+(H84:H99=G106),{"-","None","N/A","N/A"})**

is nothing more than the **OR** equivalent of the previous example, with ‘**+**’ replacing ‘*****’ to switch from ensuring both conditions are met to only one condition being met. As at the time of writing, **XOR** is not catered for, but I am sure some clever person will create an equivalent in due course (if Microsoft doesn’t beat them to it), necessity being the mother of invention and all that jazz.

**Interlude: the #CALC! error**

I mentioned there were two new error messages. I have only referred to #SPILL! so far. There is another, lurking in the background (I say “in the background” as at the time of writing, Microsoft hasn’t written any documentation on it!).

Sometimes, as you explore how you can combine Excel functions with each other you get error messages (e.g. more often than not trying **FILTER(FILTER(…** will generate an #VALUE! error). When you start playing with these new array functions, you might stumble upon #CALC! This is a new one.

To add to the myriad of error messages such #REF!, #DIV/0!, #VALUE!, #BROWN and #PIPE, let’s introduce #CALC! – which probably means something like, "Excel cannot currently figure out the answer presently, but might be able to in a future release, no promises though". I look forward to the documentation in due course though to fathom its real meaning (probably something like, “Help! Abandon ship!”).

Let’s move on.

**UNIQUE function**

The hilarious thing about **UNIQUE **is that it does two things (!). It details distinct items (i.e. provides each value that occurs with no repetition) and also it can return values which occur once and only once in a referred range. I understand that Excel users may welcome the former use with open arms and that database developers may be very interested in the latter. I still think there should have been two functions though. Otherwise, let’s just extend the **AGGREGATE** function to do just everything (it almost does now) and be done with it!

The **UNIQUE** function has the following syntax:

**=UNIQUE(array, [by_column], [occurs_once])**.

It has three arguments:

**array:**this is required and represents the range or array from which to return unique values**by_column:**this argument is optional. This is a logical value (TRUE / FALSE) indicating how to compare. If you wish to compare by row, the argument should be FALSE or omitted (since this is the default). To compare by column, you will need to select TRUE**occurs_once:**this argument is also optional. This requires a logical value too:**TRUE:**only return unique values that occur once**FALSE:**include all distinct values (default if omitted).

It’s probably clearer with some examples. Let’s give it a go. As always, I need source data:

Time for the most basic illustration:

In cell **L13**, I have simply typed

**=UNIQUE(F13:F41)**.

No optional arguments; everything in default. If I have made an error, it’s going to be my default. This has simply listed each store that appears; if “North” and “North ” (extra space) were there, then both would appear. **UNIQUE** is not case sensitive though and each entry would appear as it first occurs reading down the range **F13:F41**. The other columns contain similar formulae and **UNIQUE **looks like it takes seconds to learn. Presently, there’s an in-joke going around the Excel Most Valuable Professionals (MVPs) that array expert Mike Girvin is going to be choked as he dedicated an entire chapter in one of his books to creating that list with an array formula! Sorry Mike. Excel __is__ fun!

It’s just as simple if you want to see unique records for two (or more) columns, viz.

You can see **UNIQUE **is sort of crying out for **SORT**, but we’ll get to that shortly.

As mentioned earlier, it’s not the only way of using **UNIQUE** (no, having a unique use would be just what “they” were expecting, whoever “they” are…). You can use it to determine values that only occur once:

Here, the formula in cell **L56**,

**=UNIQUE(G56:G84,0,1)**

uses the non-default value of 1 for the optional **occurs_once **(third) argument. This means it identifies the salespeople who only occur once in cells **G56:G84**. Brilliant; I can die content knowing now.

The real power starts coming when you start playing with Excel’s existing functions and features, together with these new functions. Take this comprehensive example:

Let me step you through some of this. The formulae in cells **L94** and **M94 **use **UNIQUE** in a similar manner to my first example, to generate the list of distinct values in the ‘Section’ and ‘Manager’ fields. However, did you notice they have been sorted? That’s because I used the formula

**=SORT(UNIQUE(H94:H122))**

in cell **L94**, for example. Honestly, I think **UNIQUE **should have another argument for sorting (ascending / descending / none [default]). Watch Microsoft ignore that suggestion.

But then I did something really cool. Cells **M105** and **M106** use data validation (**ALT + D + L**) to generate a list from the ‘Lookup Data’ section. That requires taking a closer look:

Do you see the source for the data validation in cell **M105**? **=$L$84# **- so elegant! This takes the ‘Section’ list and automatically makes the drop-down list the required length! People create all sorts of tricks using **OFFSET**, dynamic range names and the like to achieve a similar effect. No more. **=$L$84# **(with the ‘**#**’, the Spilled Range Operator) is all that is needed. That’s my favourite thing in all of these new functions and features. I’m impressed – and I’m easily impressed.

The ‘AND / OR’ dropdown is a bit of an anti-climax after that, but the final formula that generates the final table, namely

**=SORT(UNIQUE(FILTER(F93:I122,IF(M108="OR",(H93:H122=M105)+(I93:I122=M106),(H93:H122=M105)*(I93:I122=M106)),{"N/A","-","-","-"})),{1;2;3;4},{1;1;1;1})**

is rather fun. I am not going to go through it though – as every aspect of this formula is simply a re-hash of an earlier point (assuming you know the **IF** function!). See if you can work your way through it for yourself.

**SEQUENCE function**

The penultimate function is **SEQUENCE**. This function allows you to generate a list of sequential numbers in an array, such as 1, 2, 3, 4. It doesn’t sound particularly exciting, but again, it really ramps up when combined with other functions and features. The syntax is given by:

**=SEQUENCE(rows, [columns], [start], [step])**.

It has four arguments:

**rows:**this argument is required and specifies how many**rows**the results should spill over**columns:**this argument is optional and specifies how many**columns**(surprise, surprise) the results should spill over. If omitted, the default value is 1**start:**this argument is also optional. This specifies what number the**SEQUENCE**should**start**from. If omitted, the default value is 1**step:**this final argument is also optional. This specifies the amount each number in the**SEQUENCE**should increase (the “**step**”). It may be positive, negative or zero. If omitted, the default value is 937,444. Wait, I’m kidding; it’s 1. They’re very unimaginative down in Redmond.

Therefore, **SEQUENCE** can be as simple as **SEQUENCE(x)**, which will generate a list of numbers in a column 1, 2, 3, …, **x**. Therefore, be mindful not to create a formula where **x **may be volatile and generate alternative values each time it is calculated, e.g. **=SEQUENCE(RANDBETWEEN(10,99))** as this will generate the #SPILL! range is volatile in size error.

A vanilla example is rather bland:

Do you see how **SEQUENCE **propagates across the row first and then down to the next row, just like reading a book? I wonder how that might work in alternative languages of Excel where users read right to left (it has to be the same or there would be chaos when workbooks were shared!).

Some of my peers had fun combining it with the **ROMAN **function:

To my mind though, my favourite simple illustration is creating a monthly calendar. A little magic with the **DATE** and **WEEKDAY** functions combined with some conditional formatting and suddenly you have:

As I mentioned above, **SEQUENCE** is arguably more powerful when included in a more complex formula. For example:

In this instance, I have created a grid using the Excel **IPMT** function to determine the amount of interest to be paid in each monthly instalment. Cells **G62:R71** calculate each monthly amount and column **T** sums these amounts to calculate the annual interest payment, a figure which is non-trivial to compute. The whole table may be replaced by the formula in cell **V62**:

**=IF($F62="","",-SUM(IPMT(Annual_Interest_Rate/Months_in_Year,SEQUENCE(1,Months_in_Year,($F62-1)*Months_in_Year+1,1),Borrowing_Term*Months_in_Year,Amount_Borrowed)))**.

I am not going to explain this and let me tell you why. Our company, SumProduct builds and reviews financial models for a living. We see terrible modelling practices established day-in, day-out. We proactively try to discourage these traits by emphasising that complex formulae should be stepped out and made transparent. Here, that can be done using the original table. I don’t want people using **SEQUENCE**, Dynamic Arrays or other spilled formulae to wrap up complicated calculations into an opaque Pandora’s Box. Yes, calculation times may be slower. Live with it. Sometimes you need to see the scenery to appreciate the beauty. I’m just a little fearful that people will embrace these functions a little too readily and the Road to Excel Hell beckons shortly. Sorry to be a miserable git.

On an upbeat note, I put a formula in cell **G61 **which is simple:

**=TRANSPOSE(SEQUENCE(Months_in_Year))**.

Yes, I am using **TRANSPOSE** without **CTRL + SHIFT + ENTER**. We are in new territory here.

**RANDARRAY function**

And so, to the final function for now: **RANDARRAY**. Originally, the **RANDARRAY** function returned an array of random numbers between 0 and 1. It’s not clear from Microsoft, analogous to the pre-existing **RAND** function, which generates a number greater than or equal to zero and strictly less than one. However, there was a general sense of underwhelm with this function and the new and improved version has recently been released. It’s the first time ever a function has been changed after it has been released publicly. It now allows you to set you own maximum and minimum __and__ decide whether you want the values returned to be decimals (e.g. 17.4381672…) or integers (whole numbers).

The new syntax for the function is now as follows:

**=RANDARRAY([rows], [columns],[min],[max],[integer])**.

The function has five arguments, all supposedly optional (but upon testing, we weren’t quite as convinced):

**rows:**this specifies how many**rows**the results should spill over. If omitted, the default value is 1**columns:**this specifies how many**columns**the results should spill over. If omitted, the default value is also 1**min:**this is the minimum value that may be selected randomly. If this is not specified, it is assumed to be zero (0)**max:**this is the maximum value that may be selected randomly. If this is not specified, it is assumed to be 1**integer:**if this is set to TRUE, only integer outputs are allowed; the default value (FALSE) provides non-integer (decimal) results.

Other points to note:

- if
**rows**or**columns**refers to a blank cell reference, this will generate the new #CALC! error - if
**rows**or**columns**are entered as decimals, the values used will be truncated to the number before the decimal point (e.g. 3.9999999 will be treated as 3) - if
**rows**or**columns**is a value less than 1, #CALC! will be returned - if
**integer**is set to TRUE and either**min**or**max**is not an integer, this will generate an #VALUE! error **max**must be greater than or equal to**min**, else the error #VALUE! is returned.

When we originally discussed the **RANDARRAY **function, we used this rather comprehensive example to create a list of random integers between two values:

Originally, the formula in cell **F44** was

**=ROUNDDOWN(RANDARRAY(H36,H37)*(H39-H38+1),0)+INT(H38)**

and the article explained how this worked. However, it’s much easier now:

The “new improved” formula in cell **F45 **(it’s moved down a row due to the additional argument required in cell **H40**) is simply

**=RANDARRAY(H36,H37,H38,H39,H40)**.

Cool, eh?

For a final example, imagine you are a schoolteacher and you have 10 five-year-old children:

For each of the next 10 weeks, you have topics you want one of them to present on:

I can use **RANDARRAY** in tandem with **SORTBY** to determine a presentation order for the term:

Oh dear. I do hope Diana has prepared well or it could all end in tears. She could try swapping with Horace, I suppose. On a serious note, the formula

**=SORTBY(F63:F72,RANDARRAY(COUNTA(F63:F72)))**

sorts the ‘Child’ order randomly (and a similar formula is used for ‘Topic’ too). In a past life, as an independent expert, I once had to attest that drug testing was being performed entirely randomly, i.e. free from any material bias. **SORTBY(RANDARRAY)** dries up that well for me once and for all.

*Liam Bastick is author of an Introduction to Financial Modelling, which provides a simple walkthrough of the common perils and pitfalls of financial modelling. Presently, this book is only available on Amazon Australia but if anyone would like to buy a copy directly contact Liam via this link. AccountingWEB readers receive a 10% discount plus p&p (including tracking and insurance at cost).*

## You might also be interested in

Recognised by Microsoft as one of 104 Most Valuable Professionals (MVPs) in Excel worldwide by Microsoft, Liam has over 30 years’ experience in financial model development/auditing, valuations, M&A, strategy, training and consultancy. He has headed Ernst & Young’s modelling team in Melbourne and was an Assistant Director in their...

## Replies (1)

## Please login or register to join the discussion.

Excellent article, thought provoking and informative.

I have just complete a course by Leila Gharani, also very good, but you have taken it much further! Thanks.

One request, would you consider sharing your workbooks?

Cheers anyway.

William Petit