Save content
Have you found this content useful? Use the button above to save it to your profile.
AIA

Excel summary functions: A summery summary

by
22nd Aug 2014
Save content
Have you found this content useful? Use the button above to save it to your profile.

Simon Hurst gets his teeth into the different characteristics of  Excel's aggregate and summary functions.

Adding up is one of the most common things to do in a spreadsheet. At its simplest, this might involve using the + operator to add up the contents of a couple of cells but, for anything more complicated, one of Excel's aggregate or conditional sum functions is likely to be necessary. When working with significant volumes of data, the best approach might be to use a PivotTable to summarise the data. Specific Excel functions are available to work with individual totals from a PivotTable or even directly from the underlying data. Adding is just one of a range of calculations that can be performed on a set of values and some of the functions that generate totals can also be used to perform these other calculations.

In this short series we’ll look at the various different Excel functions and features that can be used for adding up and otherwise evaluating values. In this first part we will start with simple sums and progress to the more flexible aggregate functions.

Aggregate functions

SUM()

Although almost everyone is familiar with the basic use of the SUM() function, even this function has a few points of interest. SUM() is usually used with a single argument to add up a contiguous range of cells but the function can deal with up to 255 separate arguments. At first glance, you might think that =SUM(A1,A3) and =A1+A3 would be interchangeable, but they can give very different results because of the way that they each process text. SUM() will ignore a text entry whereas + will attempt to evaluate it:

SUM() will ignore a text entry whereas + will attempt to evaluate it

As you can see in the example above, trying to use + where one of the cells contains text will result in an error, but using each cell reference as a separate argument of the SUM() function will not. The difference goes further, if the text entry was made up of numbers, for example a year heading for a column entered as text, + would convert the text value to a number and include it in the total – a process known as coercion. Excel SUM() function ignores figures that appear as text

In practice, this would mean that you could use SUM() to create a consistent formula in a column of data where you need to include a reference to a heading cell.

SUBTOTAL()

SUM() is one of a family of aggregate functions each of which calculates a single value from a range, for example: AVERAGE(), COUNT(), MAX() and MIN(). There are two additional aggregate functions which can do the same thing, but with more flexibility. SUBTOTAL() is available in all recent versions of Excel and is the function that will be used if you use the Data, Subtotal command to subtotal values in a block of cells. Although SUBTOTAL() can produce a simple total in the same way as SUM(), it has an additional argument that can be used to define the type of aggregate and also how values in hidden rows (but note, not hidden columns) should be treated. The first SUBTOTAL() argument is a number between 1 and 11 or between 101 and 111. The number from 1 to 11 defines the type of aggregate, with the 100+ range including the same aggregates but ensuring that hidden rows are not included in the total:

 SUBTOTAL() will include the same aggregates but omit hidden rows from the total

In this example we have hidden row 4 using the Data, Group feature and we can see how SUBTOTAL() with its first argument set to 109 or 9 behaves differently and how it compares to SUM():

How SUBTOTAL() behaves differently from SUM()

Note that, as this example shows, the argument of 109 will exclude values in rows that are hidden either using the Hide command or by 'collapsing' a group of rows. Values that are in rows that are 'hidden' by applying a filter will always be included in SUM() calculations and excluded from SUBTOTAL() whichever of the SUBTOTAL() argument numbers is used:

Filtered rows will be included in SUM() calculations but excluded from SUBTOTAL()

There is another key difference between using SUM() and SUBTOTAL(). If a SUM() range includes cells that contain other SUM() or SUBTOTAL() calculations, then they will be included in the total, potentially resulting in values being double-counted. SUBTOTAL() will ignore other SUBTOTAL() cells but will include SUM() cells:

SUBTOTAL() will ignore other SUBTOTAL() cells but will include SUM() cells.

There is another, less obvious, difference between SUM() and SUBTOTAL(). SUM() can include '3D' references that refer to the same range of cells on multiple contiguous sheets. Using a 3D reference in a SUBTOTAL() formula will generate a #VALUE! error:

Using a 3D reference in a SUBTOTAL() formula will generate a #VALUE! error.

3D references can be created by clicking on the first sheet tab, then clicking on the second sheet tab while holding down the Shift key. You can then let go of the shift key and click on the required cell. The resulting formula will then include the values in the designated cell of each of the specified sheets and all the sheets positioned between them.

AGGREGATE()

Excel 2010 introduced an even more flexible aggregate function named, appropriately enough, AGGREGATE(). AGGREGATE() provides many additional calculation options and more control over the other types of cell values to avoid. The following calculation functions are added:

  • MEDIAN
  • MODE.SNGL
  • LARGE
  • SMALL
  • PERCENTILE.INC
  • QUARTILE.INC
  • PERCENTILE.EXC
  • QUARTILE.EXC

Rather than the value of the first argument combining the type of calculation and the treatment of hidden rows as for SUBTOTAL(), AGGREGATE() has separate arguments for each, allowing 8 different variations to be provided for:

AGGREGATE() can accommodate up to 8 separate arguments.

The percentile and quartile options are worthy of further explanation. I’m not a statistician so if you have a clearer and simpler explanation, or if I’ve just got it completely wrong, please feel free to add an explanatory comment to this article.

Percentiles and quartiles relate to a set of values and how individual values are ranked within the set. For quartiles, the range of values is, in effect, divided into 4 and the quartile function returns the minimum value that would need to be attained to rank in each section. For the third quartile it would return the minimum value needed to rank in the top 25% of values, for the second it would be the minimum value for inclusion in the top 50% (this would be the same as the median value) and for the first quartile it would be the minimum value for inclusion in the top 75% of values. Where necessary, the functions will interpolate to determine the value as you will see in the following example. QUARTILE.INC() also allows the use of 0 and 4 for the quartile argument, these will return the lowest and highest values respectively, and will therefore return the same values as MIN() and MAX().

Percentiles work in exactly the same way as quartiles except that, rather than just dividing the set of data into 4 sections, they divide it into 100 sections. This means that the 25th percentile is equal to the first quartile, the 50th percentile to the second quartile and the 75th percentile to the third quartile. If a value lies in the 99th percentile, it is higher than 99% of the other values.

Hopefully, this example will help explain the concept for fellow non-statisticians:

A non-statistical view of AGGREGATE function results

Tags:

Replies (4)

Please login or register to join the discussion.

avatar
By brian.barrett
22nd Aug 2014 22:15

Subtotal and hidden columns

Very good points I was not aware of with the sum(a1,a2) operating differently from a1+a2; and with the 3D version of sum().

As an aside, Simon, can the 3D sum be used with range names local to each sheet - i.e. the same range name but each range only locally defined so sum(sheet1:sheet2!RngName) ?  I have tried and failed but perhaps it's just me!

You also mentioned the subtotal(1xx,a1:g1) not working with hidden columns - strangely this week I had a severe need to do a sum across columns but ignoring hidden cells/columns.  I spent time online but there was not easy non-VB way of doing it.  Eventually I sorted it by using the cell("width", ...) function - this gives the [character] width of a cell and appears to return 0 for hidden cells.

As such therefore I was able to do a column based subtotal(109,a1:g1) by using row 2, which could later be hidden, to contain [a2] =if(cell("width",a1)<1,0,1); and row 3, which again could be hidden, to contain [a3] = a1 * a3.  The required result would then be =sum(a3:g3).  An array formula would do away with the need for row 3 - and, I have not tried, possibly row 2 as well if the cell() function will go into an array formula.

Whilst this does work, are there any plans to allow columns to be hidden?

 

Thanks (0)
Simon Hurst
By Simon Hurst
26th Aug 2014 08:55

Worksheet-scope range names and Cell() in array

Thanks Brian. I'm glad you found some useful points in there. I'm afraid I haven't got much positive news on the questions you have asked. As far as I am aware it is not possible to use worksheet-scope range names in a 3D formula - I certainly haven't been able to make it work but would be delighted if anyone else has found out how to do it.

Your hidden column solution looks good but I haven't had much luck trying to reduce it to a single cell array formula - let me know if you succeed! Given the recent introduction of AGGREGATE() with its additional capabilities I suppose there is always hope for an enhancement to deal with hidden columns, in the meantime, the other solution would be to create a user-defined function that does what you want. One of the third-party utility add-ins for Excel might include such a function.

Thanks (0)
avatar
By Cantona1
03rd Sep 2014 10:21

AGGREGATE() with option 2

AGGREGATE() with option 2 (error trapping) Is a handy alternative to using Arrays.

Thanks (0)
avatar
By brian.barrett
04th Sep 2014 23:46

simon ...

.. thanks for the reply - I also was unable to find a single cell formula that would give the array product of my width fuction and associated value - so the best I am able to do to get, say, the sum of an array that goes across columns, is to have a hidden row that is set to 0 if the cell/column width is zero, and 1 if not.  An array formula will then give the sum of this hidden row and the array you want to get the sum of visible cells.

Unfortunately I feel unable to use VBA or user-defined functions due to company rules!

Hopefully upcoming versions will allow for hidden columns, but the way columns are treated does indicate to me that rows and columns for some reason are treated fundamentally different.

Thanks (0)