Excel summary functions: Conditional sums
In the second instalment of his overview of Excel’s summary functions, Simon Hurst delves further into the different capabilities of SUMIF, SUMIFS and array formulae for creating conditional sum arguments.
Adding up is one of the most common things to do in a spreadsheet. At its simplest, this might involve using the + operator to add up the contents of a couple of cells but, for anything more complicated, one of Excel's aggregate or conditional sum functions is likely to be necessary. When working with significant volumes of data, the best approach might be to use a PivotTable to summarise the data. Specific Excel functions are available to work with individual totals from a PivotTable or even directly from the underlying data. Adding is just one of a range of calculations that can be performed on a set of values and some of the functions that generate totals can also be used to perform these other calculations.
Excel summary series - summary
This series looks at different Excel functions and features that can be used for adding up and evaluating values. The previous article looked at simple sums and examined the more flexible aggregate functions introduced in Excel 2010, SUBTOTAL() and the AGGREGATE(). This article looks at various ways to extract summaries conditionally, for example to calculate the total sales made of a particular product from a detailed list of sales invoices. The methods covered include:
Back in the days before the arrival of Excel 2007 Excel’s only conditional sum function was SUMIF() which did exactly what its name suggests – it summed a range of values if they matched a criterion.
At its simplest, SUMIF() only requires two arguments: the range of cells containing the candidate values and the criterion against which to compare them. For example we could add up all the values in a list that are equal to 50:
We are not limited to an ‘equality’ criterion, we can also use the basic comparison operators =<>, but if we do so, the criterion needs to be surrounded by double quotes:
Anyone familiar with the ICAEW 20 Principles will probably be starting to shout at their computer screen now, invoking principle 14: “Never embed in a formula anything that might change or need to be changed”. Both of our SUMIF() formulae contravene this principle in at least one, and probably in both, arguments. We might most obviously want to change our criterion to a different value, but we might also need to cope with a criteria range of values that expands as we add data.
First of all, we’ll consider our criterion value. We can put this value in a cell and then refer to that cell from our formula:
This is not quite so easy where we use a comparison operator. If we just put our value in a separate cell, we need to construct our criterion by appending, or concatenating, the cell reference to the text version of the comparison operator using the & operator:
For maximum flexibility, you could go even further and put the value in one cell and the comparison operator in another:
With regard to the range of cells in the first argument, probably the easiest way to ensure that this changes automatically is to turn the range into an Excel Table (or for 2003 an Excel List). Here we have clicked a cell in our block of data and used the Control+T keyboard shortcut to convert the range to a Table. We have then used the Table Tools, Design tab to give the Table a descriptive name, in this case: SalesInvoices. Note that spaces cannot be included in a Table name, hence the use of the initial capitals.
If we re-enter our formula using our Table, the first argument range will be a reference to the entire Table column, not only ensuring it adjusts automatically for added rows, but also making the formula itself easier to understand:
If you only want to count items, rather than sum them, then a similar COUNTIF() function is available and for calculating averages, AVERAGEIF()
So far, we have used SUMIF() with just the first two compulsory arguments. The optional third argument refers to a ‘Sum range’ allowing values in one column to be summed based on whether values in another column match our criterion:
Although SUMIF() was, and still is, a very useful function, it does suffer from one main drawback – it can only cope with a single condition. For some calculations, you can get over this by using multiple SUMIF() functions in a formula. For example, if we wanted to total all sales between two dates we could use one SUMIF() to find the total of all sales before or equal to the last date and then subtract another SUMIF() that totals all the sales before the first date:
The Conditional Summary Wizard is dead, long live SUMIFS()
However, if you wanted to find the total of all sales of Pavlova made by Janet Leverling, SUMIF() wouldn’t be able to cope as there is no way to include two different criteria ranges. Prior to Excel 2003, the obvious solution (leaving PivotTables until the next instalment) would be some sort of array formula. Excel 2003 and 2007 included a Conditional Sum Wizard that guided you through the creation of an array formula step-by-step. From Excel 2007 onwards, a simpler solution would be to use the new SUMIFS() function that allows for up to 127 pairs of criteria range and criteria (there is also a COUNTIFS() and an AVERAGEIFS() function). The SUMIFS() syntax is slightly different to SUMIF() in that the sum range is the compulsory first argument followed by the criteria range, criteria, pairs:
An array formula equivalent could be based on using Boolean logic to multiple the sum range by a True/False test on each cell in the criteria range. The True/False test would compare each value in the criteria range with the criteria value. If the ‘statement’ evaluates as True this can be treated as a value of 1, if False it will be 0.
Turning a formula into an array formula causes it to evaluate ranges of cells as a series of individual calculations. So in our example it multiplies the first cell in the ExtendedPrice column by whether the first cell in the ProductName column is equal to K5, and then by whether the first cell in the Salesperson column is equal to L4. Because they are part of a mathematical operation (multiply) True will be converted to 1 and False to 0, so that a value from the ExtendedPrice column will only be included in the sum if neither of the two criteria statements evaluate as False. If one or more is False, then the total is multiplied by 0 and is thus 0 itself. This is repeated for each of the values in the column. The initial SUM() then adds up the total of all the individual calculations.
Array formulae have to be entered by accepting the entry of the formula using Control+Shift+Enter, rather than just Enter. This turns the formula into an array formula and adds the curly brackets at the beginning and end. Typing in the curly brackets at each end manually will not turn a formula into an array formula. Instead, it will just be treated as a text entry. This makes array formulae quite ‘fragile’ if someone edits the formula, even if they make no changes, unless they save it again using Control+Shift+Enter, it will cease to be an array formula and will almost certainly return an incorrect answer. Also, because array formulae generally perform a large number of individual calculations, using lots of array formulae, particularly if they refer to large ranges, can be slow.
The fragility drawback of an array formula can be overcome by using the SUMPRODUCT() function as a ‘wrapper’ for the arrays. SUMPRODUCT() appears to be a fairly unremarkable function, it just multiplies its arguments by each other. So, SUMPRODUCT(2,3,4) returns 2*3*4=24. However, SUMPRODUCT() treats each of its arguments as an array, so a ‘normal’ formula can be placed within SUMPRODUCT() to make it work like an array formula:
Given that SUMIFS() seems so much simpler than the array alternatives, if you have Excel 2007 or later and therefore have access to SUMIFS(), you might be wondering why you would ever bother with an array or SUMPRODUCT(). In fact, there are some circumstances where you won’t be able to use SUMIFS() but you might be able to use an array.
One such circumstance involves references to other Excel workbooks. SUMIF() and SUMIFS() return an error if they refer to cells on an external workbook that is not open. An array formula, or SUMPRODUCT() will work with references to closed workbooks.
Also, SUMIFS() can only cope with simple comparison operators. If you needed to perform an operation on one or more of the data ranges using a function, you couldn’t do this within SUMIFS(), instead you’d have to add one or more columns to the data to perform your calculations and then refer to those columns. For example, you might want to add up the sales of all the products that include the word ‘hot’. You cannot apply a function to the criteria range within SUMIF() or SUMIFS() but you can in an array formula or when using SUMPRODUCT():
Next time we'll have a look at some different approaches to extracting summaries including the use of the DSUM() function. However, if you can cope with the need to remember to refresh PivotTables, then extracting individual values from a PivotTable based on your data might be the simplest solution of all. We’ll look at how to do this using the GETPIVOTDATA() function or a cube function.