EZ guide to PivotTables 2 – From why to how
The first part of this series concentrated on why rather than how and sought to show that PivotTables can often be the quickest and simplest way to solve common Excel problems. Having hopefully shown why PivotTables are worth investigating, Simon Hurst looks in a bit more detail at how to use them.
Last time we used a PivotTable to create a top ten league table of average house price sales values in England, Scotland and Wales with just 17 mouse clicks using publicly available GOV.uk data.
Content seriesView full content series
We used the first CSV (comma separated value) file in the list - Average price.csv - and downloaded it straight into an Excel workbook before turning it into a PivotTable showing area and average price, filtered to show the top 10 highest values:
Before going any further, it's worth discussing one of the most significant causes of post PivotTable tristesse. If the data that your PivotTable is based on is inconsistent or badly-structured, it will make creating a useful and effective PivotTable much more difficult.
For example, in our house price data we have a single column for date, another for Region_Name and another for our average price. If our data had already been organised to show a separate column for each year it would have made it much more difficult to create our simple Top 10 PivotTable. Otherwise, we would not have a single value field that we could select, but instead, dozens of separate fields, each showing the results for a single year. With regard to data inconsistency, if a column is meant to contain dates or numbers, then every value in the column must be a date or a number. If it isn't, as well as potentially getting an incorrect answer, you might find that Excel defaults to counting the values in a field rather than summing them or that date fields say that they cannot be grouped.
In general, when using PivotTables based on worksheet data ranges, it's also a good idea to turn the range into a Table before creating the PivotTable.
The PivotTable should then automatically include adjacent new rows and columns when refreshed. If the PivotTable is just based on cell references rather than a Table name, then the PivotTable data source would need to be updated manually as new data is added as well as being refreshed.
Using a Table also helps make the formulae in the following explanation a bit clearer, so we've clicked on a cell in our block of downloaded raw data and used Insert, Table to turn in into an Excel Table that we have then named as tblHousePrices.
Our PivotTable uses two of our PivotTable areas – the Values area that aggregates its contents in some way, usually as a sum but in our case as an average, and the Rows area that contains the categories by which our values are grouped.
If you are used to using Excel's SUMIF() and SUMIFS() range of conditional functions it might help to think of the Values areas as being calculated in the same way as a SUMIFS() function, with the sum range being the entire contents of whichever field/column has been placed in the values area and the criteria range being the entire contents of whichever field/column has been placed in the rows area. The criteria value itself is the equivalent of each separate value in the rows area.
Industry insightsView more
So, to calculate the figure for Kensington and Chelsea using SUMIFS() we would have:
Our example used Average rather than Sum to aggregate our house prices so the equivalent of this would be:
The reason that we have used the IFS rather than IF version of each of these functions is that, as we go on to add fields to other areas of our PivotTable, or to add multiple fields to a single area, we can think of it as adding additional pairs of criteria ranges and criteria values to our notional conditional sum function.
Multiple fields in the row area
We'll start by just adding an additional field to the Row area. As we mentioned briefly last time, what happens when you add a date to the Row or Column area varies according to your version of Excel.
Before Excel 2016 you will see a list of each individual day in the field. In Excel 2016 your dates will be automatically grouped into separate Years, Quarters and Date fields with the Date field being grouped by month. To cater for the different versions, first click on the checkbox next to Date in the Field List. The Date field should automatically be added to the Rows area beneath Region_Name. Now, whether or not the Date field has already been grouped, right-click on any of the dates you can see (whether they are years or days) and choose Group and set the grouping to Years and Months:
Now, regardless of version, we should see each of our regions displaying a list of years, subdivided by month with subtotals displayed at each level. We can turn these subtotals off and on by right-clicking on any value in the relevant field and choosing subtotal ‘Field name’. We can also click the expand/collapse buttons next to each item to drill up and down through the levels of detail. If you want to expand/collapse all the values in a field, with that field selected, choose PivotTable Tools, Analyze Ribbon tab, Active Field group and then expand field or collapse field. Here we have turned off the subtotals for Region_Name and then selected any year and used collapse field to hide the display of individual months:
At the moment, our dates are subtotalled within Region_Name. We can change the order by dragging Region_Name to underneath Years and Date in the Rows box area of the Field List:
So far, we have only used two of our four available areas so our summarised values are presented as a long list in a single column. We can add items to the columns area to show our results in matrix form. As well as selecting the check box next to a field to have Excel place it in the default area, we can drag a field to a specific area, either from the field list or from any other area. This ability to visualise your data in different ways, just by dragging fields into different areas, is one of the reasons PivotTables can be such an interactive and effective way of exploring data.
Here we have dragged Date from the Rows area to the Columns area:
Note that both the Row Labels and the Column Labels headings include dropdowns allowing the fields in the respective areas to be sorted and filtered.
Talking of filtering…
We can drag Years from the Rows area to the Filters area where it will become a dropdown allowing us to select individual years or choose to Select Multiple Items:
Although we can see which year we have chosen when we select an individual year, if we do turn on Select Multiple Items and then select more than one year, our filter field shows the unhelpful 'Multiple Items':
In the next part of the series, we will be looking at overcoming the issue of selecting multiple items, together with making our PivotTables even more interactive as a first step towards using PivotTables to transform raw data into an interactive dashboard.
Simon Hurst is the founder of technology training consultancy The Knowledge Base and is a past chairman of the ICAEW's IT Faculty.