Cutting Big Data down to size – part 4: Geographical databy
In part four of his Big Data series, Simon Hurst moves away from Power BI and back to Excel, where he looks at other ways to represent data with a geographical element. This will be the first part of a two-part article.
In the first part of this series we discussed possible definitions of Big Data and considered whether some of the technology and techniques involved could be of use to businesses of all sizes.
In part two we used Microsoft Power BI Desktop to combine some internal accounting data with over 1.6m rows of data downloaded from the Ofcom website and plotted the resulting values on a Power BI Map visualisation. Part three concentrated on Power BI data visualisations and how they interact.
This time we will concentrate on acquiring a simple table of data from the Internet and then shaping it to the data structure that we need to work with. We will then use a 3D Map to present our data graphically before considering the simpler Filled Map and Bing Maps alternatives.
Next time we will modify our data query to compile data from multiple web pages and add a time dimension to our map graphic.
Note that the Filled Map is only available in Excel 2016 and, because it was included in a recent monthly update, it may currently only be available to Office 365 subscribers. There is a Bing Maps add-in that works in a similar way for those with Excel 2013 or earlier updates of Excel 2016. 3D Map is included in all editions of Excel 2016 but was originally introduced as the optional Power Map add-in in Excel 2013.
World Athletics Championships medal table
We'll use a topical example. The IAAF website includes medal tables for all 16 world championships. We can select a particular championship from the dropdown and the URL changes to show the appropriate table. We will start with the 2015 championship in Beijing.
We will use the Power Query tools. These are available as an add-in for Excel 2010 and 2013, and are built in to Excel 2016 as the Get & Transform group of the Data ribbon. The From Web option lets us specify the URL to use:
On clicking the OK button we are taken to the Navigator where we can choose which element of the web page we want to work with. We are lucky in this case: Table 0, containing our medal table, is immediately visible and we can select it and confirm in the preview that it is the table we need. Often, the information in webpages is hidden quite a few levels down and requires considerable trial and error to unearth it:
We need to change the way our data is structured in order to use it with our map graphics, so we use the Edit button to open our table in the Query Editor:
Our aim is to create a table with three columns: Country, Medal type and Total. We can right-click in the column heading of each of the Rank and Total columns which we don't need and Remove them. Currently, we have separate columns for Gold, Silver and Bronze totals. We need to reorganise these into a single column.
First, we right-click in each of the three column headings and choose to Rename them as Gold, Silver and Bronze respectively. We then click on the Gold column heading and shift+click in the Bronze heading to select all three columns. Another right-click on any of the three selected column headings will allow us to Unpivot Columns:
As the descriptions suggest, this is the opposite of Pivoting a field which creates separate columns from all the different items in a field. Here we create a single column that contains each of our column descriptions, and a separate column containing the value at the intersection of each Country and medal type.
As shown below, we now have three times as many rows as before, with a row for each country for each medal type:
We will rename our 'Attribute' column as 'Medal'. Our data is now structured as we need it, so we can choose Close & Load to load it to a worksheet Table. We can also right-click on our query and rename it as Beijing:
Power Map/3D Map
3D Map and its predecessor, the Power Map add-in, create animated visualisations of your data that you can view as a tour and convert to an HD video file. We are using Excel 2016 so the Insert Ribbon tab includes a Tours group with the 3D Map command.
When we click on 3D Map we see a globe and a Layer pane with areas for us to add fields from our available queries. For our Beijing medals table we need to drag Country to the Location area, Value to the Height area and Medal to the Category area:
The Location area needs more work. We have to specify which type of geographical location our field represents. 3D Map will then indicate any issues with allocating our field entries against the locations it will use. A confidence percentage will be displayed that we can click on to see the reasons for it being below 100%:
Here we can see that Great Britain & N.I is not recognised at all and Israel is potentially ambiguous. In fact, if we check our map, we can see that the Israel medal is plotted correctly. Without wishing to exacerbate an already fraught international situation, we need to replace Great Britain & N.I with United Kingdom or with UK.
To do this we return to our worksheet and right-click on the Beijing query in the Queries & Connections pane, and choose to Edit the query. We can then use the Replace Values command in our Country column to make our change. If we click on one of the cells containing the text we want to replace and then right-click, our Value to Find box will contain the required value:
We can then return to our 3D Map screen and use the Refresh Data command in the Home Ribbon tab and check our confidence percentage again.
We can now start playing with all the 3D Map options to plot our data. Here we have used an Aerial Colour Theme and a Flat Map with Map Labels. We have also chosen the Bubble chart type and changed the Layer Options to make our bubbles bigger and changed each category colour to represent the type of medal. We can hover over any of our 'bubbles' to see the details:
Other map graphics
If you don't have access to Power Map or 3D Map, all is not lost. As we saw in the previous part of the series, Power BI provides a free alternative that has a basic map option. In addition, towards the end of 2016 a Map Chart was introduced for Office 365 subscribers to Excel 2016. Prior to this, from Excel 2013, mapping add-ins, including Bing Maps, have been available.
Bing Maps needs quite a simple set of data to work with. Here we have right-clicked on our Beijing query and chosen to Duplicate it. We have then chosen to Edit our new query and remove the steps that Unpivot our medal type columns and rename our resulting Attribute column (which no longer exists).
Having installed Bing Maps from the Add-in Store, we have selected our entire query Table, inserted a Bing Map, and changed the Settings to generate a similar chart to our Power Map:
By the time the next part of the series is produced, London 2017 will be over, so we will adapt our existing query to bring in a list of championships, rather than just Beijing, together with the years in which they took place, and then use our new date and time field to animate our 3D Map.