welding
istock_gilaxia

Supercharge Excel: Consolidate data in files and folders

by
30th Dec 2016
Partner The Knowledge Base
Share this content

In the third part of a series of articles, Simon Hurst looks at how Excel's Power Query/Get & Transform tools can be used for a great deal more than just acquiring data from external sources.

Introduction

The first part of the series included an overall introduction to how these tools work, what they are capable of and how they might require a different way of thinking compared to 'standard' Excel techniques.

Part two looked at using the Power Query tools to automate the process of appending several Tables of data into one single Table. As stated in the first Part’s introduction, for convenience, we will refer to Power Query throughout, but Excel 2016 Get & Transform and Power BI Get Data operate in a similar way.

The story so far…

Last time we created a single table of data by using the From Table command to create three separate query connections to three Tables of data in a single Excel workbook. We then changed two of the column headings in one of the queries to ensure all three were consistent, before using New Query, Combine Queries, Append to create our single, consolidated Table.

This automates the consolidation process when we need to cope with changes in the number of rows in each Table, but if we needed to add one or more new Tables, we would have to turn each into a query and change our Append query to include those additional queries.

There are other methods we can use in Power Query to automate the consolidation of a variable number of Tables. We will look at creating a connection to a single workbook to automatically append all the relevant Tables in that workbook, and then look at a connection to a folder which will allow Tables to be consolidated even if they are in different workbooks.

Here are the three Tables we worked with last time:

Source: Simon Hurst

New Query, From File, From Workbook

We'll start by creating a connection to a single file, in our case the Excel workbook containing our three Tables of data. From within our workbook, we use New Query, From File and then choose From Workbook and navigate to the same workbook.

Once selected, the Navigator window will display the contents, showing the whole Workbook and any Tables, Sheets and Range Names. We could turn on 'Select Multiple items' and choose just the Tables that we want to consolidate, but this would cause the same issue as our original approach: we would need to change our query to accommodate any new data tables that were added. Instead, we will select the entire file using the top level item.

Source: Simon Hurst

We then click Edit to start setting up our consolidation query:

Source: Simon Hurst

We want to set up our query to automatically include existing Tables, together with any new ones we add. We can filter the Kind column to select all Tables but this creates a problem. We want to make use of our consolidated data once we've completed our query. We will probably do so by loading our query into a Table in our workbook.

Once we have saved our workbook (a query that uses a file as its source works on the last saved version of the file), our consolidated Table will be included as one of the file objects. As a result, our filter on the Kind column will include our consolidated Table as well as the source Tables. For this reason, it is usually better to leave our data Tables in one workbook and create our consolidation in a separate workbook.

In our case, we could handle the issue by ensuring that the Tables that we want to use are differentiated from any other possible Tables in some way – for example by using a prefix as we have done.

As well as filtering on the Kind column for Table, we could filter the Item column for text that starts with tbl:

Source: Simon Hurst

Note that we have renamed our Applied Steps to make it easier to understand what each does.

We can then right click in our Data column heading and choose to Remove Other Columns, just leaving us with the Data column. The header of this column includes the Expand icon which allows us to show all the rows in all the Tables.

We can see a preview of the available rows by clicking in the white space to the right of the word Table in each 'cell'. The Expand operation gives the choice of Expanding the rows or creating one or more Aggregate values. We can choose which columns to include and whether or not the column names should be prefixed by the original column name:

Source: Simon Hurst

We now have to deal with the fact that our column names are not consistent. As we can see below, the Customer name and Product name column headings differ across our Tables and, as a result, we have multiple columns for each. We can select each pair of columns in turn using Click and Shift+Click or Control+Click and then use the Transform Ribbon tab, Merge columns command.

Because one of our two columns must always contain 'null' while the other has the value we want, this will result in a single column that contains all our Customer names:

Source: Simon Hurst

Finally, we can tidy up our column names and change our data types for the ExtendedPrice and OrderDate columns using the icon at the left of each column heading:

Source: Simon Hurst

We can then Close & Load our query to create our Table of consolidated data ready for analysis.

To demonstrate the ability to cope automatically with the creation of a new Table, we have added a sheet and a Table for Spain. We need to save our workbook and then refresh our query. The new rows for Spain should be included in our consolidated Table:

Source: Simon Hurst

New Query, From File, From Folder

An alternative approach is to create a query that connects to a folder and potentially includes all the used rows in all the workbooks in that folder. To demonstrate this, we have set up a new folder called Country data and added four separate workbooks, each containing one of our Tables. We then use New Query, From File, From Folder and browse to our folder:

Source: Simon Hurst

We can then choose to Edit our query as before.

Using a folder in this way does create some additional issues. We might need to filter our files to only include workbooks and also to only include workbooks that start with a particular prefix.

Alternatively, we could just be careful only to add the files we want to use to the folder. We also don't have an existing column that we can just expand to include all our rows. Instead, we use the Add Column Ribbon tab, Custom Column command to create such a column. We need to use the ‘Excel.Workbook’ function as shown here:

Source: Simon Hurst

 

We need to know that the Excel.Workbook() function exists and ensure that we get the letter case right as the Power Query formula language (also known as 'M') is case sensitive.

The 'Learn about Power Query formulas' link allows you to explore the various categories of function available. We can select the 'Content' column from the list of Available columns: to add it to our formula.

Our new column contains a list of the objects in each workbook which we can expand:

Source: Simon Hurst

We can use the Kind column to select Tables or Sheets, and we can then expand the Data column in turn to show us our individual rows. Note that this method consolidates by column position rather than column name, so we don't have the same problem with multiple columns for Customer and Product names:

Source: Simon Hurst

In our example, the Country column in the data designates the data source, but it's worth pointing out that, if the data itself doesn't show the data source, you can use the various text manipulation features covered in the first part of the series to extract the source details from the workbook name or the sheet or Table name.

For example, we could split the workbook Name column using the full stop as the delimiter to create a column that holds just the workbook name as the source.

As in the previous From File method, we can then remove any columns that we don't need and set the data types for our ExtendedPrice and OrderDate columns before loading our data to a worksheet. If we placed another workbook file in the Country data folder and refreshed our query, the additional data would automatically be included in our consolidated data table.

Tags:

Replies (0)

Please login or register to join the discussion.

There are currently no replies, be the first to post a reply.