By some strange coincidence, at about the same time I was writing an overview of the Excel 2016 changes, a question was being posted to Excel Zone Any Answers concerning the best way to compare two lists in order to identify items that were in one list but not the other.
Without wishing to claim any supernatural powers of prophecy, the example in the 2016 article of the potential of the new Get & Transform tools was particularly relevant to the question asked:
“The inclusion of the Power Query tools as an integral part of Excel 2016 within the Get & Transform group of the Data ribbon tab not only extends how much of the data processing burden Excel can cope with, but also has the potential to change the way we work within Excel itself. To take a simple example, complex lookups involving large numbers of separate formulae could be replaced by joining two Excel tables within a Workbook Query.”
This was the Any Answers question:
List A:
12232
15353
54684
51564
List B:
12232
54684
99853
55887
All I want to see is if there are numbers in list B that aren't on list A. I'm not worried about the reverse - it can appear on list A and not B and that doesn't matter. How can I easily compare the two lists? So in the above example, I want 99853 & 55887 somehow highlighted or conditionally formatted so that I can add them to list A.
Lots of answers were suggested, most of which would indeed have solved the problem, but it’s worth considering the advantages and drawbacks of a Get & Transform approach. Of course, for lists of four items, Get & Transform wouldn’t really be necessary, but as the number of items increases, so do the advantages of a database-style approach rather than using Excel formulae.
The Get & Transform solution
Here we have our two lists. They would probably be on separate, named, sheets but, to make it easier to see what is going on, we have just put the lists side-by-side on the same sheet. We click on each list in turn and use the Data ribbon tab, Get & Transform group, From Table option. This automatically converts each of our Excel lists into an Excel Table. We can give our queries meaningful names such as ListA and ListB. In this example we have also added the optional step of using the Add Custom Column option from the Query Editor, Add Column ribbon tab to identify the source of each list just by giving the new column the name Source and setting the formulae to =”ListA” and =”ListB” respectively.
(Click to enlarge)
We just want to use these queries as background connections for our next operation so, from the Home ribbon tab, we click on the Close & Load dropdown and chose Close & Load to… Only Create Connection:
(Click to enlarge)
Having completed this operation for both lists, we have two queries named ListA and ListB.
The next step is to combine the two queries. When we do so, we will have a range of options as to which rows to include in our output. The original request was to show the items in ListB that were not also in List A. From the Get & Transform group we choose New Query and the Combine Queries, Merge option. We are then prompted to choose the two tables to merge, the columns that create the join, and the kind of join to use:
(click to enlarge)
We have selected ListB as the first table because we want our result to include Codes from this list for which there is no match in ListA, the second table we choose. We just click in each of the Code columns to indicate that this is the column to use to create the join. Finally, we choose the type of join. In our case we want “rows only in first” – i.e. rows that are only in the first of our two tables. At the bottom of the screen the number of rows that the merged query will return is shown. Once we have clicked on OK to open the Query Editor screen, we just need to click on Close and Load to load our result into a separate sheet. Here we can see the result along with our Workbook Queries:
The Refresh Issue
One of the real benefits of this approach is automation. The Query Editor creates a series of steps that are applied to the underlying data whenever the query is refreshed. This means that further rows can be added to each table and all that needs to be done to update the results table is to right-click in it and choose Refresh. However, the need to refresh can seem something of an alien concept to Excel users who are accustomed to formulae automatically being recalculated whenever a change is made. It is certainly possible to go to Data Connections, Properties and set the Refresh control for a query to refresh every so many minutes. Even though this removes all manual intervention in a similar way to automatic recalculation, it is based on a time interval, not on when something is changed. This means that, unlike an Excel spreadsheet with automatic calculation turned on, there can be a period of time (albeit only a few seconds) when the result does not reflect the latest changes.
Going further
Returning to the original question, the reason to identify the items in List B that were not also in List A was to then be able to add them to List A. Get & Transform could do this automatically. It would be possible to achieve this result in fewer steps but we will adopt the simple approach of just appending our ‘Merge’ result with our original List A.
Once again we use Get & Transform, New Query, Combine Queries, but this time we choose the Append option and just Append our ListA query with our new Merge2 query:
We could sort our columns as we wish just by using the dropdowns in the Query Editor columns before we use Close and Load to return our results as a new Table. Here we have sorted by Code within Source so we can see all our List B items at the bottom:
We can add items to the end of each of our lists and refreshing our Append query manually, or at a set time interval, should automatically add any items in List B that are not in List A to the complete List A.
Conclusions
If you are already a black belt in VLOOKUP() and COUNTIF() then you might naturally prefer an Excel cell-based solution. However, if you can imagine starting from a position of ignorance of Excel functions as well as the Get & Transform tools, then you might find the use of queries to be simpler and more automatic. There is also an argument that a cell-based approach is more error-prone as it involves potentially thousands of cells, all containing what should be the same formula, but each of which could possibly be overwritten or corrupted.
This is just one example of what Get & Transform can do. For a few more, have a look at some of the articles in Excel Zone on Power Query:
Power Query trick for reconciling cheques
You might also be interested in
Simon Hurst is the founder of technology training consultancy The Knowledge Base and is a past chairman of the ICAEW's IT Faculty.
Replies (3)
Please login or register to join the discussion.
excel
Hello everyone,
I am using excell 2014 and in my opinion it is perfect for me. Hope that the new versions wont be more complicated than the previous.
Best wishes,
Clara
Excel versions
As Excel 2014 does not exist (on WIndows or Mac), I'd say you are pretty confused to begin with!
Series of articles
Hi Clara - hopefully the series of articles on Excel 2016 will help you confirm whether or not Excel 2016 includes anything that makes it worth you updating. I don't think I have noticed any extra complications. In fact the 'Tell me what you want to do..." box in the riboon should help make it easier to use.