Clean source data with Excel Fuzzy add-in
Anyone working with accounts data will know the soul-destroying tedium of standardising and cleaning data imports so that you can use them for pivot tables, analysis and Vlookups.
An cure for some of this pain is now available, thanks to the boffins at Microsoft’s BI Laps, who have come up with the Fuzzy Lookup Add-In for Excel.
This nifty tool helps to identify and match text strings within incoming data to the phrases contained in your existing Excel workbook. And best of all, it’s available as a free download from the Microsoft BI Labs page.
As it says there, the add-in can scores each text string for similarity and be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching will pick up a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data. So rows reading “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” would have a high similarity score the target entity for a proper match.
It also has a customisation option to help you refine the add-in for your own data sources and languages.
Videos are available on the Microsoft BI Labs website, and Debra Dalgleish provides some step by step instructions and illustrations on her Contextures blog.
The BI Labs site is relatively. Having given Microsoft a hard time in the past for botching its BI strategy, it’s good to see some thought and resources going into what should be a crucial area of differentiation for the Windows software giant. PowerPivot is getting a lot of good press from experts, and if you’ve got a taste for really sophisticated analytics, the labs are also showcasing two new tools:
- PivotViewer Extension for Microsoft SQL Server Reporting Services, which lets you pull and display data from Microsoft SQL Server Reporting Services into a Silverlight-driven pivoting display environment that promises to give users the ability to interact and view relationships between individual pieces of information in an exciting, new way.
- Microsoft SQL Server Data Mining for the Cloud - A collection of cloud-based data mining services mirroring the capabilities currently available in SQL Server Analysis Services.