Working with text in Excel
As well as numerical data, text often finds its way into spreadsheets. Simon Hurst explains different ways you can deal with it.
Sometimes, in order to gain access to the numbers, you need to manipulate text information. An obvious example would be a nominal code where different sections of the code represent different aspects of the entity or item.
There are a few ways in which existing text can be manipulated using a manual operation. If the text is part of an external file you can either open the file using File, Open with the Files type set to text files, or use Get External Data, From Text.
It’s worth noting that, although these methods look similar in many ways, there are some important differences. If you open the file directly then changes can be saved back over the original file. In contrast, Get External Data imports the file content into an existing Excel workbook. Also, if you open an existing file with a .CSV (comma separated values) extension, Excel will assume it knows how to deal with the file and will just split the file into separate columns wherever it finds a comma.
On the other hand, if you open a file with a .TXT extension Excel will automatically use the Text Import Wizard to allow you greater control over how your file is imported. Using Get External Data will always invoke the Text Import Wizard for a text file, regardless of the file extension.
If your text data is already in Excel then you can run the Text to Columns Wizard on text in a single column. The Text to Columns command can be found in the Data Tools group of the Data ribbon tab and offers the same options as the Text Import Wizard.
Excel 2013 introduced Flash Fill. This feature attempts to understand how you want to organise your text column by analysing a few examples and then applying the same approach to the rest of the column. In this example our text column to be split is column A. When we enter the first three characters of A2 in cell B2, then start to enter the first three characters of A3 in B3, Excel should show a ‘shadow’ set of entries in all the other cells adjacent to our data which we can accept by pressing Enter. In the screen shot below we have entered -98000 in cell C2 and - in cell C3. You can see the proposed entries which again can be accepted by pressing Enter. Note, however, that in this case Excel doesn’t understand the significance of the minus sign and replicates it all the way down the column, including for positive values:
These manual methods are useful in some circumstances, but often you will want to implement a more automatic approach.
Manipulating text could be part of an automatic process for taking an external file containing accounting data and turning it into a formatted report. The report could contain summaries of balances calculated by referring to parts of a nominal code. Get External Data could be used to create a link to the text file containing the list of balances and associated nominal codes. Using any of the methods already covered to split a long nominal into constituent parts would insert a manual step into the process. Instead, one or more additional columns could be added adjacent to the External Data Range containing formulae to extract each of our required nominal code elements.
Since Excel 2007, the data range can be created as an Excel table. The way Excel Tables work, a formula in a column will automatically be copied down to any new rows – whether added manually or created as part of a data refresh. If your external data range is not created as a Table, one of the properties allows you to “Fill down formulas in columns adjacent to data”, which has the same effect.
Fixed width text
There are three text functions that may be useful when splitting text by position: LEFT(), RIGHT() and MID(). LEFT() and RIGHT() are straightforward: each takes two arguments a reference to the cell containing the text and the number of characters to be returned. Unsurprisingly, LEFT() returns the designated number of characters starting with the leftmost character whereas RIGHT() returns the characters from the right-hand end of the text. MID() is slightly more complicated in that, as well as the reference to the text and the number of characters, it requires a starting position. In this example we have used each of the functions to split our code into three parts:
Things are not quite so straightforward when we need to work with text containing delimiters. In order to split our text correctly we need to establish the position of each delimiter – we can then use LEFT(), RIGHT() and MID() to split the text as required.
There are two functions that you can use to find the position of a particular character. FIND() and SEARCH() both take three arguments, a character to locate, a reference to a cell containing the text and an optional starting position from which to start the search. The starting position is necessary to enable more than one instance of the search character to be located in the same piece of text. The difference between FIND() and SEARCH() is how they deal with case: FIND() is case-sensitive and SEARCH() is not. The LEN() function can also be useful when working with text of differing lengths. It just takes a reference to the text and returns the number of characters it contains.
It’s also worth bearing in mind that text functions return text. If you are extracting a section of text that you want to use as a number, then you will need to convert the text to a value, for example by using the VALUE() function. Here is the example used previously with the Text Import Wizard but using formulae to make the operation automatic:
Of course, in this case, if the original data was in an external file, it would be much easier to create a link using one of the Get External Data options and set the comma as the delimiter.
An automated summary
Here our external text file is delimited, but we also need to split the nominal code by position. We have ensured our external data range is a Table and used our LEFT(), MID() and RIGHT() functions to split our nominal code in columns C, D and E:
Columns A and B are linked to our external data source so could be refreshed to bring in new and updated information. Note that, whether your data range is created as a Table or not depends, not only on the version of Excel that you are using, but also upon which of the Get External Data options you choose. Using 'From Text' will not necessarily create a Table rather than a normal external range (although in Excel 2013 if you choose the 'Add this data to the Data Model' option this will also cause the external data range to be created as a Table). Using 'From Other Sources', 'From Microsoft Query' and then using one of the Text drivers will create a Table.
As an example of using this technique to create an automatic calculation of a summary total we have used a SUMIFS() function - because there is a single criterion we could also have used SUMIF() – to calculate our Department totals. The calculation refers to entire Table columns so, as we add more rows, our text functions are copied down to those new rows and our formulae automatically includes any values where the Department value meets our criterion.
If we just wanted to analyse our date by the various nominal code components we could use a PivotTable but this example assumes we are extracting summary totals for use in a formatted report. An alternative method would be to the use of GETPIVOTDATA() or Cube functions with a PivotTable as covered in an earlier article in this series.