Working with text in Excel
In this article, Liam Bastick looks at different ways of working with text in an Excel spreadsheet.
There are certain functions in Excel that will allow you to select only part of a text string. For example, you may have a collection of (fictitious) telephone numbers and wish to deal with them as follows:
Typically, you need to use the text functions (as they are known collectively) in combination to extract data from text strings.
For example, the LEFT / RIGHT / MID functions allow you to extract only a specified subset of a given text string. One example might be that zip codes have been entered with 9 digits, such as 12345-6789, but you only need the first five digits. To make this conversion, insert a column next to the zip code column and use the LEFT formula to extract the first five digits.
The syntax of LEFT is as follows:
The text argument refers to the cell you want to convert and [number_of_characters] is the number of characters you want to extract from the cell, counting from the left. The reason the argument is optional (that is, there are square brackets around the second argument in the syntax) is because if it is not specified it is assumed to be one (1). The formula above starts from the left of the selected cell (B3) and pulls in the number of characters you want from cell C3 (5).
Once you have entered the formula for all cells that you need to manipulate, it's important you turn your formulas into non-dependent values. This is so the original data may be removed/moved/deleted without any consequences affecting the results. To ensure this, highlight the cells that contain the text formulas, copy the cells (Ctrl + C), and go to Edit > Paste Special. Then, select 'Values' and press 'OK'.
This will paste the cells as plain text so that the formula no longer appears in the cells. This allows you to move cells around without having to worry about the formulas referencing incorrectly or miscalculating.
The RIGHT function works similarly to LEFT, except it starts from the right and counts backwards. Everything else is the same. The MID formula is slightly different. MID is short for Middle, which means you can capture parts of a word, starting at any point in a cell.
Its syntax is as follows:
Note that there are no optional arguments in this function:
- text is as before and refers to the cell/string you wish to convert
- start_number is the starting position (that is, which will be the first character in the text string subset to be extracted). For example, 2 would mean the second character, 3 the third character and so on
- number_of_characters is the length of the extracted subset text string.
=MID("One morning I shot an elephant in my pyjamas.",23,8)
returns “elephant”. That is, the “e” of “elephant” is the 23rd character in the text string and “elephant” is eight (8) characters long.
LEN(text) gives the length of text in terms of the number of text characters. For example, blank cells have zero length, “” has length 2, but =“” has length zero. The LEN function does not lie: this can be a good way to check for excess spaces and when to call in the TRIM or CLEAN functions mentioned earlier.
Other functions include:
- FIND(find_text,within_text,[start_number]) is a search function which is case sensitive but does not allow wildcard characters. It seeks out the first instance of a character or characters (typed in inverted commas) in the within_text text string. The start_number argument is optional (hence the square brackets in the syntax) so that the first few characters in a text string may be ignored. If the find_text cannot be located within within_text the error #VALUE! is returned.
- SEARCH(find_text,within_text,[start_number]) is a search function which is not case sensitive but does allow for wildcard characters. It seeks out the first instance of a character or characters (typed in inverted commas) in the within_text text string. The start_number argument is optional (hence the square brackets in the syntax) so that the first few characters in a text string may be ignored. If the find_text cannot be located within within_text the error #VALUE! Is returned.
- REPLACE(old_text,start_number,number_of_characters,new_text) is a function that allows you to swap one or several characters in a text string with another character or a set of characters. In the old_text, it seeks out the characters to be swapped by starting at the start_number of the text string and replacing the number_of_characters with the new text. For example,
=REPLACE(“Get the answer next time”,5,10,”it right”)
becomes “Get it right next time”.
- SUBSTITUTE(text,old_text,new_text,[instance_number]) is similar to REPLACE, as it replaces one or more instances of a given character or text string (old_text) in a text string with a specified character or string (new_text). The optional instance_number cites the occurrence of old_text you wish to replace. If this is omitted, all occurrences will be replaced, as this example demonstrates:
Here are some more examples to make these functions clearer:
You might also be interested in
Recognised by Microsoft as one of 104 Most Valuable Professionals (MVPs) in Excel worldwide by Microsoft, Liam has over 30 years’ experience in financial model development/auditing, valuations, M&A, strategy, training and consultancy. He has headed Ernst & Young’s modelling team in Melbourne and was an Assistant Director in their...