Common spreadsheet data issues and how to solve them
Excel MVP Liam Bastick presents some of the most common issues you might find in raw data and how to fix them.
Convert numbers stored as text to numbers
Sometimes when you import data from text files or external databases, numbers (including dates) get stored as text. Also, some users are in the habit of typing an apostrophe (‘) before a number to make it text. This could create serious issues if you are using these cells in calculations. Here’s two ways to convert numbers (that are not dates) stored as text back into numbers:
- Paste Special - In any blank cell, type the value ‘1’ (no apostrophes) and then press Enter. Select the cell containing the numerical value 1, and press Ctrl + C to copy it. Select the cell / range which you want to convert to numbers and then Paste –> Paste Special (keyboard Shortcut Alt + E + S). In the ‘Paste Special’ dialog box, select ‘Multiply’ (in the ‘Operation’ section):
Click ‘OK’. This converts all the numbers in text format back to numbers.
- Text to Columns - Select one column if data at a time. If you don't want to convert the whole column, simply select one or more cells instead, but ensure that the cells selected are in the same column, otherwise this method will not work.
The ‘Text to Columns’ button is typically used for splitting a column, but it can also be used to convert a single column of text to numbers.
On the ‘Data’ tab of the ribbon, click ‘Text to Columns’ in the ‘Data Tools’ grouping. The resulting wizard is used to split the text into different columns, but this is not required here. The text has already been converted, so you only need to click the ‘Finish’ button immediately! Numbers stored as text are now numbers.
For non-date values stored as text, you can also use the VALUE function and create a duplicate numerical range, referencing the original dataset. (The results may then be copied [Ctrl + C] and then stored as values [Alt + E + S + V] with the original dataset then removed.) A similar process may be adopted with the DATEVALUE function and date values stored as text too.
Dealing with duplicates
There are two things you can do with duplicate data; highlight them or delete them:
- Highlight duplicate data: Select the data and on the ribbon, go to Home –> Conditional Formatting –> Highlight Cells Rules –> Duplicate Values:
Specify the formatting and all the duplicate values get highlighted.
- Delete duplicate data: When you use the following feature, the duplicate data will be permanently deleted. Therefore, before you delete the duplicates, it’s a good idea to copy the original data to another worksheet or workbook so that you don’t accidentally lose any information.
Once you are satisfied you have a backup, select the range of cells that has duplicate values you want to remove. Then, click Data -> Remove Duplicates,
and then under Columns, check or uncheck the columns where you want to remove the duplicates, making sure you take note of whether your data has field headers:
Once you have completed your selection, click ‘OK’.
Removing additional spaces and non-printable characters
Extra spaces are painfully difficult to spot and may cause problems in matching data / identifying duplicates. While you may somehow spot the extra spaces between words or numbers, trailing spaces are not even visible. Fortunately, there are functions that may help us. One such function is
This function takes the cell reference or text in quotation marks (text) as the input. It removes leading and trailing spaces as well as the additional spaces between words (except single spaces).
Another useful function is
This function removes all non-printable characters from text. And I’m not talking about expletives. You can use CLEAN on text imported from other applications that contains characters that may not print with your operating system. For example, you can use CLEAN to remove some low-level computer code that is frequently positioned at the beginning and end of data files and cannot be printed.
When you get data from a database or import it from some other source, you may consider that the data should not all be stored in one cell. This can be separated into adjacent (multiple) cells. This process is known as parsing data, and may be achieved using the Text to Column functionality in Excel:
- Select the data/text you want to parse in Excel.
- Go To Data –> Text to Column on the ribbon (this opens the Text to Columns Wizard, Alt + A + E).
- Select the data type (select ‘Delimited’ if your data in not equally spaced, and is separated by characters such as comma, hyphen, semi-colon, and so on). Click ‘Next’.
- Select the delimiter (the character that separates your data). You can select a pre-defined delimiter or anything else using the ‘Other’ option.
Do you see how in the example above there is a check box for ‘Treat consecutive delimiters as one’? This allows the words to be split up consistently even if the original text includes additional, unnecessary spaces (in this illustration).
- Select the data format. Also select the destination cell. If destination cell is not selected, the current cell is overwritten.
- Enjoy the fruits of your labour:
Considering blank cells
Blank cells can cause chaos if not considered whilst cleaning your data. Blank cells may cause data to appear in incorrect columns, cause formulas to generate #VALUE! errors, and so forth. It may be preferable to fill all blank cells with zeros (0) or ‘Not Available’ or simply leave them as is. The decision must be a conscious one. If your dataset is large, a manual modification could take hours. Fortunately, it is possible to select all the blank cells on a worksheet at once:
- Select the entire data set.
- Press the F5 function key (this opens the ‘Go To’ dialog box, Ctrl + G).
- Click on the ‘Special…’ button (bottom left). This opens the ‘Go To Special’ dialog box.
- Select ‘Blanks’ and Click ‘OK’.
- This selects all the blank cells in your data set. If you want to enter 0 or Not Available in all these cells, just type it and press Ctrl + Enter (remember if you press only Enter, the value is inserted only in the active cell).
Changing text case
When you import data, often the names or titles are inconsistent. Sometimes, all the text may be in lower / upper case, or it could be a mix of both. You can easily make it all consistent by using these three functions:
- LOWER(text): Converts all text into lower case (for example, “these are my apples”).
- UPPER(text): Converts all text into upper case (for example, “THESE ARE MY APPLES”).
- PROPER(text): Converts all text into proper case (for example, “These Are My Apples”).
Often, data is extracted from multiple sources. Each source will typically have its own formatting, entirely inconsistent with all of the other data. For ease of use and data manipulation, sometimes, you simply wish to remove all data so that you are not distracted from the underlying content.
This is very easy to do in Excel. Simply select the data set (Ctrl + A will highlight all contiguous data in a range) and then go to Home –> Clear –> Clear Formats (Alt + H + F) on the ribbon. If you prefer, you can also clear everything (not recommended!), comments, hyperlinks or content:
You might also be interested in
Recognised by Microsoft as one of 104 Most Valuable Professionals (MVPs) in Excel worldwide by Microsoft, Liam has over 30 years’ experience in financial model development/auditing, valuations, M&A, strategy, training and consultancy. He has headed Ernst & Young’s modelling team in Melbourne and was an Assistant Director in their...