Save content
Have you found this content useful? Use the button above to save it to your profile.
iStock_xfgiro_exceltips

Excel tip: Use COUNTIF to manage duplicate list values

by
21st Feb 2013
Save content
Have you found this content useful? Use the button above to save it to your profile.

David H Ringstrom continues his exploration of Excel functions with an explanation of how COUNTIF can work with LARGE and SMALL functions to manage numerical lists.

My last tutorial demonstrated how the LARGE and SMALL functions can help you find the largest or smallest values in an Excel list. This follow-on piece shows how COUNTIF can help you deal with items that share the same value.

Last time around, LARGE and SMALL helped us identify the extreme values, and MATCH and INDEX functions were applied to return the item names that corresponded with those values. But if two items on a list share the same value, MATCH/INDEX will return the same name for both:

Equal values in a list can result in duplicates when MATCH/INDEX find items with the same value.

So, how can you prevent this happening and retain the integrity of your data? Read on to find out how COUNTIF can operate as a “tiebreaker” in such situations.

In the figure above, the rankings in column G were achieved by applying the following LARGE function argument, starting from cell G2, and filled down to G11: 

=LARGE(B$2:B$11,E2)

In cell F2, this INDEX formula returned the corresponding names: 

=INDEX(A$2:A$11,MATCH(G2,B$2:B$11,0))

This rankings illustrated above display bananas twice in column F. This is because kiwis had an identical number of sales, 637. The MATCH function in cell F4 searches for the number 637, based on the formula in cell G4.

The formul encountered a match in cell B5, so stopped looking and returned 5 as the row amount for INDEX to retrieve the item name. The LARGE function also returns 637 in cell B6, but the MATCH function within cell F5 also returns 5, because MATCH stops looking at a list once it finds a match.

The way around this limitation is to find a way to make the figures in cells B2 through B11 unique, without materially affecting the amounts.

The COUNTIF function can help us do this, by applying two arguments:

  • range - This is a range of cells in which we wish to look for a specified value.
  • criteria - This represents the value that we're seeking.

COUNTIF can be used to add a multiple of .001 each time it identifies the same value in a list, using this formula:

=COUNTIF($B$2:B2,B2)

The formula has been pasted into cell C2 in the example below and then filled down to cell C11. As we can see, the formula returns the number of times that each value appears in the list. Note how absolute references have been applied by putting $ signs in front of the cell address elements. In order to create an expanding range, the starting point can be anchored in cell B2 using “$B$2”. Don't use any dollars signs around the second B2, because we want this to become B3, B4, B5, and so on as we copy the formula down the column.

COUNTIF determins the number of times a number appears within a list.

Now the dupes have been identified, the next step is to add a tiny increment to the duplicate amounts to make each be unique. Do this by modifying the previous formula as follows:

 =(COUNTIF($B$2:B2,B2)-1)*0.0001+B2

In this case, COUNTIF determines how many times the value appears in the list and subtracts 1 from it. If the value appears on the list only once, there's no reason to change the original value. If the amount appears more than once, the formula will add .0001 to it based on the number of times that it has appeared previously in the list. As shown here, kiwi sales in cell C6 become 637.0001. If strawberry sales were also 637, its amount would become 637.0002: 

This revised COUNTIF formula adds a tiny increment to duplicate values

Now that we have cooked up the incrementing COUNTIF formula, it’s time to go back and clean up the original fruit sales ranking by de-duping the extra banana reference.

Having filled the COUNTIF formula down column C, copy cells C2 through C11 to the clipboard. Then then right-click cell B2 and chose Paste Special, and double-click Values, as shown below. (If you're using Excel 2010 or later, you can click the Paste Values icon. Within the Paste Special dialog box, double-clicking on Values eliminates the need to click the OK button. You can use this double-click trick in most dialog boxes when you're making a single selection).

Use Paste Special Values to replace the original values.

Once you've pasted the data, press Escape to clear with Windows clipboard.

And here is the resulting chart. Instead of banans occupying third and fourth place, COUNTIF has return both types of fruit with equal sales. I hope the trick will help you deal with similar “ties” you encounter in your Excel spreadsheets.Duplicate wording in the list has been removed.

"Either you work Excel, or it works you!" says David Ringstrom CPA, the head of Atlanta-based software and database consultancy Accounting Advisors. He presents Excel training webcasts for CPE Link and contributes articles on Excel to AccountingWEB and Microsoft Professional Accountant's Network newsletter. He can be reached by email at david[AT]acctadv.com. You can find further ExcelZone tutorials from David H Ringstrom here.

Replies (7)

Please login or register to join the discussion.

By Dick Lloyd
22nd Feb 2013 13:29

Ranking

Would it not be easier to use the ranking function, then if two values in the list are, in fact, the same, they will have the same ranking, which is arguably better than ranking Kiwi 3rd and Bananas 4th merely bacause Kiwis appear in your list after Bananas.

The formula that I would enter in cell E2 on your example would be:

=Rank (B2,B$2:B$11,0), then copy the formula down column E.

I would then simply sort the list by reference to the rank value in column E.

 

However, I can see that COUNTIF would be useful to use if you wanted to identify genuine duplicated data so it can be removed.

 

Thanks (0)
Replying to mrme89:
avatar
By duncanphilpstate
22nd Feb 2013 15:32

Creates same problem...

I think you'll find that RANK will also place Kiwis and Bananas with equal rank and so INDEX will only retrieve the first it finds? Which is very similar to the LARGE/SMALL issue under discussion. Whatever the start point, you need to prevent duplicates in what a programmer would call the key value.

Whether you rank Bananas ahead of Kiwis or vice versa is then personal preference and arbitray but they do have to be different.

Thanks (0)
Replying to mrme89:
avatar
By dringstrom
09th Mar 2013 12:25

Good thought on RANKING

Dick, 

RANKING is a great alternative, and I appreciate the enrichment your comment adds to my article. Thanks for reading!

David

 

Thanks (0)
avatar
By duncanphilpstate
22nd Feb 2013 15:37

Or use ROW()

Rather than COUNTIF as illustrated I prefer to use ROW to achive this.

Each cell in your column C would be (B2 + ROW() / 1000000) which has the effect of adding its row number to each value but well to the right of the decimal point and immaterial.

If you've got a long list, I've found that the ROW method executes faster than a whole chain of COUNTIFs, as ithe calculation only needs to refer to current cell information and not to the entire list above it.

Not to say that using COUNTIF in this way is not a useful technique of course.

Thanks (0)
Replying to Jackie0802:
avatar
By dringstrom
09th Mar 2013 12:28

Clever!

Duncan,

Great point on ROW(). I see users often learn a few functions in Excel and try to apply them to pretty much every task. I sometimes fall into that trap myself. I often use ROW for other purposes, but in this instance it is more effective than COUNTIF when working with large data sets. Thanks for reading, and for enriching my article!

David

Thanks (0)
avatar
By AMHCPA
22nd Feb 2013 19:11

Error in following

In repeating your example it was found that COUNTIF() displayed 2 beside BOTH Bananas and Kiwi. Is there a option I need to set to use this as you have shown? Thank you duncanphilpstate for the idea of using +ROW()/1000000 to increment the individual data.

Thanks (0)
Replying to lionofludesch:
avatar
By dringstrom
09th Mar 2013 12:36

AMHCPA,

The bit of nuance is that you must anchor the first range in COUNTIF by way of putting a $ in front the row number, i.e. =COUNTIF($B$2:B2,B2). The $ in front of B is extraneous in this context because we're not copying the formula sideways, so =COUNTIF(B$2:B2,B2) would work just as well. By anchoring the formula, COUNTIF looks at an ever expanding range, and that's what makes it work as a numbering scheme.

Thank you for reading my article, I hope you found it helpful!

David

Thanks (0)