First, let’s look at the data set we want to deduplicate. It is comprised of sales for agents in specific regions.
Before we start removing duplicates, we must ask ourselves, “What constitutes a duplicate?”
- Should a row of the table be removed if the [Region] on one row matches the [Region] of another row?
- Should a row of the table be removed if the [Sales Agent] on one row matches the [Sales Agent] of another row?
- Should the entire row match in all aspects to another row to be a candidate for removal?
- Should any combination of the above be required for removal?
We will go with the requirement that the entire row of the table must be the same as another row in the same table to be removed.
NOTE: You need to decide whether to retain a copy of the original data set or replace it with the deduplicated result.
Method #1: Excel’s Remove Duplicates Feature
This method of duplicate data removal works in all versions of Excel.
Before invoking the Remove Duplicates feature, let’s make a copy of the data set by:
- clicking in the data and pressing <CTRL-A> two times,
- pressing <CTRL-C> to copy the selected data into memory,
- clicking in a new location on the sheet and pressing <CTRL-V> to paste the copy of the data into the new location.
To remove the duplicate rows from the copy of the data set:
- Click in the newly copied data set.
- Select Data (tab) -> Data Tools (group) -> Remove Duplicates.
- In the Remove Duplicates dialog box, ensure all the listed column headings and the option labeled “My data has headers” are checked and press OK.
The system reports back how many duplicate values were removed as well as how many unique values remain.
Removing duplicate values based on a single column
If you wanted to remove all duplicated rows of the table based solely on the [Sales Agent] column, you would follow the above process but only check the box for [Sales Agent] in the Remove Duplicates dialog box.
This would result in the following “before and after” snapshot of the data.
NOTE: Only the row for the first encountered [Sales Agent] name is retained. All rows that follow with the same [Sales Agent] name are removed.
PROS:
This method is simple to implement and available to any Excel user, regardless of version. It’s the quickest way to remove duplicate values ‘in situ’ when you don’t need to keep the original data set. If you need to keep the original as is, the other two methods might be more efficient (they skip the extra step of copying the data).
CONS:
If new records are created in the original table that duplicate existing values, the deduplicated copy will no longer accurately reflect the source. The “old” list of unique values would need to be deleted and the Remove Duplicates process would need to be re-executed.
Finding Duplicates (Before Removal)
If your data keeps changing and there’s a risk that values will be repeated, you may want to include visual indication that duplicates are present. There are several methods of finding duplicates in Excel. The simplest one is right there on the Home tab – conditional formatting.
Select the range (avoid selecting entire columns, e.g., A:A). Go to Home (tab) -> Styles (group) -> Conditional Formatting -> Highlight Cells Rules -> Duplicate Values.
Select the style you want and click OK to apply.
Note that the duplicate check occurs on column-by-column basis rather than row by row. If you want to compare row to row, you will need to modify your approach. For example, concatenate the values to check the whole row in a single column. (You can hide the concatenated values with Custom Number Format: ;;;).
By conditionally highlighting duplicate values, you get a signal that you need to run Remove Duplicates again – or use one of the dynamic methods described below.
Featured Course
Excel Essentials for the Real World
Method #2 – How to Remove Duplicates in Excel Using the UNIQUE Function
This method of duplicate data removal works for subscribers of Office 365 and users of Excel for the Web and Excel 2021.
The UNIQUE function has been introduced in 2021 along with several other powerful functions that can manipulate arrays. Its purpose is pretty self-explanatory – it returns unique values.
There is one (optional) step of preparation before we put it to use: make a copy of our header row from the original table of data. You can skip this if you just need the unique values and don’t intend to present the deduplicated data set.
Select the cell directly below the [Sales Agent] heading (in this case, cell E2) and enter the following formula:
=UNIQUE(A2:C20)
If you have converted your data set to an official Excel Table, you can use the table’s name in place of the cell range reference. This is easier to understand and easier to write.
=UNIQUE(TSales)
The UNIQUE function can be extremely useful when creating dependent dropdown lists. Combine it with the even newer Excel functions like TOCOL/TOROW to deduplicate values across rows as well as columns at the same time.
CONS:
The downside of this method is that the number formats (see the [Sales] column) do not carry forward to the result. You need to format the cells to match if you need to retain that comma style. Also, UNIQUE, like other dynamic array formulas, cannot be used in official Excel tables. As you saw, it happily accepts table references as arguments (input), but you can’t output its results in a table.
PROS:
The UNIQUE function, as opposed to the Remove Duplicates feature, is dynamic. If new records are created that duplicate existing records, the duplicates will be automatically removed – as long as you are using an official Excel table as your source. If you’re using a range of cells, you may have to manually adjust the range in the formula.
Featured Course
Master NEW Excel Functions in Office 365 & Office 2021
Method #3 – Remove Duplicate Values with Power Query
This method of duplicate data removal works in all versions of Excel from version 2013 onward.
Power Query is a data transformation tool available in Excel that can perform simple to highly complex data cleansing procedures on “dirty” data.
To use Power Query to remove the duplicates in the table, perform the following steps:
- Click in the data to be deduplicated.
- Select Data (tab) -> Get & Transform (group) -> From Table/Range.
This will load the table into Power Query.
- Click any of the column heading and press <CTRL-A> to select all columns.
- Select Home (tab) -> Reduce Rows (group) -> Remove Rows -> Remove Duplicates.
You are left with only the unique rows.
- Load the results of the query back into the Excel spreadsheet by clicking the lower part of the Close & Load button (far left of Home ribbon) and selecting “Close & Load To…”.
- In the Import Data dialog box, select the option for Table and Existing Worksheet. Select a cell beside the original data (e.g., cell E1 of the Data sheet) and click OK.
CONS:
This method is semi-dynamic, meaning you must refresh the query’s output to see the updated results.
This can be done by right-clicking on the query output and selecting Refresh or by clicking the Refresh All button on the Data tab.
PROS:
Data can be set to refresh automatically when one of 3 events occur:
- The user opens the file (see Query Properties image below; you access Query Properties by expanding the Refresh All options and selecting Connection Properties (see above) or right-clicking on the query name in the Queries & Connections pane and selecting Properties).
- A timer is established to refresh the results every N-minutes (see Query Properties image below).
- Data is changed in the source table (initiated via an event-driven macro using VBA).
Featured Course
Master Excel Power Query – Beginner to Pro
Did you catch the “mistake”?
Look closely at the Power Query output.
Notice that “Stevie Bridge” in “Asia” with “4378” in [Sales] is listed twice.
The reason this was “missed” is because Power Query is case-sensitive when it comes to analyzing data.
Both the Remove Duplicates feature and the UNIQUE function are case-insensitive; they don’t discriminate between differing cases. Power Query will treat these two records as separate and distinct.
Fixing the Problem
To force Power Query to remove the duplicate value it “missed”, we must add a step to the query that will standardize the casing of the agent names.
- In the Queries & Connections pane, either double-click the “TSales” query or right-click the “TSales” query and select Edit. This will reopen the Power Query Editor.
- In the Query Settings pane (on the right) click the step labeled “Changed Type”. If you have more steps in your query, just remember that it should be a step before the “Removed Duplicates” step.
- Select the heading of the [Sales Agent] column.
- Capitalize the first letter of each word by selecting Transform (tab) -> Text Column (group) -> Format -> Capitalize Each Word.
NOTE: You can also do this by right-clicking the column heading and selecting Transform -> Capitalize Each Word.
- In the Insert Step notification, click Insert to add the new step to the query.
- Save the updated query back to the original output table by clicking Home (tab) -> Close & Load.
We now have only one “Stevie Bridge” in “Asia” with “4378” in [Sales]. Data successfully deduplicated.
Final Thoughts
Next time you’re faced with the dilemma how to remove duplicates in Excel, you’ll be more than up to the task. Consider your specific scenario and choose the method that will work best. Avoid any traps by understanding the limitations of each approach and say goodbye to messy data.
Practice Workbook
Feel free to Download the Workbook HERE.
Featured Course
Black Belt Excel Package
Leila Gharani
I'm a 6x Microsoft MVP with over 15 years of experience implementing and professionals on Management Information Systems of different sizes and nature.
My background is Masters in Economics, Economist, Consultant, Oracle HFM Accounting Systems Expert, SAP BW Project Manager. My passion is teaching, experimenting and sharing. I am also addicted to learning and enjoy taking online courses on a variety of topics.