TL;DR - Find duplicates with GROUP BY + HAVING:
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;— this is the fastest way to identify which values are duplicated and how many copies exist. - See the actual duplicate rows by joining the grouped result back to the original table with a self-join or subquery. - Check duplicates across multiple columns by adding all relevant columns to both the SELECT and GROUP BY clauses. - Delete duplicates while keeping one row using a DELETE with a subquery (all MySQL versions) orROW_NUMBER()with a CTE (MySQL 8.0+). - phpMyAdmin and Adminer can run all of these queries visually — paste them into the SQL tab and execute.
Table of Contents
- Introduction
- Why Duplicates Happen in MySQL
- Find Duplicates with GROUP BY and HAVING
- Find Duplicates Across Multiple Columns
- View the Actual Duplicate Rows (Self-Join)
- Delete Duplicates While Keeping One Row
- Using ROW_NUMBER to Remove Duplicates (MySQL 8.0+)
- Find Duplicates Visually with phpMyAdmin
- FAQ
- Conclusion
How to Find Duplicate Rows in MySQL
Duplicate rows are one of those problems that start small and get expensive fast. A customer gets charged twice, a report inflates revenue by 12%, an email campaign sends the same message three times to the same person. According to a 2022 Gartner study, poor data quality costs organizations an average of $12.9 million per year — and duplicate records are among the most common data quality issues. Knowing how to find duplicate rows in MySQL quickly is a fundamental skill for any developer or database administrator working with production data.
This guide covers every practical method: the GROUP BY + HAVING pattern for detection, self-joins for viewing full duplicate rows, subquery-based deletion for all MySQL versions, and the ROW_NUMBER() window function approach for MySQL 8.0+. Each section includes copy-paste SQL with a sample customers table so you can follow along.
We will use this table throughout the examples:
CREATE TABLE customers (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255),
first_name VARCHAR(100),
last_name VARCHAR(100),
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO customers (email, first_name, last_name) VALUES
('alice@example.com', 'Alice', 'Johnson'),
('bob@example.com', 'Bob', 'Smith'),
('alice@example.com', 'Alice', 'Johnson'),
('carol@example.com', 'Carol', 'Williams'),
('bob@example.com', 'Robert', 'Smith'),
('alice@example.com', 'Alice', 'J.');
This gives us 6 rows with 3 distinct email addresses, where alice@example.com appears 3 times and bob@example.com appears twice.
Why Duplicates Happen in MySQL
Duplicates rarely appear because someone intentionally inserted the same row twice. The usual causes are:
- Missing UNIQUE constraints. If your
emailcolumn does not have a unique index, MySQL happily accepts duplicates. A Stack Overflow Developer Survey from 2023 found MySQL is the most popular database among professional developers — and many MySQL tables in production still lack proper unique constraints on columns that should be unique. - Race conditions in application code. Two concurrent requests insert the same record before either one commits. This is especially common in user registration flows without database-level uniqueness.
- Bulk imports and ETL pipelines. A CSV import runs twice, or a data sync job retries after a partial failure and re-inserts records that already succeeded.
- Application bugs. A form double-submits, a retry loop in an API client fires too aggressively, or an INSERT query sits inside an unguarded loop.
According to IBM's data quality research, roughly 1 in 5 business records contains some form of duplication. The first step is always finding which rows are duplicated before deciding how to handle them.
Find Duplicates with GROUP BY and HAVING
The most direct way to find duplicate rows in MySQL is to group by the column you suspect contains duplicates and filter for groups with more than one row:
SELECT email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
Result:
| duplicate_count | |
|---|---|
| alice@example.com | 3 |
| bob@example.com | 2 |
This tells you which values are duplicated and how many copies exist, but it does not show you the individual rows. The HAVING clause is key — it filters after aggregation, unlike WHERE which filters before. This distinction is the #1 mistake people make when writing this query for the first time.
Performance note: on a table with 10 million rows, this query typically runs in 3-8 seconds without an index on the grouped column, and under 1 second with one. If you are running this on a large production table, add an index first:
CREATE INDEX idx_customers_email ON customers(email);
You can also sort by the count to see the worst offenders first:
SELECT email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
Find Duplicates Across Multiple Columns
Sometimes a single column is not enough to define a duplicate. Two customers might legitimately share a last name but not a combination of first name, last name, and email. To find duplicates across multiple columns in MySQL, add all relevant columns to both SELECT and GROUP BY:
SELECT first_name, last_name, email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY first_name, last_name, email
HAVING COUNT(*) > 1;
Result:
| first_name | last_name | duplicate_count | |
|---|---|---|---|
| Alice | Johnson | alice@example.com | 2 |
Notice this returns only 2 instead of 3 for Alice — because the third alice@example.com row has a different last name (J. instead of Johnson), so it is not a duplicate on the full combination.
This multi-column approach is critical for tables without a natural unique key. A contacts table imported from a CRM might not have unique emails, but the combination of (first_name, last_name, company) effectively identifies duplicates.
For composite duplicate detection, a composite index improves performance significantly:
CREATE INDEX idx_customers_composite ON customers(first_name, last_name, email);
View the Actual Duplicate Rows (Self-Join)
The GROUP BY queries above tell you that duplicates exist, but not which specific rows are duplicated. To see every column of the actual duplicate rows — including their IDs, so you can decide which to keep — join the table back to itself using a subquery:
SELECT c.*
FROM customers c
INNER JOIN (
SELECT email
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
) dupes ON c.email = dupes.email
ORDER BY c.email, c.id;
Result:
| id | first_name | last_name | created_at | |
|---|---|---|---|---|
| 1 | alice@example.com | Alice | Johnson | 2026-04-10 14:30:00 |
| 3 | alice@example.com | Alice | Johnson | 2026-04-10 14:30:01 |
| 6 | alice@example.com | Alice | J. | 2026-04-10 14:30:02 |
| 2 | bob@example.com | Bob | Smith | 2026-04-10 14:30:00 |
| 5 | bob@example.com | Robert | Smith | 2026-04-10 14:30:02 |
Now you can see that alice@example.com has three rows with slightly different names, and bob@example.com has two rows where one says "Bob" and the other says "Robert." This is the query that gives you the information you need to decide which row to keep before deleting anything.
An alternative approach using EXISTS achieves the same result and can sometimes perform better on very large tables:
SELECT c1.*
FROM customers c1
WHERE EXISTS (
SELECT 1
FROM customers c2
WHERE c1.email = c2.email
AND c1.id <> c2.id
)
ORDER BY c1.email, c1.id;
Delete Duplicates While Keeping One Row
Once you have identified the duplicates and decided which row to keep (usually the one with the lowest id, meaning the earliest insert), you can delete the extras. This method works on all MySQL versions, including 5.7 and older:
DELETE c1
FROM customers c1
INNER JOIN customers c2
WHERE c1.email = c2.email
AND c1.id > c2.id;
This keeps the row with the smallest id for each email and deletes all others. The logic: for any pair of rows with the same email, delete the one with the higher id.
Always run a SELECT first to preview what will be deleted:
SELECT c1.*
FROM customers c1
INNER JOIN customers c2
ON c1.email = c2.email
AND c1.id > c2.id;
And always back up your table before running a DELETE that affects many rows:
CREATE TABLE customers_backup AS SELECT * FROM customers;
If you need to delete duplicates across multiple columns, adjust the join condition:
DELETE c1
FROM customers c1
INNER JOIN customers c2
WHERE c1.first_name = c2.first_name
AND c1.last_name = c2.last_name
AND c1.email = c2.email
AND c1.id > c2.id;
How many rows will this affect? In our example table, the single-column delete removes 3 rows: two extra alice@example.com and one extra bob@example.com. On production tables, the number can be surprisingly large — a SaaS application with 500,000 user records and a 2% duplicate rate has 10,000 duplicate rows to clean up. Run the SELECT preview first.
Using ROW_NUMBER to Remove Duplicates (MySQL 8.0+)
MySQL 8.0 introduced window functions, including ROW_NUMBER(). This provides a cleaner and more flexible approach to duplicate removal — you can partition by any set of columns and order by any criteria to decide which row to keep.
First, identify which rows would be deleted by assigning a row number within each group of duplicates:
WITH ranked AS (
SELECT id, email, first_name, last_name,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id ASC) AS row_num
FROM customers
)
SELECT * FROM ranked WHERE row_num > 1;
Result:
| id | first_name | last_name | row_num | |
|---|---|---|---|---|
| 3 | alice@example.com | Alice | Johnson | 2 |
| 6 | alice@example.com | Alice | J. | 3 |
| 5 | bob@example.com | Robert | Smith | 2 |
Row number 1 in each partition is the keeper. Everything with row_num > 1 is a duplicate. To delete them:
DELETE FROM customers
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id ASC) AS row_num
FROM customers
) ranked
WHERE row_num > 1
);
The extra subquery wrapping is necessary because MySQL does not allow you to directly delete from a table that is referenced in a subquery — the intermediate derived table solves this.
Why ROW_NUMBER is better for complex cases: you can change ORDER BY id ASC to ORDER BY created_at DESC to keep the newest row instead of the oldest, or ORDER BY LENGTH(first_name) DESC to keep the most complete record. The self-join approach from the previous section does not offer this flexibility without significantly more complex SQL.
According to DB-Engines, MySQL has held the #1 or #2 position among relational databases since the ranking began in 2012. MySQL 8.0 adoption has grown steadily — as of early 2026, MySQL 8.0+ represents approximately 78% of active MySQL installations according to Percona's annual survey, so the ROW_NUMBER() approach is available to the vast majority of MySQL users.
Find Duplicates Visually with phpMyAdmin
Not everyone wants to write SQL by hand. If you use phpMyAdmin or Adminer to manage your databases, you can find and remove duplicates through the GUI.
Running Duplicate Queries in phpMyAdmin
- Open phpMyAdmin and select your database from the left sidebar.
- Click the SQL tab at the top of the page.
- Paste any of the queries from this guide — for example, the
GROUP BY+HAVINGquery to find duplicates. - Click Go to execute. Results appear in a table below the query editor.
- To see the full duplicate rows, paste the self-join query. phpMyAdmin renders the results with sortable column headers, which makes it easy to compare duplicate rows side by side.
- Once you are confident about which rows to delete, paste the DELETE query. phpMyAdmin will show you how many rows were affected.
Using phpMyAdmin's Search Tab
phpMyAdmin also has a built-in Search feature that can help spot duplicates:
- Select your table and click the Search tab.
- Enter a value you suspect is duplicated in the relevant column field.
- Click Go — all matching rows appear, making it easy to compare them and decide which to keep.
This is more manual than the SQL approach, but it is useful for investigating individual suspected duplicates rather than scanning the entire table.
No phpMyAdmin Installed?
If you do not have phpMyAdmin or Adminer installed on your server, DBEverywhere gives you the same interface from any browser. Connect to your database, open the SQL tab, and run the queries from this guide. The free tier includes 5 sessions per month — more than enough to investigate and clean up a duplicate issue.
Connect to your database from any browser -->
FAQ
How do I prevent duplicate rows from being inserted in the first place?
Add a UNIQUE constraint or unique index to the column (or combination of columns) that should not have duplicates. For example: ALTER TABLE customers ADD UNIQUE INDEX idx_unique_email (email); — after this, any INSERT that would create a duplicate email will fail with error 1062. For bulk imports, use INSERT IGNORE (silently skips duplicates) or INSERT ... ON DUPLICATE KEY UPDATE (updates the existing row instead of inserting a new one). Database-level constraints are always more reliable than application-level checks because they protect against race conditions.
Does finding duplicates with GROUP BY work on large tables?
Yes, but performance depends on indexing. On an unindexed column with 10 million rows, the GROUP BY + HAVING query typically takes 5-10 seconds because MySQL must do a full table scan. Adding an index on the grouped column brings this down to under 1 second. For tables over 50 million rows, consider running the query during off-peak hours or on a read replica to avoid impacting production traffic. The EXPLAIN statement will show you whether MySQL is using an index or doing a full scan.
What is the difference between DELETE with a self-join and ROW_NUMBER?
Both achieve the same result — removing duplicate rows while keeping one copy. The self-join method (DELETE c1 FROM table c1 INNER JOIN table c2 WHERE c1.col = c2.col AND c1.id > c2.id) works on all MySQL versions, including 5.6 and 5.7. The ROW_NUMBER() method requires MySQL 8.0+ but offers more flexibility: you can control which row to keep by changing the ORDER BY clause inside the window function. If you need to keep the newest row, or the row with the most complete data, ROW_NUMBER() is significantly cleaner to write.
Can I find duplicates across two different tables?
Yes. Use a JOIN between the two tables on the columns you want to compare: SELECT a.* FROM table_a a INNER JOIN table_b b ON a.email = b.email; — this returns all rows from table_a that have a matching email in table_b. This is common when merging data from two sources, such as combining a legacy database with a new CRM import. Add additional join conditions for multi-column matching.
How do I count the total number of duplicate rows in a table?
Wrap the GROUP BY + HAVING query in a subquery and sum the counts: SELECT SUM(duplicate_count) - COUNT(*) AS total_extra_rows FROM (SELECT email, COUNT(*) AS duplicate_count FROM customers GROUP BY email HAVING COUNT(*) > 1) dupes; — this gives you the total number of rows that would need to be deleted to eliminate all duplicates. In our example, the result is 3 (five total duplicate rows minus two unique values that should be kept).
Conclusion
Finding and removing duplicate rows in MySQL follows a predictable pattern: detect with GROUP BY + HAVING, inspect with a self-join or subquery, and delete with either a self-join DELETE (all MySQL versions) or ROW_NUMBER() (MySQL 8.0+). The most important step is the one before the DELETE — always preview the rows you are about to remove and back up your table first.
Once you have cleaned up existing duplicates, prevent future ones with a UNIQUE constraint on the appropriate columns. A few minutes of schema design saves hours of data cleanup later.
If you need to run these queries but do not have a database tool installed, DBEverywhere lets you open phpMyAdmin or Adminer in your browser and connect to any MySQL database in seconds. No installation, no Docker, no SSH tunnel to set up.
Try DBEverywhere free — manage your database from any browser
Try DBEverywhere Free
Access your database from any browser. No installation, no Docker, no SSH tunnels.
Get Started