How to Delete Duplicate Rows in SQL: Best Queries and Techniques

Apr 15, 2025 By Alison Perry

Duplicate rows in a database can create serious issues—whether it's inconsistent reporting, skewed analytics, or redundant transactions. In an ideal world, database tables would always have proper keys and constraints to avoid this. However, due to import errors, missing constraints, or manual entry, duplicates can and do occur.

Thankfully, SQL provides a number of effective methods for detecting and deleting duplicate rows from your tables. Whether you're working with a table that has unique constraints or one that doesn’t, this post will walk you through several reliable approaches to remove duplicate records using SQL.

Why Duplicate Records Are a Problem

Before diving into the solutions, it’s important to understand the consequences of duplicates:

Incorrect reporting: Data summaries may double-count values.
Redundant processing: Orders or transactions might be handled more than once.
Wasted storage: Duplicate data consumes unnecessary space.
Broken integrity: Inconsistent records break relationships across tables.

It’s always recommended to enforce primary keys or unique indexes to prevent duplicates. But when prevention isn’t in place, these SQL strategies will help clean your data.

How to Identify Duplicate Rows Before Deleting Them

Before jumping into deletion, it's crucial to accurately identify duplicate rows in your table. Deleting the wrong records can result in data loss, so verification is a critical step.

Common Technique to Identify Duplicates:

SELECT column1, column2, COUNT(*) AS duplicate_count

FROM your_table

GROUP BY column1, column2

HAVING COUNT(*) > 1;

This query returns rows that appear more than once, allowing you to see which records are duplicated and how often they occur.

Approach 1: Delete Duplicates Using GROUP BY and HAVING

This method works when you want to identify duplicates based on a subset of columns (e.g., name and score), and then remove extra occurrences.

Example Table:

CREATE TABLE Students (

RegNo INT,

Name VARCHAR(50),

Marks INT

);

INSERT INTO Students VALUES

(1, 'Tom', 77),

(2, 'Lucy', 78),

(3, 'Frank', 89),

(4, 'Jane', 98),

(5, 'Robert', 78),

(3, 'Frank', 89),

(5, 'Robert', 78),

(4, 'Jane', 98);

Identifying duplicates:

SELECT Name, Marks, COUNT(*) AS count

FROM Students

GROUP BY Name, Marks

HAVING COUNT(*) > 1;

It will return rows that appear more than once. To delete them, you’ll need to use a temporary table or CTE (see next approach), because SQL doesn’t allow deleting from grouped results directly.

Approach 2: Delete Duplicates Using ROW_NUMBER() with CTE

It is the most flexible method, especially for tables without a primary key or unique index.

Syntax:

WITH RankedRows AS (

SELECT *,

ROW_NUMBER() OVER (PARTITION BY RegNo, Name, Marks ORDER BY RegNo) AS row_num

FROM Students

)

DELETE FROM RankedRows

WHERE row_num > 1;

Explanation:

ROW_NUMBER() assigns a unique sequence to each row in a group of duplicates.
We keep the row where row_num = 1 and delete the rest.
PARTITION BY tells SQL how to group potential duplicates.
Works on SQL Server, PostgreSQL, and other modern RDBMSs.

Approach 3: Use RANK() to Identify and Remove Duplicates

The RANK() function also helps find duplicates. It's similar to ROW_NUMBER() but allows for ties.

Example Table:

CREATE TABLE Animals (

sno INT,

Animal_id INT,

Animal_name VARCHAR(50)

);

Assigning Ranks:

WITH RankedAnimals AS (

SELECT *,

RANK() OVER (PARTITION BY Animal_id, Animal_name ORDER BY sno DESC) AS rk

FROM Animals

)

DELETE FROM RankedAnimals WHERE rk > 1;

When to use this:

When you want to retain the latest record based on a column like sno or created_at.
When ties (same value) are possible and need equal ranking.

Approach 4: Deleting Duplicates Using Self-Joins

Self-joins are useful when there is no window function support (e.g., in some older systems).

Delete Query:

DELETE A

FROM Students A

INNER JOIN Students B

ON A.Name = B.Name AND A.Marks = B.Marks

WHERE A.RegNo > B.RegNo;

Logic:

It compares the table to itself.
Deletes the record with a higher RegNo when duplicates are found.

Approach 5: Remove Duplicates Using Common Table Expressions (CTE)

A CTE provides a way to create a temporary result set. With it, we can perform more readable and modular deletes.

Create Table:

CREATE TABLE Employ_DB (

emp_no INT,

emp_name VARCHAR(50),

emp_address VARCHAR(100),

emp_eoj DATE

);

INSERT INTO Employ_DB VALUES

(11, 'Mohith', 'Tokyo', '2000-05-12'),

(12, 'Sana', 'Delhi', '2001-08-22'),

(11, 'Mohith', 'Tokyo', '2000-05-12');

Using ROW_NUMBER in CTE:

WITH DuplicateEmployees AS (

SELECT *,

ROW_NUMBER() OVER (PARTITION BY emp_no, emp_name ORDER BY emp_eoj) AS rn

FROM Employ_DB

)

DELETE FROM DuplicateEmployees

WHERE rn > 1;

It is similar to previous methods but tailored for structured business data.

Approach 6: SSIS (SQL Server Integration Services) Method

If you're using SQL Server in an enterprise setting, SSIS provides visual tools to remove duplicates.

Steps:

Use a Sort transformation to sort by key columns.
Enable “Remove rows with duplicate sort values”.
Load cleaned data into a new destination table.

SSIS is powerful for ETL (Extract, Transform, Load) workflows and suits batch deduplication tasks.

Preventing Future Duplicates

While it's important to know how to delete duplicates, it's even more critical to prevent them from occurring in the first place.

Best Practices:

Define PRIMARY KEY and UNIQUE constraints to enforce data uniqueness.
Normalize your database schema to reduce redundancy.
Use NOT EXISTS or MERGE logic during inserts to avoid inserting existing records.
Validate and sanitize input data before insertion.
Monitor and log batch processes to detect anomalies early.

Prevention requires thoughtful design and validation rules, but it saves time and effort in the long run.

Conclusion

Duplicate data is a silent threat to your database’s integrity. It causes reporting errors, inefficiencies, and inconsistent behavior across applications. Thankfully, SQL provides powerful tools like GROUP BY, ROW_NUMBER(), RANK(), and even self-joins to handle duplicates efficiently.

Whether you're working on a small side project or managing enterprise-level databases, understanding how to remove duplicate rows in SQL is a must-have skill. The method you choose depends on your database structure, available functions, and whether or not your table has a primary key.

Discover how to find and delete duplicate rows in SQL using CTE, ROW_NUMBER, GROUP BY, and other efficient techniques.

Why Duplicate Records Are a Problem

How to Identify Duplicate Rows Before Deleting Them

Common Technique to Identify Duplicates:

Approach 1: Delete Duplicates Using GROUP BY and HAVING

Example Table:

Identifying duplicates:

Approach 2: Delete Duplicates Using ROW_NUMBER() with CTE

Syntax:

Explanation:

Approach 3: Use RANK() to Identify and Remove Duplicates

Example Table:

Assigning Ranks:

When to use this:

Approach 4: Deleting Duplicates Using Self-Joins

Delete Query:

Logic:

Approach 5: Remove Duplicates Using Common Table Expressions (CTE)

Create Table:

Using ROW_NUMBER in CTE:

Approach 6: SSIS (SQL Server Integration Services) Method

Steps:

Preventing Future Duplicates

Best Practices:

Conclusion

Recommended Updates

CNN vs GAN: A Comparative Analysis in Image Processing

Copyright and Artificial Intelligence: Can AI Be an Inventor in the Digital Age

Llama 3 vs. Llama 3.1: Choosing the Right Model for Your AI Applications

How AI in Customer Services Can Transform Your Business for the Better

Inside the Mind of Machines: Logic and Reasoning in AI

Learn Machine Learning and AI for Free: Top 10+ Courses to Explore in 2025

Understanding Supervised Learning: Key Concepts and Real-Life Examples

The Risks Behind AI Hallucinations – Understanding When AI Generates False Information

Understanding the Top 10 Challenges Companies Face During AI Adoption

From Hours to Minutes: The Power of AI-Generated Lesson Plans in Teaching

Oracle Unveils AI Agent Studio for Fusion Cloud Applications

Synthetic Data Generation Using Generative AI