Discover how to find and delete duplicate rows in SQL using CTE, ROW_NUMBER, GROUP BY, and other efficient techniques.

Apr 15, 2025 By Alison Perry

Duplicate rows in a database can create serious issues—whether it's inconsistent reporting, skewed analytics, or redundant transactions. In an ideal world, database tables would always have proper keys and constraints to avoid this. However, due to import errors, missing constraints, or manual entry, duplicates can and do occur.

Thankfully, SQL provides a number of effective methods for detecting and deleting duplicate rows from your tables. Whether you're working with a table that has unique constraints or one that doesn’t, this post will walk you through several reliable approaches to remove duplicate records using SQL.

Why Duplicate Records Are a Problem

Before diving into the solutions, it’s important to understand the consequences of duplicates:

  • Incorrect reporting: Data summaries may double-count values.
  • Redundant processing: Orders or transactions might be handled more than once.
  • Wasted storage: Duplicate data consumes unnecessary space.
  • Broken integrity: Inconsistent records break relationships across tables.

It’s always recommended to enforce primary keys or unique indexes to prevent duplicates. But when prevention isn’t in place, these SQL strategies will help clean your data.

How to Identify Duplicate Rows Before Deleting Them

Before jumping into deletion, it's crucial to accurately identify duplicate rows in your table. Deleting the wrong records can result in data loss, so verification is a critical step.

Common Technique to Identify Duplicates:

SELECT column1, column2, COUNT(*) AS duplicate_count

FROM your_table

GROUP BY column1, column2

HAVING COUNT(*) > 1;

This query returns rows that appear more than once, allowing you to see which records are duplicated and how often they occur.

Approach 1: Delete Duplicates Using GROUP BY and HAVING

This method works when you want to identify duplicates based on a subset of columns (e.g., name and score), and then remove extra occurrences.

Example Table:

CREATE TABLE Students (

RegNo INT,

Name VARCHAR(50),

Marks INT

);

INSERT INTO Students VALUES

(1, 'Tom', 77),

(2, 'Lucy', 78),

(3, 'Frank', 89),

(4, 'Jane', 98),

(5, 'Robert', 78),

(3, 'Frank', 89),

(5, 'Robert', 78),

(4, 'Jane', 98);

Identifying duplicates:

SELECT Name, Marks, COUNT(*) AS count

FROM Students

GROUP BY Name, Marks

HAVING COUNT(*) > 1;

It will return rows that appear more than once. To delete them, you’ll need to use a temporary table or CTE (see next approach), because SQL doesn’t allow deleting from grouped results directly.

Approach 2: Delete Duplicates Using ROW_NUMBER() with CTE

It is the most flexible method, especially for tables without a primary key or unique index.

Syntax:

WITH RankedRows AS (

SELECT *,

ROW_NUMBER() OVER (PARTITION BY RegNo, Name, Marks ORDER BY RegNo) AS row_num

FROM Students

)

DELETE FROM RankedRows

WHERE row_num > 1;

Explanation:

  • ROW_NUMBER() assigns a unique sequence to each row in a group of duplicates.
  • We keep the row where row_num = 1 and delete the rest.
  • PARTITION BY tells SQL how to group potential duplicates.
  • Works on SQL Server, PostgreSQL, and other modern RDBMSs.

Approach 3: Use RANK() to Identify and Remove Duplicates

The RANK() function also helps find duplicates. It's similar to ROW_NUMBER() but allows for ties.

Example Table:

CREATE TABLE Animals (

sno INT,

Animal_id INT,

Animal_name VARCHAR(50)

);

Assigning Ranks:

WITH RankedAnimals AS (

SELECT *,

RANK() OVER (PARTITION BY Animal_id, Animal_name ORDER BY sno DESC) AS rk

FROM Animals

)

DELETE FROM RankedAnimals WHERE rk > 1;

When to use this:

  • When you want to retain the latest record based on a column like sno or created_at.
  • When ties (same value) are possible and need equal ranking.

Approach 4: Deleting Duplicates Using Self-Joins

Self-joins are useful when there is no window function support (e.g., in some older systems).

Delete Query:

DELETE A

FROM Students A

INNER JOIN Students B

ON A.Name = B.Name AND A.Marks = B.Marks

WHERE A.RegNo > B.RegNo;

Logic:

  • It compares the table to itself.
  • Deletes the record with a higher RegNo when duplicates are found.

Approach 5: Remove Duplicates Using Common Table Expressions (CTE)

A CTE provides a way to create a temporary result set. With it, we can perform more readable and modular deletes.

Create Table:

CREATE TABLE Employ_DB (

emp_no INT,

emp_name VARCHAR(50),

emp_address VARCHAR(100),

emp_eoj DATE

);

INSERT INTO Employ_DB VALUES

(11, 'Mohith', 'Tokyo', '2000-05-12'),

(12, 'Sana', 'Delhi', '2001-08-22'),

(11, 'Mohith', 'Tokyo', '2000-05-12');

Using ROW_NUMBER in CTE:

WITH DuplicateEmployees AS (

SELECT *,

ROW_NUMBER() OVER (PARTITION BY emp_no, emp_name ORDER BY emp_eoj) AS rn

FROM Employ_DB

)

DELETE FROM DuplicateEmployees

WHERE rn > 1;

It is similar to previous methods but tailored for structured business data.

Approach 6: SSIS (SQL Server Integration Services) Method

If you're using SQL Server in an enterprise setting, SSIS provides visual tools to remove duplicates.

Steps:

  • Use a Sort transformation to sort by key columns.
  • Enable “Remove rows with duplicate sort values”.
  • Load cleaned data into a new destination table.

SSIS is powerful for ETL (Extract, Transform, Load) workflows and suits batch deduplication tasks.

Preventing Future Duplicates

While it's important to know how to delete duplicates, it's even more critical to prevent them from occurring in the first place.

Best Practices:

  • Define PRIMARY KEY and UNIQUE constraints to enforce data uniqueness.
  • Normalize your database schema to reduce redundancy.
  • Use NOT EXISTS or MERGE logic during inserts to avoid inserting existing records.
  • Validate and sanitize input data before insertion.
  • Monitor and log batch processes to detect anomalies early.

Prevention requires thoughtful design and validation rules, but it saves time and effort in the long run.

Conclusion

Duplicate data is a silent threat to your database’s integrity. It causes reporting errors, inefficiencies, and inconsistent behavior across applications. Thankfully, SQL provides powerful tools like GROUP BY, ROW_NUMBER(), RANK(), and even self-joins to handle duplicates efficiently.

Whether you're working on a small side project or managing enterprise-level databases, understanding how to remove duplicate rows in SQL is a must-have skill. The method you choose depends on your database structure, available functions, and whether or not your table has a primary key.

Recommended Updates

Basics Theory

CNN vs GAN: A Comparative Analysis in Image Processing

Alison Perry / Apr 18, 2025

Know the essential distinctions that separate CNNs from GANs as two dominant artificial neural network designs

Impact

Copyright and Artificial Intelligence: Can AI Be an Inventor in the Digital Age

Alison Perry / Apr 20, 2025

Explore if AI can be an inventor, how copyright laws apply, and what the future holds for AI-generated creations worldwide

Applications

Llama 3 vs. Llama 3.1: Choosing the Right Model for Your AI Applications

Tessa Rodriguez / Apr 16, 2025

Explore the differences between Llama 3 and Llama 3.1. Compare performance, speed, and use cases to choose the best AI model.

Impact

How AI in Customer Services Can Transform Your Business for the Better

Tessa Rodriguez / Apr 19, 2025

From 24/7 support to reducing wait times, personalizing experiences, and lowering costs, AI in customer services does wonders

Basics Theory

Inside the Mind of Machines: Logic and Reasoning in AI

Alison Perry / Apr 14, 2025

How logic and reasoning in AI serve as the foundation for smarter, more consistent decision-making in modern artificial intelligence systems

Technologies

Learn Machine Learning and AI for Free: Top 10+ Courses to Explore in 2025

Tessa Rodriguez / Apr 20, 2025

Learn AI and machine learning for free in 2025 with these top 10+ courses from leading platforms, universities, and tech experts

Basics Theory

Understanding Supervised Learning: Key Concepts and Real-Life Examples

Alison Perry / Apr 15, 2025

Get a clear understanding of supervised learning, including how it works, why labeled data matters, and where it's used in the real world—from healthcare to finance

Applications

The Risks Behind AI Hallucinations – Understanding When AI Generates False Information

Tessa Rodriguez / Apr 20, 2025

AI Hallucinations happen when AI tools create content that looks accurate but is completely false. Understand why AI generates false information and how to prevent it

Impact

Understanding the Top 10 Challenges Companies Face During AI Adoption

Tessa Rodriguez / Apr 20, 2025

A lack of vision, insufficient AI expertise, budget and cost, privacy and security concerns are major challenges in AI adoption

Impact

From Hours to Minutes: The Power of AI-Generated Lesson Plans in Teaching

Tessa Rodriguez / Apr 08, 2025

Boost teacher productivity with AI-generated lesson plans. Learn how AI lesson planning tools can save time, enhance lesson quality, and improve classroom engagement. Discover the future of teaching with AI in education

Applications

Oracle Unveils AI Agent Studio for Fusion Cloud Applications

Tessa Rodriguez / Apr 17, 2025

Business professionals can now access information about Oracle’s AI Agent Studio integrated within Fusion Suite.

Technologies

Synthetic Data Generation Using Generative AI

Tessa Rodriguez / Apr 18, 2025

GANs and VAEs demonstrate how synthetic data solves common issues in privacy safety and bias reduction and data availability challenges in AI system development