How to Delete Duplicate Rows in SQL: Complete Explaination With Examples


Presence of duplicate rows can significantly impair data consistency and query performance. Efficiently identifying and eliminating these duplicate entries is crucial for maintaining a robust and accurate database system. This comprehensive guide aims to equip you with various techniques for identifying and removing duplicate rows in SQL. Through detailed explanations and practical examples, you'll gain a deeper understanding of how to handle duplicate data effectively within your SQL database.

Firstly, we have to understand How to Identify Duplicate Rows:

Identifying Duplicate Rows:

Understanding the application of the GROUP BY clause in conjunction with the HAVING clause to identify duplicate rows within a table.

SELECT name, COUNT(*as count
FROM employees
GROUP BY name
HAVING count > 1;

How to Delete Duplicate Rows Using the DISTINCT Keyword:

The DISTINCT keyword in SQL helps filter duplicate rows from the query output, ensuring that only unique combinations of the specified columns are displayed. In this example, the query specifically selects distinct combinations of the "name" and "department" columns from the "employees" table, thereby excluding any repetitive entries with identical values in both columns.

SELECT DISTINCT name, department
FROM employees;

How to Delete Duplicate Rows Using the ROW_NUMBER() Function:

This method employs a Common Table Expression (CTE) to assign a unique row number to each record, partitioned by the "name" column and ordered by the "id" column. The subsequent DELETE statement within the CTE targets rows where the assigned row number exceeds 1, resulting in the elimination of duplicate entries while preserving the distinct ones.

WITH cte AS (
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY name ORDER BY id) AS row_num
    FROM employees
)
DELETE FROM cte WHERE row_num > 1;

How to Delete Duplicate Rows Using the Common Table Expression (CTE) Method:

This technique employs a Common Table Expression (CTE) to assign sequential row numbers based on multiple columns ("name" and "department"). Subsequently, it removes the rows where the assigned row number exceeds 1, effectively eradicating duplicate entries identified through the combined columns.

WITH cte AS (
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY name, department ORDER BY id) AS row_num
    FROM employees
)
DELETE FROM cte WHERE row_num > 1;

How to Delete Duplicate Rows Using DELETE with INNER JOIN:

This example utilizes the DELETE statement with an INNER JOIN on the same table. By matching rows with the same "name" but different "id," it deletes the rows where the "id" of one entry is greater than the "id" of another entry with the same "name," effectively removing the duplicate entries.

DELETE e1
FROM employees e1
INNER JOIN employees e2 ON e1.name = e2.name AND e1.id > e2.id;

How to Delete Dupicate Rows Using Preventing Deletion: Transferring Unique Records to a New Table:-

In this example, a new table named "employees_new" is created using the CREATE TABLE AS statement. By utilizing the DISTINCT keyword in the SELECT query, only the unique rows from the "employees" table are selected and then inserted into the newly created "employees_new" table. This method effectively prevents the insertion of duplicate entries, ensuring that only distinct records are transferred to the new table.

CREATE TABLE employees_new AS
SELECT DISTINCT *
FROM employees;

Conclusion:

The elimination of duplicate rows is a critical aspect of maintaining data accuracy and ensuring the seamless performance of SQL operations. By harnessing techniques such as the DISTINCT keyword, the ROW_NUMBER() function, and the combined usage of DELETE and INNER JOIN, you can proficiently manage and eliminate duplicate entries from your SQL database. This guide equips you with the necessary tools to handle duplicate rows efficiently, fostering data integrity and enhancing the overall efficiency of your data management practices.