There are various times when we need to find duplicate records in SQL Server. It is possible to find duplicates using DISTINCT, ROW NUMBER as well as the GROUP BY approach.
Duplicate records can create problems sometimes when displaying reports or performing a Multiple Insert update. Finding duplicate records in a database needs further investigation. In some cases, duplicate records are positive, but it all depends on the data and the database design as well.
For example, if a customer has ordered the same product twice on the same date with the the same shipping and billing address, then this may result in a duplicate record.
Let us create a table Customer with First Name, Last Name, and Mobile Number fields.
CREATE TABLE CUSTOMER
(
FirstName VARCHAR(50),
LastName VARCHAR(50),
MobileNo VARCHAR(15)
);
INSERT INTO CUSTOMER VALUES ('Niraj','Yadav',989898);
INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);
INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);
INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);
INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);
INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);
SELECT * FROM CUSTOMER;
Using the DISTINCT approach, we can quickly get unique rows in a table.
SELECT DISTINCT FirstName, LastName, MobileNo FROM CUSTOMER;
However, this does not show how many times a row has been duplicated. Using the GROUP BY approach, we can find this.
Finding Duplicates Using GROUP BY
Adding grouping and a counting field to our display of FirstName, LastName and MobileNo combination shows how many times each customer’s name appears.
SELECT FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM CUSTOMER
GROUP BY FirstName, LastName, MobileNo;
GROUP BY will show just one record for each combination of FirstName, LastName and MobileNo.
The count CNT shows how many times the row has been duplicated.
CNT = 1 indicates that row appears only once.
Let us filter out using the Having clause to exclude rows that appear only once.
SELECT FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM CUSTOMER
GROUP BY FirstName, LastName, MobileNo
HAVING COUNT(1) > 1;