find duplicate records to delete or update in sql server

This article is to describe different methods to find duplicate records and also remove those rows using different methods in SQL Server. Many times some records are considered duplicate by your business logic but SQL Server does not treat them as duplicate records which makes it difficult to delete. In my experience most of the time DBA wants to remove records for data cleaning purpose. Here are some guidelines and ways to remove duplicate entries from your database.

Before you remove duplicates

Before you decide to delete data make a backup copy of table.
Should test your script by creating a copy of table and verify the results.
After removing duplicate data try creating constraints on table to prevent future issues with duplicate entries.

Sample Table with duplicate values

CREATE TABLE users
(
 username varchar(15),
 pwd varchar(15)
)
-- First Insert
INSERT INTO users VALUES('user1','pwd1')
INSERT INTO users VALUES('user2','pwd2')
INSERT INTO users VALUES('user3','pwd3')
-- Second Insert creating duplicates
INSERT INTO users VALUES('user1','pwd4')
INSERT INTO users VALUES('user2','pwd5')
INSERT INTO users VALUES('user3','pwd6')
-- Third Insert with duplicate values
INSERT INTO users VALUES('user1','pwd7')
INSERT INTO users VALUES('user2','pwd8')
INSERT INTO users VALUES('user3','pwd9')

The above table have duplicate usernames, SQL Server does not consider it as duplicate values but business logic defines username is duplicated in users table.

SQL Server row duplication means all fields must match.

SELECT DISTINCT * FROM users

Finding duplicate values using Group by and Having clause.

SELECT username, count(*) AS [Count]
FROM users
GROUP BY username HAVING count(*) > 1

First Method to delete duplicate rows.

SET ROWCOUNT 1
SELECT @@rowcount
WHILE @@rowcount > 0
DELETE usr
FROM users as usr
INNER JOIN
(SELECT username
FROM users
GROUP BY username HAVING count(*) > 1)
AS u ON u.username = usr.username
SET ROWCOUNT 0

The second method could be by creating a copy of table with unique index with Ignore Duplicate Key clause, this method require extra space.

--Create blank table with same structure as original table.
SELECT * INTO UsersCopy FROM users WHERE 1 = 0

--Create unique index with ignore duplicate keys
CREATE UNIQUE INDEX u_username ON userscopy (username) WITH IGNORE_DUP_KEY

-- Insert data into copy table from original table
INSERT INTO userscopy SELECT * FROM users

-- Truncate / empty original table
TRUNCATE TABLE users

-- Insert the clean data back
INSERT INTO users SELECT * FROM userscopy

-- Drop the index and copy of table
DROP INDEX userscopy.u_username
DROP TABLE userscopy

Third method is using cursors to perform delete, this will work on SQL Server 2005 only. We can pass the number of rows to remove in TOP clause.

DECLARE @username varchar(15)
DECLARE @cnt int
 
DECLARE cur CURSOR READ_ONLY
FOR
SELECT username, count(*) -1 as cnt
FROM users
GROUP BY username HAVING Count(*) > 1

OPEN cur

FETCH cur INTO @username, @cnt

WHILE @@FETCH_STATUS = 0
BEGIN
 DELETE TOP(@cnt) FROM users WHERE username = @username
 FETCH cur  INTO @username,@cnt
END

CLOSE cur
DEALLOCATE cur

The fourth method could be typing each delete statement, this will work on SQL Server 2005 only because SQL Server 2000 does not allow passing parameters in TOP clause.

DELETE TOP
(
SELECT COUNT(*) - 1 
FROM users
WHERE username = 'User1'
)
FROM users
WHERE username = 'User1'

 


SQL Server DBA

I am a SQL Server DBA with almost 9 years of experience in database technologies.

Right now I am in process of redesigning this website in ASP.NET 2.0 and AJAX. The main purpose of new SQL DBA website is to offer more help to database professionals and make it easy for me to update contents on regular basis.