Two-column SQL Deduplication

Question

Two-column SQL Deduplication

I struggled with this for quite some time, but I just can't understand.

I have a table with three columns. 2 columns containing names, and a third containing the distance of Damerau Levenshtein ( http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance ) between these names.

each column contains every single name, which means that all the names present in the autor1 column are also present in the autor2 column. as a result, I have the required rows twice, only with the replacement columns autor1 and autor2.

as an example, row 3 is equal to row 1, only with the exchange of autor columns, the same thing happens with 2-4. how would I formulate a query that omits these "duplicates"?
id- | ------ autor1 ---- | ------ autor2 ----- | DLD
1 - | Abel, Gustav - | Abel, Gustave | 1
2 - | Abel, Gustav - | Abele, Gustav | 1
3 - | Abel, Gustave | Abel, Gustav - | 1
4 - | Abele, Gustav | Abel, Gustav - | 1

to
| ------ autor1 ---- | ------ autor2 ----- | DLD
| Abel, Gustav - | Abel, Gustave | 1
| Abel, Gustav - | Abele, Gustav | 1

+5

sql duplicates

lightxx May 07 '12 at 12:29

source share

1 answer

Lieven Keersmaekers · Accepted Answer · 2012-05-07T12:31:16+0000

NOT EXISTS , . , id. .

SELECT *
FROM   YourTable yto
WHERE  NOT EXISTS (
         SELECT  *
         FROM    YourTable yti
         WHERE   yti.autor2 = yto.autor1
                 AND yti.id > yto.id
       )

Edit

, -

(ID = 1)
, ID > 1 autor1 = autor2 (, ID 3) →
(ID = 2)
, ID > 2 autor1 = autor2 (, ID 4) →
(ID = 3)
, ID > 3 autor1 = autor2 () →
(ID = 4)
, ID > 4 autor1 = autor2 () →

Two-column SQL Deduplication

More articles: