Python sqlite3: Diff tables in two databases

Question

Python sqlite3: Diff tables in two databases

I have two databases with the same schema, and I want to efficiently execute diff in one of the tables. That is, they return only unique records, discounting the primary key.

columns = zip(*db1.execute("PRAGMA table_info(foo)").fetchall())[1]
db1.execute("ATTACH DATABASE '/path/to/db1.db' AS db1")
db1.execute("ATTACH DATABASE '/path/to/db2.db' AS db2")
db2.execute("ATTACH DATABASE '/path/to/db1.db' AS db1")
db2.execute("ATTACH DATABASE '/path/to/db2.db' AS db2")
data = db2.execute("""
    SELECT 
        one.* 
    FROM 
        db1.foo AS one 
        JOIN db2.foo 
        AS two 
    WHERE {}
    """.format(' AND '.join( ['one.{0}!=two.{0}'.format(c) for c in columns[1:]]))
).fetchall()

That is, ignoring the primary key (in this case meow), do not return records that are identical in both databases.

Table fooin is db1as follows:

meow    mix    please   deliver
1       123    abc
2       234    bcd      two
3       345    cde

And the table fooin db2looks like this:

meow    mix    please   deliver
1       345    cde
2       123    abc      one
3       234    bcd      two     
4       456    def      four

Thus, unique entries from db2:

[(2, 123, 'abc', 'one'), (4, 456, 'def', 'four')]

which is what I get. This works great if I have more than two columns. But if there are only two of them, that is, the primary key and a value such as in the search table:

bar  baz         bar   baz
1    123         1     234
2    234         2     345
3    345         3     123
                 4     456

, N-1 , N , N - db1. , , , .

[(1, '234'),
 (1, '234'),
 (2, '345'),
 (2, '345'),
 (3, '123'),
 (3, '123'),
 (4, '456'),
 (4, '456'),
 (4, '456')]

, :

N = db1.execute("SELECT Count(*) FROM foo").fetchone()[0]
data = [
     list(data) 
     for data,n in itertools.groupby(sorted(data)) 
     if np.mod(len(list(n)),N)==0
]

:

[[4, '456']]

, SQL-, .

, ( db ~ 10k) . ? !

+3

python sqlite3

Joe Flip 21 . '14 17:06

1

Chris Johnson · Accepted Answer · 2014-02-25T02:54:46+0000

- .

, :

sqlite> select * from t1;
meow        mix         please      delivery  
----------  ----------  ----------  ----------
1           123         abc                   
2           234         bcd         two       
3           345         cde

sqlite> select * from t2;
meow        mix         please      delivery  
----------  ----------  ----------  ----------
1           345         cde                   
2           123         abc         one       
3           234         bcd         two       
4           456         def         four

, t2/not t1 ( PK), :

select sum(q1.db), mix, please, delivery from (select 1 as db, mix, please,
delivery from t1 union all select 2 as db, mix, please, delivery from t2) q1
group by mix, please, delivery having sum(db)=2; 

sum(q1.db)  mix         please      delivery  
----------  ----------  ----------  ----------
2           123         abc         one       
2           456         def         four

, having. SUM(DB)=1 1/ 2; SUM(DB)=2 2/ 1; SUM(DB)=1 OR SUM(DB)=2 , , ; SUM(DB)=3 , .

, , PK. , , GROUP BY SUM / , PK . , -PK- , , PK.

, . db 2, . . 1 db t1, 2 db t2, 4 db t3, 8 db t4, / , , - . HAVING SUM(DB)=5 , t1 t3, t2 t4.

Python sqlite3: Diff tables in two databases

More articles: