How to find out if Postgres post has changed

I have a question like "upsert" ... but I want to throw it away because it is a bit different from any that I read on stackoverflow.

Main problem.

I am working on migrating from mysql to PostgreSQL 9.1.5 (hosted on Heroku). As part of this, I need to import several CSV files every day. Some data is sales information and is almost guaranteed to be new and needs to be inserted. But, the other parts of the data are almost guaranteed to be the same. For example, csv files (plural note) will have POS information (point of sale) in them. This rarely changes (and most likely only through additions). Then there is product information. There are about 10,000 products (the vast majority will remain unchanged, but additions and updates are possible).

The final point (but this is important) is that I have a requirement to provide an audit trail / information for any given element. For example, if I add a new POS entry, I will need to track it back to the file in which it was found. If I change the UPC code or product description, then I will need to track it to the import (and file) in which the change occurred.

The decision that I contemplate.

Since the data is provided to me through CSV, I am working on the fact that COPY will be the best / fastest way. The data structure in the files is not quite what I have in the database (i.e. Final Destination). So, I copy them to tables in an intermediate schema that corresponds to CSV (note: one schema per data source). Tables in intermediate schemas will have row triggers before insertion. These triggers can decide what to do with the data (insert, update, or ignore).

, , . , NULL ( ). , , , . , , - . ( , , x y). , , . , - "" , .

, , - . , , wiki.postgresql.org. , hstore ( , - , "last_modified" )

90% , ... .. .

?

, : 3 10K, . , , python script ( - ), , , .

:

  • . , , , .
  • , , , , ( )
  • , , SO (, " python" ), , SO , , , , .
+5
1

. , , COPY :

CREATE TEMP TABLE target_tmp AS
SELECT * FROM target_tbl LIMIT 0;  -- only copy structure, no data

COPY target_tmp FROM '/path/to/target.csv';

ANALYZE - temp. autovacuum!

ANALYZE target_tmp; 

, , temp , .

ALTER TABLE ADD CONSTRAINT target_tmp_pkey PRIMARY KEY(target_id);

.

SQL .
, target_id..

DELETE, ?

DELETE FROM target_tbl t
WHERE NOT EXISTS (
   SELECT 1 FROM target_tmp t1
   WHERE  t1.target_id = t.target_id
);

UPDATE, :

UPDATE target_tbl t
SET    col1 = t1.col1
FROM   target_tmp t1
WHERE  t.target_id = t1.target_id

UPDATE, :

...
AND    col1 IS DISTINCT FROM t1.col1; -- repeat for relevant columns

, :

...
AND    t IS DISTINCT FROM t1;         -- check the whole row

INSERT :

INSERT INTO target_tbl(target_id, col1)
SELECT t1.target_id, t1.col1
FROM   target_tmp t1
LEFT   JOIN target_tbl t USING (target_id)
WHERE  t.target_id IS NULL;

, ( ):

DROP TABLE target_tmp;

ON COMMIT DROP CREATE TEMP TABLE.
, PostgreSQL, .

+7

All Articles