I have a question like "upsert" ... but I want to throw it away because it is a bit different from any that I read on stackoverflow.
Main problem.
I am working on migrating from mysql to PostgreSQL 9.1.5 (hosted on Heroku). As part of this, I need to import several CSV files every day. Some data is sales information and is almost guaranteed to be new and needs to be inserted. But, the other parts of the data are almost guaranteed to be the same. For example, csv files (plural note) will have POS information (point of sale) in them. This rarely changes (and most likely only through additions). Then there is product information. There are about 10,000 products (the vast majority will remain unchanged, but additions and updates are possible).
The final point (but this is important) is that I have a requirement to provide an audit trail / information for any given element. For example, if I add a new POS entry, I will need to track it back to the file in which it was found. If I change the UPC code or product description, then I will need to track it to the import (and file) in which the change occurred.
The decision that I contemplate.
Since the data is provided to me through CSV, I am working on the fact that COPY will be the best / fastest way. The data structure in the files is not quite what I have in the database (i.e. Final Destination). So, I copy them to tables in an intermediate schema that corresponds to CSV (note: one schema per data source). Tables in intermediate schemas will have row triggers before insertion. These triggers can decide what to do with the data (insert, update, or ignore).
, , . , NULL ( ). , , , . , , - . ( , , x y). , , . , - "" , .
, , - . , , wiki.postgresql.org. , hstore ( , - , "last_modified" )
90% , ... .. .
?
, : 3 10K, . , , python script ( - ), , , .
:
- . , , , .
- , , , , ( )
- , , SO (, " python" ), , SO , , , , .