I need to bulk upload a large amount of data (about 7,000,000 records) to a PostgreSQL database using libpqxx . I read the documentation on how to populate the database , but I'm not sure how to use it in my case. At first I can’t use the files, so COPY in the database is out of the question. In addition, the database and the table that I load should function while I import.
The scenario is as follows: we get a file with all the data (including existing records) from another application at regular intervals (about once a month). Due to the number of records, it is simply impossible to check each record for existence, and we just do a bulk insert of new data (after preliminary processing).
Currently, for this, I create a new table, insert data using the tablewriter from libpqxx (without a transaction), and then in the transaction I rename the old table and the new table to the right place.
We also need to not only do this for one table, but also for several tables with different locations. So I tried to separate the table entry from the date parsing. Now I just need to separate the table creation. For this I use
create temporary table foo_temp (like foo including indexes including defaults including constraints );
This way I get a table similar to foo, and I don’t really need to know the layout in the place where I am writing. However, this leaves me with the problem that this will lead to the creation of a table with indexes, and the limitations and guidance above say that indexes will make bulk insert slow. However, if I discard indexes and constraints (or do not copy them first), I need a way to recreate them in the same estate that they set for the original table.
Any good tips on how to handle this quickly?
EDIT:
Companion: while playing with the database, I just noticed that the CREATE TABLEabove will not copy any foreign key restrictions, so I also need to manually specify them. Or is there a way to deal with them along with all the other restrictions?