I have several large XML files (5GB ~ each) that I import into the mongodb database. I use Expat to analyze documents, perform some data manipulation (deleting some fields, converting units, etc.), and then inserting into the database. My script is based on this: https://github.com/bgianfo/stackoverflow-mongodb/blob/master/so-import
My question is: is there a way to improve this with batch insert? Storing these documents on an array prior to insertion would be a good idea? How many documents need to be stored before insertion? Writing jsons to a file and then using mongoimport would be faster?
I appreciate any suggestion.
source
share