I am currently performing performance as well as memory tuning in our application based on sleep mode for large import / import of packages. We mainly import a CSV file with product data, where some products are new (insert) and some exist (update).
Now I focus on choosing a strategy for determining which UPDATE entities and which ones are INSERT, without checking (Select if exists) for each line in the CSV file.
My current approach is this:
- Build a hash map of all objects inside the database.
- iterate over the CSV and use the hash map to decide whether to update or insert.
This approach works well, and the test proved that it is faster than performing one IF EXISTS check for each row.
I am worried about the size of the memory if there are many objects in the database.
Now I am thinking about using a small version of the approach above, and I would like to know the views. Basically, I want to do multiple IF EXISTS validation packages with multiple lines (e.g., SELECT FROM table where sku IN (sku1, sku2, sku3))
Here are a few pseudo codes:
1. Database contains: db{sku1, sku2,sku3,sku5}
2. file contains: file {sku1, sku2, sku3, sku6}
3. Expected result:
updates: {sku1, sku2, sku3}
inserts{sku6}
4. Algorithm
have a map to keep database entities which need updates
updatemap {}
now iterate over the file in e.g. batches of 2 rows (for demo purposes)
1st iteration: foreach (select where sku IN (sku1, sku2) limit 2) as elem
-> updatemap.add(elem) -> elem is asumed to be a persistent entity here
-> myDAO.update(elem) -> executes Spring getHibernateTemplate().update() under the hood
-> updatemap contents after 1st loop {sku1, sku2}
2nd iteration: foreach (select where sku IN (sku3, sku6) limit) 2 as elem
-> updatemap.add(elem)
-> myDAO.update(elem)
-> updatemap contents after 3nd loop {sku1, sku2, sku3}
btw: I already accept things like (if i % 30 == 0) session.flush; session.clear();
Now we know all the updated items. All skus not in updatemap are basically inserts, and we can use simple set arithmetic to determine them by doing
file {sku1, sku2, sku3, sku6} - updatemap {sku1, sku2, sku3} = newinserts {sku6}
Now we can go ahead and do inserts for the rest of the CSV lines.
, - . SELECT, , , .
?
, , ?