The company we work with wants to provide us with a 1.2 g CSV file every day, containing about 900,000 sheets. Only a small part of the file changes every day, maybe less than 0.5%, and these are really just added or discarded products, not changes. We need to display product listings to our partners.
What makes this more difficult is that our partners should be able to see the product lists available within a radius of 30-500 miles of their zip code. Each line of the list of goods has a field for which the actual radius of the product (some of them are only 30, some are 500, some are 100, etc. 500 is the maximum). A partner in a given zip code is likely to have only 20 results or so, which means there will be a ton of unused data. We do not know all affiliate zip codes ahead of time.
We need to consider performance, so I'm not sure which is the best way to do this.
Do I have to have two databases - one with zip codes and latitude / longitude and use the Haversin formula to calculate the distance ... and the other - the actual product database ... and then what should I do? Return all zip codes within a given radius and find a match in the product database? For a radius of 500 miles, which will be a ton of zip codes. Or write a MySQL function?
We could use Amazon SimpleDB to store the database ... but then I still have this problem with zip codes. Could I make two "domains", as Amazon calls them, one for products and one for zip codes? I do not think you can make a request through several SimpleDB domains. At least I don't see this in the documentation.
- . PHP/MySQL SimpleDB. , - P4 2 gb. , , . VPS - , , VPS , CSV 1.2 gb. , ... , , .