I am trying to process a good piece of data (several GB), but my personal computer is resisting to do this in a reasonable amount of time, so I was wondering what my options are? I used python csv.reader , but it was very slow even to extract 200,000 lines. Then I transferred this data to the sqlite database , which got the results faster and didn't use that much memory, but slowness remained a serious problem.
So again ... what parameters should I process with this data? I was interested in using amazon point instances that seem useful for this kind of purpose, but maybe there are other solutions to explore.
Suppose spot instances are a good option, and given that I have never used them before, I would like to ask, what can I expect from them? Does anyone have experience using them for this kind of thing? If so, what is your workflow? I thought I could find several blog posts describing workflows for scientific computing, image processing, or such things, but I didn’t find anything, so if you can explain this a bit or provide some links, I would appreciate it.
Thanks in advance.
source
share