How to speed up Amazon EMR loading?

I use amazon EMR for some intensive calculations, but it takes about 7 minutes to start the calculations, is there any smart way to start the calculation right away? The calculation is a python stream starting from a user website, so I cannot afford a long run time.

Maybe I just missed the option in the ocean, which is the AWS Amazon. I just want simplicity to run tasks (that I used EMR), scalability and pay only for what I use (and start time is not useful).

+3
source share
3 answers

I know this is an old question, but I have some ideas that I would add to the next search engine, which finds this thread in the hope of speeding up loading time on Amazon EMR.

For some time I was wondering why my clusters took so long to enter, usually about 15 minutes. This takes a fairly large chunk of time for work, which usually completes in less than 1 hour. Sometimes it repels work in 1 hour, but I think that, fortunately, AWS does not charge for a full load.

, . , . , , . , 14 , OnDemand, , . OnDemand 5 . , , , , , 15 .

, Spot Core Master , . OnDemand , , Spot.

+4

, . 100 + node , , 15 , . , , 15 , , . , .

+2

?

S3 (), , ( ), .

If this is the only reason, then your 7 minutes of startup time will be transferred to ~ 5 minutes of reading from S3 time => ~ 1GB of input files on S3

+1
source

All Articles