My understanding is to calculate percentiles, the data needs to be sorted. Is this possible with the huge amount of data distributed on multiple servers without moving it?
While MapReduce as a paradigm is not suitable for this problem, an implementation of MRO is implemented.Hadoop's implementation of map reduction is based on distributed sorting - and this is what you need. Hadoop does sorting by moving data between servers only once - not so bad.I would suggest looking at the chaos terrace, which illustrates a good (and probably best) way to sort massive data with chaop. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html
, , . , , . , O (1) / O (log n) O (M) , M - O (N), N - .
, , .
, . Map-Reduce . Map-Reduce (, Hadoop) . , . ( , XML Hadoop... .)
Map-Reduce "Clydesdale". ( , , / .)
, .