Is it possible to calculate the percentiles of a data set in map reduction mode?

My understanding is to calculate percentiles, the data needs to be sorted. Is this possible with the huge amount of data distributed on multiple servers without moving it?

+5
source share
3 answers

While MapReduce as a paradigm is not suitable for this problem, an implementation of MRO is implemented.
Hadoop's implementation of map reduction is based on distributed sorting - and this is what you need. Hadoop does sorting by moving data between servers only once - not so bad.
I would suggest looking at the chaos terrace, which illustrates a good (and probably best) way to sort massive data with chaop. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html

+2
source

, , . , , . , O (1) / O (log n) O (M) , M - O (N), N - .

, , .

+2

, . Map-Reduce . Map-Reduce (, Hadoop) . , . ( , XML Hadoop... .)

Map-Reduce "Clydesdale". ( , , / .)

, .

0
source

All Articles