Is it possible to calculate the percentiles of a data set in map reduction mode?

Question

Is it possible to calculate the percentiles of a data set in map reduction mode?

My understanding is to calculate percentiles, the data needs to be sorted. Is this possible with the huge amount of data distributed on multiple servers without moving it?

+5

java statistics mapreduce percentile

marathon Sep 16 '12 at 2:53

source share

3 answers

David gruzman · Answer 1 · 2012-09-16T06:20:07+0000

While MapReduce as a paradigm is not suitable for this problem, an implementation of MRO is implemented.
Hadoop's implementation of map reduction is based on distributed sorting - and this is what you need. Hadoop does sorting by moving data between servers only once - not so bad.
I would suggest looking at the chaos terrace, which illustrates a good (and probably best) way to sort massive data with chaop. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html

Peter Lawrey · Answer 2 · 2012-09-16T09:36:34+0000

, , . , , . , O (1) / O (log n) O (M) , M - O (N), N - .

, , .

asteri · Answer 3 · 2012-09-16T03:40:32+0000

, . Map-Reduce . Map-Reduce (, Hadoop) . , . ( , XML Hadoop... .)

Map-Reduce "Clydesdale". ( , , / .)

, .

Is it possible to calculate the percentiles of a data set in map reduction mode?

More articles: