Hadoop MapReduce: default number of cards

If I do not indicate the number of cards, how to determine the number? Is there a default parameter read from a configuration file (e.g. mapred-site.xml)?

+3
source share
2 answers

Adding to what Chris added above:

  • The number of cards is usually determined by the number of DFS blocks in the input files. Although this forces people to adjust the DFS block size to adjust the number of cards.

  • The correct level of parallelism for maps seems to be around 10-100 maps / node, although this can reach 300 or so for very complex map tasks. Setting up a task takes some time, so it’s best if the cards run for at least a minute.

  • , JobConf conf.setNumMapTasks(int num). . , , Hadoop .

, . mapred.map.tasks - InputFormat . InputFormat , . DFS . mapred.min.split.size.

, 10TB 128MB DFS, 82k-, mapred.map.tasks . InputFormat .

: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

+6

:

  • (TextInputFormat, SequenceFileInputFormat ..):
    • /
    • - , ( , SequenceFiles )

, , ,

+5

All Articles