Too many fetch failures: Hadoop on the cluster (x2)

I use Hadoop last week or so (trying to deal with this), and although I managed to set up a multi-node cluster (2 machines: 1 laptop and a small desktop) and get the results, I always encounter “Too many extraction failures,” when I start work with a buy-in.

Example output (using a trivial example):

hadoop@ap200:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-0.20.203.0.jar wordcount sita sita-output3X
11/05/20 15:02:05 INFO input.FileInputFormat: Total input paths to process : 7
11/05/20 15:02:05 INFO mapred.JobClient: Running job: job_201105201500_0001
11/05/20 15:02:06 INFO mapred.JobClient:  map 0% reduce 0%
11/05/20 15:02:23 INFO mapred.JobClient:  map 28% reduce 0%
11/05/20 15:02:26 INFO mapred.JobClient:  map 42% reduce 0%
11/05/20 15:02:29 INFO mapred.JobClient:  map 57% reduce 0%
11/05/20 15:02:32 INFO mapred.JobClient:  map 100% reduce 0%
11/05/20 15:02:41 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:02:49 INFO mapred.JobClient: Task Id :      attempt_201105201500_0001_m_000003_0, Status : FAILED
Too many fetch-failures
11/05/20 15:02:53 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:02:57 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:10 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000002_0, Status : FAILED
Too many fetch-failures
11/05/20 15:03:14 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:03:17 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:25 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000006_0, Status : FAILED
Too many fetch-failures
11/05/20 15:03:29 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:03:32 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:35 INFO mapred.JobClient:  map 100% reduce 28%
11/05/20 15:03:41 INFO mapred.JobClient:  map 100% reduce 100%
11/05/20 15:03:46 INFO mapred.JobClient: Job complete: job_201105201500_0001
11/05/20 15:03:46 INFO mapred.JobClient: Counters: 25
11/05/20 15:03:46 INFO mapred.JobClient:   Job Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Launched reduce tasks=1
11/05/20 15:03:46 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=72909
11/05/20 15:03:46 INFO mapred.JobClient:     Total time spent by all reduces waiting  after reserving slots (ms)=0
11/05/20 15:03:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
11/05/20 15:03:46 INFO mapred.JobClient:     Launched map tasks=10
11/05/20 15:03:46 INFO mapred.JobClient:     Data-local map tasks=10
11/05/20 15:03:46 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=76116
11/05/20 15:03:46 INFO mapred.JobClient:   File Output Format Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Bytes Written=1412473
11/05/20 15:03:46 INFO mapred.JobClient:   FileSystemCounters
11/05/20 15:03:46 INFO mapred.JobClient:     FILE_BYTES_READ=4462381
11/05/20 15:03:46 INFO mapred.JobClient:     HDFS_BYTES_READ=6950740
11/05/20 15:03:46 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=7546513
11/05/20 15:03:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1412473
11/05/20 15:03:46 INFO mapred.JobClient:   File Input Format Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Bytes Read=6949956
11/05/20 15:03:46 INFO mapred.JobClient:   Map-Reduce Framework
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce input groups=128510
11/05/20 15:03:46 INFO mapred.JobClient:     Map output materialized bytes=2914947
11/05/20 15:03:46 INFO mapred.JobClient:     Combine output records=201001
11/05/20 15:03:46 INFO mapred.JobClient:     Map input records=137146
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce shuffle bytes=2914947
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce output records=128510
11/05/20 15:03:46 INFO mapred.JobClient:     Spilled Records=507835
11/05/20 15:03:46 INFO mapred.JobClient:     Map output bytes=11435785
11/05/20 15:03:46 INFO mapred.JobClient:     Combine input records=1174986
11/05/20 15:03:46 INFO mapred.JobClient:     Map output records=1174986
11/05/20 15:03:46 INFO mapred.JobClient:     SPLIT_RAW_BYTES=784
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce input records=201001

I made a problem with Google, and people from apache seemed to suggest that it could be anything from a network problem (or something to do with / etc / hosts files) or it could be a damaged disk on the sub nodes.

Just add: I see 2 "live nodes" in the namenode admin panel (localhost: 50070 / dfshealth), and in Map / reduce Admin I also see 2 nodes.

, ? .

: 1:

: http://pastebin.com/XMkNBJTh datanode : http://pastebin.com/ttjR7AYZ

.

+3
3

datanode node/etc/hosts.

. - IP-, - , - :

  • :

    cat/proc/sys/kernel/hostname

    HOSTNAME. IP OK .

  • :

    hostname ***. ***. ***. ***

    Asterisk IP-.

  • :

    127.0.0.1 localhost.localdomain localhost :: 1 localhost6.localdomain6 localhost6 10.200.187.77 10.200.187.77 hadoop-datanode

IP- , , hosts.

+2

, ,

1. Ip 127.0.0.1 127.0.1.1

2. node /, adoop

  -->in Host file 172.21.3.67 master-ubuntu

  -->in master/slave file master-ubuntu

3. . NameSpaceId namenode = NameSpaceId Datanode

+1

I had the same problem: “Too many sampling failures” and very slow Hadoop performance (a simple wordcount example took more than 20 minutes to work on a cluster of powerful 2-node servers). I also received a WARN message mapred.JobClient: Error reading the outputConnection task failed.

The problem was fixed when I followed the instructions of Thomas Junglut: I deleted my node wizard from the slaves configuration file. After that, the errors disappeared and the wordcount example took only 1 minute.

0
source

All Articles