I start NullPointerExceptionwhen the task starts MapReduce. It throws a method SerializationFactory getSerializer(). I use a custom value classes InputSplit, InputFormat, RecordReaderand MapReduce.
I know that an error occurs some time after the creation of partitions by my class InputFormat, but before the creation RecordReader. As far as I can tell, this happens immediately after the message “cleaning the setting area”.
By checking the source of the Hadoop at the locations indicated by the stack trace, it looks like an error occurs when it getSerialization()receives a null pointer Class<T>. JobClient writeNewSplits()calls this method as follows:
Serializer<T> serializer = factory.getSerializer((Class<T>) split.getClass());
So, I assume that when getClass()called in my custom objects InputSplit, it returns a pointer null, but that is just perplexing. Any ideas?
Full stack trace from error:
06/12/24 14:26:49 INFO mapred.JobClient: Cleaning up the staging area hdfs: // localhost: 54310 / tmp / hadoop-s3cur3 / mapred / staging / s3cur3 / .staging / job_201206240915_0035
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer (SerializationFactory.java:73)
at org.apache.hadoop.mapreduce.split.JobSplitWriter.writeNewSplits (JobSplitWriter.java:123)
at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles (JobSplitWriter.java:74)
at org.apache.hadoop.mapred.JobClient.writeNewSplits (JobClient.java:968)
at org.apache.hadoop.mapred.JobClient.writeSplits (JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access $ 600 (JobClient.java:174)
at org.apache.hadoop.mapred.JobClient $ 2.run (JobClient.java:897)
at org.apache.hadoop.mapred.JobClient $ 2.run (JobClient.java:850)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.javahaps96)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal (JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit (Job.java∗00)
at org.apache.hadoop.mapreduce.Job.waitForCompletion (Job.java∗30)
at edu.cs.illinois.cogcomp.hadoopinterface.infrastructure.CuratorJob.start (CuratorJob.java:94)
at edu.cs.illinois.cogcomp.hadoopinterface.HadoopInterface.main (HadoopInterface.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke (Method.java∗97)
at org.apache.hadoop.util.RunJar.main (RunJar.java:156)
Thank!
EDIT: My code for custom InputSplit follows:
import . . .
public class DirectorySplit extends InputSplit {
public DirectorySplit( Path docDirectoryInHDFS, FileSystem fs )
throws IOException {
this.inputPath = docDirectoryInHDFS;
hash = FileSystemHandler.getFileNameFromPath(inputPath);
this.fs = fs;
}
@Override
public long getLength() throws IOException, InterruptedException {
Path origTxt = new Path( inputPath, "original.txt" );
HadoopInterface.logger.log( msg );
return FileSystemHandler.getFileSizeInBytes( origTxt, fs);
}
@Override
public String[] getLocations() throws IOException, InterruptedException {
FileStatus status = fs.getFileStatus(inputPath);
BlockLocation[] blockLocs = fs.getFileBlockLocations( status, 0,
status.getLen() );
HashSet<String> allBlockHosts = new HashSet<String>();
for( BlockLocation blockLoc : blockLocs ) {
allBlockHosts.addAll( Arrays.asList( blockLoc.getHosts() ) );
}
return (String[])allBlockHosts.toArray();
}
public String toString() {
return hash;
}
private Path inputPath;
private String hash;
private FileSystem fs;
}
source
share