NullPointerException from Hadoop JobSplitWriter / SerializationFactory when calling InputSplit getClass ()

I start NullPointerExceptionwhen the task starts MapReduce. It throws a method SerializationFactory getSerializer(). I use a custom value classes InputSplit, InputFormat, RecordReaderand MapReduce.

I know that an error occurs some time after the creation of partitions by my class InputFormat, but before the creation RecordReader. As far as I can tell, this happens immediately after the message “cleaning the setting area”.

By checking the source of the Hadoop at the locations indicated by the stack trace, it looks like an error occurs when it getSerialization()receives a null pointer Class<T>. JobClient writeNewSplits()calls this method as follows:

Serializer<T> serializer = factory.getSerializer((Class<T>) split.getClass());

So, I assume that when getClass()called in my custom objects InputSplit, it returns a pointer null, but that is just perplexing. Any ideas?

Full stack trace from error:

06/12/24 14:26:49 INFO mapred.JobClient: Cleaning up the staging area hdfs: // localhost: 54310 / tmp / hadoop-s3cur3 / mapred / staging / s3cur3 / .staging / job_201206240915_0035
Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer (SerializationFactory.java:73)
    at org.apache.hadoop.mapreduce.split.JobSplitWriter.writeNewSplits (JobSplitWriter.java:123)
    at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles (JobSplitWriter.java:74)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits (JobClient.java:968)
    at org.apache.hadoop.mapred.JobClient.writeSplits (JobClient.java:979)
    at org.apache.hadoop.mapred.JobClient.access $ 600 (JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient $ 2.run (JobClient.java:897)
    at org.apache.hadoop.mapred.JobClient $ 2.run (JobClient.java:850)
    at java.security.AccessController.doPrivileged (Native Method)
    at javax.security.auth.Subject.doAs (Subject.javahaps96)
    at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal (JobClient.java:850)
    at org.apache.hadoop.mapreduce.Job.submit (Job.java∗00)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion (Job.java∗30)
    at edu.cs.illinois.cogcomp.hadoopinterface.infrastructure.CuratorJob.start (CuratorJob.java:94)
    at edu.cs.illinois.cogcomp.hadoopinterface.HadoopInterface.main (HadoopInterface.java:58)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke (Method.java∗97)
    at org.apache.hadoop.util.RunJar.main (RunJar.java:156)

Thank!

EDIT: My code for custom InputSplit follows:

import . . .

/**
 * A document directory within the input directory. 
 * Returned by DirectoryInputFormat.getSplits()
 * and passed to DirectoryInputFormat.createRecordReader().
 *
 * Represents the data to be processed by an individual Map process.
 */
public class DirectorySplit extends InputSplit {
    /**
     * Constructs a DirectorySplit object
     * @param docDirectoryInHDFS The location (in HDFS) of this
     *            document directory, complete with all annotations.
     * @param fs The filesystem associated with this job
     */
    public  DirectorySplit( Path docDirectoryInHDFS, FileSystem fs )
            throws IOException {
        this.inputPath = docDirectoryInHDFS;
        hash = FileSystemHandler.getFileNameFromPath(inputPath);
        this.fs = fs;
    }

    /**
     * Get the size of the split so that the input splits can be sorted by size.
     * Here, we calculate the size to be the number of bytes in the original
     * document (i.e., ignoring all annotations).
     *
     * @return The number of characters in the original document
     */
    @Override
    public long getLength() throws IOException, InterruptedException {
        Path origTxt = new Path( inputPath, "original.txt" );
        HadoopInterface.logger.log( msg );
        return FileSystemHandler.getFileSizeInBytes( origTxt, fs);
    }

    /**
     * Get the list of nodes where the data for this split would be local.
     * This list includes all nodes that contain any of the required data---it's
     * up to Hadoop to decide which one to use.
     *
     * @return An array of the nodes for whom the split is local
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    public String[] getLocations() throws IOException, InterruptedException {
        FileStatus status = fs.getFileStatus(inputPath);

        BlockLocation[] blockLocs = fs.getFileBlockLocations( status, 0,
                                                              status.getLen() );

        HashSet<String> allBlockHosts = new HashSet<String>();
        for( BlockLocation blockLoc : blockLocs ) {
            allBlockHosts.addAll( Arrays.asList( blockLoc.getHosts() ) );
        }

        return (String[])allBlockHosts.toArray();
    }

    /**
     * @return The hash of the document that this split handles
     */
    public String toString() {
        return hash;
    }

    private Path inputPath;
    private String hash;
    private FileSystem fs;
}
+5
source share
1 answer

InputSplit does not extend Writable, you will need to explicitly declare your input separator to implement Writable

+5
source

All Articles