Failed to load OpenNLP offer model in Hadoop map-reduce job

I am trying to integrate OpenNLP into the work of reducing the map on Hadoop, starting with some basic suggestion. Inside the map function, the following code is executed:

public AnalysisFile analyze(String content) {
    InputStream modelIn = null;
    String[] sentences = null;

    // references an absolute path to en-sent.bin
    logger.info("sentenceModelPath: " + sentenceModelPath);

    try {
        modelIn = getClass().getResourceAsStream(sentenceModelPath);
        SentenceModel model = new SentenceModel(modelIn);
        SentenceDetectorME sentenceBreaker = new SentenceDetectorME(model);
        sentences = sentenceBreaker.sentDetect(content);
    } catch (FileNotFoundException e) {
        logger.error("Unable to locate sentence model.");
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if (modelIn != null) {
            try {
                modelIn.close();
            } catch (IOException e) {
            }
        }
    }

    logger.info("number of sentences: " + sentences.length);

    <snip>
}

When I start my work, I get an error message in the log in which "in should not be null!". (source of class throwing error) , which means that somehow I cannot open the InputStream for the model. Other tidbits:

  • I checked that the model file exists in the location sentenceModelPath.
  • I added Maven dependencies for opennlp-maxent: 3.0.2-incubating, opennlp-tools: 1.5.2-incubating and opennlp-uima: 1.5.2-incubating.
  • Hadoop only works on my local machine.

OpenNLP. -, , Hadoop, OpenNLP, ?

+3
1

- getClass().getResourceAsStream(sentenceModelPath). - HDFS, mapper/reducer, Null (getResourceAsStream() null, ).

, :

  • HDFS:

    modelIn = FileSystem.get(context.getConfiguration()).open(
                     new Path("/sandbox/corpus-analysis/nlp/en-sent.bin"));
    
  • -files GenericOptionsParser ( HDFS /):

    modelIn = new FileInputStream("en-sent.bin");
    
  • ( ) , :
    modelIn = getClass().getResourceAsStream("/en-sent.bin");</li>
    

+6

All Articles