Hadoop streamed in python using mongo-hadoop

I am trying to get map size reduction function using python using mongo-hadoop. Hadoop works, the streaming streaming application works with python and the mongo-hadoop adapter works. However, mongo-hadoop streaming playback examples with python do not work. When I try to run the example in streaming / examples / treasury, I get the following error:

$ user @host: ~ / git / mongo-hadoop / streaming $ hasoop jar target / mongo-hadoop-streaming-assembly-1.0.1.jar -mapper examples / treasury / mapper.py -reducer examples /treasury/reducer.py -inputformat com.mongodb.hadoop.mapred.MongoInputFormat -outputformat com.mongodb.hadoop.mapred.MongoOutputFormat -inputURI mongodb: //127.0.0.1/mongo_hadoop.yield_historical.in -outputURI mongodb: 0,1 / mongo_hadoop.yield_historical.streaming.out

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Running

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Init

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Process Args

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Setup Options'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: PreProcess Args

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Parse Options

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-mapper'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'examples/treasury/mapper.py'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-reducer'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'examples/treasury/reducer.py'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-inputformat'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'com.mongodb.hadoop.mapred.MongoInputFormat'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-outputformat'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'com.mongodb.hadoop.mapred.MongoOutputFormat'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-inputURI'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/mongo_hadoop.yield_historical.in'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-outputURI'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/mongo_hadoop.yield_historical.streaming.out'

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Add InputSpecs

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Setup output_

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Post Process Args

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Args processed.

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson

**Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/filecache/DistributedCache**
    at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:959)
    at com.mongodb.hadoop.streaming.MongoStreamJob.run(MongoStreamJob.java:36)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at com.mongodb.hadoop.streaming.MongoStreamJob.main(MongoStreamJob.java:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.filecache.DistributedCache
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    ... 10 more

If someone can shed light, this will be a big help.

Full information:

As far as I can tell, I had to do the following four things:

  • Hadoop installation and testing
  • Install and test data streams using python
  • Installing and testing mongo-hadoop
  • Installing and testing mongo-hadoop streaming using python

, , . (https://github.com/danielpoe/cloudera) cloudera 4

  • - cloudera 4
  • michael nolls, python
  • mongodb.org , ufo- ( cdh4 build.sbt)
  • twitter 1,5 , readme twitter /, .
+5
1

pymongo_hadoop? ?

0

All Articles