Distributing custom configuration values ​​in Hadoop

Is there a way to set and (later) get a custom configuration object in Hadoop during Map / Reduce?

For example, suppose an application that preprocesses a large file and dynamically determines some characteristics associated with the file. In addition, suppose that these characteristics are stored in a custom Java object (for example, an object Properties, but not exclusively, as some may not be strings) and subsequently needed for each map and reduction jobs.

How can an application “propagate” this configuration so that each display and reducer function can access it when necessary?

One approach may be to use a set(String, String)class method JobConfand, for example, pass a configuration object serialized as a string JSONthrough the second parameter, but it can be too much hacking and then the corresponding JobConfinstance should be accessible by everyone Mapperand Reducerin any case (for example, after an approach like sentence in the previous question).

+5
source share
1 answer

If I am missing something, if you have an object Propertiescontaining all the properties that you need in your M / R task, you just need to write the contents of the object Propertiesto the Hadoop object Configuration, For example, something like this:

Configuration conf = new Configuration();
Properties params = getParameters(); // do whatever you need here to create your object
for (Entry<Object, Object> entry : params.entrySet()) {
    String propName = (String)entry.getKey();
    String propValue = (String)entry.getValue();
    conf.set(propName, propValue);
}

M/R Context, Configuration mapper ( map), ( reduce),

public void map(MD5Hash key, OverlapDataWritable value, Context context)
    Configuration conf = context.getConfiguration();
    String someProperty = conf.get("something");
    ....
}

, Configuration Context setup cleanup, .

, addResource Configuration, InputStream , , XML, Hadoop XML configs, .

EDIT. , String, : , (, , , Base64, , , ), / - , Configuration.

, HDFS, DistributedCache. , , , .

+8

All Articles