I need to write a program to pull blocks in HDFS from specific node data for a given file. This is not done in the task MapReduce, but programmatically to collect some statistics in my cluster.
So far, given the file, I have managed to extract the block locations associated with this file. There are several replicas for each block, and I need to access the blocks lying on a particular replica. I understand that these are mostly remote read operations.
I have done the following:
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path inFile = new Path(args[0]);
FileStatus status = fs.getFileStatus(inFile);
BlockLocation[] locs = fs.getFileBlockLocations(status, 0, status.getLen());
String[] hosts = locs[0].getHosts ();
Can someone tell me how can I get the block indicated by pointers [0] from hosts [0]?
source
share