Hdfs client program using private datanode IP address running on Amazon EC2

Question

Hdfs client program using private datanode IP address running on Amazon EC2

All,

I am following the setup of a hasoop single-node cluster on Amazon EC2. Both names and datanode work on the same Amazon EC2 instance.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ http://www.slideshare.net/benjaminwootton/configuring-your-first-hadoop- cluster-on-ec2

I have a hdfs client program in C ++ and it does not start on an EC2 instance.

When my client program tries to write data to the cluster, I received the following exception:

2014-02-13 14:37:01,027 INFO hdfs.DFSClient(DFSOutputStream.java:createBlockOutputStream(1175)) - Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.168.15.63:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:13

The file is created on hdfs but is empty. The exception seems to indicate that it cannot create a data block in the datanode due to the use of a private IP address (10.168.15.63►0010) instead of a public (or ec2-54-xxx-xxx-233.us-xxx - 1.compute.amazonaws.com or 54.xxx.xxx.233). I don't have a fixed ip address (i.e. Elastic ip address).

This is how I determine the datanode address in hdfs-site.xml.

<property>
    <name>dfs.datanode.address</name>
    <value>ec2-54-xxx-xxx-233.us-xxx-1.compute.amazonaws.com:50010</value>
    <description>The host and port that the MapReduce job tracker runs at.  If "local", then jobs are run in-process as a single map and reduce task.
    </description>
</property>