All,
I am following the setup of a hasoop single-node cluster on Amazon EC2. Both names and datanode work on the same Amazon EC2 instance.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.slideshare.net/benjaminwootton/configuring-your-first-hadoop- cluster-on-ec2
I have a hdfs client program in C ++ and it does not start on an EC2 instance.
When my client program tries to write data to the cluster, I received the following exception:
2014-02-13 14:37:01,027 INFO hdfs.DFSClient(DFSOutputStream.java:createBlockOutputStream(1175)) - Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.168.15.63:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:13
The file is created on hdfs but is empty. The exception seems to indicate that it cannot create a data block in the datanode due to the use of a private IP address (10.168.15.63βΊ0010) instead of a public (or ec2-54-xxx-xxx-233.us-xxx - 1.compute.amazonaws.com or 54.xxx.xxx.233). I don't have a fixed ip address (i.e. Elastic ip address).
This is how I determine the datanode address in hdfs-site.xml.
<property>
<name>dfs.datanode.address</name>
<value>ec2-54-xxx-xxx-233.us-xxx-1.compute.amazonaws.com:50010</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
</description>
</property>
What else needs to be configured so that the remote client can access the Hadoop cluster on EC2?
Thank you in advance!
source
share