In the background, the source file is split into HDFS blocks, the size of which is customizable (usually 128 MB, default 64 MB). For fault tolerance, each block is automatically replicated by HDFS. By default, three copies of each block are written to three different DataNodes. The replication rate is user-configurable (three by default). DataNodes are servers that are physical machines or virtual machines / cloud instances. DataNodes form a Hadoop cluster in which you write your data and on which you run MapReduce / Hive / Pig / Impala / Mahout / etc. programs.
DataNodes are Hadoop cluster workers, NameNodes are wizards.
When the file is to be written to HDFS, the client writing the file receives from the NameNode a list of DataNodes that can host replicas of the first block of the file.
, DataNodes. DataNode DataNode ( ) DataNode. , , DataNodes , . DataNode , DataNode . , .
DataNodes , , DataNodes NameNode . , HDFS. , .
: Hadoop: .