The documentation for DistributedCache states:
Its effectiveness is due to the fact that files are copied only once per task and the ability to cache archives that are not archived on slave devices.
What does it mean when he says that he can "cache archives that are not archived on subordinates"? Are cached files deleted after every job? I would like to be able to run the same job hundreds of times on different datasets without the additional overhead of redistributing DistributedCache files to each individual job. Is it possible?
Hadoop , DistributedCache. 0, . , DistributedCache , node .