I am studying the framework of Hadoop and MapReduce. So far, I have played around text files and processed them using the MapReduce framework.
When I started learning MapReduce, the first popular example, I found WORDCOUNT, which is a text file processing script. Then I wrote my own logic to process some text files and the results shown. In this case, I was successful.
But I need to switch to a different input format. Because in the real world we will not process only text files. I need to investigate processing in various formats, such as images, audio and video, using MapReduce. But I'm struggling to find suitable examples that will serve my purpose. I need some MapReduce examples and tutorials with various input formats, ranging from text and video.
Edit:
I mean image processing, video and audio. Not just a text file.
Edit 2:
Example: Say I have 10-year-old .bmp images (where compression and decompression are not involved) that are 450 GB in size. I need to analyze each image in a folder, and I have to display similar images (comparing the pixel similarity pattern). And I must indicate the images that were created / modified between "From" to "Date." Say the images created / modified between January 2013 and 2013 weight in this image set. How can i do this?
I would be glad if someone helps me go on the right track!
source
share