Porting a Java Application to Hadoop: Architecture / Design Roadblocks?

Question

Porting a Java Application to Hadoop: Architecture / Design Roadblocks?

Alrite .. so .. here is the situation: I am responsible for the ETL software migration architecture (most likely EAI) based on Java. I will have to port this to Hadoop (Apache version). Now, technically, it is more like a reboot than a migration. I do not have a database for migration. It's about using Hadoop, so the conversion phase ("ETL") is parallel. That would make my ETL software

Faster - with conversion parallel to iz-ed.
Scalability - handling more data / big data is adding more nodes.
Reliability - The redundancy and reliability of Hadoop will add my product features.

I tested this configuration - changed my transformational algo into a mapreduce model, tested it on a high-performance Hadoop cluster and demonstrated performance. Now I'm trying to understand and document all those things that can stand in the way of redesigning / intercepting / migrating this application. Here are a few I could think of:

The remaining two phases: extraction and loading. My ETL tool can handle various data sources. So, do I redo my data adapters to read data from these data sources, upload them to HDFS, and then convert them and upload to the target data source? Could this move be a huge bottleneck for the whole architecture?
: , - , ETL ? , , , // ? - Hadoop - . -, - Hadoop? ( , ).
. Hadoop? , - ACL?

/, , Hadoop/. , .

+3

java architecture hadoop

Jay 06 . '11 20:22

1

David Gruzman · Accepted Answer · 2011-06-08T10:26:25+0000

, HDFS , datanodes - . , , . , HDFS, .
- , MR - . , HDFS, "" MR, . , . scoop = , SQL Hadoop. http://www.cloudera.com/downloads/sqoop/ HIVE - SQL - , CSV Hive, therof, , (, node) .

Porting a Java Application to Hadoop: Architecture / Design Roadblocks?

More articles: