Porting a Java Application to Hadoop: Architecture / Design Roadblocks?

Alrite .. so .. here is the situation: I am responsible for the ETL software migration architecture (most likely EAI) based on Java. I will have to port this to Hadoop (Apache version). Now, technically, it is more like a reboot than a migration. I do not have a database for migration. It's about using Hadoop, so the conversion phase ("ETL") is parallel. That would make my ETL software

  • Faster - with conversion parallel to iz-ed.
  • Scalability - handling more data / big data is adding more nodes.
  • Reliability - The redundancy and reliability of Hadoop will add my product features.

I tested this configuration - changed my transformational algo into a mapreduce model, tested it on a high-performance Hadoop cluster and demonstrated performance. Now I'm trying to understand and document all those things that can stand in the way of redesigning / intercepting / migrating this application. Here are a few I could think of:

  • The remaining two phases: extraction and loading. My ETL tool can handle various data sources. So, do I redo my data adapters to read data from these data sources, upload them to HDFS, and then convert them and upload to the target data source? Could this move be a huge bottleneck for the whole architecture?
  • : , - , ETL ? , , , // ? - Hadoop - . -, - Hadoop? ( , ).
  • . Hadoop? , - ACL?

/, , Hadoop/. , .

+3
1
+1

All Articles