Alrite .. so .. here is the situation: I am responsible for the ETL software migration architecture (most likely EAI) based on Java. I will have to port this to Hadoop (Apache version). Now, technically, it is more like a reboot than a migration. I do not have a database for migration. It's about using Hadoop, so the conversion phase ("ETL") is parallel. That would make my ETL software
- Faster - with conversion parallel to iz-ed.
- Scalability - handling more data / big data is adding more nodes.
- Reliability - The redundancy and reliability of Hadoop will add my product features.
I tested this configuration - changed my transformational algo into a mapreduce model, tested it on a high-performance Hadoop cluster and demonstrated performance. Now I'm trying to understand and document all those things that can stand in the way of redesigning / intercepting / migrating this application. Here are a few I could think of:
- The remaining two phases: extraction and loading. My ETL tool can handle various data sources. So, do I redo my data adapters to read data from these data sources, upload them to HDFS, and then convert them and upload to the target data source? Could this move be a huge bottleneck for the whole architecture?
- : , - , ETL ? , , , // ? - Hadoop - . -, - Hadoop? ( , ).
- . Hadoop? , - ACL?
/, , Hadoop/.
, .