I would like to periodically (hourly) upload application logs to Cassandra for analysis using a pig.
How is this usually done? Is there a project (s) that focuses on this?
I see mumakil commonly used for bulk data loading. I could write a cron work built around this, but was hoping for something stronger than the work I would whip.
I also want to change applications for storing data in a different format (for example, syslog or directly in Cassandra), if this is preferable. Although in this case, I would be concerned about data loss if Cassandra is unavailable.
source
share