Stack for loading log files in cassandra

I would like to periodically (hourly) upload application logs to Cassandra for analysis using a pig.

How is this usually done? Is there a project (s) that focuses on this?

I see mumakil commonly used for bulk data loading. I could write a cron work built around this, but was hoping for something stronger than the work I would whip.

I also want to change applications for storing data in a different format (for example, syslog or directly in Cassandra), if this is preferable. Although in this case, I would be concerned about data loss if Cassandra is unavailable.

+3
source share
2 answers

HDFS Flume, . Pig / .

0

Flume, Flume ( ). https://github.com/geminitech/logprocessing.

Pig, , HDFS ( S3). Hadoop , , . -, -. Pig Cassandra, Cassandra, .

, Cassandra, Flume, Kafka Storm.

Cassandra Storm:

  • Kafka (, log4j)
  • storm-kafka
  • Cassandra ( ). - .
+1

All Articles