Real-time analytical processing system development

I am developing a system that should analyze a large number of user transactions and create aggregate measures (for example, trends, etc.). The system must be fast, reliable and scalable. The system is based on Java (on Linux).

The data comes from a system that generates log files (based on CSV) of user transactions. The system generates a file every minute, and each file contains transactions of different users (sorted by time), each file can contain thousands of users.

Example data structure for a CSV file:

10: 30: 01, user 1, ...
10: 30: 01, user 1, ...
10: 30: 02, user 78, ...
10: 30: 02, user 2, ...
10: 30: 03, user 1, ...
10: 30: 04, user 2, ...,,
.

The system I am planning is to process the files and do some analysis in real time. He must collect the input data, send it to several algorithms and other systems and store the results in a database. The database does not contain actual input records, but only high-level aggregated analysis of transactions. For example, trends, etc.

, , 10 , 10 5 , .

Storm , .

:

  • , .

  • , , .

  • 10 ( 5 ), 10 5 , . , 10 , Storm Field Grouping ( , ) 10 , , .

  • , , , ( ).

# 3.

? , 10 . : ? , Redis ( ).

+5
2

redis . , redis

# 3 3

  • 10

  • 5

1. : Redis - . , , STRING. , , , . In redis . .

2. 10 : , , redis. . : LPUSH LTRIM LLEN, , . , , , .

3. 5 : . redis , , expiry. . . , , . , . , , . ​​ [ ] redis . , . timestamp score, set member [ , 1, ], . zrangebyscore

:

Redis List .

LLEN, , 10.

, , [Sorted Set] Score Current Timestamp + 5 min Value .

LLEN 10, , [ ] db [ - > ]. .

, , , , db .

. , redis

+5

1 2: [Apache Flume Kafka]

№ 3: [ . Redis Esper.]

0

All Articles