How to deal with a BIG DATA Data / Fact Table? (240 million lines)

Question

How to deal with a BIG DATA Data / Fact Table? (240 million lines)

We have a BI client that generates about 40 million rows every month in its sales database tables from their sales transactions. They want to build Sales Data Mart with their historical data for 5 years, which means that this fact table will have about 240 million rows. (40 x 12 months x 5 years)

This is well structured data.

This is the first time I've come across this amount of data, and I needed to analyze vertical database tools such as Inforbright and others. But still, with such software, a simple request will take a very long time.

This made me look at Hadoop, but after reading some articles, I came to the conclusion that Hadoop is not the best option (even with Hive) to create a fact table, since in my understanding it means working with unstructured data.

So my question is: what would be the best way to build this problem ?, I'm not looking for the right technology? What would be the best response time to a query that I could get in such a large fact table? .. or did I come across a real wall here, and the only option is to build aggregated tables?

+5

database hadoop hive data-warehouse infobright

Eduardo williams Jun 07 '12 at 17:56

source share

6 answers

Sathish Senathi · Answer 1 · 2012-06-08T01:24:08+0000

You have verified Google BigQuery (Paid Premium Service) that suits your needs. It is as simple as

CSV ( char ). gzip. .
SQL ( sql), .
CSV ( )

. https://developers.google.com/bigquery/

100 . , , Google Spreadsheet, , .. . Google Microsoft Excel/PDF.

Google ( ).

bugg_tb · Answer 2 · 2012-06-07T19:05:46+0000

240 2400 .

ssd.analytical-labs.com

FCC 150 , Infobright, , VW .

, , , .

, , , , .

, Marts , , , .. 1 ( ), .

cheep, , , .

, OLAP, , , , , .

, , , , , , , .

. 0 , , , 1 90% , (date dim ) .

2 . , - .

Tom

Edit:

, JVD:

ssd : 175.67 /
sata : 113,52 /
ec2: 75,65 /
ec2 ebs raid: 89.36 /

, .

crorella · Answer 3 · 2012-06-07T22:34:09+0000

, ,

1). mondrian, agg - , , , , , .

2) - , , , . , ( Oracle) , MS SqlServer.

, . , ETL- ( 1 ), RDMBS .

rs_atl · Answer 4 · 2012-06-07T18:50:49+0000

NoSQL/Analysis, DataStax Enterprise, Apache Cassandra Hadoop . , Hadoop "" HDFS , NoSQL (, Cassandra HBase) MapReduce.

Olaf · Answer 5 · 2012-06-07T19:31:34+0000

, , Hadoop + Hive. Map/Reduce jobs Hive . .

, () SQL- . - Hive . , , , .

Tariq · Answer 6 · 2012-06-07T19:05:25+0000

hasoop . hbase, , . ... , , . , sets..you apache "sqoop", .

How to deal with a BIG DATA Data / Fact Table? (240 million lines)

More articles: