We have a BI client that generates about 40 million rows every month in its sales database tables from their sales transactions. They want to build Sales Data Mart with their historical data for 5 years, which means that this fact table will have about 240 million rows. (40 x 12 months x 5 years)
This is well structured data.
This is the first time I've come across this amount of data, and I needed to analyze vertical database tools such as Inforbright and others. But still, with such software, a simple request will take a very long time.
This made me look at Hadoop, but after reading some articles, I came to the conclusion that Hadoop is not the best option (even with Hive) to create a fact table, since in my understanding it means working with unstructured data.
So my question is: what would be the best way to build this problem ?, I'm not looking for the right technology? What would be the best response time to a query that I could get in such a large fact table? .. or did I come across a real wall here, and the only option is to build aggregated tables?
source
share