We are experimenting with BigQuery to analyze user data generated by our software application.
Our worksheet consists of hundreds of millions of rows, each of which represents a unique user session. Each of them contains a timestamp, UUID and other fields that describe the interaction of the user with our product during this session. We currently generate about 2 GB of data (~ 10 M rows) per day.
Each so often, we can run queries across the entire data set (about 2 months is now worth it and grow). However, typical queries will cover only one day, week or month. We found that as our table grows, our one-day query becomes more and more expensive (as we expect, given the BigQuery architecture)
What is the best way to more easily query subsets of our data? One approach that I can come up with is to "split" the data into separate tables for a day (or week, month, etc.), then request them together in a join:
SELECT foo from
mytable_2012-09-01,
mytable_2012-09-02,
mytable_2012-09-03;
Is there a better way than this?
source
share