Separation by date?

We are experimenting with BigQuery to analyze user data generated by our software application.

Our worksheet consists of hundreds of millions of rows, each of which represents a unique user session. Each of them contains a timestamp, UUID and other fields that describe the interaction of the user with our product during this session. We currently generate about 2 GB of data (~ 10 M rows) per day.

Each so often, we can run queries across the entire data set (about 2 months is now worth it and grow). However, typical queries will cover only one day, week or month. We found that as our table grows, our one-day query becomes more and more expensive (as we expect, given the BigQuery architecture)

What is the best way to more easily query subsets of our data? One approach that I can come up with is to "split" the data into separate tables for a day (or week, month, etc.), then request them together in a join:

SELECT foo from mytable_2012-09-01, mytable_2012-09-02, mytable_2012-09-03;

Is there a better way than this?

+5
source share
2 answers

Hi David: The best way to handle this is to trick your data into many tables and run queries, as you suggest in your example.

To be more clear, BigQuery does not have the concept of indexes (by design), so outlining data into separate tables is a useful strategy to ensure query accounting is as efficient as possible.

, , , - expirationTime , , - .

0

All Articles