I have a set of tuples (x,y)of 64-bit integers that make up my dataset. I, say, trillions of these tuples; it is not possible to store a dataset in memory on any machine on earth. However, it is wise to store them on disk.
I have a storage on disk (B + -tree) that allows me to quickly and parallelly process data in one dimension. However, some of my queries depend on both dimensions.
Request examples:
- Find a tuple whose is
xgreater than or equal to some given value - Find a tuple whose smallest
xpossible st it is ygreater than or equal to some given value - Find a tuple whose smallest
xpossible st it is yless than or equal to some given value - Perform maintenance operations (insert some tuples, delete some tuples)
The best I have found are Z-order curves, but I can't figure out how to execute queries given my two-dimensional dataset.
Solutions that are unacceptable include sequential scanning of data; this may be too slow.
source
share