I am creating a Python desktop application that allows users to select different forms of distribution to model crop yield data. I have a time series of agricultural data - about a million rows - stored in a SQLite database (although this is not set in stone if someone knows of a better choice). As soon as the user selects some data, say, a corn crop from 1990-2010 in Illinois, I want them to select a distribution form from the drop-down list. Then my function matches the distribution of the data and outputs 10,000 points taken from this established distribution form in the Numpy array. I would like this data to be temporary during program execution.
In an attempt to be effective, I would only like to make it suitable and the subsequent drawing of numbers once for a certain region and distribution. I studied temporary files in Python, but I'm not sure if this is the best approach for saving many different Numpy arrays. PyTables also looks like an interesting approach and seems to be compatible with Numpy, but I'm not sure if it is good for handling temporary data. No SQL solutions, such as MongoDB, are very popular these days, which also interests me in terms of creating a resume.
Edit: After reading the comment below and exploring it, I'm going to go with PyTables, but I'm trying to find a better way to handle this. Is it possible to create a table, as shown below, where instead of Float32Col I can use createTimeSeriesTable () from the scikits time series class or do I need to create a datetime column for a date and a Boolean column for a mask, in addition to Float32Col for storing data. Or is there a better way to solve this problem?
class Yield(IsDescription):
geography_id = UInt16Col()
data = Float32Col(shape=(50, 1))
Any help on this would be greatly appreciated.
source
share