I am new to pytables and asked a question about saving compressed pandas DataFrame. My current code is:
import pandas
H5name="C:\\MyDir\\MyHDF.h5"
store=pandas.io.pytables.HDFStore(H5name)
myDF.to_hdf(H5name,"myDFname",append=True)
myDF1=pandas.io.pytables.read_hdf(H5name,"myDFname")
store.close()
When I checked the size of the generated HDF5, the size (212kb) was much larger than the csv source file (58kb) that I used to create the pandas DataFrame.
So, I tried compression (deleting the HDF5 file) and recreating
store=pandas.io.pytables.HDFStore(H5name,complevel=1)
and the size of the created file has not changed. I tried everything complevelsfrom 1 to 9, and the size remained the same.
I tried to add
store=pandas.io.pytables.HDFStore(H5name,complevel=1,complib="zlib")
but he did not change the compression.
What could be the problem?
Also, ideally, I would like to use compression similar to what R does for its save function (for example, in my case, the 58kb file was saved at 27 KB in RData)? Do I need to do any additional serialization in Python to reduce the size?
EDIT:
Python 3.3.3 pandas 0.13.1
EDIT:
csv 487 , RData size ( R) 169 . . Bzip2 202 ( = 9) /. Blosc ( = 9) 276 , /.
, R - save, , , .