Data Compression with HDFStore

Question

Data Compression with HDFStore

I am new to pytables and asked a question about saving compressed pandas DataFrame. My current code is:

import pandas
# HDF5 file name
H5name="C:\\MyDir\\MyHDF.h5"

# create HDF5 file
store=pandas.io.pytables.HDFStore(H5name)

# write a pandas DataFrame to the HDF5 file created
myDF.to_hdf(H5name,"myDFname",append=True)

# read the pandas DataFrame back from the HDF5 file created
myDF1=pandas.io.pytables.read_hdf(H5name,"myDFname")

# close the file
store.close()

When I checked the size of the generated HDF5, the size (212kb) was much larger than the csv source file (58kb) that I used to create the pandas DataFrame.

So, I tried compression (deleting the HDF5 file) and recreating

# create HDF5 file
store=pandas.io.pytables.HDFStore(H5name,complevel=1)

and the size of the created file has not changed. I tried everything complevelsfrom 1 to 9, and the size remained the same.

I tried to add

# create HDF5 file
store=pandas.io.pytables.HDFStore(H5name,complevel=1,complib="zlib")

but he did not change the compression.

What could be the problem?

Also, ideally, I would like to use compression similar to what R does for its save function (for example, in my case, the 58kb file was saved at 27 KB in RData)? Do I need to do any additional serialization in Python to reduce the size?

EDIT:

Python 3.3.3 pandas 0.13.1

EDIT: csv 487 , RData size ( R) 169 . . Bzip2 202 ( = 9) /. Blosc ( = 9) 276 , /.

, R - save, , , .

+3

python pandas r hdf5 pytables

uday 17 . '14 19:57

1

Jeff · Answer 1 · 2014-02-17T22:53:36+0000

. HDF5 ; 64 - . - , , .

msgpack soln . HDF5 .

Data Compression with HDFStore

More articles: