Pandas has a great feature .read_table(), but huge files lead to a MemoryError.
Since I only need to load lines that satisfy a certain condition, I am looking for a way to only load them.
This can be done using a temporary file:
with open(hugeTdaFile) as huge:
with open(hugeTdaFile + ".partial.tmp", "w") as tmp:
tmp.write(huge.readline())
for line in huge:
if SomeCondition(line):
tmp.write(line)
t = pandas.read_table(tmp.name)
Is there a way to avoid such use of a temporary file?
source
share