Lazyily file filtering before reading

Suppose I have a BIG file with some lines that I want to ignore, and a function ( file_function) that takes a file object. Can I return a new file object whose lines satisfy some condition without first reading the entire file , this laziness is an important part.

Note. I could just save a temporary file ignoring these lines, but this is not ideal.

For example, suppose I had a csv file (with a bad line):

1,2
ooops
3,4

The first attempt was to create a new file object (with the same methods as the file) and overwrite readline:

class FileWithoutCondition(file):
    def __init__(self, f, condition):
        self.f = f
        self.condition = condition
    def readline(self):
        while True:
            x = self.f.readline()
            if self.condition(x):
                return x

This works if it file_nameonly uses readline... but not if it requires some other features.

with ('file_name', 'r') as f:
    f1 = FileWithoutOoops(f, lambda x: x != 'ooops\n')
    result = file_function(f1)

, StringIO, , .

, file_function - , , (, , ?).
(skim-) ?

: pandas , readline pd.read_csv..

+5
1

Python. , GET /index, , :

import re
from collections import defaultdict

pattern = re.compile(r'GET /index\(.*\).html')

# define FILE appropriately.
# map
# the condition here serves to filter lines that can not match.
matches = (pattern.search(line) for line in file(FILE, "rb") if 'GET' in line)
mapp    = (match.group(1) for match in matches if match)

# now reduce, lazy:
count = defaultdict(int)
for request in mapp:
    count[request] += 1

> 6 . . mmap , ( ).

+1

All Articles