Use python for grep lines from one file from another file

A similar question to the "grep" alternative in python; but the difficulty here is that grepped is a variable (line) from another file. I cannot figure out how to do this using functions like re.findall ()

file1:

1  20  200
1  30  300

file2:

1  20  200  0.1  0.5
1  20  200  0.3  0.1
1  30  300  0.2  0.6
1  40  400  0.9  0.6
2  50  300  0.5  0.7

Each line from file1 is my model; and I need to find such a template from file2. Then the result should be:

    1  20  200  0.1  0.5
    1  20  200  0.3  0.1
    1  30  300  0.2  0.6

I am trying to solve the problem using either bash or python, but cannot figure it out. THX

+3
source share
4 answers

Here's a non-regex based solution:

with open('/tmp/file1') as f:
  lines1 = f.readlines()

with open('/tmp/file2') as f:
  for line in f:
    if any(line.startswith(x.strip()) for x in lines1):
      print line,
+4
source

, | , :

import re

with open('file1') as file1:
    patterns = "|".join(re.escape(line.rstrip()) for line in file1)

regexp = re.compile(patterns)
with open('file2') as file2:
    for line in file2:
        if regexp.search(line):
            print line.rstrip()

, :

1   20  200 0.1 0.5
1   20  200 0.3 0.1
1   30  300 0.2 0.6

, bash, :

grep -f file1 file2 
+1

I think you need your own loop

file1patterns = [ re.Pattern(l) for l in f1.readlines() ]
lineToMatch = 0
matchedLines = []
for line in f2.readlines():
  if file1patterns[lineToMatch].matches(line):
    matchedLines += line
    lineToMatch += 1
  else:
    lineToMatch = 0
    matchedLines = []
  if len(matchedLines) == len(file1patterns)
    print matchedLines
    lineToMatch = 0
    matchedLines = []

(Not an actual compilation of Python, but hopefully you only have to move forward)

0
source

Step 1: Read all the lines from file 1, separate them and add them as tuples to the set. This will help us perform the search in the next step faster.

with open('file1', 'r') as f:
    file1_lines = set([tuple(line.strip().split()) for line in f])

Step 2: Filter the lines from file2 that match your criteria if they start with any of the lines in file1:

with open('file2', 'r') as f2:
    for line in itertools.ifilter(lambda x: tuple(x.split()[:3]) in file1_lines, f2):
        print line
0
source

All Articles