Finding / replacing substrings with annotations in an ASCII file in Python

Question

Finding / replacing substrings with annotations in an ASCII file in Python

I have a coding problem in a bioinformatics project I'm working on. Basically, my task is to extract motive sequences from the database and use this information to annotate the sequence alignment file. An alignment file is plain text, so annotation will not be anything complicated, at best just replacing the extracted sequences with asterisks in the alignment file itself.

I have a script that scans a database file, extracts all the sequences I need and writes them to the output file. I need, given the request, to read these sequences and match them with the corresponding substrings in the ASCII alignment files. Finally, for each occurrence of a sequence of motifs (a substring of a very large string of characters), I would replace the sequence of motifs XXXXXXX with a sequence of asterisks * .

The code I use is as follows (11SGLOBULIN is the name of the protein record in the database):

motif_file = open('/users/myfolder/final motifs_11SGLOBULIN','r')
align_file = open('/Users/myfolder/alignmentfiles/11sglobulin.seqs', 'w+') 
finalmotifs = motif_file.readlines()
seqalign = align_file.readlines() 


for line in seqalign:
    if motif[i] in seqalign:  # I have stored all motifs in a list called "motif"
        replace(motif, '*****')

Instead of replacing each line with a series of asterisks, the entire file is deleted. Can anyone understand why this is happening?

, , ASCII , Python , , .

+3

python text-processing bioinformatics biopython

Spyros 03 '11 13:14

4

w+. w+ open ( ): http://docs.python.org/library/functions.html#open. seq , :

align_file = open('/Users/myfolder/alignmentfiles/11sglobulin.seqs', 'w+')

replace , . .

. , align_file, .

+2

Alex Stoddard 03 '11 14:49

You can simplify this a bit by changing the innermost while loop:

while True:
    x = seq.find(motif)
    if x >= 0:
      seq = seq[:x] + redact + seq[x+len(motif):]
    else:
      break

at

if motif in seq:
  seq = seq.replace(motif, redact)

+1

John Gaines Jr. May 03 '11 at 14:34

source share

Thanks to everyone, I really appreciate the feedback, I apologize for the answer. So basically what I had to do was just as many noted, open the file for annotation and write these annotations to a new file. This bit of code did the trick:

align_file_rmode = open('/Users/spyros/folder1/python/printsmotifs/alignfiles/query, 'r') 
align_file_amode = open('/Users/spyros/folder1/python/printsmotifs/alignfiles/query, 'a+')

finalmotifs = motif_file.readlines()
seqalign = align_file_rmode.readlines() 

for line in seqalign: 
   for item in finalmotifs:
      item = item.strip().upper()
      if item in line:
         line = line.replace(item, '$' * len(item)) 
         align_file_amode.write(line) 

motif_file.close()
align_file_rmode.close()
align_file_amode.close()

0

Spyros May 07, '11 at 15:48

source share

MattH · Accepted Answer · 2011-05-03T14:16:08+0000

- . , python 2.7.

motifs = [ x.strip() for x in open('final motifs_11SGLOBULIN','r') ]
redact = '*****'

with open('11sglobulin.seqs','r') as data_in, open('11sglobulin.seqs.new','w') as data_out:
  for seq in data_in:
    for motif in motifs:
      while True:
        x = seq.find(motif)
        if x >= 0:
          seq = seq[:x] + redact + seq[x+len(motif):]
        else:
          break
  data_out.write(seq)

Finding / replacing substrings with annotations in an ASCII file in Python

More articles: