Finding / replacing substrings with annotations in an ASCII file in Python

I have a coding problem in a bioinformatics project I'm working on. Basically, my task is to extract motive sequences from the database and use this information to annotate the sequence alignment file. An alignment file is plain text, so annotation will not be anything complicated, at best just replacing the extracted sequences with asterisks in the alignment file itself.

I have a script that scans a database file, extracts all the sequences I need and writes them to the output file. I need, given the request, to read these sequences and match them with the corresponding substrings in the ASCII alignment files. Finally, for each occurrence of a sequence of motifs (a substring of a very large string of characters), I would replace the sequence of motifs XXXXXXX with a sequence of asterisks * .

The code I use is as follows (11SGLOBULIN is the name of the protein record in the database):

motif_file = open('/users/myfolder/final motifs_11SGLOBULIN','r')
align_file = open('/Users/myfolder/alignmentfiles/11sglobulin.seqs', 'w+') 
finalmotifs = motif_file.readlines()
seqalign = align_file.readlines() 


for line in seqalign:
    if motif[i] in seqalign:  # I have stored all motifs in a list called "motif"
        replace(motif, '*****') 

Instead of replacing each line with a series of asterisks, the entire file is deleted. Can anyone understand why this is happening?

, , ASCII , Python , , .

+3
4

- . , python 2.7.

motifs = [ x.strip() for x in open('final motifs_11SGLOBULIN','r') ]
redact = '*****'

with open('11sglobulin.seqs','r') as data_in, open('11sglobulin.seqs.new','w') as data_out:
  for seq in data_in:
    for motif in motifs:
      while True:
        x = seq.find(motif)
        if x >= 0:
          seq = seq[:x] + redact + seq[x+len(motif):]
        else:
          break
  data_out.write(seq)
+2

w+. w+ open ( ): http://docs.python.org/library/functions.html#open. seq , :

align_file = open('/Users/myfolder/alignmentfiles/11sglobulin.seqs', 'w+')

replace , . .

. , align_file, .

+2

You can simplify this a bit by changing the innermost while loop:

while True:
    x = seq.find(motif)
    if x >= 0:
      seq = seq[:x] + redact + seq[x+len(motif):]
    else:
      break

at

if motif in seq:
  seq = seq.replace(motif, redact)
+1
source

Thanks to everyone, I really appreciate the feedback, I apologize for the answer. So basically what I had to do was just as many noted, open the file for annotation and write these annotations to a new file. This bit of code did the trick:

align_file_rmode = open('/Users/spyros/folder1/python/printsmotifs/alignfiles/query, 'r') 
align_file_amode = open('/Users/spyros/folder1/python/printsmotifs/alignfiles/query, 'a+')

finalmotifs = motif_file.readlines()
seqalign = align_file_rmode.readlines() 

for line in seqalign: 
   for item in finalmotifs:
      item = item.strip().upper()
      if item in line:
         line = line.replace(item, '$' * len(item)) 
         align_file_amode.write(line) 

motif_file.close()
align_file_rmode.close()
align_file_amode.close()
0
source

All Articles