I have a coding problem in a bioinformatics project I'm working on. Basically, my task is to extract motive sequences from the database and use this information to annotate the sequence alignment file. An alignment file is plain text, so annotation will not be anything complicated, at best just replacing the extracted sequences with asterisks in the alignment file itself.
I have a script that scans a database file, extracts all the sequences I need and writes them to the output file. I need, given the request, to read these sequences and match them with the corresponding substrings in the ASCII alignment files. Finally, for each occurrence of a sequence of motifs (a substring of a very large string of characters), I would replace the sequence of motifs XXXXXXX with a sequence of asterisks * .
The code I use is as follows (11SGLOBULIN is the name of the protein record in the database):
motif_file = open('/users/myfolder/final motifs_11SGLOBULIN','r')
align_file = open('/Users/myfolder/alignmentfiles/11sglobulin.seqs', 'w+')
finalmotifs = motif_file.readlines()
seqalign = align_file.readlines()
for line in seqalign:
if motif[i] in seqalign:
replace(motif, '*****')
Instead of replacing each line with a series of asterisks, the entire file is deleted. Can anyone understand why this is happening?
, , ASCII , Python , , .