I am trying to create genetic signatures. I have a text file full of DNA sequences. I want to read in each line from a text file. Then add 4mers that contain 4 bases in the dictionary. For example: Example sequence
ATGATATATCTATCAT
What I want to add is ATGA, TGAT, GATA, etc. into a dictionary with an identifier that only increases by 1 when 4mers are added.
So the dictionary will contain ...
Genetic signatures, ID
ATGA,1
TGAT, 2
GATA,3
Here is what I still have ...
import sys
def main ():
readingFile = open("signatures.txt", "r")
my_DNA=""
DNAseq = {}
for char in readingFile:
my_DNA = my_DNA+char
for char in my_DNA:
index = 0
DnaID=1
seq = my_DNA[index:index+4]
if (DNAseq.has_key(seq)):
index= index +1
else :
DNAseq[seq] = DnaID
index = index+1
DnaID= DnaID+1
readingFile.close()
if __name__ == '__main__':
main()
Here is my conclusion:
ACTC
ACTC
ACTC
ACTC
ACTC
ACTC
This conclusion suggests that it does not iterate through each character in the string ... please help!
source
share