I am new to perl and would like to do what, in my opinion, is the basic DNA sequence string manipulation stored in the rtf file.
Essentially, my file is being read (the file is in FASTA format):
>LM1
AAGTCTGACGGAGCAACGCCGCGTGTATGAAGAAGGTTTTCGGATCGTAA
AGTACTGTCCGTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTT
GACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGG
TAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGCGC
GCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCCCCGGCTTAACCGGGGAG
GGTCATTGGAAACTGGAAGACTGGAGTGCAGAAGAGGAGAGTGGAATTCC
ACGTGTAGCGGTGAAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAG
GCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
AACAGGATTAGATACCCTGGTAGTCCACGCCGT
What I would like to do is read in my file and print the header (header is> LM1), then match the next DNA sequence GTGCCAGCAGCCGCand then print the previous DNA sequence.
Therefore, my output will look like this:
>LM1
AAGTCTGACGGAGCAACGCCGCGTGTATGAAGAAGGTTTTCGGATCGTAA
AGTACTGTCCGTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTT
GACGGTATCTAACCAGAAAGCCACGGCTAACTAC
I wrote the following program:
use strict; use warnings;
open(FASTA, "<seq_V3_V6_130227.rtf") or die "The file could not be found.\n";
while(<FASTA>) {
chomp($_);
if ($_ =~ m/^>/ ) {
my $header = $_;
print "$header\n";
}
my $dna = <FASTA>;
if ($dna =~ /(.*?)GTGCCAGCAGCCGC/) {
print "$dna";
}
}
close(FASTA);
The problem is that my program reads the file line by line, and the output I get is the following:
>LM1
GACGGTATCTAACCAGAAAGCCACGGCTAACTAC
, , $dna , , , . :
$dna (m//) stacked.pl 14, 1113.
- , .