DNA to RNA and protein production with Perl

I am working on a project (I have to implement it in Perl, but I'm not very good), which reads DNA and finds its RNA. Divide this RNA into triplets to get the equivalent protein name. I will explain the steps:

1) Write the following DNA in RNA, then use the genetic code to translate it into an amino acid sequence

Example:

TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT

2) To transcribe DNA, first replace each DNA with its analog (i.e. G for C, C for G, T for A and A for T):

TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT
AGTATTATGCAAAACATAAGCGGTCGCGAAGCCACA

Then remember that the foundation of Timin (T) becomes Uracil (U). Therefore, our sequence will be:

AGUAUUAUGCAAAACAUAAGCGGUCGCGAAGCCACA

Using the genetic code is similar

AGU AUU AUG CAA AAC AUA AGC GGU CGC GAA GCC ACA

() . , AGU Serine, Ser, S. AUU Isoleucine (Ile), I. , :

SIMQNISGREAT

:

enter image description here

, Perl? , .

+3
1

script , STDIN ( , ) . , "STOP" . , .

#!/usr/bin/perl
use strict;
use warnings;

my %proteins = qw/
    UUU F UUC F UUA L UUG L UCU S UCC S UCA S UCG S UAU Y UAC Y UGU C UGC C UGG W
    CUU L CUC L CUA L CUG L CCU P CCC P CCA P CCG P CAU H CAC H CAA Q CAG Q CGU R CGC R CGA R CGG R
    AUU I AUC I AUA I AUG M ACU T ACC T ACA T ACG T AAU N AAC N AAA K AAG K AGU S AGC S AGA R AGG R
    GUU V GUC V GUA V GUG V GCU A GCC A GCA A GCG A GAU D GAC D GAA E GAG E GGU G GGC G GGA G GGG G
    /;

LINE: while (<>) {
    chomp;

    y/GCTA/CGAU/; # translate (point 1&2 mixed)

    foreach my $protein (/(...)/g) {
        if (defined $proteins{$protein}) {
            print $proteins{$protein};
        }
        else {
            print "Whoops, stop state?\n";
            next LINE;
        }
    }
    print "\n"
}
+8

All Articles