BioHaskell: read the FASTA file

Using BioHaskell , how can I read a FASTA file containing amino acid sequences?

I want to be able to:

  • Get a list of Stringsequences
  • Get Map String String(from Data.Map) from a FASTA (supposedly unique) comment on a String sequence
  • Use sequences in algorithms implemented in BioHaskell.

Note. This question intentionally does not show research efforts, since he immediately answered in the style of Q & A.

+3
source share
1 answer

Retrieving the lines of the source sequence

, aa.fa FASTA. , .

import Bio.Sequence.Fasta (readFasta)
import Bio.Sequence.SeqData (seqdata)
import qualified Data.ByteString.Lazy.Char8 as LB

main = do
    sequences <- readFasta "aa.fa"
    let listOfSequences = map (LB.unpack . seqdata) sequences :: [String]
    -- Just for show, we will print one sequence per line here
    -- This will basically execute putStrLn for each sequence
    mapM_ putStrLn listOfSequences

readFasta IO [Sequence Unknown]. , , .

, LB.unpack show , show (") String. LB.unpack , BioHaskell 0.5.3, SeqData ByteString.

, castToAmino castToNuc:

AA/Nucleotide

let aaSequences = map castToAmino sequences :: [Sequence Amino]

, (BioHaskell 0.5.3) . [Sequence Amino] [Sequence Nuc] BioHaskell.

FASTA

, aa.fa

>abc123
MGLIFARATNA...

Map String String ( Data.Map.Strict) FASTA. .

Maybe String. , , , , Map .

Data.Maybe , Data.Foldable.mapM_ .

import Bio.Sequence.Fasta (readFasta)
import Bio.Sequence.SeqData (Sequence, seqdata, seqheader)
import qualified Data.ByteString.Lazy.Char8 as LB
import Data.Foldable (mapM_)
import qualified Data.Map.Strict as Map

-- | Convert a Sequence to a String tuple (sequence label, sequence)
sequenceToMapTuple :: Sequence a -> (String, String)
sequenceToMapTuple s = (LB.unpack $ seqheader s, LB.unpack $ seqdata s)

main = do
    sequences <- readFasta "aa.fa"
    -- Build the sequence map (by header)
    let sequenceMap = Map.fromList $ map sequenceToMapTuple sequences
    -- Lookup the sequence for the "abc123" header
    mapM_ print $ Map.lookup "abc123" sequenceMap

: @GabrielGonzalez Data.Foldable.mapM_ Data.Maybe.fromJust

+4

All Articles