[Biopython] FASTA file parsing

Mon Jun 30 06:22:36 UTC 2014

Hello Ismail,

An easy fix for this is to just use the built in function str() on the Seq
output. Parsing the fasta file with SeqIO.parse() should generate SeqRecord
objects which will have a Seq object stored as the seq attribute. Calling
str() on this attribute will give the string that you want.

>>> from Bio import SeqIO
>>> rec_generator = SeqIO.parse("mylibrary.fasta", "fasta")
>>> mydict = {rec.id : str(rec.seq) for rec in rec_generator}

This snippet of code should produce what you are looking for. I would
consider working with the native biopython SeqRecord though. It is easier
to call str() on a Seq object later than it is to reassign meta-data but
obviously you should do what suits your needs best. Also, you might look
into SeqIO.index() as it will produce an 'id : SeqRecord' dictionary
directly and with greater memory efficiency for large files.

- Evan

On Jun 29, 2014 7:36 PM, "Ismail Uddin" <ismail.sameeuddin at gmail.com> wrote:
>
> Dear Sir or Madam,
>
> I would like to post a question regarding FASTA file parsing using the
BioPython module. The current tutorial online indicates how to parse a
FASTA file, but the output is in the format Seq('<<sequence here>>',
SingleLetterAlphabet())
>
> I would like to know how one may simply print out the entire sequence
without any adjoining text i.e. 'ACTACGGCGAT'
>
> I ask this question, as I am trying to write a script that will read each
entry in the FASTA file and produce a dictionary of key being the ID and
the value being the raw sequence.

>
> Thank you in advance for your help and cooperation,
> Ismail Uddin
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20140629/f61a4f17/attachment.html>