[Biopython] FASTA file parsing

Peter Cock p.j.a.cock at googlemail.com
Mon Jun 30 13:00:25 UTC 2014


On Sun, Jun 29, 2014 at 11:43 AM, Ismail Uddin
<ismail.sameeuddin at gmail.com> wrote:
> Dear Sir or Madam,
>
> I would like to post a question regarding FASTA file parsing using the
> BioPython module. The current tutorial online indicates how to parse a FASTA
> file, but the output is in the format Seq('<<sequence here>>',
> SingleLetterAlphabet())
>
> I would like to know how one may simply print out the entire sequence
> without any adjoining text i.e. 'ACTACGGCGAT'
>
> I ask this question, as I am trying to write a script that will read each
> entry in the FASTA file and produce a dictionary of key being the ID and the
> value being the raw sequence.
>
> Thank you in advance for your help and cooperation,
> Ismail Uddin

I think this is a confusion about python's default str(...) and
repr(...) behaviour. At the Python prompt:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_seq = Seq("ACTACGGCGAT", generic_dna)
>>> my_seq
Seq('ACTACGGCGAT', DNAAlphabet())
>>> print(my_seq)
ACTACGGCGAT
>>> str(my_seq)
'ACTACGGCGAT'

The one you didn't like is the representation, meant to look
like what you would type to create the object. If you actually
print the object, Python automatically does str(my_seq)

Compare basic python strings,

>>> my_string = "ABC"
>>> my_string
'ABC'
>>> print(my_string)
ABC

Notice one gives the quote marks, one does not.

Can you suggest where we can make the Biopython
documentation clearer on this issue?

Thanks,

Peter


More information about the Biopython mailing list