[Biopython] FASTA file parsing

Mon Jun 30 13:00:25 UTC 2014

On Sun, Jun 29, 2014 at 11:43 AM, Ismail Uddin
<ismail.sameeuddin at gmail.com> wrote:
> Dear Sir or Madam,
>
> I would like to post a question regarding FASTA file parsing using the
> BioPython module. The current tutorial online indicates how to parse a FASTA
> file, but the output is in the format Seq('<<sequence here>>',
> SingleLetterAlphabet())
>
> I would like to know how one may simply print out the entire sequence
> without any adjoining text i.e. 'ACTACGGCGAT'
>
> I ask this question, as I am trying to write a script that will read each
> entry in the FASTA file and produce a dictionary of key being the ID and the
> value being the raw sequence.
>
> Thank you in advance for your help and cooperation,
> Ismail Uddin

I think this is a confusion about python's default str(...) and
repr(...) behaviour. At the Python prompt:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_seq = Seq("ACTACGGCGAT", generic_dna)
>>> my_seq
Seq('ACTACGGCGAT', DNAAlphabet())
>>> print(my_seq)
ACTACGGCGAT
>>> str(my_seq)
'ACTACGGCGAT'

The one you didn't like is the representation, meant to look
like what you would type to create the object. If you actually
print the object, Python automatically does str(my_seq)

Compare basic python strings,

>>> my_string = "ABC"
>>> my_string
'ABC'
>>> print(my_string)
ABC

Notice one gives the quote marks, one does not.

Can you suggest where we can make the Biopython
documentation clearer on this issue?

Thanks,

Peter