[Biopython] get back raw records with SeqIO?

Peter biopython at maubp.freeserve.co.uk
Thu Oct 1 04:06:22 EDT 2009


On Thu, Oct 1, 2009 at 12:14 AM, Cedar McKay <cmckay at u.washington.edu> wrote:
>
>> Why do you want to do this? I'd like to understand the desired
>> usage.
>
> I didn't have a specific technical reason.

OK - if you come up with a good use case example, please let us know.

> It just seemed like everything was going towards using SeqIO and things
> like Bio.Fasta were being deprecated, so I wanted to get ahead of the
> curve there. But if Bio.Genbank is going to be around for a long time,
> I don't have any problem with doing it that way.

For more complicated file formats (e.g. GenBank, SwissProt, ACE,
PHRED, ...) mapping the data into SeqRecord objects isn't 100%
perfect. Here Bio.SeqIO really is just a unifing API sitting on top
of file format specific parsers (which live in other modules), which
is good enough for most tasks. Unless/until the SeqRecord objects
are a full mapping, any more file format specific data-structure still
has its uses - and thus I see no immediate pressure to remove
Bio.GenBank etc.

Unlike some of the Bio.SeqIO parsers, for "fasta" we don't use
an underlying module (such as Bio.Fasta), and the SeqRecord
can capture all of the annotation in the raw file. One reason
for this is at the time, Bio.Fasta still used Martel and was
noticeably slower than the pure python code I adopted for
FASTA files in SeqIO. Since then Bio.Fasta has lost all the
Martel dependencies (which meant the loss of the old indexing
code, indirectly leading to the Bio.SeqIO.index() function as
per our previous discussions). This means that the remaining
code in Bio.Fasta is now redundant. Maybe we could have
just left Bio.Fasta alone, sitting quietly but tagged obsolete,
but it is clearer to remove redundancy.

Peter

P.S. For the record, Bio.Fasta was declared obsolete in
Biopython 1.48 (Sept 2008), and deprecated in Biopython
1.51 (Aug 2009).


More information about the Biopython mailing list