[BioPython] Bio.SeqIO and files with one record

Peter biopython at maubp.freeserve.co.uk
Sat Dec 8 13:20:35 UTC 2007


In July 2007, Peter wrote:
> Dear Biopython people,
>
> I'd like a little feedback on the Bio.SeqIO module - in particular, one
> situation I think could be improved is when dealing with sequences files
> which contain a single record - for example a very simple Fasta file, or
> a chromosome in a GenBank file.
>
> http://www.biopython.org/wiki/SeqIO
>
> The shortest way to get this one record as a SeqRecord object is probably:
>
> from Bio import SeqIO
> record = SeqIO.parse(open("example.gbk"), "genbank").next()
>
> This works, assuming there is at least one record, but will not trigger
> any error if there was more than one record - something you may want to
> check.
>
> Do any of you think this situation is common enough to warrant adding
> another function to Bio.SeqIO to do this for you (raising errors for no
> records or more than one record).  My suggestions for possible names
> include parse_single, parse_one, parse_sole, parse_individual and mono_parse

We had a few other name suggestions including "parse_the_only_one"
from Martin which while nice and clear is very long.

Over on the dev-mailing list, Michiel suggested we call this the
"read" function, which seems sensible.  I've filed an enhancement bug
for this whole issue:

Bugzilla Bug 2417 - Bio.SeqIO single SeqRecord read/parse
functionhttp://bugzilla.open-bio.org/show_bug.cgi?id=2417

I think the general consensus was this functionality could be useful,
but perhaps not to everyone.  In fact it turns out to be very helpful
when parsing records downloaded from the internet - which I hadn't
pointed out earlier.

I plan to add this new functionality as a "read" function - unless
anyone here wants to add anything...

Thanks,

Peter



More information about the Biopython mailing list