[BioPython] Bio.SeqIO and files with one record

Peter biopython at maubp.freeserve.co.uk
Tue Jul 10 20:03:10 UTC 2007


Dear Biopython people,

I'd like a little feedback on the Bio.SeqIO module - in particular, one 
situation I think could be improved is when dealing with sequences files 
which contain a single record - for example a very simple Fasta file, or 
a chromosome in a GenBank file.

http://www.biopython.org/wiki/SeqIO

The shortest way to get this one record as a SeqRecord object is probably:

from Bio import SeqIO
record = SeqIO.parse(open("example.gbk"), "genbank").next()

This works, assuming there is at least one record, but will not trigger 
any error if there was more than one record - something you may want to 
check.

Do any of you think this situation is common enough to warrant adding 
another function to Bio.SeqIO to do this for you (raising errors for no 
records or more than one record).  My suggestions for possible names
include parse_single, parse_one, parse_sole, parse_individual and mono_parse

One way to do this inline would be:

from Bio import SeqIO
temp_list = list(SeqIO.parse(open("example.gbk"), "genbank"))
assert len(temp_list) == 1
record = temp_list[0]
del temp_list

Or perhaps:

from Bio import SeqIO
temp_iter = list(SeqIO.parse(open("example.gbk"), "genbank"))
record = temp_iter.next()
try :
     assert temp_iter.next() is None
except StopIteration :
     pass
del temp_iter

The above code copes with the fact that in general some iterators may 
signal the end by raising a StopIteration except, or by returning None.

Peter

P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html




More information about the Biopython mailing list