[BioPython] Bio.SeqIO and files with one record

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Wed Jul 11 10:32:36 UTC 2007


Hi,

Peter wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> Do I understand correctly that the function is to return a record 
>> instead of a parser? If yes I think it could be useful. parse_single 
>> sounds good.
> 
> Yes, sorry if I wasn't clear.
> 
> Bio.SeqIO.parse(handle, format) would still return an iterator giving 
> SeqRecord objects.
> 
> The suggested function (possibly called) Bio.SeqIO.parse_single(handle, 
> format) would return a single SeqRecord object if the file contains one 
> and only one record. It would raise exceptions for no records, or more 
> than one record.
> 
> e.g.
> 
> from Bio import SeqIO
> handle = open('example.gbk')
> record = Bio.SeqIO.parse_single(handle, genbank')
> 
> or,
> 
> from Bio import SeqIO
> record = Bio.SeqIO.parse_single(open('example.faa'), 'fasta')

I think it does make sense, but call it parse_the_only_one() to make it clear,
it does not pick up just the very first record from the many.

> 
> As I said, I sometimes find myself wanting to do this - for example 
> single query BLAST files in fasta format, or bacterial genomes in 
> GenBank format.
> 
> The question is, is this worth adding to the interface or is this a 
> relatively rare need?

Once people learn to wrap the iterator in a loop it is not necessary, but I think
if you have the time to do this ... ;-)
Martin



More information about the Biopython mailing list