[BioPython] Bio.SeqIO and files with one record
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Wed Jul 11 10:32:36 UTC 2007
Hi,
Peter wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> Do I understand correctly that the function is to return a record
>> instead of a parser? If yes I think it could be useful. parse_single
>> sounds good.
>
> Yes, sorry if I wasn't clear.
>
> Bio.SeqIO.parse(handle, format) would still return an iterator giving
> SeqRecord objects.
>
> The suggested function (possibly called) Bio.SeqIO.parse_single(handle,
> format) would return a single SeqRecord object if the file contains one
> and only one record. It would raise exceptions for no records, or more
> than one record.
>
> e.g.
>
> from Bio import SeqIO
> handle = open('example.gbk')
> record = Bio.SeqIO.parse_single(handle, genbank')
>
> or,
>
> from Bio import SeqIO
> record = Bio.SeqIO.parse_single(open('example.faa'), 'fasta')
I think it does make sense, but call it parse_the_only_one() to make it clear,
it does not pick up just the very first record from the many.
>
> As I said, I sometimes find myself wanting to do this - for example
> single query BLAST files in fasta format, or bacterial genomes in
> GenBank format.
>
> The question is, is this worth adding to the interface or is this a
> relatively rare need?
Once people learn to wrap the iterator in a loop it is not necessary, but I think
if you have the time to do this ... ;-)
Martin
More information about the Biopython
mailing list