[Biopython-dev] Creating a NCBIFastaIterator

Fri Oct 7 16:06:53 UTC 2011

On 10/07/2011 12:00 PM, Peter Cock wrote:
> Hi Andrew,
>
> Interesting idea, although it doesn't fit that well with the current
> (deliberately) simple high level Bio.SeqIO.parse/read API,
> that doesn't mean we can't do it (see Bio.Phylo.parse).
>
> In this case I fail to see what benefit this gives over the current
> situation, where the user can do this themselves with the
> current FASTA parser,
>
> e.g. With a function and a generator expression,
>
> records = (do_ncbi_my_way(record) for record in SeqIO.parse(filename, "fasta"))
>
> or more simply within a loop:
>
> for record in SeqIO.parse(filename, "fasta")):
>      do_ncbi_my_way(record)
>      #Do stuff with record
>
> etc.
>
> Maybe it is down to personal preference of coding style?

I agree, there isn't much difference between specifying the callback 
function in parse() or within the loop. To me, this points out that 
re-implementing a FASTA parser simply for a format of description line 
seems unnecessary.

If a user is interesting in extracting a particular piece of information 
from a FASTA description and knows the input format of the file, how 
difficult is it for them to split() it on their own? What exactly are 
the advantages of a separate parser?

> I would much prefer a new "fasta-ncbi" parser in SeqIO
> that handled all the documented NCBI FASTA identifiers.
>
> I'm being negative here - but please don't let that deter you
> from posting ideas. This is a public list and we/I welcome
> constructive criticism and alternative ideas to the table.
>
> Regards,
>
> Peter