[BioPython] blast parsing errors

Michiel Jan Laurens de Hoon mdehoon at c2b2.columbia.edu
Mon Mar 5 16:49:53 UTC 2007


Julius Lucks wrote:
> 1.) Is the documentation for the new NCBIXML and NBCIWWW up to date?

No it is not. To ensure that the documentation on the website agrees 
with the current Biopython release, the idea was to update the 
documentation when the next Biopython release comes out. Originally we 
were planning to make a new Biopython release as soon as the new 
Bio.SeqIO code is done. However, I'd be happy to make a release in the 
immediate future without the new Bio.SeqIO, and make another one once 
Bio.SeqIO is ready.

> 2.) Why is NCBIXML.parse returning an iterator in this case since there 
> is only one result?  Or in other words, what are the use cases where an 
> iterator is necessary?

If you're parsing multiple Blast search results at the same time. In 
other words, if the fasta file for the blast search looked like
 > gene1
ATAGCTACG...
 > gene2
ATCGATCGATGGCA...
 > gene3
....

Such a file can be very large, which is why we are using an iterator 
instead of a list.

Now, one may argue that NCBIXML.parse should return a single record 
instead of an iterator if there's only one result. Others may argue that 
for consistency, it should always return an iterator. Either way is fine 
with me. Anybody have a strong opinion about this?

> 3.) How are the fink packages of Biopython maintained?
I don't know. But, it's not too difficult to install Biopython from the 
source distribution or from CVS. So if you want to be sure you have the 
latest version, you might want to try installing from CVS.

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032



More information about the Biopython mailing list