[BioPython] Changes in NCBI BLAST output format !!??

aurelie.bornot at free.fr aurelie.bornot at free.fr
Tue Jul 19 09:08:20 EDT 2005


Hi !

I've got the same problem as Jessica Leigh (in the Discussion List) :
When I try to parse a BLAST file with a script that worked until the beginning
of July, I get this syntax error :

Line does not contain 'Database':
(Blank line)

It seem that the NCBI has made changes :

-"Old" blast file :
<p>
<b>Query=</b> sequence
         (569 letters)

<p>
<b>Database:</b> All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           3,047,402 sequences; 13,743,552,639 total letters

<p> <p>If you have any problems or questions with the results...

-New Blast file :

<b>Query=</b> sequence
         (540 letters)


<b>Database:</b> All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           3,312,348 sequences; 14,588,094,788 total letters

<p> <p>If you have any problems or questions with the...


The <p> before Query and Database are missing !!!
And the fact is that in Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py, it
seems that the code to find "Database" uses the <p> :

def _scan_database_info(self, uhandle, consumer):
        attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
        read_and_call(uhandle, consumer.database_info, contains='Database')
        ....


I'm not sure to have a good understanding of what happens...
But could someone help...
I don't know what to do. Is it possible to correct the problem easily ?

Thanks a lot !!
Aurelie

--------------
Aurelie BORNOT
MNHN
Paris


More information about the BioPython mailing list