[Biopython-dev] blastpgp parsing buglet

Coleman, Michael MKC at Stowers-Institute.org
Thu May 8 14:45:27 EDT 2003

Parsing by NCBIStandalone.py fails for BLASTP 2.2.5 output.  This is the partial output that trips the problem:

gi|23099742|ref|NP_693208.1| ornithine aminotransferase [Oceanob...   430   e-119
gi|16081241|ref|NP_393547.1| L-2, 4-diaminobutyrate:2-ketoglutar...   430   e-119

Sequences not found previously or not previously below threshold:

>gi|23466947|gb|ZP_00122533.1| hypothetical protein [Haemophilus somnus 129PT]
          Length = 432

 Score =  591 bits (1524), Expect = e-167
 Identities = 191/420 (45%), Positives = 291/420 (69%), Gaps = 7/420 (1%)

The code expects to see a 'CONVERGED' but none is given here.  One possible fix would be to also look for a line beginning with '>', like so

            # Read the descriptions and the following blank lines.
            read_and_call_while(uhandle, consumer.noevent, blank=1)
            l = safe_peekline(uhandle)
            if l[:9] != 'CONVERGED' and l[:1] != '>':
                read_and_call_until(uhandle, consumer.description, blank=1)
                read_and_call_while(uhandle, consumer.noevent, blank=1)


Mike Coleman, Scientific Programmer, +1 816 926 4419
Stowers Institute for Biomedical Research
1000 E. 50th St., Kansas City, MO  64110

More information about the Biopython-dev mailing list