[BioPython] Biopython BLAST parser error

m at pavis.biodec.com m at pavis.biodec.com
Mon May 26 17:06:08 EDT 2003


Hello
    I've got a problem parsing the following output:

http://pavis.biodec.com/~m/1be3_C.blast.bz2 (warning *big file*)

It was written by 

blastpgp -d nr -e 1e-9 -b 10000 -v 10000 -j3

on this protein: 1be3.pdb C.chain

MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGICLILQILTGLFLAMHYTSDTTTA  
FSSVTHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVM
ATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILP
FIIMAIAMVHLLFLHETGSNNPTGISSDVDKIPFHPYYTIKDILGALLLILALMLLVLFAPD
LLGDPDNYTPANPLNTPPHIKPEWYFLFAYAILRSIPNKLGGVLALAFSILILALIPLLHTS
KQRSMMFRPLSQCLFWALVADLLTLTWIGGQPVEHPYITIGQLASVLYFLLILVLMPTAGTI
ENKLLKW

The Blast version that I am running is 

BLASTP 2.2.4 [Aug-26-2002]

but I've got the same behaviour with BLASTP 2.1.3 [Apr-1-2001]

I do not know the nr version number, but it is the databases 
with 705,002 sequences; 222,117,092 total letters (sorry for not 
having better information)

As I see it, the problem lies in parsing the rows number 298999 
and number 588182, where the is the phrase:

``Sequences not found previously or not previously below threshold:''

I am testing BioPython 1.10a, under Python 2.2.2, GNU / Linux / 
Debian Unstable

p.s.: the trace of the error, apart from irrrelevant data, is

    blast_parse=parser.parse(UndoHandle(blastfile))
  File "/usr/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 611, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 84, in feed
    self._scan_rounds(uhandle, consumer)
  File "/usr/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",                   
line 139, in _scan_rounds
    self._scan_descriptions(uhandle, consumer)
  File "/usr/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 245, in _scan_descriptions
    read_and_call_until(uhandle, consumer.description, blank=1)
  File "/usr/lib/python2.2/site-packages/Bio/ParserSupport.py", line
371, in read_and_call_until
    method(line)
  File "/usr/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 677, in description
    dh = self._parse(line)
  File "/usr/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 734, in _parse
    dh.score = _safe_int(dh.score)
  File "/usr/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py",
line 1633, in _safe_int
    return long(float(str))
ValueError: invalid literal for float(): b

-- 
 .*.                            finelli
 /V\
(/ \) --------------------------------------------------------------
(   )       Linux: Friends dont let friends use Piccolosoffice
^^-^^ --------------------------------------------------------------

And the crowd was stilled.  One elderly man, wondering at the sudden silence,
turned to the Child and asked him to repeat what he had said.  Wide-eyed,
the Child raised his voice and said once again, "Why, the Emperor has no
clothes!  He is naked!"
- "The Emperor's New Clothes"


More information about the BioPython mailing list