[Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15

Fri May 23 10:45:58 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2502

------- Comment #5 from ibdeno at gmail.com  2008-05-23 06:45 EST -------
Hi Peter,

Thank you. The problem must be then with the blastpgp call from Biopython,
since my code was trying to obtain plain text via the align_view='0' option:

blast_out, error_info = NCBIStandalone.blastpgp(
        blastcmd='/home/mortiz/Progs/blast-2.2.15/bin/blastpgp',
        database='/drives/databases/BlastDB/nrdb100ncbi',
        infile=file,
        npasses=passes,
        align_view='0',
        matrix_outfile=file + '.nrdb100ncbi.pssm')

However, when I print the result of this call with the handler you proposed:

handle = open("blastpgp_2.2.18.txt","w")
handle.write(blast_out.read())
handle.close()

I actually get plain text!

The same blastpgp call (same binary, same database, same input file sequence,
same number of PSI-Blast iterations) still gives the error reported in the bug
with version 2.2.18, but works all right with 2.2.15.
Because the error appears within seconds, I'm wondering if the parser is not
trying to read the results before blastpgp has actually finished the iterations
(about 3 minutes in my test)

I'm without a clue...

Miguel

(In reply to comment #4)
> I've worked out that the original problem was use to trying to parse XML output
> with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text
> output only).  Perhaps the error message could be more helpful in this
> situation?
> 
> I'm using Biopython from CVS, but it seems to parse the plain text output from
> both 2.2.15 and 2.2.18 fine.  Here is a modified version of your code which
> reads from the example plain text files provided:
> 
> #!/usr/bin/env python
> #
> import os, re, string, operator
> from Bio.Blast import NCBIStandalone
> from sys import *
> 
> E_VALUE_THRESH = 0.005
> 
> nolf = re.compile('\n')
> nogaps = re.compile('-')
> 
> blast_out = open("blastpgp.2.2.18.txt")
> b_parser = NCBIStandalone.PSIBlastParser()
> b_record = b_parser.parse(blast_out)
> 
> if b_record.converged == 1:
>     print '*** Converged!!! ***'
> 
> fastaout = open('test_psiblast.fasta','w')
> summout = open('test_psiblast.txt','w')
> 
> for alignment in b_record.rounds[-1].alignments:
>     for hsp in alignment.hsps:
>         if hsp.expect < E_VALUE_THRESH:
>             ident = 100.0*hsp.identities[0]/hsp.identities[1]
>             simil = 100.0*hsp.positives[0]/hsp.positives[1]
>             mytitle = nolf.sub(' ',alignment.title)
>             mysbjct = nogaps.sub('',hsp.sbjct)
>             summout.write('****Alignment****\n')
>             summout.write('sequence: %s\n' % mytitle[0:70])
>             summout.write('e value: %e\n' % hsp.expect)
>             summout.write('alignment length: %i\n' % hsp.positives[1])
>             summout.write('identity:   %(ident)5.2f\n' % {'ident': ident} )
>             summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} )
>             summout.write('query:   from %i to %i\n' % (hsp.query_start,
> hsp.query_end))
>             summout.write('subject: from %i to %i\n' % (hsp.sbjct_start,
> hsp.sbjct_end))
>             summout.write('%s ...\n' % hsp.query[0:75])
>             summout.write('%s ...\n' % hsp.match[0:75])
>             summout.write('%s ...\n' % hsp.sbjct[0:75])
>             fastaout.write('%s\n%s\n' % (mytitle,mysbjct))
> 
> summout.close()
> fastaout.close()
> print "Done"
> 
> ----------------------------------------------------------------------
> 
> So, as far as I can tell, the plain text PSI Blast parser is fine .
> 
> As I do not have the relevant databases installed, I have not tried using
> Biopython to call blastpgp to run PSI-Blast.  It could be there is a problem
> here with specifying the output format...
> 
> As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I
> think you get back an iterator yielding a record for each iteration.  However,
> as the example you provided had only one query and one iteration, this should
> be tested further.  The record is not showing all the information extracted by
> the PSI-Blast text parse, which should be in the XML file.  Perhaps you would
> like to investigate this?
> 
> Example code:
> 
> from Bio.Blast import NCBIStandalone, NCBIXML
> 
> for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] :
>     print
>     print filename
>     print "="*len(filename)
>     handle = open(filename)
>     record = NCBIStandalone.PSIBlastParser().parse(handle)
>     print record.query
>     if record.converged : print '*** Converged!!! ***'
>     for iter_round in record.rounds :
>         print "Iteration with %i alignments" \
>               % (len(iter_round.alignments))
>         print "%i new sequences, %i reused" \
>               %(len(iter_round.new_seqs), len(iter_round.reused_seqs))
>     print "End of plain text output"
> 
> for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] :
>     print
>     print filename
>     print "="*len(filename)
>     handle = open(filename)
>     for iter_round in NCBIXML.parse(handle) :
>         print iter_round.query
>         print "Iteration with %i alignments" \
>               % (len(iter_round.alignments))
>     print "End of XML output"
> 
> The output:
> 
> blastpgp.2.2.15.txt
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> 500 new sequences, 0 reused
> End of plain text output
> 
> blastpgp.2.2.18.txt
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> 500 new sequences, 0 reused
> End of plain text output
> 
> blastpgp.2.2.15.xml
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 500 alignments
> End of XML output
> 
> blastpgp.2.2.18.xml
> ===================
> gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens]
> Iteration with 250 alignments
> End of XML output
> 
> Notice that NCBI must have changed the XML format in some way (500 versus 250
> alignments between versions 2.2.15 and 2.2.18).  I have not explored this in
> any detail.
> 

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.