[BioPython] Re: Blast parsing

Jeffrey Chang jchang@smi.stanford.edu
Tue, 26 Nov 2002 16:09:21 -0800


Are you using the latest NCBIStandalone.py from cvs.biopython.org?  If
not, please update it and see if your problem's already been fixed.
Otherwise, please send the offending BLAST output so that I can see
what's going on.

Jeff




On Tue, Nov 26, 2002 at 07:02:08PM -0500, R.Austin wrote:
> Hi 
> I think i've found a bug in NCBIStandalone in Python2.2.2
> 
> I have some code that was written on my Mandrake box in Python2.0 and
> runs perfectly, but when I copy it to a RedHat8 box running Python2.2.2
> and the same version of biopython, i get an error.
> 
> The code is right out of the biopython tutorial (almost)and just grabs
> the first E-value and fasta tag for every blast output file in a
> directory.
> 
> ____________________________________________
> 
> 
> from Bio.Blast import NCBIStandalone
> import glob
> 
> blast_glob = '/home/user_name/blastout/*'
> b_parser = NCBIStandalone.BlastParser()
> 
> for next_file in glob.glob(blast_glob):
> 	blast_file = open(next_file, 'r')
> 	b_iterator = NCBIStandalone.Iterator(blast_file, b_parser)
> 	b_record = b_iterator.next()
> 	print b_record.query,
> 	print '\t E-value: ', b_record.alignments[0].hsps[0].expect
> 
> _________________________________________________________
> 
> And it gives the error:
> 
> Traceback (most recent call last):
>   File "./bparse", line 25, in ?
>     b_record = b_parser.parse(blast_file)
>   File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 515, in parse
>     self._scanner.feed(handle, self._consumer)
>   File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 84, in feed
>     self._scan_rounds(uhandle, consumer)
>   File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 140, in _scan_rounds
>     self._scan_alignments(uhandle, consumer)
>   File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 261, in _scan_alignments
>     self._scan_masterslave_alignment(uhandle, consumer)
>   File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 364, in _scan_masterslave_alignment
>     consumer.multalign(line)
>   File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 769, in multalign
>     name = string.rstrip(line[:self._name_length])
> TypeError: sequence index must be integer
> 
> 
> _______________________________________________________
> 
> Any help would be appreciated as I really need this to run on the 
> Redhat8 box in python2.2.2
> 
> Thanks in advance
> R.Austin
> 
> On Tue, 2002-11-26 at 12:00, biopython-request@biopython.org wrote:
> > Send BioPython mailing list submissions to
> > 	biopython@biopython.org
> > 
> > To subscribe or unsubscribe via the World Wide Web, visit
> > 	http://biopython.org/mailman/listinfo/biopython
> > or, via email, send a message with subject or body 'help' to
> > 	biopython-request@biopython.org
> > 
> > You can reach the person managing the list at
> > 	biopython-admin@biopython.org
> > 
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of BioPython digest..."
> > 
> > 
> > Today's Topics:
> > 
> >    1. blast parser (Ken Sugino)
> >    2. Re: blast parser (Brad Chapman)
> > 
> > --__--__--
> > 
> > Message: 1
> > Date: Mon, 25 Nov 2002 12:12:31 -0500
> > From: Ken Sugino <sugino@brandeis.edu>
> > To: biopython@biopython.org
> > Reply-To: sugino@brandeis.edu
> > Subject: [BioPython] blast parser
> > 
> > Hi all,
> > 
> > I encountered an error during a Blast parse:
> > 
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> >   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 47, in parse
> >     self._scanner.feed(handle, self._consumer)
> >   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 98, in feed
> >     self._scan_header(uhandle, consumer)
> >   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 161, in _scan_header
> >     self._scan_database_info(uhandle, consumer)
> >   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 174, in _scan_database_info
> >     read_and_call(uhandle, consumer.noevent, blank=1)
> >   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/ParserSupport.py", line 331, in read_and_call
> >     raise SyntaxError, errmsg
> > SyntaxError: Expected blank line, but got:
> >            1,455,628 sequences; 7,234,536,489 total letters
> > 
> > 
> > The following change seems to fix this error.
> > 
> > Bio.Blast.NCBIWWW.py line 174
> > -        read_and_call(uhandle, consumer.noevent, blank=1)
> > -        read_and_call(uhandle, consumer.noevent,
> > -                      contains='problems or questions')
> > +        read_and_call_until(uhandle, consumer.noevent,
> > +                      contains='problems or questions')
> > +        read_and_call(uhandle, consumer.noevent)
> > 
> > --__--__--
> > 
> > Message: 2
> > Date: Mon, 25 Nov 2002 12:26:03 -0500
> > From: Brad Chapman <chapmanb@arches.uga.edu>
> > To: biopython@biopython.org
> > Subject: Re: [BioPython] blast parser
> > 
> > 
> > --X1bOJ3K7DJ5YkBrT
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> > 
> > Hey Ken;
> > 
> > > I encountered an error during a Blast parse:
> > [...]
> > > SyntaxError: Expected blank line, but got:
> > >            1,455,628 sequences; 7,234,536,489 total letters
> > 
> > I actually got this error myself yesterday when I was playing around
> > with the examples and put a fix into CVS. See, I promised to only use
> > this time machine for good :-).
> > 
> > > The following change seems to fix this error.
> > > 
> > > Bio.Blast.NCBIWWW.py line 174
> > > -        read_and_call(uhandle, consumer.noevent, blank=1)
> > > -        read_and_call(uhandle, consumer.noevent,
> > > -                      contains='problems or questions')
> > > +        read_and_call_until(uhandle, consumer.noevent,
> > > +                      contains='problems or questions')
> > > +        read_and_call(uhandle, consumer.noevent)
> > 
> > The only problem with this is that it throws away the database
> > information, which we do store. The fix I used, in CVS, is attached as a
> > diff. This should also be in the new release, due out real soon now.
> > 
> > Thanks for reporting this!
> > Brad
> > 
> > --X1bOJ3K7DJ5YkBrT
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: attachment; filename="NCBIWWW.diff"
> > 
> > Index: NCBIWWW.py
> > ===================================================================
> > RCS file: /home/repository/biopython/biopython/Bio/Blast/NCBIWWW.py,v
> > retrieving revision 1.24
> > retrieving revision 1.25
> > diff -c -r1.24 -r1.25
> > *** NCBIWWW.py	2002/09/22 05:25:29	1.24
> > --- NCBIWWW.py	2002/11/24 18:52:11	1.25
> > ***************
> > *** 168,176 ****
> >           attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
> >           read_and_call(uhandle, consumer.database_info, contains='Database')
> >           # Sagar Damle reported that databases can consist of multiple lines.
> >           read_and_call_until(uhandle, consumer.database_info,
> > !                             contains='sequences')
> > !         read_and_call(uhandle, consumer.database_info, contains='sequences')
> >           read_and_call(uhandle, consumer.noevent, blank=1)
> >           read_and_call(uhandle, consumer.noevent,
> >                         contains='problems or questions')
> > --- 168,178 ----
> >           attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
> >           read_and_call(uhandle, consumer.database_info, contains='Database')
> >           # Sagar Damle reported that databases can consist of multiple lines.
> > +         # But, trickily enough, sometimes the second line can also have the
> > +         # word sequences in it. Try to use 'sequences;' (with a semicolon)
> >           read_and_call_until(uhandle, consumer.database_info,
> > !                             contains='sequences;')
> > !         read_and_call(uhandle, consumer.database_info, contains='sequences;')
> >           read_and_call(uhandle, consumer.noevent, blank=1)
> >           read_and_call(uhandle, consumer.noevent,
> >                         contains='problems or questions')
> > 
> > --X1bOJ3K7DJ5YkBrT--
> > 
> > 
> > --__--__--
> > 
> > _______________________________________________
> > BioPython mailing list  -  BioPython@biopython.org
> > http://biopython.org/mailman/listinfo/biopython
> > 
> > 
> > End of BioPython Digest
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython