[BioPython] Need help parsing Blastoutput

Michiel De Hoon mdehoon at c2b2.columbia.edu
Mon Apr 24 18:14:17 UTC 2006


Ha, I see. My stupid email program was removing the XML file from your email
messages for security reasons something or other.
Anyway, I got the XML files from the mailing list archives.

The XML file from Thursday April 20 is different from the one sent on Monday
April 24. In fact, the latter seems to be damaged; in line 194, it has:

<?xml version="1.1?>

while the former has

<?xml version="1.0"?>

So in the latter a " is missing for some reason.

Anyway, the XML parser can read the XML file from Thursday April 20 if you
fix a few things in your script:

*) Instead of
b_record = b_parser.parse(b_out)
you need
b_iterator = NCBIStandalone.Iterator(b_out, b_parser)
(and then you should also import NCBIStandalone)

*) You should check if b_record is None immediately after b_record =
b_iterator.next().

*) There is no hsp.sbject_end



--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032



-----Original Message-----
From: Halima Rabiu [mailto:halima at cbio.uct.ac.za]
Sent: Mon 4/24/2006 4:45 AM
To: Michiel De Hoon
Cc: biopython at lists.open-bio.org
Subject: RE: [BioPython] Need help parsing Blastoutput
 
Hi 
attch here is the output xml out I also attached it in my previous post 
thanks
Halimah

On Thu, 20 Apr 2006, Michiel De Hoon wrote:

> Could you send us the Blast XML output also?
> 
> --Michiel.
> 
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
> 
> 
> 
> -----Original Message-----
> From: Halima Rabiu [mailto:halima at cbio.uct.ac.za]
> Sent: Thu 4/20/2006 7:57 AM
> To: Michiel De Hoon
> Cc: biopython at lists.open-bio.org
> Subject: RE: [BioPython] Need help parsing Blastoutput
>  
> thanks I try using XML parser and I am still geting errors which I dont 
> understand . please see the attchmnt copy of my script and Blast XML 
> output.
> here is the error
> raceback (most recent call last):
>   File "Bioperser.py", line 11, in ?
>     b_record = b_parser.parse(b_out)
>   File "/usr/local/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 
> 112, in parse
>     self._parser.parse(handler)
>   File "/usr/local//lib/python2.4/xml/sax/expatreader.py", line 107, in 
> parse
>     xmlreader.IncrementalParser.parse(self, source)
>   File "/usr/local//lib/python2.4/xml/sax/xmlreader.py", line 123, in 
> parse
>     self.feed(buffer)
>   File "/usr/local//lib/python2.4/xml/sax/expatreader.py", line 211, in 
> feed
>     self._err_handler.fatalError(exc)
>   File "/usr/local//lib/python2.4/xml/sax/handler.py", line 38, in 
> fatalError
>     raise exception
> thanks
> Halimah
> 
> On Wed, 19 Apr 2006, Michiel De Hoon wrote:
> 
> > The Blast parser fails to read your file because the format of Blast
output
> > has changed. If I edit the data file so that it corresponds to the old
> format
> > (add a space here, remove a blank line there, etc.), the Blast parser
reads
> > the file without problems. The easiest solution is to repeat the Blast
run,
> > using XML for the output format, and use the Blast XML parser in
Biopython
> to
> > parse the results.
> > 
> > A general question is if anybody still needs the parser for Blast text
> > output. Currently, we are confusing our users by having a Blast text
parser
> > that tends to break. A broken parser may be worse than no parser.
> > 
> > --Michiel.
> > 
> > Michiel de Hoon
> > Center for Computational Biology and Bioinformatics
> > Columbia University
> > 1150 St Nicholas Avenue
> > New York, NY 10032
> > 
> > 
> > 
> > -----Original Message-----
> > From: Halima Rabiu [mailto:halima at cbio.uct.ac.za]
> > Sent: Wed 4/19/2006 6:15 AM
> > To: Michiel De Hoon
> > Cc: biopython at lists.open-bio.org
> > Subject: RE: [BioPython] Need help parsing Blastoutput
> >  
> > Hi 
> > Please see the attachment,it part of my Blast output.
> > yes I am try to parse text output from Blast ,I have use another script
to 
> > run my local blast that I am trying to perse the
NCBIStandalone.BlastParser
> 
> > was working fine without hsp.sbject_end  which is one of what I need to 
> > print out .
> > On checking the class diagrams from cookbook, findout that sbject_end is 
> > not included .I just need another way of printing the int(subject end).
> > Thanks for your help
> > Halimah
> > 
> > On Tue, 18 Apr 2006, Michiel De Hoon wrote:
> > 
> > > Could you also send us the file Enterococcus_out so we can run the
> script?
> > > 
> > > From the script, it looks like you're trying to parse text output from
> > Blast.
> > > While this is possible (in theory), the format of Blast text output
tends
> > to
> > > change a lot, thereby breaking the parser in Biopython. It is more
> reliable
> > > to have Blast generate output in XML format, and use the XML parser:
> > > 
> > > blast_out = open('my_blast.xml', 'r')
> > > 
> > > from Bio.Blast import NCBIXML
> > > 
> > > b_parser = NCBIXML.BlastParser()
> > > b_record = b_parser.parse(blast_out)
> > > 
> > > See section 3.1.2 in the Biopython cookbook, and section 3.1.4 on how
to
> > > generate Blast output in XML.
> > > 
> > > --Michiel.
> > > 
> > > 
> > > 
> > > Michiel de Hoon
> > > Center for Computational Biology and Bioinformatics
> > > Columbia University
> > > 1150 St Nicholas Avenue
> > > New York, NY 10032
> > > 
> > > 
> > > 
> > > -----Original Message-----
> > > From: Halima Rabiu [mailto:halima at cbio.uct.ac.za]
> > > Sent: Tue 4/18/2006 11:06 AM
> > > To: Michiel De Hoon
> > > Cc: biopython at lists.open-bio.org
> > > Subject: RE: [BioPython] Need help parsing Blastoutput
> > >  
> > > thanks
> > > please see the attchment a copy of my script and copy of my Blast
output
> > > Thanks
> > > 
> > > 
> > > On Thu, 13 Apr 2006, Michiel De Hoon wrote:
> > > 
> > > > Could you send us the script you were using?
> > > > 
> > > > --Michiel.
> > > > 
> > > > Michiel de Hoon
> > > > Center for Computational Biology and Bioinformatics
> > > > Columbia University
> > > > 1150 St Nicholas Avenue
> > > > New York, NY 10032
> > > > 
> > > > 
> > > > 
> > > > -----Original Message-----
> > > > From: biopython-bounces at lists.open-bio.org on behalf of Halima Rabiu
> > > > Sent: Thu 4/13/2006 11:07 AM
> > > > To: biopython at lists.open-bio.org
> > > > Subject: [BioPython] Need help parsing Blastoutput
> > > >  
> > > > Hi All,
> > > > I have a BLAST output from a local blast
> > > > I need to calculate my % alignment coverage as regard to my subject
> > > > I try parsed the blast output and wanted to print the
> > > > sbjct Start and Sbjct end. but I could not is there anyway I could
this
> 
> > > > try to get mach coverage between my querry and subject I dont need 
> > > > Identities,but total % alignment for querry or subject.
> > > > Thanks
> > > > Halimah
> > > > 
> > > > _______________________________________________
> > > > BioPython mailing list  -  BioPython at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/biopython
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 





More information about the Biopython mailing list