[BioPython] NCBIXML for multiple queries

David Weisman weisman at lydon.com
Mon Jan 16 10:50:26 EST 2006


Hello,

I tried using NCBIXML parsing on a local blast run, in which the input had multiple
query sequences.  Blastall writes multiple xml documents to the output file, and the
SAX parser threw a SAXParseException on the second <?xml...> declaration, complaining
of junk after the document element.

I couldn't find an obvious workaround, so I wrote a python generator function that
returns a new file handle (based on a CStringIO) for each xml document in the stream.
The usage model is:

     import xmlStreamSeparator	# new

     blastInFile = open (blastInPath, "r")   # composite blast output

     x_gen=xmlStreamSeparator.getXmlDoc(blastInFile)
     x_doc=x_gen.next()
     while not xmlStreamSeparator.xmlStreamEOF(x_doc):
         iter=NCBIStandalone.Iterator(x_doc, NCBIXML.BlastParser())

         for b_rec in iter:
             process blast record...

         x_doc=x_gen.next()   # get next xml doc from stream

Any pointers to a better model?  Many thanks for any tips.

Regards,
David



More information about the BioPython mailing list