[Biojava-dev] biojava3 BLAST parser

Scooter Willis HWillis at scripps.edu
Tue Aug 31 14:11:30 UTC 2010


Deniz

It would be great to formalize the XML blast results as Java classes. Do you have any interest in taking on the project?

Capturing the blast alignment using the new alignment classes would be a very nice feature. I like using XPATH as the query language to select for hits of interest which should allow for a SAX based approach to minimize the impact of very large XML files. XPATH and SAX does appear to have some constraints (http://stackoverflow.com/questions/1863250/is-it-there-any-xpath-processor-for-sax-model)

Probably makes sense to have a Blast module that would depend on core and alignment.

Thanks

Scooter



On Aug 31, 2010, at 8:49 AM, Deniz Koellhofer wrote:

Hi Scooter,

Thanks for the reply. I guess the BlastXMLQuery is a good example to show how to quickly extract information from a BLAST result.

But in my opinion biojava3 should alo have a Blast parser that generates java beans containing the complete Blast result set - similar to what biojava1.7.1 was doing. So yeah, I'm after translating the XML elements to Java classes.

Would something like that fit into one of the biojava3 modules? homology, I/O?

Thanks,
Deniz


On Tue, Aug 31, 2010 at 8:43 PM, Scooter Willis <HWillis at scripps.edu<mailto:HWillis at scripps.edu>> wrote:
Deniz

Can you provide some requirements regarding parsing the Blast XML. I tend to use XPATH and the DOM object to get to the data elements of interest so you already have the ability to load the Blast XML and work with the data. The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is an example of searching the Blast XML to get results. Are you wanting the XML elements translated to Java classes?

Thanks

Scooter

On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote:

> Hi,
>
> I wanted to find out the current state of blast parsing efforts in biojava3
> - especially for ncbi blastxml output?
>
> I had a quick look and found some DOM based code fragments
> in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody working
> on a more comprehensive SAX parser?
>
> The biojava1.7.1 blastxml parser seems to work fine, however some of the
> tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param don't
> seem to get parsed properly
> in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.
>
> Cheers,
> Deniz
>
> --
> Deniz Koellhofer
> Cambia
> Initiative for Open Innovation (IOI)
> Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org<mailto:biojava-dev at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/biojava-dev




--
--
Deniz Koellhofer
Cambia
Initiative for Open Innovation (IOI)
Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia





More information about the biojava-dev mailing list