[Biojava-dev] biojava3-ws alternate NCBIQBlastService implementation

Gediminas Rimša gediminas.rimsa at gmail.com
Sat Feb 11 22:25:10 UTC 2012


Hi,
the new implementation of NCBIQblastService is now on biojava-live. 
There is a simple usage example - refer to demo.NCBIQBlastServiceDemo class.


Also, I couldn't find much about parsing Blast XML results in Java when 
I needed it, so here's a short guide for that. It might not be pretty, 
but it worked for me :)

Step 1. Acquire Blast output in XML format (for example from 
NCBIQBlastService). It will start like this (note the root element 
"BlastOutput"):

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" 
"NCBI_BlastOutput.dtd">
<BlastOutput>
...

Step 2. Acquire referenced schema files - you will need 
NCBI_BlastOutput.dtd, NCBI_BlastOutput.mod.dtd and NCBI_Entity.mod.dtd 
(they can be found on NCBI site or attached to this message).

Step 3. Use XJC to generate Java classes from XML schema.  I used 
Maven's JAXB plugin:

<plugin>
<groupId>org.jvnet.jaxb2.maven2</groupId>
<artifactId>maven-jaxb2-plugin</artifactId>
<version>0.8.0</version>
<executions>
<execution>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<generatePackage>ncbi.blast.result.generated</generatePackage> <!-- 
package name for generated classes -->
<generateDirectory>${basedir}/src/main/java</generateDirectory>
<schemaLanguage>dtd</schemaLanguage>
<schemaIncludes>
<value>outputSchema/NCBI_BlastOutput.dtd</value> <!-- main schema file 
location ( here: /src/main/resources/outputSchema/NCBI_BlastOutput.dtd) -->
</schemaIncludes>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.jvnet.jaxb2-commons</groupId>
<artifactId>property-listener-injector</artifactId>
<version>1.0</version>
</dependency>
</dependencies>
</plugin>

Alternatively, you can do it from command line, example: 
http://plindenbaum.blogspot.com/2010/11/blastxmlannotations.html

Step 4. Put all 3 schema files next to the generated classes (this, 
together with a custom EntityResolver in the next step, is done so that 
you don't have to copy the schema files to every directory in which you 
want to process blast output XML files).

Step 5. Create BlastOutput object representing root XML element:

             JAXBContext jc = JAXBContext.newInstance(BlastOutput.class);
             Unmarshaller u = jc.createUnmarshaller();

             XMLReader xmlreader = XMLReaderFactory.createXMLReader();
             
xmlreader.setFeature("http://xml.org/sax/features/namespaces", true);
             
xmlreader.setFeature("http://xml.org/sax/features/namespace-prefixes", 
true);
             xmlreader.setEntityResolver(new EntityResolver() {
                 public InputSource resolveEntity(String publicId, 
String systemId) throws SAXException, IOException {
                     String file = null;
                     if (systemId.contains("NCBI_BlastOutput.dtd")) {
                         file = "NCBI_BlastOutput.dtd";
                     }
                     if (systemId.contains("NCBI_Entity.mod.dtd")) {
                         file = "NCBI_Entity.mod.dtd";
                     }
                     if (systemId.contains("NCBI_BlastOutput.mod.dtd")) {
                         file = "NCBI_BlastOutput.mod.dtd";
                     }
                     return new 
InputSource(BlastOutput.class.getResourceAsStream(file));
                 }
             });

             InputSource input = new InputSource(new FileReader(new 
File( "blast-results-file.xml" )));
             Source source = new SAXSource(xmlreader, input);

             return (BlastOutput) u.unmarshal(source);

Step 6. Use the created blastOutput like any other Java object. For 
example, if you want to get the number of Blast iterations, you can do 
it like this:
blastOutput.getBlastOutputIterations().getIteration().size()

And that's about it. Hope this helps someone

Gediminas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20120212/80b196e2/attachment-0006.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20120212/80b196e2/attachment-0007.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20120212/80b196e2/attachment-0008.html>


More information about the biojava-dev mailing list