[Biojava-dev] bioperl like blastparser

Michael Gang michaelgang at gmail.com
Mon Dec 24 08:29:45 UTC 2007


OK,
I made four changes,
in the package  org.biojava.bio.program.sax; at class BlastSaxParser
1)  at line 86 i added the variable
private String 						 oQueryLength;
2) at the method private void interpret(String poLine) throws SAXException
in the if "if (iState == IN_HEADER) {"
at line 209 i added

if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
            	 StringTokenizer st = new StringTokenizer(poLine);
            	 oQueryLength = st.nextToken().substring(1);
            }
3)at the function private void emitHeaderIds() throws SAXException {
at line 564 i added
 oAttQName.setQName("queryLength");
        oAtts.addAttribute(oAttQName.getURI(),
                           oAttQName.getLocalName(),
                           oAttQName.getQName(),
                           "CDATA", oQueryLength);

 at the package  org.biojava.bio.program.ssbind; in HeaderStAXHandler.java
4)at the private class QueryIDStAXHandler at line 95 I changed the
method startelement

        public void startElement(String            uri,
                                 String            localName,
                                 String            qName,
                                 Attributes        attr,
                                 DelegationManager dm)
        throws SAXException
        {
            ssContext.getSearchContentHandler().setQueryID(attr.getValue("id"));
            if (attr.getValue("queryLength") != null)
            {
            	ssContext.getSearchContentHandler().addSearchProperty("queryLength",
attr.getValue("queryLength"));
            }
        }
    }

Now query length is a property of the annotation  of a blast result.
It is really fun to participate in the biojava project.

Best regards,
Michael

On Dec 24, 2007 2:32 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> Hi -
>
> We are currently merging the code base into subversion (from CVS)
> after this it will be possible to check in code again.  For small
> additions it is usually easier to post the code to the dev list (in
> the body of the email as the list doesn't like attachments) or send it
> to one of the regular committers and get them to add it.
>
> The JUnit tests are the standard test package. If you have added new
> functionality it would be a good idea to add another test method in
> the appropriate JUnit test to make sure it works (and continues to
> work in the future).
>
> - Mark
>
>
> On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com> wrote:
> > Hi all,
> >
> > I've now added the extraction of the query length.
> > Can someone explain me the procedure of checking in code to biojava ?
> > I ran the unit tests in the biojava distribution? Are there additional
> > tests available ?
> >
> > Best regards,
> > Michael
> >
> >
> > On Dec 21, 2007 9:59 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> > > Hi -
> > >
> > > It is not required that you turn all Blast results into objects,
> > > because it is an event based parser you can do what you want with the
> > > events including turning them into objects or echoing them to STDOUT.
> > > Take a look at the examples in the cookbook.
> > >
> > > It may be that the query length is actually parsed but is not passed
> > > onto the object model by the event listeners.
> > >
> > > - Mark
> > >
> > >
> > > On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
> > > > Hi Michael,
> > > >
> > > > The blast parser (BlastLikeSaxParser) in BioJava has been around for
> > > > a while and is frequently being used to parse a variety
> > > > of different blast outputs. Still it is not complete and can not
> > > > parse PSI blast. We have had a number of request about it lately
> > > > so I suppose it needs a little maintenance now.
> > > >
> > > > To write a new blast parser from scratch will involve a significant
> > > > amount of time. It will take time to fix all the bugs, add support
> > > > for the different blast versions and write documentation. Much of
> > > > this is already available in BioJava, so I would prefer if you could
> > > > submit patches for
> > > > the current blast parser.  Would you also be interested to
> > > > collaborate in this direction?
> > > > Another feature that would be nice to add support for is the
> > > > possibility to send off blast searches to webservices...
> > > >
> > > > Cheers,
> > > > Andreas
> > > >
> > > >
> > > >
> > > > On 20 Dec 2007, at 12:54, Michael Gang wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I used the interface of the java blast parser.
> > > > > I had mainly two problems with it:
> > > > > 1) The blast parser does not parse all the information (for example
> > > > > query length)
> > > > > 2) The blast parser parses the whole blast report into a list which
> > > > > eats a lot of memory.
> > > > >
> > > > > I would be interested to write and contribute a blast parser which
> > > > > parses all the information of the blast and parses the blast
> > > > > iteratively.
> > > > > Something like the following code in bioperl (just in Java).
> > > > >   use Bio::SearchIO;
> > > > >     # format can be 'fasta', 'blast'
> > > > >     my $searchio = new Bio::SearchIO( -format => 'blastxml',
> > > > >                                       -file   => 'blastout.xml' );
> > > > >     while ( my $result = $searchio->next_result() ) {
> > > > >        while( my $hit = $result->next_hit ) {
> > > > >         # process the Bio::Search::Hit::HitI object
> > > > >            while( my $hsp = $hit->next_hsp ) {
> > > > >             # process the Bio::Search::HSP::HSPI object
> > > > >         }
> > > > >     }
> > > > >
> > > > > Would you be interested in such a contribution ?
> > > > >
> > > > > Best regards,
> > > > > Michael
> > > > > _______________________________________________
> > > > > biojava-dev mailing list
> > > > > biojava-dev at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > > >
> > > > -----------------------------------------------------------------------
> > > >
> > > > Andreas Prlic      Wellcome Trust Sanger Institute
> > > >                               Hinxton, Cambridge CB10 1SA, UK
> > > >                               +44 (0) 1223 49 6891
> > > >
> > > > -----------------------------------------------------------------------
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >  The Wellcome Trust Sanger Institute is operated by Genome Research
> > > >  Limited, a charity registered in England with number 1021457 and a
> > > >  company registered in England with number 2742969, whose registered
> > > >  office is 215 Euston Road, London, NW1 2BE.
> > > >
> > > > _______________________________________________
> > > > biojava-dev mailing list
> > > > biojava-dev at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > > >
> > >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>



More information about the biojava-dev mailing list