[Biojava-dev] bioperl like blastparser

Mark Schreiber markjschreiber at gmail.com
Tue Dec 25 21:44:32 UTC 2007


Hi -

When will the subversion system be ready for checkin?

- Mark

On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
> OK,
> I made four changes,
> in the package  org.biojava.bio.program.sax; at class BlastSaxParser
> 1)  at line 86 i added the variable
> private String                                           oQueryLength;
> 2) at the method private void interpret(String poLine) throws SAXException
> in the if "if (iState == IN_HEADER) {"
> at line 209 i added
>
> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
>                 StringTokenizer st = new StringTokenizer(poLine);
>                 oQueryLength = st.nextToken().substring(1);
>            }
> 3)at the function private void emitHeaderIds() throws SAXException {
> at line 564 i added
>  oAttQName.setQName("queryLength");
>        oAtts.addAttribute(oAttQName.getURI(),
>                           oAttQName.getLocalName(),
>                           oAttQName.getQName(),
>                           "CDATA", oQueryLength);
>
>  at the package  org.biojava.bio.program.ssbind; in HeaderStAXHandler.java
> 4)at the private class QueryIDStAXHandler at line 95 I changed the
> method startelement
>
>        public void startElement(String            uri,
>                                 String            localName,
>                                 String            qName,
>                                 Attributes        attr,
>                                 DelegationManager dm)
>        throws SAXException
>        {
>            ssContext.getSearchContentHandler().setQueryID(attr.getValue("id"));
>            if (attr.getValue("queryLength") != null)
>            {
>                ssContext.getSearchContentHandler().addSearchProperty("queryLength",
> attr.getValue("queryLength"));
>            }
>        }
>    }
>
> Now query length is a property of the annotation  of a blast result.
> It is really fun to participate in the biojava project.
>
> Best regards,
> Michael
>
>
> On Dec 24, 2007 2:32 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> > Hi -
> >
> > We are currently merging the code base into subversion (from CVS)
> > after this it will be possible to check in code again.  For small
> > additions it is usually easier to post the code to the dev list (in
> > the body of the email as the list doesn't like attachments) or send it
> > to one of the regular committers and get them to add it.
> >
> > The JUnit tests are the standard test package. If you have added new
> > functionality it would be a good idea to add another test method in
> > the appropriate JUnit test to make sure it works (and continues to
> > work in the future).
> >
> > - Mark
> >
> >
> > On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com> wrote:
> > > Hi all,
> > >
> > > I've now added the extraction of the query length.
> > > Can someone explain me the procedure of checking in code to biojava ?
> > > I ran the unit tests in the biojava distribution? Are there additional
> > > tests available ?
> > >
> > > Best regards,
> > > Michael
> > >
> > >
> > > On Dec 21, 2007 9:59 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> > > > Hi -
> > > >
> > > > It is not required that you turn all Blast results into objects,
> > > > because it is an event based parser you can do what you want with the
> > > > events including turning them into objects or echoing them to STDOUT.
> > > > Take a look at the examples in the cookbook.
> > > >
> > > > It may be that the query length is actually parsed but is not passed
> > > > onto the object model by the event listeners.
> > > >
> > > > - Mark
> > > >
> > > >
> > > > On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
> > > > > Hi Michael,
> > > > >
> > > > > The blast parser (BlastLikeSaxParser) in BioJava has been around for
> > > > > a while and is frequently being used to parse a variety
> > > > > of different blast outputs. Still it is not complete and can not
> > > > > parse PSI blast. We have had a number of request about it lately
> > > > > so I suppose it needs a little maintenance now.
> > > > >
> > > > > To write a new blast parser from scratch will involve a significant
> > > > > amount of time. It will take time to fix all the bugs, add support
> > > > > for the different blast versions and write documentation. Much of
> > > > > this is already available in BioJava, so I would prefer if you could
> > > > > submit patches for
> > > > > the current blast parser.  Would you also be interested to
> > > > > collaborate in this direction?
> > > > > Another feature that would be nice to add support for is the
> > > > > possibility to send off blast searches to webservices...
> > > > >
> > > > > Cheers,
> > > > > Andreas
> > > > >
> > > > >
> > > > >
> > > > > On 20 Dec 2007, at 12:54, Michael Gang wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I used the interface of the java blast parser.
> > > > > > I had mainly two problems with it:
> > > > > > 1) The blast parser does not parse all the information (for example
> > > > > > query length)
> > > > > > 2) The blast parser parses the whole blast report into a list which
> > > > > > eats a lot of memory.
> > > > > >
> > > > > > I would be interested to write and contribute a blast parser which
> > > > > > parses all the information of the blast and parses the blast
> > > > > > iteratively.
> > > > > > Something like the following code in bioperl (just in Java).
> > > > > >   use Bio::SearchIO;
> > > > > >     # format can be 'fasta', 'blast'
> > > > > >     my $searchio = new Bio::SearchIO( -format => 'blastxml',
> > > > > >                                       -file   => 'blastout.xml' );
> > > > > >     while ( my $result = $searchio->next_result() ) {
> > > > > >        while( my $hit = $result->next_hit ) {
> > > > > >         # process the Bio::Search::Hit::HitI object
> > > > > >            while( my $hsp = $hit->next_hsp ) {
> > > > > >             # process the Bio::Search::HSP::HSPI object
> > > > > >         }
> > > > > >     }
> > > > > >
> > > > > > Would you be interested in such a contribution ?
> > > > > >
> > > > > > Best regards,
> > > > > > Michael
> > > > > > _______________________________________________
> > > > > > biojava-dev mailing list
> > > > > > biojava-dev at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > > > >
> > > > > -----------------------------------------------------------------------
> > > > >
> > > > > Andreas Prlic      Wellcome Trust Sanger Institute
> > > > >                               Hinxton, Cambridge CB10 1SA, UK
> > > > >                               +44 (0) 1223 49 6891
> > > > >
> > > > > -----------------------------------------------------------------------
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >  The Wellcome Trust Sanger Institute is operated by Genome Research
> > > > >  Limited, a charity registered in England with number 1021457 and a
> > > > >  company registered in England with number 2742969, whose registered
> > > > >  office is 215 Euston Road, London, NW1 2BE.
> > > > >
> > > > > _______________________________________________
> > > > > biojava-dev mailing list
> > > > > biojava-dev at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > > > >
> > > >
> > > _______________________________________________
> > > biojava-dev mailing list
> > > biojava-dev at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > >
> >
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



More information about the biojava-dev mailing list