[Biojava-dev] bioperl like blastparser
Mark Schreiber
markjschreiber at gmail.com
Tue Dec 25 21:44:32 UTC 2007
Hi -
When will the subversion system be ready for checkin?
- Mark
On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
> OK,
> I made four changes,
> in the package org.biojava.bio.program.sax; at class BlastSaxParser
> 1) at line 86 i added the variable
> private String oQueryLength;
> 2) at the method private void interpret(String poLine) throws SAXException
> in the if "if (iState == IN_HEADER) {"
> at line 209 i added
>
> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
> StringTokenizer st = new StringTokenizer(poLine);
> oQueryLength = st.nextToken().substring(1);
> }
> 3)at the function private void emitHeaderIds() throws SAXException {
> at line 564 i added
> oAttQName.setQName("queryLength");
> oAtts.addAttribute(oAttQName.getURI(),
> oAttQName.getLocalName(),
> oAttQName.getQName(),
> "CDATA", oQueryLength);
>
> at the package org.biojava.bio.program.ssbind; in HeaderStAXHandler.java
> 4)at the private class QueryIDStAXHandler at line 95 I changed the
> method startelement
>
> public void startElement(String uri,
> String localName,
> String qName,
> Attributes attr,
> DelegationManager dm)
> throws SAXException
> {
> ssContext.getSearchContentHandler().setQueryID(attr.getValue("id"));
> if (attr.getValue("queryLength") != null)
> {
> ssContext.getSearchContentHandler().addSearchProperty("queryLength",
> attr.getValue("queryLength"));
> }
> }
> }
>
> Now query length is a property of the annotation of a blast result.
> It is really fun to participate in the biojava project.
>
> Best regards,
> Michael
>
>
> On Dec 24, 2007 2:32 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> > Hi -
> >
> > We are currently merging the code base into subversion (from CVS)
> > after this it will be possible to check in code again. For small
> > additions it is usually easier to post the code to the dev list (in
> > the body of the email as the list doesn't like attachments) or send it
> > to one of the regular committers and get them to add it.
> >
> > The JUnit tests are the standard test package. If you have added new
> > functionality it would be a good idea to add another test method in
> > the appropriate JUnit test to make sure it works (and continues to
> > work in the future).
> >
> > - Mark
> >
> >
> > On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com> wrote:
> > > Hi all,
> > >
> > > I've now added the extraction of the query length.
> > > Can someone explain me the procedure of checking in code to biojava ?
> > > I ran the unit tests in the biojava distribution? Are there additional
> > > tests available ?
> > >
> > > Best regards,
> > > Michael
> > >
> > >
> > > On Dec 21, 2007 9:59 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> > > > Hi -
> > > >
> > > > It is not required that you turn all Blast results into objects,
> > > > because it is an event based parser you can do what you want with the
> > > > events including turning them into objects or echoing them to STDOUT.
> > > > Take a look at the examples in the cookbook.
> > > >
> > > > It may be that the query length is actually parsed but is not passed
> > > > onto the object model by the event listeners.
> > > >
> > > > - Mark
> > > >
> > > >
> > > > On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
> > > > > Hi Michael,
> > > > >
> > > > > The blast parser (BlastLikeSaxParser) in BioJava has been around for
> > > > > a while and is frequently being used to parse a variety
> > > > > of different blast outputs. Still it is not complete and can not
> > > > > parse PSI blast. We have had a number of request about it lately
> > > > > so I suppose it needs a little maintenance now.
> > > > >
> > > > > To write a new blast parser from scratch will involve a significant
> > > > > amount of time. It will take time to fix all the bugs, add support
> > > > > for the different blast versions and write documentation. Much of
> > > > > this is already available in BioJava, so I would prefer if you could
> > > > > submit patches for
> > > > > the current blast parser. Would you also be interested to
> > > > > collaborate in this direction?
> > > > > Another feature that would be nice to add support for is the
> > > > > possibility to send off blast searches to webservices...
> > > > >
> > > > > Cheers,
> > > > > Andreas
> > > > >
> > > > >
> > > > >
> > > > > On 20 Dec 2007, at 12:54, Michael Gang wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I used the interface of the java blast parser.
> > > > > > I had mainly two problems with it:
> > > > > > 1) The blast parser does not parse all the information (for example
> > > > > > query length)
> > > > > > 2) The blast parser parses the whole blast report into a list which
> > > > > > eats a lot of memory.
> > > > > >
> > > > > > I would be interested to write and contribute a blast parser which
> > > > > > parses all the information of the blast and parses the blast
> > > > > > iteratively.
> > > > > > Something like the following code in bioperl (just in Java).
> > > > > > use Bio::SearchIO;
> > > > > > # format can be 'fasta', 'blast'
> > > > > > my $searchio = new Bio::SearchIO( -format => 'blastxml',
> > > > > > -file => 'blastout.xml' );
> > > > > > while ( my $result = $searchio->next_result() ) {
> > > > > > while( my $hit = $result->next_hit ) {
> > > > > > # process the Bio::Search::Hit::HitI object
> > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > # process the Bio::Search::HSP::HSPI object
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > Would you be interested in such a contribution ?
> > > > > >
> > > > > > Best regards,
> > > > > > Michael
> > > > > > _______________________________________________
> > > > > > biojava-dev mailing list
> > > > > > biojava-dev at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > > > >
> > > > > -----------------------------------------------------------------------
> > > > >
> > > > > Andreas Prlic Wellcome Trust Sanger Institute
> > > > > Hinxton, Cambridge CB10 1SA, UK
> > > > > +44 (0) 1223 49 6891
> > > > >
> > > > > -----------------------------------------------------------------------
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > The Wellcome Trust Sanger Institute is operated by Genome Research
> > > > > Limited, a charity registered in England with number 1021457 and a
> > > > > company registered in England with number 2742969, whose registered
> > > > > office is 215 Euston Road, London, NW1 2BE.
> > > > >
> > > > > _______________________________________________
> > > > > biojava-dev mailing list
> > > > > biojava-dev at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > > > >
> > > >
> > > _______________________________________________
> > > biojava-dev mailing list
> > > biojava-dev at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > >
> >
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
More information about the biojava-dev
mailing list