[Biojava-dev] bioperl like blastparser

Mark Schreiber markjschreiber at gmail.com
Mon Dec 24 00:32:32 UTC 2007


Hi -

We are currently merging the code base into subversion (from CVS)
after this it will be possible to check in code again.  For small
additions it is usually easier to post the code to the dev list (in
the body of the email as the list doesn't like attachments) or send it
to one of the regular committers and get them to add it.

The JUnit tests are the standard test package. If you have added new
functionality it would be a good idea to add another test method in
the appropriate JUnit test to make sure it works (and continues to
work in the future).

- Mark

On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com> wrote:
> Hi all,
>
> I've now added the extraction of the query length.
> Can someone explain me the procedure of checking in code to biojava ?
> I ran the unit tests in the biojava distribution? Are there additional
> tests available ?
>
> Best regards,
> Michael
>
>
> On Dec 21, 2007 9:59 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> > Hi -
> >
> > It is not required that you turn all Blast results into objects,
> > because it is an event based parser you can do what you want with the
> > events including turning them into objects or echoing them to STDOUT.
> > Take a look at the examples in the cookbook.
> >
> > It may be that the query length is actually parsed but is not passed
> > onto the object model by the event listeners.
> >
> > - Mark
> >
> >
> > On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
> > > Hi Michael,
> > >
> > > The blast parser (BlastLikeSaxParser) in BioJava has been around for
> > > a while and is frequently being used to parse a variety
> > > of different blast outputs. Still it is not complete and can not
> > > parse PSI blast. We have had a number of request about it lately
> > > so I suppose it needs a little maintenance now.
> > >
> > > To write a new blast parser from scratch will involve a significant
> > > amount of time. It will take time to fix all the bugs, add support
> > > for the different blast versions and write documentation. Much of
> > > this is already available in BioJava, so I would prefer if you could
> > > submit patches for
> > > the current blast parser.  Would you also be interested to
> > > collaborate in this direction?
> > > Another feature that would be nice to add support for is the
> > > possibility to send off blast searches to webservices...
> > >
> > > Cheers,
> > > Andreas
> > >
> > >
> > >
> > > On 20 Dec 2007, at 12:54, Michael Gang wrote:
> > >
> > > > Hi All,
> > > >
> > > > I used the interface of the java blast parser.
> > > > I had mainly two problems with it:
> > > > 1) The blast parser does not parse all the information (for example
> > > > query length)
> > > > 2) The blast parser parses the whole blast report into a list which
> > > > eats a lot of memory.
> > > >
> > > > I would be interested to write and contribute a blast parser which
> > > > parses all the information of the blast and parses the blast
> > > > iteratively.
> > > > Something like the following code in bioperl (just in Java).
> > > >   use Bio::SearchIO;
> > > >     # format can be 'fasta', 'blast'
> > > >     my $searchio = new Bio::SearchIO( -format => 'blastxml',
> > > >                                       -file   => 'blastout.xml' );
> > > >     while ( my $result = $searchio->next_result() ) {
> > > >        while( my $hit = $result->next_hit ) {
> > > >         # process the Bio::Search::Hit::HitI object
> > > >            while( my $hsp = $hit->next_hsp ) {
> > > >             # process the Bio::Search::HSP::HSPI object
> > > >         }
> > > >     }
> > > >
> > > > Would you be interested in such a contribution ?
> > > >
> > > > Best regards,
> > > > Michael
> > > > _______________________________________________
> > > > biojava-dev mailing list
> > > > biojava-dev at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > >
> > > -----------------------------------------------------------------------
> > >
> > > Andreas Prlic      Wellcome Trust Sanger Institute
> > >                               Hinxton, Cambridge CB10 1SA, UK
> > >                               +44 (0) 1223 49 6891
> > >
> > > -----------------------------------------------------------------------
> > >
> > >
> > >
> > >
> > > --
> > >  The Wellcome Trust Sanger Institute is operated by Genome Research
> > >  Limited, a charity registered in England with number 1021457 and a
> > >  company registered in England with number 2742969, whose registered
> > >  office is 215 Euston Road, London, NW1 2BE.
> > >
> > > _______________________________________________
> > > biojava-dev mailing list
> > > biojava-dev at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > >
> >
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



More information about the biojava-dev mailing list