[Biojava-dev] bioperl like blastparser

Andreas Prlic ap3 at sanger.ac.uk
Thu Dec 20 16:15:31 UTC 2007


Hi Michael,

The blast parser (BlastLikeSaxParser) in BioJava has been around for  
a while and is frequently being used to parse a variety
of different blast outputs. Still it is not complete and can not  
parse PSI blast. We have had a number of request about it lately
so I suppose it needs a little maintenance now.

To write a new blast parser from scratch will involve a significant  
amount of time. It will take time to fix all the bugs, add support  
for the different blast versions and write documentation. Much of  
this is already available in BioJava, so I would prefer if you could  
submit patches for
the current blast parser.  Would you also be interested to  
collaborate in this direction?
Another feature that would be nice to add support for is the  
possibility to send off blast searches to webservices...

Cheers,
Andreas


On 20 Dec 2007, at 12:54, Michael Gang wrote:

> Hi All,
>
> I used the interface of the java blast parser.
> I had mainly two problems with it:
> 1) The blast parser does not parse all the information (for example
> query length)
> 2) The blast parser parses the whole blast report into a list which
> eats a lot of memory.
>
> I would be interested to write and contribute a blast parser which
> parses all the information of the blast and parses the blast
> iteratively.
> Something like the following code in bioperl (just in Java).
>   use Bio::SearchIO;
>     # format can be 'fasta', 'blast'
>     my $searchio = new Bio::SearchIO( -format => 'blastxml',
>                                       -file   => 'blastout.xml' );
>     while ( my $result = $searchio->next_result() ) {
>        while( my $hit = $result->next_hit ) {
>         # process the Bio::Search::Hit::HitI object
>            while( my $hsp = $hit->next_hsp ) {
>             # process the Bio::Search::HSP::HSPI object
>         }
>     }
>
> Would you be interested in such a contribution ?
>
> Best regards,
> Michael
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the biojava-dev mailing list