[Biojava-dev] Fwd: [blast-announce] New Version of BLAST XML output

Jose Manuel Duarte jose.duarte at psi.ch
Mon May 11 09:53:18 UTC 2015


To be honest I don't really know gmap, that's because I'm not into 
sequencing data at all. My use-case is protein database search, that's 
why something like lambda that can do both dna and protein is so 
important for me. If I understand it, gmap wouldn't be able to do 
protein searches, would it?

In the lambda publication 
(http://bioinformatics.oxfordjournals.org/content/30/17/i349.full) the 
authors compare it to a few other methods (blast, pauda, rapsearch2, 
ublast), but not to gmap.

SANSparallel is another new method which is apparently also very fast. 
In their publication 
(http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.long) 
they compare to some others, but no gmap again.

Jose



On 11.05.2015 11:30, Erik McKee wrote:
>
> How does gmap compare to these?
>
> On May 11, 2015 5:26 AM, "Jose Manuel Duarte" <jose.duarte at psi.ch 
> <mailto:jose.duarte at psi.ch>> wrote:
>
>     Just one more comment regarding alternatives to blast. Recently
>     I've come across such an alternative that is not as sensitive as
>     blast but a lot faster, it's called lambda:
>
>     http://www.seqan.de/projects/lambda/
>
>     I've tried it out and I'm very impressed with the results, it can
>     do full UniRef100 searches in a split of a second. There are still
>     some issues to iron out, especially in the indexing which is very
>     memory and disk hungry. But all in all it does seem to be a real
>     alternative to blast.
>
>     Their output is blast compatible: they can do either classic
>     pairwise output (-m 0) or tabular output (-m 8). No XML output yet
>     though.
>
>     So this would support the case to have some kind of framework that
>     can deal with the results of a sequence homology search. The
>     actual parsers would be then implemented on a per-case basis.
>
>     Jose
>
>
>
>     On 10.05.2015 14:04, Paolo Pavan wrote:
>>     Hello!
>>     I obviously share the opinion of Peter and Jose. Moreover, as
>>     already written, I have used this new feature in a second work
>>     that I could also describe and submit to biojava, if of any interest.
>>
>>     About Andreas' questions:
>>     " Does your module support psiblast, rpsblast, tblastx and blast+
>>     and what versions?": At now, it supports the blastn, blastp,
>>     blastx, tblastn and tblastx version 2.2.29. I'm not very sure
>>     about psiblast and rpsblast, I should test it.
>>     But it has been designed so that to update a single parser (as
>>     well to add a new search program and still remaining in the
>>     designed framework) there will be the need to write just a single
>>     class. This will keep the code simple and neat, very important in
>>     my opinion for future developers.
>>
>>     "the disadvantage is that you constantly need to update them to
>>     the variant of blast plus version of the output file format":
>>     this unfortunately is a problem that everyone of us have to face
>>     if wants to use new ncbi programs. It happened for legacy-blast,
>>     it happened a lot of time for genbank format, it is happening for
>>     blast+. Just hoping that they would have the kindness explicit
>>     the format version inside the xml if not to name the program
>>     itself in different way, such for example blast3 or blast++, to
>>     avoid confusion. We can't do anything about that, we can just try
>>     to make the things simple and easy to reuse.
>>
>>     Just to express my opinion, I think that every bio project should
>>     first of all address theese "base level" problem more than others
>>     to allow the developer to focus on higher abstraction details.
>>     I'm sure that this will be appreciated by the community,
>>     increasing the base of users of biojava.
>>
>>     Paolo
>>
>>     2015-05-06 12:15 GMT+02:00 Jose Manuel Duarte <jose.duarte at psi.ch
>>     <mailto:jose.duarte at psi.ch>>:
>>
>>         I'd say that having some common data structure to model the
>>         output of a sequence homology search should be benefitial.
>>         For instance a blast alternative might appear one day (I'm
>>         eagerly awaiting for it!). The common data structure should
>>         be able to model the outputs of any of the different softwares.
>>
>>         There are already some alternatives to blast:
>>
>>         SANS and SANSparallel by Liisa Holm
>>         (http://www.ncbi.nlm.nih.gov/pubmed/22962464,
>>         http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full)
>>         USEARCH (commercial) (http://drive5.com/usearch/)
>>         BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html#blat3)
>>
>>         In fact SANSparallel looks very promising, it's incredibly
>>         fast though less sensitive than blast.
>>
>>         Cheers
>>
>>         Jose
>>
>>
>>
>>
>>         On 06.05.2015 10 <tel:06.05.2015%2010>:47, Peter Cock wrote:
>>
>>             On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic
>>             <andreas at sdsc.edu <mailto:andreas at sdsc.edu>> wrote:
>>
>>                 On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan
>>                 <paolo.pavan at gmail.com
>>                 <mailto:paolo.pavan at gmail.com>> wrote:
>>
>>                     As seen in other Bio projects, aside with
>>                     Sequence IO and Alignment IO
>>                     procedures it could have a Search result IO also.
>>
>>                 I never understood why other Bio* projects have
>>                 special Blast modules.
>>                 Perhaps XML parsing is not as easy as it is in Java?
>>                 Please see the code at
>>                 the bottom of this message.
>>
>>             Python at least has a range of XML parsing libraries
>>             which are up to the
>>             task. However, as Paolo wrote:
>>
>>                     The advantage is to define common data structures
>>                     that models Hsp, Hits,
>>                     Results without taking care (ie. making
>>                     abstraction) of the underlying
>>                     search program.
>>
>>             This is the big advantage of BioPerl and Biopython's
>>             SearchIO module.
>>             You can at least in theory switch between parsing BLAST
>>             XML, BLAST
>>             tabular, BLAST plain text (shudder), or another related
>>             format without
>>             major changes to your code.
>>
>>                 and the disadvantage is that you constantly need to
>>                 update them to the
>>                 variant of blast plus version of the output file format.
>>
>>             I think it is much better to have this housekeeping done
>>             once centrally in
>>             a Bio* library that re-invented by anyone parsing the
>>             BLAST output.
>>             However, the NCBI BLAST XML output has been fairly
>>             stable, and the
>>             new output has a formal schema so should be even more
>>             dependable.
>>
>>             Peter
>>             _______________________________________________
>>             biojava-dev mailing list
>>             biojava-dev at mailman.open-bio.org
>>             <mailto:biojava-dev at mailman.open-bio.org>
>>             http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>>         _______________________________________________
>>         biojava-dev mailing list
>>         biojava-dev at mailman.open-bio.org
>>         <mailto:biojava-dev at mailman.open-bio.org>
>>         http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>
>
>     _______________________________________________
>     biojava-dev mailing list
>     biojava-dev at mailman.open-bio.org
>     <mailto:biojava-dev at mailman.open-bio.org>
>     http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150511/5a46d175/attachment-0001.html>


More information about the biojava-dev mailing list