[Biojava-dev] Fwd: [blast-announce] New Version of BLAST XML output
Jose Manuel Duarte
jose.duarte at psi.ch
Mon May 11 09:53:18 UTC 2015
To be honest I don't really know gmap, that's because I'm not into
sequencing data at all. My use-case is protein database search, that's
why something like lambda that can do both dna and protein is so
important for me. If I understand it, gmap wouldn't be able to do
protein searches, would it?
In the lambda publication
(http://bioinformatics.oxfordjournals.org/content/30/17/i349.full) the
authors compare it to a few other methods (blast, pauda, rapsearch2,
ublast), but not to gmap.
SANSparallel is another new method which is apparently also very fast.
In their publication
(http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.long)
they compare to some others, but no gmap again.
Jose
On 11.05.2015 11:30, Erik McKee wrote:
>
> How does gmap compare to these?
>
> On May 11, 2015 5:26 AM, "Jose Manuel Duarte" <jose.duarte at psi.ch
> <mailto:jose.duarte at psi.ch>> wrote:
>
> Just one more comment regarding alternatives to blast. Recently
> I've come across such an alternative that is not as sensitive as
> blast but a lot faster, it's called lambda:
>
> http://www.seqan.de/projects/lambda/
>
> I've tried it out and I'm very impressed with the results, it can
> do full UniRef100 searches in a split of a second. There are still
> some issues to iron out, especially in the indexing which is very
> memory and disk hungry. But all in all it does seem to be a real
> alternative to blast.
>
> Their output is blast compatible: they can do either classic
> pairwise output (-m 0) or tabular output (-m 8). No XML output yet
> though.
>
> So this would support the case to have some kind of framework that
> can deal with the results of a sequence homology search. The
> actual parsers would be then implemented on a per-case basis.
>
> Jose
>
>
>
> On 10.05.2015 14:04, Paolo Pavan wrote:
>> Hello!
>> I obviously share the opinion of Peter and Jose. Moreover, as
>> already written, I have used this new feature in a second work
>> that I could also describe and submit to biojava, if of any interest.
>>
>> About Andreas' questions:
>> " Does your module support psiblast, rpsblast, tblastx and blast+
>> and what versions?": At now, it supports the blastn, blastp,
>> blastx, tblastn and tblastx version 2.2.29. I'm not very sure
>> about psiblast and rpsblast, I should test it.
>> But it has been designed so that to update a single parser (as
>> well to add a new search program and still remaining in the
>> designed framework) there will be the need to write just a single
>> class. This will keep the code simple and neat, very important in
>> my opinion for future developers.
>>
>> "the disadvantage is that you constantly need to update them to
>> the variant of blast plus version of the output file format":
>> this unfortunately is a problem that everyone of us have to face
>> if wants to use new ncbi programs. It happened for legacy-blast,
>> it happened a lot of time for genbank format, it is happening for
>> blast+. Just hoping that they would have the kindness explicit
>> the format version inside the xml if not to name the program
>> itself in different way, such for example blast3 or blast++, to
>> avoid confusion. We can't do anything about that, we can just try
>> to make the things simple and easy to reuse.
>>
>> Just to express my opinion, I think that every bio project should
>> first of all address theese "base level" problem more than others
>> to allow the developer to focus on higher abstraction details.
>> I'm sure that this will be appreciated by the community,
>> increasing the base of users of biojava.
>>
>> Paolo
>>
>> 2015-05-06 12:15 GMT+02:00 Jose Manuel Duarte <jose.duarte at psi.ch
>> <mailto:jose.duarte at psi.ch>>:
>>
>> I'd say that having some common data structure to model the
>> output of a sequence homology search should be benefitial.
>> For instance a blast alternative might appear one day (I'm
>> eagerly awaiting for it!). The common data structure should
>> be able to model the outputs of any of the different softwares.
>>
>> There are already some alternatives to blast:
>>
>> SANS and SANSparallel by Liisa Holm
>> (http://www.ncbi.nlm.nih.gov/pubmed/22962464,
>> http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full)
>> USEARCH (commercial) (http://drive5.com/usearch/)
>> BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html#blat3)
>>
>> In fact SANSparallel looks very promising, it's incredibly
>> fast though less sensitive than blast.
>>
>> Cheers
>>
>> Jose
>>
>>
>>
>>
>> On 06.05.2015 10 <tel:06.05.2015%2010>:47, Peter Cock wrote:
>>
>> On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic
>> <andreas at sdsc.edu <mailto:andreas at sdsc.edu>> wrote:
>>
>> On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan
>> <paolo.pavan at gmail.com
>> <mailto:paolo.pavan at gmail.com>> wrote:
>>
>> As seen in other Bio projects, aside with
>> Sequence IO and Alignment IO
>> procedures it could have a Search result IO also.
>>
>> I never understood why other Bio* projects have
>> special Blast modules.
>> Perhaps XML parsing is not as easy as it is in Java?
>> Please see the code at
>> the bottom of this message.
>>
>> Python at least has a range of XML parsing libraries
>> which are up to the
>> task. However, as Paolo wrote:
>>
>> The advantage is to define common data structures
>> that models Hsp, Hits,
>> Results without taking care (ie. making
>> abstraction) of the underlying
>> search program.
>>
>> This is the big advantage of BioPerl and Biopython's
>> SearchIO module.
>> You can at least in theory switch between parsing BLAST
>> XML, BLAST
>> tabular, BLAST plain text (shudder), or another related
>> format without
>> major changes to your code.
>>
>> and the disadvantage is that you constantly need to
>> update them to the
>> variant of blast plus version of the output file format.
>>
>> I think it is much better to have this housekeeping done
>> once centrally in
>> a Bio* library that re-invented by anyone parsing the
>> BLAST output.
>> However, the NCBI BLAST XML output has been fairly
>> stable, and the
>> new output has a formal schema so should be even more
>> dependable.
>>
>> Peter
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at mailman.open-bio.org
>> <mailto:biojava-dev at mailman.open-bio.org>
>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at mailman.open-bio.org
>> <mailto:biojava-dev at mailman.open-bio.org>
>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at mailman.open-bio.org
> <mailto:biojava-dev at mailman.open-bio.org>
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150511/5a46d175/attachment-0001.html>
More information about the biojava-dev
mailing list