[Bioperl-l] frac_* methods

aaron.j.mackey at gsk.com aaron.j.mackey at gsk.com
Wed Oct 3 15:53:12 UTC 2007


I think Wenwu makes a nice distinction here between alignment and 
placement.  BLAST is great at finding things and (thus) placing things.

String matching has a long and rich history in computer science, and we 
tend to confuse the terms "alignment" with "matching".  The "align a 
BAC/PAC to a genome" problem is one of string matching (with allowance for 
errors due to sequencing artifacts and possible SNPs); if there were no 
errors, we wouldn't use BLAST at all (and, in fact, I personally think 
programs such as MUMMER, or the various genome assembly tiling algorithms, 
are better for this particular problem).  The problem of pairwise 
alignment can also be called matching, but the distinction (at least to 
me) is that the "errors" are true evolutionary mutations, and are expected 
to occur naturally (i.e. are not an artifact of the experiment that in an 
optimal world would not occur).  BLAST is good at finding matches whose 
"errors" fit scoring-matrix-based evolutionary models, but it isn't very 
good at teasing out the actual evolutionary events that lead to those 
"errors" (this is not really a criticism of BLAST - it's job is not to 
generate evolutionarily-accurate, and -complete alignments, but to 
identify evolutionarily-conserved regions having statistical significance)

Please don't get me wrong, I think BLAST is an invaluable tool that fully 
deserves its top-most place in the bioinformatics hall of fame.  But I 
also don't believe that bioinformatics begins and ends with running a 
BLAST search and poring over the report details.

-Aaron

"Cui, Wenwu (NIH/NLM/NCBI) [C]" <cuiw at ncbi.nlm.nih.gov> wrote on 
10/03/2007 10:50:47 AM:

> I agree that BLAST is not a very good alignment algorithm but believe
> there are plenty of reasons to run BLAST, especially when placing a
> contig /BAC/PAC to a genome. In those cases, fully implementation of SW
> requires an unpractical matrix of n X m. 
> 
> Currently we are developing an algorithm which will run global alignment
> after BLAST. Hopefully a Perl wrapper will become available next year.
> 
> 
> Wenwu Cui, PhD
> 
> -----Original Message-----
> From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] 
> Sent: Tuesday, October 02, 2007 9:40 PM
> To: Jason Stajich
> Cc: bioperl-l list; Thiago Venancio
> Subject: Re: [Bioperl-l] frac_* methods
> 
> Let me second Jason's comment that while BLAST is a great search
> program, 
> it is not a very good alignment algorithm.  In this day and age with so 
> many good pairwise alignment algorithms out there (customized for the 
> context in which the alignment is performed), BLAST-based alignments 
> should frankly be ignored.  See: exonerate, pairagon, etc.
> 
> Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with
> the 
> FASTA package) is now vector-parallelized on most i386 architectures, it
> 
> is only about 10 times slower than BLAST for complete database searches 
> (with superior sensitivity/specificity); add PVM or MPI-based CPU 
> parallelization on top of that, and there's almost no reason to even run
> 
> BLAST anymore ...
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM:
> 
> > I think my answer before was something to the tune of:
> > 
> > Use an alignment algorithm that finds a single best alignment like 
> > FASTA or Smith-Waterman (SW) if what you want is a single number that 
> > represents the alignment.  BLAST is great for fast searching but 
> > FASTA or SW/SSEARCH are going to be better at creating an alignment. 
> > Consider the -postsw option in WUBLAST as well as it will realign the 
> > HSPs with SW.
> > 
> > I personally never use the frac alignment summary stats for the Hit 
> > object for this reason unless I know I am going to have a single HSP.
> > 
> > -jason
> > 
> > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote:
> > 
> > > Hi all,
> > >
> > > This topic was discussed before, but I would like to put it on the 
> > > list
> > > again, maybe someone has an update.
> > >
> > > The methods frac_identical, frac_conserved, frac_aligned_query and
> > > frac_aligned_hit can also be used in the hit context, after HSP 
> > > tilling. In
> > > my point of view, it is better to use it just in HSPs individually, 
> > > because
> > > there are some rare/strange kinds of alignments. However, we 
> > > frequently need
> > > to get one measure of the whole alignment.
> > >
> > > Any of the BioPerl masters has an update on this topic ? What is 
> > > the best
> > > current usage ?
> > >
> > > Best.
> > >
> > > Thiago
> > >
> > > -- 
> > > "Innovation distinguishes between a leader and a follower."
> > >     Steve Jobs
> > >
> > > ========================
> > > Thiago Motta Venancio, MSc
> > > PhD student in Bioinformatics
> > > University of Sao Paulo
> > > ========================
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > --
> > Jason Stajich
> > jason at bioperl.org
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list