[Bioperl-l] frac_* methods

Wed Oct 3 17:37:51 UTC 2007

I agree what you said. 

One of the reasons that we introduce 'BLAST-guided-global alignment
(NW)' is that a significant amount of clones are either of low quality,
partially sequenced, erroneously assembled, or come from non reference
strain.   

Wenwu Cui, PhD

-----Original Message-----
From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] 
Sent: Wednesday, October 03, 2007 11:53 AM
To: Cui, Wenwu (NIH/NLM/NCBI) [C]
Cc: bioperl-l list; Jason Stajich; Thiago Venancio
Subject: RE: [Bioperl-l] frac_* methods

I think Wenwu makes a nice distinction here between alignment and 
placement.  BLAST is great at finding things and (thus) placing things.

String matching has a long and rich history in computer science, and we 
tend to confuse the terms "alignment" with "matching".  The "align a 
BAC/PAC to a genome" problem is one of string matching (with allowance
for 
errors due to sequencing artifacts and possible SNPs); if there were no 
errors, we wouldn't use BLAST at all (and, in fact, I personally think 
programs such as MUMMER, or the various genome assembly tiling
algorithms, 
are better for this particular problem).  The problem of pairwise 
alignment can also be called matching, but the distinction (at least to 
me) is that the "errors" are true evolutionary mutations, and are
expected 
to occur naturally (i.e. are not an artifact of the experiment that in
an 
optimal world would not occur).  BLAST is good at finding matches whose 
"errors" fit scoring-matrix-based evolutionary models, but it isn't very

good at teasing out the actual evolutionary events that lead to those 
"errors" (this is not really a criticism of BLAST - it's job is not to 
generate evolutionarily-accurate, and -complete alignments, but to 
identify evolutionarily-conserved regions having statistical
significance)

Please don't get me wrong, I think BLAST is an invaluable tool that
fully 
deserves its top-most place in the bioinformatics hall of fame.  But I 
also don't believe that bioinformatics begins and ends with running a 
BLAST search and poring over the report details.

-Aaron

"Cui, Wenwu (NIH/NLM/NCBI) [C]" <cuiw at ncbi.nlm.nih.gov> wrote on 
10/03/2007 10:50:47 AM:

> I agree that BLAST is not a very good alignment algorithm but believe
> there are plenty of reasons to run BLAST, especially when placing a
> contig /BAC/PAC to a genome. In those cases, fully implementation of
SW
> requires an unpractical matrix of n X m. 
> 
> Currently we are developing an algorithm which will run global
alignment
> after BLAST. Hopefully a Perl wrapper will become available next year.
> 
> 
> Wenwu Cui, PhD
> 
> -----Original Message-----
> From: aaron.j.mackey at gsk.com [mailto:aaron.j.mackey at gsk.com] 
> Sent: Tuesday, October 02, 2007 9:40 PM
> To: Jason Stajich
> Cc: bioperl-l list; Thiago Venancio
> Subject: Re: [Bioperl-l] frac_* methods
> 
> Let me second Jason's comment that while BLAST is a great search
> program, 
> it is not a very good alignment algorithm.  In this day and age with
so 
> many good pairwise alignment algorithms out there (customized for the 
> context in which the alignment is performed), BLAST-based alignments 
> should frankly be ignored.  See: exonerate, pairagon, etc.
> 
> Oh, and since ssearch35 (the Smith-Waterman algorithm that comes with
> the 
> FASTA package) is now vector-parallelized on most i386 architectures,
it
> 
> is only about 10 times slower than BLAST for complete database
searches 
> (with superior sensitivity/specificity); add PVM or MPI-based CPU 
> parallelization on top of that, and there's almost no reason to even
run
> 
> BLAST anymore ...
> 
> -Aaron
> 
> bioperl-l-bounces at lists.open-bio.org wrote on 10/02/2007 06:22:59 PM:
> 
> > I think my answer before was something to the tune of:
> > 
> > Use an alignment algorithm that finds a single best alignment like 
> > FASTA or Smith-Waterman (SW) if what you want is a single number
that 
> > represents the alignment.  BLAST is great for fast searching but 
> > FASTA or SW/SSEARCH are going to be better at creating an alignment.

> > Consider the -postsw option in WUBLAST as well as it will realign
the 
> > HSPs with SW.
> > 
> > I personally never use the frac alignment summary stats for the Hit 
> > object for this reason unless I know I am going to have a single
HSP.
> > 
> > -jason
> > 
> > On Oct 2, 2007, at 2:41 PM, Thiago Venancio wrote:
> > 
> > > Hi all,
> > >
> > > This topic was discussed before, but I would like to put it on the

> > > list
> > > again, maybe someone has an update.
> > >
> > > The methods frac_identical, frac_conserved, frac_aligned_query and
> > > frac_aligned_hit can also be used in the hit context, after HSP 
> > > tilling. In
> > > my point of view, it is better to use it just in HSPs
individually, 
> > > because
> > > there are some rare/strange kinds of alignments. However, we 
> > > frequently need
> > > to get one measure of the whole alignment.
> > >
> > > Any of the BioPerl masters has an update on this topic ? What is 
> > > the best
> > > current usage ?
> > >
> > > Best.
> > >
> > > Thiago
> > >
> > > -- 
> > > "Innovation distinguishes between a leader and a follower."
> > >     Steve Jobs
> > >
> > > ========================
> > > Thiago Motta Venancio, MSc
> > > PhD student in Bioinformatics
> > > University of Sao Paulo
> > > ========================
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > --
> > Jason Stajich
> > jason at bioperl.org
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>