[Bioperl-l] extracting coding sequence from BLAST

Thiago Venancio thiago.venancio at gmail.com
Thu May 3 21:12:35 UTC 2007


Hi all,

Just for record. I am getting good results to extract CDS from protein X dna
alignments by using the following procedure:

- BLASTX to identify the hits for each dna sequence (if you want to process
sequences for further multiple sequence alignment, it is important to record
the frames);

- fastx/y to refine the alignment between the protein and the dna. FASTX/Y
is is quite good, because it performs well with frame shifts and a allows
better identification of premature stop codons. In addition, the alignment
(and the CDS prediction) is better.

This is interesting to note, to avoid analysis of "phantom" mRNAs, which are
sequences that have stops, so merely looking at the blast can raise
misleading results sometimes.

Best.

Thiago



On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Hi -
> There are some tools that do this for you -- I've listed a few from a
> google search or from what I remember reading.  It would be great If you
> (and others!) are willing to contribute a little of the info of what you
> find that works for you to the wiki, that would be great as well.   A little
> HOWTO would be cool - here or on openwetware.org.
>
> Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
> EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2
>
> Ewan Birney's estwise as part of wise package also can help if you have a
> likely protein from BLAST you want to align to the est - estwise can handle
> frameshifts, but can be too slow for some people.  Exonerate's protein2dna
> model may also work here, but I haven't tried it.
>
> -jason
> On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:
>
> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed
> to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================



More information about the Bioperl-l mailing list