[Bioperl-l] Aligning BLAST results

Andrew McArdle ajm226 at cam.ac.uk
Mon Oct 18 07:37:04 EDT 2004


Hello,

Quick note... this is a long message. If you don't want to read it all, 
then here's a summary: what is the best program/module to use for aligning 
the DNA sequences of different lengths, coming from the same cDNA, in order 
to get the longest possible contig?

Longer version: I am currently using BioPerl to automate a laborious task. 
I have a set of 125 contigs (originally assembled from available 
Schistosome EST data some years ago). These contigs are those that were 
unidentified after BlastX homology searching. In order to improve our 
chances of getting relevant homology results, we want to extend the known 
sequence of these contigs using newer EST data.

I have managed to get a basic Perl program working, which takes each 
contig, locally blasts it against an EST file I have, then takes the best 
results and assembles them together with the contig, in the hope of getting 
a longer sequence. This longer sequence is then remotely blasted at the 
NCBI on BlastX, and the results stored.

My question is, what is the best program to use for aligning the BLAST 
results.

When I did this manually (I gave up very quickly in favour of automation!), 
I used EditSeq from DNAStar to align the sequences. Not knowing much about 
alignments, in my program I opted to used ClustalW. This has given me 
alignments, but the percentage idenity values are usually very poor (circa 
50% - which I would imagine is little better than what you could get by 
chance by arranging random sequences). I then discovered that ClustalW is a 
global alignment program, and I really need a local alignment program. I 
then tried using the dpAlign module, and this gave worse results. I have 
since found something which suggests that dpAlign is only for protein, and 
I should be using pSW. Anyhow, dpAlign definitely appeared to be doing a 
global alignment, since the ends were flush.

I then tried using TCoffee, but have not, as of yet been able to get good 
results.

I have considered creating the alignments from the BLAST data (since this 
gives the start/end base numbers of the alignments, and the sense), but I 
wouldn't be able to account for gaps etc.

Apologies for the long message, but can anyone suggest which program/module 
I should be using to align these BLAST results.

Thanks very much,

Andrew McArdle


More information about the Bioperl-l mailing list