[Bioperl-l] blast and length adjustment

Jason Stajich jason.stajich at gmail.com
Fri Aug 2 07:57:06 UTC 2013


On Aug 1, 2013, at 9:01 PM, dimitark at bii.a-star.edu.sg wrote:

> Hi guys,
> i have a question about Blast.
> 
> I was working on some project where i blast using Bioperl against the human-RNA. So i found 2 sequences which hit on totally different RNAs but when i used cd-hit-est they cluster together. I even aligned them and they were almost identical, from NCBI aligner:
> 
> 2658 bits(1439) 	0.0 	1441/1442(99%) 	0/1442(0%) 	Plus/Plus
> 
> Then i decided to blast them on NCBI and they again hit on different sequences.
> Then i checked the parameters of each search and found that both queries were length adjusted aka some length was removed, namely around 30 nucleotides.
> 
> Well it was interesting to see what bioperl does about that so i found the following in BlastUtils.pm:
> 
>   # Adjust length based on BLAST flavor.
>    my $prog = $sbjct->algorithm;
>    if($prog eq 'TBLASTN') {
> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>    } elsif($prog eq 'BLASTX' ) {
> 	$sbjct->{'_length_aln_query'} /= 3;
>    } elsif($prog eq 'TBLASTX') {
> 	$sbjct->{'_length_aln_query'} /= 3;
> 	$sbjct->{'_length_aln_sbjct'} /= 3;
>    }

You are wrongly interpreting the length adjustment that happens at NCBI with this length adjustment. The code above is to deal with translated searches - notice they all are division by 3 because the coordinates presented in the BLAST results for a translated search will be the original DNA/RNA coords but when wants to know what the length is in the alignment space it is really at the protein scale.

So this is not the adjustment you seem to be looking for.
> 
> But seems there is no length adjustment for blastn as it seems to exist on NCBI.
> 
> Its kind of frustrating as i am trying to do some differential expression analysis with my own scripts. But then if these 2 seqs are so identical they should have the same annotation but they do not cos of that strange blast results.

No idea what you mean by the rest of this when it comes to your candidate RNA sequences or what you are seeking to find from the BLAST searches to help you on that front.
> 
> I am really sorry if my post is a bit messy. If you have any questions on what i meant please ask.
> 
> Any comments would be greatly appreciated!
> 
> Cheers
> D.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org





More information about the Bioperl-l mailing list