[BioPython] [DETECTED AS SPAM] Re: back-translation method for Seq object?

Peter biopython at maubp.freeserve.co.uk
Tue Oct 21 14:45:57 UTC 2008


Bruce wrote:
>>> Another case where it would be useful is that tools like TBLASTN gives
>>> protein alignments so you must open the DNA sequence and find the DNA
>>> region based on the protein alignment.

Leighton:
>> You could use TBLASTN output - which provides start and stop coordinates
>> for the match on the subject sequence - to extract this directly, without the
>> need for backtranslation.  Example output where subject coordinates give
>> the match location below:
>>
>>>
>>> ref|NC_004547.2| Erwinia carotovora subsp. atroseptica SCRI1043, complete
>>>
>>
>> genome
>>          Length = 5064019
>>
>>  Score =  731 bits (1887), Expect = 0.0
>>  Identities = 363/376 (96%), Positives = 363/376 (96%)
>>  Frame = +3
>>
>> Query: 1      MFHXXXXXXXXXXXXXTISVGMMAPFTFAEAKTPGTLVEKAPLDSKNGLMEAGEQYRIQY
>> 60
>>              MFH             TISVGMMAPFTFAEAKTPGTLVEKAPLDSKNGLMEAGEQYRIQY
>> Sbjct: 477432 MFHLPKLKQKPLALLLTISVGMMAPFTFAEAKTPGTLVEKAPLDSKNGLMEAGEQYRIQY
>> 477611
>>
>> [...]

Bruce's reply:
> Exactly my point, where is the DNA sequence? Only if you have direct access
> to the DNA sequence can you get it. Furthermore, the DNA sequence must be
> exactly the same because any change in the coordinates screws it up.

You should have the original query from when you ran the BLAST
search, so using the co-ordinates given in the BLAST hit you can
recover the original nucleotide query which gives this match.

There is no reason to do a back-translation to try and find the original
query, which would be especially difficult in this example due to the
XXXXXX region (representing a region of low complexity which was
ignored by BLAST).  Even if you tried you could find more than one
match and without checking the the coordinates BLAST gives it would
not be clear which gave this BLAST match.

Peter



More information about the Biopython mailing list