[BioPython] retrieving mRNA using protein sequence

Brad Chapman chapmanb@arches.uga.edu
Sat, 13 Oct 2001 14:28:03 -0400


Hi Scott;

> I was wondering if anyone out there might have some advice on how I might
> best go about retrieving DNA sequence using the translated protein. 

I think a relatively easy way to do this is to use tblastn (protein
query against a translated database in all 3 frames). For instance,
querying your example sequence against the nr database gives me my
guess for the identity of the original mRNA:

>gi|516327|emb|X80070.1|DMUSVAR39 D.melanogaster mRNA for Su(var)3-9
protein

Provided your sequences are in public databases, you can use use
biopython code to automate BLASTing against NCBI, parsing the BLAST
report, and fetching the original sequence. This is probably only
feasible if you have a small number of original sequences to find
out the identity of. tblastn is both slow (it took 5 minutes for a
Saturday afternoon blast of your query) and hard on NCBI resources.

If you have a ton of sequences to find the identity of, the best
thing to do would be to set up a local database of potential mRNA
sequences (if you can narrow this down to something less than all of
nr, this would also save a lot of time!), and use local blast
against this database. You could also use biopython to automate
doing everything here as well.

Hope this helps, and thanks for the kind words about the documentation :-)

Brad
-- 
PGP public key available from http://pgp.mit.edu/