[Bioperl-l] Fetching a DNA sequence corresponding to a protein

Sat Apr 16 09:13:51 EDT 2005

I'm Sang Chul Choi, ...

I'm trying to fetching a DNA sequence coding a protein from the public
database. I started with a PDB file and I want the corresponding DNA
sequence coding the protein sequence in the PDB file. Let me explain
my strategy and I'm wondering if there is any other better way to do this.

Let's say, that I have a PDB file of "1g84", and I want a DNA sequence
coding the protein. To do that, I found out that there lines starting with
"DBREF" in the PDB file. DBREF says where I could track down the 
corresponding DNA sequence. The following lines are PDB id,
1g84 and the chain I want, in this case chain "A", so 1g84 + a,
and the DBREF line in the PDB file named pdb1g84.ent.

1g84a
DBREF  1G84 A    1   105  SWS    P01854   EPC_HUMAN      106    210

In detail, this DBREF says that the flat file of protein sequence is from
swissprot database and its id is "P01854" and the chain A of protein
structure 1G84 is from position 106 to 210 in that protein sequence of
the flat file. So, I used the bioperl object "Bio::DB::SwissProt" to get the
flat file.

And, in the swissprot flat file, I easily parsed CDS line using methods
of bioperl object. Saying,

 CDS             1..574
                     /gene="IGHE"
                     /coded_by="join(L00021.1:57..495,L00022.1:98..406,
                     L00022.1:614..934,L00022.1:1021..1344,
                     L00022.1:1428..1759)"

this CDS feature says actually where I could get the DNA sequence.
So, now I'm using "Bio::DB::GenBank" to get the DNA sequence.

Thank you very much for your careful reading upto here.

Sincerely Yours,

Sang Chul Choi, from Raleigh, North Carolina

------------------------------------------------------------------------
NAVER :: Korea's No.1 portal service
www.naver.com