[Bioperl-l] From PDB to get nucleotide sequences for all related genomes?

Jason Stajich jason.stajich at duke.edu
Tue Oct 12 13:51:12 EDT 2004


Well given the protein's accession you can get the protein record  
(Bio::DB::GenPept)
parse the record looking for a CDS feature and grab the 'coded_by'  
section.  This gives you another accession number for the CDS sequence,  
use Bio::DB::GenBank to get that record.

Partial examples of stuff like this are in my tutorials
http://jason.open-bio.org/Bioperl_Tutorials/

For example this one which does some work to get the CDS for a protein  
based on a swissprot ID
http://jason.open-bio.org/Bioperl_Tutorials/Duke_2004/ 
BioperlProjects.pdf

My recollection is we've answered some or all of these questions on the  
mailing list several times - Something like this gets you started  
though.
http://www.google.com/search?q=site: 
bioperl.org+%2Bpipermail+%2Bbioperl-l+%2BCDS&ie=UTF-8&oe=UTF-8

We really need people to volunteer to help bioperl by writing up these  
questions and their solutions in the FAQ or in stand alone HOWTOs.   
(This is hopefully the part where those who don't feel qualified as  
"gurus" but want to help should be raising your hands...)

-jason
On Oct 12, 2004, at 10:03 AM, 최상철 wrote:

> Dear Bioperl Guru:
>
> I'm Sang Chul Choi, a graduate student in the program of  
> bioinformatics at NCSU.
> I'm interested in Protein Evolution Modeling and recently I should  
> apply a
> model to all PDB entries. The problem is that I am stuck in getting  
> nucleotide
> sequences of all related genomes for each PDB entry.
>
> There is "DBREF" section in PDB like this:
> ./pdb1t7s.ent
> DBREF  1T7S A   74   210  GB     17507755 NP_491893       74    210
> DBREF  1T7S B   74   210  GB     17507755 NP_491893       74    210
> =====================================================
> ./pdb1t9f.ent
> DBREF  1T9F A   22   206  GB     17508635 NP_491320       22    206
> =====================================================
> ./pdb1tc3.ent
> DBREF  1TC3 A    1    21  PDB    1TC3     1TC3             1     21
> DBREF  1TC3 B  101   120  PDB    1TC3     1TC3           101    120
> DBREF  1TC3 C  202   252  GB     1086778  P34257           2     52
>
> And, I know that there is the source orgarnism section.
>
> Using these two kinds of information, I have tried to get nucleotide  
> sequences
> from Database: NCBI, SWISSPROT, ...
>
> Is there any good suggestion for this thing? Any comment will be  
> helpful.
>
> Thanks,
>
> Sang Chul_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2621 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20041012/756ff84a/attachment.bin


More information about the Bioperl-l mailing list