[Biopython] going from protein to gene to oligos for cloning

Peter Cock p.j.a.cock at googlemail.com
Fri Dec 6 10:24:38 UTC 2013


On Fri, Dec 6, 2013 at 7:27 AM, David Shin <davidsshin at lbl.gov> wrote:
> Hi again,
>
> I'm trying to use biopython to help me grab a lot of protein sequences that
> will eventually be used as the basis for cloning. I'm almost done screening
> my protein sequences, and pretty much ok on that part...
>
> I was just curious if anyone has already developed, or has any decent
> advice on going from protein codes to getting the actual coding sequences
> of the genes.
>
> At this point, my plan is to take protein codes (ie. numbers in
> gi|145323746|) and use these to search entrez nucleotide databases directly
> to get hits (I have tested it once seems to work to get genbank records...
> then try to use the information inside to get the nucleotide sequences...
> or I guess the other way is to use the top hit from tblastn somehow?
>
> Thanks,
>
> Dave

Hi Dave,

The catch here is the protein IDs are not directly usable in the
nucleotide database - which is where ELink (Entrez Link) comes
in, available as the Entrez.elink(...) function in Biopython.

I've not tried it myself, but a colleague posted a long example
on his blog which sounds close to what you are aiming for:

http://armchairbiology.blogspot.co.uk/2013/02/surely-this-has-been-done-already.html
https://github.com/widdowquinn/scripts/blob/master/bioinformatics/get_NCBI_cds_from_protein.py

Peter



More information about the Biopython mailing list