[BioPython] Retrieving nucleotide sequence for given accession Entrez ID

Wed Oct 22 16:15:37 EDT 2008

On Wed, Oct 22, 2008 at 8:49 PM, Clary, Richard <rsclary at uncc.edu> wrote:
>
> Can anyone provide succinct Python function to retrieve the
> nucleotide sequence (as a string) for a given nucleotide
> accession ID?  Attempting to do this through E-Utils but
> having a difficult time figuring out the best way to do this
> without having to download a FASTA file...

Hi Richard,

Are you trying this using Bipython's Bio.Entrez, or accessing E-Utils directly?

Anyway, you'll want to use efetch (e.g. via the Bio.Entrez.efetch
function in Biopython)
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html

This documentation covers the possible return formats,
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html

I think FASTA would be simplest (I don't see a plain or raw text
option), and has only a tiny overhead in the download size over the
raw sequence.  Getting the sequence out of a FASTA file as a string is
trivial - for example, using Biopython:

from Bio import Entrez, SeqIO
Entrez.email = "Richard at example.com" #Tell the NCBI who you are
handle = Entrez.efetch(db="nucleotide", id="186972394",rettype="fasta")
seq_str = str(SeqIO.read(handle, "fasta").seq)

Peter