[BioPython] Retrieving nucleotide sequence for given accession Entrez ID
Peter
biopython at maubp.freeserve.co.uk
Wed Oct 22 20:15:37 UTC 2008
On Wed, Oct 22, 2008 at 8:49 PM, Clary, Richard <rsclary at uncc.edu> wrote:
>
> Can anyone provide succinct Python function to retrieve the
> nucleotide sequence (as a string) for a given nucleotide
> accession ID? Attempting to do this through E-Utils but
> having a difficult time figuring out the best way to do this
> without having to download a FASTA file...
Hi Richard,
Are you trying this using Bipython's Bio.Entrez, or accessing E-Utils directly?
Anyway, you'll want to use efetch (e.g. via the Bio.Entrez.efetch
function in Biopython)
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html
This documentation covers the possible return formats,
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
I think FASTA would be simplest (I don't see a plain or raw text
option), and has only a tiny overhead in the download size over the
raw sequence. Getting the sequence out of a FASTA file as a string is
trivial - for example, using Biopython:
from Bio import Entrez, SeqIO
Entrez.email = "Richard at example.com" #Tell the NCBI who you are
handle = Entrez.efetch(db="nucleotide", id="186972394",rettype="fasta")
seq_str = str(SeqIO.read(handle, "fasta").seq)
Peter
More information about the Biopython
mailing list