[BioPython] Retrieving nucleotide sequence for given accession Entrez ID

Peter biopython at maubp.freeserve.co.uk
Fri Oct 24 08:52:25 UTC 2008


Hi Richard,

I've taken the liberty of CC'ing this back to the mailing list,

Richard Clary wrote:
> Much appreciation Peter--it worked perfectly.

Good :)

> If you are wanting to
> retrieve multiple sequences, is a simple "+" string concatenation
> sufficient as the case when using eUtils or approach it by creating
> a tuple or dictionary and passing arguments?
>
> Richard

Moving on to your multi-sequence question, using "+" doesn't
seem to work - you should use a comma for concatenating the
IDs when calling eFetch.   What made you think of "+" here?

One other tweak is that Bio.SeqIO.read(...) is for when the handle
contains one and only one record.  In general you'll need to use
Bio.SeqIO.parse(...) instead and iterate over the records.

Depending on what you want to achieve, maybe:

from Bio import Entrez, SeqIO
id_list = ["186972394","12345678"]
Entrez.email = "Richard at example.com" #Tell the NCBI who you are
handle = Entrez.efetch(db="nucleotide", id=",".join(id_list),rettype="fasta")
for id,record in zip(id_list,SeqIO.parse(handle, "fasta")) :
    assert id in record.id, "Didn't get ID %s returned!" % id
    print "%s = %s" % (record.id, record.seq)
    #seq_str = str(record.seq)

If you still want just plain strings for the sequence, maybe:

from Bio import Entrez, SeqIO
id_list = ["186972394","12345678"]
Entrez.email = "Richard at example.com" #Tell the NCBI who you are
handle = Entrez.efetch(db="nucleotide", id=",".join(id_list),rettype="fasta")
seq_str_list = [str(record.seq) for record in SeqIO.parse(handle, "fasta")]

If you haven't already done so, please read the NCBI guidelines for
using Entrez,
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements

Also, have a look at the Entrez chapter in the tutorial, especially
the "history" support which may be relevant.
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Peter



More information about the Biopython mailing list