[Biopython] fetching chromosome IDs given the organism ID

Fields, Christopher J cjfields at illinois.edu
Fri Nov 18 02:19:39 UTC 2011


On Nov 17, 2011, at 5:09 PM, Peter Cock wrote:

> On Thu, Nov 17, 2011 at 10:51 PM, Vladislav Petyuk <petyuk at gmail.com> wrote:
>> I am trying to fetch the chromosome IDs for a given genome.
>> For example Cyanothece sp 51142 has 2 chromosomes and 4 plasmids
>> http://www.ncbi.nlm.nih.gov/genome?term=1608%5Buid%5D#tabs-1608-2
>> The piece of Biopython code that used to work for me is:
>> #---------------------
>> url = Entrez.esearch(db="genome", term="txid43989")
>> record = Entrez.read(url)
>> chromosomeIDs = record["IdList"]
>> #---------------------
>> Not anymore. Now it returns the organism id, which is 1608.
> 
> That's annoying of the NCBI to change things.

...but not unusual (ref: BLAST output over the ages). 

>> Please point in the right direction how to get the chromosome ids given the
>> organism id.
> 
> Try searching the nucleotide database directly, with
> term txid43989[orgn] to restrict the species, and I think
> there is another field to restrict to complete genomes.
> Have a look at the field list with EInfo (see the Biopython
> tutorial for EInfo which explains how to do this).
> 
> I would try it myself right now, but the Entrez website
> seems very slow from here tonight.
> 
> Peter

I think they are doing a lot of work behind the scenes, particularly with efetch (something to be aware of for all us folks who have modules pulling data from genbank).  

chris



More information about the Biopython mailing list