[BioPython] Writing a biopython script to download all Genbank records from Nucleotide database

Tue Nov 13 04:15:57 EST 2007

Christof Winter wrote:
> I used the code below to retrieve some entries from the Nucleotide database. 
> Since two entries already take a few seconds, it is probably a bad idea to 
> download _all_ entries in that way.
> 
> You might be better off downloading the data first:
> ftp://ftp.ncbi.nih.gov/genbank/

I would agree 100%.  Another benefit is you can script an FTP download 
(e.g. using wget which can cope with an interrupted internet connection 
nicely).

> from Bio import GenBank
> 
> featureParser = GenBank.FeatureParser()
> ncbiDict = GenBank.NCBIDictionary("nucleotide", "genbank", parser=featureParser)
> ...

Note that Bio.GenBank.NCBIDictionary won't work in Biopython 1.44, but 
its been fixed again in CVS - see bug 2393.

http://bugzilla.open-bio.org/show_bug.cgi?id=2393

> accessionNumbers = ["BC063166", "NM_028459"]
> 
> for accessionNo in accessionNumbers:
>      giList = GenBank.search_for(accessionNo)
>      for gi in giList:
>          record = ncbiDict[gi]   # parsing happens here
 >          ...

I expect you can ask the NCBI for records by accession directly, rather 
than doing a search to get the GI number.

Peter