[BioPython] Writing a biopython script to download all Genbank records from Nucleotide database
Peter
biopython at maubp.freeserve.co.uk
Tue Nov 13 04:15:57 EST 2007
Christof Winter wrote:
> I used the code below to retrieve some entries from the Nucleotide database.
> Since two entries already take a few seconds, it is probably a bad idea to
> download _all_ entries in that way.
>
> You might be better off downloading the data first:
> ftp://ftp.ncbi.nih.gov/genbank/
I would agree 100%. Another benefit is you can script an FTP download
(e.g. using wget which can cope with an interrupted internet connection
nicely).
> from Bio import GenBank
>
> featureParser = GenBank.FeatureParser()
> ncbiDict = GenBank.NCBIDictionary("nucleotide", "genbank", parser=featureParser)
> ...
Note that Bio.GenBank.NCBIDictionary won't work in Biopython 1.44, but
its been fixed again in CVS - see bug 2393.
http://bugzilla.open-bio.org/show_bug.cgi?id=2393
> accessionNumbers = ["BC063166", "NM_028459"]
>
> for accessionNo in accessionNumbers:
> giList = GenBank.search_for(accessionNo)
> for gi in giList:
> record = ncbiDict[gi] # parsing happens here
> ...
I expect you can ask the NCBI for records by accession directly, rather
than doing a search to get the GI number.
Peter
More information about the BioPython
mailing list