[BioPython] Can't download a FASTA file from NCBI to BLAST

Tue Jun 19 16:46:20 UTC 2007

Roger Barrette wrote:
> Hi again Peter,
> 
> You are correct in your assumptions as to what I'm trying to accomplish.  I
> have a habit of pulling random code from different places when I'm at a loss
> for how to do something, when I can't find documentation or examples.

If we start with some of you last attempt, you can see that this NCBI 
dictionary just returns raw fasta records as strings:

 >>> from Bio import GenBank
 >>> ncbi_dict = GenBank.NCBIDictionary("nucleotide","fasta")
 >>> ncbi_dict["A0B5H8]
'>gi|121693723|sp|A0B5H8|A0B5H8_9EURY TATA-box binding\nMESTINI...'
 >>> print ncbi_dict["A0B5H8]
 >gi|121693723|sp|A0B5H8|A0B5H8_9EURY TATA-box binding
MESTINIENVVASTKLADEFDLVKIESELEGAEYNKEKFPGLVYRVKSPKAAFLIFTSGKVVCTGAKNVE
DVRTVITNMARTLKSIGFDNINLEPEIHVQNIVASADLKTDLNLNAIALGLGLENIEYEPEQFPGLVYRI
KQPKVVVLIFSSGKLVVTGGKSPEECEEGVRIVRQQLENLGLL

You can just write these directly to your file:

from Bio import GenBank
from Bio import SeqIO
acc_list = ["A0B5H8", "A0C5G2", "A0CM02", "A0CRU8"]
#Don't use any record parser, we just want the raw text
ncbi_dict = GenBank.NCBIDictionary("nucleotide","fasta")
fasta_file = open("c:\\Current_Query.fasta","w")
for acc in acc_list :
     fasta_file.write(ncbi_dict[acc])
fasta_file.close()

This is very simple as there is no conversion between file formats - you 
are asking the NCBI for fasta format records, and you save them to a 
file as is.

Another option (which I was suggesting in the previous email) is to have 
the NCBIDictionary parse the data into SeqRecord objects (rather than 
raw text) and then write those to your file, possibly using Bio.SeqIO

Peter