[Biopython] NCBIWWW genbank files

Ara Kooser akooser at unm.edu
Tue Jul 19 15:07:23 UTC 2011


Peter,

  Thanks for the clarification there. I was a little confused. I'll give this a try.

Regards,
Ara

On Jul 18, 2011, at 11:54 PM, Peter Cock wrote:

> On Monday, July 18, 2011, Ara Kooser <akooser at unm.edu> wrote:
>> Good morning all,
>> 
>> 
>>   I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual:
>> 
>> from Bio.Blast import NCBIWWW
>> 
>> def query():
>>    file_query = raw_input("Please enter the name of your sequence file: ")
>>    fasta_seq = open(file_query).read()
>>    result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000)
>>    save_file = open("blast_results.xml","w")
>>    save_file.write(result_handle.read())
>>    save_file.close()
>>    result_handle.close()
>> 
>> 
>> query()
>> 
>> Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files.
>> 
>> Thanks!
>> Ara
>> 
> 
> Hi Ara,
> 
> BLAST does not offer GenBank as an output format.
> 
> Assuming I have understood your aim, this can be done as a multi step
> process: Run BLAST, extract a list of matching record accessions,
> download these records in GenBank format from the NCBI.
> 
> You may find it useful to request tabular output from BLAST and
> extract the match names (column two). This should be faster as the XML
> version of the data is much larger.
> 
> Also to avoid trying to download the same GenBank record more than
> once, I would use a Python set rather than a Python list object when
> recording this information from the BLAST file.
> 
> You can use the NCBI Entrez utilities API to download GenBank files,
> see Bio.Entrez in the Biopython tutorial, function efetch.
> 
> Peter





More information about the Biopython mailing list