[Biopython] NCBIWWW genbank files
Ara Kooser
akooser at unm.edu
Tue Jul 19 15:07:23 UTC 2011
Peter,
Thanks for the clarification there. I was a little confused. I'll give this a try.
Regards,
Ara
On Jul 18, 2011, at 11:54 PM, Peter Cock wrote:
> On Monday, July 18, 2011, Ara Kooser <akooser at unm.edu> wrote:
>> Good morning all,
>>
>>
>> I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual:
>>
>> from Bio.Blast import NCBIWWW
>>
>> def query():
>> file_query = raw_input("Please enter the name of your sequence file: ")
>> fasta_seq = open(file_query).read()
>> result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000)
>> save_file = open("blast_results.xml","w")
>> save_file.write(result_handle.read())
>> save_file.close()
>> result_handle.close()
>>
>>
>> query()
>>
>> Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files.
>>
>> Thanks!
>> Ara
>>
>
> Hi Ara,
>
> BLAST does not offer GenBank as an output format.
>
> Assuming I have understood your aim, this can be done as a multi step
> process: Run BLAST, extract a list of matching record accessions,
> download these records in GenBank format from the NCBI.
>
> You may find it useful to request tabular output from BLAST and
> extract the match names (column two). This should be faster as the XML
> version of the data is much larger.
>
> Also to avoid trying to download the same GenBank record more than
> once, I would use a Python set rather than a Python list object when
> recording this information from the BLAST file.
>
> You can use the NCBI Entrez utilities API to download GenBank files,
> see Bio.Entrez in the Biopython tutorial, function efetch.
>
> Peter
More information about the Biopython
mailing list