[Biopython] NCBIWWW genbank files

Peter Cock p.j.a.cock at googlemail.com
Tue Jul 19 05:54:12 UTC 2011


On Monday, July 18, 2011, Ara Kooser <akooser at unm.edu> wrote:
> Good morning all,
>
>
>    I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual:
>
> from Bio.Blast import NCBIWWW
>
> def query():
>     file_query = raw_input("Please enter the name of your sequence file: ")
>     fasta_seq = open(file_query).read()
>     result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000)
>     save_file = open("blast_results.xml","w")
>     save_file.write(result_handle.read())
>     save_file.close()
>     result_handle.close()
>
>
> query()
>
> Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files.
>
> Thanks!
> Ara
>

Hi Ara,

BLAST does not offer GenBank as an output format.

Assuming I have understood your aim, this can be done as a multi step
process: Run BLAST, extract a list of matching record accessions,
download these records in GenBank format from the NCBI.

You may find it useful to request tabular output from BLAST and
extract the match names (column two). This should be faster as the XML
version of the data is much larger.

Also to avoid trying to download the same GenBank record more than
once, I would use a Python set rather than a Python list object when
recording this information from the BLAST file.

You can use the NCBI Entrez utilities API to download GenBank files,
see Bio.Entrez in the Biopython tutorial, function efetch.

Peter




More information about the Biopython mailing list