[Biopython] Biopython local blastn query

Ara Kooser ghashsnaga at gmail.com
Tue Jul 30 14:14:08 UTC 2013


Peter,

  Thank you for your quick response! I added in the -perc_identity and
closed the file. I end up with the same results. I do get the full
sequences but also a bunch of partials.

ara


On Tue, Jul 30, 2013 at 2:12 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Tue, Jul 30, 2013 at 2:45 AM, Ara Kooser <ghashsnaga at gmail.com> wrote:
> > Hello all,
> >
> >    I goofed up on curating accession numbers for part of my PhD project.
> > But I have the sequences in a big fasta file! I wrote a quick script that
> > read in one sequence at a time from the file, blasted it and then
> filtered
> > it based on 0 gaps and 100% id match. I did this for just the first 6
> > sequences as to not anger the NCBI. This worked great! But it's slow
> > (really slow) and I can't submit the whole file.
> >
> >  I installed a local blast db and wrote this script.(attached as
> > meta_data_local.py and the query file, clear_genus_level.fasta ):
> >
> >
> ########################################################################################
> > #I want to read in one sequence at a time from a fasta file and blast it
> > against a local
> > #blast db.
> >
> > from Bio.Blast.Applications import NcbiblastnCommandline
> > from Bio.Blast import NCBIXML
> > from Bio import SeqIO
> > from Bio import Seq
> > from Bio.SeqRecord import SeqRecord
> >
> > nt = "/Users/arakooser/blast/db/nt.00"
> > #Where the database is located at
> > file_out = open("metadata_genus.level.csv","w+")
> >
> > #Contains all the data my boss wants on the sequences
> > file_in = open("clear_genus_level.fasta")
> >
> > #The main fasta file that needs to be blasted
> >
> > fas_rec = SeqIO.parse(file_in,"fasta")
> > #Parses the main fasta file
> >
> > for first_seq in fas_rec:
> > #Hopefully grabs the first sequence
> > #Takes that sequence from standard in and sumbits it to the blast
> > commandline and spits
> > #out an xml
> >     result = NcbiblastnCommandline(query="-", db=nt, evalue=0.001,
> outfmt=5,
> >                                    out="temp.xml")
>
> You could ask BLAST itself to apply the percentage
> identity threshold, blastn has a -perc_identity option.
>
> >     stdout, stderr = result(stdin=first_seq.format("fasta"))
> >
> > #Reading in the xml file.
> > #
> >
> >     record = open("temp.xml")
> >     ...
>
> You never close this file handle, perhaps that is
> causing problems reusing the filename?
>
> It might be safer to use a different temporary
> file each time (there are standard functions to
> generate these names in Python)?
>
> Peter
>



-- 
Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an
sub cardine glacialis ursae.

Geoscience website: http://www.tattooedscience.org/



More information about the Biopython mailing list