[Biopython] Biopython local blastn query

Tue Jul 30 15:32:30 UTC 2013

Ivan,

 Thanks! I found the blastn documentation!! This looks like what I want.

I am running blast 2.2.26. I am getting an error with those parameters.

I entered the parameters as:
max_hsps_per_subject=1, num_alignments=1 in the NcbiblastnCommandline line

Error:
Aras-MacBook-Air:CEM Genus arakooser$ python meta_data_local.py
  File "meta_data_local.py", line 30
    -out="temp.xml", max_hsps_per_subject=1, num_alignments=1)
SyntaxError: keyword can't be an expression

I think this means I am not using the correct keyword.

ara

On Tue, Jul 30, 2013 at 9:14 AM, Ivan Gregoretti <ivangreg at gmail.com> wrote:

> Hi Ara,
>
> If you are interested only in the most obvious matches, and I think you
> are, pass the following parameter values to blastn
>
> -max_hsps_per_subject 1 -num_alignments 1
>
> From the blastn documentation:
>
>  -max_hsps_per_subject <Integer, >=0>
>    Override maximum number of HSPs per subject to save for ungapped
> searches
>    (0 means do not override)
>    Default = `0'
>
>  -max_target_seqs <Integer, >=1>
>    Maximum number of aligned sequences to keep
>    Not applicable for outfmt <= 4
>    Default = `500'
>
>
> I hope this helps with your thesis.
>
> Ivan
>
>
>
>
>
> Ivan Gregoretti, PhD
> Bioinformatics
>
>
>
> On Tue, Jul 30, 2013 at 10:14 AM, Ara Kooser <ghashsnaga at gmail.com> wrote:
>
>> Peter,
>>
>>   Thank you for your quick response! I added in the -perc_identity and
>> closed the file. I end up with the same results. I do get the full
>> sequences but also a bunch of partials.
>>
>> ara
>>
>>
>> On Tue, Jul 30, 2013 at 2:12 AM, Peter Cock <p.j.a.cock at googlemail.com
>> >wrote:
>>
>> > On Tue, Jul 30, 2013 at 2:45 AM, Ara Kooser <ghashsnaga at gmail.com>
>> wrote:
>> > > Hello all,
>> > >
>> > >    I goofed up on curating accession numbers for part of my PhD
>> project.
>> > > But I have the sequences in a big fasta file! I wrote a quick script
>> that
>> > > read in one sequence at a time from the file, blasted it and then
>> > filtered
>> > > it based on 0 gaps and 100% id match. I did this for just the first 6
>> > > sequences as to not anger the NCBI. This worked great! But it's slow
>> > > (really slow) and I can't submit the whole file.
>> > >
>> > >  I installed a local blast db and wrote this script.(attached as
>> > > meta_data_local.py and the query file, clear_genus_level.fasta ):
>> > >
>> > >
>> >
>> ########################################################################################
>> > > #I want to read in one sequence at a time from a fasta file and blast
>> it
>> > > against a local
>> > > #blast db.
>> > >
>> > > from Bio.Blast.Applications import NcbiblastnCommandline
>> > > from Bio.Blast import NCBIXML
>> > > from Bio import SeqIO
>> > > from Bio import Seq
>> > > from Bio.SeqRecord import SeqRecord
>> > >
>> > > nt = "/Users/arakooser/blast/db/nt.00"
>> > > #Where the database is located at
>> > > file_out = open("metadata_genus.level.csv","w+")
>> > >
>> > > #Contains all the data my boss wants on the sequences
>> > > file_in = open("clear_genus_level.fasta")
>> > >
>> > > #The main fasta file that needs to be blasted
>> > >
>> > > fas_rec = SeqIO.parse(file_in,"fasta")
>> > > #Parses the main fasta file
>> > >
>> > > for first_seq in fas_rec:
>> > > #Hopefully grabs the first sequence
>> > > #Takes that sequence from standard in and sumbits it to the blast
>> > > commandline and spits
>> > > #out an xml
>> > >     result = NcbiblastnCommandline(query="-", db=nt, evalue=0.001,
>> > outfmt=5,
>> > >                                    out="temp.xml")
>> >
>> > You could ask BLAST itself to apply the percentage
>> > identity threshold, blastn has a -perc_identity option.
>> >
>> > >     stdout, stderr = result(stdin=first_seq.format("fasta"))
>> > >
>> > > #Reading in the xml file.
>> > > #
>> > >
>> > >     record = open("temp.xml")
>> > >     ...
>> >
>> > You never close this file handle, perhaps that is
>> > causing problems reusing the filename?
>> >
>> > It might be safer to use a different temporary
>> > file each time (there are standard functions to
>> > generate these names in Python)?
>> >
>> > Peter
>> >
>>
>>
>>
>> --
>> Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an
>> sub cardine glacialis ursae.
>>
>> Geoscience website: http://www.tattooedscience.org/
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>
>

-- 
Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an
sub cardine glacialis ursae.

Geoscience website: http://www.tattooedscience.org/