[BioRuby] Blast with file as a query option?

Ben Woodcroft donttrustben at gmail.com
Tue Apr 7 00:30:09 EDT 2009


And there is the -a flag, for specifying you want to use multiple CPUs.
ben

2009/4/7 Naohisa GOTO <ngoto at gen-info.osaka-u.ac.jp>

> Hi,
>
> On Sun, 5 Apr 2009 14:13:37 -1000
> Kevin English <kenglish at gmail.com> wrote:
>
> > Hello,
> >   I have to very large local fasta files that wish to blast against one
> > another and parse the results in bio ruby. I'm wondering if there is a
> way
> > to mimic the behavior of this blast command:
> >
> > blastall -p blastn -i Large_list_sequences_1.fasta -d
> Large_list_sequences_2
> >
> >
> > where Large_list_sequences_2 is a formatted fasta db. My current
> > implementation opens Large_list_sequences_1.fasta and goes through it
> > sequence by sequence. It seems to run pretty slow. I'm wondering if I can
> in
> > some way do the above blast command and loop through the results and get
> a
> > performance gain.
>
> To gain performance, adding options to BLAST is strongly recommended.
>  -e  Expectation value (E) [Real]
>    default = 10.0
>  -v  Number of database sequences to show one-line descriptions for (V)
> [Integer]
>    default = 500
>  -b  Number of database sequence to show alignments for (B) [Integer]
>    default = 250
>
> Changing above to smaller values will reduce output report size
> which means performance gain.
>
> Executing BLAST with multiple query sequences can also gain performance.
> In addition, when you have query sequences in a local file, calling
> blastall command directly without Bio::Blast may be good.
>
> For example,
>
>  require 'bio'
>  require 'tempfile'
>
>  command = %( blastall -p blastn -i Large_list_sequences_1.fasta
>                -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 )
>  tempfile = Tempfile.new('blastout')
>  tempfile.close(false)
>  command = command + [ "-o", tempfile.path ]
>  system(*command)
>  # After system(), error checks will be needed but skipped.
>  tempfile.open
>  ff = Bio::FlatFile.open(tempfile)
>  ff.each do |report|
>    # For example, prints query_def and target_def
>    report.each do |hit|
>      print report.query_def, "\t", hit.target_def, "\n"
>    end
>  end
>  ff.close
>  tempfile.close(true)
>
> > For any curious, my code is on github:
> >
> > http://github.com/kenglishhi/bioflexrails/tree/master
> >
> > The file that is doing the blasts is under app/model/biodatabase.rb.
> >
> > I'm trying to write a rails app uses biosql db and allows this biologist
> to
> > organize his sequences. I'm very new to bioinformatics but have a lot
> > experience with Ruby on Rails.
> >
> > Thanks in advance for you help.
>
> In general, a BLAST search against a very large database takes
> very long time, and using batch queueing system might be needed.
>
> Thanks,
>
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>



-- 
FYI: My email addresses at unimelb, uq and gmail all redirect to the same
place.


More information about the BioRuby mailing list