[BioRuby] Blast with file as a query option?
Ben Woodcroft
donttrustben at gmail.com
Tue Apr 7 04:30:09 UTC 2009
And there is the -a flag, for specifying you want to use multiple CPUs.
ben
2009/4/7 Naohisa GOTO <ngoto at gen-info.osaka-u.ac.jp>
> Hi,
>
> On Sun, 5 Apr 2009 14:13:37 -1000
> Kevin English <kenglish at gmail.com> wrote:
>
> > Hello,
> > I have to very large local fasta files that wish to blast against one
> > another and parse the results in bio ruby. I'm wondering if there is a
> way
> > to mimic the behavior of this blast command:
> >
> > blastall -p blastn -i Large_list_sequences_1.fasta -d
> Large_list_sequences_2
> >
> >
> > where Large_list_sequences_2 is a formatted fasta db. My current
> > implementation opens Large_list_sequences_1.fasta and goes through it
> > sequence by sequence. It seems to run pretty slow. I'm wondering if I can
> in
> > some way do the above blast command and loop through the results and get
> a
> > performance gain.
>
> To gain performance, adding options to BLAST is strongly recommended.
> -e Expectation value (E) [Real]
> default = 10.0
> -v Number of database sequences to show one-line descriptions for (V)
> [Integer]
> default = 500
> -b Number of database sequence to show alignments for (B) [Integer]
> default = 250
>
> Changing above to smaller values will reduce output report size
> which means performance gain.
>
> Executing BLAST with multiple query sequences can also gain performance.
> In addition, when you have query sequences in a local file, calling
> blastall command directly without Bio::Blast may be good.
>
> For example,
>
> require 'bio'
> require 'tempfile'
>
> command = %( blastall -p blastn -i Large_list_sequences_1.fasta
> -d Large_list_sequences_2 -e 0.0001 -b 20 -v 20 )
> tempfile = Tempfile.new('blastout')
> tempfile.close(false)
> command = command + [ "-o", tempfile.path ]
> system(*command)
> # After system(), error checks will be needed but skipped.
> tempfile.open
> ff = Bio::FlatFile.open(tempfile)
> ff.each do |report|
> # For example, prints query_def and target_def
> report.each do |hit|
> print report.query_def, "\t", hit.target_def, "\n"
> end
> end
> ff.close
> tempfile.close(true)
>
> > For any curious, my code is on github:
> >
> > http://github.com/kenglishhi/bioflexrails/tree/master
> >
> > The file that is doing the blasts is under app/model/biodatabase.rb.
> >
> > I'm trying to write a rails app uses biosql db and allows this biologist
> to
> > organize his sequences. I'm very new to bioinformatics but have a lot
> > experience with Ruby on Rails.
> >
> > Thanks in advance for you help.
>
> In general, a BLAST search against a very large database takes
> very long time, and using batch queueing system might be needed.
>
> Thanks,
>
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
--
FYI: My email addresses at unimelb, uq and gmail all redirect to the same
place.
More information about the BioRuby
mailing list