[BioRuby] using Bio::FlatFileIndex
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp
Tue Dec 11 14:59:52 UTC 2007
Hi,
Indexes can be generated with a command-line application br_bioflat.rb
or within Ruby script.
Example: creates an index from command line:
% br_bioflat.rb --create --type flat --location /home/xx/dbidx \
--dbname test --files /home/xx/test01.fst /home/xx/test02.fst
equivalent ruby script:
require 'bio'
is_bdb = nil # is_bdb = Bio::FlatFileIndex::MAGIC_BDB for BDB index
dbname = '/home/xx/dbidx/test'
format = nil # file format is automatically determined
options = {}
files = ['/home/xx/test01.fst', '/home/xx/test02.fst' ]
Bio::FlatFileIndex.makeindex(is_bdb, dbname, format, options, *files)
As Bio::FlatFileIndex was first written in 2002 and is
very old, the API is ugly. In addition, its internal structure
is too complicated. It may be rewritten and the API might
be changed in the future.
Addes files to the index:
% br_bioflat.rb --update --location /home/xx/dbidx \
--dbname test --files /home/xx/test03.fst /home/xx/test04.fst
equivalent ruby script:
require 'bio'
dbname = '/home/xx/dbidx/test'
options = {}
files = ['/home/xx/test03.fst', '/home/xx/test04.fst' ]
Bio::FlatFileIndex::update_index(dbname, nil, options, *files)
Re-read all files and re-generate the index:
% br_bioflat.rb --update --location /home/xx/dbidx \
--dbname test --renew
equivalent ruby script:
require 'bio'
dbname = '/home/xx/dbidx/test'
options = {}
options['renew'] = true
Bio::FlatFileIndex::update_index(dbname, nil, options, [])
Note that add files or updating the flat database (without BDB)
is very slow because it actually rebuilds indexes again.
Retrieving sequences in the index:
% br_bioflat.rb --location /home/xx/dbidx --dbname test M12963
equivalent ruby script:
require 'bio'
dbname = '/home/xx/dbidx/test'
key = 'M12963'
idx = Bio::FlatFileIndex.open(dbname)
results = idx.search(key)
results.each do |str|
print str
end
idx.close
'results' is a Bio::FlatFileIndex::Results object.
Each search result is an string.
(For more information, please see RDoc
http://bioruby.org/rdoc/classes/Bio/FlatFileIndex/Results.html )
If you want subsequence of fasta formatted data,
for example,
require 'bio'
dbname = '/home/xx/dbidx/test'
key = 'M12963'
result = idx.search(key)
result.each do |str|
ent = Bio::FastaFormat.new(str)
# for nucleic acid sequence
puts ent.naseq[0..100]
# for amino acid sequence
puts ent.aaseq[0..100]
# nucleic or amino acid sequence
puts ent.seq[0..100]
end
idx.close
Please see OBDA flat file indexing specifications
for philosophy and internal structure of index.
http://code.open-bio.org/cgi/viewcvs.cgi/obda-specs/flatfile/?cvsroot=obf-common
Thanks,
Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp
On Mon, 10 Dec 2007 17:21:31 -0000
"Schwach Frank Dr \(CMP\)" <F.Schwach at uea.ac.uk> wrote:
>
> Hi,
>
> I need to retrieve sequences from fasta files. In Perl I used to do this with Bio::DB:fasta but at first I couldn't find an equivalent in Bioruby and was almost about to give up and use Perl for this purpose when I found Bio::FlatFileIndex.
> Unfortunately, this class is not very well documented (unless I missed something). I think I can more or less figure out most of it from the code and the comments in the rdoc (http://bioruby.org/rdoc/classes/Bio/FlatFileIndex.html) but it would really be great to have some examples from people who are more familiar with this class, especially since I am relatively new to Ruby still.
>
> What I want to do is simply:
>
> 1) Build an index for a directory containing a few fasta files
> 2) In a Rails App (or any other Ruby script): retrieve sequences by their accessions and update the index if the fasta db is updated by the user.
>
> Some of the questions I have are:
> What are the options that I can pass to the makeindex method?
> In Bioperl it is possible to retrieve a subsequence straight away like this:
>
> my $seq_db_obj = Bio::DB::Fasta->new($path_to_db);
> my $seq = $seq_db_obj->seq($accession, $start, $end) ; # retrieve (sub)sequence from the database
>
> Can I do this in Ruby too or would I retrieve the entire sequence and then get the subsequence from that?
>
> Any help and examples welcome!
> Thanks a lot!
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
More information about the BioRuby
mailing list