[BioRuby] RegEx search example fasta file
Toshiaki Katayama
ktym at hgc.jp
Tue Mar 23 21:58:20 EST 2004
On 2004/03/21, at 22:33, pjotr at pckassa.com wrote:
> Can this go in the sample directory of bioruby - I have added it to
> the Wiki. Comments welcome.
As for the wiki page, comparing to the original BJIA,
(http://www.biojava.org/docs/bj_in_anger/FastaParser.htm)
this section is to answer how to parse fasta results.
As the Bio::FlatFile.auto in BioRuby is very powerful and
entry.definition is implemented in various DB classes,
the way of your code that finds entries by regexp
is not limited to the FastaFormat as follows:
% re_grep_def.rb 'serine.* kinase' genbank/gb*.seq
% re_grep_def.rb 'serine.* kinase' kegg/genes/*.ent
% re_grep_def.rb 'serine.* kinase' kegg/sequences/*.pep
----------------------------------------------
#!/usr/bin/env ruby
require 'bio'
re = /#{ARGV.shift}/i
Bio::FlatFile.auto(ARGF) do |ff|
ff.each do |entry|
if re.match(entry.definition)
puts ff.entry_raw
end
end
end
----------------------------------------------
-k
>
> Pj.
>
>
> #! /usr/bin/ruby
> #
> # $Id: fastasearch,v 1.1 2004/03/21 13:18:41 wrk Exp $
> # $Source: /home/cvs/home/pjotr/lwrk/luw/fasta/fastasearch,v $
> #
>
> # require 'profile'
>
> COPYRIGHT = "GPL (c) 2003-2004"
>
> usage = <<USAGE
>
> Search fasta file(s) tags using a regular expression (regex)
>
> Usage: fastasearch [-q query] filename(s)
>
> Example:
>
> ruby fastasearch -q '/([Hh]uman|[Hh]omo sapiens)/' nr.fa
>
> For more information see
>
> http://thebird.nl/bioinformatics/
>
> Pjotr Prins
> Wageningen University and Research Centre
> http://www.wur.nl/
> http://www.dpw.wageningen-ur.nl/nema/
>
> USAGE
>
> # --------------------------------------------------------------------
>
> srcpath=File.dirname($0)
> libpath=File.dirname(srcpath)+'/lib'
> $: << srcpath # ---- Add start path to search libraries
> $: << libpath
>
> require 'getoptlong'
> require 'bio'
>
> # ---- Parse command line
> opts = GetoptLong.new(
> [ "--help", "-h", GetoptLong::NO_ARGUMENT ],
> [ "--query", "-q", GetoptLong::REQUIRED_ARGUMENT ]
> )
>
> do_help = false
> query=nil
>
> opts.each do | opt, arg |
> do_help |= (opt == '--help')
> query = arg if (opt == '--query')
> end
>
> # ---- Print usage
> if (do_help || ARGV.size==0)
> print usage
> exit 1
> end
>
> if !query
> print "Give query: "
> query = $stdin.gets.chomp
> end
>
> ARGV.each do | fn |
> $stderr.print "Loading #{fn}..."
> f = Bio::FlatFile.auto(fn)
> $stderr.print " detected: #{f.dbclass}\n"
> f.each_entry do | e |
> if e.definition =~ /#{query}/
> print '>',e.definition,e.data
> end
> end
> end
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioruby
More information about the BioRuby
mailing list