' When I read the files that generated the error with convert_trace to attempt to transform to scf trace, they are parsed properly meaning their is an issue with the way the abif package works or the bio::flatfile iteration procedure in bioruby. Are there known bugs? some solution? -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From ngoto at gen-info.osaka-u.ac.jp Sun Jul 8 06:25:47 2012 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 8 Jul 2012 19:25:47 +0900 Subject: [BioRuby] Reading ab1 files in bioruby In-Reply-To: References: Message-ID: <201207081034.q68AYSMv020706@portal.open-bio.org> Hi, It seems that the Bio::Abif.open opens file with ASCII mode. With Ruby 1.9 and/or on Windows, with ASCII mode, conversions of encoding and/or line ending characters may break binary data. Simple workaround is to open the file with binary mode. f = File.open(filename, 'rb') chromatogram_ff = Bio::Abif.open(f) chromatogram_ff.each do |ch| ch.to_seq end With the latest git version, Bio::FlatFile opens file with binary modde by default, and the problem would not occur. Related topic: http://lists.open-bio.org/pipermail/bioruby/2012-June/002342.html Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Sat, 7 Jul 2012 06:35:59 +0300 George Githinji wrote: > Dear list, > Is there a Bioruby best practice way of reading an ab1 chromatogram file? > I am processing a long list of ab1 chromatogram files like this > > file= "filename" > chromatogram_ff = Bio::Abif.open(filename) > chromatogram_ff.each do |ch| > ch.to_seq > end > > however I came across this error with some files > > /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:107:in > `get_entry_data': undefined method `match' for nil:NilClass > (NoMethodError) > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:83:in > `block in get_directory_entries' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in > `times' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in > `get_directory_entries' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:42:in > `initialize' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in > `new' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in > `get_parsed_entry' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:288:in > `next_entry' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:335:in > `each_entry' > from parse_ab1.rb:11:in `

' > > > When I read the files that generated the error with convert_trace to > attempt to transform to scf trace, they are parsed properly meaning > their is an issue with the way the abif package works or the > bio::flatfile iteration procedure in bioruby. Are there known bugs? > some solution? > > > -- > --------------- > Sincerely > George > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ > Twitter: http://twitter.com/#!/george_l > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From georgkam at gmail.com Sun Jul 8 09:11:31 2012 From: georgkam at gmail.com (George Githinji) Date: Sun, 8 Jul 2012 16:11:31 +0300 Subject: [BioRuby] Reading ab1 files in bioruby In-Reply-To: <201207081034.q68AYSMv020706@portal.open-bio.org> References: <201207081034.q68AYSMv020706@portal.open-bio.org> Message-ID: Hi Naohisa, Many thanks for the tip. Opening the file in Binary mode works fine now. George On Sun, Jul 8, 2012 at 1:25 PM, Naohisa GOTO wrote: > Hi, > > It seems that the Bio::Abif.open opens file with ASCII mode. > With Ruby 1.9 and/or on Windows, with ASCII mode, conversions > of encoding and/or line ending characters may break binary data. > > Simple workaround is to open the file with binary mode. > > f = File.open(filename, 'rb') > chromatogram_ff = Bio::Abif.open(f) > chromatogram_ff.each do |ch| > ch.to_seq > end > > With the latest git version, Bio::FlatFile opens file with > binary modde by default, and the problem would not occur. > > Related topic: > http://lists.open-bio.org/pipermail/bioruby/2012-June/002342.html > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Sat, 7 Jul 2012 06:35:59 +0300 > George Githinji wrote: > >> Dear list, >> Is there a Bioruby best practice way of reading an ab1 chromatogram file? >> I am processing a long list of ab1 chromatogram files like this >> >> file= "filename" >> chromatogram_ff = Bio::Abif.open(filename) >> chromatogram_ff.each do |ch| >> ch.to_seq >> end >> >> however I came across this error with some files >> >> /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:107:in >> `get_entry_data': undefined method `match' for nil:NilClass >> (NoMethodError) >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:83:in >> `block in get_directory_entries' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in >> `times' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in >> `get_directory_entries' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:42:in >> `initialize' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in >> `new' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in >> `get_parsed_entry' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:288:in >> `next_entry' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:335:in >> `each_entry' >> from parse_ab1.rb:11:in `

' >> >> >> When I read the files that generated the error with convert_trace to >> attempt to transform to scf trace, they are parsed properly meaning >> their is an issue with the way the abif package works or the >> bio::flatfile iteration procedure in bioruby. Are there known bugs? >> some solution? >> >> >> -- >> --------------- >> Sincerely >> George >> Skype: george_g2 >> Blog: http://biorelated.wordpress.com/ >> Twitter: http://twitter.com/#!/george_l >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From cswh at umich.edu Mon Jul 9 23:21:40 2012 From: cswh at umich.edu (Clayton Wheeler) Date: Mon, 9 Jul 2012 23:21:40 -0400 Subject: [BioRuby] bio-maf 0.2.0 (and Kyoto Cabinet gem for JRuby) Message-ID: Hi all, I've released version 0.2.0 of bio-maf for BioRuby: http://csw.github.com/bioruby-maf/blog/2012/07/09/bio-maf_0.2.0/ Notably, this release includes removal of gaps remaining after filtering out sequences, and 'tiling' multiple alignment blocks together along with reference sequence data. Also, last week I released my Kyoto Cabinet support for JRuby as a separate gem. It's now approaching parity with the standard Ruby library for Kyoto Cabinet. http://csw.github.com/bioruby-maf/blog/2012/07/02/kyoto_cabinet_support_for_jruby/ Clayton Wheeler cswh at umich.edu From lomereiter at gmail.com Tue Jul 10 06:26:35 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 10 Jul 2012 14:26:35 +0400 Subject: [BioRuby] [GSoC] weekly report #8 Message-ID: Hello all, here's the link to the report: http://lomereiter.wordpress.com/2012/07/10/gsoc-weekly-report-8/ last week I implemented producing BAI files, and my tool sambamba-index exploits parallelism and thus is faster than samtools on multicore. Now I'm working on sorting, basic version already works but memory consumption should be improved. In fact, at least for HDDs, time of indexing and sorting is bounded by I/O speed, not the number of CPUs. So for sorting I need to tweak sizes of read/write buffers in order to get maximum performance. By the end of this week, I'm also going to make an utility for merging several sorted BAM files into one. From marian.povolny at gmail.com Tue Jul 10 18:01:07 2012 From: marian.povolny at gmail.com (Marjan Povolni) Date: Wed, 11 Jul 2012 00:01:07 +0200 Subject: [BioRuby] GSoC weekly status report No.7 Message-ID: http://blog.mpthecoder.com/post/26930939671/gsoc-weekly-status-report-no-7 I was hoping to get more done over the weekend, but the internet connection was down, so I had to take the weekend off :) Otherwise I?m working toward the 0.2 version. The deadline is set for Saturday evening. What will be in it keeps changing, but for now there are new toString() and recursiveToString() methods in Feature class, and append_to(?) methods which accept an Appender object, for more efficient output. The utility for correctly counting features is now notably faster, and gff3-ffetch has a new option for passing FASTA data to output. Currently in planning are: support for new types of records (pragmas and comments), GDC support and Ruby interface for the validation utility. More could be added to this list, but I also have to make a plan for the second half of the summer, and that will take some time too. I was hoping to use the GDC which comes with Ubuntu 12.04, but I gave up on that because of some confusing errors I was receiving in the D stdlib. I will try to build the GDC directly from its GitHub repository and get my library to compile with it. Making man pages for binaries in gems is also a problem which currently has no elegant solution. I don?t want to force my users to type ?gem man command?, so I?m planning to split the current repository into two: gff3-pltools in D and then the second repository for the Ruby library. The gff3-pltools would then receive a more traditional installation procedure and receive proper man pages. -- Marjan From marian.povolny at gmail.com Mon Jul 16 13:16:12 2012 From: marian.povolny at gmail.com (Marjan Povolni) Date: Mon, 16 Jul 2012 19:16:12 +0200 Subject: [BioRuby] GSoC weekly status report No.8 Message-ID: http://blog.mpthecoder.com/post/27339349340/gsoc-weekly-status-report-no-8 Summary: The 0.2 version of gff3-pltools has been released, together with a Ruby gem bio-gff3-pltools. Binary and source packages can be downloaded from the following location: http://mamarjan.github.com/gff3-pltools/ On Wednesday I?ll be traveling to Lodi for the EU-codefest, there I?ll be presenting about the project and current GFF3 parser and tools performance. For the next release I would like to add parallelism to the parser. I?m also thinking about adding a new option to gff3-ffetch, which would let the user specify which fields and attributes to output in tab-separated columns. Best regards, Marjan From cjfields at illinois.edu Mon Jul 16 13:20:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 16 Jul 2012 17:20:06 +0000 Subject: [BioRuby] [GSoC] GSoC weekly status report No.8 In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63D4B5@CHIMBX5.ad.uillinois.edu> I'll try to be on IRC (#bioruby and #obf-soc) those days, I may have a few questions. chris On Jul 16, 2012, at 12:16 PM, Marjan Povolni wrote: > http://blog.mpthecoder.com/post/27339349340/gsoc-weekly-status-report-no-8 > > Summary: > > The 0.2 version of gff3-pltools has been released, together with a Ruby gem > bio-gff3-pltools. Binary and source packages can be downloaded from the > following location: > > http://mamarjan.github.com/gff3-pltools/ > > On Wednesday I?ll be traveling to Lodi for the EU-codefest, there I?ll be > presenting about the project and current GFF3 parser and tools performance. > > For the next release I would like to add parallelism to the parser. I?m > also thinking about adding a new option to gff3-ffetch, which would let the > user specify which fields and attributes to output in tab-separated columns. > > Best regards, > Marjan > > _______________________________________________ > GSoC mailing list > GSoC at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/gsoc From pjotr.public14 at thebird.nl Mon Jul 16 13:29:06 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Mon, 16 Jul 2012 19:29:06 +0200 Subject: [BioRuby] [GSoC] GSoC weekly status report No.8 In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF2B63D4B5@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF2B63D4B5@CHIMBX5.ad.uillinois.edu> Message-ID: <20120716172906.GA20140@thebird.nl> On Mon, Jul 16, 2012 at 05:20:06PM +0000, Fields, Christopher J wrote: > I'll try to be on IRC (#bioruby and #obf-soc) those days, I may have a few questions. Cool :) We will also join gbrowse IRC. From lomereiter at gmail.com Tue Jul 17 02:47:49 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 17 Jul 2012 10:47:49 +0400 Subject: [BioRuby] [GSoC] weekly report #9 Message-ID: Hello everybody, My progress report for the past week is available at http://lomereiter.wordpress.com/2012/07/17/gsoc-weekly-report-9/ I've implemented sorting and merging, both parallelized and quite fast. Also my merging tool improves on ideas taken from Picard source code and merges SAM headers as well as sorted alignment records. For those who use Debian, packages for amd64 and i386 are now available: https://github.com/lomereiter/sambamba/downloads At the moment, alternatives to the following samtools commands are developed: view, index, sort, merge, flagstat. The current limitation is that most tools don't work with stdin/stdout and work with BAM files only (does anybody still use SAM?). Nevertheless, they wisely use multi-core processors and usually give a better speed. From pjotr.public14 at thebird.nl Tue Jul 17 03:59:38 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 17 Jul 2012 09:59:38 +0200 Subject: [BioRuby] [GSoC] weekly report #9 In-Reply-To: References: Message-ID: <20120717075938.GA30198@thebird.nl> Are you going to support STDIN/STDOUT? Another killer feature! On Tue, Jul 17, 2012 at 10:47:49AM +0400, Artem Tarasov wrote: > Hello everybody, > > My progress report for the past week is available at > http://lomereiter.wordpress.com/2012/07/17/gsoc-weekly-report-9/ > > I've implemented sorting and merging, both parallelized and quite fast. > Also my merging tool improves on ideas taken from Picard source code and > merges SAM headers as well as sorted alignment records. > > For those who use Debian, packages for amd64 and i386 are now available: > > https://github.com/lomereiter/sambamba/downloads > > At the moment, alternatives to the following samtools commands are > developed: view, index, sort, merge, flagstat. The current limitation is > that most tools don't work with stdin/stdout and work with BAM files only > (does anybody still use SAM?). Nevertheless, they wisely use multi-core > processors and usually give a better speed. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From lomereiter at gmail.com Tue Jul 17 04:38:22 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 17 Jul 2012 12:38:22 +0400 Subject: [BioRuby] [GSoC] weekly report #9 In-Reply-To: <20120717075938.GA30198@thebird.nl> References: <20120717075938.GA30198@thebird.nl> Message-ID: Firstly, I wouldn't call that a killer feature. On Un*x you should be able to use /dev/stdin and /dev/stdout (or a named pipe) as input/output filenames, that's the way people pipe Picard tools. Many Un*x tools (including samtools) facilitate that by using dash as a shortcut for stdin/stdout, but this is not a requirement. Clearly, STDIN can't be used for random access, and some parts of my code currently rely on assumption that input stream is seekable. I should make that optional, and then named pipes can be used as input. On Tue, Jul 17, 2012 at 11:59 AM, Pjotr Prins wrote: > Are you going to support STDIN/STDOUT? Another killer feature! > > On Tue, Jul 17, 2012 at 10:47:49AM +0400, Artem Tarasov wrote: > > Hello everybody, > > > > My progress report for the past week is available at > > http://lomereiter.wordpress.com/2012/07/17/gsoc-weekly-report-9/ > > > > I've implemented sorting and merging, both parallelized and quite fast. > > Also my merging tool improves on ideas taken from Picard source code and > > merges SAM headers as well as sorted alignment records. > > > > For those who use Debian, packages for amd64 and i386 are now available: > > > > https://github.com/lomereiter/sambamba/downloads > > > > At the moment, alternatives to the following samtools commands are > > developed: view, index, sort, merge, flagstat. The current limitation is > > that most tools don't work with stdin/stdout and work with BAM files only > > (does anybody still use SAM?). Nevertheless, they wisely use multi-core > > processors and usually give a better speed. > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > From pjotr.public14 at thebird.nl Tue Jul 17 06:56:59 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 17 Jul 2012 12:56:59 +0200 Subject: [BioRuby] the bio-table tool Message-ID: <20120717105659.GA1871@thebird.nl> I just want to share the release of one of the most useful tools I have come up with in a while :). Most of us have to deal with tabular data, delivered through spreadsheets, SQL output etc. I found I was repeating myself too often, writing one-off scripts. So I have come up with a command-line tool which allows you to transform and edit(!) tables on the fly, using one-liners. Just now I was wanted to find overlapping marker/geneset combinations in two files. The command was bio-table --overlap 0,2 NA.SUMMARY.RESULTS.REPORT.1.txt gsea_report_for_1_1342469955711.csv > overlap.1.tab where the columns to compare were 0 and 2. You can diff on columns: bio-table --diff 0,3 table2.csv table1.csv and merge tables (side by side). You can filter on values bio-table table1.csv --num-filter "values[3]-values[6] >= 0.05" and regex bio-table table1.csv --filter "rowname =~ /BGT/ and field[1] =~ /BGT/" and rewrite values bio-table table1.csv --rewrite 'rowname.upcase!; field[1]=nil if field[2].to_f<0.25' See https://github.com/pjotrp/bioruby-table for more examples. Pj. From georgkam at gmail.com Tue Jul 17 07:09:44 2012 From: georgkam at gmail.com (George Githinji) Date: Tue, 17 Jul 2012 14:09:44 +0300 Subject: [BioRuby] the bio-table tool In-Reply-To: <20120717105659.GA1871@thebird.nl> References: <20120717105659.GA1871@thebird.nl> Message-ID: Hi, I have been writing awk scripts to deal with CSV data when i need to. Having something in Ruby is totally cool! How good is it at handling tables with hundreds of thousands of lines? thanks PJ. On Tue, Jul 17, 2012 at 1:56 PM, Pjotr Prins wrote: > I just want to share the release of one of the most useful tools I > have come up with in a while :). > > Most of us have to deal with tabular data, delivered through > spreadsheets, SQL output etc. I found I was repeating myself too > often, writing one-off scripts. So I have come up with a command-line > tool which allows you to transform and edit(!) tables on the fly, > using one-liners. > > Just now I was wanted to find overlapping marker/geneset combinations > in two files. The command was > > bio-table --overlap 0,2 NA.SUMMARY.RESULTS.REPORT.1.txt gsea_report_for_1_1342469955711.csv > overlap.1.tab > > where the columns to compare were 0 and 2. You can diff on columns: > > bio-table --diff 0,3 table2.csv table1.csv > > and merge tables (side by side). You can filter on values > > bio-table table1.csv --num-filter "values[3]-values[6] >= 0.05" > > and regex > > bio-table table1.csv --filter "rowname =~ /BGT/ and field[1] =~ /BGT/" > > and rewrite values > > bio-table table1.csv --rewrite 'rowname.upcase!; field[1]=nil if field[2].to_f<0.25' > > See https://github.com/pjotrp/bioruby-table for more examples. > > Pj. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From pjotr.public14 at thebird.nl Tue Jul 17 11:03:01 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 17 Jul 2012 17:03:01 +0200 Subject: [BioRuby] the bio-table tool In-Reply-To: References: <20120717105659.GA1871@thebird.nl> Message-ID: <20120717150301.GE3531@thebird.nl> In principle it is streamed. On Tue, Jul 17, 2012 at 02:09:44PM +0300, George Githinji wrote: > Hi, > I have been writing awk scripts to deal with CSV data when i need to. > Having something in Ruby is totally cool! How good is it at handling > tables with hundreds of thousands of lines? > thanks PJ. > > > On Tue, Jul 17, 2012 at 1:56 PM, Pjotr Prins wrote: > > I just want to share the release of one of the most useful tools I > > have come up with in a while :). > > > > Most of us have to deal with tabular data, delivered through > > spreadsheets, SQL output etc. I found I was repeating myself too > > often, writing one-off scripts. So I have come up with a command-line > > tool which allows you to transform and edit(!) tables on the fly, > > using one-liners. > > > > Just now I was wanted to find overlapping marker/geneset combinations > > in two files. The command was > > > > bio-table --overlap 0,2 NA.SUMMARY.RESULTS.REPORT.1.txt gsea_report_for_1_1342469955711.csv > overlap.1.tab > > > > where the columns to compare were 0 and 2. You can diff on columns: > > > > bio-table --diff 0,3 table2.csv table1.csv > > > > and merge tables (side by side). You can filter on values > > > > bio-table table1.csv --num-filter "values[3]-values[6] >= 0.05" > > > > and regex > > > > bio-table table1.csv --filter "rowname =~ /BGT/ and field[1] =~ /BGT/" > > > > and rewrite values > > > > bio-table table1.csv --rewrite 'rowname.upcase!; field[1]=nil if field[2].to_f<0.25' > > > > See https://github.com/pjotrp/bioruby-table for more examples. > > > > Pj. > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > > -- > --------------- > Sincerely > George > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ > Twitter: http://twitter.com/#!/george_l > From cswh at umich.edu Wed Jul 18 15:44:58 2012 From: cswh at umich.edu (Clayton Wheeler) Date: Wed, 18 Jul 2012 15:44:58 -0400 Subject: [BioRuby] bio-maf release 0.3.0 Message-ID: Hi all, I've released bio-maf version 0.3.0: http://csw.github.com/bioruby-maf/blog/2012/07/18/bio-maf_0.3.0/ This version adds features including joining adjacent MAF blocks when sequences that caused them to be split have been filtered out; returning bio-alignment objects; and truncating (or ?slicing?) alignment blocks to only cover a given genomic interval. For developers, this also adds a higher-level Bio::MAF::Access API for working with directories containing indexed MAF files (or, alternatively, single files), providing all relevant functionality for indexed access in a simpler way than using the KyotoIndex and Parser classes directly. The maf_tile(1) utility has been updated to use this functionality; a directory of indexed MAF files can now be specified, and the correct file will now be parsed as appropriate. Usage of Enumerators and blocks has also been substantially improved; all access methods for multiple blocks such as Access#find, Access#slice, Parser#each_block now accept a block parameter, which will be called for each block in turn. If no block parameter is given, they will all return an Enumeratorfor the resulting blocks. This is how most of the Ruby standard library, e.g. Array#each, works. -- Clayton Wheeler cswh at umich.edu From pjotr.public14 at thebird.nl Fri Jul 20 10:29:50 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Fri, 20 Jul 2012 16:29:50 +0200 Subject: [BioRuby] biogems.info now lists other bioinformatics software Message-ID: <20120720142950.GB21452@thebird.nl> We have expanded http://biogems.info/ at the USA and EU-codefests to show useful and testsed binary (.deb) installed packages - basically by querying an installed CloudBiolinux VM, and fetching info from Debian Bio Med. See http://www.biogems.info/biolinux.html Pj. From lomereiter at gmail.com Tue Jul 24 10:46:09 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 24 Jul 2012 18:46:09 +0400 Subject: [BioRuby] [GSoC] weekly report #10 Message-ID: Hi all, During the past week I've added filtering functionality to sambamba-view utility. Now the tool parses expressions like "mapping_quality >= 50 and [MQ] >=50 and not ([RG] =~ /abcd/i or [RG] == null)", superseding the functionality given by samtools flags -f, -F, -q, -l, -r. Also I'm now introducing wget-like text progressbars to my tools, as of now this is presented in sambamba-index only. More on that is at http://lomereiter.wordpress.com/2012/07/24/gsoc-weekly-report-10/ From carlcrott at gmail.com Sat Jul 28 21:43:16 2012 From: carlcrott at gmail.com (Carl Crott) Date: Sat, 28 Jul 2012 21:43:16 -0400 Subject: [BioRuby] BioRuby Digest, Vol 82, Issue 9 In-Reply-To: References: Message-ID: I'm wondering if anyone has established a list of the requirements for additional bio-gems... I was hoping to do some development work on a gem to integrate with KEGG / Genomeweb but it seems there is no central repo for the features which we'd like to see. This might be something to keep track of in the github feature area... I cant help but think that feature requests for gems will be lost in a mailing list like this over time. Specifically MR. Goto and Mr. Barton I've talked to you both about programming some things ... however maybe something I should work on instead is pulling together a feature list of all the features for each bio-gem. Without a doubt they'll change over time .. but I think something thats slightly more static and more searchable ( Gmail's partial-string search is surprisingly bad ) would be a good idea. If you guys like this idea I'd be happy to work on it ... or any API related bio-gem which happens to have a list of required features. Thanks for all your hard work!! -Carl From pjotr.public14 at thebird.nl Sun Jul 29 03:33:17 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sun, 29 Jul 2012 09:33:17 +0200 Subject: [BioRuby] BioRuby Digest, Vol 82, Issue 9 In-Reply-To: References:

Message-ID: <20120729073317.GA31006@thebird.nl> Hi Carl, On Sat, Jul 28, 2012 at 09:43:16PM -0400, Carl Crott wrote: > I'm wondering if anyone has established a list of the requirements for > additional bio-gems... Not systematically. github issues, at this point, appear to list new ideas. > I was hoping to do some development work on a gem to integrate with KEGG / > Genomeweb but it seems there is no central repo for the features which we'd > like to see. This might be something to keep track of in the github > feature area... I cant help but think that feature requests for gems will > be lost in a mailing list like this over time. For sure. > Specifically MR. Goto and Mr. Barton I've talked to you both about > programming some things ... however maybe something I should work on > instead is pulling together a feature list of all the features for each > bio-gem. Without a doubt they'll change over time .. but I think something > thats slightly more static and more searchable ( Gmail's partial-string > search is surprisingly bad ) would be a good idea. > > If you guys like this idea I'd be happy to work on it ... or any API > related bio-gem which happens to have a list of required features. That would be very interesting :) Have you seen? https://www.relishapp.com/cucumber/cucumber/docs/drb-server-integration Basically it presents features in a nice way. I like cucumber features, and together with issues we could use that to track feature requests and new ideas. Like I did with https://github.com/pjotrp/bioruby-alignment/issues/2 You can see I gave it the label 'Newbie', it could have had a label 'Feature'. There is a link to the feature that describes it: https://github.com/pjotrp/bioruby-alignment/blob/master/features/edit/gblocks.feature My proposal would be to track github biogem repositories for issues and features. For those features that fit nowhere (such as your KEGG/Genomeweb gem) we could create a new project http://github.com/bioruby/new_features To tie this all together we need some scripting for a web page, which could be listed on http://biogems.info/features.html. I would certainly like that! Also I am interested in your RSS scraper for peer reviewed journals. It would be nice to have news items on http://biogems.info/journals.html which would list papers that somehow seem relevant to FOSS. Pj. From ngoto at gen-info.osaka-u.ac.jp Mon Jul 30 06:37:57 2012 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Mon, 30 Jul 2012 19:37:57 +0900 Subject: [BioRuby] BioRuby Digest, Vol 82, Issue 9 In-Reply-To: <20120729073317.GA31006@thebird.nl> References:

<20120729073317.GA31006@thebird.nl> Message-ID: <201207301044.q6UAiUDH004113@portal.open-bio.org> Hi, On Sun, 29 Jul 2012 09:33:17 +0200 Pjotr Prins wrote: > Hi Carl, > > On Sat, Jul 28, 2012 at 09:43:16PM -0400, Carl Crott wrote: > > I'm wondering if anyone has established a list of the requirements for > > additional bio-gems... > > Not systematically. github issues, at this point, appear to list new > ideas. There are "ideas" pages in the bioruby.org Wiki page http://bioruby.open-bio.org/wiki/ but they are slightly outdated: some ideas have already been implemented as Biogems. http://bioruby.open-bio.org/wiki/Bioruby_site_Re-styling http://bioruby.open-bio.org/wiki/Workflows http://bioruby.open-bio.org/wiki/Next_Generation_Sequencing http://bioruby.open-bio.org/wiki/Contributing Please feel free to change existing pages and/or add new pages. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From p.j.a.cock at googlemail.com Tue Jul 31 06:37:35 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 31 Jul 2012 11:37:35 +0100 Subject: [BioRuby] Travis Continuous Integration testing & pull requests Message-ID: Hi all, I'm cross posting as this is an announcement. Please keep any follow up discussion to the relevant project specific mailing list, or if general open-bio-l please. Those following the OBF blog or the OBF or Bio* Twitter accounts will have already seen this, which I posted yesterday: http://news.open-bio.org/news/2012/07/travis-ci-for-testing/ In summary, since earlier this year BioRuby and then Biopython and BioPerl have been using Travis-CI.org (a hosted continuous integration service for the open source community) to run their unit tests automatically whenever their GitHub repositories are updated. In addition we now have TravisCI automatically running our tests on any new GitHub pull requests - supported by an OBF donation to Travis-CI, see: http://about.travis-ci.org/blog/announcing-pull-request-support/ Currently BioJava only uses GitHub as an SVN mirror - but this should still let you start using TravisCI for automated testing: http://about.travis-ci.org/docs/user/languages/java/ For EMBOSS, this is another incentive to convert from CVS to github - TravisCI recently announced support for C/C++ projects: http://about.travis-ci.org/blog/support_for_go_c_and_cpp/ http://about.travis-ci.org/docs/user/languages/c/ Potentially there are other OBF projects where this would be useful too. Regards, Peter From lomereiter at gmail.com Tue Jul 3 16:40:40 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 3 Jul 2012 20:40:40 +0400 Subject: [BioRuby] [GSoC] weekly report #7 Message-ID: Hi all, I wrote a blog post about the previous week: http://lomereiter.wordpress.com/2012/07/03/gsoc-weekly-report-7/ Highlights: First version of bioruby-sambamba gem is released on rubygems.org, but the installation process can be made much more convenient. Producing binaries for all common platforms and distributing them with platform-specific gems seems to be the best way to go. Also, I've done a lot of refactoring (however, a bit more is needed), and significantly improved speed of validation and SAM parsing. In July, I'm planning to implement indexing, sorting and merging BAM files, and also add filtering functionality to Ruby bindings. For the latter, I'm going to introduce a tiny query language so that command-line tools will be able to parse it, and bindings will have some filter classes with a method to generate a query string like. From marian.povolny at gmail.com Wed Jul 4 18:56:54 2012 From: marian.povolny at gmail.com (Marjan Povolni) Date: Wed, 4 Jul 2012 20:56:54 +0200 Subject: [BioRuby] GSoC weekly status report No.6 and v0.1.0 Message-ID: http://blog.mpthecoder.com/post/26505431193/gsoc-weekly-status-report-no-6-and-v0-1-0 This post is a little bit late, but I wanted it to be the announcement of the first release, the v0.1.0? of gff3-pltools! I've created a minimal website for this project, which can be found here: http://mamarjan.github.com/gff3-pltools/ There are links to binary gems for 32 and 64-bit Linux, a source package for other platforms, binary packages with the D tools only, and a link to the API docs for the Ruby library. Please read the blog post for more information, and the README for even more information. Best regards, Marjan From georgkam at gmail.com Sat Jul 7 03:35:59 2012 From: georgkam at gmail.com (George Githinji) Date: Sat, 7 Jul 2012 06:35:59 +0300 Subject: [BioRuby] Reading ab1 files in bioruby Message-ID: Dear list, Is there a Bioruby best practice way of reading an ab1 chromatogram file? I am processing a long list of ab1 chromatogram files like this file= "filename" chromatogram_ff = Bio::Abif.open(filename) chromatogram_ff.each do |ch| ch.to_seq end however I came across this error with some files /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:107:in `get_entry_data': undefined method `match' for nil:NilClass (NoMethodError) from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:83:in `block in get_directory_entries' from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in `times' from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in `get_directory_entries' from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:42:in `initialize' from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in `new' from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in `get_parsed_entry' from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:288:in `next_entry' from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:335:in `each_entry' from parse_ab1.rb:11:in `

' When I read the files that generated the error with convert_trace to attempt to transform to scf trace, they are parsed properly meaning their is an issue with the way the abif package works or the bio::flatfile iteration procedure in bioruby. Are there known bugs? some solution? -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From ngoto at gen-info.osaka-u.ac.jp Sun Jul 8 10:25:47 2012 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 8 Jul 2012 19:25:47 +0900 Subject: [BioRuby] Reading ab1 files in bioruby In-Reply-To: References: Message-ID: <201207081034.q68AYSMv020706@portal.open-bio.org> Hi, It seems that the Bio::Abif.open opens file with ASCII mode. With Ruby 1.9 and/or on Windows, with ASCII mode, conversions of encoding and/or line ending characters may break binary data. Simple workaround is to open the file with binary mode. f = File.open(filename, 'rb') chromatogram_ff = Bio::Abif.open(f) chromatogram_ff.each do |ch| ch.to_seq end With the latest git version, Bio::FlatFile opens file with binary modde by default, and the problem would not occur. Related topic: http://lists.open-bio.org/pipermail/bioruby/2012-June/002342.html Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Sat, 7 Jul 2012 06:35:59 +0300 George Githinji wrote: > Dear list, > Is there a Bioruby best practice way of reading an ab1 chromatogram file? > I am processing a long list of ab1 chromatogram files like this > > file= "filename" > chromatogram_ff = Bio::Abif.open(filename) > chromatogram_ff.each do |ch| > ch.to_seq > end > > however I came across this error with some files > > /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:107:in > `get_entry_data': undefined method `match' for nil:NilClass > (NoMethodError) > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:83:in > `block in get_directory_entries' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in > `times' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in > `get_directory_entries' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:42:in > `initialize' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in > `new' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in > `get_parsed_entry' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:288:in > `next_entry' > from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:335:in > `each_entry' > from parse_ab1.rb:11:in `

' > > > When I read the files that generated the error with convert_trace to > attempt to transform to scf trace, they are parsed properly meaning > their is an issue with the way the abif package works or the > bio::flatfile iteration procedure in bioruby. Are there known bugs? > some solution? > > > -- > --------------- > Sincerely > George > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ > Twitter: http://twitter.com/#!/george_l > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From georgkam at gmail.com Sun Jul 8 13:11:31 2012 From: georgkam at gmail.com (George Githinji) Date: Sun, 8 Jul 2012 16:11:31 +0300 Subject: [BioRuby] Reading ab1 files in bioruby In-Reply-To: <201207081034.q68AYSMv020706@portal.open-bio.org> References: <201207081034.q68AYSMv020706@portal.open-bio.org> Message-ID: Hi Naohisa, Many thanks for the tip. Opening the file in Binary mode works fine now. George On Sun, Jul 8, 2012 at 1:25 PM, Naohisa GOTO wrote: > Hi, > > It seems that the Bio::Abif.open opens file with ASCII mode. > With Ruby 1.9 and/or on Windows, with ASCII mode, conversions > of encoding and/or line ending characters may break binary data. > > Simple workaround is to open the file with binary mode. > > f = File.open(filename, 'rb') > chromatogram_ff = Bio::Abif.open(f) > chromatogram_ff.each do |ch| > ch.to_seq > end > > With the latest git version, Bio::FlatFile opens file with > binary modde by default, and the problem would not occur. > > Related topic: > http://lists.open-bio.org/pipermail/bioruby/2012-June/002342.html > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Sat, 7 Jul 2012 06:35:59 +0300 > George Githinji wrote: > >> Dear list, >> Is there a Bioruby best practice way of reading an ab1 chromatogram file? >> I am processing a long list of ab1 chromatogram files like this >> >> file= "filename" >> chromatogram_ff = Bio::Abif.open(filename) >> chromatogram_ff.each do |ch| >> ch.to_seq >> end >> >> however I came across this error with some files >> >> /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:107:in >> `get_entry_data': undefined method `match' for nil:NilClass >> (NoMethodError) >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:83:in >> `block in get_directory_entries' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in >> `times' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:77:in >> `get_directory_entries' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/db/sanger_chromatogram/abif.rb:42:in >> `initialize' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in >> `new' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile/splitter.rb:55:in >> `get_parsed_entry' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:288:in >> `next_entry' >> from /Users/george/.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/bio-1.4.2/lib/bio/io/flatfile.rb:335:in >> `each_entry' >> from parse_ab1.rb:11:in `

' >> >> >> When I read the files that generated the error with convert_trace to >> attempt to transform to scf trace, they are parsed properly meaning >> their is an issue with the way the abif package works or the >> bio::flatfile iteration procedure in bioruby. Are there known bugs? >> some solution? >> >> >> -- >> --------------- >> Sincerely >> George >> Skype: george_g2 >> Blog: http://biorelated.wordpress.com/ >> Twitter: http://twitter.com/#!/george_l >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From cswh at umich.edu Tue Jul 10 03:21:40 2012 From: cswh at umich.edu (Clayton Wheeler) Date: Mon, 9 Jul 2012 23:21:40 -0400 Subject: [BioRuby] bio-maf 0.2.0 (and Kyoto Cabinet gem for JRuby) Message-ID: Hi all, I've released version 0.2.0 of bio-maf for BioRuby: http://csw.github.com/bioruby-maf/blog/2012/07/09/bio-maf_0.2.0/ Notably, this release includes removal of gaps remaining after filtering out sequences, and 'tiling' multiple alignment blocks together along with reference sequence data. Also, last week I released my Kyoto Cabinet support for JRuby as a separate gem. It's now approaching parity with the standard Ruby library for Kyoto Cabinet. http://csw.github.com/bioruby-maf/blog/2012/07/02/kyoto_cabinet_support_for_jruby/ Clayton Wheeler cswh at umich.edu From lomereiter at gmail.com Tue Jul 10 10:26:35 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 10 Jul 2012 14:26:35 +0400 Subject: [BioRuby] [GSoC] weekly report #8 Message-ID: Hello all, here's the link to the report: http://lomereiter.wordpress.com/2012/07/10/gsoc-weekly-report-8/ last week I implemented producing BAI files, and my tool sambamba-index exploits parallelism and thus is faster than samtools on multicore. Now I'm working on sorting, basic version already works but memory consumption should be improved. In fact, at least for HDDs, time of indexing and sorting is bounded by I/O speed, not the number of CPUs. So for sorting I need to tweak sizes of read/write buffers in order to get maximum performance. By the end of this week, I'm also going to make an utility for merging several sorted BAM files into one. From marian.povolny at gmail.com Tue Jul 10 22:01:07 2012 From: marian.povolny at gmail.com (Marjan Povolni) Date: Wed, 11 Jul 2012 00:01:07 +0200 Subject: [BioRuby] GSoC weekly status report No.7 Message-ID: http://blog.mpthecoder.com/post/26930939671/gsoc-weekly-status-report-no-7 I was hoping to get more done over the weekend, but the internet connection was down, so I had to take the weekend off :) Otherwise I?m working toward the 0.2 version. The deadline is set for Saturday evening. What will be in it keeps changing, but for now there are new toString() and recursiveToString() methods in Feature class, and append_to(?) methods which accept an Appender object, for more efficient output. The utility for correctly counting features is now notably faster, and gff3-ffetch has a new option for passing FASTA data to output. Currently in planning are: support for new types of records (pragmas and comments), GDC support and Ruby interface for the validation utility. More could be added to this list, but I also have to make a plan for the second half of the summer, and that will take some time too. I was hoping to use the GDC which comes with Ubuntu 12.04, but I gave up on that because of some confusing errors I was receiving in the D stdlib. I will try to build the GDC directly from its GitHub repository and get my library to compile with it. Making man pages for binaries in gems is also a problem which currently has no elegant solution. I don?t want to force my users to type ?gem man command?, so I?m planning to split the current repository into two: gff3-pltools in D and then the second repository for the Ruby library. The gff3-pltools would then receive a more traditional installation procedure and receive proper man pages. -- Marjan From marian.povolny at gmail.com Mon Jul 16 17:16:12 2012 From: marian.povolny at gmail.com (Marjan Povolni) Date: Mon, 16 Jul 2012 19:16:12 +0200 Subject: [BioRuby] GSoC weekly status report No.8 Message-ID: http://blog.mpthecoder.com/post/27339349340/gsoc-weekly-status-report-no-8 Summary: The 0.2 version of gff3-pltools has been released, together with a Ruby gem bio-gff3-pltools. Binary and source packages can be downloaded from the following location: http://mamarjan.github.com/gff3-pltools/ On Wednesday I?ll be traveling to Lodi for the EU-codefest, there I?ll be presenting about the project and current GFF3 parser and tools performance. For the next release I would like to add parallelism to the parser. I?m also thinking about adding a new option to gff3-ffetch, which would let the user specify which fields and attributes to output in tab-separated columns. Best regards, Marjan From cjfields at illinois.edu Mon Jul 16 17:20:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 16 Jul 2012 17:20:06 +0000 Subject: [BioRuby] [GSoC] GSoC weekly status report No.8 In-Reply-To: References: Message-ID: <118F034CF4C3EF48A96F86CE585B94BF2B63D4B5@CHIMBX5.ad.uillinois.edu> I'll try to be on IRC (#bioruby and #obf-soc) those days, I may have a few questions. chris On Jul 16, 2012, at 12:16 PM, Marjan Povolni wrote: > http://blog.mpthecoder.com/post/27339349340/gsoc-weekly-status-report-no-8 > > Summary: > > The 0.2 version of gff3-pltools has been released, together with a Ruby gem > bio-gff3-pltools. Binary and source packages can be downloaded from the > following location: > > http://mamarjan.github.com/gff3-pltools/ > > On Wednesday I?ll be traveling to Lodi for the EU-codefest, there I?ll be > presenting about the project and current GFF3 parser and tools performance. > > For the next release I would like to add parallelism to the parser. I?m > also thinking about adding a new option to gff3-ffetch, which would let the > user specify which fields and attributes to output in tab-separated columns. > > Best regards, > Marjan > > _______________________________________________ > GSoC mailing list > GSoC at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/gsoc From pjotr.public14 at thebird.nl Mon Jul 16 17:29:06 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Mon, 16 Jul 2012 19:29:06 +0200 Subject: [BioRuby] [GSoC] GSoC weekly status report No.8 In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF2B63D4B5@CHIMBX5.ad.uillinois.edu> References: <118F034CF4C3EF48A96F86CE585B94BF2B63D4B5@CHIMBX5.ad.uillinois.edu> Message-ID: <20120716172906.GA20140@thebird.nl> On Mon, Jul 16, 2012 at 05:20:06PM +0000, Fields, Christopher J wrote: > I'll try to be on IRC (#bioruby and #obf-soc) those days, I may have a few questions. Cool :) We will also join gbrowse IRC. From lomereiter at gmail.com Tue Jul 17 06:47:49 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 17 Jul 2012 10:47:49 +0400 Subject: [BioRuby] [GSoC] weekly report #9 Message-ID: Hello everybody, My progress report for the past week is available at http://lomereiter.wordpress.com/2012/07/17/gsoc-weekly-report-9/ I've implemented sorting and merging, both parallelized and quite fast. Also my merging tool improves on ideas taken from Picard source code and merges SAM headers as well as sorted alignment records. For those who use Debian, packages for amd64 and i386 are now available: https://github.com/lomereiter/sambamba/downloads At the moment, alternatives to the following samtools commands are developed: view, index, sort, merge, flagstat. The current limitation is that most tools don't work with stdin/stdout and work with BAM files only (does anybody still use SAM?). Nevertheless, they wisely use multi-core processors and usually give a better speed. From pjotr.public14 at thebird.nl Tue Jul 17 07:59:38 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 17 Jul 2012 09:59:38 +0200 Subject: [BioRuby] [GSoC] weekly report #9 In-Reply-To: References: Message-ID: <20120717075938.GA30198@thebird.nl> Are you going to support STDIN/STDOUT? Another killer feature! On Tue, Jul 17, 2012 at 10:47:49AM +0400, Artem Tarasov wrote: > Hello everybody, > > My progress report for the past week is available at > http://lomereiter.wordpress.com/2012/07/17/gsoc-weekly-report-9/ > > I've implemented sorting and merging, both parallelized and quite fast. > Also my merging tool improves on ideas taken from Picard source code and > merges SAM headers as well as sorted alignment records. > > For those who use Debian, packages for amd64 and i386 are now available: > > https://github.com/lomereiter/sambamba/downloads > > At the moment, alternatives to the following samtools commands are > developed: view, index, sort, merge, flagstat. The current limitation is > that most tools don't work with stdin/stdout and work with BAM files only > (does anybody still use SAM?). Nevertheless, they wisely use multi-core > processors and usually give a better speed. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From lomereiter at gmail.com Tue Jul 17 08:38:22 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 17 Jul 2012 12:38:22 +0400 Subject: [BioRuby] [GSoC] weekly report #9 In-Reply-To: <20120717075938.GA30198@thebird.nl> References: <20120717075938.GA30198@thebird.nl> Message-ID: Firstly, I wouldn't call that a killer feature. On Un*x you should be able to use /dev/stdin and /dev/stdout (or a named pipe) as input/output filenames, that's the way people pipe Picard tools. Many Un*x tools (including samtools) facilitate that by using dash as a shortcut for stdin/stdout, but this is not a requirement. Clearly, STDIN can't be used for random access, and some parts of my code currently rely on assumption that input stream is seekable. I should make that optional, and then named pipes can be used as input. On Tue, Jul 17, 2012 at 11:59 AM, Pjotr Prins wrote: > Are you going to support STDIN/STDOUT? Another killer feature! > > On Tue, Jul 17, 2012 at 10:47:49AM +0400, Artem Tarasov wrote: > > Hello everybody, > > > > My progress report for the past week is available at > > http://lomereiter.wordpress.com/2012/07/17/gsoc-weekly-report-9/ > > > > I've implemented sorting and merging, both parallelized and quite fast. > > Also my merging tool improves on ideas taken from Picard source code and > > merges SAM headers as well as sorted alignment records. > > > > For those who use Debian, packages for amd64 and i386 are now available: > > > > https://github.com/lomereiter/sambamba/downloads > > > > At the moment, alternatives to the following samtools commands are > > developed: view, index, sort, merge, flagstat. The current limitation is > > that most tools don't work with stdin/stdout and work with BAM files only > > (does anybody still use SAM?). Nevertheless, they wisely use multi-core > > processors and usually give a better speed. > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > From pjotr.public14 at thebird.nl Tue Jul 17 10:56:59 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 17 Jul 2012 12:56:59 +0200 Subject: [BioRuby] the bio-table tool Message-ID: <20120717105659.GA1871@thebird.nl> I just want to share the release of one of the most useful tools I have come up with in a while :). Most of us have to deal with tabular data, delivered through spreadsheets, SQL output etc. I found I was repeating myself too often, writing one-off scripts. So I have come up with a command-line tool which allows you to transform and edit(!) tables on the fly, using one-liners. Just now I was wanted to find overlapping marker/geneset combinations in two files. The command was bio-table --overlap 0,2 NA.SUMMARY.RESULTS.REPORT.1.txt gsea_report_for_1_1342469955711.csv > overlap.1.tab where the columns to compare were 0 and 2. You can diff on columns: bio-table --diff 0,3 table2.csv table1.csv and merge tables (side by side). You can filter on values bio-table table1.csv --num-filter "values[3]-values[6] >= 0.05" and regex bio-table table1.csv --filter "rowname =~ /BGT/ and field[1] =~ /BGT/" and rewrite values bio-table table1.csv --rewrite 'rowname.upcase!; field[1]=nil if field[2].to_f<0.25' See https://github.com/pjotrp/bioruby-table for more examples. Pj. From georgkam at gmail.com Tue Jul 17 11:09:44 2012 From: georgkam at gmail.com (George Githinji) Date: Tue, 17 Jul 2012 14:09:44 +0300 Subject: [BioRuby] the bio-table tool In-Reply-To: <20120717105659.GA1871@thebird.nl> References: <20120717105659.GA1871@thebird.nl> Message-ID: Hi, I have been writing awk scripts to deal with CSV data when i need to. Having something in Ruby is totally cool! How good is it at handling tables with hundreds of thousands of lines? thanks PJ. On Tue, Jul 17, 2012 at 1:56 PM, Pjotr Prins wrote: > I just want to share the release of one of the most useful tools I > have come up with in a while :). > > Most of us have to deal with tabular data, delivered through > spreadsheets, SQL output etc. I found I was repeating myself too > often, writing one-off scripts. So I have come up with a command-line > tool which allows you to transform and edit(!) tables on the fly, > using one-liners. > > Just now I was wanted to find overlapping marker/geneset combinations > in two files. The command was > > bio-table --overlap 0,2 NA.SUMMARY.RESULTS.REPORT.1.txt gsea_report_for_1_1342469955711.csv > overlap.1.tab > > where the columns to compare were 0 and 2. You can diff on columns: > > bio-table --diff 0,3 table2.csv table1.csv > > and merge tables (side by side). You can filter on values > > bio-table table1.csv --num-filter "values[3]-values[6] >= 0.05" > > and regex > > bio-table table1.csv --filter "rowname =~ /BGT/ and field[1] =~ /BGT/" > > and rewrite values > > bio-table table1.csv --rewrite 'rowname.upcase!; field[1]=nil if field[2].to_f<0.25' > > See https://github.com/pjotrp/bioruby-table for more examples. > > Pj. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From pjotr.public14 at thebird.nl Tue Jul 17 15:03:01 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 17 Jul 2012 17:03:01 +0200 Subject: [BioRuby] the bio-table tool In-Reply-To: References: <20120717105659.GA1871@thebird.nl> Message-ID: <20120717150301.GE3531@thebird.nl> In principle it is streamed. On Tue, Jul 17, 2012 at 02:09:44PM +0300, George Githinji wrote: > Hi, > I have been writing awk scripts to deal with CSV data when i need to. > Having something in Ruby is totally cool! How good is it at handling > tables with hundreds of thousands of lines? > thanks PJ. > > > On Tue, Jul 17, 2012 at 1:56 PM, Pjotr Prins wrote: > > I just want to share the release of one of the most useful tools I > > have come up with in a while :). > > > > Most of us have to deal with tabular data, delivered through > > spreadsheets, SQL output etc. I found I was repeating myself too > > often, writing one-off scripts. So I have come up with a command-line > > tool which allows you to transform and edit(!) tables on the fly, > > using one-liners. > > > > Just now I was wanted to find overlapping marker/geneset combinations > > in two files. The command was > > > > bio-table --overlap 0,2 NA.SUMMARY.RESULTS.REPORT.1.txt gsea_report_for_1_1342469955711.csv > overlap.1.tab > > > > where the columns to compare were 0 and 2. You can diff on columns: > > > > bio-table --diff 0,3 table2.csv table1.csv > > > > and merge tables (side by side). You can filter on values > > > > bio-table table1.csv --num-filter "values[3]-values[6] >= 0.05" > > > > and regex > > > > bio-table table1.csv --filter "rowname =~ /BGT/ and field[1] =~ /BGT/" > > > > and rewrite values > > > > bio-table table1.csv --rewrite 'rowname.upcase!; field[1]=nil if field[2].to_f<0.25' > > > > See https://github.com/pjotrp/bioruby-table for more examples. > > > > Pj. > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > > -- > --------------- > Sincerely > George > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ > Twitter: http://twitter.com/#!/george_l > From cswh at umich.edu Wed Jul 18 19:44:58 2012 From: cswh at umich.edu (Clayton Wheeler) Date: Wed, 18 Jul 2012 15:44:58 -0400 Subject: [BioRuby] bio-maf release 0.3.0 Message-ID: Hi all, I've released bio-maf version 0.3.0: http://csw.github.com/bioruby-maf/blog/2012/07/18/bio-maf_0.3.0/ This version adds features including joining adjacent MAF blocks when sequences that caused them to be split have been filtered out; returning bio-alignment objects; and truncating (or ?slicing?) alignment blocks to only cover a given genomic interval. For developers, this also adds a higher-level Bio::MAF::Access API for working with directories containing indexed MAF files (or, alternatively, single files), providing all relevant functionality for indexed access in a simpler way than using the KyotoIndex and Parser classes directly. The maf_tile(1) utility has been updated to use this functionality; a directory of indexed MAF files can now be specified, and the correct file will now be parsed as appropriate. Usage of Enumerators and blocks has also been substantially improved; all access methods for multiple blocks such as Access#find, Access#slice, Parser#each_block now accept a block parameter, which will be called for each block in turn. If no block parameter is given, they will all return an Enumeratorfor the resulting blocks. This is how most of the Ruby standard library, e.g. Array#each, works. -- Clayton Wheeler cswh at umich.edu From pjotr.public14 at thebird.nl Fri Jul 20 14:29:50 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Fri, 20 Jul 2012 16:29:50 +0200 Subject: [BioRuby] biogems.info now lists other bioinformatics software Message-ID: <20120720142950.GB21452@thebird.nl> We have expanded http://biogems.info/ at the USA and EU-codefests to show useful and testsed binary (.deb) installed packages - basically by querying an installed CloudBiolinux VM, and fetching info from Debian Bio Med. See http://www.biogems.info/biolinux.html Pj. From lomereiter at gmail.com Tue Jul 24 14:46:09 2012 From: lomereiter at gmail.com (Artem Tarasov) Date: Tue, 24 Jul 2012 18:46:09 +0400 Subject: [BioRuby] [GSoC] weekly report #10 Message-ID: Hi all, During the past week I've added filtering functionality to sambamba-view utility. Now the tool parses expressions like "mapping_quality >= 50 and [MQ] >=50 and not ([RG] =~ /abcd/i or [RG] == null)", superseding the functionality given by samtools flags -f, -F, -q, -l, -r. Also I'm now introducing wget-like text progressbars to my tools, as of now this is presented in sambamba-index only. More on that is at http://lomereiter.wordpress.com/2012/07/24/gsoc-weekly-report-10/ From carlcrott at gmail.com Sun Jul 29 01:43:16 2012 From: carlcrott at gmail.com (Carl Crott) Date: Sat, 28 Jul 2012 21:43:16 -0400 Subject: [BioRuby] BioRuby Digest, Vol 82, Issue 9 In-Reply-To: References: Message-ID: I'm wondering if anyone has established a list of the requirements for additional bio-gems... I was hoping to do some development work on a gem to integrate with KEGG / Genomeweb but it seems there is no central repo for the features which we'd like to see. This might be something to keep track of in the github feature area... I cant help but think that feature requests for gems will be lost in a mailing list like this over time. Specifically MR. Goto and Mr. Barton I've talked to you both about programming some things ... however maybe something I should work on instead is pulling together a feature list of all the features for each bio-gem. Without a doubt they'll change over time .. but I think something thats slightly more static and more searchable ( Gmail's partial-string search is surprisingly bad ) would be a good idea. If you guys like this idea I'd be happy to work on it ... or any API related bio-gem which happens to have a list of required features. Thanks for all your hard work!! -Carl From pjotr.public14 at thebird.nl Sun Jul 29 07:33:17 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sun, 29 Jul 2012 09:33:17 +0200 Subject: [BioRuby] BioRuby Digest, Vol 82, Issue 9 In-Reply-To: References:

Message-ID: <20120729073317.GA31006@thebird.nl> Hi Carl, On Sat, Jul 28, 2012 at 09:43:16PM -0400, Carl Crott wrote: > I'm wondering if anyone has established a list of the requirements for > additional bio-gems... Not systematically. github issues, at this point, appear to list new ideas. > I was hoping to do some development work on a gem to integrate with KEGG / > Genomeweb but it seems there is no central repo for the features which we'd > like to see. This might be something to keep track of in the github > feature area... I cant help but think that feature requests for gems will > be lost in a mailing list like this over time. For sure. > Specifically MR. Goto and Mr. Barton I've talked to you both about > programming some things ... however maybe something I should work on > instead is pulling together a feature list of all the features for each > bio-gem. Without a doubt they'll change over time .. but I think something > thats slightly more static and more searchable ( Gmail's partial-string > search is surprisingly bad ) would be a good idea. > > If you guys like this idea I'd be happy to work on it ... or any API > related bio-gem which happens to have a list of required features. That would be very interesting :) Have you seen? https://www.relishapp.com/cucumber/cucumber/docs/drb-server-integration Basically it presents features in a nice way. I like cucumber features, and together with issues we could use that to track feature requests and new ideas. Like I did with https://github.com/pjotrp/bioruby-alignment/issues/2 You can see I gave it the label 'Newbie', it could have had a label 'Feature'. There is a link to the feature that describes it: https://github.com/pjotrp/bioruby-alignment/blob/master/features/edit/gblocks.feature My proposal would be to track github biogem repositories for issues and features. For those features that fit nowhere (such as your KEGG/Genomeweb gem) we could create a new project http://github.com/bioruby/new_features To tie this all together we need some scripting for a web page, which could be listed on http://biogems.info/features.html. I would certainly like that! Also I am interested in your RSS scraper for peer reviewed journals. It would be nice to have news items on http://biogems.info/journals.html which would list papers that somehow seem relevant to FOSS. Pj. From ngoto at gen-info.osaka-u.ac.jp Mon Jul 30 10:37:57 2012 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Mon, 30 Jul 2012 19:37:57 +0900 Subject: [BioRuby] BioRuby Digest, Vol 82, Issue 9 In-Reply-To: <20120729073317.GA31006@thebird.nl> References: