From rob.syme at gmail.com Wed Jun 1 03:17:30 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 15:17:30 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin Message-ID: I've written a quick bioruby plugin to help parse blast results that are too large to fit into memory. Install: gem install bio-lazyblastxml Code:?github.com/robsyme/bioruby-lazyblastxml Blog post:?biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ The plugin uses LibXML::Reader to iterate through nodes, yielding ruby objects when required. The interface is as close to Bio::Blast::Report as I could keep it, but there are a few changes: ? Iteration.hits, hit.hsps etc do not return arrays. Instead, Report is a enumerable that yields iterations, Iteration is an enumerable that yields hits, Hits are enumerables that yield hsps, etc. This is my first attempt real shared code, and all comments and criticism are very welcome. -r Rob Syme PhD Candidate Curtin University Western Australia From pjotr.public14 at thebird.nl Wed Jun 1 03:30:16 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 1 Jun 2011 09:30:16 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: References: Message-ID: <20110601073016.GB22723@thebird.nl> Hi Rob, Why did you not start from my lazy fast and big-data XML parser for BLAST? https://github.com/pjotrp/blastxmlparser I hear it is being used in the NGS plugin. Be good to do some performance tests, when you introduce something new. I have a feeling you were simply not aware of it. Pj. On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote: > I've written a quick bioruby plugin to help parse blast results that > are too large to fit into memory. > > Install: gem install bio-lazyblastxml > Code:?github.com/robsyme/bioruby-lazyblastxml > Blog post:?biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ > > The plugin uses LibXML::Reader to iterate through nodes, yielding ruby > objects when required. > The interface is as close to Bio::Blast::Report as I could keep it, > but there are a few changes: > ? Iteration.hits, hit.hsps etc do not return arrays. Instead, Report > is a enumerable that yields iterations, Iteration is an enumerable > that yields hits, Hits are enumerables that yield hsps, etc. > > This is my first attempt real shared code, and all comments and > criticism are very welcome. > > -r > > Rob Syme > PhD Candidate > Curtin University > Western Australia > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From rob.syme at gmail.com Wed Jun 1 04:07:13 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 16:07:13 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <20110601073016.GB22723@thebird.nl> References: <20110601073016.GB22723@thebird.nl> Message-ID: You're right, I hadn't seen your project. My mistake. -r On Wed, Jun 1, 2011 at 3:30 PM, Pjotr Prins wrote: > Hi Rob, > > Why did you not start from my lazy fast and big-data XML parser for > BLAST? > > ?https://github.com/pjotrp/blastxmlparser > > I hear it is being used in the NGS plugin. Be good to do some > performance tests, when you introduce something new. > > I have a feeling you were simply not aware of it. > > Pj. > > On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote: >> I've written a quick bioruby plugin to help parse blast results that >> are too large to fit into memory. >> >> Install: gem install bio-lazyblastxml >> Code:?github.com/robsyme/bioruby-lazyblastxml >> Blog post:?biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ >> >> The plugin uses LibXML::Reader to iterate through nodes, yielding ruby >> objects when required. >> The interface is as close to Bio::Blast::Report as I could keep it, >> but there are a few changes: >> ? Iteration.hits, hit.hsps etc do not return arrays. Instead, Report >> is a enumerable that yields iterations, Iteration is an enumerable >> that yields hits, Hits are enumerables that yield hsps, etc. >> >> This is my first attempt real shared code, and all comments and >> criticism are very welcome. >> >> -r >> >> Rob Syme >> PhD Candidate >> Curtin University >> Western Australia >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > From philipp.comans at googlemail.com Wed Jun 1 04:25:37 2011 From: philipp.comans at googlemail.com (Philipp Comans) Date: Wed, 1 Jun 2011 10:25:37 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: References: <20110601073016.GB22723@thebird.nl> Message-ID: <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> Hi, I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. Cheers, Philipp Am Mittwoch, 1. Juni 2011 um 10:07 schrieb Rob Syme: > You're right, I hadn't seen your project. My mistake. > -r > > On Wed, Jun 1, 2011 at 3:30 PM, Pjotr Prins wrote: > > Hi Rob, > > > > Why did you not start from my lazy fast and big-data XML parser for > > BLAST? > > > > https://github.com/pjotrp/blastxmlparser > > > > I hear it is being used in the NGS plugin. Be good to do some > > performance tests, when you introduce something new. > > > > I have a feeling you were simply not aware of it. > > > > Pj. > > > > On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote: > > > I've written a quick bioruby plugin to help parse blast results that > > > are too large to fit into memory. > > > > > > Install: gem install bio-lazyblastxml > > > Code: github.com/robsyme/bioruby-lazyblastxml (http://github.com/robsyme/bioruby-lazyblastxml) > > > Blog post: biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ (http://biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/) > > > > > > The plugin uses LibXML::Reader to iterate through nodes, yielding ruby > > > objects when required. > > > The interface is as close to Bio::Blast::Report as I could keep it, > > > but there are a few changes: > > > Iteration.hits, hit.hsps etc do not return arrays. Instead, Report > > > is a enumerable that yields iterations, Iteration is an enumerable > > > that yields hits, Hits are enumerables that yield hsps, etc. > > > > > > This is my first attempt real shared code, and all comments and > > > criticism are very welcome. > > > > > > -r > > > > > > Rob Syme > > > PhD Candidate > > > Curtin University > > > Western Australia > > > > > > _______________________________________________ > > > BioRuby Project - http://www.bioruby.org/ > > > BioRuby mailing list > > > BioRuby at lists.open-bio.org (mailto:BioRuby at lists.open-bio.org) > > > http://lists.open-bio.org/mailman/listinfo/bioruby > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org (mailto:BioRuby at lists.open-bio.org) > http://lists.open-bio.org/mailman/listinfo/bioruby From rob.syme at gmail.com Wed Jun 1 04:33:36 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 16:33:36 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> Message-ID: I think that the list at http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty comprehensive, my mistake was simply not looking. -r On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans wrote: > Hi, > > I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. > In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. > > BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. > > Cheers, > > Philipp > From pjotr.public14 at thebird.nl Wed Jun 1 04:49:48 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 1 Jun 2011 10:49:48 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> Message-ID: <20110601084948.GA23592@thebird.nl> The general idea is to have a number of 'blessed' plugins tied to BioRuby releases. A blessed plugin is supposed to be rather solid, and have a level of documentation and testing. In addition there are 'development' plugins. Both should be listed on the plugin page. We are introducing that plumbing shortly. The duplication of work merely points out we need to get this done ;) It is interesting to note both XML parsers use lazy iterators. I also do lazy conversions. Same for my GFF3 plugin. Rob, be good to compare performance on some real-life data. Pj. On Wed, Jun 01, 2011 at 04:33:36PM +0800, Rob Syme wrote: > I think that the list at > http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty > comprehensive, my mistake was simply not looking. > -r > > > On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans > wrote: > > Hi, > > > > I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. > > In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. > > > > BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. > > > > Cheers, > > > > Philipp > > > From bonnal at ingm.org Wed Jun 1 06:26:19 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Wed, 1 Jun 2011 12:26:19 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <20110601084948.GA23592@thebird.nl> References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> <20110601084948.GA23592@thebird.nl> Message-ID: <23D33897-ACAC-47B0-85D1-A3A808D46B48@ingm.org> what about to automate this process on our wiki :-)? $# gem search -r bio- bio-assembly (0.1.0) bio-blastxmlparser (0.6.1) bio-bwa (0.2.2) bio-cnls_screenscraper (0.1.0) bio-emboss_six_frame_nucleotide_sequences (0.1.0) bio-gem (0.2.2) bio-genomic-interval (0.1.2) bio-gex (0.0.0) bio-gff3 (0.8.6) bio-graphics (1.4) bio-hello (0.0.0) bio-isoelectric_point (0.1.1) bio-kb-illumina (0.1.0) bio-lazyblastxml (0.4.0) bio-logger (0.9.0) bio-nexml (0.0.1) bio-octopus (0.1.1) bio-samtools (0.2.1) bio-sge (0.0.0) bio-tm_hmm (0.2.0) bio-ucsc-api (0.0.4) wow quite long list of plugins :-) I'm happy to see this boiling soup On 01/giu/2011, at 10.49, Pjotr Prins wrote: > The general idea is to have a number of 'blessed' plugins tied to > BioRuby releases. A blessed plugin is supposed to be rather solid, > and have a level of documentation and testing. > > In addition there are 'development' plugins. Both should be listed on > the plugin page. We are introducing that plumbing shortly. The > duplication of work merely points out we need to get this done ;) > > It is interesting to note both XML parsers use lazy iterators. I also > do lazy conversions. Same for my GFF3 plugin. Rob, be good to compare > performance on some real-life data. > > Pj. > > On Wed, Jun 01, 2011 at 04:33:36PM +0800, Rob Syme wrote: >> I think that the list at >> http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty >> comprehensive, my mistake was simply not looking. >> -r >> >> >> On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans >> wrote: >>> Hi, >>> >>> I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. >>> In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. >>> >>> BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. >>> >>> Cheers, >>> >>> Philipp >>> >> > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- The only change to succeed is starting from a simple thing. From rob.syme at gmail.com Wed Jun 1 08:26:25 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 20:26:25 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <20110601084948.GA23592@thebird.nl> References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> <20110601084948.GA23592@thebird.nl> Message-ID: I pushed a 1.4GB file through each of the parsers, simply counting the number of hits per iteration: user system total real Rob: 91.510000 0.620000 92.130000 ( 92.527617) Pjotr: 46.730000 0.430000 47.160000 ( 47.263949) One of the important differences in the parsers is that mine is lazy 'all the way down', in that the iterations are lazy, the hits are lazy and the hsps are lazy. No large chunks of XML are ever buffered into a string and then parsed together. While lazy-loading is a good idea, and should probably be adopted in more of the BioRuby core, taking it to this extreme is a bit silly. Pjotr's (more sensible) approach is to chunk up the file by iterations, and then use XPath to pull out the relevant information from there. One iteration will never be more than a few kb - certainly no strain on memory consumption. The IO strain of reading a file in tiny pieces looks to be the cause of the 2x slowdown in the example above. Lesson 1: Pragmatism is a good thing. Lesson 2: Always check to make sure work you're doing hasn't been done before Lesson 3: Use Pjotr's parser to make light work of your large Blast results. -r On Wed, Jun 1, 2011 at 4:49 PM, Pjotr Prins wrote: > The general idea is to have a number of 'blessed' plugins tied to > BioRuby releases. A blessed plugin is supposed to be rather solid, > and have a level of documentation and testing. > > In addition there are 'development' plugins. Both should be listed on > the plugin page. We are introducing that plumbing shortly. The > duplication of work merely points out we need to get this done ;) > > It is interesting to note both XML parsers use lazy iterators. I also > do lazy conversions. Same for my GFF3 plugin. Rob, be good to compare > performance on some real-life data. > > Pj. > > On Wed, Jun 01, 2011 at 04:33:36PM +0800, Rob Syme wrote: > > I think that the list at > > http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty > > comprehensive, my mistake was simply not looking. > > -r > > > > > > On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans > > wrote: > > > Hi, > > > > > > I had a similar problem recently. I needed an efficient parser for > Blast XML results and I discovered that the default parser in BioRuby was > not suitable. So I wrote my own using Nokogiri. > > > In my opinion it is way too hard at the moment to discover BioPlugins. > When people use the default XML or GFF parser that comes with BioRUby, they > do not expect that there is another, more efficient version. There should be > a section on the front page or even in the corresponding parts of the API > documentation that makes people aware of the existence of these efficient > parsers. > > > > > > BTW thank you all for BioRuby, I used in a project recently and it made > my life tremendously easier. > > > > > > Cheers, > > > > > > Philipp > > > > > > From yannick.wurm at unil.ch Mon Jun 13 01:49:39 2011 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Mon, 13 Jun 2011 12:49:39 +0700 Subject: [BioRuby] ruby BLAST server (web frontend) References: Message-ID: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Dear list & CC-ed, let me quote a discussion from a while back ( http://answerpot.com/showthread.php?1292835-rails+blast+server ): > I'd like to set up a small server for people to run BLAST against some of my sequences & see the results. > GMOD obviously comes to mind, but it seems like overkill. > And perhaps there is an almost automagic way to do this with ruby on rails. Has anyone done this yet? There was no good solution at the time. Anurag Priyam & I have since been working on something that fills this need. Ben Woodcroft has recently been contributing as well. Check: https://github.com/yannickwurm/sequenceserver or http://www.sequenceserver.com Some things remain to be improved. But globally the software works great. Thus we thought to share our progress on the list that initiated it. An excerpt of the README highlights some features: Ease of use for biologists: * intuitive and helpful web interface: automatic sequence type detection that helps choose appropriate BLAST method and database types * links to easily download sequences of BLAST hits * support for advanced options. Rapid deployment for bioinformatics administrators: * assisted formatting of BLAST databases (with sequence type detection) * automatic discovery of formatted BLAST databases during startup * uses ruby's internal web server (on any open port) or Apache * add custom hyperlinks from hits (to your genome browser or custom database). We have been using this as the web frontend for our ant genome blast at http://www.antgenomes.org since a few months. Comments, suggestions... and contributions are most welcome! Cheers, Anurag & Ben & Yannick ----------------------------- Ant Genomes & Evolution http://yannick.poulet.org skype://yannickwurm ----------------------------- From bonnal at ingm.org Mon Jun 13 03:17:21 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Mon, 13 Jun 2011 09:17:21 +0200 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: Dear Yannick and other, cute work. Just few suggestions. you could build a gem and distribute is then with a single executable script "sequenceserver" you can call all other tasks, configuration, database or starting the service like we did with biongs; it's a more consistent approach and the end user has a clear reference to your application. Installing it as gem then you need to build a web environment somewhere else but it is quite simple to create a scaffold directory ready to be used by a web server (where you put your configuration/database ref, public, js, css etc.) something like: sequenceserver database_formatter directory_with_fasta_files sequenceserver config production --bin="~/ncbi-blast-2.2.24+/bin/" --database="/Users/me/blast_databases/" sequenceserver start then if your application runs on ruby 1.87, try REE with passenger and nginx, in my opinion is the easiest web server (NGINX) with high level of performances http://www.modrails.com/ if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. could this became a bioruby plugin ? On 13/giu/2011, at 07.49, Yannick Wurm wrote: > Dear list & CC-ed, > > let me quote a discussion from a while back ( http://answerpot.com/showthread.php?1292835-rails+blast+server ): > >> I'd like to set up a small server for people to run BLAST against some of my sequences & see the results. >> GMOD obviously comes to mind, but it seems like overkill. >> And perhaps there is an almost automagic way to do this with ruby on rails. Has anyone done this yet? > > > There was no good solution at the time. Anurag Priyam & I have since been working on something that fills this need. Ben Woodcroft has recently been contributing as well. Check: > https://github.com/yannickwurm/sequenceserver or http://www.sequenceserver.com > > Some things remain to be improved. But globally the software works great. Thus we thought to share our progress on the list that initiated it. An excerpt of the README highlights some features: > > Ease of use for biologists: > * intuitive and helpful web interface: automatic sequence type detection that helps choose appropriate BLAST method and database types > * links to easily download sequences of BLAST hits > * support for advanced options. > > Rapid deployment for bioinformatics administrators: > * assisted formatting of BLAST databases (with sequence type detection) > * automatic discovery of formatted BLAST databases during startup > * uses ruby's internal web server (on any open port) or Apache > * add custom hyperlinks from hits (to your genome browser or custom database). > > > We have been using this as the web frontend for our ant genome blast at http://www.antgenomes.org since a few months. > > Comments, suggestions... and contributions are most welcome! > > Cheers, > > Anurag & Ben & Yannick > > > > ----------------------------- > Ant Genomes & Evolution > http://yannick.poulet.org > skype://yannickwurm > ----------------------------- > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From yannick.wurm at unil.ch Mon Jun 13 04:06:43 2011 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Mon, 13 Jun 2011 15:06:43 +0700 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: Thanks for the suggestions Raoul, they could substantially streamline setting things up! cheers yannick On 13 Jun 2011, at 14:17, Raoul Bonnal wrote: > Dear Yannick and other, > cute work. > Just few suggestions. > you could build a gem and distribute is then with a single executable script "sequenceserver" you can call all other tasks, > configuration, database or starting the service like we did with biongs; it's a more consistent approach and the end user has a clear reference to your application. > Installing it as gem then you need to build a web environment somewhere else but it is quite simple to create a scaffold directory ready to be used by a web server (where you put your configuration/database ref, public, js, css etc.) > something like: > > sequenceserver database_formatter directory_with_fasta_files > sequenceserver config production --bin="~/ncbi-blast-2.2.24+/bin/" --database="/Users/me/blast_databases/" > sequenceserver start > > then if your application runs on ruby 1.87, try REE with passenger and nginx, in my opinion is the easiest web server (NGINX) with high level of performances http://www.modrails.com/ > > if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. > > could this became a bioruby plugin ? > > > > > > On 13/giu/2011, at 07.49, Yannick Wurm wrote: > >> Dear list & CC-ed, >> >> let me quote a discussion from a while back ( http://answerpot.com/showthread.php?1292835-rails+blast+server ): >> >>> I'd like to set up a small server for people to run BLAST against some of my sequences & see the results. >>> GMOD obviously comes to mind, but it seems like overkill. >>> And perhaps there is an almost automagic way to do this with ruby on rails. Has anyone done this yet? >> >> >> There was no good solution at the time. Anurag Priyam & I have since been working on something that fills this need. Ben Woodcroft has recently been contributing as well. Check: >> https://github.com/yannickwurm/sequenceserver or http://www.sequenceserver.com >> >> Some things remain to be improved. But globally the software works great. Thus we thought to share our progress on the list that initiated it. An excerpt of the README highlights some features: >> >> Ease of use for biologists: >> * intuitive and helpful web interface: automatic sequence type detection that helps choose appropriate BLAST method and database types >> * links to easily download sequences of BLAST hits >> * support for advanced options. >> >> Rapid deployment for bioinformatics administrators: >> * assisted formatting of BLAST databases (with sequence type detection) >> * automatic discovery of formatted BLAST databases during startup >> * uses ruby's internal web server (on any open port) or Apache >> * add custom hyperlinks from hits (to your genome browser or custom database). >> >> >> We have been using this as the web frontend for our ant genome blast at http://www.antgenomes.org since a few months. >> >> Comments, suggestions... and contributions are most welcome! >> >> Cheers, >> >> Anurag & Ben & Yannick >> >> >> >> ----------------------------- >> Ant Genomes & Evolution >> http://yannick.poulet.org >> skype://yannickwurm >> ----------------------------- >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > ----------------------------- Ant Genomes & Evolution http://yannick.poulet.org skype://yannickwurm ----------------------------- BLAST @ http://antgenomes.org From anurag08priyam at gmail.com Mon Jun 13 12:10:53 2011 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Mon, 13 Jun 2011 21:40:53 +0530 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: > cute work. Thanks a lot Raoul :). > Just few suggestions. > you could build a gem and distribute is then with a single executable script "sequenceserver" you can call all other tasks, > configuration, database or starting the service like we did with biongs; it's a more consistent approach and the end user has a clear reference to your application. Agreed. And that is our target for the next release. > Installing it as gem then you need to build a web environment somewhere else but it is quite simple to create a scaffold directory ready to be used by a web server (where you put your configuration/database ref, public, js, css etc.) > something like: > > sequenceserver database_formatter directory_with_fasta_files > sequenceserver config production --bin="~/ncbi-blast-2.2.24+/bin/" --database="/Users/me/blast_databases/" > sequenceserver start This looks quite good. I will keep this in mind when pushing forward a gem release. >> then if your application runs on ruby 1.87, try REE with passenger and nginx, in my opinion is the easiest web server (NGINX) with high level of performances http://www.modrails.com/ > > if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. That would be great. We are putting forward a wiki page with instructions on deploying SequenceServer on Apache, and Nginix. I am almost done adding instructions for Apache, but I am not sure how to do it for Nginix. > could this became a bioruby plugin ? So, then would it become bio-sequenceserver? IMO, it doesn't logically fit in as a BioRuby plugin, as in it doesn't depend on BioRuby. And BioRuby is more like library but SequenceServer is more like an end product. Not sure though :-|. -- Anurag Priyam http://about.me/yeban/ From anurag08priyam at gmail.com Mon Jun 13 12:12:25 2011 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Mon, 13 Jun 2011 21:42:25 +0530 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: >> if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. > > That would be great. We are putting forward a wiki page with > instructions on deploying SequenceServer on Apache, and Nginix. I am > almost done adding instructions for Apache, but I am not sure how to > do it for Nginix. Oops, forgot to add the link: https://github.com/yannickwurm/sequenceserver/wiki/Deploying-Sequence-Server -- Anurag Priyam http://about.me/yeban/ From donttrustben at gmail.com Tue Jun 14 09:19:37 2011 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 14 Jun 2011 23:19:37 +1000 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: Hi, > > could this became a bioruby plugin ? > > So, then would it become bio-sequenceserver? IMO, it doesn't logically > fit in as a BioRuby plugin, as in it doesn't depend on BioRuby. And > BioRuby is more like library but SequenceServer is more like an end > product. Not sure though :-|. > To be technical, the branch trying to implement the blast overview graphic does rely on BioRuby, since that is a dependency of bio-graphics. But that branch hasn't been merged into the main tree yet, and might remain an optional thing anyway. -- Ben J Woodcroft, BE (Hons) PhD Candidate Ralph Laboratory The University of Melbourne Melbourne, Australia tel: (+613) 8344 2319 b.woodcroft at pgrad.unimelb.edu.au From pjotr.public14 at thebird.nl Tue Jun 14 09:26:54 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 14 Jun 2011 15:26:54 +0200 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: <20110614132654.GA20916@thebird.nl> The advantages of making it a plugin: 1. easy install for users 2. visibility from the BioRuby project 3. potentially a member of the stable plugin family 4. developers may use your libraries - even if the focus is an application Pj. On Tue, Jun 14, 2011 at 11:19:37PM +1000, Ben Woodcroft wrote: > Hi, > > > > > could this became a bioruby plugin ? > > > > So, then would it become bio-sequenceserver? IMO, it doesn't logically > > fit in as a BioRuby plugin, as in it doesn't depend on BioRuby. And > > BioRuby is more like library but SequenceServer is more like an end > > product. Not sure though :-|. > > > > To be technical, the branch trying to implement the blast overview graphic > does rely on BioRuby, since that is a dependency of bio-graphics. But that > branch hasn't been merged into the main tree yet, and might remain an > optional thing anyway. > > -- > Ben J Woodcroft, BE (Hons) > > PhD Candidate > Ralph Laboratory > The University of Melbourne > Melbourne, Australia > > tel: (+613) 8344 2319 > b.woodcroft at pgrad.unimelb.edu.au > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From mail at michaelbarton.me.uk Thu Jun 23 11:05:48 2011 From: mail at michaelbarton.me.uk (Michael Barton) Date: Thu, 23 Jun 2011 11:05:48 -0400 Subject: [BioRuby] GFF3 Record Equality Method Message-ID: <20110623150548.GA1030@Michael-Bartons-MacBook.local> As far as I can tell the GFF3 record in bioruby uses Object#== for comparison. I'm implementing a Bio::GFF::GFF3::Record#== method based on comparison of the GFF3 fields. Would this this be a useful addition to bioruby library? Cheers Michael Barton From ngoto at gen-info.osaka-u.ac.jp Fri Jun 24 08:41:29 2011 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 24 Jun 2011 21:41:29 +0900 Subject: [BioRuby] GFF3 Record Equality Method In-Reply-To: <20110623150548.GA1030@Michael-Bartons-MacBook.local> References: <20110623150548.GA1030@Michael-Bartons-MacBook.local> Message-ID: <20110624124129.C00871CBC47D@idnmail.gen-info.osaka-u.ac.jp> On Thu, 23 Jun 2011 11:05:48 -0400 Michael Barton wrote: > As far as I can tell the GFF3 record in bioruby uses Object#== for comparison. > I'm implementing a Bio::GFF::GFF3::Record#== method based on comparison of the > GFF3 fields. Would this this be a useful addition to bioruby library? > > Cheers > > Michael Barton Bio::GFF::GFF3::Record inherits Bio::GFF::GFF2::Record, and the GFF2::Record already has its own == method. GFF2::Record#== gives enough functionality for comparing GFF3 records, in addition to GFF2 records. #sample code #----------------------------------------------------------- require 'bio' str1 = "chrI\tSGD\tcentromere\t151467\t151584\t.\t+\t.\t" + "ID=CEN1;Name=CEN1;gene=CEN1;Alias=CEN1,test%3B0001;" + "Note=Chromosome I centromere;dbxref=SGD:S000006463;" + "Target=test%2002 123 456 -,test%2C03 159 314;" + "memo%3Dtest%3Battr=99.9%25%09match" str2 = str1.dup str3 = str1.gsub(/CEN1/, 'CEN2') obj0 = Bio::GFF::GFF3::Record.new(str1) obj1 = Bio::GFF::GFF3::Record.new(str1) obj2 = Bio::GFF::GFF3::Record.new(str2) obj3 = Bio::GFF::GFF3::Record.new(str3) p obj0==obj1 p obj1==obj2 p obj1==obj3 #----------------------------------------------------------- -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From andrew.j.grimm at gmail.com Sun Jun 26 06:16:21 2011 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Sun, 26 Jun 2011 20:16:21 +1000 Subject: [BioRuby] Anyone else attending RubyKaigi 2011? Message-ID: I noticed that Goto-san's talk got accepted as a lightning talk. Are any other BioRuby contributors or users attending? I'll be giving a talk, but I'll only briefly mention bioinformatics. I'll be talking about the Small Eigen Collider. In describing why I created the Small Eigen Collider, I'll mention that I'm a bioinformatician, and that I deal with enough information that I am tempted to run Ruby code under implementations other than YARV. http://rubykaigi.org/2011/en/schedule/details/18S03 Andrew From rob.syme at gmail.com Wed Jun 1 07:17:30 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 15:17:30 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin Message-ID: I've written a quick bioruby plugin to help parse blast results that are too large to fit into memory. Install: gem install bio-lazyblastxml Code:?github.com/robsyme/bioruby-lazyblastxml Blog post:?biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ The plugin uses LibXML::Reader to iterate through nodes, yielding ruby objects when required. The interface is as close to Bio::Blast::Report as I could keep it, but there are a few changes: ? Iteration.hits, hit.hsps etc do not return arrays. Instead, Report is a enumerable that yields iterations, Iteration is an enumerable that yields hits, Hits are enumerables that yield hsps, etc. This is my first attempt real shared code, and all comments and criticism are very welcome. -r Rob Syme PhD Candidate Curtin University Western Australia From pjotr.public14 at thebird.nl Wed Jun 1 07:30:16 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 1 Jun 2011 09:30:16 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: References: Message-ID: <20110601073016.GB22723@thebird.nl> Hi Rob, Why did you not start from my lazy fast and big-data XML parser for BLAST? https://github.com/pjotrp/blastxmlparser I hear it is being used in the NGS plugin. Be good to do some performance tests, when you introduce something new. I have a feeling you were simply not aware of it. Pj. On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote: > I've written a quick bioruby plugin to help parse blast results that > are too large to fit into memory. > > Install: gem install bio-lazyblastxml > Code:?github.com/robsyme/bioruby-lazyblastxml > Blog post:?biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ > > The plugin uses LibXML::Reader to iterate through nodes, yielding ruby > objects when required. > The interface is as close to Bio::Blast::Report as I could keep it, > but there are a few changes: > ? Iteration.hits, hit.hsps etc do not return arrays. Instead, Report > is a enumerable that yields iterations, Iteration is an enumerable > that yields hits, Hits are enumerables that yield hsps, etc. > > This is my first attempt real shared code, and all comments and > criticism are very welcome. > > -r > > Rob Syme > PhD Candidate > Curtin University > Western Australia > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From rob.syme at gmail.com Wed Jun 1 08:07:13 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 16:07:13 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <20110601073016.GB22723@thebird.nl> References: <20110601073016.GB22723@thebird.nl> Message-ID: You're right, I hadn't seen your project. My mistake. -r On Wed, Jun 1, 2011 at 3:30 PM, Pjotr Prins wrote: > Hi Rob, > > Why did you not start from my lazy fast and big-data XML parser for > BLAST? > > ?https://github.com/pjotrp/blastxmlparser > > I hear it is being used in the NGS plugin. Be good to do some > performance tests, when you introduce something new. > > I have a feeling you were simply not aware of it. > > Pj. > > On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote: >> I've written a quick bioruby plugin to help parse blast results that >> are too large to fit into memory. >> >> Install: gem install bio-lazyblastxml >> Code:?github.com/robsyme/bioruby-lazyblastxml >> Blog post:?biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ >> >> The plugin uses LibXML::Reader to iterate through nodes, yielding ruby >> objects when required. >> The interface is as close to Bio::Blast::Report as I could keep it, >> but there are a few changes: >> ? Iteration.hits, hit.hsps etc do not return arrays. Instead, Report >> is a enumerable that yields iterations, Iteration is an enumerable >> that yields hits, Hits are enumerables that yield hsps, etc. >> >> This is my first attempt real shared code, and all comments and >> criticism are very welcome. >> >> -r >> >> Rob Syme >> PhD Candidate >> Curtin University >> Western Australia >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > From philipp.comans at googlemail.com Wed Jun 1 08:25:37 2011 From: philipp.comans at googlemail.com (Philipp Comans) Date: Wed, 1 Jun 2011 10:25:37 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: References: <20110601073016.GB22723@thebird.nl> Message-ID: <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> Hi, I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. Cheers, Philipp Am Mittwoch, 1. Juni 2011 um 10:07 schrieb Rob Syme: > You're right, I hadn't seen your project. My mistake. > -r > > On Wed, Jun 1, 2011 at 3:30 PM, Pjotr Prins wrote: > > Hi Rob, > > > > Why did you not start from my lazy fast and big-data XML parser for > > BLAST? > > > > https://github.com/pjotrp/blastxmlparser > > > > I hear it is being used in the NGS plugin. Be good to do some > > performance tests, when you introduce something new. > > > > I have a feeling you were simply not aware of it. > > > > Pj. > > > > On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote: > > > I've written a quick bioruby plugin to help parse blast results that > > > are too large to fit into memory. > > > > > > Install: gem install bio-lazyblastxml > > > Code: github.com/robsyme/bioruby-lazyblastxml (http://github.com/robsyme/bioruby-lazyblastxml) > > > Blog post: biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ (http://biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/) > > > > > > The plugin uses LibXML::Reader to iterate through nodes, yielding ruby > > > objects when required. > > > The interface is as close to Bio::Blast::Report as I could keep it, > > > but there are a few changes: > > > Iteration.hits, hit.hsps etc do not return arrays. Instead, Report > > > is a enumerable that yields iterations, Iteration is an enumerable > > > that yields hits, Hits are enumerables that yield hsps, etc. > > > > > > This is my first attempt real shared code, and all comments and > > > criticism are very welcome. > > > > > > -r > > > > > > Rob Syme > > > PhD Candidate > > > Curtin University > > > Western Australia > > > > > > _______________________________________________ > > > BioRuby Project - http://www.bioruby.org/ > > > BioRuby mailing list > > > BioRuby at lists.open-bio.org (mailto:BioRuby at lists.open-bio.org) > > > http://lists.open-bio.org/mailman/listinfo/bioruby > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org (mailto:BioRuby at lists.open-bio.org) > http://lists.open-bio.org/mailman/listinfo/bioruby From rob.syme at gmail.com Wed Jun 1 08:33:36 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 16:33:36 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> Message-ID: I think that the list at http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty comprehensive, my mistake was simply not looking. -r On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans wrote: > Hi, > > I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. > In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. > > BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. > > Cheers, > > Philipp > From pjotr.public14 at thebird.nl Wed Jun 1 08:49:48 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 1 Jun 2011 10:49:48 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> Message-ID: <20110601084948.GA23592@thebird.nl> The general idea is to have a number of 'blessed' plugins tied to BioRuby releases. A blessed plugin is supposed to be rather solid, and have a level of documentation and testing. In addition there are 'development' plugins. Both should be listed on the plugin page. We are introducing that plumbing shortly. The duplication of work merely points out we need to get this done ;) It is interesting to note both XML parsers use lazy iterators. I also do lazy conversions. Same for my GFF3 plugin. Rob, be good to compare performance on some real-life data. Pj. On Wed, Jun 01, 2011 at 04:33:36PM +0800, Rob Syme wrote: > I think that the list at > http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty > comprehensive, my mistake was simply not looking. > -r > > > On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans > wrote: > > Hi, > > > > I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. > > In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. > > > > BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. > > > > Cheers, > > > > Philipp > > > From bonnal at ingm.org Wed Jun 1 10:26:19 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Wed, 1 Jun 2011 12:26:19 +0200 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <20110601084948.GA23592@thebird.nl> References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> <20110601084948.GA23592@thebird.nl> Message-ID: <23D33897-ACAC-47B0-85D1-A3A808D46B48@ingm.org> what about to automate this process on our wiki :-)? $# gem search -r bio- bio-assembly (0.1.0) bio-blastxmlparser (0.6.1) bio-bwa (0.2.2) bio-cnls_screenscraper (0.1.0) bio-emboss_six_frame_nucleotide_sequences (0.1.0) bio-gem (0.2.2) bio-genomic-interval (0.1.2) bio-gex (0.0.0) bio-gff3 (0.8.6) bio-graphics (1.4) bio-hello (0.0.0) bio-isoelectric_point (0.1.1) bio-kb-illumina (0.1.0) bio-lazyblastxml (0.4.0) bio-logger (0.9.0) bio-nexml (0.0.1) bio-octopus (0.1.1) bio-samtools (0.2.1) bio-sge (0.0.0) bio-tm_hmm (0.2.0) bio-ucsc-api (0.0.4) wow quite long list of plugins :-) I'm happy to see this boiling soup On 01/giu/2011, at 10.49, Pjotr Prins wrote: > The general idea is to have a number of 'blessed' plugins tied to > BioRuby releases. A blessed plugin is supposed to be rather solid, > and have a level of documentation and testing. > > In addition there are 'development' plugins. Both should be listed on > the plugin page. We are introducing that plumbing shortly. The > duplication of work merely points out we need to get this done ;) > > It is interesting to note both XML parsers use lazy iterators. I also > do lazy conversions. Same for my GFF3 plugin. Rob, be good to compare > performance on some real-life data. > > Pj. > > On Wed, Jun 01, 2011 at 04:33:36PM +0800, Rob Syme wrote: >> I think that the list at >> http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty >> comprehensive, my mistake was simply not looking. >> -r >> >> >> On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans >> wrote: >>> Hi, >>> >>> I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri. >>> In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers. >>> >>> BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier. >>> >>> Cheers, >>> >>> Philipp >>> >> > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- The only change to succeed is starting from a simple thing. From rob.syme at gmail.com Wed Jun 1 12:26:25 2011 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 1 Jun 2011 20:26:25 +0800 Subject: [BioRuby] Parsing large Blast xml files - a new bioruby plugin In-Reply-To: <20110601084948.GA23592@thebird.nl> References: <20110601073016.GB22723@thebird.nl> <2738C8712A2F46BAB655CE885CDF4F89@googlemail.com> <20110601084948.GA23592@thebird.nl> Message-ID: I pushed a 1.4GB file through each of the parsers, simply counting the number of hits per iteration: user system total real Rob: 91.510000 0.620000 92.130000 ( 92.527617) Pjotr: 46.730000 0.430000 47.160000 ( 47.263949) One of the important differences in the parsers is that mine is lazy 'all the way down', in that the iterations are lazy, the hits are lazy and the hsps are lazy. No large chunks of XML are ever buffered into a string and then parsed together. While lazy-loading is a good idea, and should probably be adopted in more of the BioRuby core, taking it to this extreme is a bit silly. Pjotr's (more sensible) approach is to chunk up the file by iterations, and then use XPath to pull out the relevant information from there. One iteration will never be more than a few kb - certainly no strain on memory consumption. The IO strain of reading a file in tiny pieces looks to be the cause of the 2x slowdown in the example above. Lesson 1: Pragmatism is a good thing. Lesson 2: Always check to make sure work you're doing hasn't been done before Lesson 3: Use Pjotr's parser to make light work of your large Blast results. -r On Wed, Jun 1, 2011 at 4:49 PM, Pjotr Prins wrote: > The general idea is to have a number of 'blessed' plugins tied to > BioRuby releases. A blessed plugin is supposed to be rather solid, > and have a level of documentation and testing. > > In addition there are 'development' plugins. Both should be listed on > the plugin page. We are introducing that plumbing shortly. The > duplication of work merely points out we need to get this done ;) > > It is interesting to note both XML parsers use lazy iterators. I also > do lazy conversions. Same for my GFF3 plugin. Rob, be good to compare > performance on some real-life data. > > Pj. > > On Wed, Jun 01, 2011 at 04:33:36PM +0800, Rob Syme wrote: > > I think that the list at > > http://bioruby.open-bio.org/wiki/BioRuby_Plugins is pretty > > comprehensive, my mistake was simply not looking. > > -r > > > > > > On Wed, Jun 1, 2011 at 4:25 PM, Philipp Comans > > wrote: > > > Hi, > > > > > > I had a similar problem recently. I needed an efficient parser for > Blast XML results and I discovered that the default parser in BioRuby was > not suitable. So I wrote my own using Nokogiri. > > > In my opinion it is way too hard at the moment to discover BioPlugins. > When people use the default XML or GFF parser that comes with BioRUby, they > do not expect that there is another, more efficient version. There should be > a section on the front page or even in the corresponding parts of the API > documentation that makes people aware of the existence of these efficient > parsers. > > > > > > BTW thank you all for BioRuby, I used in a project recently and it made > my life tremendously easier. > > > > > > Cheers, > > > > > > Philipp > > > > > > From yannick.wurm at unil.ch Mon Jun 13 05:49:39 2011 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Mon, 13 Jun 2011 12:49:39 +0700 Subject: [BioRuby] ruby BLAST server (web frontend) References: Message-ID: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Dear list & CC-ed, let me quote a discussion from a while back ( http://answerpot.com/showthread.php?1292835-rails+blast+server ): > I'd like to set up a small server for people to run BLAST against some of my sequences & see the results. > GMOD obviously comes to mind, but it seems like overkill. > And perhaps there is an almost automagic way to do this with ruby on rails. Has anyone done this yet? There was no good solution at the time. Anurag Priyam & I have since been working on something that fills this need. Ben Woodcroft has recently been contributing as well. Check: https://github.com/yannickwurm/sequenceserver or http://www.sequenceserver.com Some things remain to be improved. But globally the software works great. Thus we thought to share our progress on the list that initiated it. An excerpt of the README highlights some features: Ease of use for biologists: * intuitive and helpful web interface: automatic sequence type detection that helps choose appropriate BLAST method and database types * links to easily download sequences of BLAST hits * support for advanced options. Rapid deployment for bioinformatics administrators: * assisted formatting of BLAST databases (with sequence type detection) * automatic discovery of formatted BLAST databases during startup * uses ruby's internal web server (on any open port) or Apache * add custom hyperlinks from hits (to your genome browser or custom database). We have been using this as the web frontend for our ant genome blast at http://www.antgenomes.org since a few months. Comments, suggestions... and contributions are most welcome! Cheers, Anurag & Ben & Yannick ----------------------------- Ant Genomes & Evolution http://yannick.poulet.org skype://yannickwurm ----------------------------- From bonnal at ingm.org Mon Jun 13 07:17:21 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Mon, 13 Jun 2011 09:17:21 +0200 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: Dear Yannick and other, cute work. Just few suggestions. you could build a gem and distribute is then with a single executable script "sequenceserver" you can call all other tasks, configuration, database or starting the service like we did with biongs; it's a more consistent approach and the end user has a clear reference to your application. Installing it as gem then you need to build a web environment somewhere else but it is quite simple to create a scaffold directory ready to be used by a web server (where you put your configuration/database ref, public, js, css etc.) something like: sequenceserver database_formatter directory_with_fasta_files sequenceserver config production --bin="~/ncbi-blast-2.2.24+/bin/" --database="/Users/me/blast_databases/" sequenceserver start then if your application runs on ruby 1.87, try REE with passenger and nginx, in my opinion is the easiest web server (NGINX) with high level of performances http://www.modrails.com/ if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. could this became a bioruby plugin ? On 13/giu/2011, at 07.49, Yannick Wurm wrote: > Dear list & CC-ed, > > let me quote a discussion from a while back ( http://answerpot.com/showthread.php?1292835-rails+blast+server ): > >> I'd like to set up a small server for people to run BLAST against some of my sequences & see the results. >> GMOD obviously comes to mind, but it seems like overkill. >> And perhaps there is an almost automagic way to do this with ruby on rails. Has anyone done this yet? > > > There was no good solution at the time. Anurag Priyam & I have since been working on something that fills this need. Ben Woodcroft has recently been contributing as well. Check: > https://github.com/yannickwurm/sequenceserver or http://www.sequenceserver.com > > Some things remain to be improved. But globally the software works great. Thus we thought to share our progress on the list that initiated it. An excerpt of the README highlights some features: > > Ease of use for biologists: > * intuitive and helpful web interface: automatic sequence type detection that helps choose appropriate BLAST method and database types > * links to easily download sequences of BLAST hits > * support for advanced options. > > Rapid deployment for bioinformatics administrators: > * assisted formatting of BLAST databases (with sequence type detection) > * automatic discovery of formatted BLAST databases during startup > * uses ruby's internal web server (on any open port) or Apache > * add custom hyperlinks from hits (to your genome browser or custom database). > > > We have been using this as the web frontend for our ant genome blast at http://www.antgenomes.org since a few months. > > Comments, suggestions... and contributions are most welcome! > > Cheers, > > Anurag & Ben & Yannick > > > > ----------------------------- > Ant Genomes & Evolution > http://yannick.poulet.org > skype://yannickwurm > ----------------------------- > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From yannick.wurm at unil.ch Mon Jun 13 08:06:43 2011 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Mon, 13 Jun 2011 15:06:43 +0700 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: Thanks for the suggestions Raoul, they could substantially streamline setting things up! cheers yannick On 13 Jun 2011, at 14:17, Raoul Bonnal wrote: > Dear Yannick and other, > cute work. > Just few suggestions. > you could build a gem and distribute is then with a single executable script "sequenceserver" you can call all other tasks, > configuration, database or starting the service like we did with biongs; it's a more consistent approach and the end user has a clear reference to your application. > Installing it as gem then you need to build a web environment somewhere else but it is quite simple to create a scaffold directory ready to be used by a web server (where you put your configuration/database ref, public, js, css etc.) > something like: > > sequenceserver database_formatter directory_with_fasta_files > sequenceserver config production --bin="~/ncbi-blast-2.2.24+/bin/" --database="/Users/me/blast_databases/" > sequenceserver start > > then if your application runs on ruby 1.87, try REE with passenger and nginx, in my opinion is the easiest web server (NGINX) with high level of performances http://www.modrails.com/ > > if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. > > could this became a bioruby plugin ? > > > > > > On 13/giu/2011, at 07.49, Yannick Wurm wrote: > >> Dear list & CC-ed, >> >> let me quote a discussion from a while back ( http://answerpot.com/showthread.php?1292835-rails+blast+server ): >> >>> I'd like to set up a small server for people to run BLAST against some of my sequences & see the results. >>> GMOD obviously comes to mind, but it seems like overkill. >>> And perhaps there is an almost automagic way to do this with ruby on rails. Has anyone done this yet? >> >> >> There was no good solution at the time. Anurag Priyam & I have since been working on something that fills this need. Ben Woodcroft has recently been contributing as well. Check: >> https://github.com/yannickwurm/sequenceserver or http://www.sequenceserver.com >> >> Some things remain to be improved. But globally the software works great. Thus we thought to share our progress on the list that initiated it. An excerpt of the README highlights some features: >> >> Ease of use for biologists: >> * intuitive and helpful web interface: automatic sequence type detection that helps choose appropriate BLAST method and database types >> * links to easily download sequences of BLAST hits >> * support for advanced options. >> >> Rapid deployment for bioinformatics administrators: >> * assisted formatting of BLAST databases (with sequence type detection) >> * automatic discovery of formatted BLAST databases during startup >> * uses ruby's internal web server (on any open port) or Apache >> * add custom hyperlinks from hits (to your genome browser or custom database). >> >> >> We have been using this as the web frontend for our ant genome blast at http://www.antgenomes.org since a few months. >> >> Comments, suggestions... and contributions are most welcome! >> >> Cheers, >> >> Anurag & Ben & Yannick >> >> >> >> ----------------------------- >> Ant Genomes & Evolution >> http://yannick.poulet.org >> skype://yannickwurm >> ----------------------------- >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > ----------------------------- Ant Genomes & Evolution http://yannick.poulet.org skype://yannickwurm ----------------------------- BLAST @ http://antgenomes.org From anurag08priyam at gmail.com Mon Jun 13 16:10:53 2011 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Mon, 13 Jun 2011 21:40:53 +0530 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: > cute work. Thanks a lot Raoul :). > Just few suggestions. > you could build a gem and distribute is then with a single executable script "sequenceserver" you can call all other tasks, > configuration, database or starting the service like we did with biongs; it's a more consistent approach and the end user has a clear reference to your application. Agreed. And that is our target for the next release. > Installing it as gem then you need to build a web environment somewhere else but it is quite simple to create a scaffold directory ready to be used by a web server (where you put your configuration/database ref, public, js, css etc.) > something like: > > sequenceserver database_formatter directory_with_fasta_files > sequenceserver config production --bin="~/ncbi-blast-2.2.24+/bin/" --database="/Users/me/blast_databases/" > sequenceserver start This looks quite good. I will keep this in mind when pushing forward a gem release. >> then if your application runs on ruby 1.87, try REE with passenger and nginx, in my opinion is the easiest web server (NGINX) with high level of performances http://www.modrails.com/ > > if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. That would be great. We are putting forward a wiki page with instructions on deploying SequenceServer on Apache, and Nginix. I am almost done adding instructions for Apache, but I am not sure how to do it for Nginix. > could this became a bioruby plugin ? So, then would it become bio-sequenceserver? IMO, it doesn't logically fit in as a BioRuby plugin, as in it doesn't depend on BioRuby. And BioRuby is more like library but SequenceServer is more like an end product. Not sure though :-|. -- Anurag Priyam http://about.me/yeban/ From anurag08priyam at gmail.com Mon Jun 13 16:12:25 2011 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Mon, 13 Jun 2011 21:42:25 +0530 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: >> if you need help to configure nginx I can give you some hint or example of my config, it works well with rvm as well. > > That would be great. We are putting forward a wiki page with > instructions on deploying SequenceServer on Apache, and Nginix. I am > almost done adding instructions for Apache, but I am not sure how to > do it for Nginix. Oops, forgot to add the link: https://github.com/yannickwurm/sequenceserver/wiki/Deploying-Sequence-Server -- Anurag Priyam http://about.me/yeban/ From donttrustben at gmail.com Tue Jun 14 13:19:37 2011 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 14 Jun 2011 23:19:37 +1000 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: Hi, > > could this became a bioruby plugin ? > > So, then would it become bio-sequenceserver? IMO, it doesn't logically > fit in as a BioRuby plugin, as in it doesn't depend on BioRuby. And > BioRuby is more like library but SequenceServer is more like an end > product. Not sure though :-|. > To be technical, the branch trying to implement the blast overview graphic does rely on BioRuby, since that is a dependency of bio-graphics. But that branch hasn't been merged into the main tree yet, and might remain an optional thing anyway. -- Ben J Woodcroft, BE (Hons) PhD Candidate Ralph Laboratory The University of Melbourne Melbourne, Australia tel: (+613) 8344 2319 b.woodcroft at pgrad.unimelb.edu.au From pjotr.public14 at thebird.nl Tue Jun 14 13:26:54 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 14 Jun 2011 15:26:54 +0200 Subject: [BioRuby] ruby BLAST server (web frontend) In-Reply-To: References: <4E447EC6-D36D-42DF-85B7-E199E7E78042@unil.ch> Message-ID: <20110614132654.GA20916@thebird.nl> The advantages of making it a plugin: 1. easy install for users 2. visibility from the BioRuby project 3. potentially a member of the stable plugin family 4. developers may use your libraries - even if the focus is an application Pj. On Tue, Jun 14, 2011 at 11:19:37PM +1000, Ben Woodcroft wrote: > Hi, > > > > > could this became a bioruby plugin ? > > > > So, then would it become bio-sequenceserver? IMO, it doesn't logically > > fit in as a BioRuby plugin, as in it doesn't depend on BioRuby. And > > BioRuby is more like library but SequenceServer is more like an end > > product. Not sure though :-|. > > > > To be technical, the branch trying to implement the blast overview graphic > does rely on BioRuby, since that is a dependency of bio-graphics. But that > branch hasn't been merged into the main tree yet, and might remain an > optional thing anyway. > > -- > Ben J Woodcroft, BE (Hons) > > PhD Candidate > Ralph Laboratory > The University of Melbourne > Melbourne, Australia > > tel: (+613) 8344 2319 > b.woodcroft at pgrad.unimelb.edu.au > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From mail at michaelbarton.me.uk Thu Jun 23 15:05:48 2011 From: mail at michaelbarton.me.uk (Michael Barton) Date: Thu, 23 Jun 2011 11:05:48 -0400 Subject: [BioRuby] GFF3 Record Equality Method Message-ID: <20110623150548.GA1030@Michael-Bartons-MacBook.local> As far as I can tell the GFF3 record in bioruby uses Object#== for comparison. I'm implementing a Bio::GFF::GFF3::Record#== method based on comparison of the GFF3 fields. Would this this be a useful addition to bioruby library? Cheers Michael Barton From ngoto at gen-info.osaka-u.ac.jp Fri Jun 24 12:41:29 2011 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 24 Jun 2011 21:41:29 +0900 Subject: [BioRuby] GFF3 Record Equality Method In-Reply-To: <20110623150548.GA1030@Michael-Bartons-MacBook.local> References: <20110623150548.GA1030@Michael-Bartons-MacBook.local> Message-ID: <20110624124129.C00871CBC47D@idnmail.gen-info.osaka-u.ac.jp> On Thu, 23 Jun 2011 11:05:48 -0400 Michael Barton wrote: > As far as I can tell the GFF3 record in bioruby uses Object#== for comparison. > I'm implementing a Bio::GFF::GFF3::Record#== method based on comparison of the > GFF3 fields. Would this this be a useful addition to bioruby library? > > Cheers > > Michael Barton Bio::GFF::GFF3::Record inherits Bio::GFF::GFF2::Record, and the GFF2::Record already has its own == method. GFF2::Record#== gives enough functionality for comparing GFF3 records, in addition to GFF2 records. #sample code #----------------------------------------------------------- require 'bio' str1 = "chrI\tSGD\tcentromere\t151467\t151584\t.\t+\t.\t" + "ID=CEN1;Name=CEN1;gene=CEN1;Alias=CEN1,test%3B0001;" + "Note=Chromosome I centromere;dbxref=SGD:S000006463;" + "Target=test%2002 123 456 -,test%2C03 159 314;" + "memo%3Dtest%3Battr=99.9%25%09match" str2 = str1.dup str3 = str1.gsub(/CEN1/, 'CEN2') obj0 = Bio::GFF::GFF3::Record.new(str1) obj1 = Bio::GFF::GFF3::Record.new(str1) obj2 = Bio::GFF::GFF3::Record.new(str2) obj3 = Bio::GFF::GFF3::Record.new(str3) p obj0==obj1 p obj1==obj2 p obj1==obj3 #----------------------------------------------------------- -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From andrew.j.grimm at gmail.com Sun Jun 26 10:16:21 2011 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Sun, 26 Jun 2011 20:16:21 +1000 Subject: [BioRuby] Anyone else attending RubyKaigi 2011? Message-ID: I noticed that Goto-san's talk got accepted as a lightning talk. Are any other BioRuby contributors or users attending? I'll be giving a talk, but I'll only briefly mention bioinformatics. I'll be talking about the Small Eigen Collider. In describing why I created the Small Eigen Collider, I'll mention that I'm a bioinformatician, and that I deal with enough information that I am tempted to run Ruby code under implementations other than YARV. http://rubykaigi.org/2011/en/schedule/details/18S03 Andrew