From pjotr2008 at thebird.nl Thu Jun 12 05:02:48 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Thu, 12 Jun 2008 11:02:48 +0200 Subject: [BioRuby] BioRuby on github + lighthouse In-Reply-To: <0A349FF7-51BD-4546-8744-B342C4EA8950@michaelbarton.me.uk> References: <0A349FF7-51BD-4546-8744-B342C4EA8950@michaelbarton.me.uk> Message-ID: <20080612090248.GA32143@thebird.nl> Hi All, In view of my recent bibtex commit fiasco - which I thought an improvement, but probably was a regression as N. pointed out and rolled back - I favour moving the sources to a non-centralized repository. This will allow individual development where the main maintainers can cherry-pick individual patches for inclusion in the stable and development trees. Toshiaki, both Jan and I want to ask you to check out this technology and take the lead by moving a 'blessed' branch into github. The alternative is that I do the same thing - both cases will allow you to continue as before, but some development will be on git branches. Technology does not solve problems - like the problem of lack general action in the source tree - but at least git will allow people to have a sense of freedom. And it is up to the central maintainers what to include and what not. Much like the role Linus plays in kernel development. As ever, with respect, Pj. From ktym at hgc.jp Thu Jun 12 16:32:46 2008 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 13 Jun 2008 05:32:46 +0900 Subject: [BioRuby] TogoWS (Re: BioRuby on github + lighthouse) In-Reply-To: <20080612090248.GA32143@thebird.nl> References: <0A349FF7-51BD-4546-8744-B342C4EA8950@michaelbarton.me.uk> <20080612090248.GA32143@thebird.nl> Message-ID: <99D301E3-DF12-4595-AB33-F0E74B7BA690@hgc.jp> Dear all, Sorry for my long absence after the BioHackathon held in this February. However, I'm afraid that I can't spare enough time for organizing your request for a while yet. Instead, I need to wrap up outputs from the BioHackathon first. BioRuby team had focused on the generalized sequence model and by completing the work I can provide pretty nice (hopefully) feature -- parsing any sequence database entry with REST-like web service API. I hope all of you like the following idea and help me to finish the task by integrating GenBank, EMBL, UniProt, BioSQL with the new Bio::Sequence model as we had discussed during the Hackathon. Sample implementation (TogoWS) is now available at http://togows.dbcls.jp/site/rest.html where you can find links to retrieve database entries with Rails like "Pretty URL" (sorry for the Japanese text, I'll provide English version some time). For example, plain GenBank entry HUMIGHAF is available at http://togows.dbcls.jp/entry/genbank/HUMIGHAF and you can obtain * XML version by http://togows.dbcls.jp/entry/genbank/HUMIGHAF.xml * FASTA version by http://togows.dbcls.jp/entry/genbank/HUMIGHAF.fasta and, as this service is built on top of the BioRuby library, you can also parse the entry to obtain a specific field by calling any bioruby method in the Bio::GenBank class with slash. * DEFINITION field http://togows.dbcls.jp/entry/genbank/HUMIGHAF/definition However, methods to fetch specific field varies database to database, because of the different implementations in the corresponding classes. Fortunately, It would be pretty easy to solve this situation. We just need to convert GenBank, EMBL, UniProt and BioSQL data model to the generic Bio::Sequence class and use the method in the generic class. And, this is the same story that we had agreed during the Hackathon. Along with this, we need to define a set of generic methods to access the internal structure and also need to define a set of standard output formats (for features, references, cross refs, dates etc.) - slightly tough part. For example, it would be great if I can extract feature table in a reusable standard format like GFF (or [protein] DAS) instead of a YAML/XML dump of the array of Bio::Feature class. (followings are not yet implemented but should return the same result). http://togows.dbcls.jp/entry/genbank/J00231.gff http://togows.dbcls.jp/entry/genbank/J00231/features http://togows.dbcls.jp/entry/embl/J00231.gff http://togows.dbcls.jp/entry/embl/J00231/features : All we need is to list up method names and return values (formats) commonly usable with any sequence database entries. Pj, you may also want to have something like http://togows.dbcls.jp/entry/pubmed/16381885.bibtex http://togows.dbcls.jp/entry/pubmed/16381885.endnote http://togows.dbcls.jp/entry/pubmed/16381885/url and these are trivial to implement, just add the appropriate methods in the Bio::Reference class. For this purpose, I don't hesitate to change internal logic/APIs as you made, as long as it is reasonable. I'm also planning to provide search interface and converters in a similar way. Converters will include BLAST output to GFF (maybe by using BioPerl :) etc. The outcomes of the BioHackathon 2008 was fairly diverse, but I think this approach is one direction to evolve the basic infrastructure of the bioinformatics resources towards the useful integration. Actually, the real problem is, I'm still busy with other tasks and can't spare 100% effort on these... Regards, Toshiaki Katayama On 2008/06/12, at 18:02, Pjotr Prins wrote: > Hi All, > > In view of my recent bibtex commit fiasco - which I thought an > improvement, but probably was a regression as N. pointed out and > rolled back - I favour moving the sources to a non-centralized > repository. This will allow individual development where the main > maintainers can cherry-pick individual patches for inclusion in the > stable and development trees. > > Toshiaki, both Jan and I want to ask you to check out this technology > and take the lead by moving a 'blessed' branch into github. The > alternative is that I do the same thing - both cases will allow you > to continue as before, but some development will be on git branches. > > Technology does not solve problems - like the problem of lack general > action in the source tree - but at least git will allow people to have > a sense of freedom. And it is up to the central maintainers what to > include and what not. Much like the role Linus plays in kernel > development. > > As ever, with respect, > > Pj. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr2008 at thebird.nl Thu Jun 12 21:38:09 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Fri, 13 Jun 2008 03:38:09 +0200 Subject: [BioRuby] bioruby-testing-central on github Message-ID: <20080613013809.GA6689@thebird.nl> The Bioruby repository has been cloned to bioruby-testing-central: http://github.com/pjotrp/bioruby-testing-central/tree/master The convention is to name your repository as 'bioruby-testing-yourname'. So my version will be bioruby-testing-pjotr. If you register yourself with github I can add you as a collaborator. Note: we are *not* competing with the main Bioruby tree - this is a facility to encourage code submissions. It is up to the main Bioruby maintainers whether stuff gets included in the main tree. This is a bioruby-testing tree. Clone the central repository with: git clone git://github.com/pjotrp/bioruby-testing-central.git You don't need to register for that. Patches can be submitted over E-mail. For using git see the tutorial at: http://kernel.org/pub/software/scm/git/docs/gittutorial.html From pjotr2008 at thebird.nl Sat Jun 14 10:07:56 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Sat, 14 Jun 2008 16:07:56 +0200 Subject: [BioRuby] bioruby-testing-central on github Message-ID: <20080614140756.GA21822@thebird.nl> I have kicked off including support for microarrays in Bioruby with an Affymetrix CEL file reader (which is based on Ben Bolstad's Affyio, part of R/Bioconductor). The mapping is done in biolib (http://biolib.open-bio.org/). Bioruby sources are on: http://github.com/pjotrp/bioruby-testing-central/commit/5a32ff510208228b61483ee683d386ccbc3d87f2 Simple file loading works. E.g. ARGV.each do | fn | array = Bio::Microarray::Affy.new('GSM11002.CEL.gz') (0..20).each do | i | print array.intensity(i),", " end end Next step is probe(set) mapping and support for regular tab delimited files and CSV's (bit like read.table in R). If anyone is interested in participating... Pj. From mail at michaelbarton.me.uk Sat Jun 14 10:33:27 2008 From: mail at michaelbarton.me.uk (Michael Barton) Date: Sat, 14 Jun 2008 15:33:27 +0100 Subject: [BioRuby] bioruby-testing-central on github In-Reply-To: <20080613013809.GA6689@thebird.nl> References: <20080613013809.GA6689@thebird.nl> Message-ID: That looks really good. I think BioRuby being the first bio* library to use git and Github for distributed revision control is a really great step, and demonstrates the forward thinking of the BioRuby community. On 13 Jun 2008, at 02:38, Pjotr Prins wrote: > The Bioruby repository has been cloned to bioruby-testing-central: > > http://github.com/pjotrp/bioruby-testing-central/tree/master > > The convention is to name your repository as 'bioruby-testing- > yourname'. > So my version will be bioruby-testing-pjotr. If you register yourself > with github I can add you as a collaborator. Note: we are *not* > competing with the main Bioruby tree - this is a facility to > encourage code submissions. It is up to the main Bioruby maintainers > whether stuff gets included in the main tree. This is a > bioruby-testing tree. > > Clone the central repository with: > > git clone git://github.com/pjotrp/bioruby-testing-central.git > > You don't need to register for that. Patches can be submitted over > E-mail. > > For using git see the tutorial at: > > http://kernel.org/pub/software/scm/git/docs/gittutorial.html > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From email2ants at gmail.com Mon Jun 16 11:00:52 2008 From: email2ants at gmail.com (Anthony Underwood) Date: Mon, 16 Jun 2008 16:00:52 +0100 Subject: [BioRuby] Remaining work to be done after BioHakathon Message-ID: Dear Toshiaki and other biorubyists, I had a look at the new REST interface implemented at TogoWS. It looks really nice (and it's fast). You mention that there is work to be done to convert GenBank, EMBL, UniProt, BioSQL in to the new Bio::Sequence model. What is the current status of this? Is the work/progress made at the BioHackathon publicly visible? I would love to contribute and help - how can I help - perhaps the most recent changes can be added to the repository now on github? Thanks Anthony Dr Anthony Underwood Bioinformatics Unit | Statistics, Modelling and Bioinformatics Department Centre for Infections Health Protection Agency 61 Colindale Avenue London NW9 5HT t: 0208 3276466 f: 0208 3276738 e:anthony.underwood at hpa.org.uk From ngoto at gen-info.osaka-u.ac.jp Wed Jun 18 07:34:10 2008 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 18 Jun 2008 20:34:10 +0900 Subject: [BioRuby] Remaining work to be done after BioHakathon In-Reply-To: References: Message-ID: <20080618113412.11E401CBC465@idnmail.gen-info.osaka-u.ac.jp> Dear Anthony, On Mon, 16 Jun 2008 16:00:52 +0100 Anthony Underwood wrote: > Dear Toshiaki and other biorubyists, > > I had a look at the new REST interface implemented at TogoWS. It looks > really nice (and it's fast). You mention that there is work to be done > to convert GenBank, EMBL, UniProt, BioSQL in to the new Bio::Sequence > model. > > What is the current status of this? Current status is: * Conversion of GenBank and EMBL from/to Bio::Sequence usually works fine, though some fields are not yet supported (for example, PROJECT, CONSRTM, SEGMENT, CONTIG in GenBank). * Codes to support BioSQL from/to Bio::Sequence have been added, but not well tested. * UniProt from/to Bio::Sequence is not supported. * A few documents and unit tests about the Bio::Sequence model and related codes. > Is the work/progress made at the BioHackathon publicly visible? It is stored in the CVS BRANCH-biohackathon2008 branch. You can get via anonymous CVS or CVSWeb. Please see http://www.open-bio.org/wiki/SourceCode . Unfortunately, just now, I couldn't access the anonymous CVS, and this suggests trouble in the code.open-bio.org. (Note that CVS repository for developers is OK.) > I would love to contribute and help - how can I help - perhaps the > most recent changes can be added to the repository now on github? Now, CVS is still used for development. In the future, we will move to svn and/or git (or Mercurial?). Currently, I don't know whether changes made in the CVS are pushed into the github repository. Thanks, -- Naohisa Goto ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp From pjotr2008 at thebird.nl Wed Jun 18 08:37:24 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Wed, 18 Jun 2008 14:37:24 +0200 Subject: [BioRuby] Remaining work to be done after BioHakathon In-Reply-To: <20080618113412.11E401CBC465@idnmail.gen-info.osaka-u.ac.jp> References: <20080618113412.11E401CBC465@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20080618123724.GA30925@thebird.nl> On Wed, Jun 18, 2008 at 08:34:10PM +0900, Naohisa GOTO wrote: > Currently, I don't know whether changes made in the CVS are > pushed into the github repository. According to this: http://issaris.blogspot.com/2005/11/cvs-to-git-and-back.html it is feasible to import the changelog. My github bioruby-testing tree has no synch with the changelog - I don't think it is necessary because, as it stands, diffs will be patched into the main tree, when required and get added to the changelog that way. However the main tree, were it to be hosted on github, can contain the CVS log using the tools mentioned in that blog and I will rebranch the testing tree from the main tree once you get to to that point so as to have the main changelog again. Note that Jan has reserved the bioruby name on github for us. Pj. P.S. After having used both mercurial (for most of my projects) and now git I am convinced git is the better choice for Bioruby. Mostly because it nicely allows handling a central repository (mercurial has no obvious model for that) and because of the Linux kernel it gets loads of developer attention - e.g. CVS/SVN mapping and github itself are major and useful functionalities. I think you will like it (certainly coming from CVS). I have heavily deployed darcs, mercurial, svn and now git. With all of them I have had conflicts and broken repositories. darcs was nice, but often broke with larger repositories, mercurial is really nice though its conflict resolution can be non-obvious with merges, svn is better than CVS, but not a major step forward (I particularly hate that server deployment and the BDB tends to get upset mixing ssh and the webservice). So now we have git. git has impressed me for being so mature and useful. I'll pick it as the winner for large development efforts. From mail at michaelbarton.me.uk Wed Jun 18 09:25:33 2008 From: mail at michaelbarton.me.uk (Michael Barton) Date: Wed, 18 Jun 2008 14:25:33 +0100 Subject: [BioRuby] bioruby-testing-central on github In-Reply-To: References: <20080613013809.GA6689@thebird.nl> Message-ID: I'd like to raise a point about forking vs cloning. In your email Pjotr you recommend that new users clone the bioruby repository. Which is the way git is be used. However since the project is on Github, a project can be forked instead of cloned. Here is the Github blurb on this By forking a project instead of ((cloning, creating a new GitHub repo, and pushing to it)), you allow us to create a link between your fork and the original. This link helps us keep you informed of changes to the original codebase and make it trivial for you to notify the originator of changes that you have made and would like have reviewed. I think an additional unmentioned advantage is that a fork only contains the differences between yours and an original. Rather than the complete cloned repo, so this would save space too. Mike On 14 Jun 2008, at 15:33, Michael Barton wrote: > That looks really good. I think BioRuby being the first bio* library > to use git and Github for distributed revision control is a really > great step, and demonstrates the forward thinking of the BioRuby > community. > > On 13 Jun 2008, at 02:38, Pjotr Prins wrote: > >> The Bioruby repository has been cloned to bioruby-testing-central: >> >> http://github.com/pjotrp/bioruby-testing-central/tree/master >> >> The convention is to name your repository as 'bioruby-testing- >> yourname'. >> So my version will be bioruby-testing-pjotr. If you register >> yourself >> with github I can add you as a collaborator. Note: we are *not* >> competing with the main Bioruby tree - this is a facility to >> encourage code submissions. It is up to the main Bioruby maintainers >> whether stuff gets included in the main tree. This is a >> bioruby-testing tree. >> >> Clone the central repository with: >> >> git clone git://github.com/pjotrp/bioruby-testing-central.git >> >> You don't need to register for that. Patches can be submitted over >> E-mail. >> >> For using git see the tutorial at: >> >> http://kernel.org/pub/software/scm/git/docs/gittutorial.html >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > From pjotr2008 at thebird.nl Wed Jun 18 09:40:27 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Wed, 18 Jun 2008 15:40:27 +0200 Subject: [BioRuby] bioruby-testing-central on github In-Reply-To: References: <20080613013809.GA6689@thebird.nl> Message-ID: <20080618134027.GA662@thebird.nl> In my E-mail I showed you can individually branch (read 'fork') from bioruby-testing-central. I have forked bioruby-testing-pjotr myself. bioruby-testing-central is a clone of Bioruby since Bioruby is hosted on CVS - not on git. I will fork once there is a git Bioruby repository. Sorry if it all is a bit confusing. We are waiting for Bioruby to migrate from CVS. Pj. On Wed, Jun 18, 2008 at 02:25:33PM +0100, Michael Barton wrote: > I'd like to raise a point about forking vs cloning. In your email > Pjotr you recommend that new users clone the bioruby repository. Which > is the way git is be used. However since the project is on Github, a > project can be forked instead of cloned. Here is the Github blurb on > this > > By forking a project instead of ((cloning, creating a new GitHub repo, > and pushing to it)), you allow us to create a link between your fork > and the original. This link helps us keep you informed of changes to > the original codebase and make it trivial for you to notify the > originator of changes that you have made and would like have reviewed. > > I think an additional unmentioned advantage is that a fork only > contains the differences between yours and an original. Rather than > the complete cloned repo, so this would save space too. > > Mike > > > On 14 Jun 2008, at 15:33, Michael Barton wrote: > > >That looks really good. I think BioRuby being the first bio* library > >to use git and Github for distributed revision control is a really > >great step, and demonstrates the forward thinking of the BioRuby > >community. > > > >On 13 Jun 2008, at 02:38, Pjotr Prins wrote: > > > >>The Bioruby repository has been cloned to bioruby-testing-central: > >> > >>http://github.com/pjotrp/bioruby-testing-central/tree/master > >> > >>The convention is to name your repository as 'bioruby-testing- > >>yourname'. > >>So my version will be bioruby-testing-pjotr. If you register > >>yourself > >>with github I can add you as a collaborator. Note: we are *not* > >>competing with the main Bioruby tree - this is a facility to > >>encourage code submissions. It is up to the main Bioruby maintainers > >>whether stuff gets included in the main tree. This is a > >>bioruby-testing tree. > >> > >>Clone the central repository with: > >> > >>git clone git://github.com/pjotrp/bioruby-testing-central.git > >> > >>You don't need to register for that. Patches can be submitted over > >>E-mail. > >> > >>For using git see the tutorial at: > >> > >>http://kernel.org/pub/software/scm/git/docs/gittutorial.html > >> > >> > >>_______________________________________________ > >>BioRuby mailing list > >>BioRuby at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioruby > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From mattscilipoti at possiamo.com Wed Jun 25 13:53:53 2008 From: mattscilipoti at possiamo.com (Matt Scilipoti) Date: Wed, 25 Jun 2008 13:53:53 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) Message-ID: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> I am attempting to use bioruby (blast) with a rails application. It works on my dev machine (OSX), but not the Production Server (suse sles). When I attempt to perform a blast query on the Production server I receive "Errno::EPIPE (Broken pipe)" and "command not found: blastall". The permissions and path look correct to me. Is it possible that there is an apache permission issue? This occurred intermittently when I was using a mongrel_cluster (usually fixed when I restarted the cluster manually thru ssh. If I used capistrano to restart the cluster, it would not fix it. Production server config: Apache2 passenger 2.0.1 (mod_rails) > which blastall /usr/local/bin/blastall > ls -lsa /usr/local/bin/blastall 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall -> /usr/local/lib/blast-2.2.16/bin/blastall > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 /usr/local/lib/blast-2.2.16/bin/blastall > echo $PATH /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin Errors: production.log: Errno::EPIPE (Broken pipe): /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `write' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `print' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `query_command_popen' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in `popen' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in `query_command_popen' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in `query_command' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in `exec_local' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in `send' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in `query' /app/models/blast_query.rb:83:in `query' The apache error.log indicates: /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W 3 Thank you, Matt -- Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, Training & Development From adamnkraut at gmail.com Wed Jun 25 14:30:19 2008 From: adamnkraut at gmail.com (Adam Kraut) Date: Wed, 25 Jun 2008 14:30:19 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) In-Reply-To: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> References: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> Message-ID: <134ede0b0806251130u3c542a70j7de8196a4ee63c95@mail.gmail.com> It's possible that 'blastall' is not in the path of the user running the apache process. On my system Apache runs as user 'www', so to run blastall from a rails app I would edit the system-wide profile in /etc/profile. You might also want to check how to set the user for Passenger as it may be different from Apache. Using the full path (/usr/local/bin/blastall) should also work for any user but I'm not sure if the bioruby wrapper lets you do this. Cheers, Adam On Wed, Jun 25, 2008 at 1:53 PM, Matt Scilipoti wrote: > I am attempting to use bioruby (blast) with a rails application. It > works on my dev machine (OSX), but not the Production Server (suse > sles). When I attempt to perform a blast query on the Production > server I receive "Errno::EPIPE (Broken pipe)" and "command not found: > blastall". The permissions and path look correct to me. Is it > possible that there is an apache permission issue? This occurred > intermittently when I was using a mongrel_cluster (usually fixed when > I restarted the cluster manually thru ssh. If I used capistrano to > restart the cluster, it would not fix it. > > Production server config: > Apache2 > passenger 2.0.1 (mod_rails) > > > which blastall > /usr/local/bin/blastall > > > ls -lsa /usr/local/bin/blastall > 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall > -> /usr/local/lib/blast-2.2.16/bin/blastall > > > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall > 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 > /usr/local/lib/blast-2.2.16/bin/blastall > > > echo $PATH > > /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin > > Errors: > production.log: > Errno::EPIPE (Broken pipe): > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `write' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `print' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in `popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in > `query_command' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in > `exec_local' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in > `send' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in > `query' > /app/models/blast_query.rb:83:in `query' > > > The apache error.log indicates: > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: > command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m > 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W > 3 > > Thank you, > Matt > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From adamnkraut at gmail.com Wed Jun 25 15:59:59 2008 From: adamnkraut at gmail.com (Adam Kraut) Date: Wed, 25 Jun 2008 15:59:59 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) In-Reply-To: <217377760806251241l1180fe94q5f788900ee1aab9@mail.gmail.com> References: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> <134ede0b0806251130u3c542a70j7de8196a4ee63c95@mail.gmail.com> <217377760806251241l1180fe94q5f788900ee1aab9@mail.gmail.com> Message-ID: <20B316D5-B461-45D5-8E0C-F47ED5176466@gmail.com> One way to test the 'www' user's environment would be to switch to that user with 'su www' or 'sudo www'. That way you can check if blastall is actually in $PATH. Regarding the per-server configuration, do you use multiple production servers or do you mean the full path will be different between development and production? For the latter, you can set variables in development.rb and production.rb. Something like BLAST_PATH = '/path/to/blastall' can be set per environment and in your application set (factory.blastall = BLAST_PATH). That's the way I do it but there may be better solutions out there. Also, if you reply through the bioruby list more people can follow the discussion and offer advice. Best, Adam On Jun 25, 2008, at 3:41 PM, Matt Scilipoti wrote: > Thank you. > I tried adding the path to /etc/profile.local (recommended in SUSE), > but I was unsure how to ensure that 'www' was using the new > profile. I tried restarting apache, but the error still occurs. > Is a server restart necessary? This is difficult, but possible. > > The problem is solved by assigning the full path for blastall > (factory.blastall='/usr/local/bin/blastall'). But this solution > requires me to provide some configuration for each server, so a path > solution would be best. > > Thanks again, > Matt > > On Wed, Jun 25, 2008 at 2:30 PM, Adam Kraut > wrote: > It's possible that 'blastall' is not in the path of the user running > the apache process. On my system Apache runs as user 'www', so to > run blastall from a rails app I would edit the system-wide profile > in /etc/profile. You might also want to check how to set the user > for Passenger as it may be different from Apache. Using the full > path (/usr/local/bin/blastall) should also work for any user but I'm > not sure if the bioruby wrapper lets you do this. > > Cheers, > Adam > > On Wed, Jun 25, 2008 at 1:53 PM, Matt Scilipoti > wrote: > I am attempting to use bioruby (blast) with a rails application. It > works on my dev machine (OSX), but not the Production Server (suse > sles). When I attempt to perform a blast query on the Production > server I receive "Errno::EPIPE (Broken pipe)" and "command not found: > blastall". The permissions and path look correct to me. Is it > possible that there is an apache permission issue? This occurred > intermittently when I was using a mongrel_cluster (usually fixed when > I restarted the cluster manually thru ssh. If I used capistrano to > restart the cluster, it would not fix it. > > Production server config: > Apache2 > passenger 2.0.1 (mod_rails) > > > which blastall > /usr/local/bin/blastall > > > ls -lsa /usr/local/bin/blastall > 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall > -> /usr/local/lib/blast-2.2.16/bin/blastall > > > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall > 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 > /usr/local/lib/blast-2.2.16/bin/blastall > > > echo $PATH > /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/ > bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin > > Errors: > production.log: > Errno::EPIPE (Broken pipe): > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `write' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `print' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in > `popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in > `query_command' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in > `exec_local' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb: > 212:in `send' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb: > 212:in `query' > /app/models/blast_query.rb:83:in `query' > > > The apache error.log indicates: > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: > command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m > 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W > 3 > > Thank you, > Matt > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > > > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development From mattscilipoti at possiamo.com Thu Jun 26 12:10:45 2008 From: mattscilipoti at possiamo.com (Matt Scilipoti) Date: Thu, 26 Jun 2008 12:10:45 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) In-Reply-To: <20B316D5-B461-45D5-8E0C-F47ED5176466@gmail.com> References: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> <134ede0b0806251130u3c542a70j7de8196a4ee63c95@mail.gmail.com> <217377760806251241l1180fe94q5f788900ee1aab9@mail.gmail.com> <20B316D5-B461-45D5-8E0C-F47ED5176466@gmail.com> Message-ID: <217377760806260910y487677a8gde90f93194130df@mail.gmail.com> Thank you for this additional help.In the short term, I chose to use a constant in the environment files as well. We have developers on different platforms, so this is still problematic. I tried updating the PATH in /etc/profile and restarting apache, but it still couldn't find blastall. This is a client's hosted machine. This process has alerted me to the fact that both apache and passenger are running as root. I will research if this is standard on SUSE, or just my client, and correct accordingly. This also answers why passenger still couldn't find blastall after I updated /etc/profile and restarted apache. Root probably wasn't updated. I don't have the root password, so I can't `source /etc/profile` as root. Just to make sure my assumptions are right. If apache and passenger are running under the 'www' user, once I make a change to /etc/profile, will they get this change when I restart apache? Or is another action needed? Thanks again, matt On Wed, Jun 25, 2008 at 3:59 PM, Adam Kraut wrote: > One way to test the 'www' user's environment would be to switch to that > user with 'su www' or 'sudo www'. That way you can check if blastall is > actually in $PATH. Regarding the per-server configuration, do you use > multiple production servers or do you mean the full path will be different > between development and production? For the latter, you can set variables > in development.rb and production.rb. Something like BLAST_PATH = > '/path/to/blastall' can be set per environment and in your application set > (factory.blastall = BLAST_PATH). That's the way I do it but there may be > better solutions out there. Also, if you reply through the bioruby list > more people can follow the discussion and offer advice. > Best, > Adam > > On Jun 25, 2008, at 3:41 PM, Matt Scilipoti wrote: > > Thank you. I tried adding the path to /etc/profile.local (recommended in > SUSE), but I was unsure how to ensure that 'www' was using the new profile. > I tried restarting apache, but the error still occurs. Is a server > restart necessary? This is difficult, but possible. > > The problem is solved by assigning the full path for blastall > (factory.blastall='/usr/local/bin/blastall'). But this solution requires > me to provide some configuration for each server, so a path solution would > be best. > > Thanks again, > Matt > > On Wed, Jun 25, 2008 at 2:30 PM, Adam Kraut wrote: > >> It's possible that 'blastall' is not in the path of the user running the >> apache process. On my system Apache runs as user 'www', so to run blastall >> from a rails app I would edit the system-wide profile in /etc/profile. You >> might also want to check how to set the user for Passenger as it may be >> different from Apache. Using the full path (/usr/local/bin/blastall) should >> also work for any user but I'm not sure if the bioruby wrapper lets you do >> this. >> >> Cheers, >> Adam >> >> On Wed, Jun 25, 2008 at 1:53 PM, Matt Scilipoti < >> mattscilipoti at possiamo.com> wrote: >> >>> I am attempting to use bioruby (blast) with a rails application. It >>> works on my dev machine (OSX), but not the Production Server (suse >>> sles). When I attempt to perform a blast query on the Production >>> server I receive "Errno::EPIPE (Broken pipe)" and "command not found: >>> blastall". The permissions and path look correct to me. Is it >>> possible that there is an apache permission issue? This occurred >>> intermittently when I was using a mongrel_cluster (usually fixed when >>> I restarted the cluster manually thru ssh. If I used capistrano to >>> restart the cluster, it would not fix it. >>> >>> Production server config: >>> Apache2 >>> passenger 2.0.1 (mod_rails) >>> >>> > which blastall >>> /usr/local/bin/blastall >>> >>> > ls -lsa /usr/local/bin/blastall >>> 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall >>> -> /usr/local/lib/blast-2.2.16/bin/blastall >>> >>> > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall >>> 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 >>> /usr/local/lib/blast-2.2.16/bin/blastall >>> >>> > echo $PATH >>> >>> /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin >>> >>> Errors: >>> production.log: >>> Errno::EPIPE (Broken pipe): >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in >>> `write' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in >>> `print' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in >>> `query_command_popen' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in >>> `popen' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in >>> `query_command_popen' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in >>> `query_command' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in >>> `exec_local' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in >>> `send' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in >>> `query' >>> /app/models/blast_query.rb:83:in `query' >>> >>> >>> The apache error.log indicates: >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: >>> command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m >>> 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W >>> 3 >>> >>> Thank you, >>> Matt >>> -- >>> Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, >>> Training & Development >>> _______________________________________________ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby >>> >> >> > > > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development > > > -- Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, Training & Development From donttrustben at gmail.com Sat Jun 28 00:20:25 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sat, 28 Jun 2008 14:20:25 +1000 Subject: [BioRuby] GFF3 attribute parser problems Message-ID: Hi, I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 class GFF3 < GFF VERSION = 3 private def parse_attributes(attributes) hash = Hash.new attributes.split(/[^\\];/).each do |atr| key, value = atr.split('=', 2) hash[key] = value end return hash end My problem is with the split([/^\\]) bit, because it chops off an extra character at the end of the key: irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) => ["abc=de", "one=two"] where we want => ["abc=def", "one=two"] I took a shortcut to solve it and ignored the escaping of the ; and just did this hash = Hash.new attributes.split(/;/).each do |atr| key, value = atr.split('=', 2) hash[key] = value end return hash Thanks, ben From donttrustben at gmail.com Sat Jun 28 00:26:14 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sat, 28 Jun 2008 14:26:14 +1000 Subject: [BioRuby] blast -m7 (xml) and multiple queries Message-ID: Hi, I seem to have run across a bug in the bioruby blast report parser, in that it isn't able to handle reports that span multiple query sequences. My code for parsing is Bio::Blast.reports(ARGF) do |report| puts "Hits for " + report.query_def + " against " + report.db report.each {|hit| hit.each do |hsp| puts [ report.query_def, hit.accession, hsp.query_from, hsp.query_to, hsp.hit_from, hsp.hit_to, hsp.evalue, hit.target_def ].join("\t") end } When I run this on a blast xml output with 2 queries (1st has 10 hits and 2nd has 7), I get 8 hits shown, which is somewhat confusing. The query sequences are somewhat similar, so they have some hits in common - perhaps this sort of explains the number 8. I'm using bioruby from git http://github.com/bioruby/bioruby/commit/a61b16163d3ca74f3f7c8d8e8f03f5f8c68dee60 Using the newest blast (2.2.18). Is this easy to fix? Is there a workaround? A partial answer: According to http://rubyforge.org//tracker/index.php?func=detail&aid=20272&group_id=769&atid=3037 this is an unopened, unfixed bug, caused by a change in the NCBI XML schema. I can workaround by reblasting with the legacy flag -V. Thanks in advance, ben From donttrustben at gmail.com Sat Jun 28 00:45:40 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sat, 28 Jun 2008 14:45:40 +1000 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: not fixed as of most recent git commit, either http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb line 120 2008/6/28 Ben Woodcroft : > Hi, > > I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 > > class GFF3 < GFF > VERSION = 3 > > private > > def parse_attributes(attributes) > hash = Hash.new > attributes.split(/[^\\];/).each do |atr| > key, value = atr.split('=', 2) > hash[key] = value > end > return hash > end > > My problem is with the split([/^\\]) bit, because it chops off an > extra character at the end of the key: > > irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) > => ["abc=de", "one=two"] > > where we want > => ["abc=def", "one=two"] > > > > I took a shortcut to solve it and ignored the escaping of the ; and > just did this > > hash = Hash.new > attributes.split(/;/).each do |atr| > key, value = atr.split('=', 2) > hash[key] = value > end > return hash > > > Thanks, > ben > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From donttrustben at gmail.com Sat Jun 28 21:57:39 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sun, 29 Jun 2008 11:57:39 +1000 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: I have attempted a fix, and pushed it to github. I forked the main branch, not the testing one, because I class this as a bug fix, not a new feature. Available at http://github.com/wwood/bioruby/tree/master I actually had to create a new class Bio::GFF::GFF3::Record: > not fixed as of most recent git commit, either > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb > > line 120 > > 2008/6/28 Ben Woodcroft : >> Hi, >> >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 >> >> class GFF3 < GFF >> VERSION = 3 >> >> private >> >> def parse_attributes(attributes) >> hash = Hash.new >> attributes.split(/[^\\];/).each do |atr| >> key, value = atr.split('=', 2) >> hash[key] = value >> end >> return hash >> end >> >> My problem is with the split([/^\\]) bit, because it chops off an >> extra character at the end of the key: >> >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) >> => ["abc=de", "one=two"] >> >> where we want >> => ["abc=def", "one=two"] >> >> >> >> I took a shortcut to solve it and ignored the escaping of the ; and >> just did this >> >> hash = Hash.new >> attributes.split(/;/).each do |atr| >> key, value = atr.split('=', 2) >> hash[key] = value >> end >> return hash >> >> >> Thanks, >> ben >> > > > > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From ngoto at gen-info.osaka-u.ac.jp Sun Jun 29 05:51:08 2008 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 29 Jun 2008 18:51:08 +0900 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: <20080629095108.F0FD01CBC52E@idnmail.gen-info.osaka-u.ac.jp> Hi, Thank you for reporting bugs. The GFF3 specification http://song.sourceforge.net/gff3.shtml says that URL escaping rule are used for escaping semicolons, not backslashes. (cited from http://song.sourceforge.net/gff3.shtml) >> Column 9: "attributes" >> >> A list of feature attributes in the format tag=value. Multiple >> tag=value pairs are separated by semicolons. URL escaping rules are >> used for tags or values containing the following characters: ",=;". So, the existing code in BioRuby 1.2.1 is apparently wrong. (I don't know, but perhaps it might be written before the specification was well established?) If nonstandard (and illegal) GFF3 data using backslash for escape is popular, we should also consider it, but the main GFF3 class should keep the official specification. I see your changes in git, but your code seems to be still using "wrong" escaping rule and unconscious of escaping of other characters (",=;&%" and %XX escapes). BTW, I think the Bio::GFF classes in bioruby should be changed to supportcreating GFF objects from scratch and output of GFFs. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Sun, 29 Jun 2008 11:57:39 +1000 "Ben Woodcroft" wrote: > I have attempted a fix, and pushed it to github. I forked the main > branch, not the testing one, because I class this as a bug fix, not a > new feature. Available at > http://github.com/wwood/bioruby/tree/master > > I actually had to create a new class > Bio::GFF::GFF3::Record attributes happens inside the record, not the parser. I'm not sure > this is the most sensible way, but I'm following the laziness virtue > for now. > > I hope these kinds of commits get added to the main repo.. > > Thanks, > ben > > > 2008/6/28 Ben Woodcroft : > > not fixed as of most recent git commit, either > > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb > > > > line 120 > > > > 2008/6/28 Ben Woodcroft : > >> Hi, > >> > >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 > >> > >> class GFF3 < GFF > >> VERSION = 3 > >> > >> private > >> > >> def parse_attributes(attributes) > >> hash = Hash.new > >> attributes.split(/[^\\];/).each do |atr| > >> key, value = atr.split('=', 2) > >> hash[key] = value > >> end > >> return hash > >> end > >> > >> My problem is with the split([/^\\]) bit, because it chops off an > >> extra character at the end of the key: > >> > >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) > >> => ["abc=de", "one=two"] > >> > >> where we want > >> => ["abc=def", "one=two"] > >> > >> > >> > >> I took a shortcut to solve it and ignored the escaping of the ; and > >> just did this > >> > >> hash = Hash.new > >> attributes.split(/;/).each do |atr| > >> key, value = atr.split('=', 2) > >> hash[key] = value > >> end > >> return hash > >> > >> > >> Thanks, > >> ben > >> > > > > > > > > -- > > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. > > > > > > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From donttrustben at gmail.com Sun Jun 29 10:33:43 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Mon, 30 Jun 2008 00:33:43 +1000 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: <20080629095108.F0FD01CBC52E@idnmail.gen-info.osaka-u.ac.jp> References: <20080629095108.F0FD01CBC52E@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Thanks for your reply. I've had a look at the spec - it seems to be far more complex that the current bioruby GFF3 is with the cross references, predefined tags, fasta, etc. - but I agree it would be good to have a fully featured parser/writer. I fixed it so with all the escaping characters, and created the corresponding tests, and committed to the same github repo. I don't know of any GFF file that uses the illegal blackslashes, so I took that code out. Actually, the current 1.2.1 code doesn't either - the backslash code isn't ever accessed as far as I can tell. I added to_s methods for GFF3 class and corresponding Record class, like you suggested. I think a big problem with GFF files is that they load the whole thing into memory, and the to_s method doesn't fix this. Maybe in the future... Thanks, ben 2008/6/29 Naohisa GOTO : > Hi, > > Thank you for reporting bugs. > > The GFF3 specification http://song.sourceforge.net/gff3.shtml > says that URL escaping rule are used for escaping semicolons, > not backslashes. > > (cited from http://song.sourceforge.net/gff3.shtml) >>> Column 9: "attributes" >>> >>> A list of feature attributes in the format tag=value. Multiple >>> tag=value pairs are separated by semicolons. URL escaping rules are >>> used for tags or values containing the following characters: ",=;". > > So, the existing code in BioRuby 1.2.1 is apparently wrong. > (I don't know, but perhaps it might be written before the > specification was well established?) > > If nonstandard (and illegal) GFF3 data using backslash for escape > is popular, we should also consider it, but the main GFF3 class > should keep the official specification. > > I see your changes in git, but your code seems to be still using > "wrong" escaping rule and unconscious of escaping of other characters > (",=;&%" and %XX escapes). > > BTW, I think the Bio::GFF classes in bioruby should be changed > to supportcreating GFF objects from scratch and output of GFFs. > > Thanks, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > On Sun, 29 Jun 2008 11:57:39 +1000 > "Ben Woodcroft" wrote: > >> I have attempted a fix, and pushed it to github. I forked the main >> branch, not the testing one, because I class this as a bug fix, not a >> new feature. Available at >> http://github.com/wwood/bioruby/tree/master >> >> I actually had to create a new class >> Bio::GFF::GFF3::Record> attributes happens inside the record, not the parser. I'm not sure >> this is the most sensible way, but I'm following the laziness virtue >> for now. >> >> I hope these kinds of commits get added to the main repo.. >> >> Thanks, >> ben >> >> >> 2008/6/28 Ben Woodcroft : >> > not fixed as of most recent git commit, either >> > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb >> > >> > line 120 >> > >> > 2008/6/28 Ben Woodcroft : >> >> Hi, >> >> >> >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 >> >> >> >> class GFF3 < GFF >> >> VERSION = 3 >> >> >> >> private >> >> >> >> def parse_attributes(attributes) >> >> hash = Hash.new >> >> attributes.split(/[^\\];/).each do |atr| >> >> key, value = atr.split('=', 2) >> >> hash[key] = value >> >> end >> >> return hash >> >> end >> >> >> >> My problem is with the split([/^\\]) bit, because it chops off an >> >> extra character at the end of the key: >> >> >> >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) >> >> => ["abc=de", "one=two"] >> >> >> >> where we want >> >> => ["abc=def", "one=two"] >> >> >> >> >> >> >> >> I took a shortcut to solve it and ignored the escaping of the ; and >> >> just did this >> >> >> >> hash = Hash.new >> >> attributes.split(/;/).each do |atr| >> >> key, value = atr.split('=', 2) >> >> hash[key] = value >> >> end >> >> return hash >> >> >> >> >> >> Thanks, >> >> ben >> >> >> > >> > >> > >> > -- >> > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. >> > >> >> >> >> -- >> FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From pjotr2008 at thebird.nl Mon Jun 30 03:34:32 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Mon, 30 Jun 2008 09:34:32 +0200 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: <20080630073432.GB32016@thebird.nl> On Sun, Jun 29, 2008 at 11:57:39AM +1000, Ben Woodcroft wrote: > I have attempted a fix, and pushed it to github. I forked the main > branch, not the testing one, because I class this as a bug fix, not a > new feature. Available at > http://github.com/wwood/bioruby/tree/master I think that is the correct thing to do. I had an E-mail questioning the role of bioruby-testing-central - or whether new development should not happen against the main tree. I would have it like this: New feature development can happen against the main tree with the blessing of the main tree maintainers. bioruby-testing-central is for development that is, for one reason or another, not encouraged in the main tree. Truly a *testing* tree, for people to play with. Like I am doing with my microarray stuff now. Once my microarray code matures it may well be it goes into main. Naturally everyone can branch of in such a way, on his or her own. But this way new feature development is visible to all. That is what it is for. I encourage using this tree for all things experimental. Pj. From pjotr2008 at thebird.nl Thu Jun 12 09:02:48 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Thu, 12 Jun 2008 11:02:48 +0200 Subject: [BioRuby] BioRuby on github + lighthouse In-Reply-To: <0A349FF7-51BD-4546-8744-B342C4EA8950@michaelbarton.me.uk> References: <0A349FF7-51BD-4546-8744-B342C4EA8950@michaelbarton.me.uk> Message-ID: <20080612090248.GA32143@thebird.nl> Hi All, In view of my recent bibtex commit fiasco - which I thought an improvement, but probably was a regression as N. pointed out and rolled back - I favour moving the sources to a non-centralized repository. This will allow individual development where the main maintainers can cherry-pick individual patches for inclusion in the stable and development trees. Toshiaki, both Jan and I want to ask you to check out this technology and take the lead by moving a 'blessed' branch into github. The alternative is that I do the same thing - both cases will allow you to continue as before, but some development will be on git branches. Technology does not solve problems - like the problem of lack general action in the source tree - but at least git will allow people to have a sense of freedom. And it is up to the central maintainers what to include and what not. Much like the role Linus plays in kernel development. As ever, with respect, Pj. From ktym at hgc.jp Thu Jun 12 20:32:46 2008 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 13 Jun 2008 05:32:46 +0900 Subject: [BioRuby] TogoWS (Re: BioRuby on github + lighthouse) In-Reply-To: <20080612090248.GA32143@thebird.nl> References: <0A349FF7-51BD-4546-8744-B342C4EA8950@michaelbarton.me.uk> <20080612090248.GA32143@thebird.nl> Message-ID: <99D301E3-DF12-4595-AB33-F0E74B7BA690@hgc.jp> Dear all, Sorry for my long absence after the BioHackathon held in this February. However, I'm afraid that I can't spare enough time for organizing your request for a while yet. Instead, I need to wrap up outputs from the BioHackathon first. BioRuby team had focused on the generalized sequence model and by completing the work I can provide pretty nice (hopefully) feature -- parsing any sequence database entry with REST-like web service API. I hope all of you like the following idea and help me to finish the task by integrating GenBank, EMBL, UniProt, BioSQL with the new Bio::Sequence model as we had discussed during the Hackathon. Sample implementation (TogoWS) is now available at http://togows.dbcls.jp/site/rest.html where you can find links to retrieve database entries with Rails like "Pretty URL" (sorry for the Japanese text, I'll provide English version some time). For example, plain GenBank entry HUMIGHAF is available at http://togows.dbcls.jp/entry/genbank/HUMIGHAF and you can obtain * XML version by http://togows.dbcls.jp/entry/genbank/HUMIGHAF.xml * FASTA version by http://togows.dbcls.jp/entry/genbank/HUMIGHAF.fasta and, as this service is built on top of the BioRuby library, you can also parse the entry to obtain a specific field by calling any bioruby method in the Bio::GenBank class with slash. * DEFINITION field http://togows.dbcls.jp/entry/genbank/HUMIGHAF/definition However, methods to fetch specific field varies database to database, because of the different implementations in the corresponding classes. Fortunately, It would be pretty easy to solve this situation. We just need to convert GenBank, EMBL, UniProt and BioSQL data model to the generic Bio::Sequence class and use the method in the generic class. And, this is the same story that we had agreed during the Hackathon. Along with this, we need to define a set of generic methods to access the internal structure and also need to define a set of standard output formats (for features, references, cross refs, dates etc.) - slightly tough part. For example, it would be great if I can extract feature table in a reusable standard format like GFF (or [protein] DAS) instead of a YAML/XML dump of the array of Bio::Feature class. (followings are not yet implemented but should return the same result). http://togows.dbcls.jp/entry/genbank/J00231.gff http://togows.dbcls.jp/entry/genbank/J00231/features http://togows.dbcls.jp/entry/embl/J00231.gff http://togows.dbcls.jp/entry/embl/J00231/features : All we need is to list up method names and return values (formats) commonly usable with any sequence database entries. Pj, you may also want to have something like http://togows.dbcls.jp/entry/pubmed/16381885.bibtex http://togows.dbcls.jp/entry/pubmed/16381885.endnote http://togows.dbcls.jp/entry/pubmed/16381885/url and these are trivial to implement, just add the appropriate methods in the Bio::Reference class. For this purpose, I don't hesitate to change internal logic/APIs as you made, as long as it is reasonable. I'm also planning to provide search interface and converters in a similar way. Converters will include BLAST output to GFF (maybe by using BioPerl :) etc. The outcomes of the BioHackathon 2008 was fairly diverse, but I think this approach is one direction to evolve the basic infrastructure of the bioinformatics resources towards the useful integration. Actually, the real problem is, I'm still busy with other tasks and can't spare 100% effort on these... Regards, Toshiaki Katayama On 2008/06/12, at 18:02, Pjotr Prins wrote: > Hi All, > > In view of my recent bibtex commit fiasco - which I thought an > improvement, but probably was a regression as N. pointed out and > rolled back - I favour moving the sources to a non-centralized > repository. This will allow individual development where the main > maintainers can cherry-pick individual patches for inclusion in the > stable and development trees. > > Toshiaki, both Jan and I want to ask you to check out this technology > and take the lead by moving a 'blessed' branch into github. The > alternative is that I do the same thing - both cases will allow you > to continue as before, but some development will be on git branches. > > Technology does not solve problems - like the problem of lack general > action in the source tree - but at least git will allow people to have > a sense of freedom. And it is up to the central maintainers what to > include and what not. Much like the role Linus plays in kernel > development. > > As ever, with respect, > > Pj. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr2008 at thebird.nl Fri Jun 13 01:38:09 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Fri, 13 Jun 2008 03:38:09 +0200 Subject: [BioRuby] bioruby-testing-central on github Message-ID: <20080613013809.GA6689@thebird.nl> The Bioruby repository has been cloned to bioruby-testing-central: http://github.com/pjotrp/bioruby-testing-central/tree/master The convention is to name your repository as 'bioruby-testing-yourname'. So my version will be bioruby-testing-pjotr. If you register yourself with github I can add you as a collaborator. Note: we are *not* competing with the main Bioruby tree - this is a facility to encourage code submissions. It is up to the main Bioruby maintainers whether stuff gets included in the main tree. This is a bioruby-testing tree. Clone the central repository with: git clone git://github.com/pjotrp/bioruby-testing-central.git You don't need to register for that. Patches can be submitted over E-mail. For using git see the tutorial at: http://kernel.org/pub/software/scm/git/docs/gittutorial.html From pjotr2008 at thebird.nl Sat Jun 14 14:07:56 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Sat, 14 Jun 2008 16:07:56 +0200 Subject: [BioRuby] bioruby-testing-central on github Message-ID: <20080614140756.GA21822@thebird.nl> I have kicked off including support for microarrays in Bioruby with an Affymetrix CEL file reader (which is based on Ben Bolstad's Affyio, part of R/Bioconductor). The mapping is done in biolib (http://biolib.open-bio.org/). Bioruby sources are on: http://github.com/pjotrp/bioruby-testing-central/commit/5a32ff510208228b61483ee683d386ccbc3d87f2 Simple file loading works. E.g. ARGV.each do | fn | array = Bio::Microarray::Affy.new('GSM11002.CEL.gz') (0..20).each do | i | print array.intensity(i),", " end end Next step is probe(set) mapping and support for regular tab delimited files and CSV's (bit like read.table in R). If anyone is interested in participating... Pj. From mail at michaelbarton.me.uk Sat Jun 14 14:33:27 2008 From: mail at michaelbarton.me.uk (Michael Barton) Date: Sat, 14 Jun 2008 15:33:27 +0100 Subject: [BioRuby] bioruby-testing-central on github In-Reply-To: <20080613013809.GA6689@thebird.nl> References: <20080613013809.GA6689@thebird.nl> Message-ID: That looks really good. I think BioRuby being the first bio* library to use git and Github for distributed revision control is a really great step, and demonstrates the forward thinking of the BioRuby community. On 13 Jun 2008, at 02:38, Pjotr Prins wrote: > The Bioruby repository has been cloned to bioruby-testing-central: > > http://github.com/pjotrp/bioruby-testing-central/tree/master > > The convention is to name your repository as 'bioruby-testing- > yourname'. > So my version will be bioruby-testing-pjotr. If you register yourself > with github I can add you as a collaborator. Note: we are *not* > competing with the main Bioruby tree - this is a facility to > encourage code submissions. It is up to the main Bioruby maintainers > whether stuff gets included in the main tree. This is a > bioruby-testing tree. > > Clone the central repository with: > > git clone git://github.com/pjotrp/bioruby-testing-central.git > > You don't need to register for that. Patches can be submitted over > E-mail. > > For using git see the tutorial at: > > http://kernel.org/pub/software/scm/git/docs/gittutorial.html > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From email2ants at gmail.com Mon Jun 16 15:00:52 2008 From: email2ants at gmail.com (Anthony Underwood) Date: Mon, 16 Jun 2008 16:00:52 +0100 Subject: [BioRuby] Remaining work to be done after BioHakathon Message-ID: Dear Toshiaki and other biorubyists, I had a look at the new REST interface implemented at TogoWS. It looks really nice (and it's fast). You mention that there is work to be done to convert GenBank, EMBL, UniProt, BioSQL in to the new Bio::Sequence model. What is the current status of this? Is the work/progress made at the BioHackathon publicly visible? I would love to contribute and help - how can I help - perhaps the most recent changes can be added to the repository now on github? Thanks Anthony Dr Anthony Underwood Bioinformatics Unit | Statistics, Modelling and Bioinformatics Department Centre for Infections Health Protection Agency 61 Colindale Avenue London NW9 5HT t: 0208 3276466 f: 0208 3276738 e:anthony.underwood at hpa.org.uk From ngoto at gen-info.osaka-u.ac.jp Wed Jun 18 11:34:10 2008 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 18 Jun 2008 20:34:10 +0900 Subject: [BioRuby] Remaining work to be done after BioHakathon In-Reply-To: References: Message-ID: <20080618113412.11E401CBC465@idnmail.gen-info.osaka-u.ac.jp> Dear Anthony, On Mon, 16 Jun 2008 16:00:52 +0100 Anthony Underwood wrote: > Dear Toshiaki and other biorubyists, > > I had a look at the new REST interface implemented at TogoWS. It looks > really nice (and it's fast). You mention that there is work to be done > to convert GenBank, EMBL, UniProt, BioSQL in to the new Bio::Sequence > model. > > What is the current status of this? Current status is: * Conversion of GenBank and EMBL from/to Bio::Sequence usually works fine, though some fields are not yet supported (for example, PROJECT, CONSRTM, SEGMENT, CONTIG in GenBank). * Codes to support BioSQL from/to Bio::Sequence have been added, but not well tested. * UniProt from/to Bio::Sequence is not supported. * A few documents and unit tests about the Bio::Sequence model and related codes. > Is the work/progress made at the BioHackathon publicly visible? It is stored in the CVS BRANCH-biohackathon2008 branch. You can get via anonymous CVS or CVSWeb. Please see http://www.open-bio.org/wiki/SourceCode . Unfortunately, just now, I couldn't access the anonymous CVS, and this suggests trouble in the code.open-bio.org. (Note that CVS repository for developers is OK.) > I would love to contribute and help - how can I help - perhaps the > most recent changes can be added to the repository now on github? Now, CVS is still used for development. In the future, we will move to svn and/or git (or Mercurial?). Currently, I don't know whether changes made in the CVS are pushed into the github repository. Thanks, -- Naohisa Goto ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp From pjotr2008 at thebird.nl Wed Jun 18 12:37:24 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Wed, 18 Jun 2008 14:37:24 +0200 Subject: [BioRuby] Remaining work to be done after BioHakathon In-Reply-To: <20080618113412.11E401CBC465@idnmail.gen-info.osaka-u.ac.jp> References: <20080618113412.11E401CBC465@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20080618123724.GA30925@thebird.nl> On Wed, Jun 18, 2008 at 08:34:10PM +0900, Naohisa GOTO wrote: > Currently, I don't know whether changes made in the CVS are > pushed into the github repository. According to this: http://issaris.blogspot.com/2005/11/cvs-to-git-and-back.html it is feasible to import the changelog. My github bioruby-testing tree has no synch with the changelog - I don't think it is necessary because, as it stands, diffs will be patched into the main tree, when required and get added to the changelog that way. However the main tree, were it to be hosted on github, can contain the CVS log using the tools mentioned in that blog and I will rebranch the testing tree from the main tree once you get to to that point so as to have the main changelog again. Note that Jan has reserved the bioruby name on github for us. Pj. P.S. After having used both mercurial (for most of my projects) and now git I am convinced git is the better choice for Bioruby. Mostly because it nicely allows handling a central repository (mercurial has no obvious model for that) and because of the Linux kernel it gets loads of developer attention - e.g. CVS/SVN mapping and github itself are major and useful functionalities. I think you will like it (certainly coming from CVS). I have heavily deployed darcs, mercurial, svn and now git. With all of them I have had conflicts and broken repositories. darcs was nice, but often broke with larger repositories, mercurial is really nice though its conflict resolution can be non-obvious with merges, svn is better than CVS, but not a major step forward (I particularly hate that server deployment and the BDB tends to get upset mixing ssh and the webservice). So now we have git. git has impressed me for being so mature and useful. I'll pick it as the winner for large development efforts. From mail at michaelbarton.me.uk Wed Jun 18 13:25:33 2008 From: mail at michaelbarton.me.uk (Michael Barton) Date: Wed, 18 Jun 2008 14:25:33 +0100 Subject: [BioRuby] bioruby-testing-central on github In-Reply-To: References: <20080613013809.GA6689@thebird.nl> Message-ID: I'd like to raise a point about forking vs cloning. In your email Pjotr you recommend that new users clone the bioruby repository. Which is the way git is be used. However since the project is on Github, a project can be forked instead of cloned. Here is the Github blurb on this By forking a project instead of ((cloning, creating a new GitHub repo, and pushing to it)), you allow us to create a link between your fork and the original. This link helps us keep you informed of changes to the original codebase and make it trivial for you to notify the originator of changes that you have made and would like have reviewed. I think an additional unmentioned advantage is that a fork only contains the differences between yours and an original. Rather than the complete cloned repo, so this would save space too. Mike On 14 Jun 2008, at 15:33, Michael Barton wrote: > That looks really good. I think BioRuby being the first bio* library > to use git and Github for distributed revision control is a really > great step, and demonstrates the forward thinking of the BioRuby > community. > > On 13 Jun 2008, at 02:38, Pjotr Prins wrote: > >> The Bioruby repository has been cloned to bioruby-testing-central: >> >> http://github.com/pjotrp/bioruby-testing-central/tree/master >> >> The convention is to name your repository as 'bioruby-testing- >> yourname'. >> So my version will be bioruby-testing-pjotr. If you register >> yourself >> with github I can add you as a collaborator. Note: we are *not* >> competing with the main Bioruby tree - this is a facility to >> encourage code submissions. It is up to the main Bioruby maintainers >> whether stuff gets included in the main tree. This is a >> bioruby-testing tree. >> >> Clone the central repository with: >> >> git clone git://github.com/pjotrp/bioruby-testing-central.git >> >> You don't need to register for that. Patches can be submitted over >> E-mail. >> >> For using git see the tutorial at: >> >> http://kernel.org/pub/software/scm/git/docs/gittutorial.html >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > From pjotr2008 at thebird.nl Wed Jun 18 13:40:27 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Wed, 18 Jun 2008 15:40:27 +0200 Subject: [BioRuby] bioruby-testing-central on github In-Reply-To: References: <20080613013809.GA6689@thebird.nl> Message-ID: <20080618134027.GA662@thebird.nl> In my E-mail I showed you can individually branch (read 'fork') from bioruby-testing-central. I have forked bioruby-testing-pjotr myself. bioruby-testing-central is a clone of Bioruby since Bioruby is hosted on CVS - not on git. I will fork once there is a git Bioruby repository. Sorry if it all is a bit confusing. We are waiting for Bioruby to migrate from CVS. Pj. On Wed, Jun 18, 2008 at 02:25:33PM +0100, Michael Barton wrote: > I'd like to raise a point about forking vs cloning. In your email > Pjotr you recommend that new users clone the bioruby repository. Which > is the way git is be used. However since the project is on Github, a > project can be forked instead of cloned. Here is the Github blurb on > this > > By forking a project instead of ((cloning, creating a new GitHub repo, > and pushing to it)), you allow us to create a link between your fork > and the original. This link helps us keep you informed of changes to > the original codebase and make it trivial for you to notify the > originator of changes that you have made and would like have reviewed. > > I think an additional unmentioned advantage is that a fork only > contains the differences between yours and an original. Rather than > the complete cloned repo, so this would save space too. > > Mike > > > On 14 Jun 2008, at 15:33, Michael Barton wrote: > > >That looks really good. I think BioRuby being the first bio* library > >to use git and Github for distributed revision control is a really > >great step, and demonstrates the forward thinking of the BioRuby > >community. > > > >On 13 Jun 2008, at 02:38, Pjotr Prins wrote: > > > >>The Bioruby repository has been cloned to bioruby-testing-central: > >> > >>http://github.com/pjotrp/bioruby-testing-central/tree/master > >> > >>The convention is to name your repository as 'bioruby-testing- > >>yourname'. > >>So my version will be bioruby-testing-pjotr. If you register > >>yourself > >>with github I can add you as a collaborator. Note: we are *not* > >>competing with the main Bioruby tree - this is a facility to > >>encourage code submissions. It is up to the main Bioruby maintainers > >>whether stuff gets included in the main tree. This is a > >>bioruby-testing tree. > >> > >>Clone the central repository with: > >> > >>git clone git://github.com/pjotrp/bioruby-testing-central.git > >> > >>You don't need to register for that. Patches can be submitted over > >>E-mail. > >> > >>For using git see the tutorial at: > >> > >>http://kernel.org/pub/software/scm/git/docs/gittutorial.html > >> > >> > >>_______________________________________________ > >>BioRuby mailing list > >>BioRuby at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioruby > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From mattscilipoti at possiamo.com Wed Jun 25 17:53:53 2008 From: mattscilipoti at possiamo.com (Matt Scilipoti) Date: Wed, 25 Jun 2008 13:53:53 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) Message-ID: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> I am attempting to use bioruby (blast) with a rails application. It works on my dev machine (OSX), but not the Production Server (suse sles). When I attempt to perform a blast query on the Production server I receive "Errno::EPIPE (Broken pipe)" and "command not found: blastall". The permissions and path look correct to me. Is it possible that there is an apache permission issue? This occurred intermittently when I was using a mongrel_cluster (usually fixed when I restarted the cluster manually thru ssh. If I used capistrano to restart the cluster, it would not fix it. Production server config: Apache2 passenger 2.0.1 (mod_rails) > which blastall /usr/local/bin/blastall > ls -lsa /usr/local/bin/blastall 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall -> /usr/local/lib/blast-2.2.16/bin/blastall > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 /usr/local/lib/blast-2.2.16/bin/blastall > echo $PATH /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin Errors: production.log: Errno::EPIPE (Broken pipe): /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `write' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `print' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `query_command_popen' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in `popen' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in `query_command_popen' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in `query_command' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in `exec_local' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in `send' /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in `query' /app/models/blast_query.rb:83:in `query' The apache error.log indicates: /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W 3 Thank you, Matt -- Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, Training & Development From adamnkraut at gmail.com Wed Jun 25 18:30:19 2008 From: adamnkraut at gmail.com (Adam Kraut) Date: Wed, 25 Jun 2008 14:30:19 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) In-Reply-To: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> References: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> Message-ID: <134ede0b0806251130u3c542a70j7de8196a4ee63c95@mail.gmail.com> It's possible that 'blastall' is not in the path of the user running the apache process. On my system Apache runs as user 'www', so to run blastall from a rails app I would edit the system-wide profile in /etc/profile. You might also want to check how to set the user for Passenger as it may be different from Apache. Using the full path (/usr/local/bin/blastall) should also work for any user but I'm not sure if the bioruby wrapper lets you do this. Cheers, Adam On Wed, Jun 25, 2008 at 1:53 PM, Matt Scilipoti wrote: > I am attempting to use bioruby (blast) with a rails application. It > works on my dev machine (OSX), but not the Production Server (suse > sles). When I attempt to perform a blast query on the Production > server I receive "Errno::EPIPE (Broken pipe)" and "command not found: > blastall". The permissions and path look correct to me. Is it > possible that there is an apache permission issue? This occurred > intermittently when I was using a mongrel_cluster (usually fixed when > I restarted the cluster manually thru ssh. If I used capistrano to > restart the cluster, it would not fix it. > > Production server config: > Apache2 > passenger 2.0.1 (mod_rails) > > > which blastall > /usr/local/bin/blastall > > > ls -lsa /usr/local/bin/blastall > 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall > -> /usr/local/lib/blast-2.2.16/bin/blastall > > > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall > 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 > /usr/local/lib/blast-2.2.16/bin/blastall > > > echo $PATH > > /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin > > Errors: > production.log: > Errno::EPIPE (Broken pipe): > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `write' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in `print' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in `popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in > `query_command' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in > `exec_local' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in > `send' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in > `query' > /app/models/blast_query.rb:83:in `query' > > > The apache error.log indicates: > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: > command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m > 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W > 3 > > Thank you, > Matt > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From adamnkraut at gmail.com Wed Jun 25 19:59:59 2008 From: adamnkraut at gmail.com (Adam Kraut) Date: Wed, 25 Jun 2008 15:59:59 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) In-Reply-To: <217377760806251241l1180fe94q5f788900ee1aab9@mail.gmail.com> References: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> <134ede0b0806251130u3c542a70j7de8196a4ee63c95@mail.gmail.com> <217377760806251241l1180fe94q5f788900ee1aab9@mail.gmail.com> Message-ID: <20B316D5-B461-45D5-8E0C-F47ED5176466@gmail.com> One way to test the 'www' user's environment would be to switch to that user with 'su www' or 'sudo www'. That way you can check if blastall is actually in $PATH. Regarding the per-server configuration, do you use multiple production servers or do you mean the full path will be different between development and production? For the latter, you can set variables in development.rb and production.rb. Something like BLAST_PATH = '/path/to/blastall' can be set per environment and in your application set (factory.blastall = BLAST_PATH). That's the way I do it but there may be better solutions out there. Also, if you reply through the bioruby list more people can follow the discussion and offer advice. Best, Adam On Jun 25, 2008, at 3:41 PM, Matt Scilipoti wrote: > Thank you. > I tried adding the path to /etc/profile.local (recommended in SUSE), > but I was unsure how to ensure that 'www' was using the new > profile. I tried restarting apache, but the error still occurs. > Is a server restart necessary? This is difficult, but possible. > > The problem is solved by assigning the full path for blastall > (factory.blastall='/usr/local/bin/blastall'). But this solution > requires me to provide some configuration for each server, so a path > solution would be best. > > Thanks again, > Matt > > On Wed, Jun 25, 2008 at 2:30 PM, Adam Kraut > wrote: > It's possible that 'blastall' is not in the path of the user running > the apache process. On my system Apache runs as user 'www', so to > run blastall from a rails app I would edit the system-wide profile > in /etc/profile. You might also want to check how to set the user > for Passenger as it may be different from Apache. Using the full > path (/usr/local/bin/blastall) should also work for any user but I'm > not sure if the bioruby wrapper lets you do this. > > Cheers, > Adam > > On Wed, Jun 25, 2008 at 1:53 PM, Matt Scilipoti > wrote: > I am attempting to use bioruby (blast) with a rails application. It > works on my dev machine (OSX), but not the Production Server (suse > sles). When I attempt to perform a blast query on the Production > server I receive "Errno::EPIPE (Broken pipe)" and "command not found: > blastall". The permissions and path look correct to me. Is it > possible that there is an apache permission issue? This occurred > intermittently when I was using a mongrel_cluster (usually fixed when > I restarted the cluster manually thru ssh. If I used capistrano to > restart the cluster, it would not fix it. > > Production server config: > Apache2 > passenger 2.0.1 (mod_rails) > > > which blastall > /usr/local/bin/blastall > > > ls -lsa /usr/local/bin/blastall > 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall > -> /usr/local/lib/blast-2.2.16/bin/blastall > > > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall > 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 > /usr/local/lib/blast-2.2.16/bin/blastall > > > echo $PATH > /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/ > bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin > > Errors: > production.log: > Errno::EPIPE (Broken pipe): > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `write' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `print' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in > `popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in > `query_command_popen' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in > `query_command' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in > `exec_local' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb: > 212:in `send' > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb: > 212:in `query' > /app/models/blast_query.rb:83:in `query' > > > The apache error.log indicates: > /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: > command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m > 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W > 3 > > Thank you, > Matt > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > > > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development From mattscilipoti at possiamo.com Thu Jun 26 16:10:45 2008 From: mattscilipoti at possiamo.com (Matt Scilipoti) Date: Thu, 26 Jun 2008 12:10:45 -0400 Subject: [BioRuby] command not found: blastall and broken pipe from apache2 and passenger (mod_rails) In-Reply-To: <20B316D5-B461-45D5-8E0C-F47ED5176466@gmail.com> References: <217377760806251053pdba9a18rff940d5d52814609@mail.gmail.com> <134ede0b0806251130u3c542a70j7de8196a4ee63c95@mail.gmail.com> <217377760806251241l1180fe94q5f788900ee1aab9@mail.gmail.com> <20B316D5-B461-45D5-8E0C-F47ED5176466@gmail.com> Message-ID: <217377760806260910y487677a8gde90f93194130df@mail.gmail.com> Thank you for this additional help.In the short term, I chose to use a constant in the environment files as well. We have developers on different platforms, so this is still problematic. I tried updating the PATH in /etc/profile and restarting apache, but it still couldn't find blastall. This is a client's hosted machine. This process has alerted me to the fact that both apache and passenger are running as root. I will research if this is standard on SUSE, or just my client, and correct accordingly. This also answers why passenger still couldn't find blastall after I updated /etc/profile and restarted apache. Root probably wasn't updated. I don't have the root password, so I can't `source /etc/profile` as root. Just to make sure my assumptions are right. If apache and passenger are running under the 'www' user, once I make a change to /etc/profile, will they get this change when I restart apache? Or is another action needed? Thanks again, matt On Wed, Jun 25, 2008 at 3:59 PM, Adam Kraut wrote: > One way to test the 'www' user's environment would be to switch to that > user with 'su www' or 'sudo www'. That way you can check if blastall is > actually in $PATH. Regarding the per-server configuration, do you use > multiple production servers or do you mean the full path will be different > between development and production? For the latter, you can set variables > in development.rb and production.rb. Something like BLAST_PATH = > '/path/to/blastall' can be set per environment and in your application set > (factory.blastall = BLAST_PATH). That's the way I do it but there may be > better solutions out there. Also, if you reply through the bioruby list > more people can follow the discussion and offer advice. > Best, > Adam > > On Jun 25, 2008, at 3:41 PM, Matt Scilipoti wrote: > > Thank you. I tried adding the path to /etc/profile.local (recommended in > SUSE), but I was unsure how to ensure that 'www' was using the new profile. > I tried restarting apache, but the error still occurs. Is a server > restart necessary? This is difficult, but possible. > > The problem is solved by assigning the full path for blastall > (factory.blastall='/usr/local/bin/blastall'). But this solution requires > me to provide some configuration for each server, so a path solution would > be best. > > Thanks again, > Matt > > On Wed, Jun 25, 2008 at 2:30 PM, Adam Kraut wrote: > >> It's possible that 'blastall' is not in the path of the user running the >> apache process. On my system Apache runs as user 'www', so to run blastall >> from a rails app I would edit the system-wide profile in /etc/profile. You >> might also want to check how to set the user for Passenger as it may be >> different from Apache. Using the full path (/usr/local/bin/blastall) should >> also work for any user but I'm not sure if the bioruby wrapper lets you do >> this. >> >> Cheers, >> Adam >> >> On Wed, Jun 25, 2008 at 1:53 PM, Matt Scilipoti < >> mattscilipoti at possiamo.com> wrote: >> >>> I am attempting to use bioruby (blast) with a rails application. It >>> works on my dev machine (OSX), but not the Production Server (suse >>> sles). When I attempt to perform a blast query on the Production >>> server I receive "Errno::EPIPE (Broken pipe)" and "command not found: >>> blastall". The permissions and path look correct to me. Is it >>> possible that there is an apache permission issue? This occurred >>> intermittently when I was using a mongrel_cluster (usually fixed when >>> I restarted the cluster manually thru ssh. If I used capistrano to >>> restart the cluster, it would not fix it. >>> >>> Production server config: >>> Apache2 >>> passenger 2.0.1 (mod_rails) >>> >>> > which blastall >>> /usr/local/bin/blastall >>> >>> > ls -lsa /usr/local/bin/blastall >>> 0 lrwxrwxrwx 1 root root 40 2008-06-25 13:31 /usr/local/bin/blastall >>> -> /usr/local/lib/blast-2.2.16/bin/blastall >>> >>> > ls -lsa /usr/local/lib/blast-2.2.16/bin/blastall >>> 4388 -rwxr-xr-x 1 mpr mpr 4488387 2007-03-25 10:28 >>> /usr/local/lib/blast-2.2.16/bin/blastall >>> >>> > echo $PATH >>> >>> /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin >>> >>> Errors: >>> production.log: >>> Errno::EPIPE (Broken pipe): >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in >>> `write' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in >>> `print' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:163:in >>> `query_command_popen' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in >>> `popen' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:161:in >>> `query_command_popen' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/command.rb:148:in >>> `query_command' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:245:in >>> `exec_local' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in >>> `send' >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:212:in >>> `query' >>> /app/models/blast_query.rb:83:in `query' >>> >>> >>> The apache error.log indicates: >>> /usr/lib/ruby/gems/1.8/gems/bio-1.2.1/lib/bio/appl/blast.rb:186: >>> command not found: blastall -p blastp -d STYSiteAbstract -p blastp -m >>> 7 -e 0.001 -F F -M Blosum62Phosbz -v 100 -b 100 -g F -T F -I T -U F -W >>> 3 >>> >>> Thank you, >>> Matt >>> -- >>> Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, >>> Training & Development >>> _______________________________________________ >>> BioRuby mailing list >>> BioRuby at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioruby >>> >> >> > > > -- > Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, > Training & Development > > > -- Matt Scilipoti | Possiamo Consulting LLC | 443-538-8656 | Coaching, Training & Development From donttrustben at gmail.com Sat Jun 28 04:20:25 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sat, 28 Jun 2008 14:20:25 +1000 Subject: [BioRuby] GFF3 attribute parser problems Message-ID: Hi, I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 class GFF3 < GFF VERSION = 3 private def parse_attributes(attributes) hash = Hash.new attributes.split(/[^\\];/).each do |atr| key, value = atr.split('=', 2) hash[key] = value end return hash end My problem is with the split([/^\\]) bit, because it chops off an extra character at the end of the key: irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) => ["abc=de", "one=two"] where we want => ["abc=def", "one=two"] I took a shortcut to solve it and ignored the escaping of the ; and just did this hash = Hash.new attributes.split(/;/).each do |atr| key, value = atr.split('=', 2) hash[key] = value end return hash Thanks, ben From donttrustben at gmail.com Sat Jun 28 04:26:14 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sat, 28 Jun 2008 14:26:14 +1000 Subject: [BioRuby] blast -m7 (xml) and multiple queries Message-ID: Hi, I seem to have run across a bug in the bioruby blast report parser, in that it isn't able to handle reports that span multiple query sequences. My code for parsing is Bio::Blast.reports(ARGF) do |report| puts "Hits for " + report.query_def + " against " + report.db report.each {|hit| hit.each do |hsp| puts [ report.query_def, hit.accession, hsp.query_from, hsp.query_to, hsp.hit_from, hsp.hit_to, hsp.evalue, hit.target_def ].join("\t") end } When I run this on a blast xml output with 2 queries (1st has 10 hits and 2nd has 7), I get 8 hits shown, which is somewhat confusing. The query sequences are somewhat similar, so they have some hits in common - perhaps this sort of explains the number 8. I'm using bioruby from git http://github.com/bioruby/bioruby/commit/a61b16163d3ca74f3f7c8d8e8f03f5f8c68dee60 Using the newest blast (2.2.18). Is this easy to fix? Is there a workaround? A partial answer: According to http://rubyforge.org//tracker/index.php?func=detail&aid=20272&group_id=769&atid=3037 this is an unopened, unfixed bug, caused by a change in the NCBI XML schema. I can workaround by reblasting with the legacy flag -V. Thanks in advance, ben From donttrustben at gmail.com Sat Jun 28 04:45:40 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sat, 28 Jun 2008 14:45:40 +1000 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: not fixed as of most recent git commit, either http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb line 120 2008/6/28 Ben Woodcroft : > Hi, > > I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 > > class GFF3 < GFF > VERSION = 3 > > private > > def parse_attributes(attributes) > hash = Hash.new > attributes.split(/[^\\];/).each do |atr| > key, value = atr.split('=', 2) > hash[key] = value > end > return hash > end > > My problem is with the split([/^\\]) bit, because it chops off an > extra character at the end of the key: > > irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) > => ["abc=de", "one=two"] > > where we want > => ["abc=def", "one=two"] > > > > I took a shortcut to solve it and ignored the escaping of the ; and > just did this > > hash = Hash.new > attributes.split(/;/).each do |atr| > key, value = atr.split('=', 2) > hash[key] = value > end > return hash > > > Thanks, > ben > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From donttrustben at gmail.com Sun Jun 29 01:57:39 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Sun, 29 Jun 2008 11:57:39 +1000 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: I have attempted a fix, and pushed it to github. I forked the main branch, not the testing one, because I class this as a bug fix, not a new feature. Available at http://github.com/wwood/bioruby/tree/master I actually had to create a new class Bio::GFF::GFF3::Record: > not fixed as of most recent git commit, either > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb > > line 120 > > 2008/6/28 Ben Woodcroft : >> Hi, >> >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 >> >> class GFF3 < GFF >> VERSION = 3 >> >> private >> >> def parse_attributes(attributes) >> hash = Hash.new >> attributes.split(/[^\\];/).each do |atr| >> key, value = atr.split('=', 2) >> hash[key] = value >> end >> return hash >> end >> >> My problem is with the split([/^\\]) bit, because it chops off an >> extra character at the end of the key: >> >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) >> => ["abc=de", "one=two"] >> >> where we want >> => ["abc=def", "one=two"] >> >> >> >> I took a shortcut to solve it and ignored the escaping of the ; and >> just did this >> >> hash = Hash.new >> attributes.split(/;/).each do |atr| >> key, value = atr.split('=', 2) >> hash[key] = value >> end >> return hash >> >> >> Thanks, >> ben >> > > > > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From ngoto at gen-info.osaka-u.ac.jp Sun Jun 29 09:51:08 2008 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 29 Jun 2008 18:51:08 +0900 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: <20080629095108.F0FD01CBC52E@idnmail.gen-info.osaka-u.ac.jp> Hi, Thank you for reporting bugs. The GFF3 specification http://song.sourceforge.net/gff3.shtml says that URL escaping rule are used for escaping semicolons, not backslashes. (cited from http://song.sourceforge.net/gff3.shtml) >> Column 9: "attributes" >> >> A list of feature attributes in the format tag=value. Multiple >> tag=value pairs are separated by semicolons. URL escaping rules are >> used for tags or values containing the following characters: ",=;". So, the existing code in BioRuby 1.2.1 is apparently wrong. (I don't know, but perhaps it might be written before the specification was well established?) If nonstandard (and illegal) GFF3 data using backslash for escape is popular, we should also consider it, but the main GFF3 class should keep the official specification. I see your changes in git, but your code seems to be still using "wrong" escaping rule and unconscious of escaping of other characters (",=;&%" and %XX escapes). BTW, I think the Bio::GFF classes in bioruby should be changed to supportcreating GFF objects from scratch and output of GFFs. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Sun, 29 Jun 2008 11:57:39 +1000 "Ben Woodcroft" wrote: > I have attempted a fix, and pushed it to github. I forked the main > branch, not the testing one, because I class this as a bug fix, not a > new feature. Available at > http://github.com/wwood/bioruby/tree/master > > I actually had to create a new class > Bio::GFF::GFF3::Record attributes happens inside the record, not the parser. I'm not sure > this is the most sensible way, but I'm following the laziness virtue > for now. > > I hope these kinds of commits get added to the main repo.. > > Thanks, > ben > > > 2008/6/28 Ben Woodcroft : > > not fixed as of most recent git commit, either > > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb > > > > line 120 > > > > 2008/6/28 Ben Woodcroft : > >> Hi, > >> > >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 > >> > >> class GFF3 < GFF > >> VERSION = 3 > >> > >> private > >> > >> def parse_attributes(attributes) > >> hash = Hash.new > >> attributes.split(/[^\\];/).each do |atr| > >> key, value = atr.split('=', 2) > >> hash[key] = value > >> end > >> return hash > >> end > >> > >> My problem is with the split([/^\\]) bit, because it chops off an > >> extra character at the end of the key: > >> > >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) > >> => ["abc=de", "one=two"] > >> > >> where we want > >> => ["abc=def", "one=two"] > >> > >> > >> > >> I took a shortcut to solve it and ignored the escaping of the ; and > >> just did this > >> > >> hash = Hash.new > >> attributes.split(/;/).each do |atr| > >> key, value = atr.split('=', 2) > >> hash[key] = value > >> end > >> return hash > >> > >> > >> Thanks, > >> ben > >> > > > > > > > > -- > > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. > > > > > > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From donttrustben at gmail.com Sun Jun 29 14:33:43 2008 From: donttrustben at gmail.com (Ben Woodcroft) Date: Mon, 30 Jun 2008 00:33:43 +1000 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: <20080629095108.F0FD01CBC52E@idnmail.gen-info.osaka-u.ac.jp> References: <20080629095108.F0FD01CBC52E@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Thanks for your reply. I've had a look at the spec - it seems to be far more complex that the current bioruby GFF3 is with the cross references, predefined tags, fasta, etc. - but I agree it would be good to have a fully featured parser/writer. I fixed it so with all the escaping characters, and created the corresponding tests, and committed to the same github repo. I don't know of any GFF file that uses the illegal blackslashes, so I took that code out. Actually, the current 1.2.1 code doesn't either - the backslash code isn't ever accessed as far as I can tell. I added to_s methods for GFF3 class and corresponding Record class, like you suggested. I think a big problem with GFF files is that they load the whole thing into memory, and the to_s method doesn't fix this. Maybe in the future... Thanks, ben 2008/6/29 Naohisa GOTO : > Hi, > > Thank you for reporting bugs. > > The GFF3 specification http://song.sourceforge.net/gff3.shtml > says that URL escaping rule are used for escaping semicolons, > not backslashes. > > (cited from http://song.sourceforge.net/gff3.shtml) >>> Column 9: "attributes" >>> >>> A list of feature attributes in the format tag=value. Multiple >>> tag=value pairs are separated by semicolons. URL escaping rules are >>> used for tags or values containing the following characters: ",=;". > > So, the existing code in BioRuby 1.2.1 is apparently wrong. > (I don't know, but perhaps it might be written before the > specification was well established?) > > If nonstandard (and illegal) GFF3 data using backslash for escape > is popular, we should also consider it, but the main GFF3 class > should keep the official specification. > > I see your changes in git, but your code seems to be still using > "wrong" escaping rule and unconscious of escaping of other characters > (",=;&%" and %XX escapes). > > BTW, I think the Bio::GFF classes in bioruby should be changed > to supportcreating GFF objects from scratch and output of GFFs. > > Thanks, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > On Sun, 29 Jun 2008 11:57:39 +1000 > "Ben Woodcroft" wrote: > >> I have attempted a fix, and pushed it to github. I forked the main >> branch, not the testing one, because I class this as a bug fix, not a >> new feature. Available at >> http://github.com/wwood/bioruby/tree/master >> >> I actually had to create a new class >> Bio::GFF::GFF3::Record> attributes happens inside the record, not the parser. I'm not sure >> this is the most sensible way, but I'm following the laziness virtue >> for now. >> >> I hope these kinds of commits get added to the main repo.. >> >> Thanks, >> ben >> >> >> 2008/6/28 Ben Woodcroft : >> > not fixed as of most recent git commit, either >> > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb >> > >> > line 120 >> > >> > 2008/6/28 Ben Woodcroft : >> >> Hi, >> >> >> >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1 >> >> >> >> class GFF3 < GFF >> >> VERSION = 3 >> >> >> >> private >> >> >> >> def parse_attributes(attributes) >> >> hash = Hash.new >> >> attributes.split(/[^\\];/).each do |atr| >> >> key, value = atr.split('=', 2) >> >> hash[key] = value >> >> end >> >> return hash >> >> end >> >> >> >> My problem is with the split([/^\\]) bit, because it chops off an >> >> extra character at the end of the key: >> >> >> >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/) >> >> => ["abc=de", "one=two"] >> >> >> >> where we want >> >> => ["abc=def", "one=two"] >> >> >> >> >> >> >> >> I took a shortcut to solve it and ignored the escaping of the ; and >> >> just did this >> >> >> >> hash = Hash.new >> >> attributes.split(/;/).each do |atr| >> >> key, value = atr.split('=', 2) >> >> hash[key] = value >> >> end >> >> return hash >> >> >> >> >> >> Thanks, >> >> ben >> >> >> > >> > >> > >> > -- >> > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. >> > >> >> >> >> -- >> FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From pjotr2008 at thebird.nl Mon Jun 30 07:34:32 2008 From: pjotr2008 at thebird.nl (Pjotr Prins) Date: Mon, 30 Jun 2008 09:34:32 +0200 Subject: [BioRuby] GFF3 attribute parser problems In-Reply-To: References: Message-ID: <20080630073432.GB32016@thebird.nl> On Sun, Jun 29, 2008 at 11:57:39AM +1000, Ben Woodcroft wrote: > I have attempted a fix, and pushed it to github. I forked the main > branch, not the testing one, because I class this as a bug fix, not a > new feature. Available at > http://github.com/wwood/bioruby/tree/master I think that is the correct thing to do. I had an E-mail questioning the role of bioruby-testing-central - or whether new development should not happen against the main tree. I would have it like this: New feature development can happen against the main tree with the blessing of the main tree maintainers. bioruby-testing-central is for development that is, for one reason or another, not encouraged in the main tree. Truly a *testing* tree, for people to play with. Like I am doing with my microarray stuff now. Once my microarray code matures it may well be it goes into main. Naturally everyone can branch of in such a way, on his or her own. But this way new feature development is visible to all. That is what it is for. I encourage using this tree for all things experimental. Pj.