From pjotr.public78 at thebird.nl Tue Dec 1 06:38:35 2009 From: pjotr.public78 at thebird.nl (Pjotr Prins) Date: Tue, 1 Dec 2009 12:38:35 +0100 Subject: [emboss-dev] Mapping EMBOSS to Ruby, Perl and Python In-Reply-To: <20091130115717.GA9235@thebird.nl> References: <20091130115717.GA9235@thebird.nl> Message-ID: <20091201113835.GA22908@thebird.nl> Hello, anyone on this list? Pj. On Mon, Nov 30, 2009 at 12:57:17PM +0100, Pjotr Prins wrote: > For your inspection, I have committed a patch for splitting out the > logic of ./emboss/transeq.c. The patch is here: > > http://github.com/pjotrp/EMBOSS/commit/713800c4aa08ddf70b87f245a524c1a0b30c0942 > > The simplified transeq.c is here: > > http://github.com/pjotrp/EMBOSS/blob/biolib/emboss/transeq.c > > The new interfaces are here: > > http://github.com/pjotrp/EMBOSS/blob/biolib/emboss/function/emboss_transeq.c > > Basically I have split out the ACD logic and programming logic and > given them new names: > > int transeq_acd(int argc, char **argv) > > AjPSeqout transeq( AjPSeqall seqall, AjPStr *framelist, AjPStr tablename, AjPRange regions, AjBool trim, AjBool clean, AjBool alternate) > > so you can call either from an external program. The advantage being > the call interface is exactly the same, whether from the command > line, the web interface, or directly through a shared linked library. > > What do you think? I propose to (slowly) accept splitting out the > other routines in this fashion. As it does not interfere with EMBOSS > it can be done in small steps. > > The file emboss/function/emboss_transeq.c may get some extra > interfaces - the idea is that is contains nicely named and direct > methods (unlike the internal 'ajCamelCase' naming conventions). A > useful one would be a simple one reading frame translation with > pre-selected translation table (for speed). But more on that later - > I can also weight-lift that in biolib itself. > > The reason I want to do this here is to prevent duplication of > functionality at different levels. > > Pj. > _______________________________________________ > emboss-dev mailing list > emboss-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss-dev From biopython at maubp.freeserve.co.uk Tue Dec 1 07:02:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 1 Dec 2009 12:02:21 +0000 Subject: [emboss-dev] Mapping EMBOSS to Ruby, Perl and Python In-Reply-To: <20091130115717.GA9235@thebird.nl> References: <20091130115717.GA9235@thebird.nl> Message-ID: <320fb6e00912010402m7ded2694ne6191c71935ddfa5@mail.gmail.com> On Mon, Nov 30, 2009 at 11:57 AM, Pjotr Prins wrote: > > The file emboss/function/emboss_transeq.c may get some extra > interfaces - the idea is that is contains nicely named and direct > methods (unlike the internal 'ajCamelCase' naming conventions). > Naming conventions and what is "nice" is a personal judgement. Why change things? For anyone used to the EMBOSS code base, it is very advantageous to preserve the old names in any interface. I would also say it makes sense to keep the "aj" prefix - it acts as a namespace, avoiding name collisions with internal names. Peter (an EMBOSS user and minor contributor) From pjotr.public78 at thebird.nl Tue Dec 1 07:35:10 2009 From: pjotr.public78 at thebird.nl (Pjotr Prins) Date: Tue, 1 Dec 2009 13:35:10 +0100 Subject: [emboss-dev] Mapping EMBOSS to Ruby, Perl and Python In-Reply-To: <320fb6e00912010402m7ded2694ne6191c71935ddfa5@mail.gmail.com> References: <20091130115717.GA9235@thebird.nl> <320fb6e00912010402m7ded2694ne6191c71935ddfa5@mail.gmail.com> Message-ID: <20091201123510.GA23982@thebird.nl> On Tue, Dec 01, 2009 at 12:02:21PM +0000, Peter wrote: > On Mon, Nov 30, 2009 at 11:57 AM, Pjotr Prins wrote: > > > > The file emboss/function/emboss_transeq.c may get some extra > > interfaces - the idea is that is contains nicely named and direct > > methods (unlike the internal 'ajCamelCase' naming conventions). > > > Naming conventions and what is "nice" is a personal judgement. Sure. > Why change things? For anyone used to the EMBOSS code base, > it is very advantageous to preserve the old names in any interface. The standard naming is available - and not going to disappear. I don't expect anyone to change his/her ways. > I would also say it makes sense to keep the "aj" prefix - it acts > as a namespace, avoiding name collisions with internal names. It is poor-mans namespacing. And it is fine for the backend. There is no reason not to provide something better for users. Especially if the front-end is C with extensions. With Biolib I can even introduce C++, if needed. Anyway, if EMBOSS does not like it - it does not have to go into the main code base. I would prefer to export, at least, transeq and transeq_acd to the external world. That would align with EMBOSS' documentation of the binaries. It would make sense (to me) to have that in the main code base. The only thing I am really asking is to split code out of the main() functions. Pj. From pmr at ebi.ac.uk Thu Dec 10 08:36:14 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 10 Dec 2009 13:36:14 +0000 Subject: [emboss-dev] EMBOSS 6.1.0 patch 1.3 Message-ID: <4B20F94E.4000506@ebi.ac.uk> A patch for EMBOSS 6.1.0 is on the FTP server. This fixes problems with extractfeat, using format names with dashes (fastq-sanger) in USAs, scaling issues in plot outputs, and some minor bugs. The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes with a patch file and instructions in the patches subdirectory. Fix 3. EMBOSS-6.1.0/ajax/ajfeat.c EMBOSS-6.1.0/ajax/ajfeat.h EMBOSS-6.1.0/ajax/ajgraph.c EMBOSS-6.1.0/ajax/ajmath.c EMBOSS-6.1.0/ajax/ajseq.c EMBOSS-6.1.0/ajax/ajseqread.c EMBOSS-6.1.0/ajax/ajseqwrite.c EMBOSS-6.1.0/nucleus/embmisc.c EMBOSS-6.1.0/nucleus/embmisc.h EMBOSS-6.1.0/nucleus/embpat.c EMBOSS-6.1.0/emboss/coderet.c EMBOSS-6.1.0/emboss/extractfeat.c EMBOSS-6.1.0/emboss/notseq.c EMBOSS-6.1.0/emboss/prettyplot.c EMBOSS-6.1.0/emboss/seqmatchall.c EMBOSS-6.1.0/emboss/showfeat.c EMBOSS-6.1.0/emboss/showpep.c EMBOSS-6.1.0/emboss/showseq.c EMBOSS-6.1.0/emboss/twofeat.c EMBOSS-6.1.0/jemboss/utils/install-jemboss-server.sh EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/AppendToLogFileThread.java EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/JembossAuthServer.java 02-Dec-2009: Fixes problems with extractfeat. The fix includes cleaner definitions of functions used to match feature tags and feature types which result in minor updates to 6 other applications. Extractfeat in previous versions used its own text parser to extract feature data from only a limited set of formats. In release 6.1.0 it was replaced by the standard EMBOSS feature table. With no options set, extractfeat rejected all features (type '*' was needed to extract features). Extractfeat default settings now extract all features from an entry. Features on the reverse strand were incorrectly processed (an effect caused by some of the old extractfeat code remaining). Reverse strand features are now correctly parsed, including both "join(complement())" and "complement(join())" syntax in EMBL/GenBank/DDBJ feature tables. Fixes an issue in GenBank parsing where the ORIGIN line is absent. Fixes scaling errors in prettyplot, especially in mEMBOSS when plotting to a window on screen (the default output). The plplot library does not report the true width and height for several devices. The assumptions in prettyplot depend on reasonable size estimates. Release 6.2.0 will have further corrections to plplot device scaling. Fixes the counting of non-coding features in coderet. Fixes a seqmatchall error for short sequences with perfect matches When reverse-complementing sequences, also reverses the quality scores. Allows '-' in format names in the USA syntax, to allow fastq-sanger fastq-illumina and fastq-solexa format names to be used. When reading protein sequences, a sequence with only a stop is now recognized as empty (zero length) after processing ambiguity codes and stops. Fixes a problem writing features in PIR format when the feature table is empty, for example a report file with no hits. Fixes a dependency on 'ant' to install a Jemboss server. Fixes a problem in logging Jemboss info/error messages. regards, Peter Rice From biopython at maubp.freeserve.co.uk Tue Dec 15 07:07:48 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 12:07:48 +0000 Subject: [emboss-dev] Updating http://www.open-bio.org/wiki/SourceCode Message-ID: <320fb6e00912150407o48fd981fn16dedaf0581e5fef@mail.gmail.com> Hello all, We just had a puzzling mailing list query about the Biopython CVS repository, which turned out to be partly due to some very dated information here: http://www.open-bio.org/wiki/SourceCode I've made a few minor improvements, but feel the whole page could be simplified. Am I right in thinking it is just EMBOSS still using CVS (all the other projects are now on SVN or github, or obsolete)? If so, since EMBOSS has nice CVS documentation on their own webpages, could we remove most of the CVS text from the OBF wiki: http://emboss.sourceforge.net/developers/cvs.html Thanks, Peter (@Biopython) From biopython at maubp.freeserve.co.uk Tue Dec 15 08:11:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 13:11:55 +0000 Subject: [emboss-dev] [Open-bio-l] Updating http://www.open-bio.org/wiki/SourceCode In-Reply-To: References: <320fb6e00912150407o48fd981fn16dedaf0581e5fef@mail.gmail.com> Message-ID: <320fb6e00912150511u928e0a3s2187a11a634bda0a@mail.gmail.com> On Tue, Dec 15, 2009 at 12:42 PM, Andy Jenkinson wrote: > To be honest I'm not sure which (if any) of the BioDAS project's > components are using CVS. IIRC something was but I don't have > access so have never looked into it. Perhaps someone else can confirm? This wiki page suggests that BioDAS is also still using CVS: http://www.biodas.org/wiki/DAS/2#CVS_Access Peter From pjotr.public78 at thebird.nl Tue Dec 1 11:38:35 2009 From: pjotr.public78 at thebird.nl (Pjotr Prins) Date: Tue, 1 Dec 2009 12:38:35 +0100 Subject: [emboss-dev] Mapping EMBOSS to Ruby, Perl and Python In-Reply-To: <20091130115717.GA9235@thebird.nl> References: <20091130115717.GA9235@thebird.nl> Message-ID: <20091201113835.GA22908@thebird.nl> Hello, anyone on this list? Pj. On Mon, Nov 30, 2009 at 12:57:17PM +0100, Pjotr Prins wrote: > For your inspection, I have committed a patch for splitting out the > logic of ./emboss/transeq.c. The patch is here: > > http://github.com/pjotrp/EMBOSS/commit/713800c4aa08ddf70b87f245a524c1a0b30c0942 > > The simplified transeq.c is here: > > http://github.com/pjotrp/EMBOSS/blob/biolib/emboss/transeq.c > > The new interfaces are here: > > http://github.com/pjotrp/EMBOSS/blob/biolib/emboss/function/emboss_transeq.c > > Basically I have split out the ACD logic and programming logic and > given them new names: > > int transeq_acd(int argc, char **argv) > > AjPSeqout transeq( AjPSeqall seqall, AjPStr *framelist, AjPStr tablename, AjPRange regions, AjBool trim, AjBool clean, AjBool alternate) > > so you can call either from an external program. The advantage being > the call interface is exactly the same, whether from the command > line, the web interface, or directly through a shared linked library. > > What do you think? I propose to (slowly) accept splitting out the > other routines in this fashion. As it does not interfere with EMBOSS > it can be done in small steps. > > The file emboss/function/emboss_transeq.c may get some extra > interfaces - the idea is that is contains nicely named and direct > methods (unlike the internal 'ajCamelCase' naming conventions). A > useful one would be a simple one reading frame translation with > pre-selected translation table (for speed). But more on that later - > I can also weight-lift that in biolib itself. > > The reason I want to do this here is to prevent duplication of > functionality at different levels. > > Pj. > _______________________________________________ > emboss-dev mailing list > emboss-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss-dev From biopython at maubp.freeserve.co.uk Tue Dec 1 12:02:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 1 Dec 2009 12:02:21 +0000 Subject: [emboss-dev] Mapping EMBOSS to Ruby, Perl and Python In-Reply-To: <20091130115717.GA9235@thebird.nl> References: <20091130115717.GA9235@thebird.nl> Message-ID: <320fb6e00912010402m7ded2694ne6191c71935ddfa5@mail.gmail.com> On Mon, Nov 30, 2009 at 11:57 AM, Pjotr Prins wrote: > > The file emboss/function/emboss_transeq.c may get some extra > interfaces - the idea is that is contains nicely named and direct > methods (unlike the internal 'ajCamelCase' naming conventions). > Naming conventions and what is "nice" is a personal judgement. Why change things? For anyone used to the EMBOSS code base, it is very advantageous to preserve the old names in any interface. I would also say it makes sense to keep the "aj" prefix - it acts as a namespace, avoiding name collisions with internal names. Peter (an EMBOSS user and minor contributor) From pjotr.public78 at thebird.nl Tue Dec 1 12:35:10 2009 From: pjotr.public78 at thebird.nl (Pjotr Prins) Date: Tue, 1 Dec 2009 13:35:10 +0100 Subject: [emboss-dev] Mapping EMBOSS to Ruby, Perl and Python In-Reply-To: <320fb6e00912010402m7ded2694ne6191c71935ddfa5@mail.gmail.com> References: <20091130115717.GA9235@thebird.nl> <320fb6e00912010402m7ded2694ne6191c71935ddfa5@mail.gmail.com> Message-ID: <20091201123510.GA23982@thebird.nl> On Tue, Dec 01, 2009 at 12:02:21PM +0000, Peter wrote: > On Mon, Nov 30, 2009 at 11:57 AM, Pjotr Prins wrote: > > > > The file emboss/function/emboss_transeq.c may get some extra > > interfaces - the idea is that is contains nicely named and direct > > methods (unlike the internal 'ajCamelCase' naming conventions). > > > Naming conventions and what is "nice" is a personal judgement. Sure. > Why change things? For anyone used to the EMBOSS code base, > it is very advantageous to preserve the old names in any interface. The standard naming is available - and not going to disappear. I don't expect anyone to change his/her ways. > I would also say it makes sense to keep the "aj" prefix - it acts > as a namespace, avoiding name collisions with internal names. It is poor-mans namespacing. And it is fine for the backend. There is no reason not to provide something better for users. Especially if the front-end is C with extensions. With Biolib I can even introduce C++, if needed. Anyway, if EMBOSS does not like it - it does not have to go into the main code base. I would prefer to export, at least, transeq and transeq_acd to the external world. That would align with EMBOSS' documentation of the binaries. It would make sense (to me) to have that in the main code base. The only thing I am really asking is to split code out of the main() functions. Pj. From pmr at ebi.ac.uk Thu Dec 10 13:36:14 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 10 Dec 2009 13:36:14 +0000 Subject: [emboss-dev] EMBOSS 6.1.0 patch 1.3 Message-ID: <4B20F94E.4000506@ebi.ac.uk> A patch for EMBOSS 6.1.0 is on the FTP server. This fixes problems with extractfeat, using format names with dashes (fastq-sanger) in USAs, scaling issues in plot outputs, and some minor bugs. The files are on our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/fixes with a patch file and instructions in the patches subdirectory. Fix 3. EMBOSS-6.1.0/ajax/ajfeat.c EMBOSS-6.1.0/ajax/ajfeat.h EMBOSS-6.1.0/ajax/ajgraph.c EMBOSS-6.1.0/ajax/ajmath.c EMBOSS-6.1.0/ajax/ajseq.c EMBOSS-6.1.0/ajax/ajseqread.c EMBOSS-6.1.0/ajax/ajseqwrite.c EMBOSS-6.1.0/nucleus/embmisc.c EMBOSS-6.1.0/nucleus/embmisc.h EMBOSS-6.1.0/nucleus/embpat.c EMBOSS-6.1.0/emboss/coderet.c EMBOSS-6.1.0/emboss/extractfeat.c EMBOSS-6.1.0/emboss/notseq.c EMBOSS-6.1.0/emboss/prettyplot.c EMBOSS-6.1.0/emboss/seqmatchall.c EMBOSS-6.1.0/emboss/showfeat.c EMBOSS-6.1.0/emboss/showpep.c EMBOSS-6.1.0/emboss/showseq.c EMBOSS-6.1.0/emboss/twofeat.c EMBOSS-6.1.0/jemboss/utils/install-jemboss-server.sh EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/AppendToLogFileThread.java EMBOSS-6.1.0/jemboss/org/emboss/jemboss/server/JembossAuthServer.java 02-Dec-2009: Fixes problems with extractfeat. The fix includes cleaner definitions of functions used to match feature tags and feature types which result in minor updates to 6 other applications. Extractfeat in previous versions used its own text parser to extract feature data from only a limited set of formats. In release 6.1.0 it was replaced by the standard EMBOSS feature table. With no options set, extractfeat rejected all features (type '*' was needed to extract features). Extractfeat default settings now extract all features from an entry. Features on the reverse strand were incorrectly processed (an effect caused by some of the old extractfeat code remaining). Reverse strand features are now correctly parsed, including both "join(complement())" and "complement(join())" syntax in EMBL/GenBank/DDBJ feature tables. Fixes an issue in GenBank parsing where the ORIGIN line is absent. Fixes scaling errors in prettyplot, especially in mEMBOSS when plotting to a window on screen (the default output). The plplot library does not report the true width and height for several devices. The assumptions in prettyplot depend on reasonable size estimates. Release 6.2.0 will have further corrections to plplot device scaling. Fixes the counting of non-coding features in coderet. Fixes a seqmatchall error for short sequences with perfect matches When reverse-complementing sequences, also reverses the quality scores. Allows '-' in format names in the USA syntax, to allow fastq-sanger fastq-illumina and fastq-solexa format names to be used. When reading protein sequences, a sequence with only a stop is now recognized as empty (zero length) after processing ambiguity codes and stops. Fixes a problem writing features in PIR format when the feature table is empty, for example a report file with no hits. Fixes a dependency on 'ant' to install a Jemboss server. Fixes a problem in logging Jemboss info/error messages. regards, Peter Rice From biopython at maubp.freeserve.co.uk Tue Dec 15 12:07:48 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 12:07:48 +0000 Subject: [emboss-dev] Updating http://www.open-bio.org/wiki/SourceCode Message-ID: <320fb6e00912150407o48fd981fn16dedaf0581e5fef@mail.gmail.com> Hello all, We just had a puzzling mailing list query about the Biopython CVS repository, which turned out to be partly due to some very dated information here: http://www.open-bio.org/wiki/SourceCode I've made a few minor improvements, but feel the whole page could be simplified. Am I right in thinking it is just EMBOSS still using CVS (all the other projects are now on SVN or github, or obsolete)? If so, since EMBOSS has nice CVS documentation on their own webpages, could we remove most of the CVS text from the OBF wiki: http://emboss.sourceforge.net/developers/cvs.html Thanks, Peter (@Biopython) From biopython at maubp.freeserve.co.uk Tue Dec 15 13:11:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 13:11:55 +0000 Subject: [emboss-dev] [Open-bio-l] Updating http://www.open-bio.org/wiki/SourceCode In-Reply-To: References: <320fb6e00912150407o48fd981fn16dedaf0581e5fef@mail.gmail.com> Message-ID: <320fb6e00912150511u928e0a3s2187a11a634bda0a@mail.gmail.com> On Tue, Dec 15, 2009 at 12:42 PM, Andy Jenkinson wrote: > To be honest I'm not sure which (if any) of the BioDAS project's > components are using CVS. IIRC something was but I don't have > access so have never looked into it. Perhaps someone else can confirm? This wiki page suggests that BioDAS is also still using CVS: http://www.biodas.org/wiki/DAS/2#CVS_Access Peter