From henrikki.almusa at helsinki.fi Thu Jul 1 10:08:06 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Thu, 1 Jul 2004 17:08:06 +0300 Subject: Patten lists ajax header, third draft In-Reply-To: <200406301632.40816.henrikki.almusa@helsinki.fi> References: <200406281120.54203.henrikki.almusa@helsinki.fi> <200406291343.12877.henrikki.almusa@helsinki.fi> <200406301632.40816.henrikki.almusa@helsinki.fi> Message-ID: <200407011708.06797.henrikki.almusa@helsinki.fi> Hello Heres the third version of the files 'ajpat.c' and 'ajpat.h'. Atm i have tested the regular expression handling and it seems to work now. I can add pattern to list, test it against a string and then clear the list. There is one compiler warning though (my fixing causes deleting to segfault). ajpat.c: In function `ajPatternDel': ajpat.c:53: warning: passing arg 1 of `ajRegFree' from incompatible pointer type The for testing was in dreg and was this: AjPPatlist plist; AjPPattern pat; AjPStr file; AjPStr tested; file=ajStrNewC("pattern.file"); tested=ajStrNewC("ggagagagagttct"); plist=ajPatlistNew(); ajPatlistParsePatternFile(plist,file,1); while (ajPatlistGetNext(plist,&pat)) { ajFmtPrint ("name: %S mismatch: %d\n",ajPatternGetName(pat),ajPatternGetMismatch(pat)); patexp = ajPatternGetCompiledPattern(pat); if (ajRegExec(patexp,tested)) ajFmtPrint (" found from '%d'\n",ajRegOffset(patexp)); } ajDebug ("Starting deleting\n"); ajPatlistDel(&plist); Now the main issues with this is still the prosite pattern handling. From my understanding it could be fixed by making prosite patterns use a struct to move the needed pieces around. That would be easy then to be used with this as well. Other point is the overloading of the acd functions. I don't yet know how to do that. However I would like some comments on whether this is a good way to do this (and could be accepted to emboss, when ready). Thanks, -- Henrikki Almusa -------------- next part -------------- A non-text attachment was scrubbed... Name: ajpat.c Type: text/x-csrc Size: 6542 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040701/7d0f80f2/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: ajpat.h Type: text/x-chdr Size: 1964 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040701/7d0f80f2/attachment-0001.bin From hegedus at biomembrane.hu Fri Jul 2 14:14:03 2004 From: hegedus at biomembrane.hu (Tamas Hegedus) Date: Fri, 2 Jul 2004 20:14:03 +0200 (CEST) Subject: USAs Message-ID: Dear All, I would like to know if it is possible to hack ajax to handle similar USAs listed below: - USA:kw=something, ft=sthelse. - USA:SELECT * FROM mytable WHERE.. I see you are working on pattern searches. It would be great to have the possibility to define patterns in the fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq I think the implementation of this would be useful. Return 'value' could be a 'fasta' pattern file: - seq_id type[regexpr|prosite|matrix] - pattern Or at the beginning going on the simplest way: The return value is the simple pattern. I would be satisfied only with this, too :-) Thank for your help, for your answers, Tamas From hegedus at biomembrane.hu Fri Jul 2 14:20:16 2004 From: hegedus at biomembrane.hu (Tamas Hegedus) Date: Fri, 2 Jul 2004 20:20:16 +0200 (CEST) Subject: USAs2 Message-ID: Sorry! I was inaccurate: see !!! ----------------------- Dear All, I would like to know if it is possible to hack ajax to handle similar USAs listed below and !!!HOW!!!: - USA:kw=something, ft=sthelse. - USA:SELECT * FROM mytable WHERE.. I see you are working on pattern searches. It would be great to have the possibility to define patterns in the fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq I think the implementation of this would be useful. Return 'value' could be a 'fasta' pattern file: !!! >seq_id type[regexpr|prosite|matrix] pattern !!! Or at the beginning going on the simplest way: The return value is the simple pattern. I would be satisfied only with this, too :-) Thank for your help, for your answers, Tamas -- Tam?s Heged?s, Research Associate | http://www.biomembrane.hu Membrane Research Group of | mailto:hegedus at biomembrane.hu Hungarian Academy of Sciences | tel: 36-1-3724317 H-1113 Budapest Dioszegi u 64, HUNGARY | fax: 36-1-3724353 From pmr at ebi.ac.uk Fri Jul 2 14:38:02 2004 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 02 Jul 2004 19:38:02 +0100 Subject: USAs2 In-Reply-To: References: Message-ID: <40E5AB8A.7010502@ebi.ac.uk> Hi Tamas, Thanks for the suggestion! It is late on Friday, so I will give it some thought over the weekend. > I would like to know if it is possible to hack ajax to handle similar USAs > listed below and !!!HOW!!!: > - USA:kw=something, ft=sthelse. > - USA:SELECT * FROM mytable WHERE.. Yes, it is possible. But still a hack ... which means we have not yet implemented it. This is really an extended query language. I tried to define such extensions last year when I moved back to academia, but have not yet had time to implement anything. This is an excellent time to start defining extended USAs. My plan was: Start by thinking about the "SRS query language". You can search for various "fields": id (entry ID) acc (accession number) sv (sequence version ... and maybe GI number) des (description) key (keyword phrase) org (taxonomy) ... and a few more ... In SRS, you can use & (and), | (or) ! (but not) to combine search terms In SRS you can also use > and < to follow links to and from other databases. SRS has only one link between any pair of databases - I would rather like to use named links so we can choose which links to use. I would like to allow mulitple databases in the USA. There are some problems choosing a good syntax. I would also like to allow multiple fields - obviously id and acc, or combining text fields. Then, as you suggest, some SQL-like syntax would be nice. It looks complicated, but we can work in small steps. In all cases, we need to make this work with "EMBLCD" indexing, with reading flatfile data, and with any other indexing system. We can also try to make it work with SRS and SRSWWW (easy in some cases, hard in others) > I see you are working on pattern searches. > It would be great to have the possibility to define patterns in the > fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq > I think the implementation of this would be useful. > Return 'value' could be a 'fasta' pattern file: If I understand correctly, you want to define a file of named patterns, and select one using a "USA" syntax. This is not so simple ... because programs usually want only one type of pattern. However, in ACD we can give the pattern a "knowntype" attribute so EMBOSS (and any wrapper) knows what type of pattern is allowed. We can then use Henrikki Almusa's pattern list to define a file of patterns, and some pattern syntax to say which pattern(s) to use. We do have a problem - we need to make these pattern "USAs" different from simple patterns. We also need a name for pattern definitions. I am sure we can think of one. regards, Peter Rice From ableasby at hgmp.mrc.ac.uk Fri Jul 9 09:23:21 2004 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Fri, 9 Jul 2004 14:23:21 +0100 (BST) Subject: Developer 2.9.0 pre-release Message-ID: <200407091323.i69DNL4S000083@bromine.hgmp.mrc.ac.uk> EMBOSS 2.9.0 is scheduled to be released on the 15th July. Primarily for GUI developers there is now a pre-release of 2.9.0 in the directory: ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/pre/ In the same directory are EMBASSY packages tailored for 2.9.0 (the ones in the directory above are incompatible). Alan PS: The real 2.9.0 will contain a few minor changes but, if your GUI works for the above it will also work for the official release. From hegedus at biomembrane.hu Thu Jul 15 14:30:18 2004 From: hegedus at biomembrane.hu (Tamas Hegedus) Date: Thu, 15 Jul 2004 20:30:18 +0200 (CEST) Subject: ModBioSQL release 0.12 Message-ID: Dear All, Dear Peter, during my work I had to use RDBMS and EMBOSS. I collected my scripts and experiments into a package called Modular BioSQL, which has different features: -- Modular RDB realization of different biological databases allows fine-tuning with increased performance. -- Storing result sets in RDBMS allows more accurate, more comfortable analysis using SQL. -- User interaction with the RDBMS (installation, loading up and querying data) does not need programming skills. -- Light weight RDB interaction with analysis packages (only EMBOSS is implemented). -- Optimalized loading of flat files into the RDBMS. -- Using 'fixed value arrays' (*_ref tables) results in both smaller data size (smaller than the flat file) and smaller index size increasing the performance (theoretically both the uploading and querying performance). -- Relatively easily extendable to implement and handle databases other than the currently realized. You may think I suggest Modular BioSQL as a replacement of BioSQL. I do not think so! For details, please visit my web site, and send comments and suggestions: http://www.biomembrane.hu/~hegedus/modbiosql/ Best regards, Tamas -- Tamas Hegedus, Research Fellow | phone: 480-301-6041 Mayo Clinic Scottsdale | fax: 480-301-7017 13000 E. Shea Blvd | mailto:hegedus.tamas at mayo.edu Scottsdale, AZ, 85259 | http://www.biomembrane.hu/~hegedus From raoul.bonnal at itb.cnr.it Mon Jul 19 06:06:52 2004 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Mon, 19 Jul 2004 12:06:52 +0200 Subject: Baeza-Yates,Perleberg search and Mismatch position Message-ID: <1090231612.10983.17.camel@localhost.localdomain> Hi, performing a pattern search, allowing a number of mismatches, with the methond in subject, is it possible identify the mimstaches positions into the returned patterns or have I to locate them in a second step ? func embPatBYPSearch rif: nucleos/embpat.c nucleos/embpat.h How func embPatBYPSearch could be modified to save mismatch position ? tnx in advance. -- Raoul Jean Pierre Bonnal I.T.B. - C.N.R. via Fratelli Cervi, 93 20090 Segrate -Mi-, Italy Floor 7, Room 13 Tel. +390226422724 Fax. +390226422770 E-mail: raoul.bonnal at itb.cnr.it From pmr at ebi.ac.uk Fri Jul 23 07:14:57 2004 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 23 Jul 2004 12:14:57 +0100 Subject: [EMBOSS] incorporating old code in 2.9.0 In-Reply-To: <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk> References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk> Message-ID: <4100F331.5080002@ebi.ac.uk> Derek Gatherer wrote: (see Derek's previous message to emboss at embnet.org for the problem - copied to emboss-dev because developers will need to know the answer). Solution: All variable declarations of the type: AjPStr astr, bstr; must be split into single variables from EMBOSS 2.9.0: AjPStr astr; AjPStr bstr; Explanation follows. > Hi Peter > > Here is the full error set for one of the apps: > > compact.c: In function `main': > compact.c:53: error: incompatible types in assignment > > and the code is attached. > > #include "emboss.h" > int main (int argc, char **argv) > { > AjPStr cseq, cseqo; > cseq = ajStrNew(); > cseqo = ajStrNew(); Ah, all is now clear. I just kept the relevant lines included above. Note that the cseq line is fine, the cseqo line is the one that gives the error. The cause is the redefinition of AjPStr as a macro to make "const AjPStr" work. Sorry, I forgot to stress this one in the release notes. The problem is the line: AjPStr cseq, cseqo; Because AjPStr is now a macro that is replaced by "const AjOStr*" the definition of cseqo becomes: const AjOStr* cseq, cseqo; This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr (what an AjPStr points to). The solution ... AjPStr cseq; AjPStr cseqo; All AjP definitions have to now be one per line. Sorry - we worked very hard to avouid this, but the compilers simply fail to put the const in the right place otherwise so we have to live with the macro and this side effect. This should solve your problems. regards, Peter Rice From jrvalverde at cnb.uam.es Fri Jul 23 08:27:51 2004 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Fri, 23 Jul 2004 14:27:51 +0200 Subject: [EMBOSS] incorporating old code in 2.9.0 In-Reply-To: <4100F331.5080002@ebi.ac.uk> References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk> <4100F331.5080002@ebi.ac.uk> Message-ID: <20040723142751.628764c8.jrvalverde@cnb.uam.es> > Because AjPStr is now a macro that is replaced by "const AjOStr*" the > definition of cseqo becomes: > > const AjOStr* cseq, cseqo; > > This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr > (what an AjPStr points to). > Excuse me, but I've got a doubt regarding this. Wouldn't typedef const AjOStr * AjPStr; fix this and allow for multiple declarations in the same line? j -- These opinions are mine and only mine. Hey man, I saw them first! Jos? R. Valverde De nada sirve la Inteligencia Artificial cuando falta la Natural From gbottu at ben.vub.ac.be Wed Jul 28 10:27:17 2004 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 28 Jul 2004 16:27:17 +0200 Subject: EMBOSS and the GenomeReviews databank Message-ID: <20040728142717.GC25875@bigben.ulb.ac.be> Dear developers, I just noticed something that might interest you. At the EMBL-EBI they have a GenomeReviews databank (with complete bacterial chromosomes or plasmids in one entry EMBL files). They however decided to depart somewhat from the EMBL format. When I run seqret -feature grv:u00096_gr I get a lot of error messages of type : Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for tag '/protein_id' Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}' Sincerely, Guy Bottu From rls at ebi.ac.uk Wed Jul 28 11:05:39 2004 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Wed, 28 Jul 2004 16:05:39 +0100 Subject: EMBOSS and the GenomeReviews databank In-Reply-To: <20040728142717.GC25875@bigben.ulb.ac.be> Message-ID: <000801c474b4$58488720$c500a8c0@castafiore> Yes, it is very unfortunate that the genome reviews data is non-standard. I'm forwarding this to the head of that project. He may have a comment regarding the evidence tags present/future. R:) > -----Original Message----- > From: owner-emboss-dev at hgmp.mrc.ac.uk > [mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu > Sent: 28 July 2004 15:27 > To: emboss-dev at embnet.org > Subject: EMBOSS and the GenomeReviews databank > > > Dear developers, > > I just noticed something that might interest you. At the > EMBL-EBI they > have a GenomeReviews databank (with complete bacterial chromosomes or > plasmids in one entry EMBL files). They however decided to > depart somewhat > from the EMBL format. When I run > seqret -feature grv:u00096_gr > I get a lot of error messages of type : > Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for > tag '/protein_id' > Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}' > > Sincerely, > Guy Bottu > From rls at ebi.ac.uk Wed Jul 28 11:20:55 2004 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Wed, 28 Jul 2004 16:20:55 +0100 Subject: EMBOSS and the GenomeReviews databank In-Reply-To: <4107C2F6.5000609@ebi.ac.uk> Message-ID: <001301c474b6$7a5817c0$c500a8c0@castafiore> Hi Paul, Many thanks for the reply. Let's see if Guy has further comments. R:) > -----Original Message----- > From: Paul Kersey [mailto:pkersey at ebi.ac.uk] > Sent: 28 July 2004 16:15 > To: rls at ebi.ac.uk > Cc: 'Guy Bottu'; emboss-dev at embnet.org; genome_reviews at ebi.ac.uk > Subject: Re: EMBOSS and the GenomeReviews databank > > > Rodrigo Lopez wrote: > > >Yes, it is very unfortunate that the genome reviews data is > >non-standard. I'm forwarding this to the head of that > project. He may > >have a comment regarding the evidence tags present/future. > > > >R:) > > > > > > > > > >>-----Original Message----- > >>From: owner-emboss-dev at hgmp.mrc.ac.uk > >>[mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu > >>Sent: 28 July 2004 15:27 > >>To: emboss-dev at embnet.org > >>Subject: EMBOSS and the GenomeReviews databank > >> > >> > >> Dear developers, > >> > >>I just noticed something that might interest you. At the > >>EMBL-EBI they > >>have a GenomeReviews databank (with complete bacterial > chromosomes or > >>plasmids in one entry EMBL files). They however decided to > >>depart somewhat > >>from the EMBL format. When I run > >>seqret -feature grv:u00096_gr > >>I get a lot of error messages of type : > >>Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for > >>tag '/protein_id' > >>Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}' > >> > >> Sincerely, > >> Guy Bottu > >> > >> > >> > > > > > > > Dear Guy > > the evidence tags convey extra information that some users are > interested in. It was not possible to fit this information > within the > existing definition of EMBL format, hence it was necessary to > intorduce > the tags. > > However, if you do not want to use the evidence tags, we also > distribute > a program that removes them from the Genome Reviews files. > > The following comes from the Genome Reviews user manual: > > For users who do not wish to filter information by source, a > program is > provided with this release to remove evidence tags from > Genome Reviews > files, resulting in the production of "normal" EMBL format > files. This > program is written in the Java programming language and will > run on any > platform on which a Java runtime environment has been installed. Such > environments are available free of charge for many platforms > (including > Microsoft Windows, Mac OS and GNU/Linux) from either Sun Microsystems > (URL: http://java.sun.com/j2se/ or your hardware vendor. The > tag removal > program itself is available: > > * as source code (RemoveEvidenceTags.jar) from > ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/uk > > * as an executable jar file from > > ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/RemoveEvi > denceTags.jar > > eTags.jar> > > > Documentation on the use of the tag removal program can be generated > after download by one of the following commands (the first command is > for use with the RemoveEvidenceTags.java source code; and the second > command if for use with the RemoveEvidenceTags.jar file): > > * javadoc -d destination-directory RemoveEvidenceTags.java > > (where the destination-directory is the target directory, where > you would like the generated documentation to be placed) > > * jar xf RemoveEvidenceTags.jar > > (the generated documentation is placed in a directory > called javaDoc) > > The procedure to run the tag removal program is also described below: > > 1. Compile the java class, using: javac RemoveEvidenceTags.java > 2. Run the compiled code using, either: > java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags dir > or: > java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags > dir file-name > > > Alternatively the program can be run from the executable jar > (RemoveEvidenceTags.jar) as follows: > > 1. java -jar RemoveEvidenceTags.jar dir > java -jar RemoveEvidenceTags.jar dir file-name > > > where dir is the path to the directory where the Genome Reviews files > are located, and file-name is the name of a Genome Reviews file > contained in this directory. If only the single parameter > (file-name) is > used, then the program with remove the evidence tags from ALL Genome > Reviews files located in that directory. The dir should end with a > closing file separator. > > --- > > Best wishes > > Paul > > -- > "He could consider civilisation, and see the world as a > microcosm of the cell" - Joseph Heller > > ------------------------------------------------------------------ > Dr. Paul Kersey > EMBL-European Bioinformatics Institute Tel: +44-(0)1223-494601 > Wellcome Trust Genome Campus, Hinxton Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK email: pkersey at ebi.ac.uk > > From henrikki.almusa at helsinki.fi Thu Jul 1 14:08:06 2004 From: henrikki.almusa at helsinki.fi (Henrikki Almusa) Date: Thu, 1 Jul 2004 17:08:06 +0300 Subject: Patten lists ajax header, third draft In-Reply-To: <200406301632.40816.henrikki.almusa@helsinki.fi> References: <200406281120.54203.henrikki.almusa@helsinki.fi> <200406291343.12877.henrikki.almusa@helsinki.fi> <200406301632.40816.henrikki.almusa@helsinki.fi> Message-ID: <200407011708.06797.henrikki.almusa@helsinki.fi> Hello Heres the third version of the files 'ajpat.c' and 'ajpat.h'. Atm i have tested the regular expression handling and it seems to work now. I can add pattern to list, test it against a string and then clear the list. There is one compiler warning though (my fixing causes deleting to segfault). ajpat.c: In function `ajPatternDel': ajpat.c:53: warning: passing arg 1 of `ajRegFree' from incompatible pointer type The for testing was in dreg and was this: AjPPatlist plist; AjPPattern pat; AjPStr file; AjPStr tested; file=ajStrNewC("pattern.file"); tested=ajStrNewC("ggagagagagttct"); plist=ajPatlistNew(); ajPatlistParsePatternFile(plist,file,1); while (ajPatlistGetNext(plist,&pat)) { ajFmtPrint ("name: %S mismatch: %d\n",ajPatternGetName(pat),ajPatternGetMismatch(pat)); patexp = ajPatternGetCompiledPattern(pat); if (ajRegExec(patexp,tested)) ajFmtPrint (" found from '%d'\n",ajRegOffset(patexp)); } ajDebug ("Starting deleting\n"); ajPatlistDel(&plist); Now the main issues with this is still the prosite pattern handling. From my understanding it could be fixed by making prosite patterns use a struct to move the needed pieces around. That would be easy then to be used with this as well. Other point is the overloading of the acd functions. I don't yet know how to do that. However I would like some comments on whether this is a good way to do this (and could be accepted to emboss, when ready). Thanks, -- Henrikki Almusa -------------- next part -------------- A non-text attachment was scrubbed... Name: ajpat.c Type: text/x-csrc Size: 6542 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ajpat.h Type: text/x-chdr Size: 1964 bytes Desc: not available URL: From hegedus at biomembrane.hu Fri Jul 2 18:14:03 2004 From: hegedus at biomembrane.hu (Tamas Hegedus) Date: Fri, 2 Jul 2004 20:14:03 +0200 (CEST) Subject: USAs Message-ID: Dear All, I would like to know if it is possible to hack ajax to handle similar USAs listed below: - USA:kw=something, ft=sthelse. - USA:SELECT * FROM mytable WHERE.. I see you are working on pattern searches. It would be great to have the possibility to define patterns in the fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq I think the implementation of this would be useful. Return 'value' could be a 'fasta' pattern file: - seq_id type[regexpr|prosite|matrix] - pattern Or at the beginning going on the simplest way: The return value is the simple pattern. I would be satisfied only with this, too :-) Thank for your help, for your answers, Tamas From hegedus at biomembrane.hu Fri Jul 2 18:20:16 2004 From: hegedus at biomembrane.hu (Tamas Hegedus) Date: Fri, 2 Jul 2004 20:20:16 +0200 (CEST) Subject: USAs2 Message-ID: Sorry! I was inaccurate: see !!! ----------------------- Dear All, I would like to know if it is possible to hack ajax to handle similar USAs listed below and !!!HOW!!!: - USA:kw=something, ft=sthelse. - USA:SELECT * FROM mytable WHERE.. I see you are working on pattern searches. It would be great to have the possibility to define patterns in the fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq I think the implementation of this would be useful. Return 'value' could be a 'fasta' pattern file: !!! >seq_id type[regexpr|prosite|matrix] pattern !!! Or at the beginning going on the simplest way: The return value is the simple pattern. I would be satisfied only with this, too :-) Thank for your help, for your answers, Tamas -- Tam?s Heged?s, Research Associate | http://www.biomembrane.hu Membrane Research Group of | mailto:hegedus at biomembrane.hu Hungarian Academy of Sciences | tel: 36-1-3724317 H-1113 Budapest Dioszegi u 64, HUNGARY | fax: 36-1-3724353 From pmr at ebi.ac.uk Fri Jul 2 18:38:02 2004 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 02 Jul 2004 19:38:02 +0100 Subject: USAs2 In-Reply-To: References: Message-ID: <40E5AB8A.7010502@ebi.ac.uk> Hi Tamas, Thanks for the suggestion! It is late on Friday, so I will give it some thought over the weekend. > I would like to know if it is possible to hack ajax to handle similar USAs > listed below and !!!HOW!!!: > - USA:kw=something, ft=sthelse. > - USA:SELECT * FROM mytable WHERE.. Yes, it is possible. But still a hack ... which means we have not yet implemented it. This is really an extended query language. I tried to define such extensions last year when I moved back to academia, but have not yet had time to implement anything. This is an excellent time to start defining extended USAs. My plan was: Start by thinking about the "SRS query language". You can search for various "fields": id (entry ID) acc (accession number) sv (sequence version ... and maybe GI number) des (description) key (keyword phrase) org (taxonomy) ... and a few more ... In SRS, you can use & (and), | (or) ! (but not) to combine search terms In SRS you can also use > and < to follow links to and from other databases. SRS has only one link between any pair of databases - I would rather like to use named links so we can choose which links to use. I would like to allow mulitple databases in the USA. There are some problems choosing a good syntax. I would also like to allow multiple fields - obviously id and acc, or combining text fields. Then, as you suggest, some SQL-like syntax would be nice. It looks complicated, but we can work in small steps. In all cases, we need to make this work with "EMBLCD" indexing, with reading flatfile data, and with any other indexing system. We can also try to make it work with SRS and SRSWWW (easy in some cases, hard in others) > I see you are working on pattern searches. > It would be great to have the possibility to define patterns in the > fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq > I think the implementation of this would be useful. > Return 'value' could be a 'fasta' pattern file: If I understand correctly, you want to define a file of named patterns, and select one using a "USA" syntax. This is not so simple ... because programs usually want only one type of pattern. However, in ACD we can give the pattern a "knowntype" attribute so EMBOSS (and any wrapper) knows what type of pattern is allowed. We can then use Henrikki Almusa's pattern list to define a file of patterns, and some pattern syntax to say which pattern(s) to use. We do have a problem - we need to make these pattern "USAs" different from simple patterns. We also need a name for pattern definitions. I am sure we can think of one. regards, Peter Rice From ableasby at hgmp.mrc.ac.uk Fri Jul 9 13:23:21 2004 From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby) Date: Fri, 9 Jul 2004 14:23:21 +0100 (BST) Subject: Developer 2.9.0 pre-release Message-ID: <200407091323.i69DNL4S000083@bromine.hgmp.mrc.ac.uk> EMBOSS 2.9.0 is scheduled to be released on the 15th July. Primarily for GUI developers there is now a pre-release of 2.9.0 in the directory: ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/pre/ In the same directory are EMBASSY packages tailored for 2.9.0 (the ones in the directory above are incompatible). Alan PS: The real 2.9.0 will contain a few minor changes but, if your GUI works for the above it will also work for the official release. From hegedus at biomembrane.hu Thu Jul 15 18:30:18 2004 From: hegedus at biomembrane.hu (Tamas Hegedus) Date: Thu, 15 Jul 2004 20:30:18 +0200 (CEST) Subject: ModBioSQL release 0.12 Message-ID: Dear All, Dear Peter, during my work I had to use RDBMS and EMBOSS. I collected my scripts and experiments into a package called Modular BioSQL, which has different features: -- Modular RDB realization of different biological databases allows fine-tuning with increased performance. -- Storing result sets in RDBMS allows more accurate, more comfortable analysis using SQL. -- User interaction with the RDBMS (installation, loading up and querying data) does not need programming skills. -- Light weight RDB interaction with analysis packages (only EMBOSS is implemented). -- Optimalized loading of flat files into the RDBMS. -- Using 'fixed value arrays' (*_ref tables) results in both smaller data size (smaller than the flat file) and smaller index size increasing the performance (theoretically both the uploading and querying performance). -- Relatively easily extendable to implement and handle databases other than the currently realized. You may think I suggest Modular BioSQL as a replacement of BioSQL. I do not think so! For details, please visit my web site, and send comments and suggestions: http://www.biomembrane.hu/~hegedus/modbiosql/ Best regards, Tamas -- Tamas Hegedus, Research Fellow | phone: 480-301-6041 Mayo Clinic Scottsdale | fax: 480-301-7017 13000 E. Shea Blvd | mailto:hegedus.tamas at mayo.edu Scottsdale, AZ, 85259 | http://www.biomembrane.hu/~hegedus From raoul.bonnal at itb.cnr.it Mon Jul 19 10:06:52 2004 From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal) Date: Mon, 19 Jul 2004 12:06:52 +0200 Subject: Baeza-Yates,Perleberg search and Mismatch position Message-ID: <1090231612.10983.17.camel@localhost.localdomain> Hi, performing a pattern search, allowing a number of mismatches, with the methond in subject, is it possible identify the mimstaches positions into the returned patterns or have I to locate them in a second step ? func embPatBYPSearch rif: nucleos/embpat.c nucleos/embpat.h How func embPatBYPSearch could be modified to save mismatch position ? tnx in advance. -- Raoul Jean Pierre Bonnal I.T.B. - C.N.R. via Fratelli Cervi, 93 20090 Segrate -Mi-, Italy Floor 7, Room 13 Tel. +390226422724 Fax. +390226422770 E-mail: raoul.bonnal at itb.cnr.it From pmr at ebi.ac.uk Fri Jul 23 11:14:57 2004 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 23 Jul 2004 12:14:57 +0100 Subject: [EMBOSS] incorporating old code in 2.9.0 In-Reply-To: <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk> References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk> Message-ID: <4100F331.5080002@ebi.ac.uk> Derek Gatherer wrote: (see Derek's previous message to emboss at embnet.org for the problem - copied to emboss-dev because developers will need to know the answer). Solution: All variable declarations of the type: AjPStr astr, bstr; must be split into single variables from EMBOSS 2.9.0: AjPStr astr; AjPStr bstr; Explanation follows. > Hi Peter > > Here is the full error set for one of the apps: > > compact.c: In function `main': > compact.c:53: error: incompatible types in assignment > > and the code is attached. > > #include "emboss.h" > int main (int argc, char **argv) > { > AjPStr cseq, cseqo; > cseq = ajStrNew(); > cseqo = ajStrNew(); Ah, all is now clear. I just kept the relevant lines included above. Note that the cseq line is fine, the cseqo line is the one that gives the error. The cause is the redefinition of AjPStr as a macro to make "const AjPStr" work. Sorry, I forgot to stress this one in the release notes. The problem is the line: AjPStr cseq, cseqo; Because AjPStr is now a macro that is replaced by "const AjOStr*" the definition of cseqo becomes: const AjOStr* cseq, cseqo; This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr (what an AjPStr points to). The solution ... AjPStr cseq; AjPStr cseqo; All AjP definitions have to now be one per line. Sorry - we worked very hard to avouid this, but the compilers simply fail to put the const in the right place otherwise so we have to live with the macro and this side effect. This should solve your problems. regards, Peter Rice From jrvalverde at cnb.uam.es Fri Jul 23 12:27:51 2004 From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde) Date: Fri, 23 Jul 2004 14:27:51 +0200 Subject: [EMBOSS] incorporating old code in 2.9.0 In-Reply-To: <4100F331.5080002@ebi.ac.uk> References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk> <4100F331.5080002@ebi.ac.uk> Message-ID: <20040723142751.628764c8.jrvalverde@cnb.uam.es> > Because AjPStr is now a macro that is replaced by "const AjOStr*" the > definition of cseqo becomes: > > const AjOStr* cseq, cseqo; > > This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr > (what an AjPStr points to). > Excuse me, but I've got a doubt regarding this. Wouldn't typedef const AjOStr * AjPStr; fix this and allow for multiple declarations in the same line? j -- These opinions are mine and only mine. Hey man, I saw them first! Jos? R. Valverde De nada sirve la Inteligencia Artificial cuando falta la Natural From gbottu at ben.vub.ac.be Wed Jul 28 14:27:17 2004 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Wed, 28 Jul 2004 16:27:17 +0200 Subject: EMBOSS and the GenomeReviews databank Message-ID: <20040728142717.GC25875@bigben.ulb.ac.be> Dear developers, I just noticed something that might interest you. At the EMBL-EBI they have a GenomeReviews databank (with complete bacterial chromosomes or plasmids in one entry EMBL files). They however decided to depart somewhat from the EMBL format. When I run seqret -feature grv:u00096_gr I get a lot of error messages of type : Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for tag '/protein_id' Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}' Sincerely, Guy Bottu From rls at ebi.ac.uk Wed Jul 28 15:05:39 2004 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Wed, 28 Jul 2004 16:05:39 +0100 Subject: EMBOSS and the GenomeReviews databank In-Reply-To: <20040728142717.GC25875@bigben.ulb.ac.be> Message-ID: <000801c474b4$58488720$c500a8c0@castafiore> Yes, it is very unfortunate that the genome reviews data is non-standard. I'm forwarding this to the head of that project. He may have a comment regarding the evidence tags present/future. R:) > -----Original Message----- > From: owner-emboss-dev at hgmp.mrc.ac.uk > [mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu > Sent: 28 July 2004 15:27 > To: emboss-dev at embnet.org > Subject: EMBOSS and the GenomeReviews databank > > > Dear developers, > > I just noticed something that might interest you. At the > EMBL-EBI they > have a GenomeReviews databank (with complete bacterial chromosomes or > plasmids in one entry EMBL files). They however decided to > depart somewhat > from the EMBL format. When I run > seqret -feature grv:u00096_gr > I get a lot of error messages of type : > Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for > tag '/protein_id' > Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}' > > Sincerely, > Guy Bottu > From rls at ebi.ac.uk Wed Jul 28 15:20:55 2004 From: rls at ebi.ac.uk (Rodrigo Lopez) Date: Wed, 28 Jul 2004 16:20:55 +0100 Subject: EMBOSS and the GenomeReviews databank In-Reply-To: <4107C2F6.5000609@ebi.ac.uk> Message-ID: <001301c474b6$7a5817c0$c500a8c0@castafiore> Hi Paul, Many thanks for the reply. Let's see if Guy has further comments. R:) > -----Original Message----- > From: Paul Kersey [mailto:pkersey at ebi.ac.uk] > Sent: 28 July 2004 16:15 > To: rls at ebi.ac.uk > Cc: 'Guy Bottu'; emboss-dev at embnet.org; genome_reviews at ebi.ac.uk > Subject: Re: EMBOSS and the GenomeReviews databank > > > Rodrigo Lopez wrote: > > >Yes, it is very unfortunate that the genome reviews data is > >non-standard. I'm forwarding this to the head of that > project. He may > >have a comment regarding the evidence tags present/future. > > > >R:) > > > > > > > > > >>-----Original Message----- > >>From: owner-emboss-dev at hgmp.mrc.ac.uk > >>[mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu > >>Sent: 28 July 2004 15:27 > >>To: emboss-dev at embnet.org > >>Subject: EMBOSS and the GenomeReviews databank > >> > >> > >> Dear developers, > >> > >>I just noticed something that might interest you. At the > >>EMBL-EBI they > >>have a GenomeReviews databank (with complete bacterial > chromosomes or > >>plasmids in one entry EMBL files). They however decided to > >>depart somewhat > >>from the EMBL format. When I run > >>seqret -feature grv:u00096_gr > >>I get a lot of error messages of type : > >>Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for > >>tag '/protein_id' > >>Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}' > >> > >> Sincerely, > >> Guy Bottu > >> > >> > >> > > > > > > > Dear Guy > > the evidence tags convey extra information that some users are > interested in. It was not possible to fit this information > within the > existing definition of EMBL format, hence it was necessary to > intorduce > the tags. > > However, if you do not want to use the evidence tags, we also > distribute > a program that removes them from the Genome Reviews files. > > The following comes from the Genome Reviews user manual: > > For users who do not wish to filter information by source, a > program is > provided with this release to remove evidence tags from > Genome Reviews > files, resulting in the production of "normal" EMBL format > files. This > program is written in the Java programming language and will > run on any > platform on which a Java runtime environment has been installed. Such > environments are available free of charge for many platforms > (including > Microsoft Windows, Mac OS and GNU/Linux) from either Sun Microsystems > (URL: http://java.sun.com/j2se/ or your hardware vendor. The > tag removal > program itself is available: > > * as source code (RemoveEvidenceTags.jar) from > ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/uk > > * as an executable jar file from > > ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/RemoveEvi > denceTags.jar > > eTags.jar> > > > Documentation on the use of the tag removal program can be generated > after download by one of the following commands (the first command is > for use with the RemoveEvidenceTags.java source code; and the second > command if for use with the RemoveEvidenceTags.jar file): > > * javadoc -d destination-directory RemoveEvidenceTags.java > > (where the destination-directory is the target directory, where > you would like the generated documentation to be placed) > > * jar xf RemoveEvidenceTags.jar > > (the generated documentation is placed in a directory > called javaDoc) > > The procedure to run the tag removal program is also described below: > > 1. Compile the java class, using: javac RemoveEvidenceTags.java > 2. Run the compiled code using, either: > java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags dir > or: > java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags > dir file-name > > > Alternatively the program can be run from the executable jar > (RemoveEvidenceTags.jar) as follows: > > 1. java -jar RemoveEvidenceTags.jar dir > java -jar RemoveEvidenceTags.jar dir file-name > > > where dir is the path to the directory where the Genome Reviews files > are located, and file-name is the name of a Genome Reviews file > contained in this directory. If only the single parameter > (file-name) is > used, then the program with remove the evidence tags from ALL Genome > Reviews files located in that directory. The dir should end with a > closing file separator. > > --- > > Best wishes > > Paul > > -- > "He could consider civilisation, and see the world as a > microcosm of the cell" - Joseph Heller > > ------------------------------------------------------------------ > Dr. Paul Kersey > EMBL-European Bioinformatics Institute Tel: +44-(0)1223-494601 > Wellcome Trust Genome Campus, Hinxton Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK email: pkersey at ebi.ac.uk > >