From henrikki.almusa at helsinki.fi  Thu Jul  1 10:08:06 2004
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Thu, 1 Jul 2004 17:08:06 +0300
Subject: Patten lists ajax header, third draft
In-Reply-To: <200406301632.40816.henrikki.almusa@helsinki.fi>
References: <200406281120.54203.henrikki.almusa@helsinki.fi> <200406291343.12877.henrikki.almusa@helsinki.fi> <200406301632.40816.henrikki.almusa@helsinki.fi>
Message-ID: <200407011708.06797.henrikki.almusa@helsinki.fi>

Hello

Heres the third version of the files 'ajpat.c' and 'ajpat.h'. Atm i have 
tested the regular expression handling and it seems to work now. I can add 
pattern to list, test it against a string and then clear the list. There is 
one compiler warning though (my fixing causes deleting to segfault). 

ajpat.c: In function `ajPatternDel':
ajpat.c:53: warning: passing arg 1 of `ajRegFree' from incompatible pointer 
type

The for testing was in dreg and was this:

AjPPatlist plist;
AjPPattern pat;
AjPStr file;
AjPStr tested;

file=ajStrNewC("pattern.file");
tested=ajStrNewC("ggagagagagttct");
plist=ajPatlistNew();
ajPatlistParsePatternFile(plist,file,1);
while (ajPatlistGetNext(plist,&pat))
{
  ajFmtPrint ("name: %S mismatch: 
%d\n",ajPatternGetName(pat),ajPatternGetMismatch(pat));
 patexp = ajPatternGetCompiledPattern(pat);
 if (ajRegExec(patexp,tested))
   ajFmtPrint ("  found from '%d'\n",ajRegOffset(patexp));
}
ajDebug ("Starting deleting\n");
ajPatlistDel(&plist);

Now the main issues with this is still the prosite pattern handling. From my 
understanding it could be fixed by making prosite patterns use a struct to 
move the needed pieces around. That would be easy then to be used with this 
as well.

Other point is the overloading of the acd functions. I don't yet know how to 
do that. However I would like some comments on whether this is a good way to 
do this (and could be accepted to emboss, when ready).

Thanks,
-- 
Henrikki Almusa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ajpat.c
Type: text/x-csrc
Size: 6542 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040701/7d0f80f2/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ajpat.h
Type: text/x-chdr
Size: 1964 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040701/7d0f80f2/attachment-0001.bin 

From hegedus at biomembrane.hu  Fri Jul  2 14:14:03 2004
From: hegedus at biomembrane.hu (Tamas Hegedus)
Date: Fri, 2 Jul 2004 20:14:03 +0200 (CEST)
Subject: USAs
Message-ID: <Pine.LNX.4.44.0407022010430.22988-100000@viola.biomembrane.lo>

Dear All,

I would like to know if it is possible to hack ajax to handle similar USAs 
listed below:
- USA:kw=something, ft=sthelse.
- USA:SELECT * FROM mytable WHERE..

I see you are working on pattern searches.
It would be great to have the possibility to define patterns in the 
fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq
I think the implementation of this would be useful.
Return 'value' could be a 'fasta' pattern file:
-	seq_id type[regexpr|prosite|matrix]
-	pattern

Or at the beginning going on the simplest way: The return value is the 
simple pattern. I would be satisfied only with this, too :-)

Thank for your help, for your answers,
Tamas


From hegedus at biomembrane.hu  Fri Jul  2 14:20:16 2004
From: hegedus at biomembrane.hu (Tamas Hegedus)
Date: Fri, 2 Jul 2004 20:20:16 +0200 (CEST)
Subject: USAs2
Message-ID: <Pine.LNX.4.44.0407022017370.23061-100000@viola.biomembrane.lo>

Sorry! I was inaccurate: see !!!
-----------------------
Dear All,

I would like to know if it is possible to hack ajax to handle similar USAs 
listed below and !!!HOW!!!:
- USA:kw=something, ft=sthelse.
- USA:SELECT * FROM mytable WHERE..

I see you are working on pattern searches.
It would be great to have the possibility to define patterns in the 
fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq
I think the implementation of this would be useful.
Return 'value' could be a 'fasta' pattern file:
!!!
>seq_id type[regexpr|prosite|matrix]
pattern
!!!

Or at the beginning going on the simplest way: The return value is the 
simple pattern. I would be satisfied only with this, too :-)

Thank for your help, for your answers,
Tamas


-- 
Tam?s Heged?s, Research Associate      | http://www.biomembrane.hu
Membrane Research Group of             | mailto:hegedus at biomembrane.hu
Hungarian Academy of Sciences          | tel: 36-1-3724317
H-1113 Budapest Dioszegi u 64, HUNGARY | fax: 36-1-3724353


From pmr at ebi.ac.uk  Fri Jul  2 14:38:02 2004
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 02 Jul 2004 19:38:02 +0100
Subject: USAs2
In-Reply-To: <Pine.LNX.4.44.0407022017370.23061-100000@viola.biomembrane.lo>
References: <Pine.LNX.4.44.0407022017370.23061-100000@viola.biomembrane.lo>
Message-ID: <40E5AB8A.7010502@ebi.ac.uk>

Hi Tamas,

Thanks for the suggestion!

It is late on Friday, so I will give it some thought over the weekend.

> I would like to know if it is possible to hack ajax to handle similar USAs 
> listed below and !!!HOW!!!:
> - USA:kw=something, ft=sthelse.
> - USA:SELECT * FROM mytable WHERE..

Yes, it is possible. But still a hack ... which means we have not yet 
implemented it.

This is really an extended query language. I tried to define such 
extensions last year when I moved back to academia, but have not yet had 
time to implement anything.

This is an excellent time to start defining extended USAs.

My plan was:

Start by thinking about the "SRS query language". You can search for 
various "fields":

id (entry ID)
acc (accession number)
sv (sequence version ... and maybe GI number)
des (description)
key (keyword phrase)
org (taxonomy)
... and a few more ...

In SRS, you can use & (and), | (or) ! (but not) to combine search terms

In SRS you can also use > and < to follow links to and from other 
databases. SRS has only one link between any pair of databases - I would 
rather like to use named links so we can choose which links to use.

I would like to allow mulitple databases in the USA. There are some 
problems choosing a good syntax.

I would also like to allow multiple fields - obviously id and acc, or 
combining text fields.

Then, as you suggest, some SQL-like syntax would be nice.

It looks complicated, but we can work in small steps.

In all cases, we need to make this work with "EMBLCD" indexing, with 
reading flatfile data, and with any other indexing system. We can also 
try to make it work with SRS and SRSWWW (easy in some cases, hard in others)

> I see you are working on pattern searches.
> It would be great to have the possibility to define patterns in the 
> fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq
> I think the implementation of this would be useful.
> Return 'value' could be a 'fasta' pattern file:

If I understand correctly, you want to define a file of named patterns, 
and select one using a "USA" syntax.

This is not so simple ... because programs usually want only one type of 
pattern.

However, in ACD we can give the pattern a "knowntype" attribute so 
EMBOSS (and any wrapper) knows what type of pattern is allowed.

We can then use Henrikki Almusa's pattern list to define a file of 
patterns, and some pattern syntax to say which pattern(s) to use.

We do have a problem - we need to make these pattern "USAs" different 
from simple patterns. We also need a name for pattern definitions. I am 
sure we can think of one.

regards,

Peter Rice


From ableasby at hgmp.mrc.ac.uk  Fri Jul  9 09:23:21 2004
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Fri, 9 Jul 2004 14:23:21 +0100 (BST)
Subject: Developer 2.9.0 pre-release
Message-ID: <200407091323.i69DNL4S000083@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.9.0 is scheduled to be released on the 15th July.
Primarily for GUI developers there is now a pre-release
of 2.9.0 in the directory:

  ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/pre/

In the same directory are EMBASSY packages tailored for 2.9.0
(the ones in the directory above are incompatible).

Alan

PS: The real 2.9.0 will contain a few minor changes but, if your
    GUI works for the above it will also work for the official
    release.


From hegedus at biomembrane.hu  Thu Jul 15 14:30:18 2004
From: hegedus at biomembrane.hu (Tamas Hegedus)
Date: Thu, 15 Jul 2004 20:30:18 +0200 (CEST)
Subject: ModBioSQL release 0.12
Message-ID: <Pine.LNX.4.44.0407152027310.12253-100000@viola.biomembrane.lo>

Dear All,
Dear Peter,

during my work I had to use RDBMS and EMBOSS.

I collected my scripts and experiments into a package called Modular 
BioSQL, which has different features:
-- Modular RDB realization of different biological databases allows 
   fine-tuning with increased performance. 
-- Storing result sets in RDBMS allows more accurate, more comfortable 
   analysis using SQL. 

-- User interaction with the RDBMS (installation, loading up and querying 
   data) does not need programming skills. 
-- Light weight RDB interaction with analysis packages (only EMBOSS is 
   implemented). 
-- Optimalized loading of flat files into the RDBMS. 
-- Using 'fixed value arrays' (*_ref tables) results in both smaller data 
   size (smaller than the flat file) and smaller index 
   size increasing the performance (theoretically both the uploading and 
   querying performance). 
-- Relatively easily extendable to implement and handle databases other 
   than the currently realized.

You may think I suggest Modular BioSQL as a replacement of BioSQL. I do 
not think so! For details, please visit my web site, and send 
comments and suggestions:
http://www.biomembrane.hu/~hegedus/modbiosql/

Best regards,
Tamas

--
Tamas Hegedus, Research Fellow | phone: 480-301-6041
Mayo Clinic Scottsdale         | fax:   480-301-7017
13000 E. Shea Blvd             | mailto:hegedus.tamas at mayo.edu
Scottsdale, AZ, 85259          | http://www.biomembrane.hu/~hegedus


From raoul.bonnal at itb.cnr.it  Mon Jul 19 06:06:52 2004
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Mon, 19 Jul 2004 12:06:52 +0200
Subject: Baeza-Yates,Perleberg search and Mismatch position
Message-ID: <1090231612.10983.17.camel@localhost.localdomain>

Hi,
performing a pattern search, allowing a number of mismatches, with the
methond in subject, is it possible identify the mimstaches positions
into the returned patterns or have I to locate them in a second step ?

func embPatBYPSearch
rif: nucleos/embpat.c nucleos/embpat.h

How func embPatBYPSearch could be modified to save mismatch position ?

tnx in advance.

-- 
Raoul Jean Pierre Bonnal 
I.T.B. - C.N.R.
via Fratelli Cervi, 93
20090 Segrate -Mi-, Italy

Floor 7, Room 13
Tel. +390226422724
Fax. +390226422770
E-mail: raoul.bonnal at itb.cnr.it


From pmr at ebi.ac.uk  Fri Jul 23 07:14:57 2004
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 23 Jul 2004 12:14:57 +0100
Subject: [EMBOSS] incorporating old code in 2.9.0
In-Reply-To: <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk>
References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk>
Message-ID: <4100F331.5080002@ebi.ac.uk>

Derek Gatherer wrote:

(see Derek's previous message to emboss at embnet.org for the problem - copied to 
emboss-dev because developers will need to know the answer).

Solution: All variable declarations of the type:

AjPStr astr, bstr;

must be split into single variables from EMBOSS 2.9.0:

AjPStr astr;
AjPStr bstr;

Explanation follows.

> Hi Peter
> 
> Here is the full error set for one of the apps:
> 
> compact.c: In function `main':
> compact.c:53: error: incompatible types in assignment
> 
> and the code is attached.
> 
> #include "emboss.h"
> int main (int argc, char **argv)
> {
>   AjPStr cseq, cseqo;
>   cseq = ajStrNew();
>   cseqo = ajStrNew();

Ah, all is now clear. I just kept the relevant lines included above.

Note that the cseq line is fine, the cseqo line is the one that gives the error.

The cause is the redefinition of AjPStr as a macro to make "const AjPStr" 
work. Sorry, I forgot to stress this one in the release notes.

The problem is the line:

AjPStr cseq, cseqo;

Because AjPStr is now a macro that is replaced by "const AjOStr*" the 
definition of cseqo becomes:

const AjOStr* cseq, cseqo;

This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr 
(what an AjPStr points to).

The solution ...

AjPStr cseq;
AjPStr cseqo;

All AjP definitions have to now be one per line.

Sorry - we worked very hard to avouid this, but the compilers simply fail to 
put the const in the right place otherwise so we have to live with the macro 
and this side effect.

This should solve your problems.

regards,

Peter Rice


From jrvalverde at cnb.uam.es  Fri Jul 23 08:27:51 2004
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Fri, 23 Jul 2004 14:27:51 +0200
Subject: [EMBOSS] incorporating old code in 2.9.0
In-Reply-To: <4100F331.5080002@ebi.ac.uk>
References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk>
	<5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk>
	<5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk>
	<4100F331.5080002@ebi.ac.uk>
Message-ID: <20040723142751.628764c8.jrvalverde@cnb.uam.es>


> Because AjPStr is now a macro that is replaced by "const AjOStr*" the 
> definition of cseqo becomes:
> 
> const AjOStr* cseq, cseqo;
> 
> This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr 
> (what an AjPStr points to).
> 
Excuse me, but I've got a doubt regarding this. Wouldn't

typedef const AjOStr * AjPStr;

fix this and allow for multiple declarations in the same line?

					j

-- 
	These opinions are mine and only mine. Hey man, I saw them first!

			    Jos? R. Valverde

	De nada sirve la Inteligencia Artificial cuando falta la Natural


From gbottu at ben.vub.ac.be  Wed Jul 28 10:27:17 2004
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 28 Jul 2004 16:27:17 +0200
Subject: EMBOSS and the GenomeReviews databank
Message-ID: <20040728142717.GC25875@bigben.ulb.ac.be>

	Dear developers,

I just noticed something that might interest you. At the EMBL-EBI they 
have a GenomeReviews databank (with complete bacterial chromosomes or 
plasmids in one entry EMBL files). They however decided to depart somewhat 
from the EMBL format. When I run 
seqret -feature grv:u00096_gr
I get a lot of error messages of type :
Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for tag '/protein_id'
Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}'

	Sincerely,
	Guy Bottu


From rls at ebi.ac.uk  Wed Jul 28 11:05:39 2004
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Wed, 28 Jul 2004 16:05:39 +0100
Subject: EMBOSS and the GenomeReviews databank
In-Reply-To: <20040728142717.GC25875@bigben.ulb.ac.be>
Message-ID: <000801c474b4$58488720$c500a8c0@castafiore>

Yes, it is very unfortunate that the genome reviews data is non-standard.
I'm forwarding this to the head of that project. He may have a comment
regarding the evidence tags present/future.

R:)


> -----Original Message-----
> From: owner-emboss-dev at hgmp.mrc.ac.uk 
> [mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu
> Sent: 28 July 2004 15:27
> To: emboss-dev at embnet.org
> Subject: EMBOSS and the GenomeReviews databank
> 
> 
> 	Dear developers,
> 
> I just noticed something that might interest you. At the 
> EMBL-EBI they 
> have a GenomeReviews databank (with complete bacterial chromosomes or 
> plasmids in one entry EMBL files). They however decided to 
> depart somewhat 
> from the EMBL format. When I run 
> seqret -feature grv:u00096_gr
> I get a lot of error messages of type :
> Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for 
> tag '/protein_id'
> Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}'
> 
> 	Sincerely,
> 	Guy Bottu
> 


From rls at ebi.ac.uk  Wed Jul 28 11:20:55 2004
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Wed, 28 Jul 2004 16:20:55 +0100
Subject: EMBOSS and the GenomeReviews databank
In-Reply-To: <4107C2F6.5000609@ebi.ac.uk>
Message-ID: <001301c474b6$7a5817c0$c500a8c0@castafiore>

Hi Paul,

Many thanks for the reply. Let's see if Guy has further comments.

R:)


> -----Original Message-----
> From: Paul Kersey [mailto:pkersey at ebi.ac.uk] 
> Sent: 28 July 2004 16:15
> To: rls at ebi.ac.uk
> Cc: 'Guy Bottu'; emboss-dev at embnet.org; genome_reviews at ebi.ac.uk
> Subject: Re: EMBOSS and the GenomeReviews databank
> 
> 
> Rodrigo Lopez wrote:
> 
> >Yes, it is very unfortunate that the genome reviews data is 
> >non-standard. I'm forwarding this to the head of that 
> project. He may 
> >have a comment regarding the evidence tags present/future.
> >
> >R:)
> >
> >
> >  
> >
> >>-----Original Message-----
> >>From: owner-emboss-dev at hgmp.mrc.ac.uk
> >>[mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu
> >>Sent: 28 July 2004 15:27
> >>To: emboss-dev at embnet.org
> >>Subject: EMBOSS and the GenomeReviews databank
> >>
> >>
> >>	Dear developers,
> >>
> >>I just noticed something that might interest you. At the
> >>EMBL-EBI they 
> >>have a GenomeReviews databank (with complete bacterial 
> chromosomes or 
> >>plasmids in one entry EMBL files). They however decided to 
> >>depart somewhat 
> >>from the EMBL format. When I run 
> >>seqret -feature grv:u00096_gr
> >>I get a lot of error messages of type :
> >>Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for 
> >>tag '/protein_id'
> >>Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}'
> >>
> >>	Sincerely,
> >>	Guy Bottu
> >>
> >>    
> >>
> >
> >  
> >
> Dear Guy
> 
> the evidence tags convey extra information that some users are 
> interested in.  It was not possible to fit this information 
> within the 
> existing definition of EMBL format, hence it was necessary to 
> intorduce 
> the tags.
> 
> However, if you do not want to use the evidence tags, we also 
> distribute 
> a program that removes them from the Genome Reviews files.
> 
> The following comes from the Genome Reviews user manual:
> 
> For users who do not wish to filter information by source, a 
> program is 
> provided with this release to remove evidence tags from 
> Genome Reviews 
> files, resulting in the production of "normal" EMBL format 
> files. This 
> program is written in the Java programming language and will 
> run on any 
> platform on which a Java runtime environment has been installed. Such 
> environments are available free of charge for many platforms 
> (including 
> Microsoft Windows, Mac OS and GNU/Linux) from either Sun Microsystems 
> (URL: http://java.sun.com/j2se/ or your hardware vendor. The 
> tag removal 
> program itself is available:
> 
>     * as source code (RemoveEvidenceTags.jar) from
>       ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/uk
>       <ftp://ftp.ebi.ac.uk/pub/software/uk>
>     * as an executable jar file from
>       
> ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/RemoveEvi
> denceTags.jar
>       
> <ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/RemoveEvidenc
> eTags.jar>
> 
> 
> Documentation on the use of the tag removal program can be generated 
> after download by one of the following commands (the first command is 
> for use with the RemoveEvidenceTags.java source code; and the second 
> command if for use with the RemoveEvidenceTags.jar file):
> 
>     * javadoc -d destination-directory RemoveEvidenceTags.java
> 
>       (where the destination-directory is the target directory, where
>       you would like the generated documentation to be placed)
> 
>     * jar xf RemoveEvidenceTags.jar
> 
>       (the generated documentation is placed in a directory 
> called javaDoc)
> 
> The procedure to run the tag removal program is also described below:
> 
>    1. Compile the java class, using: javac RemoveEvidenceTags.java
>    2. Run the compiled code using, either:
>       java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags dir
>       or:
>       java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags 
> dir file-name
> 
> 
> Alternatively the program can be run from the executable jar 
> (RemoveEvidenceTags.jar) as follows:
> 
>    1. java -jar RemoveEvidenceTags.jar dir
>       java -jar RemoveEvidenceTags.jar dir file-name
> 
> 
> where dir is the path to the directory where the Genome Reviews files 
> are located, and file-name is the name of a Genome Reviews file 
> contained in this directory. If only the single parameter 
> (file-name) is 
> used, then the program with remove the evidence tags from ALL Genome 
> Reviews files located in that directory. The dir should end with a 
> closing file separator.
> 
> ---
> 
> Best wishes
> 
> Paul
> 
> -- 
> "He could consider civilisation, and see the world as a 
> microcosm of the cell" - Joseph Heller
> 
> ------------------------------------------------------------------
> Dr. Paul Kersey
> EMBL-European Bioinformatics Institute    Tel: +44-(0)1223-494601
> Wellcome Trust Genome Campus, Hinxton     Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK                    email: pkersey at ebi.ac.uk
> 
> 


From henrikki.almusa at helsinki.fi  Thu Jul  1 14:08:06 2004
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Thu, 1 Jul 2004 17:08:06 +0300
Subject: Patten lists ajax header, third draft
In-Reply-To: <200406301632.40816.henrikki.almusa@helsinki.fi>
References: <200406281120.54203.henrikki.almusa@helsinki.fi> <200406291343.12877.henrikki.almusa@helsinki.fi> <200406301632.40816.henrikki.almusa@helsinki.fi>
Message-ID: <200407011708.06797.henrikki.almusa@helsinki.fi>

Hello

Heres the third version of the files 'ajpat.c' and 'ajpat.h'. Atm i have 
tested the regular expression handling and it seems to work now. I can add 
pattern to list, test it against a string and then clear the list. There is 
one compiler warning though (my fixing causes deleting to segfault). 

ajpat.c: In function `ajPatternDel':
ajpat.c:53: warning: passing arg 1 of `ajRegFree' from incompatible pointer 
type

The for testing was in dreg and was this:

AjPPatlist plist;
AjPPattern pat;
AjPStr file;
AjPStr tested;

file=ajStrNewC("pattern.file");
tested=ajStrNewC("ggagagagagttct");
plist=ajPatlistNew();
ajPatlistParsePatternFile(plist,file,1);
while (ajPatlistGetNext(plist,&pat))
{
  ajFmtPrint ("name: %S mismatch: 
%d\n",ajPatternGetName(pat),ajPatternGetMismatch(pat));
 patexp = ajPatternGetCompiledPattern(pat);
 if (ajRegExec(patexp,tested))
   ajFmtPrint ("  found from '%d'\n",ajRegOffset(patexp));
}
ajDebug ("Starting deleting\n");
ajPatlistDel(&plist);

Now the main issues with this is still the prosite pattern handling. From my 
understanding it could be fixed by making prosite patterns use a struct to 
move the needed pieces around. That would be easy then to be used with this 
as well.

Other point is the overloading of the acd functions. I don't yet know how to 
do that. However I would like some comments on whether this is a good way to 
do this (and could be accepted to emboss, when ready).

Thanks,
-- 
Henrikki Almusa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ajpat.c
Type: text/x-csrc
Size: 6542 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040701/7d0f80f2/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ajpat.h
Type: text/x-chdr
Size: 1964 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20040701/7d0f80f2/attachment-0003.bin>

From hegedus at biomembrane.hu  Fri Jul  2 18:14:03 2004
From: hegedus at biomembrane.hu (Tamas Hegedus)
Date: Fri, 2 Jul 2004 20:14:03 +0200 (CEST)
Subject: USAs
Message-ID: <Pine.LNX.4.44.0407022010430.22988-100000@viola.biomembrane.lo>

Dear All,

I would like to know if it is possible to hack ajax to handle similar USAs 
listed below:
- USA:kw=something, ft=sthelse.
- USA:SELECT * FROM mytable WHERE..

I see you are working on pattern searches.
It would be great to have the possibility to define patterns in the 
fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq
I think the implementation of this would be useful.
Return 'value' could be a 'fasta' pattern file:
-	seq_id type[regexpr|prosite|matrix]
-	pattern

Or at the beginning going on the simplest way: The return value is the 
simple pattern. I would be satisfied only with this, too :-)

Thank for your help, for your answers,
Tamas


From hegedus at biomembrane.hu  Fri Jul  2 18:20:16 2004
From: hegedus at biomembrane.hu (Tamas Hegedus)
Date: Fri, 2 Jul 2004 20:20:16 +0200 (CEST)
Subject: USAs2
Message-ID: <Pine.LNX.4.44.0407022017370.23061-100000@viola.biomembrane.lo>

Sorry! I was inaccurate: see !!!
-----------------------
Dear All,

I would like to know if it is possible to hack ajax to handle similar USAs 
listed below and !!!HOW!!!:
- USA:kw=something, ft=sthelse.
- USA:SELECT * FROM mytable WHERE..

I see you are working on pattern searches.
It would be great to have the possibility to define patterns in the 
fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq
I think the implementation of this would be useful.
Return 'value' could be a 'fasta' pattern file:
!!!
>seq_id type[regexpr|prosite|matrix]
pattern
!!!

Or at the beginning going on the simplest way: The return value is the 
simple pattern. I would be satisfied only with this, too :-)

Thank for your help, for your answers,
Tamas


-- 
Tam?s Heged?s, Research Associate      | http://www.biomembrane.hu
Membrane Research Group of             | mailto:hegedus at biomembrane.hu
Hungarian Academy of Sciences          | tel: 36-1-3724317
H-1113 Budapest Dioszegi u 64, HUNGARY | fax: 36-1-3724353


From pmr at ebi.ac.uk  Fri Jul  2 18:38:02 2004
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 02 Jul 2004 19:38:02 +0100
Subject: USAs2
In-Reply-To: <Pine.LNX.4.44.0407022017370.23061-100000@viola.biomembrane.lo>
References: <Pine.LNX.4.44.0407022017370.23061-100000@viola.biomembrane.lo>
Message-ID: <40E5AB8A.7010502@ebi.ac.uk>

Hi Tamas,

Thanks for the suggestion!

It is late on Friday, so I will give it some thought over the weekend.

> I would like to know if it is possible to hack ajax to handle similar USAs 
> listed below and !!!HOW!!!:
> - USA:kw=something, ft=sthelse.
> - USA:SELECT * FROM mytable WHERE..

Yes, it is possible. But still a hack ... which means we have not yet 
implemented it.

This is really an extended query language. I tried to define such 
extensions last year when I moved back to academia, but have not yet had 
time to implement anything.

This is an excellent time to start defining extended USAs.

My plan was:

Start by thinking about the "SRS query language". You can search for 
various "fields":

id (entry ID)
acc (accession number)
sv (sequence version ... and maybe GI number)
des (description)
key (keyword phrase)
org (taxonomy)
... and a few more ...

In SRS, you can use & (and), | (or) ! (but not) to combine search terms

In SRS you can also use > and < to follow links to and from other 
databases. SRS has only one link between any pair of databases - I would 
rather like to use named links so we can choose which links to use.

I would like to allow mulitple databases in the USA. There are some 
problems choosing a good syntax.

I would also like to allow multiple fields - obviously id and acc, or 
combining text fields.

Then, as you suggest, some SQL-like syntax would be nice.

It looks complicated, but we can work in small steps.

In all cases, we need to make this work with "EMBLCD" indexing, with 
reading flatfile data, and with any other indexing system. We can also 
try to make it work with SRS and SRSWWW (easy in some cases, hard in others)

> I see you are working on pattern searches.
> It would be great to have the possibility to define patterns in the 
> fuzzpro by USA: fuzzpro -pattern=USA:patt_name USA:seq
> I think the implementation of this would be useful.
> Return 'value' could be a 'fasta' pattern file:

If I understand correctly, you want to define a file of named patterns, 
and select one using a "USA" syntax.

This is not so simple ... because programs usually want only one type of 
pattern.

However, in ACD we can give the pattern a "knowntype" attribute so 
EMBOSS (and any wrapper) knows what type of pattern is allowed.

We can then use Henrikki Almusa's pattern list to define a file of 
patterns, and some pattern syntax to say which pattern(s) to use.

We do have a problem - we need to make these pattern "USAs" different 
from simple patterns. We also need a name for pattern definitions. I am 
sure we can think of one.

regards,

Peter Rice


From ableasby at hgmp.mrc.ac.uk  Fri Jul  9 13:23:21 2004
From: ableasby at hgmp.mrc.ac.uk (Alan Bleasby)
Date: Fri, 9 Jul 2004 14:23:21 +0100 (BST)
Subject: Developer 2.9.0 pre-release
Message-ID: <200407091323.i69DNL4S000083@bromine.hgmp.mrc.ac.uk>

EMBOSS 2.9.0 is scheduled to be released on the 15th July.
Primarily for GUI developers there is now a pre-release
of 2.9.0 in the directory:

  ftp://ftp.rfcgr.mrc.ac.uk/pub/EMBOSS/pre/

In the same directory are EMBASSY packages tailored for 2.9.0
(the ones in the directory above are incompatible).

Alan

PS: The real 2.9.0 will contain a few minor changes but, if your
    GUI works for the above it will also work for the official
    release.


From hegedus at biomembrane.hu  Thu Jul 15 18:30:18 2004
From: hegedus at biomembrane.hu (Tamas Hegedus)
Date: Thu, 15 Jul 2004 20:30:18 +0200 (CEST)
Subject: ModBioSQL release 0.12
Message-ID: <Pine.LNX.4.44.0407152027310.12253-100000@viola.biomembrane.lo>

Dear All,
Dear Peter,

during my work I had to use RDBMS and EMBOSS.

I collected my scripts and experiments into a package called Modular 
BioSQL, which has different features:
-- Modular RDB realization of different biological databases allows 
   fine-tuning with increased performance. 
-- Storing result sets in RDBMS allows more accurate, more comfortable 
   analysis using SQL. 

-- User interaction with the RDBMS (installation, loading up and querying 
   data) does not need programming skills. 
-- Light weight RDB interaction with analysis packages (only EMBOSS is 
   implemented). 
-- Optimalized loading of flat files into the RDBMS. 
-- Using 'fixed value arrays' (*_ref tables) results in both smaller data 
   size (smaller than the flat file) and smaller index 
   size increasing the performance (theoretically both the uploading and 
   querying performance). 
-- Relatively easily extendable to implement and handle databases other 
   than the currently realized.

You may think I suggest Modular BioSQL as a replacement of BioSQL. I do 
not think so! For details, please visit my web site, and send 
comments and suggestions:
http://www.biomembrane.hu/~hegedus/modbiosql/

Best regards,
Tamas

--
Tamas Hegedus, Research Fellow | phone: 480-301-6041
Mayo Clinic Scottsdale         | fax:   480-301-7017
13000 E. Shea Blvd             | mailto:hegedus.tamas at mayo.edu
Scottsdale, AZ, 85259          | http://www.biomembrane.hu/~hegedus


From raoul.bonnal at itb.cnr.it  Mon Jul 19 10:06:52 2004
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Mon, 19 Jul 2004 12:06:52 +0200
Subject: Baeza-Yates,Perleberg search and Mismatch position
Message-ID: <1090231612.10983.17.camel@localhost.localdomain>

Hi,
performing a pattern search, allowing a number of mismatches, with the
methond in subject, is it possible identify the mimstaches positions
into the returned patterns or have I to locate them in a second step ?

func embPatBYPSearch
rif: nucleos/embpat.c nucleos/embpat.h

How func embPatBYPSearch could be modified to save mismatch position ?

tnx in advance.

-- 
Raoul Jean Pierre Bonnal 
I.T.B. - C.N.R.
via Fratelli Cervi, 93
20090 Segrate -Mi-, Italy

Floor 7, Room 13
Tel. +390226422724
Fax. +390226422770
E-mail: raoul.bonnal at itb.cnr.it


From pmr at ebi.ac.uk  Fri Jul 23 11:14:57 2004
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 23 Jul 2004 12:14:57 +0100
Subject: [EMBOSS] incorporating old code in 2.9.0
In-Reply-To: <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk>
References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk> <5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk>
Message-ID: <4100F331.5080002@ebi.ac.uk>

Derek Gatherer wrote:

(see Derek's previous message to emboss at embnet.org for the problem - copied to 
emboss-dev because developers will need to know the answer).

Solution: All variable declarations of the type:

AjPStr astr, bstr;

must be split into single variables from EMBOSS 2.9.0:

AjPStr astr;
AjPStr bstr;

Explanation follows.

> Hi Peter
> 
> Here is the full error set for one of the apps:
> 
> compact.c: In function `main':
> compact.c:53: error: incompatible types in assignment
> 
> and the code is attached.
> 
> #include "emboss.h"
> int main (int argc, char **argv)
> {
>   AjPStr cseq, cseqo;
>   cseq = ajStrNew();
>   cseqo = ajStrNew();

Ah, all is now clear. I just kept the relevant lines included above.

Note that the cseq line is fine, the cseqo line is the one that gives the error.

The cause is the redefinition of AjPStr as a macro to make "const AjPStr" 
work. Sorry, I forgot to stress this one in the release notes.

The problem is the line:

AjPStr cseq, cseqo;

Because AjPStr is now a macro that is replaced by "const AjOStr*" the 
definition of cseqo becomes:

const AjOStr* cseq, cseqo;

This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr 
(what an AjPStr points to).

The solution ...

AjPStr cseq;
AjPStr cseqo;

All AjP definitions have to now be one per line.

Sorry - we worked very hard to avouid this, but the compilers simply fail to 
put the const in the right place otherwise so we have to live with the macro 
and this side effect.

This should solve your problems.

regards,

Peter Rice


From jrvalverde at cnb.uam.es  Fri Jul 23 12:27:51 2004
From: jrvalverde at cnb.uam.es (=?ISO-8859-15?Q?Jos=E9?= R. Valverde)
Date: Fri, 23 Jul 2004 14:27:51 +0200
Subject: [EMBOSS] incorporating old code in 2.9.0
In-Reply-To: <4100F331.5080002@ebi.ac.uk>
References: <5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk>
	<5.2.1.1.1.20040723083507.02dd7650@udcf.gla.ac.uk>
	<5.2.1.1.1.20040723115152.02dd6770@udcf.gla.ac.uk>
	<4100F331.5080002@ebi.ac.uk>
Message-ID: <20040723142751.628764c8.jrvalverde@cnb.uam.es>


> Because AjPStr is now a macro that is replaced by "const AjOStr*" the 
> definition of cseqo becomes:
> 
> const AjOStr* cseq, cseqo;
> 
> This is a classic C problem - cseq is now an AjPStr, cseqo is only an AjOStr 
> (what an AjPStr points to).
> 
Excuse me, but I've got a doubt regarding this. Wouldn't

typedef const AjOStr * AjPStr;

fix this and allow for multiple declarations in the same line?

					j

-- 
	These opinions are mine and only mine. Hey man, I saw them first!

			    Jos? R. Valverde

	De nada sirve la Inteligencia Artificial cuando falta la Natural


From gbottu at ben.vub.ac.be  Wed Jul 28 14:27:17 2004
From: gbottu at ben.vub.ac.be (Guy Bottu)
Date: Wed, 28 Jul 2004 16:27:17 +0200
Subject: EMBOSS and the GenomeReviews databank
Message-ID: <20040728142717.GC25875@bigben.ulb.ac.be>

	Dear developers,

I just noticed something that might interest you. At the EMBL-EBI they 
have a GenomeReviews databank (with complete bacterial chromosomes or 
plasmids in one entry EMBL files). They however decided to depart somewhat 
from the EMBL format. When I run 
seqret -feature grv:u00096_gr
I get a lot of error messages of type :
Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for tag '/protein_id'
Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}'

	Sincerely,
	Guy Bottu


From rls at ebi.ac.uk  Wed Jul 28 15:05:39 2004
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Wed, 28 Jul 2004 16:05:39 +0100
Subject: EMBOSS and the GenomeReviews databank
In-Reply-To: <20040728142717.GC25875@bigben.ulb.ac.be>
Message-ID: <000801c474b4$58488720$c500a8c0@castafiore>

Yes, it is very unfortunate that the genome reviews data is non-standard.
I'm forwarding this to the head of that project. He may have a comment
regarding the evidence tags present/future.

R:)


> -----Original Message-----
> From: owner-emboss-dev at hgmp.mrc.ac.uk 
> [mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu
> Sent: 28 July 2004 15:27
> To: emboss-dev at embnet.org
> Subject: EMBOSS and the GenomeReviews databank
> 
> 
> 	Dear developers,
> 
> I just noticed something that might interest you. At the 
> EMBL-EBI they 
> have a GenomeReviews databank (with complete bacterial chromosomes or 
> plasmids in one entry EMBL files). They however decided to 
> depart somewhat 
> from the EMBL format. When I run 
> seqret -feature grv:u00096_gr
> I get a lot of error messages of type :
> Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for 
> tag '/protein_id'
> Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}'
> 
> 	Sincerely,
> 	Guy Bottu
> 


From rls at ebi.ac.uk  Wed Jul 28 15:20:55 2004
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Wed, 28 Jul 2004 16:20:55 +0100
Subject: EMBOSS and the GenomeReviews databank
In-Reply-To: <4107C2F6.5000609@ebi.ac.uk>
Message-ID: <001301c474b6$7a5817c0$c500a8c0@castafiore>

Hi Paul,

Many thanks for the reply. Let's see if Guy has further comments.

R:)


> -----Original Message-----
> From: Paul Kersey [mailto:pkersey at ebi.ac.uk] 
> Sent: 28 July 2004 16:15
> To: rls at ebi.ac.uk
> Cc: 'Guy Bottu'; emboss-dev at embnet.org; genome_reviews at ebi.ac.uk
> Subject: Re: EMBOSS and the GenomeReviews databank
> 
> 
> Rodrigo Lopez wrote:
> 
> >Yes, it is very unfortunate that the genome reviews data is 
> >non-standard. I'm forwarding this to the head of that 
> project. He may 
> >have a comment regarding the evidence tags present/future.
> >
> >R:)
> >
> >
> >  
> >
> >>-----Original Message-----
> >>From: owner-emboss-dev at hgmp.mrc.ac.uk
> >>[mailto:owner-emboss-dev at hgmp.mrc.ac.uk] On Behalf Of Guy Bottu
> >>Sent: 28 July 2004 15:27
> >>To: emboss-dev at embnet.org
> >>Subject: EMBOSS and the GenomeReviews databank
> >>
> >>
> >>	Dear developers,
> >>
> >>I just noticed something that might interest you. At the
> >>EMBL-EBI they 
> >>have a GenomeReviews databank (with complete bacterial 
> chromosomes or 
> >>plasmids in one entry EMBL files). They however decided to 
> >>depart somewhat 
> >>from the EMBL format. When I run 
> >>seqret -feature grv:u00096_gr
> >>I get a lot of error messages of type :
> >>Warning: U00096_GR: Bad value 'AAC77270.1 {EMBL:U00096}' for 
> >>tag '/protein_id'
> >>Warning: bad /protein_id value 'AAC77271.1 {EMBL:U00096}'
> >>
> >>	Sincerely,
> >>	Guy Bottu
> >>
> >>    
> >>
> >
> >  
> >
> Dear Guy
> 
> the evidence tags convey extra information that some users are 
> interested in.  It was not possible to fit this information 
> within the 
> existing definition of EMBL format, hence it was necessary to 
> intorduce 
> the tags.
> 
> However, if you do not want to use the evidence tags, we also 
> distribute 
> a program that removes them from the Genome Reviews files.
> 
> The following comes from the Genome Reviews user manual:
> 
> For users who do not wish to filter information by source, a 
> program is 
> provided with this release to remove evidence tags from 
> Genome Reviews 
> files, resulting in the production of "normal" EMBL format 
> files. This 
> program is written in the Java programming language and will 
> run on any 
> platform on which a Java runtime environment has been installed. Such 
> environments are available free of charge for many platforms 
> (including 
> Microsoft Windows, Mac OS and GNU/Linux) from either Sun Microsystems 
> (URL: http://java.sun.com/j2se/ or your hardware vendor. The 
> tag removal 
> program itself is available:
> 
>     * as source code (RemoveEvidenceTags.jar) from
>       ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/uk
>       <ftp://ftp.ebi.ac.uk/pub/software/uk>
>     * as an executable jar file from
>       
> ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/java/RemoveEvi
> denceTags.jar
>       
> <ftp://ftp.ebi.ac.uk/pub/software/genome_reviews/RemoveEvidenc
> eTags.jar>
> 
> 
> Documentation on the use of the tag removal program can be generated 
> after download by one of the following commands (the first command is 
> for use with the RemoveEvidenceTags.java source code; and the second 
> command if for use with the RemoveEvidenceTags.jar file):
> 
>     * javadoc -d destination-directory RemoveEvidenceTags.java
> 
>       (where the destination-directory is the target directory, where
>       you would like the generated documentation to be placed)
> 
>     * jar xf RemoveEvidenceTags.jar
> 
>       (the generated documentation is placed in a directory 
> called javaDoc)
> 
> The procedure to run the tag removal program is also described below:
> 
>    1. Compile the java class, using: javac RemoveEvidenceTags.java
>    2. Run the compiled code using, either:
>       java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags dir
>       or:
>       java -cp . uk/ac/ebi/genomeReviews/RemoveEvidenceTags 
> dir file-name
> 
> 
> Alternatively the program can be run from the executable jar 
> (RemoveEvidenceTags.jar) as follows:
> 
>    1. java -jar RemoveEvidenceTags.jar dir
>       java -jar RemoveEvidenceTags.jar dir file-name
> 
> 
> where dir is the path to the directory where the Genome Reviews files 
> are located, and file-name is the name of a Genome Reviews file 
> contained in this directory. If only the single parameter 
> (file-name) is 
> used, then the program with remove the evidence tags from ALL Genome 
> Reviews files located in that directory. The dir should end with a 
> closing file separator.
> 
> ---
> 
> Best wishes
> 
> Paul
> 
> -- 
> "He could consider civilisation, and see the world as a 
> microcosm of the cell" - Joseph Heller
> 
> ------------------------------------------------------------------
> Dr. Paul Kersey
> EMBL-European Bioinformatics Institute    Tel: +44-(0)1223-494601
> Wellcome Trust Genome Campus, Hinxton     Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK                    email: pkersey at ebi.ac.uk
> 
>