From henrikki.almusa at helsinki.fi  Thu Dec  4 09:26:32 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Thu, 4 Dec 2003 16:26:32 +0200
Subject: tfscan output conversion
Message-ID: <200312041626.32319.henrikki.almusa@helsinki.fi>

Hello,

I'm trying to convert tfscan to write report output (patch attached). This 
basicly should work, but it doesn't. So one problem and one test request.

Problem. For some reason this patch seems to make it hang if used more than 
one sequence. I've used -debug option and taken last 75 lines from debug to 
add as attachment (since whole thing is almost 1meg). I can't figure out what 
causes this.

Test request. Since there is now the binding factor informtion added to this 
i've put that into tail of report. However i don't get these using tfscan 
here, so i'd like someone to see what it looks like with them.

Thanks,
-- 
Henrikki Almusa
-------------- next part --------------
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'MOUSE$A21COL_02     '
reportWriteSeqTable subseq 6 seq 840 28..33
ajFeatGetNote 'acc'
  try /note="*id HS$APOE_08          "
  testing *name
  try /note="*acc R00149"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00149'
ajFeatGetNote 'id'
  try /note="*id HS$APOE_08          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$APOE_08          '
reportWriteSeqTable subseq 5 seq 840 374..378
ajFeatGetNote 'acc'
  try /note="*id HS$ALBU_03          "
  testing *name
  try /note="*acc R00079"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00079'
ajFeatGetNote 'id'
  try /note="*id HS$ALBU_03          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$ALBU_03          '
ajStrCut 0 0 len: 1 ibegin: 0 iend: 1
ajStrCut 0 0 len: 3 ibegin: 0 iend: 1
ajFeattableDel 80723c0
ajSeqRead: input file 'mRNA.small.twice' still there, try again
ajFeattableDel 0
seqRead: cleared
seqRead: seqin format 10 'fasta'
seqRead: one format specified
ajFileBuffNobuff mRNA.small.twice buffsize: 15
++seqRead known format 10
++seqReadFmt format 10 (fasta) 'mRNA.small.twice' feat No
ajSeqParseNcbi '>Exon_10_head_2 (copied for testing two seqs) 
'
trying ajSeqParseFasta
ajSeqParseFasta '>Exon_10_head_2 (copied for testing two seqs) 
'
result id: 'Exon_10_head_2' acc: '' desc: '(copied for testing two seqs) '
parsed id 'Exon_10_head_2' acc '' sv '' gi '' desc '(copied for testing two seqs) '
seqSetName 'Exon_10_head_2' result: 'Exon_10_head_2'
at EOF: File already read to end mRNA.small.twice
End of file - data in buffer - return ajFalse
ajFileBuffClear (0) Nobuff: Yes
 first: 15 thys->Pos: 15 thys->Size: 15 thys->Nobuff: Yes
ajFileBuffClear 'mRNA.small.twice' (0 lines)
     Y size: 15 pos: 15 removed 15 lines add to free: 0
seqReadFmt success with format 10 (fasta)
seqQueryMatch 'Exon_10_head_2' id '' acc '' Sv '' Des '' Key '' Org ''
No accession number to test
No taxonomy to test
No keyword to test
No description to test
testing sequence 'Exon_10_head_2' type 'DNA' IsNuc No IsProt No
ajSeqTypeCheckIn type 'dna' found (DNA sequence)
Remove all gaps
ajSeqIsNuc Type ''
seqTypeGapnucS test
Convert '?XUu' to 'NNTt'
ajSeqRead: open buffer  usa: 'mRNA.small.twice' returns: Yes
++keep restored 0..0 (N) 'fasta' 10
ajSeqRead: thys->Db '', seqin->Db ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqRead: thys->Entryname 'Exon_10_head_2', seqin->Entryname ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqSetRange (len: 840 0..0 old 0..0)
      result: (len: 840 0..0)
ajSeqallNext success
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tfscan_report.patch
Type: text/x-diff
Size: 5759 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20031204/bda0a97e/attachment.bin 

From henrikki.almusa at helsinki.fi  Fri Dec  5 05:08:52 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 12:08:52 +0200
Subject: Report format
Message-ID: <200312051208.52153.henrikki.almusa@helsinki.fi>

Hello,

Small question about report format. Since one can add tag in style 
'type:value=text_in_file'. Is there some way to give spaces in "text_in_file" 
or give sort of "%-xS" type of syntax to make sure atleast x sized are is 
used?

Thanks,
-- 
Henrikki Almusa


From pmr at ebi.ac.uk  Fri Dec  5 06:20:27 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 05 Dec 2003 11:20:27 +0000
Subject: Report format
In-Reply-To: <200312051208.52153.henrikki.almusa@helsinki.fi>
References: <200312051208.52153.henrikki.almusa@helsinki.fi>
Message-ID: <3FD069FB.7010600@ebi.ac.uk>

Henrikki Almusa wrote:
> Hello,
> 
> Small question about report format. Since one can add tag in style 
> 'type:value=text_in_file'. Is there some way to give spaces in "text_in_file" 
> or give sort of "%-xS" type of syntax to make sure atleast x sized are is 
> used?

Good idea!

The code has widths for each column already.

We need a syntax to give:

1. minimum column width
2. maximum column width (for example, for sequence data that can be very 
long)

Perhaps type:value%n.n=columnheading

More questions:

Do we need more column types?

Do we need a way to define the "standard" tags - to change them or to 
exclude them (excluding could be done with qualifiers, changing needs a 
tag syntax)

regards,

Peter


From henrikki.almusa at helsinki.fi  Fri Dec  5 07:04:36 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 14:04:36 +0200
Subject: Report format
In-Reply-To: <3FD069FB.7010600@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk>
Message-ID: <200312051404.36982.henrikki.almusa@helsinki.fi>

On Friday 05 December 2003 13:20, Peter Rice wrote:
> The code has widths for each column already.
>
> We need a syntax to give:
>
> 1. minimum column width
> 2. maximum column width (for example, for sequence data that can be very
> long)
>
> Perhaps type:value%n.n=columnheading

Might be good to make it possible to add space within there. Eg. column 
heading. Possible ways could be either allow type:value%n.n='column heading' 
or add tag_delim to report, which defaults on space or whitespace.  

> More questions:
>
> Do we need more column types?

Perhaps name/id tag. Can't figure out much else.

> Do we need a way to define the "standard" tags - to change them or to
> exclude them (excluding could be done with qualifiers, changing needs a
> tag syntax)

This might help. For example 'cusp' could be adapted if the sequence and start 
and end points could be dropped. Perhaps this can be done in some report 
format, would need to check.

I think there is those '-noscore' etc which do similar thing. Of course all 
"standard" tags then need to be named and added to there.

-- 
Henrikki Almusa


From henrikki.almusa at helsinki.fi  Fri Dec  5 07:11:53 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 14:11:53 +0200
Subject: Report format
In-Reply-To: <3FD069FB.7010600@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk>
Message-ID: <200312051411.53703.henrikki.almusa@helsinki.fi>

Forgot to add to previous mail.

One thing that would be nice to do with report format is sorting. Would it be 
hard to try to create a possibility to sort the feature table 
accending/decending order other than how its built?

Eg sort by sequence, then start point?

-- 
Henrikki Almusa


From pmr at ebi.ac.uk  Fri Dec  5 07:38:29 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 05 Dec 2003 12:38:29 +0000
Subject: Report format
In-Reply-To: <200312051404.36982.henrikki.almusa@helsinki.fi>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk> <200312051404.36982.henrikki.almusa@helsinki.fi>
Message-ID: <3FD07C45.2070703@ebi.ac.uk>

Henrikki Almusa wrote:

>>Perhaps type:value%n.n=columnheading
> 
> Might be good to make it possible to add space within there. Eg. column 
> heading. Possible ways could be either allow type:value%n.n='column heading' 
> or add tag_delim to report, which defaults on space or whitespace.  

Spaces are tricky - for parsers that read the output.

I prefer column_heading (or ColumnHeading)

Any other comments on this?

>>Do we need more column types?
> 
> Perhaps name/id tag. Can't figure out much else.

Are they different to "str"?

I noticed in checking 2.8.0 that "rstr" works as a right-justified 
string. (str is left justified, anything else is right-justified. 
Perhaps we should structly check the tag types (anything is allowed in 
2.8.0!)

>>Do we need a way to define the "standard" tags - to change them or to
>>exclude them (excluding could be done with qualifiers, changing needs a
>>tag syntax)

> This might help. For example 'cusp' could be adapted if the sequence and start 
> and end points could be dropped. Perhaps this can be done in some report 
> format, would need to check.

Will do. We can put the -norstart and -norend qualifiers (see below) 
into the ACD report definition.

> I think there is those '-noscore' etc which do similar thing. Of course all 
> "standard" tags then need to be named and added to there.

Yes, will do. Have to check for any report formats that may be strange 
without specific tags (they can,of course, ignore the qualifier)

regards,

Peter


From henrikki.almusa at helsinki.fi  Fri Dec  5 08:46:06 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 15:46:06 +0200
Subject: Report format
In-Reply-To: <3FD07C45.2070703@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <200312051404.36982.henrikki.almusa@helsinki.fi> <3FD07C45.2070703@ebi.ac.uk>
Message-ID: <200312051546.06872.henrikki.almusa@helsinki.fi>

On Friday 05 December 2003 14:38, Peter Rice wrote:
> Henrikki Almusa wrote:
> >>Perhaps type:value%n.n=columnheading
> >
> > Might be good to make it possible to add space within there. Eg. column
> > heading. Possible ways could be either allow type:value%n.n='column
> > heading' or add tag_delim to report, which defaults on space or
> > whitespace.
>
> Spaces are tricky - for parsers that read the output.
>
> I prefer column_heading (or ColumnHeading)
>
> Any other comments on this?

I just don't like ThisTypeOfCapsing much. But i can live with it, no problem.

> >>Do we need more column types?
> >
> > Perhaps name/id tag. Can't figure out much else.
>
> Are they different to "str"?

Ah, right, understood it then wrong and your right, no dirrefent.

> I noticed in checking 2.8.0 that "rstr" works as a right-justified
> string. (str is left justified, anything else is right-justified.
> Perhaps we should structly check the tag types (anything is allowed in
> 2.8.0!)

And perhaps put some documentation into web :). There is quite little 
knowledge on what can be done with report that i could find. 

-- 
Henrikki Almusa


From d.counsell at hgmp.mrc.ac.uk  Mon Dec  8 06:28:15 2003
From: d.counsell at hgmp.mrc.ac.uk (Damian Counsell)
Date: Mon, 8 Dec 2003 11:28:15 +0000
Subject: Report format
In-Reply-To: <3FD069FB.7010600@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk>
Message-ID: <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk>

* Peter Rice <pmr at ebi.ac.uk> [031205 11:23]:
> Henrikki Almusa wrote:
> >Hello,
> >
> >Small question about report format. Since one can add tag in style 
> >'type:value=text_in_file'. Is there some way to give spaces in 
> >"text_in_file" or give sort of "%-xS" type of syntax to make sure atleast 
> >x sized are is used?
> 
> Good idea!
> 
> The code has widths for each column already.
> 
> We need a syntax to give:
> 
> 1. minimum column width
> 2. maximum column width (for example, for sequence data that can be very 
> long)
> 
> Perhaps type:value%n.n=columnheading
> 
> More questions:
> 
> Do we need more column types?

This may be a completely stupid suggestion, but, if you don't ask...

Could we have decimal tabs, please?  You know: ones smart enough to
align themselves by the position of the floating point, even in the
absence of specifying the number of digits before and after it?


all the best

Damian


-- 
MRC Rosalind Franklin Centre for Genomics Research 
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SB, UK
phone: +44 (0)1223 494585        fax: +44 (0)1223 494512
email: d.counsell at hgmp.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk/~dcounsel/


From pmr at ebi.ac.uk  Mon Dec  8 06:45:59 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 08 Dec 2003 11:45:59 +0000
Subject: Report format
In-Reply-To: <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk> <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk>
Message-ID: <3FD46477.1030602@ebi.ac.uk>

Damian Counsell wrote:
> This may be a completely stupid suggestion, but, if you don't ask...
> 
> Could we have decimal tabs, please?  You know: ones smart enough to
> align themselves by the position of the floating point, even in the
> absence of specifying the number of digits before and after it?

Hmmmm ...

What really happens to these data types is that the values are all 
strings written by the calling program and stored as tag=value pairs in 
an internal feature table.

On output, the (string) tag value is reported.

"str" tags are left justified
Anything else is right justified.

We could try, for floats, reading the value in and rewriting it in a 
fixed precision. A small overhead but maybe worth it.

We could also try removing extra trailing zeroes in some cases.

Comments?

Peter


From henrikki.almusa at helsinki.fi  Mon Dec  8 09:24:42 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Mon, 8 Dec 2003 16:24:42 +0200
Subject: tfscan output conversion
In-Reply-To: <200312041626.32319.henrikki.almusa@helsinki.fi>
References: <200312041626.32319.henrikki.almusa@helsinki.fi>
Message-ID: <200312081624.42660.henrikki.almusa@helsinki.fi>

Hello,

This patch should actually print the info properly into tail. Still would like 
to confirm that though. 

But the problem still remains. So any info on why the while fails?

-- 
Henrikki Almusa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tfscan_report.patch
Type: text/x-diff
Size: 6413 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/emboss-dev/attachments/20031208/00108002/attachment.bin 
-------------- next part --------------
ajFeatGetNote 'id'
  try /note="*id MOUSE$A21COL_02     "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'MOUSE$A21COL_02     '
reportWriteSeqTable subseq 6 seq 840 28..33
ajFeatGetNote 'acc'
  try /note="*id HS$APOE_08          "
  testing *name
  try /note="*acc R00149"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00149'
ajFeatGetNote 'id'
  try /note="*id HS$APOE_08          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$APOE_08          '
reportWriteSeqTable subseq 5 seq 840 374..378
ajFeatGetNote 'acc'
  try /note="*id HS$ALBU_03          "
  testing *name
  try /note="*acc R00079"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00079'
ajFeatGetNote 'id'
  try /note="*id HS$ALBU_03          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$ALBU_03          '
ajFeattableDel 80723d8
ajSeqRead: input file 'mRNA.small.twice' still there, try again
ajFeattableDel 0
seqRead: cleared
seqRead: seqin format 10 'fasta'
seqRead: one format specified
ajFileBuffNobuff mRNA.small.twice buffsize: 15
++seqRead known format 10
++seqReadFmt format 10 (fasta) 'mRNA.small.twice' feat No
ajSeqParseNcbi '>Exon_10_head_2 (copied for testing two seqs) 
'
trying ajSeqParseFasta
ajSeqParseFasta '>Exon_10_head_2 (copied for testing two seqs) 
'
result id: 'Exon_10_head_2' acc: '' desc: '(copied for testing two seqs) '
parsed id 'Exon_10_head_2' acc '' sv '' gi '' desc '(copied for testing two seqs) '
seqSetName 'Exon_10_head_2' result: 'Exon_10_head_2'
at EOF: File already read to end mRNA.small.twice
End of file - data in buffer - return ajFalse
ajFileBuffClear (0) Nobuff: Yes
 first: 15 thys->Pos: 15 thys->Size: 15 thys->Nobuff: Yes
ajFileBuffClear 'mRNA.small.twice' (0 lines)
     Y size: 15 pos: 15 removed 15 lines add to free: 0
seqReadFmt success with format 10 (fasta)
seqQueryMatch 'Exon_10_head_2' id '' acc '' Sv '' Des '' Key '' Org ''
No accession number to test
No taxonomy to test
No keyword to test
No description to test
testing sequence 'Exon_10_head_2' type 'DNA' IsNuc No IsProt No
ajSeqTypeCheckIn type 'dna' found (DNA sequence)
Remove all gaps
ajSeqIsNuc Type ''
seqTypeGapnucS test
Convert '?XUu' to 'NNTt'
ajSeqRead: open buffer  usa: 'mRNA.small.twice' returns: Yes
++keep restored 0..0 (N) 'fasta' 10
ajSeqRead: thys->Db '', seqin->Db ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqRead: thys->Entryname 'Exon_10_head_2', seqin->Entryname ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqSetRange (len: 840 0..0 old 0..0)
      result: (len: 840 0..0)
ajSeqallNext success

From pmr at ebi.ac.uk  Mon Dec  8 10:01:57 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 08 Dec 2003 15:01:57 +0000
Subject: ACD changes for 2.9.0
Message-ID: <3FD49265.3030205@ebi.ac.uk>

Just committed some new ACD validations (in acdvalid). Interface 
developers will need to look for them in 2.9.0.

New section "additional" for qualifiers with additional:"Y" defined.
I would suggest treating this in the same way as "advanced" (for many 
programs it needed only a rename of the advanced section).

New ACD type "toggle" - this is the same as "boolean" and will be used 
for thoe boolean values that are only used to control other ACD 
qualifiers (-plot for example). acdvalid will allow these toggles in 
other sections, and will (but not yet) check for them in calculated 
values. Boolean values will be expected to appear in the required, 
additional or advanced sections (but can be in the input or output 
sections without problem, as before).

Input and output datatypes now must appear in the input and output 
sections. matrix, datafile and cfile datatypes have been relocated.

The application name in the ACD file must match the true application 
name. This is only checked by acdvalid so far to avoid breaking 
third-party ACD files.

Output outfile, align, report, etc. have new attributes:

nullok - if true, can return a NULL value

nulldefault - if true, defaults to a NULL value. Setting a filename on 
the command line overrides and creates an output file. Setting to "" on 
the commandline ***creates the expected default filename***

missing - if true, can ctreate the expected filename by simply using 
-qualname on the commandline (rather than -qualname="") if it is last on 
the command line or followe dby anotehr qualifier (if followed by a 
paraneter that will appear to be the filename value)

I have started to change "string" datatypes to other datatypes if 
approporiate (for example to directory or datafile). There will be more 
of these.

Peter


From d.counsell at hgmp.mrc.ac.uk  Mon Dec  8 10:12:09 2003
From: d.counsell at hgmp.mrc.ac.uk (Damian Counsell)
Date: Mon, 8 Dec 2003 15:12:09 +0000
Subject: Report format
In-Reply-To: <3FD46477.1030602@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk> <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk> <3FD46477.1030602@ebi.ac.uk>
Message-ID: <20031208151209.GD5099@dev4.hgmp.mrc.ac.uk>

Peter!


* Peter Rice <pmr at ebi.ac.uk> [031208 11:49]:
> Damian Counsell wrote:
> >This may be a completely stupid suggestion, but, if you don't ask...
> >
> >Could we have decimal tabs, please?  You know: ones smart enough to
> >align themselves by the position of the floating point, even in the
> >absence of specifying the number of digits before and after it?
> 
> Hmmmm ...
> 
> What really happens to these data types is that the values are all 
> strings written by the calling program and stored as tag=value pairs in 
> an internal feature table.
> 
> On output, the (string) tag value is reported.
> 
> "str" tags are left justified
> Anything else is right justified.
> 
> We could try, for floats, reading the value in and rewriting it in a 
> fixed precision. A small overhead but maybe worth it.

Thanks for the explanation.  If no one else has any objections this
sounds fine to me.

> We could also try removing extra trailing zeroes in some cases.

I have no problem with trailing zeroes when there is consistent and
controllable length and precision in the output.  Leading zeroes are
another matter; the right number of leading spaces would be the ideal
solution for me of course.


all the best

Damian

-- 
MRC Rosalind Franklin Centre for Genomics Research 
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SB, UK
phone: +44 (0)1223 494585        fax: +44 (0)1223 494512
email: d.counsell at hgmp.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk/~dcounsel/


From gwilliam at hgmp.mrc.ac.uk  Fri Dec 12 08:59:18 2003
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 12 Dec 2003 13:59:18 +0000
Subject: CpG programs
Message-ID: <3FD9C9B6.BEEF7DC8@hgmp.mrc.ac.uk>

EMBOSS has several programs for finding CpG islands:

cpgreport	Reports all CpG rich regions
newcpgseek	Reports CpG rich regions
newcpgreport	Report CpG rich areas 

The documentation (originally supplied by the author) says that for all
practical purposes you should probably use newcpgreport.

There is probably a case for retiring some of these programs to the
'make check' section of the Makefile? (i.e remove them from the standard
distribution unless explicitly compiled.)

Which of these, if any, do you use and why?

Regards,
Gary

-- 
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522			Fax: +44 1223 494512
E-mail: gwilliam at rfcgr.mrc.ac.uk	Web: http://www.rfcgr.mrc.ac.uk


From rls at ebi.ac.uk  Mon Dec 22 05:19:43 2003
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Mon, 22 Dec 2003 10:19:43 -0000
Subject: CpG programs
In-Reply-To: <3FD9C9B6.BEEF7DC8@hgmp.mrc.ac.uk>
Message-ID: <GLEEIFGCKIHKAFEPMLLGOEPHCEAA.rls@ebi.ac.uk>

Hi,

Sorry for the late reply. I'm currently re-writing a small portion of the
code to speed things up together with a collaborator. As soon as this one is
tried and tested we will move to replace the old version of newcpgreport
with this one. As soon as that is done a name change from newcpgreport to
cpgreport will be requested and the old programs can be retired.

Thanks and Merry Xmas to all!!!!

R:)


> -----Original Message-----
> From: owner-emboss-dev at hgmp.mrc.ac.uk
> [mailto:owner-emboss-dev at hgmp.mrc.ac.uk]On Behalf Of Gary Williams, Tel
> 01223 494522
> Sent: 12 December 2003 13:59
> To: emboss-dev at embnet.org
> Subject: CpG programs
>
>
> EMBOSS has several programs for finding CpG islands:
>
> cpgreport	Reports all CpG rich regions
> newcpgseek	Reports CpG rich regions
> newcpgreport	Report CpG rich areas
>
> The documentation (originally supplied by the author) says that for all
> practical purposes you should probably use newcpgreport.
>
> There is probably a case for retiring some of these programs to the
> 'make check' section of the Makefile? (i.e remove them from the standard
> distribution unless explicitly compiled.)
>
> Which of these, if any, do you use and why?
>
> Regards,
> Gary
>
> --
> Gary Williams
> MRC Rosalind Franklin Centre for Genomics Research
> Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
> Tel: +44 1223 494522			Fax: +44 1223 494512
> E-mail: gwilliam at rfcgr.mrc.ac.uk	Web: http://www.rfcgr.mrc.ac.uk
>


From henrikki.almusa at helsinki.fi  Thu Dec  4 14:26:32 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Thu, 4 Dec 2003 16:26:32 +0200
Subject: tfscan output conversion
Message-ID: <200312041626.32319.henrikki.almusa@helsinki.fi>

Hello,

I'm trying to convert tfscan to write report output (patch attached). This 
basicly should work, but it doesn't. So one problem and one test request.

Problem. For some reason this patch seems to make it hang if used more than 
one sequence. I've used -debug option and taken last 75 lines from debug to 
add as attachment (since whole thing is almost 1meg). I can't figure out what 
causes this.

Test request. Since there is now the binding factor informtion added to this 
i've put that into tail of report. However i don't get these using tfscan 
here, so i'd like someone to see what it looks like with them.

Thanks,
-- 
Henrikki Almusa
-------------- next part --------------
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'MOUSE$A21COL_02     '
reportWriteSeqTable subseq 6 seq 840 28..33
ajFeatGetNote 'acc'
  try /note="*id HS$APOE_08          "
  testing *name
  try /note="*acc R00149"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00149'
ajFeatGetNote 'id'
  try /note="*id HS$APOE_08          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$APOE_08          '
reportWriteSeqTable subseq 5 seq 840 374..378
ajFeatGetNote 'acc'
  try /note="*id HS$ALBU_03          "
  testing *name
  try /note="*acc R00079"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00079'
ajFeatGetNote 'id'
  try /note="*id HS$ALBU_03          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$ALBU_03          '
ajStrCut 0 0 len: 1 ibegin: 0 iend: 1
ajStrCut 0 0 len: 3 ibegin: 0 iend: 1
ajFeattableDel 80723c0
ajSeqRead: input file 'mRNA.small.twice' still there, try again
ajFeattableDel 0
seqRead: cleared
seqRead: seqin format 10 'fasta'
seqRead: one format specified
ajFileBuffNobuff mRNA.small.twice buffsize: 15
++seqRead known format 10
++seqReadFmt format 10 (fasta) 'mRNA.small.twice' feat No
ajSeqParseNcbi '>Exon_10_head_2 (copied for testing two seqs) 
'
trying ajSeqParseFasta
ajSeqParseFasta '>Exon_10_head_2 (copied for testing two seqs) 
'
result id: 'Exon_10_head_2' acc: '' desc: '(copied for testing two seqs) '
parsed id 'Exon_10_head_2' acc '' sv '' gi '' desc '(copied for testing two seqs) '
seqSetName 'Exon_10_head_2' result: 'Exon_10_head_2'
at EOF: File already read to end mRNA.small.twice
End of file - data in buffer - return ajFalse
ajFileBuffClear (0) Nobuff: Yes
 first: 15 thys->Pos: 15 thys->Size: 15 thys->Nobuff: Yes
ajFileBuffClear 'mRNA.small.twice' (0 lines)
     Y size: 15 pos: 15 removed 15 lines add to free: 0
seqReadFmt success with format 10 (fasta)
seqQueryMatch 'Exon_10_head_2' id '' acc '' Sv '' Des '' Key '' Org ''
No accession number to test
No taxonomy to test
No keyword to test
No description to test
testing sequence 'Exon_10_head_2' type 'DNA' IsNuc No IsProt No
ajSeqTypeCheckIn type 'dna' found (DNA sequence)
Remove all gaps
ajSeqIsNuc Type ''
seqTypeGapnucS test
Convert '?XUu' to 'NNTt'
ajSeqRead: open buffer  usa: 'mRNA.small.twice' returns: Yes
++keep restored 0..0 (N) 'fasta' 10
ajSeqRead: thys->Db '', seqin->Db ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqRead: thys->Entryname 'Exon_10_head_2', seqin->Entryname ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqSetRange (len: 840 0..0 old 0..0)
      result: (len: 840 0..0)
ajSeqallNext success
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tfscan_report.patch
Type: text/x-diff
Size: 5759 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20031204/bda0a97e/attachment-0001.bin>

From henrikki.almusa at helsinki.fi  Fri Dec  5 10:08:52 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 12:08:52 +0200
Subject: Report format
Message-ID: <200312051208.52153.henrikki.almusa@helsinki.fi>

Hello,

Small question about report format. Since one can add tag in style 
'type:value=text_in_file'. Is there some way to give spaces in "text_in_file" 
or give sort of "%-xS" type of syntax to make sure atleast x sized are is 
used?

Thanks,
-- 
Henrikki Almusa


From pmr at ebi.ac.uk  Fri Dec  5 11:20:27 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 05 Dec 2003 11:20:27 +0000
Subject: Report format
In-Reply-To: <200312051208.52153.henrikki.almusa@helsinki.fi>
References: <200312051208.52153.henrikki.almusa@helsinki.fi>
Message-ID: <3FD069FB.7010600@ebi.ac.uk>

Henrikki Almusa wrote:
> Hello,
> 
> Small question about report format. Since one can add tag in style 
> 'type:value=text_in_file'. Is there some way to give spaces in "text_in_file" 
> or give sort of "%-xS" type of syntax to make sure atleast x sized are is 
> used?

Good idea!

The code has widths for each column already.

We need a syntax to give:

1. minimum column width
2. maximum column width (for example, for sequence data that can be very 
long)

Perhaps type:value%n.n=columnheading

More questions:

Do we need more column types?

Do we need a way to define the "standard" tags - to change them or to 
exclude them (excluding could be done with qualifiers, changing needs a 
tag syntax)

regards,

Peter


From henrikki.almusa at helsinki.fi  Fri Dec  5 12:04:36 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 14:04:36 +0200
Subject: Report format
In-Reply-To: <3FD069FB.7010600@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk>
Message-ID: <200312051404.36982.henrikki.almusa@helsinki.fi>

On Friday 05 December 2003 13:20, Peter Rice wrote:
> The code has widths for each column already.
>
> We need a syntax to give:
>
> 1. minimum column width
> 2. maximum column width (for example, for sequence data that can be very
> long)
>
> Perhaps type:value%n.n=columnheading

Might be good to make it possible to add space within there. Eg. column 
heading. Possible ways could be either allow type:value%n.n='column heading' 
or add tag_delim to report, which defaults on space or whitespace.  

> More questions:
>
> Do we need more column types?

Perhaps name/id tag. Can't figure out much else.

> Do we need a way to define the "standard" tags - to change them or to
> exclude them (excluding could be done with qualifiers, changing needs a
> tag syntax)

This might help. For example 'cusp' could be adapted if the sequence and start 
and end points could be dropped. Perhaps this can be done in some report 
format, would need to check.

I think there is those '-noscore' etc which do similar thing. Of course all 
"standard" tags then need to be named and added to there.

-- 
Henrikki Almusa


From henrikki.almusa at helsinki.fi  Fri Dec  5 12:11:53 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 14:11:53 +0200
Subject: Report format
In-Reply-To: <3FD069FB.7010600@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk>
Message-ID: <200312051411.53703.henrikki.almusa@helsinki.fi>

Forgot to add to previous mail.

One thing that would be nice to do with report format is sorting. Would it be 
hard to try to create a possibility to sort the feature table 
accending/decending order other than how its built?

Eg sort by sequence, then start point?

-- 
Henrikki Almusa


From pmr at ebi.ac.uk  Fri Dec  5 12:38:29 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 05 Dec 2003 12:38:29 +0000
Subject: Report format
In-Reply-To: <200312051404.36982.henrikki.almusa@helsinki.fi>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk> <200312051404.36982.henrikki.almusa@helsinki.fi>
Message-ID: <3FD07C45.2070703@ebi.ac.uk>

Henrikki Almusa wrote:

>>Perhaps type:value%n.n=columnheading
> 
> Might be good to make it possible to add space within there. Eg. column 
> heading. Possible ways could be either allow type:value%n.n='column heading' 
> or add tag_delim to report, which defaults on space or whitespace.  

Spaces are tricky - for parsers that read the output.

I prefer column_heading (or ColumnHeading)

Any other comments on this?

>>Do we need more column types?
> 
> Perhaps name/id tag. Can't figure out much else.

Are they different to "str"?

I noticed in checking 2.8.0 that "rstr" works as a right-justified 
string. (str is left justified, anything else is right-justified. 
Perhaps we should structly check the tag types (anything is allowed in 
2.8.0!)

>>Do we need a way to define the "standard" tags - to change them or to
>>exclude them (excluding could be done with qualifiers, changing needs a
>>tag syntax)

> This might help. For example 'cusp' could be adapted if the sequence and start 
> and end points could be dropped. Perhaps this can be done in some report 
> format, would need to check.

Will do. We can put the -norstart and -norend qualifiers (see below) 
into the ACD report definition.

> I think there is those '-noscore' etc which do similar thing. Of course all 
> "standard" tags then need to be named and added to there.

Yes, will do. Have to check for any report formats that may be strange 
without specific tags (they can,of course, ignore the qualifier)

regards,

Peter


From henrikki.almusa at helsinki.fi  Fri Dec  5 13:46:06 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Fri, 5 Dec 2003 15:46:06 +0200
Subject: Report format
In-Reply-To: <3FD07C45.2070703@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <200312051404.36982.henrikki.almusa@helsinki.fi> <3FD07C45.2070703@ebi.ac.uk>
Message-ID: <200312051546.06872.henrikki.almusa@helsinki.fi>

On Friday 05 December 2003 14:38, Peter Rice wrote:
> Henrikki Almusa wrote:
> >>Perhaps type:value%n.n=columnheading
> >
> > Might be good to make it possible to add space within there. Eg. column
> > heading. Possible ways could be either allow type:value%n.n='column
> > heading' or add tag_delim to report, which defaults on space or
> > whitespace.
>
> Spaces are tricky - for parsers that read the output.
>
> I prefer column_heading (or ColumnHeading)
>
> Any other comments on this?

I just don't like ThisTypeOfCapsing much. But i can live with it, no problem.

> >>Do we need more column types?
> >
> > Perhaps name/id tag. Can't figure out much else.
>
> Are they different to "str"?

Ah, right, understood it then wrong and your right, no dirrefent.

> I noticed in checking 2.8.0 that "rstr" works as a right-justified
> string. (str is left justified, anything else is right-justified.
> Perhaps we should structly check the tag types (anything is allowed in
> 2.8.0!)

And perhaps put some documentation into web :). There is quite little 
knowledge on what can be done with report that i could find. 

-- 
Henrikki Almusa


From d.counsell at hgmp.mrc.ac.uk  Mon Dec  8 11:28:15 2003
From: d.counsell at hgmp.mrc.ac.uk (Damian Counsell)
Date: Mon, 8 Dec 2003 11:28:15 +0000
Subject: Report format
In-Reply-To: <3FD069FB.7010600@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk>
Message-ID: <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk>

* Peter Rice <pmr at ebi.ac.uk> [031205 11:23]:
> Henrikki Almusa wrote:
> >Hello,
> >
> >Small question about report format. Since one can add tag in style 
> >'type:value=text_in_file'. Is there some way to give spaces in 
> >"text_in_file" or give sort of "%-xS" type of syntax to make sure atleast 
> >x sized are is used?
> 
> Good idea!
> 
> The code has widths for each column already.
> 
> We need a syntax to give:
> 
> 1. minimum column width
> 2. maximum column width (for example, for sequence data that can be very 
> long)
> 
> Perhaps type:value%n.n=columnheading
> 
> More questions:
> 
> Do we need more column types?

This may be a completely stupid suggestion, but, if you don't ask...

Could we have decimal tabs, please?  You know: ones smart enough to
align themselves by the position of the floating point, even in the
absence of specifying the number of digits before and after it?


all the best

Damian


-- 
MRC Rosalind Franklin Centre for Genomics Research 
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SB, UK
phone: +44 (0)1223 494585        fax: +44 (0)1223 494512
email: d.counsell at hgmp.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk/~dcounsel/


From pmr at ebi.ac.uk  Mon Dec  8 11:45:59 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 08 Dec 2003 11:45:59 +0000
Subject: Report format
In-Reply-To: <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk> <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk>
Message-ID: <3FD46477.1030602@ebi.ac.uk>

Damian Counsell wrote:
> This may be a completely stupid suggestion, but, if you don't ask...
> 
> Could we have decimal tabs, please?  You know: ones smart enough to
> align themselves by the position of the floating point, even in the
> absence of specifying the number of digits before and after it?

Hmmmm ...

What really happens to these data types is that the values are all 
strings written by the calling program and stored as tag=value pairs in 
an internal feature table.

On output, the (string) tag value is reported.

"str" tags are left justified
Anything else is right justified.

We could try, for floats, reading the value in and rewriting it in a 
fixed precision. A small overhead but maybe worth it.

We could also try removing extra trailing zeroes in some cases.

Comments?

Peter


From henrikki.almusa at helsinki.fi  Mon Dec  8 14:24:42 2003
From: henrikki.almusa at helsinki.fi (Henrikki Almusa)
Date: Mon, 8 Dec 2003 16:24:42 +0200
Subject: tfscan output conversion
In-Reply-To: <200312041626.32319.henrikki.almusa@helsinki.fi>
References: <200312041626.32319.henrikki.almusa@helsinki.fi>
Message-ID: <200312081624.42660.henrikki.almusa@helsinki.fi>

Hello,

This patch should actually print the info properly into tail. Still would like 
to confirm that though. 

But the problem still remains. So any info on why the while fails?

-- 
Henrikki Almusa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tfscan_report.patch
Type: text/x-diff
Size: 6413 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss-dev/attachments/20031208/00108002/attachment-0001.bin>
-------------- next part --------------
ajFeatGetNote 'id'
  try /note="*id MOUSE$A21COL_02     "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'MOUSE$A21COL_02     '
reportWriteSeqTable subseq 6 seq 840 28..33
ajFeatGetNote 'acc'
  try /note="*id HS$APOE_08          "
  testing *name
  try /note="*acc R00149"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00149'
ajFeatGetNote 'id'
  try /note="*id HS$APOE_08          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$APOE_08          '
reportWriteSeqTable subseq 5 seq 840 374..378
ajFeatGetNote 'acc'
  try /note="*id HS$ALBU_03          "
  testing *name
  try /note="*acc R00079"
  testing *name
  found [1] 'acc'
reportWriteSeqTable jwid 9 jmin 6 tagval 'R00079'
ajFeatGetNote 'id'
  try /note="*id HS$ALBU_03          "
  testing *name
  found [1] 'id'
reportWriteSeqTable jwid 10 jmin 6 tagval 'HS$ALBU_03          '
ajFeattableDel 80723d8
ajSeqRead: input file 'mRNA.small.twice' still there, try again
ajFeattableDel 0
seqRead: cleared
seqRead: seqin format 10 'fasta'
seqRead: one format specified
ajFileBuffNobuff mRNA.small.twice buffsize: 15
++seqRead known format 10
++seqReadFmt format 10 (fasta) 'mRNA.small.twice' feat No
ajSeqParseNcbi '>Exon_10_head_2 (copied for testing two seqs) 
'
trying ajSeqParseFasta
ajSeqParseFasta '>Exon_10_head_2 (copied for testing two seqs) 
'
result id: 'Exon_10_head_2' acc: '' desc: '(copied for testing two seqs) '
parsed id 'Exon_10_head_2' acc '' sv '' gi '' desc '(copied for testing two seqs) '
seqSetName 'Exon_10_head_2' result: 'Exon_10_head_2'
at EOF: File already read to end mRNA.small.twice
End of file - data in buffer - return ajFalse
ajFileBuffClear (0) Nobuff: Yes
 first: 15 thys->Pos: 15 thys->Size: 15 thys->Nobuff: Yes
ajFileBuffClear 'mRNA.small.twice' (0 lines)
     Y size: 15 pos: 15 removed 15 lines add to free: 0
seqReadFmt success with format 10 (fasta)
seqQueryMatch 'Exon_10_head_2' id '' acc '' Sv '' Des '' Key '' Org ''
No accession number to test
No taxonomy to test
No keyword to test
No description to test
testing sequence 'Exon_10_head_2' type 'DNA' IsNuc No IsProt No
ajSeqTypeCheckIn type 'dna' found (DNA sequence)
Remove all gaps
ajSeqIsNuc Type ''
seqTypeGapnucS test
Convert '?XUu' to 'NNTt'
ajSeqRead: open buffer  usa: 'mRNA.small.twice' returns: Yes
++keep restored 0..0 (N) 'fasta' 10
ajSeqRead: thys->Db '', seqin->Db ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqRead: thys->Entryname 'Exon_10_head_2', seqin->Entryname ''
ajSeqRead: thys->Name 'Exon_10_head_2'
ajSeqSetRange (len: 840 0..0 old 0..0)
      result: (len: 840 0..0)
ajSeqallNext success

From pmr at ebi.ac.uk  Mon Dec  8 15:01:57 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Mon, 08 Dec 2003 15:01:57 +0000
Subject: ACD changes for 2.9.0
Message-ID: <3FD49265.3030205@ebi.ac.uk>

Just committed some new ACD validations (in acdvalid). Interface 
developers will need to look for them in 2.9.0.

New section "additional" for qualifiers with additional:"Y" defined.
I would suggest treating this in the same way as "advanced" (for many 
programs it needed only a rename of the advanced section).

New ACD type "toggle" - this is the same as "boolean" and will be used 
for thoe boolean values that are only used to control other ACD 
qualifiers (-plot for example). acdvalid will allow these toggles in 
other sections, and will (but not yet) check for them in calculated 
values. Boolean values will be expected to appear in the required, 
additional or advanced sections (but can be in the input or output 
sections without problem, as before).

Input and output datatypes now must appear in the input and output 
sections. matrix, datafile and cfile datatypes have been relocated.

The application name in the ACD file must match the true application 
name. This is only checked by acdvalid so far to avoid breaking 
third-party ACD files.

Output outfile, align, report, etc. have new attributes:

nullok - if true, can return a NULL value

nulldefault - if true, defaults to a NULL value. Setting a filename on 
the command line overrides and creates an output file. Setting to "" on 
the commandline ***creates the expected default filename***

missing - if true, can ctreate the expected filename by simply using 
-qualname on the commandline (rather than -qualname="") if it is last on 
the command line or followe dby anotehr qualifier (if followed by a 
paraneter that will appear to be the filename value)

I have started to change "string" datatypes to other datatypes if 
approporiate (for example to directory or datafile). There will be more 
of these.

Peter


From d.counsell at hgmp.mrc.ac.uk  Mon Dec  8 15:12:09 2003
From: d.counsell at hgmp.mrc.ac.uk (Damian Counsell)
Date: Mon, 8 Dec 2003 15:12:09 +0000
Subject: Report format
In-Reply-To: <3FD46477.1030602@ebi.ac.uk>
References: <200312051208.52153.henrikki.almusa@helsinki.fi> <3FD069FB.7010600@ebi.ac.uk> <20031208112815.GB5099@dev4.hgmp.mrc.ac.uk> <3FD46477.1030602@ebi.ac.uk>
Message-ID: <20031208151209.GD5099@dev4.hgmp.mrc.ac.uk>

Peter!


* Peter Rice <pmr at ebi.ac.uk> [031208 11:49]:
> Damian Counsell wrote:
> >This may be a completely stupid suggestion, but, if you don't ask...
> >
> >Could we have decimal tabs, please?  You know: ones smart enough to
> >align themselves by the position of the floating point, even in the
> >absence of specifying the number of digits before and after it?
> 
> Hmmmm ...
> 
> What really happens to these data types is that the values are all 
> strings written by the calling program and stored as tag=value pairs in 
> an internal feature table.
> 
> On output, the (string) tag value is reported.
> 
> "str" tags are left justified
> Anything else is right justified.
> 
> We could try, for floats, reading the value in and rewriting it in a 
> fixed precision. A small overhead but maybe worth it.

Thanks for the explanation.  If no one else has any objections this
sounds fine to me.

> We could also try removing extra trailing zeroes in some cases.

I have no problem with trailing zeroes when there is consistent and
controllable length and precision in the output.  Leading zeroes are
another matter; the right number of leading spaces would be the ideal
solution for me of course.


all the best

Damian

-- 
MRC Rosalind Franklin Centre for Genomics Research 
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SB, UK
phone: +44 (0)1223 494585        fax: +44 (0)1223 494512
email: d.counsell at hgmp.mrc.ac.uk Web: http://www.rfcgr.mrc.ac.uk/~dcounsel/


From gwilliam at hgmp.mrc.ac.uk  Fri Dec 12 13:59:18 2003
From: gwilliam at hgmp.mrc.ac.uk (Gary Williams, Tel 01223 494522)
Date: Fri, 12 Dec 2003 13:59:18 +0000
Subject: CpG programs
Message-ID: <3FD9C9B6.BEEF7DC8@hgmp.mrc.ac.uk>

EMBOSS has several programs for finding CpG islands:

cpgreport	Reports all CpG rich regions
newcpgseek	Reports CpG rich regions
newcpgreport	Report CpG rich areas 

The documentation (originally supplied by the author) says that for all
practical purposes you should probably use newcpgreport.

There is probably a case for retiring some of these programs to the
'make check' section of the Makefile? (i.e remove them from the standard
distribution unless explicitly compiled.)

Which of these, if any, do you use and why?

Regards,
Gary

-- 
Gary Williams
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494522			Fax: +44 1223 494512
E-mail: gwilliam at rfcgr.mrc.ac.uk	Web: http://www.rfcgr.mrc.ac.uk


From rls at ebi.ac.uk  Mon Dec 22 10:19:43 2003
From: rls at ebi.ac.uk (Rodrigo Lopez)
Date: Mon, 22 Dec 2003 10:19:43 -0000
Subject: CpG programs
In-Reply-To: <3FD9C9B6.BEEF7DC8@hgmp.mrc.ac.uk>
Message-ID: <GLEEIFGCKIHKAFEPMLLGOEPHCEAA.rls@ebi.ac.uk>

Hi,

Sorry for the late reply. I'm currently re-writing a small portion of the
code to speed things up together with a collaborator. As soon as this one is
tried and tested we will move to replace the old version of newcpgreport
with this one. As soon as that is done a name change from newcpgreport to
cpgreport will be requested and the old programs can be retired.

Thanks and Merry Xmas to all!!!!

R:)


> -----Original Message-----
> From: owner-emboss-dev at hgmp.mrc.ac.uk
> [mailto:owner-emboss-dev at hgmp.mrc.ac.uk]On Behalf Of Gary Williams, Tel
> 01223 494522
> Sent: 12 December 2003 13:59
> To: emboss-dev at embnet.org
> Subject: CpG programs
>
>
> EMBOSS has several programs for finding CpG islands:
>
> cpgreport	Reports all CpG rich regions
> newcpgseek	Reports CpG rich regions
> newcpgreport	Report CpG rich areas
>
> The documentation (originally supplied by the author) says that for all
> practical purposes you should probably use newcpgreport.
>
> There is probably a case for retiring some of these programs to the
> 'make check' section of the Makefile? (i.e remove them from the standard
> distribution unless explicitly compiled.)
>
> Which of these, if any, do you use and why?
>
> Regards,
> Gary
>
> --
> Gary Williams
> MRC Rosalind Franklin Centre for Genomics Research
> Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
> Tel: +44 1223 494522			Fax: +44 1223 494512
> E-mail: gwilliam at rfcgr.mrc.ac.uk	Web: http://www.rfcgr.mrc.ac.uk
>