From charles-listes-emboss at plessy.org  Sat Jan 10 00:29:46 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Sat, 10 Jan 2009 14:29:46 +0900
Subject: [EMBOSS] Please update the patch
	in	ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
Message-ID: <20090110052946.GA3077@kunpuu.plessy.org>

Dear EMBOSS developers,

I am using the patches in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
to produce up-to-date Debian packages. I noticed that there are fixes in the
parent directory that are not present in the patch. Could you update it?

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan

From ajb at ebi.ac.uk  Sat Jan 10 06:17:36 2009
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Sat, 10 Jan 2009 11:17:36 -0000 (GMT)
Subject: [EMBOSS] Please update the patch
 in	ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
In-Reply-To: <20090110052946.GA3077@kunpuu.plessy.org>
References: <20090110052946.GA3077@kunpuu.plessy.org>
Message-ID: <50394.86.9.126.186.1231586256.squirrel@webmail.ebi.ac.uk>

Hello Charles,

The patch file is there now (a casualty of the holidays).
It also corrects the copying of 4 data files to the installation
directories (affecting featcopy, infobase, inforesidue and trimspace).
Those Makefile changes are a little tricky to represent
in the 'fixes' directory as some files have the same name. So,
that's a work in progress.

Alan

> Dear EMBOSS developers,
>
> I am using the patches in
> ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
> to produce up-to-date Debian packages. I noticed that there are fixes in
> the
> parent directory that are not present in the patch. Could you update it?
>
> Have a nice day,
>
> --
> Charles Plessy
> Debian Med packaging team,
> http://www.debian.org/devel/debian-med
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From jeedward at yahoo.com  Fri Jan 16 16:57:55 2009
From: jeedward at yahoo.com (John Edward)
Date: Fri, 16 Jan 2009 13:57:55 -0800 (PST)
Subject: [EMBOSS] BCBGC-09 final call for papers
Message-ID: <317305.17140.qm@web45906.mail.sp1.yahoo.com>

BCBGC-09 final call for papers
?
The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include:
????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) 
????????? International Conference on Automation, Robotics and Control Systems (ARCS-09)
????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09)
????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) 
????????? International Conference on Information Security and Privacy (ISP-09)
????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09)
????????? International Conference on Software Engineering Theory and Practice (SETP-09) 
????????? International Conference on Theory and Applications of Computational Science (TACS-09)
????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09)
?
The website http://www.PromoteResearch.org contains more details.
?
Sincerely
John Edward
Publicity committee
?


From charles-listes-emboss at plessy.org  Sun Jan 18 20:42:02 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Mon, 19 Jan 2009 10:42:02 +0900
Subject: [EMBOSS] jemboss
In-Reply-To: <21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com>
References: <21e884180808281625m6f6fde4ci2932ed82a202c642@mail.gmail.com>
	<55740.86.9.126.186.1219996803.squirrel@webmail.ebi.ac.uk>
	<20080829090845.GG15089@kunpuu.plessy.org>
	<21e884180808290821r4da88fc7p542568a6e3589760@mail.gmail.com>
	<20080830015014.GB19735@kunpuu.plessy.org>
	<21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com>
Message-ID: <20090119014202.GB9537@kunpuu.plessy.org>

Le Mon, Sep 01, 2008 at 12:04:28PM -0300, Beny Spira a ?crit :
> >
> I am not sure about which java is installed by default in Debian, as there
> appears to be more than one (gcj, eclipse and now sun's jre and jdk).
> Is there anything else that may be done to install Jemboss?

Dear Beny,

I prepared an experimental package for jEMBOSS that uses OpenJDK. It is
available from the following URL:

http://packages.debian.org/experimental/jemboss

This package is not yet high quality; I welcome all comments to improve it. For
the moment all that is done is to collect the files installed by 'make -C
jemboss install' and to package them separately.

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan

From scott at cs.wits.ac.za  Mon Jan 19 08:23:49 2009
From: scott at cs.wits.ac.za (Scott Hazelhurst)
Date: Mon, 19 Jan 2009 15:23:49 +0200
Subject: [EMBOSS] Nthseq issue
Message-ID: <C59A4B85.6A39%scott@cs.wits.ac.za>


I don't know whether this is a bug or a feature, but I discovered  that
nthseq skips empty sequences in its counting. So if you have 10 sequences
and the  fifth is empty, then nthseq -number 6 actually returns the 7th
sequence. It does print out a warning that the sequence is empty but not
that its skipping (and also if you are putting this in a pipeline you
wouldn't see it). I couldn't see any documentation on this.

I found this problem in a data set from some collaborators, we ran dust and
then used biosed to remove Ns. Obviously this makes some sequences not
usable. While it is understandable why nthseq behaves in the way it does,
the problem is that in an automated set up it may be difficult do the
adjustment.


Regards

Scott


<html><p><font face = "verdana" size = "0.8" color = "navy">This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary.</font></p></html>

From pmr at ebi.ac.uk  Thu Jan 22 03:35:50 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 22 Jan 2009 08:35:50 +0000
Subject: [EMBOSS] Nthseq issue
In-Reply-To: <C59A4B85.6A39%scott@cs.wits.ac.za>
References: <C59A4B85.6A39%scott@cs.wits.ac.za>
Message-ID: <49782FE6.40803@ebi.ac.uk>

Scott Hazelhurst wrote:
> 
> I don't know whether this is a bug or a feature, but I discovered  that
> nthseq skips empty sequences in its counting. So if you have 10 sequences
> and the  fifth is empty, then nthseq -number 6 actually returns the 7th
> sequence. It does print out a warning that the sequence is empty but not
> that its skipping (and also if you are putting this in a pipeline you
> wouldn't see it). I couldn't see any documentation on this.
> 
> I found this problem in a data set from some collaborators, we ran dust and
> then used biosed to remove Ns. Obviously this makes some sequences not
> usable. While it is understandable why nthseq behaves in the way it does,
> the problem is that in an automated set up it may be difficult do the
> adjustment.

We will, take a look. Zero length sequences are routinely ignored in 
EMBOSS. We will check whether it is possible to use an alternative method 
for counting in nthseq and any other application that counts input sequences.

Of course, if the nth sequence is empty nthseq would have to return a 
failure to read it.

regards,

Peter Rice

From jeedward at yahoo.com  Fri Jan 23 14:41:54 2009
From: jeedward at yahoo.com (John Edward)
Date: Fri, 23 Jan 2009 11:41:54 -0800 (PST)
Subject: [EMBOSS] Final call for papers: BCBGC-09
Message-ID: <326404.38388.qm@web45907.mail.sp1.yahoo.com>


Final call for papers: BCBGC-09
?
The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include:
????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) 
????????? International Conference on Automation, Robotics and Control Systems (ARCS-09)
????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09)
????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) 
????????? International Conference on Information Security and Privacy (ISP-09)
????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09)
????????? International Conference on Software Engineering Theory and Practice (SETP-09) 
????????? International Conference on Theory and Applications of Computational Science (TACS-09)
????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09)
?
The website http://www.PromoteResearch.org contains more details.
?
Sincerely
John Edward
Publicity committee
?
?
?


From georgios at biotek.uio.no  Wed Jan 28 06:00:14 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Wed, 28 Jan 2009 12:00:14 +0100
Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0
Message-ID: <49803ABE.6080809@biotek.uio.no>

Hi list,

We are still at emboss 5.0.0 (plus patches). We have a problem using 
seqret to parse normal IDs from a file that we cannot understand. Here 
is the story with details:

I have an .fna file from a 454 read in fasta format, that goes typically 
like this :

 >FLTU7OB01CIMST length=234 xy=0915_0859 region=1 run=R_2008_12_11_14_44_02_
TTTATTATTTAATCAATAATAAAGTGCTTTAGTCAAATCGTGATGTTTCAATTATTAACA
AGTTTATTATTTCTTCATTTTACCATAATACGCTTCAAAACGTCGATGAACATATGAATT
TGAGGGATTTTTGTAACCAGGTTTTATTTTTTAAAAATCATTAAAAAATGGTGAAGTTTC
TCGAATATCGTGTTCAAAATTCAATTCCGAAATAAGTCGCCCCTAATCTGATGA
 >FLTU7OB01DL726 length=211 xy=1366_0736 region=1 run=R_2008_12_11_14_44_02_
AAACAGATAGTCAGTATTGAATTACTTTATGTAGAGCCACAATTTAGAAACAGAGGTTTA
GCTACTATACTGAAGTGTGGTATTGAGACTTGGGCAAAAAGTATAAAAGCGAAACAAATC
ATTAGTACAGTACATAAAGACAACGTGACAATGATATCATTGAACAAGCGGTTAGGGTAT
CAATTAAGTCACGTGAAAATGTATAAAGATA
....

Length is:
cat 068_2023_454Reads.fna | grep ^">" | wc -l
288507

I convert this file to EMBL format using seqret and I get a properly 
formatted file with the same number of sequence entries:

cat staphyl68.dat | grep "^ID" | wc -l
288507

I now make a btree index of the id field with dbxflat:
$ dbxflat
Database b+tree indexing for flat file databases
Basename for index files: staphyl68
Resource name: staphyl68
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
    REFSEQ : Refseq
Entry format [SWISS]: EMBL
....
Index fields [id,acc]: id
Processing file ./staphyl68.dat

(resource records and db defs also OK)

That seems to produce the right number of files:

tjonasse at dias ~/mrsa/454/068_reads $ ls  staphyl68.*
staphyl68.dat  staphyl68.ent  staphyl68.pxid  staphyl68.xid

And here starts the problem: We have an input text file 'ahits' with 
sequence IDs per line:

FLTU7OB01AH8CG
FLTU7OB01ASKRR
FLTU7OB01AUXQJ
FLTU7OB01DSL0N
FLTU7OB01BB9NP

(no fancy control characters, checking with od:
0000000   F   L   T   U   7   O   B   0   1   A   H   8   C   G  \n   F
0000020   L   T   U   7   O   B   0   1   A   S   K   R   R  \n   F   L
0000040   T   U   7   )

We extract the  'ahits' sequences (1000 sequences) from the emboss 
database by doing simply:

for seq in `cat ahits`; do seqret -filter staphyl68-id:$seq; done > 
multifasta68.fasta

And that produces exactly a 1000 seq multifasta file.

Now, then, we have a second file called 'bhits' (697 sequences). This 
file has exactly the same format as 'ahits', but when we try to extract 
the identified sequences, we get the following:

for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done

Died: seqret terminated: Bad value for '-sequence' with -auto defined
'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO
(one error per sequence ID)

This is wrong. Why? I know that the seq identifiers of 'bhits' are in 
the original fna file, the .dat EMBL file and also on the *.xid entry:

cat 068_2023_454Reads.fna | grep FLTU7OB01AJHZO
 >FLTU7OB01AJHZO length=276 xy=0104_3906 region=1 run=R_2008_12_11_14_44_02_

cat staphyl68.dat | grep FLTU7OB01AJHZO
ID   FLTU7OB01AJHZO; SV 1; linear; unassigned DNA; STD; UNC; 276 BP.

strings staphyl68.xid | grep -i FLTU7OB01AJHZO
fltu7ob01ajhzo

In addition, if I try the single identifier on its own, it works:
seqret staphyl68-id:FLTU7OB01AJHZO
Reads and writes (returns) sequences
output sequence(s) [fltu7ob01ajhzo.fasta]:
cat fltu7ob01ajhzo.fasta
 >FLTU7OB01AJHZO FLTU7OB01AJHZO.1 length=276 xy=0104_3906 region=1 
run=R_2008_12_11_14_44_02_
TCGAATGATTAATCTTGAAAATAAAACCTTCGTAATTATGGGTATTGCTAATAAACGTAG
TATCGGATTTGGCGTTGCAAAGGTATTAGATCAATTAGGGGCTAAACTTGTTTTCACTTA
TCGTAAAGACCGTAGCCGCAAAGAATTAGAAAAATTATTAGAACAATTAAACCAAGAAGA
GCCAAAATTATATCAAATCGATGTTCAAAAAGATGAAGATGTAGTAAATGGTTTTGCTAA
AATTGGCGAAGAAGTAGGCAATATTGATGGCGTATA


so, my question is:

Why does the filter mode seqret invoked inside the for loop fails and 
this one works, and the problem does not exist for the 'afile' but only 
the 'bfile'?

Thanks for any answers.

GM


-- 
--
George Magklaras BSc Hons MPhil
RHCE:805008309135525

Senior Computer Systems Engineer/UNIX-Linux Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://folk.uio.no/georgios


From georgios at biotek.uio.no  Wed Jan 28 08:47:40 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Wed, 28 Jan 2009 14:47:40 +0100
Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version
 5.0.0
In-Reply-To: <49805B17.5010203@ebi.ac.uk>
References: <49803ABE.6080809@biotek.uio.no> <49805B17.5010203@ebi.ac.uk>
Message-ID: <498061FC.6010101@biotek.uio.no>

Hi Peter, thanks for your reply

Certainly:

1)For the failed run (for seq in `cat bhits`; do seqret -debug -filter 
staphyl68-id:$seq; done ) the seqret.dbg contains:

Debug file seqret.dbg buffered:No
ajAcdInitP pgm 'seqret' package ''
ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd'
EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd
closing file '/site/share/EMBOSS/acd/seqret.acd'
ajFileNewIn '/site/share/EMBOSS/acd/codes.english'
EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english
closing file '/site/share/EMBOSS/acd/codes.english'
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard'
EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard
closing file '/site/share/EMBOSS/acd/knowntypes.standard'
Set acdprotein value '$(sequence.protein)'
ajSeqinClear called
' 0..0(N) '' 0  'staphyl68-id:FLTU7OB01AHJ67
'SA to test: 'staphyl68-id:FLTU7OB01AHJ67

format regexp: No list:No
no format specified in USA

...input format not set
dbname dbexp: Yes
'ound dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67
' Field 'id'ng 'FLTU7OB01AHJ67
' acc '' sv '' gi '' des '' org '' key ''
no wildcard in stored qry
database type: 'N' format 'embl'
use access method 'emboss'
Matched seqAccess[1] 'emboss'
seqAccessEmboss type 1
' acc '' hasacc:Yess/u4/tjonasse/mrsa/454/068_reads/' entry 'fltu7ob01ahj67
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
EOF ajFileGetsL file 
/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
' acc: '' hasacc:Yesahj67
B+tree Entry failed
' not foundtry id:'fltu7ob01ahj67
seqEmbossQryClose clean up qryd
Database 'staphyl68' : access method 'emboss' failed

2)For the standalone successful run (seqret -debug 
staphyl68-id:FLTU7OB01AHJ67), seqret.dbg states:
Debug file seqret.dbg buffered:No
ajAcdInitP pgm 'seqret' package ''
ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd'
EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd
closing file '/site/share/EMBOSS/acd/seqret.acd'
ajFileNewIn '/site/share/EMBOSS/acd/codes.english'
EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english
closing file '/site/share/EMBOSS/acd/codes.english'
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard'
EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard
closing file '/site/share/EMBOSS/acd/knowntypes.standard'
Set acdprotein value '$(sequence.protein)'
ajSeqinClear called
++seqUsaProcess 'staphyl68-id:FLTU7OB01AHJ67' 0..0(N) '' 0
USA to test: 'staphyl68-id:FLTU7OB01AHJ67'

format regexp: No list:No
no format specified in USA

...input format not set
dbname dbexp: Yes
found dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67'
   db QryString 'FLTU7OB01AHJ67' Field 'id'
ajSeqQueryWild id 'FLTU7OB01AHJ67' acc '' sv '' gi '' des '' org '' key ''
no wildcard in stored qry
database type: 'N' format 'embl'
use access method 'emboss'
Matched seqAccess[1] 'emboss'
seqAccessEmboss type 1
directory '/div/dias/u4/tjonasse/mrsa/454/068_reads/' entry 
'fltu7ob01ahj67' acc '' hasacc:Yes
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
EOF ajFileGetsL file 
/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
entry id: 'fltu7ob01ahj67' acc: '' hasacc:Yes
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
seqEmbossQryClose clean up qryd
seqRead: cleared
seqRead: seqin format 3 'embl'
seqRead: one format specified
ajFileBuffNobuff /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat 
buffsize: 0
++seqRead known format 3
++seqReadFmt format 3 (embl) 'staphyl68-id:FLTU7OB01AHJ67' feat No
seqReadEmbl first line 'ID   FLTU7OB01AHJ67; SV 1; linear; unassigned 
DNA; STD; UNC; 184 BP.
'
seqReadEmbl ID line found
seqSetName word 'FLTU7OB01AHJ67'
seqSetName 'FLTU7OB01AHJ67' result: 'FLTU7OB01AHJ67'
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajFileBuffClear (0) Nobuff: Yes
size 0: Lines: 0 Curr: 0  Prev: 0 Last: 0 Free: 0 Freelast: 0
ajFileBuffClear 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (0 lines)
      Y size: 0 pos: 0 removed 0 lines add to free: 0
Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
              Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N
  Free: 0 Last: -1
seqReadFmt success with format 3 (embl)
seqQueryMatch 'FLTU7OB01AHJ67' id 'fltu7ob01ahj67' acc '' Sv '' Gi '' 
Des '' Key '' Org '' Case No Done Yes
seqTypeSet 'N'
ajSeqTypeCheckIn type 'gapany' found (any valid sequence with gaps)
Convert gaps to '-'
ajSeqTypeCheckIn: bad characters test passed, convert
Convert '?' to 'X'
ajSeqTypeCheckIn: OK - no badchars
seqDefine: thys->Db 'staphyl68', seqin->Db 'staphyl68'
seqDefine: thys->Name 'FLTU7OB01AHJ67' type: N
seqDefine: thys->Entryname 'FLTU7OB01AHJ67', seqin->Entryname ''
seqDefine: returns thys->Name 'FLTU7OB01AHJ67' type: N
++ajSeqallread set db: 'staphyl68' => 'staphyl68'
ajSeqallGetName ''
ajSeqIsNuc Type 'N'
ajSeqIsNuc Type 'N'
ajSeqIsProt Type 'N'
ajSeqallGetUsa 'staphyl68-id:FLTU7OB01AHJ67'
ajSeqallGetseqName 'FLTU7OB01AHJ67'
... output format not set, default to 'fasta'
ajSeqoutClear called
... output format not set, default to 'fasta'
ajSeqoutOpen dir '' qrydir ''
seqoutUsaProcess
output USA to test: 'fltu7ob01ahj67.fasta'

format regexp: No
no format specified in USA

file:id regexp: Yes
found filename fltu7ob01ahj67.fasta single: No dir: ''
ajFileNewOutD('' 'fltu7ob01ahj67.fasta')
ajFileNewOutD open name 'fltu7ob01ahj67.fasta'

ajSeqSetRange (len: 184 0..0 old 0..0) rev:No reversed:No
       result: (len: 184 0..0)
ajSeqoutWriteSeq 'FLTU7OB01AHJ67' len: 184
ajSeqoutWriteSeq 17 'fasta' single: No feat: No Save: No
seqClone out Setdb '' Db '' seq Setdb '<null>' Db 'staphyl68'
seqClone outseq->Type '' seq->Type 'N'
seqClone 0 .. 0 1 .. 184 len: 184 type: 'N'
   Db: 'staphyl68' Name: 'FLTU7OB01AHJ67' Entryname: 'FLTU7OB01AHJ67'
ajSeqTypeCheckS type 'gapany' found (any valid sequence with gaps)
Convert gaps to '-'
Convert '?' to 'X'
ajSeqoutSetNameDefaultS already has a name 'FLTU7OB01AHJ67'
seqWriteFasta outseq Db 'staphyl68' Setdb '' Setoutdb '' Name 
'FLTU7OB01AHJ67'
seqoutUfoLocal Features No Ufo 0 ''
ajSeqoutWriteSeq tests features No tabouitisopen No UfoLocal No ftlocal No
ajSeqRead: input file 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' still there, 
try again
seqRead: cleared
seqRead: single access - count 1 - call access routine again
seqAccessEmboss type 1
seqEmbossQryReuse: query data all finished
seqRead: seqin->Query->Access->Access(seqin) *failed*
ajSeqRead: open buffer  usa: 'staphyl68-id:FLTU7OB01AHJ67' returns: No
ajSeqallNext failed
ajSeqinClear called
ajFileBuffClear (-1) Nobuff: Yes
size 0: Lines: 0 Curr: 0  Prev: 0 Last: 0 Free: 0 Freelast: 0
ajFileBuffClear 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (-1 lines)
      Y size: 0 pos: 0 removed 0 lines add to free: 0
Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
              Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N
  Free: 0 Last: -1
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
ajSeqoutClose 'fltu7ob01ahj67.fasta'
closing file 'fltu7ob01ahj67.fasta'
ajSeqinDel called usa:''
ajSeqQueryDel db:'' id:''

Final Summary
=============

Table usage : 11 opened, 0 closed, 251 maxsize, 40 maxmem
List usage : 27 opened, 27 closed, 1438 maxsize 2380 nodes
List iterator usage : 4 opened, 4 closed
File usage : 1 opened, 9 closed, 3 max, 10 total
ajNamExit done
Regexp usage (bytes): 168 allocated, 1008 freed, -840 in use (sizes change)
Regexp usage (number): 21 allocated, 21 freed 0 in use
Array usage (bytes): 0 allocated, 0 freed, 0 in use
Array usage (number): 0 allocated, 0 freed, 0 resized, 0 in use
Array usage 2D (bytes): 0 allocated, 0 freed, 0 in use
Array usage 2D (number): 0 allocated, 0 freed, 0 resized, 0 in use
Array usage 3D (bytes): 0 allocated, 0 freed, 0 in use
Array usage 3D (number): 0 allocated, 0 freed, 0 resized, 0 in use
String usage (bytes): 268013 allocated, 268270 freed, -257 in use
String usage (number): 4982 allocated, 4979 freed 3 in use
Memory usage (bytes): 535329 allocated, 640 reallocated 503881 zeroed
Memory usage (number): 14393 allocates, 14405 frees, 10 resizes, -12 in use
closing file 'seqret.dbg'


3)The staphyl68.pxid file contains:
Order     60
Fill      42
Pagesize  2048
Level     2
Cachesize 200
Order2    82
Fill2     99
Count     288506
Kwlimit   15


In addition, the definition plus resource record I defined for the the 
staphyl68 database in my local .embossrc file is the following (which 
should accommodate for the length of the id field, shouldn't it?):

DB staphyl68 [
         type: N
         method: emboss
         format: embl
         fields: "id,des"
         file: staphyl68.dat
         indexdirectory: /div/dias/u4/tjonasse/mrsa/454/068_reads/
         comment: "mrsa staphyl68 reads"
]

RES staphyl68 [
    type: Index
    idlen:  20
    deslen: 50
]


Best regards,
GM


Peter Rice wrote:
> Hi George,
> 
>> Why does the filter mode seqret invoked inside the for loop fails and 
>> this one works, and the problem does not exist for the 'afile' but 
>> only the 'bfile'?
> 
> Can you add "-debug" to the seqret commandline and send me the
> seqret.dbg file (it will be for the last seqret run so you'll need some
> way to make sure the last run failed)
> 
> and also sent the seqret.dbg file for running seqret standalone with the
> same ID that worked.
> 
> It would also be useful to see the .pxid file for the staphyl68 database
> (it includes the length of ID that was indexed - your IDs are quite long
> for dbxflat)
> 
> regards,
> 
> Peter
> 

-- 


From georgios at biotek.uio.no  Wed Jan 28 09:15:43 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Wed, 28 Jan 2009 15:15:43 +0100
Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version
 5.0.0
In-Reply-To: <498063B0.1010207@ebi.ac.uk>
References: <49803ABE.6080809@biotek.uio.no> <498063B0.1010207@ebi.ac.uk>
Message-ID: <4980688F.2010209@biotek.uio.no>

Indeed there was an \r \n to blame. Didn't spot that with of, because it 
was only one instance at the beginning of the file and not on every 
line. dos2unix to the rescue and we are back in business.

Cheers Peter!

GM

Peter Rice wrote:
> Hi George,
>>  Now, then, we have a second file called 'bhits' (697 sequences). This
>> file has exactly the same format as 'ahits', but when we try to 
>> extract the identified sequences, we get the following:
>>
>> for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done
>>
>> Died: seqret terminated: Bad value for '-sequence' with -auto defined
>> 'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO
>> (one error per sequence ID)
> 
> Umm ... does the message really start with 'rror'?
> 
> That suggests some non-printing character is involved in the ID. Have 
> you checked bhits does not have any strange characters?
> 
> The error message should be:
> 
> Error: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO'
> 
> So something at the end of the ID seems to have moved the final quite to 
> the start of the line.
> 
> I can get the same effect by using noreturn -system pc to change the 
> carriage control characters in bhits.
> 
> I suspect that is the cause of your problem.
> 
> Let me know if that doesn't solve it.
> 
> regards,
> 
> Peter
> 


From staffa at niehs.nih.gov  Thu Jan 29 10:45:32 2009
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 29 Jan 2009 10:45:32 -0500
Subject: [EMBOSS] EMBOSS/Jemboss
In-Reply-To: <4980688F.2010209@biotek.uio.no>
Message-ID: <C5A7394C.CE99%staffa@niehs.nih.gov>

We are working hammer and tongs to make emboss and jemboss available
institute-wide as a substitute for the GCG package to be as much like SeqLab
as possible. 
Is there anyone there who has been successful creating a client-server
relationship with Jemboss on Mac OS X with a Unix server?


From charles-listes-emboss at plessy.org  Sat Jan 10 05:29:46 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Sat, 10 Jan 2009 14:29:46 +0900
Subject: [EMBOSS] Please update the patch
	in	ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
Message-ID: <20090110052946.GA3077@kunpuu.plessy.org>

Dear EMBOSS developers,

I am using the patches in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
to produce up-to-date Debian packages. I noticed that there are fixes in the
parent directory that are not present in the patch. Could you update it?

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


From ajb at ebi.ac.uk  Sat Jan 10 11:17:36 2009
From: ajb at ebi.ac.uk (ajb at ebi.ac.uk)
Date: Sat, 10 Jan 2009 11:17:36 -0000 (GMT)
Subject: [EMBOSS] Please update the patch
 in	ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
In-Reply-To: <20090110052946.GA3077@kunpuu.plessy.org>
References: <20090110052946.GA3077@kunpuu.plessy.org>
Message-ID: <50394.86.9.126.186.1231586256.squirrel@webmail.ebi.ac.uk>

Hello Charles,

The patch file is there now (a casualty of the holidays).
It also corrects the copying of 4 data files to the installation
directories (affecting featcopy, infobase, inforesidue and trimspace).
Those Makefile changes are a little tricky to represent
in the 'fixes' directory as some files have the same name. So,
that's a work in progress.

Alan

> Dear EMBOSS developers,
>
> I am using the patches in
> ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/
> to produce up-to-date Debian packages. I noticed that there are fixes in
> the
> parent directory that are not present in the patch. Could you update it?
>
> Have a nice day,
>
> --
> Charles Plessy
> Debian Med packaging team,
> http://www.debian.org/devel/debian-med
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


From jeedward at yahoo.com  Fri Jan 16 21:57:55 2009
From: jeedward at yahoo.com (John Edward)
Date: Fri, 16 Jan 2009 13:57:55 -0800 (PST)
Subject: [EMBOSS] BCBGC-09 final call for papers
Message-ID: <317305.17140.qm@web45906.mail.sp1.yahoo.com>

BCBGC-09 final call for papers
?
The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include:
????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) 
????????? International Conference on Automation, Robotics and Control Systems (ARCS-09)
????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09)
????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) 
????????? International Conference on Information Security and Privacy (ISP-09)
????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09)
????????? International Conference on Software Engineering Theory and Practice (SETP-09) 
????????? International Conference on Theory and Applications of Computational Science (TACS-09)
????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09)
?
The website http://www.PromoteResearch.org contains more details.
?
Sincerely
John Edward
Publicity committee
?


From charles-listes-emboss at plessy.org  Mon Jan 19 01:42:02 2009
From: charles-listes-emboss at plessy.org (Charles Plessy)
Date: Mon, 19 Jan 2009 10:42:02 +0900
Subject: [EMBOSS] jemboss
In-Reply-To: <21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com>
References: <21e884180808281625m6f6fde4ci2932ed82a202c642@mail.gmail.com>
	<55740.86.9.126.186.1219996803.squirrel@webmail.ebi.ac.uk>
	<20080829090845.GG15089@kunpuu.plessy.org>
	<21e884180808290821r4da88fc7p542568a6e3589760@mail.gmail.com>
	<20080830015014.GB19735@kunpuu.plessy.org>
	<21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com>
Message-ID: <20090119014202.GB9537@kunpuu.plessy.org>

Le Mon, Sep 01, 2008 at 12:04:28PM -0300, Beny Spira a ?crit :
> >
> I am not sure about which java is installed by default in Debian, as there
> appears to be more than one (gcj, eclipse and now sun's jre and jdk).
> Is there anything else that may be done to install Jemboss?

Dear Beny,

I prepared an experimental package for jEMBOSS that uses OpenJDK. It is
available from the following URL:

http://packages.debian.org/experimental/jemboss

This package is not yet high quality; I welcome all comments to improve it. For
the moment all that is done is to collect the files installed by 'make -C
jemboss install' and to package them separately.

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


From scott at cs.wits.ac.za  Mon Jan 19 13:23:49 2009
From: scott at cs.wits.ac.za (Scott Hazelhurst)
Date: Mon, 19 Jan 2009 15:23:49 +0200
Subject: [EMBOSS] Nthseq issue
Message-ID: <C59A4B85.6A39%scott@cs.wits.ac.za>


I don't know whether this is a bug or a feature, but I discovered  that
nthseq skips empty sequences in its counting. So if you have 10 sequences
and the  fifth is empty, then nthseq -number 6 actually returns the 7th
sequence. It does print out a warning that the sequence is empty but not
that its skipping (and also if you are putting this in a pipeline you
wouldn't see it). I couldn't see any documentation on this.

I found this problem in a data set from some collaborators, we ran dust and
then used biosed to remove Ns. Obviously this makes some sequences not
usable. While it is understandable why nthseq behaves in the way it does,
the problem is that in an automated set up it may be difficult do the
adjustment.


Regards

Scott


<html><p><font face = "verdana" size = "0.8" color = "navy">This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary.</font></p></html>


From pmr at ebi.ac.uk  Thu Jan 22 08:35:50 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 22 Jan 2009 08:35:50 +0000
Subject: [EMBOSS] Nthseq issue
In-Reply-To: <C59A4B85.6A39%scott@cs.wits.ac.za>
References: <C59A4B85.6A39%scott@cs.wits.ac.za>
Message-ID: <49782FE6.40803@ebi.ac.uk>

Scott Hazelhurst wrote:
> 
> I don't know whether this is a bug or a feature, but I discovered  that
> nthseq skips empty sequences in its counting. So if you have 10 sequences
> and the  fifth is empty, then nthseq -number 6 actually returns the 7th
> sequence. It does print out a warning that the sequence is empty but not
> that its skipping (and also if you are putting this in a pipeline you
> wouldn't see it). I couldn't see any documentation on this.
> 
> I found this problem in a data set from some collaborators, we ran dust and
> then used biosed to remove Ns. Obviously this makes some sequences not
> usable. While it is understandable why nthseq behaves in the way it does,
> the problem is that in an automated set up it may be difficult do the
> adjustment.

We will, take a look. Zero length sequences are routinely ignored in 
EMBOSS. We will check whether it is possible to use an alternative method 
for counting in nthseq and any other application that counts input sequences.

Of course, if the nth sequence is empty nthseq would have to return a 
failure to read it.

regards,

Peter Rice


From jeedward at yahoo.com  Fri Jan 23 19:41:54 2009
From: jeedward at yahoo.com (John Edward)
Date: Fri, 23 Jan 2009 11:41:54 -0800 (PST)
Subject: [EMBOSS] Final call for papers: BCBGC-09
Message-ID: <326404.38388.qm@web45907.mail.sp1.yahoo.com>


Final call for papers: BCBGC-09
?
The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include:
????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) 
????????? International Conference on Automation, Robotics and Control Systems (ARCS-09)
????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09)
????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) 
????????? International Conference on Information Security and Privacy (ISP-09)
????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09)
????????? International Conference on Software Engineering Theory and Practice (SETP-09) 
????????? International Conference on Theory and Applications of Computational Science (TACS-09)
????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09)
?
The website http://www.PromoteResearch.org contains more details.
?
Sincerely
John Edward
Publicity committee
?
?
?


From georgios at biotek.uio.no  Wed Jan 28 11:00:14 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Wed, 28 Jan 2009 12:00:14 +0100
Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0
Message-ID: <49803ABE.6080809@biotek.uio.no>

Hi list,

We are still at emboss 5.0.0 (plus patches). We have a problem using 
seqret to parse normal IDs from a file that we cannot understand. Here 
is the story with details:

I have an .fna file from a 454 read in fasta format, that goes typically 
like this :

 >FLTU7OB01CIMST length=234 xy=0915_0859 region=1 run=R_2008_12_11_14_44_02_
TTTATTATTTAATCAATAATAAAGTGCTTTAGTCAAATCGTGATGTTTCAATTATTAACA
AGTTTATTATTTCTTCATTTTACCATAATACGCTTCAAAACGTCGATGAACATATGAATT
TGAGGGATTTTTGTAACCAGGTTTTATTTTTTAAAAATCATTAAAAAATGGTGAAGTTTC
TCGAATATCGTGTTCAAAATTCAATTCCGAAATAAGTCGCCCCTAATCTGATGA
 >FLTU7OB01DL726 length=211 xy=1366_0736 region=1 run=R_2008_12_11_14_44_02_
AAACAGATAGTCAGTATTGAATTACTTTATGTAGAGCCACAATTTAGAAACAGAGGTTTA
GCTACTATACTGAAGTGTGGTATTGAGACTTGGGCAAAAAGTATAAAAGCGAAACAAATC
ATTAGTACAGTACATAAAGACAACGTGACAATGATATCATTGAACAAGCGGTTAGGGTAT
CAATTAAGTCACGTGAAAATGTATAAAGATA
....

Length is:
cat 068_2023_454Reads.fna | grep ^">" | wc -l
288507

I convert this file to EMBL format using seqret and I get a properly 
formatted file with the same number of sequence entries:

cat staphyl68.dat | grep "^ID" | wc -l
288507

I now make a btree index of the id field with dbxflat:
$ dbxflat
Database b+tree indexing for flat file databases
Basename for index files: staphyl68
Resource name: staphyl68
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
    REFSEQ : Refseq
Entry format [SWISS]: EMBL
....
Index fields [id,acc]: id
Processing file ./staphyl68.dat

(resource records and db defs also OK)

That seems to produce the right number of files:

tjonasse at dias ~/mrsa/454/068_reads $ ls  staphyl68.*
staphyl68.dat  staphyl68.ent  staphyl68.pxid  staphyl68.xid

And here starts the problem: We have an input text file 'ahits' with 
sequence IDs per line:

FLTU7OB01AH8CG
FLTU7OB01ASKRR
FLTU7OB01AUXQJ
FLTU7OB01DSL0N
FLTU7OB01BB9NP

(no fancy control characters, checking with od:
0000000   F   L   T   U   7   O   B   0   1   A   H   8   C   G  \n   F
0000020   L   T   U   7   O   B   0   1   A   S   K   R   R  \n   F   L
0000040   T   U   7   )

We extract the  'ahits' sequences (1000 sequences) from the emboss 
database by doing simply:

for seq in `cat ahits`; do seqret -filter staphyl68-id:$seq; done > 
multifasta68.fasta

And that produces exactly a 1000 seq multifasta file.

Now, then, we have a second file called 'bhits' (697 sequences). This 
file has exactly the same format as 'ahits', but when we try to extract 
the identified sequences, we get the following:

for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done

Died: seqret terminated: Bad value for '-sequence' with -auto defined
'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO
(one error per sequence ID)

This is wrong. Why? I know that the seq identifiers of 'bhits' are in 
the original fna file, the .dat EMBL file and also on the *.xid entry:

cat 068_2023_454Reads.fna | grep FLTU7OB01AJHZO
 >FLTU7OB01AJHZO length=276 xy=0104_3906 region=1 run=R_2008_12_11_14_44_02_

cat staphyl68.dat | grep FLTU7OB01AJHZO
ID   FLTU7OB01AJHZO; SV 1; linear; unassigned DNA; STD; UNC; 276 BP.

strings staphyl68.xid | grep -i FLTU7OB01AJHZO
fltu7ob01ajhzo

In addition, if I try the single identifier on its own, it works:
seqret staphyl68-id:FLTU7OB01AJHZO
Reads and writes (returns) sequences
output sequence(s) [fltu7ob01ajhzo.fasta]:
cat fltu7ob01ajhzo.fasta
 >FLTU7OB01AJHZO FLTU7OB01AJHZO.1 length=276 xy=0104_3906 region=1 
run=R_2008_12_11_14_44_02_
TCGAATGATTAATCTTGAAAATAAAACCTTCGTAATTATGGGTATTGCTAATAAACGTAG
TATCGGATTTGGCGTTGCAAAGGTATTAGATCAATTAGGGGCTAAACTTGTTTTCACTTA
TCGTAAAGACCGTAGCCGCAAAGAATTAGAAAAATTATTAGAACAATTAAACCAAGAAGA
GCCAAAATTATATCAAATCGATGTTCAAAAAGATGAAGATGTAGTAAATGGTTTTGCTAA
AATTGGCGAAGAAGTAGGCAATATTGATGGCGTATA


so, my question is:

Why does the filter mode seqret invoked inside the for loop fails and 
this one works, and the problem does not exist for the 'afile' but only 
the 'bfile'?

Thanks for any answers.

GM


-- 
--
George Magklaras BSc Hons MPhil
RHCE:805008309135525

Senior Computer Systems Engineer/UNIX-Linux Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://folk.uio.no/georgios


From georgios at biotek.uio.no  Wed Jan 28 13:47:40 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Wed, 28 Jan 2009 14:47:40 +0100
Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version
 5.0.0
In-Reply-To: <49805B17.5010203@ebi.ac.uk>
References: <49803ABE.6080809@biotek.uio.no> <49805B17.5010203@ebi.ac.uk>
Message-ID: <498061FC.6010101@biotek.uio.no>

Hi Peter, thanks for your reply

Certainly:

1)For the failed run (for seq in `cat bhits`; do seqret -debug -filter 
staphyl68-id:$seq; done ) the seqret.dbg contains:

Debug file seqret.dbg buffered:No
ajAcdInitP pgm 'seqret' package ''
ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd'
EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd
closing file '/site/share/EMBOSS/acd/seqret.acd'
ajFileNewIn '/site/share/EMBOSS/acd/codes.english'
EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english
closing file '/site/share/EMBOSS/acd/codes.english'
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard'
EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard
closing file '/site/share/EMBOSS/acd/knowntypes.standard'
Set acdprotein value '$(sequence.protein)'
ajSeqinClear called
' 0..0(N) '' 0  'staphyl68-id:FLTU7OB01AHJ67
'SA to test: 'staphyl68-id:FLTU7OB01AHJ67

format regexp: No list:No
no format specified in USA

...input format not set
dbname dbexp: Yes
'ound dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67
' Field 'id'ng 'FLTU7OB01AHJ67
' acc '' sv '' gi '' des '' org '' key ''
no wildcard in stored qry
database type: 'N' format 'embl'
use access method 'emboss'
Matched seqAccess[1] 'emboss'
seqAccessEmboss type 1
' acc '' hasacc:Yess/u4/tjonasse/mrsa/454/068_reads/' entry 'fltu7ob01ahj67
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
EOF ajFileGetsL file 
/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
' acc: '' hasacc:Yesahj67
B+tree Entry failed
' not foundtry id:'fltu7ob01ahj67
seqEmbossQryClose clean up qryd
Database 'staphyl68' : access method 'emboss' failed

2)For the standalone successful run (seqret -debug 
staphyl68-id:FLTU7OB01AHJ67), seqret.dbg states:
Debug file seqret.dbg buffered:No
ajAcdInitP pgm 'seqret' package ''
ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd'
EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd
closing file '/site/share/EMBOSS/acd/seqret.acd'
ajFileNewIn '/site/share/EMBOSS/acd/codes.english'
EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english
closing file '/site/share/EMBOSS/acd/codes.english'
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard'
EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard
closing file '/site/share/EMBOSS/acd/knowntypes.standard'
Set acdprotein value '$(sequence.protein)'
ajSeqinClear called
++seqUsaProcess 'staphyl68-id:FLTU7OB01AHJ67' 0..0(N) '' 0
USA to test: 'staphyl68-id:FLTU7OB01AHJ67'

format regexp: No list:No
no format specified in USA

...input format not set
dbname dbexp: Yes
found dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67'
   db QryString 'FLTU7OB01AHJ67' Field 'id'
ajSeqQueryWild id 'FLTU7OB01AHJ67' acc '' sv '' gi '' des '' org '' key ''
no wildcard in stored qry
database type: 'N' format 'embl'
use access method 'emboss'
Matched seqAccess[1] 'emboss'
seqAccessEmboss type 1
directory '/div/dias/u4/tjonasse/mrsa/454/068_reads/' entry 
'fltu7ob01ahj67' acc '' hasacc:Yes
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
EOF ajFileGetsL file 
/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
entry id: 'fltu7ob01ahj67' acc: '' hasacc:Yes
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
seqEmbossQryClose clean up qryd
seqRead: cleared
seqRead: seqin format 3 'embl'
seqRead: one format specified
ajFileBuffNobuff /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat 
buffsize: 0
++seqRead known format 3
++seqReadFmt format 3 (embl) 'staphyl68-id:FLTU7OB01AHJ67' feat No
seqReadEmbl first line 'ID   FLTU7OB01AHJ67; SV 1; linear; unassigned 
DNA; STD; UNC; 184 BP.
'
seqReadEmbl ID line found
seqSetName word 'FLTU7OB01AHJ67'
seqSetName 'FLTU7OB01AHJ67' result: 'FLTU7OB01AHJ67'
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajFileBuffClear (0) Nobuff: Yes
size 0: Lines: 0 Curr: 0  Prev: 0 Last: 0 Free: 0 Freelast: 0
ajFileBuffClear 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (0 lines)
      Y size: 0 pos: 0 removed 0 lines add to free: 0
Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
              Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N
  Free: 0 Last: -1
seqReadFmt success with format 3 (embl)
seqQueryMatch 'FLTU7OB01AHJ67' id 'fltu7ob01ahj67' acc '' Sv '' Gi '' 
Des '' Key '' Org '' Case No Done Yes
seqTypeSet 'N'
ajSeqTypeCheckIn type 'gapany' found (any valid sequence with gaps)
Convert gaps to '-'
ajSeqTypeCheckIn: bad characters test passed, convert
Convert '?' to 'X'
ajSeqTypeCheckIn: OK - no badchars
seqDefine: thys->Db 'staphyl68', seqin->Db 'staphyl68'
seqDefine: thys->Name 'FLTU7OB01AHJ67' type: N
seqDefine: thys->Entryname 'FLTU7OB01AHJ67', seqin->Entryname ''
seqDefine: returns thys->Name 'FLTU7OB01AHJ67' type: N
++ajSeqallread set db: 'staphyl68' => 'staphyl68'
ajSeqallGetName ''
ajSeqIsNuc Type 'N'
ajSeqIsNuc Type 'N'
ajSeqIsProt Type 'N'
ajSeqallGetUsa 'staphyl68-id:FLTU7OB01AHJ67'
ajSeqallGetseqName 'FLTU7OB01AHJ67'
... output format not set, default to 'fasta'
ajSeqoutClear called
... output format not set, default to 'fasta'
ajSeqoutOpen dir '' qrydir ''
seqoutUsaProcess
output USA to test: 'fltu7ob01ahj67.fasta'

format regexp: No
no format specified in USA

file:id regexp: Yes
found filename fltu7ob01ahj67.fasta single: No dir: ''
ajFileNewOutD('' 'fltu7ob01ahj67.fasta')
ajFileNewOutD open name 'fltu7ob01ahj67.fasta'

ajSeqSetRange (len: 184 0..0 old 0..0) rev:No reversed:No
       result: (len: 184 0..0)
ajSeqoutWriteSeq 'FLTU7OB01AHJ67' len: 184
ajSeqoutWriteSeq 17 'fasta' single: No feat: No Save: No
seqClone out Setdb '' Db '' seq Setdb '<null>' Db 'staphyl68'
seqClone outseq->Type '' seq->Type 'N'
seqClone 0 .. 0 1 .. 184 len: 184 type: 'N'
   Db: 'staphyl68' Name: 'FLTU7OB01AHJ67' Entryname: 'FLTU7OB01AHJ67'
ajSeqTypeCheckS type 'gapany' found (any valid sequence with gaps)
Convert gaps to '-'
Convert '?' to 'X'
ajSeqoutSetNameDefaultS already has a name 'FLTU7OB01AHJ67'
seqWriteFasta outseq Db 'staphyl68' Setdb '' Setoutdb '' Name 
'FLTU7OB01AHJ67'
seqoutUfoLocal Features No Ufo 0 ''
ajSeqoutWriteSeq tests features No tabouitisopen No UfoLocal No ftlocal No
ajSeqRead: input file 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' still there, 
try again
seqRead: cleared
seqRead: single access - count 1 - call access routine again
seqAccessEmboss type 1
seqEmbossQryReuse: query data all finished
seqRead: seqin->Query->Access->Access(seqin) *failed*
ajSeqRead: open buffer  usa: 'staphyl68-id:FLTU7OB01AHJ67' returns: No
ajSeqallNext failed
ajSeqinClear called
ajFileBuffClear (-1) Nobuff: Yes
size 0: Lines: 0 Curr: 0  Prev: 0 Last: 0 Free: 0 Freelast: 0
ajFileBuffClear 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (-1 lines)
      Y size: 0 pos: 0 removed 0 lines add to free: 0
Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
              Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N
  Free: 0 Last: -1
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
ajSeqoutClose 'fltu7ob01ahj67.fasta'
closing file 'fltu7ob01ahj67.fasta'
ajSeqinDel called usa:''
ajSeqQueryDel db:'' id:''

Final Summary
=============

Table usage : 11 opened, 0 closed, 251 maxsize, 40 maxmem
List usage : 27 opened, 27 closed, 1438 maxsize 2380 nodes
List iterator usage : 4 opened, 4 closed
File usage : 1 opened, 9 closed, 3 max, 10 total
ajNamExit done
Regexp usage (bytes): 168 allocated, 1008 freed, -840 in use (sizes change)
Regexp usage (number): 21 allocated, 21 freed 0 in use
Array usage (bytes): 0 allocated, 0 freed, 0 in use
Array usage (number): 0 allocated, 0 freed, 0 resized, 0 in use
Array usage 2D (bytes): 0 allocated, 0 freed, 0 in use
Array usage 2D (number): 0 allocated, 0 freed, 0 resized, 0 in use
Array usage 3D (bytes): 0 allocated, 0 freed, 0 in use
Array usage 3D (number): 0 allocated, 0 freed, 0 resized, 0 in use
String usage (bytes): 268013 allocated, 268270 freed, -257 in use
String usage (number): 4982 allocated, 4979 freed 3 in use
Memory usage (bytes): 535329 allocated, 640 reallocated 503881 zeroed
Memory usage (number): 14393 allocates, 14405 frees, 10 resizes, -12 in use
closing file 'seqret.dbg'


3)The staphyl68.pxid file contains:
Order     60
Fill      42
Pagesize  2048
Level     2
Cachesize 200
Order2    82
Fill2     99
Count     288506
Kwlimit   15


In addition, the definition plus resource record I defined for the the 
staphyl68 database in my local .embossrc file is the following (which 
should accommodate for the length of the id field, shouldn't it?):

DB staphyl68 [
         type: N
         method: emboss
         format: embl
         fields: "id,des"
         file: staphyl68.dat
         indexdirectory: /div/dias/u4/tjonasse/mrsa/454/068_reads/
         comment: "mrsa staphyl68 reads"
]

RES staphyl68 [
    type: Index
    idlen:  20
    deslen: 50
]


Best regards,
GM


Peter Rice wrote:
> Hi George,
> 
>> Why does the filter mode seqret invoked inside the for loop fails and 
>> this one works, and the problem does not exist for the 'afile' but 
>> only the 'bfile'?
> 
> Can you add "-debug" to the seqret commandline and send me the
> seqret.dbg file (it will be for the last seqret run so you'll need some
> way to make sure the last run failed)
> 
> and also sent the seqret.dbg file for running seqret standalone with the
> same ID that worked.
> 
> It would also be useful to see the .pxid file for the staphyl68 database
> (it includes the length of ID that was indexed - your IDs are quite long
> for dbxflat)
> 
> regards,
> 
> Peter
> 

-- 


From georgios at biotek.uio.no  Wed Jan 28 14:15:43 2009
From: georgios at biotek.uio.no (George Magklaras)
Date: Wed, 28 Jan 2009 15:15:43 +0100
Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version
 5.0.0
In-Reply-To: <498063B0.1010207@ebi.ac.uk>
References: <49803ABE.6080809@biotek.uio.no> <498063B0.1010207@ebi.ac.uk>
Message-ID: <4980688F.2010209@biotek.uio.no>

Indeed there was an \r \n to blame. Didn't spot that with of, because it 
was only one instance at the beginning of the file and not on every 
line. dos2unix to the rescue and we are back in business.

Cheers Peter!

GM

Peter Rice wrote:
> Hi George,
>>  Now, then, we have a second file called 'bhits' (697 sequences). This
>> file has exactly the same format as 'ahits', but when we try to 
>> extract the identified sequences, we get the following:
>>
>> for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done
>>
>> Died: seqret terminated: Bad value for '-sequence' with -auto defined
>> 'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO
>> (one error per sequence ID)
> 
> Umm ... does the message really start with 'rror'?
> 
> That suggests some non-printing character is involved in the ID. Have 
> you checked bhits does not have any strange characters?
> 
> The error message should be:
> 
> Error: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO'
> 
> So something at the end of the ID seems to have moved the final quite to 
> the start of the line.
> 
> I can get the same effect by using noreturn -system pc to change the 
> carriage control characters in bhits.
> 
> I suspect that is the cause of your problem.
> 
> Let me know if that doesn't solve it.
> 
> regards,
> 
> Peter
> 


From staffa at niehs.nih.gov  Thu Jan 29 15:45:32 2009
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 29 Jan 2009 10:45:32 -0500
Subject: [EMBOSS] EMBOSS/Jemboss
In-Reply-To: <4980688F.2010209@biotek.uio.no>
Message-ID: <C5A7394C.CE99%staffa@niehs.nih.gov>

We are working hammer and tongs to make emboss and jemboss available
institute-wide as a substitute for the GCG package to be as much like SeqLab
as possible. 
Is there anyone there who has been successful creating a client-server
relationship with Jemboss on Mac OS X with a Unix server?