From charles-listes-emboss at plessy.org Sat Jan 10 00:29:46 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Sat, 10 Jan 2009 14:29:46 +0900 Subject: [EMBOSS] Please update the patch in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ Message-ID: <20090110052946.GA3077@kunpuu.plessy.org> Dear EMBOSS developers, I am using the patches in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ to produce up-to-date Debian packages. I noticed that there are fixes in the parent directory that are not present in the patch. Could you update it? Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From ajb at ebi.ac.uk Sat Jan 10 06:17:36 2009 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Sat, 10 Jan 2009 11:17:36 -0000 (GMT) Subject: [EMBOSS] Please update the patch in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ In-Reply-To: <20090110052946.GA3077@kunpuu.plessy.org> References: <20090110052946.GA3077@kunpuu.plessy.org> Message-ID: <50394.86.9.126.186.1231586256.squirrel@webmail.ebi.ac.uk> Hello Charles, The patch file is there now (a casualty of the holidays). It also corrects the copying of 4 data files to the installation directories (affecting featcopy, infobase, inforesidue and trimspace). Those Makefile changes are a little tricky to represent in the 'fixes' directory as some files have the same name. So, that's a work in progress. Alan > Dear EMBOSS developers, > > I am using the patches in > ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ > to produce up-to-date Debian packages. I noticed that there are fixes in > the > parent directory that are not present in the patch. Could you update it? > > Have a nice day, > > -- > Charles Plessy > Debian Med packaging team, > http://www.debian.org/devel/debian-med > Tsurumi, Kanagawa, Japan > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From jeedward at yahoo.com Fri Jan 16 16:57:55 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 16 Jan 2009 13:57:55 -0800 (PST) Subject: [EMBOSS] BCBGC-09 final call for papers Message-ID: <317305.17140.qm@web45906.mail.sp1.yahoo.com> BCBGC-09 final call for papers ? The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee ? From charles-listes-emboss at plessy.org Sun Jan 18 20:42:02 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 19 Jan 2009 10:42:02 +0900 Subject: [EMBOSS] jemboss In-Reply-To: <21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com> References: <21e884180808281625m6f6fde4ci2932ed82a202c642@mail.gmail.com> <55740.86.9.126.186.1219996803.squirrel@webmail.ebi.ac.uk> <20080829090845.GG15089@kunpuu.plessy.org> <21e884180808290821r4da88fc7p542568a6e3589760@mail.gmail.com> <20080830015014.GB19735@kunpuu.plessy.org> <21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com> Message-ID: <20090119014202.GB9537@kunpuu.plessy.org> Le Mon, Sep 01, 2008 at 12:04:28PM -0300, Beny Spira a ?crit : > > > I am not sure about which java is installed by default in Debian, as there > appears to be more than one (gcj, eclipse and now sun's jre and jdk). > Is there anything else that may be done to install Jemboss? Dear Beny, I prepared an experimental package for jEMBOSS that uses OpenJDK. It is available from the following URL: http://packages.debian.org/experimental/jemboss This package is not yet high quality; I welcome all comments to improve it. For the moment all that is done is to collect the files installed by 'make -C jemboss install' and to package them separately. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From scott at cs.wits.ac.za Mon Jan 19 08:23:49 2009 From: scott at cs.wits.ac.za (Scott Hazelhurst) Date: Mon, 19 Jan 2009 15:23:49 +0200 Subject: [EMBOSS] Nthseq issue Message-ID: I don't know whether this is a bug or a feature, but I discovered that nthseq skips empty sequences in its counting. So if you have 10 sequences and the fifth is empty, then nthseq -number 6 actually returns the 7th sequence. It does print out a warning that the sequence is empty but not that its skipping (and also if you are putting this in a pipeline you wouldn't see it). I couldn't see any documentation on this. I found this problem in a data set from some collaborators, we ran dust and then used biosed to remove Ns. Obviously this makes some sequences not usable. While it is understandable why nthseq behaves in the way it does, the problem is that in an automated set up it may be difficult do the adjustment. Regards Scott

This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary.

From pmr at ebi.ac.uk Thu Jan 22 03:35:50 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 22 Jan 2009 08:35:50 +0000 Subject: [EMBOSS] Nthseq issue In-Reply-To: References: Message-ID: <49782FE6.40803@ebi.ac.uk> Scott Hazelhurst wrote: > > I don't know whether this is a bug or a feature, but I discovered that > nthseq skips empty sequences in its counting. So if you have 10 sequences > and the fifth is empty, then nthseq -number 6 actually returns the 7th > sequence. It does print out a warning that the sequence is empty but not > that its skipping (and also if you are putting this in a pipeline you > wouldn't see it). I couldn't see any documentation on this. > > I found this problem in a data set from some collaborators, we ran dust and > then used biosed to remove Ns. Obviously this makes some sequences not > usable. While it is understandable why nthseq behaves in the way it does, > the problem is that in an automated set up it may be difficult do the > adjustment. We will, take a look. Zero length sequences are routinely ignored in EMBOSS. We will check whether it is possible to use an alternative method for counting in nthseq and any other application that counts input sequences. Of course, if the nth sequence is empty nthseq would have to return a failure to read it. regards, Peter Rice From jeedward at yahoo.com Fri Jan 23 14:41:54 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 23 Jan 2009 11:41:54 -0800 (PST) Subject: [EMBOSS] Final call for papers: BCBGC-09 Message-ID: <326404.38388.qm@web45907.mail.sp1.yahoo.com> Final call for papers: BCBGC-09 ? The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee ? ? ? From georgios at biotek.uio.no Wed Jan 28 06:00:14 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Wed, 28 Jan 2009 12:00:14 +0100 Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0 Message-ID: <49803ABE.6080809@biotek.uio.no> Hi list, We are still at emboss 5.0.0 (plus patches). We have a problem using seqret to parse normal IDs from a file that we cannot understand. Here is the story with details: I have an .fna file from a 454 read in fasta format, that goes typically like this : >FLTU7OB01CIMST length=234 xy=0915_0859 region=1 run=R_2008_12_11_14_44_02_ TTTATTATTTAATCAATAATAAAGTGCTTTAGTCAAATCGTGATGTTTCAATTATTAACA AGTTTATTATTTCTTCATTTTACCATAATACGCTTCAAAACGTCGATGAACATATGAATT TGAGGGATTTTTGTAACCAGGTTTTATTTTTTAAAAATCATTAAAAAATGGTGAAGTTTC TCGAATATCGTGTTCAAAATTCAATTCCGAAATAAGTCGCCCCTAATCTGATGA >FLTU7OB01DL726 length=211 xy=1366_0736 region=1 run=R_2008_12_11_14_44_02_ AAACAGATAGTCAGTATTGAATTACTTTATGTAGAGCCACAATTTAGAAACAGAGGTTTA GCTACTATACTGAAGTGTGGTATTGAGACTTGGGCAAAAAGTATAAAAGCGAAACAAATC ATTAGTACAGTACATAAAGACAACGTGACAATGATATCATTGAACAAGCGGTTAGGGTAT CAATTAAGTCACGTGAAAATGTATAAAGATA .... Length is: cat 068_2023_454Reads.fna | grep ^">" | wc -l 288507 I convert this file to EMBL format using seqret and I get a properly formatted file with the same number of sequence entries: cat staphyl68.dat | grep "^ID" | wc -l 288507 I now make a btree index of the id field with dbxflat: $ dbxflat Database b+tree indexing for flat file databases Basename for index files: staphyl68 Resource name: staphyl68 EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: EMBL .... Index fields [id,acc]: id Processing file ./staphyl68.dat (resource records and db defs also OK) That seems to produce the right number of files: tjonasse at dias ~/mrsa/454/068_reads $ ls staphyl68.* staphyl68.dat staphyl68.ent staphyl68.pxid staphyl68.xid And here starts the problem: We have an input text file 'ahits' with sequence IDs per line: FLTU7OB01AH8CG FLTU7OB01ASKRR FLTU7OB01AUXQJ FLTU7OB01DSL0N FLTU7OB01BB9NP (no fancy control characters, checking with od: 0000000 F L T U 7 O B 0 1 A H 8 C G \n F 0000020 L T U 7 O B 0 1 A S K R R \n F L 0000040 T U 7 ) We extract the 'ahits' sequences (1000 sequences) from the emboss database by doing simply: for seq in `cat ahits`; do seqret -filter staphyl68-id:$seq; done > multifasta68.fasta And that produces exactly a 1000 seq multifasta file. Now, then, we have a second file called 'bhits' (697 sequences). This file has exactly the same format as 'ahits', but when we try to extract the identified sequences, we get the following: for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done Died: seqret terminated: Bad value for '-sequence' with -auto defined 'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO (one error per sequence ID) This is wrong. Why? I know that the seq identifiers of 'bhits' are in the original fna file, the .dat EMBL file and also on the *.xid entry: cat 068_2023_454Reads.fna | grep FLTU7OB01AJHZO >FLTU7OB01AJHZO length=276 xy=0104_3906 region=1 run=R_2008_12_11_14_44_02_ cat staphyl68.dat | grep FLTU7OB01AJHZO ID FLTU7OB01AJHZO; SV 1; linear; unassigned DNA; STD; UNC; 276 BP. strings staphyl68.xid | grep -i FLTU7OB01AJHZO fltu7ob01ajhzo In addition, if I try the single identifier on its own, it works: seqret staphyl68-id:FLTU7OB01AJHZO Reads and writes (returns) sequences output sequence(s) [fltu7ob01ajhzo.fasta]: cat fltu7ob01ajhzo.fasta >FLTU7OB01AJHZO FLTU7OB01AJHZO.1 length=276 xy=0104_3906 region=1 run=R_2008_12_11_14_44_02_ TCGAATGATTAATCTTGAAAATAAAACCTTCGTAATTATGGGTATTGCTAATAAACGTAG TATCGGATTTGGCGTTGCAAAGGTATTAGATCAATTAGGGGCTAAACTTGTTTTCACTTA TCGTAAAGACCGTAGCCGCAAAGAATTAGAAAAATTATTAGAACAATTAAACCAAGAAGA GCCAAAATTATATCAAATCGATGTTCAAAAAGATGAAGATGTAGTAAATGGTTTTGCTAA AATTGGCGAAGAAGTAGGCAATATTGATGGCGTATA so, my question is: Why does the filter mode seqret invoked inside the for loop fails and this one works, and the problem does not exist for the 'afile' but only the 'bfile'? Thanks for any answers. GM -- -- George Magklaras BSc Hons MPhil RHCE:805008309135525 Senior Computer Systems Engineer/UNIX-Linux Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://folk.uio.no/georgios From georgios at biotek.uio.no Wed Jan 28 08:47:40 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Wed, 28 Jan 2009 14:47:40 +0100 Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0 In-Reply-To: <49805B17.5010203@ebi.ac.uk> References: <49803ABE.6080809@biotek.uio.no> <49805B17.5010203@ebi.ac.uk> Message-ID: <498061FC.6010101@biotek.uio.no> Hi Peter, thanks for your reply Certainly: 1)For the failed run (for seq in `cat bhits`; do seqret -debug -filter staphyl68-id:$seq; done ) the seqret.dbg contains: Debug file seqret.dbg buffered:No ajAcdInitP pgm 'seqret' package '' ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd' EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd closing file '/site/share/EMBOSS/acd/seqret.acd' ajFileNewIn '/site/share/EMBOSS/acd/codes.english' EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english closing file '/site/share/EMBOSS/acd/codes.english' ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard' EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard closing file '/site/share/EMBOSS/acd/knowntypes.standard' Set acdprotein value '$(sequence.protein)' ajSeqinClear called ' 0..0(N) '' 0 'staphyl68-id:FLTU7OB01AHJ67 'SA to test: 'staphyl68-id:FLTU7OB01AHJ67 format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes 'ound dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67 ' Field 'id'ng 'FLTU7OB01AHJ67 ' acc '' sv '' gi '' des '' org '' key '' no wildcard in stored qry database type: 'N' format 'embl' use access method 'emboss' Matched seqAccess[1] 'emboss' seqAccessEmboss type 1 ' acc '' hasacc:Yess/u4/tjonasse/mrsa/454/068_reads/' entry 'fltu7ob01ahj67 ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' ' acc: '' hasacc:Yesahj67 B+tree Entry failed ' not foundtry id:'fltu7ob01ahj67 seqEmbossQryClose clean up qryd Database 'staphyl68' : access method 'emboss' failed 2)For the standalone successful run (seqret -debug staphyl68-id:FLTU7OB01AHJ67), seqret.dbg states: Debug file seqret.dbg buffered:No ajAcdInitP pgm 'seqret' package '' ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd' EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd closing file '/site/share/EMBOSS/acd/seqret.acd' ajFileNewIn '/site/share/EMBOSS/acd/codes.english' EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english closing file '/site/share/EMBOSS/acd/codes.english' ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard' EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard closing file '/site/share/EMBOSS/acd/knowntypes.standard' Set acdprotein value '$(sequence.protein)' ajSeqinClear called ++seqUsaProcess 'staphyl68-id:FLTU7OB01AHJ67' 0..0(N) '' 0 USA to test: 'staphyl68-id:FLTU7OB01AHJ67' format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes found dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67' db QryString 'FLTU7OB01AHJ67' Field 'id' ajSeqQueryWild id 'FLTU7OB01AHJ67' acc '' sv '' gi '' des '' org '' key '' no wildcard in stored qry database type: 'N' format 'embl' use access method 'emboss' Matched seqAccess[1] 'emboss' seqAccessEmboss type 1 directory '/div/dias/u4/tjonasse/mrsa/454/068_reads/' entry 'fltu7ob01ahj67' acc '' hasacc:Yes ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' entry id: 'fltu7ob01ahj67' acc: '' hasacc:Yes ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' seqEmbossQryClose clean up qryd seqRead: cleared seqRead: seqin format 3 'embl' seqRead: one format specified ajFileBuffNobuff /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat buffsize: 0 ++seqRead known format 3 ++seqReadFmt format 3 (embl) 'staphyl68-id:FLTU7OB01AHJ67' feat No seqReadEmbl first line 'ID FLTU7OB01AHJ67; SV 1; linear; unassigned DNA; STD; UNC; 184 BP. ' seqReadEmbl ID line found seqSetName word 'FLTU7OB01AHJ67' seqSetName 'FLTU7OB01AHJ67' result: 'FLTU7OB01AHJ67' ajTableNewFunctionLen hint 4 size 251 ajTableNewFunctionLen hint 4 size 251 ajTableNewFunctionLen hint 4 size 251 ajTableNewFunctionLen hint 4 size 251 ajFileBuffClear (0) Nobuff: Yes size 0: Lines: 0 Curr: 0 Prev: 0 Last: 0 Free: 0 Freelast: 0 ajFileBuffClear '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (0 lines) Y size: 0 pos: 0 removed 0 lines add to free: 0 Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N Free: 0 Last: -1 seqReadFmt success with format 3 (embl) seqQueryMatch 'FLTU7OB01AHJ67' id 'fltu7ob01ahj67' acc '' Sv '' Gi '' Des '' Key '' Org '' Case No Done Yes seqTypeSet 'N' ajSeqTypeCheckIn type 'gapany' found (any valid sequence with gaps) Convert gaps to '-' ajSeqTypeCheckIn: bad characters test passed, convert Convert '?' to 'X' ajSeqTypeCheckIn: OK - no badchars seqDefine: thys->Db 'staphyl68', seqin->Db 'staphyl68' seqDefine: thys->Name 'FLTU7OB01AHJ67' type: N seqDefine: thys->Entryname 'FLTU7OB01AHJ67', seqin->Entryname '' seqDefine: returns thys->Name 'FLTU7OB01AHJ67' type: N ++ajSeqallread set db: 'staphyl68' => 'staphyl68' ajSeqallGetName '' ajSeqIsNuc Type 'N' ajSeqIsNuc Type 'N' ajSeqIsProt Type 'N' ajSeqallGetUsa 'staphyl68-id:FLTU7OB01AHJ67' ajSeqallGetseqName 'FLTU7OB01AHJ67' ... output format not set, default to 'fasta' ajSeqoutClear called ... output format not set, default to 'fasta' ajSeqoutOpen dir '' qrydir '' seqoutUsaProcess output USA to test: 'fltu7ob01ahj67.fasta' format regexp: No no format specified in USA file:id regexp: Yes found filename fltu7ob01ahj67.fasta single: No dir: '' ajFileNewOutD('' 'fltu7ob01ahj67.fasta') ajFileNewOutD open name 'fltu7ob01ahj67.fasta' ajSeqSetRange (len: 184 0..0 old 0..0) rev:No reversed:No result: (len: 184 0..0) ajSeqoutWriteSeq 'FLTU7OB01AHJ67' len: 184 ajSeqoutWriteSeq 17 'fasta' single: No feat: No Save: No seqClone out Setdb '' Db '' seq Setdb '' Db 'staphyl68' seqClone outseq->Type '' seq->Type 'N' seqClone 0 .. 0 1 .. 184 len: 184 type: 'N' Db: 'staphyl68' Name: 'FLTU7OB01AHJ67' Entryname: 'FLTU7OB01AHJ67' ajSeqTypeCheckS type 'gapany' found (any valid sequence with gaps) Convert gaps to '-' Convert '?' to 'X' ajSeqoutSetNameDefaultS already has a name 'FLTU7OB01AHJ67' seqWriteFasta outseq Db 'staphyl68' Setdb '' Setoutdb '' Name 'FLTU7OB01AHJ67' seqoutUfoLocal Features No Ufo 0 '' ajSeqoutWriteSeq tests features No tabouitisopen No UfoLocal No ftlocal No ajSeqRead: input file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' still there, try again seqRead: cleared seqRead: single access - count 1 - call access routine again seqAccessEmboss type 1 seqEmbossQryReuse: query data all finished seqRead: seqin->Query->Access->Access(seqin) *failed* ajSeqRead: open buffer usa: 'staphyl68-id:FLTU7OB01AHJ67' returns: No ajSeqallNext failed ajSeqinClear called ajFileBuffClear (-1) Nobuff: Yes size 0: Lines: 0 Curr: 0 Prev: 0 Last: 0 Free: 0 Freelast: 0 ajFileBuffClear '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (-1 lines) Y size: 0 pos: 0 removed 0 lines add to free: 0 Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N Free: 0 Last: -1 closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' ajSeqoutClose 'fltu7ob01ahj67.fasta' closing file 'fltu7ob01ahj67.fasta' ajSeqinDel called usa:'' ajSeqQueryDel db:'' id:'' Final Summary ============= Table usage : 11 opened, 0 closed, 251 maxsize, 40 maxmem List usage : 27 opened, 27 closed, 1438 maxsize 2380 nodes List iterator usage : 4 opened, 4 closed File usage : 1 opened, 9 closed, 3 max, 10 total ajNamExit done Regexp usage (bytes): 168 allocated, 1008 freed, -840 in use (sizes change) Regexp usage (number): 21 allocated, 21 freed 0 in use Array usage (bytes): 0 allocated, 0 freed, 0 in use Array usage (number): 0 allocated, 0 freed, 0 resized, 0 in use Array usage 2D (bytes): 0 allocated, 0 freed, 0 in use Array usage 2D (number): 0 allocated, 0 freed, 0 resized, 0 in use Array usage 3D (bytes): 0 allocated, 0 freed, 0 in use Array usage 3D (number): 0 allocated, 0 freed, 0 resized, 0 in use String usage (bytes): 268013 allocated, 268270 freed, -257 in use String usage (number): 4982 allocated, 4979 freed 3 in use Memory usage (bytes): 535329 allocated, 640 reallocated 503881 zeroed Memory usage (number): 14393 allocates, 14405 frees, 10 resizes, -12 in use closing file 'seqret.dbg' 3)The staphyl68.pxid file contains: Order 60 Fill 42 Pagesize 2048 Level 2 Cachesize 200 Order2 82 Fill2 99 Count 288506 Kwlimit 15 In addition, the definition plus resource record I defined for the the staphyl68 database in my local .embossrc file is the following (which should accommodate for the length of the id field, shouldn't it?): DB staphyl68 [ type: N method: emboss format: embl fields: "id,des" file: staphyl68.dat indexdirectory: /div/dias/u4/tjonasse/mrsa/454/068_reads/ comment: "mrsa staphyl68 reads" ] RES staphyl68 [ type: Index idlen: 20 deslen: 50 ] Best regards, GM Peter Rice wrote: > Hi George, > >> Why does the filter mode seqret invoked inside the for loop fails and >> this one works, and the problem does not exist for the 'afile' but >> only the 'bfile'? > > Can you add "-debug" to the seqret commandline and send me the > seqret.dbg file (it will be for the last seqret run so you'll need some > way to make sure the last run failed) > > and also sent the seqret.dbg file for running seqret standalone with the > same ID that worked. > > It would also be useful to see the .pxid file for the staphyl68 database > (it includes the length of ID that was indexed - your IDs are quite long > for dbxflat) > > regards, > > Peter > -- From georgios at biotek.uio.no Wed Jan 28 09:15:43 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Wed, 28 Jan 2009 15:15:43 +0100 Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0 In-Reply-To: <498063B0.1010207@ebi.ac.uk> References: <49803ABE.6080809@biotek.uio.no> <498063B0.1010207@ebi.ac.uk> Message-ID: <4980688F.2010209@biotek.uio.no> Indeed there was an \r \n to blame. Didn't spot that with of, because it was only one instance at the beginning of the file and not on every line. dos2unix to the rescue and we are back in business. Cheers Peter! GM Peter Rice wrote: > Hi George, >> Now, then, we have a second file called 'bhits' (697 sequences). This >> file has exactly the same format as 'ahits', but when we try to >> extract the identified sequences, we get the following: >> >> for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done >> >> Died: seqret terminated: Bad value for '-sequence' with -auto defined >> 'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO >> (one error per sequence ID) > > Umm ... does the message really start with 'rror'? > > That suggests some non-printing character is involved in the ID. Have > you checked bhits does not have any strange characters? > > The error message should be: > > Error: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO' > > So something at the end of the ID seems to have moved the final quite to > the start of the line. > > I can get the same effect by using noreturn -system pc to change the > carriage control characters in bhits. > > I suspect that is the cause of your problem. > > Let me know if that doesn't solve it. > > regards, > > Peter > From staffa at niehs.nih.gov Thu Jan 29 10:45:32 2009 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Thu, 29 Jan 2009 10:45:32 -0500 Subject: [EMBOSS] EMBOSS/Jemboss In-Reply-To: <4980688F.2010209@biotek.uio.no> Message-ID: We are working hammer and tongs to make emboss and jemboss available institute-wide as a substitute for the GCG package to be as much like SeqLab as possible. Is there anyone there who has been successful creating a client-server relationship with Jemboss on Mac OS X with a Unix server? From charles-listes-emboss at plessy.org Sat Jan 10 05:29:46 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Sat, 10 Jan 2009 14:29:46 +0900 Subject: [EMBOSS] Please update the patch in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ Message-ID: <20090110052946.GA3077@kunpuu.plessy.org> Dear EMBOSS developers, I am using the patches in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ to produce up-to-date Debian packages. I noticed that there are fixes in the parent directory that are not present in the patch. Could you update it? Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From ajb at ebi.ac.uk Sat Jan 10 11:17:36 2009 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Sat, 10 Jan 2009 11:17:36 -0000 (GMT) Subject: [EMBOSS] Please update the patch in ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ In-Reply-To: <20090110052946.GA3077@kunpuu.plessy.org> References: <20090110052946.GA3077@kunpuu.plessy.org> Message-ID: <50394.86.9.126.186.1231586256.squirrel@webmail.ebi.ac.uk> Hello Charles, The patch file is there now (a casualty of the holidays). It also corrects the copying of 4 data files to the installation directories (affecting featcopy, infobase, inforesidue and trimspace). Those Makefile changes are a little tricky to represent in the 'fixes' directory as some files have the same name. So, that's a work in progress. Alan > Dear EMBOSS developers, > > I am using the patches in > ftp://emboss.open-bio.org/pub/EMBOSS/fixes/patches/ > to produce up-to-date Debian packages. I noticed that there are fixes in > the > parent directory that are not present in the patch. Could you update it? > > Have a nice day, > > -- > Charles Plessy > Debian Med packaging team, > http://www.debian.org/devel/debian-med > Tsurumi, Kanagawa, Japan > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From jeedward at yahoo.com Fri Jan 16 21:57:55 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 16 Jan 2009 13:57:55 -0800 (PST) Subject: [EMBOSS] BCBGC-09 final call for papers Message-ID: <317305.17140.qm@web45906.mail.sp1.yahoo.com> BCBGC-09 final call for papers ? The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee ? From charles-listes-emboss at plessy.org Mon Jan 19 01:42:02 2009 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 19 Jan 2009 10:42:02 +0900 Subject: [EMBOSS] jemboss In-Reply-To: <21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com> References: <21e884180808281625m6f6fde4ci2932ed82a202c642@mail.gmail.com> <55740.86.9.126.186.1219996803.squirrel@webmail.ebi.ac.uk> <20080829090845.GG15089@kunpuu.plessy.org> <21e884180808290821r4da88fc7p542568a6e3589760@mail.gmail.com> <20080830015014.GB19735@kunpuu.plessy.org> <21e884180809010804p34882dc2g9b0097162ff68f2e@mail.gmail.com> Message-ID: <20090119014202.GB9537@kunpuu.plessy.org> Le Mon, Sep 01, 2008 at 12:04:28PM -0300, Beny Spira a ?crit : > > > I am not sure about which java is installed by default in Debian, as there > appears to be more than one (gcj, eclipse and now sun's jre and jdk). > Is there anything else that may be done to install Jemboss? Dear Beny, I prepared an experimental package for jEMBOSS that uses OpenJDK. It is available from the following URL: http://packages.debian.org/experimental/jemboss This package is not yet high quality; I welcome all comments to improve it. For the moment all that is done is to collect the files installed by 'make -C jemboss install' and to package them separately. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From scott at cs.wits.ac.za Mon Jan 19 13:23:49 2009 From: scott at cs.wits.ac.za (Scott Hazelhurst) Date: Mon, 19 Jan 2009 15:23:49 +0200 Subject: [EMBOSS] Nthseq issue Message-ID: I don't know whether this is a bug or a feature, but I discovered that nthseq skips empty sequences in its counting. So if you have 10 sequences and the fifth is empty, then nthseq -number 6 actually returns the 7th sequence. It does print out a warning that the sequence is empty but not that its skipping (and also if you are putting this in a pipeline you wouldn't see it). I couldn't see any documentation on this. I found this problem in a data set from some collaborators, we ran dust and then used biosed to remove Ns. Obviously this makes some sequences not usable. While it is understandable why nthseq behaves in the way it does, the problem is that in an automated set up it may be difficult do the adjustment. Regards Scott

This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary.

From pmr at ebi.ac.uk Thu Jan 22 08:35:50 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 22 Jan 2009 08:35:50 +0000 Subject: [EMBOSS] Nthseq issue In-Reply-To: References: Message-ID: <49782FE6.40803@ebi.ac.uk> Scott Hazelhurst wrote: > > I don't know whether this is a bug or a feature, but I discovered that > nthseq skips empty sequences in its counting. So if you have 10 sequences > and the fifth is empty, then nthseq -number 6 actually returns the 7th > sequence. It does print out a warning that the sequence is empty but not > that its skipping (and also if you are putting this in a pipeline you > wouldn't see it). I couldn't see any documentation on this. > > I found this problem in a data set from some collaborators, we ran dust and > then used biosed to remove Ns. Obviously this makes some sequences not > usable. While it is understandable why nthseq behaves in the way it does, > the problem is that in an automated set up it may be difficult do the > adjustment. We will, take a look. Zero length sequences are routinely ignored in EMBOSS. We will check whether it is possible to use an alternative method for counting in nthseq and any other application that counts input sequences. Of course, if the nth sequence is empty nthseq would have to return a failure to read it. regards, Peter Rice From jeedward at yahoo.com Fri Jan 23 19:41:54 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 23 Jan 2009 11:41:54 -0800 (PST) Subject: [EMBOSS] Final call for papers: BCBGC-09 Message-ID: <326404.38388.qm@web45907.mail.sp1.yahoo.com> Final call for papers: BCBGC-09 ? The 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee ? ? ? From georgios at biotek.uio.no Wed Jan 28 11:00:14 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Wed, 28 Jan 2009 12:00:14 +0100 Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0 Message-ID: <49803ABE.6080809@biotek.uio.no> Hi list, We are still at emboss 5.0.0 (plus patches). We have a problem using seqret to parse normal IDs from a file that we cannot understand. Here is the story with details: I have an .fna file from a 454 read in fasta format, that goes typically like this : >FLTU7OB01CIMST length=234 xy=0915_0859 region=1 run=R_2008_12_11_14_44_02_ TTTATTATTTAATCAATAATAAAGTGCTTTAGTCAAATCGTGATGTTTCAATTATTAACA AGTTTATTATTTCTTCATTTTACCATAATACGCTTCAAAACGTCGATGAACATATGAATT TGAGGGATTTTTGTAACCAGGTTTTATTTTTTAAAAATCATTAAAAAATGGTGAAGTTTC TCGAATATCGTGTTCAAAATTCAATTCCGAAATAAGTCGCCCCTAATCTGATGA >FLTU7OB01DL726 length=211 xy=1366_0736 region=1 run=R_2008_12_11_14_44_02_ AAACAGATAGTCAGTATTGAATTACTTTATGTAGAGCCACAATTTAGAAACAGAGGTTTA GCTACTATACTGAAGTGTGGTATTGAGACTTGGGCAAAAAGTATAAAAGCGAAACAAATC ATTAGTACAGTACATAAAGACAACGTGACAATGATATCATTGAACAAGCGGTTAGGGTAT CAATTAAGTCACGTGAAAATGTATAAAGATA .... Length is: cat 068_2023_454Reads.fna | grep ^">" | wc -l 288507 I convert this file to EMBL format using seqret and I get a properly formatted file with the same number of sequence entries: cat staphyl68.dat | grep "^ID" | wc -l 288507 I now make a btree index of the id field with dbxflat: $ dbxflat Database b+tree indexing for flat file databases Basename for index files: staphyl68 Resource name: staphyl68 EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GB : Genbank, DDBJ REFSEQ : Refseq Entry format [SWISS]: EMBL .... Index fields [id,acc]: id Processing file ./staphyl68.dat (resource records and db defs also OK) That seems to produce the right number of files: tjonasse at dias ~/mrsa/454/068_reads $ ls staphyl68.* staphyl68.dat staphyl68.ent staphyl68.pxid staphyl68.xid And here starts the problem: We have an input text file 'ahits' with sequence IDs per line: FLTU7OB01AH8CG FLTU7OB01ASKRR FLTU7OB01AUXQJ FLTU7OB01DSL0N FLTU7OB01BB9NP (no fancy control characters, checking with od: 0000000 F L T U 7 O B 0 1 A H 8 C G \n F 0000020 L T U 7 O B 0 1 A S K R R \n F L 0000040 T U 7 ) We extract the 'ahits' sequences (1000 sequences) from the emboss database by doing simply: for seq in `cat ahits`; do seqret -filter staphyl68-id:$seq; done > multifasta68.fasta And that produces exactly a 1000 seq multifasta file. Now, then, we have a second file called 'bhits' (697 sequences). This file has exactly the same format as 'ahits', but when we try to extract the identified sequences, we get the following: for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done Died: seqret terminated: Bad value for '-sequence' with -auto defined 'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO (one error per sequence ID) This is wrong. Why? I know that the seq identifiers of 'bhits' are in the original fna file, the .dat EMBL file and also on the *.xid entry: cat 068_2023_454Reads.fna | grep FLTU7OB01AJHZO >FLTU7OB01AJHZO length=276 xy=0104_3906 region=1 run=R_2008_12_11_14_44_02_ cat staphyl68.dat | grep FLTU7OB01AJHZO ID FLTU7OB01AJHZO; SV 1; linear; unassigned DNA; STD; UNC; 276 BP. strings staphyl68.xid | grep -i FLTU7OB01AJHZO fltu7ob01ajhzo In addition, if I try the single identifier on its own, it works: seqret staphyl68-id:FLTU7OB01AJHZO Reads and writes (returns) sequences output sequence(s) [fltu7ob01ajhzo.fasta]: cat fltu7ob01ajhzo.fasta >FLTU7OB01AJHZO FLTU7OB01AJHZO.1 length=276 xy=0104_3906 region=1 run=R_2008_12_11_14_44_02_ TCGAATGATTAATCTTGAAAATAAAACCTTCGTAATTATGGGTATTGCTAATAAACGTAG TATCGGATTTGGCGTTGCAAAGGTATTAGATCAATTAGGGGCTAAACTTGTTTTCACTTA TCGTAAAGACCGTAGCCGCAAAGAATTAGAAAAATTATTAGAACAATTAAACCAAGAAGA GCCAAAATTATATCAAATCGATGTTCAAAAAGATGAAGATGTAGTAAATGGTTTTGCTAA AATTGGCGAAGAAGTAGGCAATATTGATGGCGTATA so, my question is: Why does the filter mode seqret invoked inside the for loop fails and this one works, and the problem does not exist for the 'afile' but only the 'bfile'? Thanks for any answers. GM -- -- George Magklaras BSc Hons MPhil RHCE:805008309135525 Senior Computer Systems Engineer/UNIX-Linux Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://folk.uio.no/georgios From georgios at biotek.uio.no Wed Jan 28 13:47:40 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Wed, 28 Jan 2009 14:47:40 +0100 Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0 In-Reply-To: <49805B17.5010203@ebi.ac.uk> References: <49803ABE.6080809@biotek.uio.no> <49805B17.5010203@ebi.ac.uk> Message-ID: <498061FC.6010101@biotek.uio.no> Hi Peter, thanks for your reply Certainly: 1)For the failed run (for seq in `cat bhits`; do seqret -debug -filter staphyl68-id:$seq; done ) the seqret.dbg contains: Debug file seqret.dbg buffered:No ajAcdInitP pgm 'seqret' package '' ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd' EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd closing file '/site/share/EMBOSS/acd/seqret.acd' ajFileNewIn '/site/share/EMBOSS/acd/codes.english' EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english closing file '/site/share/EMBOSS/acd/codes.english' ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard' EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard closing file '/site/share/EMBOSS/acd/knowntypes.standard' Set acdprotein value '$(sequence.protein)' ajSeqinClear called ' 0..0(N) '' 0 'staphyl68-id:FLTU7OB01AHJ67 'SA to test: 'staphyl68-id:FLTU7OB01AHJ67 format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes 'ound dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67 ' Field 'id'ng 'FLTU7OB01AHJ67 ' acc '' sv '' gi '' des '' org '' key '' no wildcard in stored qry database type: 'N' format 'embl' use access method 'emboss' Matched seqAccess[1] 'emboss' seqAccessEmboss type 1 ' acc '' hasacc:Yess/u4/tjonasse/mrsa/454/068_reads/' entry 'fltu7ob01ahj67 ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' ' acc: '' hasacc:Yesahj67 B+tree Entry failed ' not foundtry id:'fltu7ob01ahj67 seqEmbossQryClose clean up qryd Database 'staphyl68' : access method 'emboss' failed 2)For the standalone successful run (seqret -debug staphyl68-id:FLTU7OB01AHJ67), seqret.dbg states: Debug file seqret.dbg buffered:No ajAcdInitP pgm 'seqret' package '' ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd' EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd closing file '/site/share/EMBOSS/acd/seqret.acd' ajFileNewIn '/site/share/EMBOSS/acd/codes.english' EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english closing file '/site/share/EMBOSS/acd/codes.english' ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajTableNewFunctionLen hint 25 size 251 ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard' EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard closing file '/site/share/EMBOSS/acd/knowntypes.standard' Set acdprotein value '$(sequence.protein)' ajSeqinClear called ++seqUsaProcess 'staphyl68-id:FLTU7OB01AHJ67' 0..0(N) '' 0 USA to test: 'staphyl68-id:FLTU7OB01AHJ67' format regexp: No list:No no format specified in USA ...input format not set dbname dbexp: Yes found dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67' db QryString 'FLTU7OB01AHJ67' Field 'id' ajSeqQueryWild id 'FLTU7OB01AHJ67' acc '' sv '' gi '' des '' org '' key '' no wildcard in stored qry database type: 'N' format 'embl' use access method 'emboss' Matched seqAccess[1] 'emboss' seqAccessEmboss type 1 directory '/div/dias/u4/tjonasse/mrsa/454/068_reads/' entry 'fltu7ob01ahj67' acc '' hasacc:Yes ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid' ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent' entry id: 'fltu7ob01ahj67' acc: '' hasacc:Yes ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' seqEmbossQryClose clean up qryd seqRead: cleared seqRead: seqin format 3 'embl' seqRead: one format specified ajFileBuffNobuff /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat buffsize: 0 ++seqRead known format 3 ++seqReadFmt format 3 (embl) 'staphyl68-id:FLTU7OB01AHJ67' feat No seqReadEmbl first line 'ID FLTU7OB01AHJ67; SV 1; linear; unassigned DNA; STD; UNC; 184 BP. ' seqReadEmbl ID line found seqSetName word 'FLTU7OB01AHJ67' seqSetName 'FLTU7OB01AHJ67' result: 'FLTU7OB01AHJ67' ajTableNewFunctionLen hint 4 size 251 ajTableNewFunctionLen hint 4 size 251 ajTableNewFunctionLen hint 4 size 251 ajTableNewFunctionLen hint 4 size 251 ajFileBuffClear (0) Nobuff: Yes size 0: Lines: 0 Curr: 0 Prev: 0 Last: 0 Free: 0 Freelast: 0 ajFileBuffClear '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (0 lines) Y size: 0 pos: 0 removed 0 lines add to free: 0 Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N Free: 0 Last: -1 seqReadFmt success with format 3 (embl) seqQueryMatch 'FLTU7OB01AHJ67' id 'fltu7ob01ahj67' acc '' Sv '' Gi '' Des '' Key '' Org '' Case No Done Yes seqTypeSet 'N' ajSeqTypeCheckIn type 'gapany' found (any valid sequence with gaps) Convert gaps to '-' ajSeqTypeCheckIn: bad characters test passed, convert Convert '?' to 'X' ajSeqTypeCheckIn: OK - no badchars seqDefine: thys->Db 'staphyl68', seqin->Db 'staphyl68' seqDefine: thys->Name 'FLTU7OB01AHJ67' type: N seqDefine: thys->Entryname 'FLTU7OB01AHJ67', seqin->Entryname '' seqDefine: returns thys->Name 'FLTU7OB01AHJ67' type: N ++ajSeqallread set db: 'staphyl68' => 'staphyl68' ajSeqallGetName '' ajSeqIsNuc Type 'N' ajSeqIsNuc Type 'N' ajSeqIsProt Type 'N' ajSeqallGetUsa 'staphyl68-id:FLTU7OB01AHJ67' ajSeqallGetseqName 'FLTU7OB01AHJ67' ... output format not set, default to 'fasta' ajSeqoutClear called ... output format not set, default to 'fasta' ajSeqoutOpen dir '' qrydir '' seqoutUsaProcess output USA to test: 'fltu7ob01ahj67.fasta' format regexp: No no format specified in USA file:id regexp: Yes found filename fltu7ob01ahj67.fasta single: No dir: '' ajFileNewOutD('' 'fltu7ob01ahj67.fasta') ajFileNewOutD open name 'fltu7ob01ahj67.fasta' ajSeqSetRange (len: 184 0..0 old 0..0) rev:No reversed:No result: (len: 184 0..0) ajSeqoutWriteSeq 'FLTU7OB01AHJ67' len: 184 ajSeqoutWriteSeq 17 'fasta' single: No feat: No Save: No seqClone out Setdb '' Db '' seq Setdb '' Db 'staphyl68' seqClone outseq->Type '' seq->Type 'N' seqClone 0 .. 0 1 .. 184 len: 184 type: 'N' Db: 'staphyl68' Name: 'FLTU7OB01AHJ67' Entryname: 'FLTU7OB01AHJ67' ajSeqTypeCheckS type 'gapany' found (any valid sequence with gaps) Convert gaps to '-' Convert '?' to 'X' ajSeqoutSetNameDefaultS already has a name 'FLTU7OB01AHJ67' seqWriteFasta outseq Db 'staphyl68' Setdb '' Setoutdb '' Name 'FLTU7OB01AHJ67' seqoutUfoLocal Features No Ufo 0 '' ajSeqoutWriteSeq tests features No tabouitisopen No UfoLocal No ftlocal No ajSeqRead: input file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' still there, try again seqRead: cleared seqRead: single access - count 1 - call access routine again seqAccessEmboss type 1 seqEmbossQryReuse: query data all finished seqRead: seqin->Query->Access->Access(seqin) *failed* ajSeqRead: open buffer usa: 'staphyl68-id:FLTU7OB01AHJ67' returns: No ajSeqallNext failed ajSeqinClear called ajFileBuffClear (-1) Nobuff: Yes size 0: Lines: 0 Curr: 0 Prev: 0 Last: 0 Free: 0 Freelast: 0 ajFileBuffClear '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (-1 lines) Y size: 0 pos: 0 removed 0 lines add to free: 0 Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N Free: 0 Last: -1 closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' ajSeqoutClose 'fltu7ob01ahj67.fasta' closing file 'fltu7ob01ahj67.fasta' ajSeqinDel called usa:'' ajSeqQueryDel db:'' id:'' Final Summary ============= Table usage : 11 opened, 0 closed, 251 maxsize, 40 maxmem List usage : 27 opened, 27 closed, 1438 maxsize 2380 nodes List iterator usage : 4 opened, 4 closed File usage : 1 opened, 9 closed, 3 max, 10 total ajNamExit done Regexp usage (bytes): 168 allocated, 1008 freed, -840 in use (sizes change) Regexp usage (number): 21 allocated, 21 freed 0 in use Array usage (bytes): 0 allocated, 0 freed, 0 in use Array usage (number): 0 allocated, 0 freed, 0 resized, 0 in use Array usage 2D (bytes): 0 allocated, 0 freed, 0 in use Array usage 2D (number): 0 allocated, 0 freed, 0 resized, 0 in use Array usage 3D (bytes): 0 allocated, 0 freed, 0 in use Array usage 3D (number): 0 allocated, 0 freed, 0 resized, 0 in use String usage (bytes): 268013 allocated, 268270 freed, -257 in use String usage (number): 4982 allocated, 4979 freed 3 in use Memory usage (bytes): 535329 allocated, 640 reallocated 503881 zeroed Memory usage (number): 14393 allocates, 14405 frees, 10 resizes, -12 in use closing file 'seqret.dbg' 3)The staphyl68.pxid file contains: Order 60 Fill 42 Pagesize 2048 Level 2 Cachesize 200 Order2 82 Fill2 99 Count 288506 Kwlimit 15 In addition, the definition plus resource record I defined for the the staphyl68 database in my local .embossrc file is the following (which should accommodate for the length of the id field, shouldn't it?): DB staphyl68 [ type: N method: emboss format: embl fields: "id,des" file: staphyl68.dat indexdirectory: /div/dias/u4/tjonasse/mrsa/454/068_reads/ comment: "mrsa staphyl68 reads" ] RES staphyl68 [ type: Index idlen: 20 deslen: 50 ] Best regards, GM Peter Rice wrote: > Hi George, > >> Why does the filter mode seqret invoked inside the for loop fails and >> this one works, and the problem does not exist for the 'afile' but >> only the 'bfile'? > > Can you add "-debug" to the seqret commandline and send me the > seqret.dbg file (it will be for the last seqret run so you'll need some > way to make sure the last run failed) > > and also sent the seqret.dbg file for running seqret standalone with the > same ID that worked. > > It would also be useful to see the .pxid file for the staphyl68 database > (it includes the length of ID that was indexed - your IDs are quite long > for dbxflat) > > regards, > > Peter > -- From georgios at biotek.uio.no Wed Jan 28 14:15:43 2009 From: georgios at biotek.uio.no (George Magklaras) Date: Wed, 28 Jan 2009 15:15:43 +0100 Subject: [EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0 In-Reply-To: <498063B0.1010207@ebi.ac.uk> References: <49803ABE.6080809@biotek.uio.no> <498063B0.1010207@ebi.ac.uk> Message-ID: <4980688F.2010209@biotek.uio.no> Indeed there was an \r \n to blame. Didn't spot that with of, because it was only one instance at the beginning of the file and not on every line. dos2unix to the rescue and we are back in business. Cheers Peter! GM Peter Rice wrote: > Hi George, >> Now, then, we have a second file called 'bhits' (697 sequences). This >> file has exactly the same format as 'ahits', but when we try to >> extract the identified sequences, we get the following: >> >> for seq in `cat bhits`; do seqret -filter staphyl68-id:$seq; done >> >> Died: seqret terminated: Bad value for '-sequence' with -auto defined >> 'rror: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO >> (one error per sequence ID) > > Umm ... does the message really start with 'rror'? > > That suggests some non-printing character is involved in the ID. Have > you checked bhits does not have any strange characters? > > The error message should be: > > Error: Unable to read sequence 'staphyl68-id:FLTU7OB01AJHZO' > > So something at the end of the ID seems to have moved the final quite to > the start of the line. > > I can get the same effect by using noreturn -system pc to change the > carriage control characters in bhits. > > I suspect that is the cause of your problem. > > Let me know if that doesn't solve it. > > regards, > > Peter > From staffa at niehs.nih.gov Thu Jan 29 15:45:32 2009 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Thu, 29 Jan 2009 10:45:32 -0500 Subject: [EMBOSS] EMBOSS/Jemboss In-Reply-To: <4980688F.2010209@biotek.uio.no> Message-ID: We are working hammer and tongs to make emboss and jemboss available institute-wide as a substitute for the GCG package to be as much like SeqLab as possible. Is there anyone there who has been successful creating a client-server relationship with Jemboss on Mac OS X with a Unix server?