From works at giandrea.com Mon May 2 11:14:49 2005 From: works at giandrea.com (Andrea Girardi) Date: Mon May 2 11:09:11 2005 Subject: [Biojava-l] Fasta File Message-ID: <427643E9.3090200@giandrea.com> Hi to all i'm new on this newsletter. I've a problem with Fasta file. I've downloaded today the BioJava libs and I'd like to know if there are some metods to open a fasta file like this: >gi|5713315|ref|NP_002060.1| (NM_002069) guanine nucleotide binding protein (G protein), alpha inhibiting activity polypeptide 1 [Homo sapiens]gi|12733317|ref|XP_011603.1| (XM_011603) hypothetical protein XP_011603 [Homo sapiens]gi|14749912|ref|XP_034149.1| (XM_034149) guanine nucleotide binding protein (G protein), alpha inhibiting activity polypeptide 1 [Homo sapiens]gi|121019|sp|P04898|GBI1_HUMAN GUANINE NUCLEOTIDE-BINDING PROTEIN G(I), ALPHA-1 SUBUNIT (ADENYLATE CYCLASE-INHIBITING G ALPHA PROTEIN)gi|71892|pir||RGBOI1 GTP-binding regulatory protein Gi alpha-1 chain (adenylate cyclase-inhibiting) - bovinegi|2144867|pir||RGHUI1 GTP-binding regulatory protein Gi alpha-1 chain (adenylate cyclase-inhibiting) - humangi|391|emb|CAA27288.1| (X03642) alpha-subunit [Bos taurus]gi|3005737|gb|AAC09361.1| (AF055013) guanine nucleotide-binding protein alpha-i subunit [Homo sapiens]gi|224920|prf||1204197A protein Gi alpha [Bos taurus] MGCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKIIHEAGYSEEECKQYKAVVYSNTIQS IIAIIRAMGRLKIDFGDSARADDARQLFVLAGAAEEGFMTAELAGVIKRLWKDSGVQACFNRSREYQLNDSAAYYLNDLD RIAQPNYIPTQQDVLRTRVKTTGIVETHFTFKDLHFKMFDVGGQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEM NRMHESMKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKKSPLTICYPEYAGSNTYEEAAAYIQCQFEDLNKRKDTKEIY THFTCATDTKNVQFVFDAVTDVIIKNNLKDCGLF I've to compare this string with a string in a XML file like this. D:\sequest\Emoclot_1\Fattore_Ottavo_06_E5\03\ 3.1 SR1 1070.4 4507907 Fattore_Ottavo_E5_02_001,493-495 R.ILAGPAGDSNVVK.L 1241.419 2 2.947 0.048 865.5 1 16/24 1 Someone can help me? Thanks, Andrea University of Verona, Italy From hollandr at gis.a-star.edu.sg Mon May 2 22:08:08 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon May 2 22:03:21 2005 Subject: [Biojava-l] Fasta File Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56019515AA@BIONIC.biopolis.one-north.com> Have a look at Mark's Biojava in Anger book: http://www.biojava.org/docs/bj_in_anger/ Particularly this bit: http://www.biojava.org/docs/bj_in_anger/ReadFasta.htm This second link has two examples. One reads a fasta file and stores it as a SequenceDB object, the other stores it as a SequenceIterator object. You can get a SequenceIterator object from a SequenceDB by calling db.getSequenceIterator() (assuming your SequenceDB object is called db). Once you have a SequenceIterator you use it like this (assuming the instance of this object is called si) to iterate over the sequences in the file: while (si.hasNext()) { Sequence s = (Sequence)s.next(); // Do stuff with each sequence s here } The Sequence object has methods for getting the name, sequence (represented as a SymbolList object) etc. - look it up in the BioJava Javadocs at http://www.biojava.org/docs/api/ cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Andrea Girardi > Sent: Monday, May 02, 2005 11:15 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] Fasta File > > > Hi to all > > i'm new on this newsletter. I've a problem with Fasta file. I've > downloaded today the BioJava libs and I'd like to know if > there are some > metods to open a fasta file like this: > > >gi|5713315|ref|NP_002060.1| (NM_002069) guanine nucleotide binding > protein (G protein), alpha inhibiting activity polypeptide 1 [Homo > sapiens]gi|12733317|ref|XP_011603.1| (XM_011603) > hypothetical protein > XP_011603 [Homo sapiens]gi|14749912|ref|XP_034149.1| (XM_034149) > guanine nucleotide binding protein (G protein), alpha inhibiting > activity polypeptide 1 [Homo sapiens]gi|121019|sp|P04898|GBI1_HUMAN > GUANINE NUCLEOTIDE-BINDING PROTEIN G(I), ALPHA-1 SUBUNIT (ADENYLATE > CYCLASE-INHIBITING G ALPHA PROTEIN)gi|71892|pir||RGBOI1 GTP-binding > regulatory protein Gi alpha-1 chain (adenylate cyclase-inhibiting) - > bovinegi|2144867|pir||RGHUI1 GTP-binding regulatory protein > Gi alpha-1 > chain (adenylate cyclase-inhibiting) - humangi|391|emb|CAA27288.1| > (X03642) alpha-subunit [Bos taurus]gi|3005737|gb|AAC09361.1| > (AF055013) > guanine nucleotide-binding protein alpha-i subunit [Homo > sapiens]gi|224920|prf||1204197A protein Gi alpha [Bos taurus] > MGCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKIIHEAGYS > EEECKQYKAVVYSNTIQS > IIAIIRAMGRLKIDFGDSARADDARQLFVLAGAAEEGFMTAELAGVIKRLWKDSGVQACFNR > SREYQLNDSAAYYLNDLD > RIAQPNYIPTQQDVLRTRVKTTGIVETHFTFKDLHFKMFDVGGQRSERKKWIHCFEGVTAII > FCVALSDYDLVLAEDEEM > NRMHESMKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKKSPLTICYPEYAGSNTYEEAAAY > IQCQFEDLNKRKDTKEIY > THFTCATDTKNVQFVFDAVTDVIIKNNLKDCGLF > > I've to compare this string with a string in a XML file like this. > > > xsi:noNamespaceSchemaLocation="schema1.xsd"> > > D:\sequest\Emoclot_1\Fattore_Ottavo_06_E5\03\ rigfilepath> > 3.1 SR1 > > 1070.4 > 4507907 > > Fattore_Ottavo_E5_02_001,493-495 > R.ILAGPAGDSNVVK.L > 1241.419 > 2 > 2.947 > 0.048 > 865.5 > 1 > 16/24 > 1 > > > > > Someone can help me? > Thanks, > > Andrea > University of Verona, Italy > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From hollandr at gis.a-star.edu.sg Mon May 2 22:18:48 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon May 2 22:13:06 2005 Subject: [Biojava-l] Fasta File Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56019515AE@BIONIC.biopolis.one-north.com> My apologies - that should have been db.sequenceIterator() not db.getSequenceIterator()...! cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Richard HOLLAND > Sent: Tuesday, May 03, 2005 10:08 AM > To: Andrea Girardi > Cc: biojava-l@biojava.org > Subject: RE: [Biojava-l] Fasta File > > > Have a look at Mark's Biojava in Anger book: > > http://www.biojava.org/docs/bj_in_anger/ > > Particularly this bit: > > http://www.biojava.org/docs/bj_in_anger/ReadFasta.htm > > This second link has two examples. One reads a fasta file and > stores it > as a SequenceDB object, the other stores it as a SequenceIterator > object. You can get a SequenceIterator object from a SequenceDB by > calling db.getSequenceIterator() (assuming your SequenceDB object is > called db). > > Once you have a SequenceIterator you use it like this (assuming the > instance of this object is called si) to iterate over the sequences in > the file: > > while (si.hasNext()) { > Sequence s = (Sequence)s.next(); > // Do stuff with each sequence s here > } > > The Sequence object has methods for getting the name, sequence > (represented as a SymbolList object) etc. - look it up in the BioJava > Javadocs at http://www.biojava.org/docs/api/ > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > > > -----Original Message----- > > From: biojava-l-bounces@portal.open-bio.org > > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > > Andrea Girardi > > Sent: Monday, May 02, 2005 11:15 PM > > To: biojava-l@biojava.org > > Subject: [Biojava-l] Fasta File > > > > > > Hi to all > > > > i'm new on this newsletter. I've a problem with Fasta file. I've > > downloaded today the BioJava libs and I'd like to know if > > there are some > > metods to open a fasta file like this: > > > > >gi|5713315|ref|NP_002060.1| (NM_002069) guanine > nucleotide binding > > protein (G protein), alpha inhibiting activity polypeptide 1 [Homo > > sapiens]gi|12733317|ref|XP_011603.1| (XM_011603) > > hypothetical protein > > XP_011603 [Homo sapiens]gi|14749912|ref|XP_034149.1| (XM_034149) > > guanine nucleotide binding protein (G protein), alpha inhibiting > > activity polypeptide 1 [Homo > sapiens]gi|121019|sp|P04898|GBI1_HUMAN > > GUANINE NUCLEOTIDE-BINDING PROTEIN G(I), ALPHA-1 SUBUNIT (ADENYLATE > > CYCLASE-INHIBITING G ALPHA PROTEIN)gi|71892|pir||RGBOI1 > GTP-binding > > regulatory protein Gi alpha-1 chain (adenylate > cyclase-inhibiting) - > > bovinegi|2144867|pir||RGHUI1 GTP-binding regulatory protein > > Gi alpha-1 > > chain (adenylate cyclase-inhibiting) - humangi|391|emb|CAA27288.1| > > (X03642) alpha-subunit [Bos taurus]gi|3005737|gb|AAC09361.1| > > (AF055013) > > guanine nucleotide-binding protein alpha-i subunit [Homo > > sapiens]gi|224920|prf||1204197A protein Gi alpha [Bos taurus] > > MGCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKIIHEAGYS > > EEECKQYKAVVYSNTIQS > > IIAIIRAMGRLKIDFGDSARADDARQLFVLAGAAEEGFMTAELAGVIKRLWKDSGVQACFNR > > SREYQLNDSAAYYLNDLD > > RIAQPNYIPTQQDVLRTRVKTTGIVETHFTFKDLHFKMFDVGGQRSERKKWIHCFEGVTAII > > FCVALSDYDLVLAEDEEM > > NRMHESMKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKKSPLTICYPEYAGSNTYEEAAAY > > IQCQFEDLNKRKDTKEIY > > THFTCATDTKNVQFVFDAVTDVIIKNNLKDCGLF > > > > I've to compare this string with a string in a XML file like this. > > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > xsi:noNamespaceSchemaLocation="schema1.xsd"> > > > > D:\sequest\Emoclot_1\Fattore_Ottavo_06_E5\03\ > rigfilepath> > > 3.1 SR1 > > > > 1070.4 > > 4507907 > > > > Fattore_Ottavo_E5_02_001,493-495 > > R.ILAGPAGDSNVVK.L > > 1241.419 > > 2 > > 2.947 > > 0.048 > > 865.5 > > 1 > > 16/24 > > 1 > > > > > > > > > > Someone can help me? > > Thanks, > > > > Andrea > > University of Verona, Italy > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From boehme at mpiib-berlin.mpg.de Wed May 4 09:01:21 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Wed May 4 08:54:16 2005 Subject: [Biojava-l] lage data sets/filtersByAnnotation ? In-Reply-To: <200505021512.j42FCQfY011298@portal.open-bio.org> References: <200505021512.j42FCQfY011298@portal.open-bio.org> Message-ID: <4278C7A1.3080505@mpiib-berlin.mpg.de> Hi, 1. I was wondering if anyone has testet bioJava/Biosql with large data sets - say the human genome? Not any sequence analysis/manipulation, just retrieving data. Is it reasonable fast to retrieve eg. features filtered by type/ancestor in, say 5 times nested features compared to direct db access? 2. I noticed in the source that filters by annotation also existed, but commented out. Are there any plans to reactivate these? Regards, Martina From fab.schreiber at gmx.de Wed May 4 05:27:44 2005 From: fab.schreiber at gmx.de (fabian schreiber) Date: Wed May 4 09:10:49 2005 Subject: [Biojava-l] NPE from FlatModel. with SimpleModelInState Message-ID: Hello! This thread appeared about a year ago, but there was no response to it. Is there any now? I'm currently working on a complex HMM with several SimpleModelInStates. When i try to call [code] DP dp=DPFactory.DEFAULT.createDP(outer_Hmm) [/code] , where outer_Hmm is the HMM, I get the following message: [code] java.lang.NullPointerException at org.biojava.bio.dp.FlatModel.(FlatModel.java:232) at org.biojava.bio.dp.DP.flatView(DP.java:96) at org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory.java:51) at start.main(start.java:103) [/code] Can someone help me or give me some advice? Or is it just a bug and i have to solve the problem differently? Thanks a lot! Fabian From hollandr at gis.a-star.edu.sg Wed May 4 21:32:04 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed May 4 21:26:25 2005 Subject: [Biojava-l] Term Synonyms Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560195170D@BIONIC.biopolis.one-north.com> I have added the concept of Synonyms, found in BioSQL, to the BioJava ontology term objects. I have also made the BioSQL interface part of BioJava aware of these synonyms and capable of retrieving/persisting them from the database. This has resulted in the addition of three new methods ( getSynonyms(), addSynonym(synonym), removeSynonym(synonym) ) to the Term interface, with associated additions in those interfaces which extend it. Likewise, one extra method ( createTerm(name,description,synonyms) ) has been added to the Ontology interface. Please let me know if there are any problems. cheers, Richard Richard Holland Bioinformatics Specialist Genome Institute of Singapore 60 Biopolis Street, #02-01 Genome, Singapore 138672 Tel: (65) 6478 8000 DID: (65) 6478 8199 Email: hollandr@gis.a-star.edu.sg --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- From yezhenqing at yahoo.com.cn Tue May 10 03:04:24 2005 From: yezhenqing at yahoo.com.cn (zhenqing ye) Date: Tue May 10 02:57:30 2005 Subject: [Biojava-l] topic about RestrictionEnzyme Message-ID: <20050510070425.67595.qmail@web15312.mail.bjs.yahoo.com> Hi, I have tried biojava-1.4pre1.zip for my works, but there are some bugs(i am not very sure) in RestrictionEnzyme related classes. I have written the following codes: ... RestrictionEnzyme enzyme = RestrictionEnzymeManager.getEnzyme("Bsp24I"); System.out.println(enzyme.getName()); System.out.println(enzyme.getCutType()); try{ System.out.println(enzyme.getUpstreamEndType()); System.out.println(enzyme.getUpstreamCut()[0]); System.out.println(enzyme.getUpstreamCut()[1]); System.out.println(enzyme.getDownstreamEndType()); System.out.println(enzyme.getDownstreamCut()[0]); System.out.println(enzyme.getDownstreamCut()[1]); } catch(BioException bioe){ System.out.println("some error"); } ... the output is like that:(# means commention) Bsp24I (#Name) 1 (#CutType) 1 (#UpstreamEndType) 8 (#UpstreamCut[0]) 13 (#UpstreamCut[1]) 2 (#DownstreamEndType) 1 (#DownstreamCut[0]) 1 (#DownstreamCut[1]) furthermore, the explanation in REBASE version 504 likes the following... ENZYMES WITH UNUSUAL CLEAVAGE PROPERTIES: Enzymes that cut on both sides of their recognition sequences, such as BcgI, Bsp24I, CjeI and CjePI, have 4 cleavage sites each instead of 2. Bsp24I 5' ^NNNNNNNNGACNNNNNNTGGNNNNNNNNNNNN^ 3' 3' ^NNNNNNNNNNNNNCTGNNNNNNACCNNNNNNN^ 5' This will be described in some REBASE reports as: Bsp24I (8/13)GACNNNNNNTGG(12/7) so i think the expected values should be the following for the last three parameters: 0 (#DownstreamEndType) 12 (#DownstreamCut[0]) 7 (#DownstreamCut[1]) is there any one meet this situation and give some commentions, or maybe i should try me best to go deep into the source codes. thanks Zhenqing Ye --------------------------------- Do You Yahoo!? ע��һ��Ʒ�ʵ��Ż��ѵ�� From mark.schreiber at novartis.com Tue May 10 21:35:17 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue May 10 21:28:22 2005 Subject: [Biojava-l] topic about RestrictionEnzyme Message-ID: Hi - This does look like an error although I'm no expert on restriction enzymes. Can anyone else comment? If you do find the source of the error please let us know. - Mark zhenqing ye Sent by: biojava-l-bounces@portal.open-bio.org 05/10/2005 03:04 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] topic about RestrictionEnzyme Hi, I have tried biojava-1.4pre1.zip for my works, but there are some bugs(i am not very sure) in RestrictionEnzyme related classes. I have written the following codes: ... RestrictionEnzyme enzyme = RestrictionEnzymeManager.getEnzyme("Bsp24I"); System.out.println(enzyme.getName()); System.out.println(enzyme.getCutType()); try{ System.out.println(enzyme.getUpstreamEndType()); System.out.println(enzyme.getUpstreamCut()[0]); System.out.println(enzyme.getUpstreamCut()[1]); System.out.println(enzyme.getDownstreamEndType()); System.out.println(enzyme.getDownstreamCut()[0]); System.out.println(enzyme.getDownstreamCut()[1]); } catch(BioException bioe){ System.out.println("some error"); } ... the output is like that:(# means commention) Bsp24I (#Name) 1 (#CutType) 1 (#UpstreamEndType) 8 (#UpstreamCut[0]) 13 (#UpstreamCut[1]) 2 (#DownstreamEndType) 1 (#DownstreamCut[0]) 1 (#DownstreamCut[1]) furthermore, the explanation in REBASE version 504 likes the following... ENZYMES WITH UNUSUAL CLEAVAGE PROPERTIES: Enzymes that cut on both sides of their recognition sequences, such as BcgI, Bsp24I, CjeI and CjePI, have 4 cleavage sites each instead of 2. Bsp24I 5' ^NNNNNNNNGACNNNNNNTGGNNNNNNNNNNNN^ 3' 3' ^NNNNNNNNNNNNNCTGNNNNNNACCNNNNNNN^ 5' This will be described in some REBASE reports as: Bsp24I (8/13)GACNNNNNNTGG(12/7) so i think the expected values should be the following for the last three parameters: 0 (#DownstreamEndType) 12 (#DownstreamCut[0]) 7 (#DownstreamCut[1]) is there any one meet this situation and give some commentions, or maybe i should try me best to go deep into the source codes. thanks Zhenqing Ye --------------------------------- Do You Yahoo!? ע��һ��Ʒ�ʵ��Ż��ѵ�� _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From yezhenqing at yahoo.com.cn Tue May 10 22:04:44 2005 From: yezhenqing at yahoo.com.cn (zhenqing ye) Date: Tue May 10 21:57:26 2005 Subject: [Biojava-l] topic about RestrictionEnzyme Message-ID: <20050511020444.28516.qmail@web15305.mail.bjs.yahoo.com> Hi, i have solved the problem, so let's share it. the code line in RestrictionEnzymeManager.java should be changed to the correct one as following. .... // Create site value splitter RegexSplitter site = new RegexSplitter(Pattern.compile("(\$-?\\d/-?\\d+\$|[A-Za-z^]+)"), 1); # old one ... new RegexSplitter(Pattern.compile("(\$-?\\d+/-?\\d+\$|[A-Za-z^]+)"), 1); # new one just add one "+" character. hoho, i think it's a mini incaution, not bug. furthermore, something in my previous draft should be changed too. 0 (#DownstreamEndType) 0 (#DownstreamEndType) 12 (#DownstreamCut[0]) -----> 24 (#DownstreamCut[0]) 7 (#DownstreamCut[1]) 19 (#DownstreamCut[1]) Bsp24I 5' ^NNNNNNNNGACNNNNNNTGGNNNNNNNNNNNN^ 3' 3' ^NNNNNNNNNNNNNCTGNNNNNNACCNNNNNNN^ 5' length(GACNNNNNNTGGNNNNNNNNNNNN)=24 length(CTGNNNNNNACCNNNNNNN)=19 thanks zhenqing ye --------------------------------- Do You Yahoo!? ע��һ��Ʒ�ʵ��Ż��ѵ�� From mark.schreiber at novartis.com Wed May 11 01:09:34 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed May 11 01:03:06 2005 Subject: [Biojava-l] topic about RestrictionEnzyme Message-ID: I was going to patch this in CVS but it seems Richard beat me to it! Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 zhenqing ye Sent by: biojava-l-bounces@portal.open-bio.org 05/11/2005 10:04 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] topic about RestrictionEnzyme Hi, i have solved the problem, so let's share it. the code line in RestrictionEnzymeManager.java should be changed to the correct one as following. .... // Create site value splitter RegexSplitter site = new RegexSplitter(Pattern.compile("(\$-?\\d/-?\\d+\$|[A-Za-z^]+)"), 1); # old one ... new RegexSplitter(Pattern.compile("(\$-?\\d+/-?\\d+\$|[A-Za-z^]+)"), 1); # new one just add one "+" character. hoho, i think it's a mini incaution, not bug. furthermore, something in my previous draft should be changed too. 0 (#DownstreamEndType) 0 (#DownstreamEndType) 12 (#DownstreamCut[0]) -----> 24 (#DownstreamCut[0]) 7 (#DownstreamCut[1]) 19 (#DownstreamCut[1]) Bsp24I 5' ^NNNNNNNNGACNNNNNNTGGNNNNNNNNNNNN^ 3' 3' ^NNNNNNNNNNNNNCTGNNNNNNACCNNNNNNN^ 5' length(GACNNNNNNTGGNNNNNNNNNNNN)=24 length(CTGNNNNNNACCNNNNNNN)=19 thanks zhenqing ye --------------------------------- Do You Yahoo!? ע��һ��Ʒ�ʵ��Ż��ѵ�� _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed May 11 02:04:54 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed May 11 01:57:52 2005 Subject: [Biojava-l] Ant, Biojava and NetBeans Message-ID: Hello - I have been playing with biojava and NetBeans 4.0, I have been pleasently surprised to find it works very well. It copes with the idea of having lots of source directories, integrates very well with CVS and does a good job of building from the Ant build.xml The one thing I have had problems with is that it throws ClassCastExceptions when building Javadocs from the Ant script. This doesn't happen from the command line. It does actually manage to produce HTML (presumably without the nice custom taglets). Googleing around for an answer there is some suggestion it is caused by a JVM bug (something to do with forking processes or something). Has anyone else noticed this? Anyone got a solution? - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From ks_moses at yahoo.com Wed May 11 05:22:50 2005 From: ks_moses at yahoo.com (Simon Moses) Date: Wed May 11 05:18:41 2005 Subject: [Biojava-l] accessing data in servers(ex. NCBI) from our java programmes Message-ID: <20050511092250.56885.qmail@web31804.mail.mud.yahoo.com> dear sir, i am new to biojava. i want to access data from NCBI(or other Databanks) with my programme. is there any possibility to send a 'select query' and get data? how to know the format in which the data is stored? like textfile, oracle database, mysql etc.. if data is stored in any RDBMS what is the connection string? if data is in textfiles how to know the path and file names? are the paths are fixed? my intention is to send an sql query, get data, process it and display in my programme after checking few conditions. i have seen an application which runs in client machine and accesses these databases. for example, if we give our DNA in that programme and select the server and database, programme will search in NCBI server and give DNA sequences which are nealy equal. thanks in advance --Simon Moses ************************** Visit My Home Page http://www.geocities.com/ks_moses updated: 28 Sep 2004. Simon Moses ************************** --------------------------------- Yahoo! Mail Stay connected, organized, and protected. Take the tour From mark.schreiber at novartis.com Wed May 11 05:40:06 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed May 11 05:33:21 2005 Subject: [Biojava-l] accessing data in servers(ex. NCBI) from our java programmes Message-ID: Welcome to Biojava! to get data from NCBI I reccomend using their eutils (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html) to read sequence text files (or the sequence text returned by NCBI) have a look at the tutorials about SeqIO on http://www.biojava.org/docs/bj_in_anger/index.htm to interact with a SQL database I would reccomend getting a very recent version of biojava from CVS and a very recent version of BioSQL (http://obda.open-bio.org/) from the open-bio cvs server (http://cvs.open-bio.org/) the have a look at the BioSQL tutorials at the bottom of the Biojava in Anger page (http://www.biojava.org/docs/bj_in_anger/index.htm) Keep posted for updates on the biojava / biosql interface. This is an area of active development. Best of luck. - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Simon Moses Sent by: biojava-l-bounces@portal.open-bio.org 05/11/2005 05:22 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] accessing data in servers(ex. NCBI) from our java programmes dear sir, i am new to biojava. i want to access data from NCBI(or other Databanks) with my programme. is there any possibility to send a 'select query' and get data? how to know the format in which the data is stored? like textfile, oracle database, mysql etc.. if data is stored in any RDBMS what is the connection string? if data is in textfiles how to know the path and file names? are the paths are fixed? my intention is to send an sql query, get data, process it and display in my programme after checking few conditions. i have seen an application which runs in client machine and accesses these databases. for example, if we give our DNA in that programme and select the server and database, programme will search in NCBI server and give DNA sequences which are nealy equal. thanks in advance --Simon Moses ************************** Visit My Home Page http://www.geocities.com/ks_moses updated: 28 Sep 2004. Simon Moses ************************** --------------------------------- Yahoo! Mail Stay connected, organized, and protected. Take the tour _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From gwaldon at geneinfinity.org Wed May 11 14:30:44 2005 From: gwaldon at geneinfinity.org (george waldon) Date: Wed May 11 14:25:55 2005 Subject: =?US-ASCII?B?UkU6IFtCaW9qYXZhLWxdIHRvcGljIGFib3V0IFJlc3RyaWN0aW9uRW56eW1l?= Message-ID: <200505111830.j4BIUin9089377@mmm1924.dulles19-verio.com> >From zhenqing ye: 0 (#DownstreamEndType) 24 (#DownstreamCut[0]) 19 (#DownstreamCut[1]) BaeI generates 3' overhangs on both sides so DownstreamEndType equals 1 (OVERHANG_3PRIME). Looking at the code, I just found another bug. In RestrictionEnzyme.java, the first public constructor calls the second constructor where "cutType = CUT_COMPOUND;" overwrites the attribution of cutType in the first one. There is also a bug I found a while ago. In RestrictionEnzymeManager.java, around 2/3 down, put for (Iterator ii = isoschizomers.iterator(); ii.hasNext();) { String isoName = (String) ii.next(); Object re = nameToEnzyme.get(isoName); if(re!=null) tempSet.add(re); } helps to deal with isoschizomers. I had sometimes a problem with end of files in org.biojava.bio.program.tagvalue.Parser.java which disappears after putting mark(2): // end of record. Is it the last in the file? if(tv == null) { // scan for eof after whitespace boolean eof = false; while(true) { reader.mark(2); Althought I never investigated if it was a real bug or not. Also, adding two line return at the end of the enzyme file helped. - George From mark.schreiber at novartis.com Wed May 11 23:15:25 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed May 11 23:17:41 2005 Subject: [Biojava-l] Changes to BlastSAXParser Message-ID: Hello - I noticed a bug in the BlastSAXParser that truncated the database ID and description after one line. I modified this to fix this problem and checked it into CVS. The JUnit tests pass but I'm not sure if this affects HMMER parsing. The BastLikeSAXParser claims it can parse HMMER files but I've never confirmed this. If anyone uses this functionality could they test it and let me know if there are problems? - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From yezhenqing at yahoo.com.cn Thu May 12 03:41:18 2005 From: yezhenqing at yahoo.com.cn (zhenqing ye) Date: Thu May 12 03:34:18 2005 Subject: [Biojava-l] topic about RestrictionEnzyme Message-ID: <20050512074118.31008.qmail@web15302.mail.bjs.yahoo.com> >BaeI generates 3' overhangs on both sides so DownstreamEndType equals 1 (OVERHANG_3PRIME). i don't think so. you can refer the record in rebase database. <1>BaeI <2> <3>(10/15)ACNNNNGTAYC(12/7) 5' 3' N N N N N^N N N N N N N N N N A C N N N N G T A Y C N N N N N N N N N N N N^ ^N N N N N N N N N N N N N N N T G N N N N C A T R G N N N N N N N^N N N N N 3' 5' the left side should be OVERHAND_3PRIME=1, and the right side should be OVERHAND_5PRIME=0 the code runs ok for me. >I just found another bug. In RestrictionEnzyme.java, the first public constructor calls the second constructor where "cutType = CUT_COMPOUND;" overwrites the attribution of cutType in the first one. you can read the codes in the first constructor public RestrictionEnzyme(String name, SymbolList site, int dsForward, int dsReverse) throws IllegalAlphabetException { this(name, site, null, new int [] { dsForward, dsReverse }); cutType = CUT_SIMPLE; #########it has been changed back } the last situation you recommended, i am not very sure about it cause i have no chance to try it. only my own opinion! furthermore, English is not my native language. but hope it can still convey my meaning. thanks zhenqing ye --------------------------------- Do You Yahoo!? ע��һ��Ʒ�ʵ��Ż��ѵ�� From mathias.fuchs at ch.unilog.com Thu May 12 06:28:34 2005 From: mathias.fuchs at ch.unilog.com (Fuchs, Mathias) Date: Thu May 12 06:21:02 2005 Subject: [Biojava-l] mouse selection of sequences possible? Message-ID: Hi everyone, i am new to biojava and i am currently investigating suitable resources that i could use for an upcomming project in the pharmaceutical industry. I made some test implemantations with biojava and i think i will use it (or parts of it). I have played around with some renderers but could't find out if there is a way to select (and highlight) a part of a sequence with the mouse and use the selected part for further computing. Is there a simple way to use mouse selecting of sequences or do i have to write my own display for that purpose? Thanks in advance to everyone Mathias Fuchs Mathias Fuchs Unilog IT Services AG - a Unilog company Clarastrasse 12, CH-4058 Basel, Switzerland fon +41 (0)61 699 2435 www.unilog-itservices.ch, www.unilog.com From kalle.naslund at genpat.uu.se Thu May 12 07:11:39 2005 From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=) Date: Thu May 12 07:06:14 2005 Subject: [Biojava-l] mouse selection of sequences possible? In-Reply-To: References: Message-ID: <428339EB.6060103@genpat.uu.se> Fuchs, Mathias wrote: >Hi everyone, > >i am new to biojava and i am currently investigating suitable resources that >i could use for an upcomming project in the pharmaceutical industry. I made >some test implemantations with biojava and i think i will use it (or parts >of it). > >I have played around with some renderers but could't find out if there is a >way to select (and highlight) a part of a sequence with the mouse and use >the selected part for further computing. >Is there a simple way to use mouse selecting of sequences or do i have to >write my own display for that purpose? > >Thanks in advance to everyone > >Mathias Fuchs > > Elloh! I dont think there is any existing classes that do what you want in biojava. But, i have hacked togheter some code that does approximately what you describe. Im doing this by using a LayeredRenderer that has two layers. The top layer holds the normal renderers that do the actual sequence visualisation. The layer under has a renderer that renders a given location in colored blocks. This gives the impression that you have marked some areas. Then you just create some mouse listeners that listens for mouse events that are to be interpreted as selection actions and sets the location to be rendered appropriately in the bottom layer. Hacking this togheter shouldnt be that hard, on the other hand, if you trust my code ( its pretty hackish ) im willing to share it with you, either as GPL or LGPL licenced code. kind regards Kalle N?slund From gwaldon at geneinfinity.org Thu May 12 12:04:28 2005 From: gwaldon at geneinfinity.org (george waldon) Date: Thu May 12 11:57:11 2005 Subject: =?US-ASCII?B?UkU6IFtCaW9qYXZhLWxdIHRvcGljIGFib3V0IFJlc3RyaWN0aW9uRW56eW1l?= Message-ID: <200505121604.j4CG4Shs025924@mmm1924.dulles19-verio.com> Oups! My mistake (and apologies) about the constructors and cutType. Now, I must disagree with the overhangs. When the cut is not bluntended, it generates a little asymetrical overhang or piece of single stranded DNA which is designated by its location at the end 3' or 5' of the strand of DNA it belongs to. e.g. 5' overhang: NNN^3' ||| NNNNNNN^5' and 3' overhang: NNNNNNN^3' ||| NNN^5' You can read the original article on BaeI from NEB Sears LE, Zhou B, Aliotta JM, Morgan RD, Kong H. (1996) "BaeI, another unusual BcgI-like restriction endonuclease" Nucleic Acids Res. 1996 Sep 15;24(18):3590-2 which states on page 3591 that BaeI generates 3' overhangs. Get also a look at http://arbl.cvmbs.colostate.edu/hbooks/genetics/biotech/enzymes/renzymes.html which shows all this with nice pictures. Thanks George From yezhenqing at hotmail.com Fri May 13 20:48:15 2005 From: yezhenqing at hotmail.com (yezhenqing yezhenqing) Date: Fri May 13 20:40:58 2005 Subject: [Biojava-l] Re:topic about RestrictionEnzyme Message-ID: maybe it's not suitable for me to answer this question. but i am glad to share my understanding here.(just my guess) to be a biologist, it is right for the defination of overhangs in the reference. but in the case involved in a cut type where the enzyme cuts in two positions relative to the recognition site, i think it's inconvenient to code implementation for uniform according to biological rules. you can try more enzymes in the code, maybe you will find the rules of overhangs defination in the code implementation. For example HgaI GACGC (5/10) indicates cleavage as follows: 5' GACGCNNNNN^ 3' 3' CTGCGNNNNNNNNNN^ 5' it should be 5'-overhang according to biological rule, but in the code implementation it's 3'-overhang. i think the defination of overhang in the code implementation is related with the strand direction. anyway, it's not harmful to our biological tasks, and not confused for us to solve problems when we keep the difference in mind. thanks zhenqing ye _________________________________________________________________ �� MSN Explorer: http://explorer.msn.com/lccn/ From kalle.naslund at genpat.uu.se Mon May 16 04:35:55 2005 From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=) Date: Mon May 16 04:29:29 2005 Subject: [Biojava-l] mouse selection of sequences possible? In-Reply-To: References: Message-ID: <42885B6B.9070708@genpat.uu.se> I have just taken my code, and slapped a BJ copyright notice at the top of each file. So just uncompress the file, copy the java files to the appropriate location ( for now i just put everything under org.bioajva.bio.gui.sequence ) and rebuild biojava. Then you can use the classes. The code can be found at http://beluga.genpat.uu.se/kalle/selection_code.tgz In time, and if people feel this approach is the right way to add support for displaying selection bars, i will fix this code and put it in BJ. /Kalle From gwaldon at geneinfinity.org Mon May 16 16:44:49 2005 From: gwaldon at geneinfinity.org (george waldon) Date: Mon May 16 16:37:28 2005 Subject: =?US-ASCII?B?UkU6IFtCaW9qYXZhLWxdIFJlOnRvcGljIGFib3V0IFJlc3RyaWN0aW9uRW56eW1l?= Message-ID: <200505162044.j4GKinf6088795@mmm1924.dulles19-verio.com> zhenqing ye wrote: >For example HgaI GACGC (5/10) indicates cleavage as >follows: > 5' GACGCNNNNN^ 3' > 3' CTGCGNNNNNNNNNN^ 5' >it should be 5'-overhang according to biological rule, but >in the code implementation it's 3'-overhang. i think the >defination of overhang in the code implementation >is related with the strand direction. It's a 5'overhang and if biojava says otherwise, it is most probably a bug. It is just confusing because we use the word "end" to describe both the enzyme cuts (end type using biojava javadoc terminology) and the strand termini (termination of a strand either 3' end or 5' end). - George From mark.schreiber at novartis.com Mon May 16 21:40:33 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon May 16 21:33:14 2005 Subject: [Biojava-l] Re:topic about RestrictionEnzyme Message-ID: If there is a consensus on how to fix this problem (if there is a problem) let me know so I can check it in to CVS. - Mark "george waldon" Sent by: biojava-l-bounces@portal.open-bio.org 05/17/2005 04:44 AM Please respond to george waldon To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] Re:topic about RestrictionEnzyme zhenqing ye wrote: >For example HgaI GACGC (5/10) indicates cleavage as >follows: > 5' GACGCNNNNN^ 3' > 3' CTGCGNNNNNNNNNN^ 5' >it should be 5'-overhang according to biological rule, but >in the code implementation it's 3'-overhang. i think the >defination of overhang in the code implementation >is related with the strand direction. It's a 5'overhang and if biojava says otherwise, it is most probably a bug. It is just confusing because we use the word "end" to describe both the enzyme cuts (end type using biojava javadoc terminology) and the strand termini (termination of a strand either 3' end or 5' end). - George _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From yezhenqing at yahoo.com.cn Mon May 16 22:31:36 2005 From: yezhenqing at yahoo.com.cn (zhenqing ye) Date: Mon May 16 22:24:20 2005 Subject: [Biojava-l] Re:topic about RestrictionEnzyme Message-ID: <20050517023136.11335.qmail@web15309.mail.cnb.yahoo.com> Oups! My mistake (and apologies) about the understanding of the defination of overhangs. thinking for a while and re-diving into the codes, George is right, maybe it's truly a bug. the confused codes is the following functions in RestrictionEnzyme.java ... public int getDownstreamEndType() { if (dsCutPositions[0] > dsCutPositions[1]) //if dsCutPositions[0]>dsCutPositions[1], it should be OVERHANG_3PRIME return OVERHANG_5PRIME; //to fix it, we can change the ">" to "<" or change the OVERHANG_5PRIME else if (dsCutPositions[0] < dsCutPositions[1]) //vice versa return OVERHANG_3PRIME; else return BLUNT; } ... to ensure the judgement, the following code in the same class is right for job. so i am sorry for my mistake. ^_^ ... public int getUpstreamEndType() throws BioException { if (cutType == CUT_SIMPLE) throw new BioException(name + " does not cut upstream of the recognition site"); if (usCutPositions[0] > usCutPositions[1]) //this is right return OVERHANG_5PRIME; else if (usCutPositions[0] < usCutPositions[1]) //this is right return OVERHANG_3PRIME; else return BLUNT; } ... thanks for George ^_^ thanks for all zhenqing ye --------------------------------- Do You Yahoo!? ע��һ��Ʒ�ʵ��Ż��ѵ�� From boehme at mpiib-berlin.mpg.de Wed May 18 06:07:54 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Wed May 18 06:01:44 2005 Subject: [Biojava-l] nested features Message-ID: <428B13FA.2090306@mpiib-berlin.mpg.de> Hi, I'm still having problems with nested features. I know that the order in which they are retrieved is not constant, but the hierachy should be. I do: delete(db, "AF100928"); create(db, sequence); retrieveSeq(db, "AF100928"); and get: deleting AF100928 adding a sequence 1 retrieving AF100928 AF100928 contains 2 features Source: 1 Type: AIF-1 contains: 0 Project : [2572/73, A] Source: ncbi Type: gen contains: 8 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 OR, on some runs (at least one in 10) I get: deleting AF100928 adding a sequence 1 retrieving AF100928 AF100928 contains 1 features Source: ncbi Type: gen contains: 2 Second Level: Source: AIF Type: siRNA contains: 1 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 I can't see any difference in the biosql db (MySql), but I don't know where to look exactly. If I don't delete the record, just retrieve it on different runs, the output stays the same. I'm using biojava live from last month. I would be very grateful if someone could help me with this! Martina I add it like this: private static void create(BioSQLSequenceDB db, String sequence) throws IllegalSymbolException, IllegalArgumentException, ChangeVetoException, IndexOutOfBoundsException, BioException { Sequence seq = DNATools.createDNASequence(sequence, "AF100928"); // add annotation seq.getAnnotation().setProperty("Lab-Genname", "AIF"); seq.getAnnotation().setProperty("gi", "4323586"); seq.getAnnotation() .setProperty("genename", "apoptosis-inducing factor"); seq.getAnnotation().setProperty("organism", "Homo sapiens"); seq.getAnnotation().setProperty("protein-id", "AAD16436.1"); // fill the template Feature.Template templSeq = new Feature.Template(); templSeq.source = "ncbi"; // templSeq.strand = StrandedFeature.UNKNOWN; templSeq.type = "gen"; templSeq.location = Location.empty; Feature seqF = seq.createFeature(templSeq); // create siRNA StrandedFeature.Template templSt = new StrandedFeature.Template(); templSt.location = new RangeLocation(201, 221); templSt.strand = StrandedFeature.POSITIVE; templSt.source = "AIF"; // =>id templSt.type = "siRNA"; // oder shRNA oder cellline? Annotation annoSiRNA = new SimpleAnnotation(); annoSiRNA.setProperty("abbreviation", "AIF"); annoSiRNA.setProperty("no", 3); templSt.annotation = annoSiRNA; Feature sf = seqF.createFeature(templSt); // create charge Feature.Template charge = new Feature.Template(); charge.type = "AIF-1"; charge.source = "1"; // Erste Synthese charge.location = Location.empty; Annotation chargeAnno = new SmallAnnotation(); chargeAnno.setProperty("Batchno", "2572/73"); chargeAnno.setProperty("Project", "A"); charge.annotation = chargeAnno; Feature cf = sf.createFeature(charge); // create Amplicon StrandedFeature.Template templ = new StrandedFeature.Template(); templ.location = new RangeLocation(72, 92); templ.source = "AIF 5`450 / AIF 3`682"; templ.type = "Amplicon"; templ.strand = StrandedFeature.UNKNOWN; Annotation annoAmplicon = new SimpleAnnotation(); annoAmplicon.setProperty("Provider", "Invitrogen"); annoAmplicon.setProperty("designed by", "Chris"); annoAmplicon.setProperty("Stock-Box/Position", "rtS-A3/A5 + rtS-A3/A6"); annoAmplicon.setProperty("Arbeitsl?sung", "rtArb-A1/C3"); // templ.annotation = annoAmplicon; Feature af = seqF.createFeature(templ); System.out.println("adding a sequence"); System.out.println(seq.countFeatures()); db.addSequence(seq); } and here it gets printed: private static void retrieveSeq(BioSQLSequenceDB db, String seqName) throws BioException, NoSuchElementException { Sequence seq; System.out.println("retrieving " + seqName); seq = db.getSequence(seqName); System.out.println(seq.getName() + " contains " + seq.countFeatures() + " features"); for (Iterator i = seq.features(); i.hasNext();) { Feature f = (Feature) i.next(); System.out.println("Source: " + f.getSource() + " Type: " + f.getType() + " contains: " + f.countFeatures()); Annotation anno = f.getAnnotation(); // print each key value pair for (Iterator it = anno.keys().iterator(); it.hasNext();) { Object key = it.next(); System.out.println(key + " : " + anno.getProperty(key)); } for (Iterator it = f.features(); it.hasNext();) { Feature fs = (Feature) it.next(); System.out.println("Second Level: Source: " + fs.getSource() + " Type: " + fs.getType() + " contains: " + fs.countFeatures()); } } From fab.schreiber at gmx.de Wed May 18 12:17:12 2005 From: fab.schreiber at gmx.de (Fabian Schreiber) Date: Wed May 18 12:18:49 2005 Subject: [Biojava-l] How to read in an aligment and then start viterbi Message-ID: Hello! I created a Hmm with an Alphabet, that contains 9 different Proteins such as ALA, ARG, etc. After i created this model, i want to read in an aligment from a file and start Viterbi with the aligment as input. When i do so, i always get the following error: [code] org.biojava.bio.symbol.IllegalSymbolException: Symbol ARG not found in alphabet DnaAlphabet at org.biojava.bio.symbol.AbstractAlphabet.validate(AbstractAlphabet.java:278) at org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:117) at org.biojava.bio.dist.SimpleDistribution.getWeightImpl(SimpleDistribution.java:131) at org.biojava.bio.dist.AbstractDistribution.getWeight(AbstractDistribution.java:197) at org.biojava.bio.dp.ScoreType$Probability.calculateScore(ScoreType.java:54) at org.biojava.bio.dp.onehead.SingleDP.getEmission(SingleDP.java:100) at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:553) at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:488) [/code] It looks as if the alphabets (one from the hmm and the other from the alignment) differ, but i debugged it carefully and also checked the indices of both alphabets, which are equal, but could not find a problem. Is it a bug? Did anyone of you experienced a similiar problem? Can someone help me please? Its really important. Thanks a lot Greetings Fabian From hollandr at gis.a-star.edu.sg Wed May 18 21:52:51 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed May 18 21:46:26 2005 Subject: [Biojava-l] How to read in an aligment and then start viterbi Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601B94325@BIONIC.biopolis.one-north.com> >From the exception trace it looks as though Viterbi thinks you are using a DnaAlphabet when in fact you have a custom one. However without seeing your code it's hard to tell if this is a bug in BioJava, or a problem in the way Viterbi has been called. Could you provide a sample method which shows (a) how you construct the alphabet, (b) creating the Hmm with it, (c) reading the alignment file, and (d) passing it to Viterbi. It doesn't have to be very complicated or intricate as long as it shows what you are doing and produces similar exceptions. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Fabian Schreiber > Sent: Thursday, May 19, 2005 12:17 AM > To: biojava-l@biojava.org > Subject: [Biojava-l] How to read in an aligment and then start viterbi > > > Hello! > I created a Hmm with an Alphabet, that > contains 9 different Proteins such as ALA, ARG, etc. > After i created this model, i want to read in an > aligment from a file and start Viterbi with the aligment as input. > When i do so, i always get the following error: > [code] > org.biojava.bio.symbol.IllegalSymbolException: Symbol ARG not > found in > alphabet DnaAlphabet > at > org.biojava.bio.symbol.AbstractAlphabet.validate(AbstractAlpha > bet.java:278) > at > org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(Line > arAlphabetIndex.java:117) > at > org.biojava.bio.dist.SimpleDistribution.getWeightImpl(SimpleDi > stribution.java:131) > at > org.biojava.bio.dist.AbstractDistribution.getWeight(AbstractDi > stribution.java:197) > at > org.biojava.bio.dp.ScoreType$Probability.calculateScore(ScoreT > ype.java:54) > at > org.biojava.bio.dp.onehead.SingleDP.getEmission(SingleDP.java:100) > at > org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:553) > at > org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:488) > [/code] > > It looks as if the alphabets (one from the hmm and the other from the > alignment) differ, but i debugged it carefully and also checked > the indices of both alphabets, which are equal, but could not find > a problem. > Is it a bug? > Did anyone of you experienced a similiar problem? > > Can someone help me please? > > Its really important. > > Thanks a lot > > Greetings > > Fabian > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From hollandr at gis.a-star.edu.sg Wed May 18 22:33:59 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed May 18 22:28:15 2005 Subject: [Biojava-l] nested features Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601B9432F@BIONIC.biopolis.one-north.com> Just to get this straight, if I follow your code correctly, then the second example output you should be seeing is the correct one. Could you confirm that this is what you are expecting? At least then it gives us something to work towards as a definite known correct answer. It would also be helpful if you could provide, for each of the two cases, the contents of the seqfeature and seqfeature_relationship tables. You don't need to provide them all. Just find out the bioentry_id of the inserted sequence (look it up in the 'bioentry' table using the 'name' column to look for your sequence name), then record all the rows from seqfeature with bioentry_id being that value, and all rows from seqfeature_relationship where either subject_seqfeature_id or object_seqfeature_id matches any of the seqfeature_id values you just got from the seqfeature table. In other words (assuming Oracle, but similar queries will work elsewhere): select seqfeature.* from seqfeature, bioentry where seqfeature.bioentry_id = bioentry.bioentry_id and bioentry.name = 'AF100928'; select seqfeature_relationship.* from seqfeature_relationship, seqfeature, bioentry where (subject_seqfeature_id = seqfeature.seqfeature_id or object_seqfeature_id = seqfeature.seqfeature_id) and seqfeature.bioentry_id = bioentry.bioentry_id and bioentry.name = 'AF100928'; Also could you change all the "Iterator" references in your test code to just "Iterator" as BioJava does not use Java 1.5 internally and I'm not sure what effect genericising the iterator might have (Java 1.5 can do funny things due to the effect of what they call erasure). This probably won't fix it, but it's worth a try. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Martina > Sent: Wednesday, May 18, 2005 6:08 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] nested features > > > Hi, > I'm still having problems with nested features. I know that the order > in which they are retrieved is not constant, but the hierachy should > be. I do: > > delete(db, "AF100928"); > create(db, sequence); > retrieveSeq(db, "AF100928"); > > and get: > > deleting AF100928 > adding a sequence > 1 > retrieving AF100928 > AF100928 contains 2 features > Source: 1 Type: AIF-1 contains: 0 > Project : [2572/73, A] > Source: ncbi Type: gen contains: 8 > Second Level: Source: AIF Type: siRNA contains: 0 > Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 > Second Level: Source: AIF Type: siRNA contains: 0 > Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 > Second Level: Source: AIF Type: siRNA contains: 0 > Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 > Second Level: Source: AIF Type: siRNA contains: 0 > Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 > > > OR, on some runs (at least one in 10) I get: > > deleting AF100928 > adding a sequence > 1 > retrieving AF100928 > AF100928 contains 1 features > Source: ncbi Type: gen contains: 2 > Second Level: Source: AIF Type: siRNA contains: 1 > Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 > > I can't see any difference in the biosql db (MySql), but I don't know > where to look exactly. > If I don't delete the record, just retrieve it on different runs, the > output stays the same. I'm using biojava live from last month. > I would be very grateful if someone could help me with this! > > Martina > > I add it like this: > > private static void create(BioSQLSequenceDB db, String sequence) > throws IllegalSymbolException, > IllegalArgumentException, > ChangeVetoException, IndexOutOfBoundsException, BioException { > Sequence seq = > DNATools.createDNASequence(sequence, "AF100928"); > // add annotation > seq.getAnnotation().setProperty("Lab-Genname", "AIF"); > seq.getAnnotation().setProperty("gi", "4323586"); > seq.getAnnotation() > .setProperty("genename", > "apoptosis-inducing factor"); > seq.getAnnotation().setProperty("organism", > "Homo sapiens"); > seq.getAnnotation().setProperty("protein-id", > "AAD16436.1"); > // fill the template > Feature.Template templSeq = new Feature.Template(); > templSeq.source = "ncbi"; > // templSeq.strand = StrandedFeature.UNKNOWN; > templSeq.type = "gen"; > templSeq.location = Location.empty; > Feature seqF = seq.createFeature(templSeq); > // create siRNA > StrandedFeature.Template templSt = new > StrandedFeature.Template(); > templSt.location = new RangeLocation(201, 221); > templSt.strand = StrandedFeature.POSITIVE; > templSt.source = "AIF"; // =>id > templSt.type = "siRNA"; // oder shRNA oder cellline? > Annotation annoSiRNA = new SimpleAnnotation(); > annoSiRNA.setProperty("abbreviation", "AIF"); > annoSiRNA.setProperty("no", 3); > templSt.annotation = annoSiRNA; > Feature sf = seqF.createFeature(templSt); > // create charge > Feature.Template charge = new Feature.Template(); > charge.type = "AIF-1"; > charge.source = "1"; // Erste Synthese > charge.location = Location.empty; > Annotation chargeAnno = new SmallAnnotation(); > chargeAnno.setProperty("Batchno", "2572/73"); > chargeAnno.setProperty("Project", "A"); > charge.annotation = chargeAnno; > Feature cf = sf.createFeature(charge); > // create Amplicon > StrandedFeature.Template templ = new > StrandedFeature.Template(); > templ.location = new RangeLocation(72, 92); > templ.source = "AIF 5`450 / AIF 3`682"; > templ.type = "Amplicon"; > templ.strand = StrandedFeature.UNKNOWN; > Annotation annoAmplicon = new SimpleAnnotation(); > annoAmplicon.setProperty("Provider", "Invitrogen"); > annoAmplicon.setProperty("designed by", "Chris"); > annoAmplicon.setProperty("Stock-Box/Position", > "rtS-A3/A5 + rtS-A3/A6"); > annoAmplicon.setProperty("Arbeitsl?sung", > "rtArb-A1/C3"); > // templ.annotation = annoAmplicon; > Feature af = seqF.createFeature(templ); > > System.out.println("adding a sequence"); > System.out.println(seq.countFeatures()); > db.addSequence(seq); > } > > and here it gets printed: > > private static void retrieveSeq(BioSQLSequenceDB db, String seqName) > throws BioException, NoSuchElementException { > Sequence seq; > System.out.println("retrieving " + seqName); > seq = db.getSequence(seqName); > System.out.println(seq.getName() + " contains " + seq.countFeatures() > + " features"); > for (Iterator i = seq.features(); i.hasNext();) { > Feature f = (Feature) i.next(); > System.out.println("Source: " + f.getSource() + " Type: " + > f.getType() + " contains: " + f.countFeatures()); > Annotation anno = f.getAnnotation(); > // print each key value pair > for (Iterator it = anno.keys().iterator(); it.hasNext();) { > Object key = it.next(); > System.out.println(key + " : " + anno.getProperty(key)); > } > for (Iterator it = f.features(); it.hasNext();) { > Feature fs = (Feature) it.next(); > System.out.println("Second Level: Source: " + > fs.getSource() > + " Type: " + fs.getType() + " > contains: " > + fs.countFeatures()); > } > } > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From boehme at mpiib-berlin.mpg.de Thu May 19 08:18:09 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Thu May 19 08:12:00 2005 Subject: [Biojava-l] nested features In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601B9432F@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601B9432F@BIONIC.biopolis.one-north.com> Message-ID: <428C8401.2080700@mpiib-berlin.mpg.de> Thanks for your interest, Richard, I should have made this clearer: yes, I'm expecting the second output: retrieving AF100928 AF100928 contains 1 features Source: ncbi Type: gen contains: 2 Second Level: Source: AIF Type: siRNA contains: 1 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 I am trying to upgrade to MySQL 4.1.12 (was 4.1 alpha before (maybe it was a problem there)) and it is not working right now, but I will post the results of the sql statements as soon as the db is up again. Martina Richard HOLLAND wrote: > Just to get this straight, if I follow your code correctly, then the second example output you should be seeing is the correct one. Could you confirm that this is what you are expecting? At least then it gives us something to work towards as a definite known correct answer. > > It would also be helpful if you could provide, for each of the two cases, the contents of the seqfeature and seqfeature_relationship tables. You don't need to provide them all. Just find out the bioentry_id of the inserted sequence (look it up in the 'bioentry' table using the 'name' column to look for your sequence name), then record all the rows from seqfeature with bioentry_id being that value, and all rows from seqfeature_relationship where either subject_seqfeature_id or object_seqfeature_id matches any of the seqfeature_id values you just got from the seqfeature table. > > In other words (assuming Oracle, but similar queries will work elsewhere): > > select seqfeature.* > from seqfeature, bioentry > where seqfeature.bioentry_id = bioentry.bioentry_id > and bioentry.name = 'AF100928'; > > select seqfeature_relationship.* > from seqfeature_relationship, seqfeature, bioentry > where (subject_seqfeature_id = seqfeature.seqfeature_id > or object_seqfeature_id = seqfeature.seqfeature_id) > and seqfeature.bioentry_id = bioentry.bioentry_id > and bioentry.name = 'AF100928'; > > Also could you change all the "Iterator" references in your test code to just "Iterator" as BioJava does not use Java 1.5 internally and I'm not sure what effect genericising the iterator might have (Java 1.5 can do funny things due to the effect of what they call erasure). This probably won't fix it, but it's worth a try. From cox at mshri.on.ca Thu May 19 18:43:50 2005 From: cox at mshri.on.ca (Brian Cox) Date: Thu May 19 18:36:01 2005 Subject: [Biojava-l] BioException Error Message-ID: <005701c55cc4$3a591830$426cc50a@ad.mshri.on.ca> I am getting a BioException Error thrown on a new MAC but the same code and BioJava version running on a windows box and a older MAC are okay. Is there any known issue with newer MACs? the error is: AlignExtractTF.java:221: cannot access BioException bad class file: ./BioException.java file does not contain class BioException Please remove or make sure it appears in the correct subdirectory of the classpath. public void mapSequence(File fileName)throws FileNotFoundException, IOException{ ^ 1 error the problematic code is; public void mapSequence(File fileName)throws FileNotFoundException, IOException{ try{ BufferedReader in = new BufferedReader (new FileReader (fileName)); SequenceIterator si = (SequenceIterator)SeqIOTools.fileToBiojava("fasta","dna", in); while (si.hasNext()){ Sequence seq = si.nextSequence(); seqMap.put(seq.getName(), seq.seqString()); seqAvailible.append(seq.getName()+"\n"); } } catch (FileNotFoundException ex) { //can't find file specified ex.printStackTrace(); } catch (BioException ex) { //error parsing requested format ex.printStackTrace(); } thanks Brian Cox Samuel Lunenfeld Research Institute Mount Sinai Hospital, Rm 884 Toronto, Ontario Canada 416-586-8266 From hollandr at gis.a-star.edu.sg Thu May 19 21:09:34 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Thu May 19 21:03:29 2005 Subject: [Biojava-l] BioException Error Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601B943AB@BIONIC.biopolis.one-north.com> The machine you run your code on should make no difference at all in this case. The problem is more likely to be with the Java environment on your newer machine, and/or the way in which BioJava was compiled. It does look like you may be running different versions of Java on the two machines. This could be a major contributing factor, depending on the vendors/versions of the two JREs you have installed. However, from the exception trace it looks more like you might have some of the individual BioJava .java files on your classpath, which would/should not work (I'm surprised it does on your older machine if the commands you use to run your program are the same on both!). Try compiling the BioJava jar file on your newer machine (using the build.xml Ant script supplied with the BioJava distribution), and adding only the jar files it generates to your classpath. Make sure your classpath contains nothing else (except other jar files your project requires, that is). cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Brian Cox > Sent: Friday, May 20, 2005 6:44 AM > To: biojava-l@biojava.org > Subject: [Biojava-l] BioException Error > > > I am getting a BioException Error thrown on a new MAC but the > same code and > BioJava version running on a windows box and a older MAC are > okay. Is there > any known issue with newer MACs? > > the error is: > AlignExtractTF.java:221: cannot access BioException > bad class file: ./BioException.java > file does not contain class BioException > Please remove or make sure it appears in the correct subdirectory of > the classpath. > public void mapSequence(File fileName)throws > FileNotFoundException, IOException{ > > > ^ > 1 error > > > the problematic code is; > > > public void mapSequence(File fileName)throws FileNotFoundException, > IOException{ > try{ > BufferedReader in = new BufferedReader (new FileReader (fileName)); > SequenceIterator si = > (SequenceIterator)SeqIOTools.fileToBiojava("fasta","dna", in); > while (si.hasNext()){ > Sequence seq = si.nextSequence(); > seqMap.put(seq.getName(), seq.seqString()); > seqAvailible.append(seq.getName()+"\n"); > } > > } > catch (FileNotFoundException ex) { > //can't find file specified > ex.printStackTrace(); > } > catch (BioException ex) { > //error parsing requested format > ex.printStackTrace(); > } > > > thanks > > > Brian Cox > Samuel Lunenfeld Research Institute > Mount Sinai Hospital, Rm 884 > Toronto, Ontario > Canada > > 416-586-8266 > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From boehme at mpiib-berlin.mpg.de Mon May 23 04:15:18 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon May 23 04:09:29 2005 Subject: [Biojava-l] nested features In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601B9432F@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601B9432F@BIONIC.biopolis.one-north.com> Message-ID: <42919116.2020701@mpiib-berlin.mpg.de> It looks like it was a database problem and had nothing to do with biojava - since MySQL 4.1.12 is running, I don't have that problem anymore. Sorry to have bothered you! Martina From daniela.saccol at gmail.com Mon May 23 19:39:08 2005 From: daniela.saccol at gmail.com (Daniela Saccol) Date: Mon May 23 19:31:46 2005 Subject: [Biojava-l] Help me, please!! Message-ID: <7bf45b8805052316391808059d@mail.gmail.com> Hi! I am starting to work with the Biojava and I am finding problems for which not meeting the answers. If somebody could help me, I would be immensely been thankful. I have a HMM that have 81 main states, a beginning state, an end state, 82 delete states and 82 insert states. The states delete do not emit symbols, as well as beginning and end. The states main and insert emit. The problem occurs when I give the following command: DP dp = DPFactory.DEFAULT.createDP(seqPromoter); >From this point, the program does not execute more, but also it does not emit errors. I read in some place that this could happen in one of the versions of the Biojava, because this not accepted more than one DotState (I have 82). This is truthful? In case that yes, the following versions of the Biojava correct this problem? Which? In case that not, somebody has idea of how can correct it? Thank you very much, Daniela. From jolyon.holdstock at ogt.co.uk Tue May 24 05:26:23 2005 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Tue May 24 05:19:08 2005 Subject: [Biojava-l] Scrolling in TranslatedSequencePanel Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F5A8E63@EUCLID.internal.ogtip.com> Hi, I am using a TranslatedSequencePanel (TSP) to display sequence with associated data and it all looks good (thank you BioJava). I want to add more tracks to the TSP but the height of the rendered image now becomes bigger than the space that it fits in. I have implemented scrolling along the sequence OK but cannot work out how to do this for the vertical view. If anybody has had some success with this I would be grateful for any help. Many thanks, Jolyon Holdstock From kalle.naslund at genpat.uu.se Tue May 24 07:17:52 2005 From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=) Date: Tue May 24 07:10:18 2005 Subject: [Biojava-l] Scrolling in TranslatedSequencePanel In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F5A8E63@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F5A8E63@EUCLID.internal.ogtip.com> Message-ID: <42930D60.6040000@genpat.uu.se> Jolyon Holdstock wrote: >Hi, > > > >I am using a TranslatedSequencePanel (TSP) to display sequence with >associated data and it all looks good (thank you BioJava). I want to add >more tracks to the TSP but the height of the rendered image now becomes >bigger than the space that it fits in. I have implemented scrolling >along the sequence OK but cannot work out how to do this for the >vertical view. > > > >If anybody has had some success with this I would be grateful for any >help. > > > The simple approach i use is to toss the TranslatedSequencePanel inside JScrollPane. The use of JScrollPane is the common way to get scrollability in Swing. In order to get the behaviour most people want you most likely should set HorizontalScrollBarPolicy to NEVER and VerticalScrollBarPoly to AS_NEEDED. /Kalle From kalle.naslund at genpat.uu.se Tue May 24 07:23:53 2005 From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=) Date: Tue May 24 07:13:51 2005 Subject: [Biojava-l] mouse selection of sequences possible? Message-ID: <42930EC9.4040405@genpat.uu.se> Fuchs, Mathias wrote: >Hi Kalle, >Thanks a lot for your help. Unfourtunately it seems there is a class "GUITools" missing. You are using it in the SelectionBarRenderer -- GUITools.getVisibleRange( src, g ) --; can you tell me where to find that class. Is it in the BioJava >framework? I couldn'd find it there. i am using Version 1.30 for jdk1.4 >Thanks >Mathias The code i provided have only been tested with biojava 1.4 and decently new CVS source. So for it to work you will have to use a biojava version that coitains the new Gui impovements. A quick look in the webCVS system seems to imply thatthe class GUITools should exist in 1.3, as revision 1.3.2.1 is on he release-1_3-branch. But my recomendation would be to uppgrade to biojava 1.4. /Kalle -----Original Message----- From: Kalle N?slund [mailto:kalle.naslund@genpat.uu.se] Sent: Monday, May 16, 2005 10:36 AM To: Fuchs, Mathias Cc: 'biojava-l@biojava.org' Subject: Re: [Biojava-l] mouse selection of sequences possible? I have just taken my code, and slapped a BJ copyright notice at the top of each file. So just uncompress the file, copy the java files to the appropriate location ( for now i just put everything under org.bioajva.bio.gui.sequence ) and rebuild biojava. Then you can use the classes. The code can be found at http://beluga.genpat.uu.se/kalle/selection_code.tgz From mark.schreiber at novartis.com Wed May 25 09:40:47 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed May 25 09:33:45 2005 Subject: [Biojava-l] Help me, please!! Message-ID: Hello Daniela - I'm not sure I can immediately help but can you tell us your biojava version and send an example program that shows the problem. We need to develop some more test cases for the DP package. I suspect there are some bugs that we need to find. - Mark Daniela Saccol Sent by: biojava-l-bounces@portal.open-bio.org 05/24/2005 02:39 AM Please respond to Daniela Saccol To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Help me, please!! Hi! I am starting to work with the Biojava and I am finding problems for which not meeting the answers. If somebody could help me, I would be immensely been thankful. I have a HMM that have 81 main states, a beginning state, an end state, 82 delete states and 82 insert states. The states delete do not emit symbols, as well as beginning and end. The states main and insert emit. The problem occurs when I give the following command: DP dp = DPFactory.DEFAULT.createDP(seqPromoter); >From this point, the program does not execute more, but also it does not emit errors. I read in some place that this could happen in one of the versions of the Biojava, because this not accepted more than one DotState (I have 82). This is truthful? In case that yes, the following versions of the Biojava correct this problem? Which? In case that not, somebody has idea of how can correct it? Thank you very much, Daniela. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Tue May 31 23:02:04 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Tue May 31 22:56:03 2005 Subject: [Biojava-l] Change Proposal regarding References Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601B94785@BIONIC.biopolis.one-north.com> Hi all, This is a two-pronged change proposal - first to allow BioJava to make correct use of the bioentry_dbxref tables in BioSQL, and second to allow it to parse reference information correctly from EMBL, Genbank, Genpept, GenXML, and SwissProt records and store them within Sequence objects in a consistent manner. Currently, references are loaded from only some of the above formats. Depending on the format, they are stored in different ways within Sequence object. Genbank references are stored with each line of the record as a separate annotation. eg. one annotation with a key saying REFERENCE and value giving a location, another with a key saying AUTHOR and a value listing them, etc. etc. As simple String/String annotations, they get persisted to the bioentry_qualifer_value table in BioSQL. As multiple references are read, they get stored with the same keys, so you end up with Annotations for these keys containing ArrayLists of potentially different arity, depending on which of the original references had which optional fields included (eg. PUBMED or MEDLINE). This makes it impossible to accurately reconstruct the original reference information when exporting the sequence to a file. EMBL/Swissprot references do almost the same thing, except the parser here gathers up the various reference tags from the file and wraps each set in its own ReferenceAnnotation class, which is just a map which gets flattened out and persisted to bioentry_qualifier_value as String/String annotation pairs as above. When loaded back in from BioSQL the ReferenceAnnotation objects are not recreated, and you end up with the same ArrayList problem as above, leading to the same problem when trying to export the sequence to a file. Another problem here is that the two approaches only understand their own methods when it comes to exporting references in their own file formats. So, the Genbank exporter cannot export references that were loaded from EMBL/Swissprot, and vice versa. Not good! So, I propose the following: 1) Change the file format parsers above to create, when reading sequences from file, an org.biojava.bibliography.BibRef objects for each inputted reference. This object can then be stored against the Sequence as an annotation, with the key of BibRef.class. As with all other kinds of annotation, if multiple references are loaded then the value of the annotation should be an ArrayList of the various BibRef objects. If only one reference is loaded, then the value should be the single BibRef object itself. 2) Change the file format parsers above to understand, when writing sequences to file, how to convert BibRef annotations into their own formats. 3) There is no restriction on which of the established BibRef subtypes from org.biojava.bibliography.* you can actually use to annotate the sequence. Usually you'll be wanting a BiblioJournalArticle object. However, you MUST use certain fields as follows: a) use the 'identifier' field to store the PubMed or MedLine ID (purely the ID, not prefixed with anything). b) use the 'publisher' field to store a BiblioOrganisation object with name set to 'PUBMED' or 'MEDLINE' as appropriate (must be upper case - if not, it will get changed to upper case on persistence to BioSQL, so you might as well stick it in upper case to start with). c) use the 'type' field to store a TYPE_* value from BibRefSupport to indicate what sort of resource this reference refers to (in most cases you'll want TYPE_JOURNAL_ARTICLE). 4) To alter BioSQLSequenceDB.persistBioentryProperty() to check for annotations with the key of BibRef.class or any of its established subtypes as above, and use special behaviour to persist these to the bioentry_dbxref table (and related tables as appropriate). 5) To alter BioSQLSequenceAnnotation.initAnnotations() to check for and load the bioentry_dbxref data as BibRef.class annotations. Any suggestions/changes/volunteers/violent objections? I can manage steps 4 and 5 myself quite easily, but will need help from everyone out there in updating the file parsers to use this proposed mechanism. cheers, Richard Richard Holland Bioinformatics Specialist Genome Institute of Singapore 60 Biopolis Street, #02-01 Genome, Singapore 138672 Tel: (65) 6478 8000 DID: (65) 6478 8199 Email: hollandr@gis.a-star.edu.sg --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- From mark.schreiber at novartis.com Tue May 31 23:18:08 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue May 31 23:12:32 2005 Subject: [Biojava-l] Change Proposal regarding References Message-ID: I'd support this and might be able to help out with advice or words of encouragement (coffee at least) for the first few steps. I would also encourage you to look into the rank column of the appropriate BioSQL tables. The rank column is intented to help preserve the order of comments, dbxrefs, references, qualifiers etc so that when you dump something out in Genbank format you get everything in the same order it was read in. I'm not sure Biojava makes sensible use of rank columns at the moment. - Mark "Richard HOLLAND" Sent by: biojava-l-bounces@portal.open-bio.org 06/01/2005 11:02 AM To: , "OBDA BioSQL" cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Change Proposal regarding References Hi all, This is a two-pronged change proposal - first to allow BioJava to make correct use of the bioentry_dbxref tables in BioSQL, and second to allow it to parse reference information correctly from EMBL, Genbank, Genpept, GenXML, and SwissProt records and store them within Sequence objects in a consistent manner. Currently, references are loaded from only some of the above formats. Depending on the format, they are stored in different ways within Sequence object. Genbank references are stored with each line of the record as a separate annotation. eg. one annotation with a key saying REFERENCE and value giving a location, another with a key saying AUTHOR and a value listing them, etc. etc. As simple String/String annotations, they get persisted to the bioentry_qualifer_value table in BioSQL. As multiple references are read, they get stored with the same keys, so you end up with Annotations for these keys containing ArrayLists of potentially different arity, depending on which of the original references had which optional fields included (eg. PUBMED or MEDLINE). This makes it impossible to accurately reconstruct the original reference information when exporting the sequence to a file. EMBL/Swissprot references do almost the same thing, except the parser here gathers up the various reference tags from the file and wraps each set in its own ReferenceAnnotation class, which is just a map which gets flattened out and persisted to bioentry_qualifier_value as String/String annotation pairs as above. When loaded back in from BioSQL the ReferenceAnnotation objects are not recreated, and you end up with the same ArrayList problem as above, leading to the same problem when trying to export the sequence to a file. Another problem here is that the two approaches only understand their own methods when it comes to exporting references in their own file formats. So, the Genbank exporter cannot export references that were loaded from EMBL/Swissprot, and vice versa. Not good! So, I propose the following: 1) Change the file format parsers above to create, when reading sequences from file, an org.biojava.bibliography.BibRef objects for each inputted reference. This object can then be stored against the Sequence as an annotation, with the key of BibRef.class. As with all other kinds of annotation, if multiple references are loaded then the value of the annotation should be an ArrayList of the various BibRef objects. If only one reference is loaded, then the value should be the single BibRef object itself. 2) Change the file format parsers above to understand, when writing sequences to file, how to convert BibRef annotations into their own formats. 3) There is no restriction on which of the established BibRef subtypes from org.biojava.bibliography.* you can actually use to annotate the sequence. Usually you'll be wanting a BiblioJournalArticle object. However, you MUST use certain fields as follows: a) use the 'identifier' field to store the PubMed or MedLine ID (purely the ID, not prefixed with anything). b) use the 'publisher' field to store a BiblioOrganisation object with name set to 'PUBMED' or 'MEDLINE' as appropriate (must be upper case - if not, it will get changed to upper case on persistence to BioSQL, so you might as well stick it in upper case to start with). c) use the 'type' field to store a TYPE_* value from BibRefSupport to indicate what sort of resource this reference refers to (in most cases you'll want TYPE_JOURNAL_ARTICLE). 4) To alter BioSQLSequenceDB.persistBioentryProperty() to check for annotations with the key of BibRef.class or any of its established subtypes as above, and use special behaviour to persist these to the bioentry_dbxref table (and related tables as appropriate). 5) To alter BioSQLSequenceAnnotation.initAnnotations() to check for and load the bioentry_dbxref data as BibRef.class annotations. Any suggestions/changes/volunteers/violent objections? I can manage steps 4 and 5 myself quite easily, but will need help from everyone out there in updating the file parsers to use this proposed mechanism. cheers, Richard Richard Holland Bioinformatics Specialist Genome Institute of Singapore 60 Biopolis Street, #02-01 Genome, Singapore 138672 Tel: (65) 6478 8000 DID: (65) 6478 8199 Email: hollandr@gis.a-star.edu.sg --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Tue May 31 23:42:30 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Tue May 31 23:36:05 2005 Subject: [Biojava-l] Change Proposal regarding References Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601B94799@BIONIC.biopolis.one-north.com> OK, I'll bear that in mind. Most annotations currently have rank implied by the order they were loaded, as the underlying class in the commonly-used SimpleAnnotation is a LinkedHashMap which preserves order of iteration. We can use this property of LinkedHashMap to assign ranks as annotations pass into BioSQL. Retrieval will be slightly harder but not impossible - it will involve loading annotations of all kinds from the database into a temporary sorted map of rank->annotation then creating the SimpleAnnotation object to be returned from the value set of this temporary map ordered by key. (BioSQLSequenceAnnotation will have to be changed to use SimpleAnnotation on retrieving data - currently it uses SmallAnnotation which is not ordered). For sequences annotated with things other than SimpleAnnotation objects or their subtypes, you will find the annotations come back in a different order. However I'm not sure if this is the case anywhere at present. I should also point out that we should be using the 'bioentry_reference' and 'reference' tables, and not 'bioentry_dbxref' as I mistakenly mentioned in the original post. Note that the 'identifier' and 'provider' fields in BibRef are optional and only for use when a PubMed/Medline etc. value has been specified in the original file. They will both be ignored by the BioSQL persistence layer if either are set to null. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: mark.schreiber@novartis.com > [mailto:mark.schreiber@novartis.com] > Sent: Wednesday, June 01, 2005 11:18 AM > To: Richard HOLLAND > Cc: biojava-l@biojava.org; > biojava-l-bounces@portal.open-bio.org; OBDA BioSQL > Subject: Re: [Biojava-l] Change Proposal regarding References > > > I'd support this and might be able to help out with advice or > words of > encouragement (coffee at least) for the first few steps. > > I would also encourage you to look into the rank column of > the appropriate > BioSQL tables. The rank column is intented to help preserve > the order of > comments, dbxrefs, references, qualifiers etc so that when you dump > something out in Genbank format you get everything in the > same order it > was read in. I'm not sure Biojava makes sensible use of rank > columns at > the moment. > > - Mark > > > > > > "Richard HOLLAND" > Sent by: biojava-l-bounces@portal.open-bio.org > 06/01/2005 11:02 AM > > > To: , "OBDA BioSQL" > > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Change Proposal regarding > References > > > Hi all, > > This is a two-pronged change proposal - first to allow > BioJava to make > correct use of the bioentry_dbxref tables in BioSQL, and > second to allow > it to parse reference information correctly from EMBL, > Genbank, Genpept, > GenXML, and SwissProt records and store them within Sequence > objects in a > consistent manner. > > Currently, references are loaded from only some of the above formats. > Depending on the format, they are stored in different ways > within Sequence > object. > > Genbank references are stored with each line of the record as > a separate > annotation. eg. one annotation with a key saying REFERENCE and value > giving a location, another with a key saying AUTHOR and a > value listing > them, etc. etc. As simple String/String annotations, they get > persisted to > the bioentry_qualifer_value table in BioSQL. As multiple > references are > read, they get stored with the same keys, so you end up with > Annotations > for these keys containing ArrayLists of potentially different arity, > depending on which of the original references had which > optional fields > included (eg. PUBMED or MEDLINE). This makes it impossible to > accurately > reconstruct the original reference information when exporting > the sequence > to a file. > > EMBL/Swissprot references do almost the same thing, except > the parser here > gathers up the various reference tags from the file and wraps > each set in > its own ReferenceAnnotation class, which is just a map which gets > flattened out and persisted to bioentry_qualifier_value as > String/String > annotation pairs as above. When loaded back in from BioSQL the > ReferenceAnnotation objects are not recreated, and you end up > with the > same ArrayList problem as above, leading to the same problem > when trying > to export the sequence to a file. > > Another problem here is that the two approaches only > understand their own > methods when it comes to exporting references in their own > file formats. > So, the Genbank exporter cannot export references that were > loaded from > EMBL/Swissprot, and vice versa. > > Not good! > > So, I propose the following: > > 1) Change the file format parsers above to > create, when > reading sequences from file, an > org.biojava.bibliography.BibRef objects > for each inputted reference. This object can then be stored > against the > Sequence as an annotation, with the key of BibRef.class. As > with all other > kinds of annotation, if multiple references are loaded then > the value of > the annotation should be an ArrayList of the various BibRef > objects. If > only one reference is loaded, then the value should be the > single BibRef > object itself. > 2) Change the file format parsers above to > understand, > when writing sequences to file, how to convert BibRef > annotations into > their own formats. > 3) There is no restriction on which of the > established > BibRef subtypes from org.biojava.bibliography.* you can > actually use to > annotate the sequence. Usually you'll be wanting a > BiblioJournalArticle > object. However, you MUST use certain fields as follows: > a) use the 'identifier' > field to store > the PubMed or MedLine ID (purely the ID, not prefixed with anything). > b) use the 'publisher' field > to store a > BiblioOrganisation object with name set to 'PUBMED' or 'MEDLINE' as > appropriate (must be upper case - if not, it will get changed > to upper > case on persistence to BioSQL, so you might as well stick it > in upper case > to start with). > c) use the 'type' field to > store a TYPE_* > value from BibRefSupport to indicate what sort of resource > this reference > refers to (in most cases you'll want TYPE_JOURNAL_ARTICLE). > 4) To alter > BioSQLSequenceDB.persistBioentryProperty() to > check for annotations with the key of BibRef.class or any of its > established subtypes as above, and use special behaviour to > persist these > to the bioentry_dbxref table (and related tables as appropriate). > 5) To alter > BioSQLSequenceAnnotation.initAnnotations() to > check for and load the bioentry_dbxref data as BibRef.class > annotations. > > Any suggestions/changes/volunteers/violent objections? I can > manage steps > 4 and 5 myself quite easily, but will need help from everyone > out there in > updating the file parsers to use this proposed mechanism. > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > Genome Institute of Singapore > 60 Biopolis Street, #02-01 Genome, Singapore 138672 > Tel: (65) 6478 8000 DID: (65) 6478 8199 > Email: hollandr@gis.a-star.edu.sg > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us > immediately. Please do > not copy or use it for any purpose, or disclose its content > to any other > person. Thank you. > --------------------------------------------- > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > From boehme at mpiib-berlin.mpg.de Wed May 4 05:50:29 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 13 13:20:55 2005 Subject: [Biojava-l] [BioSQL-l] hierachy of features In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D56019514BC@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D56019514BC@BIONIC.biopolis.one-north.com> Message-ID: <42789C8A.8010708@mpiib-berlin.mpg.de> I get differend results on differend runs. The hierachy of features changes (not only the order) - with the same code. I'm using bioJava Live. I do: delete(db, "AF100928"); create(db, sequence); retrieveSeq(db, "AF100928"); and get: deleting AF100928 adding a sequence 1 retrieving AF100928 AF100928 contains 1 features Source: ncbi Type: gen contains: 2 Second Level: Source: AIF Type: siRNA contains: 1 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 OR, on some runs (at least one in 10) I get: deleting AF100928 adding a sequence 1 retrieving AF100928 AF100928 contains 2 features Source: 1 Type: AIF-1 contains: 0 Project : [2572/73, A] Source: ncbi Type: gen contains: 8 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 Second Level: Source: AIF Type: siRNA contains: 0 Second Level: Source: AIF 5`450 / AIF 3`682 Type: Amplicon contains: 0 Just in case, I attached the source. I can see only 4 features in Table seqfeature - but that might not be the right table to look into? Can anybody help me with this? Thanks Martina -------------- next part -------------- import org.biojava.bio.BioException; import org.biojava.bio.symbol.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.utils.*; import org.biojava.bio.seq.DNATools; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.db.biosql.BioSQLSequenceDB; import org.biojava.utils.ChangeVetoException; import org.biojava.bio.seq.io.SeqIOTools; import java.io.*; import java.sql.SQLException; import java.util.*; /** *

* Tests a connection to a BioSQLSequenceDB and a simple Sequence write, read * and delete *

*/ public class bioSqlTest { public static void main(String[] args) { // url format depends on your jdbc driver String dbURL = "jdbc:mysql://somnus/test"; String dbUser = "xxxxxxx"; String dbPass = "xxxxx"; // we will connect to a biodatabase called test String biodatabase = "test"; // or create one if it doesn't exist boolean createIfMissing = true; try { // load a JDBC driver Class.forName("com.mysql.jdbc.Driver"); } catch (ClassNotFoundException ex) { System.out .println("Cannot find DB driver, is it on your classpath?"); } try { // create a connection BioSQLSequenceDB db = new BioSQLSequenceDB(dbURL, dbUser, dbPass, biodatabase, createIfMissing); String sequence = "agaggaaagggaaggaggaggtcccgaatagcggtcgccgaaatgttccggtgtggaggcctggcggcgggtgctttgaagcagaagctggtgcccttggtgcggaccgtgtgcgtccgaagcccgaggcagaggaaccggctcccaggcaacttgttccagcgatggcatgttcctctagaactccagatgacaagacaaatggctagctctggtgcatcagggggcaaaatcgataattctgtgttagtccttattgtgggcttatcaacagtaggagctggtgcctatgcctacaagactatgaaagaggacgaaaaaagatacaatgaaagaatttcagggttagggctgacaccagaacagaaacagaaaaaggccgcgttatctgcttcagaaggagaggaagttcctcaagacaaggcgccaagtcatgttcctttcctgctaattggtggaggcacagctgcttttgctgcagccagatccatccgggctcgggatcctggggccagggtactgattgtatctgaagatcctgagctgccgtacatgcgacctcctctttcaaaagaactgtggttttcagatgacccaaatgtcacaaagacactgcgattcaaacagtggaatggaaaagagagaagcatatatttccagccaccttctttctatgtctctgctcaggacctgcctcatattgagaatggtggtgtggctgtcctcactgggaagaaggtagtacagctggatgtgagagacaacatggtgaaacttaatgatggctctcaaataacctatgaaaagtgcttgattgcaacaggaggtactccaagaagtctgtctgccattgatagggctggagcagaggtgaagagtagaacaacgcttttcagaaagattggagactttagaagcttggagaagatttcacgggaagtcaaatcaattacgattatcggtgggggcttccttggtagcgaactggcctgtgctcttggcagaaaggctcgagccttgggcacagaagtgattcaactcttccccgagaaaggaaatatgggaaagatcctccccgaatacctcagcaactggaccatggaaaaagtcagacgagagggggttaaggtgatgcccaatgctattgtgcaatccgttggagtcagcagtggcaagttacttatcaagctgaaagacggcaggaaggtagaaactgaccacatagtggcagctgtgggcctggagcccaatgttgagttggccaagactggtggcctggaaatagactcagattttggtggcttccgggtaaatgcagagctacaagcacgctctaacatctgggtggcaggagatgctgcatgcttctacgatataaagttgggaaggaggcgggtagagcaccatgatcacgctgttgtgagtggaagattggctggagaaaatatgactggagctgctaagccgtactggcatcagtcaatgttctggagtgatttgggccccgatgttggctatgaagctattggtcttgtggacagtagtttgcccacagttggtgtttttgcaaaagcaactgcacaagacaaccccaaatctgccacagagcagtcaggaactggtatccgatcagagagtgagacagagtccgaggcctcagaaattactattcctcccagcaccccggcagttccacaggctcccgtccagggggaggactacggcaaaggtgtcatcttctacctcagggacaaagtggtcgtggggattgtgctatggaacatctttaaccgaatgccaatagcaaggaagatcattaaggacggtgagcagcatgaagatctcaatgaagtagccaaactattcaacattcatgaagactgaagccccacagtggaattggcaa"; sequence = sequence.replace(" ", ""); delete(db, "AF100928"); create(db, sequence); retrieveSeq(db, "AF100928"); } // } catch (ChangeVetoException ex) { // System.err.println("Cannot add Sequence, is the DB locked?"); // System.exit(1); // // } catch (Exception ex) { ex.printStackTrace(); System.exit(1); } } private static void create(BioSQLSequenceDB db, String sequence) throws IllegalSymbolException, IllegalArgumentException, ChangeVetoException, IndexOutOfBoundsException, BioException { Sequence seq = DNATools.createDNASequence(sequence, "AF100928"); // add annotation seq.getAnnotation().setProperty("Lab-Genname", "AIF"); seq.getAnnotation().setProperty("gi", "4323586"); seq.getAnnotation() .setProperty("genename", "apoptosis-inducing factor"); seq.getAnnotation().setProperty("organism", "Homo sapiens"); seq.getAnnotation().setProperty("protein-id", "AAD16436.1"); // fill the template Feature.Template templSeq = new Feature.Template(); templSeq.source = "ncbi"; // templSeq.strand = StrandedFeature.UNKNOWN; templSeq.type = "gen"; templSeq.location = Location.empty; Feature seqF = seq.createFeature(templSeq); // create siRNA StrandedFeature.Template templSt = new StrandedFeature.Template(); templSt.location = new RangeLocation(201, 221); templSt.strand = StrandedFeature.POSITIVE; templSt.source = "AIF"; // =>id templSt.type = "siRNA"; // oder shRNA oder cellline? Annotation annoSiRNA = new SimpleAnnotation(); annoSiRNA.setProperty("abbreviation", "AIF"); annoSiRNA.setProperty("no", 3); templSt.annotation = annoSiRNA; Feature sf = seqF.createFeature(templSt); // create charge Feature.Template charge = new Feature.Template(); charge.type = "AIF-1"; // Name, so wie er in der siRNA Liste steht charge.source = "1"; // Erste Synthese charge.location = Location.empty; Annotation chargeAnno = new SmallAnnotation(); chargeAnno.setProperty("Batchno", "2572/73"); chargeAnno.setProperty("Project", "A"); charge.annotation = chargeAnno; Feature cf = sf.createFeature(charge); // create Amplicon StrandedFeature.Template templ = new StrandedFeature.Template(); templ.location = new RangeLocation(72, 92); templ.source = "AIF 5`450 / AIF 3`682"; templ.type = "Amplicon"; templ.strand = StrandedFeature.UNKNOWN; Annotation annoAmplicon = new SimpleAnnotation(); annoAmplicon.setProperty("Provider", "Invitrogen"); annoAmplicon.setProperty("designed by", "Chris"); annoAmplicon.setProperty("Stock-Box/Position", "rtS-A3/A5 + rtS-A3/A6"); annoAmplicon.setProperty("Arbeitsl�sung", "rtArb-A1/C3"); // templ.annotation = annoAmplicon; Feature af = seqF.createFeature(templ); System.out.println("adding a sequence"); System.out.println(seq.countFeatures()); db.addSequence(seq); // seq = null; //cannot remove unless there are no references to the // sequence } private static void delete(BioSQLSequenceDB db, String name) { try { // delete the record System.out.println("deleting " + name); db.removeSequence(name); } catch (Exception ex) { System.err.println("Cannot remove " + name + " is the DB locked?"); } } private static void retrieveSeq(BioSQLSequenceDB db, String seqName) throws BioException, NoSuchElementException { Sequence seq; System.out.println("retrieving " + seqName); seq = db.getSequence(seqName); // try { System.out.println(seq.getName() + " contains " + seq.countFeatures() + " features"); for (Iterator i = seq.features(); i.hasNext();) { Feature f = (Feature) i.next(); /* Print only 'toplevel' features */ System.out.println("Source: " + f.getSource() + " Type: " + f.getType() + " contains: " + f.countFeatures()); Annotation anno = f.getAnnotation(); // print each key value pair for (Iterator it = anno.keys().iterator(); it.hasNext();) { Object key = it.next(); System.out.println(key + " : " + anno.getProperty(key)); } for (Iterator it = f.features(); it.hasNext();) { Feature fs = (Feature) it.next(); // print second level features System.out.println("Second Level: Source: " + fs.getSource() + " Type: " + fs.getType() + " contains: " + fs.countFeatures()); } } } }