From heatkent at gmail.com Wed Mar 1 16:15:20 2006 From: heatkent at gmail.com (Heather Kent) Date: Thu Mar 2 01:55:49 2006 Subject: [Biojava-l] problems with SubIntegerAlphabet Message-ID: Hi, i'm currently having a problem with the IntegerAlphabet.SubIntegerAlphabetclass. When i make a call to the seqstring() method from AbstractSymbolList i get an error, No such element Exception "parser not supported by Integer Alphabet yet" in the getTokenization method of my SubIntegerAlphabet class the call from seqstring to getTokenization sends "default" as the string name .....the getTokenization method for the IntegerAlphabet class accepts both "token" or "default" but the SubIntegerAlphabet class only accepts only "token" can anyone help me find a way around this when i'm working with SubIntegerAlphabets?? thanx Heather From mark.schreiber at novartis.com Thu Mar 2 04:09:08 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Mar 2 04:04:33 2006 Subject: [Biojava-l] problems with SubIntegerAlphabet Message-ID: Hi - The actual code in seqString() method gets the "default" tokenizer from the parent Alphabet (SubInteger in this case) and asks it to tokenize the SymbolList. It is a bug that SubIntegerAlphabet doesn't have a "default", however, if you use the code from SimpleSymbolList's .seqString() method as an example you can do the equivalent operation "manually" as a work around using "token" instead of default. Let me know if you have problems... I will also fix this bug in CVS shortly. - Mark "Heather Kent" Sent by: biojava-l-bounces@portal.open-bio.org 03/02/2006 05:15 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] problems with SubIntegerAlphabet Hi, i'm currently having a problem with the IntegerAlphabet.SubIntegerAlphabetclass. When i make a call to the seqstring() method from AbstractSymbolList i get an error, No such element Exception "parser not supported by Integer Alphabet yet" in the getTokenization method of my SubIntegerAlphabet class the call from seqstring to getTokenization sends "default" as the string name .....the getTokenization method for the IntegerAlphabet class accepts both "token" or "default" but the SubIntegerAlphabet class only accepts only "token" can anyone help me find a way around this when i'm working with SubIntegerAlphabets?? thanx Heather _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Thu Mar 2 06:48:05 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Mar 2 06:43:54 2006 Subject: [Biojava-l] problems with SubIntegerAlphabet Message-ID: Fixed in CVS now - Mark "Heather Kent" Sent by: biojava-l-bounces@portal.open-bio.org 03/02/2006 05:15 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] problems with SubIntegerAlphabet Hi, i'm currently having a problem with the IntegerAlphabet.SubIntegerAlphabetclass. When i make a call to the seqstring() method from AbstractSymbolList i get an error, No such element Exception "parser not supported by Integer Alphabet yet" in the getTokenization method of my SubIntegerAlphabet class the call from seqstring to getTokenization sends "default" as the string name .....the getTokenization method for the IntegerAlphabet class accepts both "token" or "default" but the SubIntegerAlphabet class only accepts only "token" can anyone help me find a way around this when i'm working with SubIntegerAlphabets?? thanx Heather _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From admin at unleashedinformatics.com Thu Mar 2 18:14:22 2006 From: admin at unleashedinformatics.com (Unleashed Informatics Administration) Date: Thu Mar 2 18:09:28 2006 Subject: [Biojava-l] SeqHound User Support Message-ID: <44077C4E.9010500@unleashedinformatics.com> As announced on 2 March 2006, SeqHound has been replaced by "DogBox Online". From 3 April 2006, users of the SeqHound API will be required to provide their e-mail address when beginning a block of SeqHound (now DogBox Online) calls. Users who fail to provide a valid address will not have access to the API, and will not have access to user support. The following FAQ has been posted to the DogBox Online website. FAQ: Q. What is DogBox Online? DogBox Online is a powerful integrated data service for the life science community, and represents the new name for the SeqHound service offered by The Blueprint Initiative. The new service is located at: http://dogboxonline.unleashedinformatics.com Q. What happened to SeqHound? The SeqHound service is being phased out. Please change to DogBox Online. Q. How will DogBox Online differ? DogBox Online includes several new features previously available only to DogBox customers. Q. Will my use of the SeqHound API be affected? Yes. On 3 April 2006, Unleashed Informatics will require SeqHound users to submit a valid e-mail address as part of the initial SHoundInit call. If you were not using a SHoundInit call or similar in your code, ensure that you now begin your scripts with this call. Q. Can you give me an example? Using the Perl API, you need to begin your series of SeqHound queries like so: > use SeqHound; > SHoundInit('Program Name'); Replace the 'Program Name' text with your valid e-mail address: > use SeqHound; > SHoundInit('joe.bloggs@blogme.com'); To avoid disappointment, we recommend you change your scripts now to employ the new URL, http://dogboxonline.unleashedinformatics.com. Q. What happens if I don't provide a valid email address? Your use of the API will be blocked until you do so. Q. Why is this change happening? 1. To obtain feedback from users regarding API use and improvement. 2. To notify users of future developments and new features. Q. Where do I sign up? https://secure.unleashedinformatics.com/index.php?pg=support.register Thank you for your co-operation. From admin at unleashedinformatics.com Thu Mar 2 18:09:05 2006 From: admin at unleashedinformatics.com (Unleashed Informatics Administration) Date: Thu Mar 2 20:04:09 2006 Subject: [Biojava-l] Unleashed Informatics Supports DogBox Online Community Message-ID: <44077B11.7090608@unleashedinformatics.com> In December 2005, Unleashed Informatics acquired commercial rights to Blueprint Initiative intellectual property from Mount Sinai Hospital. Spun-off from The Blueprint Initiative public research program at Toronto?s Mount Sinai Hospital, Unleashed Informatics provides integrated hardware and software products designed to harness the power of increasingly complex scientific data. On 22 February, Unleashed Informatics released DogBox Online as an open access product to the life science community. DogBox Online is an integrated, online data retrieval service and represents the new, re-named SeqHound service previously offered by the Blueprint Initiative. The new service is located at http://dogboxonline.unleashedinformatics.com, and requires a free Unleashed Informatics account for unrestricted access. The DogBox Online registration process will help Unleashed Informatics better understand the resource user base, and ultimately help us improve our open access offerings in line with the needs of the life sciences community. Importantly, the collection of such user feedback is essential for the preparation of planned public good research grant applications aimed at funding the ongoing provision of open source and freely available bioinformatics resources. Unleashed Informatics is making a concerted effort to develop, maintain and improve open access resources for global researchers. The release this past week of the freely accessible DogBox Online reaffirms the company?s commitment to open access resources. Specific support documentation for new DogBox Online service can be found in the Help section. From wendy.wong at gmail.com Fri Mar 3 09:28:18 2006 From: wendy.wong at gmail.com (wendy wong) Date: Fri Mar 3 22:35:43 2006 Subject: [Biojava-l] odds ratio Message-ID: Hi, I am trying to set up an HMM wth a few states. I have a background state with background distribution. For the rest of the states, I set up the distribution and then use the setNullModel to set the null distribution to the background distribution. Am I doing the right thing? The reason why I am asking is when I tried using the forward backward algorithm, the scores of the state that I am interested in at each site is greater than 2000. I would expect some sites to have a number less than 1 to indicate that it is more likely to be in the null distribution, or am I doing something totally wrong here? thanks, wendy From mark.schreiber at novartis.com Sun Mar 5 20:24:00 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Mar 5 20:19:05 2006 Subject: [Biojava-l] odds ratio Message-ID: >Hi, > >I am trying to set up an HMM wth a few states. I have a background >state with background distribution. For the rest of the states, I set >up the distribution and then use the setNullModel to set the null >distribution to the background distribution. Am I doing the right >thing? Sounds correct. >The reason why I am asking is when I tried using the forward backward >algorithm, the scores of the state that I am interested in at each >site is greater than 2000. I would expect some sites to have a number >less than 1 to indicate that it is more likely to be in the null >distribution, or am I doing something totally wrong here? If you used an odds scoring function then you have done things correctly. Sounds wierd. - Mark From wendy.wong at gmail.com Mon Mar 6 06:16:46 2006 From: wendy.wong at gmail.com (wendy wong) Date: Mon Mar 6 06:19:04 2006 Subject: [Biojava-l] odds ratio In-Reply-To: References: Message-ID: Hi, here is what I have done: First, I set up the distribution for a state, then do siteDistribution.setNullModel(nullDistribution); to set up the null Distribution then use this function to get the odds ratio of my C3 state (the one i am interested in). Maybe I can try getting the odds ratio for the background state and see what numers I got? public static void getC3PosteriorOdds(MarkovModel model, SymbolList obsSymList) throws IllegalArgumentException, BioException { DP dp = DPFactory.DEFAULT.createDP(model); SymbolList[] obs_array = {obsSymList}; DPMatrix dpMatrix = dp.forwardsBackwards(obs_array, ScoreType.ODDS); State [] states = dp.getStates(); //find the C3 site int c3Index = -1; for (int s = 0; s < dp.getDotStatesIndex(); s++) { if (states[s].getName().equalsIgnoreCase("C.3")) { c3Index = s; break; } } for (int i = 0; i < obsSymList.length(); i++) { int[] array = { c3Index, i }; log.debug(i+1 + " " + dpMatrix.getCell(array)); } } thanks, wendy On 3/6/06, mark.schreiber@novartis.com wrote: > >Hi, > > > >I am trying to set up an HMM wth a few states. I have a background > >state with background distribution. For the rest of the states, I set > >up the distribution and then use the setNullModel to set the null > >distribution to the background distribution. Am I doing the right > >thing? > > Sounds correct. > > >The reason why I am asking is when I tried using the forward backward > >algorithm, the scores of the state that I am interested in at each > >site is greater than 2000. I would expect some sites to have a number > >less than 1 to indicate that it is more likely to be in the null > >distribution, or am I doing something totally wrong here? > > If you used an odds scoring function then you have done things correctly. > Sounds wierd. > > - Mark > > From jolyon.holdstock at ogt.co.uk Wed Mar 8 05:47:07 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Wed Mar 8 07:11:05 2006 Subject: [Biojava-l] BiojavaX EmblFormat Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FA7AEC7@EUCLID.internal.ogtip.com> Hi, I am using the new format parsers in BioJavaX. GenbankFormat is great, but I am having some trouble with the EMBLFormat class. I have downloaded a sequence file (ID:U00096) from the EBI in EMBL format but I don't believe it is parsing properly. My code is as follows: String fileName = "path to file"; try { RichSequenceIterator rsi = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new FileReader(fileName)), null); while (rsi.hasNext()) { RichSequence seq = rsi.nextRichSequence(); System.out.println(seq.getURN()); System.out.println(seq.length()); System.out.println(seq.getAccession()); } } catch (IOException IOE) { System.out.println("BioJava IOException " + IOE); } catch (BioException BIOE) { System.out.println("BioJavaX BioException " + BIOE); BIOE.printStackTrace(); } The BioJava parser will read it. seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); //works I checked the web CVS and the EMBLFormat class is 3 months old so I am using the most recent version. I have pasted a snippet of the sequence file that retains the problems below. The errors are: The ID line isn't parsed because of 'genomic' being there - deleting it removes the problem org.biojava.bio.BioException: Could not read sequence Caused by: org.biojava.bio.seq.io.ParseException: Bad ID line found: U00096 standard; circular genomic DNA; PRO; 4639675 BP. ID U00096 standard; circular genomic DNA; PRO; 4639675 BP. //fails ID U00096 standard; circular DNA; PRO; 4639675 BP. //works There is a problem with the RX tag which fails with output: org.biojava.bio.BioException: Could not read sequence Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:352) Replacing RX DOI; 10.1126/science.277.5331.1453. with removes the error XX RX DOI; 10.1126/science.277.5331.1453. There is an error with parsing the authors org.biojava.bio.BioException: Could not read sequence Caused by: java.lang.IllegalArgumentException: Authors string cannot be null at org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:395) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead er.java:100) I am looking at the code trying to see where the problems are but suspect that it may be beyond me. So if anybody has some experience of this I would welcome their input. Thanks, Jolyon This is a snippet of the code that reproduces the errors in my hands. ID U00096 standard; circular genomic DNA; PRO; 4639675 BP. XX AC U00096; AE000111-AE000510; XX SV U00096.2 XX DT 23-FEB-2006 (Rel. 86, Created) DT 06-MAR-2006 (Rel. 87, Last updated, Version 3) XX DE Escherichia coli K-12 MG1655, complete genome. XX KW . XX OS Escherichia coli K12 OC Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; OC Enterobacteriaceae; Escherichia. XX RN [1] RP 1-4639675 RX DOI; 10.1126/science.277.5331.1453. RX PUBMED; 9278503. RA Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V., Riley M., RA Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J., RA Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao Y.; RT "The complete genome sequence of Escherichia coli K-12"; RL Science 277(5331):1453-1474(1997). XX RN [2] RP 1-4639675 RX DOI; 10.1093/nar/gkj150. RX PUBMED; 16397293. RA Riley M., Abe T., Arnaud M.B., Berlyn M.K., Blattner F.R., Chaudhuri R.R., RA Glasner J.D., Horiuchi T., Keseler I.M., Kosuge T., Mori H., Perna N.T., RA Plunkett G. III, Rudd K.E., Serres M.H., Thomas G.H., Thomson N.R., RA Wishart D., Wanner B.L.; RT "Escherichia coli K-12: a cooperatively developed annotation RT snapshot--2005"; RL (er) Nucleic Acids Res. 34 (1), 1-9 (2006) XX RN [3] RC Woods Hole, Mass., on 14-18 November 2003 (sequence corrections) RP 1-4639675 RA Arnaud M., Berlyn M.K.B., Blattner F.R., Galperin M.Y., Glasner J.D., RA Horiuchi T., Kosuge T., Mori H., Perna N.T., Plunkett G. III, Riley M., RA Rudd K.E., Serres M.H., Thomas G.H., Wanner B.L.; RT "Workshop on Annotation of Escherichia coli K-12"; RL Unpublished. XX RN [4] RC ASAP download 10 June 2004 (annotation updates) RP 1-4639675 RA Glasner J.D., Perna N.T., Plunkett G. III, Anderson B.D., Bockhorst J., RA Hu J.C., Riley M., Rudd K.E., Serres M.H.; RT "ASAP: Escherichia coli K-12 strain MG1655 version m56"; RL Unpublished. XX RN [5] RC GenBank accessions AG613214 to AG613378 (sequence corrections) RP 1-4639675 RA Hayashi K., Morooka N., Mori H., Horiuchi T.; RT "A more accurate sequence comparison between genomes of Escherichia coli RT K12 W3110 and MG1655 strains"; RL Unpublished. XX RN [6] RC GenBank accession AY605712 (sequence corrections) RP 1-4639675 RA Perna N.T.; RT "Escherichia coli K-12 MG1655 yqiK-rfaE intergenic region, genomic sequence RT correction"; RL Unpublished. XX RN [7] RP 1-4639675 RA Rudd K.E.; RT "A manual approach to accurate translation start site annotation: an E. RT coli K-12 case study"; RL Unpublished. XX RN [8] RP 1-4639675 RA Blattner F.R., Plunkett G. III.; RT ; RL Submitted (16-JAN-1997) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [9] RP 1-4639675 RA Blattner F.R., Plunkett G. III.; RT ; RL Submitted (02-SEP-1997) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [10] RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (13-OCT-1998) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [11] RC Sequence update by submitter RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (10-JUN-2004) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [12] RC Protein updates by submitter RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (07-FEB-2006) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX DR EMBL-TPA; BR000242. XX FH Key Location/Qualifiers FH FT source 1..4639675 FT /organism="Escherichia coli K12" FT /strain="K-12" FT /sub_strain="MG1655" FT /mol_type="genomic DNA" FT /db_xref="taxon:83333" FT gene 190..255 FT /gene="thrL" FT /locus_tag="b0001" FT /note="synonyms: ECK0001, JW4367" FT CDS 190..255 FT /codon_start=1 FT /transl_table=11 FT /gene="thrL" FT /locus_tag="b0001" FT /product="thr operon leader peptide" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="leader; Amino acid biosynthesis: Threonine" FT /note="go_process: threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73112.1" FT /translation="MKRISTTITTTITITTGNGAG" FT gene 337..2799 FT /gene="thrA" FT /locus_tag="b0002" FT /note="synonyms: Hs, thrD, ECK0002, JW0001" FT CDS 337..2799 FT /codon_start=1 FT /transl_table=11 FT /gene="thrA" FT /locus_tag="b0002" FT /product="fused aspartokinase I and homoserine FT dehydrogenase I" FT /function="1.5.1.21 metabolism; building block FT biosynthesis; amino acids; homoserine" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="1.1.1.3" FT /EC_number="2.7.2.4" FT /note="bifunctional: aspartokinase I (N-terminal); FT homoserine dehydrogenase I (C-terminal); go_component: FT cytoplasm [goid 0005737]; go_process: threonine FT biosynthesis [goid 0009088]; go_process: homoserine FT biosynthesis [goid 0009090]" FT /protein_id="AAC73113.1" FT /translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITN FT HLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHV FT LHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLES FT TVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADC FT CEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCL FT IKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMS FT RARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAII FT SVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQM FT LFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGVANSKALLTNVHGLN FT LENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVT FT PNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGI FT LSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGREL FT ELADIEIEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDG FT VCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR FT TLSWKLGV" FT gene 2801..3733 FT /gene="thrB" FT /locus_tag="b0003" FT /note="synonyms: ECK0003, JW0002" FT CDS 2801..3733 FT /codon_start=1 FT /transl_table=11 FT /gene="thrB" FT /locus_tag="b0003" FT /product="homoserine kinase" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="2.7.1.39" FT /note="go_component: cytoplasm [goid 0005737]; go_process: FT threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73114.1" FT /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFS FT LNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVV FT AALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQ FT QVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELA FT AKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVAD FT WLGKNYLQNQEGFVHICRLDTAGARVLEN" FT gene 3734..5020 FT /gene="thrC" FT /locus_tag="b0004" FT /note="synonyms: ECK0004, JW0003" FT CDS 3734..5020 FT /codon_start=1 FT /transl_table=11 FT /gene="thrC" FT /locus_tag="b0004" FT /product="threonine synthase" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="4.2.3.1" FT /note="go_component: cytoplasm [goid 0005737]; go_process: FT threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73115.1" FT /translation="MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDE FT MLKLDFVTRSAKILSAFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTL FT AFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKIS FT PLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQ FT ICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPR FT FLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTM FT RELKELGYTSEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKE FT LAERADLPLLSHNLPADFAALRKLMMNHQ" XX SQ Sequence 4639675 BP; 1142228 A; 1179554 C; 1176923 G; 1140970 T; 0 other; agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc 60 tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg 120 tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac 180 acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt 240 aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg 300 cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt 360 acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc 420 aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg 480 gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa 540 cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg 600 caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt 660 agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa 720 atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc 780 gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct 840 gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca 900 ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac 960 tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac 1020 gttgacgggg tctatacctg cgacccgcgt caggtgcccg atgcgaggtt gttgaagtcg 1080 atgtcctacc aggaagcgat ggagctttcc tacttcggcg ctaaagttct tcacccccgc 1140 accattaccc ccatcgccca gttccagatc ccttgcctga ttaaaaatac cggaaatcct 1200 caagcaccag gtacgctcat tggtgccagc cgtgatgaag acgaattacc ggtcaagggc 1260 atttccaatc tgaataacat ggcaatgttc agcgtttctg gtccggggat gaaagggatg 1320 gtcggcatgg cggcgcgcgt ctttgcagcg atgtcacgcg cccgtatttc cgtggtgctg 1380 attacgcaat catcttccga atacagcatc agtttctgcg ttccacaaag cgactgtgtg 1440 cgagctgaac gggcaatgca ggaagagttc tacctggaac tgaaagaagg cttactggag 1500 ccgctggcag tgacggaacg gctggccatt atctcggtgg taggtgatgg tatgcgcacc 1560 ttgcgtggga tctcggcgaa attctttgcc gcactggccc gcgccaatat caacattgtc 1620 gccattgctc agggatcttc tgaacgctca atctctgtcg tggtaaataa cgatgatgcg 1680 accactggcg tgcgcgttac tcatcagatg ctgttcaata ccgatcaggt tatcgaagtg 1740 tttgtgattg gcgtcggtgg cgttggcggt gcgctgctgg agcaactgaa gcgtcagcaa 1800 agctggctga agaataaaca tatcgactta cgtgtctgcg gtgttgccaa ctcgaaggct 1860 ctgctcacca atgtacatgg ccttaatctg gaaaactggc aggaagaact ggcgcaagcc 1920 aaagagccgt ttaatctcgg gcgcttaatt cgcctcgtga aagaatatca tctgctgaac 1980 ccggtcattg ttgactgcac ttccagccag gcagtggcgg atcaatatgc cgacttcctg 2040 cgcgaaggtt tccacgttgt cacgccgaac aaaaaggcca acacctcgtc gatggattac 2100 taccatcagt tgcgttatgc ggcggaaaaa tcgcggcgta aattcctcta tgacaccaac 2160 gttggggctg gattaccggt tattgagaac ctgcaaaatc tgctcaatgc aggtgatgaa 2220 ttgatgaagt tctccggcat tctttctggt tcgctttctt atatcttcgg caagttagac 2280 gaaggcatga gtttctccga ggcgaccacg ctggcgcggg aaatgggtta taccgaaccg 2340 gacccgcgag atgatctttc tggtatggat gtggcgcgta aactattgat tctcgctcgt 2400 gaaacgggac gtgaactgga gctggcggat attgaaattg aacctgtgct gcccgcagag 2460 tttaacgccg agggtgatgt tgccgctttt atggcgaatc tgtcacaact cgacgatctc 2520 tttgccgcgc gcgtggcgaa ggcccgtgat gaaggaaaag ttttgcgcta tgttggcaat 2580 attgatgaag atggcgtctg ccgcgtgaag attgccgaag tggatggtaa tgatccgctg 2640 ttcaaagtga aaaatggcga aaacgccctg gccttctata gccactatta tcagccgctg 2700 ccgttggtac tgcgcggata tggtgcgggc aatgacgtta cagctgccgg tgtctttgct 2760 gatctgctac gtaccctctc atggaagtta ggagtctgac atggttaaag tttatgcccc 2820 ggcttccagt gccaatatga gcgtcgggtt tgatgtgctc ggggcggcgg tgacacctgt 2880 tgatggtgca ttgctcggag atgtagtcac ggttgaggcg gcagagacat tcagtctcaa 2940 caacctcgga cgctttgccg ataagctgcc gtcagaacca cgggaaaata tcgtttatca 3000 gtgctgggag cgtttttgcc aggaactggg taagcaaatt ccagtggcga tgaccctgga 3060 aaagaatatg ccgatcggtt cgggcttagg ctccagtgcc tgttcggtgg tcgcggcgct 3120 gatggcgatg aatgaacact gcggcaagcc gcttaatgac actcgtttgc tggctttgat 3180 gggcgagctg gaaggccgta tctccggcag cattcattac gacaacgtgg caccgtgttt 3240 tctcggtggt atgcagttga tgatcgaaga aaacgacatc atcagccagc aagtgccagg 3300 gtttgatgag tggctgtggg tgctggcgta tccggggatt aaagtctcga cggcagaagc 3360 cagggctatt ttaccggcgc agtatcgccg ccaggattgc attgcgcacg ggcgacatct 3420 ggcaggcttc attcacgcct gctattcccg tcagcctgag cttgccgcga agctgatgaa 3480 agatgttatc gctgaaccct accgtgaacg gttactgcca ggcttccggc aggcgcggca 3540 ggcggtcgcg gaaatcggcg cggtagcgag cggtatctcc ggctccggcc cgaccttgtt 3600 cgctctgtgt gacaagccgg aaaccgccca gcgcgttgcc gactggttgg gtaagaacta 3660 cctgcaaaat caggaaggtt ttgttcatat ttgccggctg gatacggcgg gcgcacgagt 3720 actggaaaac taaatgaaac tctacaatct gaaagatcac aacgagcagg tcagctttgc 3780 gcaagccgta acccaggggt tgggcaaaaa tcaggggctg ttttttccgc acgacctgcc 3840 ggaattcagc ctgactgaaa ttgatgagat gctgaagctg gattttgtca cccgcagtgc 3900 gaagatcctc tcggcgttta ttggtgatga aatcccacag gaaatcctgg aagagcgcgt 3960 gcgcgcggcg tttgccttcc cggctccggt cgccaatgtt gaaagcgatg tcggttgtct 4020 ggaattgttc cacgggccaa cgctggcatt taaagatttc ggcggtcgct ttatggcaca 4080 aatgctgacc catattgcgg gtgataagcc agtgaccatt ctgaccgcga cctccggtga 4140 taccggagcg gcagtggctc atgctttcta cggtttaccg aatgtgaaag tggttatcct 4200 ctatccacga ggcaaaatca gtccactgca agaaaaactg ttctgtacat tgggcggcaa 4260 tatcgaaact gttgccatcg acggcgattt cgatgcctgt caggcgctgg tgaagcaggc 4320 gtttgatgat gaagaactga aagtggcgct agggttaaac tcggctaact cgattaacat 4380 cagccgtttg ctggcgcaga tttgctacta ctttgaagct gttgcgcagc tgccgcagga 4440 gacgcgcaac cagctggttg tctcggtgcc aagcggaaac ttcggcgatt tgacggcggg 4500 tctgctggcg aagtcactcg gtctgccggt gaaacgtttt attgctgcga ccaacgtgaa 4560 cgataccgtg ccacgtttcc tgcacgacgg tcagtggtca cccaaagcga ctcaggcgac 4620 gttatccaac gcgatggacg tgagtcagcc gaacaactgg ccgcgtgtgg aagagttgtt 4680 ccgccgcaaa atctggcaac tgaaagagct gggttatgca gccgtggatg atgaaaccac 4740 gcaacagaca atgcgtgagt taaaagaact gggctacact tcggagccgc acgctgccgt 4800 agcttatcgt gcgctgcgtg atcagttgaa tccaggcgaa tatggcttgt tcctcggcac 4860 cgcgcatccg gcgaaattta aagagagcgt ggaagcgatt ctcggtgaaa cgttggatct 4920 gccaaaagag ctggcagaac gtgctgattt acccttgctt tcacataatc tgcccgccga 4980 ttttgctgcg ttgcgtaaat tgatgatgaa tcatcagtaa aatctattca ttatctcaat 5040 caggccgggt ttgcttttat gcagcccggc ttttttatga agaaattatg gagaaaaatg 5100 acagggaaaa aggagaaatt ctcaataaat gcggtaactt agagattagg attgcggaga 5160 ataacaaccg ccgttctcat cgagtaatct ccggatatcg acccataacg ggcaatgata 5220 aaaggagtaa cctgtgaaaa agatgcaatc tatcgtactc gcactttccc tggttctggt 5280 cgctcccatg gcagcacagg ctgcggaaat tacgttagtc ccgtcagtaa aattacagat 5340 aggcgatcgt gataatcgtg gctattactg ggatggaggt cactggcgcg accacggctg 5400 gtggaaacaa cattatgaat ggcgaggcaa tcgctggcac ctacacggac cgccgccacc 5460 gccgcgccac cataagaaag ctcctcatga tcatcacggc ggtcatggtc caggcaaaca 5520 tcaccgctaa atgacaaatg ccgggtaaca atccggcatt cagcgcctga tgcgacgctg 5580 gcgcgtctta tcaggcctac gttaattctg caatatattg aatctgcatg cttttgtagg 5640 caggataagg cgttcacgcc gcatccggca ttgactgcaa acttaacgct gctcgtagcg 5700 tttaaacacc agttcgccat tgctggagga atcttcatca aagaagtaac cttcgctatt 5760 aaaaccagtc agttgctctg gtttggtcag ccgattttca ataatgaaac gactcatcag 5820 accgcgtgct ttcttagcgt agaagctgat gatcttaaat ttgccgttct tctcatcgag 5880 gaacaccggc ttgataatct cggcattcaa tttcttcggc ttcaccgatt taaaatactc 5940 atctgacgcc agattaatca ccacattatc gccttgtgct gcgagcgcct cgttcagctt 6000 gttggtgatg atatctcccc agaattgata cagatctttc cctcgggcat tctcaagacg 6060 gatccccatt tccagacgat aaggctgcat taaatcgagc gggcggagta cgccatacaa 6120 gccggaaagc attcgcaaat gctgttgggc aaaatcgaaa tcgtcttcgc tgaaggtttc 6180 ggcctgcaag ccggtgtaga catcaccttt aaacgccaga atcgcctggc gggcattcgc 6240 cggcgtgaaa tctggctgcc agtcatgaaa gcgagcggcg ttgatacccg ccagtttgtc 6300 gctgatgcgc atcagcgtgc taatctgcgg aggcgtcagt ttccgcgcct catggatcaa 6360 ctgctgggaa ttgtctaaca gctccggcag cgtatagcgc gtggtggtca acgggctttg 6420 gtaatcaagc gttttcgcag gtgaaataag aatcagcata tccagtcctt gcaggaaatt 6480 tatgccgact ttagcaaaaa atgagaatga gttgatcgat agttgtgatt actcctgcga 6540 aacatcatcc cacgcgtccg gagaaagctg gcgaccgata tccggataac gcaatggatc 6600 aaacaccggg cgcacgccga gtttacgctg gcgtagataa tcactggcaa tggtatgaac 6660 cacaggcgag agcagtaaaa tggcggtcaa attggtaata gccatgcagg ccattatgat 6720 atctgccagt tgccacatca gcggaaggct tagcaaggtg ccgccgatga ccgttgcgaa 6780 ggtgcagatc cgcaaacacc agatcgcttt agggttgttc aggcgtaaaa agaagagatt 6840 gttttcggca taaatgtagt tggcaacgat ggagctgaag gcaaacagaa taaccacaag 6900 ggtaacaaac tcagcacccc aggaacccat tagcacccgc atcgccttct ggataagctg 6960 aataccttcc agcggcatgt aggttgtgcc gttacccgcc agtaatatca gcatggcgct 7020 tgccgtacag atgaccaggg tgtcgataaa aatgccaatc atctggacaa tcccttgcgc 7080 tgccggatgc ggaggccagg acgccgctgc cgctgccgcg tttggcgtcg aacccattcc 7140 cgcctcattg gaaaacatac tgcgctgaaa accgttagta atcgcctggc ttaaggtata 7200 tcccgccgcg ccgcctgccg cttcctgcca gccaaaagca ctctcaaaaa tagaccaaat 7260 gacgtgggga agttgcccga tattcattac gcaaattacc aggctggtca gtacccagat 7320 tatcgccatc aacgggacaa agccctgcat gagccgggcg acgccatgaa gaccgcgagt 7380 gattgccagc agagtaaaga cagcgagaat aatgcctgtc accagcgggg gaaaatcaaa 7440 agaaaaactc agggcgcggg caacggcgtt cgcttgaact ccgctgaaaa ttatgccata 7500 ggcgatgagc aaaaagacgg cgaacagaac gcccatccag cgcatcccca gcccgcgcgc 7560 catataccat gccggtccgc cacgaaactg cccattgacg tcacgttctt tataaagttg 7620 tgccagagaa cattcggcaa acgaggtcgc catgccgata aacgcggcaa cccacatcca 7680 aaagacggct ccaggtccac cggcggtaat agccagcgca acgccggcca ggttgccgct 7740 acccacgcgc gccgcaagac tggtacacaa tgactgaaat gaggttaaac cgcctggctg 7800 tggatgaatg ctatttttaa gacttttgcc aaactggcgg atgtagcgaa actgcacaaa 7860 tccggtgcga aaagtgaacc aacaacctgc gccgaagagc aggtaaatca ttaccgatcc 7920 ccaaaggacg ctgttaatga aggagaaaaa atctggcatg catatccctc ttattgccgg 7980 tcgcgatgac tttcctgtgt aaacgttacc aattgtttaa gaagtatata cgctacgagg 8040 tacttgataa cttctgcgta gcatacatga ggttttgtat aaaaatggcg ggcgatatca 8100 acgcagtgtc agaaatccga aacagtctcg cctggcgata accgtcttgt cggcggttgc 8160 gctgacgttg cgtcgtgata tcatcagggc agaccggtta catcccccta acaagctgtt 8220 taaagagaaa tactatcatg acggacaaat tgacctccct tcgtcagtac accaccgtag 8280 tggccgacac tggggacatc gcggcaatga agctgtatca accgcaggat gccacaacca 8340 acccttctct cattcttaac gcagcgcaga ttccggaata ccgtaagttg attgatgatg 8400 ctgtcgcctg ggcgaaacag cagagcaacg atcgcgcgca gcagatcgtg gacgcgaccg 8460 acaaactggc agtaaatatt ggtctggaaa tcctgaaact ggttccgggc cgtatctcaa 8520 ctgaagttga tgcgcgtctt tcctatgaca ccgaagcgtc aattgcgaaa gcaaaacgcc 8580 tgatcaaact ctacaacgat gctggtatta gcaacgatcg tattctgatc aaactggctt 8640 ctacctggca gggtatccgt gctgcagaac agctggaaaa agaaggcatc aactgtaacc 8700 tgaccctgct gttctccttc gctcaggctc gtgcttgtgc ggaagcgggc gtgttcctga 8760 tctcgccgtt tgttggccgt attcttgact ggtacaaagc gaataccgat aagaaagagt 8820 acgctccggc agaagatccg ggcgtggttt ctgtatctga aatctaccag tactacaaag 8880 agcacggtta tgaaaccgtg gttatgggcg caagcttccg taacatcggc gaaattctgg 8940 aactggcagg ctgcgaccgt ctgaccatcg caccggcact gctgaaagag ctggcggaga 9000 // Jolyon Holdstock Ph.D. Senior Computational Biologist, Oxford Gene Technology (Ops) Ltd. Begbroke Business and Science Park Sandy Lane, Yarnton Oxford, OX5 1PF Tel: 01865 309699 Fax: 01865 842116 Confidentiality Notice: The contents of this email from the Oxford Gene Technology Group of Companies are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. From dreher at mpiib-berlin.mpg.de Wed Mar 8 13:08:50 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Wed Mar 8 13:11:54 2006 Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation Message-ID: <440F1DB2.903@mpiib-berlin.mpg.de> Hello all, in my last post I described a problem with primary keys. When I tried to save a RichSequence with annotations in a PostgreSQL/BioSQL-Database using Hibernate, among others the exception --- org.postgresql.util.PSQLException: ERROR: relation "ontology_ontology_id_seq" does not exist --- was thrown. This could be solved by changing the tag in the ontology.hbm.xml from to ontology_pk_seq (and similarly in the term.hbm.xml file). I'm not sure if this is specific for my project or if it's a general problem. Anyway, this works fine now, however another problem came up: I want to enrich a Sequence that was downloaded from Genbank and (by enriching) save all the annotations in the RichSequence object. Sequence seq = new GenbankSequenceDB().getSequence("NM_008160"); RichSequence s = RichSequence.Tools.enrich(seq); tdb.addSequence(s); (where tdb is a convenience wrapper for storing and retrieving sequences from the BioSQL-DB, it works with non-enriched sequences). From the debugging info I got, this works at the object level, but when I try to save the sequence to the DB, the following exception is thrown: 2006-03-08 18:35:00,642 ERROR [httpWorkerThread-28080-9] calling method: org.hibernate.util.JDBCExceptionReporter.logExceptions(JDBCExceptionReporter.java:72) *ERROR: duplicate key violates unique constraint "seqfeature_bioentry_id_key"* 2006-03-08 18:35:00,643 ERROR [httpWorkerThread-28080-9] calling method: org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:299) *Could not synchronize database state with session* org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:69) at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:202) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86) at org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427) at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009) at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356) at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106) at rnaiprediction.sequence.db.SequenceDB.addSequence(SequenceDB.java:67) at rnaiprediction.Queue.prerender(Queue.java:374) ...... *Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into seqfeature (bioentry_id, source_term_id, type_term_id, display_name, rank, seqfeature_id) values (126, 269, 269, NULL, 0, 83) was aborted. Call getNextException to see the cause.* at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2497) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1298) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:347) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2559) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195) ... 57 more Any suggestions would be highly appreciated! Regards, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mark.schreiber at novartis.com Wed Mar 8 20:02:09 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Mar 8 19:57:12 2006 Subject: [Biojava-l] BiojavaX EmblFormat Message-ID: The biojavax parser uses regular expressions to parse these lines. I will need to check what needs changing in these regex's to allow parsing of these files. Thanks for your testing! - Mark "Jolyon Holdstock" Sent by: biojava-l-bounces@portal.open-bio.org 03/08/2006 06:47 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BiojavaX EmblFormat Hi, I am using the new format parsers in BioJavaX. GenbankFormat is great, but I am having some trouble with the EMBLFormat class. I have downloaded a sequence file (ID:U00096) from the EBI in EMBL format but I don't believe it is parsing properly. My code is as follows: String fileName = "path to file"; try { RichSequenceIterator rsi = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new FileReader(fileName)), null); while (rsi.hasNext()) { RichSequence seq = rsi.nextRichSequence(); System.out.println(seq.getURN()); System.out.println(seq.length()); System.out.println(seq.getAccession()); } } catch (IOException IOE) { System.out.println("BioJava IOException " + IOE); } catch (BioException BIOE) { System.out.println("BioJavaX BioException " + BIOE); BIOE.printStackTrace(); } The BioJava parser will read it. seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); //works I checked the web CVS and the EMBLFormat class is 3 months old so I am using the most recent version. I have pasted a snippet of the sequence file that retains the problems below. The errors are: The ID line isn't parsed because of 'genomic' being there - deleting it removes the problem org.biojava.bio.BioException: Could not read sequence Caused by: org.biojava.bio.seq.io.ParseException: Bad ID line found: U00096 standard; circular genomic DNA; PRO; 4639675 BP. ID U00096 standard; circular genomic DNA; PRO; 4639675 BP. //fails ID U00096 standard; circular DNA; PRO; 4639675 BP. //works There is a problem with the RX tag which fails with output: org.biojava.bio.BioException: Could not read sequence Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:352) Replacing RX DOI; 10.1126/science.277.5331.1453. with removes the error XX RX DOI; 10.1126/science.277.5331.1453. There is an error with parsing the authors org.biojava.bio.BioException: Could not read sequence Caused by: java.lang.IllegalArgumentException: Authors string cannot be null at org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:395) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead er.java:100) I am looking at the code trying to see where the problems are but suspect that it may be beyond me. So if anybody has some experience of this I would welcome their input. Thanks, Jolyon This is a snippet of the code that reproduces the errors in my hands. ID U00096 standard; circular genomic DNA; PRO; 4639675 BP. XX AC U00096; AE000111-AE000510; XX SV U00096.2 XX DT 23-FEB-2006 (Rel. 86, Created) DT 06-MAR-2006 (Rel. 87, Last updated, Version 3) XX DE Escherichia coli K-12 MG1655, complete genome. XX KW . XX OS Escherichia coli K12 OC Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; OC Enterobacteriaceae; Escherichia. XX RN [1] RP 1-4639675 RX DOI; 10.1126/science.277.5331.1453. RX PUBMED; 9278503. RA Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V., Riley M., RA Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J., RA Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao Y.; RT "The complete genome sequence of Escherichia coli K-12"; RL Science 277(5331):1453-1474(1997). XX RN [2] RP 1-4639675 RX DOI; 10.1093/nar/gkj150. RX PUBMED; 16397293. RA Riley M., Abe T., Arnaud M.B., Berlyn M.K., Blattner F.R., Chaudhuri R.R., RA Glasner J.D., Horiuchi T., Keseler I.M., Kosuge T., Mori H., Perna N.T., RA Plunkett G. III, Rudd K.E., Serres M.H., Thomas G.H., Thomson N.R., RA Wishart D., Wanner B.L.; RT "Escherichia coli K-12: a cooperatively developed annotation RT snapshot--2005"; RL (er) Nucleic Acids Res. 34 (1), 1-9 (2006) XX RN [3] RC Woods Hole, Mass., on 14-18 November 2003 (sequence corrections) RP 1-4639675 RA Arnaud M., Berlyn M.K.B., Blattner F.R., Galperin M.Y., Glasner J.D., RA Horiuchi T., Kosuge T., Mori H., Perna N.T., Plunkett G. III, Riley M., RA Rudd K.E., Serres M.H., Thomas G.H., Wanner B.L.; RT "Workshop on Annotation of Escherichia coli K-12"; RL Unpublished. XX RN [4] RC ASAP download 10 June 2004 (annotation updates) RP 1-4639675 RA Glasner J.D., Perna N.T., Plunkett G. III, Anderson B.D., Bockhorst J., RA Hu J.C., Riley M., Rudd K.E., Serres M.H.; RT "ASAP: Escherichia coli K-12 strain MG1655 version m56"; RL Unpublished. XX RN [5] RC GenBank accessions AG613214 to AG613378 (sequence corrections) RP 1-4639675 RA Hayashi K., Morooka N., Mori H., Horiuchi T.; RT "A more accurate sequence comparison between genomes of Escherichia coli RT K12 W3110 and MG1655 strains"; RL Unpublished. XX RN [6] RC GenBank accession AY605712 (sequence corrections) RP 1-4639675 RA Perna N.T.; RT "Escherichia coli K-12 MG1655 yqiK-rfaE intergenic region, genomic sequence RT correction"; RL Unpublished. XX RN [7] RP 1-4639675 RA Rudd K.E.; RT "A manual approach to accurate translation start site annotation: an E. RT coli K-12 case study"; RL Unpublished. XX RN [8] RP 1-4639675 RA Blattner F.R., Plunkett G. III.; RT ; RL Submitted (16-JAN-1997) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [9] RP 1-4639675 RA Blattner F.R., Plunkett G. III.; RT ; RL Submitted (02-SEP-1997) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [10] RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (13-OCT-1998) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [11] RC Sequence update by submitter RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (10-JUN-2004) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX RN [12] RC Protein updates by submitter RP 1-4639675 RA Plunkett G. III.; RT ; RL Submitted (07-FEB-2006) to the EMBL/GenBank/DDBJ databases. RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall, Madison, RL WI 53706-1580, USA XX DR EMBL-TPA; BR000242. XX FH Key Location/Qualifiers FH FT source 1..4639675 FT /organism="Escherichia coli K12" FT /strain="K-12" FT /sub_strain="MG1655" FT /mol_type="genomic DNA" FT /db_xref="taxon:83333" FT gene 190..255 FT /gene="thrL" FT /locus_tag="b0001" FT /note="synonyms: ECK0001, JW4367" FT CDS 190..255 FT /codon_start=1 FT /transl_table=11 FT /gene="thrL" FT /locus_tag="b0001" FT /product="thr operon leader peptide" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="leader; Amino acid biosynthesis: Threonine" FT /note="go_process: threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73112.1" FT /translation="MKRISTTITTTITITTGNGAG" FT gene 337..2799 FT /gene="thrA" FT /locus_tag="b0002" FT /note="synonyms: Hs, thrD, ECK0002, JW0001" FT CDS 337..2799 FT /codon_start=1 FT /transl_table=11 FT /gene="thrA" FT /locus_tag="b0002" FT /product="fused aspartokinase I and homoserine FT dehydrogenase I" FT /function="1.5.1.21 metabolism; building block FT biosynthesis; amino acids; homoserine" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="1.1.1.3" FT /EC_number="2.7.2.4" FT /note="bifunctional: aspartokinase I (N-terminal); FT homoserine dehydrogenase I (C-terminal); go_component: FT cytoplasm [goid 0005737]; go_process: threonine FT biosynthesis [goid 0009088]; go_process: homoserine FT biosynthesis [goid 0009090]" FT /protein_id="AAC73113.1" FT /translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITN FT HLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHV FT LHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLES FT TVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADC FT CEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCL FT IKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMS FT RARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAII FT SVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQM FT LFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGVANSKALLTNVHGLN FT LENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVT FT PNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGI FT LSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGREL FT ELADIEIEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDG FT VCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR FT TLSWKLGV" FT gene 2801..3733 FT /gene="thrB" FT /locus_tag="b0003" FT /note="synonyms: ECK0003, JW0002" FT CDS 2801..3733 FT /codon_start=1 FT /transl_table=11 FT /gene="thrB" FT /locus_tag="b0003" FT /product="homoserine kinase" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="2.7.1.39" FT /note="go_component: cytoplasm [goid 0005737]; go_process: FT threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73114.1" FT /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFS FT LNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVV FT AALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQ FT QVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELA FT AKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVAD FT WLGKNYLQNQEGFVHICRLDTAGARVLEN" FT gene 3734..5020 FT /gene="thrC" FT /locus_tag="b0004" FT /note="synonyms: ECK0004, JW0003" FT CDS 3734..5020 FT /codon_start=1 FT /transl_table=11 FT /gene="thrC" FT /locus_tag="b0004" FT /product="threonine synthase" FT /function="1.5.1.8 metabolism; building block biosynthesis; FT amino acids; threonine" FT /function="7.1 location of gene products; cytoplasm" FT /function="enzyme; Amino acid biosynthesis: Threonine" FT /EC_number="4.2.3.1" FT /note="go_component: cytoplasm [goid 0005737]; go_process: FT threonine biosynthesis [goid 0009088]" FT /protein_id="AAC73115.1" FT /translation="MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDE FT MLKLDFVTRSAKILSAFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTL FT AFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKIS FT PLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQ FT ICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPR FT FLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTM FT RELKELGYTSEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKE FT LAERADLPLLSHNLPADFAALRKLMMNHQ" XX SQ Sequence 4639675 BP; 1142228 A; 1179554 C; 1176923 G; 1140970 T; 0 other; agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc 60 tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg 120 tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac 180 acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt 240 aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg 300 cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt 360 acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc 420 aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg 480 gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa 540 cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg 600 caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt 660 agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa 720 atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc 780 gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct 840 gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca 900 ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac 960 tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac 1020 gttgacgggg tctatacctg cgacccgcgt caggtgcccg atgcgaggtt gttgaagtcg 1080 atgtcctacc aggaagcgat ggagctttcc tacttcggcg ctaaagttct tcacccccgc 1140 accattaccc ccatcgccca gttccagatc ccttgcctga ttaaaaatac cggaaatcct 1200 caagcaccag gtacgctcat tggtgccagc cgtgatgaag acgaattacc ggtcaagggc 1260 atttccaatc tgaataacat ggcaatgttc agcgtttctg gtccggggat gaaagggatg 1320 gtcggcatgg cggcgcgcgt ctttgcagcg atgtcacgcg cccgtatttc cgtggtgctg 1380 attacgcaat catcttccga atacagcatc agtttctgcg ttccacaaag cgactgtgtg 1440 cgagctgaac gggcaatgca ggaagagttc tacctggaac tgaaagaagg cttactggag 1500 ccgctggcag tgacggaacg gctggccatt atctcggtgg taggtgatgg tatgcgcacc 1560 ttgcgtggga tctcggcgaa attctttgcc gcactggccc gcgccaatat caacattgtc 1620 gccattgctc agggatcttc tgaacgctca atctctgtcg tggtaaataa cgatgatgcg 1680 accactggcg tgcgcgttac tcatcagatg ctgttcaata ccgatcaggt tatcgaagtg 1740 tttgtgattg gcgtcggtgg cgttggcggt gcgctgctgg agcaactgaa gcgtcagcaa 1800 agctggctga agaataaaca tatcgactta cgtgtctgcg gtgttgccaa ctcgaaggct 1860 ctgctcacca atgtacatgg ccttaatctg gaaaactggc aggaagaact ggcgcaagcc 1920 aaagagccgt ttaatctcgg gcgcttaatt cgcctcgtga aagaatatca tctgctgaac 1980 ccggtcattg ttgactgcac ttccagccag gcagtggcgg atcaatatgc cgacttcctg 2040 cgcgaaggtt tccacgttgt cacgccgaac aaaaaggcca acacctcgtc gatggattac 2100 taccatcagt tgcgttatgc ggcggaaaaa tcgcggcgta aattcctcta tgacaccaac 2160 gttggggctg gattaccggt tattgagaac ctgcaaaatc tgctcaatgc aggtgatgaa 2220 ttgatgaagt tctccggcat tctttctggt tcgctttctt atatcttcgg caagttagac 2280 gaaggcatga gtttctccga ggcgaccacg ctggcgcggg aaatgggtta taccgaaccg 2340 gacccgcgag atgatctttc tggtatggat gtggcgcgta aactattgat tctcgctcgt 2400 gaaacgggac gtgaactgga gctggcggat attgaaattg aacctgtgct gcccgcagag 2460 tttaacgccg agggtgatgt tgccgctttt atggcgaatc tgtcacaact cgacgatctc 2520 tttgccgcgc gcgtggcgaa ggcccgtgat gaaggaaaag ttttgcgcta tgttggcaat 2580 attgatgaag atggcgtctg ccgcgtgaag attgccgaag tggatggtaa tgatccgctg 2640 ttcaaagtga aaaatggcga aaacgccctg gccttctata gccactatta tcagccgctg 2700 ccgttggtac tgcgcggata tggtgcgggc aatgacgtta cagctgccgg tgtctttgct 2760 gatctgctac gtaccctctc atggaagtta ggagtctgac atggttaaag tttatgcccc 2820 ggcttccagt gccaatatga gcgtcgggtt tgatgtgctc ggggcggcgg tgacacctgt 2880 tgatggtgca ttgctcggag atgtagtcac ggttgaggcg gcagagacat tcagtctcaa 2940 caacctcgga cgctttgccg ataagctgcc gtcagaacca cgggaaaata tcgtttatca 3000 gtgctgggag cgtttttgcc aggaactggg taagcaaatt ccagtggcga tgaccctgga 3060 aaagaatatg ccgatcggtt cgggcttagg ctccagtgcc tgttcggtgg tcgcggcgct 3120 gatggcgatg aatgaacact gcggcaagcc gcttaatgac actcgtttgc tggctttgat 3180 gggcgagctg gaaggccgta tctccggcag cattcattac gacaacgtgg caccgtgttt 3240 tctcggtggt atgcagttga tgatcgaaga aaacgacatc atcagccagc aagtgccagg 3300 gtttgatgag tggctgtggg tgctggcgta tccggggatt aaagtctcga cggcagaagc 3360 cagggctatt ttaccggcgc agtatcgccg ccaggattgc attgcgcacg ggcgacatct 3420 ggcaggcttc attcacgcct gctattcccg tcagcctgag cttgccgcga agctgatgaa 3480 agatgttatc gctgaaccct accgtgaacg gttactgcca ggcttccggc aggcgcggca 3540 ggcggtcgcg gaaatcggcg cggtagcgag cggtatctcc ggctccggcc cgaccttgtt 3600 cgctctgtgt gacaagccgg aaaccgccca gcgcgttgcc gactggttgg gtaagaacta 3660 cctgcaaaat caggaaggtt ttgttcatat ttgccggctg gatacggcgg gcgcacgagt 3720 actggaaaac taaatgaaac tctacaatct gaaagatcac aacgagcagg tcagctttgc 3780 gcaagccgta acccaggggt tgggcaaaaa tcaggggctg ttttttccgc acgacctgcc 3840 ggaattcagc ctgactgaaa ttgatgagat gctgaagctg gattttgtca cccgcagtgc 3900 gaagatcctc tcggcgttta ttggtgatga aatcccacag gaaatcctgg aagagcgcgt 3960 gcgcgcggcg tttgccttcc cggctccggt cgccaatgtt gaaagcgatg tcggttgtct 4020 ggaattgttc cacgggccaa cgctggcatt taaagatttc ggcggtcgct ttatggcaca 4080 aatgctgacc catattgcgg gtgataagcc agtgaccatt ctgaccgcga cctccggtga 4140 taccggagcg gcagtggctc atgctttcta cggtttaccg aatgtgaaag tggttatcct 4200 ctatccacga ggcaaaatca gtccactgca agaaaaactg ttctgtacat tgggcggcaa 4260 tatcgaaact gttgccatcg acggcgattt cgatgcctgt caggcgctgg tgaagcaggc 4320 gtttgatgat gaagaactga aagtggcgct agggttaaac tcggctaact cgattaacat 4380 cagccgtttg ctggcgcaga tttgctacta ctttgaagct gttgcgcagc tgccgcagga 4440 gacgcgcaac cagctggttg tctcggtgcc aagcggaaac ttcggcgatt tgacggcggg 4500 tctgctggcg aagtcactcg gtctgccggt gaaacgtttt attgctgcga ccaacgtgaa 4560 cgataccgtg ccacgtttcc tgcacgacgg tcagtggtca cccaaagcga ctcaggcgac 4620 gttatccaac gcgatggacg tgagtcagcc gaacaactgg ccgcgtgtgg aagagttgtt 4680 ccgccgcaaa atctggcaac tgaaagagct gggttatgca gccgtggatg atgaaaccac 4740 gcaacagaca atgcgtgagt taaaagaact gggctacact tcggagccgc acgctgccgt 4800 agcttatcgt gcgctgcgtg atcagttgaa tccaggcgaa tatggcttgt tcctcggcac 4860 cgcgcatccg gcgaaattta aagagagcgt ggaagcgatt ctcggtgaaa cgttggatct 4920 gccaaaagag ctggcagaac gtgctgattt acccttgctt tcacataatc tgcccgccga 4980 ttttgctgcg ttgcgtaaat tgatgatgaa tcatcagtaa aatctattca ttatctcaat 5040 caggccgggt ttgcttttat gcagcccggc ttttttatga agaaattatg gagaaaaatg 5100 acagggaaaa aggagaaatt ctcaataaat gcggtaactt agagattagg attgcggaga 5160 ataacaaccg ccgttctcat cgagtaatct ccggatatcg acccataacg ggcaatgata 5220 aaaggagtaa cctgtgaaaa agatgcaatc tatcgtactc gcactttccc tggttctggt 5280 cgctcccatg gcagcacagg ctgcggaaat tacgttagtc ccgtcagtaa aattacagat 5340 aggcgatcgt gataatcgtg gctattactg ggatggaggt cactggcgcg accacggctg 5400 gtggaaacaa cattatgaat ggcgaggcaa tcgctggcac ctacacggac cgccgccacc 5460 gccgcgccac cataagaaag ctcctcatga tcatcacggc ggtcatggtc caggcaaaca 5520 tcaccgctaa atgacaaatg ccgggtaaca atccggcatt cagcgcctga tgcgacgctg 5580 gcgcgtctta tcaggcctac gttaattctg caatatattg aatctgcatg cttttgtagg 5640 caggataagg cgttcacgcc gcatccggca ttgactgcaa acttaacgct gctcgtagcg 5700 tttaaacacc agttcgccat tgctggagga atcttcatca aagaagtaac cttcgctatt 5760 aaaaccagtc agttgctctg gtttggtcag ccgattttca ataatgaaac gactcatcag 5820 accgcgtgct ttcttagcgt agaagctgat gatcttaaat ttgccgttct tctcatcgag 5880 gaacaccggc ttgataatct cggcattcaa tttcttcggc ttcaccgatt taaaatactc 5940 atctgacgcc agattaatca ccacattatc gccttgtgct gcgagcgcct cgttcagctt 6000 gttggtgatg atatctcccc agaattgata cagatctttc cctcgggcat tctcaagacg 6060 gatccccatt tccagacgat aaggctgcat taaatcgagc gggcggagta cgccatacaa 6120 gccggaaagc attcgcaaat gctgttgggc aaaatcgaaa tcgtcttcgc tgaaggtttc 6180 ggcctgcaag ccggtgtaga catcaccttt aaacgccaga atcgcctggc gggcattcgc 6240 cggcgtgaaa tctggctgcc agtcatgaaa gcgagcggcg ttgatacccg ccagtttgtc 6300 gctgatgcgc atcagcgtgc taatctgcgg aggcgtcagt ttccgcgcct catggatcaa 6360 ctgctgggaa ttgtctaaca gctccggcag cgtatagcgc gtggtggtca acgggctttg 6420 gtaatcaagc gttttcgcag gtgaaataag aatcagcata tccagtcctt gcaggaaatt 6480 tatgccgact ttagcaaaaa atgagaatga gttgatcgat agttgtgatt actcctgcga 6540 aacatcatcc cacgcgtccg gagaaagctg gcgaccgata tccggataac gcaatggatc 6600 aaacaccggg cgcacgccga gtttacgctg gcgtagataa tcactggcaa tggtatgaac 6660 cacaggcgag agcagtaaaa tggcggtcaa attggtaata gccatgcagg ccattatgat 6720 atctgccagt tgccacatca gcggaaggct tagcaaggtg ccgccgatga ccgttgcgaa 6780 ggtgcagatc cgcaaacacc agatcgcttt agggttgttc aggcgtaaaa agaagagatt 6840 gttttcggca taaatgtagt tggcaacgat ggagctgaag gcaaacagaa taaccacaag 6900 ggtaacaaac tcagcacccc aggaacccat tagcacccgc atcgccttct ggataagctg 6960 aataccttcc agcggcatgt aggttgtgcc gttacccgcc agtaatatca gcatggcgct 7020 tgccgtacag atgaccaggg tgtcgataaa aatgccaatc atctggacaa tcccttgcgc 7080 tgccggatgc ggaggccagg acgccgctgc cgctgccgcg tttggcgtcg aacccattcc 7140 cgcctcattg gaaaacatac tgcgctgaaa accgttagta atcgcctggc ttaaggtata 7200 tcccgccgcg ccgcctgccg cttcctgcca gccaaaagca ctctcaaaaa tagaccaaat 7260 gacgtgggga agttgcccga tattcattac gcaaattacc aggctggtca gtacccagat 7320 tatcgccatc aacgggacaa agccctgcat gagccgggcg acgccatgaa gaccgcgagt 7380 gattgccagc agagtaaaga cagcgagaat aatgcctgtc accagcgggg gaaaatcaaa 7440 agaaaaactc agggcgcggg caacggcgtt cgcttgaact ccgctgaaaa ttatgccata 7500 ggcgatgagc aaaaagacgg cgaacagaac gcccatccag cgcatcccca gcccgcgcgc 7560 catataccat gccggtccgc cacgaaactg cccattgacg tcacgttctt tataaagttg 7620 tgccagagaa cattcggcaa acgaggtcgc catgccgata aacgcggcaa cccacatcca 7680 aaagacggct ccaggtccac cggcggtaat agccagcgca acgccggcca ggttgccgct 7740 acccacgcgc gccgcaagac tggtacacaa tgactgaaat gaggttaaac cgcctggctg 7800 tggatgaatg ctatttttaa gacttttgcc aaactggcgg atgtagcgaa actgcacaaa 7860 tccggtgcga aaagtgaacc aacaacctgc gccgaagagc aggtaaatca ttaccgatcc 7920 ccaaaggacg ctgttaatga aggagaaaaa atctggcatg catatccctc ttattgccgg 7980 tcgcgatgac tttcctgtgt aaacgttacc aattgtttaa gaagtatata cgctacgagg 8040 tacttgataa cttctgcgta gcatacatga ggttttgtat aaaaatggcg ggcgatatca 8100 acgcagtgtc agaaatccga aacagtctcg cctggcgata accgtcttgt cggcggttgc 8160 gctgacgttg cgtcgtgata tcatcagggc agaccggtta catcccccta acaagctgtt 8220 taaagagaaa tactatcatg acggacaaat tgacctccct tcgtcagtac accaccgtag 8280 tggccgacac tggggacatc gcggcaatga agctgtatca accgcaggat gccacaacca 8340 acccttctct cattcttaac gcagcgcaga ttccggaata ccgtaagttg attgatgatg 8400 ctgtcgcctg ggcgaaacag cagagcaacg atcgcgcgca gcagatcgtg gacgcgaccg 8460 acaaactggc agtaaatatt ggtctggaaa tcctgaaact ggttccgggc cgtatctcaa 8520 ctgaagttga tgcgcgtctt tcctatgaca ccgaagcgtc aattgcgaaa gcaaaacgcc 8580 tgatcaaact ctacaacgat gctggtatta gcaacgatcg tattctgatc aaactggctt 8640 ctacctggca gggtatccgt gctgcagaac agctggaaaa agaaggcatc aactgtaacc 8700 tgaccctgct gttctccttc gctcaggctc gtgcttgtgc ggaagcgggc gtgttcctga 8760 tctcgccgtt tgttggccgt attcttgact ggtacaaagc gaataccgat aagaaagagt 8820 acgctccggc agaagatccg ggcgtggttt ctgtatctga aatctaccag tactacaaag 8880 agcacggtta tgaaaccgtg gttatgggcg caagcttccg taacatcggc gaaattctgg 8940 aactggcagg ctgcgaccgt ctgaccatcg caccggcact gctgaaagag ctggcggaga 9000 // Jolyon Holdstock Ph.D. Senior Computational Biologist, Oxford Gene Technology (Ops) Ltd. Begbroke Business and Science Park Sandy Lane, Yarnton Oxford, OX5 1PF Tel: 01865 309699 Fax: 01865 842116 Confidentiality Notice: The contents of this email from the Oxford Gene Technology Group of Companies are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Mar 8 21:18:42 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Mar 8 21:13:44 2006 Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation Message-ID: Hi Felix - There are some mapping differences between postgress and MySQL and Oracle, mostly seems to center around how they generate primary keys. I think you have solved this with your changes to the hbm.xml files. I will commit these to CVS. The second problem you describe might be caused by the enrich process. Richard has created a biojavax equivalent of GenbankSequenceDB (RichGenbankSequenceDB I think) which will mean you can avoid using the enrich method. This may solve the problem. The problem might be with the primary key of some seqfeature, this might be because of the enrich() method. *ERROR: duplicate key violates unique constraint "seqfeature_bioentry_id_key"* It may also be because of a problem in the postgres mapping of features (although if it only happens with enrich()ed sequences then probably not). It could also be some old entries in your database from previous testing that may need cleaning out (although if the hibernate mapping is correct this is not likely). - Mark Felix Dreher Sent by: biojava-l-bounces@portal.open-bio.org 03/09/2006 02:08 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation Hello all, in my last post I described a problem with primary keys. When I tried to save a RichSequence with annotations in a PostgreSQL/BioSQL-Database using Hibernate, among others the exception --- org.postgresql.util.PSQLException: ERROR: relation "ontology_ontology_id_seq" does not exist --- was thrown. This could be solved by changing the tag in the ontology.hbm.xml from to ontology_pk_seq (and similarly in the term.hbm.xml file). I'm not sure if this is specific for my project or if it's a general problem. Anyway, this works fine now, however another problem came up: I want to enrich a Sequence that was downloaded from Genbank and (by enriching) save all the annotations in the RichSequence object. Sequence seq = new GenbankSequenceDB().getSequence("NM_008160"); RichSequence s = RichSequence.Tools.enrich(seq); tdb.addSequence(s); (where tdb is a convenience wrapper for storing and retrieving sequences from the BioSQL-DB, it works with non-enriched sequences). From the debugging info I got, this works at the object level, but when I try to save the sequence to the DB, the following exception is thrown: 2006-03-08 18:35:00,642 ERROR [httpWorkerThread-28080-9] calling method: org.hibernate.util.JDBCExceptionReporter.logExceptions(JDBCExceptionReporter.java:72) *ERROR: duplicate key violates unique constraint "seqfeature_bioentry_id_key"* 2006-03-08 18:35:00,643 ERROR [httpWorkerThread-28080-9] calling method: org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:299) *Could not synchronize database state with session* org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:69) at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:202) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86) at org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427) at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009) at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356) at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106) at rnaiprediction.sequence.db.SequenceDB.addSequence(SequenceDB.java:67) at rnaiprediction.Queue.prerender(Queue.java:374) ...... *Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into seqfeature (bioentry_id, source_term_id, type_term_id, display_name, rank, seqfeature_id) values (126, 269, 269, NULL, 0, 83) was aborted. Call getNextException to see the cause.* at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2497) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1298) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:347) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2559) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195) ... 57 more Any suggestions would be highly appreciated! Regards, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From anderson.moura at telemar-rj.com.br Thu Mar 9 11:59:37 2006 From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva) Date: Thu Mar 9 12:06:21 2006 Subject: [Biojava-l] Alignmente algorithms implemented by BioJava Message-ID: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net> Hi, Can somebody tell me what algorithms Biojava uses to make local alignments and multiples alignments? I'm Serching it on the Documentation but I have not found it? Where could I find it? I'm working on a Java Environment for Analysis and Alignment of Sequences! Thanks, Anderson Moura - Brazil Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a. From sylvain.foisy at bioneq.qc.ca Thu Mar 9 12:45:45 2006 From: sylvain.foisy at bioneq.qc.ca (Sylvain Foisy) Date: Thu Mar 9 13:19:11 2006 Subject: [Biojava-l] Alignmente algorithms implemented by BioJava In-Reply-To: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net> Message-ID: Hi there, The CVS code in BioJava contains the necessary classes to perform both Smith-Waterman and Needleman-Wunch pairwise alignments. Just have a look at the Javadocs for instructions. The preferred way around here is based around using HMMs to perform the alignments; for this look at the Cookbook section of the BioJava's new website for ways to do that (http://biojava.open-bio.org). As far as MSA are concerned, I don't think that there is anything to do that in BJ. You could call clustal from within your program and use BJ's MSA parsing classes to do your stuff. Hope this helps Sylvain On 3/9/06 11:59 AM, "[NAME]" <[ADDRESS]> wrote: > Hi, > > Can somebody tell me what algorithms Biojava uses to make local alignments and > multiples alignments? > I'm Serching it on the Documentation but I have not found it? > > Where could I find it? I'm working on a Java Environment for Analysis and > Alignment of Sequences! > > Thanks, > Anderson Moura - Brazil > > > Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas > e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do > remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, > informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. > Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe > respondendo imediatamente a este e-mail e em seguida apague-a. > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ================================================================== Sylvain Foisy, Ph. D. Directeur - operations / Project Manager BioneQ - Reseau quebecois de bio-informatique U. de Montreal / Genome-Quebec Adresse postale: Departement de biochimie Pavillon principal 2900, boul. ?douard-Montpetit Montr?al (Qu?bec) H3T 1J4 Tel: (514) 343-6111 x.2545 Fax: (514) 343-7759 Courriel: sylvain.foisy@bioneq.qc.ca ================================================================== From anderson.moura at telemar-rj.com.br Thu Mar 9 13:17:59 2006 From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva) Date: Thu Mar 9 13:22:46 2006 Subject: RES: [Biojava-l] Alignmente algorithms implemented by BioJava Message-ID: <3C39C09ED334F243838953854BE43FB6022BE082@MAILBX02.telemar.corp.net> Thanks, I was really looking for these 3 ones!! I'm very knew in Bioinformatics, so can I ask if there is others algorithms that are really used? I only know these 3 ones (SW, NW and HMM)!! Thanks Anderson Moura - Brazil -----Mensagem original----- De: Sylvain Foisy [mailto:sylvain.foisy@bioneq.qc.ca] Enviada em: quinta-feira, 9 de mar?o de 2006 14:46 Para: Anderson Moura da Silva Cc: biojava-l@portal.open-bio.org Assunto: Re: [Biojava-l] Alignmente algorithms implemented by BioJava Hi there, The CVS code in BioJava contains the necessary classes to perform both Smith-Waterman and Needleman-Wunch pairwise alignments. Just have a look at the Javadocs for instructions. The preferred way around here is based around using HMMs to perform the alignments; for this look at the Cookbook section of the BioJava's new website for ways to do that (http://biojava.open-bio.org). As far as MSA are concerned, I don't think that there is anything to do that in BJ. You could call clustal from within your program and use BJ's MSA parsing classes to do your stuff. Hope this helps Sylvain On 3/9/06 11:59 AM, "[NAME]" <[ADDRESS]> wrote: > Hi, > > Can somebody tell me what algorithms Biojava uses to make local alignments and > multiples alignments? > I'm Serching it on the Documentation but I have not found it? > > Where could I find it? I'm working on a Java Environment for Analysis and > Alignment of Sequences! > > Thanks, > Anderson Moura - Brazil > > > Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas > e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do > remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, > informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. > Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe > respondendo imediatamente a este e-mail e em seguida apague-a. > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ================================================================== Sylvain Foisy, Ph. D. Directeur - operations / Project Manager BioneQ - Reseau quebecois de bio-informatique U. de Montreal / Genome-Quebec Adresse postale: Departement de biochimie Pavillon principal 2900, boul. ?douard-Montpetit Montr?al (Qu?bec) H3T 1J4 Tel: (514) 343-6111 x.2545 Fax: (514) 343-7759 Courriel: sylvain.foisy@bioneq.qc.ca ================================================================== From guedes at unisul.br Thu Mar 9 14:33:59 2006 From: guedes at unisul.br (Dickson S. Guedes) Date: Thu Mar 9 15:37:43 2006 Subject: [Biojava-l] Alignmente algorithms implemented by BioJava In-Reply-To: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net> References: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net> Message-ID: <44108327.9040009@unisul.br> Anderson Moura da Silva escreveu: > Hi, > > Can somebody tell me what algorithms Biojava uses to make local alignments and multiples alignments? > I'm Serching it on the Documentation but I have not found it? > > Where could I find it? I'm working on a Java Environment for Analysis and Alignment of Sequences! > > Thanks, > Anderson Moura - Brazil Sorry ALL but I'll reply this message using my natural language. Ola Anderson, O BioJava n?o implementa uma classe para Alinhamentos Multiplos, por?m tem como voce fazer alinhamento de pares (PairWise) usando programa??o din?mica (DP) neste caso h? um exemplo no CookBook no link abaixo: - http://biojava.open-bio.org/wiki/BioJava:CookBook:DP:PairWise Voce pode optar por fazer alinhamentos multiplos usando o pacote STRAP neste caso ha um exemplo em: - http://www.charite.de/bioinf/strap/biojavaInAnger_SequenceAligner.html []'s -- Dickson S. Guedes /* * UNISUL - Universidade do Sul de Santa Catarina * ATI - Assessoria de Tecnologia da Informa??o * (0xx48) 621-3200 - http://www.unisul.br * * "Quis custodiet ipsos custodes?" */ From dreher at mpiib-berlin.mpg.de Fri Mar 10 10:37:21 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Fri Mar 10 10:33:42 2006 Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation In-Reply-To: References: Message-ID: <44119D31.6010703@mpiib-berlin.mpg.de> Hi Mark, I changed my code and now use the new GenbankRichSequenceDB.getRichSequence(String id) method, and this solved my problem. However I had to change some of the hbm.xml files again. I will commit these to CVS. Thanks again for your help. Regards, Felix mark.schreiber@novartis.com wrote: >Hi Felix - > >There are some mapping differences between postgress and MySQL and Oracle, >mostly seems to center around how they generate primary keys. I think you >have solved this with your changes to the hbm.xml files. I will commit >these to CVS. > >The second problem you describe might be caused by the enrich process. >Richard has created a biojavax equivalent of GenbankSequenceDB >(RichGenbankSequenceDB I think) which will mean you can avoid using the >enrich method. This may solve the problem. > >The problem might be with the primary key of some seqfeature, this might >be because of the enrich() method. > >*ERROR: duplicate key violates unique constraint >"seqfeature_bioentry_id_key"* > >It may also be because of a problem in the postgres mapping of features >(although if it only happens with enrich()ed sequences then probably not). > >It could also be some old entries in your database from previous testing >that may need cleaning out (although if the hibernate mapping is correct >this is not likely). > >- Mark > > > > > >Felix Dreher >Sent by: biojava-l-bounces@portal.open-bio.org >03/09/2006 02:08 AM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation > > >Hello all, > >in my last post I described a problem with primary keys. When I tried to >save a RichSequence with annotations in a PostgreSQL/BioSQL-Database >using Hibernate, >among others the exception >--- org.postgresql.util.PSQLException: ERROR: relation >"ontology_ontology_id_seq" does not exist --- >was thrown. >This could be solved by changing the tag in the >ontology.hbm.xml >from > > >to > > ontology_pk_seq > > >(and similarly in the term.hbm.xml file). > >I'm not sure if this is specific for my project or if it's a general >problem. >Anyway, this works fine now, however another problem came up: > >I want to enrich a Sequence that was downloaded from Genbank and (by >enriching) save all the annotations in the RichSequence object. > >Sequence seq = new GenbankSequenceDB().getSequence("NM_008160"); >RichSequence s = RichSequence.Tools.enrich(seq); >tdb.addSequence(s); > >(where tdb is a convenience wrapper for storing and retrieving sequences >from the BioSQL-DB, it works with non-enriched sequences). > > From the debugging info I got, this works at the object level, but when >I try to save the sequence to the DB, the following exception is thrown: > > > >2006-03-08 18:35:00,642 ERROR [httpWorkerThread-28080-9] > calling method: >org.hibernate.util.JDBCExceptionReporter.logExceptions(JDBCExceptionReporter.java:72) > *ERROR: duplicate key violates unique constraint >"seqfeature_bioentry_id_key"* > >2006-03-08 18:35:00,643 ERROR [httpWorkerThread-28080-9] > calling method: >org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:299) > *Could not synchronize database state with session* >org.hibernate.exception.ConstraintViolationException: Could not execute >JDBC batch update > at >org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:69) > at >org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43) > at >org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:202) > at >org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91) > at >org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86) > at >org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171) > at >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048) > at >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427) > at >org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51) > at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) > at >org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227) > at >org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140) > at >org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296) > at >org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) > at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009) > at >org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356) > at >org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106) > at >rnaiprediction.sequence.db.SequenceDB.addSequence(SequenceDB.java:67) > at rnaiprediction.Queue.prerender(Queue.java:374) >...... > >*Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into >seqfeature (bioentry_id, source_term_id, type_term_id, display_name, >rank, seqfeature_id) values (126, 269, 269, NULL, 0, 83) was aborted. >Call getNextException to see the cause.* > at >org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2497) > at >org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1298) > at >org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:347) > at >org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2559) > at >org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58) > at >org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195) > ... 57 more > > >Any suggestions would be highly appreciated! > >Regards, >Felix > > > > -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From emy_66 at hotmail.com Mon Mar 13 00:39:41 2006 From: emy_66 at hotmail.com (Emily Wong) Date: Mon, 13 Mar 2006 05:39:41 +0000 Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13 Message-ID: Hi, Is there a parser that takes into account ncbi blast version 2.2.13(on their website)? I am trying to use the code here to parse : http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the parser from strict to lazy I get these comments : Exception in thread "main" java.lang.NullPointerException at org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215) at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164) at org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311) at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160) at BlastParser.main(BlastParser.java:46) Thanks, Emily From mark.schreiber at novartis.com Mon Mar 13 20:07:39 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 14 Mar 2006 09:07:39 +0800 Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13 Message-ID: Possibly some variation in this output is causing the problem Can you post some blast output that replicates this error? Thanks - Mark "Emily Wong" Sent by: biojava-l-bounces at lists.open-bio.org 03/13/2006 01:39 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13 Hi, Is there a parser that takes into account ncbi blast version 2.2.13(on their website)? I am trying to use the code here to parse : http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the parser from strict to lazy I get these comments : Exception in thread "main" java.lang.NullPointerException at org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215) at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164) at org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311) at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160) at BlastParser.main(BlastParser.java:46) Thanks, Emily _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From christoph.gille at charite.de Tue Mar 14 02:47:10 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 14 Mar 2006 08:47:10 +0100 (CET) Subject: [Biojava-l] alignment algor in Biojava In-Reply-To: References: Message-ID: <64617.84.190.58.246.1142322430.squirrel@webmail.charite.de> > Hi, > > Can somebody tell me what algorithms Biojava uses to make local alignments > and multiples alignments? I'm Serching it on the Documentation but I have > not found it? at the bottom of the page http://www.biojava.org/docs/bj_in_anger/index.htm http://www.charite.de/bioinf/strap/Scripting.html#SequenceAligner Cheers Christoph From koeberle at mpiib-berlin.mpg.de Tue Mar 14 07:28:37 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Tue, 14 Mar 2006 13:28:37 +0100 Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ? Message-ID: <4416B6F5.1050905@mpiib-berlin.mpg.de> Hi, I try to write a Sequence-Object into BioSQL-DB using the Classes of BioJAVA-X. This works well. But if I try to save a Sequence-Object with two (or more) Features and both Feature have equal Types and equal Sources, writing in DB fails. Is the idea wrong to have more than one Feature with same type and source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL. Thanks, Christian The Errormessage: org.hibernate.StaleStateException: Batch update returned unexpected row count from update: 0 actual row count: 0 expected: 1 at org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86) at org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427) at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009) at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356) at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106) -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de From koeberle at mpiib-berlin.mpg.de Tue Mar 14 09:06:43 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Tue, 14 Mar 2006 15:06:43 +0100 Subject: [Biojava-l] BioJAVA-X + BioSQL + no update Message-ID: <4416CDF3.4000407@mpiib-berlin.mpg.de> Hi, I have following problem. I put a RichSequence-Object into a BioSQL-DB, using the new Classes from BioJAVA-X. Later I get these Sequence-Object from the BioSQL-DB (also with BioJAVA-X) and create new Faeture-Objects and Note-Objects and add these to the Sequence-Object. In the case of BioJAVA 1.4 all Features and Annotations are written into the BioSQL-DB. In case of BioJAVA-X there are no changes ind the DB. Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the changes into the DB. Thanks, Christian -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de From bubba.puryear at gmail.com Mon Mar 13 12:27:11 2006 From: bubba.puryear at gmail.com (Bubba Puryear) Date: Mon, 13 Mar 2006 12:27:11 -0500 Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records Message-ID: Hello, I work on a webapp for a biotech company that uses biojava to parse plasmid and feature maps (genbank flatfile format) and we store them in a local database. I've wanted to update the version of biojava we use because the current CVS parser handles features that cross the origin on plasmid maps much better than the parser in 1.4. However, we have a lot of data in various databases that have genbank records formatted in some of the older incarnations of the GFF. In particular, some feature maps don't have ACCESSION fields, and/or are missing modification dates and genbank divisions on the LOCUS line. When I try to parse one of those maps with biojavax, I get parse errors. Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat class be made more tolerant? I know NCBI made several changes to their flatfile format in part because writing parsers for the older specs was tricky. So I'm not sure which direction the bio* folks would like to go with this. I've attached a small example map that causes parse problems. The data in the map is completely bogus, but the structure was taken from a real map file I have to deal with. The following code snippet illustrates my problems: BufferedReader br = new BufferedReader(new StringReader(genbankContent)); try { RichSequenceIterator sequences = IOTools.readGenbankDNA(br, null); if (sequences.hasNext()) { this.sequence = sequences.nextRichSequence(); } } catch (Exception e) { e.printStackTrace(); } where genbankContent is a String containing the contents of the attached file. Thanks much, Bubba Puryear -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.gb Type: chemical/seq-na-genbank Size: 1091 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biojava-l/attachments/20060313/b56af1d0/attachment.bin From mira.edelstein at gmx.net Tue Mar 14 17:30:01 2006 From: mira.edelstein at gmx.net (Mira) Date: Tue, 14 Mar 2006 23:30:01 +0100 Subject: [Biojava-l] (no subject) Message-ID: <001501c647b6$d5954f70$9b7ba8c0@mecom> please take me from the mailing list thanks mira From mark.schreiber at novartis.com Wed Mar 15 01:42:59 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 15 Mar 2006 14:42:59 +0800 Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ? Message-ID: This could be a bug, this is bleeding edge development code. Are you using the most up to date CVS code? Also which database are you using? As a suggestion RichFeatures with the same Type, Source and Parent sequence can only be distinguished by rank (In BioSQL and BioJavaX). Can you persist them to the DB if you give one a different rank? - Mark Christian K?berle Sent by: biojava-l-bounces at lists.open-bio.org 03/14/2006 08:28 PM To: bio java mailing list cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ? Hi, I try to write a Sequence-Object into BioSQL-DB using the Classes of BioJAVA-X. This works well. But if I try to save a Sequence-Object with two (or more) Features and both Feature have equal Types and equal Sources, writing in DB fails. Is the idea wrong to have more than one Feature with same type and source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL. Thanks, Christian The Errormessage: org.hibernate.StaleStateException: Batch update returned unexpected row count from update: 0 actual row count: 0 expected: 1 at org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86) at org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427) at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009) at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356) at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106) -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Mar 15 02:02:02 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 15 Mar 2006 15:02:02 +0800 Subject: [Biojava-l] BioJAVA-X + BioSQL + no update Message-ID: With BioJavaX if you want any changes to a RichSequence object to persist to the database you need to "save or add it" with Hibernate. SessionFactory sessionFactory = new Configuration().configure().buildSessionFactory(); Session session = sessionFactory.openSession(); RichObjectFactory.connectToBioSQL(session); RichSequence rs = ...; // some sequence you've made or modified session.saveOrUpdate("Sequence",rs); // persist the sequence *** Another way is to do everything inside a transaction (this example is from the BioJavaX docbook in CVS) SessionFactory sessionFactory = new Configuration().configure().buildSessionFactory(); Session session = sessionFactory.openSession(); RichObjectFactory.connectToBioSQL(session); Transaction tx = session.beginTransaction(); try { // print out all the namespaces in the database Query q = session.createQuery("from Namespace"); List namespaces = q.list(); // retrieve all the namespaces from the db for (Iterator i = namespaces.iterator(); i.hasNext(); ) { Namespace ns = (Namespace)i.next(); System.out.println(ns.getName()); // print out the name of the namespace // print out all the sequences in the namespace Query sq = session.createQuery("from BioEntry where namespace= :nsp"); // set the named parameter "nsp" to ns sq.setParameter("nsp",ns); List sequences = sq.list(); for (Iterator j = sequences.iterator(); j.hasNext(); ) { BioEntry be = (BioEntry)j.next(); // RichSequences are BioEntrys too System.out.println(" "+be.getName()); // print out the name of the sequence // if the sequence is called bloggs, change its description to XYZ if (be.getName().equals("bloggs")) { be.setDescription("XYZ"); } } } // commit and tidy up tx.commit(); System.out.println("Changes committed."); // all sequences called bloggs now have a description of "XYZ" in the database } catch (Exception e) { tx.rollback(); System.out.println("Changes rolled back."); e.printStackTrace(); } session.close(); Christian K?berle Sent by: biojava-l-bounces at lists.open-bio.org 03/14/2006 10:06 PM To: bio java mailing list cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioJAVA-X + BioSQL + no update Hi, I have following problem. I put a RichSequence-Object into a BioSQL-DB, using the new Classes from BioJAVA-X. Later I get these Sequence-Object from the BioSQL-DB (also with BioJAVA-X) and create new Faeture-Objects and Note-Objects and add these to the Sequence-Object. In the case of BioJAVA 1.4 all Features and Annotations are written into the BioSQL-DB. In case of BioJAVA-X there are no changes ind the DB. Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the changes into the DB. Thanks, Christian -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Mar 15 02:11:55 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 15 Mar 2006 15:11:55 +0800 Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records Message-ID: Hi - I'm happy for the regexps in GenbankFormat and EMBLFormat etc to be relaxed a little as long as the parsing of fully valid genbank files doesn't suffer. If someone wants to test this thoroughly it would be a great benefit to the whole community. In some cases it may not be possible. For example if a feature doesn't have sufficient information to build a proper RichFeature object I don't think we should allow the file. I might be good to make a collection in CVS of example files that are known to have broken the parser in the past (the files folder in the test suite would be an ideal place). - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Bubba Puryear" Sent by: biojava-l-bounces at lists.open-bio.org 03/14/2006 01:27 AM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records Hello, I work on a webapp for a biotech company that uses biojava to parse plasmid and feature maps (genbank flatfile format) and we store them in a local database. I've wanted to update the version of biojava we use because the current CVS parser handles features that cross the origin on plasmid maps much better than the parser in 1.4. However, we have a lot of data in various databases that have genbank records formatted in some of the older incarnations of the GFF. In particular, some feature maps don't have ACCESSION fields, and/or are missing modification dates and genbank divisions on the LOCUS line. When I try to parse one of those maps with biojavax, I get parse errors. Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat class be made more tolerant? I know NCBI made several changes to their flatfile format in part because writing parsers for the older specs was tricky. So I'm not sure which direction the bio* folks would like to go with this. I've attached a small example map that causes parse problems. The data in the map is completely bogus, but the structure was taken from a real map file I have to deal with. The following code snippet illustrates my problems: BufferedReader br = new BufferedReader(new StringReader(genbankContent)); try { RichSequenceIterator sequences = IOTools.readGenbankDNA(br, null); if (sequences.hasNext()) { this.sequence = sequences.nextRichSequence(); } } catch (Exception e) { e.printStackTrace(); } where genbankContent is a String containing the contents of the attached file. Thanks much, Bubba Puryear _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l [ Attachment ''FOO.GB'' removed by Mark Schreiber ] From koeberle at mpiib-berlin.mpg.de Thu Mar 16 05:03:26 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Thu, 16 Mar 2006 11:03:26 +0100 Subject: [Biojava-l] BioJAVA-X + BioSQL + no update In-Reply-To: References: Message-ID: <441937EE.6000204@mpiib-berlin.mpg.de> Hi Mark, it works but the code has to look like that: ... session.getTransaction().begin(); session.saveOrUpdate("Sequence",seq); session.getTransaction().commit(); it also works with: session.update("Sequence",seq); Thanks, Christian mark.schreiber at novartis.com wrote: > With BioJavaX if you want any changes to a RichSequence object to persist > to the database you need to "save or add it" with Hibernate. > > > SessionFactory sessionFactory = new > Configuration().configure().buildSessionFactory(); > Session session = sessionFactory.openSession(); > RichObjectFactory.connectToBioSQL(session); > > RichSequence rs = ...; // some sequence you've made or > modified > session.saveOrUpdate("Sequence",rs); // persist the sequence > > *** > Another way is to do everything inside a transaction (this example is from > the BioJavaX docbook in CVS) > > SessionFactory sessionFactory = new > Configuration().configure().buildSessionFactory(); > Session session = sessionFactory.openSession(); > RichObjectFactory.connectToBioSQL(session); > > Transaction tx = session.beginTransaction(); > try { > > // print out all the namespaces in the database > > Query q = session.createQuery("from Namespace"); > List namespaces = q.list(); // retrieve all the > namespaces from the db > for (Iterator i = namespaces.iterator(); i.hasNext(); ) { > Namespace ns = (Namespace)i.next(); > System.out.println(ns.getName()); // print out the name of the > namespace > > // print out all the sequences in the namespace > Query sq = session.createQuery("from BioEntry where namespace= > :nsp"); > // set the named parameter "nsp" to ns > sq.setParameter("nsp",ns); > List sequences = sq.list(); > > for (Iterator j = sequences.iterator(); j.hasNext(); ) { > BioEntry be = (BioEntry)j.next(); // RichSequences are > BioEntrys too > System.out.println(" "+be.getName()); // print out the name > of the sequence > > // if the sequence is called bloggs, change its description to > XYZ > > if (be.getName().equals("bloggs")) { > be.setDescription("XYZ"); > } > } > > } > > // commit and tidy up > tx.commit(); > System.out.println("Changes committed."); > > // all sequences called bloggs now have a description of "XYZ" in the > database > > } catch (Exception e) { > tx.rollback(); > System.out.println("Changes rolled back."); > e.printStackTrace(); > } > > session.close(); > > > > > > > Christian K?berle > Sent by: biojava-l-bounces at lists.open-bio.org > 03/14/2006 10:06 PM > > > To: bio java mailing list > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] BioJAVA-X + BioSQL + no update > > > Hi, > I have following problem. > I put a RichSequence-Object into a BioSQL-DB, using the new Classes from > BioJAVA-X. > Later I get these Sequence-Object from the BioSQL-DB (also with > BioJAVA-X) and create new Faeture-Objects and Note-Objects and add > these to the Sequence-Object. > In the case of BioJAVA 1.4 all Features and Annotations are written into > the BioSQL-DB. In case of BioJAVA-X there are no changes ind the DB. > Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the > changes into the DB. > > Thanks, > Christian > > -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de From mark.schreiber at novartis.com Thu Mar 16 21:50:34 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 17 Mar 2006 10:50:34 +0800 Subject: [Biojava-l] ProfileHMM Serialization Problem Message-ID: He did fix a number of problems, although possibly not all, Which version are you using? Can you send a stack trace? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Todd Riley 03/17/2006 10:33 AM To: Mark Schreiber/GP/Novartis at PH cc: biojava-l-bounces at portal.open-bio.org, biojava-l at biojava.org Subject: ProfileHMM Serialization Problem Hello all, I am having a problem with serialized ProfileHMM objects. I can read in one serialized ProfileHMM object, but never more than one (I can't even read in the same serialized object again.) It appears that the problem lies with the AlphabetManager. Maybe a clash with alphabet names and/or indexes? I looked in the archives and found the problem seemed to exist back in Oct of 2002. Has this ever been addressed? Any help here would be greatly appreciated. Thanks, Todd RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Schreiber, Mark Tue, 08 Oct 2002 13:11:33 -0700 Yup, It needs fixing, serialization and BioJava just don't seem to play that well :( The question is what kind of API. The attractive part about serialization is that when it works you get back what you started with. You can also do RMI. The downside of the XML model is you don't get back what you had before, you get back a MarkovModel, all of your custom designed methods etc are lost. Two ways I can see to get around this. One right a wrapper class that makes your custom model and the thing returned by the XMLMarkovModel look the same (look like the same interface generally). The other option is to mimic something like JAXB (not JAXB though as it won't cope well with BioJava flyweight symbols and alphabets). Somewhere the class name is stored in the XML and through the wonders of introspection things are returned to how they were. This generally requires the class to be designed as a valid bean, or at least point to a nice FactoryClass or something. Ultimately this would be good for all of BioJava. I know people hate the idea of another XML format but I think that there really isn't one that represents what we are trying to do here. You could also write XSLT to transform into XML flavours that aren't as interested in gory details such as classnames etc which are needed for serialization. Just my $0.02 - Mark > -----Original Message----- > From: Matthew Pocock [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, 9 October 2002 7:08 a.m. > To: Lachlan Coin; [EMAIL PROTECTED] > Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs > > > Hi, > > HMM serialization (or persistance) seems to be an > ongoing problem for people. We (OK - I) wrote this > code a long time ago, back in the dark ages when I > didn't know much about programming. Does anyone want > to fix this mess once and for all, and write a HMM > persistance API? It sounds like that would be a realy > helpfull thing to have. > > Matthew > > --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi > > > > Having made a mistake in serialising HMMs before - > > are you writing your > > serialised object at several points in the code? > > Unless you write all of > > the models at the same point, they will not work > > when you read them back > > in. > > > > Cheers, > > > > Lachlan > > > > > > > > Message: 1 > > > Subject: RE: [Biojava-l] Create DP object from > > profileHMM class file > > > Date: Tue, 8 Oct 2002 08:53:41 +1300 > > > From: "Schreiber, Mark" > > <[EMAIL PROTECTED]> > > > To: "Tisanai" <[EMAIL PROTECTED]>, > > <[EMAIL PROTECTED]> > > > > > > Hi - > > > > > > The error is coming from the 64th line of your > > program (at > > > T_Zscore.main(T_Zscore.java:64)) > > > > > > I can see two places that the error might be > > coming from but I need to > > > know which line is the 64th line of the program. > > > > > > Is it: ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > > > Or is it: dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > > > > > > > > > > -----Original Message----- > > > > From: Tisanai > > [mailto:[EMAIL PROTECTED]] > > > > Sent: Tuesday, 8 October 2002 2:40 a.m. > > > > To: [EMAIL PROTECTED] > > > > Subject: [Biojava-l] Create DP object from > > profileHMM class file > > > > > > > > > > > > Hi > > > > > > > > By this code I would like to create DP object > > from several > > > > phmm file. > > > > > > > > for(int > > i=0;i > > > String model_out_name = > > md_out_lst.align[i]; > > > > File md_file = new File(model_out_name); > > > > > > > > FileInputStream fis_md = new > > FileInputStream(md_file); > > > > ObjectInputStream ois_md = new > > ObjectInputStream(fis_md); > > > > ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > ois_md.close(); > > > > dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > } > > > > > > > > I found that it always stuck at the second file (i=2). If there is only one file in my list this code will > > work fine. But if there is more than one file in the list when it try to > > > > create the second dp object (dp[1]). This kind of error will shown out: > > > > > > > > org.biojava.bio.BioError: State d-15 > > is known in > > > > states but is not listed in the transFrom table > > > > at > > > > > > > org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar > > > > kovModel.java:227) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586) > > > > at org.biojava.bio.dp.DP.stateList(DP.java:123) > > > > at org.biojava.bio.dp.DP.update(DP.java:353) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49) > > > > at org.biojava.bio.dp.DP.(DP.java:377) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.(SingleDP.java:41) > > > > at > > > > > > > org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory > > > .java:53) > > > > at T_Zscore.main(T_Zscore.java:64) > > > > > > > > How can I fix my code? > > > > > > > > Thank > > > > Tisanai > > > > > > > > _______________________________________________ > > > > Biojava-l mailing list - [EMAIL PROTECTED] > > > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > ============================================================== > ========= > > > Attention: The information contained in this > > message and/or attachments > > > from AgResearch Limited is intended only for the > > persons or entities > > > to which it is addressed and may contain > > confidential and/or privileged > > > material. Any review, retransmission, > > dissemination or other use of, or > > > taking of any action in reliance upon, this > > information by persons or > > > entities other than the intended recipients is > > prohibited by AgResearch > > > Limited. If you have received this message in > > error, please notify the > > > sender immediately. > > > > > > ============================================================== > ========= > > > > _______________________________________________ > > Biojava-l mailing list - [EMAIL PROTECTED] > > http://biojava.org/mailman/listinfo/biojava-l > > __________________________________________________ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Lachlan Coin From mark.schreiber at novartis.com Thu Mar 16 21:52:52 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 17 Mar 2006 10:52:52 +0800 Subject: [Biojava-l] Away Message-ID: Hello - I'm going to be travelling a lot in the next 5 weeks and may only have patchy access to email and no access to CVS or my development machines. Therefore I won't be able to offer much in the way of technical support. Hopefully Richard and Michael will be able to deal with any major issues that crop up. Best regards, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From toddri at eden.rutgers.edu Thu Mar 16 21:33:05 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu, 16 Mar 2006 21:33:05 -0500 Subject: [Biojava-l] ProfileHMM Serialization Problem In-Reply-To: References: Message-ID: <441A1FE1.9000508@eden.rutgers.edu> Hello all, I am having a problem with serialized ProfileHMM objects. I can read in one serialized ProfileHMM object, but never more than one (I can't even read in the same serialized object again.) It appears that the problem lies with the AlphabetManager. Maybe a clash with alphabet names and/or indexes? I looked in the archives and found the problem seemed to exist back in Oct of 2002. Has this ever been addressed? Any help here would be greatly appreciated. Thanks, Todd RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Schreiber, Mark Tue, 08 Oct 2002 13:11:33 -0700 Yup, It needs fixing, serialization and BioJava just don't seem to play that well :( The question is what kind of API. The attractive part about serialization is that when it works you get back what you started with. You can also do RMI. The downside of the XML model is you don't get back what you had before, you get back a MarkovModel, all of your custom designed methods etc are lost. Two ways I can see to get around this. One right a wrapper class that makes your custom model and the thing returned by the XMLMarkovModel look the same (look like the same interface generally). The other option is to mimic something like JAXB (not JAXB though as it won't cope well with BioJava flyweight symbols and alphabets). Somewhere the class name is stored in the XML and through the wonders of introspection things are returned to how they were. This generally requires the class to be designed as a valid bean, or at least point to a nice FactoryClass or something. Ultimately this would be good for all of BioJava. I know people hate the idea of another XML format but I think that there really isn't one that represents what we are trying to do here. You could also write XSLT to transform into XML flavours that aren't as interested in gory details such as classnames etc which are needed for serialization. Just my $0.02 - Mark > -----Original Message----- > From: Matthew Pocock [mailto:[EMAIL PROTECTED] ] > Sent: Wednesday, 9 October 2002 7:08 a.m. > To: Lachlan Coin; [EMAIL PROTECTED] > Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs > > > Hi, > > HMM serialization (or persistance) seems to be an > ongoing problem for people. We (OK - I) wrote this > code a long time ago, back in the dark ages when I > didn't know much about programming. Does anyone want > to fix this mess once and for all, and write a HMM > persistance API? It sounds like that would be a realy > helpfull thing to have. > > Matthew > > --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi > > > > Having made a mistake in serialising HMMs before - > > are you writing your > > serialised object at several points in the code? > > Unless you write all of > > the models at the same point, they will not work > > when you read them back > > in. > > > > Cheers, > > > > Lachlan > > > > > > > > Message: 1 > > > Subject: RE: [Biojava-l] Create DP object from > > profileHMM class file > > > Date: Tue, 8 Oct 2002 08:53:41 +1300 > > > From: "Schreiber, Mark" > > <[EMAIL PROTECTED]> > > > To: "Tisanai" <[EMAIL PROTECTED]>, > > <[EMAIL PROTECTED]> > > > > > > Hi - > > > > > > The error is coming from the 64th line of your > > program (at > > > T_Zscore.main(T_Zscore.java:64)) > > > > > > I can see two places that the error might be > > coming from but I need to > > > know which line is the 64th line of the program. > > > > > > Is it: ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > > > Or is it: dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > > > > > > > > > > -----Original Message----- > > > > From: Tisanai > > [mailto:[EMAIL PROTECTED] ] > > > > Sent: Tuesday, 8 October 2002 2:40 a.m. > > > > To: [EMAIL PROTECTED] > > > > Subject: [Biojava-l] Create DP object from > > profileHMM class file > > > > > > > > > > > > Hi > > > > > > > > By this code I would like to create DP object > > from several > > > > phmm file. > > > > > > > > for(int > > i=0;i > > > String model_out_name = > > md_out_lst.align[i]; > > > > File md_file = new File(model_out_name); > > > > > > > > FileInputStream fis_md = new > > FileInputStream(md_file); > > > > ObjectInputStream ois_md = new > > ObjectInputStream(fis_md); > > > > ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > ois_md.close(); > > > > dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > } > > > > > > > > I found that it always stuck at the second file (i=2). If there is only one file in my list this code will > > work fine. But if there is more than one file in the list when it try to > > > > create the second dp object (dp[1]). This kind of error will shown out: > > > > > > > > org.biojava.bio.BioError: State d-15 > > is known in > > > > states but is not listed in the transFrom table > > > > at > > > > > > > org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar > > > > kovModel.java:227) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586) > > > > at org.biojava.bio.dp.DP.stateList(DP.java:123) > > > > at org.biojava.bio.dp.DP.update(DP.java:353) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49) > > > > at org.biojava.bio.dp.DP.(DP.java:377) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.(SingleDP.java:41) > > > > at > > > > > > > org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory > > > .java:53) > > > > at T_Zscore.main(T_Zscore.java:64) > > > > > > > > How can I fix my code? > > > > > > > > Thank > > > > Tisanai > > > > > > > > _______________________________________________ > > > > Biojava-l mailing list - [EMAIL PROTECTED] > > > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > ============================================================== > ========= > > > Attention: The information contained in this > > message and/or attachments > > > from AgResearch Limited is intended only for the > > persons or entities > > > to which it is addressed and may contain > > confidential and/or privileged > > > material. Any review, retransmission, > > dissemination or other use of, or > > > taking of any action in reliance upon, this > > information by persons or > > > entities other than the intended recipients is > > prohibited by AgResearch > > > Limited. If you have received this message in > > error, please notify the > > > sender immediately. > > > > > > ============================================================== > ========= > > > > _______________________________________________ > > Biojava-l mailing list - [EMAIL PROTECTED] > > http://biojava.org/mailman/listinfo/biojava-l > > __________________________________________________ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Lachlan Coin From er.sukhdeepsingh at gmail.com Fri Mar 17 06:21:16 2006 From: er.sukhdeepsingh at gmail.com (Sukhdeep Singh) Date: Fri, 17 Mar 2006 16:51:16 +0530 Subject: [Biojava-l] need help Message-ID: <40fbb41e0603170321p572b04cdj20d8e84ae5fb3977@mail.gmail.com> hello guys myself SUKHDEEP SINGH a 2ND YEAR student of AMBALA COLLEGE OF ENGINEERING & APPLIED RESEARCH. pals i am very much dedicated to bioinformatics and want to do something great in it. i have also done basic & advanced courses in BIOINFORMATICS in my 15 day winter vacation. I hav learned the functions of some softwares such as RASMOL,SWISSPDB,CN3D( V3.1),CLUSTAL-X,HYPERCAM(V7.5 student evaluation version). i am very much dedicated to it because i have a good knowledge of computers as i am operating it for about 4 years but moderate knowledge of bio. I am also familier to the databases like KEGG,NCBI,PUBMED,ENTREZ etc. so i want you to help me by telling me any tutorial program for BIOJAVA,BIOPERL or any institute giving training in bioinformatics or any other subject related to BIOINFORMATICS for 45 days nearly in the month of july-august. so please friends jus help me out with this REPLY me at er.sukhdeepsingh at gmail.com SUKHDEEP SINGH From dag at sonsorol.org Tue Mar 21 12:55:11 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Tue, 21 Mar 2006 12:55:11 -0500 Subject: [Biojava-l] Important OBF update for biojava developers and users Message-ID: Executive summary: biojava.org new DNS is propagating as I write this email. Eventually everyone should see the new wiki-based site running on the new OBF server hardware. Read on for more info on some other upcoming changes... Hi biojava people, Sorry for the interruption but I've got some important site and server news. People will also see multiple copies of this note as I slowly transition sites over one at a time. We are in the midst of moving all of our websites, mailing lists, developers and sourcecode repositories onto more modern hardware located in a 2nd Boston area datacenter facility. The transition is important for a couple of reasons - the most urgent being that we are going to lose internet connectivity in our current hosting facility on March 27th 2006. That datacenter belongs to Wyeth Research in Cambridge, Massachusetts. Wyeth Research & Genetics Institute have been long time significant supporters & hosting providers for OBF servers and projects -- we owe them a great deal of gratitude and public acknowledgment for hosting our servers over many years. Speaking as a hardware geek I can tell you that the many years of high-bandwidth, trouble free hosting have been invaluable for our efforts and projects. Sadly, it is no longer possible for them to host our servers as they need to begin making some network and WAN circuit changes that will no longer support direct internet facing servers (such as ours) in Cambridge. The other major reason for the transition is our need to relocate onto hardware that can better be remotely managed (as our volunteer administrators are scattered all over the globe). My employer, BioTeam Inc. has donated new server hardware and is also providing the hosting facilities in a Tier 1 Boston area colocation facility. Infrastructure geeks can see pictures of the colocation cage and the new OBF servers online at this URL: http://bioteam.net/gallery/bioteamBDC -- those servers also host EMBOSS FTP/CVS and mailing lists. Current status of the migration: - All 57 mailing lists have been moved over to the new hardware (you may have noticed "lists.open-bio.org" showing up in your list messages) - The new anonymous sourcecode server is running at http:// code.open-bio.org. "cvs.biojava.org" is already pointing at it. - Your website (biojava.org) was moved to the new hardware (and new Wiki site!) about an hour ago - Developers with CVS accounts have *NOT* been migrated yet Basically we are trying to relocate everything but the developers over the next few days so we can spend the weekend on the developer and CVS transition. If DNS has not propagated yet, point your browser at http:// biojava.open-bio.org -- that is the new site your group has been building. What is happening now is DNS pointers for biojava.org and www.biojava.org are slowly changing over to point at the wiki and the new hardware. Eventually you'll see the same site regardless of which URL you use. For biojava users -------------------------- Please keep an eye on your website and mailing lists and let support at open-bio.org know if there are any problems with the transition. In particular your new wiki site contains embedded links to some parts of the 'old' static website. I caught the obvious ones -- (biojava.org/downloads/ and biojava.org/docs/ but I may have missed some. Please let me know about any broken links. Also someone may want to clean up the biojava logo image now in the wiki to make the white background transparent. For developers and leaders --------------------------------------------- Whomever will be updating the static parts of the website (/downlaod/ and /docs/) in the future will need login access to our new central webserver machine, please contact support at open-bio.org to request a user account for biojava website maintenance. For people with CVS commit/write access --------------------------------------------------------- Also note that when we finally do transition over to the new developer machine (where the real sourcecode lives), ALL developers will need to email support at open-bio.org to request a password reset. Although we can transition usernames, settings and home directories over from the old to the new machine we can not transition over existing passwords as they are stored in incompatible hashed formats. All developers are going to need new passwords for the new developer machine. We will likely make the developer machine swap this weekend. Reporting Problems / Help & Assistance ------------------------------------------------------ The transition will be complicated, we need your help to spot problems and glitches! The OBF has a new helpdesk ticketing system set up at "support at open-bio.org" so that all OBF admins can read and respond to issues and problems. Most troubles should be reported to that address. For urgent problems, especially during this transition period, feel free to contact me directly (dag at sonsorol.org) (ichat/ aol/aim screen name: bioteamdag). Regards, Chris Dagdigian open-bio.org From toddri at eden.rutgers.edu Thu Mar 23 16:59:23 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu, 23 Mar 2006 16:59:23 -0500 Subject: [Biojava-l] HMM's - Attempting some fancy stuff In-Reply-To: <44119D31.6010703@mpiib-berlin.mpg.de> References: <44119D31.6010703@mpiib-berlin.mpg.de> Message-ID: <44231A3B.6030902@eden.rutgers.edu> Hello, After successfully implementing some TFBS search models using the ProfileHMM and DP classes, I am ready to attempt some fancier stuff that is going to require some serious coding. Before I begin, I thought that I might field some questions to the BioJava users/programmers that have some experience and/or interest in the BioJava HMM classes. I want to be sure to implement features in a fashion that will maximize usability in the simplest way.... Questions: 1. Many of the TFBS sites that I am modeling are palindromic or repetitive. I wish to associate transition and emission distributions (as prior knowledge) during training in order to enforce a palindromic and/or repetitive pattern and thus also greatly reduce the parameter space. Example: A p53 TFBS is palindromic and repetitive. A 20 column Profile HMM can be greatly reduced to an HMM with a the match-state topology of 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1) 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1), where C() means DNA complement. Notice that with this model, I now have only 5 match-state emissions as opposed to 20 to train. (C(n) is a complement view over distribution n). There are also far fewer transition distributions to train if I impose that the transitions from a->b are the same as b->a or C(b)->C(a), but in the opposite direction. I wish to implement this in a fashion that does not require any changes to the current Viterbi, forward, Baum Welch, etc, algorithms, or the DP class. I have already started writing classes that provide a view (or complement view) over an existing distribution. My plan is to use these views as a means to correlate emission and transition distributions from and between different columns in the Profile HMM. Has anyone ever tried this or thought of trying this? Any ideas about how to implement this could be very useful. 2. I wish to use more complicated background models than just a 0-th order background distribution. I would like to use a Dirichlet mixture and/or higher order Markov models. Has anyone looked into this? Any ideas as to how to implement this in the current release? -Todd From toddri at eden.rutgers.edu Thu Mar 23 18:04:45 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu, 23 Mar 2006 18:04:45 -0500 Subject: [Biojava-l] HMM's - Attempting some fancy stuff In-Reply-To: <1143153837.13405.184.camel@elm.mcb.mcgill.ca> References: <44119D31.6010703@mpiib-berlin.mpg.de> <44231A3B.6030902@eden.rutgers.edu> <1143153837.13405.184.camel@elm.mcb.mcgill.ca> Message-ID: <4423298D.8000901@eden.rutgers.edu> Yes, I agree that the palindromes are not always identical. However, often my unaligned training data is not complete enough to train the model well without some simplification. So far, I have been using Cross-validation, sensitivity, and specificity to determine the effectiveness of this simplification approach. -Todd Francois Pepin wrote: >>1. Many of the TFBS sites that I am modeling are palindromic or >>repetitive. I wish to associate transition and emission distributions >>(as prior knowledge) during training in order to enforce a palindromic >>and/or repetitive pattern and thus also greatly reduce the parameter space. >> >> > >Just as a note, we haven't found this to be ideal, if you have >sufficient training data. It is often the case that one of the >palindromes is more conserved than the other, and you would treating >them the same way. > >Of course, it depends how much of an in-depth study you'll want to be >doing. > >Francois > > > From mark.schreiber at novartis.com Thu Mar 23 21:28:04 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 24 Mar 2006 10:28:04 +0800 Subject: [Biojava-l] HMM's - Attempting some fancy stuff Message-ID: I think you could do a palindrome as a push-down automaton or similar. Alternatively you could do something like a HMM with emission duration as in Borodovsky's GeneMarkHMM programs but that would require a lot of new code for the DP library (good to have though). To use a Dirichlet mixture as your background you could calculate one and give it to a Distribution although it might be best to implement the Distribution interface with a class that generates one for you. To go to higer order models you just need a higher order alphabet (http://biojava.org/wiki/BioJava:Cookbook:Alphabets:CrossProduct) and possibly use an OrderNDistribution for background and emission (http://biojava.org/wiki/BioJava:CookBook:Distribution:Custom) - Mark Todd Riley Sent by: biojava-l-bounces at lists.open-bio.org 03/24/2006 07:04 AM To: Francois Pepin cc: biojava-l at biojava.org, Mark Schreiber/GP/Novartis at PH Subject: Re: [Biojava-l] HMM's - Attempting some fancy stuff Yes, I agree that the palindromes are not always identical. However, often my unaligned training data is not complete enough to train the model well without some simplification. So far, I have been using Cross-validation, sensitivity, and specificity to determine the effectiveness of this simplification approach. -Todd Francois Pepin wrote: >>1. Many of the TFBS sites that I am modeling are palindromic or >>repetitive. I wish to associate transition and emission distributions >>(as prior knowledge) during training in order to enforce a palindromic >>and/or repetitive pattern and thus also greatly reduce the parameter space. >> >> > >Just as a note, we haven't found this to be ideal, if you have >sufficient training data. It is often the case that one of the >palindromes is more conserved than the other, and you would treating >them the same way. > >Of course, it depends how much of an in-depth study you'll want to be >doing. > >Francois > > > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From jolyon.holdstock at ogt.co.uk Fri Mar 24 06:26:44 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Fri, 24 Mar 2006 11:26:44 -0000 Subject: [Biojava-l] RichSequence annotations... Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com> Hi, I use the following code to extract all the genes from a sequence file; I load the sequence then filter out only CDS features; iterating through these lets me get the gene annotation for the feature //====================================================================== ========= Sequence seq; String fileName = new File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e mbl"); try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); } catch (IOException IOE) { System.out.println("IOException " + IOE); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } //Create a feature filter for CDS features only FeatureFilter ff = new FeatureFilter.ByType("CDS"); //Get the filtered Features FeatureHolder fh = seq.filter(ff); //Iterate over the Features in fh for (Iterator i = fh.features(); i.hasNext(); ) { Feature f = (Feature)i.next(); Annotation annotation = f.getAnnotation(); Object key = "gene"; hash.put(annotation.getProperty(key), f); } //====================================================================== ========= I am now using the new BioJavaX classes which I cannot get to work. Does anyone has any pointers for this? I use the sequence data so have to use a RichSequence rather than a BioEntry //====================================================================== ========= RichSequence richSeq; String fileName = "C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl"; try { richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new FileReader(fileName)), null).nextRichSequence(); } catch (IOException IOE) { System.out.println("IOException " + IOE); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } //Create a feature filter for CDS features only FeatureFilter ff = new FeatureFilter.ByType("CDS"); //Get the filtered Features FeatureHolder fh = richSeq.filter(ff); //Iterate through the features for (Iterator i = fh.features(); i.hasNext(); ) { RichFeature rf = (RichFeature) i.next(); System.out.println("RichFeature: " + rf.toString()); RichAnnotation ra = (RichAnnotation) rf.getAnnotation(); System.out.println("RichAnnotation: " + ra.toString()); } //====================================================================== ========= The output shows that CDS features have been filtered successfully and that the gene name is in the annotation RichFeature: (#1) lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109 76,12496..12656,14136..14266]) RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1" 14403..14532,16852..16987,17821..17959,18068..18122, 19456..19570,23623..23753,25885..26053,29102..29240, 32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4) biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match: proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene: dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited guanine nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1] If I add the following then I can see what keys are in the annotation //====================================================================== ========= Set keySet = ra.keys(); for (Iterator it = keySet.iterator(); it.hasNext(); ) { String key = it.next().toString(); System.out.println("Key: " + key); } //====================================================================== ========= The output shows that there is a gene Key: biojavax:clone_lib Key: biojavax:codon_start Key: biojavax:evidence Key: biojavax:gene Key: biojavax:note Key: biojavax:product Key: biojavax:protein_id My understanding is that I need to use a ComparableTerm to access the value but when I create it I get a NoSuchElementException error ComparableTerm gene = RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); System.out.println("Gene: " + ra.getProperty(gene)); java.util.NoSuchElementException: No such property: biojavax:gene, rank 0 cheers, Jolyon Jolyon Holdstock Ph.D. Senior Computational Biologist, Oxford Gene Technology (Ops) Ltd. Begbroke Business and Science Park Sandy Lane, Yarnton Oxford, OX5 1PF Tel: 01865 309699 Fax: 01865 842116 Confidentiality Notice: The contents of this email from the Oxford Gene Technology Group of Companies are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. From richard.holland at ebi.ac.uk Fri Mar 24 08:16:49 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 24 Mar 2006 13:16:49 +0000 Subject: [Biojava-l] RichSequence annotations... In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com> Message-ID: <1143206209.3899.84.camel@texas.ebi.ac.uk> The terms are ranked in RichAnnotations. getProperty(term) searches for a Note with that term and a rank of zero. If you don't know the ranks, you need to use the public Note[] getProperties(Object key); method on the RichAnnotation object instead. This will return a list of all matching Note objects with the given term regardless of rank. cheers, Richard On Fri, 2006-03-24 at 11:26 +0000, Jolyon Holdstock wrote: > Hi, > > > > I use the following code to extract all the genes from a sequence file; > > I load the sequence then filter out only CDS features; iterating through > these lets me get the gene annotation for the feature > > > > //====================================================================== > ========= > > Sequence seq; > > String fileName = new > File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e > mbl"); > > try { > > seq = SeqIOTools.readEmbl(new BufferedReader(new > FileReader(fileName))).nextSequence(); > > } > > catch (IOException IOE) { > > System.out.println("IOException " + IOE); > > } > > catch (BioException BIOE) { > > System.out.println("BioException " + BIOE); > > } > > > > //Create a feature filter for CDS features only > > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > > > > //Get the filtered Features > > FeatureHolder fh = seq.filter(ff); > > > > //Iterate over the Features in fh > > for (Iterator i = fh.features(); i.hasNext(); ) { > > Feature f = (Feature)i.next(); > > Annotation annotation = f.getAnnotation(); > > Object key = "gene"; > > hash.put(annotation.getProperty(key), f); > > } > > //====================================================================== > ========= > > > > I am now using the new BioJavaX classes which I cannot get to work. Does > anyone has any pointers for this? > > I use the sequence data so have to use a RichSequence rather than a > BioEntry > > > > //====================================================================== > ========= > > RichSequence richSeq; > > String fileName = > "C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl"; > > try { > > richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new > FileReader(fileName)), null).nextRichSequence(); > > } > > catch (IOException IOE) { > > System.out.println("IOException " + IOE); > > } > > catch (BioException BIOE) { > > System.out.println("BioException " + BIOE); > > } > > > > //Create a feature filter for CDS features only > > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > > > > //Get the filtered Features > > FeatureHolder fh = richSeq.filter(ff); > > > > //Iterate through the features > > for (Iterator i = fh.features(); i.hasNext(); ) { > > RichFeature rf = (RichFeature) i.next(); > > System.out.println("RichFeature: " + rf.toString()); > > RichAnnotation ra = (RichAnnotation) rf.getAnnotation(); > > System.out.println("RichAnnotation: " + ra.toString()); > > } > > //====================================================================== > ========= > > > > The output shows that CDS features have been filtered successfully and > that the gene name is in the annotation > > > > RichFeature: (#1) > lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109 > 76,12496..12656,14136..14266]) > > RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1" > > 14403..14532,16852..16987,17821..17959,18068..18122, > > 19456..19570,23623..23753,25885..26053,29102..29240, > > 32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4) > biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match: > proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene: > dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited > guanine > > nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1] > > > > > > If I add the following then I can see what keys are in the annotation > > //====================================================================== > ========= > > Set keySet = ra.keys(); > > for (Iterator it = keySet.iterator(); it.hasNext(); ) { > > String key = it.next().toString(); > > System.out.println("Key: " + key); > > } > > //====================================================================== > ========= > > > > The output shows that there is a gene > > > > Key: biojavax:clone_lib > > Key: biojavax:codon_start > > Key: biojavax:evidence > > Key: biojavax:gene > > Key: biojavax:note > > Key: biojavax:product > > Key: biojavax:protein_id > > > > My understanding is that I need to use a ComparableTerm to access the > value but when I create it I get a NoSuchElementException error > > > > ComparableTerm gene = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); > > System.out.println("Gene: " + ra.getProperty(gene)); > > > > java.util.NoSuchElementException: No such property: biojavax:gene, rank > 0 > > > > cheers, > > > > Jolyon > > > > > > > > > > Jolyon Holdstock Ph.D. > > Senior Computational Biologist, > > Oxford Gene Technology (Ops) Ltd. > > Begbroke Business and Science Park > > Sandy Lane, Yarnton > > Oxford, OX5 1PF > > > > Tel: 01865 309699 > > Fax: 01865 842116 > > > > Confidentiality Notice: > > The contents of this email from the Oxford Gene Technology Group of > Companies are confidential and intended solely for the person to whom it > is addressed. It may contain privileged and confidential information. If > you are not the intended recipient you must not read, copy, distribute, > discuss or take any action in reliance on it. > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From dag at sonsorol.org Sat Mar 25 18:50:57 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Sat, 25 Mar 2006 18:50:57 -0500 Subject: [Biojava-l] Important news for developers on open-bio machines Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org> Hi, apologies for the massive cross-post. I'll keep it short! This message is a last-ditch attempt to contact people with developer accounts on pub.open-bio.org who may have not received the individual mails we've been sending via the obf-developers at lists.open-bio.org mailing list. We suspect that there are a number of devs out there for whom we don't have up to date email addresses. All open-bio services have been migrated to new hardware and a new datacenter. Part of this migration process involved moving all developer accounts and all source-code repositories to a new server. The developer migration was completed a few minutes ago. An unavoidable side effect of the move is that all developers are now locked out of their accounts until they contact us for a password reset. If you are a developer and this news comes as a surprise to you, it means we don't have your contact info. Your best way to get up to speed on the history and technical details behind the migration is to point your browser here: http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ thread.html ... and read the various messages we've posted this month. Included in the first message is the information on how to request an account reset. Regards, Chris Dagdigian open-bio.org From duze at gmx.de Tue Mar 28 01:44:38 2006 From: duze at gmx.de (=?ISO-8859-1?Q?=22Andreas_Dr=E4ger=22?=) Date: Tue, 28 Mar 2006 08:44:38 +0200 (MEST) Subject: [Biojava-l] (no subject) Message-ID: <2493.1143528278@www086.gmx.net> Hi, I just tried the GA-Example from the BioJava Cookbook. Therefore I included all sources from the biojava-live directory from CVS. The following line seems to cause problems: genAlg.run(new DemoStopping()); After execution one receives the following (error) message: gen,average_fitness,best_fitness 0,49.98,67.0 Exception in thread "main" java.lang.Error: Unresolved compilation problem: Syntax error on token "assert", assert expected at org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280) at org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339) at org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80) at org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108) at GADemo.main(GADemo.java:91) I do not know, how to proceed, so I post this message to you. Sincerely, Andreas Dr?ger -- Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer! Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer From richard.holland at ebi.ac.uk Tue Mar 28 02:42:33 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 28 Mar 2006 08:42:33 +0100 Subject: [Biojava-l] (no subject) In-Reply-To: <2493.1143528278@www086.gmx.net> References: <2493.1143528278@www086.gmx.net> Message-ID: <1143531753.3898.45.camel@texas.ebi.ac.uk> Hi Andreas. This sounds like a compiler version or flags problem. Could you check that you are running javac from a Java 1.4 or later installation? Also, see http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility (The Ant script uses the flag '-source 1.4' for everything). Then try doing an 'ant clean' before your 'ant package-biojava' to make sure everything gets recompiled. cheers, Richard On Tue, 2006-03-28 at 08:44 +0200, "Andreas Dr?ger" wrote: > Hi, > > I just tried the GA-Example from the BioJava Cookbook. > Therefore I included all sources from the biojava-live > directory from CVS. The following line seems to cause > problems: > > genAlg.run(new DemoStopping()); > > After execution one receives the following (error) message: > gen,average_fitness,best_fitness > 0,49.98,67.0 > Exception in thread "main" java.lang.Error: Unresolved compilation problem: > Syntax error on token "assert", assert expected > > at > org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280) > at > org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339) > at > org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80) > at > org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108) > at GADemo.main(GADemo.java:91) > > I do not know, how to proceed, so I post this message to you. > > Sincerely, > Andreas Dr?ger > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From andreas.draeger at clever-telefonieren.de Tue Mar 28 03:29:32 2006 From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Tue, 28 Mar 2006 10:29:32 +0200 Subject: [Biojava-l] GA-Package Message-ID: <4428F3EC.9050507@clever-telefonieren.de> Thanks, Now it works fine! Cheers, Andreas Richard Holland wrote: Hi Andreas. This sounds like a compiler version or flags problem. Could you check that you are running javac from a Java 1.4 or later installation? Also, see http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility (The Ant script uses the flag '-source 1.4' for everything). Then try doing an 'ant clean' before your 'ant package-biojava' to make sure everything gets recompiled. cheers, Richard -- ================================== Andreas Dr?ger PhD student Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Phone: +49-7071-29-70436 ================================== From andreas.draeger at clever-telefonieren.de Tue Mar 28 03:34:20 2006 From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Tue, 28 Mar 2006 10:34:20 +0200 Subject: [Biojava-l] GA-Package Message-ID: <4428F50C.4070104@clever-telefonieren.de> Thanks, Now it works fine! Cheers, Andreas -- ================================== Andreas Dr?ger PhD student Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Phone: +49-7071-29-70436 ================================== From wendy.wong at gmail.com Thu Mar 30 10:41:47 2006 From: wendy.wong at gmail.com (wendy wong) Date: Thu, 30 Mar 2006 16:41:47 +0100 Subject: [Biojava-l] unsupervised training of transition weights Message-ID: Hi, I am trying to train my HMM using unsupervised training (I don't need to train the emission probabilities). I was wondering how I can do so in biojava. do I have to implement the TransitionTrainer interface? my second question is: I implemnted getWeightImpl in my custom distribution to set up my emission states and it works fine. but is it possible to get the program to access it only when there's certain symbol in the observed sequence, (instead of precalculated)? and I also found that (although I might be wrong) the weights are calculated twice, once was when the distribution was created, and then when I call viterbi it calls getWeightImpl again. I am not sure what I did wrong here :( any input would be very much appreciated! thank you! wendy From td2 at sanger.ac.uk Fri Mar 31 05:58:38 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Fri, 31 Mar 2006 11:58:38 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: References: Message-ID: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> On 30 Mar 2006, at 16:41, wendy wong wrote: > Hi, > > I am trying to train my HMM using unsupervised training (I don't need > to train the emission probabilities). I was wondering how I can do so > in biojava. do I have to implement the TransitionTrainer interface? The easiest way to do this is to use UntrainableDistributions for all the transition-sets that you don't want to be trained: http://www.biojava.org/docs/api14/org/biojava/bio/dist/ UntrainableDistribution.html If UntrainableDistribution doesn't fit your requirements, the alternative is to create your own Distribution implementation with a registerTrainer method that creates a "dummy" (i.e. doesn't do anything) DistributionTrainer. UntrainableDistribution is just a subclass of SimpleDistribution which replaces the registerTrainer method with a non-functional version. > my second question is: > I implemnted getWeightImpl in my custom distribution to set up my > emission states and it works fine. but is it possible to get the > program to access it only when there's certain symbol in the observed > sequence, (instead of precalculated)? and I also found that (although > I might be wrong) the weights are calculated twice, once was when the > distribution was created, and then when I call viterbi it calls > getWeightImpl again. I am not sure what I did wrong here :( The DP code does some caching of probabilities, I don't think there's any way to turn this off without modifying the DP implementations. Thomas. From matthew.pocock at ncl.ac.uk Fri Mar 31 12:05:25 2006 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Fri, 31 Mar 2006 18:05:25 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> References: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> Message-ID: <200603311805.25861.matthew.pocock@ncl.ac.uk> > The DP code does some caching of probabilities, I don't think there's > any way to turn this off without modifying the DP implementations. > > Thomas. My reccolection is that if you did turn this off, the algorithm would run very, very much more slowly. Internally to the DP objects, the distribution probabilities (in fact, they aren't even probabilities by this stage) are stored in a data-structure optimized for the type of lookups performed during the dynamic programming recursions. Matthew From emy_66 at hotmail.com Mon Mar 13 05:39:41 2006 From: emy_66 at hotmail.com (Emily Wong) Date: Mon, 13 Mar 2006 05:39:41 +0000 Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13 Message-ID: Hi, Is there a parser that takes into account ncbi blast version 2.2.13(on their website)? I am trying to use the code here to parse : http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the parser from strict to lazy I get these comments : Exception in thread "main" java.lang.NullPointerException at org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215) at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164) at org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311) at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160) at BlastParser.main(BlastParser.java:46) Thanks, Emily From mark.schreiber at novartis.com Tue Mar 14 01:07:39 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 14 Mar 2006 09:07:39 +0800 Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13 Message-ID: Possibly some variation in this output is causing the problem Can you post some blast output that replicates this error? Thanks - Mark "Emily Wong" Sent by: biojava-l-bounces at lists.open-bio.org 03/13/2006 01:39 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13 Hi, Is there a parser that takes into account ncbi blast version 2.2.13(on their website)? I am trying to use the code here to parse : http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the parser from strict to lazy I get these comments : Exception in thread "main" java.lang.NullPointerException at org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215) at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164) at org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311) at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160) at BlastParser.main(BlastParser.java:46) Thanks, Emily _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From emy_66 at hotmail.com Tue Mar 14 02:30:55 2006 From: emy_66 at hotmail.com (Emily Wong) Date: Tue, 14 Mar 2006 02:30:55 +0000 Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13 In-Reply-To: Message-ID: Hi, Here is a truncated example of the blast output that causes the biojava blast parser to print out error messages. Thanks, Emily BLASTN 2.2.13 [Nov-27-2005] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch?ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. RID: 1142225124-9513-115032994966.BLASTQ1 Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 3,777,692 sequences; 16,788,289,139 total letters Query= gi|27477090|ref|NM_000589.2| Homo sapiens interleukin 4 (IL4), transcript variant 1, mRNA Length=921 Score E Sequences producing significant alignments: (Bits) Value gi|27477090|ref|NM_000589.2| Homo sapiens interleukin 4 (IL4), t 1649 0.0 gi|45708995|gb|BC067514.1| Homo sapiens interleukin 4, transc... 1045 0.0 gi|186334|gb|M13982.1|HUMIL4 Human interleukin 4 (IL-4) mRNA, co 1039 0.0 gi|45709847|gb|BC067515.1| Homo sapiens interleukin 4, transc... 1037 0.0 gi|47123366|gb|BC070123.1| Homo sapiens interleukin 4, transc... 1023 0.0 gi|42490870|gb|BC066277.1| Homo sapiens interleukin 4, transc... 1023 0.0 gi|42490899|gb|BC066278.1| Homo sapiens cDNA clone IMAGE:6971781 1021 0.0 gi|27477091|ref|NM_172348.1| Homo sapiens interleukin 4 (IL4), t 1003 0.0 gi|2811101|gb|AC004039.1| Homo sapiens chromosome 5, P1 clone... 1001 0.0 gi|14600279|gb|AF395008.1|AF395008 Homo sapiens interleukin 4 (I 1001 0.0 gi|1930572|gb|L81582.1|HSL81582 Homo sapiens (subclone 9_c5 f... 1001 0.0 gi|33831|emb|X06750.1|HSIL45 Human interleukin 4 gene 5'-region 1001 0.0 gi|186336|gb|M23442.1|HUMIL4A Human interleukin 4 (IL-4) gene, c 1001 0.0 gi|39980474|gb|AY480012.1| Pan troglodytes interleukin 13 (IL... 977 0.0 gi|37703741|gb|AY339646.1| Gorilla gorilla interleukin 4 (IL4... 977 0.0 gi|37703740|gb|AY339645.1| Pan paniscus interleukin 4 (IL4) g... 977 0.0 gi|37703739|gb|AY339644.1| Pan troglodytes interleukin 4 (IL4... 977 0.0 gi|62990254|gb|AC157216.2| Pan troglodytes BAC clone CH251-66... 977 0.0 gi|37703742|gb|AY339647.1| Pongo pygmaeus interleukin 4 (IL4)... 969 0.0 gi|61358435|gb|AY888625.1| Synthetic construct Homo sapiens c... 916 0.0 gi|61368530|gb|AY891269.1| Synthetic construct Homo sapiens c... 912 0.0 gi|60828588|gb|AY893811.1| Synthetic construct Homo sapiens c... 912 0.0 gi|60816922|gb|AY893365.1| Synthetic construct Homo sapiens c... 912 0.0 gi|58743320|ref|NM_001011714.1| Pan troglodytes interleukin 4 (I 908 0.0 gi|22858883|gb|AY130260.1| Pan troglodytes interleukin-4 (IL-4) 908 0.0 gi|37703743|gb|AY339648.1| Papio papio interleukin 4 (IL4) ge... 904 0.0 gi|1841299|dbj|AB000515.1| Macaca fascicularis mRNA for IL-4 pre 797 0.0 gi|514383|gb|L26027.1|MACIN4A Macaca mulatta interleukin 4 mRNA, 797 0.0 gi|644793|gb|U19838.1|CTU19838 Cercocebus torquatus interleukin- 789 0.0 gi|29569760|gb|AY234221.1| Papio anubis interleukin-4 precursor 779 0.0 gi|74136370|ref|NM_001032904.1| Macaca mulatta interleukin-4 (LO 773 0.0 gi|37014164|gb|AY376144.1| Macaca mulatta interleukin-4 mRNA, co 773 0.0 gi|20452370|gb|AF465829.1| Homo sapiens interleukin 4-like (IL4) 684 0.0 gi|40804376|gb|AY486435.1| Cercocebus torquatus atys interleu... 642 0.0 gi|40804375|gb|AY486434.1| Macaca mulatta interleukin 4 (IL-4) g 634 1e-178 gi|555892|gb|U14131.1|BTIL4S1 Bos taurus interleukin 4 (IL4) gen 561 1e-156 gi|673418|emb|X81851.1|HSIL4SV H. sapiens IL-4 gene splice varia 559 6e-156 gi|58736974|dbj|AB102862.1| Homo sapiens IL4 mRNA for interle... 559 6e-156 gi|22858887|gb|AY130262.1| Macaca fascicularis interleukin-4 ... 559 6e-156 gi|19918909|gb|AY083267.1| Macaca mulatta IL-4 gene, promoter re 553 3e-154 gi|58743332|ref|NM_001008993.1| Pan troglodytes interleukin 4 (I 551 1e-153 gi|22858885|gb|AY130261.1| Pan troglodytes interleukin-4 delt... 551 1e-153 gi|2905623|gb|AF043336.1|AF043336 Homo sapiens interleukin 4 del 551 1e-153 gi|19918910|gb|AY083268.1| Macaca radiata IL-4 gene, promoter re 545 8e-152 gi|4102679|gb|AF014509.1| Aotus nancymaae interleukin-4 (IL-4) m 545 8e-152 gi|6648935|gb|AF097321.1| Aotus lemurinus interleukin-4 (IL-4) m 545 8e-152 gi|6648933|gb|AF097320.1| Aotus nigriceps interleukin-4 (IL-4) m 545 8e-152 gi|4102669|gb|AF014504.1| Aotus vociferans interleukin-4 (IL-4) 537 2e-149 gi|25991896|gb|AF457197.1| Macaca mulatta interleukin 4 mRNA, pa 502 1e-138 gi|8575472|gb|AF235038.1|AF235038 Homo sapiens interleukin 4 (IL 486 7e-134 gi|18654097|gb|L81736.1|HUM11DC9S Homo sapiens (subclone 5_f7 fr 355 2e-94 gi|18654096|gb|L81735.1|HUM11DC92S Homo sapiens (subclone 8_e7 f 355 2e-94 gi|1930575|gb|L81579.1|HSL81579 Homo sapiens (subclone 5_f7 f... 355 2e-94 gi|1930574|gb|L81580.1|HSL81580 Homo sapiens (subclone 8_e7 f... 355 2e-94 gi|34419654|gb|AC107611.6| Rattus norvegicus 10 BAC CH230-195... 278 3e-71 gi|56481|emb|X53087.1|RNIL4E12 R.norvegicus gene for interleukin 270 7e-69 gi|545218|gb|S69238.1| IL4 {promoter} [mice, BALB/c, Genomic, 82 222 2e-54 gi|52678|emb|X05064.1|MMIL4G12 Mouse interleukin 4 gene exons 1 222 2e-54 gi|57157560|dbj|AB174764.1| Mus musculus molossinus IL-4 gene... 220 6e-54 gi|3687208|gb|AC005742.1| Mus musculus chromosome 11, BAC clo... 214 4e-52 gi|11038606|gb|AC084392.1| Mus musculus BAC clone GS1-182G5 from 214 4e-52 gi|545217|gb|S69237.1| IL4 {promoter} [mice, C57BL/6, Genomic, 8 214 4e-52 gi|21211994|emb|AL645741.15| Mouse DNA sequence from clone RP... 214 4e-52 gi|27960676|gb|AF463769.1| Mus musculus strain C57BL/6J cytok... 214 4e-52 gi|1930579|gb|L81578.1|HSL81578 Homo sapiens (subclone 2_b2 f... 204 4e-49 gi|4996849|dbj|AB020732.1| Tursiops truncatus mRNA for interleuk 192 1e-45 gi|163212|gb|M84745.1|BOVIL4XX Bovine interleukin 4 (IL4) gene, 180 5e-42 gi|20530678|gb|AF493991.1| Sus scrofa interleukin-4 precursor (I 141 5e-30 gi|1997|emb|X68330.1|SSILK4 S.scrofa mRNA for interleukin-4 141 5e-30 gi|55742621|ref|NM_214123.1| Sus scrofa interleukin 4 (IL4), mRN 141 5e-30 gi|1730275|gb|U34273.1|CHU34273 Capra hircus interleukin-4 mRNA, 141 5e-30 gi|294220|gb|L12991.1|PIGIL4A Pig interleukin 4 mRNA, complete c 133 1e-27 gi|29603606|dbj|AB107648.1| Lama glama IL-4 mRNA for interleukin 131 4e-27 gi|32186777|gb|AY294020.1| Sus scrofa interleukin 4 mRNA, comple 125 3e-25 gi|57527819|ref|NM_001009313.2| Ovis aries interleukin 4 (IL4), 125 3e-25 gi|165950|gb|M96845.1|SHPIL4A Sheep interleukin 4 mRNA, complete 125 3e-25 gi|84794457|dbj|AB246673.1| Camelus bactrianus IL-4 mRNA for int 123 1e-24 gi|163891|gb|L07081.1|CEUINTERLU Cervus elaphus (clone SH3) inte 119 2e-23 gi|164233|gb|L06010.1|HRSIL4X Horse interleukin 4 mRNA, partial 97.6 6e-17 gi|50096517|gb|AY648947.1| Capra hircus interleukin-4 mRNA, comp 87.7 6e-14 gi|31416286|gb|AY293620.1| Bubalus bubalis interleukin 4 (IL-4) 83.8 9e-13 gi|84875352|dbj|AB246356.1| Bubalus bubalis x Bubalus caraban... 83.8 9e-13 gi|84871721|dbj|AB246355.1| Bubalus bubalis IL-4 mRNA for interl 83.8 9e-13 gi|84871699|dbj|AB246275.1| Bubalus carabanensis IL-4 mRNA for i 83.8 9e-13 gi|31343261|ref|NM_173921.2| Bos taurus interleukin 4 (IL4), mRN 83.8 9e-13 gi|163210|gb|M77120.1|BOVIL4X Bovine interleukin 4 (IL4) mRNA, c 83.8 9e-13 gi|46310147|gb|AY575607.1| Ovis aries interleukin 4 (IL-4) mR... 81.8 4e-12 gi|46310145|gb|AY575606.1| Ovis aries interleukin 4 (IL-4) mR... 81.8 4e-12 gi|21217734|gb|AY096800.1| Ovis aries interleukin-4 precursor, m 81.8 4e-12 gi|2654199|gb|AF035404.1| Equus caballus interleukin-4 precur... 81.8 4e-12 gi|1277|emb|Z11897.1|OAIL4MRNA O.aries IL-4 mRNA for interleukin 81.8 4e-12 gi|5732983|gb|AF172168.1|AF172168 Ovis aries interleukin 4 mRNA, 81.8 4e-12 gi|10716183|gb|AF305617.1|AF305617 Equus caballus interleukin 4 81.8 4e-12 gi|60687486|gb|AY939910.1| Boselaphus tragocamelus interleukin-4 81.8 4e-12 gi|50978885|ref|NM_001003159.1| Canis familiaris interleukin 4 ( 75.8 2e-10 gi|7330263|gb|AF239917.1|AF239917 Canis familiaris interleukin-4 75.8 2e-10 gi|13346438|gb|AF346295.1|AF346295 Phocoena phocoena interleukin 75.8 2e-10 gi|6007792|gb|AF187322.1|AF187322 Canis familiaris interleukin-4 75.8 2e-10 gi|4185290|gb|AF083270.1|AF083270 Canis familiaris interleuki... 75.8 2e-10 gi|14029512|gb|AF333965.1| Marmota monax interleukin-4 mRNA, par 67.9 5e-08 ALIGNMENTS >gi|27477090|ref|NM_000589.2| Homo sapiens interleukin 4 (IL4), transcript >variant 1, mRNA Length=921 Score = 1649 bits (832), Expect = 0.0 Identities = 832/832 (100%), Gaps = 0/832 (0%) Strand=Plus/Plus Query 1 TTCTATGCAAAGCAAAAAGCCAGCAGCAGCCCCAAGCTGATAAGATTAATCTAAAGAGCA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 1 TTCTATGCAAAGCAAAAAGCCAGCAGCAGCCCCAAGCTGATAAGATTAATCTAAAGAGCA 60 Query 61 AATTATGGTGTAATTTCCTATGCTGAAACTTTGTAGTTAATTTTTTAAAAAGGTTTCATT 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 61 AATTATGGTGTAATTTCCTATGCTGAAACTTTGTAGTTAATTTTTTAAAAAGGTTTCATT 120 Query 121 TTCCTATTGGTCTGATTTCACAGGAACATTTTACCTGTTTGTGAGGCATTTTTTCTCCTG 180 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 121 TTCCTATTGGTCTGATTTCACAGGAACATTTTACCTGTTTGTGAGGCATTTTTTCTCCTG 180 Query 181 GAAGAGAGGTGCTGATTGGCCCCAAGTGACTGACAATCTGGTGTAACGAAAATTTCCAAT 240 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 181 GAAGAGAGGTGCTGATTGGCCCCAAGTGACTGACAATCTGGTGTAACGAAAATTTCCAAT 240 Query 241 GTAAACTCATTTTCCCTCGGTTTCAGCAATTTTAAATCTATATATAGAGATATCTTTGTC 300 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 241 GTAAACTCATTTTCCCTCGGTTTCAGCAATTTTAAATCTATATATAGAGATATCTTTGTC 300 Query 301 AGCATTGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCG 360 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 301 AGCATTGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCG 360 Query 361 ACACCTATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATG 420 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 361 ACACCTATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATG 420 Query 421 TGCCGGCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAAC 480 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 421 TGCCGGCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAAC 480 Query 481 TTTGAACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTT 540 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 481 TTTGAACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTT 540 Query 541 TGCTGCCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCG 600 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 541 TGCTGCCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCG 600 Query 601 GCAGTTCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTT 660 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 601 GCAGTTCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTT 660 Query 661 CCACAGGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCT 720 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 661 CCACAGGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCT 720 Query 721 GGCGGGCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTT 780 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 721 GGCGGGCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTT 780 Query 781 GGAAAGGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA 832 |||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 781 GGAAAGGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA 832 >gi|45708995|gb|BC067514.1| Homo sapiens interleukin 4, transcript variant >1, mRNA (cDNA clone MGC:79403 IMAGE:6971780), complete cds Length=528 Score = 1045 bits (527), Expect = 0.0 Identities = 527/527 (100%), Gaps = 0/527 (0%) Strand=Plus/Plus Query 306 TGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACC 365 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 1 TGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACC 60 Query 366 TATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCG 425 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 61 TATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCG 120 Query 426 GCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGA 485 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 121 GCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGA 180 Query 486 ACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTG 545 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 181 ACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTG 240 Query 546 CCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGT 605 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 241 CCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGT 300 Query 606 TCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACA 665 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 301 TCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACA 360 Query 666 GGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGG 725 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 361 GGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGG 420 Query 726 GCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAA 785 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 421 GCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAA 480 Query 786 GGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA 832 ||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 481 GGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA 527 >gi|186334|gb|M13982.1|HUMIL4 Human interleukin 4 (IL-4) mRNA, complete cds Length=614 Score = 1039 bits (524), Expect = 0.0 Identities = 524/524 (100%), Gaps = 0/524 (0%) Strand=Plus/Plus Query 309 ATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACCTAT 368 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 2 ATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACCTAT 61 Query 369 TAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCGGCA 428 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 62 TAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCGGCA 121 Query 429 ACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGAACA 488 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 122 ACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGAACA 181 Query 489 GCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTGCCT 548 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 182 GCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTGCCT 241 Query 549 CCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGTTCT 608 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 242 CCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGTTCT 301 Query 609 ACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACAGGC 668 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 302 ACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACAGGC 361 Query 669 ACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGGGCT 728 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 362 ACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGGGCT 421 Query 729 TGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAAGGC 788 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 422 TGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAAGGC 481 Query 789 TAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA 832 |||||||||||||||||||||||||||||||||||||||||||| Sbjct 482 TAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA 525 >From: mark.schreiber at novartis.com >To: "Emily Wong" >CC: biojava-l at lists.open-bio.org, biojava-l-bounces at lists.open-bio.org >Subject: Re: [Biojava-l] Blast parser for ncbi blast version 2.2.13 >Date: Tue, 14 Mar 2006 09:07:39 +0800 > >Possibly some variation in this output is causing the problem > >Can you post some blast output that replicates this error? > >Thanks > >- Mark > > > > > >"Emily Wong" >Sent by: biojava-l-bounces at lists.open-bio.org >03/13/2006 01:39 PM > > > To: biojava-l at lists.open-bio.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Blast parser for ncbi blast version >2.2.13 > > >Hi, > >Is there a parser that takes into account ncbi blast version 2.2.13(on >their >website)? I am trying to use the code here to parse : >http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the >parser from strict to lazy I get these comments : >Exception in thread "main" java.lang.NullPointerException > at >org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215) > at >org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164) > at >org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311) > at >org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274) > at >org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160) > at BlastParser.main(BlastParser.java:46) > >Thanks, > >Emily > > >_______________________________________________ >Biojava-l mailing list - Biojava-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From christoph.gille at charite.de Tue Mar 14 07:47:10 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 14 Mar 2006 08:47:10 +0100 (CET) Subject: [Biojava-l] alignment algor in Biojava In-Reply-To: References: Message-ID: <64617.84.190.58.246.1142322430.squirrel@webmail.charite.de> > Hi, > > Can somebody tell me what algorithms Biojava uses to make local alignments > and multiples alignments? I'm Serching it on the Documentation but I have > not found it? at the bottom of the page http://www.biojava.org/docs/bj_in_anger/index.htm http://www.charite.de/bioinf/strap/Scripting.html#SequenceAligner Cheers Christoph From koeberle at mpiib-berlin.mpg.de Tue Mar 14 12:28:37 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Tue, 14 Mar 2006 13:28:37 +0100 Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ? Message-ID: <4416B6F5.1050905@mpiib-berlin.mpg.de> Hi, I try to write a Sequence-Object into BioSQL-DB using the Classes of BioJAVA-X. This works well. But if I try to save a Sequence-Object with two (or more) Features and both Feature have equal Types and equal Sources, writing in DB fails. Is the idea wrong to have more than one Feature with same type and source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL. Thanks, Christian The Errormessage: org.hibernate.StaleStateException: Batch update returned unexpected row count from update: 0 actual row count: 0 expected: 1 at org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86) at org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427) at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009) at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356) at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106) -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de From koeberle at mpiib-berlin.mpg.de Tue Mar 14 14:06:43 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Tue, 14 Mar 2006 15:06:43 +0100 Subject: [Biojava-l] BioJAVA-X + BioSQL + no update Message-ID: <4416CDF3.4000407@mpiib-berlin.mpg.de> Hi, I have following problem. I put a RichSequence-Object into a BioSQL-DB, using the new Classes from BioJAVA-X. Later I get these Sequence-Object from the BioSQL-DB (also with BioJAVA-X) and create new Faeture-Objects and Note-Objects and add these to the Sequence-Object. In the case of BioJAVA 1.4 all Features and Annotations are written into the BioSQL-DB. In case of BioJAVA-X there are no changes ind the DB. Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the changes into the DB. Thanks, Christian -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de From bubba.puryear at gmail.com Mon Mar 13 17:27:11 2006 From: bubba.puryear at gmail.com (Bubba Puryear) Date: Mon, 13 Mar 2006 12:27:11 -0500 Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records Message-ID: Hello, I work on a webapp for a biotech company that uses biojava to parse plasmid and feature maps (genbank flatfile format) and we store them in a local database. I've wanted to update the version of biojava we use because the current CVS parser handles features that cross the origin on plasmid maps much better than the parser in 1.4. However, we have a lot of data in various databases that have genbank records formatted in some of the older incarnations of the GFF. In particular, some feature maps don't have ACCESSION fields, and/or are missing modification dates and genbank divisions on the LOCUS line. When I try to parse one of those maps with biojavax, I get parse errors. Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat class be made more tolerant? I know NCBI made several changes to their flatfile format in part because writing parsers for the older specs was tricky. So I'm not sure which direction the bio* folks would like to go with this. I've attached a small example map that causes parse problems. The data in the map is completely bogus, but the structure was taken from a real map file I have to deal with. The following code snippet illustrates my problems: BufferedReader br = new BufferedReader(new StringReader(genbankContent)); try { RichSequenceIterator sequences = IOTools.readGenbankDNA(br, null); if (sequences.hasNext()) { this.sequence = sequences.nextRichSequence(); } } catch (Exception e) { e.printStackTrace(); } where genbankContent is a String containing the contents of the attached file. Thanks much, Bubba Puryear -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.gb Type: chemical/seq-na-genbank Size: 1091 bytes Desc: not available URL: From mira.edelstein at gmx.net Tue Mar 14 22:30:01 2006 From: mira.edelstein at gmx.net (Mira) Date: Tue, 14 Mar 2006 23:30:01 +0100 Subject: [Biojava-l] (no subject) Message-ID: <001501c647b6$d5954f70$9b7ba8c0@mecom> please take me from the mailing list thanks mira From mark.schreiber at novartis.com Wed Mar 15 06:42:59 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 15 Mar 2006 14:42:59 +0800 Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ? Message-ID: This could be a bug, this is bleeding edge development code. Are you using the most up to date CVS code? Also which database are you using? As a suggestion RichFeatures with the same Type, Source and Parent sequence can only be distinguished by rank (In BioSQL and BioJavaX). Can you persist them to the DB if you give one a different rank? - Mark Christian K?berle Sent by: biojava-l-bounces at lists.open-bio.org 03/14/2006 08:28 PM To: bio java mailing list cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ? Hi, I try to write a Sequence-Object into BioSQL-DB using the Classes of BioJAVA-X. This works well. But if I try to save a Sequence-Object with two (or more) Features and both Feature have equal Types and equal Sources, writing in DB fails. Is the idea wrong to have more than one Feature with same type and source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL. Thanks, Christian The Errormessage: org.hibernate.StaleStateException: Batch update returned unexpected row count from update: 0 actual row count: 0 expected: 1 at org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91) at org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86) at org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427) at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009) at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356) at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106) -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Mar 15 07:02:02 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 15 Mar 2006 15:02:02 +0800 Subject: [Biojava-l] BioJAVA-X + BioSQL + no update Message-ID: With BioJavaX if you want any changes to a RichSequence object to persist to the database you need to "save or add it" with Hibernate. SessionFactory sessionFactory = new Configuration().configure().buildSessionFactory(); Session session = sessionFactory.openSession(); RichObjectFactory.connectToBioSQL(session); RichSequence rs = ...; // some sequence you've made or modified session.saveOrUpdate("Sequence",rs); // persist the sequence *** Another way is to do everything inside a transaction (this example is from the BioJavaX docbook in CVS) SessionFactory sessionFactory = new Configuration().configure().buildSessionFactory(); Session session = sessionFactory.openSession(); RichObjectFactory.connectToBioSQL(session); Transaction tx = session.beginTransaction(); try { // print out all the namespaces in the database Query q = session.createQuery("from Namespace"); List namespaces = q.list(); // retrieve all the namespaces from the db for (Iterator i = namespaces.iterator(); i.hasNext(); ) { Namespace ns = (Namespace)i.next(); System.out.println(ns.getName()); // print out the name of the namespace // print out all the sequences in the namespace Query sq = session.createQuery("from BioEntry where namespace= :nsp"); // set the named parameter "nsp" to ns sq.setParameter("nsp",ns); List sequences = sq.list(); for (Iterator j = sequences.iterator(); j.hasNext(); ) { BioEntry be = (BioEntry)j.next(); // RichSequences are BioEntrys too System.out.println(" "+be.getName()); // print out the name of the sequence // if the sequence is called bloggs, change its description to XYZ if (be.getName().equals("bloggs")) { be.setDescription("XYZ"); } } } // commit and tidy up tx.commit(); System.out.println("Changes committed."); // all sequences called bloggs now have a description of "XYZ" in the database } catch (Exception e) { tx.rollback(); System.out.println("Changes rolled back."); e.printStackTrace(); } session.close(); Christian K?berle Sent by: biojava-l-bounces at lists.open-bio.org 03/14/2006 10:06 PM To: bio java mailing list cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioJAVA-X + BioSQL + no update Hi, I have following problem. I put a RichSequence-Object into a BioSQL-DB, using the new Classes from BioJAVA-X. Later I get these Sequence-Object from the BioSQL-DB (also with BioJAVA-X) and create new Faeture-Objects and Note-Objects and add these to the Sequence-Object. In the case of BioJAVA 1.4 all Features and Annotations are written into the BioSQL-DB. In case of BioJAVA-X there are no changes ind the DB. Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the changes into the DB. Thanks, Christian -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Mar 15 07:11:55 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 15 Mar 2006 15:11:55 +0800 Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records Message-ID: Hi - I'm happy for the regexps in GenbankFormat and EMBLFormat etc to be relaxed a little as long as the parsing of fully valid genbank files doesn't suffer. If someone wants to test this thoroughly it would be a great benefit to the whole community. In some cases it may not be possible. For example if a feature doesn't have sufficient information to build a proper RichFeature object I don't think we should allow the file. I might be good to make a collection in CVS of example files that are known to have broken the parser in the past (the files folder in the test suite would be an ideal place). - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Bubba Puryear" Sent by: biojava-l-bounces at lists.open-bio.org 03/14/2006 01:27 AM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records Hello, I work on a webapp for a biotech company that uses biojava to parse plasmid and feature maps (genbank flatfile format) and we store them in a local database. I've wanted to update the version of biojava we use because the current CVS parser handles features that cross the origin on plasmid maps much better than the parser in 1.4. However, we have a lot of data in various databases that have genbank records formatted in some of the older incarnations of the GFF. In particular, some feature maps don't have ACCESSION fields, and/or are missing modification dates and genbank divisions on the LOCUS line. When I try to parse one of those maps with biojavax, I get parse errors. Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat class be made more tolerant? I know NCBI made several changes to their flatfile format in part because writing parsers for the older specs was tricky. So I'm not sure which direction the bio* folks would like to go with this. I've attached a small example map that causes parse problems. The data in the map is completely bogus, but the structure was taken from a real map file I have to deal with. The following code snippet illustrates my problems: BufferedReader br = new BufferedReader(new StringReader(genbankContent)); try { RichSequenceIterator sequences = IOTools.readGenbankDNA(br, null); if (sequences.hasNext()) { this.sequence = sequences.nextRichSequence(); } } catch (Exception e) { e.printStackTrace(); } where genbankContent is a String containing the contents of the attached file. Thanks much, Bubba Puryear _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l [ Attachment ''FOO.GB'' removed by Mark Schreiber ] From koeberle at mpiib-berlin.mpg.de Thu Mar 16 10:03:26 2006 From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Thu, 16 Mar 2006 11:03:26 +0100 Subject: [Biojava-l] BioJAVA-X + BioSQL + no update In-Reply-To: References: Message-ID: <441937EE.6000204@mpiib-berlin.mpg.de> Hi Mark, it works but the code has to look like that: ... session.getTransaction().begin(); session.saveOrUpdate("Sequence",seq); session.getTransaction().commit(); it also works with: session.update("Sequence",seq); Thanks, Christian mark.schreiber at novartis.com wrote: > With BioJavaX if you want any changes to a RichSequence object to persist > to the database you need to "save or add it" with Hibernate. > > > SessionFactory sessionFactory = new > Configuration().configure().buildSessionFactory(); > Session session = sessionFactory.openSession(); > RichObjectFactory.connectToBioSQL(session); > > RichSequence rs = ...; // some sequence you've made or > modified > session.saveOrUpdate("Sequence",rs); // persist the sequence > > *** > Another way is to do everything inside a transaction (this example is from > the BioJavaX docbook in CVS) > > SessionFactory sessionFactory = new > Configuration().configure().buildSessionFactory(); > Session session = sessionFactory.openSession(); > RichObjectFactory.connectToBioSQL(session); > > Transaction tx = session.beginTransaction(); > try { > > // print out all the namespaces in the database > > Query q = session.createQuery("from Namespace"); > List namespaces = q.list(); // retrieve all the > namespaces from the db > for (Iterator i = namespaces.iterator(); i.hasNext(); ) { > Namespace ns = (Namespace)i.next(); > System.out.println(ns.getName()); // print out the name of the > namespace > > // print out all the sequences in the namespace > Query sq = session.createQuery("from BioEntry where namespace= > :nsp"); > // set the named parameter "nsp" to ns > sq.setParameter("nsp",ns); > List sequences = sq.list(); > > for (Iterator j = sequences.iterator(); j.hasNext(); ) { > BioEntry be = (BioEntry)j.next(); // RichSequences are > BioEntrys too > System.out.println(" "+be.getName()); // print out the name > of the sequence > > // if the sequence is called bloggs, change its description to > XYZ > > if (be.getName().equals("bloggs")) { > be.setDescription("XYZ"); > } > } > > } > > // commit and tidy up > tx.commit(); > System.out.println("Changes committed."); > > // all sequences called bloggs now have a description of "XYZ" in the > database > > } catch (Exception e) { > tx.rollback(); > System.out.println("Changes rolled back."); > e.printStackTrace(); > } > > session.close(); > > > > > > > Christian K?berle > Sent by: biojava-l-bounces at lists.open-bio.org > 03/14/2006 10:06 PM > > > To: bio java mailing list > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] BioJAVA-X + BioSQL + no update > > > Hi, > I have following problem. > I put a RichSequence-Object into a BioSQL-DB, using the new Classes from > BioJAVA-X. > Later I get these Sequence-Object from the BioSQL-DB (also with > BioJAVA-X) and create new Faeture-Objects and Note-Objects and add > these to the Sequence-Object. > In the case of BioJAVA 1.4 all Features and Annotations are written into > the BioSQL-DB. In case of BioJAVA-X there are no changes ind the DB. > Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the > changes into the DB. > > Thanks, > Christian > > -- Christian K?berle Max Planck Institute for Infection Biology Department: Immunology Schumannstr. 21/22 10117 Berlin Tel: +49 30 28 460 562 e-mail: koeberle at mpiib-berlin.mpg.de From mark.schreiber at novartis.com Fri Mar 17 02:50:34 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 17 Mar 2006 10:50:34 +0800 Subject: [Biojava-l] ProfileHMM Serialization Problem Message-ID: He did fix a number of problems, although possibly not all, Which version are you using? Can you send a stack trace? - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Todd Riley 03/17/2006 10:33 AM To: Mark Schreiber/GP/Novartis at PH cc: biojava-l-bounces at portal.open-bio.org, biojava-l at biojava.org Subject: ProfileHMM Serialization Problem Hello all, I am having a problem with serialized ProfileHMM objects. I can read in one serialized ProfileHMM object, but never more than one (I can't even read in the same serialized object again.) It appears that the problem lies with the AlphabetManager. Maybe a clash with alphabet names and/or indexes? I looked in the archives and found the problem seemed to exist back in Oct of 2002. Has this ever been addressed? Any help here would be greatly appreciated. Thanks, Todd RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Schreiber, Mark Tue, 08 Oct 2002 13:11:33 -0700 Yup, It needs fixing, serialization and BioJava just don't seem to play that well :( The question is what kind of API. The attractive part about serialization is that when it works you get back what you started with. You can also do RMI. The downside of the XML model is you don't get back what you had before, you get back a MarkovModel, all of your custom designed methods etc are lost. Two ways I can see to get around this. One right a wrapper class that makes your custom model and the thing returned by the XMLMarkovModel look the same (look like the same interface generally). The other option is to mimic something like JAXB (not JAXB though as it won't cope well with BioJava flyweight symbols and alphabets). Somewhere the class name is stored in the XML and through the wonders of introspection things are returned to how they were. This generally requires the class to be designed as a valid bean, or at least point to a nice FactoryClass or something. Ultimately this would be good for all of BioJava. I know people hate the idea of another XML format but I think that there really isn't one that represents what we are trying to do here. You could also write XSLT to transform into XML flavours that aren't as interested in gory details such as classnames etc which are needed for serialization. Just my $0.02 - Mark > -----Original Message----- > From: Matthew Pocock [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, 9 October 2002 7:08 a.m. > To: Lachlan Coin; [EMAIL PROTECTED] > Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs > > > Hi, > > HMM serialization (or persistance) seems to be an > ongoing problem for people. We (OK - I) wrote this > code a long time ago, back in the dark ages when I > didn't know much about programming. Does anyone want > to fix this mess once and for all, and write a HMM > persistance API? It sounds like that would be a realy > helpfull thing to have. > > Matthew > > --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi > > > > Having made a mistake in serialising HMMs before - > > are you writing your > > serialised object at several points in the code? > > Unless you write all of > > the models at the same point, they will not work > > when you read them back > > in. > > > > Cheers, > > > > Lachlan > > > > > > > > Message: 1 > > > Subject: RE: [Biojava-l] Create DP object from > > profileHMM class file > > > Date: Tue, 8 Oct 2002 08:53:41 +1300 > > > From: "Schreiber, Mark" > > <[EMAIL PROTECTED]> > > > To: "Tisanai" <[EMAIL PROTECTED]>, > > <[EMAIL PROTECTED]> > > > > > > Hi - > > > > > > The error is coming from the 64th line of your > > program (at > > > T_Zscore.main(T_Zscore.java:64)) > > > > > > I can see two places that the error might be > > coming from but I need to > > > know which line is the 64th line of the program. > > > > > > Is it: ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > > > Or is it: dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > > > > > > > > > > -----Original Message----- > > > > From: Tisanai > > [mailto:[EMAIL PROTECTED]] > > > > Sent: Tuesday, 8 October 2002 2:40 a.m. > > > > To: [EMAIL PROTECTED] > > > > Subject: [Biojava-l] Create DP object from > > profileHMM class file > > > > > > > > > > > > Hi > > > > > > > > By this code I would like to create DP object > > from several > > > > phmm file. > > > > > > > > for(int > > i=0;i > > > String model_out_name = > > md_out_lst.align[i]; > > > > File md_file = new File(model_out_name); > > > > > > > > FileInputStream fis_md = new > > FileInputStream(md_file); > > > > ObjectInputStream ois_md = new > > ObjectInputStream(fis_md); > > > > ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > ois_md.close(); > > > > dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > } > > > > > > > > I found that it always stuck at the second file (i=2). If there is only one file in my list this code will > > work fine. But if there is more than one file in the list when it try to > > > > create the second dp object (dp[1]). This kind of error will shown out: > > > > > > > > org.biojava.bio.BioError: State d-15 > > is known in > > > > states but is not listed in the transFrom table > > > > at > > > > > > > org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar > > > > kovModel.java:227) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586) > > > > at org.biojava.bio.dp.DP.stateList(DP.java:123) > > > > at org.biojava.bio.dp.DP.update(DP.java:353) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49) > > > > at org.biojava.bio.dp.DP.(DP.java:377) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.(SingleDP.java:41) > > > > at > > > > > > > org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory > > > .java:53) > > > > at T_Zscore.main(T_Zscore.java:64) > > > > > > > > How can I fix my code? > > > > > > > > Thank > > > > Tisanai > > > > > > > > _______________________________________________ > > > > Biojava-l mailing list - [EMAIL PROTECTED] > > > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > ============================================================== > ========= > > > Attention: The information contained in this > > message and/or attachments > > > from AgResearch Limited is intended only for the > > persons or entities > > > to which it is addressed and may contain > > confidential and/or privileged > > > material. Any review, retransmission, > > dissemination or other use of, or > > > taking of any action in reliance upon, this > > information by persons or > > > entities other than the intended recipients is > > prohibited by AgResearch > > > Limited. If you have received this message in > > error, please notify the > > > sender immediately. > > > > > > ============================================================== > ========= > > > > _______________________________________________ > > Biojava-l mailing list - [EMAIL PROTECTED] > > http://biojava.org/mailman/listinfo/biojava-l > > __________________________________________________ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Lachlan Coin From mark.schreiber at novartis.com Fri Mar 17 02:52:52 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 17 Mar 2006 10:52:52 +0800 Subject: [Biojava-l] Away Message-ID: Hello - I'm going to be travelling a lot in the next 5 weeks and may only have patchy access to email and no access to CVS or my development machines. Therefore I won't be able to offer much in the way of technical support. Hopefully Richard and Michael will be able to deal with any major issues that crop up. Best regards, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From toddri at eden.rutgers.edu Fri Mar 17 02:33:05 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu, 16 Mar 2006 21:33:05 -0500 Subject: [Biojava-l] ProfileHMM Serialization Problem In-Reply-To: References: Message-ID: <441A1FE1.9000508@eden.rutgers.edu> Hello all, I am having a problem with serialized ProfileHMM objects. I can read in one serialized ProfileHMM object, but never more than one (I can't even read in the same serialized object again.) It appears that the problem lies with the AlphabetManager. Maybe a clash with alphabet names and/or indexes? I looked in the archives and found the problem seemed to exist back in Oct of 2002. Has this ever been addressed? Any help here would be greatly appreciated. Thanks, Todd RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Schreiber, Mark Tue, 08 Oct 2002 13:11:33 -0700 Yup, It needs fixing, serialization and BioJava just don't seem to play that well :( The question is what kind of API. The attractive part about serialization is that when it works you get back what you started with. You can also do RMI. The downside of the XML model is you don't get back what you had before, you get back a MarkovModel, all of your custom designed methods etc are lost. Two ways I can see to get around this. One right a wrapper class that makes your custom model and the thing returned by the XMLMarkovModel look the same (look like the same interface generally). The other option is to mimic something like JAXB (not JAXB though as it won't cope well with BioJava flyweight symbols and alphabets). Somewhere the class name is stored in the XML and through the wonders of introspection things are returned to how they were. This generally requires the class to be designed as a valid bean, or at least point to a nice FactoryClass or something. Ultimately this would be good for all of BioJava. I know people hate the idea of another XML format but I think that there really isn't one that represents what we are trying to do here. You could also write XSLT to transform into XML flavours that aren't as interested in gory details such as classnames etc which are needed for serialization. Just my $0.02 - Mark > -----Original Message----- > From: Matthew Pocock [mailto:[EMAIL PROTECTED] ] > Sent: Wednesday, 9 October 2002 7:08 a.m. > To: Lachlan Coin; [EMAIL PROTECTED] > Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs > > > Hi, > > HMM serialization (or persistance) seems to be an > ongoing problem for people. We (OK - I) wrote this > code a long time ago, back in the dark ages when I > didn't know much about programming. Does anyone want > to fix this mess once and for all, and write a HMM > persistance API? It sounds like that would be a realy > helpfull thing to have. > > Matthew > > --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi > > > > Having made a mistake in serialising HMMs before - > > are you writing your > > serialised object at several points in the code? > > Unless you write all of > > the models at the same point, they will not work > > when you read them back > > in. > > > > Cheers, > > > > Lachlan > > > > > > > > Message: 1 > > > Subject: RE: [Biojava-l] Create DP object from > > profileHMM class file > > > Date: Tue, 8 Oct 2002 08:53:41 +1300 > > > From: "Schreiber, Mark" > > <[EMAIL PROTECTED]> > > > To: "Tisanai" <[EMAIL PROTECTED]>, > > <[EMAIL PROTECTED]> > > > > > > Hi - > > > > > > The error is coming from the 64th line of your > > program (at > > > T_Zscore.main(T_Zscore.java:64)) > > > > > > I can see two places that the error might be > > coming from but I need to > > > know which line is the 64th line of the program. > > > > > > Is it: ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > > > Or is it: dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > > > > > > > > > > -----Original Message----- > > > > From: Tisanai > > [mailto:[EMAIL PROTECTED] ] > > > > Sent: Tuesday, 8 October 2002 2:40 a.m. > > > > To: [EMAIL PROTECTED] > > > > Subject: [Biojava-l] Create DP object from > > profileHMM class file > > > > > > > > > > > > Hi > > > > > > > > By this code I would like to create DP object > > from several > > > > phmm file. > > > > > > > > for(int > > i=0;i > > > String model_out_name = > > md_out_lst.align[i]; > > > > File md_file = new File(model_out_name); > > > > > > > > FileInputStream fis_md = new > > FileInputStream(md_file); > > > > ObjectInputStream ois_md = new > > ObjectInputStream(fis_md); > > > > ProfileHMM model = (ProfileHMM) > > ois_md.readObject(); > > > > ois_md.close(); > > > > dp[i] = > > DPFactory.DEFAULT.createDP(model); > > > > } > > > > > > > > I found that it always stuck at the second file (i=2). If there is only one file in my list this code will > > work fine. But if there is more than one file in the list when it try to > > > > create the second dp object (dp[1]). This kind of error will shown out: > > > > > > > > org.biojava.bio.BioError: State d-15 > > is known in > > > > states but is not listed in the transFrom table > > > > at > > > > > > > org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar > > > > kovModel.java:227) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599) > > > > at > > > > > > > org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586) > > > > at org.biojava.bio.dp.DP.stateList(DP.java:123) > > > > at org.biojava.bio.dp.DP.update(DP.java:353) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49) > > > > at org.biojava.bio.dp.DP.(DP.java:377) > > > > at > > > org.biojava.bio.dp.onehead.SingleDP.(SingleDP.java:41) > > > > at > > > > > > > org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory > > > .java:53) > > > > at T_Zscore.main(T_Zscore.java:64) > > > > > > > > How can I fix my code? > > > > > > > > Thank > > > > Tisanai > > > > > > > > _______________________________________________ > > > > Biojava-l mailing list - [EMAIL PROTECTED] > > > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > ============================================================== > ========= > > > Attention: The information contained in this > > message and/or attachments > > > from AgResearch Limited is intended only for the > > persons or entities > > > to which it is addressed and may contain > > confidential and/or privileged > > > material. Any review, retransmission, > > dissemination or other use of, or > > > taking of any action in reliance upon, this > > information by persons or > > > entities other than the intended recipients is > > prohibited by AgResearch > > > Limited. If you have received this message in > > error, please notify the > > > sender immediately. > > > > > > ============================================================== > ========= > > > > _______________________________________________ > > Biojava-l mailing list - [EMAIL PROTECTED] > > http://biojava.org/mailman/listinfo/biojava-l > > __________________________________________________ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Lachlan Coin From er.sukhdeepsingh at gmail.com Fri Mar 17 11:21:16 2006 From: er.sukhdeepsingh at gmail.com (Sukhdeep Singh) Date: Fri, 17 Mar 2006 16:51:16 +0530 Subject: [Biojava-l] need help Message-ID: <40fbb41e0603170321p572b04cdj20d8e84ae5fb3977@mail.gmail.com> hello guys myself SUKHDEEP SINGH a 2ND YEAR student of AMBALA COLLEGE OF ENGINEERING & APPLIED RESEARCH. pals i am very much dedicated to bioinformatics and want to do something great in it. i have also done basic & advanced courses in BIOINFORMATICS in my 15 day winter vacation. I hav learned the functions of some softwares such as RASMOL,SWISSPDB,CN3D( V3.1),CLUSTAL-X,HYPERCAM(V7.5 student evaluation version). i am very much dedicated to it because i have a good knowledge of computers as i am operating it for about 4 years but moderate knowledge of bio. I am also familier to the databases like KEGG,NCBI,PUBMED,ENTREZ etc. so i want you to help me by telling me any tutorial program for BIOJAVA,BIOPERL or any institute giving training in bioinformatics or any other subject related to BIOINFORMATICS for 45 days nearly in the month of july-august. so please friends jus help me out with this REPLY me at er.sukhdeepsingh at gmail.com SUKHDEEP SINGH From dag at sonsorol.org Tue Mar 21 17:55:11 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Tue, 21 Mar 2006 12:55:11 -0500 Subject: [Biojava-l] Important OBF update for biojava developers and users Message-ID: Executive summary: biojava.org new DNS is propagating as I write this email. Eventually everyone should see the new wiki-based site running on the new OBF server hardware. Read on for more info on some other upcoming changes... Hi biojava people, Sorry for the interruption but I've got some important site and server news. People will also see multiple copies of this note as I slowly transition sites over one at a time. We are in the midst of moving all of our websites, mailing lists, developers and sourcecode repositories onto more modern hardware located in a 2nd Boston area datacenter facility. The transition is important for a couple of reasons - the most urgent being that we are going to lose internet connectivity in our current hosting facility on March 27th 2006. That datacenter belongs to Wyeth Research in Cambridge, Massachusetts. Wyeth Research & Genetics Institute have been long time significant supporters & hosting providers for OBF servers and projects -- we owe them a great deal of gratitude and public acknowledgment for hosting our servers over many years. Speaking as a hardware geek I can tell you that the many years of high-bandwidth, trouble free hosting have been invaluable for our efforts and projects. Sadly, it is no longer possible for them to host our servers as they need to begin making some network and WAN circuit changes that will no longer support direct internet facing servers (such as ours) in Cambridge. The other major reason for the transition is our need to relocate onto hardware that can better be remotely managed (as our volunteer administrators are scattered all over the globe). My employer, BioTeam Inc. has donated new server hardware and is also providing the hosting facilities in a Tier 1 Boston area colocation facility. Infrastructure geeks can see pictures of the colocation cage and the new OBF servers online at this URL: http://bioteam.net/gallery/bioteamBDC -- those servers also host EMBOSS FTP/CVS and mailing lists. Current status of the migration: - All 57 mailing lists have been moved over to the new hardware (you may have noticed "lists.open-bio.org" showing up in your list messages) - The new anonymous sourcecode server is running at http:// code.open-bio.org. "cvs.biojava.org" is already pointing at it. - Your website (biojava.org) was moved to the new hardware (and new Wiki site!) about an hour ago - Developers with CVS accounts have *NOT* been migrated yet Basically we are trying to relocate everything but the developers over the next few days so we can spend the weekend on the developer and CVS transition. If DNS has not propagated yet, point your browser at http:// biojava.open-bio.org -- that is the new site your group has been building. What is happening now is DNS pointers for biojava.org and www.biojava.org are slowly changing over to point at the wiki and the new hardware. Eventually you'll see the same site regardless of which URL you use. For biojava users -------------------------- Please keep an eye on your website and mailing lists and let support at open-bio.org know if there are any problems with the transition. In particular your new wiki site contains embedded links to some parts of the 'old' static website. I caught the obvious ones -- (biojava.org/downloads/ and biojava.org/docs/ but I may have missed some. Please let me know about any broken links. Also someone may want to clean up the biojava logo image now in the wiki to make the white background transparent. For developers and leaders --------------------------------------------- Whomever will be updating the static parts of the website (/downlaod/ and /docs/) in the future will need login access to our new central webserver machine, please contact support at open-bio.org to request a user account for biojava website maintenance. For people with CVS commit/write access --------------------------------------------------------- Also note that when we finally do transition over to the new developer machine (where the real sourcecode lives), ALL developers will need to email support at open-bio.org to request a password reset. Although we can transition usernames, settings and home directories over from the old to the new machine we can not transition over existing passwords as they are stored in incompatible hashed formats. All developers are going to need new passwords for the new developer machine. We will likely make the developer machine swap this weekend. Reporting Problems / Help & Assistance ------------------------------------------------------ The transition will be complicated, we need your help to spot problems and glitches! The OBF has a new helpdesk ticketing system set up at "support at open-bio.org" so that all OBF admins can read and respond to issues and problems. Most troubles should be reported to that address. For urgent problems, especially during this transition period, feel free to contact me directly (dag at sonsorol.org) (ichat/ aol/aim screen name: bioteamdag). Regards, Chris Dagdigian open-bio.org From toddri at eden.rutgers.edu Thu Mar 23 21:59:23 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu, 23 Mar 2006 16:59:23 -0500 Subject: [Biojava-l] HMM's - Attempting some fancy stuff In-Reply-To: <44119D31.6010703@mpiib-berlin.mpg.de> References: <44119D31.6010703@mpiib-berlin.mpg.de> Message-ID: <44231A3B.6030902@eden.rutgers.edu> Hello, After successfully implementing some TFBS search models using the ProfileHMM and DP classes, I am ready to attempt some fancier stuff that is going to require some serious coding. Before I begin, I thought that I might field some questions to the BioJava users/programmers that have some experience and/or interest in the BioJava HMM classes. I want to be sure to implement features in a fashion that will maximize usability in the simplest way.... Questions: 1. Many of the TFBS sites that I am modeling are palindromic or repetitive. I wish to associate transition and emission distributions (as prior knowledge) during training in order to enforce a palindromic and/or repetitive pattern and thus also greatly reduce the parameter space. Example: A p53 TFBS is palindromic and repetitive. A 20 column Profile HMM can be greatly reduced to an HMM with a the match-state topology of 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1) 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1), where C() means DNA complement. Notice that with this model, I now have only 5 match-state emissions as opposed to 20 to train. (C(n) is a complement view over distribution n). There are also far fewer transition distributions to train if I impose that the transitions from a->b are the same as b->a or C(b)->C(a), but in the opposite direction. I wish to implement this in a fashion that does not require any changes to the current Viterbi, forward, Baum Welch, etc, algorithms, or the DP class. I have already started writing classes that provide a view (or complement view) over an existing distribution. My plan is to use these views as a means to correlate emission and transition distributions from and between different columns in the Profile HMM. Has anyone ever tried this or thought of trying this? Any ideas about how to implement this could be very useful. 2. I wish to use more complicated background models than just a 0-th order background distribution. I would like to use a Dirichlet mixture and/or higher order Markov models. Has anyone looked into this? Any ideas as to how to implement this in the current release? -Todd From toddri at eden.rutgers.edu Thu Mar 23 23:04:45 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu, 23 Mar 2006 18:04:45 -0500 Subject: [Biojava-l] HMM's - Attempting some fancy stuff In-Reply-To: <1143153837.13405.184.camel@elm.mcb.mcgill.ca> References: <44119D31.6010703@mpiib-berlin.mpg.de> <44231A3B.6030902@eden.rutgers.edu> <1143153837.13405.184.camel@elm.mcb.mcgill.ca> Message-ID: <4423298D.8000901@eden.rutgers.edu> Yes, I agree that the palindromes are not always identical. However, often my unaligned training data is not complete enough to train the model well without some simplification. So far, I have been using Cross-validation, sensitivity, and specificity to determine the effectiveness of this simplification approach. -Todd Francois Pepin wrote: >>1. Many of the TFBS sites that I am modeling are palindromic or >>repetitive. I wish to associate transition and emission distributions >>(as prior knowledge) during training in order to enforce a palindromic >>and/or repetitive pattern and thus also greatly reduce the parameter space. >> >> > >Just as a note, we haven't found this to be ideal, if you have >sufficient training data. It is often the case that one of the >palindromes is more conserved than the other, and you would treating >them the same way. > >Of course, it depends how much of an in-depth study you'll want to be >doing. > >Francois > > > From mark.schreiber at novartis.com Fri Mar 24 02:28:04 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 24 Mar 2006 10:28:04 +0800 Subject: [Biojava-l] HMM's - Attempting some fancy stuff Message-ID: I think you could do a palindrome as a push-down automaton or similar. Alternatively you could do something like a HMM with emission duration as in Borodovsky's GeneMarkHMM programs but that would require a lot of new code for the DP library (good to have though). To use a Dirichlet mixture as your background you could calculate one and give it to a Distribution although it might be best to implement the Distribution interface with a class that generates one for you. To go to higer order models you just need a higher order alphabet (http://biojava.org/wiki/BioJava:Cookbook:Alphabets:CrossProduct) and possibly use an OrderNDistribution for background and emission (http://biojava.org/wiki/BioJava:CookBook:Distribution:Custom) - Mark Todd Riley Sent by: biojava-l-bounces at lists.open-bio.org 03/24/2006 07:04 AM To: Francois Pepin cc: biojava-l at biojava.org, Mark Schreiber/GP/Novartis at PH Subject: Re: [Biojava-l] HMM's - Attempting some fancy stuff Yes, I agree that the palindromes are not always identical. However, often my unaligned training data is not complete enough to train the model well without some simplification. So far, I have been using Cross-validation, sensitivity, and specificity to determine the effectiveness of this simplification approach. -Todd Francois Pepin wrote: >>1. Many of the TFBS sites that I am modeling are palindromic or >>repetitive. I wish to associate transition and emission distributions >>(as prior knowledge) during training in order to enforce a palindromic >>and/or repetitive pattern and thus also greatly reduce the parameter space. >> >> > >Just as a note, we haven't found this to be ideal, if you have >sufficient training data. It is often the case that one of the >palindromes is more conserved than the other, and you would treating >them the same way. > >Of course, it depends how much of an in-depth study you'll want to be >doing. > >Francois > > > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From jolyon.holdstock at ogt.co.uk Fri Mar 24 11:26:44 2006 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Fri, 24 Mar 2006 11:26:44 -0000 Subject: [Biojava-l] RichSequence annotations... Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com> Hi, I use the following code to extract all the genes from a sequence file; I load the sequence then filter out only CDS features; iterating through these lets me get the gene annotation for the feature //====================================================================== ========= Sequence seq; String fileName = new File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e mbl"); try { seq = SeqIOTools.readEmbl(new BufferedReader(new FileReader(fileName))).nextSequence(); } catch (IOException IOE) { System.out.println("IOException " + IOE); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } //Create a feature filter for CDS features only FeatureFilter ff = new FeatureFilter.ByType("CDS"); //Get the filtered Features FeatureHolder fh = seq.filter(ff); //Iterate over the Features in fh for (Iterator i = fh.features(); i.hasNext(); ) { Feature f = (Feature)i.next(); Annotation annotation = f.getAnnotation(); Object key = "gene"; hash.put(annotation.getProperty(key), f); } //====================================================================== ========= I am now using the new BioJavaX classes which I cannot get to work. Does anyone has any pointers for this? I use the sequence data so have to use a RichSequence rather than a BioEntry //====================================================================== ========= RichSequence richSeq; String fileName = "C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl"; try { richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new FileReader(fileName)), null).nextRichSequence(); } catch (IOException IOE) { System.out.println("IOException " + IOE); } catch (BioException BIOE) { System.out.println("BioException " + BIOE); } //Create a feature filter for CDS features only FeatureFilter ff = new FeatureFilter.ByType("CDS"); //Get the filtered Features FeatureHolder fh = richSeq.filter(ff); //Iterate through the features for (Iterator i = fh.features(); i.hasNext(); ) { RichFeature rf = (RichFeature) i.next(); System.out.println("RichFeature: " + rf.toString()); RichAnnotation ra = (RichAnnotation) rf.getAnnotation(); System.out.println("RichAnnotation: " + ra.toString()); } //====================================================================== ========= The output shows that CDS features have been filtered successfully and that the gene name is in the annotation RichFeature: (#1) lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109 76,12496..12656,14136..14266]) RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1" 14403..14532,16852..16987,17821..17959,18068..18122, 19456..19570,23623..23753,25885..26053,29102..29240, 32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4) biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match: proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene: dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited guanine nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1] If I add the following then I can see what keys are in the annotation //====================================================================== ========= Set keySet = ra.keys(); for (Iterator it = keySet.iterator(); it.hasNext(); ) { String key = it.next().toString(); System.out.println("Key: " + key); } //====================================================================== ========= The output shows that there is a gene Key: biojavax:clone_lib Key: biojavax:codon_start Key: biojavax:evidence Key: biojavax:gene Key: biojavax:note Key: biojavax:product Key: biojavax:protein_id My understanding is that I need to use a ComparableTerm to access the value but when I create it I get a NoSuchElementException error ComparableTerm gene = RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); System.out.println("Gene: " + ra.getProperty(gene)); java.util.NoSuchElementException: No such property: biojavax:gene, rank 0 cheers, Jolyon Jolyon Holdstock Ph.D. Senior Computational Biologist, Oxford Gene Technology (Ops) Ltd. Begbroke Business and Science Park Sandy Lane, Yarnton Oxford, OX5 1PF Tel: 01865 309699 Fax: 01865 842116 Confidentiality Notice: The contents of this email from the Oxford Gene Technology Group of Companies are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. From richard.holland at ebi.ac.uk Fri Mar 24 13:16:49 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Fri, 24 Mar 2006 13:16:49 +0000 Subject: [Biojava-l] RichSequence annotations... In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com> Message-ID: <1143206209.3899.84.camel@texas.ebi.ac.uk> The terms are ranked in RichAnnotations. getProperty(term) searches for a Note with that term and a rank of zero. If you don't know the ranks, you need to use the public Note[] getProperties(Object key); method on the RichAnnotation object instead. This will return a list of all matching Note objects with the given term regardless of rank. cheers, Richard On Fri, 2006-03-24 at 11:26 +0000, Jolyon Holdstock wrote: > Hi, > > > > I use the following code to extract all the genes from a sequence file; > > I load the sequence then filter out only CDS features; iterating through > these lets me get the gene annotation for the feature > > > > //====================================================================== > ========= > > Sequence seq; > > String fileName = new > File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e > mbl"); > > try { > > seq = SeqIOTools.readEmbl(new BufferedReader(new > FileReader(fileName))).nextSequence(); > > } > > catch (IOException IOE) { > > System.out.println("IOException " + IOE); > > } > > catch (BioException BIOE) { > > System.out.println("BioException " + BIOE); > > } > > > > //Create a feature filter for CDS features only > > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > > > > //Get the filtered Features > > FeatureHolder fh = seq.filter(ff); > > > > //Iterate over the Features in fh > > for (Iterator i = fh.features(); i.hasNext(); ) { > > Feature f = (Feature)i.next(); > > Annotation annotation = f.getAnnotation(); > > Object key = "gene"; > > hash.put(annotation.getProperty(key), f); > > } > > //====================================================================== > ========= > > > > I am now using the new BioJavaX classes which I cannot get to work. Does > anyone has any pointers for this? > > I use the sequence data so have to use a RichSequence rather than a > BioEntry > > > > //====================================================================== > ========= > > RichSequence richSeq; > > String fileName = > "C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl"; > > try { > > richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new > FileReader(fileName)), null).nextRichSequence(); > > } > > catch (IOException IOE) { > > System.out.println("IOException " + IOE); > > } > > catch (BioException BIOE) { > > System.out.println("BioException " + BIOE); > > } > > > > //Create a feature filter for CDS features only > > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > > > > //Get the filtered Features > > FeatureHolder fh = richSeq.filter(ff); > > > > //Iterate through the features > > for (Iterator i = fh.features(); i.hasNext(); ) { > > RichFeature rf = (RichFeature) i.next(); > > System.out.println("RichFeature: " + rf.toString()); > > RichAnnotation ra = (RichAnnotation) rf.getAnnotation(); > > System.out.println("RichAnnotation: " + ra.toString()); > > } > > //====================================================================== > ========= > > > > The output shows that CDS features have been filtered successfully and > that the gene name is in the annotation > > > > RichFeature: (#1) > lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109 > 76,12496..12656,14136..14266]) > > RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1" > > 14403..14532,16852..16987,17821..17959,18068..18122, > > 19456..19570,23623..23753,25885..26053,29102..29240, > > 32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4) > biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match: > proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene: > dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited > guanine > > nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1] > > > > > > If I add the following then I can see what keys are in the annotation > > //====================================================================== > ========= > > Set keySet = ra.keys(); > > for (Iterator it = keySet.iterator(); it.hasNext(); ) { > > String key = it.next().toString(); > > System.out.println("Key: " + key); > > } > > //====================================================================== > ========= > > > > The output shows that there is a gene > > > > Key: biojavax:clone_lib > > Key: biojavax:codon_start > > Key: biojavax:evidence > > Key: biojavax:gene > > Key: biojavax:note > > Key: biojavax:product > > Key: biojavax:protein_id > > > > My understanding is that I need to use a ComparableTerm to access the > value but when I create it I get a NoSuchElementException error > > > > ComparableTerm gene = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); > > System.out.println("Gene: " + ra.getProperty(gene)); > > > > java.util.NoSuchElementException: No such property: biojavax:gene, rank > 0 > > > > cheers, > > > > Jolyon > > > > > > > > > > Jolyon Holdstock Ph.D. > > Senior Computational Biologist, > > Oxford Gene Technology (Ops) Ltd. > > Begbroke Business and Science Park > > Sandy Lane, Yarnton > > Oxford, OX5 1PF > > > > Tel: 01865 309699 > > Fax: 01865 842116 > > > > Confidentiality Notice: > > The contents of this email from the Oxford Gene Technology Group of > Companies are confidential and intended solely for the person to whom it > is addressed. It may contain privileged and confidential information. If > you are not the intended recipient you must not read, copy, distribute, > discuss or take any action in reliance on it. > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From dag at sonsorol.org Sat Mar 25 23:50:57 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Sat, 25 Mar 2006 18:50:57 -0500 Subject: [Biojava-l] Important news for developers on open-bio machines Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org> Hi, apologies for the massive cross-post. I'll keep it short! This message is a last-ditch attempt to contact people with developer accounts on pub.open-bio.org who may have not received the individual mails we've been sending via the obf-developers at lists.open-bio.org mailing list. We suspect that there are a number of devs out there for whom we don't have up to date email addresses. All open-bio services have been migrated to new hardware and a new datacenter. Part of this migration process involved moving all developer accounts and all source-code repositories to a new server. The developer migration was completed a few minutes ago. An unavoidable side effect of the move is that all developers are now locked out of their accounts until they contact us for a password reset. If you are a developer and this news comes as a surprise to you, it means we don't have your contact info. Your best way to get up to speed on the history and technical details behind the migration is to point your browser here: http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ thread.html ... and read the various messages we've posted this month. Included in the first message is the information on how to request an account reset. Regards, Chris Dagdigian open-bio.org From duze at gmx.de Tue Mar 28 06:44:38 2006 From: duze at gmx.de (=?ISO-8859-1?Q?=22Andreas_Dr=E4ger=22?=) Date: Tue, 28 Mar 2006 08:44:38 +0200 (MEST) Subject: [Biojava-l] (no subject) Message-ID: <2493.1143528278@www086.gmx.net> Hi, I just tried the GA-Example from the BioJava Cookbook. Therefore I included all sources from the biojava-live directory from CVS. The following line seems to cause problems: genAlg.run(new DemoStopping()); After execution one receives the following (error) message: gen,average_fitness,best_fitness 0,49.98,67.0 Exception in thread "main" java.lang.Error: Unresolved compilation problem: Syntax error on token "assert", assert expected at org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280) at org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339) at org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80) at org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108) at GADemo.main(GADemo.java:91) I do not know, how to proceed, so I post this message to you. Sincerely, Andreas Dr?ger -- Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer! Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer From richard.holland at ebi.ac.uk Tue Mar 28 07:42:33 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 28 Mar 2006 08:42:33 +0100 Subject: [Biojava-l] (no subject) In-Reply-To: <2493.1143528278@www086.gmx.net> References: <2493.1143528278@www086.gmx.net> Message-ID: <1143531753.3898.45.camel@texas.ebi.ac.uk> Hi Andreas. This sounds like a compiler version or flags problem. Could you check that you are running javac from a Java 1.4 or later installation? Also, see http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility (The Ant script uses the flag '-source 1.4' for everything). Then try doing an 'ant clean' before your 'ant package-biojava' to make sure everything gets recompiled. cheers, Richard On Tue, 2006-03-28 at 08:44 +0200, "Andreas Dr?ger" wrote: > Hi, > > I just tried the GA-Example from the BioJava Cookbook. > Therefore I included all sources from the biojava-live > directory from CVS. The following line seems to cause > problems: > > genAlg.run(new DemoStopping()); > > After execution one receives the following (error) message: > gen,average_fitness,best_fitness > 0,49.98,67.0 > Exception in thread "main" java.lang.Error: Unresolved compilation problem: > Syntax error on token "assert", assert expected > > at > org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280) > at > org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339) > at > org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80) > at > org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108) > at GADemo.main(GADemo.java:91) > > I do not know, how to proceed, so I post this message to you. > > Sincerely, > Andreas Dr?ger > -- Richard Holland European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK Tel: +44-(0)1223-494416 --------------- From andreas.draeger at clever-telefonieren.de Tue Mar 28 08:29:32 2006 From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Tue, 28 Mar 2006 10:29:32 +0200 Subject: [Biojava-l] GA-Package Message-ID: <4428F3EC.9050507@clever-telefonieren.de> Thanks, Now it works fine! Cheers, Andreas Richard Holland wrote: Hi Andreas. This sounds like a compiler version or flags problem. Could you check that you are running javac from a Java 1.4 or later installation? Also, see http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility (The Ant script uses the flag '-source 1.4' for everything). Then try doing an 'ant clean' before your 'ant package-biojava' to make sure everything gets recompiled. cheers, Richard -- ================================== Andreas Dr?ger PhD student Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Phone: +49-7071-29-70436 ================================== From andreas.draeger at clever-telefonieren.de Tue Mar 28 08:34:20 2006 From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Tue, 28 Mar 2006 10:34:20 +0200 Subject: [Biojava-l] GA-Package Message-ID: <4428F50C.4070104@clever-telefonieren.de> Thanks, Now it works fine! Cheers, Andreas -- ================================== Andreas Dr?ger PhD student Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Phone: +49-7071-29-70436 ================================== From wendy.wong at gmail.com Thu Mar 30 15:41:47 2006 From: wendy.wong at gmail.com (wendy wong) Date: Thu, 30 Mar 2006 16:41:47 +0100 Subject: [Biojava-l] unsupervised training of transition weights Message-ID: Hi, I am trying to train my HMM using unsupervised training (I don't need to train the emission probabilities). I was wondering how I can do so in biojava. do I have to implement the TransitionTrainer interface? my second question is: I implemnted getWeightImpl in my custom distribution to set up my emission states and it works fine. but is it possible to get the program to access it only when there's certain symbol in the observed sequence, (instead of precalculated)? and I also found that (although I might be wrong) the weights are calculated twice, once was when the distribution was created, and then when I call viterbi it calls getWeightImpl again. I am not sure what I did wrong here :( any input would be very much appreciated! thank you! wendy From td2 at sanger.ac.uk Fri Mar 31 10:58:38 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Fri, 31 Mar 2006 11:58:38 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: References: Message-ID: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> On 30 Mar 2006, at 16:41, wendy wong wrote: > Hi, > > I am trying to train my HMM using unsupervised training (I don't need > to train the emission probabilities). I was wondering how I can do so > in biojava. do I have to implement the TransitionTrainer interface? The easiest way to do this is to use UntrainableDistributions for all the transition-sets that you don't want to be trained: http://www.biojava.org/docs/api14/org/biojava/bio/dist/ UntrainableDistribution.html If UntrainableDistribution doesn't fit your requirements, the alternative is to create your own Distribution implementation with a registerTrainer method that creates a "dummy" (i.e. doesn't do anything) DistributionTrainer. UntrainableDistribution is just a subclass of SimpleDistribution which replaces the registerTrainer method with a non-functional version. > my second question is: > I implemnted getWeightImpl in my custom distribution to set up my > emission states and it works fine. but is it possible to get the > program to access it only when there's certain symbol in the observed > sequence, (instead of precalculated)? and I also found that (although > I might be wrong) the weights are calculated twice, once was when the > distribution was created, and then when I call viterbi it calls > getWeightImpl again. I am not sure what I did wrong here :( The DP code does some caching of probabilities, I don't think there's any way to turn this off without modifying the DP implementations. Thomas. From matthew.pocock at ncl.ac.uk Fri Mar 31 17:05:25 2006 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Fri, 31 Mar 2006 18:05:25 +0100 Subject: [Biojava-l] unsupervised training of transition weights In-Reply-To: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> References: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk> Message-ID: <200603311805.25861.matthew.pocock@ncl.ac.uk> > The DP code does some caching of probabilities, I don't think there's > any way to turn this off without modifying the DP implementations. > > Thomas. My reccolection is that if you did turn this off, the algorithm would run very, very much more slowly. Internally to the DP objects, the distribution probabilities (in fact, they aren't even probabilities by this stage) are stored in a data-structure optimized for the type of lookups performed during the dynamic programming recursions. Matthew