From bugzilla-daemon at portal.open-bio.org Wed Oct 1 16:48:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 16:48:15 -0400 Subject: [Biojava-dev] [Bug 2602] New: ParseException thrown when parsing Genbank file. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2602 Summary: ParseException thrown when parsing Genbank file. Product: BioJava Version: live (CVS source) Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P1 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: tritt at wisc.edu When attempting to read in a Genbank file using RichSequence.IOTools, I received a ParseException. When using SeqIOTools, I do not have this problem. The code that exposed the bug is given below. public static void main(String[] args) { String dnaDir = args[args.length-1]; BufferedReader[] br = new BufferedReader[8]; FileReader orthologs = null; for (int i = 0; i < br.length; i++) br[i] = null; try { orthologs = new FileReader(args[0]); for (int i = 0; i < br.length; i++) br[i] = new BufferedReader(new FileReader(args[i+1])); } catch (FileNotFoundException ex){ ex.printStackTrace(); System.exit(-1); } RichSequenceIterator[] seqIt = new RichSequenceIterator[8]; HashMap[] features = new HashMap[8]; for (int i = 0; i < features.length; i++){ features[i] = new HashMap(); } for (int i = 0; i < br.length; i++) seqIt[i] = RichSequence.IOTools.readGenbankDNA(br[i], null); for (int i = 0; i < seqIt.length; i++){ RichSequence seq = null; try { seq = seqIt[i].nextRichSequence(); seqIt[i] = null; br[i] = null; } catch (NoSuchElementException ex) { ex.printStackTrace(); System.exit(-1); } catch (BioException ex) { ex.printStackTrace(); System.exit(-1); } . . . The following error message was received. org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at OrthologSeqExtractor.main(OrthologSeqExtractor.java:76) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.GenbankFormat Accession=EDL933 Id=null Comments=Bad dbxref Parse_block=FEATURES Location/Qualifierssource 1..5528423/db_xref "GenBank:AE005174"/db_xref "RefSeq_NA:NC_002655"/db_xref "ATCC:700927"/db_xref "taxon:155864"/db_xref "ERIC:SOP"/mol_type "genomic DNA"/note "enterohemorrhagic"/organism "Escherichia coli"/serotype "O157:H7:K-"/strain "EDL933"/transl_table 11/db_xref "ASAP:ABH-0023909"/db_xref "ERIC:ABH-0023909" . . . Stack trace follows .... at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:462) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 1 more -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 03:54:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 03:54:42 -0400 Subject: [Biojava-dev] [Bug 2603] New: StringIndexOutOfBoundsException while parsing blastresult Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2603 Summary: StringIndexOutOfBoundsException while parsing blastresult Product: BioJava Version: unspecified Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: bio AssignedTo: biojava-dev at biojava.org ReportedBy: dtoomey at rcsi.ie While parsing a blast result I get a StringIndexOutOfBoundsException. I have narrowed down the cuase of the error to this section Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) GN=ISPF What I have found is that if the 3rd line is less than 11 characters long the error is thrown. If I add text or even extra spaces to this line then the error does not occur. Also I have noticed that it does not happen to the first entry in a file containing multiple blast searches. I have tried this on both Windows and Linux and get the same error. I have been using blast version 2.2.18 but have also tried 2.2.17 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 06:30:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 06:30:16 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810031030.m93AUGcD007688@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #1 from dtoomey at rcsi.ie 2008-10-03 06:30 EST ------- I have narrowed down the offending line to oParsedSeq = poLine.substring( iOffset).concat( new String( oPadding ) ); from 'BlastLikeAlignmentSAXParser.java' I have put in a hack which at least allows me to run the code try { oParsedSeq = poLine.substring( iOffset).concat( new String( oPadding ) ); } catch (StringIndexOutOfBoundsException ex) { System.out.println("Caught sub string error for poLine: " + poLine + " Offset is " + String.valueOf(iOffset)); oParsedSeq = poLine.concat( new String( oPadding ) ); } (In reply to comment #0) > While parsing a blast result I get a StringIndexOutOfBoundsException. I have > narrowed down the cuase of the error to this section > Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol > 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) > GN=ISPF > What I have found is that if the 3rd line is less than 11 characters long the > error is thrown. If I add text or even extra spaces to this line then the error > does not occur. Also I have noticed that it does not happen to the first entry > in a file containing multiple blast searches. > I have tried this on both Windows and Linux and get the same error. I have been > using blast version 2.2.18 but have also tried 2.2.17 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 04:12:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 04:12:18 -0400 Subject: [Biojava-dev] [Bug 2617] New: Cookbook blast parser example fails on a tblastn example Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2617 Summary: Cookbook blast parser example fails on a tblastn example Product: BioJava Version: live (CVS source) Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: search AssignedTo: biojava-dev at biojava.org ReportedBy: holland at ebi.ac.uk (raised on behalf of user Charles Imbusch) Hello, for a project I want to parse a tblastn result with BioJava. I used the code on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I get an error message as follows: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1938) at java.lang.String.substring(String.java:1905) at org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289) at org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115) at org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514) at org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287) at org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251) at org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118) at org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635) at org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337) at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164) at org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313) at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162) at BlastEcho.echo(BlastEcho.java:29) at BlastEcho.main(BlastEcho.java:75) I uploaded the Blast output file I want to parse here: http://charles.imbusch.net/tmp/blastresult.txt Any answer is appreciated. Cheers, Charles -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From f.jossinet at ibmc.u-strasbg.fr Wed Oct 15 04:36:09 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Wed, 15 Oct 2008 10:36:09 +0200 Subject: [Biojava-dev] Proposition of participation to the BioJava project Message-ID: Dear BioJava team, my name is Fabrice Jossinet. I'm working as assistant professor in a french university (Louis Pasteur University in Strasbourg). I'm developing bioinformatics tool with the Java language since 2002. Before that, I did a PhD as a molecular biologist at the bench ;) I'm interested in the study of RNA. At now I'm focused on their structural features, but i'm also interested in non-coding RNA genes in genomes. You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ . At now this project has a size of 60 000 lines of code and uses more than 10 external libraries. I'm following BioJava since several years now. I would like to extend it with RNA concepts. If you think that I can participate, don't hesitate to answer me ;) All the best Fabrice -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From simpleyrx at 163.com Wed Oct 15 05:11:50 2008 From: simpleyrx at 163.com (simpleyrx) Date: Wed, 15 Oct 2008 17:11:50 +0800 (CST) Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ? In-Reply-To: References: Message-ID: <7852810.354291224061911001.JavaMail.coremail@app143.163.com> Dear experts, I wonder that can biojava can calcaulte profile-profile alignment ? -- student From bugzilla-daemon at portal.open-bio.org Wed Oct 15 12:05:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:05:23 -0400 Subject: [Biojava-dev] [Bug 2617] Cookbook blast parser example fails on a tblastn example In-Reply-To: Message-ID: <200810151605.m9FG5Nhb004488@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2617 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from holland at ebi.ac.uk 2008-10-15 12:05 EST ------- *** This bug has been marked as a duplicate of bug 2603 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 12:05:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:05:25 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810151605.m9FG5PZo004505@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |holland at ebi.ac.uk ------- Comment #2 from holland at ebi.ac.uk 2008-10-15 12:05 EST ------- *** Bug 2617 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From holland at eaglegenomics.com Wed Oct 15 12:25:16 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 15 Oct 2008 17:25:16 +0100 Subject: [Biojava-dev] Proposition of participation to the BioJava project In-Reply-To: References: Message-ID: You're absolutely welcome to contribute! We appreciate all the help we can get. I will be sending out an email to the BioJava mailing lists in the next couple of days inviting contributions for the new BioJava 3 code and describing how to go about it. I think your RNA ideas would be a great starting point. cheers, Richard 2008/10/15 Fabrice Jossinet > Dear BioJava team, > > my name is Fabrice Jossinet. I'm working as assistant professor in a french > university (Louis Pasteur University in Strasbourg). > I'm developing bioinformatics tool with the Java language since 2002. > Before that, I did a PhD as a molecular biologist at the bench ;) > I'm interested in the study of RNA. At now I'm focused on their structural > features, but i'm also interested in non-coding RNA genes in genomes. > You can have a look at my current project at this address: > http://paradise-ibmc.u-strasbg.fr/. At now this project has a size of 60 > 000 lines of code and uses more than 10 external libraries. > > I'm following BioJava since several years now. I would like to extend it > with RNA concepts. If you think that I can participate, don't hesitate to > answer me ;) > > All the best > > Fabrice > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Wed Oct 15 12:29:59 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 15 Oct 2008 17:29:59 +0100 Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ? In-Reply-To: <7852810.354291224061911001.JavaMail.coremail@app143.163.com> References: <7852810.354291224061911001.JavaMail.coremail@app143.163.com> Message-ID: The short answer: no. The long answer: not yet! But if someone would like to contribute some code that can do it, watch out for my email to the mailing lists in the next couple of days inviting contributions for the new BioJava 3 code base. cheers, Richard 2008/10/15 simpleyrx > > Dear experts, > > I wonder that can biojava can calcaulte profile-profile alignment ? > > > > > -- > > > student > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From bugzilla-daemon at portal.open-bio.org Thu Oct 16 02:15:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 02:15:05 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160615.m9G6F5Tk014016@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #3 from tbanks at agr.gc.ca 2008-10-16 02:15 EST ------- Created an attachment (id=1007) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1007&action=view) patch file 1 for bug 2603 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 02:15:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 02:15:46 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160615.m9G6FkaF014096@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #4 from tbanks at agr.gc.ca 2008-10-16 02:15 EST ------- Created an attachment (id=1008) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1008&action=view) patch file 2 for bug 2603 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 02:18:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 02:18:10 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160618.m9G6IATb014290@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #5 from tbanks at agr.gc.ca 2008-10-16 02:18 EST ------- I've written up a fix for this bug. As Richard suspected this fix takes care of bug 2617 (I've tested both). I've attached the patch files for the two affected files. If the patches don't take let me know and I'll email the files. - Travis -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From f.jossinet at ibmc.u-strasbg.fr Thu Oct 16 04:50:54 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Thu, 16 Oct 2008 10:50:54 +0200 Subject: [Biojava-dev] Proposition of participation to the BioJava project In-Reply-To: References: Message-ID: <65EB20E6-6137-441B-AC13-26031D46BDFE@ibmc.u-strasbg.fr> Dear Richard, Thank you very much. I'm looking forward to this invitation. All the best Fabrice Le 15 oct. 08 ? 18:25, Richard Holland a ?crit : > You're absolutely welcome to contribute! We appreciate all the help > we can get. > > I will be sending out an email to the BioJava mailing lists in the > next couple of days inviting contributions for the new BioJava 3 > code and describing how to go about it. I think your RNA ideas would > be a great starting point. > > cheers, > Richard > > 2008/10/15 Fabrice Jossinet > Dear BioJava team, > > my name is Fabrice Jossinet. I'm working as assistant professor in a > french university (Louis Pasteur University in Strasbourg). > I'm developing bioinformatics tool with the Java language since > 2002. Before that, I did a PhD as a molecular biologist at the > bench ;) > I'm interested in the study of RNA. At now I'm focused on their > structural features, but i'm also interested in non-coding RNA genes > in genomes. > You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ > . At now this project has a size of 60 000 lines of code and uses > more than 10 external libraries. > > I'm following BioJava since several years now. I would like to > extend it with RNA concepts. If you think that I can participate, > don't hesitate to answer me ;) > > All the best > > Fabrice > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ From bugzilla-daemon at portal.open-bio.org Thu Oct 16 05:39:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 05:39:11 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160939.m9G9dBGm028921@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #6 from holland at ebi.ac.uk 2008-10-16 05:39 EST ------- Thanks for the patches! Could you email me the complete two files that you've modified (it's easier for me to just copy-and-paste the entire file). I'll then commit them to SVN. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From fbristow at gmail.com Fri Oct 17 14:58:08 2008 From: fbristow at gmail.com (Franklin Bristow) Date: Fri, 17 Oct 2008 13:58:08 -0500 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files Message-ID: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> Hello everyone, I've been doing some work with swissprot, and I've been needing to make use of the file reading and writing facilities in biojava. I was using biojava 1.5, but I've recently moved to using biojava-live so that I can actually step through the code to see what's going on. I have successfully created an index of my swissprot database and I can read my sequences out of that indexed database. All of the appropriate information is loaded from the records in the file into the appropriate objects. I am quite happy with this. The problem that I am having has to do with writing swissprot records. When I started using biojava, the recommended way to do this was using SeqIOTools: SeqIOTools.writeSwissprot(byteStream, swissSequence); While this works (ie: no exceptions are thrown), the record that is printed to the byteStream looks pretty ugly (it's littered with XX lines) and is not valid as per the current swissprot file spec ( http://www.expasy.ch/sprot/userman.html). While this record is invalid, it does contain all of the information that was originally in the swissprot file. I would include what I get as an output here, but it's irrelevant. SeqIOTools became deprecated in favour of this: RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); Once again, while this works (and this time the record is valid), the record that is printed contains almost none of the original information that is contained in the swissprot record. This is the output that I get when I call this method (the spacing is may not look right because of fonts, but that is not the problem): ID Q4UVA7_null STANDARD; 273 AA. > AC Q4UVA7; > DT null, integrated into UniProtKB/?. > DT null, sequence version 0. > DT null, entry version 0. > DE null. > FT any 1 273 > FT any 153 160 > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > // > But what I am expecting to see looks like this (again, the spacing is the fault of the font, not the output): > ID Y1953_XANC8 Reviewed; 273 AA. > AC Q4UVA7; > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 1. > DT 06-FEB-2007, entry version 12. > DE UPF0085 protein XC_1953. > GN OrderedLocusNames=XC_1953; > OS Xanthomonas campestris pv. campestris (strain 8004). > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; > OC Xanthomonadaceae; Xanthomonas. > OX NCBI_TaxID=314565; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. > RX PubMed=15899963; DOI=10.1101/gr.3378705; > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q., > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B., > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; > RT "Comparative and functional genomic analyses of the pathogenicity of > RT phytopathogen Xanthomonas campestris pv. campestris."; > RL Genome Res. 15:757-767(2005). > CC -!- SIMILARITY: Belongs to the UPF0085 family. > CC ------------------------------------------------------------ > ----------- > CC Copyrighted by the UniProt Consortium, see > http://www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ------------------------------------------------------------ > ----------- > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. > DR GenomeReviews; CP000050_GR; XC_1953. > DR KEGG; xcb:XC_1953; -. > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. > DR HAMAP; MF_01062; -; 1. > DR InterPro; IPR005177; DUF299. > DR Pfam; PF03618; DUF299; 1. > KW ATP-binding; Complete proteome; Nucleotide-binding. > FT CHAIN 1 273 UPF0085 protein XC_1953. > FT /FTId=PRO_0000196744. > FT NP_BIND 153 160 ATP (Potential). > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > // > Needless to say, there is a considerable loss of information. At first I wasn't sure if this was a problem with parsing the database that I had, so I inspected the object that was retrieved from the database. As I mentioned before, the parsing seems to be working fine. I get a SimpleSequence object that has all of the correct annotations and other information loaded into it. I then continued to step through the writeUniProt method in RichSequence.IOTools and found that this method first calls "enrich" on SimpleSequence which turns it into a SimpleRichSequence. There appears to be some loss of information at this point, specifically in the feature set where the 'key name' is lost -- it just becomes 'any'. It is when we get to the actual process of writing to the stream in UniprotFormat.writeSequence that we have the problems. All of the code appears to be there for printing the information out that I'm expecting. I think the problem is that in the process of "enrich"-ing the sequence, the data is still stored in the object, but it is no longer where it is expected to be. For example, when we get to writing the comments out: // comments - if any if (!rs.getComments().isEmpty()) { The List of comments IS empty, but there are comments in the SimpleRichSequence, they are stored in the notes data member. So. After this lengthy explanation of my problem, I am wondering if I am merely not doing this correctly. Is there a better way to pass my information to the writeUniprot method -- should I be transforming my SimpleSequence objects into a SimpleRichSequence manually? Am I just going about this entirely the wrong way? If I am going about this correctly and the functionality to do this is merely not there or hasn't been implemented correctly, I would be more than happy to help out... I can supply patches, create bug reports, or anything else that is necessary. Any guidance in this matter would be greatly appreciated! -- Franklin From holland at eaglegenomics.com Fri Oct 17 16:08:25 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 17 Oct 2008 21:08:25 +0100 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files In-Reply-To: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> Message-ID: Hello. I'm not sure how you're getting your uniprot records out of your swissprot database, or what format your swissprot database is in? If it's BioSQL, then the way BioJava interacts with it has altered significantly with BioJavaX - previous versions basically stuffed everything in as comments, hence all the XX lines you got when writing it back out again. However if it's not BioSQL and you've written something custom of your own, then I couldn't really comment! BioJavaX will attempt to convert the old sequence objects into rich sequence objects, but there's not much in common between the way uniprot data is stored in the old object model and the new one. Therefore the enrich method can't do a very good job - especially for stuff which the original parser stored as comments instead of properly distributing it across the object model. Data which the original parser stored in this comment format will mostly get ignored by the conversion process, because the conversion process has no idea where the record came from and therefore what to do with the comments inside it. Your best bet is to read your data out of your database directly as rich sequence objects, or if not possible, then do the conversion manually. cheers, Richard 2008/10/17 Franklin Bristow > Hello everyone, > I've been doing some work with swissprot, and I've been needing to make use > of the file reading and writing facilities in biojava. > > I was using biojava 1.5, but I've recently moved to using biojava-live so > that I can actually step through the code to see what's going on. > > I have successfully created an index of my swissprot database and I can > read > my sequences out of that indexed database. All of the appropriate > information is loaded from the records in the file into the appropriate > objects. I am quite happy with this. > > The problem that I am having has to do with writing swissprot records. > > When I started using biojava, the recommended way to do this was using > SeqIOTools: > SeqIOTools.writeSwissprot(byteStream, swissSequence); > > While this works (ie: no exceptions are thrown), the record that is printed > to the byteStream looks pretty ugly (it's littered with XX lines) and is > not > valid as per the current swissprot file spec ( > http://www.expasy.ch/sprot/userman.html). While this record is invalid, > it > does contain all of the information that was originally in the swissprot > file. I would include what I get as an output here, but it's irrelevant. > > SeqIOTools became deprecated in favour of this: > RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); > > Once again, while this works (and this time the record is valid), the > record > that is printed contains almost none of the original information that is > contained in the swissprot record. This is the output that I get when I > call this method (the spacing is may not look right because of fonts, but > that is not the problem): > > ID Q4UVA7_null STANDARD; 273 AA. > > AC Q4UVA7; > > DT null, integrated into UniProtKB/?. > > DT null, sequence version 0. > > DT null, entry version 0. > > DE null. > > FT any 1 273 > > FT any 153 160 > > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > > // > > > > But what I am expecting to see looks like this (again, the spacing is the > fault of the font, not the output): > > > ID Y1953_XANC8 Reviewed; 273 AA. > > AC Q4UVA7; > > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. > > DT 05-JUL-2005, sequence version 1. > > DT 06-FEB-2007, entry version 12. > > DE UPF0085 protein XC_1953. > > GN OrderedLocusNames=XC_1953; > > OS Xanthomonas campestris pv. campestris (strain 8004). > > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; > > OC Xanthomonadaceae; Xanthomonas. > > OX NCBI_TaxID=314565; > > RN [1] > > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. > > RX PubMed=15899963; DOI=10.1101/gr.3378705; > > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q., > > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., > > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B., > > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; > > RT "Comparative and functional genomic analyses of the pathogenicity of > > RT phytopathogen Xanthomonas campestris pv. campestris."; > > RL Genome Res. 15:757-767(2005). > > CC -!- SIMILARITY: Belongs to the UPF0085 family. > > CC ------------------------------------------------------------ > > ----------- > > CC Copyrighted by the UniProt Consortium, see > > http://www.uniprot.org/terms > > CC Distributed under the Creative Commons Attribution-NoDerivs License > > CC ------------------------------------------------------------ > > ----------- > > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. > > DR GenomeReviews; CP000050_GR; XC_1953. > > DR KEGG; xcb:XC_1953; -. > > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. > > DR HAMAP; MF_01062; -; 1. > > DR InterPro; IPR005177; DUF299. > > DR Pfam; PF03618; DUF299; 1. > > KW ATP-binding; Complete proteome; Nucleotide-binding. > > FT CHAIN 1 273 UPF0085 protein XC_1953. > > FT /FTId=PRO_0000196744. > > FT NP_BIND 153 160 ATP (Potential). > > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > > // > > > > Needless to say, there is a considerable loss of information. > > At first I wasn't sure if this was a problem with parsing the database that > I had, so I inspected the object that was retrieved from the database. As > I > mentioned before, the parsing seems to be working fine. I get a > SimpleSequence object that has all of the correct annotations and other > information loaded into it. > > I then continued to step through the writeUniProt method in > RichSequence.IOTools and found that this method first calls "enrich" on > SimpleSequence which turns it into a SimpleRichSequence. There appears to > be some loss of information at this point, specifically in the feature set > where the 'key name' is lost -- it just becomes 'any'. > > It is when we get to the actual process of writing to the stream in > UniprotFormat.writeSequence that we have the problems. All of the code > appears to be there for printing the information out that I'm expecting. I > think the problem is that in the process of "enrich"-ing the sequence, the > data is still stored in the object, but it is no longer where it is > expected > to be. For example, when we get to writing the comments out: > // comments - if any > if (!rs.getComments().isEmpty()) { > > The List of comments IS empty, but there are comments in the > SimpleRichSequence, they are stored in the notes data member. > > So. After this lengthy explanation of my problem, I am wondering if I am > merely not doing this correctly. Is there a better way to pass my > information to the writeUniprot method -- should I be transforming my > SimpleSequence objects into a SimpleRichSequence manually? Am I just going > about this entirely the wrong way? > > If I am going about this correctly and the functionality to do this is > merely not there or hasn't been implemented correctly, I would be more than > happy to help out... I can supply patches, create bug reports, or anything > else that is necessary. > > Any guidance in this matter would be greatly appreciated! > > -- > Franklin > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Sun Oct 19 20:18:29 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 01:18:29 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! Message-ID: Hi all, I've just committed some new code to the biojava3 branch of the biojava-live subversion repository. It's the foundations of a brand new alphabet+symbol set of classes, and an example of how to use them to represent DNA. You'll notice that the new code is very lightweight and allows for a lot more flexibility than the old code - for instance, the concept of Alphabet has changed radically. It also makes much more extensive use of the Collections API. I haven't got any test cases or usage examples yet but give me a shout if you don't understand the code and I'll explain how it works. (Hint: SymbolFormat is there to convert Strings into SymbolList objects, and vice versa). So, now we want some volunteers! We're starting from scratch here so there's a lot of work to do. The whole of BioJava needs 'translating' into BJ3, whether it be copy-and-paste existing classes and modify them to suit the new style, or write completely new ones to provide equivalent functionality. I'll post an example of how to do file parsing soon, probably starting with FASTA. In the meantime, a good place to start would be for people to design object models to represent their favourite data types (e.g. Genbank, or microarray data). Utility classes to manipulate those objects would be great too. The object models need to be normalised as much as possible - e.g. if your data has a lot of comments, and the order of those comments is important, then give your object model a collection of comment objects. The object model for each data type should be completely independent and use basic data types wherever possible (e.g. store sequences as strings, don't attempt to parse them into anything fancy like SymbolLists). The closer the object model is to the original data format, the better. There's going to be clever tricks when it comes to converting data between different object models (e.g. Genbank to INSDSeq), which I will explain later when I put the file parsing examples up. You'll notice how the biojava3 branch uses Maven instead of Ant. This is because we want to make it as modular as possible, so if you want to write microarray stuff, create a new microarray sub-project (as per the dna example that's already there). This way if someone only wants the microarray bit of BJ3, they only need install the appropriate JAR file and can ignore the rest. (The 'core' module is for stuff that is so generic it could be used anywhere, or is used in every single other module.) If coding isn't your cup of tea, then we would very much welcome testers (particularly those who enjoy writing test cases!), documenters (particularly code commenters), translators (for internationalisation of the code), and of course all those who wish to contribute ideas and suggestions no matter how off-the-wall they might be. In particular if you'd like to take charge of an area of the development process, e.g. Documentation Chief, or Protein Champion, then that would be much appreciated. I'm very much looking forward to working with everyone on this. Good luck, and happy coding! cheers, Richard PS. Please don't forget to attach the appropriate licence to your code. You can copy-and-paste it from the existing classes I just committed this evening. PPS. For those who are worried about backwards compatibility - this was discussed on the lists a while back and it was made clear that BJ3 is a clean break. However, the existing code will continue to be maintained and bugfixed for a couple of years so you don't have to upgrade if you don't want to - it just won't have any new features developed for it. This is largely because it'll probably take just that long to write all the new BJ3 code. When we do decide to desupport the existing BJ code, plenty of notice will be given (i.e. years as opposed to months). -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Mon Oct 20 00:13:01 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 20 Oct 2008 12:13:01 +0800 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com> Hi - Just a comment ... Does an alphabet need to be a Singleton in this new paradigm? If it does then do you want to have an equals() method? Currently you could have: Alphabet a; Alphabet b; a.equals(b) //true; a == b //false Unless there is a strong reason why Alphabet needs to be a Singleton I don't think it should be (Singletons make life hard when transporting between JVMs). You can get a similar kind of behaivor with caching where it doesn't hurt if there is more than one instance of an equal alphabet but when they pass through the cache they can get cleaned up (like the interning behaivour of Strings). Put it this way. If I have two copies of the DNA alphabet will it matter (other than a bit of memory waste)? - Mark On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland wrote: > Hi all, > > I've just committed some new code to the biojava3 branch of the biojava-live > subversion repository. It's the foundations of a brand new alphabet+symbol > set of classes, and an example of how to use them to represent DNA. You'll > notice that the new code is very lightweight and allows for a lot more > flexibility than the old code - for instance, the concept of Alphabet has > changed radically. It also makes much more extensive use of the Collections > API. > > I haven't got any test cases or usage examples yet but give me a shout if > you don't understand the code and I'll explain how it works. (Hint: > SymbolFormat is there to convert Strings into SymbolList objects, and vice > versa). > > So, now we want some volunteers! We're starting from scratch here so there's > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > whether it be copy-and-paste existing classes and modify them to suit the > new style, or write completely new ones to provide equivalent functionality. > > > I'll post an example of how to do file parsing soon, probably starting with > FASTA. In the meantime, a good place to start would be for people to design > object models to represent their favourite data types (e.g. Genbank, or > microarray data). Utility classes to manipulate those objects would be great > too. > > The object models need to be normalised as much as possible - e.g. if your > data has a lot of comments, and the order of those comments is important, > then give your object model a collection of comment objects. The object > model for each data type should be completely independent and use basic data > types wherever possible (e.g. store sequences as strings, don't attempt to > parse them into anything fancy like SymbolLists). The closer the object > model is to the original data format, the better. There's going to be clever > tricks when it comes to converting data between different object models > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > parsing examples up. > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > because we want to make it as modular as possible, so if you want to write > microarray stuff, create a new microarray sub-project (as per the dna > example that's already there). This way if someone only wants the microarray > bit of BJ3, they only need install the appropriate JAR file and can ignore > the rest. (The 'core' module is for stuff that is so generic it could be > used anywhere, or is used in every single other module.) > > If coding isn't your cup of tea, then we would very much welcome testers > (particularly those who enjoy writing test cases!), documenters > (particularly code commenters), translators (for internationalisation of the > code), and of course all those who wish to contribute ideas and suggestions > no matter how off-the-wall they might be. In particular if you'd like to > take charge of an area of the development process, e.g. Documentation Chief, > or Protein Champion, then that would be much appreciated. > > I'm very much looking forward to working with everyone on this. Good luck, > and happy coding! > > cheers, > Richard > > PS. Please don't forget to attach the appropriate licence to your code. You > can copy-and-paste it from the existing classes I just committed this > evening. > > PPS. For those who are worried about backwards compatibility - this was > discussed on the lists a while back and it was made clear that BJ3 is a > clean break. However, the existing code will continue to be maintained and > bugfixed for a couple of years so you don't have to upgrade if you don't > want to - it just won't have any new features developed for it. This is > largely because it'll probably take just that long to write all the new BJ3 > code. When we do decide to desupport the existing BJ code, plenty of notice > will be given (i.e. years as opposed to months). > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Mon Oct 20 04:23:17 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 09:23:17 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com> References: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com> Message-ID: Good point, and the answer is no it doesn't really matter! So I will remove the singleton-ish ness of Alphabet. 2008/10/20 Mark Schreiber > Hi - > > Just a comment ... > > Does an alphabet need to be a Singleton in this new paradigm? If it > does then do you want to have an equals() method? Currently you could > have: > > Alphabet a; Alphabet b; > > a.equals(b) //true; > a == b //false > > Unless there is a strong reason why Alphabet needs to be a Singleton I > don't think it should be (Singletons make life hard when transporting > between JVMs). You can get a similar kind of behaivor with caching > where it doesn't hurt if there is more than one instance of an equal > alphabet but when they pass through the cache they can get cleaned up > (like the interning behaivour of Strings). > > Put it this way. If I have two copies of the DNA alphabet will it > matter (other than a bit of memory waste)? > > - Mark > > On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland > wrote: > > Hi all, > > > > I've just committed some new code to the biojava3 branch of the > biojava-live > > subversion repository. It's the foundations of a brand new > alphabet+symbol > > set of classes, and an example of how to use them to represent DNA. > You'll > > notice that the new code is very lightweight and allows for a lot more > > flexibility than the old code - for instance, the concept of Alphabet has > > changed radically. It also makes much more extensive use of the > Collections > > API. > > > > I haven't got any test cases or usage examples yet but give me a shout if > > you don't understand the code and I'll explain how it works. (Hint: > > SymbolFormat is there to convert Strings into SymbolList objects, and > vice > > versa). > > > > So, now we want some volunteers! We're starting from scratch here so > there's > > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > > whether it be copy-and-paste existing classes and modify them to suit the > > new style, or write completely new ones to provide equivalent > functionality. > > > > > > I'll post an example of how to do file parsing soon, probably starting > with > > FASTA. In the meantime, a good place to start would be for people to > design > > object models to represent their favourite data types (e.g. Genbank, or > > microarray data). Utility classes to manipulate those objects would be > great > > too. > > > > The object models need to be normalised as much as possible - e.g. if > your > > data has a lot of comments, and the order of those comments is important, > > then give your object model a collection of comment objects. The object > > model for each data type should be completely independent and use basic > data > > types wherever possible (e.g. store sequences as strings, don't attempt > to > > parse them into anything fancy like SymbolLists). The closer the object > > model is to the original data format, the better. There's going to be > clever > > tricks when it comes to converting data between different object models > > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > > parsing examples up. > > > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > > because we want to make it as modular as possible, so if you want to > write > > microarray stuff, create a new microarray sub-project (as per the dna > > example that's already there). This way if someone only wants the > microarray > > bit of BJ3, they only need install the appropriate JAR file and can > ignore > > the rest. (The 'core' module is for stuff that is so generic it could be > > used anywhere, or is used in every single other module.) > > > > If coding isn't your cup of tea, then we would very much welcome testers > > (particularly those who enjoy writing test cases!), documenters > > (particularly code commenters), translators (for internationalisation of > the > > code), and of course all those who wish to contribute ideas and > suggestions > > no matter how off-the-wall they might be. In particular if you'd like to > > take charge of an area of the development process, e.g. Documentation > Chief, > > or Protein Champion, then that would be much appreciated. > > > > I'm very much looking forward to working with everyone on this. Good > luck, > > and happy coding! > > > > cheers, > > Richard > > > > PS. Please don't forget to attach the appropriate licence to your code. > You > > can copy-and-paste it from the existing classes I just committed this > > evening. > > > > PPS. For those who are worried about backwards compatibility - this was > > discussed on the lists a while back and it was made clear that BJ3 is a > > clean break. However, the existing code will continue to be maintained > and > > bugfixed for a couple of years so you don't have to upgrade if you don't > > want to - it just won't have any new features developed for it. This is > > largely because it'll probably take just that long to write all the new > BJ3 > > code. When we do decide to desupport the existing BJ code, plenty of > notice > > will be given (i.e. years as opposed to months). > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From fbristow at gmail.com Mon Oct 20 09:36:15 2008 From: fbristow at gmail.com (Franklin Bristow) Date: Mon, 20 Oct 2008 08:36:15 -0500 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files In-Reply-To: References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> Message-ID: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com> Hi Richard, I'm getting my records from an indexed flat file. I indexed the file using IndexTools.indexSwissprot(). I am then retrieving the records from the flat file "database" using the SequenceDBLite interface which is being provided to me using the Registry and SystemRegistry classes. The following a simple example of what I am doing: First I index the flat file: > File[] files = new File[] { new File("/home/fbristow/db/uniprot_sprot.dat") > }; > try { > IndexTools.indexSwissprot("uniprot_sprot", new > File("/home/fbristow/db/index/uniprot_sprot"), files); > } catch (BioException bioE) { > bioE.printStackTrace(); > } catch (ParserException parseE) { > parseE.printStackTrace(); > } catch (IOException ioE) { > ioE.printStackTrace(); > } Then I get a handle on that file by doing: > Registry registry = SystemRegistry.instance(); > setSwissDatabase(registry.getDatabase("swissprot")) > And I have a file in /etc that tells the registry how to find the indexes with the swissprot identifier as per http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html Ultimately, this gives me a class that implements the interface SequenceDBLite, and when I query this interface for sequences it returns to me Sequence objects. I can't seem to see anything that would give me a RichSequence, so I think that I'll continue to get them in this manner, but I'll convert the Sequence objects into RichSequence objects myself. Thanks for your attention! On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland wrote: > Hello. > > I'm not sure how you're getting your uniprot records out of your swissprot > database, or what format your swissprot database is in? If it's BioSQL, then > the way BioJava interacts with it has altered significantly with BioJavaX - > previous versions basically stuffed everything in as comments, hence all the > XX lines you got when writing it back out again. However if it's not BioSQL > and you've written something custom of your own, then I couldn't really > comment! > > BioJavaX will attempt to convert the old sequence objects into rich > sequence objects, but there's not much in common between the way uniprot > data is stored in the old object model and the new one. Therefore the enrich > method can't do a very good job - especially for stuff which the original > parser stored as comments instead of properly distributing it across the > object model. Data which the original parser stored in this comment format > will mostly get ignored by the conversion process, because the conversion > process has no idea where the record came from and therefore what to do with > the comments inside it. > > Your best bet is to read your data out of your database directly as rich > sequence objects, or if not possible, then do the conversion manually. > > cheers, > Richard > > > 2008/10/17 Franklin Bristow > >> Hello everyone, >> I've been doing some work with swissprot, and I've been needing to make >> use >> of the file reading and writing facilities in biojava. >> >> I was using biojava 1.5, but I've recently moved to using biojava-live so >> that I can actually step through the code to see what's going on. >> >> I have successfully created an index of my swissprot database and I can >> read >> my sequences out of that indexed database. All of the appropriate >> information is loaded from the records in the file into the appropriate >> objects. I am quite happy with this. >> >> The problem that I am having has to do with writing swissprot records. >> >> When I started using biojava, the recommended way to do this was using >> SeqIOTools: >> SeqIOTools.writeSwissprot(byteStream, swissSequence); >> >> While this works (ie: no exceptions are thrown), the record that is >> printed >> to the byteStream looks pretty ugly (it's littered with XX lines) and is >> not >> valid as per the current swissprot file spec ( >> http://www.expasy.ch/sprot/userman.html). While this record is invalid, >> it >> does contain all of the information that was originally in the swissprot >> file. I would include what I get as an output here, but it's irrelevant. >> >> SeqIOTools became deprecated in favour of this: >> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); >> >> Once again, while this works (and this time the record is valid), the >> record >> that is printed contains almost none of the original information that is >> contained in the swissprot record. This is the output that I get when I >> call this method (the spacing is may not look right because of fonts, but >> that is not the problem): >> >> ID Q4UVA7_null STANDARD; 273 AA. >> > AC Q4UVA7; >> > DT null, integrated into UniProtKB/?. >> > DT null, sequence version 0. >> > DT null, entry version 0. >> > DE null. >> > FT any 1 273 >> > FT any 153 160 >> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >> > // >> > >> >> But what I am expecting to see looks like this (again, the spacing is the >> fault of the font, not the output): >> >> > ID Y1953_XANC8 Reviewed; 273 AA. >> > AC Q4UVA7; >> > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. >> > DT 05-JUL-2005, sequence version 1. >> > DT 06-FEB-2007, entry version 12. >> > DE UPF0085 protein XC_1953. >> > GN OrderedLocusNames=XC_1953; >> > OS Xanthomonas campestris pv. campestris (strain 8004). >> > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; >> > OC Xanthomonadaceae; Xanthomonas. >> > OX NCBI_TaxID=314565; >> > RN [1] >> > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. >> > RX PubMed=15899963; DOI=10.1101/gr.3378705; >> > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q., >> > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., >> > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B., >> > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; >> > RT "Comparative and functional genomic analyses of the pathogenicity >> of >> > RT phytopathogen Xanthomonas campestris pv. campestris."; >> > RL Genome Res. 15:757-767(2005). >> > CC -!- SIMILARITY: Belongs to the UPF0085 family. >> > CC ------------------------------------------------------------ >> > ----------- >> > CC Copyrighted by the UniProt Consortium, see >> > http://www.uniprot.org/terms >> > CC Distributed under the Creative Commons Attribution-NoDerivs License >> > CC ------------------------------------------------------------ >> > ----------- >> > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. >> > DR GenomeReviews; CP000050_GR; XC_1953. >> > DR KEGG; xcb:XC_1953; -. >> > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. >> > DR HAMAP; MF_01062; -; 1. >> > DR InterPro; IPR005177; DUF299. >> > DR Pfam; PF03618; DUF299; 1. >> > KW ATP-binding; Complete proteome; Nucleotide-binding. >> > FT CHAIN 1 273 UPF0085 protein XC_1953. >> > FT /FTId=PRO_0000196744. >> > FT NP_BIND 153 160 ATP (Potential). >> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >> > // >> > >> >> Needless to say, there is a considerable loss of information. >> >> At first I wasn't sure if this was a problem with parsing the database >> that >> I had, so I inspected the object that was retrieved from the database. As >> I >> mentioned before, the parsing seems to be working fine. I get a >> SimpleSequence object that has all of the correct annotations and other >> information loaded into it. >> >> I then continued to step through the writeUniProt method in >> RichSequence.IOTools and found that this method first calls "enrich" on >> SimpleSequence which turns it into a SimpleRichSequence. There appears to >> be some loss of information at this point, specifically in the feature set >> where the 'key name' is lost -- it just becomes 'any'. >> >> It is when we get to the actual process of writing to the stream in >> UniprotFormat.writeSequence that we have the problems. All of the code >> appears to be there for printing the information out that I'm expecting. >> I >> think the problem is that in the process of "enrich"-ing the sequence, the >> data is still stored in the object, but it is no longer where it is >> expected >> to be. For example, when we get to writing the comments out: >> // comments - if any >> if (!rs.getComments().isEmpty()) { >> >> The List of comments IS empty, but there are comments in the >> SimpleRichSequence, they are stored in the notes data member. >> >> So. After this lengthy explanation of my problem, I am wondering if I am >> merely not doing this correctly. Is there a better way to pass my >> information to the writeUniprot method -- should I be transforming my >> SimpleSequence objects into a SimpleRichSequence manually? Am I just >> going >> about this entirely the wrong way? >> >> If I am going about this correctly and the functionality to do this is >> merely not there or hasn't been implemented correctly, I would be more >> than >> happy to help out... I can supply patches, create bug reports, or >> anything >> else that is necessary. >> >> Any guidance in this matter would be greatly appreciated! >> >> -- >> Franklin >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Franklin From holland at eaglegenomics.com Mon Oct 20 09:51:36 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 14:51:36 +0100 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: Excellent! Thanks for your offer of help! Yes, an advanced RNA module would be very helpful indeed. You should probably call it 'rna'. As long as everyone who intends to work on BJ3 declares their intentions here, as you just have, then basically it's first come first served. I won't be doing any official supervision other than keeping an eye on committed code once in a while to make sure it all looks OK. So feel free to start coding straight away! All new modules should probably start by: 1. copying the existing dna module to something new, like 'rna' in this case. 2. remove all the hidden .svn directories from the copy, 3. update the pom.xml in the copy (do a search-and-replace on dna and change to the new name, rna in this case), delete the existing source packages in src/main/java (org.biojava.dna) and create suitable new ones (org.biojava.rna in this case). 4. empty out the target/ folder then svn add the new module 5. svn:ignore the target/ directory in your new module, 6. include your new module in the list at the end of the pom.xml in the root directory of the biojava3 branch. cheers, Richard 2008/10/20 Fabrice Jossinet > Dear Richard, > > I'm answering to your "official call", to propose you my help for the > development of the biojava3 code. With the modularity of Maven, I also would > like to proposes you my help for the development of a module that will use > the biojava3 code to manage more specialized RNA stuff (secondary and > tertiary structures, base-pairs classifications, modified nucleotides, RNA > alignments,....). > > What will be the next step for me? Will you make a selection? > > Best Regards > > Fabrice Jossinet > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Oct 20 10:17:34 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 15:17:34 +0100 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files In-Reply-To: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com> References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com> Message-ID: Wow, I didn't know anyone was actually using the registry thing. I certainly never have! That's probably why it was left out of the whole update to RichSequences. There will probably be equivalent functionality in BioJava3 at some point but I doubt anyone will backport the RichSequence updates to the existing registry setup (unless there's any volunteers!). Good luck with the conversion process. cheers, Richard 2008/10/20 Franklin Bristow > Hi Richard, > I'm getting my records from an indexed flat file. I indexed the file using > IndexTools.indexSwissprot(). I am then retrieving the records from the flat > file "database" using the SequenceDBLite interface which is being provided > to me using the Registry and SystemRegistry classes. The following a simple > example of what I am doing: > > First I index the flat file: > >> File[] files = new File[] { new >> File("/home/fbristow/db/uniprot_sprot.dat") }; >> try { >> IndexTools.indexSwissprot("uniprot_sprot", new >> File("/home/fbristow/db/index/uniprot_sprot"), files); >> } catch (BioException bioE) { >> bioE.printStackTrace(); >> } catch (ParserException parseE) { >> parseE.printStackTrace(); >> } catch (IOException ioE) { >> ioE.printStackTrace(); >> } > > > Then I get a handle on that file by doing: > >> Registry registry = SystemRegistry.instance(); >> setSwissDatabase(registry.getDatabase("swissprot")) >> > > And I have a file in /etc that tells the registry how to find the indexes > with the swissprot identifier as per > http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html > > Ultimately, this gives me a class that implements the interface > SequenceDBLite, and when I query this interface for sequences it returns to > me Sequence objects. I can't seem to see anything that would give me a > RichSequence, so I think that I'll continue to get them in this manner, but > I'll convert the Sequence objects into RichSequence objects myself. > > Thanks for your attention! > > > On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland < > holland at eaglegenomics.com> wrote: > >> Hello. >> >> I'm not sure how you're getting your uniprot records out of your swissprot >> database, or what format your swissprot database is in? If it's BioSQL, then >> the way BioJava interacts with it has altered significantly with BioJavaX - >> previous versions basically stuffed everything in as comments, hence all the >> XX lines you got when writing it back out again. However if it's not BioSQL >> and you've written something custom of your own, then I couldn't really >> comment! >> >> BioJavaX will attempt to convert the old sequence objects into rich >> sequence objects, but there's not much in common between the way uniprot >> data is stored in the old object model and the new one. Therefore the enrich >> method can't do a very good job - especially for stuff which the original >> parser stored as comments instead of properly distributing it across the >> object model. Data which the original parser stored in this comment format >> will mostly get ignored by the conversion process, because the conversion >> process has no idea where the record came from and therefore what to do with >> the comments inside it. >> >> Your best bet is to read your data out of your database directly as rich >> sequence objects, or if not possible, then do the conversion manually. >> >> cheers, >> Richard >> >> >> 2008/10/17 Franklin Bristow >> >>> Hello everyone, >>> I've been doing some work with swissprot, and I've been needing to make >>> use >>> of the file reading and writing facilities in biojava. >>> >>> I was using biojava 1.5, but I've recently moved to using biojava-live so >>> that I can actually step through the code to see what's going on. >>> >>> I have successfully created an index of my swissprot database and I can >>> read >>> my sequences out of that indexed database. All of the appropriate >>> information is loaded from the records in the file into the appropriate >>> objects. I am quite happy with this. >>> >>> The problem that I am having has to do with writing swissprot records. >>> >>> When I started using biojava, the recommended way to do this was using >>> SeqIOTools: >>> SeqIOTools.writeSwissprot(byteStream, swissSequence); >>> >>> While this works (ie: no exceptions are thrown), the record that is >>> printed >>> to the byteStream looks pretty ugly (it's littered with XX lines) and is >>> not >>> valid as per the current swissprot file spec ( >>> http://www.expasy.ch/sprot/userman.html). While this record is invalid, >>> it >>> does contain all of the information that was originally in the swissprot >>> file. I would include what I get as an output here, but it's irrelevant. >>> >>> SeqIOTools became deprecated in favour of this: >>> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); >>> >>> Once again, while this works (and this time the record is valid), the >>> record >>> that is printed contains almost none of the original information that is >>> contained in the swissprot record. This is the output that I get when I >>> call this method (the spacing is may not look right because of fonts, but >>> that is not the problem): >>> >>> ID Q4UVA7_null STANDARD; 273 AA. >>> > AC Q4UVA7; >>> > DT null, integrated into UniProtKB/?. >>> > DT null, sequence version 0. >>> > DT null, entry version 0. >>> > DE null. >>> > FT any 1 273 >>> > FT any 153 160 >>> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >>> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >>> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >>> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >>> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >>> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >>> > // >>> > >>> >>> But what I am expecting to see looks like this (again, the spacing is the >>> fault of the font, not the output): >>> >>> > ID Y1953_XANC8 Reviewed; 273 AA. >>> > AC Q4UVA7; >>> > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. >>> > DT 05-JUL-2005, sequence version 1. >>> > DT 06-FEB-2007, entry version 12. >>> > DE UPF0085 protein XC_1953. >>> > GN OrderedLocusNames=XC_1953; >>> > OS Xanthomonas campestris pv. campestris (strain 8004). >>> > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; >>> > OC Xanthomonadaceae; Xanthomonas. >>> > OX NCBI_TaxID=314565; >>> > RN [1] >>> > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. >>> > RX PubMed=15899963; DOI=10.1101/gr.3378705; >>> > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun >>> Q., >>> > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., >>> > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen >>> B., >>> > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; >>> > RT "Comparative and functional genomic analyses of the pathogenicity >>> of >>> > RT phytopathogen Xanthomonas campestris pv. campestris."; >>> > RL Genome Res. 15:757-767(2005). >>> > CC -!- SIMILARITY: Belongs to the UPF0085 family. >>> > CC ------------------------------------------------------------ >>> > ----------- >>> > CC Copyrighted by the UniProt Consortium, see >>> > http://www.uniprot.org/terms >>> > CC Distributed under the Creative Commons Attribution-NoDerivs >>> License >>> > CC ------------------------------------------------------------ >>> > ----------- >>> > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. >>> > DR GenomeReviews; CP000050_GR; XC_1953. >>> > DR KEGG; xcb:XC_1953; -. >>> > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. >>> > DR HAMAP; MF_01062; -; 1. >>> > DR InterPro; IPR005177; DUF299. >>> > DR Pfam; PF03618; DUF299; 1. >>> > KW ATP-binding; Complete proteome; Nucleotide-binding. >>> > FT CHAIN 1 273 UPF0085 protein XC_1953. >>> > FT /FTId=PRO_0000196744. >>> > FT NP_BIND 153 160 ATP (Potential). >>> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >>> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >>> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >>> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >>> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >>> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >>> > // >>> > >>> >>> Needless to say, there is a considerable loss of information. >>> >>> At first I wasn't sure if this was a problem with parsing the database >>> that >>> I had, so I inspected the object that was retrieved from the database. >>> As I >>> mentioned before, the parsing seems to be working fine. I get a >>> SimpleSequence object that has all of the correct annotations and other >>> information loaded into it. >>> >>> I then continued to step through the writeUniProt method in >>> RichSequence.IOTools and found that this method first calls "enrich" on >>> SimpleSequence which turns it into a SimpleRichSequence. There appears >>> to >>> be some loss of information at this point, specifically in the feature >>> set >>> where the 'key name' is lost -- it just becomes 'any'. >>> >>> It is when we get to the actual process of writing to the stream in >>> UniprotFormat.writeSequence that we have the problems. All of the code >>> appears to be there for printing the information out that I'm expecting. >>> I >>> think the problem is that in the process of "enrich"-ing the sequence, >>> the >>> data is still stored in the object, but it is no longer where it is >>> expected >>> to be. For example, when we get to writing the comments out: >>> // comments - if any >>> if (!rs.getComments().isEmpty()) { >>> >>> The List of comments IS empty, but there are comments in the >>> SimpleRichSequence, they are stored in the notes data member. >>> >>> So. After this lengthy explanation of my problem, I am wondering if I am >>> merely not doing this correctly. Is there a better way to pass my >>> information to the writeUniprot method -- should I be transforming my >>> SimpleSequence objects into a SimpleRichSequence manually? Am I just >>> going >>> about this entirely the wrong way? >>> >>> If I am going about this correctly and the functionality to do this is >>> merely not there or hasn't been implemented correctly, I would be more >>> than >>> happy to help out... I can supply patches, create bug reports, or >>> anything >>> else that is necessary. >>> >>> Any guidance in this matter would be greatly appreciated! >>> >>> -- >>> Franklin >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > > > -- > Franklin > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From f.jossinet at ibmc.u-strasbg.fr Mon Oct 20 09:04:29 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Mon, 20 Oct 2008 15:04:29 +0200 Subject: [Biojava-dev] BioJava3 contribution Message-ID: Dear Richard, I'm answering to your "official call", to propose you my help for the development of the biojava3 code. With the modularity of Maven, I also would like to proposes you my help for the development of a module that will use the biojava3 code to manage more specialized RNA stuff (secondary and tertiary structures, base-pairs classifications, modified nucleotides, RNA alignments,....). What will be the next step for me? Will you make a selection? Best Regards Fabrice Jossinet -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From andreas at sdsc.edu Mon Oct 20 15:18:48 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 20 Oct 2008 12:18:48 -0700 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> Hi Fabrice, Regarding the tertiaty structure representation we should work together. There is a seet of tools available already in the current biojava 1.7 which I was intending to maintain and migrate to biojava v 3. Let me know if you have specific RNA related requests... Andreas On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland wrote: > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their intentions > here, as you just have, then basically it's first come first served. I won't > be doing any official supervision other than keeping an eye on committed > code once in a while to make sure it all looks OK. So feel free to start > coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in this > case. > 2. remove all the hidden .svn directories from the copy, > 3. update the pom.xml in the copy (do a search-and-replace on dna and change > to the new name, rna in this case), delete the existing source packages in > src/main/java (org.biojava.dna) and create suitable new ones > (org.biojava.rna in this case). > 4. empty out the target/ folder then svn add the new module > 5. svn:ignore the target/ directory in your new module, > 6. include your new module in the list at the end of the pom.xml in the root > directory of the biojava3 branch. > > cheers, > Richard > > > > 2008/10/20 Fabrice Jossinet > >> Dear Richard, >> >> I'm answering to your "official call", to propose you my help for the >> development of the biojava3 code. With the modularity of Maven, I also would >> like to proposes you my help for the development of a module that will use >> the biojava3 code to manage more specialized RNA stuff (secondary and >> tertiary structures, base-pairs classifications, modified nucleotides, RNA >> alignments,....). >> >> What will be the next step for me? Will you make a selection? >> >> Best Regards >> >> Fabrice Jossinet >> >> -- >> Dr. Fabrice Jossinet >> Laboratoire de Bioinformatique, modelisation et simulation des acides >> nucleiques >> Universite Louis Pasteur >> Institut de biologie moleculaire et cellulaire du CNRS >> UPR9002, Architecture et Reactivite de l'ARN >> 15 rue Rene Descartes >> F-67084 Strasbourg Cedex >> France >> >> Tel + 33 (0) 3 88 417053 >> FAX + 33 (0) 3 88 60 22 18 >> >> f.jossinet at ibmc.u-strasbg.fr >> fjossinet at gmail.com >> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >> http://fjossinet.u-strasbg.fr/ >> >> >> >> >> > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From fjossinet at orange.fr Mon Oct 20 16:40:26 2008 From: fjossinet at orange.fr (Fabrice Jossinet) Date: Mon, 20 Oct 2008 22:40:26 +0200 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> References: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> Message-ID: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr> Hi Andreas, yes of course, I really would like to work with you (I like your work with SPICE). I wanted to contact you about this point before to start. Concerning the tertiary structure representation, I need to annotate an RNA tertiary structure with base-pairs families (as described in http://www.ncbi.nlm.nih.gov/pubmed/12177293 or in http://prion.bchs.uh.edu/bp_type/ ) and structural motifs (like those listed in the SCOR database http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814) . The idea is to attach these features to a 3D in the same way than the features attached to a sequence (1D). What do you think? Fabrice Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit : > Hi Fabrice, > > Regarding the tertiaty structure representation we should work > together. There is a seet of tools available already in the current > biojava 1.7 which I was intending to maintain and migrate to biojava v > 3. Let me know if you have specific RNA related requests... > > Andreas > > On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland > wrote: >> Excellent! Thanks for your offer of help! >> >> Yes, an advanced RNA module would be very helpful indeed. You should >> probably call it 'rna'. >> >> As long as everyone who intends to work on BJ3 declares their >> intentions >> here, as you just have, then basically it's first come first >> served. I won't >> be doing any official supervision other than keeping an eye on >> committed >> code once in a while to make sure it all looks OK. So feel free to >> start >> coding straight away! >> >> All new modules should probably start by: >> >> 1. copying the existing dna module to something new, like 'rna' in >> this >> case. >> 2. remove all the hidden .svn directories from the copy, >> 3. update the pom.xml in the copy (do a search-and-replace on dna >> and change >> to the new name, rna in this case), delete the existing source >> packages in >> src/main/java (org.biojava.dna) and create suitable new ones >> (org.biojava.rna in this case). >> 4. empty out the target/ folder then svn add the new module >> 5. svn:ignore the target/ directory in your new module, >> 6. include your new module in the list at the end of the pom.xml in >> the root >> directory of the biojava3 branch. >> >> cheers, >> Richard >> >> >> >> 2008/10/20 Fabrice Jossinet >> >>> Dear Richard, >>> >>> I'm answering to your "official call", to propose you my help for >>> the >>> development of the biojava3 code. With the modularity of Maven, I >>> also would >>> like to proposes you my help for the development of a module that >>> will use >>> the biojava3 code to manage more specialized RNA stuff (secondary >>> and >>> tertiary structures, base-pairs classifications, modified >>> nucleotides, RNA >>> alignments,....). >>> >>> What will be the next step for me? Will you make a selection? >>> >>> Best Regards >>> >>> Fabrice Jossinet >>> >>> -- >>> Dr. Fabrice Jossinet >>> Laboratoire de Bioinformatique, modelisation et simulation des >>> acides >>> nucleiques >>> Universite Louis Pasteur >>> Institut de biologie moleculaire et cellulaire du CNRS >>> UPR9002, Architecture et Reactivite de l'ARN >>> 15 rue Rene Descartes >>> F-67084 Strasbourg Cedex >>> France >>> >>> Tel + 33 (0) 3 88 417053 >>> FAX + 33 (0) 3 88 60 22 18 >>> >>> f.jossinet at ibmc.u-strasbg.fr >>> fjossinet at gmail.com >>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >>> http://fjossinet.u-strasbg.fr/ >>> >>> >>> >>> >>> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From markjschreiber at gmail.com Mon Oct 20 22:54:27 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 21 Oct 2008 10:54:27 +0800 Subject: [Biojava-dev] Biojava / BioSQL entity beans Message-ID: <93b45ca50810201954k44ab0f65xb94a0214d8eb4e13@mail.gmail.com> Hi - Richard has kindly uploaded some JPA Entity beans that map to the BioSQL database schema as a BioSQL module for BJ3. These entity beans where generated as part of the Tokyo webservices workshop. As Entities they are useful as POJOs as well as data transfer via JPA, JAXB and can be used in EJB containers or a plain old JVM. The have no biological smarts and the intention was/is that these will be provided by wrapping them in Bio-aware (and more thread safe) wrappers that implement interfaces from other BJ3 modules. In essence it is a persistence layer. The following is copied verbatim from the package-info.java and gives you some idea of how I intend the package to be used (obviously some of this is still to come). There is also some discussion of some of the gotcha's that might trip you up when playing with object relational persistence. BTW the naming convention is to call something FooEntity. Where BioSQL requires a compound primary key this is implemented as an Embeddable object called FooEntityPK which is the key for FooEntity. The other thing you may see is FooEntityUK which is the same concept but represents some of the cases where BioSQL tables don't have a primary key (even a compound one) but implicitly they do because all the fields have the SQL unique restriction. In these cases JPA still requires an Embeddable key to track updates. As far as Java is concerned they are the same as a FooEntityPK but I used a different name to make the distinction. The annotations provide mapping to tables from a Derby database. This is the reference Java in memory DB which can run from any JVM and is also found in Glassfish. The mappings will likely also work with MySQL. For Oracle (and possibly others) you would need to override the @GeneratedValue strategy for generating primary keys. I believe this can be done with external XML config files. You may also wish to overide the default eager loading and cascade annotations depending on your JPA persistence method and preferences. This has been lightly tested using Glassfish, Derby and Toplink essentials and is a work in progress but seems to work OK. Best regards, - Mark /** * The package contains Entity representations of BioJava classes. * The purpose of these entities is to allow simple serialization of BioJava data * using binary serialization for protocols that require this (eg RPC between * Java application servers) as well as persistence mechanisms that require bean * like ojbects such as the Java Persistence Architechture (JPA) or the * Java API for XML Binding (JAXB). For this reason all objects in this package * should provide a parameterless public constructor and public get/set methods * for relevant fields. *

* Given the public nature of the constructors and the setters in these beans * these classes are not intended for direct use in general programming when * using the BioJava v3 API. This is because it is possible to leave the bean in * and inconsitent state and they are not thread safe unless synchronization * controlled externally (via synchornization blocks or via a application container). *

* The Entities are intended to back other objects that a * programer will interact with directly. For example Foo.class will be backed * by FooEntity.class. Generally interaction with Foo.class is to be prefered and * will often be more sensible as the entities typically provide no 'biological * behaivour'. Relevant behaivour should be provided by the wrapping class. It is best * to think of Foo as a view onto the data that is held in the * FooEntity. A good example is the sophisticated Symbol * behaivour that can represent biological logic about IUPAC ambiguity symbols. * For example a 'w' in a Biosequence represents an abiguity between 'a' and 't', * whereas a 'w' in BiosequenceEntity is simply a 'w' and nothing else. *

* The wrapper entity pattern is intended to allow for a lot of the advanced * behaivour in the original BioJava while also allowing use of modern transport * and persistence packages. This is achieved by peristing and transporting the * entity without the wrapper and re-wrapping it at the other end. *

* Currently BioJava v3 uses annotated @Id fields to define * equals(Object o). Consistent definition is critical to how * the object will behave when persisted to a database. In the case of: *

 * Foo f = ... initialize
 * Foo fo = ... initialize
 * boolean b = f.equals(fo);
 * 
* b would be true if both objects share the same value * (or embeddable object) in the field that represents the primary key in the * database even if all other fields are equal. This is desirable because * two entities representing the same DB record may be retreived from two different * sessions. Additionally these are the identity fields, so logically, they should map to * the concept of identity. Finally, searching a collection is made very simple * without requireing an iterator: *
 * Integer id = //code to initialize
 * collection.contains(new Foo(id));
 * 
* By default BioJava v3 entities use only the primary key field for equality * If either record has null as the primary key value it is never equal * to another. When implementing equals(Object o) it is not advisable to perform * the test this.getClass() == o.getClass() because of the possibility of proxy * classes used in JPA. This can, however, lead to an issue with the * hashcode() method. Consider the following code: *
 * Foo foo = new Foo() //no primary key
 * HashSet set = new HashSet();
 * set.add(foo);
 * // code here to persist Foo and consequently generate it's PK
 * boolean b = set.contains(foo);
 * 
* Because only the PK is used for equality, then the PK is used in the hashcode. * This means that b is probably going to be false because * it would have been stored in a hash bucket using the old hashcode that will * now be different even though the set actually does contain a pointer to foo. * Although a potential deficiency it is unlikely to be a major problem for * BioJava v3 developers because using entity backed objects is prefered to direct * interaction with entities. If you need to use entities directly then use hashed * collections with caution. * *

Wrapper classes can either delegate it's equals call to the underlying * entity or it can do something that is more biologically sensible * (as PK values are typically not exposed in the wrapper). It is probably more * sensible for a wrapper to define it's own equals (and haschode * implementations due to the limitations of the default @Id based system * described above. Especially the potential hashcode problems. * * For example FooSequence.class might want to base * equality on the exact match of the DNA sequence it holds even though * FooSequenceEntity.class may only use the PK field. If delegation * is used (or not) it should be clearly documented. *

* *

* @author Mark Schreiber */ package org.biojava.biosql.entity; From andreas at sdsc.edu Mon Oct 20 23:17:28 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 20 Oct 2008 20:17:28 -0700 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> Hi, Couple of thoughts regarding biojava v3: License: Since it seems we will end up copying code from biojava 1.6 to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e. people should still use the same biojava license headers when committing new files and all code will be considered to be LGPL, if no header is present. Do NOT commit code under other licenses. Installation: We need some installation instructions on the wiki site, e.g. how to get the maven setup running. What are the code conventions for the new version? Blast: the Blast parsing modules are among the most frequently used ones in biojava 1.6. To make people use biojava v3 it will be crucial to have a port of them to the new version. Does anybody want to take care of that? Automated builds: is it interesting to have automated builds set up for the new version at this stage, or should we wait until a more mature stage? I could easily add another auto-build similar to the one for biojava 1.6 at http://www.spice-3d.org/cruise/ Andreas On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland wrote: > Hi all, > > I've just committed some new code to the biojava3 branch of the biojava-live > subversion repository. It's the foundations of a brand new alphabet+symbol > set of classes, and an example of how to use them to represent DNA. You'll > notice that the new code is very lightweight and allows for a lot more > flexibility than the old code - for instance, the concept of Alphabet has > changed radically. It also makes much more extensive use of the Collections > API. > > I haven't got any test cases or usage examples yet but give me a shout if > you don't understand the code and I'll explain how it works. (Hint: > SymbolFormat is there to convert Strings into SymbolList objects, and vice > versa). > > So, now we want some volunteers! We're starting from scratch here so there's > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > whether it be copy-and-paste existing classes and modify them to suit the > new style, or write completely new ones to provide equivalent functionality. > > > I'll post an example of how to do file parsing soon, probably starting with > FASTA. In the meantime, a good place to start would be for people to design > object models to represent their favourite data types (e.g. Genbank, or > microarray data). Utility classes to manipulate those objects would be great > too. > > The object models need to be normalised as much as possible - e.g. if your > data has a lot of comments, and the order of those comments is important, > then give your object model a collection of comment objects. The object > model for each data type should be completely independent and use basic data > types wherever possible (e.g. store sequences as strings, don't attempt to > parse them into anything fancy like SymbolLists). The closer the object > model is to the original data format, the better. There's going to be clever > tricks when it comes to converting data between different object models > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > parsing examples up. > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > because we want to make it as modular as possible, so if you want to write > microarray stuff, create a new microarray sub-project (as per the dna > example that's already there). This way if someone only wants the microarray > bit of BJ3, they only need install the appropriate JAR file and can ignore > the rest. (The 'core' module is for stuff that is so generic it could be > used anywhere, or is used in every single other module.) > > If coding isn't your cup of tea, then we would very much welcome testers > (particularly those who enjoy writing test cases!), documenters > (particularly code commenters), translators (for internationalisation of the > code), and of course all those who wish to contribute ideas and suggestions > no matter how off-the-wall they might be. In particular if you'd like to > take charge of an area of the development process, e.g. Documentation Chief, > or Protein Champion, then that would be much appreciated. > > I'm very much looking forward to working with everyone on this. Good luck, > and happy coding! > > cheers, > Richard > > PS. Please don't forget to attach the appropriate licence to your code. You > can copy-and-paste it from the existing classes I just committed this > evening. > > PPS. For those who are worried about backwards compatibility - this was > discussed on the lists a while back and it was made clear that BJ3 is a > clean break. However, the existing code will continue to be maintained and > bugfixed for a couple of years so you don't have to upgrade if you don't > want to - it just won't have any new features developed for it. This is > largely because it'll probably take just that long to write all the new BJ3 > code. When we do decide to desupport the existing BJ code, plenty of notice > will be given (i.e. years as opposed to months). > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From fjossinet at orange.fr Tue Oct 21 03:09:46 2008 From: fjossinet at orange.fr (Fabrice Jossinet) Date: Tue, 21 Oct 2008 09:09:46 +0200 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: Hi Richard, I did everything but, with my IntelliJ IDE, I cannot commit the new rna module due to a failure in authentification. Do I have to register somewhere to have an account? (but perhaps it's a wrong configuration on my side) Fabrice Le 20 oct. 08 ? 15:51, Richard Holland a ?crit : > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their > intentions here, as you just have, then basically it's first come > first served. I won't be doing any official supervision other than > keeping an eye on committed code once in a while to make sure it all > looks OK. So feel free to start coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in > this case. > 2. remove all the hidden .svn directories from the copy, > 3. update the pom.xml in the copy (do a search-and-replace on dna > and change to the new name, rna in this case), delete the existing > source packages in src/main/java (org.biojava.dna) and create > suitable new ones (org.biojava.rna in this case). > 4. empty out the target/ folder then svn add the new module > 5. svn:ignore the target/ directory in your new module, > 6. include your new module in the list at the end of the pom.xml in > the root directory of the biojava3 branch. > > cheers, > Richard > > > > 2008/10/20 Fabrice Jossinet > Dear Richard, > > I'm answering to your "official call", to propose you my help for > the development of the biojava3 code. With the modularity of Maven, > I also would like to proposes you my help for the development of a > module that will use the biojava3 code to manage more specialized > RNA stuff (secondary and tertiary structures, base-pairs > classifications, modified nucleotides, RNA alignments,....). > > What will be the next step for me? Will you make a selection? > > Best Regards > > Fabrice Jossinet > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From holland at eaglegenomics.com Tue Oct 21 05:06:41 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 10:06:41 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> Message-ID: > > > License: Since it seems we will end up copying code from biojava 1.6 > to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e. > people should still use the same biojava license headers when > committing new files and all code will be considered to be LGPL, if no > header is present. Do NOT commit code under other licenses. > > Installation: We need some installation instructions on the wiki site, > e.g. how to get the maven setup running. What are the code > conventions for the new version? Not sure where best to put it in the Wiki, but I agree it needs to go there somewhere. Installation is a one-liner from within the top level of the project: mvn install This compiles and installs the JARs into your local Maven repository, and also downloads and installs any external dependencies. Then you can add the installed modules as dependencies in your own Maven projects. If you need to write a launcher script for your project, or you want to use the JAR files outside Maven, you can use this command to generate the CLASSPATH for use outside Maven. This only includes external dependencies - you'll also need to add to it the individual JAR files from inside the various target/ folders that Maven built for you: mvn dependency:build-classpath Code conventions are simple: 1. I'm not fussed about the specific formatter people use in each module, as long as the code is all formatted using some kind of consistent method. I personally just use the default settings from Format code in NetBeans. 2. Use 'this' wherever possible, and for static references, use the classname prefix (e.g. MyClass.staticField). I hate having to try and work out in my head which references are going where, and which are static and which are not! 3. Comment every single method, even if it's private. This helps understand the flow of your code. Also comment liberally inside methods if they are longer than just a few lines (i.e. if you can't fit the entire method within the code panel in NetBeans, its going to need internal comments). 4. When writing getters/setters, follow the Java beans conventions so that automated frameworks like Spring can easily pick it up and work with it. 5. Please write tests for your code using JUnit conventions, inside the test/ folder of each module. I know I haven't done this myself yet, but I'm going to! > > > Blast: the Blast parsing modules are among the most frequently used > ones in biojava 1.6. To make people use biojava v3 it will be crucial > to have a port of them to the new version. Does anybody want to take > care of that? I'll second that. Blast is vital. We'd really appreciate a volunteer, please! > > Automated builds: is it interesting to have automated builds set up > for the new version at this stage, or should we wait until a more > mature stage? I could easily add another auto-build similar to the one > for biojava 1.6 at http://www.spice-3d.org/cruise/ You could do, although I don't think they'd be much use yet. But why not start early then we won't forget to do it later. Richard > > Andreas > > On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland > wrote: > > Hi all, > > > > I've just committed some new code to the biojava3 branch of the > biojava-live > > subversion repository. It's the foundations of a brand new > alphabet+symbol > > set of classes, and an example of how to use them to represent DNA. > You'll > > notice that the new code is very lightweight and allows for a lot more > > flexibility than the old code - for instance, the concept of Alphabet has > > changed radically. It also makes much more extensive use of the > Collections > > API. > > > > I haven't got any test cases or usage examples yet but give me a shout if > > you don't understand the code and I'll explain how it works. (Hint: > > SymbolFormat is there to convert Strings into SymbolList objects, and > vice > > versa). > > > > So, now we want some volunteers! We're starting from scratch here so > there's > > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > > whether it be copy-and-paste existing classes and modify them to suit the > > new style, or write completely new ones to provide equivalent > functionality. > > > > > > I'll post an example of how to do file parsing soon, probably starting > with > > FASTA. In the meantime, a good place to start would be for people to > design > > object models to represent their favourite data types (e.g. Genbank, or > > microarray data). Utility classes to manipulate those objects would be > great > > too. > > > > The object models need to be normalised as much as possible - e.g. if > your > > data has a lot of comments, and the order of those comments is important, > > then give your object model a collection of comment objects. The object > > model for each data type should be completely independent and use basic > data > > types wherever possible (e.g. store sequences as strings, don't attempt > to > > parse them into anything fancy like SymbolLists). The closer the object > > model is to the original data format, the better. There's going to be > clever > > tricks when it comes to converting data between different object models > > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > > parsing examples up. > > > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > > because we want to make it as modular as possible, so if you want to > write > > microarray stuff, create a new microarray sub-project (as per the dna > > example that's already there). This way if someone only wants the > microarray > > bit of BJ3, they only need install the appropriate JAR file and can > ignore > > the rest. (The 'core' module is for stuff that is so generic it could be > > used anywhere, or is used in every single other module.) > > > > If coding isn't your cup of tea, then we would very much welcome testers > > (particularly those who enjoy writing test cases!), documenters > > (particularly code commenters), translators (for internationalisation of > the > > code), and of course all those who wish to contribute ideas and > suggestions > > no matter how off-the-wall they might be. In particular if you'd like to > > take charge of an area of the development process, e.g. Documentation > Chief, > > or Protein Champion, then that would be much appreciated. > > > > I'm very much looking forward to working with everyone on this. Good > luck, > > and happy coding! > > > > cheers, > > Richard > > > > PS. Please don't forget to attach the appropriate licence to your code. > You > > can copy-and-paste it from the existing classes I just committed this > > evening. > > > > PPS. For those who are worried about backwards compatibility - this was > > discussed on the lists a while back and it was made clear that BJ3 is a > > clean break. However, the existing code will continue to be maintained > and > > bugfixed for a couple of years so you don't have to upgrade if you don't > > want to - it just won't have any new features developed for it. This is > > largely because it'll probably take just that long to write all the new > BJ3 > > code. When we do decide to desupport the existing BJ code, plenty of > notice > > will be given (i.e. years as opposed to months). > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Tue Oct 21 05:09:26 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 10:09:26 +0100 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: Ah, yes. The person to talk to is Andreas. He has control over the SVN repository. 2008/10/21 Fabrice Jossinet > Hi Richard, > I did everything but, with my IntelliJ IDE, I cannot commit the new rna > module due to a failure in authentification. Do I have to register somewhere > to have an account? (but perhaps it's a wrong configuration on my side) > > Fabrice > > Le 20 oct. 08 ? 15:51, Richard Holland a ?crit : > > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their intentions > here, as you just have, then basically it's first come first served. I won't > be doing any official supervision other than keeping an eye on committed > code once in a while to make sure it all looks OK. So feel free to start > coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in this > case. > 2. remove all the hidden .svn directories from the copy, > 3. update the pom.xml in the copy (do a search-and-replace on dna and > change to the new name, rna in this case), delete the existing source > packages in src/main/java (org.biojava.dna) and create suitable new ones > (org.biojava.rna in this case). > 4. empty out the target/ folder then svn add the new module > 5. svn:ignore the target/ directory in your new module, > 6. include your new module in the list at the end of the pom.xml in the > root directory of the biojava3 branch. > > cheers, > Richard > > > > 2008/10/20 Fabrice Jossinet > >> Dear Richard, >> >> I'm answering to your "official call", to propose you my help for the >> development of the biojava3 code. With the modularity of Maven, I also would >> like to proposes you my help for the development of a module that will use >> the biojava3 code to manage more specialized RNA stuff (secondary and >> tertiary structures, base-pairs classifications, modified nucleotides, RNA >> alignments,....). >> >> What will be the next step for me? Will you make a selection? >> >> Best Regards >> >> Fabrice Jossinet >> >> -- >> Dr. Fabrice Jossinet >> Laboratoire de Bioinformatique, modelisation et simulation des acides >> nucleiques >> Universite Louis Pasteur >> Institut de biologie moleculaire et cellulaire du CNRS >> UPR9002, Architecture et Reactivite de l'ARN >> 15 rue Rene Descartes >> F-67084 Strasbourg Cedex >> France >> >> Tel + 33 (0) 3 88 417053 >> FAX + 33 (0) 3 88 60 22 18 >> >> f.jossinet at ibmc.u-strasbg.fr >> fjossinet at gmail.com >> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >> http://fjossinet.u-strasbg.fr/ >> >> >> >> >> > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > > > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Tue Oct 21 05:26:41 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 21 Oct 2008 17:26:41 +0800 Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please! In-Reply-To: References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> Message-ID: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com> >> Blast: the Blast parsing modules are among the most frequently used >> ones in biojava 1.6. To make people use biojava v3 it will be crucial >> to have a port of them to the new version. Does anybody want to take >> care of that? > > > I'll second that. Blast is vital. We'd really appreciate a volunteer, > please! > BlastXML output would certainly be the easiest place to start. I also think with the new Thing/ ThingBuilder framework it will be possible to develop all manner of parsers for the vagaries of Blast text output that come with each new release of Blast. Possible but maybe not a good idea. I don't think that output was ever supposed to be machine readable. The table formatted output (-m8 I think) would be a better option. Given the DTD it should be possible to do a quick JAXB binding. How would that work in the Thing/ ThingBuilder paradigm? - Mark From holland at eaglegenomics.com Tue Oct 21 06:18:40 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 11:18:40 +0100 Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please! In-Reply-To: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com> References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com> Message-ID: JAXB would follow the exact same Thing/ThingBuilder pattern, but with the following subtle differences... 0. Your root data model object as generated by JAXB should be modified to implement Thing, making it a JAXBThing. 1. JAXBReader (extends ThingReader) would open and read the file using JAXB and directly construct JAXBThings. 2. JAXBReceiver (extends ThingReceiver) be a pass-through interface with just one method, something like setJAXBThing() to pass in the already-parsed JAXBThing directly. 3. Any converters would expand/deflate data from other formats to/from the JAXBThing object directly. Richard. 2008/10/21 Mark Schreiber > >> Blast: the Blast parsing modules are among the most frequently used > >> ones in biojava 1.6. To make people use biojava v3 it will be crucial > >> to have a port of them to the new version. Does anybody want to take > >> care of that? > > > > > > I'll second that. Blast is vital. We'd really appreciate a volunteer, > > please! > > > > BlastXML output would certainly be the easiest place to start. I also > think with the new Thing/ ThingBuilder framework it will be possible > to develop all manner of parsers for the vagaries of Blast text output > that come with each new release of Blast. Possible but maybe not a > good idea. I don't think that output was ever supposed to be machine > readable. The table formatted output (-m8 I think) would be a better > option. > > Given the DTD it should be possible to do a quick JAXB binding. How > would that work in the Thing/ ThingBuilder paradigm? > > - Mark > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From dicknetherlands at gmail.com Tue Oct 21 07:14:29 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Tue, 21 Oct 2008 12:14:29 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> Message-ID: For now, yes it's empty. But I can envisage situations where it might be nice to have Thing implement some common methods (e.g. isMachineGenerated(), isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder for future expansion, than have to re-engineer everything should we identify a need for common functions in future. You'll see that Thing already extends Serializable, implying that all Things must be able to persist to an object backing store. Serializable itself is also an empty interface! Also I like the idea of having Thing, not Object, as a kind of marker of intention. To me it makes it clearer when reading code to avoid Object wherever possible. Thing may not be any more clever than Object, but it immediately declares an intention when reading code as to what kind of Object should be expected. 2008/10/21 Mark Schreiber > Is there any need for Thing at all? Can't a bulder be typed to produce > something that extends Object? > > If Thing provides no behaivour contract or meta-information then why > does it exist? > > - Mark > > On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: > > Depends on what you want to program. If you want to have a collection of > > objects which are Things & perform a common action on them then > > annotations are not the way forward. > > > > If you want to have some kind of meta-programming occurring & need a > > class to be multiple things then annotations are right. There is > > currently no way to enforce compile time dependencies on annotations & > > my thinking is that this is right. Annotations should be meta data or > > provide a way to alter a class in a non-invasive way (think Web Service > > annotations creating WS Servers & Clients without any alteration of the > > class). > > > > Andy > > > > Richard Holland wrote: > >> Spot on. > >> > >> Annotation/interface.... i think Annotation is probably better as you > >> suggest, but I'd have to look into that. Not sure how it works with > >> collections and generics. If it does turn out to be a better bet, I'll > >> change it over. > >> > >> With the BioSQL dependencies, take a look at the pom.xml file inside the > >> biojava-dna module. It declares a dependency on biojava-core. If you > want to > >> add dependencies to external JARs, take a look at biojava-biosql's > pom.xml > >> to see how it depends on javax.persistence. (The easiest way to add > these is > >> via an IDE such as NetBeans, which is what I'm using at the moment). > >> > >> cheers, > >> Richard > >> > >> 2008/10/21 Mark Schreiber > >> > >>> So if I want to build a BioSQL loader from Genbank then would the > >>> classes (or there wrappers) in the BioSQL Entity package need to > >>> implement Thing? Would maven have an issue with that or would it just > >>> create a dependency on core? (you can tell I've never used Maven > >>> right). > >>> > >>> From a design point of view should Thing be an interface or an > >>> Annotation? The reason I ask is that it doesn't define any methods so > >>> it is more of a tag than an interface. > >>> > >>> Anyway, my understanding is that I would use a Genbank parser (or > >>> write one). Write a EntityReceiver interface (probably more than one > >>> given the number of entities in BioSQL, implement a EntityBuilder > >>> (again possibly more than one) that implements EntityReceiver and > >>> builds Entity beans from messages it receives. In this case I probably > >>> wouldn't provide a writer as JPA would be writing the beans to the > >>> database. Would this be how you imagine it? > >>> > >>> - Mark > >>> > >>> > >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland > >>> wrote: > >>>> (From now on I will only be posting these development messages to > >>>> biojava-dev, which is the intended purpose of that list. Those of you > who > >>>> wish to keep track of things but are currently only subscribed to > >>> biojava-l > >>>> should also subscribe to biojava-dev in order to keep up to date.) > >>>> > >>>> As promised, I've committed a new package in the biojava-core module > that > >>>> should help understand how to do file parsing and conversion and > writing > >>> in > >>>> the new BJ3 modules. Here's an example of how to use it to write a > >>> Genbank > >>>> parser (note no parsers actually exist yet!): > >>>> > >>>> 1. Design yourself a Genbank class which implements the interface > Thing > >>> and > >>>> can fully represent all the data that might possibly occur inside a > >>> Genbank > >>>> file. > >>>> > >>>> 2. Write an interface called GenbankReceiver, which extends > ThingReceiver > >>>> and defines all the methods you might need in order to construct a > >>> Genbank > >>>> object in an asynchronous fashion. > >>>> > >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and > >>>> ThingBuilder. It's job is to receive data via method calls, use that > data > >>> to > >>>> construct a Genbank object, then provide that object on demand. > >>>> > >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and > >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of > >>>> constructing new Genbank objects, it writes Genbank records to file > that > >>>> reflect the data it receives. > >>>> > >>>> 5. Write a GenbankReader class which implements ThingReader. It can > read > >>>> GenbankFiles and output the data to the methods of the ThingReceiver > >>>> provided to it, which in this case could be anything which implements > the > >>>> interface GenbankReceiver. > >>>> > >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It > takes a > >>>> Genbank object and will fire off data from it to the provided > >>> ThingReceiver > >>>> (a GenbankReceiver instance) as if the Genbank object was being read > from > >>> a > >>>> file or some other source. > >>>> > >>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 > or > >>> 2, > >>>> but the additional steps are necessary for flexibility in converting > >>> between > >>>> formats. > >>>> > >>>> Now to use it (you'll probably want a GenbankTools class to wrap these > >>> steps > >>>> up for user-friendliness, including various options for opening files, > >>>> etc.): > >>>> > >>>> 1. To read a file - instantiate ThingParser with your GenbankReader as > >>> the > >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods > on > >>>> ThingParser to get the objects out. > >>>> > >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter > >>> wrapping > >>>> your Genbank object, and a GenbankWriter as the receiver. Use the > >>> parseAll() > >>>> method on the ThingParser to dump the whole lot to your chosen output. > >>>> > >>>> The clever bit comes when you want to convert between files. Imagine > >>> you've > >>>> done all the above for Genbank, and you've also done it for FASTA. How > to > >>>> convert between them? What you need to do is this: > >>>> > >>>> 1. Implement all the classes for both Genbank and FASTA. > >>>> > >>>> 2. Write a GenbankFASTAConverter class that implements > >>> ThingConverter > >>>> and GenbankReceiver, and will internally convert the data received and > >>> pass > >>>> it on out to the receiver provided, which will be a FASTAReceiver > >>> instance. > >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the > >>> opposite > >>>> way, implementing ThingConverter and FASTAReceiver. > >>>> > >>>> Then to convert you use ThingParser again: > >>>> > >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a > >>>> FASTAReader reader, a GenbankBuilder receiver, and add a > >>>> FASTAGenbankConverter instance to the converter chain. Use the > iterator > >>> to > >>>> get your Genbank objects out of your FASTA file. > >>>> > >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a > >>>> GenbankWriter instead and use parseAll() instead of the iterator > methos. > >>>> > >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide > a > >>>> FASTAEmitter wrapping your FASTA object as the reader instead. > >>>> > >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both > the > >>>> reader and the receiver as per options 2 and 3. > >>>> > >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all > >>> mentions > >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. > >>>> > >>>> One last and very important feature of this approach is that if you > >>> discover > >>>> that nobody has written the appropriate converter for your chosen pair > of > >>>> formats A and C, but converters do exist to map A to some other format > B > >>> and > >>>> that other format B on to C, then you can just put the two converts > A-B > >>> and > >>>> B-C into the ThingParser chain and it'll work perfectly. > >>>> > >>>> Enjoy! > >>>> > >>>> cheers, > >>>> Richard > >>>> > >>>> -- > >>>> Richard Holland, BSc MBCS > >>>> Finance Director, Eagle Genomics Ltd > >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >>>> http://www.eaglegenomics.com/ > >>>> _______________________________________________ > >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >> > >> > >> > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Tue Oct 21 07:24:13 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 21 Oct 2008 19:24:13 +0800 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> Message-ID: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> Depending on what you want them for isMachineGenerated(), isManuallyCurated(), would possibly be better as annotations (@MachineGenerated, @ManuallyCurated). This is true metadata. Probably if Java had annotations in version 1.1 Serializable would also be an Annotation. I would agree with the idea that ThingBuilder etc should be typed on extends Serializable. - Mark On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland wrote: > For now, yes it's empty. But I can envisage situations where it might be > nice to have Thing implement some common methods (e.g. isMachineGenerated(), > isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder > for future expansion, than have to re-engineer everything should we identify > a need for common functions in future. > > You'll see that Thing already extends Serializable, implying that all Things > must be able to persist to an object backing store. Serializable itself is > also an empty interface! > > Also I like the idea of having Thing, not Object, as a kind of marker of > intention. To me it makes it clearer when reading code to avoid Object > wherever possible. Thing may not be any more clever than Object, but it > immediately declares an intention when reading code as to what kind of > Object should be expected. > > > 2008/10/21 Mark Schreiber >> >> Is there any need for Thing at all? Can't a bulder be typed to produce >> something that extends Object? >> >> If Thing provides no behaivour contract or meta-information then why >> does it exist? >> >> - Mark >> >> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: >> > Depends on what you want to program. If you want to have a collection of >> > objects which are Things & perform a common action on them then >> > annotations are not the way forward. >> > >> > If you want to have some kind of meta-programming occurring & need a >> > class to be multiple things then annotations are right. There is >> > currently no way to enforce compile time dependencies on annotations & >> > my thinking is that this is right. Annotations should be meta data or >> > provide a way to alter a class in a non-invasive way (think Web Service >> > annotations creating WS Servers & Clients without any alteration of the >> > class). >> > >> > Andy >> > >> > Richard Holland wrote: >> >> Spot on. >> >> >> >> Annotation/interface.... i think Annotation is probably better as you >> >> suggest, but I'd have to look into that. Not sure how it works with >> >> collections and generics. If it does turn out to be a better bet, I'll >> >> change it over. >> >> >> >> With the BioSQL dependencies, take a look at the pom.xml file inside >> >> the >> >> biojava-dna module. It declares a dependency on biojava-core. If you >> >> want to >> >> add dependencies to external JARs, take a look at biojava-biosql's >> >> pom.xml >> >> to see how it depends on javax.persistence. (The easiest way to add >> >> these is >> >> via an IDE such as NetBeans, which is what I'm using at the moment). >> >> >> >> cheers, >> >> Richard >> >> >> >> 2008/10/21 Mark Schreiber >> >> >> >>> So if I want to build a BioSQL loader from Genbank then would the >> >>> classes (or there wrappers) in the BioSQL Entity package need to >> >>> implement Thing? Would maven have an issue with that or would it just >> >>> create a dependency on core? (you can tell I've never used Maven >> >>> right). >> >>> >> >>> From a design point of view should Thing be an interface or an >> >>> Annotation? The reason I ask is that it doesn't define any methods so >> >>> it is more of a tag than an interface. >> >>> >> >>> Anyway, my understanding is that I would use a Genbank parser (or >> >>> write one). Write a EntityReceiver interface (probably more than one >> >>> given the number of entities in BioSQL, implement a EntityBuilder >> >>> (again possibly more than one) that implements EntityReceiver and >> >>> builds Entity beans from messages it receives. In this case I probably >> >>> wouldn't provide a writer as JPA would be writing the beans to the >> >>> database. Would this be how you imagine it? >> >>> >> >>> - Mark >> >>> >> >>> >> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland >> >>> wrote: >> >>>> (From now on I will only be posting these development messages to >> >>>> biojava-dev, which is the intended purpose of that list. Those of you >> >>>> who >> >>>> wish to keep track of things but are currently only subscribed to >> >>> biojava-l >> >>>> should also subscribe to biojava-dev in order to keep up to date.) >> >>>> >> >>>> As promised, I've committed a new package in the biojava-core module >> >>>> that >> >>>> should help understand how to do file parsing and conversion and >> >>>> writing >> >>> in >> >>>> the new BJ3 modules. Here's an example of how to use it to write a >> >>> Genbank >> >>>> parser (note no parsers actually exist yet!): >> >>>> >> >>>> 1. Design yourself a Genbank class which implements the interface >> >>>> Thing >> >>> and >> >>>> can fully represent all the data that might possibly occur inside a >> >>> Genbank >> >>>> file. >> >>>> >> >>>> 2. Write an interface called GenbankReceiver, which extends >> >>>> ThingReceiver >> >>>> and defines all the methods you might need in order to construct a >> >>> Genbank >> >>>> object in an asynchronous fashion. >> >>>> >> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and >> >>>> ThingBuilder. It's job is to receive data via method calls, use that >> >>>> data >> >>> to >> >>>> construct a Genbank object, then provide that object on demand. >> >>>> >> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and >> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of >> >>>> constructing new Genbank objects, it writes Genbank records to file >> >>>> that >> >>>> reflect the data it receives. >> >>>> >> >>>> 5. Write a GenbankReader class which implements ThingReader. It can >> >>>> read >> >>>> GenbankFiles and output the data to the methods of the ThingReceiver >> >>>> provided to it, which in this case could be anything which implements >> >>>> the >> >>>> interface GenbankReceiver. >> >>>> >> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It >> >>>> takes a >> >>>> Genbank object and will fire off data from it to the provided >> >>> ThingReceiver >> >>>> (a GenbankReceiver instance) as if the Genbank object was being read >> >>>> from >> >>> a >> >>>> file or some other source. >> >>>> >> >>>> That's it! OK so it's a minimum of 6 classes instead of the original >> >>>> 1 or >> >>> 2, >> >>>> but the additional steps are necessary for flexibility in converting >> >>> between >> >>>> formats. >> >>>> >> >>>> Now to use it (you'll probably want a GenbankTools class to wrap >> >>>> these >> >>> steps >> >>>> up for user-friendliness, including various options for opening >> >>>> files, >> >>>> etc.): >> >>>> >> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader >> >>>> as >> >>> the >> >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods >> >>>> on >> >>>> ThingParser to get the objects out. >> >>>> >> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter >> >>> wrapping >> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the >> >>> parseAll() >> >>>> method on the ThingParser to dump the whole lot to your chosen >> >>>> output. >> >>>> >> >>>> The clever bit comes when you want to convert between files. Imagine >> >>> you've >> >>>> done all the above for Genbank, and you've also done it for FASTA. >> >>>> How to >> >>>> convert between them? What you need to do is this: >> >>>> >> >>>> 1. Implement all the classes for both Genbank and FASTA. >> >>>> >> >>>> 2. Write a GenbankFASTAConverter class that implements >> >>> ThingConverter >> >>>> and GenbankReceiver, and will internally convert the data received >> >>>> and >> >>> pass >> >>>> it on out to the receiver provided, which will be a FASTAReceiver >> >>> instance. >> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the >> >>> opposite >> >>>> way, implementing ThingConverter and FASTAReceiver. >> >>>> >> >>>> Then to convert you use ThingParser again: >> >>>> >> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a >> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a >> >>>> FASTAGenbankConverter instance to the converter chain. Use the >> >>>> iterator >> >>> to >> >>>> get your Genbank objects out of your FASTA file. >> >>>> >> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a >> >>>> GenbankWriter instead and use parseAll() instead of the iterator >> >>>> methos. >> >>>> >> >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide >> >>>> a >> >>>> FASTAEmitter wrapping your FASTA object as the reader instead. >> >>>> >> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both >> >>>> the >> >>>> reader and the receiver as per options 2 and 3. >> >>>> >> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all >> >>> mentions >> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. >> >>>> >> >>>> One last and very important feature of this approach is that if you >> >>> discover >> >>>> that nobody has written the appropriate converter for your chosen >> >>>> pair of >> >>>> formats A and C, but converters do exist to map A to some other >> >>>> format B >> >>> and >> >>>> that other format B on to C, then you can just put the two converts >> >>>> A-B >> >>> and >> >>>> B-C into the ThingParser chain and it'll work perfectly. >> >>>> >> >>>> Enjoy! >> >>>> >> >>>> cheers, >> >>>> Richard >> >>>> >> >>>> -- >> >>>> Richard Holland, BSc MBCS >> >>>> Finance Director, Eagle Genomics Ltd >> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com >> >>>> http://www.eaglegenomics.com/ >> >>>> _______________________________________________ >> >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >>>> >> >> >> >> >> >> >> > > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From andreas at sdsc.edu Tue Oct 21 07:31:40 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 21 Oct 2008 04:31:40 -0700 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr> References: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr> Message-ID: <59a41c430810210431v2a9e1647w6a6fca991926f175@mail.gmail.com> Hi Fabrice, The biojava 1 features could only accept integer positions as start and stop. For protein structures an amino acid is uniquely identified by a number and an insertion code. As such in the biojava 1 world it was not possible to implement this for the protein structures. If we have a cleaner interface definition for that in biojava 3 should be no prob. Andreas On Mon, Oct 20, 2008 at 1:40 PM, Fabrice Jossinet wrote: > Hi Andreas, > yes of course, I really would like to work with you (I like your work with > SPICE). I wanted to contact you about this point before to start. Concerning > the tertiary structure representation, I need to annotate an RNA tertiary > structure with base-pairs families (as described in > http://www.ncbi.nlm.nih.gov/pubmed/12177293 or in > http://prion.bchs.uh.edu/bp_type/ ) and structural motifs (like those listed > in the SCOR database > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814). The idea > is to attach these features to a 3D in the same way than the features > attached to a sequence (1D). > What do you think? > Fabrice > Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit : > > Hi Fabrice, > > Regarding the tertiaty structure representation we should work > together. There is a seet of tools available already in the current > biojava 1.7 which I was intending to maintain and migrate to biojava v > 3. Let me know if you have specific RNA related requests... > > Andreas > > On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland > wrote: > > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their intentions > > here, as you just have, then basically it's first come first served. I won't > > be doing any official supervision other than keeping an eye on committed > > code once in a while to make sure it all looks OK. So feel free to start > > coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in this > > case. > > 2. remove all the hidden .svn directories from the copy, > > 3. update the pom.xml in the copy (do a search-and-replace on dna and change > > to the new name, rna in this case), delete the existing source packages in > > src/main/java (org.biojava.dna) and create suitable new ones > > (org.biojava.rna in this case). > > 4. empty out the target/ folder then svn add the new module > > 5. svn:ignore the target/ directory in your new module, > > 6. include your new module in the list at the end of the pom.xml in the root > > directory of the biojava3 branch. > > cheers, > > Richard > > > > 2008/10/20 Fabrice Jossinet > > Dear Richard, > > I'm answering to your "official call", to propose you my help for the > > development of the biojava3 code. With the modularity of Maven, I also would > > like to proposes you my help for the development of a module that will use > > the biojava3 code to manage more specialized RNA stuff (secondary and > > tertiary structures, base-pairs classifications, modified nucleotides, RNA > > alignments,....). > > What will be the next step for me? Will you make a selection? > > Best Regards > > Fabrice Jossinet > > -- > > Dr. Fabrice Jossinet > > Laboratoire de Bioinformatique, modelisation et simulation des acides > > nucleiques > > Universite Louis Pasteur > > Institut de biologie moleculaire et cellulaire du CNRS > > UPR9002, Architecture et Reactivite de l'ARN > > 15 rue Rene Descartes > > F-67084 Strasbourg Cedex > > France > > Tel + 33 (0) 3 88 417053 > > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > > fjossinet at gmail.com > > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > > http://fjossinet.u-strasbg.fr/ > > > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > From holland at eaglegenomics.com Tue Oct 21 07:39:44 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 12:39:44 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> Message-ID: The two examples I gave would be better as annotations, its true. Serializable, and Cloneable for that matter, would definitely work better that way. Well, we could do away with Thing altogether then. I'll update the code. 2008/10/21 Mark Schreiber > Depending on what you want them for isMachineGenerated(), > isManuallyCurated(), would possibly be better as annotations > (@MachineGenerated, @ManuallyCurated). This is true metadata. > > Probably if Java had annotations in version 1.1 Serializable would > also be an Annotation. I would agree with the idea that ThingBuilder > etc should be typed on extends Serializable. > > - Mark > > On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland > wrote: > > For now, yes it's empty. But I can envisage situations where it might be > > nice to have Thing implement some common methods (e.g. > isMachineGenerated(), > > isManuallyCurated(), etc.). I'd rather have it there now to be a > placeholder > > for future expansion, than have to re-engineer everything should we > identify > > a need for common functions in future. > > > > You'll see that Thing already extends Serializable, implying that all > Things > > must be able to persist to an object backing store. Serializable itself > is > > also an empty interface! > > > > Also I like the idea of having Thing, not Object, as a kind of marker of > > intention. To me it makes it clearer when reading code to avoid Object > > wherever possible. Thing may not be any more clever than Object, but it > > immediately declares an intention when reading code as to what kind of > > Object should be expected. > > > > > > 2008/10/21 Mark Schreiber > >> > >> Is there any need for Thing at all? Can't a bulder be typed to produce > >> something that extends Object? > >> > >> If Thing provides no behaivour contract or meta-information then why > >> does it exist? > >> > >> - Mark > >> > >> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: > >> > Depends on what you want to program. If you want to have a collection > of > >> > objects which are Things & perform a common action on them then > >> > annotations are not the way forward. > >> > > >> > If you want to have some kind of meta-programming occurring & need a > >> > class to be multiple things then annotations are right. There is > >> > currently no way to enforce compile time dependencies on annotations & > >> > my thinking is that this is right. Annotations should be meta data or > >> > provide a way to alter a class in a non-invasive way (think Web > Service > >> > annotations creating WS Servers & Clients without any alteration of > the > >> > class). > >> > > >> > Andy > >> > > >> > Richard Holland wrote: > >> >> Spot on. > >> >> > >> >> Annotation/interface.... i think Annotation is probably better as you > >> >> suggest, but I'd have to look into that. Not sure how it works with > >> >> collections and generics. If it does turn out to be a better bet, > I'll > >> >> change it over. > >> >> > >> >> With the BioSQL dependencies, take a look at the pom.xml file inside > >> >> the > >> >> biojava-dna module. It declares a dependency on biojava-core. If you > >> >> want to > >> >> add dependencies to external JARs, take a look at biojava-biosql's > >> >> pom.xml > >> >> to see how it depends on javax.persistence. (The easiest way to add > >> >> these is > >> >> via an IDE such as NetBeans, which is what I'm using at the moment). > >> >> > >> >> cheers, > >> >> Richard > >> >> > >> >> 2008/10/21 Mark Schreiber > >> >> > >> >>> So if I want to build a BioSQL loader from Genbank then would the > >> >>> classes (or there wrappers) in the BioSQL Entity package need to > >> >>> implement Thing? Would maven have an issue with that or would it > just > >> >>> create a dependency on core? (you can tell I've never used Maven > >> >>> right). > >> >>> > >> >>> From a design point of view should Thing be an interface or an > >> >>> Annotation? The reason I ask is that it doesn't define any methods > so > >> >>> it is more of a tag than an interface. > >> >>> > >> >>> Anyway, my understanding is that I would use a Genbank parser (or > >> >>> write one). Write a EntityReceiver interface (probably more than one > >> >>> given the number of entities in BioSQL, implement a EntityBuilder > >> >>> (again possibly more than one) that implements EntityReceiver and > >> >>> builds Entity beans from messages it receives. In this case I > probably > >> >>> wouldn't provide a writer as JPA would be writing the beans to the > >> >>> database. Would this be how you imagine it? > >> >>> > >> >>> - Mark > >> >>> > >> >>> > >> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland > >> >>> wrote: > >> >>>> (From now on I will only be posting these development messages to > >> >>>> biojava-dev, which is the intended purpose of that list. Those of > you > >> >>>> who > >> >>>> wish to keep track of things but are currently only subscribed to > >> >>> biojava-l > >> >>>> should also subscribe to biojava-dev in order to keep up to date.) > >> >>>> > >> >>>> As promised, I've committed a new package in the biojava-core > module > >> >>>> that > >> >>>> should help understand how to do file parsing and conversion and > >> >>>> writing > >> >>> in > >> >>>> the new BJ3 modules. Here's an example of how to use it to write a > >> >>> Genbank > >> >>>> parser (note no parsers actually exist yet!): > >> >>>> > >> >>>> 1. Design yourself a Genbank class which implements the interface > >> >>>> Thing > >> >>> and > >> >>>> can fully represent all the data that might possibly occur inside a > >> >>> Genbank > >> >>>> file. > >> >>>> > >> >>>> 2. Write an interface called GenbankReceiver, which extends > >> >>>> ThingReceiver > >> >>>> and defines all the methods you might need in order to construct a > >> >>> Genbank > >> >>>> object in an asynchronous fashion. > >> >>>> > >> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver > and > >> >>>> ThingBuilder. It's job is to receive data via method calls, use > that > >> >>>> data > >> >>> to > >> >>>> construct a Genbank object, then provide that object on demand. > >> >>>> > >> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and > >> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of > >> >>>> constructing new Genbank objects, it writes Genbank records to file > >> >>>> that > >> >>>> reflect the data it receives. > >> >>>> > >> >>>> 5. Write a GenbankReader class which implements ThingReader. It can > >> >>>> read > >> >>>> GenbankFiles and output the data to the methods of the > ThingReceiver > >> >>>> provided to it, which in this case could be anything which > implements > >> >>>> the > >> >>>> interface GenbankReceiver. > >> >>>> > >> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It > >> >>>> takes a > >> >>>> Genbank object and will fire off data from it to the provided > >> >>> ThingReceiver > >> >>>> (a GenbankReceiver instance) as if the Genbank object was being > read > >> >>>> from > >> >>> a > >> >>>> file or some other source. > >> >>>> > >> >>>> That's it! OK so it's a minimum of 6 classes instead of the > original > >> >>>> 1 or > >> >>> 2, > >> >>>> but the additional steps are necessary for flexibility in > converting > >> >>> between > >> >>>> formats. > >> >>>> > >> >>>> Now to use it (you'll probably want a GenbankTools class to wrap > >> >>>> these > >> >>> steps > >> >>>> up for user-friendliness, including various options for opening > >> >>>> files, > >> >>>> etc.): > >> >>>> > >> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader > >> >>>> as > >> >>> the > >> >>>> reader, and GenbankBuilder as the receiver. Use the iterator > methods > >> >>>> on > >> >>>> ThingParser to get the objects out. > >> >>>> > >> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter > >> >>> wrapping > >> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the > >> >>> parseAll() > >> >>>> method on the ThingParser to dump the whole lot to your chosen > >> >>>> output. > >> >>>> > >> >>>> The clever bit comes when you want to convert between files. > Imagine > >> >>> you've > >> >>>> done all the above for Genbank, and you've also done it for FASTA. > >> >>>> How to > >> >>>> convert between them? What you need to do is this: > >> >>>> > >> >>>> 1. Implement all the classes for both Genbank and FASTA. > >> >>>> > >> >>>> 2. Write a GenbankFASTAConverter class that implements > >> >>> ThingConverter > >> >>>> and GenbankReceiver, and will internally convert the data received > >> >>>> and > >> >>> pass > >> >>>> it on out to the receiver provided, which will be a FASTAReceiver > >> >>> instance. > >> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the > >> >>> opposite > >> >>>> way, implementing ThingConverter and FASTAReceiver. > >> >>>> > >> >>>> Then to convert you use ThingParser again: > >> >>>> > >> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with > a > >> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a > >> >>>> FASTAGenbankConverter instance to the converter chain. Use the > >> >>>> iterator > >> >>> to > >> >>>> get your Genbank objects out of your FASTA file. > >> >>>> > >> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a > >> >>>> GenbankWriter instead and use parseAll() instead of the iterator > >> >>>> methos. > >> >>>> > >> >>>> 3. From FASTA object to Genbank object: Same as option 1, but > provide > >> >>>> a > >> >>>> FASTAEmitter wrapping your FASTA object as the reader instead. > >> >>>> > >> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap > both > >> >>>> the > >> >>>> reader and the receiver as per options 2 and 3. > >> >>>> > >> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all > >> >>> mentions > >> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. > >> >>>> > >> >>>> One last and very important feature of this approach is that if you > >> >>> discover > >> >>>> that nobody has written the appropriate converter for your chosen > >> >>>> pair of > >> >>>> formats A and C, but converters do exist to map A to some other > >> >>>> format B > >> >>> and > >> >>>> that other format B on to C, then you can just put the two converts > >> >>>> A-B > >> >>> and > >> >>>> B-C into the ThingParser chain and it'll work perfectly. > >> >>>> > >> >>>> Enjoy! > >> >>>> > >> >>>> cheers, > >> >>>> Richard > >> >>>> > >> >>>> -- > >> >>>> Richard Holland, BSc MBCS > >> >>>> Finance Director, Eagle Genomics Ltd > >> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >> >>>> http://www.eaglegenomics.com/ > >> >>>> _______________________________________________ > >> >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> >>>> > >> >> > >> >> > >> >> > >> > > > > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Tue Oct 21 10:32:45 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 21 Oct 2008 15:32:45 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> Message-ID: <48FDE80D.1040106@ebi.ac.uk> If "Thing" has gone then what impact does this have on remaining classes? Considering methods like canReadNextThing() & readNextThing(); should this be canReadNext() & readNext()? Just an idle thought .... Andy Richard Holland wrote: > The two examples I gave would be better as annotations, its true. > Serializable, and Cloneable for that matter, would definitely work better > that way. > > Well, we could do away with Thing altogether then. I'll update the code. > > > 2008/10/21 Mark Schreiber > >> Depending on what you want them for isMachineGenerated(), >> isManuallyCurated(), would possibly be better as annotations >> (@MachineGenerated, @ManuallyCurated). This is true metadata. >> >> Probably if Java had annotations in version 1.1 Serializable would >> also be an Annotation. I would agree with the idea that ThingBuilder >> etc should be typed on extends Serializable. >> >> - Mark >> >> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland >> wrote: >>> For now, yes it's empty. But I can envisage situations where it might be >>> nice to have Thing implement some common methods (e.g. >> isMachineGenerated(), >>> isManuallyCurated(), etc.). I'd rather have it there now to be a >> placeholder >>> for future expansion, than have to re-engineer everything should we >> identify >>> a need for common functions in future. >>> >>> You'll see that Thing already extends Serializable, implying that all >> Things >>> must be able to persist to an object backing store. Serializable itself >> is >>> also an empty interface! >>> >>> Also I like the idea of having Thing, not Object, as a kind of marker of >>> intention. To me it makes it clearer when reading code to avoid Object >>> wherever possible. Thing may not be any more clever than Object, but it >>> immediately declares an intention when reading code as to what kind of >>> Object should be expected. >>> >>> >>> 2008/10/21 Mark Schreiber >>>> Is there any need for Thing at all? Can't a bulder be typed to produce >>>> something that extends Object? >>>> >>>> If Thing provides no behaivour contract or meta-information then why >>>> does it exist? >>>> >>>> - Mark >>>> >>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: >>>>> Depends on what you want to program. If you want to have a collection >> of >>>>> objects which are Things & perform a common action on them then >>>>> annotations are not the way forward. >>>>> >>>>> If you want to have some kind of meta-programming occurring & need a >>>>> class to be multiple things then annotations are right. There is >>>>> currently no way to enforce compile time dependencies on annotations & >>>>> my thinking is that this is right. Annotations should be meta data or >>>>> provide a way to alter a class in a non-invasive way (think Web >> Service >>>>> annotations creating WS Servers & Clients without any alteration of >> the >>>>> class). >>>>> >>>>> Andy >>>>> >>>>> Richard Holland wrote: >>>>>> Spot on. >>>>>> >>>>>> Annotation/interface.... i think Annotation is probably better as you >>>>>> suggest, but I'd have to look into that. Not sure how it works with >>>>>> collections and generics. If it does turn out to be a better bet, >> I'll >>>>>> change it over. >>>>>> >>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside >>>>>> the >>>>>> biojava-dna module. It declares a dependency on biojava-core. If you >>>>>> want to >>>>>> add dependencies to external JARs, take a look at biojava-biosql's >>>>>> pom.xml >>>>>> to see how it depends on javax.persistence. (The easiest way to add >>>>>> these is >>>>>> via an IDE such as NetBeans, which is what I'm using at the moment). >>>>>> >>>>>> cheers, >>>>>> Richard >>>>>> >>>>>> 2008/10/21 Mark Schreiber >>>>>> >>>>>>> So if I want to build a BioSQL loader from Genbank then would the >>>>>>> classes (or there wrappers) in the BioSQL Entity package need to >>>>>>> implement Thing? Would maven have an issue with that or would it >> just >>>>>>> create a dependency on core? (you can tell I've never used Maven >>>>>>> right). >>>>>>> >>>>>>> From a design point of view should Thing be an interface or an >>>>>>> Annotation? The reason I ask is that it doesn't define any methods >> so >>>>>>> it is more of a tag than an interface. >>>>>>> >>>>>>> Anyway, my understanding is that I would use a Genbank parser (or >>>>>>> write one). Write a EntityReceiver interface (probably more than one >>>>>>> given the number of entities in BioSQL, implement a EntityBuilder >>>>>>> (again possibly more than one) that implements EntityReceiver and >>>>>>> builds Entity beans from messages it receives. In this case I >> probably >>>>>>> wouldn't provide a writer as JPA would be writing the beans to the >>>>>>> database. Would this be how you imagine it? >>>>>>> >>>>>>> - Mark >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland >>>>>>> wrote: >>>>>>>> (From now on I will only be posting these development messages to >>>>>>>> biojava-dev, which is the intended purpose of that list. Those of >> you >>>>>>>> who >>>>>>>> wish to keep track of things but are currently only subscribed to >>>>>>> biojava-l >>>>>>>> should also subscribe to biojava-dev in order to keep up to date.) >>>>>>>> >>>>>>>> As promised, I've committed a new package in the biojava-core >> module >>>>>>>> that >>>>>>>> should help understand how to do file parsing and conversion and >>>>>>>> writing >>>>>>> in >>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a >>>>>>> Genbank >>>>>>>> parser (note no parsers actually exist yet!): >>>>>>>> >>>>>>>> 1. Design yourself a Genbank class which implements the interface >>>>>>>> Thing >>>>>>> and >>>>>>>> can fully represent all the data that might possibly occur inside a >>>>>>> Genbank >>>>>>>> file. >>>>>>>> >>>>>>>> 2. Write an interface called GenbankReceiver, which extends >>>>>>>> ThingReceiver >>>>>>>> and defines all the methods you might need in order to construct a >>>>>>> Genbank >>>>>>>> object in an asynchronous fashion. >>>>>>>> >>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver >> and >>>>>>>> ThingBuilder. It's job is to receive data via method calls, use >> that >>>>>>>> data >>>>>>> to >>>>>>>> construct a Genbank object, then provide that object on demand. >>>>>>>> >>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and >>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of >>>>>>>> constructing new Genbank objects, it writes Genbank records to file >>>>>>>> that >>>>>>>> reflect the data it receives. >>>>>>>> >>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It can >>>>>>>> read >>>>>>>> GenbankFiles and output the data to the methods of the >> ThingReceiver >>>>>>>> provided to it, which in this case could be anything which >> implements >>>>>>>> the >>>>>>>> interface GenbankReceiver. >>>>>>>> >>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It >>>>>>>> takes a >>>>>>>> Genbank object and will fire off data from it to the provided >>>>>>> ThingReceiver >>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being >> read >>>>>>>> from >>>>>>> a >>>>>>>> file or some other source. >>>>>>>> >>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the >> original >>>>>>>> 1 or >>>>>>> 2, >>>>>>>> but the additional steps are necessary for flexibility in >> converting >>>>>>> between >>>>>>>> formats. >>>>>>>> >>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap >>>>>>>> these >>>>>>> steps >>>>>>>> up for user-friendliness, including various options for opening >>>>>>>> files, >>>>>>>> etc.): >>>>>>>> >>>>>>>> 1. To read a file - instantiate ThingParser with your GenbankReader >>>>>>>> as >>>>>>> the >>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator >> methods >>>>>>>> on >>>>>>>> ThingParser to get the objects out. >>>>>>>> >>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter >>>>>>> wrapping >>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the >>>>>>> parseAll() >>>>>>>> method on the ThingParser to dump the whole lot to your chosen >>>>>>>> output. >>>>>>>> >>>>>>>> The clever bit comes when you want to convert between files. >> Imagine >>>>>>> you've >>>>>>>> done all the above for Genbank, and you've also done it for FASTA. >>>>>>>> How to >>>>>>>> convert between them? What you need to do is this: >>>>>>>> >>>>>>>> 1. Implement all the classes for both Genbank and FASTA. >>>>>>>> >>>>>>>> 2. Write a GenbankFASTAConverter class that implements >>>>>>> ThingConverter >>>>>>>> and GenbankReceiver, and will internally convert the data received >>>>>>>> and >>>>>>> pass >>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver >>>>>>> instance. >>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the >>>>>>> opposite >>>>>>>> way, implementing ThingConverter and FASTAReceiver. >>>>>>>> >>>>>>>> Then to convert you use ThingParser again: >>>>>>>> >>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with >> a >>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a >>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the >>>>>>>> iterator >>>>>>> to >>>>>>>> get your Genbank objects out of your FASTA file. >>>>>>>> >>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a >>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator >>>>>>>> methos. >>>>>>>> >>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but >> provide >>>>>>>> a >>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead. >>>>>>>> >>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap >> both >>>>>>>> the >>>>>>>> reader and the receiver as per options 2 and 3. >>>>>>>> >>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all >>>>>>> mentions >>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. >>>>>>>> >>>>>>>> One last and very important feature of this approach is that if you >>>>>>> discover >>>>>>>> that nobody has written the appropriate converter for your chosen >>>>>>>> pair of >>>>>>>> formats A and C, but converters do exist to map A to some other >>>>>>>> format B >>>>>>> and >>>>>>>> that other format B on to C, then you can just put the two converts >>>>>>>> A-B >>>>>>> and >>>>>>>> B-C into the ThingParser chain and it'll work perfectly. >>>>>>>> >>>>>>>> Enjoy! >>>>>>>> >>>>>>>> cheers, >>>>>>>> Richard >>>>>>>> >>>>>>>> -- >>>>>>>> Richard Holland, BSc MBCS >>>>>>>> Finance Director, Eagle Genomics Ltd >>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com >>>>>>>> http://www.eaglegenomics.com/ >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>>> >>>>>> >>> >>> >>> -- >>> Richard Holland, BSc MBCS >>> Finance Director, Eagle Genomics Ltd >>> M: +44 7500 438846 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> > > > From holland at eaglegenomics.com Tue Oct 21 12:13:37 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 17:13:37 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: <48FDE80D.1040106@ebi.ac.uk> References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> <48FDE80D.1040106@ebi.ac.uk> Message-ID: Yup - why not. Feel free to go in and edit. :) 2008/10/21 Andy Yates > If "Thing" has gone then what impact does this have on remaining > classes? Considering methods like canReadNextThing() & readNextThing(); > should this be canReadNext() & readNext()? > > Just an idle thought .... > > Andy > > Richard Holland wrote: > > The two examples I gave would be better as annotations, its true. > > Serializable, and Cloneable for that matter, would definitely work better > > that way. > > > > Well, we could do away with Thing altogether then. I'll update the code. > > > > > > 2008/10/21 Mark Schreiber > > > >> Depending on what you want them for isMachineGenerated(), > >> isManuallyCurated(), would possibly be better as annotations > >> (@MachineGenerated, @ManuallyCurated). This is true metadata. > >> > >> Probably if Java had annotations in version 1.1 Serializable would > >> also be an Annotation. I would agree with the idea that ThingBuilder > >> etc should be typed on extends Serializable. > >> > >> - Mark > >> > >> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland > >> wrote: > >>> For now, yes it's empty. But I can envisage situations where it might > be > >>> nice to have Thing implement some common methods (e.g. > >> isMachineGenerated(), > >>> isManuallyCurated(), etc.). I'd rather have it there now to be a > >> placeholder > >>> for future expansion, than have to re-engineer everything should we > >> identify > >>> a need for common functions in future. > >>> > >>> You'll see that Thing already extends Serializable, implying that all > >> Things > >>> must be able to persist to an object backing store. Serializable itself > >> is > >>> also an empty interface! > >>> > >>> Also I like the idea of having Thing, not Object, as a kind of marker > of > >>> intention. To me it makes it clearer when reading code to avoid Object > >>> wherever possible. Thing may not be any more clever than Object, but it > >>> immediately declares an intention when reading code as to what kind of > >>> Object should be expected. > >>> > >>> > >>> 2008/10/21 Mark Schreiber > >>>> Is there any need for Thing at all? Can't a bulder be typed to produce > >>>> something that extends Object? > >>>> > >>>> If Thing provides no behaivour contract or meta-information then why > >>>> does it exist? > >>>> > >>>> - Mark > >>>> > >>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: > >>>>> Depends on what you want to program. If you want to have a collection > >> of > >>>>> objects which are Things & perform a common action on them then > >>>>> annotations are not the way forward. > >>>>> > >>>>> If you want to have some kind of meta-programming occurring & need a > >>>>> class to be multiple things then annotations are right. There is > >>>>> currently no way to enforce compile time dependencies on annotations > & > >>>>> my thinking is that this is right. Annotations should be meta data or > >>>>> provide a way to alter a class in a non-invasive way (think Web > >> Service > >>>>> annotations creating WS Servers & Clients without any alteration of > >> the > >>>>> class). > >>>>> > >>>>> Andy > >>>>> > >>>>> Richard Holland wrote: > >>>>>> Spot on. > >>>>>> > >>>>>> Annotation/interface.... i think Annotation is probably better as > you > >>>>>> suggest, but I'd have to look into that. Not sure how it works with > >>>>>> collections and generics. If it does turn out to be a better bet, > >> I'll > >>>>>> change it over. > >>>>>> > >>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside > >>>>>> the > >>>>>> biojava-dna module. It declares a dependency on biojava-core. If you > >>>>>> want to > >>>>>> add dependencies to external JARs, take a look at biojava-biosql's > >>>>>> pom.xml > >>>>>> to see how it depends on javax.persistence. (The easiest way to add > >>>>>> these is > >>>>>> via an IDE such as NetBeans, which is what I'm using at the moment). > >>>>>> > >>>>>> cheers, > >>>>>> Richard > >>>>>> > >>>>>> 2008/10/21 Mark Schreiber > >>>>>> > >>>>>>> So if I want to build a BioSQL loader from Genbank then would the > >>>>>>> classes (or there wrappers) in the BioSQL Entity package need to > >>>>>>> implement Thing? Would maven have an issue with that or would it > >> just > >>>>>>> create a dependency on core? (you can tell I've never used Maven > >>>>>>> right). > >>>>>>> > >>>>>>> From a design point of view should Thing be an interface or an > >>>>>>> Annotation? The reason I ask is that it doesn't define any methods > >> so > >>>>>>> it is more of a tag than an interface. > >>>>>>> > >>>>>>> Anyway, my understanding is that I would use a Genbank parser (or > >>>>>>> write one). Write a EntityReceiver interface (probably more than > one > >>>>>>> given the number of entities in BioSQL, implement a EntityBuilder > >>>>>>> (again possibly more than one) that implements EntityReceiver and > >>>>>>> builds Entity beans from messages it receives. In this case I > >> probably > >>>>>>> wouldn't provide a writer as JPA would be writing the beans to the > >>>>>>> database. Would this be how you imagine it? > >>>>>>> > >>>>>>> - Mark > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland > >>>>>>> wrote: > >>>>>>>> (From now on I will only be posting these development messages to > >>>>>>>> biojava-dev, which is the intended purpose of that list. Those of > >> you > >>>>>>>> who > >>>>>>>> wish to keep track of things but are currently only subscribed to > >>>>>>> biojava-l > >>>>>>>> should also subscribe to biojava-dev in order to keep up to date.) > >>>>>>>> > >>>>>>>> As promised, I've committed a new package in the biojava-core > >> module > >>>>>>>> that > >>>>>>>> should help understand how to do file parsing and conversion and > >>>>>>>> writing > >>>>>>> in > >>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a > >>>>>>> Genbank > >>>>>>>> parser (note no parsers actually exist yet!): > >>>>>>>> > >>>>>>>> 1. Design yourself a Genbank class which implements the interface > >>>>>>>> Thing > >>>>>>> and > >>>>>>>> can fully represent all the data that might possibly occur inside > a > >>>>>>> Genbank > >>>>>>>> file. > >>>>>>>> > >>>>>>>> 2. Write an interface called GenbankReceiver, which extends > >>>>>>>> ThingReceiver > >>>>>>>> and defines all the methods you might need in order to construct a > >>>>>>> Genbank > >>>>>>>> object in an asynchronous fashion. > >>>>>>>> > >>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver > >> and > >>>>>>>> ThingBuilder. It's job is to receive data via method calls, use > >> that > >>>>>>>> data > >>>>>>> to > >>>>>>>> construct a Genbank object, then provide that object on demand. > >>>>>>>> > >>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver > and > >>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of > >>>>>>>> constructing new Genbank objects, it writes Genbank records to > file > >>>>>>>> that > >>>>>>>> reflect the data it receives. > >>>>>>>> > >>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It > can > >>>>>>>> read > >>>>>>>> GenbankFiles and output the data to the methods of the > >> ThingReceiver > >>>>>>>> provided to it, which in this case could be anything which > >> implements > >>>>>>>> the > >>>>>>>> interface GenbankReceiver. > >>>>>>>> > >>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It > >>>>>>>> takes a > >>>>>>>> Genbank object and will fire off data from it to the provided > >>>>>>> ThingReceiver > >>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being > >> read > >>>>>>>> from > >>>>>>> a > >>>>>>>> file or some other source. > >>>>>>>> > >>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the > >> original > >>>>>>>> 1 or > >>>>>>> 2, > >>>>>>>> but the additional steps are necessary for flexibility in > >> converting > >>>>>>> between > >>>>>>>> formats. > >>>>>>>> > >>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap > >>>>>>>> these > >>>>>>> steps > >>>>>>>> up for user-friendliness, including various options for opening > >>>>>>>> files, > >>>>>>>> etc.): > >>>>>>>> > >>>>>>>> 1. To read a file - instantiate ThingParser with your > GenbankReader > >>>>>>>> as > >>>>>>> the > >>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator > >> methods > >>>>>>>> on > >>>>>>>> ThingParser to get the objects out. > >>>>>>>> > >>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter > >>>>>>> wrapping > >>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the > >>>>>>> parseAll() > >>>>>>>> method on the ThingParser to dump the whole lot to your chosen > >>>>>>>> output. > >>>>>>>> > >>>>>>>> The clever bit comes when you want to convert between files. > >> Imagine > >>>>>>> you've > >>>>>>>> done all the above for Genbank, and you've also done it for FASTA. > >>>>>>>> How to > >>>>>>>> convert between them? What you need to do is this: > >>>>>>>> > >>>>>>>> 1. Implement all the classes for both Genbank and FASTA. > >>>>>>>> > >>>>>>>> 2. Write a GenbankFASTAConverter class that implements > >>>>>>> ThingConverter > >>>>>>>> and GenbankReceiver, and will internally convert the data received > >>>>>>>> and > >>>>>>> pass > >>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver > >>>>>>> instance. > >>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly > the > >>>>>>> opposite > >>>>>>>> way, implementing ThingConverter and FASTAReceiver. > >>>>>>>> > >>>>>>>> Then to convert you use ThingParser again: > >>>>>>>> > >>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with > >> a > >>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a > >>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the > >>>>>>>> iterator > >>>>>>> to > >>>>>>>> get your Genbank objects out of your FASTA file. > >>>>>>>> > >>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide > a > >>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator > >>>>>>>> methos. > >>>>>>>> > >>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but > >> provide > >>>>>>>> a > >>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead. > >>>>>>>> > >>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap > >> both > >>>>>>>> the > >>>>>>>> reader and the receiver as per options 2 and 3. > >>>>>>>> > >>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all > >>>>>>> mentions > >>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. > >>>>>>>> > >>>>>>>> One last and very important feature of this approach is that if > you > >>>>>>> discover > >>>>>>>> that nobody has written the appropriate converter for your chosen > >>>>>>>> pair of > >>>>>>>> formats A and C, but converters do exist to map A to some other > >>>>>>>> format B > >>>>>>> and > >>>>>>>> that other format B on to C, then you can just put the two > converts > >>>>>>>> A-B > >>>>>>> and > >>>>>>>> B-C into the ThingParser chain and it'll work perfectly. > >>>>>>>> > >>>>>>>> Enjoy! > >>>>>>>> > >>>>>>>> cheers, > >>>>>>>> Richard > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Richard Holland, BSc MBCS > >>>>>>>> Finance Director, Eagle Genomics Ltd > >>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >>>>>>>> http://www.eaglegenomics.com/ > >>>>>>>> _______________________________________________ > >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>>>>>> > >>>>>> > >>>>>> > >>> > >>> > >>> -- > >>> Richard Holland, BSc MBCS > >>> Finance Director, Eagle Genomics Ltd > >>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >>> http://www.eaglegenomics.com/ > >>> > > > > > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From fjossinet at orange.fr Tue Oct 21 15:55:47 2008 From: fjossinet at orange.fr (Fabrice Jossinet) Date: Tue, 21 Oct 2008 21:55:47 +0200 Subject: [Biojava-dev] Biojava 3 and intermolecular features Message-ID: Hi all, When I used the previous releases of biojava, i had some problems to model inter-molecular features. For example interactions between two sequences/molecules in a tertiary structure or the interactions between two molecular partners in an interaction network. The feature should be the same, shared by (at least) 2 molecules but can be attached to different locations for each molecule. With the current biojava model, a feature is composed of one location for a given sequence. Consequently, for the development of my previous software, I decided to change a little bit the biojava paradigm. For example, to model an intermolecular interaction between the region 23-35 of mySeq1 and the region 34-46 of mySeq2 i have: Feature myFeature = new InterMolecularInteraction(); mySeq1.addAnnotation(new Annotation(myFeature, new Location("23-35"))); mySeq2.addAnnotation(new Annotation(myFeature, new Location("34-46"))); The Annotation concept links a feature to a location and is attached to a sequence (this concept has no relation with the Annotation concept proposed by Biojava). With this kind of model, I could also able to use the same concepts and strategy to model multiple alignments, which can also be seen as a kind of "inter-molecular relation". Is there any plan to model these kind of features in biojava3? If no, can my proposal be a good start ? Fabrice -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From heuermh at acm.org Thu Oct 23 01:12:07 2008 From: heuermh at acm.org (Michael Heuer) Date: Thu, 23 Oct 2008 01:12:07 -0400 (EDT) Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: Message-ID: Sorry, I'm a bit late to the game. Hope I didn't miss anything exciting yet! Would it be better to commit this to trunk, and put the current codebase out to pasture on a branch? Is it possible (or desireable) to send SVN commit messages to the dev mailing list? Or alternatively, should someone create a project entry for biojava on CIA.vc? http://cia.vc As soon as I can remember my dev.open-bio.org password I'll start committing stuff, otherwise I'll post patches to bugzilla. michael On Mon, 20 Oct 2008, Richard Holland wrote: > Hi all, > > I've just committed some new code to the biojava3 branch of the biojava-live > subversion repository. It's the foundations of a brand new alphabet+symbol > set of classes, and an example of how to use them to represent DNA. You'll > notice that the new code is very lightweight and allows for a lot more > flexibility than the old code - for instance, the concept of Alphabet has > changed radically. It also makes much more extensive use of the Collections > API. > > I haven't got any test cases or usage examples yet but give me a shout if > you don't understand the code and I'll explain how it works. (Hint: > SymbolFormat is there to convert Strings into SymbolList objects, and vice > versa). > > So, now we want some volunteers! We're starting from scratch here so there's > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > whether it be copy-and-paste existing classes and modify them to suit the > new style, or write completely new ones to provide equivalent functionality. > > > I'll post an example of how to do file parsing soon, probably starting with > FASTA. In the meantime, a good place to start would be for people to design > object models to represent their favourite data types (e.g. Genbank, or > microarray data). Utility classes to manipulate those objects would be great > too. > > The object models need to be normalised as much as possible - e.g. if your > data has a lot of comments, and the order of those comments is important, > then give your object model a collection of comment objects. The object > model for each data type should be completely independent and use basic data > types wherever possible (e.g. store sequences as strings, don't attempt to > parse them into anything fancy like SymbolLists). The closer the object > model is to the original data format, the better. There's going to be clever > tricks when it comes to converting data between different object models > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > parsing examples up. > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > because we want to make it as modular as possible, so if you want to write > microarray stuff, create a new microarray sub-project (as per the dna > example that's already there). This way if someone only wants the microarray > bit of BJ3, they only need install the appropriate JAR file and can ignore > the rest. (The 'core' module is for stuff that is so generic it could be > used anywhere, or is used in every single other module.) > > If coding isn't your cup of tea, then we would very much welcome testers > (particularly those who enjoy writing test cases!), documenters > (particularly code commenters), translators (for internationalisation of the > code), and of course all those who wish to contribute ideas and suggestions > no matter how off-the-wall they might be. In particular if you'd like to > take charge of an area of the development process, e.g. Documentation Chief, > or Protein Champion, then that would be much appreciated. > > I'm very much looking forward to working with everyone on this. Good luck, > and happy coding! > > cheers, > Richard > > PS. Please don't forget to attach the appropriate licence to your code. You > can copy-and-paste it from the existing classes I just committed this > evening. > > PPS. For those who are worried about backwards compatibility - this was > discussed on the lists a while back and it was made clear that BJ3 is a > clean break. However, the existing code will continue to be maintained and > bugfixed for a couple of years so you don't have to upgrade if you don't > want to - it just won't have any new features developed for it. This is > largely because it'll probably take just that long to write all the new BJ3 > code. When we do decide to desupport the existing BJ code, plenty of notice > will be given (i.e. years as opposed to months). > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Thu Oct 23 02:04:23 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 23 Oct 2008 07:04:23 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: > > > Would it be better to commit this to trunk, and put the current codebase > out to pasture on a branch? Andreas is Mr.SVN. Andreas, what do you think? > > Is it possible (or desireable) to send SVN commit messages to the dev > mailing list? Or alternatively, should someone create a project entry for > biojava on CIA.vc? > > http://cia.vc I think commit messages to biojava-dev would be very useful. If nothing else, it provides a good indicator of activity to casual observers, and also lets people keep an automated eye (by mail filtering) on commits in the areas that interest them most. > > As soon as I can remember my dev.open-bio.org password I'll start > committing stuff, otherwise I'll post patches to bugzilla. If you've forgotten it, let support at OBF know and they'll reset it for you. cheers, Richard > > > michael > > > On Mon, 20 Oct 2008, Richard Holland wrote: > > > Hi all, > > > > I've just committed some new code to the biojava3 branch of the > biojava-live > > subversion repository. It's the foundations of a brand new > alphabet+symbol > > set of classes, and an example of how to use them to represent DNA. > You'll > > notice that the new code is very lightweight and allows for a lot more > > flexibility than the old code - for instance, the concept of Alphabet has > > changed radically. It also makes much more extensive use of the > Collections > > API. > > > > I haven't got any test cases or usage examples yet but give me a shout if > > you don't understand the code and I'll explain how it works. (Hint: > > SymbolFormat is there to convert Strings into SymbolList objects, and > vice > > versa). > > > > So, now we want some volunteers! We're starting from scratch here so > there's > > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > > whether it be copy-and-paste existing classes and modify them to suit the > > new style, or write completely new ones to provide equivalent > functionality. > > > > > > I'll post an example of how to do file parsing soon, probably starting > with > > FASTA. In the meantime, a good place to start would be for people to > design > > object models to represent their favourite data types (e.g. Genbank, or > > microarray data). Utility classes to manipulate those objects would be > great > > too. > > > > The object models need to be normalised as much as possible - e.g. if > your > > data has a lot of comments, and the order of those comments is important, > > then give your object model a collection of comment objects. The object > > model for each data type should be completely independent and use basic > data > > types wherever possible (e.g. store sequences as strings, don't attempt > to > > parse them into anything fancy like SymbolLists). The closer the object > > model is to the original data format, the better. There's going to be > clever > > tricks when it comes to converting data between different object models > > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > > parsing examples up. > > > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > > because we want to make it as modular as possible, so if you want to > write > > microarray stuff, create a new microarray sub-project (as per the dna > > example that's already there). This way if someone only wants the > microarray > > bit of BJ3, they only need install the appropriate JAR file and can > ignore > > the rest. (The 'core' module is for stuff that is so generic it could be > > used anywhere, or is used in every single other module.) > > > > If coding isn't your cup of tea, then we would very much welcome testers > > (particularly those who enjoy writing test cases!), documenters > > (particularly code commenters), translators (for internationalisation of > the > > code), and of course all those who wish to contribute ideas and > suggestions > > no matter how off-the-wall they might be. In particular if you'd like to > > take charge of an area of the development process, e.g. Documentation > Chief, > > or Protein Champion, then that would be much appreciated. > > > > I'm very much looking forward to working with everyone on this. Good > luck, > > and happy coding! > > > > cheers, > > Richard > > > > PS. Please don't forget to attach the appropriate licence to your code. > You > > can copy-and-paste it from the existing classes I just committed this > > evening. > > > > PPS. For those who are worried about backwards compatibility - this was > > discussed on the lists a while back and it was made clear that BJ3 is a > > clean break. However, the existing code will continue to be maintained > and > > bugfixed for a couple of years so you don't have to upgrade if you don't > > want to - it just won't have any new features developed for it. This is > > largely because it'll probably take just that long to write all the new > BJ3 > > code. When we do decide to desupport the existing BJ code, plenty of > notice > > will be given (i.e. years as opposed to months). > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ch.koeberle at googlemail.com Thu Oct 23 04:58:15 2008 From: ch.koeberle at googlemail.com (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Thu, 23 Oct 2008 10:58:15 +0200 Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship Message-ID: Hi, I found a bug in the postgre mapping file for BioEntryRelationship. line: The value for the attribute class has to be "BioEntry" For the BioEntry I miss methodes to have access to subject_bioentry BioEntryRelationship. I think the BioEntryRelationship. is a parent child relationship. So it will be nice to have access to both. Furthermore the hibernate mapping strategies for the BioSQL is quite slow and produces a lot of queries to the database. Because for all lists and set the lazy fetch mode is disable. In this mode hibernate will execute one query for each element in a list or set. The faster way is to enable the lazy fetch mode an use methods to load the list. Each of these methods executes only one query. For excample: public List getParents(BioEntry bioEntry){ String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object =:subject"; Query query = session.createQuery(stmt); query.setParameter("subject", bioEntry); return query.list(); } This is factor 2 to 4 faster than the methode BioEntry..getRelationships() In case of all dependences of an BioEntry-Object an select with lazy fetching can be 500 times faster than a select with eager fetching (in case of unigene cluster Hs.4 for example). Here a example for the relationship unigene cluster Hs.2 and the gene BC067218 (we use BioSQL to store Unigene) getParents(): runtime: 14 msec SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_, bioentry1_.name as name89_, bioentry1_.identifier as identifier89_, bioentry1_.accession as accession89_, bioentry1_.description as descript5_89_, bioentry1_.version as version89_, bioentry1_.division as division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id as biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as seq93_, case when bioentry1_1_.bioentry_id is not null then 2 when bioentry1_.bioentry_id is not null then 0 end as clazz_ from unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left outer join unigene.biosequence bioentry1_1_ on bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer joinunigene.biosequence bioentry1_2_ on bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where bioentryre0_.object_bioentry_id=? bioEntry.getRelationships(): runtime: 36 msec SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_, bioentry0_.name as name89_, bioentry0_.identifier as identifier89_, bioentry0_.accession as accession89_, bioentry0_.description as descript5_89_, bioentry0_.version as version89_, bioentry0_.division as division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id as biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as seq93_, case when bioentry0_1_.bioentry_id is not null then 2 when bioentry0_.bioentry_id is not null then 0 end as clazz_ from unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_ on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join unigene.biosequence bioentry0_2_ on bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=? Hibernate: select relationsh0_.object_bioentry_id as object3_1_, relationsh0_.bioentry_relationship_id as bioentry1_1_, relationsh0_.bioentry_relationship_id as bioentry1_95_0_, relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_, relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship relationsh0_ where relationsh0_.object_bioentry_id=? Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, namespace0_.description as descript4_80_0_ from unigene.biodatabase namespace0_ where namespace0_.biodatabase_id=? Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, bioentry0_.name as name89_0_, bioentry0_.identifier as identifier89_0_, bioentry0_.accession as accession89_0_, bioentry0_.description as descript5_89_0_, bioentry0_.version as version89_0_, bioentry0_.division as division89_0_, bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_ on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join unigene.biosequence bioentry0_2_ on bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.bioentry_id=? Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, namespace0_.description as descript4_80_0_ from unigene.biodatabase namespace0_ where namespace0_.biodatabase_id=? Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_, term0_.identifier as identifier84_0_, term0_.definition as definition84_0_, term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from unigene.term term0_ where term0_.term_id=? Hibernate: select ontology0_.ontology_id as ontology1_83_0_, ontology0_.name as name83_0_, ontology0_.definition as definition83_0_ from unigene.ontology ontology0_ where ontology0_.ontology_id=? Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_, termset0_.identifier as identifier84_0_, termset0_.definition as definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id as ontology6_84_0_ from unigene.term termset0_ where termset0_.ontology_id=? Hibernate: select tripleset0_.ontology_id as ontology5_1_, tripleset0_.term_relationship_id as term1_1_, tripleset0_.term_relationship_id as term1_87_0_, tripleset0_.subject_term_id as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_, tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id as ontology5_87_0_ from unigene.term_relationship tripleset0_ where tripleset0_.ontology_id=? Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref rankedcros0_ where rankedcros0_.term_id=? Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as synonym0_ from unigene.term_synonym synonymset0_ where synonymset0_.term_id=? -- Christian K?berle From dicknetherlands at gmail.com Thu Oct 23 05:45:53 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 23 Oct 2008 10:45:53 +0100 Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship In-Reply-To: References: Message-ID: Christian, Thanks for your comments. I'm not sure which file you're referring to, or what version of BioJava you have, as the line you quote does not appear in any of the current hbm.xml files in the trunk of SubVersion. Also, the BioEntryRelationship interface and it's implementations do already have getSubject() and getObject() methods which return the parent and child BioEntry instances. The BioEntry interface itself has a getBioEntryRelationships() method which returns all relationships in which it is the object BioEntry. You could use HQL to obtain those for which it is the subject, but you are right that it would be good to have a method that returns the latter. Could you raise a BugZilla request for this? It would be good if you could do some thorough testing of your lazy loading suggestions on some other use cases before we decide whether or not to adopt that approach in future developments. Use cases would include: 1. have a very large database with thousands of related records in it (e.g. load the whole of GenBank). Iterate over all the records in the database and perform a simple read operation on each that hits the modified methods. See if you run out of memory. 2. like 1, but perform a series of repeated read/write operations using the modified methods, with a final commit to attempt to write the results back to see if they still persist correctly. The reason is that the modified methods might cause problems with those people who are processing large volumes of data in their databases. If all related records are loaded at once, even only on demand, instead of one at a time, it will cause memory issues. The trade off is therefore memory vs. speed. We opted for the memory option because it makes life easier for most novice coders to not have to trace out-of-memory exceptions (although they will still occur using the existing methods, but it happens less often). Also, your method reruns the query every time it is called. It probably should cache the results after the first call, to prevent objects being reloaded unnecessarily, and to prevent problems with objects from a previous call being modified then attempted to be overwritten by a subsequent call? Also if Hibernate does not receive the same set back that it auto-loaded as a property via the default get() method when it comes to save the object, it will throw a wobbly and refuse to commit. cheers, Richard 2008/10/23 Christian K?berle > Hi, > I found a bug in the postgre mapping file for BioEntryRelationship. > line: > not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId" > embed-xml="false"/> > The value for the attribute class has to be "BioEntry" > > For the BioEntry I miss methodes to have access to subject_bioentry > BioEntryRelationship. I think the BioEntryRelationship. is a parent child > relationship. So it will be nice to have access to both. > > Furthermore the hibernate mapping strategies for the BioSQL is quite slow > and produces a lot of queries to the database. Because for all lists and > set > the lazy fetch mode is disable. In this mode hibernate will execute one > query for each element in a list or set. The faster way is to enable the > lazy fetch mode an use methods to load the list. Each of these methods > executes only one query. > For excample: > > public List getParents(BioEntry bioEntry){ > > String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object > =:subject"; > Query query = session.createQuery(stmt); > query.setParameter("subject", bioEntry); > return query.list(); > > } > > > This is factor 2 to 4 faster than the methode BioEntry..getRelationships() > In case of all dependences of an BioEntry-Object an select with lazy > fetching can be 500 times faster than a select with eager fetching (in case > of unigene cluster Hs.4 for example). > Here a example for the relationship unigene cluster Hs.2 and the gene > BC067218 (we use BioSQL to store Unigene) > > getParents(): > runtime: 14 msec > SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_, > bioentry1_.name as name89_, bioentry1_.identifier as identifier89_, > bioentry1_.accession as accession89_, bioentry1_.description as > descript5_89_, bioentry1_.version as version89_, bioentry1_.division as > division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id > as > biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as > length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as > seq93_, > case when bioentry1_1_.bioentry_id is not null then 2 when > bioentry1_.bioentry_id is not null then 0 end as clazz_ from > unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry > bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left > outer join unigene.biosequence bioentry1_1_ on > bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer > joinunigene.biosequence bioentry1_2_ on > bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where > bioentryre0_.object_bioentry_id=? > > > bioEntry.getRelationships(): > runtime: 36 msec > SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_, > bioentry0_.name as name89_, bioentry0_.identifier as identifier89_, > bioentry0_.accession as accession89_, bioentry0_.description as > descript5_89_, bioentry0_.version as version89_, bioentry0_.division as > division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id > as > biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as > length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as > seq93_, > case when bioentry0_1_.bioentry_id is not null then 2 when > bioentry0_.bioentry_id is not null then 0 end as clazz_ from > unigene.bioentry bioentry0_ left outer join unigene.biosequence > bioentry0_1_ > on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join > unigene.biosequence bioentry0_2_ on > bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=? > Hibernate: select relationsh0_.object_bioentry_id as object3_1_, > relationsh0_.bioentry_relationship_id as bioentry1_1_, > relationsh0_.bioentry_relationship_id as bioentry1_95_0_, > relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as > object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_, > relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship > relationsh0_ where relationsh0_.object_bioentry_id=? > Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, > namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, > namespace0_.description as descript4_80_0_ from unigene.biodatabase > namespace0_ where namespace0_.biodatabase_id=? > Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, > bioentry0_.name > as name89_0_, bioentry0_.identifier as identifier89_0_, > bioentry0_.accession > as accession89_0_, bioentry0_.description as descript5_89_0_, > bioentry0_.version as version89_0_, bioentry0_.division as division89_0_, > bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as > biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length > as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as > seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when > bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from > unigene.bioentry bioentry0_ left outer join unigene.biosequence > bioentry0_1_ > on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join > unigene.biosequence bioentry0_2_ on > bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where > bioentry0_.bioentry_id=? > Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, > namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, > namespace0_.description as descript4_80_0_ from unigene.biodatabase > namespace0_ where namespace0_.biodatabase_id=? > Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_, > term0_.identifier as identifier84_0_, term0_.definition as definition84_0_, > term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from > unigene.term term0_ where term0_.term_id=? > Hibernate: select ontology0_.ontology_id as ontology1_83_0_, > ontology0_.name > as name83_0_, ontology0_.definition as definition83_0_ from > unigene.ontology > ontology0_ where ontology0_.ontology_id=? > Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id > as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_, > termset0_.identifier as identifier84_0_, termset0_.definition as > definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id > as ontology6_84_0_ from unigene.term termset0_ where > termset0_.ontology_id=? > Hibernate: select tripleset0_.ontology_id as ontology5_1_, > tripleset0_.term_relationship_id as term1_1_, > tripleset0_.term_relationship_id as term1_87_0_, > tripleset0_.subject_term_id > as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_, > tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id > as ontology5_87_0_ from unigene.term_relationship tripleset0_ where > tripleset0_.ontology_id=? > Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id > as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref > rankedcros0_ where rankedcros0_.term_id=? > Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as > synonym0_ from unigene.term_synonym synonymset0_ where > synonymset0_.term_id=? > > -- > Christian K?berle > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From bugzilla-daemon at portal.open-bio.org Thu Oct 23 09:16:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 23 Oct 2008 09:16:43 -0400 Subject: [Biojava-dev] [Bug 2625] New: Parent Child Relationship of BioEntry via BioEntryRelationship Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2625 Summary: Parent Child Relationship of BioEntry via BioEntryRelationship Product: BioJava Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: DB / BioSQL AssignedTo: biojava-dev at biojava.org ReportedBy: ch.koeberle at googlemail.com An BioEntry-Object has only the methode getRelationships(), these method gives all BioEntryRelationship-Objkcts where the BioEntry-Object is the result of BioEntryRelationship.getObject() . Because the in the BioEntry.hbm.xml is only these mapping: I miss somethings like this: BioEntry.getReverseRelationships() (or getChilds()) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andreas at sdsc.edu Thu Oct 23 09:57:41 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 06:57:41 -0700 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: <59a41c430810230657p73b5d10kbf497c20fdfbe893@mail.gmail.com> >> Would it be better to commit this to trunk, and put the current codebase >> out to pasture on a branch? At the moment we have a number of unreleased bug fixes in biojava-live/trunk . Also if somebody would start using BJ at the present I would still recommend to use 1.6. As such I would say for the moment let's leave it the way it is. Once we reach alpha stage we could release a final biojava 1.7 and afterwards switch the branches in svn. About the commit messages sent to this list: can we make this a once per day? I can also set something up as part of cruise control... Andreas From andreas at sdsc.edu Thu Oct 23 13:24:27 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 10:24:27 -0700 Subject: [Biojava-dev] svn write access In-Reply-To: <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr> References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr> <59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com> <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr> Message-ID: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com> Hi Fabrice, in order to obtain a developer checkout you have to follow the procedure as it is described on http://biojava.org/wiki/CVS_to_SVN_Migration under the section Developer checkout code.open-bio is a read only copy of the SVN repository for anonymous checkout. The "real" developer repository is on the dev.open-bio machine and you can only access it via ssh. This setup is for security reasons. code.open-bio and dev.open-bio are getting synchronized approx ev. 20 min. Andreas On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet wrote: > Ok, I did that with the "code.open-bio.org" server and like that: > > svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3 > --username fjossinet --password blabla > > In this case, it seems it doesn't work. > > I will try the other way as described in the biojava homepage > > Thanx > > F > Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit : > >> you need to check out with that account, so the svn flags are all set >> correctly. >> >> see the biojava homepage for how to check out with a developer account. >> A >> >> 2008/10/23 Fabrice Jossinet : >>> >>> Hi Andreas, >>> >>> Mauricio has created me the account fjossinet for the machine >>> dev.open-bio.org. But I think this is only the first step since I still >>> don't have the write access on the svn machine. >>> >>> Thank you for your help >>> >>> Regards >>> >>> Fabrice >>> >>> >>> -- >>> Dr. Fabrice Jossinet >>> Laboratoire de Bioinformatique, modelisation et simulation des acides >>> nucleiques >>> Universite Louis Pasteur >>> Institut de biologie moleculaire et cellulaire du CNRS >>> UPR9002, Architecture et Reactivite de l'ARN >>> 15 rue Rene Descartes >>> F-67084 Strasbourg Cedex >>> France >>> >>> Tel + 33 (0) 3 88 417053 >>> FAX + 33 (0) 3 88 60 22 18 >>> >>> f.jossinet at ibmc.u-strasbg.fr >>> fjossinet at gmail.com >>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >>> http://fjossinet.u-strasbg.fr/ >>> >>> >>> >>> >>> > > From andreas at sdsc.edu Thu Oct 23 23:17:02 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 20:17:02 -0700 Subject: [Biojava-dev] biojava 3 docu on wiki Message-ID: <59a41c430810232017wbc8874fnf829c5b9e7ced4a9@mail.gmail.com> Hi, I summarized the current status of the BioJava3 project at http://biojava.org/wiki/BioJava3_project feel free to update/add/comment. Andreas From andreas at sdsc.edu Fri Oct 24 00:01:31 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 21:01:31 -0700 Subject: [Biojava-dev] biojava 3 - java version Message-ID: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> Hi, I just tried to get an initial svn checkout of biojava3 on my mac at home. It fails to build since there is no Java 1.6 available for my OSX 10.4.11 ... Is there a strong reason why we should enforce java 1.6? otherwise would be good to support 1.5+ Andreas From f.jossinet at ibmc.u-strasbg.fr Fri Oct 24 04:21:15 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Fri, 24 Oct 2008 10:21:15 +0200 Subject: [Biojava-dev] svn write access In-Reply-To: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com> References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr> <59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com> <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr> <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com> Message-ID: <4CF8A26B-C50A-40F2-A7A5-B9F958F0F677@ibmc.u-strasbg.fr> Hi Andreas, Thank you for these details. I have added the new RNA module to biojava3 branch and I have updated the pom.xml file in the root directory of this branch. Fabrice Le 23 oct. 08 ? 19:24, Andreas Prlic a ?crit : > Hi Fabrice, > > in order to obtain a developer checkout you have to follow the > procedure as it is described on > http://biojava.org/wiki/CVS_to_SVN_Migration > under the section > Developer checkout > > code.open-bio is a read only copy of the SVN repository for anonymous > checkout. The "real" developer repository is on the dev.open-bio > machine and you can only access it via ssh. This setup is for security > reasons. code.open-bio and dev.open-bio are getting synchronized > approx ev. 20 min. > > Andreas > > On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet > wrote: >> Ok, I did that with the "code.open-bio.org" server and like that: >> >> svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3 >> --username fjossinet --password blabla >> >> In this case, it seems it doesn't work. >> >> I will try the other way as described in the biojava homepage >> >> Thanx >> >> F >> Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit : >> >>> you need to check out with that account, so the svn flags are all >>> set >>> correctly. >>> >>> see the biojava homepage for how to check out with a developer >>> account. >>> A >>> >>> 2008/10/23 Fabrice Jossinet : >>>> >>>> Hi Andreas, >>>> >>>> Mauricio has created me the account fjossinet for the machine >>>> dev.open-bio.org. But I think this is only the first step since I >>>> still >>>> don't have the write access on the svn machine. >>>> >>>> Thank you for your help >>>> >>>> Regards >>>> >>>> Fabrice >>>> >>>> >>>> -- >>>> Dr. Fabrice Jossinet >>>> Laboratoire de Bioinformatique, modelisation et simulation des >>>> acides >>>> nucleiques >>>> Universite Louis Pasteur >>>> Institut de biologie moleculaire et cellulaire du CNRS >>>> UPR9002, Architecture et Reactivite de l'ARN >>>> 15 rue Rene Descartes >>>> F-67084 Strasbourg Cedex >>>> France >>>> >>>> Tel + 33 (0) 3 88 417053 >>>> FAX + 33 (0) 3 88 60 22 18 >>>> >>>> f.jossinet at ibmc.u-strasbg.fr >>>> fjossinet at gmail.com >>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >>>> http://fjossinet.u-strasbg.fr/ >>>> >>>> >>>> >>>> >>>> >> >> From dicknetherlands at gmail.com Fri Oct 24 05:58:18 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 24 Oct 2008 10:58:18 +0100 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> Message-ID: It's only the older PPC Mac models (running Mac OS X 10.4 or older) which can't get any newer official versions of Java than 1.5 / 5.0. However, an alternative (free) route for obtaining a Java 1.6 / 6.0 compiler is provided for these older machines: http://landonf.bikemonkey.org/static/soylatte/ We wanted to move to Java 6 because it'll likely take about a year to get BJ3 fully up and running, by which time Java 6 will probably be the oldest supported version of Java available from Sun (5.0 is already end-of-lifed, and with 7.0 due out in January it is likely to be desupported very soon. When 8.0 probably about 12 months after BJ3 is finished then 5.0 will definitely become desupported). cheers, Richard 2008/10/24 Andreas Prlic > Hi, > > I just tried to get an initial svn checkout of biojava3 on my mac at > home. It fails to build since there is no Java 1.6 available for my > OSX 10.4.11 ... > Is there a strong reason why we should enforce java 1.6? otherwise > would be good to support 1.5+ > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From f.jossinet at ibmc.u-strasbg.fr Fri Oct 24 06:20:59 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Fri, 24 Oct 2008 12:20:59 +0200 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> Message-ID: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> Just to refresh the memory.... Major changes included in Java 6: * Support for older Win9x versions dropped. The last version for Windows 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 (1.5.0.16). * Scripting Language Support (JSR 223): Generic API for tight integration with scripting languages, and built-in Mozilla Javascript Rhino integration * Dramatic performance improvements for the core platform, and Swing. * Improved Web Service support through JAX-WS (JSR 224) * JDBC 4.0 support (JSR 221). * Java Compiler API (JSR 199): an API allowing a Java program to select and invoke a Java Compiler programmatically. * Upgrade of JAXB to version 2.0: Including integration of a StAX parser. * Support for pluggable annotations (JSR 269). * Many GUI improvements, such as integration of SwingWorker in the API, table sorting and filtering, and true Swing double-buffering (eliminating the gray-area effect). Perhaps the core module can be linked to the 1.5 version. And if someone needs, for example, the improvements of the GUI for his module, this module will be linked to another version. Possible or not ? F Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit : > Hi, > > I just tried to get an initial svn checkout of biojava3 on my mac at > home. It fails to build since there is no Java 1.6 available for my > OSX 10.4.11 ... > Is there a strong reason why we should enforce java 1.6? otherwise > would be good to support 1.5+ > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From dicknetherlands at gmail.com Fri Oct 24 07:14:43 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 24 Oct 2008 12:14:43 +0100 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> Message-ID: If you can find a way to make Maven do that, then I'm happy for you to make the relevant changes. cheers, Richard 2008/10/24 Fabrice Jossinet > Just to refresh the memory.... > > Major changes included in Java 6: > > * Support for older Win9x versions dropped. The last version for Windows > 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 ( > 1.5.0.16). > * Scripting Language Support (JSR 223): Generic API for tight > integration with scripting languages, and built-in Mozilla Javascript Rhino > integration > * Dramatic performance improvements for the core platform, and Swing. > * Improved Web Service support through JAX-WS (JSR 224) > * JDBC 4.0 support (JSR 221). > * Java Compiler API (JSR 199): an API allowing a Java program to select > and invoke a Java Compiler programmatically. > * Upgrade of JAXB to version 2.0: Including integration of a StAX > parser. > * Support for pluggable annotations (JSR 269). > * Many GUI improvements, such as integration of SwingWorker in the API, > table sorting and filtering, and true Swing double-buffering (eliminating > the gray-area effect). > > Perhaps the core module can be linked to the 1.5 version. And if someone > needs, for example, the improvements of the GUI for his module, this module > will be linked to another version. > > Possible or not ? > > F > > Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit : > > > Hi, >> >> I just tried to get an initial svn checkout of biojava3 on my mac at >> home. It fails to build since there is no Java 1.6 available for my >> OSX 10.4.11 ... >> Is there a strong reason why we should enforce java 1.6? otherwise >> would be good to support 1.5+ >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Fri Oct 24 07:28:56 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 24 Oct 2008 12:28:56 +0100 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> Message-ID: <4901B178.7090307@ebi.ac.uk> Yes I believe it is possible to get a module compiled against a different type of Java as seen here: http://maven.apache.org/plugins/maven-compiler-plugin/howto.html However to do this properly it requires compiling the code using the 1.5 JDK sources especially if we are going to leverage the API as much as we can. My group has already encountered this with changes to the java.sql.Connection interfaces meaning we have to compile against 1.5 sources. Andy Richard Holland wrote: > If you can find a way to make Maven do that, then I'm happy for you to make > the relevant changes. > > cheers, > Richard > > 2008/10/24 Fabrice Jossinet > >> Just to refresh the memory.... >> >> Major changes included in Java 6: >> >> * Support for older Win9x versions dropped. The last version for Windows >> 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 ( >> 1.5.0.16). >> * Scripting Language Support (JSR 223): Generic API for tight >> integration with scripting languages, and built-in Mozilla Javascript Rhino >> integration >> * Dramatic performance improvements for the core platform, and Swing. >> * Improved Web Service support through JAX-WS (JSR 224) >> * JDBC 4.0 support (JSR 221). >> * Java Compiler API (JSR 199): an API allowing a Java program to select >> and invoke a Java Compiler programmatically. >> * Upgrade of JAXB to version 2.0: Including integration of a StAX >> parser. >> * Support for pluggable annotations (JSR 269). >> * Many GUI improvements, such as integration of SwingWorker in the API, >> table sorting and filtering, and true Swing double-buffering (eliminating >> the gray-area effect). >> >> Perhaps the core module can be linked to the 1.5 version. And if someone >> needs, for example, the improvements of the GUI for his module, this module >> will be linked to another version. >> >> Possible or not ? >> >> F >> >> Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit : >> >> >> Hi, >>> I just tried to get an initial svn checkout of biojava3 on my mac at >>> home. It fails to build since there is no Java 1.6 available for my >>> OSX 10.4.11 ... >>> Is there a strong reason why we should enforce java 1.6? otherwise >>> would be good to support 1.5+ >>> >>> Andreas >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > From pzgyuanf at gmail.com Sat Oct 25 10:00:17 2008 From: pzgyuanf at gmail.com (pprun) Date: Sat, 25 Oct 2008 22:00:17 +0800 Subject: [Biojava-dev] Test failed for Alphabet.getSymbolMatchType method Message-ID: <49032671.1080309@gmail.com> Hi, The current implementation uses the same condition equalsIgnoreCase for EXACT_STRING_MATCH and MIXED_CASE_MATCH public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) { ... if (a.toString().equalsIgnoreCase(b.toString())) { return SymbolMatchType.EXACT_STRING_MATCH; } if (a.toString().equalsIgnoreCase(b.toString())) { return SymbolMatchType.MIXED_CASE_MATCH; } ... String.equals should be used for EXACT_STRING_MATCH: public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) { ... if (a.toString().equals(b.toString())) { return SymbolMatchType.EXACT_STRING_MATCH; } if (a.toString().equalsIgnoreCase(b.toString())) { return SymbolMatchType.MIXED_CASE_MATCH; } ... The test case used to identify the above bug is: /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.core.symbol; import org.junit.After; import org.junit.AfterClass; import org.junit.Before; import org.junit.BeforeClass; import org.junit.Test; import static org.junit.Assert.*; /** * * @author pprun */ public class AlphabetTest { public AlphabetTest() { } @BeforeClass public static void setUpClass() throws Exception { } @AfterClass public static void tearDownClass() throws Exception { } @Before public void setUp() { } @After public void tearDown() { } /** * Test of getSymbolMatchType method, of class Alphabet. */ @Test public void testGetSymbolMatchType() { System.out.println("getSymbolMatchType"); Alphabet testAlphabet = new Alphabet("testGetSymbolMatchType"); // 1. exact match Symbol a = Symbol.get("ATGC"); Symbol b = Symbol.get("ATGC"); SymbolMatchType expResult = SymbolMatchType.EXACT_MATCH; SymbolMatchType result = testAlphabet.getSymbolMatchType(a, b); assertEquals(expResult, result); // 2. mixed case match a = Symbol.get("ATGC"); b = Symbol.get("aTGC"); expResult = SymbolMatchType.MIXED_CASE_MATCH; result = testAlphabet.getSymbolMatchType(a, b); assertEquals(expResult, result); } } BTW., how can I get the dev/test role? Then I can contribute to the development or test (as I'm still a beginner for bio field) for BJ3. Thanks, Pprun From andreas at sdsc.edu Tue Oct 28 00:40:35 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 27 Oct 2008 21:40:35 -0700 Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship In-Reply-To: References: Message-ID: <59a41c430810272140h290a8a91q26af24946c2c63a5@mail.gmail.com> Hi Richard, I updated the 1.6 release with your fixes : http://www.biojava.org/download/bj16/all/biojava-1.6.1-all.jar Can you please verify and if it is correct update the download page on the wiki? Andreas On Thu, Oct 23, 2008 at 6:24 AM, Richard Holland wrote: > Andreas - is it possible to rebuild biojava-1.6-all.jar with the following > fix made to it? > > cheers, > Richard > > ---------- Forwarded message ---------- > From: Christian K?berle > Date: 2008/10/23 > Subject: Re: [Biojava-dev] BioSQL postgre BioEntryRelationship > To: Richard Holland > > > Hi Richard, > > I found the error in the current download of biojava 6.1 > (http://www.biojava.org/download/bj16/all/biojava-1.6-all.jar) in the file > src/org/biojavax/bio/db/biosql/pg/BioEntryRelationship.hbm.xml > > > table="bioentry_relationship" node="sequenceRelation" > entity-name="BioEntryRelationship"> > column="bioentry_relationship_id" node="@id"> > > bioentry_relationship_pk_seq > > > cascade="persist,merge,save-update" node="@termId" embed-xml="false"/> > not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId" > embed-xml="false"/> > not-null="true" cascade="persist,merge,save-update" > node="@subjectBioEntryId" embed-xml="false"/> > > > > > cheers, > Christian > > > 2008/10/23 Richard Holland >> >> Christian, >> >> Thanks for your comments. >> >> I'm not sure which file you're referring to, or what version of BioJava >> you have, as the line you quote does not appear in any of the current >> hbm.xml files in the trunk of SubVersion. >> >> Also, the BioEntryRelationship interface and it's implementations do >> already have getSubject() and getObject() methods which return the parent >> and child BioEntry instances. >> >> The BioEntry interface itself has a getBioEntryRelationships() method >> which returns all relationships in which it is the object BioEntry. You >> could use HQL to obtain those for which it is the subject, but you are right >> that it would be good to have a method that returns the latter. Could you >> raise a BugZilla request for this? >> >> It would be good if you could do some thorough testing of your lazy >> loading suggestions on some other use cases before we decide whether or not >> to adopt that approach in future developments. Use cases would include: >> >> 1. have a very large database with thousands of related records in it >> (e.g. load the whole of GenBank). Iterate over all the records in the >> database and perform a simple read operation on each that hits the modified >> methods. See if you run out of memory. >> >> 2. like 1, but perform a series of repeated read/write operations using >> the modified methods, with a final commit to attempt to write the results >> back to see if they still persist correctly. >> >> The reason is that the modified methods might cause problems with those >> people who are processing large volumes of data in their databases. If all >> related records are loaded at once, even only on demand, instead of one at a >> time, it will cause memory issues. The trade off is therefore memory vs. >> speed. We opted for the memory option because it makes life easier for most >> novice coders to not have to trace out-of-memory exceptions (although they >> will still occur using the existing methods, but it happens less often). >> >> Also, your method reruns the query every time it is called. It probably >> should cache the results after the first call, to prevent objects being >> reloaded unnecessarily, and to prevent problems with objects from a previous >> call being modified then attempted to be overwritten by a subsequent call? >> Also if Hibernate does not receive the same set back that it auto-loaded as >> a property via the default get() method when it comes to save the object, it >> will throw a wobbly and refuse to commit. >> >> cheers, >> Richard >> >> >> >> 2008/10/23 Christian K?berle >>> >>> Hi, >>> I found a bug in the postgre mapping file for BioEntryRelationship. >>> line: >>> >> not-null="true" cascade="persist,merge,save-update" >>> node="@objectFeatureId" >>> embed-xml="false"/> >>> The value for the attribute class has to be "BioEntry" >>> >>> For the BioEntry I miss methodes to have access to subject_bioentry >>> BioEntryRelationship. I think the BioEntryRelationship. is a parent child >>> relationship. So it will be nice to have access to both. >>> >>> Furthermore the hibernate mapping strategies for the BioSQL is quite slow >>> and produces a lot of queries to the database. Because for all lists and >>> set >>> the lazy fetch mode is disable. In this mode hibernate will execute one >>> query for each element in a list or set. The faster way is to enable the >>> lazy fetch mode an use methods to load the list. Each of these methods >>> executes only one query. >>> For excample: >>> >>> public List getParents(BioEntry bioEntry){ >>> >>> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object >>> =:subject"; >>> Query query = session.createQuery(stmt); >>> query.setParameter("subject", bioEntry); >>> return query.list(); >>> >>> } >>> >>> >>> This is factor 2 to 4 faster than the methode >>> BioEntry..getRelationships() >>> In case of all dependences of an BioEntry-Object an select with lazy >>> fetching can be 500 times faster than a select with eager fetching (in >>> case >>> of unigene cluster Hs.4 for example). >>> Here a example for the relationship unigene cluster Hs.2 and the gene >>> BC067218 (we use BioSQL to store Unigene) >>> >>> getParents(): >>> runtime: 14 msec >>> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_, >>> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_, >>> bioentry1_.accession as accession89_, bioentry1_.description as >>> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as >>> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id >>> as >>> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as >>> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as >>> seq93_, >>> case when bioentry1_1_.bioentry_id is not null then 2 when >>> bioentry1_.bioentry_id is not null then 0 end as clazz_ from >>> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry >>> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id >>> left >>> outer join unigene.biosequence bioentry1_1_ on >>> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer >>> joinunigene.biosequence bioentry1_2_ on >>> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where >>> bioentryre0_.object_bioentry_id=? >>> >>> >>> bioEntry.getRelationships(): >>> runtime: 36 msec >>> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_, >>> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_, >>> bioentry0_.accession as accession89_, bioentry0_.description as >>> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as >>> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id >>> as >>> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as >>> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as >>> seq93_, >>> case when bioentry0_1_.bioentry_id is not null then 2 when >>> bioentry0_.bioentry_id is not null then 0 end as clazz_ from >>> unigene.bioentry bioentry0_ left outer join unigene.biosequence >>> bioentry0_1_ >>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join >>> unigene.biosequence bioentry0_2_ on >>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=? >>> Hibernate: select relationsh0_.object_bioentry_id as object3_1_, >>> relationsh0_.bioentry_relationship_id as bioentry1_1_, >>> relationsh0_.bioentry_relationship_id as bioentry1_95_0_, >>> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as >>> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_, >>> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship >>> relationsh0_ where relationsh0_.object_bioentry_id=? >>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, >>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, >>> namespace0_.description as descript4_80_0_ from unigene.biodatabase >>> namespace0_ where namespace0_.biodatabase_id=? >>> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, >>> bioentry0_.name >>> as name89_0_, bioentry0_.identifier as identifier89_0_, >>> bioentry0_.accession >>> as accession89_0_, bioentry0_.description as descript5_89_0_, >>> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_, >>> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as >>> biodatab9_89_0_, bioentry0_1_.version as version93_0_, >>> bioentry0_1_.length >>> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq >>> as >>> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when >>> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from >>> unigene.bioentry bioentry0_ left outer join unigene.biosequence >>> bioentry0_1_ >>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join >>> unigene.biosequence bioentry0_2_ on >>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where >>> bioentry0_.bioentry_id=? >>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, >>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, >>> namespace0_.description as descript4_80_0_ from unigene.biodatabase >>> namespace0_ where namespace0_.biodatabase_id=? >>> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as >>> name84_0_, >>> term0_.identifier as identifier84_0_, term0_.definition as >>> definition84_0_, >>> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ >>> from >>> unigene.term term0_ where term0_.term_id=? >>> Hibernate: select ontology0_.ontology_id as ontology1_83_0_, >>> ontology0_.name >>> as name83_0_, ontology0_.definition as definition83_0_ from >>> unigene.ontology >>> ontology0_ where ontology0_.ontology_id=? >>> Hibernate: select termset0_.ontology_id as ontology6_1_, >>> termset0_.term_id >>> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as >>> name84_0_, >>> termset0_.identifier as identifier84_0_, termset0_.definition as >>> definition84_0_, termset0_.is_obsolete as is5_84_0_, >>> termset0_.ontology_id >>> as ontology6_84_0_ from unigene.term termset0_ where >>> termset0_.ontology_id=? >>> Hibernate: select tripleset0_.ontology_id as ontology5_1_, >>> tripleset0_.term_relationship_id as term1_1_, >>> tripleset0_.term_relationship_id as term1_87_0_, >>> tripleset0_.subject_term_id >>> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_, >>> tripleset0_.predicate_term_id as predicate4_87_0_, >>> tripleset0_.ontology_id >>> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where >>> tripleset0_.ontology_id=? >>> Hibernate: select rankedcros0_.term_id as term1_0_, >>> rankedcros0_.dbxref_id >>> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref >>> rankedcros0_ where rankedcros0_.term_id=? >>> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym >>> as >>> synonym0_ from unigene.term_synonym synonymset0_ where >>> synonymset0_.term_id=? >>> >>> -- >>> Christian K?berle >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ > > > > -- > Christian K?berle > Sch?nholzerstr. 5 > 10115 Berlin > Mobil: 0179 79 35 345 > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From bugzilla-daemon at portal.open-bio.org Wed Oct 1 20:48:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 16:48:15 -0400 Subject: [Biojava-dev] [Bug 2602] New: ParseException thrown when parsing Genbank file. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2602 Summary: ParseException thrown when parsing Genbank file. Product: BioJava Version: live (CVS source) Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P1 Component: seq.io AssignedTo: biojava-dev at biojava.org ReportedBy: tritt at wisc.edu When attempting to read in a Genbank file using RichSequence.IOTools, I received a ParseException. When using SeqIOTools, I do not have this problem. The code that exposed the bug is given below. public static void main(String[] args) { String dnaDir = args[args.length-1]; BufferedReader[] br = new BufferedReader[8]; FileReader orthologs = null; for (int i = 0; i < br.length; i++) br[i] = null; try { orthologs = new FileReader(args[0]); for (int i = 0; i < br.length; i++) br[i] = new BufferedReader(new FileReader(args[i+1])); } catch (FileNotFoundException ex){ ex.printStackTrace(); System.exit(-1); } RichSequenceIterator[] seqIt = new RichSequenceIterator[8]; HashMap[] features = new HashMap[8]; for (int i = 0; i < features.length; i++){ features[i] = new HashMap(); } for (int i = 0; i < br.length; i++) seqIt[i] = RichSequence.IOTools.readGenbankDNA(br[i], null); for (int i = 0; i < seqIt.length; i++){ RichSequence seq = null; try { seq = seqIt[i].nextRichSequence(); seqIt[i] = null; br[i] = null; } catch (NoSuchElementException ex) { ex.printStackTrace(); System.exit(-1); } catch (BioException ex) { ex.printStackTrace(); System.exit(-1); } . . . The following error message was received. org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at OrthologSeqExtractor.main(OrthologSeqExtractor.java:76) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.GenbankFormat Accession=EDL933 Id=null Comments=Bad dbxref Parse_block=FEATURES Location/Qualifierssource 1..5528423/db_xref "GenBank:AE005174"/db_xref "RefSeq_NA:NC_002655"/db_xref "ATCC:700927"/db_xref "taxon:155864"/db_xref "ERIC:SOP"/mol_type "genomic DNA"/note "enterohemorrhagic"/organism "Escherichia coli"/serotype "O157:H7:K-"/strain "EDL933"/transl_table 11/db_xref "ASAP:ABH-0023909"/db_xref "ERIC:ABH-0023909" . . . Stack trace follows .... at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:462) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 1 more -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 07:54:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 03:54:42 -0400 Subject: [Biojava-dev] [Bug 2603] New: StringIndexOutOfBoundsException while parsing blastresult Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2603 Summary: StringIndexOutOfBoundsException while parsing blastresult Product: BioJava Version: unspecified Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: bio AssignedTo: biojava-dev at biojava.org ReportedBy: dtoomey at rcsi.ie While parsing a blast result I get a StringIndexOutOfBoundsException. I have narrowed down the cuase of the error to this section Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) GN=ISPF What I have found is that if the 3rd line is less than 11 characters long the error is thrown. If I add text or even extra spaces to this line then the error does not occur. Also I have noticed that it does not happen to the first entry in a file containing multiple blast searches. I have tried this on both Windows and Linux and get the same error. I have been using blast version 2.2.18 but have also tried 2.2.17 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 10:30:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 06:30:16 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810031030.m93AUGcD007688@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #1 from dtoomey at rcsi.ie 2008-10-03 06:30 EST ------- I have narrowed down the offending line to oParsedSeq = poLine.substring( iOffset).concat( new String( oPadding ) ); from 'BlastLikeAlignmentSAXParser.java' I have put in a hack which at least allows me to run the code try { oParsedSeq = poLine.substring( iOffset).concat( new String( oPadding ) ); } catch (StringIndexOutOfBoundsException ex) { System.out.println("Caught sub string error for poLine: " + poLine + " Offset is " + String.valueOf(iOffset)); oParsedSeq = poLine.concat( new String( oPadding ) ); } (In reply to comment #0) > While parsing a blast result I get a StringIndexOutOfBoundsException. I have > narrowed down the cuase of the error to this section > Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol > 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) > GN=ISPF > What I have found is that if the 3rd line is less than 11 characters long the > error is thrown. If I add text or even extra spaces to this line then the error > does not occur. Also I have noticed that it does not happen to the first entry > in a file containing multiple blast searches. > I have tried this on both Windows and Linux and get the same error. I have been > using blast version 2.2.18 but have also tried 2.2.17 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 08:12:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 04:12:18 -0400 Subject: [Biojava-dev] [Bug 2617] New: Cookbook blast parser example fails on a tblastn example Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2617 Summary: Cookbook blast parser example fails on a tblastn example Product: BioJava Version: live (CVS source) Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: search AssignedTo: biojava-dev at biojava.org ReportedBy: holland at ebi.ac.uk (raised on behalf of user Charles Imbusch) Hello, for a project I want to parse a tblastn result with BioJava. I used the code on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I get an error message as follows: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1938) at java.lang.String.substring(String.java:1905) at org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289) at org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115) at org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514) at org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287) at org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251) at org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118) at org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635) at org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337) at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164) at org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313) at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162) at BlastEcho.echo(BlastEcho.java:29) at BlastEcho.main(BlastEcho.java:75) I uploaded the Blast output file I want to parse here: http://charles.imbusch.net/tmp/blastresult.txt Any answer is appreciated. Cheers, Charles -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From f.jossinet at ibmc.u-strasbg.fr Wed Oct 15 08:36:09 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Wed, 15 Oct 2008 10:36:09 +0200 Subject: [Biojava-dev] Proposition of participation to the BioJava project Message-ID: Dear BioJava team, my name is Fabrice Jossinet. I'm working as assistant professor in a french university (Louis Pasteur University in Strasbourg). I'm developing bioinformatics tool with the Java language since 2002. Before that, I did a PhD as a molecular biologist at the bench ;) I'm interested in the study of RNA. At now I'm focused on their structural features, but i'm also interested in non-coding RNA genes in genomes. You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ . At now this project has a size of 60 000 lines of code and uses more than 10 external libraries. I'm following BioJava since several years now. I would like to extend it with RNA concepts. If you think that I can participate, don't hesitate to answer me ;) All the best Fabrice -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From simpleyrx at 163.com Wed Oct 15 09:11:50 2008 From: simpleyrx at 163.com (simpleyrx) Date: Wed, 15 Oct 2008 17:11:50 +0800 (CST) Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ? In-Reply-To: References: Message-ID: <7852810.354291224061911001.JavaMail.coremail@app143.163.com> Dear experts, I wonder that can biojava can calcaulte profile-profile alignment ? -- student From bugzilla-daemon at portal.open-bio.org Wed Oct 15 16:05:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:05:23 -0400 Subject: [Biojava-dev] [Bug 2617] Cookbook blast parser example fails on a tblastn example In-Reply-To: Message-ID: <200810151605.m9FG5Nhb004488@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2617 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from holland at ebi.ac.uk 2008-10-15 12:05 EST ------- *** This bug has been marked as a duplicate of bug 2603 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 16:05:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:05:25 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810151605.m9FG5PZo004505@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 holland at ebi.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |holland at ebi.ac.uk ------- Comment #2 from holland at ebi.ac.uk 2008-10-15 12:05 EST ------- *** Bug 2617 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From holland at eaglegenomics.com Wed Oct 15 16:25:16 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 15 Oct 2008 17:25:16 +0100 Subject: [Biojava-dev] Proposition of participation to the BioJava project In-Reply-To: References: Message-ID: You're absolutely welcome to contribute! We appreciate all the help we can get. I will be sending out an email to the BioJava mailing lists in the next couple of days inviting contributions for the new BioJava 3 code and describing how to go about it. I think your RNA ideas would be a great starting point. cheers, Richard 2008/10/15 Fabrice Jossinet > Dear BioJava team, > > my name is Fabrice Jossinet. I'm working as assistant professor in a french > university (Louis Pasteur University in Strasbourg). > I'm developing bioinformatics tool with the Java language since 2002. > Before that, I did a PhD as a molecular biologist at the bench ;) > I'm interested in the study of RNA. At now I'm focused on their structural > features, but i'm also interested in non-coding RNA genes in genomes. > You can have a look at my current project at this address: > http://paradise-ibmc.u-strasbg.fr/. At now this project has a size of 60 > 000 lines of code and uses more than 10 external libraries. > > I'm following BioJava since several years now. I would like to extend it > with RNA concepts. If you think that I can participate, don't hesitate to > answer me ;) > > All the best > > Fabrice > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Wed Oct 15 16:29:59 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 15 Oct 2008 17:29:59 +0100 Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ? In-Reply-To: <7852810.354291224061911001.JavaMail.coremail@app143.163.com> References: <7852810.354291224061911001.JavaMail.coremail@app143.163.com> Message-ID: The short answer: no. The long answer: not yet! But if someone would like to contribute some code that can do it, watch out for my email to the mailing lists in the next couple of days inviting contributions for the new BioJava 3 code base. cheers, Richard 2008/10/15 simpleyrx > > Dear experts, > > I wonder that can biojava can calcaulte profile-profile alignment ? > > > > > -- > > > student > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From bugzilla-daemon at portal.open-bio.org Thu Oct 16 06:15:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 02:15:05 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160615.m9G6F5Tk014016@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #3 from tbanks at agr.gc.ca 2008-10-16 02:15 EST ------- Created an attachment (id=1007) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1007&action=view) patch file 1 for bug 2603 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 06:15:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 02:15:46 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160615.m9G6FkaF014096@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #4 from tbanks at agr.gc.ca 2008-10-16 02:15 EST ------- Created an attachment (id=1008) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1008&action=view) patch file 2 for bug 2603 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 06:18:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 02:18:10 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160618.m9G6IATb014290@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #5 from tbanks at agr.gc.ca 2008-10-16 02:18 EST ------- I've written up a fix for this bug. As Richard suspected this fix takes care of bug 2617 (I've tested both). I've attached the patch files for the two affected files. If the patches don't take let me know and I'll email the files. - Travis -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From f.jossinet at ibmc.u-strasbg.fr Thu Oct 16 08:50:54 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Thu, 16 Oct 2008 10:50:54 +0200 Subject: [Biojava-dev] Proposition of participation to the BioJava project In-Reply-To: References: Message-ID: <65EB20E6-6137-441B-AC13-26031D46BDFE@ibmc.u-strasbg.fr> Dear Richard, Thank you very much. I'm looking forward to this invitation. All the best Fabrice Le 15 oct. 08 ? 18:25, Richard Holland a ?crit : > You're absolutely welcome to contribute! We appreciate all the help > we can get. > > I will be sending out an email to the BioJava mailing lists in the > next couple of days inviting contributions for the new BioJava 3 > code and describing how to go about it. I think your RNA ideas would > be a great starting point. > > cheers, > Richard > > 2008/10/15 Fabrice Jossinet > Dear BioJava team, > > my name is Fabrice Jossinet. I'm working as assistant professor in a > french university (Louis Pasteur University in Strasbourg). > I'm developing bioinformatics tool with the Java language since > 2002. Before that, I did a PhD as a molecular biologist at the > bench ;) > I'm interested in the study of RNA. At now I'm focused on their > structural features, but i'm also interested in non-coding RNA genes > in genomes. > You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ > . At now this project has a size of 60 000 lines of code and uses > more than 10 external libraries. > > I'm following BioJava since several years now. I would like to > extend it with RNA concepts. If you think that I can participate, > don't hesitate to answer me ;) > > All the best > > Fabrice > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ From bugzilla-daemon at portal.open-bio.org Thu Oct 16 09:39:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 05:39:11 -0400 Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while parsing blastresult In-Reply-To: Message-ID: <200810160939.m9G9dBGm028921@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2603 ------- Comment #6 from holland at ebi.ac.uk 2008-10-16 05:39 EST ------- Thanks for the patches! Could you email me the complete two files that you've modified (it's easier for me to just copy-and-paste the entire file). I'll then commit them to SVN. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From fbristow at gmail.com Fri Oct 17 18:58:08 2008 From: fbristow at gmail.com (Franklin Bristow) Date: Fri, 17 Oct 2008 13:58:08 -0500 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files Message-ID: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> Hello everyone, I've been doing some work with swissprot, and I've been needing to make use of the file reading and writing facilities in biojava. I was using biojava 1.5, but I've recently moved to using biojava-live so that I can actually step through the code to see what's going on. I have successfully created an index of my swissprot database and I can read my sequences out of that indexed database. All of the appropriate information is loaded from the records in the file into the appropriate objects. I am quite happy with this. The problem that I am having has to do with writing swissprot records. When I started using biojava, the recommended way to do this was using SeqIOTools: SeqIOTools.writeSwissprot(byteStream, swissSequence); While this works (ie: no exceptions are thrown), the record that is printed to the byteStream looks pretty ugly (it's littered with XX lines) and is not valid as per the current swissprot file spec ( http://www.expasy.ch/sprot/userman.html). While this record is invalid, it does contain all of the information that was originally in the swissprot file. I would include what I get as an output here, but it's irrelevant. SeqIOTools became deprecated in favour of this: RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); Once again, while this works (and this time the record is valid), the record that is printed contains almost none of the original information that is contained in the swissprot record. This is the output that I get when I call this method (the spacing is may not look right because of fonts, but that is not the problem): ID Q4UVA7_null STANDARD; 273 AA. > AC Q4UVA7; > DT null, integrated into UniProtKB/?. > DT null, sequence version 0. > DT null, entry version 0. > DE null. > FT any 1 273 > FT any 153 160 > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > // > But what I am expecting to see looks like this (again, the spacing is the fault of the font, not the output): > ID Y1953_XANC8 Reviewed; 273 AA. > AC Q4UVA7; > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 1. > DT 06-FEB-2007, entry version 12. > DE UPF0085 protein XC_1953. > GN OrderedLocusNames=XC_1953; > OS Xanthomonas campestris pv. campestris (strain 8004). > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; > OC Xanthomonadaceae; Xanthomonas. > OX NCBI_TaxID=314565; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. > RX PubMed=15899963; DOI=10.1101/gr.3378705; > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q., > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B., > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; > RT "Comparative and functional genomic analyses of the pathogenicity of > RT phytopathogen Xanthomonas campestris pv. campestris."; > RL Genome Res. 15:757-767(2005). > CC -!- SIMILARITY: Belongs to the UPF0085 family. > CC ------------------------------------------------------------ > ----------- > CC Copyrighted by the UniProt Consortium, see > http://www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs License > CC ------------------------------------------------------------ > ----------- > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. > DR GenomeReviews; CP000050_GR; XC_1953. > DR KEGG; xcb:XC_1953; -. > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. > DR HAMAP; MF_01062; -; 1. > DR InterPro; IPR005177; DUF299. > DR Pfam; PF03618; DUF299; 1. > KW ATP-binding; Complete proteome; Nucleotide-binding. > FT CHAIN 1 273 UPF0085 protein XC_1953. > FT /FTId=PRO_0000196744. > FT NP_BIND 153 160 ATP (Potential). > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > // > Needless to say, there is a considerable loss of information. At first I wasn't sure if this was a problem with parsing the database that I had, so I inspected the object that was retrieved from the database. As I mentioned before, the parsing seems to be working fine. I get a SimpleSequence object that has all of the correct annotations and other information loaded into it. I then continued to step through the writeUniProt method in RichSequence.IOTools and found that this method first calls "enrich" on SimpleSequence which turns it into a SimpleRichSequence. There appears to be some loss of information at this point, specifically in the feature set where the 'key name' is lost -- it just becomes 'any'. It is when we get to the actual process of writing to the stream in UniprotFormat.writeSequence that we have the problems. All of the code appears to be there for printing the information out that I'm expecting. I think the problem is that in the process of "enrich"-ing the sequence, the data is still stored in the object, but it is no longer where it is expected to be. For example, when we get to writing the comments out: // comments - if any if (!rs.getComments().isEmpty()) { The List of comments IS empty, but there are comments in the SimpleRichSequence, they are stored in the notes data member. So. After this lengthy explanation of my problem, I am wondering if I am merely not doing this correctly. Is there a better way to pass my information to the writeUniprot method -- should I be transforming my SimpleSequence objects into a SimpleRichSequence manually? Am I just going about this entirely the wrong way? If I am going about this correctly and the functionality to do this is merely not there or hasn't been implemented correctly, I would be more than happy to help out... I can supply patches, create bug reports, or anything else that is necessary. Any guidance in this matter would be greatly appreciated! -- Franklin From holland at eaglegenomics.com Fri Oct 17 20:08:25 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 17 Oct 2008 21:08:25 +0100 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files In-Reply-To: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> Message-ID: Hello. I'm not sure how you're getting your uniprot records out of your swissprot database, or what format your swissprot database is in? If it's BioSQL, then the way BioJava interacts with it has altered significantly with BioJavaX - previous versions basically stuffed everything in as comments, hence all the XX lines you got when writing it back out again. However if it's not BioSQL and you've written something custom of your own, then I couldn't really comment! BioJavaX will attempt to convert the old sequence objects into rich sequence objects, but there's not much in common between the way uniprot data is stored in the old object model and the new one. Therefore the enrich method can't do a very good job - especially for stuff which the original parser stored as comments instead of properly distributing it across the object model. Data which the original parser stored in this comment format will mostly get ignored by the conversion process, because the conversion process has no idea where the record came from and therefore what to do with the comments inside it. Your best bet is to read your data out of your database directly as rich sequence objects, or if not possible, then do the conversion manually. cheers, Richard 2008/10/17 Franklin Bristow > Hello everyone, > I've been doing some work with swissprot, and I've been needing to make use > of the file reading and writing facilities in biojava. > > I was using biojava 1.5, but I've recently moved to using biojava-live so > that I can actually step through the code to see what's going on. > > I have successfully created an index of my swissprot database and I can > read > my sequences out of that indexed database. All of the appropriate > information is loaded from the records in the file into the appropriate > objects. I am quite happy with this. > > The problem that I am having has to do with writing swissprot records. > > When I started using biojava, the recommended way to do this was using > SeqIOTools: > SeqIOTools.writeSwissprot(byteStream, swissSequence); > > While this works (ie: no exceptions are thrown), the record that is printed > to the byteStream looks pretty ugly (it's littered with XX lines) and is > not > valid as per the current swissprot file spec ( > http://www.expasy.ch/sprot/userman.html). While this record is invalid, > it > does contain all of the information that was originally in the swissprot > file. I would include what I get as an output here, but it's irrelevant. > > SeqIOTools became deprecated in favour of this: > RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); > > Once again, while this works (and this time the record is valid), the > record > that is printed contains almost none of the original information that is > contained in the swissprot record. This is the output that I get when I > call this method (the spacing is may not look right because of fonts, but > that is not the problem): > > ID Q4UVA7_null STANDARD; 273 AA. > > AC Q4UVA7; > > DT null, integrated into UniProtKB/?. > > DT null, sequence version 0. > > DT null, entry version 0. > > DE null. > > FT any 1 273 > > FT any 153 160 > > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > > // > > > > But what I am expecting to see looks like this (again, the spacing is the > fault of the font, not the output): > > > ID Y1953_XANC8 Reviewed; 273 AA. > > AC Q4UVA7; > > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. > > DT 05-JUL-2005, sequence version 1. > > DT 06-FEB-2007, entry version 12. > > DE UPF0085 protein XC_1953. > > GN OrderedLocusNames=XC_1953; > > OS Xanthomonas campestris pv. campestris (strain 8004). > > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; > > OC Xanthomonadaceae; Xanthomonas. > > OX NCBI_TaxID=314565; > > RN [1] > > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. > > RX PubMed=15899963; DOI=10.1101/gr.3378705; > > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q., > > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., > > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B., > > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; > > RT "Comparative and functional genomic analyses of the pathogenicity of > > RT phytopathogen Xanthomonas campestris pv. campestris."; > > RL Genome Res. 15:757-767(2005). > > CC -!- SIMILARITY: Belongs to the UPF0085 family. > > CC ------------------------------------------------------------ > > ----------- > > CC Copyrighted by the UniProt Consortium, see > > http://www.uniprot.org/terms > > CC Distributed under the Creative Commons Attribution-NoDerivs License > > CC ------------------------------------------------------------ > > ----------- > > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. > > DR GenomeReviews; CP000050_GR; XC_1953. > > DR KEGG; xcb:XC_1953; -. > > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. > > DR HAMAP; MF_01062; -; 1. > > DR InterPro; IPR005177; DUF299. > > DR Pfam; PF03618; DUF299; 1. > > KW ATP-binding; Complete proteome; Nucleotide-binding. > > FT CHAIN 1 273 UPF0085 protein XC_1953. > > FT /FTId=PRO_0000196744. > > FT NP_BIND 153 160 ATP (Potential). > > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; > > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE > > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF > > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT > > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF > > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF > > // > > > > Needless to say, there is a considerable loss of information. > > At first I wasn't sure if this was a problem with parsing the database that > I had, so I inspected the object that was retrieved from the database. As > I > mentioned before, the parsing seems to be working fine. I get a > SimpleSequence object that has all of the correct annotations and other > information loaded into it. > > I then continued to step through the writeUniProt method in > RichSequence.IOTools and found that this method first calls "enrich" on > SimpleSequence which turns it into a SimpleRichSequence. There appears to > be some loss of information at this point, specifically in the feature set > where the 'key name' is lost -- it just becomes 'any'. > > It is when we get to the actual process of writing to the stream in > UniprotFormat.writeSequence that we have the problems. All of the code > appears to be there for printing the information out that I'm expecting. I > think the problem is that in the process of "enrich"-ing the sequence, the > data is still stored in the object, but it is no longer where it is > expected > to be. For example, when we get to writing the comments out: > // comments - if any > if (!rs.getComments().isEmpty()) { > > The List of comments IS empty, but there are comments in the > SimpleRichSequence, they are stored in the notes data member. > > So. After this lengthy explanation of my problem, I am wondering if I am > merely not doing this correctly. Is there a better way to pass my > information to the writeUniprot method -- should I be transforming my > SimpleSequence objects into a SimpleRichSequence manually? Am I just going > about this entirely the wrong way? > > If I am going about this correctly and the functionality to do this is > merely not there or hasn't been implemented correctly, I would be more than > happy to help out... I can supply patches, create bug reports, or anything > else that is necessary. > > Any guidance in this matter would be greatly appreciated! > > -- > Franklin > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Oct 20 00:18:29 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 01:18:29 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! Message-ID: Hi all, I've just committed some new code to the biojava3 branch of the biojava-live subversion repository. It's the foundations of a brand new alphabet+symbol set of classes, and an example of how to use them to represent DNA. You'll notice that the new code is very lightweight and allows for a lot more flexibility than the old code - for instance, the concept of Alphabet has changed radically. It also makes much more extensive use of the Collections API. I haven't got any test cases or usage examples yet but give me a shout if you don't understand the code and I'll explain how it works. (Hint: SymbolFormat is there to convert Strings into SymbolList objects, and vice versa). So, now we want some volunteers! We're starting from scratch here so there's a lot of work to do. The whole of BioJava needs 'translating' into BJ3, whether it be copy-and-paste existing classes and modify them to suit the new style, or write completely new ones to provide equivalent functionality. I'll post an example of how to do file parsing soon, probably starting with FASTA. In the meantime, a good place to start would be for people to design object models to represent their favourite data types (e.g. Genbank, or microarray data). Utility classes to manipulate those objects would be great too. The object models need to be normalised as much as possible - e.g. if your data has a lot of comments, and the order of those comments is important, then give your object model a collection of comment objects. The object model for each data type should be completely independent and use basic data types wherever possible (e.g. store sequences as strings, don't attempt to parse them into anything fancy like SymbolLists). The closer the object model is to the original data format, the better. There's going to be clever tricks when it comes to converting data between different object models (e.g. Genbank to INSDSeq), which I will explain later when I put the file parsing examples up. You'll notice how the biojava3 branch uses Maven instead of Ant. This is because we want to make it as modular as possible, so if you want to write microarray stuff, create a new microarray sub-project (as per the dna example that's already there). This way if someone only wants the microarray bit of BJ3, they only need install the appropriate JAR file and can ignore the rest. (The 'core' module is for stuff that is so generic it could be used anywhere, or is used in every single other module.) If coding isn't your cup of tea, then we would very much welcome testers (particularly those who enjoy writing test cases!), documenters (particularly code commenters), translators (for internationalisation of the code), and of course all those who wish to contribute ideas and suggestions no matter how off-the-wall they might be. In particular if you'd like to take charge of an area of the development process, e.g. Documentation Chief, or Protein Champion, then that would be much appreciated. I'm very much looking forward to working with everyone on this. Good luck, and happy coding! cheers, Richard PS. Please don't forget to attach the appropriate licence to your code. You can copy-and-paste it from the existing classes I just committed this evening. PPS. For those who are worried about backwards compatibility - this was discussed on the lists a while back and it was made clear that BJ3 is a clean break. However, the existing code will continue to be maintained and bugfixed for a couple of years so you don't have to upgrade if you don't want to - it just won't have any new features developed for it. This is largely because it'll probably take just that long to write all the new BJ3 code. When we do decide to desupport the existing BJ code, plenty of notice will be given (i.e. years as opposed to months). -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Mon Oct 20 04:13:01 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 20 Oct 2008 12:13:01 +0800 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com> Hi - Just a comment ... Does an alphabet need to be a Singleton in this new paradigm? If it does then do you want to have an equals() method? Currently you could have: Alphabet a; Alphabet b; a.equals(b) //true; a == b //false Unless there is a strong reason why Alphabet needs to be a Singleton I don't think it should be (Singletons make life hard when transporting between JVMs). You can get a similar kind of behaivor with caching where it doesn't hurt if there is more than one instance of an equal alphabet but when they pass through the cache they can get cleaned up (like the interning behaivour of Strings). Put it this way. If I have two copies of the DNA alphabet will it matter (other than a bit of memory waste)? - Mark On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland wrote: > Hi all, > > I've just committed some new code to the biojava3 branch of the biojava-live > subversion repository. It's the foundations of a brand new alphabet+symbol > set of classes, and an example of how to use them to represent DNA. You'll > notice that the new code is very lightweight and allows for a lot more > flexibility than the old code - for instance, the concept of Alphabet has > changed radically. It also makes much more extensive use of the Collections > API. > > I haven't got any test cases or usage examples yet but give me a shout if > you don't understand the code and I'll explain how it works. (Hint: > SymbolFormat is there to convert Strings into SymbolList objects, and vice > versa). > > So, now we want some volunteers! We're starting from scratch here so there's > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > whether it be copy-and-paste existing classes and modify them to suit the > new style, or write completely new ones to provide equivalent functionality. > > > I'll post an example of how to do file parsing soon, probably starting with > FASTA. In the meantime, a good place to start would be for people to design > object models to represent their favourite data types (e.g. Genbank, or > microarray data). Utility classes to manipulate those objects would be great > too. > > The object models need to be normalised as much as possible - e.g. if your > data has a lot of comments, and the order of those comments is important, > then give your object model a collection of comment objects. The object > model for each data type should be completely independent and use basic data > types wherever possible (e.g. store sequences as strings, don't attempt to > parse them into anything fancy like SymbolLists). The closer the object > model is to the original data format, the better. There's going to be clever > tricks when it comes to converting data between different object models > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > parsing examples up. > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > because we want to make it as modular as possible, so if you want to write > microarray stuff, create a new microarray sub-project (as per the dna > example that's already there). This way if someone only wants the microarray > bit of BJ3, they only need install the appropriate JAR file and can ignore > the rest. (The 'core' module is for stuff that is so generic it could be > used anywhere, or is used in every single other module.) > > If coding isn't your cup of tea, then we would very much welcome testers > (particularly those who enjoy writing test cases!), documenters > (particularly code commenters), translators (for internationalisation of the > code), and of course all those who wish to contribute ideas and suggestions > no matter how off-the-wall they might be. In particular if you'd like to > take charge of an area of the development process, e.g. Documentation Chief, > or Protein Champion, then that would be much appreciated. > > I'm very much looking forward to working with everyone on this. Good luck, > and happy coding! > > cheers, > Richard > > PS. Please don't forget to attach the appropriate licence to your code. You > can copy-and-paste it from the existing classes I just committed this > evening. > > PPS. For those who are worried about backwards compatibility - this was > discussed on the lists a while back and it was made clear that BJ3 is a > clean break. However, the existing code will continue to be maintained and > bugfixed for a couple of years so you don't have to upgrade if you don't > want to - it just won't have any new features developed for it. This is > largely because it'll probably take just that long to write all the new BJ3 > code. When we do decide to desupport the existing BJ code, plenty of notice > will be given (i.e. years as opposed to months). > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Mon Oct 20 08:23:17 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 09:23:17 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com> References: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com> Message-ID: Good point, and the answer is no it doesn't really matter! So I will remove the singleton-ish ness of Alphabet. 2008/10/20 Mark Schreiber > Hi - > > Just a comment ... > > Does an alphabet need to be a Singleton in this new paradigm? If it > does then do you want to have an equals() method? Currently you could > have: > > Alphabet a; Alphabet b; > > a.equals(b) //true; > a == b //false > > Unless there is a strong reason why Alphabet needs to be a Singleton I > don't think it should be (Singletons make life hard when transporting > between JVMs). You can get a similar kind of behaivor with caching > where it doesn't hurt if there is more than one instance of an equal > alphabet but when they pass through the cache they can get cleaned up > (like the interning behaivour of Strings). > > Put it this way. If I have two copies of the DNA alphabet will it > matter (other than a bit of memory waste)? > > - Mark > > On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland > wrote: > > Hi all, > > > > I've just committed some new code to the biojava3 branch of the > biojava-live > > subversion repository. It's the foundations of a brand new > alphabet+symbol > > set of classes, and an example of how to use them to represent DNA. > You'll > > notice that the new code is very lightweight and allows for a lot more > > flexibility than the old code - for instance, the concept of Alphabet has > > changed radically. It also makes much more extensive use of the > Collections > > API. > > > > I haven't got any test cases or usage examples yet but give me a shout if > > you don't understand the code and I'll explain how it works. (Hint: > > SymbolFormat is there to convert Strings into SymbolList objects, and > vice > > versa). > > > > So, now we want some volunteers! We're starting from scratch here so > there's > > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > > whether it be copy-and-paste existing classes and modify them to suit the > > new style, or write completely new ones to provide equivalent > functionality. > > > > > > I'll post an example of how to do file parsing soon, probably starting > with > > FASTA. In the meantime, a good place to start would be for people to > design > > object models to represent their favourite data types (e.g. Genbank, or > > microarray data). Utility classes to manipulate those objects would be > great > > too. > > > > The object models need to be normalised as much as possible - e.g. if > your > > data has a lot of comments, and the order of those comments is important, > > then give your object model a collection of comment objects. The object > > model for each data type should be completely independent and use basic > data > > types wherever possible (e.g. store sequences as strings, don't attempt > to > > parse them into anything fancy like SymbolLists). The closer the object > > model is to the original data format, the better. There's going to be > clever > > tricks when it comes to converting data between different object models > > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > > parsing examples up. > > > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > > because we want to make it as modular as possible, so if you want to > write > > microarray stuff, create a new microarray sub-project (as per the dna > > example that's already there). This way if someone only wants the > microarray > > bit of BJ3, they only need install the appropriate JAR file and can > ignore > > the rest. (The 'core' module is for stuff that is so generic it could be > > used anywhere, or is used in every single other module.) > > > > If coding isn't your cup of tea, then we would very much welcome testers > > (particularly those who enjoy writing test cases!), documenters > > (particularly code commenters), translators (for internationalisation of > the > > code), and of course all those who wish to contribute ideas and > suggestions > > no matter how off-the-wall they might be. In particular if you'd like to > > take charge of an area of the development process, e.g. Documentation > Chief, > > or Protein Champion, then that would be much appreciated. > > > > I'm very much looking forward to working with everyone on this. Good > luck, > > and happy coding! > > > > cheers, > > Richard > > > > PS. Please don't forget to attach the appropriate licence to your code. > You > > can copy-and-paste it from the existing classes I just committed this > > evening. > > > > PPS. For those who are worried about backwards compatibility - this was > > discussed on the lists a while back and it was made clear that BJ3 is a > > clean break. However, the existing code will continue to be maintained > and > > bugfixed for a couple of years so you don't have to upgrade if you don't > > want to - it just won't have any new features developed for it. This is > > largely because it'll probably take just that long to write all the new > BJ3 > > code. When we do decide to desupport the existing BJ code, plenty of > notice > > will be given (i.e. years as opposed to months). > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From fbristow at gmail.com Mon Oct 20 13:36:15 2008 From: fbristow at gmail.com (Franklin Bristow) Date: Mon, 20 Oct 2008 08:36:15 -0500 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files In-Reply-To: References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> Message-ID: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com> Hi Richard, I'm getting my records from an indexed flat file. I indexed the file using IndexTools.indexSwissprot(). I am then retrieving the records from the flat file "database" using the SequenceDBLite interface which is being provided to me using the Registry and SystemRegistry classes. The following a simple example of what I am doing: First I index the flat file: > File[] files = new File[] { new File("/home/fbristow/db/uniprot_sprot.dat") > }; > try { > IndexTools.indexSwissprot("uniprot_sprot", new > File("/home/fbristow/db/index/uniprot_sprot"), files); > } catch (BioException bioE) { > bioE.printStackTrace(); > } catch (ParserException parseE) { > parseE.printStackTrace(); > } catch (IOException ioE) { > ioE.printStackTrace(); > } Then I get a handle on that file by doing: > Registry registry = SystemRegistry.instance(); > setSwissDatabase(registry.getDatabase("swissprot")) > And I have a file in /etc that tells the registry how to find the indexes with the swissprot identifier as per http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html Ultimately, this gives me a class that implements the interface SequenceDBLite, and when I query this interface for sequences it returns to me Sequence objects. I can't seem to see anything that would give me a RichSequence, so I think that I'll continue to get them in this manner, but I'll convert the Sequence objects into RichSequence objects myself. Thanks for your attention! On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland wrote: > Hello. > > I'm not sure how you're getting your uniprot records out of your swissprot > database, or what format your swissprot database is in? If it's BioSQL, then > the way BioJava interacts with it has altered significantly with BioJavaX - > previous versions basically stuffed everything in as comments, hence all the > XX lines you got when writing it back out again. However if it's not BioSQL > and you've written something custom of your own, then I couldn't really > comment! > > BioJavaX will attempt to convert the old sequence objects into rich > sequence objects, but there's not much in common between the way uniprot > data is stored in the old object model and the new one. Therefore the enrich > method can't do a very good job - especially for stuff which the original > parser stored as comments instead of properly distributing it across the > object model. Data which the original parser stored in this comment format > will mostly get ignored by the conversion process, because the conversion > process has no idea where the record came from and therefore what to do with > the comments inside it. > > Your best bet is to read your data out of your database directly as rich > sequence objects, or if not possible, then do the conversion manually. > > cheers, > Richard > > > 2008/10/17 Franklin Bristow > >> Hello everyone, >> I've been doing some work with swissprot, and I've been needing to make >> use >> of the file reading and writing facilities in biojava. >> >> I was using biojava 1.5, but I've recently moved to using biojava-live so >> that I can actually step through the code to see what's going on. >> >> I have successfully created an index of my swissprot database and I can >> read >> my sequences out of that indexed database. All of the appropriate >> information is loaded from the records in the file into the appropriate >> objects. I am quite happy with this. >> >> The problem that I am having has to do with writing swissprot records. >> >> When I started using biojava, the recommended way to do this was using >> SeqIOTools: >> SeqIOTools.writeSwissprot(byteStream, swissSequence); >> >> While this works (ie: no exceptions are thrown), the record that is >> printed >> to the byteStream looks pretty ugly (it's littered with XX lines) and is >> not >> valid as per the current swissprot file spec ( >> http://www.expasy.ch/sprot/userman.html). While this record is invalid, >> it >> does contain all of the information that was originally in the swissprot >> file. I would include what I get as an output here, but it's irrelevant. >> >> SeqIOTools became deprecated in favour of this: >> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); >> >> Once again, while this works (and this time the record is valid), the >> record >> that is printed contains almost none of the original information that is >> contained in the swissprot record. This is the output that I get when I >> call this method (the spacing is may not look right because of fonts, but >> that is not the problem): >> >> ID Q4UVA7_null STANDARD; 273 AA. >> > AC Q4UVA7; >> > DT null, integrated into UniProtKB/?. >> > DT null, sequence version 0. >> > DT null, entry version 0. >> > DE null. >> > FT any 1 273 >> > FT any 153 160 >> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >> > // >> > >> >> But what I am expecting to see looks like this (again, the spacing is the >> fault of the font, not the output): >> >> > ID Y1953_XANC8 Reviewed; 273 AA. >> > AC Q4UVA7; >> > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. >> > DT 05-JUL-2005, sequence version 1. >> > DT 06-FEB-2007, entry version 12. >> > DE UPF0085 protein XC_1953. >> > GN OrderedLocusNames=XC_1953; >> > OS Xanthomonas campestris pv. campestris (strain 8004). >> > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; >> > OC Xanthomonadaceae; Xanthomonas. >> > OX NCBI_TaxID=314565; >> > RN [1] >> > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. >> > RX PubMed=15899963; DOI=10.1101/gr.3378705; >> > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q., >> > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., >> > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B., >> > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; >> > RT "Comparative and functional genomic analyses of the pathogenicity >> of >> > RT phytopathogen Xanthomonas campestris pv. campestris."; >> > RL Genome Res. 15:757-767(2005). >> > CC -!- SIMILARITY: Belongs to the UPF0085 family. >> > CC ------------------------------------------------------------ >> > ----------- >> > CC Copyrighted by the UniProt Consortium, see >> > http://www.uniprot.org/terms >> > CC Distributed under the Creative Commons Attribution-NoDerivs License >> > CC ------------------------------------------------------------ >> > ----------- >> > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. >> > DR GenomeReviews; CP000050_GR; XC_1953. >> > DR KEGG; xcb:XC_1953; -. >> > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. >> > DR HAMAP; MF_01062; -; 1. >> > DR InterPro; IPR005177; DUF299. >> > DR Pfam; PF03618; DUF299; 1. >> > KW ATP-binding; Complete proteome; Nucleotide-binding. >> > FT CHAIN 1 273 UPF0085 protein XC_1953. >> > FT /FTId=PRO_0000196744. >> > FT NP_BIND 153 160 ATP (Potential). >> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >> > // >> > >> >> Needless to say, there is a considerable loss of information. >> >> At first I wasn't sure if this was a problem with parsing the database >> that >> I had, so I inspected the object that was retrieved from the database. As >> I >> mentioned before, the parsing seems to be working fine. I get a >> SimpleSequence object that has all of the correct annotations and other >> information loaded into it. >> >> I then continued to step through the writeUniProt method in >> RichSequence.IOTools and found that this method first calls "enrich" on >> SimpleSequence which turns it into a SimpleRichSequence. There appears to >> be some loss of information at this point, specifically in the feature set >> where the 'key name' is lost -- it just becomes 'any'. >> >> It is when we get to the actual process of writing to the stream in >> UniprotFormat.writeSequence that we have the problems. All of the code >> appears to be there for printing the information out that I'm expecting. >> I >> think the problem is that in the process of "enrich"-ing the sequence, the >> data is still stored in the object, but it is no longer where it is >> expected >> to be. For example, when we get to writing the comments out: >> // comments - if any >> if (!rs.getComments().isEmpty()) { >> >> The List of comments IS empty, but there are comments in the >> SimpleRichSequence, they are stored in the notes data member. >> >> So. After this lengthy explanation of my problem, I am wondering if I am >> merely not doing this correctly. Is there a better way to pass my >> information to the writeUniprot method -- should I be transforming my >> SimpleSequence objects into a SimpleRichSequence manually? Am I just >> going >> about this entirely the wrong way? >> >> If I am going about this correctly and the functionality to do this is >> merely not there or hasn't been implemented correctly, I would be more >> than >> happy to help out... I can supply patches, create bug reports, or >> anything >> else that is necessary. >> >> Any guidance in this matter would be greatly appreciated! >> >> -- >> Franklin >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Franklin From holland at eaglegenomics.com Mon Oct 20 13:51:36 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 14:51:36 +0100 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: Excellent! Thanks for your offer of help! Yes, an advanced RNA module would be very helpful indeed. You should probably call it 'rna'. As long as everyone who intends to work on BJ3 declares their intentions here, as you just have, then basically it's first come first served. I won't be doing any official supervision other than keeping an eye on committed code once in a while to make sure it all looks OK. So feel free to start coding straight away! All new modules should probably start by: 1. copying the existing dna module to something new, like 'rna' in this case. 2. remove all the hidden .svn directories from the copy, 3. update the pom.xml in the copy (do a search-and-replace on dna and change to the new name, rna in this case), delete the existing source packages in src/main/java (org.biojava.dna) and create suitable new ones (org.biojava.rna in this case). 4. empty out the target/ folder then svn add the new module 5. svn:ignore the target/ directory in your new module, 6. include your new module in the list at the end of the pom.xml in the root directory of the biojava3 branch. cheers, Richard 2008/10/20 Fabrice Jossinet > Dear Richard, > > I'm answering to your "official call", to propose you my help for the > development of the biojava3 code. With the modularity of Maven, I also would > like to proposes you my help for the development of a module that will use > the biojava3 code to manage more specialized RNA stuff (secondary and > tertiary structures, base-pairs classifications, modified nucleotides, RNA > alignments,....). > > What will be the next step for me? Will you make a selection? > > Best Regards > > Fabrice Jossinet > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Oct 20 14:17:34 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 20 Oct 2008 15:17:34 +0100 Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files In-Reply-To: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com> References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com> <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com> Message-ID: Wow, I didn't know anyone was actually using the registry thing. I certainly never have! That's probably why it was left out of the whole update to RichSequences. There will probably be equivalent functionality in BioJava3 at some point but I doubt anyone will backport the RichSequence updates to the existing registry setup (unless there's any volunteers!). Good luck with the conversion process. cheers, Richard 2008/10/20 Franklin Bristow > Hi Richard, > I'm getting my records from an indexed flat file. I indexed the file using > IndexTools.indexSwissprot(). I am then retrieving the records from the flat > file "database" using the SequenceDBLite interface which is being provided > to me using the Registry and SystemRegistry classes. The following a simple > example of what I am doing: > > First I index the flat file: > >> File[] files = new File[] { new >> File("/home/fbristow/db/uniprot_sprot.dat") }; >> try { >> IndexTools.indexSwissprot("uniprot_sprot", new >> File("/home/fbristow/db/index/uniprot_sprot"), files); >> } catch (BioException bioE) { >> bioE.printStackTrace(); >> } catch (ParserException parseE) { >> parseE.printStackTrace(); >> } catch (IOException ioE) { >> ioE.printStackTrace(); >> } > > > Then I get a handle on that file by doing: > >> Registry registry = SystemRegistry.instance(); >> setSwissDatabase(registry.getDatabase("swissprot")) >> > > And I have a file in /etc that tells the registry how to find the indexes > with the swissprot identifier as per > http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html > > Ultimately, this gives me a class that implements the interface > SequenceDBLite, and when I query this interface for sequences it returns to > me Sequence objects. I can't seem to see anything that would give me a > RichSequence, so I think that I'll continue to get them in this manner, but > I'll convert the Sequence objects into RichSequence objects myself. > > Thanks for your attention! > > > On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland < > holland at eaglegenomics.com> wrote: > >> Hello. >> >> I'm not sure how you're getting your uniprot records out of your swissprot >> database, or what format your swissprot database is in? If it's BioSQL, then >> the way BioJava interacts with it has altered significantly with BioJavaX - >> previous versions basically stuffed everything in as comments, hence all the >> XX lines you got when writing it back out again. However if it's not BioSQL >> and you've written something custom of your own, then I couldn't really >> comment! >> >> BioJavaX will attempt to convert the old sequence objects into rich >> sequence objects, but there's not much in common between the way uniprot >> data is stored in the old object model and the new one. Therefore the enrich >> method can't do a very good job - especially for stuff which the original >> parser stored as comments instead of properly distributing it across the >> object model. Data which the original parser stored in this comment format >> will mostly get ignored by the conversion process, because the conversion >> process has no idea where the record came from and therefore what to do with >> the comments inside it. >> >> Your best bet is to read your data out of your database directly as rich >> sequence objects, or if not possible, then do the conversion manually. >> >> cheers, >> Richard >> >> >> 2008/10/17 Franklin Bristow >> >>> Hello everyone, >>> I've been doing some work with swissprot, and I've been needing to make >>> use >>> of the file reading and writing facilities in biojava. >>> >>> I was using biojava 1.5, but I've recently moved to using biojava-live so >>> that I can actually step through the code to see what's going on. >>> >>> I have successfully created an index of my swissprot database and I can >>> read >>> my sequences out of that indexed database. All of the appropriate >>> information is loaded from the records in the file into the appropriate >>> objects. I am quite happy with this. >>> >>> The problem that I am having has to do with writing swissprot records. >>> >>> When I started using biojava, the recommended way to do this was using >>> SeqIOTools: >>> SeqIOTools.writeSwissprot(byteStream, swissSequence); >>> >>> While this works (ie: no exceptions are thrown), the record that is >>> printed >>> to the byteStream looks pretty ugly (it's littered with XX lines) and is >>> not >>> valid as per the current swissprot file spec ( >>> http://www.expasy.ch/sprot/userman.html). While this record is invalid, >>> it >>> does contain all of the information that was originally in the swissprot >>> file. I would include what I get as an output here, but it's irrelevant. >>> >>> SeqIOTools became deprecated in favour of this: >>> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null); >>> >>> Once again, while this works (and this time the record is valid), the >>> record >>> that is printed contains almost none of the original information that is >>> contained in the swissprot record. This is the output that I get when I >>> call this method (the spacing is may not look right because of fonts, but >>> that is not the problem): >>> >>> ID Q4UVA7_null STANDARD; 273 AA. >>> > AC Q4UVA7; >>> > DT null, integrated into UniProtKB/?. >>> > DT null, sequence version 0. >>> > DT null, entry version 0. >>> > DE null. >>> > FT any 1 273 >>> > FT any 153 160 >>> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >>> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >>> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >>> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >>> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >>> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >>> > // >>> > >>> >>> But what I am expecting to see looks like this (again, the spacing is the >>> fault of the font, not the output): >>> >>> > ID Y1953_XANC8 Reviewed; 273 AA. >>> > AC Q4UVA7; >>> > DT 10-JAN-2006, integrated into UniProtKB/Swiss-Prot. >>> > DT 05-JUL-2005, sequence version 1. >>> > DT 06-FEB-2007, entry version 12. >>> > DE UPF0085 protein XC_1953. >>> > GN OrderedLocusNames=XC_1953; >>> > OS Xanthomonas campestris pv. campestris (strain 8004). >>> > OC Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales; >>> > OC Xanthomonadaceae; Xanthomonas. >>> > OX NCBI_TaxID=314565; >>> > RN [1] >>> > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. >>> > RX PubMed=15899963; DOI=10.1101/gr.3378705; >>> > RA Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun >>> Q., >>> > RA Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L., >>> > RA Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen >>> B., >>> > RA Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.; >>> > RT "Comparative and functional genomic analyses of the pathogenicity >>> of >>> > RT phytopathogen Xanthomonas campestris pv. campestris."; >>> > RL Genome Res. 15:757-767(2005). >>> > CC -!- SIMILARITY: Belongs to the UPF0085 family. >>> > CC ------------------------------------------------------------ >>> > ----------- >>> > CC Copyrighted by the UniProt Consortium, see >>> > http://www.uniprot.org/terms >>> > CC Distributed under the Creative Commons Attribution-NoDerivs >>> License >>> > CC ------------------------------------------------------------ >>> > ----------- >>> > DR EMBL; CP000050; AAY49016.1; -; Genomic_DNA. >>> > DR GenomeReviews; CP000050_GR; XC_1953. >>> > DR KEGG; xcb:XC_1953; -. >>> > DR GO; GO:0005524; F:ATP binding; IEA:HAMAP. >>> > DR HAMAP; MF_01062; -; 1. >>> > DR InterPro; IPR005177; DUF299. >>> > DR Pfam; PF03618; DUF299; 1. >>> > KW ATP-binding; Complete proteome; Nucleotide-binding. >>> > FT CHAIN 1 273 UPF0085 protein XC_1953. >>> > FT /FTId=PRO_0000196744. >>> > FT NP_BIND 153 160 ATP (Potential). >>> > SQ SEQUENCE 273 AA; 30853 MW; 604FB6C6437A9D90 CRC64; >>> > MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE >>> > RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF >>> > ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT >>> > EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF >>> > QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF >>> > // >>> > >>> >>> Needless to say, there is a considerable loss of information. >>> >>> At first I wasn't sure if this was a problem with parsing the database >>> that >>> I had, so I inspected the object that was retrieved from the database. >>> As I >>> mentioned before, the parsing seems to be working fine. I get a >>> SimpleSequence object that has all of the correct annotations and other >>> information loaded into it. >>> >>> I then continued to step through the writeUniProt method in >>> RichSequence.IOTools and found that this method first calls "enrich" on >>> SimpleSequence which turns it into a SimpleRichSequence. There appears >>> to >>> be some loss of information at this point, specifically in the feature >>> set >>> where the 'key name' is lost -- it just becomes 'any'. >>> >>> It is when we get to the actual process of writing to the stream in >>> UniprotFormat.writeSequence that we have the problems. All of the code >>> appears to be there for printing the information out that I'm expecting. >>> I >>> think the problem is that in the process of "enrich"-ing the sequence, >>> the >>> data is still stored in the object, but it is no longer where it is >>> expected >>> to be. For example, when we get to writing the comments out: >>> // comments - if any >>> if (!rs.getComments().isEmpty()) { >>> >>> The List of comments IS empty, but there are comments in the >>> SimpleRichSequence, they are stored in the notes data member. >>> >>> So. After this lengthy explanation of my problem, I am wondering if I am >>> merely not doing this correctly. Is there a better way to pass my >>> information to the writeUniprot method -- should I be transforming my >>> SimpleSequence objects into a SimpleRichSequence manually? Am I just >>> going >>> about this entirely the wrong way? >>> >>> If I am going about this correctly and the functionality to do this is >>> merely not there or hasn't been implemented correctly, I would be more >>> than >>> happy to help out... I can supply patches, create bug reports, or >>> anything >>> else that is necessary. >>> >>> Any guidance in this matter would be greatly appreciated! >>> >>> -- >>> Franklin >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > > > -- > Franklin > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From f.jossinet at ibmc.u-strasbg.fr Mon Oct 20 13:04:29 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Mon, 20 Oct 2008 15:04:29 +0200 Subject: [Biojava-dev] BioJava3 contribution Message-ID: Dear Richard, I'm answering to your "official call", to propose you my help for the development of the biojava3 code. With the modularity of Maven, I also would like to proposes you my help for the development of a module that will use the biojava3 code to manage more specialized RNA stuff (secondary and tertiary structures, base-pairs classifications, modified nucleotides, RNA alignments,....). What will be the next step for me? Will you make a selection? Best Regards Fabrice Jossinet -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From andreas at sdsc.edu Mon Oct 20 19:18:48 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 20 Oct 2008 12:18:48 -0700 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> Hi Fabrice, Regarding the tertiaty structure representation we should work together. There is a seet of tools available already in the current biojava 1.7 which I was intending to maintain and migrate to biojava v 3. Let me know if you have specific RNA related requests... Andreas On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland wrote: > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their intentions > here, as you just have, then basically it's first come first served. I won't > be doing any official supervision other than keeping an eye on committed > code once in a while to make sure it all looks OK. So feel free to start > coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in this > case. > 2. remove all the hidden .svn directories from the copy, > 3. update the pom.xml in the copy (do a search-and-replace on dna and change > to the new name, rna in this case), delete the existing source packages in > src/main/java (org.biojava.dna) and create suitable new ones > (org.biojava.rna in this case). > 4. empty out the target/ folder then svn add the new module > 5. svn:ignore the target/ directory in your new module, > 6. include your new module in the list at the end of the pom.xml in the root > directory of the biojava3 branch. > > cheers, > Richard > > > > 2008/10/20 Fabrice Jossinet > >> Dear Richard, >> >> I'm answering to your "official call", to propose you my help for the >> development of the biojava3 code. With the modularity of Maven, I also would >> like to proposes you my help for the development of a module that will use >> the biojava3 code to manage more specialized RNA stuff (secondary and >> tertiary structures, base-pairs classifications, modified nucleotides, RNA >> alignments,....). >> >> What will be the next step for me? Will you make a selection? >> >> Best Regards >> >> Fabrice Jossinet >> >> -- >> Dr. Fabrice Jossinet >> Laboratoire de Bioinformatique, modelisation et simulation des acides >> nucleiques >> Universite Louis Pasteur >> Institut de biologie moleculaire et cellulaire du CNRS >> UPR9002, Architecture et Reactivite de l'ARN >> 15 rue Rene Descartes >> F-67084 Strasbourg Cedex >> France >> >> Tel + 33 (0) 3 88 417053 >> FAX + 33 (0) 3 88 60 22 18 >> >> f.jossinet at ibmc.u-strasbg.fr >> fjossinet at gmail.com >> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >> http://fjossinet.u-strasbg.fr/ >> >> >> >> >> > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From fjossinet at orange.fr Mon Oct 20 20:40:26 2008 From: fjossinet at orange.fr (Fabrice Jossinet) Date: Mon, 20 Oct 2008 22:40:26 +0200 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> References: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> Message-ID: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr> Hi Andreas, yes of course, I really would like to work with you (I like your work with SPICE). I wanted to contact you about this point before to start. Concerning the tertiary structure representation, I need to annotate an RNA tertiary structure with base-pairs families (as described in http://www.ncbi.nlm.nih.gov/pubmed/12177293 or in http://prion.bchs.uh.edu/bp_type/ ) and structural motifs (like those listed in the SCOR database http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814) . The idea is to attach these features to a 3D in the same way than the features attached to a sequence (1D). What do you think? Fabrice Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit : > Hi Fabrice, > > Regarding the tertiaty structure representation we should work > together. There is a seet of tools available already in the current > biojava 1.7 which I was intending to maintain and migrate to biojava v > 3. Let me know if you have specific RNA related requests... > > Andreas > > On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland > wrote: >> Excellent! Thanks for your offer of help! >> >> Yes, an advanced RNA module would be very helpful indeed. You should >> probably call it 'rna'. >> >> As long as everyone who intends to work on BJ3 declares their >> intentions >> here, as you just have, then basically it's first come first >> served. I won't >> be doing any official supervision other than keeping an eye on >> committed >> code once in a while to make sure it all looks OK. So feel free to >> start >> coding straight away! >> >> All new modules should probably start by: >> >> 1. copying the existing dna module to something new, like 'rna' in >> this >> case. >> 2. remove all the hidden .svn directories from the copy, >> 3. update the pom.xml in the copy (do a search-and-replace on dna >> and change >> to the new name, rna in this case), delete the existing source >> packages in >> src/main/java (org.biojava.dna) and create suitable new ones >> (org.biojava.rna in this case). >> 4. empty out the target/ folder then svn add the new module >> 5. svn:ignore the target/ directory in your new module, >> 6. include your new module in the list at the end of the pom.xml in >> the root >> directory of the biojava3 branch. >> >> cheers, >> Richard >> >> >> >> 2008/10/20 Fabrice Jossinet >> >>> Dear Richard, >>> >>> I'm answering to your "official call", to propose you my help for >>> the >>> development of the biojava3 code. With the modularity of Maven, I >>> also would >>> like to proposes you my help for the development of a module that >>> will use >>> the biojava3 code to manage more specialized RNA stuff (secondary >>> and >>> tertiary structures, base-pairs classifications, modified >>> nucleotides, RNA >>> alignments,....). >>> >>> What will be the next step for me? Will you make a selection? >>> >>> Best Regards >>> >>> Fabrice Jossinet >>> >>> -- >>> Dr. Fabrice Jossinet >>> Laboratoire de Bioinformatique, modelisation et simulation des >>> acides >>> nucleiques >>> Universite Louis Pasteur >>> Institut de biologie moleculaire et cellulaire du CNRS >>> UPR9002, Architecture et Reactivite de l'ARN >>> 15 rue Rene Descartes >>> F-67084 Strasbourg Cedex >>> France >>> >>> Tel + 33 (0) 3 88 417053 >>> FAX + 33 (0) 3 88 60 22 18 >>> >>> f.jossinet at ibmc.u-strasbg.fr >>> fjossinet at gmail.com >>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >>> http://fjossinet.u-strasbg.fr/ >>> >>> >>> >>> >>> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From markjschreiber at gmail.com Tue Oct 21 02:54:27 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 21 Oct 2008 10:54:27 +0800 Subject: [Biojava-dev] Biojava / BioSQL entity beans Message-ID: <93b45ca50810201954k44ab0f65xb94a0214d8eb4e13@mail.gmail.com> Hi - Richard has kindly uploaded some JPA Entity beans that map to the BioSQL database schema as a BioSQL module for BJ3. These entity beans where generated as part of the Tokyo webservices workshop. As Entities they are useful as POJOs as well as data transfer via JPA, JAXB and can be used in EJB containers or a plain old JVM. The have no biological smarts and the intention was/is that these will be provided by wrapping them in Bio-aware (and more thread safe) wrappers that implement interfaces from other BJ3 modules. In essence it is a persistence layer. The following is copied verbatim from the package-info.java and gives you some idea of how I intend the package to be used (obviously some of this is still to come). There is also some discussion of some of the gotcha's that might trip you up when playing with object relational persistence. BTW the naming convention is to call something FooEntity. Where BioSQL requires a compound primary key this is implemented as an Embeddable object called FooEntityPK which is the key for FooEntity. The other thing you may see is FooEntityUK which is the same concept but represents some of the cases where BioSQL tables don't have a primary key (even a compound one) but implicitly they do because all the fields have the SQL unique restriction. In these cases JPA still requires an Embeddable key to track updates. As far as Java is concerned they are the same as a FooEntityPK but I used a different name to make the distinction. The annotations provide mapping to tables from a Derby database. This is the reference Java in memory DB which can run from any JVM and is also found in Glassfish. The mappings will likely also work with MySQL. For Oracle (and possibly others) you would need to override the @GeneratedValue strategy for generating primary keys. I believe this can be done with external XML config files. You may also wish to overide the default eager loading and cascade annotations depending on your JPA persistence method and preferences. This has been lightly tested using Glassfish, Derby and Toplink essentials and is a work in progress but seems to work OK. Best regards, - Mark /** * The package contains Entity representations of BioJava classes. * The purpose of these entities is to allow simple serialization of BioJava data * using binary serialization for protocols that require this (eg RPC between * Java application servers) as well as persistence mechanisms that require bean * like ojbects such as the Java Persistence Architechture (JPA) or the * Java API for XML Binding (JAXB). For this reason all objects in this package * should provide a parameterless public constructor and public get/set methods * for relevant fields. *

* Given the public nature of the constructors and the setters in these beans * these classes are not intended for direct use in general programming when * using the BioJava v3 API. This is because it is possible to leave the bean in * and inconsitent state and they are not thread safe unless synchronization * controlled externally (via synchornization blocks or via a application container). *

* The Entities are intended to back other objects that a * programer will interact with directly. For example Foo.class will be backed * by FooEntity.class. Generally interaction with Foo.class is to be prefered and * will often be more sensible as the entities typically provide no 'biological * behaivour'. Relevant behaivour should be provided by the wrapping class. It is best * to think of Foo as a view onto the data that is held in the * FooEntity. A good example is the sophisticated Symbol * behaivour that can represent biological logic about IUPAC ambiguity symbols. * For example a 'w' in a Biosequence represents an abiguity between 'a' and 't', * whereas a 'w' in BiosequenceEntity is simply a 'w' and nothing else. *

* The wrapper entity pattern is intended to allow for a lot of the advanced * behaivour in the original BioJava while also allowing use of modern transport * and persistence packages. This is achieved by peristing and transporting the * entity without the wrapper and re-wrapping it at the other end. *

* Currently BioJava v3 uses annotated @Id fields to define * equals(Object o). Consistent definition is critical to how * the object will behave when persisted to a database. In the case of: *

 * Foo f = ... initialize
 * Foo fo = ... initialize
 * boolean b = f.equals(fo);
 * 
* b would be true if both objects share the same value * (or embeddable object) in the field that represents the primary key in the * database even if all other fields are equal. This is desirable because * two entities representing the same DB record may be retreived from two different * sessions. Additionally these are the identity fields, so logically, they should map to * the concept of identity. Finally, searching a collection is made very simple * without requireing an iterator: *
 * Integer id = //code to initialize
 * collection.contains(new Foo(id));
 * 
* By default BioJava v3 entities use only the primary key field for equality * If either record has null as the primary key value it is never equal * to another. When implementing equals(Object o) it is not advisable to perform * the test this.getClass() == o.getClass() because of the possibility of proxy * classes used in JPA. This can, however, lead to an issue with the * hashcode() method. Consider the following code: *
 * Foo foo = new Foo() //no primary key
 * HashSet set = new HashSet();
 * set.add(foo);
 * // code here to persist Foo and consequently generate it's PK
 * boolean b = set.contains(foo);
 * 
* Because only the PK is used for equality, then the PK is used in the hashcode. * This means that b is probably going to be false because * it would have been stored in a hash bucket using the old hashcode that will * now be different even though the set actually does contain a pointer to foo. * Although a potential deficiency it is unlikely to be a major problem for * BioJava v3 developers because using entity backed objects is prefered to direct * interaction with entities. If you need to use entities directly then use hashed * collections with caution. * *

Wrapper classes can either delegate it's equals call to the underlying * entity or it can do something that is more biologically sensible * (as PK values are typically not exposed in the wrapper). It is probably more * sensible for a wrapper to define it's own equals (and haschode * implementations due to the limitations of the default @Id based system * described above. Especially the potential hashcode problems. * * For example FooSequence.class might want to base * equality on the exact match of the DNA sequence it holds even though * FooSequenceEntity.class may only use the PK field. If delegation * is used (or not) it should be clearly documented. *

* *

* @author Mark Schreiber */ package org.biojava.biosql.entity; From andreas at sdsc.edu Tue Oct 21 03:17:28 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 20 Oct 2008 20:17:28 -0700 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> Hi, Couple of thoughts regarding biojava v3: License: Since it seems we will end up copying code from biojava 1.6 to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e. people should still use the same biojava license headers when committing new files and all code will be considered to be LGPL, if no header is present. Do NOT commit code under other licenses. Installation: We need some installation instructions on the wiki site, e.g. how to get the maven setup running. What are the code conventions for the new version? Blast: the Blast parsing modules are among the most frequently used ones in biojava 1.6. To make people use biojava v3 it will be crucial to have a port of them to the new version. Does anybody want to take care of that? Automated builds: is it interesting to have automated builds set up for the new version at this stage, or should we wait until a more mature stage? I could easily add another auto-build similar to the one for biojava 1.6 at http://www.spice-3d.org/cruise/ Andreas On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland wrote: > Hi all, > > I've just committed some new code to the biojava3 branch of the biojava-live > subversion repository. It's the foundations of a brand new alphabet+symbol > set of classes, and an example of how to use them to represent DNA. You'll > notice that the new code is very lightweight and allows for a lot more > flexibility than the old code - for instance, the concept of Alphabet has > changed radically. It also makes much more extensive use of the Collections > API. > > I haven't got any test cases or usage examples yet but give me a shout if > you don't understand the code and I'll explain how it works. (Hint: > SymbolFormat is there to convert Strings into SymbolList objects, and vice > versa). > > So, now we want some volunteers! We're starting from scratch here so there's > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > whether it be copy-and-paste existing classes and modify them to suit the > new style, or write completely new ones to provide equivalent functionality. > > > I'll post an example of how to do file parsing soon, probably starting with > FASTA. In the meantime, a good place to start would be for people to design > object models to represent their favourite data types (e.g. Genbank, or > microarray data). Utility classes to manipulate those objects would be great > too. > > The object models need to be normalised as much as possible - e.g. if your > data has a lot of comments, and the order of those comments is important, > then give your object model a collection of comment objects. The object > model for each data type should be completely independent and use basic data > types wherever possible (e.g. store sequences as strings, don't attempt to > parse them into anything fancy like SymbolLists). The closer the object > model is to the original data format, the better. There's going to be clever > tricks when it comes to converting data between different object models > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > parsing examples up. > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > because we want to make it as modular as possible, so if you want to write > microarray stuff, create a new microarray sub-project (as per the dna > example that's already there). This way if someone only wants the microarray > bit of BJ3, they only need install the appropriate JAR file and can ignore > the rest. (The 'core' module is for stuff that is so generic it could be > used anywhere, or is used in every single other module.) > > If coding isn't your cup of tea, then we would very much welcome testers > (particularly those who enjoy writing test cases!), documenters > (particularly code commenters), translators (for internationalisation of the > code), and of course all those who wish to contribute ideas and suggestions > no matter how off-the-wall they might be. In particular if you'd like to > take charge of an area of the development process, e.g. Documentation Chief, > or Protein Champion, then that would be much appreciated. > > I'm very much looking forward to working with everyone on this. Good luck, > and happy coding! > > cheers, > Richard > > PS. Please don't forget to attach the appropriate licence to your code. You > can copy-and-paste it from the existing classes I just committed this > evening. > > PPS. For those who are worried about backwards compatibility - this was > discussed on the lists a while back and it was made clear that BJ3 is a > clean break. However, the existing code will continue to be maintained and > bugfixed for a couple of years so you don't have to upgrade if you don't > want to - it just won't have any new features developed for it. This is > largely because it'll probably take just that long to write all the new BJ3 > code. When we do decide to desupport the existing BJ code, plenty of notice > will be given (i.e. years as opposed to months). > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From fjossinet at orange.fr Tue Oct 21 07:09:46 2008 From: fjossinet at orange.fr (Fabrice Jossinet) Date: Tue, 21 Oct 2008 09:09:46 +0200 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: Hi Richard, I did everything but, with my IntelliJ IDE, I cannot commit the new rna module due to a failure in authentification. Do I have to register somewhere to have an account? (but perhaps it's a wrong configuration on my side) Fabrice Le 20 oct. 08 ? 15:51, Richard Holland a ?crit : > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their > intentions here, as you just have, then basically it's first come > first served. I won't be doing any official supervision other than > keeping an eye on committed code once in a while to make sure it all > looks OK. So feel free to start coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in > this case. > 2. remove all the hidden .svn directories from the copy, > 3. update the pom.xml in the copy (do a search-and-replace on dna > and change to the new name, rna in this case), delete the existing > source packages in src/main/java (org.biojava.dna) and create > suitable new ones (org.biojava.rna in this case). > 4. empty out the target/ folder then svn add the new module > 5. svn:ignore the target/ directory in your new module, > 6. include your new module in the list at the end of the pom.xml in > the root directory of the biojava3 branch. > > cheers, > Richard > > > > 2008/10/20 Fabrice Jossinet > Dear Richard, > > I'm answering to your "official call", to propose you my help for > the development of the biojava3 code. With the modularity of Maven, > I also would like to proposes you my help for the development of a > module that will use the biojava3 code to manage more specialized > RNA stuff (secondary and tertiary structures, base-pairs > classifications, modified nucleotides, RNA alignments,....). > > What will be the next step for me? Will you make a selection? > > Best Regards > > Fabrice Jossinet > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From holland at eaglegenomics.com Tue Oct 21 09:06:41 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 10:06:41 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> Message-ID: > > > License: Since it seems we will end up copying code from biojava 1.6 > to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e. > people should still use the same biojava license headers when > committing new files and all code will be considered to be LGPL, if no > header is present. Do NOT commit code under other licenses. > > Installation: We need some installation instructions on the wiki site, > e.g. how to get the maven setup running. What are the code > conventions for the new version? Not sure where best to put it in the Wiki, but I agree it needs to go there somewhere. Installation is a one-liner from within the top level of the project: mvn install This compiles and installs the JARs into your local Maven repository, and also downloads and installs any external dependencies. Then you can add the installed modules as dependencies in your own Maven projects. If you need to write a launcher script for your project, or you want to use the JAR files outside Maven, you can use this command to generate the CLASSPATH for use outside Maven. This only includes external dependencies - you'll also need to add to it the individual JAR files from inside the various target/ folders that Maven built for you: mvn dependency:build-classpath Code conventions are simple: 1. I'm not fussed about the specific formatter people use in each module, as long as the code is all formatted using some kind of consistent method. I personally just use the default settings from Format code in NetBeans. 2. Use 'this' wherever possible, and for static references, use the classname prefix (e.g. MyClass.staticField). I hate having to try and work out in my head which references are going where, and which are static and which are not! 3. Comment every single method, even if it's private. This helps understand the flow of your code. Also comment liberally inside methods if they are longer than just a few lines (i.e. if you can't fit the entire method within the code panel in NetBeans, its going to need internal comments). 4. When writing getters/setters, follow the Java beans conventions so that automated frameworks like Spring can easily pick it up and work with it. 5. Please write tests for your code using JUnit conventions, inside the test/ folder of each module. I know I haven't done this myself yet, but I'm going to! > > > Blast: the Blast parsing modules are among the most frequently used > ones in biojava 1.6. To make people use biojava v3 it will be crucial > to have a port of them to the new version. Does anybody want to take > care of that? I'll second that. Blast is vital. We'd really appreciate a volunteer, please! > > Automated builds: is it interesting to have automated builds set up > for the new version at this stage, or should we wait until a more > mature stage? I could easily add another auto-build similar to the one > for biojava 1.6 at http://www.spice-3d.org/cruise/ You could do, although I don't think they'd be much use yet. But why not start early then we won't forget to do it later. Richard > > Andreas > > On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland > wrote: > > Hi all, > > > > I've just committed some new code to the biojava3 branch of the > biojava-live > > subversion repository. It's the foundations of a brand new > alphabet+symbol > > set of classes, and an example of how to use them to represent DNA. > You'll > > notice that the new code is very lightweight and allows for a lot more > > flexibility than the old code - for instance, the concept of Alphabet has > > changed radically. It also makes much more extensive use of the > Collections > > API. > > > > I haven't got any test cases or usage examples yet but give me a shout if > > you don't understand the code and I'll explain how it works. (Hint: > > SymbolFormat is there to convert Strings into SymbolList objects, and > vice > > versa). > > > > So, now we want some volunteers! We're starting from scratch here so > there's > > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > > whether it be copy-and-paste existing classes and modify them to suit the > > new style, or write completely new ones to provide equivalent > functionality. > > > > > > I'll post an example of how to do file parsing soon, probably starting > with > > FASTA. In the meantime, a good place to start would be for people to > design > > object models to represent their favourite data types (e.g. Genbank, or > > microarray data). Utility classes to manipulate those objects would be > great > > too. > > > > The object models need to be normalised as much as possible - e.g. if > your > > data has a lot of comments, and the order of those comments is important, > > then give your object model a collection of comment objects. The object > > model for each data type should be completely independent and use basic > data > > types wherever possible (e.g. store sequences as strings, don't attempt > to > > parse them into anything fancy like SymbolLists). The closer the object > > model is to the original data format, the better. There's going to be > clever > > tricks when it comes to converting data between different object models > > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > > parsing examples up. > > > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > > because we want to make it as modular as possible, so if you want to > write > > microarray stuff, create a new microarray sub-project (as per the dna > > example that's already there). This way if someone only wants the > microarray > > bit of BJ3, they only need install the appropriate JAR file and can > ignore > > the rest. (The 'core' module is for stuff that is so generic it could be > > used anywhere, or is used in every single other module.) > > > > If coding isn't your cup of tea, then we would very much welcome testers > > (particularly those who enjoy writing test cases!), documenters > > (particularly code commenters), translators (for internationalisation of > the > > code), and of course all those who wish to contribute ideas and > suggestions > > no matter how off-the-wall they might be. In particular if you'd like to > > take charge of an area of the development process, e.g. Documentation > Chief, > > or Protein Champion, then that would be much appreciated. > > > > I'm very much looking forward to working with everyone on this. Good > luck, > > and happy coding! > > > > cheers, > > Richard > > > > PS. Please don't forget to attach the appropriate licence to your code. > You > > can copy-and-paste it from the existing classes I just committed this > > evening. > > > > PPS. For those who are worried about backwards compatibility - this was > > discussed on the lists a while back and it was made clear that BJ3 is a > > clean break. However, the existing code will continue to be maintained > and > > bugfixed for a couple of years so you don't have to upgrade if you don't > > want to - it just won't have any new features developed for it. This is > > largely because it'll probably take just that long to write all the new > BJ3 > > code. When we do decide to desupport the existing BJ code, plenty of > notice > > will be given (i.e. years as opposed to months). > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Tue Oct 21 09:09:26 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 10:09:26 +0100 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: References: Message-ID: Ah, yes. The person to talk to is Andreas. He has control over the SVN repository. 2008/10/21 Fabrice Jossinet > Hi Richard, > I did everything but, with my IntelliJ IDE, I cannot commit the new rna > module due to a failure in authentification. Do I have to register somewhere > to have an account? (but perhaps it's a wrong configuration on my side) > > Fabrice > > Le 20 oct. 08 ? 15:51, Richard Holland a ?crit : > > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their intentions > here, as you just have, then basically it's first come first served. I won't > be doing any official supervision other than keeping an eye on committed > code once in a while to make sure it all looks OK. So feel free to start > coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in this > case. > 2. remove all the hidden .svn directories from the copy, > 3. update the pom.xml in the copy (do a search-and-replace on dna and > change to the new name, rna in this case), delete the existing source > packages in src/main/java (org.biojava.dna) and create suitable new ones > (org.biojava.rna in this case). > 4. empty out the target/ folder then svn add the new module > 5. svn:ignore the target/ directory in your new module, > 6. include your new module in the list at the end of the pom.xml in the > root directory of the biojava3 branch. > > cheers, > Richard > > > > 2008/10/20 Fabrice Jossinet > >> Dear Richard, >> >> I'm answering to your "official call", to propose you my help for the >> development of the biojava3 code. With the modularity of Maven, I also would >> like to proposes you my help for the development of a module that will use >> the biojava3 code to manage more specialized RNA stuff (secondary and >> tertiary structures, base-pairs classifications, modified nucleotides, RNA >> alignments,....). >> >> What will be the next step for me? Will you make a selection? >> >> Best Regards >> >> Fabrice Jossinet >> >> -- >> Dr. Fabrice Jossinet >> Laboratoire de Bioinformatique, modelisation et simulation des acides >> nucleiques >> Universite Louis Pasteur >> Institut de biologie moleculaire et cellulaire du CNRS >> UPR9002, Architecture et Reactivite de l'ARN >> 15 rue Rene Descartes >> F-67084 Strasbourg Cedex >> France >> >> Tel + 33 (0) 3 88 417053 >> FAX + 33 (0) 3 88 60 22 18 >> >> f.jossinet at ibmc.u-strasbg.fr >> fjossinet at gmail.com >> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >> http://fjossinet.u-strasbg.fr/ >> >> >> >> >> > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > > > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Tue Oct 21 09:26:41 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 21 Oct 2008 17:26:41 +0800 Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please! In-Reply-To: References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> Message-ID: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com> >> Blast: the Blast parsing modules are among the most frequently used >> ones in biojava 1.6. To make people use biojava v3 it will be crucial >> to have a port of them to the new version. Does anybody want to take >> care of that? > > > I'll second that. Blast is vital. We'd really appreciate a volunteer, > please! > BlastXML output would certainly be the easiest place to start. I also think with the new Thing/ ThingBuilder framework it will be possible to develop all manner of parsers for the vagaries of Blast text output that come with each new release of Blast. Possible but maybe not a good idea. I don't think that output was ever supposed to be machine readable. The table formatted output (-m8 I think) would be a better option. Given the DTD it should be possible to do a quick JAXB binding. How would that work in the Thing/ ThingBuilder paradigm? - Mark From holland at eaglegenomics.com Tue Oct 21 10:18:40 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 11:18:40 +0100 Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please! In-Reply-To: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com> References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com> <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com> Message-ID: JAXB would follow the exact same Thing/ThingBuilder pattern, but with the following subtle differences... 0. Your root data model object as generated by JAXB should be modified to implement Thing, making it a JAXBThing. 1. JAXBReader (extends ThingReader) would open and read the file using JAXB and directly construct JAXBThings. 2. JAXBReceiver (extends ThingReceiver) be a pass-through interface with just one method, something like setJAXBThing() to pass in the already-parsed JAXBThing directly. 3. Any converters would expand/deflate data from other formats to/from the JAXBThing object directly. Richard. 2008/10/21 Mark Schreiber > >> Blast: the Blast parsing modules are among the most frequently used > >> ones in biojava 1.6. To make people use biojava v3 it will be crucial > >> to have a port of them to the new version. Does anybody want to take > >> care of that? > > > > > > I'll second that. Blast is vital. We'd really appreciate a volunteer, > > please! > > > > BlastXML output would certainly be the easiest place to start. I also > think with the new Thing/ ThingBuilder framework it will be possible > to develop all manner of parsers for the vagaries of Blast text output > that come with each new release of Blast. Possible but maybe not a > good idea. I don't think that output was ever supposed to be machine > readable. The table formatted output (-m8 I think) would be a better > option. > > Given the DTD it should be possible to do a quick JAXB binding. How > would that work in the Thing/ ThingBuilder paradigm? > > - Mark > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From dicknetherlands at gmail.com Tue Oct 21 11:14:29 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Tue, 21 Oct 2008 12:14:29 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> Message-ID: For now, yes it's empty. But I can envisage situations where it might be nice to have Thing implement some common methods (e.g. isMachineGenerated(), isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder for future expansion, than have to re-engineer everything should we identify a need for common functions in future. You'll see that Thing already extends Serializable, implying that all Things must be able to persist to an object backing store. Serializable itself is also an empty interface! Also I like the idea of having Thing, not Object, as a kind of marker of intention. To me it makes it clearer when reading code to avoid Object wherever possible. Thing may not be any more clever than Object, but it immediately declares an intention when reading code as to what kind of Object should be expected. 2008/10/21 Mark Schreiber > Is there any need for Thing at all? Can't a bulder be typed to produce > something that extends Object? > > If Thing provides no behaivour contract or meta-information then why > does it exist? > > - Mark > > On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: > > Depends on what you want to program. If you want to have a collection of > > objects which are Things & perform a common action on them then > > annotations are not the way forward. > > > > If you want to have some kind of meta-programming occurring & need a > > class to be multiple things then annotations are right. There is > > currently no way to enforce compile time dependencies on annotations & > > my thinking is that this is right. Annotations should be meta data or > > provide a way to alter a class in a non-invasive way (think Web Service > > annotations creating WS Servers & Clients without any alteration of the > > class). > > > > Andy > > > > Richard Holland wrote: > >> Spot on. > >> > >> Annotation/interface.... i think Annotation is probably better as you > >> suggest, but I'd have to look into that. Not sure how it works with > >> collections and generics. If it does turn out to be a better bet, I'll > >> change it over. > >> > >> With the BioSQL dependencies, take a look at the pom.xml file inside the > >> biojava-dna module. It declares a dependency on biojava-core. If you > want to > >> add dependencies to external JARs, take a look at biojava-biosql's > pom.xml > >> to see how it depends on javax.persistence. (The easiest way to add > these is > >> via an IDE such as NetBeans, which is what I'm using at the moment). > >> > >> cheers, > >> Richard > >> > >> 2008/10/21 Mark Schreiber > >> > >>> So if I want to build a BioSQL loader from Genbank then would the > >>> classes (or there wrappers) in the BioSQL Entity package need to > >>> implement Thing? Would maven have an issue with that or would it just > >>> create a dependency on core? (you can tell I've never used Maven > >>> right). > >>> > >>> From a design point of view should Thing be an interface or an > >>> Annotation? The reason I ask is that it doesn't define any methods so > >>> it is more of a tag than an interface. > >>> > >>> Anyway, my understanding is that I would use a Genbank parser (or > >>> write one). Write a EntityReceiver interface (probably more than one > >>> given the number of entities in BioSQL, implement a EntityBuilder > >>> (again possibly more than one) that implements EntityReceiver and > >>> builds Entity beans from messages it receives. In this case I probably > >>> wouldn't provide a writer as JPA would be writing the beans to the > >>> database. Would this be how you imagine it? > >>> > >>> - Mark > >>> > >>> > >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland > >>> wrote: > >>>> (From now on I will only be posting these development messages to > >>>> biojava-dev, which is the intended purpose of that list. Those of you > who > >>>> wish to keep track of things but are currently only subscribed to > >>> biojava-l > >>>> should also subscribe to biojava-dev in order to keep up to date.) > >>>> > >>>> As promised, I've committed a new package in the biojava-core module > that > >>>> should help understand how to do file parsing and conversion and > writing > >>> in > >>>> the new BJ3 modules. Here's an example of how to use it to write a > >>> Genbank > >>>> parser (note no parsers actually exist yet!): > >>>> > >>>> 1. Design yourself a Genbank class which implements the interface > Thing > >>> and > >>>> can fully represent all the data that might possibly occur inside a > >>> Genbank > >>>> file. > >>>> > >>>> 2. Write an interface called GenbankReceiver, which extends > ThingReceiver > >>>> and defines all the methods you might need in order to construct a > >>> Genbank > >>>> object in an asynchronous fashion. > >>>> > >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and > >>>> ThingBuilder. It's job is to receive data via method calls, use that > data > >>> to > >>>> construct a Genbank object, then provide that object on demand. > >>>> > >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and > >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of > >>>> constructing new Genbank objects, it writes Genbank records to file > that > >>>> reflect the data it receives. > >>>> > >>>> 5. Write a GenbankReader class which implements ThingReader. It can > read > >>>> GenbankFiles and output the data to the methods of the ThingReceiver > >>>> provided to it, which in this case could be anything which implements > the > >>>> interface GenbankReceiver. > >>>> > >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It > takes a > >>>> Genbank object and will fire off data from it to the provided > >>> ThingReceiver > >>>> (a GenbankReceiver instance) as if the Genbank object was being read > from > >>> a > >>>> file or some other source. > >>>> > >>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 > or > >>> 2, > >>>> but the additional steps are necessary for flexibility in converting > >>> between > >>>> formats. > >>>> > >>>> Now to use it (you'll probably want a GenbankTools class to wrap these > >>> steps > >>>> up for user-friendliness, including various options for opening files, > >>>> etc.): > >>>> > >>>> 1. To read a file - instantiate ThingParser with your GenbankReader as > >>> the > >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods > on > >>>> ThingParser to get the objects out. > >>>> > >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter > >>> wrapping > >>>> your Genbank object, and a GenbankWriter as the receiver. Use the > >>> parseAll() > >>>> method on the ThingParser to dump the whole lot to your chosen output. > >>>> > >>>> The clever bit comes when you want to convert between files. Imagine > >>> you've > >>>> done all the above for Genbank, and you've also done it for FASTA. How > to > >>>> convert between them? What you need to do is this: > >>>> > >>>> 1. Implement all the classes for both Genbank and FASTA. > >>>> > >>>> 2. Write a GenbankFASTAConverter class that implements > >>> ThingConverter > >>>> and GenbankReceiver, and will internally convert the data received and > >>> pass > >>>> it on out to the receiver provided, which will be a FASTAReceiver > >>> instance. > >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the > >>> opposite > >>>> way, implementing ThingConverter and FASTAReceiver. > >>>> > >>>> Then to convert you use ThingParser again: > >>>> > >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a > >>>> FASTAReader reader, a GenbankBuilder receiver, and add a > >>>> FASTAGenbankConverter instance to the converter chain. Use the > iterator > >>> to > >>>> get your Genbank objects out of your FASTA file. > >>>> > >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a > >>>> GenbankWriter instead and use parseAll() instead of the iterator > methos. > >>>> > >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide > a > >>>> FASTAEmitter wrapping your FASTA object as the reader instead. > >>>> > >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both > the > >>>> reader and the receiver as per options 2 and 3. > >>>> > >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all > >>> mentions > >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. > >>>> > >>>> One last and very important feature of this approach is that if you > >>> discover > >>>> that nobody has written the appropriate converter for your chosen pair > of > >>>> formats A and C, but converters do exist to map A to some other format > B > >>> and > >>>> that other format B on to C, then you can just put the two converts > A-B > >>> and > >>>> B-C into the ThingParser chain and it'll work perfectly. > >>>> > >>>> Enjoy! > >>>> > >>>> cheers, > >>>> Richard > >>>> > >>>> -- > >>>> Richard Holland, BSc MBCS > >>>> Finance Director, Eagle Genomics Ltd > >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >>>> http://www.eaglegenomics.com/ > >>>> _______________________________________________ > >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >> > >> > >> > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Tue Oct 21 11:24:13 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 21 Oct 2008 19:24:13 +0800 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> Message-ID: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> Depending on what you want them for isMachineGenerated(), isManuallyCurated(), would possibly be better as annotations (@MachineGenerated, @ManuallyCurated). This is true metadata. Probably if Java had annotations in version 1.1 Serializable would also be an Annotation. I would agree with the idea that ThingBuilder etc should be typed on extends Serializable. - Mark On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland wrote: > For now, yes it's empty. But I can envisage situations where it might be > nice to have Thing implement some common methods (e.g. isMachineGenerated(), > isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder > for future expansion, than have to re-engineer everything should we identify > a need for common functions in future. > > You'll see that Thing already extends Serializable, implying that all Things > must be able to persist to an object backing store. Serializable itself is > also an empty interface! > > Also I like the idea of having Thing, not Object, as a kind of marker of > intention. To me it makes it clearer when reading code to avoid Object > wherever possible. Thing may not be any more clever than Object, but it > immediately declares an intention when reading code as to what kind of > Object should be expected. > > > 2008/10/21 Mark Schreiber >> >> Is there any need for Thing at all? Can't a bulder be typed to produce >> something that extends Object? >> >> If Thing provides no behaivour contract or meta-information then why >> does it exist? >> >> - Mark >> >> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: >> > Depends on what you want to program. If you want to have a collection of >> > objects which are Things & perform a common action on them then >> > annotations are not the way forward. >> > >> > If you want to have some kind of meta-programming occurring & need a >> > class to be multiple things then annotations are right. There is >> > currently no way to enforce compile time dependencies on annotations & >> > my thinking is that this is right. Annotations should be meta data or >> > provide a way to alter a class in a non-invasive way (think Web Service >> > annotations creating WS Servers & Clients without any alteration of the >> > class). >> > >> > Andy >> > >> > Richard Holland wrote: >> >> Spot on. >> >> >> >> Annotation/interface.... i think Annotation is probably better as you >> >> suggest, but I'd have to look into that. Not sure how it works with >> >> collections and generics. If it does turn out to be a better bet, I'll >> >> change it over. >> >> >> >> With the BioSQL dependencies, take a look at the pom.xml file inside >> >> the >> >> biojava-dna module. It declares a dependency on biojava-core. If you >> >> want to >> >> add dependencies to external JARs, take a look at biojava-biosql's >> >> pom.xml >> >> to see how it depends on javax.persistence. (The easiest way to add >> >> these is >> >> via an IDE such as NetBeans, which is what I'm using at the moment). >> >> >> >> cheers, >> >> Richard >> >> >> >> 2008/10/21 Mark Schreiber >> >> >> >>> So if I want to build a BioSQL loader from Genbank then would the >> >>> classes (or there wrappers) in the BioSQL Entity package need to >> >>> implement Thing? Would maven have an issue with that or would it just >> >>> create a dependency on core? (you can tell I've never used Maven >> >>> right). >> >>> >> >>> From a design point of view should Thing be an interface or an >> >>> Annotation? The reason I ask is that it doesn't define any methods so >> >>> it is more of a tag than an interface. >> >>> >> >>> Anyway, my understanding is that I would use a Genbank parser (or >> >>> write one). Write a EntityReceiver interface (probably more than one >> >>> given the number of entities in BioSQL, implement a EntityBuilder >> >>> (again possibly more than one) that implements EntityReceiver and >> >>> builds Entity beans from messages it receives. In this case I probably >> >>> wouldn't provide a writer as JPA would be writing the beans to the >> >>> database. Would this be how you imagine it? >> >>> >> >>> - Mark >> >>> >> >>> >> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland >> >>> wrote: >> >>>> (From now on I will only be posting these development messages to >> >>>> biojava-dev, which is the intended purpose of that list. Those of you >> >>>> who >> >>>> wish to keep track of things but are currently only subscribed to >> >>> biojava-l >> >>>> should also subscribe to biojava-dev in order to keep up to date.) >> >>>> >> >>>> As promised, I've committed a new package in the biojava-core module >> >>>> that >> >>>> should help understand how to do file parsing and conversion and >> >>>> writing >> >>> in >> >>>> the new BJ3 modules. Here's an example of how to use it to write a >> >>> Genbank >> >>>> parser (note no parsers actually exist yet!): >> >>>> >> >>>> 1. Design yourself a Genbank class which implements the interface >> >>>> Thing >> >>> and >> >>>> can fully represent all the data that might possibly occur inside a >> >>> Genbank >> >>>> file. >> >>>> >> >>>> 2. Write an interface called GenbankReceiver, which extends >> >>>> ThingReceiver >> >>>> and defines all the methods you might need in order to construct a >> >>> Genbank >> >>>> object in an asynchronous fashion. >> >>>> >> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and >> >>>> ThingBuilder. It's job is to receive data via method calls, use that >> >>>> data >> >>> to >> >>>> construct a Genbank object, then provide that object on demand. >> >>>> >> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and >> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of >> >>>> constructing new Genbank objects, it writes Genbank records to file >> >>>> that >> >>>> reflect the data it receives. >> >>>> >> >>>> 5. Write a GenbankReader class which implements ThingReader. It can >> >>>> read >> >>>> GenbankFiles and output the data to the methods of the ThingReceiver >> >>>> provided to it, which in this case could be anything which implements >> >>>> the >> >>>> interface GenbankReceiver. >> >>>> >> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It >> >>>> takes a >> >>>> Genbank object and will fire off data from it to the provided >> >>> ThingReceiver >> >>>> (a GenbankReceiver instance) as if the Genbank object was being read >> >>>> from >> >>> a >> >>>> file or some other source. >> >>>> >> >>>> That's it! OK so it's a minimum of 6 classes instead of the original >> >>>> 1 or >> >>> 2, >> >>>> but the additional steps are necessary for flexibility in converting >> >>> between >> >>>> formats. >> >>>> >> >>>> Now to use it (you'll probably want a GenbankTools class to wrap >> >>>> these >> >>> steps >> >>>> up for user-friendliness, including various options for opening >> >>>> files, >> >>>> etc.): >> >>>> >> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader >> >>>> as >> >>> the >> >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods >> >>>> on >> >>>> ThingParser to get the objects out. >> >>>> >> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter >> >>> wrapping >> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the >> >>> parseAll() >> >>>> method on the ThingParser to dump the whole lot to your chosen >> >>>> output. >> >>>> >> >>>> The clever bit comes when you want to convert between files. Imagine >> >>> you've >> >>>> done all the above for Genbank, and you've also done it for FASTA. >> >>>> How to >> >>>> convert between them? What you need to do is this: >> >>>> >> >>>> 1. Implement all the classes for both Genbank and FASTA. >> >>>> >> >>>> 2. Write a GenbankFASTAConverter class that implements >> >>> ThingConverter >> >>>> and GenbankReceiver, and will internally convert the data received >> >>>> and >> >>> pass >> >>>> it on out to the receiver provided, which will be a FASTAReceiver >> >>> instance. >> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the >> >>> opposite >> >>>> way, implementing ThingConverter and FASTAReceiver. >> >>>> >> >>>> Then to convert you use ThingParser again: >> >>>> >> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a >> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a >> >>>> FASTAGenbankConverter instance to the converter chain. Use the >> >>>> iterator >> >>> to >> >>>> get your Genbank objects out of your FASTA file. >> >>>> >> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a >> >>>> GenbankWriter instead and use parseAll() instead of the iterator >> >>>> methos. >> >>>> >> >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide >> >>>> a >> >>>> FASTAEmitter wrapping your FASTA object as the reader instead. >> >>>> >> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both >> >>>> the >> >>>> reader and the receiver as per options 2 and 3. >> >>>> >> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all >> >>> mentions >> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. >> >>>> >> >>>> One last and very important feature of this approach is that if you >> >>> discover >> >>>> that nobody has written the appropriate converter for your chosen >> >>>> pair of >> >>>> formats A and C, but converters do exist to map A to some other >> >>>> format B >> >>> and >> >>>> that other format B on to C, then you can just put the two converts >> >>>> A-B >> >>> and >> >>>> B-C into the ThingParser chain and it'll work perfectly. >> >>>> >> >>>> Enjoy! >> >>>> >> >>>> cheers, >> >>>> Richard >> >>>> >> >>>> -- >> >>>> Richard Holland, BSc MBCS >> >>>> Finance Director, Eagle Genomics Ltd >> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com >> >>>> http://www.eaglegenomics.com/ >> >>>> _______________________________________________ >> >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >>>> >> >> >> >> >> >> >> > > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From andreas at sdsc.edu Tue Oct 21 11:31:40 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 21 Oct 2008 04:31:40 -0700 Subject: [Biojava-dev] BioJava3 contribution In-Reply-To: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr> References: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com> <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr> Message-ID: <59a41c430810210431v2a9e1647w6a6fca991926f175@mail.gmail.com> Hi Fabrice, The biojava 1 features could only accept integer positions as start and stop. For protein structures an amino acid is uniquely identified by a number and an insertion code. As such in the biojava 1 world it was not possible to implement this for the protein structures. If we have a cleaner interface definition for that in biojava 3 should be no prob. Andreas On Mon, Oct 20, 2008 at 1:40 PM, Fabrice Jossinet wrote: > Hi Andreas, > yes of course, I really would like to work with you (I like your work with > SPICE). I wanted to contact you about this point before to start. Concerning > the tertiary structure representation, I need to annotate an RNA tertiary > structure with base-pairs families (as described in > http://www.ncbi.nlm.nih.gov/pubmed/12177293 or in > http://prion.bchs.uh.edu/bp_type/ ) and structural motifs (like those listed > in the SCOR database > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814). The idea > is to attach these features to a 3D in the same way than the features > attached to a sequence (1D). > What do you think? > Fabrice > Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit : > > Hi Fabrice, > > Regarding the tertiaty structure representation we should work > together. There is a seet of tools available already in the current > biojava 1.7 which I was intending to maintain and migrate to biojava v > 3. Let me know if you have specific RNA related requests... > > Andreas > > On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland > wrote: > > Excellent! Thanks for your offer of help! > > Yes, an advanced RNA module would be very helpful indeed. You should > > probably call it 'rna'. > > As long as everyone who intends to work on BJ3 declares their intentions > > here, as you just have, then basically it's first come first served. I won't > > be doing any official supervision other than keeping an eye on committed > > code once in a while to make sure it all looks OK. So feel free to start > > coding straight away! > > All new modules should probably start by: > > 1. copying the existing dna module to something new, like 'rna' in this > > case. > > 2. remove all the hidden .svn directories from the copy, > > 3. update the pom.xml in the copy (do a search-and-replace on dna and change > > to the new name, rna in this case), delete the existing source packages in > > src/main/java (org.biojava.dna) and create suitable new ones > > (org.biojava.rna in this case). > > 4. empty out the target/ folder then svn add the new module > > 5. svn:ignore the target/ directory in your new module, > > 6. include your new module in the list at the end of the pom.xml in the root > > directory of the biojava3 branch. > > cheers, > > Richard > > > > 2008/10/20 Fabrice Jossinet > > Dear Richard, > > I'm answering to your "official call", to propose you my help for the > > development of the biojava3 code. With the modularity of Maven, I also would > > like to proposes you my help for the development of a module that will use > > the biojava3 code to manage more specialized RNA stuff (secondary and > > tertiary structures, base-pairs classifications, modified nucleotides, RNA > > alignments,....). > > What will be the next step for me? Will you make a selection? > > Best Regards > > Fabrice Jossinet > > -- > > Dr. Fabrice Jossinet > > Laboratoire de Bioinformatique, modelisation et simulation des acides > > nucleiques > > Universite Louis Pasteur > > Institut de biologie moleculaire et cellulaire du CNRS > > UPR9002, Architecture et Reactivite de l'ARN > > 15 rue Rene Descartes > > F-67084 Strasbourg Cedex > > France > > Tel + 33 (0) 3 88 417053 > > FAX + 33 (0) 3 88 60 22 18 > > f.jossinet at ibmc.u-strasbg.fr > > fjossinet at gmail.com > > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > > http://fjossinet.u-strasbg.fr/ > > > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > -- > Dr. Fabrice Jossinet > Laboratoire de Bioinformatique, modelisation et simulation des acides > nucleiques > Universite Louis Pasteur > Institut de biologie moleculaire et cellulaire du CNRS > UPR9002, Architecture et Reactivite de l'ARN > 15 rue Rene Descartes > F-67084 Strasbourg Cedex > France > Tel + 33 (0) 3 88 417053 > FAX + 33 (0) 3 88 60 22 18 > f.jossinet at ibmc.u-strasbg.fr > fjossinet at gmail.com > http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html > http://fjossinet.u-strasbg.fr/ > > > From holland at eaglegenomics.com Tue Oct 21 11:39:44 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 12:39:44 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> Message-ID: The two examples I gave would be better as annotations, its true. Serializable, and Cloneable for that matter, would definitely work better that way. Well, we could do away with Thing altogether then. I'll update the code. 2008/10/21 Mark Schreiber > Depending on what you want them for isMachineGenerated(), > isManuallyCurated(), would possibly be better as annotations > (@MachineGenerated, @ManuallyCurated). This is true metadata. > > Probably if Java had annotations in version 1.1 Serializable would > also be an Annotation. I would agree with the idea that ThingBuilder > etc should be typed on extends Serializable. > > - Mark > > On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland > wrote: > > For now, yes it's empty. But I can envisage situations where it might be > > nice to have Thing implement some common methods (e.g. > isMachineGenerated(), > > isManuallyCurated(), etc.). I'd rather have it there now to be a > placeholder > > for future expansion, than have to re-engineer everything should we > identify > > a need for common functions in future. > > > > You'll see that Thing already extends Serializable, implying that all > Things > > must be able to persist to an object backing store. Serializable itself > is > > also an empty interface! > > > > Also I like the idea of having Thing, not Object, as a kind of marker of > > intention. To me it makes it clearer when reading code to avoid Object > > wherever possible. Thing may not be any more clever than Object, but it > > immediately declares an intention when reading code as to what kind of > > Object should be expected. > > > > > > 2008/10/21 Mark Schreiber > >> > >> Is there any need for Thing at all? Can't a bulder be typed to produce > >> something that extends Object? > >> > >> If Thing provides no behaivour contract or meta-information then why > >> does it exist? > >> > >> - Mark > >> > >> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: > >> > Depends on what you want to program. If you want to have a collection > of > >> > objects which are Things & perform a common action on them then > >> > annotations are not the way forward. > >> > > >> > If you want to have some kind of meta-programming occurring & need a > >> > class to be multiple things then annotations are right. There is > >> > currently no way to enforce compile time dependencies on annotations & > >> > my thinking is that this is right. Annotations should be meta data or > >> > provide a way to alter a class in a non-invasive way (think Web > Service > >> > annotations creating WS Servers & Clients without any alteration of > the > >> > class). > >> > > >> > Andy > >> > > >> > Richard Holland wrote: > >> >> Spot on. > >> >> > >> >> Annotation/interface.... i think Annotation is probably better as you > >> >> suggest, but I'd have to look into that. Not sure how it works with > >> >> collections and generics. If it does turn out to be a better bet, > I'll > >> >> change it over. > >> >> > >> >> With the BioSQL dependencies, take a look at the pom.xml file inside > >> >> the > >> >> biojava-dna module. It declares a dependency on biojava-core. If you > >> >> want to > >> >> add dependencies to external JARs, take a look at biojava-biosql's > >> >> pom.xml > >> >> to see how it depends on javax.persistence. (The easiest way to add > >> >> these is > >> >> via an IDE such as NetBeans, which is what I'm using at the moment). > >> >> > >> >> cheers, > >> >> Richard > >> >> > >> >> 2008/10/21 Mark Schreiber > >> >> > >> >>> So if I want to build a BioSQL loader from Genbank then would the > >> >>> classes (or there wrappers) in the BioSQL Entity package need to > >> >>> implement Thing? Would maven have an issue with that or would it > just > >> >>> create a dependency on core? (you can tell I've never used Maven > >> >>> right). > >> >>> > >> >>> From a design point of view should Thing be an interface or an > >> >>> Annotation? The reason I ask is that it doesn't define any methods > so > >> >>> it is more of a tag than an interface. > >> >>> > >> >>> Anyway, my understanding is that I would use a Genbank parser (or > >> >>> write one). Write a EntityReceiver interface (probably more than one > >> >>> given the number of entities in BioSQL, implement a EntityBuilder > >> >>> (again possibly more than one) that implements EntityReceiver and > >> >>> builds Entity beans from messages it receives. In this case I > probably > >> >>> wouldn't provide a writer as JPA would be writing the beans to the > >> >>> database. Would this be how you imagine it? > >> >>> > >> >>> - Mark > >> >>> > >> >>> > >> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland > >> >>> wrote: > >> >>>> (From now on I will only be posting these development messages to > >> >>>> biojava-dev, which is the intended purpose of that list. Those of > you > >> >>>> who > >> >>>> wish to keep track of things but are currently only subscribed to > >> >>> biojava-l > >> >>>> should also subscribe to biojava-dev in order to keep up to date.) > >> >>>> > >> >>>> As promised, I've committed a new package in the biojava-core > module > >> >>>> that > >> >>>> should help understand how to do file parsing and conversion and > >> >>>> writing > >> >>> in > >> >>>> the new BJ3 modules. Here's an example of how to use it to write a > >> >>> Genbank > >> >>>> parser (note no parsers actually exist yet!): > >> >>>> > >> >>>> 1. Design yourself a Genbank class which implements the interface > >> >>>> Thing > >> >>> and > >> >>>> can fully represent all the data that might possibly occur inside a > >> >>> Genbank > >> >>>> file. > >> >>>> > >> >>>> 2. Write an interface called GenbankReceiver, which extends > >> >>>> ThingReceiver > >> >>>> and defines all the methods you might need in order to construct a > >> >>> Genbank > >> >>>> object in an asynchronous fashion. > >> >>>> > >> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver > and > >> >>>> ThingBuilder. It's job is to receive data via method calls, use > that > >> >>>> data > >> >>> to > >> >>>> construct a Genbank object, then provide that object on demand. > >> >>>> > >> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and > >> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of > >> >>>> constructing new Genbank objects, it writes Genbank records to file > >> >>>> that > >> >>>> reflect the data it receives. > >> >>>> > >> >>>> 5. Write a GenbankReader class which implements ThingReader. It can > >> >>>> read > >> >>>> GenbankFiles and output the data to the methods of the > ThingReceiver > >> >>>> provided to it, which in this case could be anything which > implements > >> >>>> the > >> >>>> interface GenbankReceiver. > >> >>>> > >> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It > >> >>>> takes a > >> >>>> Genbank object and will fire off data from it to the provided > >> >>> ThingReceiver > >> >>>> (a GenbankReceiver instance) as if the Genbank object was being > read > >> >>>> from > >> >>> a > >> >>>> file or some other source. > >> >>>> > >> >>>> That's it! OK so it's a minimum of 6 classes instead of the > original > >> >>>> 1 or > >> >>> 2, > >> >>>> but the additional steps are necessary for flexibility in > converting > >> >>> between > >> >>>> formats. > >> >>>> > >> >>>> Now to use it (you'll probably want a GenbankTools class to wrap > >> >>>> these > >> >>> steps > >> >>>> up for user-friendliness, including various options for opening > >> >>>> files, > >> >>>> etc.): > >> >>>> > >> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader > >> >>>> as > >> >>> the > >> >>>> reader, and GenbankBuilder as the receiver. Use the iterator > methods > >> >>>> on > >> >>>> ThingParser to get the objects out. > >> >>>> > >> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter > >> >>> wrapping > >> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the > >> >>> parseAll() > >> >>>> method on the ThingParser to dump the whole lot to your chosen > >> >>>> output. > >> >>>> > >> >>>> The clever bit comes when you want to convert between files. > Imagine > >> >>> you've > >> >>>> done all the above for Genbank, and you've also done it for FASTA. > >> >>>> How to > >> >>>> convert between them? What you need to do is this: > >> >>>> > >> >>>> 1. Implement all the classes for both Genbank and FASTA. > >> >>>> > >> >>>> 2. Write a GenbankFASTAConverter class that implements > >> >>> ThingConverter > >> >>>> and GenbankReceiver, and will internally convert the data received > >> >>>> and > >> >>> pass > >> >>>> it on out to the receiver provided, which will be a FASTAReceiver > >> >>> instance. > >> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the > >> >>> opposite > >> >>>> way, implementing ThingConverter and FASTAReceiver. > >> >>>> > >> >>>> Then to convert you use ThingParser again: > >> >>>> > >> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with > a > >> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a > >> >>>> FASTAGenbankConverter instance to the converter chain. Use the > >> >>>> iterator > >> >>> to > >> >>>> get your Genbank objects out of your FASTA file. > >> >>>> > >> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a > >> >>>> GenbankWriter instead and use parseAll() instead of the iterator > >> >>>> methos. > >> >>>> > >> >>>> 3. From FASTA object to Genbank object: Same as option 1, but > provide > >> >>>> a > >> >>>> FASTAEmitter wrapping your FASTA object as the reader instead. > >> >>>> > >> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap > both > >> >>>> the > >> >>>> reader and the receiver as per options 2 and 3. > >> >>>> > >> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all > >> >>> mentions > >> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. > >> >>>> > >> >>>> One last and very important feature of this approach is that if you > >> >>> discover > >> >>>> that nobody has written the appropriate converter for your chosen > >> >>>> pair of > >> >>>> formats A and C, but converters do exist to map A to some other > >> >>>> format B > >> >>> and > >> >>>> that other format B on to C, then you can just put the two converts > >> >>>> A-B > >> >>> and > >> >>>> B-C into the ThingParser chain and it'll work perfectly. > >> >>>> > >> >>>> Enjoy! > >> >>>> > >> >>>> cheers, > >> >>>> Richard > >> >>>> > >> >>>> -- > >> >>>> Richard Holland, BSc MBCS > >> >>>> Finance Director, Eagle Genomics Ltd > >> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >> >>>> http://www.eaglegenomics.com/ > >> >>>> _______________________________________________ > >> >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> >>>> > >> >> > >> >> > >> >> > >> > > > > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Tue Oct 21 14:32:45 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 21 Oct 2008 15:32:45 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> Message-ID: <48FDE80D.1040106@ebi.ac.uk> If "Thing" has gone then what impact does this have on remaining classes? Considering methods like canReadNextThing() & readNextThing(); should this be canReadNext() & readNext()? Just an idle thought .... Andy Richard Holland wrote: > The two examples I gave would be better as annotations, its true. > Serializable, and Cloneable for that matter, would definitely work better > that way. > > Well, we could do away with Thing altogether then. I'll update the code. > > > 2008/10/21 Mark Schreiber > >> Depending on what you want them for isMachineGenerated(), >> isManuallyCurated(), would possibly be better as annotations >> (@MachineGenerated, @ManuallyCurated). This is true metadata. >> >> Probably if Java had annotations in version 1.1 Serializable would >> also be an Annotation. I would agree with the idea that ThingBuilder >> etc should be typed on extends Serializable. >> >> - Mark >> >> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland >> wrote: >>> For now, yes it's empty. But I can envisage situations where it might be >>> nice to have Thing implement some common methods (e.g. >> isMachineGenerated(), >>> isManuallyCurated(), etc.). I'd rather have it there now to be a >> placeholder >>> for future expansion, than have to re-engineer everything should we >> identify >>> a need for common functions in future. >>> >>> You'll see that Thing already extends Serializable, implying that all >> Things >>> must be able to persist to an object backing store. Serializable itself >> is >>> also an empty interface! >>> >>> Also I like the idea of having Thing, not Object, as a kind of marker of >>> intention. To me it makes it clearer when reading code to avoid Object >>> wherever possible. Thing may not be any more clever than Object, but it >>> immediately declares an intention when reading code as to what kind of >>> Object should be expected. >>> >>> >>> 2008/10/21 Mark Schreiber >>>> Is there any need for Thing at all? Can't a bulder be typed to produce >>>> something that extends Object? >>>> >>>> If Thing provides no behaivour contract or meta-information then why >>>> does it exist? >>>> >>>> - Mark >>>> >>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: >>>>> Depends on what you want to program. If you want to have a collection >> of >>>>> objects which are Things & perform a common action on them then >>>>> annotations are not the way forward. >>>>> >>>>> If you want to have some kind of meta-programming occurring & need a >>>>> class to be multiple things then annotations are right. There is >>>>> currently no way to enforce compile time dependencies on annotations & >>>>> my thinking is that this is right. Annotations should be meta data or >>>>> provide a way to alter a class in a non-invasive way (think Web >> Service >>>>> annotations creating WS Servers & Clients without any alteration of >> the >>>>> class). >>>>> >>>>> Andy >>>>> >>>>> Richard Holland wrote: >>>>>> Spot on. >>>>>> >>>>>> Annotation/interface.... i think Annotation is probably better as you >>>>>> suggest, but I'd have to look into that. Not sure how it works with >>>>>> collections and generics. If it does turn out to be a better bet, >> I'll >>>>>> change it over. >>>>>> >>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside >>>>>> the >>>>>> biojava-dna module. It declares a dependency on biojava-core. If you >>>>>> want to >>>>>> add dependencies to external JARs, take a look at biojava-biosql's >>>>>> pom.xml >>>>>> to see how it depends on javax.persistence. (The easiest way to add >>>>>> these is >>>>>> via an IDE such as NetBeans, which is what I'm using at the moment). >>>>>> >>>>>> cheers, >>>>>> Richard >>>>>> >>>>>> 2008/10/21 Mark Schreiber >>>>>> >>>>>>> So if I want to build a BioSQL loader from Genbank then would the >>>>>>> classes (or there wrappers) in the BioSQL Entity package need to >>>>>>> implement Thing? Would maven have an issue with that or would it >> just >>>>>>> create a dependency on core? (you can tell I've never used Maven >>>>>>> right). >>>>>>> >>>>>>> From a design point of view should Thing be an interface or an >>>>>>> Annotation? The reason I ask is that it doesn't define any methods >> so >>>>>>> it is more of a tag than an interface. >>>>>>> >>>>>>> Anyway, my understanding is that I would use a Genbank parser (or >>>>>>> write one). Write a EntityReceiver interface (probably more than one >>>>>>> given the number of entities in BioSQL, implement a EntityBuilder >>>>>>> (again possibly more than one) that implements EntityReceiver and >>>>>>> builds Entity beans from messages it receives. In this case I >> probably >>>>>>> wouldn't provide a writer as JPA would be writing the beans to the >>>>>>> database. Would this be how you imagine it? >>>>>>> >>>>>>> - Mark >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland >>>>>>> wrote: >>>>>>>> (From now on I will only be posting these development messages to >>>>>>>> biojava-dev, which is the intended purpose of that list. Those of >> you >>>>>>>> who >>>>>>>> wish to keep track of things but are currently only subscribed to >>>>>>> biojava-l >>>>>>>> should also subscribe to biojava-dev in order to keep up to date.) >>>>>>>> >>>>>>>> As promised, I've committed a new package in the biojava-core >> module >>>>>>>> that >>>>>>>> should help understand how to do file parsing and conversion and >>>>>>>> writing >>>>>>> in >>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a >>>>>>> Genbank >>>>>>>> parser (note no parsers actually exist yet!): >>>>>>>> >>>>>>>> 1. Design yourself a Genbank class which implements the interface >>>>>>>> Thing >>>>>>> and >>>>>>>> can fully represent all the data that might possibly occur inside a >>>>>>> Genbank >>>>>>>> file. >>>>>>>> >>>>>>>> 2. Write an interface called GenbankReceiver, which extends >>>>>>>> ThingReceiver >>>>>>>> and defines all the methods you might need in order to construct a >>>>>>> Genbank >>>>>>>> object in an asynchronous fashion. >>>>>>>> >>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver >> and >>>>>>>> ThingBuilder. It's job is to receive data via method calls, use >> that >>>>>>>> data >>>>>>> to >>>>>>>> construct a Genbank object, then provide that object on demand. >>>>>>>> >>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and >>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of >>>>>>>> constructing new Genbank objects, it writes Genbank records to file >>>>>>>> that >>>>>>>> reflect the data it receives. >>>>>>>> >>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It can >>>>>>>> read >>>>>>>> GenbankFiles and output the data to the methods of the >> ThingReceiver >>>>>>>> provided to it, which in this case could be anything which >> implements >>>>>>>> the >>>>>>>> interface GenbankReceiver. >>>>>>>> >>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It >>>>>>>> takes a >>>>>>>> Genbank object and will fire off data from it to the provided >>>>>>> ThingReceiver >>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being >> read >>>>>>>> from >>>>>>> a >>>>>>>> file or some other source. >>>>>>>> >>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the >> original >>>>>>>> 1 or >>>>>>> 2, >>>>>>>> but the additional steps are necessary for flexibility in >> converting >>>>>>> between >>>>>>>> formats. >>>>>>>> >>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap >>>>>>>> these >>>>>>> steps >>>>>>>> up for user-friendliness, including various options for opening >>>>>>>> files, >>>>>>>> etc.): >>>>>>>> >>>>>>>> 1. To read a file - instantiate ThingParser with your GenbankReader >>>>>>>> as >>>>>>> the >>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator >> methods >>>>>>>> on >>>>>>>> ThingParser to get the objects out. >>>>>>>> >>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter >>>>>>> wrapping >>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the >>>>>>> parseAll() >>>>>>>> method on the ThingParser to dump the whole lot to your chosen >>>>>>>> output. >>>>>>>> >>>>>>>> The clever bit comes when you want to convert between files. >> Imagine >>>>>>> you've >>>>>>>> done all the above for Genbank, and you've also done it for FASTA. >>>>>>>> How to >>>>>>>> convert between them? What you need to do is this: >>>>>>>> >>>>>>>> 1. Implement all the classes for both Genbank and FASTA. >>>>>>>> >>>>>>>> 2. Write a GenbankFASTAConverter class that implements >>>>>>> ThingConverter >>>>>>>> and GenbankReceiver, and will internally convert the data received >>>>>>>> and >>>>>>> pass >>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver >>>>>>> instance. >>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the >>>>>>> opposite >>>>>>>> way, implementing ThingConverter and FASTAReceiver. >>>>>>>> >>>>>>>> Then to convert you use ThingParser again: >>>>>>>> >>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with >> a >>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a >>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the >>>>>>>> iterator >>>>>>> to >>>>>>>> get your Genbank objects out of your FASTA file. >>>>>>>> >>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a >>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator >>>>>>>> methos. >>>>>>>> >>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but >> provide >>>>>>>> a >>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead. >>>>>>>> >>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap >> both >>>>>>>> the >>>>>>>> reader and the receiver as per options 2 and 3. >>>>>>>> >>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all >>>>>>> mentions >>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. >>>>>>>> >>>>>>>> One last and very important feature of this approach is that if you >>>>>>> discover >>>>>>>> that nobody has written the appropriate converter for your chosen >>>>>>>> pair of >>>>>>>> formats A and C, but converters do exist to map A to some other >>>>>>>> format B >>>>>>> and >>>>>>>> that other format B on to C, then you can just put the two converts >>>>>>>> A-B >>>>>>> and >>>>>>>> B-C into the ThingParser chain and it'll work perfectly. >>>>>>>> >>>>>>>> Enjoy! >>>>>>>> >>>>>>>> cheers, >>>>>>>> Richard >>>>>>>> >>>>>>>> -- >>>>>>>> Richard Holland, BSc MBCS >>>>>>>> Finance Director, Eagle Genomics Ltd >>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com >>>>>>>> http://www.eaglegenomics.com/ >>>>>>>> _______________________________________________ >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>>>>> >>>>>> >>>>>> >>> >>> >>> -- >>> Richard Holland, BSc MBCS >>> Finance Director, Eagle Genomics Ltd >>> M: +44 7500 438846 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> > > > From holland at eaglegenomics.com Tue Oct 21 16:13:37 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 21 Oct 2008 17:13:37 +0100 Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3 In-Reply-To: <48FDE80D.1040106@ebi.ac.uk> References: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com> <48FD97AB.70503@ebi.ac.uk> <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com> <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com> <48FDE80D.1040106@ebi.ac.uk> Message-ID: Yup - why not. Feel free to go in and edit. :) 2008/10/21 Andy Yates > If "Thing" has gone then what impact does this have on remaining > classes? Considering methods like canReadNextThing() & readNextThing(); > should this be canReadNext() & readNext()? > > Just an idle thought .... > > Andy > > Richard Holland wrote: > > The two examples I gave would be better as annotations, its true. > > Serializable, and Cloneable for that matter, would definitely work better > > that way. > > > > Well, we could do away with Thing altogether then. I'll update the code. > > > > > > 2008/10/21 Mark Schreiber > > > >> Depending on what you want them for isMachineGenerated(), > >> isManuallyCurated(), would possibly be better as annotations > >> (@MachineGenerated, @ManuallyCurated). This is true metadata. > >> > >> Probably if Java had annotations in version 1.1 Serializable would > >> also be an Annotation. I would agree with the idea that ThingBuilder > >> etc should be typed on extends Serializable. > >> > >> - Mark > >> > >> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland > >> wrote: > >>> For now, yes it's empty. But I can envisage situations where it might > be > >>> nice to have Thing implement some common methods (e.g. > >> isMachineGenerated(), > >>> isManuallyCurated(), etc.). I'd rather have it there now to be a > >> placeholder > >>> for future expansion, than have to re-engineer everything should we > >> identify > >>> a need for common functions in future. > >>> > >>> You'll see that Thing already extends Serializable, implying that all > >> Things > >>> must be able to persist to an object backing store. Serializable itself > >> is > >>> also an empty interface! > >>> > >>> Also I like the idea of having Thing, not Object, as a kind of marker > of > >>> intention. To me it makes it clearer when reading code to avoid Object > >>> wherever possible. Thing may not be any more clever than Object, but it > >>> immediately declares an intention when reading code as to what kind of > >>> Object should be expected. > >>> > >>> > >>> 2008/10/21 Mark Schreiber > >>>> Is there any need for Thing at all? Can't a bulder be typed to produce > >>>> something that extends Object? > >>>> > >>>> If Thing provides no behaivour contract or meta-information then why > >>>> does it exist? > >>>> > >>>> - Mark > >>>> > >>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates wrote: > >>>>> Depends on what you want to program. If you want to have a collection > >> of > >>>>> objects which are Things & perform a common action on them then > >>>>> annotations are not the way forward. > >>>>> > >>>>> If you want to have some kind of meta-programming occurring & need a > >>>>> class to be multiple things then annotations are right. There is > >>>>> currently no way to enforce compile time dependencies on annotations > & > >>>>> my thinking is that this is right. Annotations should be meta data or > >>>>> provide a way to alter a class in a non-invasive way (think Web > >> Service > >>>>> annotations creating WS Servers & Clients without any alteration of > >> the > >>>>> class). > >>>>> > >>>>> Andy > >>>>> > >>>>> Richard Holland wrote: > >>>>>> Spot on. > >>>>>> > >>>>>> Annotation/interface.... i think Annotation is probably better as > you > >>>>>> suggest, but I'd have to look into that. Not sure how it works with > >>>>>> collections and generics. If it does turn out to be a better bet, > >> I'll > >>>>>> change it over. > >>>>>> > >>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside > >>>>>> the > >>>>>> biojava-dna module. It declares a dependency on biojava-core. If you > >>>>>> want to > >>>>>> add dependencies to external JARs, take a look at biojava-biosql's > >>>>>> pom.xml > >>>>>> to see how it depends on javax.persistence. (The easiest way to add > >>>>>> these is > >>>>>> via an IDE such as NetBeans, which is what I'm using at the moment). > >>>>>> > >>>>>> cheers, > >>>>>> Richard > >>>>>> > >>>>>> 2008/10/21 Mark Schreiber > >>>>>> > >>>>>>> So if I want to build a BioSQL loader from Genbank then would the > >>>>>>> classes (or there wrappers) in the BioSQL Entity package need to > >>>>>>> implement Thing? Would maven have an issue with that or would it > >> just > >>>>>>> create a dependency on core? (you can tell I've never used Maven > >>>>>>> right). > >>>>>>> > >>>>>>> From a design point of view should Thing be an interface or an > >>>>>>> Annotation? The reason I ask is that it doesn't define any methods > >> so > >>>>>>> it is more of a tag than an interface. > >>>>>>> > >>>>>>> Anyway, my understanding is that I would use a Genbank parser (or > >>>>>>> write one). Write a EntityReceiver interface (probably more than > one > >>>>>>> given the number of entities in BioSQL, implement a EntityBuilder > >>>>>>> (again possibly more than one) that implements EntityReceiver and > >>>>>>> builds Entity beans from messages it receives. In this case I > >> probably > >>>>>>> wouldn't provide a writer as JPA would be writing the beans to the > >>>>>>> database. Would this be how you imagine it? > >>>>>>> > >>>>>>> - Mark > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland > >>>>>>> wrote: > >>>>>>>> (From now on I will only be posting these development messages to > >>>>>>>> biojava-dev, which is the intended purpose of that list. Those of > >> you > >>>>>>>> who > >>>>>>>> wish to keep track of things but are currently only subscribed to > >>>>>>> biojava-l > >>>>>>>> should also subscribe to biojava-dev in order to keep up to date.) > >>>>>>>> > >>>>>>>> As promised, I've committed a new package in the biojava-core > >> module > >>>>>>>> that > >>>>>>>> should help understand how to do file parsing and conversion and > >>>>>>>> writing > >>>>>>> in > >>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a > >>>>>>> Genbank > >>>>>>>> parser (note no parsers actually exist yet!): > >>>>>>>> > >>>>>>>> 1. Design yourself a Genbank class which implements the interface > >>>>>>>> Thing > >>>>>>> and > >>>>>>>> can fully represent all the data that might possibly occur inside > a > >>>>>>> Genbank > >>>>>>>> file. > >>>>>>>> > >>>>>>>> 2. Write an interface called GenbankReceiver, which extends > >>>>>>>> ThingReceiver > >>>>>>>> and defines all the methods you might need in order to construct a > >>>>>>> Genbank > >>>>>>>> object in an asynchronous fashion. > >>>>>>>> > >>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver > >> and > >>>>>>>> ThingBuilder. It's job is to receive data via method calls, use > >> that > >>>>>>>> data > >>>>>>> to > >>>>>>>> construct a Genbank object, then provide that object on demand. > >>>>>>>> > >>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver > and > >>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of > >>>>>>>> constructing new Genbank objects, it writes Genbank records to > file > >>>>>>>> that > >>>>>>>> reflect the data it receives. > >>>>>>>> > >>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It > can > >>>>>>>> read > >>>>>>>> GenbankFiles and output the data to the methods of the > >> ThingReceiver > >>>>>>>> provided to it, which in this case could be anything which > >> implements > >>>>>>>> the > >>>>>>>> interface GenbankReceiver. > >>>>>>>> > >>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It > >>>>>>>> takes a > >>>>>>>> Genbank object and will fire off data from it to the provided > >>>>>>> ThingReceiver > >>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being > >> read > >>>>>>>> from > >>>>>>> a > >>>>>>>> file or some other source. > >>>>>>>> > >>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the > >> original > >>>>>>>> 1 or > >>>>>>> 2, > >>>>>>>> but the additional steps are necessary for flexibility in > >> converting > >>>>>>> between > >>>>>>>> formats. > >>>>>>>> > >>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap > >>>>>>>> these > >>>>>>> steps > >>>>>>>> up for user-friendliness, including various options for opening > >>>>>>>> files, > >>>>>>>> etc.): > >>>>>>>> > >>>>>>>> 1. To read a file - instantiate ThingParser with your > GenbankReader > >>>>>>>> as > >>>>>>> the > >>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator > >> methods > >>>>>>>> on > >>>>>>>> ThingParser to get the objects out. > >>>>>>>> > >>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter > >>>>>>> wrapping > >>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the > >>>>>>> parseAll() > >>>>>>>> method on the ThingParser to dump the whole lot to your chosen > >>>>>>>> output. > >>>>>>>> > >>>>>>>> The clever bit comes when you want to convert between files. > >> Imagine > >>>>>>> you've > >>>>>>>> done all the above for Genbank, and you've also done it for FASTA. > >>>>>>>> How to > >>>>>>>> convert between them? What you need to do is this: > >>>>>>>> > >>>>>>>> 1. Implement all the classes for both Genbank and FASTA. > >>>>>>>> > >>>>>>>> 2. Write a GenbankFASTAConverter class that implements > >>>>>>> ThingConverter > >>>>>>>> and GenbankReceiver, and will internally convert the data received > >>>>>>>> and > >>>>>>> pass > >>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver > >>>>>>> instance. > >>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly > the > >>>>>>> opposite > >>>>>>>> way, implementing ThingConverter and FASTAReceiver. > >>>>>>>> > >>>>>>>> Then to convert you use ThingParser again: > >>>>>>>> > >>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with > >> a > >>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a > >>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the > >>>>>>>> iterator > >>>>>>> to > >>>>>>>> get your Genbank objects out of your FASTA file. > >>>>>>>> > >>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide > a > >>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator > >>>>>>>> methos. > >>>>>>>> > >>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but > >> provide > >>>>>>>> a > >>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead. > >>>>>>>> > >>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap > >> both > >>>>>>>> the > >>>>>>>> reader and the receiver as per options 2 and 3. > >>>>>>>> > >>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all > >>>>>>> mentions > >>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead. > >>>>>>>> > >>>>>>>> One last and very important feature of this approach is that if > you > >>>>>>> discover > >>>>>>>> that nobody has written the appropriate converter for your chosen > >>>>>>>> pair of > >>>>>>>> formats A and C, but converters do exist to map A to some other > >>>>>>>> format B > >>>>>>> and > >>>>>>>> that other format B on to C, then you can just put the two > converts > >>>>>>>> A-B > >>>>>>> and > >>>>>>>> B-C into the ThingParser chain and it'll work perfectly. > >>>>>>>> > >>>>>>>> Enjoy! > >>>>>>>> > >>>>>>>> cheers, > >>>>>>>> Richard > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Richard Holland, BSc MBCS > >>>>>>>> Finance Director, Eagle Genomics Ltd > >>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >>>>>>>> http://www.eaglegenomics.com/ > >>>>>>>> _______________________________________________ > >>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>>>>>> > >>>>>> > >>>>>> > >>> > >>> > >>> -- > >>> Richard Holland, BSc MBCS > >>> Finance Director, Eagle Genomics Ltd > >>> M: +44 7500 438846 | E: holland at eaglegenomics.com > >>> http://www.eaglegenomics.com/ > >>> > > > > > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From fjossinet at orange.fr Tue Oct 21 19:55:47 2008 From: fjossinet at orange.fr (Fabrice Jossinet) Date: Tue, 21 Oct 2008 21:55:47 +0200 Subject: [Biojava-dev] Biojava 3 and intermolecular features Message-ID: Hi all, When I used the previous releases of biojava, i had some problems to model inter-molecular features. For example interactions between two sequences/molecules in a tertiary structure or the interactions between two molecular partners in an interaction network. The feature should be the same, shared by (at least) 2 molecules but can be attached to different locations for each molecule. With the current biojava model, a feature is composed of one location for a given sequence. Consequently, for the development of my previous software, I decided to change a little bit the biojava paradigm. For example, to model an intermolecular interaction between the region 23-35 of mySeq1 and the region 34-46 of mySeq2 i have: Feature myFeature = new InterMolecularInteraction(); mySeq1.addAnnotation(new Annotation(myFeature, new Location("23-35"))); mySeq2.addAnnotation(new Annotation(myFeature, new Location("34-46"))); The Annotation concept links a feature to a location and is attached to a sequence (this concept has no relation with the Annotation concept proposed by Biojava). With this kind of model, I could also able to use the same concepts and strategy to model multiple alignments, which can also be seen as a kind of "inter-molecular relation". Is there any plan to model these kind of features in biojava3? If no, can my proposal be a good start ? Fabrice -- Dr. Fabrice Jossinet Laboratoire de Bioinformatique, modelisation et simulation des acides nucleiques Universite Louis Pasteur Institut de biologie moleculaire et cellulaire du CNRS UPR9002, Architecture et Reactivite de l'ARN 15 rue Rene Descartes F-67084 Strasbourg Cedex France Tel + 33 (0) 3 88 417053 FAX + 33 (0) 3 88 60 22 18 f.jossinet at ibmc.u-strasbg.fr fjossinet at gmail.com http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html http://fjossinet.u-strasbg.fr/ From heuermh at acm.org Thu Oct 23 05:12:07 2008 From: heuermh at acm.org (Michael Heuer) Date: Thu, 23 Oct 2008 01:12:07 -0400 (EDT) Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: Message-ID: Sorry, I'm a bit late to the game. Hope I didn't miss anything exciting yet! Would it be better to commit this to trunk, and put the current codebase out to pasture on a branch? Is it possible (or desireable) to send SVN commit messages to the dev mailing list? Or alternatively, should someone create a project entry for biojava on CIA.vc? http://cia.vc As soon as I can remember my dev.open-bio.org password I'll start committing stuff, otherwise I'll post patches to bugzilla. michael On Mon, 20 Oct 2008, Richard Holland wrote: > Hi all, > > I've just committed some new code to the biojava3 branch of the biojava-live > subversion repository. It's the foundations of a brand new alphabet+symbol > set of classes, and an example of how to use them to represent DNA. You'll > notice that the new code is very lightweight and allows for a lot more > flexibility than the old code - for instance, the concept of Alphabet has > changed radically. It also makes much more extensive use of the Collections > API. > > I haven't got any test cases or usage examples yet but give me a shout if > you don't understand the code and I'll explain how it works. (Hint: > SymbolFormat is there to convert Strings into SymbolList objects, and vice > versa). > > So, now we want some volunteers! We're starting from scratch here so there's > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > whether it be copy-and-paste existing classes and modify them to suit the > new style, or write completely new ones to provide equivalent functionality. > > > I'll post an example of how to do file parsing soon, probably starting with > FASTA. In the meantime, a good place to start would be for people to design > object models to represent their favourite data types (e.g. Genbank, or > microarray data). Utility classes to manipulate those objects would be great > too. > > The object models need to be normalised as much as possible - e.g. if your > data has a lot of comments, and the order of those comments is important, > then give your object model a collection of comment objects. The object > model for each data type should be completely independent and use basic data > types wherever possible (e.g. store sequences as strings, don't attempt to > parse them into anything fancy like SymbolLists). The closer the object > model is to the original data format, the better. There's going to be clever > tricks when it comes to converting data between different object models > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > parsing examples up. > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > because we want to make it as modular as possible, so if you want to write > microarray stuff, create a new microarray sub-project (as per the dna > example that's already there). This way if someone only wants the microarray > bit of BJ3, they only need install the appropriate JAR file and can ignore > the rest. (The 'core' module is for stuff that is so generic it could be > used anywhere, or is used in every single other module.) > > If coding isn't your cup of tea, then we would very much welcome testers > (particularly those who enjoy writing test cases!), documenters > (particularly code commenters), translators (for internationalisation of the > code), and of course all those who wish to contribute ideas and suggestions > no matter how off-the-wall they might be. In particular if you'd like to > take charge of an area of the development process, e.g. Documentation Chief, > or Protein Champion, then that would be much appreciated. > > I'm very much looking forward to working with everyone on this. Good luck, > and happy coding! > > cheers, > Richard > > PS. Please don't forget to attach the appropriate licence to your code. You > can copy-and-paste it from the existing classes I just committed this > evening. > > PPS. For those who are worried about backwards compatibility - this was > discussed on the lists a while back and it was made clear that BJ3 is a > clean break. However, the existing code will continue to be maintained and > bugfixed for a couple of years so you don't have to upgrade if you don't > want to - it just won't have any new features developed for it. This is > largely because it'll probably take just that long to write all the new BJ3 > code. When we do decide to desupport the existing BJ code, plenty of notice > will be given (i.e. years as opposed to months). > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Thu Oct 23 06:04:23 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 23 Oct 2008 07:04:23 +0100 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: > > > Would it be better to commit this to trunk, and put the current codebase > out to pasture on a branch? Andreas is Mr.SVN. Andreas, what do you think? > > Is it possible (or desireable) to send SVN commit messages to the dev > mailing list? Or alternatively, should someone create a project entry for > biojava on CIA.vc? > > http://cia.vc I think commit messages to biojava-dev would be very useful. If nothing else, it provides a good indicator of activity to casual observers, and also lets people keep an automated eye (by mail filtering) on commits in the areas that interest them most. > > As soon as I can remember my dev.open-bio.org password I'll start > committing stuff, otherwise I'll post patches to bugzilla. If you've forgotten it, let support at OBF know and they'll reset it for you. cheers, Richard > > > michael > > > On Mon, 20 Oct 2008, Richard Holland wrote: > > > Hi all, > > > > I've just committed some new code to the biojava3 branch of the > biojava-live > > subversion repository. It's the foundations of a brand new > alphabet+symbol > > set of classes, and an example of how to use them to represent DNA. > You'll > > notice that the new code is very lightweight and allows for a lot more > > flexibility than the old code - for instance, the concept of Alphabet has > > changed radically. It also makes much more extensive use of the > Collections > > API. > > > > I haven't got any test cases or usage examples yet but give me a shout if > > you don't understand the code and I'll explain how it works. (Hint: > > SymbolFormat is there to convert Strings into SymbolList objects, and > vice > > versa). > > > > So, now we want some volunteers! We're starting from scratch here so > there's > > a lot of work to do. The whole of BioJava needs 'translating' into BJ3, > > whether it be copy-and-paste existing classes and modify them to suit the > > new style, or write completely new ones to provide equivalent > functionality. > > > > > > I'll post an example of how to do file parsing soon, probably starting > with > > FASTA. In the meantime, a good place to start would be for people to > design > > object models to represent their favourite data types (e.g. Genbank, or > > microarray data). Utility classes to manipulate those objects would be > great > > too. > > > > The object models need to be normalised as much as possible - e.g. if > your > > data has a lot of comments, and the order of those comments is important, > > then give your object model a collection of comment objects. The object > > model for each data type should be completely independent and use basic > data > > types wherever possible (e.g. store sequences as strings, don't attempt > to > > parse them into anything fancy like SymbolLists). The closer the object > > model is to the original data format, the better. There's going to be > clever > > tricks when it comes to converting data between different object models > > (e.g. Genbank to INSDSeq), which I will explain later when I put the file > > parsing examples up. > > > > You'll notice how the biojava3 branch uses Maven instead of Ant. This is > > because we want to make it as modular as possible, so if you want to > write > > microarray stuff, create a new microarray sub-project (as per the dna > > example that's already there). This way if someone only wants the > microarray > > bit of BJ3, they only need install the appropriate JAR file and can > ignore > > the rest. (The 'core' module is for stuff that is so generic it could be > > used anywhere, or is used in every single other module.) > > > > If coding isn't your cup of tea, then we would very much welcome testers > > (particularly those who enjoy writing test cases!), documenters > > (particularly code commenters), translators (for internationalisation of > the > > code), and of course all those who wish to contribute ideas and > suggestions > > no matter how off-the-wall they might be. In particular if you'd like to > > take charge of an area of the development process, e.g. Documentation > Chief, > > or Protein Champion, then that would be much appreciated. > > > > I'm very much looking forward to working with everyone on this. Good > luck, > > and happy coding! > > > > cheers, > > Richard > > > > PS. Please don't forget to attach the appropriate licence to your code. > You > > can copy-and-paste it from the existing classes I just committed this > > evening. > > > > PPS. For those who are worried about backwards compatibility - this was > > discussed on the lists a while back and it was made clear that BJ3 is a > > clean break. However, the existing code will continue to be maintained > and > > bugfixed for a couple of years so you don't have to upgrade if you don't > > want to - it just won't have any new features developed for it. This is > > largely because it'll probably take just that long to write all the new > BJ3 > > code. When we do decide to desupport the existing BJ code, plenty of > notice > > will be given (i.e. years as opposed to months). > > > > > > -- > > Richard Holland, BSc MBCS > > Finance Director, Eagle Genomics Ltd > > M: +44 7500 438846 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ch.koeberle at googlemail.com Thu Oct 23 08:58:15 2008 From: ch.koeberle at googlemail.com (=?ISO-8859-1?Q?Christian_K=F6berle?=) Date: Thu, 23 Oct 2008 10:58:15 +0200 Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship Message-ID: Hi, I found a bug in the postgre mapping file for BioEntryRelationship. line: The value for the attribute class has to be "BioEntry" For the BioEntry I miss methodes to have access to subject_bioentry BioEntryRelationship. I think the BioEntryRelationship. is a parent child relationship. So it will be nice to have access to both. Furthermore the hibernate mapping strategies for the BioSQL is quite slow and produces a lot of queries to the database. Because for all lists and set the lazy fetch mode is disable. In this mode hibernate will execute one query for each element in a list or set. The faster way is to enable the lazy fetch mode an use methods to load the list. Each of these methods executes only one query. For excample: public List getParents(BioEntry bioEntry){ String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object =:subject"; Query query = session.createQuery(stmt); query.setParameter("subject", bioEntry); return query.list(); } This is factor 2 to 4 faster than the methode BioEntry..getRelationships() In case of all dependences of an BioEntry-Object an select with lazy fetching can be 500 times faster than a select with eager fetching (in case of unigene cluster Hs.4 for example). Here a example for the relationship unigene cluster Hs.2 and the gene BC067218 (we use BioSQL to store Unigene) getParents(): runtime: 14 msec SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_, bioentry1_.name as name89_, bioentry1_.identifier as identifier89_, bioentry1_.accession as accession89_, bioentry1_.description as descript5_89_, bioentry1_.version as version89_, bioentry1_.division as division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id as biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as seq93_, case when bioentry1_1_.bioentry_id is not null then 2 when bioentry1_.bioentry_id is not null then 0 end as clazz_ from unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left outer join unigene.biosequence bioentry1_1_ on bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer joinunigene.biosequence bioentry1_2_ on bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where bioentryre0_.object_bioentry_id=? bioEntry.getRelationships(): runtime: 36 msec SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_, bioentry0_.name as name89_, bioentry0_.identifier as identifier89_, bioentry0_.accession as accession89_, bioentry0_.description as descript5_89_, bioentry0_.version as version89_, bioentry0_.division as division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id as biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as seq93_, case when bioentry0_1_.bioentry_id is not null then 2 when bioentry0_.bioentry_id is not null then 0 end as clazz_ from unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_ on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join unigene.biosequence bioentry0_2_ on bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=? Hibernate: select relationsh0_.object_bioentry_id as object3_1_, relationsh0_.bioentry_relationship_id as bioentry1_1_, relationsh0_.bioentry_relationship_id as bioentry1_95_0_, relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_, relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship relationsh0_ where relationsh0_.object_bioentry_id=? Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, namespace0_.description as descript4_80_0_ from unigene.biodatabase namespace0_ where namespace0_.biodatabase_id=? Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, bioentry0_.name as name89_0_, bioentry0_.identifier as identifier89_0_, bioentry0_.accession as accession89_0_, bioentry0_.description as descript5_89_0_, bioentry0_.version as version89_0_, bioentry0_.division as division89_0_, bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_ on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join unigene.biosequence bioentry0_2_ on bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.bioentry_id=? Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, namespace0_.description as descript4_80_0_ from unigene.biodatabase namespace0_ where namespace0_.biodatabase_id=? Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_, term0_.identifier as identifier84_0_, term0_.definition as definition84_0_, term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from unigene.term term0_ where term0_.term_id=? Hibernate: select ontology0_.ontology_id as ontology1_83_0_, ontology0_.name as name83_0_, ontology0_.definition as definition83_0_ from unigene.ontology ontology0_ where ontology0_.ontology_id=? Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_, termset0_.identifier as identifier84_0_, termset0_.definition as definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id as ontology6_84_0_ from unigene.term termset0_ where termset0_.ontology_id=? Hibernate: select tripleset0_.ontology_id as ontology5_1_, tripleset0_.term_relationship_id as term1_1_, tripleset0_.term_relationship_id as term1_87_0_, tripleset0_.subject_term_id as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_, tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id as ontology5_87_0_ from unigene.term_relationship tripleset0_ where tripleset0_.ontology_id=? Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref rankedcros0_ where rankedcros0_.term_id=? Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as synonym0_ from unigene.term_synonym synonymset0_ where synonymset0_.term_id=? -- Christian K?berle From dicknetherlands at gmail.com Thu Oct 23 09:45:53 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 23 Oct 2008 10:45:53 +0100 Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship In-Reply-To: References: Message-ID: Christian, Thanks for your comments. I'm not sure which file you're referring to, or what version of BioJava you have, as the line you quote does not appear in any of the current hbm.xml files in the trunk of SubVersion. Also, the BioEntryRelationship interface and it's implementations do already have getSubject() and getObject() methods which return the parent and child BioEntry instances. The BioEntry interface itself has a getBioEntryRelationships() method which returns all relationships in which it is the object BioEntry. You could use HQL to obtain those for which it is the subject, but you are right that it would be good to have a method that returns the latter. Could you raise a BugZilla request for this? It would be good if you could do some thorough testing of your lazy loading suggestions on some other use cases before we decide whether or not to adopt that approach in future developments. Use cases would include: 1. have a very large database with thousands of related records in it (e.g. load the whole of GenBank). Iterate over all the records in the database and perform a simple read operation on each that hits the modified methods. See if you run out of memory. 2. like 1, but perform a series of repeated read/write operations using the modified methods, with a final commit to attempt to write the results back to see if they still persist correctly. The reason is that the modified methods might cause problems with those people who are processing large volumes of data in their databases. If all related records are loaded at once, even only on demand, instead of one at a time, it will cause memory issues. The trade off is therefore memory vs. speed. We opted for the memory option because it makes life easier for most novice coders to not have to trace out-of-memory exceptions (although they will still occur using the existing methods, but it happens less often). Also, your method reruns the query every time it is called. It probably should cache the results after the first call, to prevent objects being reloaded unnecessarily, and to prevent problems with objects from a previous call being modified then attempted to be overwritten by a subsequent call? Also if Hibernate does not receive the same set back that it auto-loaded as a property via the default get() method when it comes to save the object, it will throw a wobbly and refuse to commit. cheers, Richard 2008/10/23 Christian K?berle > Hi, > I found a bug in the postgre mapping file for BioEntryRelationship. > line: > not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId" > embed-xml="false"/> > The value for the attribute class has to be "BioEntry" > > For the BioEntry I miss methodes to have access to subject_bioentry > BioEntryRelationship. I think the BioEntryRelationship. is a parent child > relationship. So it will be nice to have access to both. > > Furthermore the hibernate mapping strategies for the BioSQL is quite slow > and produces a lot of queries to the database. Because for all lists and > set > the lazy fetch mode is disable. In this mode hibernate will execute one > query for each element in a list or set. The faster way is to enable the > lazy fetch mode an use methods to load the list. Each of these methods > executes only one query. > For excample: > > public List getParents(BioEntry bioEntry){ > > String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object > =:subject"; > Query query = session.createQuery(stmt); > query.setParameter("subject", bioEntry); > return query.list(); > > } > > > This is factor 2 to 4 faster than the methode BioEntry..getRelationships() > In case of all dependences of an BioEntry-Object an select with lazy > fetching can be 500 times faster than a select with eager fetching (in case > of unigene cluster Hs.4 for example). > Here a example for the relationship unigene cluster Hs.2 and the gene > BC067218 (we use BioSQL to store Unigene) > > getParents(): > runtime: 14 msec > SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_, > bioentry1_.name as name89_, bioentry1_.identifier as identifier89_, > bioentry1_.accession as accession89_, bioentry1_.description as > descript5_89_, bioentry1_.version as version89_, bioentry1_.division as > division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id > as > biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as > length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as > seq93_, > case when bioentry1_1_.bioentry_id is not null then 2 when > bioentry1_.bioentry_id is not null then 0 end as clazz_ from > unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry > bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left > outer join unigene.biosequence bioentry1_1_ on > bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer > joinunigene.biosequence bioentry1_2_ on > bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where > bioentryre0_.object_bioentry_id=? > > > bioEntry.getRelationships(): > runtime: 36 msec > SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_, > bioentry0_.name as name89_, bioentry0_.identifier as identifier89_, > bioentry0_.accession as accession89_, bioentry0_.description as > descript5_89_, bioentry0_.version as version89_, bioentry0_.division as > division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id > as > biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as > length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as > seq93_, > case when bioentry0_1_.bioentry_id is not null then 2 when > bioentry0_.bioentry_id is not null then 0 end as clazz_ from > unigene.bioentry bioentry0_ left outer join unigene.biosequence > bioentry0_1_ > on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join > unigene.biosequence bioentry0_2_ on > bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=? > Hibernate: select relationsh0_.object_bioentry_id as object3_1_, > relationsh0_.bioentry_relationship_id as bioentry1_1_, > relationsh0_.bioentry_relationship_id as bioentry1_95_0_, > relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as > object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_, > relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship > relationsh0_ where relationsh0_.object_bioentry_id=? > Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, > namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, > namespace0_.description as descript4_80_0_ from unigene.biodatabase > namespace0_ where namespace0_.biodatabase_id=? > Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, > bioentry0_.name > as name89_0_, bioentry0_.identifier as identifier89_0_, > bioentry0_.accession > as accession89_0_, bioentry0_.description as descript5_89_0_, > bioentry0_.version as version89_0_, bioentry0_.division as division89_0_, > bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as > biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length > as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as > seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when > bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from > unigene.bioentry bioentry0_ left outer join unigene.biosequence > bioentry0_1_ > on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join > unigene.biosequence bioentry0_2_ on > bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where > bioentry0_.bioentry_id=? > Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, > namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, > namespace0_.description as descript4_80_0_ from unigene.biodatabase > namespace0_ where namespace0_.biodatabase_id=? > Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_, > term0_.identifier as identifier84_0_, term0_.definition as definition84_0_, > term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from > unigene.term term0_ where term0_.term_id=? > Hibernate: select ontology0_.ontology_id as ontology1_83_0_, > ontology0_.name > as name83_0_, ontology0_.definition as definition83_0_ from > unigene.ontology > ontology0_ where ontology0_.ontology_id=? > Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id > as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_, > termset0_.identifier as identifier84_0_, termset0_.definition as > definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id > as ontology6_84_0_ from unigene.term termset0_ where > termset0_.ontology_id=? > Hibernate: select tripleset0_.ontology_id as ontology5_1_, > tripleset0_.term_relationship_id as term1_1_, > tripleset0_.term_relationship_id as term1_87_0_, > tripleset0_.subject_term_id > as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_, > tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id > as ontology5_87_0_ from unigene.term_relationship tripleset0_ where > tripleset0_.ontology_id=? > Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id > as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref > rankedcros0_ where rankedcros0_.term_id=? > Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as > synonym0_ from unigene.term_synonym synonymset0_ where > synonymset0_.term_id=? > > -- > Christian K?berle > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From bugzilla-daemon at portal.open-bio.org Thu Oct 23 13:16:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 23 Oct 2008 09:16:43 -0400 Subject: [Biojava-dev] [Bug 2625] New: Parent Child Relationship of BioEntry via BioEntryRelationship Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2625 Summary: Parent Child Relationship of BioEntry via BioEntryRelationship Product: BioJava Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: DB / BioSQL AssignedTo: biojava-dev at biojava.org ReportedBy: ch.koeberle at googlemail.com An BioEntry-Object has only the methode getRelationships(), these method gives all BioEntryRelationship-Objkcts where the BioEntry-Object is the result of BioEntryRelationship.getObject() . Because the in the BioEntry.hbm.xml is only these mapping: I miss somethings like this: BioEntry.getReverseRelationships() (or getChilds()) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andreas at sdsc.edu Thu Oct 23 13:57:41 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 06:57:41 -0700 Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please! In-Reply-To: References: Message-ID: <59a41c430810230657p73b5d10kbf497c20fdfbe893@mail.gmail.com> >> Would it be better to commit this to trunk, and put the current codebase >> out to pasture on a branch? At the moment we have a number of unreleased bug fixes in biojava-live/trunk . Also if somebody would start using BJ at the present I would still recommend to use 1.6. As such I would say for the moment let's leave it the way it is. Once we reach alpha stage we could release a final biojava 1.7 and afterwards switch the branches in svn. About the commit messages sent to this list: can we make this a once per day? I can also set something up as part of cruise control... Andreas From andreas at sdsc.edu Thu Oct 23 17:24:27 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 10:24:27 -0700 Subject: [Biojava-dev] svn write access In-Reply-To: <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr> References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr> <59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com> <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr> Message-ID: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com> Hi Fabrice, in order to obtain a developer checkout you have to follow the procedure as it is described on http://biojava.org/wiki/CVS_to_SVN_Migration under the section Developer checkout code.open-bio is a read only copy of the SVN repository for anonymous checkout. The "real" developer repository is on the dev.open-bio machine and you can only access it via ssh. This setup is for security reasons. code.open-bio and dev.open-bio are getting synchronized approx ev. 20 min. Andreas On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet wrote: > Ok, I did that with the "code.open-bio.org" server and like that: > > svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3 > --username fjossinet --password blabla > > In this case, it seems it doesn't work. > > I will try the other way as described in the biojava homepage > > Thanx > > F > Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit : > >> you need to check out with that account, so the svn flags are all set >> correctly. >> >> see the biojava homepage for how to check out with a developer account. >> A >> >> 2008/10/23 Fabrice Jossinet : >>> >>> Hi Andreas, >>> >>> Mauricio has created me the account fjossinet for the machine >>> dev.open-bio.org. But I think this is only the first step since I still >>> don't have the write access on the svn machine. >>> >>> Thank you for your help >>> >>> Regards >>> >>> Fabrice >>> >>> >>> -- >>> Dr. Fabrice Jossinet >>> Laboratoire de Bioinformatique, modelisation et simulation des acides >>> nucleiques >>> Universite Louis Pasteur >>> Institut de biologie moleculaire et cellulaire du CNRS >>> UPR9002, Architecture et Reactivite de l'ARN >>> 15 rue Rene Descartes >>> F-67084 Strasbourg Cedex >>> France >>> >>> Tel + 33 (0) 3 88 417053 >>> FAX + 33 (0) 3 88 60 22 18 >>> >>> f.jossinet at ibmc.u-strasbg.fr >>> fjossinet at gmail.com >>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >>> http://fjossinet.u-strasbg.fr/ >>> >>> >>> >>> >>> > > From andreas at sdsc.edu Fri Oct 24 03:17:02 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 20:17:02 -0700 Subject: [Biojava-dev] biojava 3 docu on wiki Message-ID: <59a41c430810232017wbc8874fnf829c5b9e7ced4a9@mail.gmail.com> Hi, I summarized the current status of the BioJava3 project at http://biojava.org/wiki/BioJava3_project feel free to update/add/comment. Andreas From andreas at sdsc.edu Fri Oct 24 04:01:31 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 23 Oct 2008 21:01:31 -0700 Subject: [Biojava-dev] biojava 3 - java version Message-ID: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> Hi, I just tried to get an initial svn checkout of biojava3 on my mac at home. It fails to build since there is no Java 1.6 available for my OSX 10.4.11 ... Is there a strong reason why we should enforce java 1.6? otherwise would be good to support 1.5+ Andreas From f.jossinet at ibmc.u-strasbg.fr Fri Oct 24 08:21:15 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Fri, 24 Oct 2008 10:21:15 +0200 Subject: [Biojava-dev] svn write access In-Reply-To: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com> References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr> <59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com> <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr> <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com> Message-ID: <4CF8A26B-C50A-40F2-A7A5-B9F958F0F677@ibmc.u-strasbg.fr> Hi Andreas, Thank you for these details. I have added the new RNA module to biojava3 branch and I have updated the pom.xml file in the root directory of this branch. Fabrice Le 23 oct. 08 ? 19:24, Andreas Prlic a ?crit : > Hi Fabrice, > > in order to obtain a developer checkout you have to follow the > procedure as it is described on > http://biojava.org/wiki/CVS_to_SVN_Migration > under the section > Developer checkout > > code.open-bio is a read only copy of the SVN repository for anonymous > checkout. The "real" developer repository is on the dev.open-bio > machine and you can only access it via ssh. This setup is for security > reasons. code.open-bio and dev.open-bio are getting synchronized > approx ev. 20 min. > > Andreas > > On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet > wrote: >> Ok, I did that with the "code.open-bio.org" server and like that: >> >> svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3 >> --username fjossinet --password blabla >> >> In this case, it seems it doesn't work. >> >> I will try the other way as described in the biojava homepage >> >> Thanx >> >> F >> Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit : >> >>> you need to check out with that account, so the svn flags are all >>> set >>> correctly. >>> >>> see the biojava homepage for how to check out with a developer >>> account. >>> A >>> >>> 2008/10/23 Fabrice Jossinet : >>>> >>>> Hi Andreas, >>>> >>>> Mauricio has created me the account fjossinet for the machine >>>> dev.open-bio.org. But I think this is only the first step since I >>>> still >>>> don't have the write access on the svn machine. >>>> >>>> Thank you for your help >>>> >>>> Regards >>>> >>>> Fabrice >>>> >>>> >>>> -- >>>> Dr. Fabrice Jossinet >>>> Laboratoire de Bioinformatique, modelisation et simulation des >>>> acides >>>> nucleiques >>>> Universite Louis Pasteur >>>> Institut de biologie moleculaire et cellulaire du CNRS >>>> UPR9002, Architecture et Reactivite de l'ARN >>>> 15 rue Rene Descartes >>>> F-67084 Strasbourg Cedex >>>> France >>>> >>>> Tel + 33 (0) 3 88 417053 >>>> FAX + 33 (0) 3 88 60 22 18 >>>> >>>> f.jossinet at ibmc.u-strasbg.fr >>>> fjossinet at gmail.com >>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html >>>> http://fjossinet.u-strasbg.fr/ >>>> >>>> >>>> >>>> >>>> >> >> From dicknetherlands at gmail.com Fri Oct 24 09:58:18 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 24 Oct 2008 10:58:18 +0100 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> Message-ID: It's only the older PPC Mac models (running Mac OS X 10.4 or older) which can't get any newer official versions of Java than 1.5 / 5.0. However, an alternative (free) route for obtaining a Java 1.6 / 6.0 compiler is provided for these older machines: http://landonf.bikemonkey.org/static/soylatte/ We wanted to move to Java 6 because it'll likely take about a year to get BJ3 fully up and running, by which time Java 6 will probably be the oldest supported version of Java available from Sun (5.0 is already end-of-lifed, and with 7.0 due out in January it is likely to be desupported very soon. When 8.0 probably about 12 months after BJ3 is finished then 5.0 will definitely become desupported). cheers, Richard 2008/10/24 Andreas Prlic > Hi, > > I just tried to get an initial svn checkout of biojava3 on my mac at > home. It fails to build since there is no Java 1.6 available for my > OSX 10.4.11 ... > Is there a strong reason why we should enforce java 1.6? otherwise > would be good to support 1.5+ > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From f.jossinet at ibmc.u-strasbg.fr Fri Oct 24 10:20:59 2008 From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet) Date: Fri, 24 Oct 2008 12:20:59 +0200 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> Message-ID: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> Just to refresh the memory.... Major changes included in Java 6: * Support for older Win9x versions dropped. The last version for Windows 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 (1.5.0.16). * Scripting Language Support (JSR 223): Generic API for tight integration with scripting languages, and built-in Mozilla Javascript Rhino integration * Dramatic performance improvements for the core platform, and Swing. * Improved Web Service support through JAX-WS (JSR 224) * JDBC 4.0 support (JSR 221). * Java Compiler API (JSR 199): an API allowing a Java program to select and invoke a Java Compiler programmatically. * Upgrade of JAXB to version 2.0: Including integration of a StAX parser. * Support for pluggable annotations (JSR 269). * Many GUI improvements, such as integration of SwingWorker in the API, table sorting and filtering, and true Swing double-buffering (eliminating the gray-area effect). Perhaps the core module can be linked to the 1.5 version. And if someone needs, for example, the improvements of the GUI for his module, this module will be linked to another version. Possible or not ? F Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit : > Hi, > > I just tried to get an initial svn checkout of biojava3 on my mac at > home. It fails to build since there is no Java 1.6 available for my > OSX 10.4.11 ... > Is there a strong reason why we should enforce java 1.6? otherwise > would be good to support 1.5+ > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From dicknetherlands at gmail.com Fri Oct 24 11:14:43 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 24 Oct 2008 12:14:43 +0100 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> Message-ID: If you can find a way to make Maven do that, then I'm happy for you to make the relevant changes. cheers, Richard 2008/10/24 Fabrice Jossinet > Just to refresh the memory.... > > Major changes included in Java 6: > > * Support for older Win9x versions dropped. The last version for Windows > 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 ( > 1.5.0.16). > * Scripting Language Support (JSR 223): Generic API for tight > integration with scripting languages, and built-in Mozilla Javascript Rhino > integration > * Dramatic performance improvements for the core platform, and Swing. > * Improved Web Service support through JAX-WS (JSR 224) > * JDBC 4.0 support (JSR 221). > * Java Compiler API (JSR 199): an API allowing a Java program to select > and invoke a Java Compiler programmatically. > * Upgrade of JAXB to version 2.0: Including integration of a StAX > parser. > * Support for pluggable annotations (JSR 269). > * Many GUI improvements, such as integration of SwingWorker in the API, > table sorting and filtering, and true Swing double-buffering (eliminating > the gray-area effect). > > Perhaps the core module can be linked to the 1.5 version. And if someone > needs, for example, the improvements of the GUI for his module, this module > will be linked to another version. > > Possible or not ? > > F > > Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit : > > > Hi, >> >> I just tried to get an initial svn checkout of biojava3 on my mac at >> home. It fails to build since there is no Java 1.6 available for my >> OSX 10.4.11 ... >> Is there a strong reason why we should enforce java 1.6? otherwise >> would be good to support 1.5+ >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Fri Oct 24 11:28:56 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 24 Oct 2008 12:28:56 +0100 Subject: [Biojava-dev] biojava 3 - java version In-Reply-To: References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com> <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr> Message-ID: <4901B178.7090307@ebi.ac.uk> Yes I believe it is possible to get a module compiled against a different type of Java as seen here: http://maven.apache.org/plugins/maven-compiler-plugin/howto.html However to do this properly it requires compiling the code using the 1.5 JDK sources especially if we are going to leverage the API as much as we can. My group has already encountered this with changes to the java.sql.Connection interfaces meaning we have to compile against 1.5 sources. Andy Richard Holland wrote: > If you can find a way to make Maven do that, then I'm happy for you to make > the relevant changes. > > cheers, > Richard > > 2008/10/24 Fabrice Jossinet > >> Just to refresh the memory.... >> >> Major changes included in Java 6: >> >> * Support for older Win9x versions dropped. The last version for Windows >> 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 ( >> 1.5.0.16). >> * Scripting Language Support (JSR 223): Generic API for tight >> integration with scripting languages, and built-in Mozilla Javascript Rhino >> integration >> * Dramatic performance improvements for the core platform, and Swing. >> * Improved Web Service support through JAX-WS (JSR 224) >> * JDBC 4.0 support (JSR 221). >> * Java Compiler API (JSR 199): an API allowing a Java program to select >> and invoke a Java Compiler programmatically. >> * Upgrade of JAXB to version 2.0: Including integration of a StAX >> parser. >> * Support for pluggable annotations (JSR 269). >> * Many GUI improvements, such as integration of SwingWorker in the API, >> table sorting and filtering, and true Swing double-buffering (eliminating >> the gray-area effect). >> >> Perhaps the core module can be linked to the 1.5 version. And if someone >> needs, for example, the improvements of the GUI for his module, this module >> will be linked to another version. >> >> Possible or not ? >> >> F >> >> Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit : >> >> >> Hi, >>> I just tried to get an initial svn checkout of biojava3 on my mac at >>> home. It fails to build since there is no Java 1.6 available for my >>> OSX 10.4.11 ... >>> Is there a strong reason why we should enforce java 1.6? otherwise >>> would be good to support 1.5+ >>> >>> Andreas >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > From pzgyuanf at gmail.com Sat Oct 25 14:00:17 2008 From: pzgyuanf at gmail.com (pprun) Date: Sat, 25 Oct 2008 22:00:17 +0800 Subject: [Biojava-dev] Test failed for Alphabet.getSymbolMatchType method Message-ID: <49032671.1080309@gmail.com> Hi, The current implementation uses the same condition equalsIgnoreCase for EXACT_STRING_MATCH and MIXED_CASE_MATCH public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) { ... if (a.toString().equalsIgnoreCase(b.toString())) { return SymbolMatchType.EXACT_STRING_MATCH; } if (a.toString().equalsIgnoreCase(b.toString())) { return SymbolMatchType.MIXED_CASE_MATCH; } ... String.equals should be used for EXACT_STRING_MATCH: public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) { ... if (a.toString().equals(b.toString())) { return SymbolMatchType.EXACT_STRING_MATCH; } if (a.toString().equalsIgnoreCase(b.toString())) { return SymbolMatchType.MIXED_CASE_MATCH; } ... The test case used to identify the above bug is: /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.core.symbol; import org.junit.After; import org.junit.AfterClass; import org.junit.Before; import org.junit.BeforeClass; import org.junit.Test; import static org.junit.Assert.*; /** * * @author pprun */ public class AlphabetTest { public AlphabetTest() { } @BeforeClass public static void setUpClass() throws Exception { } @AfterClass public static void tearDownClass() throws Exception { } @Before public void setUp() { } @After public void tearDown() { } /** * Test of getSymbolMatchType method, of class Alphabet. */ @Test public void testGetSymbolMatchType() { System.out.println("getSymbolMatchType"); Alphabet testAlphabet = new Alphabet("testGetSymbolMatchType"); // 1. exact match Symbol a = Symbol.get("ATGC"); Symbol b = Symbol.get("ATGC"); SymbolMatchType expResult = SymbolMatchType.EXACT_MATCH; SymbolMatchType result = testAlphabet.getSymbolMatchType(a, b); assertEquals(expResult, result); // 2. mixed case match a = Symbol.get("ATGC"); b = Symbol.get("aTGC"); expResult = SymbolMatchType.MIXED_CASE_MATCH; result = testAlphabet.getSymbolMatchType(a, b); assertEquals(expResult, result); } } BTW., how can I get the dev/test role? Then I can contribute to the development or test (as I'm still a beginner for bio field) for BJ3. Thanks, Pprun From andreas at sdsc.edu Tue Oct 28 04:40:35 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 27 Oct 2008 21:40:35 -0700 Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship In-Reply-To: References: Message-ID: <59a41c430810272140h290a8a91q26af24946c2c63a5@mail.gmail.com> Hi Richard, I updated the 1.6 release with your fixes : http://www.biojava.org/download/bj16/all/biojava-1.6.1-all.jar Can you please verify and if it is correct update the download page on the wiki? Andreas On Thu, Oct 23, 2008 at 6:24 AM, Richard Holland wrote: > Andreas - is it possible to rebuild biojava-1.6-all.jar with the following > fix made to it? > > cheers, > Richard > > ---------- Forwarded message ---------- > From: Christian K?berle > Date: 2008/10/23 > Subject: Re: [Biojava-dev] BioSQL postgre BioEntryRelationship > To: Richard Holland > > > Hi Richard, > > I found the error in the current download of biojava 6.1 > (http://www.biojava.org/download/bj16/all/biojava-1.6-all.jar) in the file > src/org/biojavax/bio/db/biosql/pg/BioEntryRelationship.hbm.xml > > > table="bioentry_relationship" node="sequenceRelation" > entity-name="BioEntryRelationship"> > column="bioentry_relationship_id" node="@id"> > > bioentry_relationship_pk_seq > > > cascade="persist,merge,save-update" node="@termId" embed-xml="false"/> > not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId" > embed-xml="false"/> > not-null="true" cascade="persist,merge,save-update" > node="@subjectBioEntryId" embed-xml="false"/> > > > > > cheers, > Christian > > > 2008/10/23 Richard Holland >> >> Christian, >> >> Thanks for your comments. >> >> I'm not sure which file you're referring to, or what version of BioJava >> you have, as the line you quote does not appear in any of the current >> hbm.xml files in the trunk of SubVersion. >> >> Also, the BioEntryRelationship interface and it's implementations do >> already have getSubject() and getObject() methods which return the parent >> and child BioEntry instances. >> >> The BioEntry interface itself has a getBioEntryRelationships() method >> which returns all relationships in which it is the object BioEntry. You >> could use HQL to obtain those for which it is the subject, but you are right >> that it would be good to have a method that returns the latter. Could you >> raise a BugZilla request for this? >> >> It would be good if you could do some thorough testing of your lazy >> loading suggestions on some other use cases before we decide whether or not >> to adopt that approach in future developments. Use cases would include: >> >> 1. have a very large database with thousands of related records in it >> (e.g. load the whole of GenBank). Iterate over all the records in the >> database and perform a simple read operation on each that hits the modified >> methods. See if you run out of memory. >> >> 2. like 1, but perform a series of repeated read/write operations using >> the modified methods, with a final commit to attempt to write the results >> back to see if they still persist correctly. >> >> The reason is that the modified methods might cause problems with those >> people who are processing large volumes of data in their databases. If all >> related records are loaded at once, even only on demand, instead of one at a >> time, it will cause memory issues. The trade off is therefore memory vs. >> speed. We opted for the memory option because it makes life easier for most >> novice coders to not have to trace out-of-memory exceptions (although they >> will still occur using the existing methods, but it happens less often). >> >> Also, your method reruns the query every time it is called. It probably >> should cache the results after the first call, to prevent objects being >> reloaded unnecessarily, and to prevent problems with objects from a previous >> call being modified then attempted to be overwritten by a subsequent call? >> Also if Hibernate does not receive the same set back that it auto-loaded as >> a property via the default get() method when it comes to save the object, it >> will throw a wobbly and refuse to commit. >> >> cheers, >> Richard >> >> >> >> 2008/10/23 Christian K?berle >>> >>> Hi, >>> I found a bug in the postgre mapping file for BioEntryRelationship. >>> line: >>> >> not-null="true" cascade="persist,merge,save-update" >>> node="@objectFeatureId" >>> embed-xml="false"/> >>> The value for the attribute class has to be "BioEntry" >>> >>> For the BioEntry I miss methodes to have access to subject_bioentry >>> BioEntryRelationship. I think the BioEntryRelationship. is a parent child >>> relationship. So it will be nice to have access to both. >>> >>> Furthermore the hibernate mapping strategies for the BioSQL is quite slow >>> and produces a lot of queries to the database. Because for all lists and >>> set >>> the lazy fetch mode is disable. In this mode hibernate will execute one >>> query for each element in a list or set. The faster way is to enable the >>> lazy fetch mode an use methods to load the list. Each of these methods >>> executes only one query. >>> For excample: >>> >>> public List getParents(BioEntry bioEntry){ >>> >>> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object >>> =:subject"; >>> Query query = session.createQuery(stmt); >>> query.setParameter("subject", bioEntry); >>> return query.list(); >>> >>> } >>> >>> >>> This is factor 2 to 4 faster than the methode >>> BioEntry..getRelationships() >>> In case of all dependences of an BioEntry-Object an select with lazy >>> fetching can be 500 times faster than a select with eager fetching (in >>> case >>> of unigene cluster Hs.4 for example). >>> Here a example for the relationship unigene cluster Hs.2 and the gene >>> BC067218 (we use BioSQL to store Unigene) >>> >>> getParents(): >>> runtime: 14 msec >>> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_, >>> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_, >>> bioentry1_.accession as accession89_, bioentry1_.description as >>> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as >>> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id >>> as >>> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as >>> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as >>> seq93_, >>> case when bioentry1_1_.bioentry_id is not null then 2 when >>> bioentry1_.bioentry_id is not null then 0 end as clazz_ from >>> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry >>> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id >>> left >>> outer join unigene.biosequence bioentry1_1_ on >>> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer >>> joinunigene.biosequence bioentry1_2_ on >>> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where >>> bioentryre0_.object_bioentry_id=? >>> >>> >>> bioEntry.getRelationships(): >>> runtime: 36 msec >>> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_, >>> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_, >>> bioentry0_.accession as accession89_, bioentry0_.description as >>> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as >>> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id >>> as >>> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as >>> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as >>> seq93_, >>> case when bioentry0_1_.bioentry_id is not null then 2 when >>> bioentry0_.bioentry_id is not null then 0 end as clazz_ from >>> unigene.bioentry bioentry0_ left outer join unigene.biosequence >>> bioentry0_1_ >>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join >>> unigene.biosequence bioentry0_2_ on >>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=? >>> Hibernate: select relationsh0_.object_bioentry_id as object3_1_, >>> relationsh0_.bioentry_relationship_id as bioentry1_1_, >>> relationsh0_.bioentry_relationship_id as bioentry1_95_0_, >>> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as >>> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_, >>> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship >>> relationsh0_ where relationsh0_.object_bioentry_id=? >>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, >>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, >>> namespace0_.description as descript4_80_0_ from unigene.biodatabase >>> namespace0_ where namespace0_.biodatabase_id=? >>> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, >>> bioentry0_.name >>> as name89_0_, bioentry0_.identifier as identifier89_0_, >>> bioentry0_.accession >>> as accession89_0_, bioentry0_.description as descript5_89_0_, >>> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_, >>> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as >>> biodatab9_89_0_, bioentry0_1_.version as version93_0_, >>> bioentry0_1_.length >>> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq >>> as >>> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when >>> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from >>> unigene.bioentry bioentry0_ left outer join unigene.biosequence >>> bioentry0_1_ >>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join >>> unigene.biosequence bioentry0_2_ on >>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where >>> bioentry0_.bioentry_id=? >>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_, >>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_, >>> namespace0_.description as descript4_80_0_ from unigene.biodatabase >>> namespace0_ where namespace0_.biodatabase_id=? >>> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as >>> name84_0_, >>> term0_.identifier as identifier84_0_, term0_.definition as >>> definition84_0_, >>> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ >>> from >>> unigene.term term0_ where term0_.term_id=? >>> Hibernate: select ontology0_.ontology_id as ontology1_83_0_, >>> ontology0_.name >>> as name83_0_, ontology0_.definition as definition83_0_ from >>> unigene.ontology >>> ontology0_ where ontology0_.ontology_id=? >>> Hibernate: select termset0_.ontology_id as ontology6_1_, >>> termset0_.term_id >>> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as >>> name84_0_, >>> termset0_.identifier as identifier84_0_, termset0_.definition as >>> definition84_0_, termset0_.is_obsolete as is5_84_0_, >>> termset0_.ontology_id >>> as ontology6_84_0_ from unigene.term termset0_ where >>> termset0_.ontology_id=? >>> Hibernate: select tripleset0_.ontology_id as ontology5_1_, >>> tripleset0_.term_relationship_id as term1_1_, >>> tripleset0_.term_relationship_id as term1_87_0_, >>> tripleset0_.subject_term_id >>> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_, >>> tripleset0_.predicate_term_id as predicate4_87_0_, >>> tripleset0_.ontology_id >>> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where >>> tripleset0_.ontology_id=? >>> Hibernate: select rankedcros0_.term_id as term1_0_, >>> rankedcros0_.dbxref_id >>> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref >>> rankedcros0_ where rankedcros0_.term_id=? >>> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym >>> as >>> synonym0_ from unigene.term_synonym synonymset0_ where >>> synonymset0_.term_id=? >>> >>> -- >>> Christian K?berle >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ > > > > -- > Christian K?berle > Sch?nholzerstr. 5 > 10115 Berlin > Mobil: 0179 79 35 345 > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ >