From aradwen at gmail.com Sat May 1 05:42:10 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Sat, 1 May 2010 11:42:10 +0200 Subject: [Biojava-dev] Parsing PfamScan results Message-ID: Hi everyone, Do anyone already tried to implement a method to parse PfamScan outputs ? That's not too hard to do but I'm planning to write it in Java, but before I ignore if this is possible with Biojava or if someone already did somthing in that sense, it would be helpful. Regards Radwen -- R. ANIBA From andreas at sdsc.edu Mon May 3 00:53:28 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 2 May 2010 21:53:28 -0700 Subject: [Biojava-dev] Parsing PfamScan results In-Reply-To: References: Message-ID: Hi Radwen, So far there is no PfamScan parser in BioJava.... Andreas On Sat, May 1, 2010 at 2:42 AM, Radwen Aniba wrote: > Hi everyone, > > Do anyone already tried to implement a method to parse PfamScan outputs ? > That's not too hard to do but I'm planning to write it in Java, but before > I > ignore if this is possible with Biojava or if someone already did somthing > in that sense, it would be helpful. > > Regards > > Radwen > > -- > R. ANIBA > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From aradwen at gmail.com Mon May 3 00:59:53 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Mon, 3 May 2010 06:59:53 +0200 Subject: [Biojava-dev] Parsing PfamScan results In-Reply-To: References: Message-ID: Hi Anreas, Yes I've parsed Biojava API looking for something like that but started to write my own parser in Java, may be if someone could need it I can write a message with a code here. Thx anyway Rad 2010/5/3 Andreas Prlic > Hi Radwen, > > So far there is no PfamScan parser in BioJava.... > > Andreas > > On Sat, May 1, 2010 at 2:42 AM, Radwen Aniba wrote: > >> Hi everyone, >> >> Do anyone already tried to implement a method to parse PfamScan outputs ? >> That's not too hard to do but I'm planning to write it in Java, but before >> I >> ignore if this is possible with Biojava or if someone already did somthing >> in that sense, it would be helpful. >> >> Regards >> >> Radwen >> >> -- >> R. ANIBA >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > -- R. ANIBA Bioinformatics PhD Laboratoire de Bioinformatique et G?nomique Int?gratives, Institut de G?n?tique et de Biologie Mol?culaire et Cellulaire (IGBMC), 1 rue Laurent Fries, 67404 Illkirch, France. http://www-igbmc.u-strasbg.fr http://alnitak.u-strasbg.fr/~aniba/alexsys From jianjiong.gao at gmail.com Mon May 3 18:00:44 2010 From: jianjiong.gao at gmail.com (Jianjiong Gao) Date: Mon, 3 May 2010 17:00:44 -0500 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: References: <4BD7B711.9090108@cs.wisc.edu> <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> <4BD8DC33.7010607@cs.wisc.edu> Message-ID: Hi Dr. Prlic and Everyone, Thanks for the guide. I was able to follow your instructions and am now playing with the code now. I have also created a wiki page for my GSoC project: http://www.biojava.org/wiki/GSoC:PTM. Currently, it is almost the same with my project proposal. I will keep updating this page over the summer to reflect the project progress. If you have any comments/suggestions, please add them on the wiki page or send them to me or the list. Best Regards, -JJ On Fri, Apr 30, 2010 at 10:29 AM, Andreas Prlic wrote: > Hi Mark and Jianjiong, > > In the meanwhile you should have received your login info for the develoment > SVN server. I suggest the following things as next steps: > > > *) If you have not done so already, sign up to the biojava-l and biojava-dev > mailing lists > > *) Get a biojava checkout from the developmental SVN server. > > *) add the LGPL license javadoc header > http://www.biojava.org/wiki/BioJava3_license to the templates in your IDE. > > *) Take a look at the JUnit tests and add a new test for something that is > related for your projects > > *) Take a look at the Wiki pages (e.g. > http://www.biojava.org/wiki/BioJava:CookBook ), get an account on the wiki > and improve one of the documentation pages > > *) take a look at the javadocs at http://www.biojava.org/docs/api/index.html > > Andreas > From andreas at sdsc.edu Mon May 3 20:30:09 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 3 May 2010 17:30:09 -0700 Subject: [Biojava-dev] Parsing PfamScan results In-Reply-To: References: Message-ID: sure, just post the code here, we can do a review... Andreas On Sun, May 2, 2010 at 9:59 PM, Radwen Aniba wrote: > Hi Anreas, > > Yes I've parsed Biojava API looking for something like that but started to > write my own parser in Java, may be if someone could need it I can write a > message with a code here. > > Thx anyway > > Rad > > 2010/5/3 Andreas Prlic > > Hi Radwen, >> >> So far there is no PfamScan parser in BioJava.... >> >> Andreas >> >> On Sat, May 1, 2010 at 2:42 AM, Radwen Aniba wrote: >> >>> Hi everyone, >>> >>> Do anyone already tried to implement a method to parse PfamScan outputs ? >>> That's not too hard to do but I'm planning to write it in Java, but >>> before I >>> ignore if this is possible with Biojava or if someone already did >>> somthing >>> in that sense, it would be helpful. >>> >>> Regards >>> >>> Radwen >>> >>> -- >>> R. ANIBA >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> > > > > -- > R. ANIBA > > Bioinformatics PhD > Laboratoire de Bioinformatique et G?nomique Int?gratives, > Institut de G?n?tique et de Biologie Mol?culaire et Cellulaire (IGBMC), > 1 rue Laurent Fries, > 67404 Illkirch, France. > http://www-igbmc.u-strasbg.fr > http://alnitak.u-strasbg.fr/~aniba/alexsys > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From HWillis at scripps.edu Mon May 3 21:08:34 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 3 May 2010 21:08:34 -0400 Subject: [Biojava-dev] AbstractSequence Message-ID: <59BF0994-BF57-4AFB-AA6F-B35A9E310865@scripps.edu> Andy Trying to finish up the code for the gff parser where we start with a scaffold/dna sequence and by mapping all the various CDS regions we can extract the encoded protein sequence handling negative strand and phase shift. Each DNASequence can have a collection of genes. Each gene will have a collection of TranscriptionSequences Each TranscriptionSequence will have a collection of CDSSequences regions with strand and phase attributes. >From the CDSSequences owned by the parent GeneSequence we can pull out Intron/Exon sequences by superimposing all CDS regions which will then form an exon region. If not an exon region then the remainder is intron regions. As it currently stands DNASequence would actually contain the sequence data where you can't create a GeneSequence without passing in a parent DNA sequence. GeneSequence,TranscriptSequence and CDSSequences all extend DNASequence but do not have a reference to backend store but for all modeling purposes they are DNASequences. When I call getSubSequence(begin,end) for the CDS sequence we don't handle the case where we will walk up the parents to find a valid backend store. I should be able to fix it with some minor changes in AbstractSequence and giving AbstractSequence a reference to a possible ParentSequence. Before making any changes I wanted to make sure you are all checked in so I don't run into major architectural changes on your end. I will be working on this genome related code for the next 30 days which will help me allocate to getting the core architecture full functional. Thanks Scooter From heuermh at acm.org Tue May 4 01:59:36 2010 From: heuermh at acm.org (Michael Heuer) Date: Tue, 4 May 2010 01:59:36 -0400 (EDT) Subject: [Biojava-dev] biojava SVN In-Reply-To: Message-ID: We recently went through the process of syncing a project to maven central repository using the OSS Repository Nexus installation at Sonatype. http://code.google.com/p/piccolo2d/issues/detail?id=168 https://docs.sonatype.com/display/NX/OSS+Repository+Hosting Using Nexus as a repository manager is quite nice (we use it at work too) but syncing to central carries some additional requirements (e.g. all releases must be signed with a PGP key hosted on the MIT keyserver) which take some time to work through. All biojava dependencies (including maven plugins) must already be available on maven central or we won't be able to sync -- kind of a chicken and egg problem I know. michael On Fri, 30 Apr 2010, Richard Holland wrote: > Could a small MVN repo be set up at OBF? > > On 30 Apr 2010, at 16:48, Andy Yates wrote: > > > Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? > > > > Andy > > > > On 30 Apr 2010, at 16:44, Andreas Prlic wrote: > > > >> Hi, > >> > >> The BioJava SVN has not been fully compiling ever since the Hackathon. I > >> guess things were quite in flux the last months and it is now time to make > >> sure SVN fully compiles again. There is a few things we need to figure out > >> in order for that: > >> > >> * Jar files for libraries that are not in a public Maven repository. Jules : > >> at some point you indicated that we might be able to get such jar files > >> hosted by the EBI Maven repository. Do you think that is still an > >> possibility and could you get a few libraries into that? In particular that > >> would be Jmol, Astex, and probably one or two other Jar files. That would > >> make the BioJava checkout process much smoother and not require a developer > >> to manually install jars for full functionality. > >> > >> * We have a couple of modules that are fragmented and broken. This is due to > >> historic leftovers from when we started the re-factoring process. If all the > >> functionality has been moved into the new biojava3-core module, I would vote > >> for removing the modules starting with sequence* > >> > >> Andreas > >> > >> > >> -- > >> ----------------------------------------------------------------------- > >> Dr. Andreas Prlic > >> Senior Scientist, RCSB PDB Protein Data Bank > >> University of California, San Diego > >> (+1) 858.246.0526 > >> ----------------------------------------------------------------------- > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > > Andrew Yates Ensembl Genomes Engineer > > EMBL-EBI Tel: +44-(0)1223-492538 > > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Tue May 4 11:25:44 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 4 May 2010 16:25:44 +0100 Subject: [Biojava-dev] AbstractSequence In-Reply-To: <59BF0994-BF57-4AFB-AA6F-B35A9E310865@scripps.edu> References: <59BF0994-BF57-4AFB-AA6F-B35A9E310865@scripps.edu> Message-ID: Hi Scooter, So by the sounds of things really we're saying here that we need a way of decorating the sequences accordingly to make them behave accordingly whilst letting someone insatiate an instance of a GeneSequence (for example) if required. That sounds a reasonable thing to do and the way that you've plotted it out sounds good as well. The walking code shouldn't be too hard since each level should be able to delegate if & when required (to all intents & purposes each sequence is a backing store to another sequence). So from my side of things I've got some things to check in. I've relaxed the typing rules on Fasta parsing/writing because it wouldn't let you write a Sequence object back out which is poor (and happened only because of the changes I made). The only other code I have is an implementation of a 2bit sequence storage engine. Mostly because my group was trying their best to decipher a 2bit encoded sequence (as in encoded but not in the UCSC .2bit file format) and I decided to take our efforts into an example SequenceBackingStore. That said I'm a bit wary of committing it in since it is not something you _need_ and therefore would be better going into an extensions library (but we don't have one yet). What do you think? Andy On 4 May 2010, at 02:08, Scooter Willis wrote: > Andy > > Trying to finish up the code for the gff parser where we start with a scaffold/dna sequence and by mapping all the various CDS regions we can extract the encoded protein sequence handling negative strand and phase shift. > > Each DNASequence can have a collection of genes. > Each gene will have a collection of TranscriptionSequences > Each TranscriptionSequence will have a collection of CDSSequences regions with strand and phase attributes. >> From the CDSSequences owned by the parent GeneSequence we can pull out Intron/Exon sequences by superimposing all CDS regions which will then form an exon region. If not an exon region then the remainder is intron regions. > > As it currently stands DNASequence would actually contain the sequence data where you can't create a GeneSequence without passing in a parent DNA sequence. GeneSequence,TranscriptSequence and CDSSequences all extend DNASequence but do not have a reference to backend store but for all modeling purposes they are DNASequences. When I call getSubSequence(begin,end) for the CDS sequence we don't handle the case where we will walk up the parents to find a valid backend store. I should be able to fix it with some minor changes in AbstractSequence and giving AbstractSequence a reference to a possible ParentSequence. > > Before making any changes I wanted to make sure you are all checked in so I don't run into major architectural changes on your end. I will be working on this genome related code for the next 30 days which will help me allocate to getting the core architecture full functional. > > Thanks > > Scooter -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Tue May 4 11:38:12 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 4 May 2010 11:38:12 -0400 Subject: [Biojava-dev] AbstractSequence In-Reply-To: References: <59BF0994-BF57-4AFB-AA6F-B35A9E310865@scripps.edu> Message-ID: <7716EA6A-0284-40BA-9BBB-580BD0F335D5@scripps.edu> Andy I am knee deep in testing the changes so don't worry about check in of code and we can sort that out after I have it settled. I am trying to work myself through the need of SequenceView as an abstract concept where I think I have now built that into the default behavior based on a parent child relationships in the sequences. You can get a sub sequence of a DNA sequence of any type where it will have a start and end based on parent sequence. Won't work currently if you want to use relative indexing on a CDS region that is defined as an index relative to some high level parentDNA scaffold sequence. Easy enough to add relative indexing. Not sure if SequenceView will be required. I would like to be in a position that you can have a CDS sequence and ask for intron sequence data relative to the start of the CDS sequence with local coordinates. Gets complicated with negative strand where you want 3' or 5' data but if I close my eyes and put the code in the correct spot it should work. If we get all the relationships correct and name the method calls correctly should be straightforward. I also changed the getBegin and getEnd methods to getBioBegin() and getBioEnd() to be very clear we are using 1 based indexing for all apis. Scooter On May 4, 2010, at 11:25 AM, Andy Yates wrote: > Hi Scooter, > > So by the sounds of things really we're saying here that we need a way of decorating the sequences accordingly to make them behave accordingly whilst letting someone insatiate an instance of a GeneSequence (for example) if required. That sounds a reasonable thing to do and the way that you've plotted it out sounds good as well. > > The walking code shouldn't be too hard since each level should be able to delegate if & when required (to all intents & purposes each sequence is a backing store to another sequence). > > So from my side of things I've got some things to check in. I've relaxed the typing rules on Fasta parsing/writing because it wouldn't let you write a Sequence object back out which is poor (and happened only because of the changes I made). The only other code I have is an implementation of a 2bit sequence storage engine. Mostly because my group was trying their best to decipher a 2bit encoded sequence (as in encoded but not in the UCSC .2bit file format) and I decided to take our efforts into an example SequenceBackingStore. That said I'm a bit wary of committing it in since it is not something you _need_ and therefore would be better going into an extensions library (but we don't have one yet). What do you think? > > Andy > > On 4 May 2010, at 02:08, Scooter Willis wrote: > >> Andy >> >> Trying to finish up the code for the gff parser where we start with a scaffold/dna sequence and by mapping all the various CDS regions we can extract the encoded protein sequence handling negative strand and phase shift. >> >> Each DNASequence can have a collection of genes. >> Each gene will have a collection of TranscriptionSequences >> Each TranscriptionSequence will have a collection of CDSSequences regions with strand and phase attributes. >>> From the CDSSequences owned by the parent GeneSequence we can pull out Intron/Exon sequences by superimposing all CDS regions which will then form an exon region. If not an exon region then the remainder is intron regions. >> >> As it currently stands DNASequence would actually contain the sequence data where you can't create a GeneSequence without passing in a parent DNA sequence. GeneSequence,TranscriptSequence and CDSSequences all extend DNASequence but do not have a reference to backend store but for all modeling purposes they are DNASequences. When I call getSubSequence(begin,end) for the CDS sequence we don't handle the case where we will walk up the parents to find a valid backend store. I should be able to fix it with some minor changes in AbstractSequence and giving AbstractSequence a reference to a possible ParentSequence. >> >> Before making any changes I wanted to make sure you are all checked in so I don't run into major architectural changes on your end. I will be working on this genome related code for the next 30 days which will help me allocate to getting the core architecture full functional. >> >> Thanks >> >> Scooter > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > From ayates at ebi.ac.uk Tue May 4 12:10:35 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 4 May 2010 17:10:35 +0100 Subject: [Biojava-dev] AbstractSequence In-Reply-To: <7716EA6A-0284-40BA-9BBB-580BD0F335D5@scripps.edu> References: <59BF0994-BF57-4AFB-AA6F-B35A9E310865@scripps.edu> <7716EA6A-0284-40BA-9BBB-580BD0F335D5@scripps.edu> Message-ID: <346E0954-4EF9-4167-B050-D8608451857D@ebi.ac.uk> Okay I'll go ahead with the commit then and try to make sure it'll work as it gets committed in (to make sure I'm not going to annoy Andreas). If that kind of functionality is baked into the Sequence implementation then I agree there should be no reason why we should need it. The only reason I think it existed originally was to provide a quick way of returning sub-Sequences with relative indexes. The only thing to be aware of is that I extended ComplementSequenceView and ReversedSequenceView from the abstract classes. These should still exist in some capacity however their implementation means it would be easy to build a version which is _just_ a Sequence (there's nothing fancy it requires WRT SequenceViews just that it is a decorated sequence). Wow okay evil stuff but yeah great if you can do it. Don't forget that we've got to think about circular co-orindates as well. So long as we can bake it in early it should be fine ... or we offer a CircularChromosome class (which would be used for mitochondria & plasmids amongst other things). Okay I'm not a fan of that kind of naming of coordinate methods but I know why you're doing it :). It does make it clearer ... so long as there is nothing that lets you index in 0 base then I'm happy. If you're working with sequence then you should only ever be working in 1 based indexing. Andy On 4 May 2010, at 16:38, Scooter Willis wrote: > Andy > > I am knee deep in testing the changes so don't worry about check in of code and we can sort that out after I have it settled. I am trying to work myself through the need of SequenceView as an abstract concept where I think I have now built that into the default behavior based on a parent child relationships in the sequences. You can get a sub sequence of a DNA sequence of any type where it will have a start and end based on parent sequence. Won't work currently if you want to use relative indexing on a CDS region that is defined as an index relative to some high level parentDNA scaffold sequence. Easy enough to add relative indexing. Not sure if SequenceView will be required. > > I would like to be in a position that you can have a CDS sequence and ask for intron sequence data relative to the start of the CDS sequence with local coordinates. Gets complicated with negative strand where you want 3' or 5' data but if I close my eyes and put the code in the correct spot it should work. If we get all the relationships correct and name the method calls correctly should be straightforward. I also changed the getBegin and getEnd methods to getBioBegin() and getBioEnd() to be very clear we are using 1 based indexing for all apis. > > Scooter > > > On May 4, 2010, at 11:25 AM, Andy Yates wrote: > >> Hi Scooter, >> >> So by the sounds of things really we're saying here that we need a way of decorating the sequences accordingly to make them behave accordingly whilst letting someone insatiate an instance of a GeneSequence (for example) if required. That sounds a reasonable thing to do and the way that you've plotted it out sounds good as well. >> >> The walking code shouldn't be too hard since each level should be able to delegate if & when required (to all intents & purposes each sequence is a backing store to another sequence). >> >> So from my side of things I've got some things to check in. I've relaxed the typing rules on Fasta parsing/writing because it wouldn't let you write a Sequence object back out which is poor (and happened only because of the changes I made). The only other code I have is an implementation of a 2bit sequence storage engine. Mostly because my group was trying their best to decipher a 2bit encoded sequence (as in encoded but not in the UCSC .2bit file format) and I decided to take our efforts into an example SequenceBackingStore. That said I'm a bit wary of committing it in since it is not something you _need_ and therefore would be better going into an extensions library (but we don't have one yet). What do you think? >> >> Andy >> >> On 4 May 2010, at 02:08, Scooter Willis wrote: >> >>> Andy >>> >>> Trying to finish up the code for the gff parser where we start with a scaffold/dna sequence and by mapping all the various CDS regions we can extract the encoded protein sequence handling negative strand and phase shift. >>> >>> Each DNASequence can have a collection of genes. >>> Each gene will have a collection of TranscriptionSequences >>> Each TranscriptionSequence will have a collection of CDSSequences regions with strand and phase attributes. >>>> From the CDSSequences owned by the parent GeneSequence we can pull out Intron/Exon sequences by superimposing all CDS regions which will then form an exon region. If not an exon region then the remainder is intron regions. >>> >>> As it currently stands DNASequence would actually contain the sequence data where you can't create a GeneSequence without passing in a parent DNA sequence. GeneSequence,TranscriptSequence and CDSSequences all extend DNASequence but do not have a reference to backend store but for all modeling purposes they are DNASequences. When I call getSubSequence(begin,end) for the CDS sequence we don't handle the case where we will walk up the parents to find a valid backend store. I should be able to fix it with some minor changes in AbstractSequence and giving AbstractSequence a reference to a possible ParentSequence. >>> >>> Before making any changes I wanted to make sure you are all checked in so I don't run into major architectural changes on your end. I will be working on this genome related code for the next 30 days which will help me allocate to getting the core architecture full functional. >>> >>> Thanks >>> >>> Scooter >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Tue May 4 12:14:53 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 4 May 2010 17:14:53 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: References: Message-ID: <42C46DBC-0290-4E4E-96A1-361BF1E6ED94@ebi.ac.uk> Well it's too bad a chicken & egg situation really. Looking at the Maven file the current BioJava has only 1 dependency on a non-Mavenized project which is bytecode.jar. I can have a look at doing this for 1.7 if people don't mind. Andy On 4 May 2010, at 06:59, Michael Heuer wrote: > > We recently went through the process of syncing a project to maven central > repository using the OSS Repository Nexus installation at Sonatype. > > http://code.google.com/p/piccolo2d/issues/detail?id=168 > > https://docs.sonatype.com/display/NX/OSS+Repository+Hosting > > Using Nexus as a repository manager is quite nice (we use it at work too) > but syncing to central carries some additional requirements (e.g. all > releases must be signed with a PGP key hosted on the MIT keyserver) which > take some time to work through. > > All biojava dependencies (including maven plugins) must already be > available on maven central or we won't be able to sync -- kind of a > chicken and egg problem I know. > > michael > > > On Fri, 30 Apr 2010, Richard Holland wrote: > >> Could a small MVN repo be set up at OBF? >> >> On 30 Apr 2010, at 16:48, Andy Yates wrote: >> >>> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? >>> >>> Andy >>> >>> On 30 Apr 2010, at 16:44, Andreas Prlic wrote: >>> >>>> Hi, >>>> >>>> The BioJava SVN has not been fully compiling ever since the Hackathon. I >>>> guess things were quite in flux the last months and it is now time to make >>>> sure SVN fully compiles again. There is a few things we need to figure out >>>> in order for that: >>>> >>>> * Jar files for libraries that are not in a public Maven repository. Jules : >>>> at some point you indicated that we might be able to get such jar files >>>> hosted by the EBI Maven repository. Do you think that is still an >>>> possibility and could you get a few libraries into that? In particular that >>>> would be Jmol, Astex, and probably one or two other Jar files. That would >>>> make the BioJava checkout process much smoother and not require a developer >>>> to manually install jars for full functionality. >>>> >>>> * We have a couple of modules that are fragmented and broken. This is due to >>>> historic leftovers from when we started the re-factoring process. If all the >>>> functionality has been moved into the new biojava3-core module, I would vote >>>> for removing the modules starting with sequence* >>>> >>>> Andreas >>>> >>>> >>>> -- >>>> ----------------------------------------------------------------------- >>>> Dr. Andreas Prlic >>>> Senior Scientist, RCSB PDB Protein Data Bank >>>> University of California, San Diego >>>> (+1) 858.246.0526 >>>> ----------------------------------------------------------------------- >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Tue May 4 12:22:57 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 4 May 2010 17:22:57 +0100 Subject: [Biojava-dev] AbstractSequence In-Reply-To: <7716EA6A-0284-40BA-9BBB-580BD0F335D5@scripps.edu> References: <59BF0994-BF57-4AFB-AA6F-B35A9E310865@scripps.edu> <7716EA6A-0284-40BA-9BBB-580BD0F335D5@scripps.edu> Message-ID: <39D33BC6-A73F-49D4-92F2-5258A024E94B@ebi.ac.uk> I should also mention the appearance of what I've called (badly) mixins which are static methods to be used in different sequence implementations without having to inherit the code from an abstract class. This is quite limited in what it is bringing but is a far better place for cross-concern Sequence methods to go e.g. counting the number of times a Compound(s) appears in a Sequence. The name is poor but I think what they bring is worth keeping them Andy On 4 May 2010, at 16:38, Scooter Willis wrote: > Andy > > I am knee deep in testing the changes so don't worry about check in of code and we can sort that out after I have it settled. I am trying to work myself through the need of SequenceView as an abstract concept where I think I have now built that into the default behavior based on a parent child relationships in the sequences. You can get a sub sequence of a DNA sequence of any type where it will have a start and end based on parent sequence. Won't work currently if you want to use relative indexing on a CDS region that is defined as an index relative to some high level parentDNA scaffold sequence. Easy enough to add relative indexing. Not sure if SequenceView will be required. > > I would like to be in a position that you can have a CDS sequence and ask for intron sequence data relative to the start of the CDS sequence with local coordinates. Gets complicated with negative strand where you want 3' or 5' data but if I close my eyes and put the code in the correct spot it should work. If we get all the relationships correct and name the method calls correctly should be straightforward. I also changed the getBegin and getEnd methods to getBioBegin() and getBioEnd() to be very clear we are using 1 based indexing for all apis. > > Scooter > > > On May 4, 2010, at 11:25 AM, Andy Yates wrote: > >> Hi Scooter, >> >> So by the sounds of things really we're saying here that we need a way of decorating the sequences accordingly to make them behave accordingly whilst letting someone insatiate an instance of a GeneSequence (for example) if required. That sounds a reasonable thing to do and the way that you've plotted it out sounds good as well. >> >> The walking code shouldn't be too hard since each level should be able to delegate if & when required (to all intents & purposes each sequence is a backing store to another sequence). >> >> So from my side of things I've got some things to check in. I've relaxed the typing rules on Fasta parsing/writing because it wouldn't let you write a Sequence object back out which is poor (and happened only because of the changes I made). The only other code I have is an implementation of a 2bit sequence storage engine. Mostly because my group was trying their best to decipher a 2bit encoded sequence (as in encoded but not in the UCSC .2bit file format) and I decided to take our efforts into an example SequenceBackingStore. That said I'm a bit wary of committing it in since it is not something you _need_ and therefore would be better going into an extensions library (but we don't have one yet). What do you think? >> >> Andy >> >> On 4 May 2010, at 02:08, Scooter Willis wrote: >> >>> Andy >>> >>> Trying to finish up the code for the gff parser where we start with a scaffold/dna sequence and by mapping all the various CDS regions we can extract the encoded protein sequence handling negative strand and phase shift. >>> >>> Each DNASequence can have a collection of genes. >>> Each gene will have a collection of TranscriptionSequences >>> Each TranscriptionSequence will have a collection of CDSSequences regions with strand and phase attributes. >>>> From the CDSSequences owned by the parent GeneSequence we can pull out Intron/Exon sequences by superimposing all CDS regions which will then form an exon region. If not an exon region then the remainder is intron regions. >>> >>> As it currently stands DNASequence would actually contain the sequence data where you can't create a GeneSequence without passing in a parent DNA sequence. GeneSequence,TranscriptSequence and CDSSequences all extend DNASequence but do not have a reference to backend store but for all modeling purposes they are DNASequences. When I call getSubSequence(begin,end) for the CDS sequence we don't handle the case where we will walk up the parents to find a valid backend store. I should be able to fix it with some minor changes in AbstractSequence and giving AbstractSequence a reference to a possible ParentSequence. >>> >>> Before making any changes I wanted to make sure you are all checked in so I don't run into major architectural changes on your end. I will be working on this genome related code for the next 30 days which will help me allocate to getting the core architecture full functional. >>> >>> Thanks >>> >>> Scooter >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From jacobsen at ebi.ac.uk Tue May 4 13:12:35 2010 From: jacobsen at ebi.ac.uk (Jules Jacobsen) Date: Tue, 04 May 2010 18:12:35 +0100 Subject: [Biojava-dev] Parsing PfamScan results In-Reply-To: References: Message-ID: <4BE05583.3050101@ebi.ac.uk> Is this the stockholm format you're talking about? If so I've partly written one for BioJava3 although came a bit unstuck with the multiple sequence alignment sequence and the new Sequence class. I need to reply to Scooter and Andy's thread regarding the Sequence stuff though. Sorry - got to get the bus now...! Jules On 04/05/2010 01:30, Andreas Prlic wrote: > sure, just post the code here, we can do a review... > > Andreas > > On Sun, May 2, 2010 at 9:59 PM, Radwen Aniba wrote: > >> Hi Anreas, >> >> Yes I've parsed Biojava API looking for something like that but started to >> write my own parser in Java, may be if someone could need it I can write a >> message with a code here. >> >> Thx anyway >> >> Rad >> >> 2010/5/3 Andreas Prlic >> >> Hi Radwen, >>> >>> So far there is no PfamScan parser in BioJava.... >>> >>> Andreas >>> >>> On Sat, May 1, 2010 at 2:42 AM, Radwen Aniba wrote: >>> >>>> Hi everyone, >>>> >>>> Do anyone already tried to implement a method to parse PfamScan outputs ? >>>> That's not too hard to do but I'm planning to write it in Java, but >>>> before I >>>> ignore if this is possible with Biojava or if someone already did >>>> somthing >>>> in that sense, it would be helpful. >>>> >>>> Regards >>>> >>>> Radwen >>>> >>>> -- >>>> R. ANIBA >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> >>> >>> >>> -- >>> ----------------------------------------------------------------------- >>> Dr. Andreas Prlic >>> Senior Scientist, RCSB PDB Protein Data Bank >>> University of California, San Diego >>> (+1) 858.246.0526 >>> ----------------------------------------------------------------------- >>> >> >> >> >> -- >> R. ANIBA >> >> Bioinformatics PhD >> Laboratoire de Bioinformatique et G?nomique Int?gratives, >> Institut de G?n?tique et de Biologie Mol?culaire et Cellulaire (IGBMC), >> 1 rue Laurent Fries, >> 67404 Illkirch, France. >> http://www-igbmc.u-strasbg.fr >> http://alnitak.u-strasbg.fr/~aniba/alexsys >> > > > From andreas.prlic at gmail.com Tue May 4 16:12:12 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 4 May 2010 13:12:12 -0700 Subject: [Biojava-dev] biojava SVN In-Reply-To: <42C46DBC-0290-4E4E-96A1-361BF1E6ED94@ebi.ac.uk> References: <42C46DBC-0290-4E4E-96A1-361BF1E6ED94@ebi.ac.uk> Message-ID: The answer how to get non-mavenized Jars out is probably a simple one: here how to set up our own repository. I will look at setting one up at one of the OBF servers. This can then be pointed to from our pom files... http://steve-on-sakai.blogspot.com/2009/12/setting-up-your-own-maven-repository.html Andreas On Tue, May 4, 2010 at 9:14 AM, Andy Yates wrote: > Well it's too bad a chicken & egg situation really. Looking at the Maven file the current BioJava has only 1 dependency on a non-Mavenized project which is bytecode.jar. > > I can have a look at doing this for 1.7 if people don't mind. > > Andy > > On 4 May 2010, at 06:59, Michael Heuer wrote: > >> >> We recently went through the process of syncing a project to maven central >> repository using the OSS Repository Nexus installation at Sonatype. >> >> http://code.google.com/p/piccolo2d/issues/detail?id=168 >> >> https://docs.sonatype.com/display/NX/OSS+Repository+Hosting >> >> Using Nexus as a repository manager is quite nice (we use it at work too) >> but syncing to central carries some additional requirements (e.g. all >> releases must be signed with a PGP key hosted on the MIT keyserver) which >> take some time to work through. >> >> All biojava dependencies (including maven plugins) must already be >> available on maven central or we won't be able to sync -- kind of a >> chicken and egg problem I know. >> >> ? michael >> >> >> On Fri, 30 Apr 2010, Richard Holland wrote: >> >>> Could a small MVN repo be set up at OBF? >>> >>> On 30 Apr 2010, at 16:48, Andy Yates wrote: >>> >>>> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? >>>> >>>> Andy >>>> >>>> On 30 Apr 2010, at 16:44, Andreas Prlic wrote: >>>> >>>>> Hi, >>>>> >>>>> The BioJava SVN has not been fully compiling ever since the Hackathon. I >>>>> guess things were quite in flux the last months and it is now time to make >>>>> sure SVN fully compiles again. ?There is a few things we need to figure out >>>>> in order for that: >>>>> >>>>> * Jar files for libraries that are not in a public Maven repository. Jules : >>>>> at some point you indicated that we might be able to get such jar files >>>>> hosted by the EBI Maven repository. Do you think that is still an >>>>> possibility and could you get a few libraries into that? In particular that >>>>> would be Jmol, Astex, and probably one or two other Jar files. That would >>>>> make the BioJava checkout process much smoother and not require a developer >>>>> to manually install jars for full functionality. >>>>> >>>>> * We have a couple of modules that are fragmented and broken. This is due to >>>>> historic leftovers from when we started the re-factoring process. If all the >>>>> functionality has been moved into the new biojava3-core module, I would vote >>>>> for removing the modules starting with sequence* >>>>> >>>>> Andreas >>>>> >>>>> >>>>> -- >>>>> ----------------------------------------------------------------------- >>>>> Dr. Andreas Prlic >>>>> Senior Scientist, RCSB PDB Protein Data Bank >>>>> University of California, San Diego >>>>> (+1) 858.246.0526 >>>>> ----------------------------------------------------------------------- >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From heuermh at acm.org Tue May 4 23:22:09 2010 From: heuermh at acm.org (Michael Heuer) Date: Tue, 4 May 2010 23:22:09 -0400 (EDT) Subject: [Biojava-dev] biojava SVN In-Reply-To: Message-ID: On Tue, 4 May 2010, Andreas Prlic wrote: > The answer how to get non-mavenized Jars out is probably a simple one: > > here how to set up our own repository. I will look at setting one up > at one of the OBF servers. This can then be pointed to from our pom > files... > > http://steve-on-sakai.blogspot.com/2009/12/setting-up-your-own-maven-repository.html Right, but that helps only temporarily. We can't have any non-maven central repositories or any non-maven central repo dependencies if you want to rsync a local repo to maven central. If we are setting up a local repo, I would recommend installing Nexus. michael From andreas.prlic at gmail.com Wed May 5 00:38:45 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 4 May 2010 21:38:45 -0700 Subject: [Biojava-dev] biojava SVN In-Reply-To: References:

Message-ID: > Right, but that helps only temporarily. We can't have any non-maven > central repositories or any non-maven central repo dependencies if you > want to rsync a local repo to maven central. > Ok. Is it at all possible to upload 3rd party jars to the central repo? > If we are setting up a local repo, I would recommend installing Nexus. > I will try to talk to the BioMoby guys about what they are using for their repository... In the meanwhile I started to play around with an biojava specific repository... http://www.biojava.org/download/maven/ Andreas From ayates at ebi.ac.uk Wed May 5 03:59:52 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 5 May 2010 08:59:52 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: References:

Message-ID: Which 3rd party jars? Looking at the dependency list of the core 3rd party dependencies which were not in maven central was 0 (apart from bytecode). Andy On 5 May 2010, at 05:38, Andreas Prlic wrote: >> Right, but that helps only temporarily. We can't have any non-maven >> central repositories or any non-maven central repo dependencies if you >> want to rsync a local repo to maven central. >> > > Ok. Is it at all possible to upload 3rd party jars to the central repo? > > >> If we are setting up a local repo, I would recommend installing Nexus. >> > > I will try to talk to the BioMoby guys about what they are using for their > repository... > > In the meanwhile I started to play around with an biojava specific > repository... > http://www.biojava.org/download/maven/ > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas.prlic at gmail.com Wed May 5 11:22:40 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Wed, 5 May 2010 08:22:40 -0700 Subject: [Biojava-dev] biojava SVN In-Reply-To: References:

Message-ID: Hi Andy, I am talking about biojava3: the structure-gui module requires a visualization library (Jmol) and phylo depends on forester... Andreas On Wed, May 5, 2010 at 12:59 AM, Andy Yates wrote: > Which 3rd party jars? Looking at the dependency list of the core 3rd party dependencies which were not in maven central was 0 (apart from bytecode). > > Andy > > On 5 May 2010, at 05:38, Andreas Prlic wrote: > >>> Right, but that helps only temporarily. ?We can't have any non-maven >>> central repositories or any non-maven central repo dependencies if you >>> want to rsync a local repo to maven central. >>> >> >> Ok. Is it at all possible to upload 3rd party jars to the central repo? >> >> >>> If we are setting up a local repo, I would recommend installing Nexus. >>> >> >> I ?will try to talk to the BioMoby guys about what they are using for their >> repository... >> >> In the meanwhile I started to play around with an biojava specific >> repository... >> http://www.biojava.org/download/maven/ >> >> Andreas >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From ayates at ebi.ac.uk Wed May 5 11:33:18 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 5 May 2010 16:33:18 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: References:

Message-ID: Ah fair enough. So Jmol should be okay so long as we can get the okay from those guys. Forester is a bit harder since there is no official release of that code. However I thought that code base had been committed into our repo. Andy On 5 May 2010, at 16:22, Andreas Prlic wrote: > Hi Andy, > > I am talking about biojava3: the structure-gui module requires a > visualization library (Jmol) and phylo depends on forester... > > Andreas > > On Wed, May 5, 2010 at 12:59 AM, Andy Yates wrote: >> Which 3rd party jars? Looking at the dependency list of the core 3rd party dependencies which were not in maven central was 0 (apart from bytecode). >> >> Andy >> >> On 5 May 2010, at 05:38, Andreas Prlic wrote: >> >>>> Right, but that helps only temporarily. We can't have any non-maven >>>> central repositories or any non-maven central repo dependencies if you >>>> want to rsync a local repo to maven central. >>>> >>> >>> Ok. Is it at all possible to upload 3rd party jars to the central repo? >>> >>> >>>> If we are setting up a local repo, I would recommend installing Nexus. >>>> >>> >>> I will try to talk to the BioMoby guys about what they are using for their >>> repository... >>> >>> In the meanwhile I started to play around with an biojava specific >>> repository... >>> http://www.biojava.org/download/maven/ >>> >>> Andreas >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Wed May 5 11:51:39 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 5 May 2010 11:51:39 -0400 Subject: [Biojava-dev] biojava SVN In-Reply-To: References:

Message-ID: <089AD916-7C05-4E82-9A8A-1F74435ED7E9@scripps.edu> I will send an email off to Forester developer to see if he wants to leverage the work where he can indicate the location for the maven repository. It would be nice to get him in the officially in the BioJava family. That way he may be willing to support and update it. It would be nice to not have "source" code for forester in our build unless we plan on doing some work on the code. The code is very solid and feature complete so I don't imagine we will have that requirement. Scooter On May 5, 2010, at 11:33 AM, Andy Yates wrote: > Ah fair enough. So Jmol should be okay so long as we can get the okay from those guys. Forester is a bit harder since there is no official release of that code. However I thought that code base had been committed into our repo. > > Andy > > On 5 May 2010, at 16:22, Andreas Prlic wrote: > >> Hi Andy, >> >> I am talking about biojava3: the structure-gui module requires a >> visualization library (Jmol) and phylo depends on forester... >> >> Andreas >> >> On Wed, May 5, 2010 at 12:59 AM, Andy Yates wrote: >>> Which 3rd party jars? Looking at the dependency list of the core 3rd party dependencies which were not in maven central was 0 (apart from bytecode). >>> >>> Andy >>> >>> On 5 May 2010, at 05:38, Andreas Prlic wrote: >>> >>>>> Right, but that helps only temporarily. We can't have any non-maven >>>>> central repositories or any non-maven central repo dependencies if you >>>>> want to rsync a local repo to maven central. >>>>> >>>> >>>> Ok. Is it at all possible to upload 3rd party jars to the central repo? >>>> >>>> >>>>> If we are setting up a local repo, I would recommend installing Nexus. >>>>> >>>> >>>> I will try to talk to the BioMoby guys about what they are using for their >>>> repository... >>>> >>>> In the meanwhile I started to play around with an biojava specific >>>> repository... >>>> http://www.biojava.org/download/maven/ >>>> >>>> Andreas >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From bugzilla-daemon at portal.open-bio.org Sun May 9 18:27:53 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 May 2010 18:27:53 -0400 Subject: [Biojava-dev] [Bug 2854] Selection of protein alphabet is hardcoded in ProteinTools class In-Reply-To: Message-ID: <201005092227.o49MRr0q009095@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2854 kmjuris at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 10 21:29:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 May 2010 21:29:00 -0400 Subject: [Biojava-dev] [Bug 2854] Selection of protein alphabet is hardcoded in ProteinTools class In-Reply-To: Message-ID: <201005110129.o4B1T0Mj028287@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2854 ------- Comment #1 from kmjuris at gmail.com 2010-05-10 21:28 EST ------- Created an attachment (id=1499) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1499&action=view) Unit Tests The attached Unit Tests test the proposed ProteinTools fix. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 10 21:31:33 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 May 2010 21:31:33 -0400 Subject: [Biojava-dev] [Bug 2854] Selection of protein alphabet is hardcoded in ProteinTools class In-Reply-To: Message-ID: <201005110131.o4B1VXag028435@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2854 ------- Comment #2 from kmjuris at gmail.com 2010-05-10 21:31 EST ------- Created an attachment (id=1500) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1500&action=view) Proposed fix to ProteinTools The proposed ProteinTools fix does not hardcode the selection of the ProteinTools.getTAlphabet() when calling createProtein(). Rather, the fix determines the appropriate alphabet based on theProtein string input. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From trevor.paterson at roslin.ed.ac.uk Tue May 11 10:45:43 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Tue, 11 May 2010 15:45:43 +0100 Subject: [Biojava-dev] DNASequence not being a bean Message-ID: <428F360A0BA69249B73A710B58435CFE04DBD60305@EXSEEDMD.is.ed.ac.uk> Hi as you may be aware I am working with Andy Law at Roslin to kick off development of a Java version of the ensembl-api. It makes good sense for us to integrate with the new Bio-Java code, however we have a few fundamental issues with the immutability of Sequence objects. Because the BioJava Sequence objects require initialization at construction time with the actual sequence we will have problems using Ibatis mapping to create Sequences from SQL queries, and we will not easily be able to use LazyLoad to fetch the sequence only when we need it. Ibatis uses bean setters to set properties on beans, which must have an empty constructor - so setSequence(String seqString, CompoundSet compoundSet) and setSequence(SequenceProxyLoader proxyLoader, CompoundSet compoundSet) would be very useful alternatively we could hack round this if we could access the backing store in our own subclasses of DNASequence - i.e give the Sequence properties 'protected' rather than 'private' visibility. In essence what we are wanting to do is implement DatasourceAware subclasses of BioJava DNASequence, which we can retrieve partially filled in from Ensembl, but only retrieve and set the actual DNASequence by LazyLoad when we want the DNA Sequence. It may be that we can implement this lazy loading using extensions of the SequenceProxyLoader interface (I am guessing that this is what it is for) but again we'd still benefit from accessing the backing store directly. Obviously I am not up-to-speed with the design ideas behind the DNASequence Object and maybe I am barking up the wrong tree in trying to subclass it to my own ends, so any hints and tips would be most welcome. cheers for now Trevor Paterson PhD new email trevor.paterson at roslin.ed.ac.uk Bioinformatics The Roslin Institute The Royal (Dick) School of Veterinary Studies University of Edinburgh Scotland EH25 9PS phone +44 (0)131 5274197 http://www.roslin.ed.ac.uk http://www.resspecies.org http://www.thearkdb.org -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From HWillis at scripps.edu Tue May 11 12:03:11 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 11 May 2010 12:03:11 -0400 Subject: [Biojava-dev] DNASequence not being a bean In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DBD60305@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DBD60305@EXSEEDMD.is.ed.ac.uk> Message-ID: <01F954F7-9999-4292-98AC-B567E722DCD0@scripps.edu> Trevor Andy Yates and I are knee deep in this at the moment and about to do a code check in to help clarify some of the concepts with classes that have better descriptive names. We will send out an email when we have all the issues resolved. SequenceProxyLoader concept has a name change to SequenceProxyReader to help leverage the abstract concepts of InputStreams or File Readers in Java. We realize it is confusing at the moment and I am working on examples to make it a little clearer. I had an internal deadline for a project where I needed the code so haven't had time to do test cases and give examples. That deadline was this morning so now I can get back to balanced programming and get this finalized. The backing store and Sequence Reader are interfaces to the same concept. The Sequence is stored either on disk, as objects, in a string, at uniprot, in a database etc where the AbstractSequence doesn't need to know those details. Andy Yates is working on the storage mechanism to allow edits etc so he owns that portion of the code. I agree having an empty constructor is important to be a proper Bean. It does however place an additional programming contract burden on developers that are just getting started that the object is not valid if you don't call setSequence(). Easy enough to make the change if we don't get any feedback arguing against. I have added in parent child relationships where you can't can't create a TranscriptSequence as that must be a child of a GeneSequence which must be a child of a DNASequence. Only the DNASequence has a constructor exposed to pass in the actual Sequence Data. Indexing for features is relative to the sequence storage. Instead of having Features as place holders we are trying to model the relationships of a Feature as a class with methods that correspond to the Feature. I don't want to expose the empty constructor on these types of sequences because I want to enforce the relationships if not lots of code checking has to occur in the base classes. For example when you add a CDS feature to a transcriptSequence you get back a CDSSequence that based on the parent child relationships when you actually ask for the sequence as a string the underlying code will find the parent with the backing store/Sequence reader and get the sub-sequence. Of course you can pass in a null sequence as the parent but I can also defend against that in one spot to enforce a contract for what is expected. Can you describe the scenario where you would create Sequences and then at some point in the future give the actual SequenceSQLReader implementation to load the data. Your SequenceSQLReader could load the actual data at the time of instantiation or all calls that actually consume data make a check/call to the lazyload/init. Lots of ways to optimize based on your use case. I use in one example if(isInit() == false) init() in each of the methods which makes the lazy loading fairly easy. If you are working with sequence data at the chromosome level then it is optimal to send all calls to sql server because you are typically grabbing sub-sequence data that covers a feature so you don't take the local memory hit or load time of pulling all the data. The goal was to make it flexible and I think it will work for almost any case except for the one you pointed out where having an empty constructor for integration with bean aware tools. Scooter On May 11, 2010, at 10:45 AM, PATERSON Trevor wrote: > Hi > > as you may be aware I am working with Andy Law at Roslin to kick off development of a Java version of the ensembl-api. > > It makes good sense for us to integrate with the new Bio-Java code, however we have a few fundamental issues with the immutability of Sequence objects. > > Because the BioJava Sequence objects require initialization at construction time with the actual sequence we will have problems using Ibatis mapping to create Sequences from SQL queries, and we will not easily be able to use LazyLoad to fetch the sequence only when we need it. > > Ibatis uses bean setters to set properties on beans, which must have an empty constructor > > - so setSequence(String seqString, CompoundSet compoundSet) > and setSequence(SequenceProxyLoader proxyLoader, CompoundSet compoundSet) would be very useful > > alternatively we could hack round this if we could access the backing store in our own subclasses of DNASequence - i.e give the Sequence properties 'protected' rather than 'private' visibility. > > > In essence what we are wanting to do is implement DatasourceAware subclasses of BioJava DNASequence, which we can retrieve partially filled in from Ensembl, but only retrieve and set the actual DNASequence by LazyLoad when we want the DNA Sequence. > > It may be that we can implement this lazy loading using extensions of the SequenceProxyLoader interface (I am guessing that this is what it is for) but again we'd still benefit from accessing the backing store directly. > > Obviously I am not up-to-speed with the design ideas behind the DNASequence Object and maybe I am barking up the wrong tree in trying to subclass it to my own ends, so any hints and tips would be most welcome. > > cheers for now > > Trevor Paterson PhD > new email trevor.paterson at roslin.ed.ac.uk > > Bioinformatics > The Roslin Institute > The Royal (Dick) School of Veterinary Studies > University of Edinburgh > Scotland EH25 9PS > phone +44 (0)131 5274197 > http://www.roslin.ed.ac.uk > http://www.resspecies.org > http://www.thearkdb.org > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at eaglegenomics.com Tue May 11 11:58:37 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 11 May 2010 16:58:37 +0100 Subject: [Biojava-dev] DNASequence not being a bean In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DBD60305@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DBD60305@EXSEEDMD.is.ed.ac.uk> Message-ID: <514259DB-B344-44D9-BCFC-357482AB598F@eaglegenomics.com> To construct a delayed-instantiation form of DNASequence you could use the Proxy pattern: 1. Create a class that subclasses DNASequence AND implements SequenceProxyLoader, has an internal instance of DNASequence itself (set to null initially), and make all the DNASequence methods just delegate to the equivalents on the internal instance. 2. Add a null constructor (DNASequence already has one that does nothing so this is OK). 3. Add required additional setter methods from SequenceProxyLoader and make them remember/track when they are called and with what values. 4. Once all the SequenceProxyLoader setters have been called use the data to instantiate and construct a DNASequence instance and set it as the internal instance. Alternatively just have Ibatis create SequenceProxyLoader instances directly and just convert them to DNASequence in your code whenever you need to, using the DNASequence(SequenceProxyLoader) constructor. cheers, Richard On 11 May 2010, at 15:45, PATERSON Trevor wrote: > Hi > > as you may be aware I am working with Andy Law at Roslin to kick off development of a Java version of the ensembl-api. > > It makes good sense for us to integrate with the new Bio-Java code, however we have a few fundamental issues with the immutability of Sequence objects. > > Because the BioJava Sequence objects require initialization at construction time with the actual sequence we will have problems using Ibatis mapping to create Sequences from SQL queries, and we will not easily be able to use LazyLoad to fetch the sequence only when we need it. > > Ibatis uses bean setters to set properties on beans, which must have an empty constructor > > - so setSequence(String seqString, CompoundSet compoundSet) > and setSequence(SequenceProxyLoader proxyLoader, CompoundSet compoundSet) would be very useful > > alternatively we could hack round this if we could access the backing store in our own subclasses of DNASequence - i.e give the Sequence properties 'protected' rather than 'private' visibility. > > > In essence what we are wanting to do is implement DatasourceAware subclasses of BioJava DNASequence, which we can retrieve partially filled in from Ensembl, but only retrieve and set the actual DNASequence by LazyLoad when we want the DNA Sequence. > > It may be that we can implement this lazy loading using extensions of the SequenceProxyLoader interface (I am guessing that this is what it is for) but again we'd still benefit from accessing the backing store directly. > > Obviously I am not up-to-speed with the design ideas behind the DNASequence Object and maybe I am barking up the wrong tree in trying to subclass it to my own ends, so any hints and tips would be most welcome. > > cheers for now > > Trevor Paterson PhD > new email trevor.paterson at roslin.ed.ac.uk > > Bioinformatics > The Roslin Institute > The Royal (Dick) School of Veterinary Studies > University of Edinburgh > Scotland EH25 9PS > phone +44 (0)131 5274197 > http://www.roslin.ed.ac.uk > http://www.resspecies.org > http://www.thearkdb.org > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Tue May 11 12:16:25 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 11 May 2010 17:16:25 +0100 Subject: [Biojava-dev] DNASequence not being a bean In-Reply-To: <01F954F7-9999-4292-98AC-B567E722DCD0@scripps.edu> References: <428F360A0BA69249B73A710B58435CFE04DBD60305@EXSEEDMD.is.ed.ac.uk> <01F954F7-9999-4292-98AC-B567E722DCD0@scripps.edu> Message-ID: Hi guys, Just in the middle of a reply & I saw this response :) Firstly can I say I'm glad that you're trying to use this new code Trevor; it'll provide a very useful & driving use-case for the new API and I'm also glad that the Ensembl API is still being developed. It's bound to help us out in the long run. So I see two ways around this. as Scooter said you can implement a SequenceProxyReader to do the work of getting & setting data to and from the Ensembl schema. Bulk loads of the data from Ensembl probably isn't a great idea as the sequence can be quite large depending on who has built the DB. If I remember correctly the Perl API does grabs of 250KB and puts them in a LRU cache. This would seem like my first port of call however this does still mean you need a way of doing bean style construction. The other thing to take not to sue the existing hierarchy of objects and implement your own Sequence. Whilst this does mean whatever we do for working with features in BioJava you may only be able to leverage a portion of this. However using that interface an Generic type does give you complete freedom to do whatever you want to Andy On 11 May 2010, at 17:03, Scooter Willis wrote: > Trevor > > Andy Yates and I are knee deep in this at the moment and about to do a code check in to help clarify some of the concepts with classes that have better descriptive names. We will send out an email when we have all the issues resolved. SequenceProxyLoader concept has a name change to SequenceProxyReader to help leverage the abstract concepts of InputStreams or File Readers in Java. We realize it is confusing at the moment and I am working on examples to make it a little clearer. I had an internal deadline for a project where I needed the code so haven't had time to do test cases and give examples. That deadline was this morning so now I can get back to balanced programming and get this finalized. > > The backing store and Sequence Reader are interfaces to the same concept. The Sequence is stored either on disk, as objects, in a string, at uniprot, in a database etc where the AbstractSequence doesn't need to know those details. Andy Yates is working on the storage mechanism to allow edits etc so he owns that portion of the code. I agree having an empty constructor is important to be a proper Bean. It does however place an additional programming contract burden on developers that are just getting started that the object is not valid if you don't call setSequence(). Easy enough to make the change if we don't get any feedback arguing against. > > I have added in parent child relationships where you can't can't create a TranscriptSequence as that must be a child of a GeneSequence which must be a child of a DNASequence. Only the DNASequence has a constructor exposed to pass in the actual Sequence Data. Indexing for features is relative to the sequence storage. Instead of having Features as place holders we are trying to model the relationships of a Feature as a class with methods that correspond to the Feature. I don't want to expose the empty constructor on these types of sequences because I want to enforce the relationships if not lots of code checking has to occur in the base classes. For example when you add a CDS feature to a transcriptSequence you get back a CDSSequence that based on the parent child relationships when you actually ask for the sequence as a string the underlying code will find the parent with the backing store/Sequence reader and get the sub-sequence. Of course you can pass in a null sequence as! > the parent but I can also defend against that in one spot to enforce a contract for what is expected. > > Can you describe the scenario where you would create Sequences and then at some point in the future give the actual SequenceSQLReader implementation to load the data. Your SequenceSQLReader could load the actual data at the time of instantiation or all calls that actually consume data make a check/call to the lazyload/init. Lots of ways to optimize based on your use case. I use in one example if(isInit() == false) init() in each of the methods which makes the lazy loading fairly easy. If you are working with sequence data at the chromosome level then it is optimal to send all calls to sql server because you are typically grabbing sub-sequence data that covers a feature so you don't take the local memory hit or load time of pulling all the data. The goal was to make it flexible and I think it will work for almost any case except for the one you pointed out where having an empty constructor for integration with bean aware tools. > > Scooter > > > On May 11, 2010, at 10:45 AM, PATERSON Trevor wrote: > >> Hi >> >> as you may be aware I am working with Andy Law at Roslin to kick off development of a Java version of the ensembl-api. >> >> It makes good sense for us to integrate with the new Bio-Java code, however we have a few fundamental issues with the immutability of Sequence objects. >> >> Because the BioJava Sequence objects require initialization at construction time with the actual sequence we will have problems using Ibatis mapping to create Sequences from SQL queries, and we will not easily be able to use LazyLoad to fetch the sequence only when we need it. >> >> Ibatis uses bean setters to set properties on beans, which must have an empty constructor >> >> - so setSequence(String seqString, CompoundSet compoundSet) >> and setSequence(SequenceProxyLoader proxyLoader, CompoundSet compoundSet) would be very useful >> >> alternatively we could hack round this if we could access the backing store in our own subclasses of DNASequence - i.e give the Sequence properties 'protected' rather than 'private' visibility. >> >> >> In essence what we are wanting to do is implement DatasourceAware subclasses of BioJava DNASequence, which we can retrieve partially filled in from Ensembl, but only retrieve and set the actual DNASequence by LazyLoad when we want the DNA Sequence. >> >> It may be that we can implement this lazy loading using extensions of the SequenceProxyLoader interface (I am guessing that this is what it is for) but again we'd still benefit from accessing the backing store directly. >> >> Obviously I am not up-to-speed with the design ideas behind the DNASequence Object and maybe I am barking up the wrong tree in trying to subclass it to my own ends, so any hints and tips would be most welcome. >> >> cheers for now >> >> Trevor Paterson PhD >> new email trevor.paterson at roslin.ed.ac.uk >> >> Bioinformatics >> The Roslin Institute >> The Royal (Dick) School of Veterinary Studies >> University of Edinburgh >> Scotland EH25 9PS >> phone +44 (0)131 5274197 >> http://www.roslin.ed.ac.uk >> http://www.resspecies.org >> http://www.thearkdb.org >> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Tue May 11 21:39:14 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 11 May 2010 21:39:14 -0400 Subject: [Biojava-dev] Biojava3-Core Message-ID: I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. Scooter From ayates at ebi.ac.uk Wed May 12 05:10:12 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 10:10:12 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: Message-ID: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? Andy On 12 May 2010, at 02:39, Scooter Willis wrote: > > I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. > > I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. > > Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Wed May 12 05:43:55 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 10:43:55 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <85BDCDB4-A7D7-4B30-A44E-25CCA71AE670@exseed.ed.ac.uk> References: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> <85BDCDB4-A7D7-4B30-A44E-25CCA71AE670@exseed.ed.ac.uk> Message-ID: CVS used to let you run code over checkins say a beautifier; not sure if SVN or Git support this ... On 12 May 2010, at 10:24, LAW Andrew wrote: > Has anyone found a maven plugin that will reliably enforce coding styles? > > > On 12 May 2010, at 10:10, Andy Yates wrote: > >> One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? >> >> Andy >> >> On 12 May 2010, at 02:39, Scooter Willis wrote: >> >>> >>> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. >>> >>> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. >>> >>> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > Later, > > Andy > -------- > Yada, yada, yada... > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andy.law at roslin.ed.ac.uk Wed May 12 05:24:06 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Wed, 12 May 2010 10:24:06 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> References: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> Message-ID: <85BDCDB4-A7D7-4B30-A44E-25CCA71AE670@exseed.ed.ac.uk> Has anyone found a maven plugin that will reliably enforce coding styles? On 12 May 2010, at 10:10, Andy Yates wrote: > One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? > > Andy > > On 12 May 2010, at 02:39, Scooter Willis wrote: > >> >> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. >> >> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. >> >> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. >> >> Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From trevor.paterson at roslin.ed.ac.uk Wed May 12 05:37:35 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Wed, 12 May 2010 10:37:35 +0100 Subject: [Biojava-dev] DNASequence not being a bean In-Reply-To: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> References: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DC1BBA6F@EXSEEDMD.is.ed.ac.uk> Thanks for your comments and ideas... I am mulling things over in the meantime - i haven't been able to connect to the SVN for a while...... there seems to be something wrong with open-bio connections, does it still allow anonymous connections? I can get the code tarred up from github....but it is missing a module so won't compile >>org/biojava3/core/sequence/DNATests.java:[13,33] package org.biojava3.core.features does not exist trevor > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > Andy Yates > Sent: 12 May 2010 10:10 > To: Scooter Willis > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava3-Core > > One thing as a co-contributor that I've noticed is there is a > difference in coding style. I'm happy to adopt any coding > style the majority agree with but I'd rather not do a huge > commit changing 2 spaced tabs into 4 spaced tabs :). Can you > send around your netbeans coding profile? > > Andy > > On 12 May 2010, at 02:39, Scooter Willis wrote: > > > > > I just committed a fairly significant number of changes > into BioJava3-core. If you don't have any code changes in > core then probably better to delete the core package and > check out again o avoid resolving lots of conflicts. > > > > I also checked in the beginning of biojava3-genome which > makes significant use of biojava3-core. The code is being > used to take gene predictions from GeneMarkHMM, GeneID and > GeneMark which all have their own version of gff, gff3 and > gtf output. The code will take features from the predictions > and maps them onto DNASequences and allows you to get > ProteinSequence for each prediction. It is very much a work > in progress and will be adding additional features in the > next 30 days for a gene annotation project that is part of > the day job. > > > > Let me know if you have any comments or suggestions. I > promise better java docs, test cases and wiki overview soon. > > > > Scooter > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From biopython at maubp.freeserve.co.uk Wed May 12 06:15:22 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 May 2010 11:15:22 +0100 Subject: [Biojava-dev] DNASequence not being a bean In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DC1BBA6F@EXSEEDMD.is.ed.ac.uk> References: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DC1BBA6F@EXSEEDMD.is.ed.ac.uk> Message-ID: On Wed, May 12, 2010 at 10:37 AM, PATERSON Trevor wrote: > > Thanks for your comments and ideas... > > I am mulling things over > > in the meantime - i haven't been able to connect to the SVN for a > while...... there seems to be something ?wrong with open-bio > connections, does it still allow anonymous connections? I assume there are still on going issues with the VM running the anonymous SVN/CVS server at code.open-bio.org :( Peter From HWillis at scripps.edu Wed May 12 06:17:06 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 12 May 2010 06:17:06 -0400 Subject: [Biojava-dev] Biojava3-Core Message-ID: Using the default netbeans profile. I will extract the details. If I am the only netbeans user then I can use eclipse setup if you provide details. The only thing that I have a personal preference for is using a full line versus breaking a method call into multiple lines. I understand the need for short lines in a VI world. Scooter Sent from G1 -----Original Message----- From: Andy Yates [ayates at ebi.ac.uk] Received: 5/12/10 5:10 AM To: Scooter Willis [HWillis at scripps.edu] CC: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] Subject: Re: [Biojava-dev] Biojava3-Core One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? Andy On 12 May 2010, at 02:39, Scooter Willis wrote: > > I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. > > I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. > > Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andy.law at roslin.ed.ac.uk Wed May 12 06:23:18 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Wed, 12 May 2010 11:23:18 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: Message-ID: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> We use NetBeans too. And I *like* the NetBeans conventions :o} On 12 May 2010, at 11:17, Scooter Willis wrote: > Using the default netbeans profile. I will extract the details. If I am the only netbeans user then I can use eclipse setup if you provide details. The only thing that I have a personal preference for is using a full line versus breaking a method call into multiple lines. I understand the need for short lines in a VI world. > > Scooter > > Sent from G1 > > -----Original Message----- > From: Andy Yates [ayates at ebi.ac.uk] > Received: 5/12/10 5:10 AM > To: Scooter Willis [HWillis at scripps.edu] > CC: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] > Subject: Re: [Biojava-dev] Biojava3-Core > > > > One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? > > Andy > > On 12 May 2010, at 02:39, Scooter Willis wrote: > >> >> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. >> >> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. >> >> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. >> >> Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From HWillis at scripps.edu Wed May 12 06:27:25 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 12 May 2010 06:27:25 -0400 Subject: [Biojava-dev] DNASequence not being a bean Message-ID: We are seeing some differences as well and andy yates is reviewing. The features package was moved to the genome module so not sure how package deletes get propogated to the different repositories. Can you make calls to static methods with your database mapping tool? If so we can have a helper class return the classes without an empty constructor. Sent from G1 -----Original Message----- From: PATERSON Trevor [trevor.paterson at roslin.ed.ac.uk] Received: 5/12/10 6:20 AM To: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] Subject: Re: [Biojava-dev] DNASequence not being a bean Thanks for your comments and ideas... I am mulling things over in the meantime - i haven't been able to connect to the SVN for a while...... there seems to be something wrong with open-bio connections, does it still allow anonymous connections? I can get the code tarred up from github....but it is missing a module so won't compile >>org/biojava3/core/sequence/DNATests.java:[13,33] package org.biojava3.core.features does not exist trevor > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > Andy Yates > Sent: 12 May 2010 10:10 > To: Scooter Willis > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava3-Core > > One thing as a co-contributor that I've noticed is there is a > difference in coding style. I'm happy to adopt any coding > style the majority agree with but I'd rather not do a huge > commit changing 2 spaced tabs into 4 spaced tabs :). Can you > send around your netbeans coding profile? > > Andy > > On 12 May 2010, at 02:39, Scooter Willis wrote: > > > > > I just committed a fairly significant number of changes > into BioJava3-core. If you don't have any code changes in > core then probably better to delete the core package and > check out again o avoid resolving lots of conflicts. > > > > I also checked in the beginning of biojava3-genome which > makes significant use of biojava3-core. The code is being > used to take gene predictions from GeneMarkHMM, GeneID and > GeneMark which all have their own version of gff, gff3 and > gtf output. The code will take features from the predictions > and maps them onto DNASequences and allows you to get > ProteinSequence for each prediction. It is very much a work > in progress and will be adding additional features in the > next 30 days for a gene annotation project that is part of > the day job. > > > > Let me know if you have any comments or suggestions. I > promise better java docs, test cases and wiki overview soon. > > > > Scooter > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From ayates at ebi.ac.uk Wed May 12 06:46:11 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 11:46:11 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> Message-ID: I'm an Eclipse man but I'm more than happy to switch to Netbeans for BioJava. On 12 May 2010, at 11:23, LAW Andrew wrote: > We use NetBeans too. > > And I *like* the NetBeans conventions :o} > > > On 12 May 2010, at 11:17, Scooter Willis wrote: > >> Using the default netbeans profile. I will extract the details. If I am the only netbeans user then I can use eclipse setup if you provide details. The only thing that I have a personal preference for is using a full line versus breaking a method call into multiple lines. I understand the need for short lines in a VI world. >> >> Scooter >> >> Sent from G1 >> >> -----Original Message----- >> From: Andy Yates [ayates at ebi.ac.uk] >> Received: 5/12/10 5:10 AM >> To: Scooter Willis [HWillis at scripps.edu] >> CC: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] >> Subject: Re: [Biojava-dev] Biojava3-Core >> >> >> >> One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? >> >> Andy >> >> On 12 May 2010, at 02:39, Scooter Willis wrote: >> >>> >>> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. >>> >>> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. >>> >>> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Wed May 12 06:49:39 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 11:49:39 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> Message-ID: <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> Okay so what's the ruling on indentation. Netbeans defaults to spaces with 4 per tab. Everyone okay with that? On 12 May 2010, at 11:46, Andy Yates wrote: > I'm an Eclipse man but I'm more than happy to switch to Netbeans for BioJava. > > On 12 May 2010, at 11:23, LAW Andrew wrote: > >> We use NetBeans too. >> >> And I *like* the NetBeans conventions :o} >> >> >> On 12 May 2010, at 11:17, Scooter Willis wrote: >> >>> Using the default netbeans profile. I will extract the details. If I am the only netbeans user then I can use eclipse setup if you provide details. The only thing that I have a personal preference for is using a full line versus breaking a method call into multiple lines. I understand the need for short lines in a VI world. >>> >>> Scooter >>> >>> Sent from G1 >>> >>> -----Original Message----- >>> From: Andy Yates [ayates at ebi.ac.uk] >>> Received: 5/12/10 5:10 AM >>> To: Scooter Willis [HWillis at scripps.edu] >>> CC: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] >>> Subject: Re: [Biojava-dev] Biojava3-Core >>> >>> >>> >>> One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? >>> >>> Andy >>> >>> On 12 May 2010, at 02:39, Scooter Willis wrote: >>> >>>> >>>> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. >>>> >>>> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. >>>> >>>> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. >>>> >>>> Scooter >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> Later, >> >> Andy >> -------- >> Yada, yada, yada... >> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Wed May 12 06:58:07 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 12 May 2010 06:58:07 -0400 Subject: [Biojava-dev] Biojava3-Core Message-ID: You will never go back to the dark side once you see the light! Sent from G1 -----Original Message----- From: Andy Yates [ayates at ebi.ac.uk] Received: 5/12/10 6:46 AM To: LAW Andrew [andy.law at roslin.ed.ac.uk] CC: Scooter Willis [HWillis at scripps.edu]; biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] Subject: Re: [Biojava-dev] Biojava3-Core I'm an Eclipse man but I'm more than happy to switch to Netbeans for BioJava. On 12 May 2010, at 11:23, LAW Andrew wrote: > We use NetBeans too. > > And I *like* the NetBeans conventions :o} > > > On 12 May 2010, at 11:17, Scooter Willis wrote: > >> Using the default netbeans profile. I will extract the details. If I am the only netbeans user then I can use eclipse setup if you provide details. The only thing that I have a personal preference for is using a full line versus breaking a method call into multiple lines. I understand the need for short lines in a VI world. >> >> Scooter >> >> Sent from G1 >> >> -----Original Message----- >> From: Andy Yates [ayates at ebi.ac.uk] >> Received: 5/12/10 5:10 AM >> To: Scooter Willis [HWillis at scripps.edu] >> CC: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] >> Subject: Re: [Biojava-dev] Biojava3-Core >> >> >> >> One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? >> >> Andy >> >> On 12 May 2010, at 02:39, Scooter Willis wrote: >> >>> >>> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. >>> >>> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. >>> >>> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Wed May 12 06:59:06 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 11:59:06 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: Message-ID: <300E1D23-AED1-4CCB-8EA3-A0DC58420075@ebi.ac.uk> I'll still be an Eclipse boy at heart; after all it's where my Perl editor with working debugger lives On 12 May 2010, at 11:58, Scooter Willis wrote: > You will never go back to the dark side once you see the light! > > Sent from G1 > > -----Original Message----- > From: Andy Yates [ayates at ebi.ac.uk] > Received: 5/12/10 6:46 AM > To: LAW Andrew [andy.law at roslin.ed.ac.uk] > CC: Scooter Willis [HWillis at scripps.edu]; biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] > Subject: Re: [Biojava-dev] Biojava3-Core > > > I'm an Eclipse man but I'm more than happy to switch to Netbeans for BioJava. > > On 12 May 2010, at 11:23, LAW Andrew wrote: > > > We use NetBeans too. > > > > And I *like* the NetBeans conventions :o} > > > > > > On 12 May 2010, at 11:17, Scooter Willis wrote: > > > >> Using the default netbeans profile. I will extract the details. If I am the only netbeans user then I can use eclipse setup if you provide details. The only thing that I have a personal preference for is using a full line versus breaking a method call into multiple lines. I understand the need for short lines in a VI world. > >> > >> Scooter > >> > >> Sent from G1 > >> > >> -----Original Message----- > >> From: Andy Yates [ayates at ebi.ac.uk] > >> Received: 5/12/10 5:10 AM > >> To: Scooter Willis [HWillis at scripps.edu] > >> CC: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] > >> Subject: Re: [Biojava-dev] Biojava3-Core > >> > >> > >> > >> One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? > >> > >> Andy > >> > >> On 12 May 2010, at 02:39, Scooter Willis wrote: > >> > >>> > >>> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. > >>> > >>> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. > >>> > >>> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. > >>> > >>> Scooter > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> -- > >> Andrew Yates Ensembl Genomes Engineer > >> EMBL-EBI Tel: +44-(0)1223-492538 > >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >> > >> > >> > >> > >> > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > Later, > > > > Andy > > -------- > > Yada, yada, yada... > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From trevor.paterson at roslin.ed.ac.uk Wed May 12 07:04:30 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Wed, 12 May 2010 12:04:30 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <300E1D23-AED1-4CCB-8EA3-A0DC58420075@ebi.ac.uk> References: <300E1D23-AED1-4CCB-8EA3-A0DC58420075@ebi.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DC1BBA70@EXSEEDMD.is.ed.ac.uk> i was a netbeans die-hard until i had to delve into the perl ensembl-api......... where eclipse certainly wins over emacs :) > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > Andy Yates > Sent: 12 May 2010 11:59 > To: Scooter Willis > Cc: LAW Andrew; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Biojava3-Core > > I'll still be an Eclipse boy at heart; after all it's where > my Perl editor with working debugger lives > > On 12 May 2010, at 11:58, Scooter Willis wrote: > > > You will never go back to the dark side once you see the light! > > > > Sent from G1 > > > > -----Original Message----- > > From: Andy Yates [ayates at ebi.ac.uk] > > Received: 5/12/10 6:46 AM > > To: LAW Andrew [andy.law at roslin.ed.ac.uk] > > CC: Scooter Willis [HWillis at scripps.edu]; > > biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] > > Subject: Re: [Biojava-dev] Biojava3-Core > > > > > > I'm an Eclipse man but I'm more than happy to switch to > Netbeans for BioJava. > > > > On 12 May 2010, at 11:23, LAW Andrew wrote: > > > > > We use NetBeans too. > > > > > > And I *like* the NetBeans conventions :o} > > > > > > > > > On 12 May 2010, at 11:17, Scooter Willis wrote: > > > > > >> Using the default netbeans profile. I will extract the > details. If I am the only netbeans user then I can use > eclipse setup if you provide details. The only thing that I > have a personal preference for is using a full line versus > breaking a method call into multiple lines. I understand the > need for short lines in a VI world. > > >> > > >> Scooter > > >> > > >> Sent from G1 > > >> > > >> -----Original Message----- > > >> From: Andy Yates [ayates at ebi.ac.uk] > > >> Received: 5/12/10 5:10 AM > > >> To: Scooter Willis [HWillis at scripps.edu] > > >> CC: biojava-dev at lists.open-bio.org > [biojava-dev at lists.open-bio.org] > > >> Subject: Re: [Biojava-dev] Biojava3-Core > > >> > > >> > > >> > > >> One thing as a co-contributor that I've noticed is there > is a difference in coding style. I'm happy to adopt any > coding style the majority agree with but I'd rather not do a > huge commit changing 2 spaced tabs into 4 spaced tabs :). Can > you send around your netbeans coding profile? > > >> > > >> Andy > > >> > > >> On 12 May 2010, at 02:39, Scooter Willis wrote: > > >> > > >>> > > >>> I just committed a fairly significant number of changes > into BioJava3-core. If you don't have any code changes in > core then probably better to delete the core package and > check out again o avoid resolving lots of conflicts. > > >>> > > >>> I also checked in the beginning of biojava3-genome > which makes significant use of biojava3-core. The code is > being used to take gene predictions from GeneMarkHMM, GeneID > and GeneMark which all have their own version of gff, gff3 > and gtf output. The code will take features from the > predictions and maps them onto DNASequences and allows you to > get ProteinSequence for each prediction. It is very much a > work in progress and will be adding additional features in > the next 30 days for a gene annotation project that is part > of the day job. > > >>> > > >>> Let me know if you have any comments or suggestions. I > promise better java docs, test cases and wiki overview soon. > > >>> > > >>> Scooter > > >>> _______________________________________________ > > >>> biojava-dev mailing list > > >>> biojava-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > >> > > >> -- > > >> Andrew Yates Ensembl Genomes Engineer > > >> EMBL-EBI Tel: +44-(0)1223-492538 > > >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > > >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> biojava-dev mailing list > > >> biojava-dev at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > Later, > > > > > > Andy > > > -------- > > > Yada, yada, yada... > > > The University of Edinburgh is a charitable body, registered in > > > Scotland, with registration number SC005336 > > > Disclaimer: This e-mail and any attachments are > confidential and intended solely for the use of the > recipient(s) to whom they are addressed. If you have received > it in error, please destroy all copies and inform the sender. > > > > > > > > > -- > > > The University of Edinburgh is a charitable body, registered in > > > Scotland, with registration number SC005336. > > > > > > > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > > Andrew Yates Ensembl Genomes Engineer > > EMBL-EBI Tel: +44-(0)1223-492538 > > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > > > > > > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From andy.law at roslin.ed.ac.uk Wed May 12 07:09:14 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Wed, 12 May 2010 12:09:14 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> Message-ID: <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> On 12 May 2010, at 11:52, Richard Holland wrote: > I see a need for a formal coding style here, regardless of what platform people are using. Taking Netbeans as a basis is a good start but it needs to be documented for contributors to read and follow, then enforced across the whole project. Absolutely. But the pain in making use those conventions can be numbing. Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From holland at eaglegenomics.com Wed May 12 06:52:55 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 12 May 2010 11:52:55 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> Message-ID: <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> I see a need for a formal coding style here, regardless of what platform people are using. Taking Netbeans as a basis is a good start but it needs to be documented for contributors to read and follow, then enforced across the whole project. On 12 May 2010, at 11:49, Andy Yates wrote: > Okay so what's the ruling on indentation. Netbeans defaults to spaces with 4 per tab. Everyone okay with that? > > On 12 May 2010, at 11:46, Andy Yates wrote: > >> I'm an Eclipse man but I'm more than happy to switch to Netbeans for BioJava. >> >> On 12 May 2010, at 11:23, LAW Andrew wrote: >> >>> We use NetBeans too. >>> >>> And I *like* the NetBeans conventions :o} >>> >>> >>> On 12 May 2010, at 11:17, Scooter Willis wrote: >>> >>>> Using the default netbeans profile. I will extract the details. If I am the only netbeans user then I can use eclipse setup if you provide details. The only thing that I have a personal preference for is using a full line versus breaking a method call into multiple lines. I understand the need for short lines in a VI world. >>>> >>>> Scooter >>>> >>>> Sent from G1 >>>> >>>> -----Original Message----- >>>> From: Andy Yates [ayates at ebi.ac.uk] >>>> Received: 5/12/10 5:10 AM >>>> To: Scooter Willis [HWillis at scripps.edu] >>>> CC: biojava-dev at lists.open-bio.org [biojava-dev at lists.open-bio.org] >>>> Subject: Re: [Biojava-dev] Biojava3-Core >>>> >>>> >>>> >>>> One thing as a co-contributor that I've noticed is there is a difference in coding style. I'm happy to adopt any coding style the majority agree with but I'd rather not do a huge commit changing 2 spaced tabs into 4 spaced tabs :). Can you send around your netbeans coding profile? >>>> >>>> Andy >>>> >>>> On 12 May 2010, at 02:39, Scooter Willis wrote: >>>> >>>>> >>>>> I just committed a fairly significant number of changes into BioJava3-core. If you don't have any code changes in core then probably better to delete the core package and check out again o avoid resolving lots of conflicts. >>>>> >>>>> I also checked in the beginning of biojava3-genome which makes significant use of biojava3-core. The code is being used to take gene predictions from GeneMarkHMM, GeneID and GeneMark which all have their own version of gff, gff3 and gtf output. The code will take features from the predictions and maps them onto DNASequences and allows you to get ProteinSequence for each prediction. It is very much a work in progress and will be adding additional features in the next 30 days for a gene annotation project that is part of the day job. >>>>> >>>>> Let me know if you have any comments or suggestions. I promise better java docs, test cases and wiki overview soon. >>>>> >>>>> Scooter >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Wed May 12 07:26:26 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 12:26:26 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> Message-ID: <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> So long as they are documented then developers shouldn't complain when their formatting changes. I just want to avoid a situation like now with Scooter's check-in which has a lot of formatting changes so the code changes are quite hard to pick out. I've gone and started a page about conventions and have started a discussion. Once we are agreed on what should go into the conventions then they will migrate to the wiki page. Andy On 12 May 2010, at 12:09, LAW Andrew wrote: > > On 12 May 2010, at 11:52, Richard Holland wrote: > >> I see a need for a formal coding style here, regardless of what platform people are using. Taking Netbeans as a basis is a good start but it needs to be documented for contributors to read and follow, then enforced across the whole project. > > Absolutely. But the pain in making use those conventions can be numbing. > > > > Later, > > Andy > -------- > Yada, yada, yada... > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Wed May 12 07:56:01 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 12:56:01 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> Message-ID: <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> I can say that the Eclipse diff tool does care about whitespace and I'm quite sure that the Unix diff cares about it as well. I can't say anything about Netbeans since I haven't used it in years. With all the best respect if the formatting rules say 4 spaces per indentation & code comes in not behaving in that manner then it'll be reformatted. This is nothing against the person who does the commits just they should be aware that it _could_ happen. I should say my personal preference is 2 spaces per indentation; I am flexible about this & will go with what the majority agree. That said if I'm being flexible about it I hope others will be as well. Andy On 12 May 2010, at 12:41, LAW Andrew wrote: > I think the main thing should be for each of us to find a diff tool that doesn't care about whitespace. The other stuff (camelCase, use of braces [ALWAYS!!!], splitting lines) are more important and can/should be defined but if you tell me that we must use 2 spaces to indent and my editor of choice uses a tab then I'm probably not going to listen to you in all practical situations. > > Not being awkward, just pragmatic. > > > On 12 May 2010, at 12:26, Andy Yates wrote: > >> So long as they are documented then developers shouldn't complain when their formatting changes. I just want to avoid a situation like now with Scooter's check-in which has a lot of formatting changes so the code changes are quite hard to pick out. >> >> I've gone and started a page about conventions and have started a discussion. Once we are agreed on what should go into the conventions then they will migrate to the wiki page. >> >> Andy >> >> On 12 May 2010, at 12:09, LAW Andrew wrote: >> >>> >>> On 12 May 2010, at 11:52, Richard Holland wrote: >>> >>>> I see a need for a formal coding style here, regardless of what platform people are using. Taking Netbeans as a basis is a good start but it needs to be documented for contributors to read and follow, then enforced across the whole project. >>> >>> Absolutely. But the pain in making use those conventions can be numbing. >>> >>> >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> > > Later, > > Andy > -------- > Yada, yada, yada... > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From holland at eaglegenomics.com Wed May 12 08:21:31 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 12 May 2010 13:21:31 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> Message-ID: <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. cheers, Richard On 12 May 2010, at 12:56, Andy Yates wrote: > I can say that the Eclipse diff tool does care about whitespace and I'm quite sure that the Unix diff cares about it as well. I can't say anything about Netbeans since I haven't used it in years. > > With all the best respect if the formatting rules say 4 spaces per indentation & code comes in not behaving in that manner then it'll be reformatted. This is nothing against the person who does the commits just they should be aware that it _could_ happen. I should say my personal preference is 2 spaces per indentation; I am flexible about this & will go with what the majority agree. That said if I'm being flexible about it I hope others will be as well. > > Andy > > On 12 May 2010, at 12:41, LAW Andrew wrote: > >> I think the main thing should be for each of us to find a diff tool that doesn't care about whitespace. The other stuff (camelCase, use of braces [ALWAYS!!!], splitting lines) are more important and can/should be defined but if you tell me that we must use 2 spaces to indent and my editor of choice uses a tab then I'm probably not going to listen to you in all practical situations. >> >> Not being awkward, just pragmatic. >> >> >> On 12 May 2010, at 12:26, Andy Yates wrote: >> >>> So long as they are documented then developers shouldn't complain when their formatting changes. I just want to avoid a situation like now with Scooter's check-in which has a lot of formatting changes so the code changes are quite hard to pick out. >>> >>> I've gone and started a page about conventions and have started a discussion. Once we are agreed on what should go into the conventions then they will migrate to the wiki page. >>> >>> Andy >>> >>> On 12 May 2010, at 12:09, LAW Andrew wrote: >>> >>>> >>>> On 12 May 2010, at 11:52, Richard Holland wrote: >>>> >>>>> I see a need for a formal coding style here, regardless of what platform people are using. Taking Netbeans as a basis is a good start but it needs to be documented for contributors to read and follow, then enforced across the whole project. >>>> >>>> Absolutely. But the pain in making use those conventions can be numbing. >>>> >>>> >>>> >>>> Later, >>>> >>>> Andy >>>> -------- >>>> Yada, yada, yada... >>>> >>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >> >> Later, >> >> Andy >> -------- >> Yada, yada, yada... >> >> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >> >> >> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andy.law at roslin.ed.ac.uk Wed May 12 07:41:05 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Wed, 12 May 2010 12:41:05 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> Message-ID: <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> I think the main thing should be for each of us to find a diff tool that doesn't care about whitespace. The other stuff (camelCase, use of braces [ALWAYS!!!], splitting lines) are more important and can/should be defined but if you tell me that we must use 2 spaces to indent and my editor of choice uses a tab then I'm probably not going to listen to you in all practical situations. Not being awkward, just pragmatic. On 12 May 2010, at 12:26, Andy Yates wrote: > So long as they are documented then developers shouldn't complain when their formatting changes. I just want to avoid a situation like now with Scooter's check-in which has a lot of formatting changes so the code changes are quite hard to pick out. > > I've gone and started a page about conventions and have started a discussion. Once we are agreed on what should go into the conventions then they will migrate to the wiki page. > > Andy > > On 12 May 2010, at 12:09, LAW Andrew wrote: > >> >> On 12 May 2010, at 11:52, Richard Holland wrote: >> >>> I see a need for a formal coding style here, regardless of what platform people are using. Taking Netbeans as a basis is a good start but it needs to be documented for contributors to read and follow, then enforced across the whole project. >> >> Absolutely. But the pain in making use those conventions can be numbing. >> >> >> >> Later, >> >> Andy >> -------- >> Yada, yada, yada... >> >> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >> >> >> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From HWillis at scripps.edu Wed May 12 08:49:39 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 12 May 2010 08:49:39 -0400 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> Message-ID: I agree this problem is not unique to our development effort and we should lean towards stricter adherence to standards as BioJava3 represents a reset. It is always easier to start out with more rules versus adding them in later. As code becomes stable in the libraries the cost/hassle of following guidelines goes to zero. I am caring around huge guilt for lack of test cases, javadoc and wiki overview. Just haven't had the borrowed time at work to do docs. Things are clearing up and will be making a focused effort to get docs in place. On May 12, 2010, at 8:21 AM, Richard Holland wrote: > The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). > > The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. > > In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. > > The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. > > BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. > > cheers, > Richard > > > On 12 May 2010, at 12:56, Andy Yates wrote: > >> I can say that the Eclipse diff tool does care about whitespace and I'm quite sure that the Unix diff cares about it as well. I can't say anything about Netbeans since I haven't used it in years. >> >> With all the best respect if the formatting rules say 4 spaces per indentation & code comes in not behaving in that manner then it'll be reformatted. This is nothing against the person who does the commits just they should be aware that it _could_ happen. I should say my personal preference is 2 spaces per indentation; I am flexible about this & will go with what the majority agree. That said if I'm being flexible about it I hope others will be as well. >> >> Andy >> >> On 12 May 2010, at 12:41, LAW Andrew wrote: >> >>> I think the main thing should be for each of us to find a diff tool that doesn't care about whitespace. The other stuff (camelCase, use of braces [ALWAYS!!!], splitting lines) are more important and can/should be defined but if you tell me that we must use 2 spaces to indent and my editor of choice uses a tab then I'm probably not going to listen to you in all practical situations. >>> >>> Not being awkward, just pragmatic. >>> >>> >>> On 12 May 2010, at 12:26, Andy Yates wrote: >>> >>>> So long as they are documented then developers shouldn't complain when their formatting changes. I just want to avoid a situation like now with Scooter's check-in which has a lot of formatting changes so the code changes are quite hard to pick out. >>>> >>>> I've gone and started a page about conventions and have started a discussion. Once we are agreed on what should go into the conventions then they will migrate to the wiki page. >>>> >>>> Andy >>>> >>>> On 12 May 2010, at 12:09, LAW Andrew wrote: >>>> >>>>> >>>>> On 12 May 2010, at 11:52, Richard Holland wrote: >>>>> >>>>>> I see a need for a formal coding style here, regardless of what platform people are using. Taking Netbeans as a basis is a good start but it needs to be documented for contributors to read and follow, then enforced across the whole project. >>>>> >>>>> Absolutely. But the pain in making use those conventions can be numbing. >>>>> >>>>> >>>>> >>>>> Later, >>>>> >>>>> Andy >>>>> -------- >>>>> Yada, yada, yada... >>>>> >>>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >>>>> >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From andy.law at roslin.ed.ac.uk Wed May 12 09:25:17 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Wed, 12 May 2010 14:25:17 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> Message-ID: Don't think that there's anything I'd disagree with in what you've written. And he who controls the commit phase, controls the game (to paraphrase an obnoxious NZ rugby referee's tattoo). On 12 May 2010, at 13:21, Richard Holland wrote: > The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). > > The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. > > In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. > > The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. > > BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. > > cheers, > Richard Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Wed May 12 09:42:55 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 14:42:55 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> Message-ID: <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> Can't think of a NZ ref you may be thinking of ;) So in the spirit of this I have been trying to move towards a more professional approach to BioJava development but this is counteracted by the time I can devote to it producing work closer to the first pattern of work that Richard mentioned. Quick & easy standards where IDEs can help are at least a step towards helping me out and meaning I get to spend more time worrying about the code, tests & docs. Going back to the second point of the thread if Scooter can share your Netbeans formatting profile then I will adopt that; if it is the default then I will use that unchanged. Anyone else who is working on the code & in Netbeans feel free to also use the same profile. Eclipse users are going to be left out in the cold on this since I'll be leaving the shores of Eclipse for BioJava3. Maybe this can be the start of a defined set of coding practices but for the moment it will ease code sharing for myself & Scooter. Andy On 12 May 2010, at 14:25, LAW Andrew wrote: > Don't think that there's anything I'd disagree with in what you've written. And he who controls the commit phase, controls the game (to paraphrase an obnoxious NZ rugby referee's tattoo). > > > On 12 May 2010, at 13:21, Richard Holland wrote: > >> The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). >> >> The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. >> >> In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. >> >> The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. >> >> BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. >> >> cheers, >> Richard > > Later, > > Andy > -------- > Yada, yada, yada... > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From holland at eaglegenomics.com Wed May 12 09:53:42 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 12 May 2010 14:53:42 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> Message-ID: <9515BF3E-A65D-4B7F-9C28-E603FC4C4896@eaglegenomics.com> it's cold out here! (says me, the last remaining eclipse user...) On 12 May 2010, at 14:42, Andy Yates wrote: > Can't think of a NZ ref you may be thinking of ;) > > So in the spirit of this I have been trying to move towards a more professional approach to BioJava development but this is counteracted by the time I can devote to it producing work closer to the first pattern of work that Richard mentioned. Quick & easy standards where IDEs can help are at least a step towards helping me out and meaning I get to spend more time worrying about the code, tests & docs. > > Going back to the second point of the thread if Scooter can share your Netbeans formatting profile then I will adopt that; if it is the default then I will use that unchanged. Anyone else who is working on the code & in Netbeans feel free to also use the same profile. Eclipse users are going to be left out in the cold on this since I'll be leaving the shores of Eclipse for BioJava3. Maybe this can be the start of a defined set of coding practices but for the moment it will ease code sharing for myself & Scooter. > > Andy > > On 12 May 2010, at 14:25, LAW Andrew wrote: > >> Don't think that there's anything I'd disagree with in what you've written. And he who controls the commit phase, controls the game (to paraphrase an obnoxious NZ rugby referee's tattoo). >> >> >> On 12 May 2010, at 13:21, Richard Holland wrote: >> >>> The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). >>> >>> The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. >>> >>> In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. >>> >>> The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. >>> >>> BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. >>> >>> cheers, >>> Richard >> >> Later, >> >> Andy >> -------- >> Yada, yada, yada... >> >> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >> >> >> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Wed May 12 09:56:33 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 14:56:33 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <9515BF3E-A65D-4B7F-9C28-E603FC4C4896@eaglegenomics.com> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> <9515BF3E-A65D-4B7F-9C28-E603FC4C4896@eaglegenomics.com> Message-ID: Come over to the Netbeans island. It's sunny & they have Mojitos On 12 May 2010, at 14:53, Richard Holland wrote: > it's cold out here! (says me, the last remaining eclipse user...) > > On 12 May 2010, at 14:42, Andy Yates wrote: > >> Can't think of a NZ ref you may be thinking of ;) >> >> So in the spirit of this I have been trying to move towards a more professional approach to BioJava development but this is counteracted by the time I can devote to it producing work closer to the first pattern of work that Richard mentioned. Quick & easy standards where IDEs can help are at least a step towards helping me out and meaning I get to spend more time worrying about the code, tests & docs. >> >> Going back to the second point of the thread if Scooter can share your Netbeans formatting profile then I will adopt that; if it is the default then I will use that unchanged. Anyone else who is working on the code & in Netbeans feel free to also use the same profile. Eclipse users are going to be left out in the cold on this since I'll be leaving the shores of Eclipse for BioJava3. Maybe this can be the start of a defined set of coding practices but for the moment it will ease code sharing for myself & Scooter. >> >> Andy >> >> On 12 May 2010, at 14:25, LAW Andrew wrote: >> >>> Don't think that there's anything I'd disagree with in what you've written. And he who controls the commit phase, controls the game (to paraphrase an obnoxious NZ rugby referee's tattoo). >>> >>> >>> On 12 May 2010, at 13:21, Richard Holland wrote: >>> >>>> The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). >>>> >>>> The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. >>>> >>>> In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. >>>> >>>> The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. >>>> >>>> BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. >>>> >>>> cheers, >>>> Richard >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From markjschreiber at gmail.com Wed May 12 09:58:06 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 12 May 2010 21:58:06 +0800 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> References: <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> Message-ID: What's wrong with New Zealand referees? Like Richard, having seen both sides of this game I think that we should at least adhere to some minimum standards. It does however mean that code needs to be reviewed by either some code review tool or a couple of people who will do some minimal checks for quality. Pleasingly there does seem to be some more developers on the list now so this might be feasible. It will also mean that there will be fewer people with commit privileges. - Mark On Wed, May 12, 2010 at 9:42 PM, Andy Yates wrote: > > Can't think of a NZ ref you may be thinking of ;) > > So in the spirit of this I have been trying to move towards a more professional approach to BioJava development but this is counteracted by the time I can devote to it producing work closer to the first pattern of work that Richard mentioned. Quick & easy standards where IDEs can help are at least a step towards helping me out and meaning I get to spend more time worrying about the code, tests & docs. > > Going back to the second point of the thread if Scooter can share your Netbeans formatting profile then I will adopt that; if it is the default then I will use that unchanged. Anyone else who is working on the code & in Netbeans feel free to also use the same profile. Eclipse users are going to be left out in the cold on this since I'll be leaving the shores of Eclipse for BioJava3. Maybe this can be the start of a defined set of coding practices but for the moment it will ease code sharing for myself & Scooter. > > Andy > > On 12 May 2010, at 14:25, LAW Andrew wrote: > > > Don't think that there's anything I'd disagree with in what you've written. And he who controls the commit phase, controls the game (to paraphrase an obnoxious NZ rugby referee's tattoo). > > > > > > On 12 May 2010, at 13:21, Richard Holland wrote: > > > >> The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). > >> > >> The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. > >> > >> In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. > >> > >> The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. > >> > >> BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. > >> > >> cheers, > >> Richard > > > > Later, > > > > Andy > > -------- > > Yada, yada, yada... > > > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > > > > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From markjschreiber at gmail.com Wed May 12 10:00:22 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 12 May 2010 22:00:22 +0800 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> <9515BF3E-A65D-4B7F-9C28-E603FC4C4896@eaglegenomics.com> Message-ID: Eclipse does have better Android, AppEngine and AWS support, I use both, Netbeans for Desktops and Servers and Eclipse for Phones and Clouds. On Wed, May 12, 2010 at 9:56 PM, Andy Yates wrote: > Come over to the Netbeans island. It's sunny & they have Mojitos > > On 12 May 2010, at 14:53, Richard Holland wrote: > >> it's cold out here! (says me, the last remaining eclipse user...) >> >> On 12 May 2010, at 14:42, Andy Yates wrote: >> >>> Can't think of a NZ ref you may be thinking of ;) >>> >>> So in the spirit of this I have been trying to move towards a more professional approach to BioJava development but this is counteracted by the time I can devote to it producing work closer to the first pattern of work that Richard mentioned. Quick & easy standards where IDEs can help are at least a step towards helping me out and meaning I get to spend more time worrying about the code, tests & docs. >>> >>> Going back to the second point of the thread if Scooter can share your Netbeans formatting profile then I will adopt that; if it is the default then I will use that unchanged. Anyone else who is working on the code & in Netbeans feel free to also use the same profile. Eclipse users are going to be left out in the cold on this since I'll be leaving the shores of Eclipse for BioJava3. Maybe this can be the start of a defined set of coding practices but for the moment it will ease code sharing for myself & Scooter. >>> >>> Andy >>> >>> On 12 May 2010, at 14:25, LAW Andrew wrote: >>> >>>> Don't think that there's anything I'd disagree with in what you've written. And he who controls the commit phase, controls the game (to paraphrase an obnoxious NZ rugby referee's tattoo). >>>> >>>> >>>> On 12 May 2010, at 13:21, Richard Holland wrote: >>>> >>>>> The below is a massive generalisation based on my experience of having worked as a software developer both in academia and in commercial environments (including an airline and a bank). >>>>> >>>>> The standards of academic-generated software tend to be defined as whatever the author (usually a sole author) identifies as requiring least effort to achieve, so the library that gets used is the one they've just found on Google that looks like it probably works, the formatting is their editor's default, documentation is minimal (because most users will probably just email and ask questions anyway), user requirements gathering is based on asking friends at coffee, the code commenting is for their own short-term reference (because most software ceases to be developed any further after the paper about it is published or the author leaves the institute), etc. The standards tend to be selected as whatever makes the author's life easiest, as the software's purpose is to support some other research and usually is not the author's main goal - it gets published as a side-effect of their main research interests. >>>>> >>>>> In commercial software standards tend to be defined by best practice, with authors (usually teams) being asked to adhere strictly to defined customs such a code formatting, and to thoroughly test any new libraries before using them. Detailed user documentation is required because helpdesks are expensive things to run, and proper detailed user requirements gathering is important to minimise subsequent helpdesk contact and improvement requests. Code commenting is also critical to aid new members of the team to take over when old members leave and there's no overlap to transfer knowledge face-to-face. These processes slow down development and increase the costs of the software produced but they do ensure that the code is of higher quality and is more maintainable, and that the users get more benefit from it. >>>>> >>>>> The above comments might seem harsh over-generalisations but I can put my hand up and say I've definitely been guilty in the past of most of those academic charges myself, and I know plenty of others who have been there too. Likewise I've experienced how depressing it is to work in a commercial environment so strictly locked down that you think development is heading nowhere because it's just all too much effort. >>>>> >>>>> BioJava sits between the two, along with most other Bio* open source projects. It's produced and maintained mostly by a team, as in commercial software, but that team is made up mostly of academics. If we want BioJava to be accepted as quality software we have to adopt at least some of the above commercial software development techniques, and then we need to ensure that everyone who develops for BioJava sticks to them to the letter. That could include being asked to reconfigure default settings on editors, post detailed explanations about choice of external libraries, etc. >>>>> >>>>> cheers, >>>>> Richard >>>> >>>> Later, >>>> >>>> Andy >>>> -------- >>>> Yada, yada, yada... >>>> >>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>> >>> -- >>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >>> >>> >>> >>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andy.law at roslin.ed.ac.uk Wed May 12 10:15:45 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Wed, 12 May 2010 15:15:45 +0100 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> <6AB63499-373D-4A20-A1A9-4C65CC8BA32D@eaglegenomics.com> <7FEEF486-B259-4E22-8298-DA376B14DC7B@ebi.ac.uk> <9515BF3E-A65D-4B7F-9C28-E603FC4C4896@eaglegenomics.com> Message-ID: On 12 May 2010, at 15:00, Mark Schreiber wrote: > Eclipse does have better Android, AppEngine and AWS support, I use > both, Netbeans for Desktops and Servers and Eclipse for Phones and > Clouds. Hmmm. BioJava on Android anyone? :o} Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Wed May 12 10:45:20 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 12 May 2010 15:45:20 +0100 Subject: [Biojava-dev] DNASequence not being a bean In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DC1BBA6F@EXSEEDMD.is.ed.ac.uk> References: <246E113A-08DC-477E-B624-1CD2E69CEBB8@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DC1BBA6F@EXSEEDMD.is.ed.ac.uk> Message-ID: Let us know what you think as soon as possible and how you're going to proceed. I'm going through the code later on today and will attempt to clean up all of these problems. I'm sorry but for the moment the SVN will be broken. My only fix for the meanwhile is to ignore the GC tests which is the ones which will probably be causing you the problems Andy On 12 May 2010, at 10:37, PATERSON Trevor wrote: > Thanks for your comments and ideas... > > I am mulling things over > > in the meantime - i haven't been able to connect to the SVN for a while...... there seems to be something wrong with open-bio connections, does it still allow anonymous connections? > > I can get the code tarred up from github....but it is missing a module so won't compile > >>> org/biojava3/core/sequence/DNATests.java:[13,33] package org.biojava3.core.features does not exist > > trevor > > > >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of >> Andy Yates >> Sent: 12 May 2010 10:10 >> To: Scooter Willis >> Cc: biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Biojava3-Core >> >> One thing as a co-contributor that I've noticed is there is a >> difference in coding style. I'm happy to adopt any coding >> style the majority agree with but I'd rather not do a huge >> commit changing 2 spaced tabs into 4 spaced tabs :). Can you >> send around your netbeans coding profile? >> >> Andy >> >> On 12 May 2010, at 02:39, Scooter Willis wrote: >> >>> >>> I just committed a fairly significant number of changes >> into BioJava3-core. If you don't have any code changes in >> core then probably better to delete the core package and >> check out again o avoid resolving lots of conflicts. >>> >>> I also checked in the beginning of biojava3-genome which >> makes significant use of biojava3-core. The code is being >> used to take gene predictions from GeneMarkHMM, GeneID and >> GeneMark which all have their own version of gff, gff3 and >> gtf output. The code will take features from the predictions >> and maps them onto DNASequences and allows you to get >> ProteinSequence for each prediction. It is very much a work >> in progress and will be adding additional features in the >> next 30 days for a gene annotation project that is part of >> the day job. >>> >>> Let me know if you have any comments or suggestions. I >> promise better java docs, test cases and wiki overview soon. >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Wed May 12 16:35:33 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 12 May 2010 16:35:33 -0400 Subject: [Biojava-dev] Code checkout References: <1AA4133C-6722-4F2C-8406-7810FC2C5680@scripps.edu> Message-ID: <684C21AE-E91D-4692-A6E5-3B5C1F441463@scripps.edu> Andy Yesterday when I was fighting code updates I only pulled the source tree and not the test source(forgot). So all test cases passing was only true for all old test cases. TranslationTest is failing on multiFrameTranslation. This could be a result of all the code changes we made or something you are working on. I didn't try to fix and the test case fails but all others pass. I also commented out the GCStats test case where not sure if that should be in the genome module or functionality that should be in the core. Your test case coverage is amazing. When are you going to get around to actually writing code that does something! Kidding of course and you have set the bar very high.... I moved FastaReaderHelper and FastaWriterHelper to the sequence.io package as they were in the sequence package. Not sure if we should adopt the Helper naming convention but I like having a class designed to bring together code abstraction in easy to use working code. The newbie programmer is not going to invest time in figuring out how to read a fasta file with all the flexibility we are providing so typical use cases can be represented in the helper class. It also makes it easier to have modular working examples in one place. The helper classes should all have static methods and very few lines of code per method. Scooter From sbliven at ucsd.edu Wed May 12 19:11:42 2010 From: sbliven at ucsd.edu (Spencer Bliven) Date: Wed, 12 May 2010 16:11:42 -0700 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> Message-ID: Regarding diff tools, I think all the major tools ignore whitespace if you set up the options right. In eclipse, this can be changed by right clicking. For diff, use 'diff -b'. for svn, use 'svn diff -x -b'. Meld (linux) and FileMerge (mac) also can easily switch back and forth. I assume the same is true in netbeans, although I've never used it. That said, a good diff tool won't prevent the annoyance of trying to read code with inconsistent whitespace. Lets set a standard (I like 4 spaces too) and all agree to stick to it. Setting up pre-commit hooks in SVN seems like it's usually more trouble than its worth ( http://stackoverflow.com/questions/1017953?tab=votes#tab-top). -Spencer On Wed, May 12, 2010 at 4:56 AM, Andy Yates wrote: > I can say that the Eclipse diff tool does care about whitespace and I'm > quite sure that the Unix diff cares about it as well. I can't say anything > about Netbeans since I haven't used it in years. > > With all the best respect if the formatting rules say 4 spaces per > indentation & code comes in not behaving in that manner then it'll be > reformatted. This is nothing against the person who does the commits just > they should be aware that it _could_ happen. I should say my personal > preference is 2 spaces per indentation; I am flexible about this & will go > with what the majority agree. That said if I'm being flexible about it I > hope others will be as well. > > Andy > > On 12 May 2010, at 12:41, LAW Andrew wrote: > > > I think the main thing should be for each of us to find a diff tool that > doesn't care about whitespace. The other stuff (camelCase, use of braces > [ALWAYS!!!], splitting lines) are more important and can/should be defined > but if you tell me that we must use 2 spaces to indent and my editor of > choice uses a tab then I'm probably not going to listen to you in all > practical situations. > > > > Not being awkward, just pragmatic. > > > > > > On 12 May 2010, at 12:26, Andy Yates wrote: > > > >> So long as they are documented then developers shouldn't complain when > their formatting changes. I just want to avoid a situation like now with > Scooter's check-in which has a lot of formatting changes so the code changes > are quite hard to pick out. > >> > >> I've gone and started a page about conventions and have started a > discussion. Once we are agreed on what should go into the conventions then > they will migrate to the wiki page. > >> > >> Andy > >> > >> On 12 May 2010, at 12:09, LAW Andrew wrote: > >> > >>> > >>> On 12 May 2010, at 11:52, Richard Holland wrote: > >>> > >>>> I see a need for a formal coding style here, regardless of what > platform people are using. Taking Netbeans as a basis is a good start but it > needs to be documented for contributors to read and follow, then enforced > across the whole project. > >>> > >>> Absolutely. But the pain in making > use those conventions can be numbing. > >>> > >>> > >>> > >>> Later, > >>> > >>> Andy > >>> -------- > >>> Yada, yada, yada... > >>> > >>> The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336 > >>> Disclaimer: This e-mail and any attachments are confidential and > intended solely for the use of the recipient(s) to whom they are addressed. > If you have received it in error, please destroy all copies and inform the > sender. > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> The University of Edinburgh is a charitable body, registered in > >>> Scotland, with registration number SC005336. > >>> > >> > >> -- > >> Andrew Yates Ensembl Genomes Engineer > >> EMBL-EBI Tel: +44-(0)1223-492538 > >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >> > >> > >> > >> > > > > Later, > > > > Andy > > -------- > > Yada, yada, yada... > > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336 > > Disclaimer: This e-mail and any attachments are confidential and intended > solely for the use of the recipient(s) to whom they are addressed. If you > have received it in error, please destroy all copies and inform the sender. > > > > > > > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Wed May 12 19:34:22 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 12 May 2010 16:34:22 -0700 Subject: [Biojava-dev] Biojava3-Core In-Reply-To: References: <84E58D87-0921-4441-8F50-C080E5A53CCC@exseed.ed.ac.uk> <1F631C17-C53E-4192-8A06-0009A654A558@ebi.ac.uk> <93614637-B1A9-4029-BD30-AD724759C6E8@eaglegenomics.com> <098592F1-757C-495C-939F-97E9ABA9F41F@exseed.ed.ac.uk> <653183E9-D0A3-4536-ABFB-F983522AFDEA@ebi.ac.uk> <6AD9B907-565B-4CDD-8647-9FD66ED41075@exseed.ed.ac.uk> <45A3FA41-CC98-472D-96ED-36C56F1670BA@ebi.ac.uk> Message-ID: Eclipse has code style templates. Are they based on a standard that can be shared between IDEs? if yes the easiest would be to define such a template and make everybody use it.... Andreas On Wed, May 12, 2010 at 4:11 PM, Spencer Bliven wrote: > Regarding diff tools, I think all the major tools ignore whitespace if you > set up the options right. In eclipse, this can be changed by right clicking. > For diff, use 'diff -b'. for svn, use 'svn diff -x -b'. Meld (linux) and > FileMerge (mac) also can easily switch back and forth. I assume the same is > true in netbeans, although I've never used it. > > That said, a good diff tool won't prevent the annoyance of trying to read > code with inconsistent whitespace. Lets set a standard (I like 4 spaces too) > and all agree to stick to it. Setting up pre-commit hooks in SVN seems like > it's usually more trouble than its worth ( > http://stackoverflow.com/questions/1017953?tab=votes#tab-top). > > -Spencer > > > On Wed, May 12, 2010 at 4:56 AM, Andy Yates wrote: > >> I can say that the Eclipse diff tool does care about whitespace and I'm >> quite sure that the Unix diff cares about it as well. I can't say anything >> about Netbeans since I haven't used it in years. >> >> With all the best respect if the formatting rules say 4 spaces per >> indentation & code comes in not behaving in that manner then it'll be >> reformatted. This is nothing against the person who does the commits just >> they should be aware that it _could_ happen. I should say my personal >> preference is 2 spaces per indentation; I am flexible about this & will go >> with what the majority agree. That said if I'm being flexible about it I >> hope others will be as well. >> >> Andy >> >> On 12 May 2010, at 12:41, LAW Andrew wrote: >> >> > I think the main thing should be for each of us to find a diff tool that >> doesn't care about whitespace. The other stuff (camelCase, use of braces >> [ALWAYS!!!], splitting lines) are more important and can/should be defined >> but if you tell me that we must use 2 spaces to indent and my editor of >> choice uses a tab then I'm probably not going to listen to you in all >> practical situations. >> > >> > Not being awkward, just pragmatic. >> > >> > >> > On 12 May 2010, at 12:26, Andy Yates wrote: >> > >> >> So long as they are documented then developers shouldn't complain when >> their formatting changes. I just want to avoid a situation like now with >> Scooter's check-in which has a lot of formatting changes so the code changes >> are quite hard to pick out. >> >> >> >> I've gone and started a page about conventions and have started a >> discussion. Once we are agreed on what should go into the conventions then >> they will migrate to the wiki page. >> >> >> >> Andy >> >> >> >> On 12 May 2010, at 12:09, LAW Andrew wrote: >> >> >> >>> >> >>> On 12 May 2010, at 11:52, Richard Holland wrote: >> >>> >> >>>> I see a need for a formal coding style here, regardless of what >> platform people are using. Taking Netbeans as a basis is a good start but it >> needs to be documented for contributors to read and follow, then enforced >> across the whole project. >> >>> >> >>> Absolutely. But the pain in making >> use those conventions can be numbing. >> >>> >> >>> >> >>> >> >>> Later, >> >>> >> >>> Andy >> >>> -------- >> >>> Yada, yada, yada... >> >>> >> >>> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336 >> >>> Disclaimer: This e-mail and any attachments are confidential and >> intended solely for the use of the recipient(s) to whom they are addressed. >> If you have received it in error, please destroy all copies and inform the >> sender. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> The University of Edinburgh is a charitable body, registered in >> >>> Scotland, with registration number SC005336. >> >>> >> >> >> >> -- >> >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> >> >> >> >> > >> > Later, >> > >> > Andy >> > -------- >> > Yada, yada, yada... >> > >> > The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336 >> > Disclaimer: This e-mail and any attachments are confidential and intended >> solely for the use of the recipient(s) to whom they are addressed. If you >> have received it in error, please destroy all copies and inform the sender. >> > >> > >> > >> > >> > >> > >> > -- >> > The University of Edinburgh is a charitable body, registered in >> > Scotland, with registration number SC005336. >> > >> >> -- >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From heuermh at acm.org Thu May 13 01:06:30 2010 From: heuermh at acm.org (Michael Heuer) Date: Thu, 13 May 2010 01:06:30 -0400 (EDT) Subject: [Biojava-dev] Biojava3-Core In-Reply-To: Message-ID: Sorry to come into this conversation late. I have gone through this in several different FOSS and in-house projects, and have found that the best way to do it is 1) Configure Checkstyle plugin (supported via ant, maven, eclipse, maybe others) to a standard that most developers are comfortable with and that doesn't result in 10000s of errors per class in the existing source tree http://checkstyle.sf.net 2) Configure the Eclipse formatter (or similar tool in another IDE) to match the Checkstyle configuration as closely as possible and run it on the existing source tree 3) Tweak both configurations until you have a reasonable result (say 100 or fewer errors per class) and check those in to revision control 4) Include Checkstyle reports in continuous integration builds, and make sure the number of Checkstyle errors goes down over time If developers use the IDE of choice, then they should use the formatter before check in, if not, then they are otherwise responsible for making sure they check in code that doesn't increase the number of Checkstyle errors significantly. Also, developers should try to commit style-fix-only changes in commits separate from feature-or-bug-fix changes. michael On Wed, 12 May 2010, Andreas Prlic wrote: > Eclipse has code style templates. Are they based on a standard that > can be shared between IDEs? if yes the easiest would be to define such > a template and make everybody use it.... > > Andreas > > > > On Wed, May 12, 2010 at 4:11 PM, Spencer Bliven wrote: > > Regarding diff tools, I think all the major tools ignore whitespace if you > > set up the options right. In eclipse, this can be changed by right clicking. > > For diff, use 'diff -b'. for svn, use 'svn diff -x -b'. Meld (linux) and > > FileMerge (mac) also can easily switch back and forth. I assume the same is > > true in netbeans, although I've never used it. > > > > That said, a good diff tool won't prevent the annoyance of trying to read > > code with inconsistent whitespace. Lets set a standard (I like 4 spaces too) > > and all agree to stick to it. Setting up pre-commit hooks in SVN seems like > > it's usually more trouble than its worth ( > > http://stackoverflow.com/questions/1017953?tab=votes#tab-top). > > > > -Spencer > > > > > > On Wed, May 12, 2010 at 4:56 AM, Andy Yates wrote: > > > >> I can say that the Eclipse diff tool does care about whitespace and I'm > >> quite sure that the Unix diff cares about it as well. I can't say anything > >> about Netbeans since I haven't used it in years. > >> > >> With all the best respect if the formatting rules say 4 spaces per > >> indentation & code comes in not behaving in that manner then it'll be > >> reformatted. This is nothing against the person who does the commits just > >> they should be aware that it _could_ happen. I should say my personal > >> preference is 2 spaces per indentation; I am flexible about this & will go > >> with what the majority agree. That said if I'm being flexible about it I > >> hope others will be as well. > >> > >> Andy > >> > >> On 12 May 2010, at 12:41, LAW Andrew wrote: > >> > >> > I think the main thing should be for each of us to find a diff tool that > >> doesn't care about whitespace. The other stuff (camelCase, use of braces > >> [ALWAYS!!!], splitting lines) are more important and can/should be defined > >> but if you tell me that we must use 2 spaces to indent and my editor of > >> choice uses a tab then I'm probably not going to listen to you in all > >> practical situations. > >> > > >> > Not being awkward, just pragmatic. > >> > > >> > > >> > On 12 May 2010, at 12:26, Andy Yates wrote: > >> > > >> >> So long as they are documented then developers shouldn't complain when > >> their formatting changes. I just want to avoid a situation like now with > >> Scooter's check-in which has a lot of formatting changes so the code changes > >> are quite hard to pick out. > >> >> > >> >> I've gone and started a page about conventions and have started a > >> discussion. Once we are agreed on what should go into the conventions then > >> they will migrate to the wiki page. > >> >> > >> >> Andy > >> >> > >> >> On 12 May 2010, at 12:09, LAW Andrew wrote: > >> >> > >> >>> > >> >>> On 12 May 2010, at 11:52, Richard Holland wrote: > >> >>> > >> >>>> I see a need for a formal coding style here, regardless of what > >> platform people are using. Taking Netbeans as a basis is a good start but it > >> needs to be documented for contributors to read and follow, then enforced > >> across the whole project. > >> >>> > >> >>> Absolutely. But the pain in making > >> use those conventions can be numbing. > >> >>> > >> >>> > >> >>> > >> >>> Later, > >> >>> > >> >>> Andy > >> >>> -------- > >> >>> Yada, yada, yada... > >> >>> > >> >>> The University of Edinburgh is a charitable body, registered in > >> Scotland, with registration number SC005336 > >> >>> Disclaimer: This e-mail and any attachments are confidential and > >> intended solely for the use of the recipient(s) to whom they are addressed. > >> If you have received it in error, please destroy all copies and inform the > >> sender. > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> The University of Edinburgh is a charitable body, registered in > >> >>> Scotland, with registration number SC005336. > >> >>> > >> >> > >> >> -- > >> >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > >> >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > >> >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > >> >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > >> >> > >> >> > >> >> > >> >> > >> > > >> > Later, > >> > > >> > Andy > >> > -------- > >> > Yada, yada, yada... > >> > > >> > The University of Edinburgh is a charitable body, registered in Scotland, > >> with registration number SC005336 > >> > Disclaimer: This e-mail and any attachments are confidential and intended > >> solely for the use of the recipient(s) to whom they are addressed. If you > >> have received it in error, please destroy all copies and inform the sender. > >> > > >> > > >> > > >> > > >> > > >> > > >> > -- > >> > The University of Edinburgh is a charitable body, registered in > >> > Scotland, with registration number SC005336. > >> > > >> > >> -- > >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > >> > >> > >> > >> > >> > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From trevor.paterson at roslin.ed.ac.uk Thu May 13 08:21:47 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Thu, 13 May 2010 13:21:47 +0100 Subject: [Biojava-dev] EnsemblApi use case for DNASequences Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> Perhaps if I describe our initial use case and how we hope to address it using Ibatis and BioJava API, I can get some pointers on how much of this is already supported in BioJava, how much I am going to need to implement and how I would be best doing this to generate useful reusable code. For each genome assembly build Ensembl stores different levels of DNA sequence regions, it calls these coordinate_sytems (eg clones, contigs, chromosomes etc). For each genome assembly there is one 'TopLevel' coordinate_system (eg chromosomes). And one 'SequenceLevel' coordinate_system (eg contigs). Each sequence region in the database records its length and coordinate_system BUT ONLY DNA regions which are at 'SequenceLevel' have actual DNA Sequence recorded, all other regions must have their actual sequence recovered by 'projecting' from their level to DNA regions at 'SequenceLevel'. so our initial use case is __________________________ 1. retrieve Chromosome 25 for Chicken from the database. What we get back are some properties (Name, coordinateSystemID and length) - and what we map this to in ibatis is an AssembledSequence Object - with these properties 2. fetch the sequence level assmbly details for this Chromosome. We get back a table mapping from-to coordinates of the chromosome versus from to coordinates of the contigs that are at Sequence Level diagramatically this looks like <--------------------------------------------------------------> chr25 <--> <---> <----> <--> <--> <-----> <--> <--> <----> <---> contigs <-----> <--> <----> <-------> you will note that there are - overlaps - gaps - potentially mismatches ( I am ignoring these for the moment) 3. to get the DNA sequence, the ensembl perl api stitches together the contigs into one 'Sequence' - filling gaps with gap sequences of the correct length, so it generates an ordered list of mappings between the chromosome coordinate system and coordinates of contigs and gaps <--> <---> ---> <--> > <-----> -> ---> <---> contigs --> <--> <----> -------> nn n n n nn gaps the perl api can then fetch the actual DNA sequence for any region of the chromosome by looking up the contig regions it needs to fetch the projected sequence of from this projection map. Remember that chromosomes, contigs and gaps can all be very long, or very short! Our Java API ____________ I have mirrored what the perl api does fetching a chromosome object - which Ibatis instantiates as an AssembledSequence object, which extends BioJava DNASequence Object - but obviously just has a couple of new properties set at this time (length, name, coord_system). fetching an Assembly Object for this Chromosome Object - this contains an ordered List of Mapping Objects which contain Source (ie the Chromosome), SourceCoordinates, Target (a new DNASequence Object for each contig), TargetCoordinates This Assembly Object can stitch together the Mapping Projection for all or some of the Chromosome, just like the perl API, creating a new ordered List of Mapping Objects where the TargetCoordinates are alterred to remove overlaps, and new GapSequence objects have been inserted. [Gaps are problematic - do I really want DNASequence Objects that contain N of length x, allowing me to use the Gaps just like any other DNASequence but with all the overhead that invloves, or should I just omit these mappings, or do i set the Target to Null in a Mapping - and then I will need code to handle these wherever I use sequences that contain null spacers - PERHAPS there is some representation to handle Gaps generically in the BioJava API). So now I am at the point of fetching actual DNA Sequence for regions of interest on the Chromosome. This will invlove a look up of the stitched Mapping List for the contig regions to retieve from Ensembl, and then setting the actual DNA sequence in these. Hence my simplistic extension of DNA Sequences in the above scenario falls over because of the Ibatis Bean requirement for setting properties directly on Objects, whivh i cant work around if the DNASequence objects don't allow for setters. I'm playing with lots of different ideas - possibly the simplest is just to forget about extending BioJava DNASequence for my ensembl objects (chromosomes, contigs) - and just create DNASequences for the 'real' Sequences that I get back as base strings from ensembl, which would then be contained or referenced in my chromosomes/contig objects etc. I am sure however that this would mean that I end up having to reimplement much of the BioJava functionality in the new model Objects, whereas I was hoping to leverage this transparently by simply extending DNASequence. I guess one of my biggest concerns about extending BioJava to represent very big sequences is the potential overhead if i have to instantiate them with backing stores containing the 'real' sequences - we are obvioulsy hoping to lazy load (sub)sequences from ensembl when they are actually needed. We would have to be very careful to override all the methods that called back to the backing store if we already had the information we needed or could lazy load it, without grabbing the whole sequence. (e.g. the simple case of the chromosome - we have the length from the initial query - so wouldn't want retrieve it from the backing store). So probably the correct way of doing things is to Implement our own SequenceProxyReader for EnsemblAware Sequences to handle lazy loads, which also provides all of the required backing store functionality. As usual the correct way will turn out to be the most work! Cheers Trevor -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Thu May 13 08:38:10 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 13 May 2010 13:38:10 +0100 Subject: [Biojava-dev] EnsemblApi use case for DNASequences In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> Message-ID: I have to say that from working with Ensembl for the past 2 years hearing this is what it does to store sequence scares **** out of me; you've really hit onto the hardest part of the schema there. As you said at the end of your email the best way to accomplish this is by creating a SeqeunceProxyReader which can do all this logic and lets you work with the "right" objects and not have to re-implement that code. Now this leaves a few alternatives to how you can represent this in memory. We already have a 2bit implementation (will be called TwoBitSequenceReader) for storing very large pieces of Sequence but that only has support for ACGT and no support for gaps or Ns. This could be extended to bring in support for these as features or you could materialise that sequence and then push it into another Sequence object I have been working with (unchecked in atmo) which lets you join Sequences together. This combined with a Sequence which returns Compounds of a particular type e.g. Ns for any given length would let you represent massive amounts of Sequence in a very small amount of space. All of these updates will be in place soon but I cannot say exactly when The other option would be to cache chunks of the DNA indexed by the seq_region_id. Pushing this into a LRU cache with soft references (so they'll be cleaned up when you'd run out of memory) could be a good way to go. Either way the simple way really isn't the way to go IMHO; on the flip side it would get you to a prototype quicker. Of course this depends on what type of code you are writing. If it is prototype code then great or if it's what normally happens in Bioinformatics (we claim it's a prototype but in reality it's the real deal) then go with the proper solution Andy On 13 May 2010, at 13:21, PATERSON Trevor wrote: > Perhaps if I describe our initial use case and how we hope to address it > using Ibatis and BioJava API, I can get some pointers on how much of this > is already supported in BioJava, how much I am going to need to implement > and how I would be best doing this to generate useful reusable code. > > For each genome assembly build Ensembl stores different levels of DNA > sequence regions, it calls these coordinate_sytems (eg clones, contigs, chromosomes etc). > > For each genome assembly there is one 'TopLevel' coordinate_system (eg chromosomes). > And one 'SequenceLevel' coordinate_system (eg contigs). > > Each sequence region in the database records its length and coordinate_system > BUT ONLY DNA regions which are at 'SequenceLevel' have actual DNA Sequence > recorded, all other regions must have their actual sequence recovered by > 'projecting' from their level to DNA regions at 'SequenceLevel'. > > > so our initial use case is > __________________________ > > 1. retrieve Chromosome 25 for Chicken from the database. > > What we get back are some properties (Name, coordinateSystemID and length) > - and what we map this to in ibatis is an AssembledSequence Object - with these properties > > 2. fetch the sequence level assmbly details for this Chromosome. > > We get back a table mapping from-to coordinates of the chromosome versus from to > coordinates of the contigs that are at Sequence Level > > diagramatically this looks like > > > <--------------------------------------------------------------> chr25 > <--> <---> <----> <--> <--> <-----> <--> <--> <----> <---> contigs > <-----> <--> <----> <-------> > > you will note that there are > - overlaps > - gaps > - potentially mismatches ( I am ignoring these for the moment) > > 3. to get the DNA sequence, the ensembl perl api stitches together the contigs into > one 'Sequence' - filling gaps with gap sequences of the correct length, so it generates > an ordered list of mappings between the chromosome coordinate system and coordinates of > contigs and gaps > > <--> <---> ---> <--> > <-----> -> ---> <---> contigs > --> <--> <----> -------> > nn n n n nn gaps > > the perl api can then fetch the actual DNA sequence for any region of the chromosome > by looking up the contig regions it needs to fetch the projected sequence of from this > projection map. > > Remember that chromosomes, contigs and gaps can all be very long, or very short! > > Our Java API > ____________ > > I have mirrored what the perl api does > > fetching a chromosome object - which Ibatis instantiates as an AssembledSequence object, > which extends BioJava DNASequence Object - but obviously just has a couple of new properties > set at this time (length, name, coord_system). > > fetching an Assembly Object for this Chromosome Object - this contains an ordered List of Mapping > Objects which contain Source (ie the Chromosome), SourceCoordinates, Target (a new DNASequence Object > for each contig), TargetCoordinates > > This Assembly Object can stitch together the Mapping Projection for all or some of the > Chromosome, just like the perl API, creating a new ordered List of Mapping Objects where > the TargetCoordinates are alterred to remove overlaps, and new GapSequence objects have been > inserted. [Gaps are problematic - do I really want DNASequence Objects that contain N of > length x, allowing me to use the Gaps just like any other DNASequence but with all the overhead > that invloves, or should I just omit these mappings, or do i set the Target to Null in a Mapping > - and then I will need code to handle these wherever I use sequences that contain null spacers - > PERHAPS there is some representation to handle Gaps generically in the BioJava API). > > So now I am at the point of fetching actual DNA Sequence for regions of interest on the > Chromosome. This will invlove a look up of the stitched Mapping List for the contig regions > to retieve from Ensembl, and then setting the actual DNA sequence in these. > > Hence my simplistic extension of DNA Sequences in the above scenario falls over because of the > Ibatis Bean requirement for setting properties directly on Objects, whivh i cant work around if > the DNASequence objects don't allow for setters. > > I'm playing with lots of different ideas - possibly the simplest is just to forget about > extending BioJava DNASequence for my ensembl objects (chromosomes, contigs) > - and just create DNASequences for the 'real' Sequences that I get back as base strings > from ensembl, which would then be contained or referenced in my chromosomes/contig objects etc. > I am sure however that this would mean that I end up having to > reimplement much of the BioJava functionality in the new model Objects, whereas I was hoping > to leverage this transparently by simply extending DNASequence. > > I guess one of my biggest concerns about extending BioJava to represent very big sequences is > the potential overhead if i have to instantiate them with backing stores containing the 'real' > sequences - we are obvioulsy hoping to lazy load (sub)sequences from ensembl when they are actually > needed. We would have to be very careful to override all the methods that called back to the backing > store if we already had the information we needed or could lazy load it, without grabbing the whole sequence. > (e.g. the simple case of the chromosome - we have the length from the initial query - so wouldn't want > retrieve it from the backing store). > > So probably the correct way of doing things is to Implement our own SequenceProxyReader for EnsemblAware > Sequences to handle lazy loads, which also provides all of the required backing store functionality. As > usual the correct way will turn out to be the most work! > > Cheers Trevor > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From trevor.paterson at roslin.ed.ac.uk Thu May 13 08:44:09 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Thu, 13 May 2010 13:44:09 +0100 Subject: [Biojava-dev] EnsemblApi use case for DNASequences In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E1FA@EXSEEDMD.is.ed.ac.uk> That all sounds very useful - I'll get started now :) > -----Original Message----- > From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates > Sent: 13 May 2010 13:38 > To: PATERSON Trevor > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] EnsemblApi use case for DNASequences > > I have to say that from working with Ensembl for the past 2 > years hearing this is what it does to store sequence scares > **** out of me; you've really hit onto the hardest part of > the schema there. > > As you said at the end of your email the best way to > accomplish this is by creating a SeqeunceProxyReader which > can do all this logic and lets you work with the "right" > objects and not have to re-implement that code. Now this > leaves a few alternatives to how you can represent this in > memory. We already have a 2bit implementation (will be called > TwoBitSequenceReader) for storing very large pieces of > Sequence but that only has support for ACGT and no support > for gaps or Ns. This could be extended to bring in support > for these as features or you could materialise that sequence > and then push it into another Sequence object I have been > working with (unchecked in atmo) which lets you join > Sequences together. This combined with a Sequence which > returns Compounds of a particular type e.g. Ns for any given > length would let you represent massive amounts of Sequence in > a very small amount of space. All of these updates will be in > place soon but I cannot say exactly when > > The other option would be to cache chunks of the DNA indexed > by the seq_region_id. Pushing this into a LRU cache with soft > references (so they'll be cleaned up when you'd run out of > memory) could be a good way to go. > > Either way the simple way really isn't the way to go IMHO; on > the flip side it would get you to a prototype quicker. Of > course this depends on what type of code you are writing. If > it is prototype code then great or if it's what normally > happens in Bioinformatics (we claim it's a prototype but in > reality it's the real deal) then go with the proper solution > > Andy > > On 13 May 2010, at 13:21, PATERSON Trevor wrote: > > > Perhaps if I describe our initial use case and how we hope > to address > > it using Ibatis and BioJava API, I can get some pointers on > how much > > of this is already supported in BioJava, how much I am > going to need > > to implement and how I would be best doing this to generate > useful reusable code. > > > > For each genome assembly build Ensembl stores different > levels of DNA > > sequence regions, it calls these coordinate_sytems (eg > clones, contigs, chromosomes etc). > > > > For each genome assembly there is one 'TopLevel' > coordinate_system (eg chromosomes). > > And one 'SequenceLevel' coordinate_system (eg contigs). > > > > Each sequence region in the database records its length and > > coordinate_system BUT ONLY DNA regions which are at 'SequenceLevel' > > have actual DNA Sequence recorded, all other regions must > have their > > actual sequence recovered by 'projecting' from their level > to DNA regions at 'SequenceLevel'. > > > > > > so our initial use case is > > __________________________ > > > > 1. retrieve Chromosome 25 for Chicken from the database. > > > > What we get back are some properties (Name, coordinateSystemID and > > length) > > - and what we map this to in ibatis is an AssembledSequence > Object - > > with these properties > > > > 2. fetch the sequence level assmbly details for this Chromosome. > > > > We get back a table mapping from-to coordinates of the chromosome > > versus from to coordinates of the contigs that are at Sequence Level > > > > diagramatically this looks like > > > > > > > <--------------------------------------------------------------> chr25 > > <--> <---> <----> <--> <--> <-----> <--> <--> <----> > <---> contigs > > <-----> <--> <----> <-------> > > > > you will note that there are > > - overlaps > > - gaps > > - potentially mismatches ( I am ignoring these for the moment) > > > > 3. to get the DNA sequence, the ensembl perl api stitches > together the > > contigs into one 'Sequence' - filling gaps with gap > sequences of the > > correct length, so it generates an ordered list of mappings between > > the chromosome coordinate system and coordinates of > contigs and gaps > > > > <--> <---> ---> <--> > <-----> -> ---> > <---> contigs > > --> <--> <----> -------> > > nn n n n nn gaps > > > > the perl api can then fetch the actual DNA sequence for any > region of > > the chromosome by looking up the contig regions it needs to > fetch the > > projected sequence of from this projection map. > > > > Remember that chromosomes, contigs and gaps can all be very > long, or very short! > > > > Our Java API > > ____________ > > > > I have mirrored what the perl api does > > > > fetching a chromosome object - which Ibatis instantiates as an > > AssembledSequence object, which extends BioJava DNASequence > Object - > > but obviously just has a couple of new properties set at > this time (length, name, coord_system). > > > > fetching an Assembly Object for this Chromosome Object - > this contains > > an ordered List of Mapping Objects which contain Source (ie the > > Chromosome), SourceCoordinates, Target (a new DNASequence > Object for > > each contig), TargetCoordinates > > > > This Assembly Object can stitch together the Mapping Projection for > > all or some of the Chromosome, just like the perl API, > creating a new > > ordered List of Mapping Objects where the TargetCoordinates are > > alterred to remove overlaps, and new GapSequence objects have been > > inserted. [Gaps are problematic - do I really want > DNASequence Objects > > that contain N of length x, allowing me to use the Gaps > just like any > > other DNASequence but with all the overhead that invloves, > or should I > > just omit these mappings, or do i set the Target to Null in > a Mapping > > - and then I will need code to handle these wherever I use > sequences > > that contain null spacers - PERHAPS there is some > representation to handle Gaps generically in the BioJava API). > > > > So now I am at the point of fetching actual DNA Sequence > for regions > > of interest on the Chromosome. This will invlove a look up of the > > stitched Mapping List for the contig regions to retieve > from Ensembl, and then setting the actual DNA sequence in these. > > > > Hence my simplistic extension of DNA Sequences in the above > scenario > > falls over because of the Ibatis Bean requirement for setting > > properties directly on Objects, whivh i cant work around if > the DNASequence objects don't allow for setters. > > > > I'm playing with lots of different ideas - possibly the simplest is > > just to forget about extending BioJava DNASequence for my ensembl > > objects (chromosomes, contigs) > > - and just create DNASequences for the 'real' Sequences that I get > > back as base strings from ensembl, which would then be > contained or referenced in my chromosomes/contig objects etc. > > I am sure however that this would mean that I end up having to > > reimplement much of the BioJava functionality in the new model > > Objects, whereas I was hoping to leverage this > transparently by simply extending DNASequence. > > > > I guess one of my biggest concerns about extending BioJava to > > represent very big sequences is the potential overhead if i > have to instantiate them with backing stores containing the 'real' > > sequences - we are obvioulsy hoping to lazy load > (sub)sequences from > > ensembl when they are actually needed. We would have to be very > > careful to override all the methods that called back to the > backing store if we already had the information we needed or > could lazy load it, without grabbing the whole sequence. > > (e.g. the simple case of the chromosome - we have the > length from the > > initial query - so wouldn't want retrieve it from the > backing store). > > > > So probably the correct way of doing things is to Implement > our own > > SequenceProxyReader for EnsemblAware Sequences to handle > lazy loads, > > which also provides all of the required backing store > functionality. As usual the correct way will turn out to be > the most work! > > > > Cheers Trevor > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From biopython at maubp.freeserve.co.uk Thu May 13 09:20:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 May 2010 14:20:51 +0100 Subject: [Biojava-dev] EnsemblApi use case for DNASequences In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> Message-ID: On Thu, May 13, 2010 at 1:38 PM, Andy Yates wrote: > > > As you said at the end of your email the best way to accomplish this > is by creating a SeqeunceProxyReader which can do all this logic > and lets you work with the "right" objects and not have to re-implement > that code. Now this leaves a few alternatives to how you can represent > this in memory. We already have a 2bit implementation (will be called > TwoBitSequenceReader) for storing very large pieces of Sequence > but that only has support for ACGT and no support for gaps or Ns. > This could be extended to bring in support for these as features or > you could materialise that sequence and then push it into another > Sequence object I have been working with (unchecked in atmo) > which lets you join Sequences together. This combined with a > Sequence which returns Compounds of a particular type e.g. Ns for > any given length would let you represent massive amounts of > Sequence in a very small amount of space. All of these updates > will be in place soon but I cannot say exactly when Does BioJava have a 4bit sequence implementation for ambiguous DNA (or RNA)? That would let you treat N as 1111 (all four bits set) and a gap as 0000 (none of the bits set). Peter From ayates at ebi.ac.uk Thu May 13 09:31:15 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 13 May 2010 14:31:15 +0100 Subject: [Biojava-dev] EnsemblApi use case for DNASequences In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> Message-ID: <460E8BC6-5EBF-444E-9B01-0015C4E9F292@ebi.ac.uk> Not at the moment. The 2bit implementation has a worker and has been built with the idea that it _could_ be extended to as you say a 4bit implementation. If it were written I wouldn't keep it to just DNA or RNA but to any CompoundSet with 16 or less compounds. Andy On 13 May 2010, at 14:20, Peter wrote: > On Thu, May 13, 2010 at 1:38 PM, Andy Yates wrote: >> >> >> As you said at the end of your email the best way to accomplish this >> is by creating a SeqeunceProxyReader which can do all this logic >> and lets you work with the "right" objects and not have to re-implement >> that code. Now this leaves a few alternatives to how you can represent >> this in memory. We already have a 2bit implementation (will be called >> TwoBitSequenceReader) for storing very large pieces of Sequence >> but that only has support for ACGT and no support for gaps or Ns. >> This could be extended to bring in support for these as features or >> you could materialise that sequence and then push it into another >> Sequence object I have been working with (unchecked in atmo) >> which lets you join Sequences together. This combined with a >> Sequence which returns Compounds of a particular type e.g. Ns for >> any given length would let you represent massive amounts of >> Sequence in a very small amount of space. All of these updates >> will be in place soon but I cannot say exactly when > > Does BioJava have a 4bit sequence implementation for ambiguous > DNA (or RNA)? That would let you treat N as 1111 (all four bits set) > and a gap as 0000 (none of the bits set). > > Peter -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Thu May 13 09:35:15 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 13 May 2010 09:35:15 -0400 Subject: [Biojava-dev] EnsemblApi use case for DNASequences In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E1FA@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E1FA@EXSEEDMD.is.ed.ac.uk> Message-ID: <607BBC52-46BF-4DB2-BF2B-57B76B487116@scripps.edu> Trevor I agree that going the ProxySequenceReader sounds like the best approach. If you look at SequenceFileProxyLoader in the source this is simple example I put in place where it is used to load large fasta files where you can look at the main method in FastaReader for example of how it is used. I parse the file to get the accession id and corresponding sequence offset in the file. This creates a sequence proxy that has all the details to get the actual sequence data if a call is ever made. This is a lazy load example where the sequence will get pulled into memory when a request is made for actual data. This way you can load very large Fasta files without a memory penalty. The other use case as you describe is to never load the full sequence into memory. You can come up with your own optimization steps where you can keep various sub-sequences as they are requested in memory and clear them from the cache based on a buffer size or a length of time since that region was last accessed. It all depends on your use case. If you are going to iterate through the sequence one position at a time then you would implement a read buffer for sequence data to avoid the overhead of multiple calls to the database. What we haven't had a chance to do yet is put examples in place that pull down features for a sequence. The following is what we have been kicking around as a design concept but lots of details to work out. You should be able to create sequences with the following examples. DNASequence ncbiDNASequence = new DNASequence(new NCBISequenceReader("NC_000012.9")); with the ability to do this with no memory penalty or load time problem. new DNASequence(new NCBISequenceReader("NC_000013.9")); new DNASequence(new NCBISequenceReader("NC_000014.9")); new DNASequence(new NCBISequenceReader("NC_000015.9")); new DNASequence(new NCBISequenceReader("NC_000016.9")); This would also be a use case ProteinSequence rxraProteinSequence = new ProteinSequence(new UniprotSequenceReader("RXRA_HUMAN") ); The calls to the SequenceReader would know how to query the necessary details about the sequence as it relates to features etc. In the NCBISequenceReader example for the DNASequence set all the GeneSequences, TranscriptSequences, CDSSequences etc. where you delay loading sequence data as long as possible. If I wanted to work with promotor regions I could get the corresponding GeneSequence and then retrieve 5000 nucleotides in the 5' and 3' flanking region of the gene or grab all the sequence data located in the intron regions of the gene. Each request for sequence data would send the request to NCBI via the appropriate api(webservice,rest,url,toolkit) or in your example to the database. In the UniProtSequenceReader example I would want the following call to work GeneSequence rxraGeneSequence = rxraProteinSequence.getGeneSequence() where UniprotSequenceReader based on knowing everything about RXRA_HUMAN would create a GeneSequence with a call to GeneSequence(new NCBISequenceReader(rxra_gene_index)) and return a GeneSequence. In the biojava3-genome package you can take a look at GeneFeatureHelper as the code I am using to take GFF/GTF/GFF3 features and create GeneSequences. Welcome any input or suggestion on method/class names etc. Andy and I bounced around how to model the relationships and would benefit from input from others. For your IBATIS example would the following work public class SequenceCreator{ public SequenceCreator(){ } public DNASequence getDNASequence(SequenceReaderProxy sequenceReaderProxy){ return new DNASequence(sequenceReaderProxy); } } Thanks Scooter On May 13, 2010, at 8:44 AM, PATERSON Trevor wrote: > That all sounds very useful - I'll get started now :) > >> -----Original Message----- >> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates >> Sent: 13 May 2010 13:38 >> To: PATERSON Trevor >> Cc: biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] EnsemblApi use case for DNASequences >> >> I have to say that from working with Ensembl for the past 2 >> years hearing this is what it does to store sequence scares >> **** out of me; you've really hit onto the hardest part of >> the schema there. >> >> As you said at the end of your email the best way to >> accomplish this is by creating a SeqeunceProxyReader which >> can do all this logic and lets you work with the "right" >> objects and not have to re-implement that code. Now this >> leaves a few alternatives to how you can represent this in >> memory. We already have a 2bit implementation (will be called >> TwoBitSequenceReader) for storing very large pieces of >> Sequence but that only has support for ACGT and no support >> for gaps or Ns. This could be extended to bring in support >> for these as features or you could materialise that sequence >> and then push it into another Sequence object I have been >> working with (unchecked in atmo) which lets you join >> Sequences together. This combined with a Sequence which >> returns Compounds of a particular type e.g. Ns for any given >> length would let you represent massive amounts of Sequence in >> a very small amount of space. All of these updates will be in >> place soon but I cannot say exactly when >> >> The other option would be to cache chunks of the DNA indexed >> by the seq_region_id. Pushing this into a LRU cache with soft >> references (so they'll be cleaned up when you'd run out of >> memory) could be a good way to go. >> >> Either way the simple way really isn't the way to go IMHO; on >> the flip side it would get you to a prototype quicker. Of >> course this depends on what type of code you are writing. If >> it is prototype code then great or if it's what normally >> happens in Bioinformatics (we claim it's a prototype but in >> reality it's the real deal) then go with the proper solution >> >> Andy >> >> On 13 May 2010, at 13:21, PATERSON Trevor wrote: >> >>> Perhaps if I describe our initial use case and how we hope >> to address >>> it using Ibatis and BioJava API, I can get some pointers on >> how much >>> of this is already supported in BioJava, how much I am >> going to need >>> to implement and how I would be best doing this to generate >> useful reusable code. >>> >>> For each genome assembly build Ensembl stores different >> levels of DNA >>> sequence regions, it calls these coordinate_sytems (eg >> clones, contigs, chromosomes etc). >>> >>> For each genome assembly there is one 'TopLevel' >> coordinate_system (eg chromosomes). >>> And one 'SequenceLevel' coordinate_system (eg contigs). >>> >>> Each sequence region in the database records its length and >>> coordinate_system BUT ONLY DNA regions which are at 'SequenceLevel' >>> have actual DNA Sequence recorded, all other regions must >> have their >>> actual sequence recovered by 'projecting' from their level >> to DNA regions at 'SequenceLevel'. >>> >>> >>> so our initial use case is >>> __________________________ >>> >>> 1. retrieve Chromosome 25 for Chicken from the database. >>> >>> What we get back are some properties (Name, coordinateSystemID and >>> length) >>> - and what we map this to in ibatis is an AssembledSequence >> Object - >>> with these properties >>> >>> 2. fetch the sequence level assmbly details for this Chromosome. >>> >>> We get back a table mapping from-to coordinates of the chromosome >>> versus from to coordinates of the contigs that are at Sequence Level >>> >>> diagramatically this looks like >>> >>> >>> >> <--------------------------------------------------------------> chr25 >>> <--> <---> <----> <--> <--> <-----> <--> <--> <----> >> <---> contigs >>> <-----> <--> <----> <-------> >>> >>> you will note that there are >>> - overlaps >>> - gaps >>> - potentially mismatches ( I am ignoring these for the moment) >>> >>> 3. to get the DNA sequence, the ensembl perl api stitches >> together the >>> contigs into one 'Sequence' - filling gaps with gap >> sequences of the >>> correct length, so it generates an ordered list of mappings between >>> the chromosome coordinate system and coordinates of >> contigs and gaps >>> >>> <--> <---> ---> <--> > <-----> -> ---> >> <---> contigs >>> --> <--> <----> -------> >>> nn n n n nn gaps >>> >>> the perl api can then fetch the actual DNA sequence for any >> region of >>> the chromosome by looking up the contig regions it needs to >> fetch the >>> projected sequence of from this projection map. >>> >>> Remember that chromosomes, contigs and gaps can all be very >> long, or very short! >>> >>> Our Java API >>> ____________ >>> >>> I have mirrored what the perl api does >>> >>> fetching a chromosome object - which Ibatis instantiates as an >>> AssembledSequence object, which extends BioJava DNASequence >> Object - >>> but obviously just has a couple of new properties set at >> this time (length, name, coord_system). >>> >>> fetching an Assembly Object for this Chromosome Object - >> this contains >>> an ordered List of Mapping Objects which contain Source (ie the >>> Chromosome), SourceCoordinates, Target (a new DNASequence >> Object for >>> each contig), TargetCoordinates >>> >>> This Assembly Object can stitch together the Mapping Projection for >>> all or some of the Chromosome, just like the perl API, >> creating a new >>> ordered List of Mapping Objects where the TargetCoordinates are >>> alterred to remove overlaps, and new GapSequence objects have been >>> inserted. [Gaps are problematic - do I really want >> DNASequence Objects >>> that contain N of length x, allowing me to use the Gaps >> just like any >>> other DNASequence but with all the overhead that invloves, >> or should I >>> just omit these mappings, or do i set the Target to Null in >> a Mapping >>> - and then I will need code to handle these wherever I use >> sequences >>> that contain null spacers - PERHAPS there is some >> representation to handle Gaps generically in the BioJava API). >>> >>> So now I am at the point of fetching actual DNA Sequence >> for regions >>> of interest on the Chromosome. This will invlove a look up of the >>> stitched Mapping List for the contig regions to retieve >> from Ensembl, and then setting the actual DNA sequence in these. >>> >>> Hence my simplistic extension of DNA Sequences in the above >> scenario >>> falls over because of the Ibatis Bean requirement for setting >>> properties directly on Objects, whivh i cant work around if >> the DNASequence objects don't allow for setters. >>> >>> I'm playing with lots of different ideas - possibly the simplest is >>> just to forget about extending BioJava DNASequence for my ensembl >>> objects (chromosomes, contigs) >>> - and just create DNASequences for the 'real' Sequences that I get >>> back as base strings from ensembl, which would then be >> contained or referenced in my chromosomes/contig objects etc. >>> I am sure however that this would mean that I end up having to >>> reimplement much of the BioJava functionality in the new model >>> Objects, whereas I was hoping to leverage this >> transparently by simply extending DNASequence. >>> >>> I guess one of my biggest concerns about extending BioJava to >>> represent very big sequences is the potential overhead if i >> have to instantiate them with backing stores containing the 'real' >>> sequences - we are obvioulsy hoping to lazy load >> (sub)sequences from >>> ensembl when they are actually needed. We would have to be very >>> careful to override all the methods that called back to the >> backing store if we already had the information we needed or >> could lazy load it, without grabbing the whole sequence. >>> (e.g. the simple case of the chromosome - we have the >> length from the >>> initial query - so wouldn't want retrieve it from the >> backing store). >>> >>> So probably the correct way of doing things is to Implement >> our own >>> SequenceProxyReader for EnsemblAware Sequences to handle >> lazy loads, >>> which also provides all of the required backing store >> functionality. As usual the correct way will turn out to be >> the most work! >>> >>> Cheers Trevor >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From thomas.a.down at googlemail.com Thu May 13 09:43:04 2010 From: thomas.a.down at googlemail.com (Thomas Down) Date: Thu, 13 May 2010 14:43:04 +0100 Subject: [Biojava-dev] EnsemblApi use case for DNASequences In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> Message-ID: On Thu, May 13, 2010 at 1:38 PM, Andy Yates wrote: > We already have a 2bit implementation (will be called TwoBitSequenceReader) > for storing very large pieces of Sequence but that only has support for ACGT > and no support for gaps or Ns. If you haven't already, I'd recommend taking a look at how the UCSC .2bit file format handles Ns. Quite elegant, and seems to cover most genomic use cases very efficiently. I've got a BioJava (1.x, I'm afraid) SequenceDB implementation that's backed by a .2bit file (in a MappedByteBuffer) if you're curious. Thomas. From ayates at ebi.ac.uk Thu May 13 09:48:02 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 13 May 2010 14:48:02 +0100 Subject: [Biojava-dev] EnsemblApi use case for DNASequences In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E1F6@EXSEEDMD.is.ed.ac.uk> Message-ID: <29FD44FF-0880-4F4E-A330-3182A88ACDF3@ebi.ac.uk> I did like the UCSC .2bit format but this was originally written because of a use-case when someone has stored DNA in a DB using 2bit encoding but not a .2bit file. The file format does handle gaps and Ns very well but that's only because it stores where the run of those features are. The 2bit sequence reader does not do that at the moment however it was developed with that kind of extension in mind. When it is unable to translate a String into a Compound it triggers a method which normally will throw an exception. It would be possible to override this and then provide identical functionality to 2bit but the process of doing this scared me :). Plus there's other things to be getting on with and I'm happy to leave it in a state which means it could be extended. Andy On 13 May 2010, at 14:43, Thomas Down wrote: > On Thu, May 13, 2010 at 1:38 PM, Andy Yates wrote: > We already have a 2bit implementation (will be called TwoBitSequenceReader) for storing very large pieces of Sequence but that only has support for ACGT and no support for gaps or Ns. > > If you haven't already, I'd recommend taking a look at how the UCSC .2bit file format handles Ns. Quite elegant, and seems to cover most genomic use cases very efficiently. I've got a BioJava (1.x, I'm afraid) SequenceDB implementation that's backed by a .2bit file (in a MappedByteBuffer) if you're curious. > > Thomas. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From aradwen at gmail.com Fri May 14 06:35:19 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 14 May 2010 12:35:19 +0200 Subject: [Biojava-dev] Alignment GUI and Netbeans Message-ID: Hello everybody I've Just took a look at http://www.biojava.org/wiki/BioJava:CookBook:Interfaces:Alignments_II And i'm calling your developpers experiences. I would like to adapt this code to put AlignmentPanel_II inside a Panel using netbeans Could someone tell me how to proceed ? just the first steps (illustrations would be a plus) Thanks Rad -- From jolyon.holdstock at ogt.co.uk Fri May 14 07:02:05 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Fri, 14 May 2010 12:02:05 +0100 Subject: [Biojava-dev] Alignment GUI and Netbeans[Scanned] In-Reply-To: References: Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02F28CE9@EUCLID.internal.ogtip.com> Hi Rad, The TranslatedSequencePanel class extends JComponent and so does the JPanel class. Whatever you want to do with the JPanel can be done directly with the TranslatedSequencePanel. So you can create a TranslatedSequencePanel, add whatever components you want and then use it in your application the same way as you would use a JPanel. Is this what you need to know? Jolyon -----Original Message----- From: Radwen Aniba [mailto:aradwen at gmail.com] Sent: 14 May 2010 11:35 To: biojava-dev at lists.open-bio.org Subject: [Biojava-dev] Alignment GUI and Netbeans[Scanned] Hello everybody I've Just took a look at http://www.biojava.org/wiki/BioJava:CookBook:Interfaces:Alignments_II And i'm calling your developpers experiences. I would like to adapt this code to put AlignmentPanel_II inside a Panel using netbeans Could someone tell me how to proceed ? just the first steps (illustrations would be a plus) Thanks Rad -- _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev This email has been scanned by Oxford Gene Technology Security Systems. This email has been scanned by Oxford Gene Technology Security Systems. From aradwen at gmail.com Fri May 14 07:06:22 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 14 May 2010 13:06:22 +0200 Subject: [Biojava-dev] Alignment GUI and Netbeans[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02F28CE9@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02F28CE9@EUCLID.internal.ogtip.com> Message-ID: What I want to do is to render an MSA (just like the example given in the link) But instead of creating a panel, I just want my alignment drawn in JPanel But I don't see how can I modify this code yet 2010/5/14 Jolyon Holdstock > Hi Rad, > > The TranslatedSequencePanel class extends JComponent and so does the > JPanel class. > > Whatever you want to do with the JPanel can be done directly with the > TranslatedSequencePanel. > > So you can create a TranslatedSequencePanel, add whatever components you > want and then use it in your application the same way as you would use a > JPanel. > > Is this what you need to know? > > Jolyon > > > > -----Original Message----- > From: Radwen Aniba [mailto:aradwen at gmail.com] > Sent: 14 May 2010 11:35 > To: biojava-dev at lists.open-bio.org > Subject: [Biojava-dev] Alignment GUI and Netbeans[Scanned] > > Hello everybody > > I've Just took a look at > http://www.biojava.org/wiki/BioJava:CookBook:Interfaces:Alignments_II > > And i'm calling your developpers experiences. > I would like to adapt this code to put AlignmentPanel_II inside a Panel > using netbeans > > Could someone tell me how to proceed ? just the first steps > (illustrations > would be a plus) > > Thanks > > Rad > > -- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > -- R. ANIBA Bioinformatics PhD Laboratoire de Bioinformatique et G?nomique Int?gratives, Institut de G?n?tique et de Biologie Mol?culaire et Cellulaire (IGBMC), 1 rue Laurent Fries, 67404 Illkirch, France. http://www-igbmc.u-strasbg.fr http://alnitak.u-strasbg.fr/~aniba/alexsys From jolyon.holdstock at ogt.co.uk Fri May 14 09:35:20 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Fri, 14 May 2010 14:35:20 +0100 Subject: [Biojava-dev] Alignment GUI and Netbeans[Scanned] In-Reply-To: References: <588D0DD225D05746B5D8CAE1BE971F3F02F28CE9@EUCLID.internal.ogtip.com> Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02F28D4B@EUCLID.internal.ogtip.com> Hi Rad, If you draw the alignment in a TranslatedSequencePanel that can be treated as a JPanel What do you want to do with the JPanel? Cheers, J From: Radwen Aniba [mailto:aradwen at gmail.com] Sent: 14 May 2010 12:06 To: Jolyon Holdstock Cc: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Alignment GUI and Netbeans[Scanned] What I want to do is to render an MSA (just like the example given in the link) But instead of creating a panel, I just want my alignment drawn in JPanel But I don't see how can I modify this code yet 2010/5/14 Jolyon Holdstock Hi Rad, The TranslatedSequencePanel class extends JComponent and so does the JPanel class. Whatever you want to do with the JPanel can be done directly with the TranslatedSequencePanel. So you can create a TranslatedSequencePanel, add whatever components you want and then use it in your application the same way as you would use a JPanel. Is this what you need to know? Jolyon -----Original Message----- From: Radwen Aniba [mailto:aradwen at gmail.com] Sent: 14 May 2010 11:35 To: biojava-dev at lists.open-bio.org Subject: [Biojava-dev] Alignment GUI and Netbeans[Scanned] Hello everybody I've Just took a look at http://www.biojava.org/wiki/BioJava:CookBook:Interfaces:Alignments_II And i'm calling your developpers experiences. I would like to adapt this code to put AlignmentPanel_II inside a Panel using netbeans Could someone tell me how to proceed ? just the first steps (illustrations would be a plus) Thanks Rad -- _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev This email has been scanned by Oxford Gene Technology Security Systems. This email has been scanned by Oxford Gene Technology Security Systems. -- R. ANIBA Bioinformatics PhD Laboratoire de Bioinformatique et G?nomique Int?gratives, Institut de G?n?tique et de Biologie Mol?culaire et Cellulaire (IGBMC), 1 rue Laurent Fries, 67404 Illkirch, France. http://www-igbmc.u-strasbg.fr http://alnitak.u-strasbg.fr/~aniba/alexsys This email has been scanned by Oxford Gene Technology Security Systems. This email has been scanned by Oxford Gene Technology Security Systems. From trevor.paterson at roslin.ed.ac.uk Mon May 17 08:15:54 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Mon, 17 May 2010 13:15:54 +0100 Subject: [Biojava-dev] Errors versus Exceptions Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> resending cos of bad headers ________________________________ From: PATERSON Trevor Sent: 17 May 2010 13:02 To: biojava-dev at lists.open-bio.org Subject: Errors versus Exceptions Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching and responding to thrown errors. for example the setContents routine of the Sequence Readers throws a CompoundNotFoundError if there is no mapping for a base - but I can't see what is done with this. According to the Java Lang Spec, errors are used for "serious problems that a reasonable application should not try to catch" - but I would have thought finding an unrecognized base in a sequence was just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors Cheers Trevor -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From HWillis at scripps.edu Mon May 17 08:51:28 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 17 May 2010 08:51:28 -0400 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> Message-ID: <21563263-02F3-494E-BD37-DEA988CCC059@scripps.edu> Trevor I think this was discussed for a minimal amount of time at the Hackathon and needs more attention. I agree we should make better use of exceptions versus errors. Scooter On May 17, 2010, at 8:15 AM, PATERSON Trevor wrote: > resending cos of bad headers > > ________________________________ > > From: PATERSON Trevor > Sent: 17 May 2010 13:02 > To: biojava-dev at lists.open-bio.org > Subject: Errors versus Exceptions > > > Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions > - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. > > Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. > We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. > > It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, > because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching > and responding to thrown errors. > > for example the setContents routine of the Sequence Readers throws a CompoundNotFoundError if there is no mapping for a base - > but I can't see what is done with this. > > According to the Java Lang Spec, errors are used for "serious problems that a reasonable application > should not try to catch" - but I would have thought finding an unrecognized base in a sequence was > just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported > in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which > seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors > > Cheers Trevor > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From ayates at ebi.ac.uk Mon May 17 09:24:41 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 17 May 2010 14:24:41 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> Message-ID: Is the problem in the code the fact that errors are used or that they are not declared on the method signatures? On a more design note I do not like checked exceptions for two reasons. The first is they seem to make more junior developers catch Exception and swallow it. The second is whenever I've been Java coding in the past whenever a checked exception gets thrown (say IOException because of incorrect file permissions) I cannot deal with it which in the past has meant I either forward on the problem or re-throw in an unchecked exception. That said as Scooter mentioned the exception system was rushed out in the hackathon and did not have much work put into it. Andy On 17 May 2010, at 13:15, PATERSON Trevor wrote: > resending cos of bad headers > > ________________________________ > > From: PATERSON Trevor > Sent: 17 May 2010 13:02 > To: biojava-dev at lists.open-bio.org > Subject: Errors versus Exceptions > > > Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions > - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. > > Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. > We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. > > It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, > because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching > and responding to thrown errors. > > for example the setContents routine of the Sequence Readers throws a CompoundNotFoundError if there is no mapping for a base - > but I can't see what is done with this. > > According to the Java Lang Spec, errors are used for "serious problems that a reasonable application > should not try to catch" - but I would have thought finding an unrecognized base in a sequence was > just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported > in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which > seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors > > Cheers Trevor > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Mon May 17 09:37:32 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 17 May 2010 09:37:32 -0400 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> Message-ID: <6ABE7208-CE97-482B-BD03-1425AF594E14@scripps.edu> Andy I think it is reasonable in the creation of sequences to have invalid chars passed in where we have probably covered the obvious cases - . etc. We recently came across N for DNA and X for aligned output from Blast. I know in our lab they started using different chars for PTM's so it will be tough to predict all cases. The developer would have two options at this point based on the error/exception they change the sequence data to a correct format or do their own custom compound mapping. Since that is more of a user input error and outside our control we should probably throw the appropriate exception and let the user deal with it. Scooter On May 17, 2010, at 9:24 AM, Andy Yates wrote: > Is the problem in the code the fact that errors are used or that they are not declared on the method signatures? > > On a more design note I do not like checked exceptions for two reasons. The first is they seem to make more junior developers catch Exception and swallow it. The second is whenever I've been Java coding in the past whenever a checked exception gets thrown (say IOException because of incorrect file permissions) I cannot deal with it which in the past has meant I either forward on the problem or re-throw in an unchecked exception. > > That said as Scooter mentioned the exception system was rushed out in the hackathon and did not have much work put into it. > > Andy > > On 17 May 2010, at 13:15, PATERSON Trevor wrote: > >> resending cos of bad headers >> >> ________________________________ >> >> From: PATERSON Trevor >> Sent: 17 May 2010 13:02 >> To: biojava-dev at lists.open-bio.org >> Subject: Errors versus Exceptions >> >> >> Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions >> - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. >> >> Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. >> We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. >> >> It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, >> because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching >> and responding to thrown errors. >> >> for example the setContents routine of the Sequence Readers throws a CompoundNotFoundError if there is no mapping for a base - >> but I can't see what is done with this. >> >> According to the Java Lang Spec, errors are used for "serious problems that a reasonable application >> should not try to catch" - but I would have thought finding an unrecognized base in a sequence was >> just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported >> in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which >> seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors >> >> Cheers Trevor >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From ayates at ebi.ac.uk Mon May 17 09:49:57 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 17 May 2010 14:49:57 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <6ABE7208-CE97-482B-BD03-1425AF594E14@scripps.edu> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <6ABE7208-CE97-482B-BD03-1425AF594E14@scripps.edu> Message-ID: <6CE441D2-1CC4-4838-B187-540E09439F28@ebi.ac.uk> I think this opens us up to one of the initial points about BioJava3 was to make it easier for the entry level programmer as well as to the more experienced programmer. Much like having one's cake & eating it too :). Assuming the following signature: setContents() throws UnknownCompoundException; What would give us the best API. Where 1). UnknownCompoundException extends Exception This has the knock on effect of turning our Sequence constructors into: DNASequence s; try { s = new DNASequence("ATGC"); } catch(UnknownCompoundException e) { //.... } There is of course the option of swallowing/reporting exceptions under certain circumstances (say this constructor) but that results in inconsistent API behaviour. 2). UnknownCompoundException extends RuntimeException Has the problem that should anything *need* to deal with the exception it doesn't have to. I don't know what the right answer is here so I'm happy to be led by whoever feels the strongest about it Andy On 17 May 2010, at 14:37, Scooter Willis wrote: > Andy > > I think it is reasonable in the creation of sequences to have invalid chars passed in where we have probably covered the obvious cases - . etc. We recently came across N for DNA and X for aligned output from Blast. I know in our lab they started using different chars for PTM's so it will be tough to predict all cases. The developer would have two options at this point based on the error/exception they change the sequence data to a correct format or do their own custom compound mapping. Since that is more of a user input error and outside our control we should probably throw the appropriate exception and let the user deal with it. > > Scooter > > > > On May 17, 2010, at 9:24 AM, Andy Yates wrote: > >> Is the problem in the code the fact that errors are used or that they are not declared on the method signatures? >> >> On a more design note I do not like checked exceptions for two reasons. The first is they seem to make more junior developers catch Exception and swallow it. The second is whenever I've been Java coding in the past whenever a checked exception gets thrown (say IOException because of incorrect file permissions) I cannot deal with it which in the past has meant I either forward on the problem or re-throw in an unchecked exception. >> >> That said as Scooter mentioned the exception system was rushed out in the hackathon and did not have much work put into it. >> >> Andy >> >> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >> >>> resending cos of bad headers >>> >>> ________________________________ >>> >>> From: PATERSON Trevor >>> Sent: 17 May 2010 13:02 >>> To: biojava-dev at lists.open-bio.org >>> Subject: Errors versus Exceptions >>> >>> >>> Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions >>> - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. >>> >>> Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. >>> We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. >>> >>> It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, >>> because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching >>> and responding to thrown errors. >>> >>> for example the setContents routine of the Sequence Readers throws a CompoundNotFoundError if there is no mapping for a base - >>> but I can't see what is done with this. >>> >>> According to the Java Lang Spec, errors are used for "serious problems that a reasonable application >>> should not try to catch" - but I would have thought finding an unrecognized base in a sequence was >>> just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported >>> in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which >>> seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors >>> >>> Cheers Trevor >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From trevor.paterson at roslin.ed.ac.uk Mon May 17 09:58:08 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Mon, 17 May 2010 14:58:08 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> > Is the problem in the code the fact that errors are used or > that they are not declared on the method signatures? Declaring them would help - but Even if you declare Errors in the signature, there is nothing to force you to catch them. So nothing to alert you to the chance that they may be thrown when you are writing code. public class Demo { public static void main(String[] args) throws Exception{ Demo.demo(); System.out.println("never reached"); } public static void demo() throws Error { throw new Error("kills the runtime"); } } Compiles OK with or without declaring the throwable - but throws a runtime error. :. IMHO throwing errors is not very useful to developer - they can use them - IF they know the code well enough to know they might be thrown If Exceptions were used 1. they would HAVE to be dealt with - if a developer just swallows something bad, than that's their responsibility for shitty code ;) 2. like Errors they would be extendable, so real case dataloaders could use their own exception types and decide how to deal with or ignore them I guess in my background of coding for data access over the wire with jdbc and webservices - you need to be aware of the real probability of connection and request failures - so explicitly using Exceptions forces the developer to code defensively > > On a more design note I do not like checked exceptions for > two reasons. The first is they seem to make more junior > developers catch Exception and swallow it. The second is > whenever I've been Java coding in the past whenever a checked > exception gets thrown (say IOException because of incorrect > file permissions) I cannot deal with it which in the past has > meant I either forward on the problem or re-throw in an > unchecked exception. > > That said as Scooter mentioned the exception system was > rushed out in the hackathon and did not have much work put into it. > > Andy > > On 17 May 2010, at 13:15, PATERSON Trevor wrote: > > > resending cos of bad headers > > > > ________________________________ > > > > From: PATERSON Trevor > > Sent: 17 May 2010 13:02 > > To: biojava-dev at lists.open-bio.org > > Subject: Errors versus Exceptions > > > > > > Could I ask a quick question about why BJ3 seems to use > Errors rather > > than Exceptions > > - maybe this is already documented somewhere on the wiki > for BJ3 or 1 - and you can point me there. > > > > Obviously for us dataloading from remote databases we need > to have a method to catch connection,sql, datamapping errors etc. > > We ususally throw DataAccessExceptions when this happens, > which wrap any java.net, Ibatis and SQL exceptions. > > > > It is difficult for us to plug in our lazyload over your > signatures in > > the BJ3 sequence readers, as these don't throw exceptions, > because you > > seem to use Errors throughout - which dont need to be > declared. Infact I cant actually see examples of you catching > and responding to thrown errors. > > > > for example the setContents routine of the Sequence Readers > throws a > > CompoundNotFoundError if there is no mapping for a base - > but I can't see what is done with this. > > > > According to the Java Lang Spec, errors are used for > "serious problems > > that a reasonable application should not try to catch" - > but I would > > have thought finding an unrecognized base in a sequence was > just the > > sort of thing that should be thrown and caught and acted > on. As error > > throwables are not reported in the signature - developers > don't have > > any clue ( or requirement) that they should be catching and dealing > > with errors - which seems a bit dangerous to me... If > Exceptions were > > thrownrather than Errors that would force the developer to handle > > runtime errors > > > > Cheers Trevor > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From HWillis at scripps.edu Mon May 17 10:09:09 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 17 May 2010 10:09:09 -0400 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <6CE441D2-1CC4-4838-B187-540E09439F28@ebi.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <6ABE7208-CE97-482B-BD03-1425AF594E14@scripps.edu> <6CE441D2-1CC4-4838-B187-540E09439F28@ebi.ac.uk> Message-ID: Andy I don't have a good solution either. On this one I think we are stuck and probably need to "error" on the side of throwing an exception where the developer needs to deal with it. Typically I will declare methods "throws Exception" that I can't deal with exceptions and want to propagate the error without forcing try catch everywhere only to throw the exception again or exit. It would be nice through annotations to alter the behavior of Error versus Exception but beyond our reach. A quick google search finds this as a discussion point but I didn't see anything that looked like it would solve our problem. Only spent 5 minutes to see if others had provided guidance/solution. Scooter On May 17, 2010, at 9:49 AM, Andy Yates wrote: > I think this opens us up to one of the initial points about BioJava3 was to make it easier for the entry level programmer as well as to the more experienced programmer. Much like having one's cake & eating it too :). Assuming the following signature: > > setContents() throws UnknownCompoundException; > > What would give us the best API. Where > > 1). UnknownCompoundException extends Exception > > This has the knock on effect of turning our Sequence constructors into: > > DNASequence s; > try { > s = new DNASequence("ATGC"); > } > catch(UnknownCompoundException e) { > //.... > } > > There is of course the option of swallowing/reporting exceptions under certain circumstances (say this constructor) but that results in inconsistent API behaviour. > > 2). UnknownCompoundException extends RuntimeException > > Has the problem that should anything *need* to deal with the exception it doesn't have to. > > > I don't know what the right answer is here so I'm happy to be led by whoever feels the strongest about it > > Andy > > On 17 May 2010, at 14:37, Scooter Willis wrote: > >> Andy >> >> I think it is reasonable in the creation of sequences to have invalid chars passed in where we have probably covered the obvious cases - . etc. We recently came across N for DNA and X for aligned output from Blast. I know in our lab they started using different chars for PTM's so it will be tough to predict all cases. The developer would have two options at this point based on the error/exception they change the sequence data to a correct format or do their own custom compound mapping. Since that is more of a user input error and outside our control we should probably throw the appropriate exception and let the user deal with it. >> >> Scooter >> >> >> >> On May 17, 2010, at 9:24 AM, Andy Yates wrote: >> >>> Is the problem in the code the fact that errors are used or that they are not declared on the method signatures? >>> >>> On a more design note I do not like checked exceptions for two reasons. The first is they seem to make more junior developers catch Exception and swallow it. The second is whenever I've been Java coding in the past whenever a checked exception gets thrown (say IOException because of incorrect file permissions) I cannot deal with it which in the past has meant I either forward on the problem or re-throw in an unchecked exception. >>> >>> That said as Scooter mentioned the exception system was rushed out in the hackathon and did not have much work put into it. >>> >>> Andy >>> >>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >>> >>>> resending cos of bad headers >>>> >>>> ________________________________ >>>> >>>> From: PATERSON Trevor >>>> Sent: 17 May 2010 13:02 >>>> To: biojava-dev at lists.open-bio.org >>>> Subject: Errors versus Exceptions >>>> >>>> >>>> Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions >>>> - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. >>>> >>>> Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. >>>> We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. >>>> >>>> It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, >>>> because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching >>>> and responding to thrown errors. >>>> >>>> for example the setContents routine of the Sequence Readers throws a CompoundNotFoundError if there is no mapping for a base - >>>> but I can't see what is done with this. >>>> >>>> According to the Java Lang Spec, errors are used for "serious problems that a reasonable application >>>> should not try to catch" - but I would have thought finding an unrecognized base in a sequence was >>>> just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported >>>> in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which >>>> seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors >>>> >>>> Cheers Trevor >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Mon May 17 10:11:48 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 17 May 2010 07:11:48 -0700 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <6CE441D2-1CC4-4838-B187-540E09439F28@ebi.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <6ABE7208-CE97-482B-BD03-1425AF594E14@scripps.edu> <6CE441D2-1CC4-4838-B187-540E09439F28@ebi.ac.uk> Message-ID: Hi Andy, On this topic I am with Josha Bloch (Effective Java) : His point is that checked exceptions should be used for recoverable errors and runtime exceptions for programming errors. Passing in a wrong sequence is a recoverable errors (the software should ask the user to select a different sequence). As such this would suggest the use of checked exceptions. Andreas On Mon, May 17, 2010 at 6:49 AM, Andy Yates wrote: > I think this opens us up to one of the initial points about BioJava3 was to make it easier for the entry level programmer as well as to the more experienced programmer. Much like having one's cake & eating it too :). Assuming the following signature: > > setContents() throws UnknownCompoundException; > > What would give us the best API. Where > > 1). UnknownCompoundException extends Exception > > This has the knock on effect of turning our Sequence constructors into: > > DNASequence s; > try { > ?s = new DNASequence("ATGC"); > } > catch(UnknownCompoundException e) { > ?//.... > } > > There is of course the option of swallowing/reporting exceptions under certain circumstances (say this constructor) but that results in inconsistent API behaviour. > > 2). UnknownCompoundException extends RuntimeException > > Has the problem that should anything *need* to deal with the exception it doesn't have to. > > > I don't know what the right answer is here so I'm happy to be led by whoever feels the strongest about it > > Andy > > On 17 May 2010, at 14:37, Scooter Willis wrote: > >> Andy >> >> I think it is reasonable in the creation of sequences to have invalid chars passed in where we have probably covered the obvious cases - . etc. We recently came across N for DNA and X for aligned output from Blast. I know in our lab they started using different chars for PTM's so it will be tough to predict all cases. The developer would have two options at this point based on the error/exception they change the sequence data to a correct format or do their own custom compound mapping. Since that is more of a user input error and outside our control we should probably throw the appropriate exception and let the user deal with it. >> >> Scooter >> >> >> >> On May 17, 2010, at 9:24 AM, Andy Yates wrote: >> >>> Is the problem in the code the fact that errors are used or that they are not declared on the method signatures? >>> >>> On a more design note I do not like checked exceptions for two reasons. The first is they seem to make more junior developers catch Exception and swallow it. The second is whenever I've been Java coding in the past whenever a checked exception gets thrown (say IOException because of incorrect file permissions) I cannot deal with it which in the past has meant I either forward on the problem or re-throw in an unchecked exception. >>> >>> That said as Scooter mentioned the exception system was rushed out in the hackathon and did not have much work put into it. >>> >>> Andy >>> >>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >>> >>>> resending cos of bad headers >>>> >>>> ________________________________ >>>> >>>> From: PATERSON Trevor >>>> Sent: 17 May 2010 13:02 >>>> To: biojava-dev at lists.open-bio.org >>>> Subject: Errors versus Exceptions >>>> >>>> >>>> Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions >>>> - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. >>>> >>>> Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. >>>> We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. >>>> >>>> It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, >>>> because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching >>>> and responding to thrown errors. >>>> >>>> for example the setContents routine of the Sequence Readers throws a ?CompoundNotFoundError if there is no mapping for a base - >>>> but I can't see what is done with this. >>>> >>>> According to the Java Lang Spec, errors are used for "serious problems that a reasonable application >>>> should not try to catch" - but I would have thought finding an unrecognized base in a sequence was >>>> just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported >>>> in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which >>>> seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors >>>> >>>> Cheers Trevor >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From ayates at ebi.ac.uk Mon May 17 10:26:25 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 17 May 2010 15:26:25 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <6ABE7208-CE97-482B-BD03-1425AF594E14@scripps.edu> <6CE441D2-1CC4-4838-B187-540E09439F28@ebi.ac.uk> Message-ID: <0DA4461A-3299-4F5F-8036-236C26899241@ebi.ac.uk> Hi Andreas, That is a fair point and Effective Java is the Perl Best Practices for Java. ERH (opinionated but right more often than not) also gives some good advice on this: http://cafe.elharo.com/programming/bruce-eckel-is-wrong If everyone is okay with checked exceptions then that's fine. We'll have to tighten up the parts of the API which currently throw these and make sure that we do not pass back Exception but the right types of Exceptions where applicable. Andy On 17 May 2010, at 15:11, Andreas Prlic wrote: > Hi Andy, > > On this topic I am with Josha Bloch (Effective Java) : His point is > that checked exceptions should be used for recoverable errors and > runtime exceptions for programming errors. > > Passing in a wrong sequence is a recoverable errors (the software > should ask the user to select a different sequence). As such this > would suggest the use of checked exceptions. > > Andreas > > > > On Mon, May 17, 2010 at 6:49 AM, Andy Yates wrote: >> I think this opens us up to one of the initial points about BioJava3 was to make it easier for the entry level programmer as well as to the more experienced programmer. Much like having one's cake & eating it too :). Assuming the following signature: >> >> setContents() throws UnknownCompoundException; >> >> What would give us the best API. Where >> >> 1). UnknownCompoundException extends Exception >> >> This has the knock on effect of turning our Sequence constructors into: >> >> DNASequence s; >> try { >> s = new DNASequence("ATGC"); >> } >> catch(UnknownCompoundException e) { >> //.... >> } >> >> There is of course the option of swallowing/reporting exceptions under certain circumstances (say this constructor) but that results in inconsistent API behaviour. >> >> 2). UnknownCompoundException extends RuntimeException >> >> Has the problem that should anything *need* to deal with the exception it doesn't have to. >> >> >> I don't know what the right answer is here so I'm happy to be led by whoever feels the strongest about it >> >> Andy >> >> On 17 May 2010, at 14:37, Scooter Willis wrote: >> >>> Andy >>> >>> I think it is reasonable in the creation of sequences to have invalid chars passed in where we have probably covered the obvious cases - . etc. We recently came across N for DNA and X for aligned output from Blast. I know in our lab they started using different chars for PTM's so it will be tough to predict all cases. The developer would have two options at this point based on the error/exception they change the sequence data to a correct format or do their own custom compound mapping. Since that is more of a user input error and outside our control we should probably throw the appropriate exception and let the user deal with it. >>> >>> Scooter >>> >>> >>> >>> On May 17, 2010, at 9:24 AM, Andy Yates wrote: >>> >>>> Is the problem in the code the fact that errors are used or that they are not declared on the method signatures? >>>> >>>> On a more design note I do not like checked exceptions for two reasons. The first is they seem to make more junior developers catch Exception and swallow it. The second is whenever I've been Java coding in the past whenever a checked exception gets thrown (say IOException because of incorrect file permissions) I cannot deal with it which in the past has meant I either forward on the problem or re-throw in an unchecked exception. >>>> >>>> That said as Scooter mentioned the exception system was rushed out in the hackathon and did not have much work put into it. >>>> >>>> Andy >>>> >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >>>> >>>>> resending cos of bad headers >>>>> >>>>> ________________________________ >>>>> >>>>> From: PATERSON Trevor >>>>> Sent: 17 May 2010 13:02 >>>>> To: biojava-dev at lists.open-bio.org >>>>> Subject: Errors versus Exceptions >>>>> >>>>> >>>>> Could I ask a quick question about why BJ3 seems to use Errors rather than Exceptions >>>>> - maybe this is already documented somewhere on the wiki for BJ3 or 1 - and you can point me there. >>>>> >>>>> Obviously for us dataloading from remote databases we need to have a method to catch connection,sql, datamapping errors etc. >>>>> We ususally throw DataAccessExceptions when this happens, which wrap any java.net, Ibatis and SQL exceptions. >>>>> >>>>> It is difficult for us to plug in our lazyload over your signatures in the BJ3 sequence readers, as these don't throw exceptions, >>>>> because you seem to use Errors throughout - which dont need to be declared. Infact I cant actually see examples of you catching >>>>> and responding to thrown errors. >>>>> >>>>> for example the setContents routine of the Sequence Readers throws a CompoundNotFoundError if there is no mapping for a base - >>>>> but I can't see what is done with this. >>>>> >>>>> According to the Java Lang Spec, errors are used for "serious problems that a reasonable application >>>>> should not try to catch" - but I would have thought finding an unrecognized base in a sequence was >>>>> just the sort of thing that should be thrown and caught and acted on. As error throwables are not reported >>>>> in the signature - developers don't have any clue ( or requirement) that they should be catching and dealing with errors - which >>>>> seems a bit dangerous to me... If Exceptions were thrownrather than Errors that would force the developer to handle runtime errors >>>>> >>>>> Cheers Trevor >>>>> >>>>> -- >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >>>>> >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Mon May 17 10:30:46 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 17 May 2010 15:30:46 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> Message-ID: <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> There's something that irks me with checked exceptions & I found code worked better once I went to a fail-fast method of coding but for an API I can see the usefulness of them. Wow I think this could be the fastest any group has ever dealt with the Checked/Unchecked exception argument :) Andy On 17 May 2010, at 14:58, PATERSON Trevor wrote: > > >> Is the problem in the code the fact that errors are used or >> that they are not declared on the method signatures? > > Declaring them would help - but > Even if you declare Errors in the signature, there is nothing to force you to catch them. > So nothing to alert you to the chance that they may be thrown when you are writing code. > > public class Demo { > > public static void main(String[] args) throws Exception{ > > Demo.demo(); > System.out.println("never reached"); > > } > > public static void demo() throws Error { > throw new Error("kills the runtime"); > } > } > > Compiles OK with or without declaring the throwable - but throws a runtime error. > > :. IMHO throwing errors is not very useful to developer > - they can use them - IF they know the code well enough to know they might be thrown > > If Exceptions were used > 1. they would HAVE to be dealt with - if a developer just swallows something bad, than that's their responsibility for shitty code ;) > 2. like Errors they would be extendable, so real case dataloaders could use their own exception types and decide how to deal with or ignore them > > I guess in my background of coding for data access over the wire with jdbc and webservices > - you need to be aware of the real probability of connection and request failures > - so explicitly using Exceptions forces the developer to code defensively > > > > >> >> On a more design note I do not like checked exceptions for >> two reasons. The first is they seem to make more junior >> developers catch Exception and swallow it. The second is >> whenever I've been Java coding in the past whenever a checked >> exception gets thrown (say IOException because of incorrect >> file permissions) I cannot deal with it which in the past has >> meant I either forward on the problem or re-throw in an >> unchecked exception. >> >> That said as Scooter mentioned the exception system was >> rushed out in the hackathon and did not have much work put into it. >> >> Andy >> >> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >> >>> resending cos of bad headers >>> >>> ________________________________ >>> >>> From: PATERSON Trevor >>> Sent: 17 May 2010 13:02 >>> To: biojava-dev at lists.open-bio.org >>> Subject: Errors versus Exceptions >>> >>> >>> Could I ask a quick question about why BJ3 seems to use >> Errors rather >>> than Exceptions >>> - maybe this is already documented somewhere on the wiki >> for BJ3 or 1 - and you can point me there. >>> >>> Obviously for us dataloading from remote databases we need >> to have a method to catch connection,sql, datamapping errors etc. >>> We ususally throw DataAccessExceptions when this happens, >> which wrap any java.net, Ibatis and SQL exceptions. >>> >>> It is difficult for us to plug in our lazyload over your >> signatures in >>> the BJ3 sequence readers, as these don't throw exceptions, >> because you >>> seem to use Errors throughout - which dont need to be >> declared. Infact I cant actually see examples of you catching >> and responding to thrown errors. >>> >>> for example the setContents routine of the Sequence Readers >> throws a >>> CompoundNotFoundError if there is no mapping for a base - >> but I can't see what is done with this. >>> >>> According to the Java Lang Spec, errors are used for >> "serious problems >>> that a reasonable application should not try to catch" - >> but I would >>> have thought finding an unrecognized base in a sequence was >> just the >>> sort of thing that should be thrown and caught and acted >> on. As error >>> throwables are not reported in the signature - developers >> don't have >>> any clue ( or requirement) that they should be catching and dealing >>> with errors - which seems a bit dangerous to me... If >> Exceptions were >>> thrownrather than Errors that would force the developer to handle >>> runtime errors >>> >>> Cheers Trevor >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From trevor.paterson at roslin.ed.ac.uk Mon May 17 11:24:34 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Mon, 17 May 2010 16:24:34 +0100 Subject: [Biojava-dev] Using DNASequence reverseComplement Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> Sorry for raising that behemoth earlier.. I have a separate problem with the DNASequence API - Probably I just don't understand how to use the View objects If I make a DNASequence DNASequence seq = new DNASequence("AAAAACCCCGGGTT"); i.e. length = 14, I might reasonably want to get the ReverseComplement of bases 11-14, which should 'be' "AACC" But I cannot manage to get this in one easy step.... seq.toString(): AAAAACCCCGGGTT --> FINE seq.getReverseComplement().getSequenceAsString(): AACCCGGGGTTTTT --> FINE But when I try to use bounds on this complement - methods refer back to the original seq's iterator, not the complement seq.getReverseComplement().getSequenceAsString(11,14,Strand.POSITIVE): GGTT i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE) seq.getReverseComplement().getSequenceAsString(11,14,Strand.NEGATIVE): TTGG i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE) Is this the desired behaviour? How would I get the desired reverseComplement fragment? The only obvious way that I can see is DNASequence subseq = new DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE)); System.out.println(""+ subseq.getReverseComplement().getSequenceAsString()); _____________________________________________________________________________________________ On a related point I was mightily confused by the Strand.POSITIVE/Strand.NEGATIVE enumeration I was naively interpreting them to refer to the strand of the DNA: Whereas they infact refer to the directionality of the Iterator *on the same Strand* A better name might be Direction:FORWARDS/Direction.BACKWARDS? Positive and negative strand has loaded biological meaning for newbies like me ( sense versus antisense ) So I made the assumption that a Strand.NEGATIVE call would itself reverseComplement -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Mon May 17 12:07:36 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 17 May 2010 17:07:36 +0100 Subject: [Biojava-dev] Using DNASequence reverseComplement In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> Message-ID: Hi Trevor, You've stumbled right into something myself & Scooter are trying to clean-up now. The assumption I had originally made is that all operations on things like getSequenceAsString() would go via the view since that's where the logic is located for both the reversing & the complementing code. That call now delegates onto the backing store and not the view which means you get these very odd results happening. For the moment I think the following code would do what you're expecting: seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() It's annoying because you're in the reverse coordinate system you've got to reverse the original coordinates so asking for position 11,14 just isn't going to work. The other way of working with this would be to construct the views yourself and pass in a subsequence of the original sequence i.e. new ReversedSequenceView(new ComplementSequenceView(seq.getSubSequence(11,14)); This is really a problem with the erasure of the Sequence types from DNASequence. If DNASequence returned the same type from its subsequence method then you would just call revComp on that and it would have been fine. The thing to take away from are: * getSequenceAsString(Integer, Integer, Strand) is not well supported atmo * So long as we are sure it should remain it will be * There should be no reason to materialise the Sequence into a String to get a part of the API working. If there is then we've messed up Andy p.s. The strand stuff is confusing; originally it was meant to be +ve & -ve strands but assumed that the Sequence you had was always on the +ve strand. Eventually the meaning will come back but will require the methods to be more aware of the strand DNA is on to make the right call about what you want to do. This all ties in with circular genomes support and locations On 17 May 2010, at 16:24, PATERSON Trevor wrote: > Sorry for raising that behemoth earlier.. > > I have a separate problem with the DNASequence API - > > Probably I just don't understand how to use the View objects > > > If I make a DNASequence > > DNASequence seq = new DNASequence("AAAAACCCCGGGTT"); > > i.e. length = 14, > > I might reasonably want to get the ReverseComplement of bases 11-14, which should 'be' "AACC" > > But I cannot manage to get this in one easy step.... > > seq.toString(): AAAAACCCCGGGTT --> FINE > > seq.getReverseComplement().getSequenceAsString(): AACCCGGGGTTTTT --> FINE > > But when I try to use bounds on this complement - methods refer back to the original seq's iterator, not the complement > > seq.getReverseComplement().getSequenceAsString(11,14,Strand.POSITIVE): GGTT > i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE) > seq.getReverseComplement().getSequenceAsString(11,14,Strand.NEGATIVE): TTGG > i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE) > > Is this the desired behaviour? How would I get the desired reverseComplement fragment? > > The only obvious way that I can see is > > DNASequence subseq = new DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE)); > System.out.println(""+ subseq.getReverseComplement().getSequenceAsString()); > > _____________________________________________________________________________________________ > > On a related point I was mightily confused by the Strand.POSITIVE/Strand.NEGATIVE enumeration > > I was naively interpreting them to refer to the strand of the DNA: > Whereas they infact refer to the directionality of the Iterator *on the same Strand* > > A better name might be Direction:FORWARDS/Direction.BACKWARDS? > Positive and negative strand has loaded biological meaning for newbies like me ( sense versus antisense ) > So I made the assumption that a Strand.NEGATIVE call would itself reverseComplement > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From HWillis at scripps.edu Mon May 17 12:31:33 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 17 May 2010 12:31:33 -0400 Subject: [Biojava-dev] Using DNASequence reverseComplement In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> Message-ID: <15ACF8DF-34BB-4C64-9758-8308259C2DDF@scripps.edu> Trevor Andy and I have been working on the concept of views and how to handle this so it is not confusing. We have not accomplished that goal yet. We both have different approaches we are trying to sort out and you may have come across a bug in methods that should be private or we haven't explored all possible use cases with corresponding test cases. The problem is ReverseComplement implies negative strand and where the api currently makes it confusing in that methods are exposed that allow these type of non-sensical use cases. Comments below On May 17, 2010, at 11:24 AM, PATERSON Trevor wrote: > Sorry for raising that behemoth earlier.. > > I have a separate problem with the DNASequence API - > > Probably I just don't understand how to use the View objects > > > If I make a DNASequence > > DNASequence seq = new DNASequence("AAAAACCCCGGGTT"); > > i.e. length = 14, > > I might reasonably want to get the ReverseComplement of bases 11-14, which should 'be' "AACC" > > But I cannot manage to get this in one easy step.... > > seq.toString(): AAAAACCCCGGGTT --> FINE > > seq.getReverseComplement().getSequenceAsString(): AACCCGGGGTTTTT --> FINE > > But when I try to use bounds on this complement - methods refer back to the original seq's iterator, not the complement > We would like to minimize the use of String as a intermediary to go between objects. Strings should be used for creating the sequence and export. Andy has some sequence views where you would pass in DNASequence to a SequenceView to get the desired transformation. I will let Andy comment on what he recommends. If you look through the test cases you will see what he has setup. > seq.getReverseComplement().getSequenceAsString(11,14,Strand.POSITIVE): GGTT > i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE) > seq.getReverseComplement().getSequenceAsString(11,14,Strand.NEGATIVE): TTGG > i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE) > > Is this the desired behaviour? How would I get the desired reverseComplement fragment? > > The only obvious way that I can see is > > DNASequence subseq = new DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE)); > System.out.println(""+ subseq.getReverseComplement().getSequenceAsString()); > > _____________________________________________________________________________________________ > > On a related point I was mightily confused by the Strand.POSITIVE/Strand.NEGATIVE enumeration > This is a struggle between the computer science domain and the biology domain. Currently Sense and Strand have meaning and we need to stick with that if only to force the correct vocabulary on the programmer who needs to discuss with the biologists. A DNA sequence or RNASequence has different interpretations depending on the context where it is used. You may want to look at the sequence in both directions and that does not imply reverseComplement behavior. For example you are working on DNA patterns for DNA docking/binding you need to consider both strands of DNA as they form a 3d structure. > I was naively interpreting them to refer to the strand of the DNA: > Whereas they infact refer to the directionality of the Iterator *on the same Strand* > > A better name might be Direction:FORWARDS/Direction.BACKWARDS? > Positive and negative strand has loaded biological meaning for newbies like me ( sense versus antisense ) > So I made the assumption that a Strand.NEGATIVE call would itself reverseComplement > -- Keep the input/feedback coming as that is the best way for us to sort out the API. > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev Thanks Scooter From markjschreiber at gmail.com Mon May 17 23:53:52 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 18 May 2010 11:53:52 +0800 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> Message-ID: To give some historical perspective. This started happening in BioJava 1.5. We switched BioException to extend RuntimeException (which is not checked). This came from some thinking at the time that Java's exception checking was a little bit broken (as exemplified by BioJava). Checked exceptions are good as long as you don't swallow them in the API or just use them to spew a stack trace. Spitting a stack trace is pointless as the JVM will do a much better job of this when it dumps you out. Swallowing them in the API is stupid because it is the developer who needs to do something with it. Unless it is truly an Error which is the only time an Error should be used (unrecoverable problems). RuntimeExceptions should be used for any type of exception that can be avoided by defensive programming. NullPointerExceptions and IndexOutOfBounds exceptions are classic examples. There are hundreds of places in Java where you could get one of these but you can easily avoid them by checking if a collection contains an item, doing simple if( x == null) checks, using array.length etc. For this reason these and some other Java exceptions extend RuntimeException and are not checked. Can you imagine what Java would look like if you actually had to put every array access in a try, catch statement? Therefore, as much as possible I think BioJava3 should have exceptions that extend RuntimeException and provide a defensive mechanism to avoid having them happen. For example in creating a DNA sequence from text you could provide a validator which will check if there are any "incorrect" characters. Good code examples on the wiki should show the use of defensive programming and not just surround everything with a try catch. Importantly RuntimeExceptions should be declared so people are aware they may occur. This is not required by the compiler but it is good practice for documentation purposes. The Exception class itself should probably contain some Javadoc which explains how it can be avoided in the first place. Checked exceptions are good for cases where you can recover (although this is often more challenging than people think) but it is always easier and faster to check yourself, throwing the exception and generating the stacktrace etc takes quite a lot of effort from the JVM. Finally, I don't think you should be throwing Errors to often. Although Errors are unchecked and give you the same effect as RuntimeExceptions they imply something really bad has gone wrong. These should be reserved for things like configuration files being corrupt or things that would prevent BioJava from being used, like not finding a required JAR file or plugin. You can't recover from these. If your API has lots of places where Errors can occur it might indicate poor design. - Mark On Mon, May 17, 2010 at 10:30 PM, Andy Yates wrote: > > There's something that irks me with checked exceptions & I found code worked better once I went to a fail-fast method of coding but for an API I can see the usefulness of them. > > Wow I think this could be the fastest any group has ever dealt with the Checked/Unchecked exception argument :) > > Andy > > On 17 May 2010, at 14:58, PATERSON Trevor wrote: > > > > > > >> Is the problem in the code the fact that errors are used or > >> that they are not declared on the method signatures? > > > > Declaring them would help - but > > Even if you declare Errors in the signature, there is nothing to force you to catch them. > > So nothing to alert you to the chance that they may be thrown when you are writing code. > > > > public class Demo { > > > > ? ? ? public ?static void main(String[] args) throws Exception{ > > > > ? ? ? Demo.demo(); > > ? ? ? ? ? ? ? System.out.println("never reached"); > > > > ? ? ? } > > > > ? ? ? public static void demo() throws Error { > > ? ? ? ? ? ? ? throw new Error("kills the runtime"); > > ? ? ? } > > } > > > > Compiles OK with or without declaring the throwable - but throws a runtime error. > > > > :. IMHO throwing errors is not very useful to developer > > - they can use them - IF they know the code well enough to know they might be thrown > > > > If Exceptions were used > > 1. they would HAVE to be dealt with - if a developer just swallows something bad, than that's their responsibility for shitty code ;) > > 2. like Errors they would be extendable, so real case dataloaders could use their own exception types and decide how to deal with or ignore them > > > > I guess in my background of coding for data access over the wire with jdbc and webservices > > - you need to be aware of the real probability of connection and request failures > > - so explicitly using Exceptions forces the developer to code defensively > > > > > > > > > >> > >> On a more design note I do not like checked exceptions for > >> two reasons. The first is they seem to make more junior > >> developers catch Exception and swallow it. The second is > >> whenever I've been Java coding in the past whenever a checked > >> exception gets thrown (say IOException because of incorrect > >> file permissions) I cannot deal with it which in the past has > >> meant I either forward on the problem or re-throw in an > >> unchecked exception. > >> > >> That said as Scooter mentioned the exception system was > >> rushed out in the hackathon and did not have much work put into it. > >> > >> Andy > >> > >> On 17 May 2010, at 13:15, PATERSON Trevor wrote: > >> > >>> resending cos of bad headers > >>> > >>> ________________________________ > >>> > >>> From: PATERSON Trevor > >>> Sent: 17 May 2010 13:02 > >>> To: biojava-dev at lists.open-bio.org > >>> Subject: Errors versus Exceptions > >>> > >>> > >>> Could I ask a quick question about why BJ3 seems to use > >> Errors rather > >>> than Exceptions > >>> - maybe this is already documented somewhere on the wiki > >> for BJ3 or 1 - and you can point me there. > >>> > >>> Obviously for us dataloading from remote databases we need > >> to have a method to catch connection,sql, datamapping errors etc. > >>> We ususally throw DataAccessExceptions when this happens, > >> which wrap any java.net, Ibatis and SQL exceptions. > >>> > >>> It is difficult for us to plug in our lazyload over your > >> signatures in > >>> the BJ3 sequence readers, as these don't throw exceptions, > >> because you > >>> seem to use Errors throughout - which dont need to be > >> declared. Infact I cant actually see examples of you catching > >> and responding to thrown errors. > >>> > >>> for example the setContents routine of the Sequence Readers > >> throws a > >>> CompoundNotFoundError if there is no mapping for a base - > >> but I can't see what is done with this. > >>> > >>> According to the Java Lang Spec, errors are used for > >> "serious problems > >>> that a reasonable application should not try to catch" - > >> but I would > >>> have thought finding an unrecognized base in a sequence was > >> just the > >>> sort of thing that should be thrown and caught and acted > >> on. As error > >>> throwables are not reported in the signature - developers > >> don't have > >>> any clue ( or requirement) that they should be catching and dealing > >>> with errors - which seems a bit dangerous to me... If > >> Exceptions were > >>> thrownrather than Errors that would force the developer to handle > >>> runtime errors > >>> > >>> Cheers Trevor > >>> > >>> -- > >>> The University of Edinburgh is a charitable body, registered in > >>> Scotland, with registration number SC005336. > >>> > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> -- > >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > >> > >> > >> > >> > >> > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From ayates at ebi.ac.uk Tue May 18 04:52:35 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 18 May 2010 09:52:35 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> Message-ID: <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> You've made that previous decision a lot harder now Mark (I knew it was over too quickly). On reflection I'm not sure how well you can recover from something like an unknown compound. When you hit that exception what are you going to do with it? Something like IOExceptions for a file not being open you can do something about that (say it's NFS you could always retry after an allotted amount of time has passed). But an unknown compound ... not sure how you can deal with that apart from callbacks in the parsers. If you think of this in terms of FASTA input into an aligner program and you have an unknown compound there all you're going to do is to pass a message back to the user saying you didn't understand the input sequence with the error message from the Exception something like "Compound '?' is not supported". You could always recover attempting to move through different CompoundSets seeing if one works but are you going to do that? I guess the thing to do is to start with checked exceptions in potentially recoverable situations and if they prove to be too clumsy then as BioJava did previously we will switch to the RuntimeException as a base class. We should also endeavour to pass on exceptions and avoid excessive exception wrapping. I do think there is a situation where we can relax these rules; I would like to see the relaxation of them when a developer is building say a DNASequence but only using a String. The same should happen for helper classes. That's my 2p :) Andy On 18 May 2010, at 04:53, Mark Schreiber wrote: > To give some historical perspective. This started happening in BioJava > 1.5. We switched BioException to extend RuntimeException (which is not > checked). This came from some thinking at the time that Java's > exception checking was a little bit broken (as exemplified by > BioJava). Checked exceptions are good as long as you don't swallow > them in the API or just use them to spew a stack trace. Spitting a > stack trace is pointless as the JVM will do a much better job of this > when it dumps you out. Swallowing them in the API is stupid because it > is the developer who needs to do something with it. Unless it is truly > an Error which is the only time an Error should be used (unrecoverable > problems). > > RuntimeExceptions should be used for any type of exception that can be > avoided by defensive programming. NullPointerExceptions and > IndexOutOfBounds exceptions are classic examples. There are hundreds > of places in Java where you could get one of these but you can easily > avoid them by checking if a collection contains an item, doing simple > if( x == null) checks, using array.length etc. For this reason these > and some other Java exceptions extend RuntimeException and are not > checked. Can you imagine what Java would look like if you actually had > to put every array access in a try, catch statement? > > Therefore, as much as possible I think BioJava3 should have exceptions > that extend RuntimeException and provide a defensive mechanism to > avoid having them happen. For example in creating a DNA sequence from > text you could provide a validator which will check if there are any > "incorrect" characters. Good code examples on the wiki should show the > use of defensive programming and not just surround everything with a > try catch. Importantly RuntimeExceptions should be declared so people > are aware they may occur. This is not required by the compiler but it > is good practice for documentation purposes. The Exception class > itself should probably contain some Javadoc which explains how it can > be avoided in the first place. Checked exceptions are good for cases > where you can recover (although this is often more challenging than > people think) but it is always easier and faster to check yourself, > throwing the exception and generating the stacktrace etc takes quite a > lot of effort from the JVM. > > Finally, I don't think you should be throwing Errors to often. > Although Errors are unchecked and give you the same effect as > RuntimeExceptions they imply something really bad has gone wrong. > These should be reserved for things like configuration files being > corrupt or things that would prevent BioJava from being used, like not > finding a required JAR file or plugin. You can't recover from these. > If your API has lots of places where Errors can occur it might > indicate poor design. > > - Mark > > On Mon, May 17, 2010 at 10:30 PM, Andy Yates wrote: >> >> There's something that irks me with checked exceptions & I found code worked better once I went to a fail-fast method of coding but for an API I can see the usefulness of them. >> >> Wow I think this could be the fastest any group has ever dealt with the Checked/Unchecked exception argument :) >> >> Andy >> >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: >> >>> >>> >>>> Is the problem in the code the fact that errors are used or >>>> that they are not declared on the method signatures? >>> >>> Declaring them would help - but >>> Even if you declare Errors in the signature, there is nothing to force you to catch them. >>> So nothing to alert you to the chance that they may be thrown when you are writing code. >>> >>> public class Demo { >>> >>> public static void main(String[] args) throws Exception{ >>> >>> Demo.demo(); >>> System.out.println("never reached"); >>> >>> } >>> >>> public static void demo() throws Error { >>> throw new Error("kills the runtime"); >>> } >>> } >>> >>> Compiles OK with or without declaring the throwable - but throws a runtime error. >>> >>> :. IMHO throwing errors is not very useful to developer >>> - they can use them - IF they know the code well enough to know they might be thrown >>> >>> If Exceptions were used >>> 1. they would HAVE to be dealt with - if a developer just swallows something bad, than that's their responsibility for shitty code ;) >>> 2. like Errors they would be extendable, so real case dataloaders could use their own exception types and decide how to deal with or ignore them >>> >>> I guess in my background of coding for data access over the wire with jdbc and webservices >>> - you need to be aware of the real probability of connection and request failures >>> - so explicitly using Exceptions forces the developer to code defensively >>> >>> >>> >>> >>>> >>>> On a more design note I do not like checked exceptions for >>>> two reasons. The first is they seem to make more junior >>>> developers catch Exception and swallow it. The second is >>>> whenever I've been Java coding in the past whenever a checked >>>> exception gets thrown (say IOException because of incorrect >>>> file permissions) I cannot deal with it which in the past has >>>> meant I either forward on the problem or re-throw in an >>>> unchecked exception. >>>> >>>> That said as Scooter mentioned the exception system was >>>> rushed out in the hackathon and did not have much work put into it. >>>> >>>> Andy >>>> >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >>>> >>>>> resending cos of bad headers >>>>> >>>>> ________________________________ >>>>> >>>>> From: PATERSON Trevor >>>>> Sent: 17 May 2010 13:02 >>>>> To: biojava-dev at lists.open-bio.org >>>>> Subject: Errors versus Exceptions >>>>> >>>>> >>>>> Could I ask a quick question about why BJ3 seems to use >>>> Errors rather >>>>> than Exceptions >>>>> - maybe this is already documented somewhere on the wiki >>>> for BJ3 or 1 - and you can point me there. >>>>> >>>>> Obviously for us dataloading from remote databases we need >>>> to have a method to catch connection,sql, datamapping errors etc. >>>>> We ususally throw DataAccessExceptions when this happens, >>>> which wrap any java.net, Ibatis and SQL exceptions. >>>>> >>>>> It is difficult for us to plug in our lazyload over your >>>> signatures in >>>>> the BJ3 sequence readers, as these don't throw exceptions, >>>> because you >>>>> seem to use Errors throughout - which dont need to be >>>> declared. Infact I cant actually see examples of you catching >>>> and responding to thrown errors. >>>>> >>>>> for example the setContents routine of the Sequence Readers >>>> throws a >>>>> CompoundNotFoundError if there is no mapping for a base - >>>> but I can't see what is done with this. >>>>> >>>>> According to the Java Lang Spec, errors are used for >>>> "serious problems >>>>> that a reasonable application should not try to catch" - >>>> but I would >>>>> have thought finding an unrecognized base in a sequence was >>>> just the >>>>> sort of thing that should be thrown and caught and acted >>>> on. As error >>>>> throwables are not reported in the signature - developers >>>> don't have >>>>> any clue ( or requirement) that they should be catching and dealing >>>>> with errors - which seems a bit dangerous to me... If >>>> Exceptions were >>>>> thrownrather than Errors that would force the developer to handle >>>>> runtime errors >>>>> >>>>> Cheers Trevor >>>>> >>>>> -- >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >>>>> >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From trevor.paterson at roslin.ed.ac.uk Tue May 18 05:00:22 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Tue, 18 May 2010 10:00:22 +0100 Subject: [Biojava-dev] Using DNASequence reverseComplement In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E20E@EXSEEDMD.is.ed.ac.uk> Hi andy I know this is all work in progress so I'm not too hung up on everything not working out the box.. What I am doing is stitching together contigs into an assembly, so I have an AssembledDNASequence Object, that extends DNASequence but defers its method to an Assembly Object which contains a Map of DNASequences with ranges and orientations. This map is created from a single SQL call, and is populated with DNASequence Objects that lazy load their actual sequence data from the database when required The only way I can check that the map is giving my the correct assembly is to print out the sequence as string of each component in order, ie Frag 1, from 1 to 200 Gap of 150 N Frag 2, from 1 to 5000 REVERSED Frag 3, from 150 to 6000 Gap of 2000 N etc I think your suggestion actually demonstrates the 'bug' ( or 'feature' ;) seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() returns the whole reverse complement - not just 1-4: AACCCGGGGTTTTT Because the getSequenceAsString is looking at the parent of the view : Something even more horrible happens if you try to bound that query seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.POSITIVE) Returns AAAA ( ie this is 1-4 of the parent, not even the reverse complement.) And perhaps more interestingly seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.NEGATIVE)) Also returns AAAA (which I just plain can't understand..) ___________________________________________________________________________________ Anyway - I am probably at a stage where I don't want to do any more development on this at the moment... I am reasonably happy that an EnsemblAPI will be able to use the BioJava Sequence objects down the road. I have made an EnsemblDNASequenceReader that extends and implements ProxySequenceReader This is used to create a DataSourceAwareDNASequence and responsible for lazy loading the actual sequence data from Ensembl when required To do this I needed to give the AbstractSequences's (SequenceReader) sequenceStorage property and its getter protected rather than private access, because there needs to be exchange of info from one to the other I will need to implement some sort of chunking and caching for retrieval of larger sequences, and maybe even pass all the methods to a buffered reader for large sequences, but I'm not going to worry about this at the moment. There are plenty of other things for me to protoype in ensembl at the moment! Cheers Trevor > -----Original Message----- > From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates > Sent: 17 May 2010 17:08 > To: PATERSON Trevor > Cc: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Using DNASequence reverseComplement > > Hi Trevor, > > You've stumbled right into something myself & Scooter are > trying to clean-up now. The assumption I had originally made > is that all operations on things like getSequenceAsString() > would go via the view since that's where the logic is located > for both the reversing & the complementing code. That call > now delegates onto the backing store and not the view which > means you get these very odd results happening. > > For the moment I think the following code would do what > you're expecting: > > seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() > > It's annoying because you're in the reverse coordinate system > you've got to reverse the original coordinates so asking for > position 11,14 just isn't going to work. The other way of > working with this would be to construct the views yourself > and pass in a subsequence of the original sequence i.e. > > new ReversedSequenceView(new > ComplementSequenceView(seq.getSubSequence(11,14)); > > This is really a problem with the erasure of the Sequence > types from DNASequence. If DNASequence returned the same type > from its subsequence method then you would just call revComp > on that and it would have been fine. > > The thing to take away from are: > > * getSequenceAsString(Integer, Integer, Strand) is not well > supported atmo > > * So long as we are sure it should remain it will be > > * There should be no reason to materialise the Sequence into > a String to get a part of the API working. If there is then > we've messed up > > Andy > > p.s. The strand stuff is confusing; originally it was meant > to be +ve & -ve strands but assumed that the Sequence you had > was always on the +ve strand. Eventually the meaning will > come back but will require the methods to be more aware of > the strand DNA is on to make the right call about what you > want to do. This all ties in with circular genomes support > and locations > > On 17 May 2010, at 16:24, PATERSON Trevor wrote: > > > Sorry for raising that behemoth earlier.. > > > > I have a separate problem with the DNASequence API - > > > > Probably I just don't understand how to use the View objects > > > > > > If I make a DNASequence > > > > DNASequence seq = new DNASequence("AAAAACCCCGGGTT"); > > > > i.e. length = 14, > > > > I might reasonably want to get the ReverseComplement of > bases 11-14, which should 'be' "AACC" > > > > But I cannot manage to get this in one easy step.... > > > > seq.toString(): AAAAACCCCGGGTT --> FINE > > > > seq.getReverseComplement().getSequenceAsString(): > AACCCGGGGTTTTT --> > > FINE > > > > But when I try to use bounds on this complement - methods > refer back > > to the original seq's iterator, not the complement > > > > > seq.getReverseComplement().getSequenceAsString(11,14,Strand.PO > SITIVE): GGTT > > i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE) > > > seq.getReverseComplement().getSequenceAsString(11,14,Strand.NE > GATIVE): TTGG > > i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE) > > > > Is this the desired behaviour? How would I get the desired > reverseComplement fragment? > > > > The only obvious way that I can see is > > > > DNASequence subseq = new > DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE)); > > System.out.println(""+ > > subseq.getReverseComplement().getSequenceAsString()); > > > > > ______________________________________________________________________ > > _______________________ > > > > On a related point I was mightily confused by the > > Strand.POSITIVE/Strand.NEGATIVE enumeration > > > > I was naively interpreting them to refer to the strand of the DNA: > > Whereas they infact refer to the directionality of the Iterator *on > > the same Strand* > > > > A better name might be Direction:FORWARDS/Direction.BACKWARDS? > > Positive and negative strand has loaded biological meaning > for newbies > > like me ( sense versus antisense ) So I made the assumption that a > > Strand.NEGATIVE call would itself reverseComplement > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From trevor.paterson at roslin.ed.ac.uk Tue May 18 05:06:01 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Tue, 18 May 2010 10:06:01 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> > On reflection I'm not sure how well you can recover from > something like an unknown compound. When you hit that > exception what are you going to do with it? I think it depends on what level your application is working at - if your application is just a script doing some data munging maybe that is a fatal exception that you want to cause the script to die gracefully... But if you are running a large graphical display program and you ask for the sequence of have a particular component - you need to be catching the exception and handling the failed request That's my 2.5p ;) Trevor > -----Original Message----- > From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates > Sent: 18 May 2010 09:53 > To: Mark Schreiber > Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Errors versus Exceptions > > You've made that previous decision a lot harder now Mark (I > knew it was over too quickly). > > On reflection I'm not sure how well you can recover from > something like an unknown compound. When you hit that > exception what are you going to do with it? Something like > IOExceptions for a file not being open you can do something > about that (say it's NFS you could always retry after an > allotted amount of time has passed). But an unknown compound > ... not sure how you can deal with that apart from callbacks > in the parsers. If you think of this in terms of FASTA input > into an aligner program and you have an unknown compound > there all you're going to do is to pass a message back to the > user saying you didn't understand the input sequence with the > error message from the Exception something like "Compound '?' > is not supported". You could always recover attempting to > move through different CompoundSets seeing if one works but > are you going to do that? > > I guess the thing to do is to start with checked exceptions > in potentially recoverable situations and if they prove to be > too clumsy then as BioJava did previously we will switch to > the RuntimeException as a base class. We should also > endeavour to pass on exceptions and avoid excessive exception > wrapping. > > I do think there is a situation where we can relax these > rules; I would like to see the relaxation of them when a > developer is building say a DNASequence but only using a > String. The same should happen for helper classes. > > That's my 2p :) > > Andy > > > On 18 May 2010, at 04:53, Mark Schreiber wrote: > > > To give some historical perspective. This started happening > in BioJava > > 1.5. We switched BioException to extend RuntimeException > (which is not > > checked). This came from some thinking at the time that Java's > > exception checking was a little bit broken (as exemplified by > > BioJava). Checked exceptions are good as long as you don't swallow > > them in the API or just use them to spew a stack trace. Spitting a > > stack trace is pointless as the JVM will do a much better > job of this > > when it dumps you out. Swallowing them in the API is stupid > because it > > is the developer who needs to do something with it. Unless > it is truly > > an Error which is the only time an Error should be used > (unrecoverable > > problems). > > > > RuntimeExceptions should be used for any type of exception > that can be > > avoided by defensive programming. NullPointerExceptions and > > IndexOutOfBounds exceptions are classic examples. There are > hundreds > > of places in Java where you could get one of these but you > can easily > > avoid them by checking if a collection contains an item, > doing simple > > if( x == null) checks, using array.length etc. For this > reason these > > and some other Java exceptions extend RuntimeException and are not > > checked. Can you imagine what Java would look like if you > actually had > > to put every array access in a try, catch statement? > > > > Therefore, as much as possible I think BioJava3 should have > exceptions > > that extend RuntimeException and provide a defensive mechanism to > > avoid having them happen. For example in creating a DNA > sequence from > > text you could provide a validator which will check if > there are any > > "incorrect" characters. Good code examples on the wiki > should show the > > use of defensive programming and not just surround > everything with a > > try catch. Importantly RuntimeExceptions should be > declared so people > > are aware they may occur. This is not required by the > compiler but it > > is good practice for documentation purposes. The Exception class > > itself should probably contain some Javadoc which explains > how it can > > be avoided in the first place. Checked exceptions are good > for cases > > where you can recover (although this is often more challenging than > > people think) but it is always easier and faster to check yourself, > > throwing the exception and generating the stacktrace etc > takes quite a > > lot of effort from the JVM. > > > > Finally, I don't think you should be throwing Errors to often. > > Although Errors are unchecked and give you the same effect as > > RuntimeExceptions they imply something really bad has gone wrong. > > These should be reserved for things like configuration files being > > corrupt or things that would prevent BioJava from being > used, like not > > finding a required JAR file or plugin. You can't recover from these. > > If your API has lots of places where Errors can occur it might > > indicate poor design. > > > > - Mark > > > > On Mon, May 17, 2010 at 10:30 PM, Andy Yates > wrote: > >> > >> There's something that irks me with checked exceptions & I > found code worked better once I went to a fail-fast method of > coding but for an API I can see the usefulness of them. > >> > >> Wow I think this could be the fastest any group has ever > dealt with > >> the Checked/Unchecked exception argument :) > >> > >> Andy > >> > >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: > >> > >>> > >>> > >>>> Is the problem in the code the fact that errors are used or that > >>>> they are not declared on the method signatures? > >>> > >>> Declaring them would help - but > >>> Even if you declare Errors in the signature, there is > nothing to force you to catch them. > >>> So nothing to alert you to the chance that they may be > thrown when you are writing code. > >>> > >>> public class Demo { > >>> > >>> public static void main(String[] args) throws Exception{ > >>> > >>> Demo.demo(); > >>> System.out.println("never reached"); > >>> > >>> } > >>> > >>> public static void demo() throws Error { > >>> throw new Error("kills the runtime"); > >>> } > >>> } > >>> > >>> Compiles OK with or without declaring the throwable - but > throws a runtime error. > >>> > >>> :. IMHO throwing errors is not very useful to developer > >>> - they can use them - IF they know the code well enough > to know they > >>> might be thrown > >>> > >>> If Exceptions were used > >>> 1. they would HAVE to be dealt with - if a developer just > swallows > >>> something bad, than that's their responsibility for > shitty code ;) > >>> 2. like Errors they would be extendable, so real case dataloaders > >>> could use their own exception types and decide how to > deal with or > >>> ignore them > >>> > >>> I guess in my background of coding for data access over the wire > >>> with jdbc and webservices > >>> - you need to be aware of the real probability of connection and > >>> request failures > >>> - so explicitly using Exceptions forces the developer to code > >>> defensively > >>> > >>> > >>> > >>> > >>>> > >>>> On a more design note I do not like checked exceptions for two > >>>> reasons. The first is they seem to make more junior developers > >>>> catch Exception and swallow it. The second is whenever I've been > >>>> Java coding in the past whenever a checked exception gets thrown > >>>> (say IOException because of incorrect file permissions) I cannot > >>>> deal with it which in the past has meant I either forward on the > >>>> problem or re-throw in an unchecked exception. > >>>> > >>>> That said as Scooter mentioned the exception system was > rushed out > >>>> in the hackathon and did not have much work put into it. > >>>> > >>>> Andy > >>>> > >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: > >>>> > >>>>> resending cos of bad headers > >>>>> > >>>>> ________________________________ > >>>>> > >>>>> From: PATERSON Trevor > >>>>> Sent: 17 May 2010 13:02 > >>>>> To: biojava-dev at lists.open-bio.org > >>>>> Subject: Errors versus Exceptions > >>>>> > >>>>> > >>>>> Could I ask a quick question about why BJ3 seems to use > >>>> Errors rather > >>>>> than Exceptions > >>>>> - maybe this is already documented somewhere on the wiki > >>>> for BJ3 or 1 - and you can point me there. > >>>>> > >>>>> Obviously for us dataloading from remote databases we need > >>>> to have a method to catch connection,sql, datamapping errors etc. > >>>>> We ususally throw DataAccessExceptions when this happens, > >>>> which wrap any java.net, Ibatis and SQL exceptions. > >>>>> > >>>>> It is difficult for us to plug in our lazyload over your > >>>> signatures in > >>>>> the BJ3 sequence readers, as these don't throw exceptions, > >>>> because you > >>>>> seem to use Errors throughout - which dont need to be > >>>> declared. Infact I cant actually see examples of you > catching and > >>>> responding to thrown errors. > >>>>> > >>>>> for example the setContents routine of the Sequence Readers > >>>> throws a > >>>>> CompoundNotFoundError if there is no mapping for a base - > >>>> but I can't see what is done with this. > >>>>> > >>>>> According to the Java Lang Spec, errors are used for > >>>> "serious problems > >>>>> that a reasonable application should not try to catch" - > >>>> but I would > >>>>> have thought finding an unrecognized base in a sequence was > >>>> just the > >>>>> sort of thing that should be thrown and caught and acted > >>>> on. As error > >>>>> throwables are not reported in the signature - developers > >>>> don't have > >>>>> any clue ( or requirement) that they should be catching and > >>>>> dealing with errors - which seems a bit dangerous to me... If > >>>> Exceptions were > >>>>> thrownrather than Errors that would force the developer > to handle > >>>>> runtime errors > >>>>> > >>>>> Cheers Trevor > >>>>> > >>>>> -- > >>>>> The University of Edinburgh is a charitable body, registered in > >>>>> Scotland, with registration number SC005336. > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> biojava-dev mailing list > >>>>> biojava-dev at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>>> > >>>> -- > >>>> Andrew Yates Ensembl Genomes Engineer > >>>> EMBL-EBI Tel: +44-(0)1223-492538 > >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>> -- > >>> The University of Edinburgh is a charitable body, registered in > >>> Scotland, with registration number SC005336. > >>> > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> -- > >> Andrew Yates Ensembl Genomes Engineer > >> EMBL-EBI Tel: +44-(0)1223-492538 > >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >> > >> > >> > >> > >> > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From andy.law at roslin.ed.ac.uk Tue May 18 05:10:30 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Tue, 18 May 2010 10:10:30 +0100 Subject: [Biojava-dev] Using DNASequence reverseComplement In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20E@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20E@EXSEEDMD.is.ed.ac.uk> Message-ID: <3FA5C783-F771-4038-BBD7-232EE2D5BDF8@exseed.ed.ac.uk> Another philosophical question from the sidelines. Does a BJ3 object return a *new* object when asked for a reverseComplement/subSequence/whatever (I suspect from what I have read here and heard from Trevor that it does not)? Should it? If I have DNA sequence object A and I ask it for the reverseComplement of the last 16 bases (call it sequence B) and then modify sequence A to change the last base (if that is possible?), what would happen to the sequence B? (And what do we think *should* happen to sequence B)? On 18 May 2010, at 10:00, PATERSON Trevor wrote: > Hi andy > > I know this is all work in progress so I'm not too hung up on everything not working out the box.. > > What I am doing is stitching together contigs into an assembly, so I have an AssembledDNASequence Object, > that extends DNASequence but defers its method to an Assembly Object which contains a Map of DNASequences > with ranges and orientations. This map is created from a single SQL call, and is populated with DNASequence > Objects that lazy load their actual sequence data from the database when required > > The only way I can check that the map is giving my the correct assembly is to print out > the sequence as string of each component in order, > ie > Frag 1, from 1 to 200 > Gap of 150 N > Frag 2, from 1 to 5000 REVERSED > Frag 3, from 150 to 6000 > Gap of 2000 N > > etc > > I think your suggestion actually demonstrates the 'bug' ( or 'feature' ;) > > seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() > returns the whole reverse complement - not just 1-4: AACCCGGGGTTTTT > Because the getSequenceAsString is looking at the parent of the view : > > Something even more horrible happens if you try to bound that query > seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.POSITIVE) > Returns AAAA ( ie this is 1-4 of the parent, not even the reverse complement.) > And perhaps more interestingly > seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.NEGATIVE)) > Also returns AAAA (which I just plain can't understand..) > > ___________________________________________________________________________________ > > Anyway - I am probably at a stage where I don't want to do any more development on this at the moment... > I am reasonably happy that an EnsemblAPI will be able to use the BioJava Sequence objects down the road. > > I have made an EnsemblDNASequenceReader that extends and implements ProxySequenceReader > This is used to create a DataSourceAwareDNASequence and responsible for lazy loading the actual sequence data > from Ensembl when required > > To do this I needed to give the AbstractSequences's (SequenceReader) sequenceStorage property and its getter protected rather > than private access, because there needs to be exchange of info from one to the other > > I will need to implement some sort of chunking and caching for retrieval of larger sequences, and maybe even pass > all the methods to a buffered reader for large sequences, but I'm not going to worry about this at the moment. > > There are plenty of other things for me to protoype in ensembl at the moment! > > Cheers Trevor > > > > > > > > >> -----Original Message----- >> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates >> Sent: 17 May 2010 17:08 >> To: PATERSON Trevor >> Cc: biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Using DNASequence reverseComplement >> >> Hi Trevor, >> >> You've stumbled right into something myself & Scooter are >> trying to clean-up now. The assumption I had originally made >> is that all operations on things like getSequenceAsString() >> would go via the view since that's where the logic is located >> for both the reversing & the complementing code. That call >> now delegates onto the backing store and not the view which >> means you get these very odd results happening. >> >> For the moment I think the following code would do what >> you're expecting: >> >> seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() >> >> It's annoying because you're in the reverse coordinate system >> you've got to reverse the original coordinates so asking for >> position 11,14 just isn't going to work. The other way of >> working with this would be to construct the views yourself >> and pass in a subsequence of the original sequence i.e. >> >> new ReversedSequenceView(new >> ComplementSequenceView(seq.getSubSequence(11,14)); >> >> This is really a problem with the erasure of the Sequence >> types from DNASequence. If DNASequence returned the same type >> from its subsequence method then you would just call revComp >> on that and it would have been fine. >> >> The thing to take away from are: >> >> * getSequenceAsString(Integer, Integer, Strand) is not well >> supported atmo >> >> * So long as we are sure it should remain it will be >> >> * There should be no reason to materialise the Sequence into >> a String to get a part of the API working. If there is then >> we've messed up >> >> Andy >> >> p.s. The strand stuff is confusing; originally it was meant >> to be +ve & -ve strands but assumed that the Sequence you had >> was always on the +ve strand. Eventually the meaning will >> come back but will require the methods to be more aware of >> the strand DNA is on to make the right call about what you >> want to do. This all ties in with circular genomes support >> and locations >> >> On 17 May 2010, at 16:24, PATERSON Trevor wrote: >> >>> Sorry for raising that behemoth earlier.. >>> >>> I have a separate problem with the DNASequence API - >>> >>> Probably I just don't understand how to use the View objects >>> >>> >>> If I make a DNASequence >>> >>> DNASequence seq = new DNASequence("AAAAACCCCGGGTT"); >>> >>> i.e. length = 14, >>> >>> I might reasonably want to get the ReverseComplement of >> bases 11-14, which should 'be' "AACC" >>> >>> But I cannot manage to get this in one easy step.... >>> >>> seq.toString(): AAAAACCCCGGGTT --> FINE >>> >>> seq.getReverseComplement().getSequenceAsString(): >> AACCCGGGGTTTTT --> >>> FINE >>> >>> But when I try to use bounds on this complement - methods >> refer back >>> to the original seq's iterator, not the complement >>> >>> >> seq.getReverseComplement().getSequenceAsString(11,14,Strand.PO >> SITIVE): GGTT >>> i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE) >>> >> seq.getReverseComplement().getSequenceAsString(11,14,Strand.NE >> GATIVE): TTGG >>> i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE) >>> >>> Is this the desired behaviour? How would I get the desired >> reverseComplement fragment? >>> >>> The only obvious way that I can see is >>> >>> DNASequence subseq = new >> DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE)); >>> System.out.println(""+ >>> subseq.getReverseComplement().getSequenceAsString()); >>> >>> >> ______________________________________________________________________ >>> _______________________ >>> >>> On a related point I was mightily confused by the >>> Strand.POSITIVE/Strand.NEGATIVE enumeration >>> >>> I was naively interpreting them to refer to the strand of the DNA: >>> Whereas they infact refer to the directionality of the Iterator *on >>> the same Strand* >>> >>> A better name might be Direction:FORWARDS/Direction.BACKWARDS? >>> Positive and negative strand has loaded biological meaning >> for newbies >>> like me ( sense versus antisense ) So I made the assumption that a >>> Strand.NEGATIVE call would itself reverseComplement >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From markjschreiber at gmail.com Tue May 18 05:10:47 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 18 May 2010 17:10:47 +0800 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> Message-ID: Your ability to recover really depends on the context and how much effort you want to make. If you are in a long running app on a server you should try very hard to recover or at least fail gracefully without crashing. For small programs, who cares? In interactive contexts you can request feedback from the user. For example, if your FASTA file is not FASTA you could ask for another file. Clarifying a little more, Unchecked exceptions are for things that could easily be avoided if the API allows for it by doing simple checks (testing for null, validating input etc). Checked exceptions should be for things that are difficult to check or can be unpredictable; IOException is a good example. An IOException can happen because of something completely out of your control. It needs to be checked because defensive programming is near impossible; you could check that a file exists but a sporadic network failure between calls might make it unreachable. Finally, Errors are for the unrecoverable. Things that will probably even bring down a long running app or server app (or at least require some reconfiguration or redeployment). I think a well designed API can make most exceptions unchecked (and avoidable with good programming), small numbers of Checked exceptions for things that can't be avoided with defensive programming and a very small number of things that can cause errors. - Mark On Tue, May 18, 2010 at 4:52 PM, Andy Yates wrote: > > You've made that previous decision a lot harder now Mark (I knew it was over too quickly). > > On reflection I'm not sure how well you can recover from something like an unknown compound. When you hit that exception what are you going to do with it? Something like IOExceptions for a file not being open you can do something about that (say it's NFS you could always retry after an allotted amount of time has passed). But an unknown compound ... not sure how you can deal with that apart from callbacks in the parsers. If you think of this in terms of FASTA input into an aligner program and you have an unknown compound there all you're going to do is to pass a message back to the user saying you didn't understand the input sequence with the error message from the Exception something like "Compound '?' is not supported". You could always recover attempting to move through different CompoundSets seeing if one works but are you going to do that? > > I guess the thing to do is to start with checked exceptions in potentially recoverable situations and if they prove to be too clumsy then as BioJava did previously we will switch to the RuntimeException as a base class. We should also endeavour to pass on exceptions and avoid excessive exception wrapping. > > I do think there is a situation where we can relax these rules; I would like to see the relaxation of them when a developer is building say a DNASequence but only using a String. The same should happen for helper classes. > > That's my 2p :) > > Andy > > > On 18 May 2010, at 04:53, Mark Schreiber wrote: > > > To give some historical perspective. This started happening in BioJava > > 1.5. We switched BioException to extend RuntimeException (which is not > > checked). This came from some thinking at the time that Java's > > exception checking was a little bit broken (as exemplified by > > BioJava). ?Checked exceptions are good as long as you don't swallow > > them in the API or just use them to spew a stack trace. Spitting a > > stack trace is pointless as the JVM will do a much better job of this > > when it dumps you out. Swallowing them in the API is stupid because it > > is the developer who needs to do something with it. Unless it is truly > > an Error which is the only time an Error should be used (unrecoverable > > problems). > > > > RuntimeExceptions should be used for any type of exception that can be > > avoided by defensive programming. NullPointerExceptions and > > IndexOutOfBounds exceptions are classic examples. There are hundreds > > of places in Java where you could get one of these but you can easily > > avoid them by checking if a collection contains an item, doing simple > > if( x == null) checks, using array.length etc. For this reason these > > and some other Java exceptions extend RuntimeException and are not > > checked. Can you imagine what Java would look like if you actually had > > to put every array access in a try, catch statement? > > > > Therefore, as much as possible I think BioJava3 should have exceptions > > that extend RuntimeException and provide a defensive mechanism to > > avoid having them happen. For example in creating a DNA sequence from > > text you could provide a validator which will check if there are any > > "incorrect" characters. Good code examples on the wiki should show the > > use of defensive programming and not just surround everything with a > > try catch. ?Importantly RuntimeExceptions should be declared so people > > are aware they may occur. This is not required by the compiler but it > > is good practice for documentation purposes. The Exception class > > itself should probably contain some Javadoc which explains how it can > > be avoided in the first place. Checked exceptions are good for cases > > where you can recover (although this is often more challenging than > > people think) but it is always easier and faster to check yourself, > > throwing the exception and generating the stacktrace etc takes quite a > > lot of effort from the JVM. > > > > Finally, I don't think you should be throwing Errors to often. > > Although Errors are unchecked and give you the same effect as > > RuntimeExceptions they imply something really bad has gone wrong. > > These should be reserved for things like configuration files being > > corrupt or things that would prevent BioJava from being used, like not > > finding a required JAR file or plugin. You can't recover from these. > > If your API has lots of places where Errors can occur it might > > indicate poor design. > > > > - Mark > > > > On Mon, May 17, 2010 at 10:30 PM, Andy Yates wrote: > >> > >> There's something that irks me with checked exceptions & I found code worked better once I went to a fail-fast method of coding but for an API I can see the usefulness of them. > >> > >> Wow I think this could be the fastest any group has ever dealt with the Checked/Unchecked exception argument :) > >> > >> Andy > >> > >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: > >> > >>> > >>> > >>>> Is the problem in the code the fact that errors are used or > >>>> that they are not declared on the method signatures? > >>> > >>> Declaring them would help - but > >>> Even if you declare Errors in the signature, there is nothing to force you to catch them. > >>> So nothing to alert you to the chance that they may be thrown when you are writing code. > >>> > >>> public class Demo { > >>> > >>> ? ? ? public ?static void main(String[] args) throws Exception{ > >>> > >>> ? ? ? Demo.demo(); > >>> ? ? ? ? ? ? ? System.out.println("never reached"); > >>> > >>> ? ? ? } > >>> > >>> ? ? ? public static void demo() throws Error { > >>> ? ? ? ? ? ? ? throw new Error("kills the runtime"); > >>> ? ? ? } > >>> } > >>> > >>> Compiles OK with or without declaring the throwable - but throws a runtime error. > >>> > >>> :. IMHO throwing errors is not very useful to developer > >>> - they can use them - IF they know the code well enough to know they might be thrown > >>> > >>> If Exceptions were used > >>> 1. they would HAVE to be dealt with - if a developer just swallows something bad, than that's their responsibility for shitty code ;) > >>> 2. like Errors they would be extendable, so real case dataloaders could use their own exception types and decide how to deal with or ignore them > >>> > >>> I guess in my background of coding for data access over the wire with jdbc and webservices > >>> - you need to be aware of the real probability of connection and request failures > >>> - so explicitly using Exceptions forces the developer to code defensively > >>> > >>> > >>> > >>> > >>>> > >>>> On a more design note I do not like checked exceptions for > >>>> two reasons. The first is they seem to make more junior > >>>> developers catch Exception and swallow it. The second is > >>>> whenever I've been Java coding in the past whenever a checked > >>>> exception gets thrown (say IOException because of incorrect > >>>> file permissions) I cannot deal with it which in the past has > >>>> meant I either forward on the problem or re-throw in an > >>>> unchecked exception. > >>>> > >>>> That said as Scooter mentioned the exception system was > >>>> rushed out in the hackathon and did not have much work put into it. > >>>> > >>>> Andy > >>>> > >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: > >>>> > >>>>> resending cos of bad headers > >>>>> > >>>>> ________________________________ > >>>>> > >>>>> From: PATERSON Trevor > >>>>> Sent: 17 May 2010 13:02 > >>>>> To: biojava-dev at lists.open-bio.org > >>>>> Subject: Errors versus Exceptions > >>>>> > >>>>> > >>>>> Could I ask a quick question about why BJ3 seems to use > >>>> Errors rather > >>>>> than Exceptions > >>>>> - maybe this is already documented somewhere on the wiki > >>>> for BJ3 or 1 - and you can point me there. > >>>>> > >>>>> Obviously for us dataloading from remote databases we need > >>>> to have a method to catch connection,sql, datamapping errors etc. > >>>>> We ususally throw DataAccessExceptions when this happens, > >>>> which wrap any java.net, Ibatis and SQL exceptions. > >>>>> > >>>>> It is difficult for us to plug in our lazyload over your > >>>> signatures in > >>>>> the BJ3 sequence readers, as these don't throw exceptions, > >>>> because you > >>>>> seem to use Errors throughout - which dont need to be > >>>> declared. Infact I cant actually see examples of you catching > >>>> and responding to thrown errors. > >>>>> > >>>>> for example the setContents routine of the Sequence Readers > >>>> throws a > >>>>> CompoundNotFoundError if there is no mapping for a base - > >>>> but I can't see what is done with this. > >>>>> > >>>>> According to the Java Lang Spec, errors are used for > >>>> "serious problems > >>>>> that a reasonable application should not try to catch" - > >>>> but I would > >>>>> have thought finding an unrecognized base in a sequence was > >>>> just the > >>>>> sort of thing that should be thrown and caught and acted > >>>> on. As error > >>>>> throwables are not reported in the signature - developers > >>>> don't have > >>>>> any clue ( or requirement) that they should be catching and dealing > >>>>> with errors - which seems a bit dangerous to me... If > >>>> Exceptions were > >>>>> thrownrather than Errors that would force the developer to handle > >>>>> runtime errors > >>>>> > >>>>> Cheers Trevor > >>>>> > >>>>> -- > >>>>> The University of Edinburgh is a charitable body, registered in > >>>>> Scotland, with registration number SC005336. > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> biojava-dev mailing list > >>>>> biojava-dev at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >>>> > >>>> -- > >>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > >>>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > >>>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > >>>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>> -- > >>> The University of Edinburgh is a charitable body, registered in > >>> Scotland, with registration number SC005336. > >>> > >>> > >>> _______________________________________________ > >>> biojava-dev mailing list > >>> biojava-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> -- > >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > >> > >> > >> > >> > >> > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > From ayates at ebi.ac.uk Tue May 18 05:16:32 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 18 May 2010 10:16:32 +0100 Subject: [Biojava-dev] Using DNASequence reverseComplement In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20E@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20E@EXSEEDMD.is.ed.ac.uk> Message-ID: <16ED427D-DD92-4D02-908A-8FF620992440@ebi.ac.uk> Hey Trevor, So first things first have a look at the current SVN status (or GitHub) as I did a commit last night bringing in a lot of bugfixes in & throwing more unsupported errors I haven't had time to implement (mostly to do with the stranded version of sequence as string). If that doesn't work then let me know asap and I'll take your example as another test case to make sure we don't regress. There's also some new bits of code in there which may help you out. There's a Sequence which will return the same compound for a given length no matter what you call out to (something I thought may be of some use for large runs of Ns) without consuming the space for that. There's also some code in there for working with Edits and for handling translations which don't translate well; a feature of Ensembl databases and something Scooter was after. It's a pity it needs that but maybe there is something we can do to help yourself out and lock down the access for others (some kind of hint if you allow this level of access). I'm glad that you're happy we can accommodate your needs. If you're happy with what's been done would you be happy to merge these changes & publish them to github as a fork? If not fair enough :) Andy On 18 May 2010, at 10:00, PATERSON Trevor wrote: > Hi andy > > I know this is all work in progress so I'm not too hung up on everything not working out the box.. > > What I am doing is stitching together contigs into an assembly, so I have an AssembledDNASequence Object, > that extends DNASequence but defers its method to an Assembly Object which contains a Map of DNASequences > with ranges and orientations. This map is created from a single SQL call, and is populated with DNASequence > Objects that lazy load their actual sequence data from the database when required > > The only way I can check that the map is giving my the correct assembly is to print out > the sequence as string of each component in order, > ie > Frag 1, from 1 to 200 > Gap of 150 N > Frag 2, from 1 to 5000 REVERSED > Frag 3, from 150 to 6000 > Gap of 2000 N > > etc > > I think your suggestion actually demonstrates the 'bug' ( or 'feature' ;) > > seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() > returns the whole reverse complement - not just 1-4: AACCCGGGGTTTTT > Because the getSequenceAsString is looking at the parent of the view : > > Something even more horrible happens if you try to bound that query > seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.POSITIVE) > Returns AAAA ( ie this is 1-4 of the parent, not even the reverse complement.) > And perhaps more interestingly > seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.NEGATIVE)) > Also returns AAAA (which I just plain can't understand..) > > ___________________________________________________________________________________ > > Anyway - I am probably at a stage where I don't want to do any more development on this at the moment... > I am reasonably happy that an EnsemblAPI will be able to use the BioJava Sequence objects down the road. > > I have made an EnsemblDNASequenceReader that extends and implements ProxySequenceReader > This is used to create a DataSourceAwareDNASequence and responsible for lazy loading the actual sequence data > from Ensembl when required > > To do this I needed to give the AbstractSequences's (SequenceReader) sequenceStorage property and its getter protected rather > than private access, because there needs to be exchange of info from one to the other > > I will need to implement some sort of chunking and caching for retrieval of larger sequences, and maybe even pass > all the methods to a buffered reader for large sequences, but I'm not going to worry about this at the moment. > > There are plenty of other things for me to protoype in ensembl at the moment! > > Cheers Trevor > > > > > > > > >> -----Original Message----- >> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates >> Sent: 17 May 2010 17:08 >> To: PATERSON Trevor >> Cc: biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Using DNASequence reverseComplement >> >> Hi Trevor, >> >> You've stumbled right into something myself & Scooter are >> trying to clean-up now. The assumption I had originally made >> is that all operations on things like getSequenceAsString() >> would go via the view since that's where the logic is located >> for both the reversing & the complementing code. That call >> now delegates onto the backing store and not the view which >> means you get these very odd results happening. >> >> For the moment I think the following code would do what >> you're expecting: >> >> seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() >> >> It's annoying because you're in the reverse coordinate system >> you've got to reverse the original coordinates so asking for >> position 11,14 just isn't going to work. The other way of >> working with this would be to construct the views yourself >> and pass in a subsequence of the original sequence i.e. >> >> new ReversedSequenceView(new >> ComplementSequenceView(seq.getSubSequence(11,14)); >> >> This is really a problem with the erasure of the Sequence >> types from DNASequence. If DNASequence returned the same type >> from its subsequence method then you would just call revComp >> on that and it would have been fine. >> >> The thing to take away from are: >> >> * getSequenceAsString(Integer, Integer, Strand) is not well >> supported atmo >> >> * So long as we are sure it should remain it will be >> >> * There should be no reason to materialise the Sequence into >> a String to get a part of the API working. If there is then >> we've messed up >> >> Andy >> >> p.s. The strand stuff is confusing; originally it was meant >> to be +ve & -ve strands but assumed that the Sequence you had >> was always on the +ve strand. Eventually the meaning will >> come back but will require the methods to be more aware of >> the strand DNA is on to make the right call about what you >> want to do. This all ties in with circular genomes support >> and locations >> >> On 17 May 2010, at 16:24, PATERSON Trevor wrote: >> >>> Sorry for raising that behemoth earlier.. >>> >>> I have a separate problem with the DNASequence API - >>> >>> Probably I just don't understand how to use the View objects >>> >>> >>> If I make a DNASequence >>> >>> DNASequence seq = new DNASequence("AAAAACCCCGGGTT"); >>> >>> i.e. length = 14, >>> >>> I might reasonably want to get the ReverseComplement of >> bases 11-14, which should 'be' "AACC" >>> >>> But I cannot manage to get this in one easy step.... >>> >>> seq.toString(): AAAAACCCCGGGTT --> FINE >>> >>> seq.getReverseComplement().getSequenceAsString(): >> AACCCGGGGTTTTT --> >>> FINE >>> >>> But when I try to use bounds on this complement - methods >> refer back >>> to the original seq's iterator, not the complement >>> >>> >> seq.getReverseComplement().getSequenceAsString(11,14,Strand.PO >> SITIVE): GGTT >>> i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE) >>> >> seq.getReverseComplement().getSequenceAsString(11,14,Strand.NE >> GATIVE): TTGG >>> i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE) >>> >>> Is this the desired behaviour? How would I get the desired >> reverseComplement fragment? >>> >>> The only obvious way that I can see is >>> >>> DNASequence subseq = new >> DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE)); >>> System.out.println(""+ >>> subseq.getReverseComplement().getSequenceAsString()); >>> >>> >> ______________________________________________________________________ >>> _______________________ >>> >>> On a related point I was mightily confused by the >>> Strand.POSITIVE/Strand.NEGATIVE enumeration >>> >>> I was naively interpreting them to refer to the strand of the DNA: >>> Whereas they infact refer to the directionality of the Iterator *on >>> the same Strand* >>> >>> A better name might be Direction:FORWARDS/Direction.BACKWARDS? >>> Positive and negative strand has loaded biological meaning >> for newbies >>> like me ( sense versus antisense ) So I made the assumption that a >>> Strand.NEGATIVE call would itself reverseComplement >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From trevor.paterson at roslin.ed.ac.uk Tue May 18 05:22:39 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Tue, 18 May 2010 10:22:39 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> Yes... But... If our implementation relies on a lazy load underneath every Sequence method implementation.. Then every method can throw an IO type Exception In essence you can instantiate a valid Sequence object with a LazyLoading Reader before you get the valid SequenceData, unlike the case where you can check that you can make a valid Sequence object from the Reader. > -----Original Message----- > From: Mark Schreiber [mailto:markjschreiber at gmail.com] > Sent: 18 May 2010 10:11 > To: Andy Yates > Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Errors versus Exceptions > > Your ability to recover really depends on the context and how > much effort you want to make. If you are in a long running > app on a server you should try very hard to recover or at > least fail gracefully without crashing. For small programs, > who cares? In interactive contexts you can request feedback > from the user. For example, if your FASTA file is not FASTA > you could ask for another file. > > Clarifying a little more, Unchecked exceptions are for things > that could easily be avoided if the API allows for it by > doing simple checks (testing for null, validating input etc). > Checked exceptions should be for things that are difficult to > check or can be unpredictable; IOException is a good example. > An IOException can happen because of something completely out > of your control. It needs to be checked because defensive > programming is near impossible; you could check that a file > exists but a sporadic network failure between calls might > make it unreachable. Finally, Errors are for the > unrecoverable. Things that will probably even bring down a > long running app or server app (or at least require some > reconfiguration or redeployment). > > I think a well designed API can make most exceptions > unchecked (and avoidable with good programming), small > numbers of Checked exceptions for things that can't be > avoided with defensive programming and a very small number of > things that can cause errors. > > - Mark > > On Tue, May 18, 2010 at 4:52 PM, Andy Yates wrote: > > > > You've made that previous decision a lot harder now Mark (I > knew it was over too quickly). > > > > On reflection I'm not sure how well you can recover from > something like an unknown compound. When you hit that > exception what are you going to do with it? Something like > IOExceptions for a file not being open you can do something > about that (say it's NFS you could always retry after an > allotted amount of time has passed). But an unknown compound > ... not sure how you can deal with that apart from callbacks > in the parsers. If you think of this in terms of FASTA input > into an aligner program and you have an unknown compound > there all you're going to do is to pass a message back to the > user saying you didn't understand the input sequence with the > error message from the Exception something like "Compound '?' > is not supported". You could always recover attempting to > move through different CompoundSets seeing if one works but > are you going to do that? > > > > I guess the thing to do is to start with checked exceptions > in potentially recoverable situations and if they prove to be > too clumsy then as BioJava did previously we will switch to > the RuntimeException as a base class. We should also > endeavour to pass on exceptions and avoid excessive exception > wrapping. > > > > I do think there is a situation where we can relax these > rules; I would like to see the relaxation of them when a > developer is building say a DNASequence but only using a > String. The same should happen for helper classes. > > > > That's my 2p :) > > > > Andy > > > > > > On 18 May 2010, at 04:53, Mark Schreiber wrote: > > > > > To give some historical perspective. This started happening in > > > BioJava 1.5. We switched BioException to extend RuntimeException > > > (which is not checked). This came from some thinking at the time > > > that Java's exception checking was a little bit broken (as > > > exemplified by BioJava). Checked exceptions are good as > long as you > > > don't swallow them in the API or just use them to spew a stack > > > trace. Spitting a stack trace is pointless as the JVM > will do a much > > > better job of this when it dumps you out. Swallowing them > in the API > > > is stupid because it is the developer who needs to do > something with > > > it. Unless it is truly an Error which is the only time an Error > > > should be used (unrecoverable problems). > > > > > > RuntimeExceptions should be used for any type of > exception that can > > > be avoided by defensive programming. NullPointerExceptions and > > > IndexOutOfBounds exceptions are classic examples. There > are hundreds > > > of places in Java where you could get one of these but you can > > > easily avoid them by checking if a collection contains an item, > > > doing simple if( x == null) checks, using array.length > etc. For this > > > reason these and some other Java exceptions extend > RuntimeException > > > and are not checked. Can you imagine what Java would look like if > > > you actually had to put every array access in a try, > catch statement? > > > > > > Therefore, as much as possible I think BioJava3 should have > > > exceptions that extend RuntimeException and provide a defensive > > > mechanism to avoid having them happen. For example in > creating a DNA > > > sequence from text you could provide a validator which > will check if > > > there are any "incorrect" characters. Good code examples > on the wiki > > > should show the use of defensive programming and not just > surround > > > everything with a try catch. Importantly > RuntimeExceptions should > > > be declared so people are aware they may occur. This is > not required > > > by the compiler but it is good practice for documentation > purposes. > > > The Exception class itself should probably contain some Javadoc > > > which explains how it can be avoided in the first place. Checked > > > exceptions are good for cases where you can recover > (although this > > > is often more challenging than people think) but it is > always easier > > > and faster to check yourself, throwing the exception and > generating > > > the stacktrace etc takes quite a lot of effort from the JVM. > > > > > > Finally, I don't think you should be throwing Errors to often. > > > Although Errors are unchecked and give you the same effect as > > > RuntimeExceptions they imply something really bad has gone wrong. > > > These should be reserved for things like configuration > files being > > > corrupt or things that would prevent BioJava from being > used, like > > > not finding a required JAR file or plugin. You can't > recover from these. > > > If your API has lots of places where Errors can occur it might > > > indicate poor design. > > > > > > - Mark > > > > > > On Mon, May 17, 2010 at 10:30 PM, Andy Yates > wrote: > > >> > > >> There's something that irks me with checked exceptions & > I found code worked better once I went to a fail-fast method > of coding but for an API I can see the usefulness of them. > > >> > > >> Wow I think this could be the fastest any group has ever > dealt with > > >> the Checked/Unchecked exception argument :) > > >> > > >> Andy > > >> > > >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: > > >> > > >>> > > >>> > > >>>> Is the problem in the code the fact that errors are > used or that > > >>>> they are not declared on the method signatures? > > >>> > > >>> Declaring them would help - but > > >>> Even if you declare Errors in the signature, there is > nothing to force you to catch them. > > >>> So nothing to alert you to the chance that they may be > thrown when you are writing code. > > >>> > > >>> public class Demo { > > >>> > > >>> public static void main(String[] args) throws Exception{ > > >>> > > >>> Demo.demo(); > > >>> System.out.println("never reached"); > > >>> > > >>> } > > >>> > > >>> public static void demo() throws Error { > > >>> throw new Error("kills the runtime"); > > >>> } > > >>> } > > >>> > > >>> Compiles OK with or without declaring the throwable - > but throws a runtime error. > > >>> > > >>> :. IMHO throwing errors is not very useful to developer > > >>> - they can use them - IF they know the code well enough to know > > >>> they might be thrown > > >>> > > >>> If Exceptions were used > > >>> 1. they would HAVE to be dealt with - if a developer > just swallows > > >>> something bad, than that's their responsibility for > shitty code ;) > > >>> 2. like Errors they would be extendable, so real case > dataloaders > > >>> could use their own exception types and decide how to > deal with or > > >>> ignore them > > >>> > > >>> I guess in my background of coding for data access over > the wire > > >>> with jdbc and webservices > > >>> - you need to be aware of the real probability of > connection and > > >>> request failures > > >>> - so explicitly using Exceptions forces the developer to code > > >>> defensively > > >>> > > >>> > > >>> > > >>> > > >>>> > > >>>> On a more design note I do not like checked exceptions for two > > >>>> reasons. The first is they seem to make more junior developers > > >>>> catch Exception and swallow it. The second is whenever > I've been > > >>>> Java coding in the past whenever a checked exception > gets thrown > > >>>> (say IOException because of incorrect file > permissions) I cannot > > >>>> deal with it which in the past has meant I either > forward on the > > >>>> problem or re-throw in an unchecked exception. > > >>>> > > >>>> That said as Scooter mentioned the exception system was rushed > > >>>> out in the hackathon and did not have much work put into it. > > >>>> > > >>>> Andy > > >>>> > > >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: > > >>>> > > >>>>> resending cos of bad headers > > >>>>> > > >>>>> ________________________________ > > >>>>> > > >>>>> From: PATERSON Trevor > > >>>>> Sent: 17 May 2010 13:02 > > >>>>> To: biojava-dev at lists.open-bio.org > > >>>>> Subject: Errors versus Exceptions > > >>>>> > > >>>>> > > >>>>> Could I ask a quick question about why BJ3 seems to use > > >>>> Errors rather > > >>>>> than Exceptions > > >>>>> - maybe this is already documented somewhere on the wiki > > >>>> for BJ3 or 1 - and you can point me there. > > >>>>> > > >>>>> Obviously for us dataloading from remote databases we need > > >>>> to have a method to catch connection,sql, datamapping > errors etc. > > >>>>> We ususally throw DataAccessExceptions when this happens, > > >>>> which wrap any java.net, Ibatis and SQL exceptions. > > >>>>> > > >>>>> It is difficult for us to plug in our lazyload over your > > >>>> signatures in > > >>>>> the BJ3 sequence readers, as these don't throw exceptions, > > >>>> because you > > >>>>> seem to use Errors throughout - which dont need to be > > >>>> declared. Infact I cant actually see examples of you > catching and > > >>>> responding to thrown errors. > > >>>>> > > >>>>> for example the setContents routine of the Sequence Readers > > >>>> throws a > > >>>>> CompoundNotFoundError if there is no mapping for a base - > > >>>> but I can't see what is done with this. > > >>>>> > > >>>>> According to the Java Lang Spec, errors are used for > > >>>> "serious problems > > >>>>> that a reasonable application should not try to catch" - > > >>>> but I would > > >>>>> have thought finding an unrecognized base in a sequence was > > >>>> just the > > >>>>> sort of thing that should be thrown and caught and acted > > >>>> on. As error > > >>>>> throwables are not reported in the signature - developers > > >>>> don't have > > >>>>> any clue ( or requirement) that they should be catching and > > >>>>> dealing with errors - which seems a bit dangerous to me... If > > >>>> Exceptions were > > >>>>> thrownrather than Errors that would force the developer to > > >>>>> handle runtime errors > > >>>>> > > >>>>> Cheers Trevor > > >>>>> > > >>>>> -- > > >>>>> The University of Edinburgh is a charitable body, > registered in > > >>>>> Scotland, with registration number SC005336. > > >>>>> > > >>>>> > > >>>>> _______________________________________________ > > >>>>> biojava-dev mailing list > > >>>>> biojava-dev at lists.open-bio.org > > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > >>>> > > >>>> -- > > >>>> Andrew Yates Ensembl Genomes > Engineer EMBL-EBI > > >>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > > >>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK > > >>>> http://www.ensemblgenomes.org/ > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>> -- > > >>> The University of Edinburgh is a charitable body, registered in > > >>> Scotland, with registration number SC005336. > > >>> > > >>> > > >>> _______________________________________________ > > >>> biojava-dev mailing list > > >>> biojava-dev at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > >> > > >> -- > > >> Andrew Yates Ensembl Genomes Engineer > EMBL-EBI > > >> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > > >> +44-(0)1223-494468 Cambridge CB10 1SD, UK > > >> http://www.ensemblgenomes.org/ > > >> > > >> > > >> > > >> > > >> > > >> _______________________________________________ > > >> biojava-dev mailing list > > >> biojava-dev at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI > > Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > > +44-(0)1223-494468 Cambridge CB10 1SD, UK > > http://www.ensemblgenomes.org/ > > > > > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Tue May 18 05:23:14 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 18 May 2010 10:23:14 +0100 Subject: [Biojava-dev] Using DNASequence reverseComplement In-Reply-To: <3FA5C783-F771-4038-BBD7-232EE2D5BDF8@exseed.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20D@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20E@EXSEEDMD.is.ed.ac.uk> <3FA5C783-F771-4038-BBD7-232EE2D5BDF8@exseed.ed.ac.uk> Message-ID: <8F638588-FBDC-4A79-8CC3-0FAB14DE19CD@ebi.ac.uk> At the moment if you ask for these they are returned as views on the original sequence; the idea of this was that Sequence is immutable. If mutability is brought in then that does raise some serious issues for the code but under the current situation then this is a safe & memory efficient way of getting this working. Andy On 18 May 2010, at 10:10, LAW Andrew wrote: > Another philosophical question from the sidelines. > > Does a BJ3 object return a *new* object when asked for a reverseComplement/subSequence/whatever (I suspect from what I have read here and heard from Trevor that it does not)? > > Should it? > > > If I have DNA sequence object A and I ask it for the reverseComplement of the last 16 bases (call it sequence B) and then modify sequence A to change the last base (if that is possible?), what would happen to the sequence B? (And what do we think *should* happen to sequence B)? > > > > On 18 May 2010, at 10:00, PATERSON Trevor wrote: > >> Hi andy >> >> I know this is all work in progress so I'm not too hung up on everything not working out the box.. >> >> What I am doing is stitching together contigs into an assembly, so I have an AssembledDNASequence Object, >> that extends DNASequence but defers its method to an Assembly Object which contains a Map of DNASequences >> with ranges and orientations. This map is created from a single SQL call, and is populated with DNASequence >> Objects that lazy load their actual sequence data from the database when required >> >> The only way I can check that the map is giving my the correct assembly is to print out >> the sequence as string of each component in order, >> ie >> Frag 1, from 1 to 200 >> Gap of 150 N >> Frag 2, from 1 to 5000 REVERSED >> Frag 3, from 150 to 6000 >> Gap of 2000 N >> >> etc >> >> I think your suggestion actually demonstrates the 'bug' ( or 'feature' ;) >> >> seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() >> returns the whole reverse complement - not just 1-4: AACCCGGGGTTTTT >> Because the getSequenceAsString is looking at the parent of the view : >> >> Something even more horrible happens if you try to bound that query >> seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.POSITIVE) >> Returns AAAA ( ie this is 1-4 of the parent, not even the reverse complement.) >> And perhaps more interestingly >> seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString(1,4,Strand.NEGATIVE)) >> Also returns AAAA (which I just plain can't understand..) >> >> ___________________________________________________________________________________ >> >> Anyway - I am probably at a stage where I don't want to do any more development on this at the moment... >> I am reasonably happy that an EnsemblAPI will be able to use the BioJava Sequence objects down the road. >> >> I have made an EnsemblDNASequenceReader that extends and implements ProxySequenceReader >> This is used to create a DataSourceAwareDNASequence and responsible for lazy loading the actual sequence data >> from Ensembl when required >> >> To do this I needed to give the AbstractSequences's (SequenceReader) sequenceStorage property and its getter protected rather >> than private access, because there needs to be exchange of info from one to the other >> >> I will need to implement some sort of chunking and caching for retrieval of larger sequences, and maybe even pass >> all the methods to a buffered reader for large sequences, but I'm not going to worry about this at the moment. >> >> There are plenty of other things for me to protoype in ensembl at the moment! >> >> Cheers Trevor >> >> >> >> >> >> >> >> >>> -----Original Message----- >>> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates >>> Sent: 17 May 2010 17:08 >>> To: PATERSON Trevor >>> Cc: biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] Using DNASequence reverseComplement >>> >>> Hi Trevor, >>> >>> You've stumbled right into something myself & Scooter are >>> trying to clean-up now. The assumption I had originally made >>> is that all operations on things like getSequenceAsString() >>> would go via the view since that's where the logic is located >>> for both the reversing & the complementing code. That call >>> now delegates onto the backing store and not the view which >>> means you get these very odd results happening. >>> >>> For the moment I think the following code would do what >>> you're expecting: >>> >>> seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString() >>> >>> It's annoying because you're in the reverse coordinate system >>> you've got to reverse the original coordinates so asking for >>> position 11,14 just isn't going to work. The other way of >>> working with this would be to construct the views yourself >>> and pass in a subsequence of the original sequence i.e. >>> >>> new ReversedSequenceView(new >>> ComplementSequenceView(seq.getSubSequence(11,14)); >>> >>> This is really a problem with the erasure of the Sequence >>> types from DNASequence. If DNASequence returned the same type >>> from its subsequence method then you would just call revComp >>> on that and it would have been fine. >>> >>> The thing to take away from are: >>> >>> * getSequenceAsString(Integer, Integer, Strand) is not well >>> supported atmo >>> >>> * So long as we are sure it should remain it will be >>> >>> * There should be no reason to materialise the Sequence into >>> a String to get a part of the API working. If there is then >>> we've messed up >>> >>> Andy >>> >>> p.s. The strand stuff is confusing; originally it was meant >>> to be +ve & -ve strands but assumed that the Sequence you had >>> was always on the +ve strand. Eventually the meaning will >>> come back but will require the methods to be more aware of >>> the strand DNA is on to make the right call about what you >>> want to do. This all ties in with circular genomes support >>> and locations >>> >>> On 17 May 2010, at 16:24, PATERSON Trevor wrote: >>> >>>> Sorry for raising that behemoth earlier.. >>>> >>>> I have a separate problem with the DNASequence API - >>>> >>>> Probably I just don't understand how to use the View objects >>>> >>>> >>>> If I make a DNASequence >>>> >>>> DNASequence seq = new DNASequence("AAAAACCCCGGGTT"); >>>> >>>> i.e. length = 14, >>>> >>>> I might reasonably want to get the ReverseComplement of >>> bases 11-14, which should 'be' "AACC" >>>> >>>> But I cannot manage to get this in one easy step.... >>>> >>>> seq.toString(): AAAAACCCCGGGTT --> FINE >>>> >>>> seq.getReverseComplement().getSequenceAsString(): >>> AACCCGGGGTTTTT --> >>>> FINE >>>> >>>> But when I try to use bounds on this complement - methods >>> refer back >>>> to the original seq's iterator, not the complement >>>> >>>> >>> seq.getReverseComplement().getSequenceAsString(11,14,Strand.PO >>> SITIVE): GGTT >>>> i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE) >>>> >>> seq.getReverseComplement().getSequenceAsString(11,14,Strand.NE >>> GATIVE): TTGG >>>> i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE) >>>> >>>> Is this the desired behaviour? How would I get the desired >>> reverseComplement fragment? >>>> >>>> The only obvious way that I can see is >>>> >>>> DNASequence subseq = new >>> DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE)); >>>> System.out.println(""+ >>>> subseq.getReverseComplement().getSequenceAsString()); >>>> >>>> >>> ______________________________________________________________________ >>>> _______________________ >>>> >>>> On a related point I was mightily confused by the >>>> Strand.POSITIVE/Strand.NEGATIVE enumeration >>>> >>>> I was naively interpreting them to refer to the strand of the DNA: >>>> Whereas they infact refer to the directionality of the Iterator *on >>>> the same Strand* >>>> >>>> A better name might be Direction:FORWARDS/Direction.BACKWARDS? >>>> Positive and negative strand has loaded biological meaning >>> for newbies >>>> like me ( sense versus antisense ) So I made the assumption that a >>>> Strand.NEGATIVE call would itself reverseComplement >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From markjschreiber at gmail.com Tue May 18 05:23:47 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 18 May 2010 17:23:47 +0800 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> Message-ID: The script example is a good one. If you truly want to do something quick and dirty you could use BioJava with Groovy and ignore exceptions completely. This also eludes to a problem with making BioJava easy to use and safe. I have come to the conclusion that you can't really make a Java API easy for beginners. You can make it safe using all kinds of best practices from Bloch's effective Java (private setters, no public constructers etc) but that makes it quite a bit harder for people who are newer to Java (and are used to bean like patterns). Alternatively you can use bean like patterns (useful in JEE settings) at the great danger that people will corrupt objects using public setters and cause all kinds of concurrent problems with threads. An experience developer should be able to handle both. Someone new will struggle with both. Languages like Groovy might be best for the casual or new user. Very quick to get up and running and somewhat safe (scripting languages are never really safe). It wouldn't be too hard to make some convenience Groovy objects; you could make a GSequence for example to make use of Groovy's overloaded operators. Anyhow, making BioJava easier to use was one argument for making exceptions inherit from RuntimeException. Ultimately I don't think we made it easier to use for beginners although it did make code tidier and hopefully forced people to think about defensive code. - Mark On Tue, May 18, 2010 at 5:06 PM, PATERSON Trevor wrote: > > >> On reflection I'm not sure how well you can recover from >> something like an unknown compound. When you hit that >> exception what are you going to do with it? > > I think it depends on what level your application is working at - > if your application is just a script doing some data munging maybe > that is a fatal exception that you want to cause the script to die gracefully... > But if you are running a large graphical display program and you ask for the > sequence of have a particular component ?- you need to be catching the exception > and handling the failed request > > That's my 2.5p ;) > > Trevor > > > > > > >> -----Original Message----- >> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of Andy Yates >> Sent: 18 May 2010 09:53 >> To: Mark Schreiber >> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Errors versus Exceptions >> >> You've made that previous decision a lot harder now Mark (I >> knew it was over too quickly). >> >> On reflection I'm not sure how well you can recover from >> something like an unknown compound. When you hit that >> exception what are you going to do with it? Something like >> IOExceptions for a file not being open you can do something >> about that (say it's NFS you could always retry after an >> allotted amount of time has passed). But an unknown compound >> ... not sure how you can deal with that apart from callbacks >> in the parsers. If you think of this in terms of FASTA input >> into an aligner program and you have an unknown compound >> there all you're going to do is to pass a message back to the >> user saying you didn't understand the input sequence with the >> error message from the Exception something like "Compound '?' >> is not supported". You could always recover attempting to >> move through different CompoundSets seeing if one works but >> are you going to do that? >> >> I guess the thing to do is to start with checked exceptions >> in potentially recoverable situations and if they prove to be >> too clumsy then as BioJava did previously we will switch to >> the RuntimeException as a base class. We should also >> endeavour to pass on exceptions and avoid excessive exception >> wrapping. >> >> I do think there is a situation where we can relax these >> rules; I would like to see the relaxation of them when a >> developer is building say a DNASequence but only using a >> String. The same should happen for helper classes. >> >> That's my 2p :) >> >> Andy >> >> >> On 18 May 2010, at 04:53, Mark Schreiber wrote: >> >> > To give some historical perspective. This started happening >> in BioJava >> > 1.5. We switched BioException to extend RuntimeException >> (which is not >> > checked). This came from some thinking at the time that Java's >> > exception checking was a little bit broken (as exemplified by >> > BioJava). ?Checked exceptions are good as long as you don't swallow >> > them in the API or just use them to spew a stack trace. Spitting a >> > stack trace is pointless as the JVM will do a much better >> job of this >> > when it dumps you out. Swallowing them in the API is stupid >> because it >> > is the developer who needs to do something with it. Unless >> it is truly >> > an Error which is the only time an Error should be used >> (unrecoverable >> > problems). >> > >> > RuntimeExceptions should be used for any type of exception >> that can be >> > avoided by defensive programming. NullPointerExceptions and >> > IndexOutOfBounds exceptions are classic examples. There are >> hundreds >> > of places in Java where you could get one of these but you >> can easily >> > avoid them by checking if a collection contains an item, >> doing simple >> > if( x == null) checks, using array.length etc. For this >> reason these >> > and some other Java exceptions extend RuntimeException and are not >> > checked. Can you imagine what Java would look like if you >> actually had >> > to put every array access in a try, catch statement? >> > >> > Therefore, as much as possible I think BioJava3 should have >> exceptions >> > that extend RuntimeException and provide a defensive mechanism to >> > avoid having them happen. For example in creating a DNA >> sequence from >> > text you could provide a validator which will check if >> there are any >> > "incorrect" characters. Good code examples on the wiki >> should show the >> > use of defensive programming and not just surround >> everything with a >> > try catch. ?Importantly RuntimeExceptions should be >> declared so people >> > are aware they may occur. This is not required by the >> compiler but it >> > is good practice for documentation purposes. The Exception class >> > itself should probably contain some Javadoc which explains >> how it can >> > be avoided in the first place. Checked exceptions are good >> for cases >> > where you can recover (although this is often more challenging than >> > people think) but it is always easier and faster to check yourself, >> > throwing the exception and generating the stacktrace etc >> takes quite a >> > lot of effort from the JVM. >> > >> > Finally, I don't think you should be throwing Errors to often. >> > Although Errors are unchecked and give you the same effect as >> > RuntimeExceptions they imply something really bad has gone wrong. >> > These should be reserved for things like configuration files being >> > corrupt or things that would prevent BioJava from being >> used, like not >> > finding a required JAR file or plugin. You can't recover from these. >> > If your API has lots of places where Errors can occur it might >> > indicate poor design. >> > >> > - Mark >> > >> > On Mon, May 17, 2010 at 10:30 PM, Andy Yates >> wrote: >> >> >> >> There's something that irks me with checked exceptions & I >> found code worked better once I went to a fail-fast method of >> coding but for an API I can see the usefulness of them. >> >> >> >> Wow I think this could be the fastest any group has ever >> dealt with >> >> the Checked/Unchecked exception argument :) >> >> >> >> Andy >> >> >> >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: >> >> >> >>> >> >>> >> >>>> Is the problem in the code the fact that errors are used or that >> >>>> they are not declared on the method signatures? >> >>> >> >>> Declaring them would help - but >> >>> Even if you declare Errors in the signature, there is >> nothing to force you to catch them. >> >>> So nothing to alert you to the chance that they may be >> thrown when you are writing code. >> >>> >> >>> public class Demo { >> >>> >> >>> ? ? ? public ?static void main(String[] args) throws Exception{ >> >>> >> >>> ? ? ? Demo.demo(); >> >>> ? ? ? ? ? ? ? System.out.println("never reached"); >> >>> >> >>> ? ? ? } >> >>> >> >>> ? ? ? public static void demo() throws Error { >> >>> ? ? ? ? ? ? ? throw new Error("kills the runtime"); >> >>> ? ? ? } >> >>> } >> >>> >> >>> Compiles OK with or without declaring the throwable - but >> throws a runtime error. >> >>> >> >>> :. IMHO throwing errors is not very useful to developer >> >>> - they can use them - IF they know the code well enough >> to know they >> >>> might be thrown >> >>> >> >>> If Exceptions were used >> >>> 1. they would HAVE to be dealt with - if a developer just >> swallows >> >>> something bad, than that's their responsibility for >> shitty code ;) >> >>> 2. like Errors they would be extendable, so real case dataloaders >> >>> could use their own exception types and decide how to >> deal with or >> >>> ignore them >> >>> >> >>> I guess in my background of coding for data access over the wire >> >>> with jdbc and webservices >> >>> - you need to be aware of the real probability of connection and >> >>> request failures >> >>> - so explicitly using Exceptions forces the developer to code >> >>> defensively >> >>> >> >>> >> >>> >> >>> >> >>>> >> >>>> On a more design note I do not like checked exceptions for two >> >>>> reasons. The first is they seem to make more junior developers >> >>>> catch Exception and swallow it. The second is whenever I've been >> >>>> Java coding in the past whenever a checked exception gets thrown >> >>>> (say IOException because of incorrect file permissions) I cannot >> >>>> deal with it which in the past has meant I either forward on the >> >>>> problem or re-throw in an unchecked exception. >> >>>> >> >>>> That said as Scooter mentioned the exception system was >> rushed out >> >>>> in the hackathon and did not have much work put into it. >> >>>> >> >>>> Andy >> >>>> >> >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >> >>>> >> >>>>> resending cos of bad headers >> >>>>> >> >>>>> ________________________________ >> >>>>> >> >>>>> From: PATERSON Trevor >> >>>>> Sent: 17 May 2010 13:02 >> >>>>> To: biojava-dev at lists.open-bio.org >> >>>>> Subject: Errors versus Exceptions >> >>>>> >> >>>>> >> >>>>> Could I ask a quick question about why BJ3 seems to use >> >>>> Errors rather >> >>>>> than Exceptions >> >>>>> - maybe this is already documented somewhere on the wiki >> >>>> for BJ3 or 1 - and you can point me there. >> >>>>> >> >>>>> Obviously for us dataloading from remote databases we need >> >>>> to have a method to catch connection,sql, datamapping errors etc. >> >>>>> We ususally throw DataAccessExceptions when this happens, >> >>>> which wrap any java.net, Ibatis and SQL exceptions. >> >>>>> >> >>>>> It is difficult for us to plug in our lazyload over your >> >>>> signatures in >> >>>>> the BJ3 sequence readers, as these don't throw exceptions, >> >>>> because you >> >>>>> seem to use Errors throughout - which dont need to be >> >>>> declared. Infact I cant actually see examples of you >> catching and >> >>>> responding to thrown errors. >> >>>>> >> >>>>> for example the setContents routine of the Sequence Readers >> >>>> throws a >> >>>>> CompoundNotFoundError if there is no mapping for a base - >> >>>> but I can't see what is done with this. >> >>>>> >> >>>>> According to the Java Lang Spec, errors are used for >> >>>> "serious problems >> >>>>> that a reasonable application should not try to catch" - >> >>>> but I would >> >>>>> have thought finding an unrecognized base in a sequence was >> >>>> just the >> >>>>> sort of thing that should be thrown and caught and acted >> >>>> on. As error >> >>>>> throwables are not reported in the signature - developers >> >>>> don't have >> >>>>> any clue ( or requirement) that they should be catching and >> >>>>> dealing with errors - which seems a bit dangerous to me... If >> >>>> Exceptions were >> >>>>> thrownrather than Errors that would force the developer >> to handle >> >>>>> runtime errors >> >>>>> >> >>>>> Cheers Trevor >> >>>>> >> >>>>> -- >> >>>>> The University of Edinburgh is a charitable body, registered in >> >>>>> Scotland, with registration number SC005336. >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> biojava-dev mailing list >> >>>>> biojava-dev at lists.open-bio.org >> >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >>>> >> >>>> -- >> >>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> >>>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> >>>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> >>>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>> -- >> >>> The University of Edinburgh is a charitable body, registered in >> >>> Scotland, with registration number SC005336. >> >>> >> >>> >> >>> _______________________________________________ >> >>> biojava-dev mailing list >> >>> biojava-dev at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> >> >> -- >> >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> biojava-dev mailing list >> >> biojava-dev at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > From ayates at ebi.ac.uk Tue May 18 05:27:38 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 18 May 2010 10:27:38 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> Message-ID: <1A231587-4E8B-4C41-8297-CE0C9F963F9B@ebi.ac.uk> Doesn't that bring it back to unchecked exceptions? On 18 May 2010, at 10:22, PATERSON Trevor wrote: > Yes... But... > > If our implementation relies on a lazy load underneath every Sequence method implementation.. > > Then every method can throw an IO type Exception > > In essence you can instantiate a valid Sequence object with a LazyLoading Reader before you get the valid SequenceData, > unlike the case where you can check that you can make a valid Sequence object from the Reader. > > >> -----Original Message----- >> From: Mark Schreiber [mailto:markjschreiber at gmail.com] >> Sent: 18 May 2010 10:11 >> To: Andy Yates >> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Errors versus Exceptions >> >> Your ability to recover really depends on the context and how >> much effort you want to make. If you are in a long running >> app on a server you should try very hard to recover or at >> least fail gracefully without crashing. For small programs, >> who cares? In interactive contexts you can request feedback >> from the user. For example, if your FASTA file is not FASTA >> you could ask for another file. >> >> Clarifying a little more, Unchecked exceptions are for things >> that could easily be avoided if the API allows for it by >> doing simple checks (testing for null, validating input etc). >> Checked exceptions should be for things that are difficult to >> check or can be unpredictable; IOException is a good example. >> An IOException can happen because of something completely out >> of your control. It needs to be checked because defensive >> programming is near impossible; you could check that a file >> exists but a sporadic network failure between calls might >> make it unreachable. Finally, Errors are for the >> unrecoverable. Things that will probably even bring down a >> long running app or server app (or at least require some >> reconfiguration or redeployment). >> >> I think a well designed API can make most exceptions >> unchecked (and avoidable with good programming), small >> numbers of Checked exceptions for things that can't be >> avoided with defensive programming and a very small number of >> things that can cause errors. >> >> - Mark >> >> On Tue, May 18, 2010 at 4:52 PM, Andy Yates wrote: >>> >>> You've made that previous decision a lot harder now Mark (I >> knew it was over too quickly). >>> >>> On reflection I'm not sure how well you can recover from >> something like an unknown compound. When you hit that >> exception what are you going to do with it? Something like >> IOExceptions for a file not being open you can do something >> about that (say it's NFS you could always retry after an >> allotted amount of time has passed). But an unknown compound >> ... not sure how you can deal with that apart from callbacks >> in the parsers. If you think of this in terms of FASTA input >> into an aligner program and you have an unknown compound >> there all you're going to do is to pass a message back to the >> user saying you didn't understand the input sequence with the >> error message from the Exception something like "Compound '?' >> is not supported". You could always recover attempting to >> move through different CompoundSets seeing if one works but >> are you going to do that? >>> >>> I guess the thing to do is to start with checked exceptions >> in potentially recoverable situations and if they prove to be >> too clumsy then as BioJava did previously we will switch to >> the RuntimeException as a base class. We should also >> endeavour to pass on exceptions and avoid excessive exception >> wrapping. >>> >>> I do think there is a situation where we can relax these >> rules; I would like to see the relaxation of them when a >> developer is building say a DNASequence but only using a >> String. The same should happen for helper classes. >>> >>> That's my 2p :) >>> >>> Andy >>> >>> >>> On 18 May 2010, at 04:53, Mark Schreiber wrote: >>> >>>> To give some historical perspective. This started happening in >>>> BioJava 1.5. We switched BioException to extend RuntimeException >>>> (which is not checked). This came from some thinking at the time >>>> that Java's exception checking was a little bit broken (as >>>> exemplified by BioJava). Checked exceptions are good as >> long as you >>>> don't swallow them in the API or just use them to spew a stack >>>> trace. Spitting a stack trace is pointless as the JVM >> will do a much >>>> better job of this when it dumps you out. Swallowing them >> in the API >>>> is stupid because it is the developer who needs to do >> something with >>>> it. Unless it is truly an Error which is the only time an Error >>>> should be used (unrecoverable problems). >>>> >>>> RuntimeExceptions should be used for any type of >> exception that can >>>> be avoided by defensive programming. NullPointerExceptions and >>>> IndexOutOfBounds exceptions are classic examples. There >> are hundreds >>>> of places in Java where you could get one of these but you can >>>> easily avoid them by checking if a collection contains an item, >>>> doing simple if( x == null) checks, using array.length >> etc. For this >>>> reason these and some other Java exceptions extend >> RuntimeException >>>> and are not checked. Can you imagine what Java would look like if >>>> you actually had to put every array access in a try, >> catch statement? >>>> >>>> Therefore, as much as possible I think BioJava3 should have >>>> exceptions that extend RuntimeException and provide a defensive >>>> mechanism to avoid having them happen. For example in >> creating a DNA >>>> sequence from text you could provide a validator which >> will check if >>>> there are any "incorrect" characters. Good code examples >> on the wiki >>>> should show the use of defensive programming and not just >> surround >>>> everything with a try catch. Importantly >> RuntimeExceptions should >>>> be declared so people are aware they may occur. This is >> not required >>>> by the compiler but it is good practice for documentation >> purposes. >>>> The Exception class itself should probably contain some Javadoc >>>> which explains how it can be avoided in the first place. Checked >>>> exceptions are good for cases where you can recover >> (although this >>>> is often more challenging than people think) but it is >> always easier >>>> and faster to check yourself, throwing the exception and >> generating >>>> the stacktrace etc takes quite a lot of effort from the JVM. >>>> >>>> Finally, I don't think you should be throwing Errors to often. >>>> Although Errors are unchecked and give you the same effect as >>>> RuntimeExceptions they imply something really bad has gone wrong. >>>> These should be reserved for things like configuration >> files being >>>> corrupt or things that would prevent BioJava from being >> used, like >>>> not finding a required JAR file or plugin. You can't >> recover from these. >>>> If your API has lots of places where Errors can occur it might >>>> indicate poor design. >>>> >>>> - Mark >>>> >>>> On Mon, May 17, 2010 at 10:30 PM, Andy Yates >> wrote: >>>>> >>>>> There's something that irks me with checked exceptions & >> I found code worked better once I went to a fail-fast method >> of coding but for an API I can see the usefulness of them. >>>>> >>>>> Wow I think this could be the fastest any group has ever >> dealt with >>>>> the Checked/Unchecked exception argument :) >>>>> >>>>> Andy >>>>> >>>>> On 17 May 2010, at 14:58, PATERSON Trevor wrote: >>>>> >>>>>> >>>>>> >>>>>>> Is the problem in the code the fact that errors are >> used or that >>>>>>> they are not declared on the method signatures? >>>>>> >>>>>> Declaring them would help - but >>>>>> Even if you declare Errors in the signature, there is >> nothing to force you to catch them. >>>>>> So nothing to alert you to the chance that they may be >> thrown when you are writing code. >>>>>> >>>>>> public class Demo { >>>>>> >>>>>> public static void main(String[] args) throws Exception{ >>>>>> >>>>>> Demo.demo(); >>>>>> System.out.println("never reached"); >>>>>> >>>>>> } >>>>>> >>>>>> public static void demo() throws Error { >>>>>> throw new Error("kills the runtime"); >>>>>> } >>>>>> } >>>>>> >>>>>> Compiles OK with or without declaring the throwable - >> but throws a runtime error. >>>>>> >>>>>> :. IMHO throwing errors is not very useful to developer >>>>>> - they can use them - IF they know the code well enough to know >>>>>> they might be thrown >>>>>> >>>>>> If Exceptions were used >>>>>> 1. they would HAVE to be dealt with - if a developer >> just swallows >>>>>> something bad, than that's their responsibility for >> shitty code ;) >>>>>> 2. like Errors they would be extendable, so real case >> dataloaders >>>>>> could use their own exception types and decide how to >> deal with or >>>>>> ignore them >>>>>> >>>>>> I guess in my background of coding for data access over >> the wire >>>>>> with jdbc and webservices >>>>>> - you need to be aware of the real probability of >> connection and >>>>>> request failures >>>>>> - so explicitly using Exceptions forces the developer to code >>>>>> defensively >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> On a more design note I do not like checked exceptions for two >>>>>>> reasons. The first is they seem to make more junior developers >>>>>>> catch Exception and swallow it. The second is whenever >> I've been >>>>>>> Java coding in the past whenever a checked exception >> gets thrown >>>>>>> (say IOException because of incorrect file >> permissions) I cannot >>>>>>> deal with it which in the past has meant I either >> forward on the >>>>>>> problem or re-throw in an unchecked exception. >>>>>>> >>>>>>> That said as Scooter mentioned the exception system was rushed >>>>>>> out in the hackathon and did not have much work put into it. >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >>>>>>> >>>>>>>> resending cos of bad headers >>>>>>>> >>>>>>>> ________________________________ >>>>>>>> >>>>>>>> From: PATERSON Trevor >>>>>>>> Sent: 17 May 2010 13:02 >>>>>>>> To: biojava-dev at lists.open-bio.org >>>>>>>> Subject: Errors versus Exceptions >>>>>>>> >>>>>>>> >>>>>>>> Could I ask a quick question about why BJ3 seems to use >>>>>>> Errors rather >>>>>>>> than Exceptions >>>>>>>> - maybe this is already documented somewhere on the wiki >>>>>>> for BJ3 or 1 - and you can point me there. >>>>>>>> >>>>>>>> Obviously for us dataloading from remote databases we need >>>>>>> to have a method to catch connection,sql, datamapping >> errors etc. >>>>>>>> We ususally throw DataAccessExceptions when this happens, >>>>>>> which wrap any java.net, Ibatis and SQL exceptions. >>>>>>>> >>>>>>>> It is difficult for us to plug in our lazyload over your >>>>>>> signatures in >>>>>>>> the BJ3 sequence readers, as these don't throw exceptions, >>>>>>> because you >>>>>>>> seem to use Errors throughout - which dont need to be >>>>>>> declared. Infact I cant actually see examples of you >> catching and >>>>>>> responding to thrown errors. >>>>>>>> >>>>>>>> for example the setContents routine of the Sequence Readers >>>>>>> throws a >>>>>>>> CompoundNotFoundError if there is no mapping for a base - >>>>>>> but I can't see what is done with this. >>>>>>>> >>>>>>>> According to the Java Lang Spec, errors are used for >>>>>>> "serious problems >>>>>>>> that a reasonable application should not try to catch" - >>>>>>> but I would >>>>>>>> have thought finding an unrecognized base in a sequence was >>>>>>> just the >>>>>>>> sort of thing that should be thrown and caught and acted >>>>>>> on. As error >>>>>>>> throwables are not reported in the signature - developers >>>>>>> don't have >>>>>>>> any clue ( or requirement) that they should be catching and >>>>>>>> dealing with errors - which seems a bit dangerous to me... If >>>>>>> Exceptions were >>>>>>>> thrownrather than Errors that would force the developer to >>>>>>>> handle runtime errors >>>>>>>> >>>>>>>> Cheers Trevor >>>>>>>> >>>>>>>> -- >>>>>>>> The University of Edinburgh is a charitable body, >> registered in >>>>>>>> Scotland, with registration number SC005336. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> biojava-dev mailing list >>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>> >>>>>>> -- >>>>>>> Andrew Yates Ensembl Genomes >> Engineer EMBL-EBI >>>>>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: >>>>>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>>>>> http://www.ensemblgenomes.org/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>>> -- >>>>> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI >>>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: >>>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>>> http://www.ensemblgenomes.org/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI >>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: >>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>> http://www.ensemblgenomes.org/ >>> >>> >>> >>> >> > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From markjschreiber at gmail.com Tue May 18 05:28:24 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 18 May 2010 17:28:24 +0800 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> Message-ID: Yikes! OK, in that case you will need to declare IOExceptions in lots of places. In this day and age I would not advise making your lazy loading dependent on a file system. You will need to allow for distributed FS's, databases, AmazonS3, distributed memory caches. Could turn into a bit of a headache - Mark On Tue, May 18, 2010 at 5:22 PM, PATERSON Trevor wrote: > Yes... But... > > If our implementation relies on a lazy load underneath every Sequence method implementation.. > > Then every method can throw an IO type Exception > > In essence you can instantiate a valid Sequence object with a LazyLoading Reader before you get the valid SequenceData, > unlike the case where you can check that you can make a valid Sequence object from the Reader. > > >> -----Original Message----- >> From: Mark Schreiber [mailto:markjschreiber at gmail.com] >> Sent: 18 May 2010 10:11 >> To: Andy Yates >> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Errors versus Exceptions >> >> Your ability to recover really depends on the context and how >> much effort you want to make. If you are in a long running >> app on a server you should try very hard to recover or at >> least fail gracefully without crashing. For small programs, >> who cares? In interactive contexts you can request feedback >> from the user. For example, if your FASTA file is not FASTA >> you could ask for another file. >> >> Clarifying a little more, Unchecked exceptions are for things >> that could easily be avoided if the API allows for it by >> doing simple checks (testing for null, validating input etc). >> Checked exceptions should be for things that are difficult to >> check or can be unpredictable; IOException is a good example. >> An IOException can happen because of something completely out >> of your control. It needs to be checked because defensive >> programming is near impossible; you could check that a file >> exists but a sporadic network failure between calls might >> make it unreachable. ?Finally, Errors are for the >> unrecoverable. Things that will probably even bring down a >> long running app or server app (or at least require some >> reconfiguration or redeployment). >> >> I think a well designed API can make most exceptions >> unchecked (and avoidable with good programming), small >> numbers of Checked exceptions for things that can't be >> avoided with defensive programming and a very small number of >> things that can cause errors. >> >> - Mark >> >> On Tue, May 18, 2010 at 4:52 PM, Andy Yates wrote: >> > >> > You've made that previous decision a lot harder now Mark (I >> knew it was over too quickly). >> > >> > On reflection I'm not sure how well you can recover from >> something like an unknown compound. When you hit that >> exception what are you going to do with it? Something like >> IOExceptions for a file not being open you can do something >> about that (say it's NFS you could always retry after an >> allotted amount of time has passed). But an unknown compound >> ... not sure how you can deal with that apart from callbacks >> in the parsers. If you think of this in terms of FASTA input >> into an aligner program and you have an unknown compound >> there all you're going to do is to pass a message back to the >> user saying you didn't understand the input sequence with the >> error message from the Exception something like "Compound '?' >> is not supported". You could always recover attempting to >> move through different CompoundSets seeing if one works but >> are you going to do that? >> > >> > I guess the thing to do is to start with checked exceptions >> in potentially recoverable situations and if they prove to be >> too clumsy then as BioJava did previously we will switch to >> the RuntimeException as a base class. We should also >> endeavour to pass on exceptions and avoid excessive exception >> wrapping. >> > >> > I do think there is a situation where we can relax these >> rules; I would like to see the relaxation of them when a >> developer is building say a DNASequence but only using a >> String. The same should happen for helper classes. >> > >> > That's my 2p :) >> > >> > Andy >> > >> > >> > On 18 May 2010, at 04:53, Mark Schreiber wrote: >> > >> > > To give some historical perspective. This started happening in >> > > BioJava 1.5. We switched BioException to extend RuntimeException >> > > (which is not checked). This came from some thinking at the time >> > > that Java's exception checking was a little bit broken (as >> > > exemplified by BioJava). ?Checked exceptions are good as >> long as you >> > > don't swallow them in the API or just use them to spew a stack >> > > trace. Spitting a stack trace is pointless as the JVM >> will do a much >> > > better job of this when it dumps you out. Swallowing them >> in the API >> > > is stupid because it is the developer who needs to do >> something with >> > > it. Unless it is truly an Error which is the only time an Error >> > > should be used (unrecoverable problems). >> > > >> > > RuntimeExceptions should be used for any type of >> exception that can >> > > be avoided by defensive programming. NullPointerExceptions and >> > > IndexOutOfBounds exceptions are classic examples. There >> are hundreds >> > > of places in Java where you could get one of these but you can >> > > easily avoid them by checking if a collection contains an item, >> > > doing simple if( x == null) checks, using array.length >> etc. For this >> > > reason these and some other Java exceptions extend >> RuntimeException >> > > and are not checked. Can you imagine what Java would look like if >> > > you actually had to put every array access in a try, >> catch statement? >> > > >> > > Therefore, as much as possible I think BioJava3 should have >> > > exceptions that extend RuntimeException and provide a defensive >> > > mechanism to avoid having them happen. For example in >> creating a DNA >> > > sequence from text you could provide a validator which >> will check if >> > > there are any "incorrect" characters. Good code examples >> on the wiki >> > > should show the use of defensive programming and not just >> surround >> > > everything with a try catch. ?Importantly >> RuntimeExceptions should >> > > be declared so people are aware they may occur. This is >> not required >> > > by the compiler but it is good practice for documentation >> purposes. >> > > The Exception class itself should probably contain some Javadoc >> > > which explains how it can be avoided in the first place. Checked >> > > exceptions are good for cases where you can recover >> (although this >> > > is often more challenging than people think) but it is >> always easier >> > > and faster to check yourself, throwing the exception and >> generating >> > > the stacktrace etc takes quite a lot of effort from the JVM. >> > > >> > > Finally, I don't think you should be throwing Errors to often. >> > > Although Errors are unchecked and give you the same effect as >> > > RuntimeExceptions they imply something really bad has gone wrong. >> > > These should be reserved for things like configuration >> files being >> > > corrupt or things that would prevent BioJava from being >> used, like >> > > not finding a required JAR file or plugin. You can't >> recover from these. >> > > If your API has lots of places where Errors can occur it might >> > > indicate poor design. >> > > >> > > - Mark >> > > >> > > On Mon, May 17, 2010 at 10:30 PM, Andy Yates >> wrote: >> > >> >> > >> There's something that irks me with checked exceptions & >> I found code worked better once I went to a fail-fast method >> of coding but for an API I can see the usefulness of them. >> > >> >> > >> Wow I think this could be the fastest any group has ever >> dealt with >> > >> the Checked/Unchecked exception argument :) >> > >> >> > >> Andy >> > >> >> > >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: >> > >> >> > >>> >> > >>> >> > >>>> Is the problem in the code the fact that errors are >> used or that >> > >>>> they are not declared on the method signatures? >> > >>> >> > >>> Declaring them would help - but >> > >>> Even if you declare Errors in the signature, there is >> nothing to force you to catch them. >> > >>> So nothing to alert you to the chance that they may be >> thrown when you are writing code. >> > >>> >> > >>> public class Demo { >> > >>> >> > >>> ? ? ? public ?static void main(String[] args) throws Exception{ >> > >>> >> > >>> ? ? ? Demo.demo(); >> > >>> ? ? ? ? ? ? ? System.out.println("never reached"); >> > >>> >> > >>> ? ? ? } >> > >>> >> > >>> ? ? ? public static void demo() throws Error { >> > >>> ? ? ? ? ? ? ? throw new Error("kills the runtime"); >> > >>> ? ? ? } >> > >>> } >> > >>> >> > >>> Compiles OK with or without declaring the throwable - >> but throws a runtime error. >> > >>> >> > >>> :. IMHO throwing errors is not very useful to developer >> > >>> - they can use them - IF they know the code well enough to know >> > >>> they might be thrown >> > >>> >> > >>> If Exceptions were used >> > >>> 1. they would HAVE to be dealt with - if a developer >> just swallows >> > >>> something bad, than that's their responsibility for >> shitty code ;) >> > >>> 2. like Errors they would be extendable, so real case >> dataloaders >> > >>> could use their own exception types and decide how to >> deal with or >> > >>> ignore them >> > >>> >> > >>> I guess in my background of coding for data access over >> the wire >> > >>> with jdbc and webservices >> > >>> - you need to be aware of the real probability of >> connection and >> > >>> request failures >> > >>> - so explicitly using Exceptions forces the developer to code >> > >>> defensively >> > >>> >> > >>> >> > >>> >> > >>> >> > >>>> >> > >>>> On a more design note I do not like checked exceptions for two >> > >>>> reasons. The first is they seem to make more junior developers >> > >>>> catch Exception and swallow it. The second is whenever >> I've been >> > >>>> Java coding in the past whenever a checked exception >> gets thrown >> > >>>> (say IOException because of incorrect file >> permissions) I cannot >> > >>>> deal with it which in the past has meant I either >> forward on the >> > >>>> problem or re-throw in an unchecked exception. >> > >>>> >> > >>>> That said as Scooter mentioned the exception system was rushed >> > >>>> out in the hackathon and did not have much work put into it. >> > >>>> >> > >>>> Andy >> > >>>> >> > >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >> > >>>> >> > >>>>> resending cos of bad headers >> > >>>>> >> > >>>>> ________________________________ >> > >>>>> >> > >>>>> From: PATERSON Trevor >> > >>>>> Sent: 17 May 2010 13:02 >> > >>>>> To: biojava-dev at lists.open-bio.org >> > >>>>> Subject: Errors versus Exceptions >> > >>>>> >> > >>>>> >> > >>>>> Could I ask a quick question about why BJ3 seems to use >> > >>>> Errors rather >> > >>>>> than Exceptions >> > >>>>> - maybe this is already documented somewhere on the wiki >> > >>>> for BJ3 or 1 - and you can point me there. >> > >>>>> >> > >>>>> Obviously for us dataloading from remote databases we need >> > >>>> to have a method to catch connection,sql, datamapping >> errors etc. >> > >>>>> We ususally throw DataAccessExceptions when this happens, >> > >>>> which wrap any java.net, Ibatis and SQL exceptions. >> > >>>>> >> > >>>>> It is difficult for us to plug in our lazyload over your >> > >>>> signatures in >> > >>>>> the BJ3 sequence readers, as these don't throw exceptions, >> > >>>> because you >> > >>>>> seem to use Errors throughout - which dont need to be >> > >>>> declared. Infact I cant actually see examples of you >> catching and >> > >>>> responding to thrown errors. >> > >>>>> >> > >>>>> for example the setContents routine of the Sequence Readers >> > >>>> throws a >> > >>>>> CompoundNotFoundError if there is no mapping for a base - >> > >>>> but I can't see what is done with this. >> > >>>>> >> > >>>>> According to the Java Lang Spec, errors are used for >> > >>>> "serious problems >> > >>>>> that a reasonable application should not try to catch" - >> > >>>> but I would >> > >>>>> have thought finding an unrecognized base in a sequence was >> > >>>> just the >> > >>>>> sort of thing that should be thrown and caught and acted >> > >>>> on. As error >> > >>>>> throwables are not reported in the signature - developers >> > >>>> don't have >> > >>>>> any clue ( or requirement) that they should be catching and >> > >>>>> dealing with errors - which seems a bit dangerous to me... If >> > >>>> Exceptions were >> > >>>>> thrownrather than Errors that would force the developer to >> > >>>>> handle runtime errors >> > >>>>> >> > >>>>> Cheers Trevor >> > >>>>> >> > >>>>> -- >> > >>>>> The University of Edinburgh is a charitable body, >> registered in >> > >>>>> Scotland, with registration number SC005336. >> > >>>>> >> > >>>>> >> > >>>>> _______________________________________________ >> > >>>>> biojava-dev mailing list >> > >>>>> biojava-dev at lists.open-bio.org >> > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > >>>> >> > >>>> -- >> > >>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes >> Engineer EMBL-EBI >> > >>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >> > >>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >> > >>>> http://www.ensemblgenomes.org/ >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>> -- >> > >>> The University of Edinburgh is a charitable body, registered in >> > >>> Scotland, with registration number SC005336. >> > >>> >> > >>> >> > >>> _______________________________________________ >> > >>> biojava-dev mailing list >> > >>> biojava-dev at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > >> >> > >> -- >> > >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI >> > >> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >> > >> +44-(0)1223-494468 Cambridge CB10 1SD, UK >> > >> http://www.ensemblgenomes.org/ >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> _______________________________________________ >> > >> biojava-dev mailing list >> > >> biojava-dev at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > >> > -- >> > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI >> > Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >> > +44-(0)1223-494468 Cambridge CB10 1SD, UK >> > http://www.ensemblgenomes.org/ >> > >> > >> > >> > >> > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > From trevor.paterson at roslin.ed.ac.uk Tue May 18 05:31:34 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Tue, 18 May 2010 10:31:34 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E211@EXSEEDMD.is.ed.ac.uk> In our use case the lazy loading is dependent on remote ensmebl mysql access - through Ibatis ;) It is already a headache :( > -----Original Message----- > From: Mark Schreiber [mailto:markjschreiber at gmail.com] > Sent: 18 May 2010 10:28 > To: PATERSON Trevor > Cc: Andy Yates; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Errors versus Exceptions > > Yikes! OK, in that case you will need to declare IOExceptions > in lots of places. In this day and age I would not advise > making your lazy loading dependent on a file system. You will > need to allow for distributed FS's, databases, AmazonS3, > distributed memory caches. > Could turn into a bit of a headache > > - Mark > > On Tue, May 18, 2010 at 5:22 PM, PATERSON Trevor > wrote: > > Yes... But... > > > > If our implementation relies on a lazy load underneath > every Sequence method implementation.. > > > > Then every method can throw an IO type Exception > > > > In essence you can instantiate a valid Sequence object with a > > LazyLoading Reader before you get the valid SequenceData, > unlike the case where you can check that you can make a valid > Sequence object from the Reader. > > > > > >> -----Original Message----- > >> From: Mark Schreiber [mailto:markjschreiber at gmail.com] > >> Sent: 18 May 2010 10:11 > >> To: Andy Yates > >> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org > >> Subject: Re: [Biojava-dev] Errors versus Exceptions > >> > >> Your ability to recover really depends on the context and how much > >> effort you want to make. If you are in a long running app > on a server > >> you should try very hard to recover or at least fail gracefully > >> without crashing. For small programs, who cares? In interactive > >> contexts you can request feedback from the user. For > example, if your > >> FASTA file is not FASTA you could ask for another file. > >> > >> Clarifying a little more, Unchecked exceptions are for things that > >> could easily be avoided if the API allows for it by doing simple > >> checks (testing for null, validating input etc). > >> Checked exceptions should be for things that are difficult > to check > >> or can be unpredictable; IOException is a good example. > >> An IOException can happen because of something completely > out of your > >> control. It needs to be checked because defensive > programming is near > >> impossible; you could check that a file exists but a > sporadic network > >> failure between calls might make it unreachable. Finally, > Errors are > >> for the unrecoverable. Things that will probably even bring down a > >> long running app or server app (or at least require some > >> reconfiguration or redeployment). > >> > >> I think a well designed API can make most exceptions > unchecked (and > >> avoidable with good programming), small numbers of Checked > exceptions > >> for things that can't be avoided with defensive programming and a > >> very small number of things that can cause errors. > >> > >> - Mark > >> > >> On Tue, May 18, 2010 at 4:52 PM, Andy Yates > wrote: > >> > > >> > You've made that previous decision a lot harder now Mark (I > >> knew it was over too quickly). > >> > > >> > On reflection I'm not sure how well you can recover from > >> something like an unknown compound. When you hit that > exception what > >> are you going to do with it? Something like IOExceptions > for a file > >> not being open you can do something about that (say it's NFS you > >> could always retry after an allotted amount of time has > passed). But > >> an unknown compound ... not sure how you can deal with that apart > >> from callbacks in the parsers. If you think of this in > terms of FASTA > >> input into an aligner program and you have an unknown > compound there > >> all you're going to do is to pass a message back to the > user saying > >> you didn't understand the input sequence with the error > message from > >> the Exception something like "Compound '?' > >> is not supported". You could always recover attempting to move > >> through different CompoundSets seeing if one works but are > you going > >> to do that? > >> > > >> > I guess the thing to do is to start with checked exceptions > >> in potentially recoverable situations and if they prove to be too > >> clumsy then as BioJava did previously we will switch to the > >> RuntimeException as a base class. We should also endeavour > to pass on > >> exceptions and avoid excessive exception wrapping. > >> > > >> > I do think there is a situation where we can relax these > >> rules; I would like to see the relaxation of them when a > developer is > >> building say a DNASequence but only using a String. The > same should > >> happen for helper classes. > >> > > >> > That's my 2p :) > >> > > >> > Andy > >> > > >> > > >> > On 18 May 2010, at 04:53, Mark Schreiber wrote: > >> > > >> > > To give some historical perspective. This started happening in > >> > > BioJava 1.5. We switched BioException to extend > RuntimeException > >> > > (which is not checked). This came from some thinking > at the time > >> > > that Java's exception checking was a little bit broken (as > >> > > exemplified by BioJava). Checked exceptions are good as > >> long as you > >> > > don't swallow them in the API or just use them to spew a stack > >> > > trace. Spitting a stack trace is pointless as the JVM > >> will do a much > >> > > better job of this when it dumps you out. Swallowing them > >> in the API > >> > > is stupid because it is the developer who needs to do > >> something with > >> > > it. Unless it is truly an Error which is the only time > an Error > >> > > should be used (unrecoverable problems). > >> > > > >> > > RuntimeExceptions should be used for any type of > >> exception that can > >> > > be avoided by defensive programming. NullPointerExceptions and > >> > > IndexOutOfBounds exceptions are classic examples. There > >> are hundreds > >> > > of places in Java where you could get one of these but you can > >> > > easily avoid them by checking if a collection contains > an item, > >> > > doing simple if( x == null) checks, using array.length > >> etc. For this > >> > > reason these and some other Java exceptions extend > >> RuntimeException > >> > > and are not checked. Can you imagine what Java would > look like if > >> > > you actually had to put every array access in a try, > >> catch statement? > >> > > > >> > > Therefore, as much as possible I think BioJava3 should have > >> > > exceptions that extend RuntimeException and provide a > defensive > >> > > mechanism to avoid having them happen. For example in > >> creating a DNA > >> > > sequence from text you could provide a validator which > >> will check if > >> > > there are any "incorrect" characters. Good code examples > >> on the wiki > >> > > should show the use of defensive programming and not just > >> surround > >> > > everything with a try catch. Importantly > >> RuntimeExceptions should > >> > > be declared so people are aware they may occur. This is > >> not required > >> > > by the compiler but it is good practice for documentation > >> purposes. > >> > > The Exception class itself should probably contain > some Javadoc > >> > > which explains how it can be avoided in the first > place. Checked > >> > > exceptions are good for cases where you can recover > >> (although this > >> > > is often more challenging than people think) but it is > >> always easier > >> > > and faster to check yourself, throwing the exception and > >> generating > >> > > the stacktrace etc takes quite a lot of effort from the JVM. > >> > > > >> > > Finally, I don't think you should be throwing Errors to often. > >> > > Although Errors are unchecked and give you the same effect as > >> > > RuntimeExceptions they imply something really bad has > gone wrong. > >> > > These should be reserved for things like configuration > >> files being > >> > > corrupt or things that would prevent BioJava from being > >> used, like > >> > > not finding a required JAR file or plugin. You can't > >> recover from these. > >> > > If your API has lots of places where Errors can occur it might > >> > > indicate poor design. > >> > > > >> > > - Mark > >> > > > >> > > On Mon, May 17, 2010 at 10:30 PM, Andy Yates > >> wrote: > >> > >> > >> > >> There's something that irks me with checked exceptions & > >> I found code worked better once I went to a fail-fast method of > >> coding but for an API I can see the usefulness of them. > >> > >> > >> > >> Wow I think this could be the fastest any group has ever > >> dealt with > >> > >> the Checked/Unchecked exception argument :) > >> > >> > >> > >> Andy > >> > >> > >> > >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: > >> > >> > >> > >>> > >> > >>> > >> > >>>> Is the problem in the code the fact that errors are > >> used or that > >> > >>>> they are not declared on the method signatures? > >> > >>> > >> > >>> Declaring them would help - but Even if you declare > Errors in > >> > >>> the signature, there is > >> nothing to force you to catch them. > >> > >>> So nothing to alert you to the chance that they may be > >> thrown when you are writing code. > >> > >>> > >> > >>> public class Demo { > >> > >>> > >> > >>> public static void main(String[] args) throws > Exception{ > >> > >>> > >> > >>> Demo.demo(); > >> > >>> System.out.println("never reached"); > >> > >>> > >> > >>> } > >> > >>> > >> > >>> public static void demo() throws Error { > >> > >>> throw new Error("kills the runtime"); > >> > >>> } > >> > >>> } > >> > >>> > >> > >>> Compiles OK with or without declaring the throwable - > >> but throws a runtime error. > >> > >>> > >> > >>> :. IMHO throwing errors is not very useful to developer > >> > >>> - they can use them - IF they know the code well > enough to know > >> > >>> they might be thrown > >> > >>> > >> > >>> If Exceptions were used > >> > >>> 1. they would HAVE to be dealt with - if a developer > >> just swallows > >> > >>> something bad, than that's their responsibility for > >> shitty code ;) > >> > >>> 2. like Errors they would be extendable, so real case > >> dataloaders > >> > >>> could use their own exception types and decide how to > >> deal with or > >> > >>> ignore them > >> > >>> > >> > >>> I guess in my background of coding for data access over > >> the wire > >> > >>> with jdbc and webservices > >> > >>> - you need to be aware of the real probability of > >> connection and > >> > >>> request failures > >> > >>> - so explicitly using Exceptions forces the > developer to code > >> > >>> defensively > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>>> > >> > >>>> On a more design note I do not like checked > exceptions for two > >> > >>>> reasons. The first is they seem to make more junior > developers > >> > >>>> catch Exception and swallow it. The second is whenever > >> I've been > >> > >>>> Java coding in the past whenever a checked exception > >> gets thrown > >> > >>>> (say IOException because of incorrect file > >> permissions) I cannot > >> > >>>> deal with it which in the past has meant I either > >> forward on the > >> > >>>> problem or re-throw in an unchecked exception. > >> > >>>> > >> > >>>> That said as Scooter mentioned the exception system > was rushed > >> > >>>> out in the hackathon and did not have much work put into it. > >> > >>>> > >> > >>>> Andy > >> > >>>> > >> > >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: > >> > >>>> > >> > >>>>> resending cos of bad headers > >> > >>>>> > >> > >>>>> ________________________________ > >> > >>>>> > >> > >>>>> From: PATERSON Trevor > >> > >>>>> Sent: 17 May 2010 13:02 > >> > >>>>> To: biojava-dev at lists.open-bio.org > >> > >>>>> Subject: Errors versus Exceptions > >> > >>>>> > >> > >>>>> > >> > >>>>> Could I ask a quick question about why BJ3 seems to use > >> > >>>> Errors rather > >> > >>>>> than Exceptions > >> > >>>>> - maybe this is already documented somewhere on the wiki > >> > >>>> for BJ3 or 1 - and you can point me there. > >> > >>>>> > >> > >>>>> Obviously for us dataloading from remote databases we need > >> > >>>> to have a method to catch connection,sql, datamapping > >> errors etc. > >> > >>>>> We ususally throw DataAccessExceptions when this happens, > >> > >>>> which wrap any java.net, Ibatis and SQL exceptions. > >> > >>>>> > >> > >>>>> It is difficult for us to plug in our lazyload over your > >> > >>>> signatures in > >> > >>>>> the BJ3 sequence readers, as these don't throw exceptions, > >> > >>>> because you > >> > >>>>> seem to use Errors throughout - which dont need to be > >> > >>>> declared. Infact I cant actually see examples of you > >> catching and > >> > >>>> responding to thrown errors. > >> > >>>>> > >> > >>>>> for example the setContents routine of the Sequence Readers > >> > >>>> throws a > >> > >>>>> CompoundNotFoundError if there is no mapping for a base - > >> > >>>> but I can't see what is done with this. > >> > >>>>> > >> > >>>>> According to the Java Lang Spec, errors are used for > >> > >>>> "serious problems > >> > >>>>> that a reasonable application should not try to catch" - > >> > >>>> but I would > >> > >>>>> have thought finding an unrecognized base in a sequence was > >> > >>>> just the > >> > >>>>> sort of thing that should be thrown and caught and acted > >> > >>>> on. As error > >> > >>>>> throwables are not reported in the signature - developers > >> > >>>> don't have > >> > >>>>> any clue ( or requirement) that they should be > catching and > >> > >>>>> dealing with errors - which seems a bit dangerous > to me... If > >> > >>>> Exceptions were > >> > >>>>> thrownrather than Errors that would force the developer to > >> > >>>>> handle runtime errors > >> > >>>>> > >> > >>>>> Cheers Trevor > >> > >>>>> > >> > >>>>> -- > >> > >>>>> The University of Edinburgh is a charitable body, > >> registered in > >> > >>>>> Scotland, with registration number SC005336. > >> > >>>>> > >> > >>>>> > >> > >>>>> _______________________________________________ > >> > >>>>> biojava-dev mailing list > >> > >>>>> biojava-dev at lists.open-bio.org > >> > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >>>> > >> > >>>> -- > >> > >>>> Andrew Yates Ensembl Genomes > >> Engineer EMBL-EBI > >> > >>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > >> > >>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK > >> > >>>> http://www.ensemblgenomes.org/ > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>> -- > >> > >>> The University of Edinburgh is a charitable body, > registered in > >> > >>> Scotland, with registration number SC005336. > >> > >>> > >> > >>> > >> > >>> _______________________________________________ > >> > >>> biojava-dev mailing list > >> > >>> biojava-dev at lists.open-bio.org > >> > >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> > >> > >> -- > >> > >> Andrew Yates Ensembl Genomes Engineer > >> EMBL-EBI > >> > >> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > >> > >> +44-(0)1223-494468 Cambridge CB10 1SD, UK > >> > >> http://www.ensemblgenomes.org/ > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> > >> biojava-dev mailing list > >> > >> biojava-dev at lists.open-bio.org > >> > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > > >> > -- > >> > Andrew Yates Ensembl Genomes Engineer > >> EMBL-EBI > >> > Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > >> > +44-(0)1223-494468 Cambridge CB10 1SD, UK > >> > http://www.ensemblgenomes.org/ > >> > > >> > > >> > > >> > > >> > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From trevor.paterson at roslin.ed.ac.uk Tue May 18 05:51:49 2010 From: trevor.paterson at roslin.ed.ac.uk (PATERSON Trevor) Date: Tue, 18 May 2010 10:51:49 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> Message-ID: <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk> > You can make it safe using all > kinds of best practices from Bloch's effective Java (private > setters, no public constructers etc) but that makes it quite > a bit harder for people who are newer to Java (and are used > to bean like patterns) I'm not new to Java :) But bean properties and public setters and constuctors are kind of central to the idea of lazy load and datamapping through Ibatis ... There maybe there does come a point where: "hey if I can't make and use these biojava objects without jumping through enormous hoops I'll just make my own objects..." - which obviously would be 'bad'. trevor > -----Original Message----- > From: Mark Schreiber [mailto:markjschreiber at gmail.com] > Sent: 18 May 2010 10:24 > To: PATERSON Trevor > Cc: Andy Yates; biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] Errors versus Exceptions > > The script example is a good one. If you truly want to do > something quick and dirty you could use BioJava with Groovy > and ignore exceptions completely. This also eludes to a > problem with making BioJava easy to use and safe. > > I have come to the conclusion that you can't really make a > Java API easy for beginners. You can make it safe using all > kinds of best practices from Bloch's effective Java (private > setters, no public constructers etc) but that makes it quite > a bit harder for people who are newer to Java (and are used > to bean like patterns). Alternatively you can use bean like > patterns (useful in JEE settings) at the great danger that > people will corrupt objects using public setters and cause > all kinds of concurrent problems with threads. An experience > developer should be able to handle both. Someone new will > struggle with both. > > Languages like Groovy might be best for the casual or new > user. Very quick to get up and running and somewhat safe > (scripting languages are never really safe). It wouldn't be > too hard to make some convenience Groovy objects; you could > make a GSequence for example to make use of Groovy's > overloaded operators. > > Anyhow, making BioJava easier to use was one argument for > making exceptions inherit from RuntimeException. Ultimately I > don't think we made it easier to use for beginners although > it did make code tidier and hopefully forced people to think > about defensive code. > > - Mark > > On Tue, May 18, 2010 at 5:06 PM, PATERSON Trevor > wrote: > > > > > >> On reflection I'm not sure how well you can recover from something > >> like an unknown compound. When you hit that exception what are you > >> going to do with it? > > > > I think it depends on what level your application is > working at - if > > your application is just a script doing some data munging > maybe that > > is a fatal exception that you want to cause the script to > die gracefully... > > But if you are running a large graphical display program > and you ask > > for the sequence of have a particular component - you need to be > > catching the exception and handling the failed request > > > > That's my 2.5p ;) > > > > Trevor > > > > > > > > > > > > > >> -----Original Message----- > >> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of > Andy Yates > >> Sent: 18 May 2010 09:53 > >> To: Mark Schreiber > >> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org > >> Subject: Re: [Biojava-dev] Errors versus Exceptions > >> > >> You've made that previous decision a lot harder now Mark > (I knew it > >> was over too quickly). > >> > >> On reflection I'm not sure how well you can recover from something > >> like an unknown compound. When you hit that exception what are you > >> going to do with it? Something like IOExceptions for a > file not being > >> open you can do something about that (say it's NFS you > could always > >> retry after an allotted amount of time has passed). But an unknown > >> compound ... not sure how you can deal with that apart > from callbacks > >> in the parsers. If you think of this in terms of FASTA > input into an > >> aligner program and you have an unknown compound there all you're > >> going to do is to pass a message back to the user saying > you didn't > >> understand the input sequence with the error message from the > >> Exception something like "Compound '?' > >> is not supported". You could always recover attempting to move > >> through different CompoundSets seeing if one works but are > you going > >> to do that? > >> > >> I guess the thing to do is to start with checked exceptions in > >> potentially recoverable situations and if they prove to be > too clumsy > >> then as BioJava did previously we will switch to the > RuntimeException > >> as a base class. We should also endeavour to pass on > exceptions and > >> avoid excessive exception wrapping. > >> > >> I do think there is a situation where we can relax these rules; I > >> would like to see the relaxation of them when a developer > is building > >> say a DNASequence but only using a String. The same should > happen for > >> helper classes. > >> > >> That's my 2p :) > >> > >> Andy > >> > >> > >> On 18 May 2010, at 04:53, Mark Schreiber wrote: > >> > >> > To give some historical perspective. This started happening > >> in BioJava > >> > 1.5. We switched BioException to extend RuntimeException > >> (which is not > >> > checked). This came from some thinking at the time that Java's > >> > exception checking was a little bit broken (as exemplified by > >> > BioJava). Checked exceptions are good as long as you > don't swallow > >> > them in the API or just use them to spew a stack trace. > Spitting a > >> > stack trace is pointless as the JVM will do a much better > >> job of this > >> > when it dumps you out. Swallowing them in the API is stupid > >> because it > >> > is the developer who needs to do something with it. Unless > >> it is truly > >> > an Error which is the only time an Error should be used > >> (unrecoverable > >> > problems). > >> > > >> > RuntimeExceptions should be used for any type of exception > >> that can be > >> > avoided by defensive programming. NullPointerExceptions and > >> > IndexOutOfBounds exceptions are classic examples. There are > >> hundreds > >> > of places in Java where you could get one of these but you > >> can easily > >> > avoid them by checking if a collection contains an item, > >> doing simple > >> > if( x == null) checks, using array.length etc. For this > >> reason these > >> > and some other Java exceptions extend RuntimeException > and are not > >> > checked. Can you imagine what Java would look like if you > >> actually had > >> > to put every array access in a try, catch statement? > >> > > >> > Therefore, as much as possible I think BioJava3 should have > >> exceptions > >> > that extend RuntimeException and provide a defensive > mechanism to > >> > avoid having them happen. For example in creating a DNA > >> sequence from > >> > text you could provide a validator which will check if > >> there are any > >> > "incorrect" characters. Good code examples on the wiki > >> should show the > >> > use of defensive programming and not just surround > >> everything with a > >> > try catch. Importantly RuntimeExceptions should be > >> declared so people > >> > are aware they may occur. This is not required by the > >> compiler but it > >> > is good practice for documentation purposes. The Exception class > >> > itself should probably contain some Javadoc which explains > >> how it can > >> > be avoided in the first place. Checked exceptions are good > >> for cases > >> > where you can recover (although this is often more > challenging than > >> > people think) but it is always easier and faster to > check yourself, > >> > throwing the exception and generating the stacktrace etc > >> takes quite a > >> > lot of effort from the JVM. > >> > > >> > Finally, I don't think you should be throwing Errors to often. > >> > Although Errors are unchecked and give you the same effect as > >> > RuntimeExceptions they imply something really bad has gone wrong. > >> > These should be reserved for things like configuration > files being > >> > corrupt or things that would prevent BioJava from being > >> used, like not > >> > finding a required JAR file or plugin. You can't recover > from these. > >> > If your API has lots of places where Errors can occur it might > >> > indicate poor design. > >> > > >> > - Mark > >> > > >> > On Mon, May 17, 2010 at 10:30 PM, Andy Yates > >> wrote: > >> >> > >> >> There's something that irks me with checked exceptions & I > >> found code worked better once I went to a fail-fast method > of coding > >> but for an API I can see the usefulness of them. > >> >> > >> >> Wow I think this could be the fastest any group has ever > >> dealt with > >> >> the Checked/Unchecked exception argument :) > >> >> > >> >> Andy > >> >> > >> >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: > >> >> > >> >>> > >> >>> > >> >>>> Is the problem in the code the fact that errors are > used or that > >> >>>> they are not declared on the method signatures? > >> >>> > >> >>> Declaring them would help - but > >> >>> Even if you declare Errors in the signature, there is > >> nothing to force you to catch them. > >> >>> So nothing to alert you to the chance that they may be > >> thrown when you are writing code. > >> >>> > >> >>> public class Demo { > >> >>> > >> >>> public static void main(String[] args) throws Exception{ > >> >>> > >> >>> Demo.demo(); > >> >>> System.out.println("never reached"); > >> >>> > >> >>> } > >> >>> > >> >>> public static void demo() throws Error { > >> >>> throw new Error("kills the runtime"); > >> >>> } > >> >>> } > >> >>> > >> >>> Compiles OK with or without declaring the throwable - but > >> throws a runtime error. > >> >>> > >> >>> :. IMHO throwing errors is not very useful to developer > >> >>> - they can use them - IF they know the code well enough > >> to know they > >> >>> might be thrown > >> >>> > >> >>> If Exceptions were used > >> >>> 1. they would HAVE to be dealt with - if a developer just > >> swallows > >> >>> something bad, than that's their responsibility for > >> shitty code ;) > >> >>> 2. like Errors they would be extendable, so real case > dataloaders > >> >>> could use their own exception types and decide how to > >> deal with or > >> >>> ignore them > >> >>> > >> >>> I guess in my background of coding for data access > over the wire > >> >>> with jdbc and webservices > >> >>> - you need to be aware of the real probability of > connection and > >> >>> request failures > >> >>> - so explicitly using Exceptions forces the developer to code > >> >>> defensively > >> >>> > >> >>> > >> >>> > >> >>> > >> >>>> > >> >>>> On a more design note I do not like checked > exceptions for two > >> >>>> reasons. The first is they seem to make more junior > developers > >> >>>> catch Exception and swallow it. The second is > whenever I've been > >> >>>> Java coding in the past whenever a checked exception > gets thrown > >> >>>> (say IOException because of incorrect file > permissions) I cannot > >> >>>> deal with it which in the past has meant I either > forward on the > >> >>>> problem or re-throw in an unchecked exception. > >> >>>> > >> >>>> That said as Scooter mentioned the exception system was > >> rushed out > >> >>>> in the hackathon and did not have much work put into it. > >> >>>> > >> >>>> Andy > >> >>>> > >> >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: > >> >>>> > >> >>>>> resending cos of bad headers > >> >>>>> > >> >>>>> ________________________________ > >> >>>>> > >> >>>>> From: PATERSON Trevor > >> >>>>> Sent: 17 May 2010 13:02 > >> >>>>> To: biojava-dev at lists.open-bio.org > >> >>>>> Subject: Errors versus Exceptions > >> >>>>> > >> >>>>> > >> >>>>> Could I ask a quick question about why BJ3 seems to use > >> >>>> Errors rather > >> >>>>> than Exceptions > >> >>>>> - maybe this is already documented somewhere on the wiki > >> >>>> for BJ3 or 1 - and you can point me there. > >> >>>>> > >> >>>>> Obviously for us dataloading from remote databases we need > >> >>>> to have a method to catch connection,sql, datamapping > errors etc. > >> >>>>> We ususally throw DataAccessExceptions when this happens, > >> >>>> which wrap any java.net, Ibatis and SQL exceptions. > >> >>>>> > >> >>>>> It is difficult for us to plug in our lazyload over your > >> >>>> signatures in > >> >>>>> the BJ3 sequence readers, as these don't throw exceptions, > >> >>>> because you > >> >>>>> seem to use Errors throughout - which dont need to be > >> >>>> declared. Infact I cant actually see examples of you > >> catching and > >> >>>> responding to thrown errors. > >> >>>>> > >> >>>>> for example the setContents routine of the Sequence Readers > >> >>>> throws a > >> >>>>> CompoundNotFoundError if there is no mapping for a base - > >> >>>> but I can't see what is done with this. > >> >>>>> > >> >>>>> According to the Java Lang Spec, errors are used for > >> >>>> "serious problems > >> >>>>> that a reasonable application should not try to catch" - > >> >>>> but I would > >> >>>>> have thought finding an unrecognized base in a sequence was > >> >>>> just the > >> >>>>> sort of thing that should be thrown and caught and acted > >> >>>> on. As error > >> >>>>> throwables are not reported in the signature - developers > >> >>>> don't have > >> >>>>> any clue ( or requirement) that they should be catching and > >> >>>>> dealing with errors - which seems a bit dangerous to me... If > >> >>>> Exceptions were > >> >>>>> thrownrather than Errors that would force the developer > >> to handle > >> >>>>> runtime errors > >> >>>>> > >> >>>>> Cheers Trevor > >> >>>>> > >> >>>>> -- > >> >>>>> The University of Edinburgh is a charitable body, > registered in > >> >>>>> Scotland, with registration number SC005336. > >> >>>>> > >> >>>>> > >> >>>>> _______________________________________________ > >> >>>>> biojava-dev mailing list > >> >>>>> biojava-dev at lists.open-bio.org > >> >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> >>>> > >> >>>> -- > >> >>>> Andrew Yates Ensembl Genomes > Engineer EMBL-EBI > >> >>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > >> >>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK > >> >>>> http://www.ensemblgenomes.org/ > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>> -- > >> >>> The University of Edinburgh is a charitable body, > registered in > >> >>> Scotland, with registration number SC005336. > >> >>> > >> >>> > >> >>> _______________________________________________ > >> >>> biojava-dev mailing list > >> >>> biojava-dev at lists.open-bio.org > >> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> >> > >> >> -- > >> >> Andrew Yates Ensembl Genomes Engineer > EMBL-EBI > >> >> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > >> >> +44-(0)1223-494468 Cambridge CB10 1SD, UK > >> >> http://www.ensemblgenomes.org/ > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> _______________________________________________ > >> >> biojava-dev mailing list > >> >> biojava-dev at lists.open-bio.org > >> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > >> > >> -- > >> Andrew Yates Ensembl Genomes Engineer > EMBL-EBI > >> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: > >> +44-(0)1223-494468 Cambridge CB10 1SD, UK > >> http://www.ensemblgenomes.org/ > >> > >> > >> > >> > >> > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Tue May 18 05:55:07 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 18 May 2010 10:55:07 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk> Message-ID: Now that's the situation we want to avoid like the plague :) On 18 May 2010, at 10:51, PATERSON Trevor wrote: >> You can make it safe using all >> kinds of best practices from Bloch's effective Java (private >> setters, no public constructers etc) but that makes it quite >> a bit harder for people who are newer to Java (and are used >> to bean like patterns) > > I'm not new to Java :) > But bean properties and public setters and constuctors are kind of central to the > idea of lazy load and datamapping through Ibatis ... > There maybe there does come a point where: > "hey if I can't make and use these biojava objects without jumping through enormous hoops > I'll just make my own objects..." - which obviously would be 'bad'. > > > trevor > > > > >> -----Original Message----- >> From: Mark Schreiber [mailto:markjschreiber at gmail.com] >> Sent: 18 May 2010 10:24 >> To: PATERSON Trevor >> Cc: Andy Yates; biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Errors versus Exceptions >> >> The script example is a good one. If you truly want to do >> something quick and dirty you could use BioJava with Groovy >> and ignore exceptions completely. This also eludes to a >> problem with making BioJava easy to use and safe. >> >> I have come to the conclusion that you can't really make a >> Java API easy for beginners. You can make it safe using all >> kinds of best practices from Bloch's effective Java (private >> setters, no public constructers etc) but that makes it quite >> a bit harder for people who are newer to Java (and are used >> to bean like patterns). Alternatively you can use bean like >> patterns (useful in JEE settings) at the great danger that >> people will corrupt objects using public setters and cause >> all kinds of concurrent problems with threads. An experience >> developer should be able to handle both. Someone new will >> struggle with both. >> >> Languages like Groovy might be best for the casual or new >> user. Very quick to get up and running and somewhat safe >> (scripting languages are never really safe). It wouldn't be >> too hard to make some convenience Groovy objects; you could >> make a GSequence for example to make use of Groovy's >> overloaded operators. >> >> Anyhow, making BioJava easier to use was one argument for >> making exceptions inherit from RuntimeException. Ultimately I >> don't think we made it easier to use for beginners although >> it did make code tidier and hopefully forced people to think >> about defensive code. >> >> - Mark >> >> On Tue, May 18, 2010 at 5:06 PM, PATERSON Trevor >> wrote: >>> >>> >>>> On reflection I'm not sure how well you can recover from something >>>> like an unknown compound. When you hit that exception what are you >>>> going to do with it? >>> >>> I think it depends on what level your application is >> working at - if >>> your application is just a script doing some data munging >> maybe that >>> is a fatal exception that you want to cause the script to >> die gracefully... >>> But if you are running a large graphical display program >> and you ask >>> for the sequence of have a particular component - you need to be >>> catching the exception and handling the failed request >>> >>> That's my 2.5p ;) >>> >>> Trevor >>> >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of >> Andy Yates >>>> Sent: 18 May 2010 09:53 >>>> To: Mark Schreiber >>>> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org >>>> Subject: Re: [Biojava-dev] Errors versus Exceptions >>>> >>>> You've made that previous decision a lot harder now Mark >> (I knew it >>>> was over too quickly). >>>> >>>> On reflection I'm not sure how well you can recover from something >>>> like an unknown compound. When you hit that exception what are you >>>> going to do with it? Something like IOExceptions for a >> file not being >>>> open you can do something about that (say it's NFS you >> could always >>>> retry after an allotted amount of time has passed). But an unknown >>>> compound ... not sure how you can deal with that apart >> from callbacks >>>> in the parsers. If you think of this in terms of FASTA >> input into an >>>> aligner program and you have an unknown compound there all you're >>>> going to do is to pass a message back to the user saying >> you didn't >>>> understand the input sequence with the error message from the >>>> Exception something like "Compound '?' >>>> is not supported". You could always recover attempting to move >>>> through different CompoundSets seeing if one works but are >> you going >>>> to do that? >>>> >>>> I guess the thing to do is to start with checked exceptions in >>>> potentially recoverable situations and if they prove to be >> too clumsy >>>> then as BioJava did previously we will switch to the >> RuntimeException >>>> as a base class. We should also endeavour to pass on >> exceptions and >>>> avoid excessive exception wrapping. >>>> >>>> I do think there is a situation where we can relax these rules; I >>>> would like to see the relaxation of them when a developer >> is building >>>> say a DNASequence but only using a String. The same should >> happen for >>>> helper classes. >>>> >>>> That's my 2p :) >>>> >>>> Andy >>>> >>>> >>>> On 18 May 2010, at 04:53, Mark Schreiber wrote: >>>> >>>>> To give some historical perspective. This started happening >>>> in BioJava >>>>> 1.5. We switched BioException to extend RuntimeException >>>> (which is not >>>>> checked). This came from some thinking at the time that Java's >>>>> exception checking was a little bit broken (as exemplified by >>>>> BioJava). Checked exceptions are good as long as you >> don't swallow >>>>> them in the API or just use them to spew a stack trace. >> Spitting a >>>>> stack trace is pointless as the JVM will do a much better >>>> job of this >>>>> when it dumps you out. Swallowing them in the API is stupid >>>> because it >>>>> is the developer who needs to do something with it. Unless >>>> it is truly >>>>> an Error which is the only time an Error should be used >>>> (unrecoverable >>>>> problems). >>>>> >>>>> RuntimeExceptions should be used for any type of exception >>>> that can be >>>>> avoided by defensive programming. NullPointerExceptions and >>>>> IndexOutOfBounds exceptions are classic examples. There are >>>> hundreds >>>>> of places in Java where you could get one of these but you >>>> can easily >>>>> avoid them by checking if a collection contains an item, >>>> doing simple >>>>> if( x == null) checks, using array.length etc. For this >>>> reason these >>>>> and some other Java exceptions extend RuntimeException >> and are not >>>>> checked. Can you imagine what Java would look like if you >>>> actually had >>>>> to put every array access in a try, catch statement? >>>>> >>>>> Therefore, as much as possible I think BioJava3 should have >>>> exceptions >>>>> that extend RuntimeException and provide a defensive >> mechanism to >>>>> avoid having them happen. For example in creating a DNA >>>> sequence from >>>>> text you could provide a validator which will check if >>>> there are any >>>>> "incorrect" characters. Good code examples on the wiki >>>> should show the >>>>> use of defensive programming and not just surround >>>> everything with a >>>>> try catch. Importantly RuntimeExceptions should be >>>> declared so people >>>>> are aware they may occur. This is not required by the >>>> compiler but it >>>>> is good practice for documentation purposes. The Exception class >>>>> itself should probably contain some Javadoc which explains >>>> how it can >>>>> be avoided in the first place. Checked exceptions are good >>>> for cases >>>>> where you can recover (although this is often more >> challenging than >>>>> people think) but it is always easier and faster to >> check yourself, >>>>> throwing the exception and generating the stacktrace etc >>>> takes quite a >>>>> lot of effort from the JVM. >>>>> >>>>> Finally, I don't think you should be throwing Errors to often. >>>>> Although Errors are unchecked and give you the same effect as >>>>> RuntimeExceptions they imply something really bad has gone wrong. >>>>> These should be reserved for things like configuration >> files being >>>>> corrupt or things that would prevent BioJava from being >>>> used, like not >>>>> finding a required JAR file or plugin. You can't recover >> from these. >>>>> If your API has lots of places where Errors can occur it might >>>>> indicate poor design. >>>>> >>>>> - Mark >>>>> >>>>> On Mon, May 17, 2010 at 10:30 PM, Andy Yates >>>> wrote: >>>>>> >>>>>> There's something that irks me with checked exceptions & I >>>> found code worked better once I went to a fail-fast method >> of coding >>>> but for an API I can see the usefulness of them. >>>>>> >>>>>> Wow I think this could be the fastest any group has ever >>>> dealt with >>>>>> the Checked/Unchecked exception argument :) >>>>>> >>>>>> Andy >>>>>> >>>>>> On 17 May 2010, at 14:58, PATERSON Trevor wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>>> Is the problem in the code the fact that errors are >> used or that >>>>>>>> they are not declared on the method signatures? >>>>>>> >>>>>>> Declaring them would help - but >>>>>>> Even if you declare Errors in the signature, there is >>>> nothing to force you to catch them. >>>>>>> So nothing to alert you to the chance that they may be >>>> thrown when you are writing code. >>>>>>> >>>>>>> public class Demo { >>>>>>> >>>>>>> public static void main(String[] args) throws Exception{ >>>>>>> >>>>>>> Demo.demo(); >>>>>>> System.out.println("never reached"); >>>>>>> >>>>>>> } >>>>>>> >>>>>>> public static void demo() throws Error { >>>>>>> throw new Error("kills the runtime"); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Compiles OK with or without declaring the throwable - but >>>> throws a runtime error. >>>>>>> >>>>>>> :. IMHO throwing errors is not very useful to developer >>>>>>> - they can use them - IF they know the code well enough >>>> to know they >>>>>>> might be thrown >>>>>>> >>>>>>> If Exceptions were used >>>>>>> 1. they would HAVE to be dealt with - if a developer just >>>> swallows >>>>>>> something bad, than that's their responsibility for >>>> shitty code ;) >>>>>>> 2. like Errors they would be extendable, so real case >> dataloaders >>>>>>> could use their own exception types and decide how to >>>> deal with or >>>>>>> ignore them >>>>>>> >>>>>>> I guess in my background of coding for data access >> over the wire >>>>>>> with jdbc and webservices >>>>>>> - you need to be aware of the real probability of >> connection and >>>>>>> request failures >>>>>>> - so explicitly using Exceptions forces the developer to code >>>>>>> defensively >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On a more design note I do not like checked >> exceptions for two >>>>>>>> reasons. The first is they seem to make more junior >> developers >>>>>>>> catch Exception and swallow it. The second is >> whenever I've been >>>>>>>> Java coding in the past whenever a checked exception >> gets thrown >>>>>>>> (say IOException because of incorrect file >> permissions) I cannot >>>>>>>> deal with it which in the past has meant I either >> forward on the >>>>>>>> problem or re-throw in an unchecked exception. >>>>>>>> >>>>>>>> That said as Scooter mentioned the exception system was >>>> rushed out >>>>>>>> in the hackathon and did not have much work put into it. >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >>>>>>>> >>>>>>>>> resending cos of bad headers >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> >>>>>>>>> From: PATERSON Trevor >>>>>>>>> Sent: 17 May 2010 13:02 >>>>>>>>> To: biojava-dev at lists.open-bio.org >>>>>>>>> Subject: Errors versus Exceptions >>>>>>>>> >>>>>>>>> >>>>>>>>> Could I ask a quick question about why BJ3 seems to use >>>>>>>> Errors rather >>>>>>>>> than Exceptions >>>>>>>>> - maybe this is already documented somewhere on the wiki >>>>>>>> for BJ3 or 1 - and you can point me there. >>>>>>>>> >>>>>>>>> Obviously for us dataloading from remote databases we need >>>>>>>> to have a method to catch connection,sql, datamapping >> errors etc. >>>>>>>>> We ususally throw DataAccessExceptions when this happens, >>>>>>>> which wrap any java.net, Ibatis and SQL exceptions. >>>>>>>>> >>>>>>>>> It is difficult for us to plug in our lazyload over your >>>>>>>> signatures in >>>>>>>>> the BJ3 sequence readers, as these don't throw exceptions, >>>>>>>> because you >>>>>>>>> seem to use Errors throughout - which dont need to be >>>>>>>> declared. Infact I cant actually see examples of you >>>> catching and >>>>>>>> responding to thrown errors. >>>>>>>>> >>>>>>>>> for example the setContents routine of the Sequence Readers >>>>>>>> throws a >>>>>>>>> CompoundNotFoundError if there is no mapping for a base - >>>>>>>> but I can't see what is done with this. >>>>>>>>> >>>>>>>>> According to the Java Lang Spec, errors are used for >>>>>>>> "serious problems >>>>>>>>> that a reasonable application should not try to catch" - >>>>>>>> but I would >>>>>>>>> have thought finding an unrecognized base in a sequence was >>>>>>>> just the >>>>>>>>> sort of thing that should be thrown and caught and acted >>>>>>>> on. As error >>>>>>>>> throwables are not reported in the signature - developers >>>>>>>> don't have >>>>>>>>> any clue ( or requirement) that they should be catching and >>>>>>>>> dealing with errors - which seems a bit dangerous to me... If >>>>>>>> Exceptions were >>>>>>>>> thrownrather than Errors that would force the developer >>>> to handle >>>>>>>>> runtime errors >>>>>>>>> >>>>>>>>> Cheers Trevor >>>>>>>>> >>>>>>>>> -- >>>>>>>>> The University of Edinburgh is a charitable body, >> registered in >>>>>>>>> Scotland, with registration number SC005336. >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> biojava-dev mailing list >>>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>> >>>>>>>> -- >>>>>>>> Andrew Yates Ensembl Genomes >> Engineer EMBL-EBI >>>>>>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: >>>>>>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>>>>>> http://www.ensemblgenomes.org/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> The University of Edinburgh is a charitable body, >> registered in >>>>>>> Scotland, with registration number SC005336. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> biojava-dev mailing list >>>>>>> biojava-dev at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>>> -- >>>>>> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI >>>>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: >>>>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>>>> http://www.ensemblgenomes.org/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI >>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: >>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>> http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >> > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From heuermh at acm.org Wed May 19 22:44:59 2010 From: heuermh at acm.org (Michael Heuer) Date: Wed, 19 May 2010 22:44:59 -0400 (EDT) Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> Message-ID: On Tue, 18 May 2010, PATERSON Trevor wrote: > Yes... But... > > If our implementation relies on a lazy load underneath every Sequence method implementation.. > > Then every method can throw an IO type Exception > > In essence you can instantiate a valid Sequence object with a LazyLoading Reader before you get the valid SequenceData, > unlike the case where you can check that you can make a valid Sequence object from the Reader. Don't make a lazy load Sequence then. Have the reader populate a (package private) memento object, and then use that to create a fully formed Sequence. Lazy loading is a bad smell. michael From markjschreiber at gmail.com Thu May 20 00:02:12 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 20 May 2010 12:02:12 +0800 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk> Message-ID: Sorry, you lost me on that thread. What are we avoiding like the plague? On Tue, May 18, 2010 at 5:55 PM, Andy Yates wrote: > Now that's the situation we want to avoid like the plague :) > > On 18 May 2010, at 10:51, PATERSON Trevor wrote: > >>> You can make it safe using all >>> kinds of best practices from Bloch's effective Java (private >>> setters, no public constructers etc) but that makes it quite >>> a bit harder for people who are newer to Java (and are used >>> to bean like patterns) >> >> I'm not new to Java :) >> But bean properties and public setters and constuctors are kind of central to the >> idea of lazy load and datamapping through Ibatis ... >> There maybe there does come a point where: >> "hey if I can't make and use these biojava objects without jumping through enormous hoops >> I'll just make my own objects..." ?- which obviously would be 'bad'. >> >> >> trevor >> >> >> >> >>> -----Original Message----- >>> From: Mark Schreiber [mailto:markjschreiber at gmail.com] >>> Sent: 18 May 2010 10:24 >>> To: PATERSON Trevor >>> Cc: Andy Yates; biojava-dev at lists.open-bio.org >>> Subject: Re: [Biojava-dev] Errors versus Exceptions >>> >>> The script example is a good one. If you truly want to do >>> something quick and dirty you could use BioJava with Groovy >>> and ignore exceptions completely. ?This also eludes to a >>> problem with making BioJava easy to use and safe. >>> >>> I have come to the conclusion that you can't really make a >>> Java API easy for beginners. You can make it safe using all >>> kinds of best practices from Bloch's effective Java (private >>> setters, no public constructers etc) but that makes it quite >>> a bit harder for people who are newer to Java (and are used >>> to bean like patterns). Alternatively you can use bean like >>> patterns (useful in JEE settings) at the great danger that >>> people will corrupt objects using public setters and cause >>> all kinds of concurrent problems with threads. An experience >>> developer should be able to handle both. Someone new will >>> struggle with both. >>> >>> Languages like Groovy might be best for the casual or new >>> user. Very quick to get up and running and somewhat safe >>> (scripting languages are never really safe). It wouldn't be >>> too hard to make some convenience Groovy objects; you could >>> make a GSequence for example to make use of Groovy's >>> overloaded operators. >>> >>> Anyhow, making BioJava easier to use was one argument for >>> making exceptions inherit from RuntimeException. Ultimately I >>> don't think we made it easier to use for beginners although >>> it did make code tidier and hopefully forced people to think >>> about defensive code. >>> >>> - Mark >>> >>> On Tue, May 18, 2010 at 5:06 PM, PATERSON Trevor >>> wrote: >>>> >>>> >>>>> On reflection I'm not sure how well you can recover from something >>>>> like an unknown compound. When you hit that exception what are you >>>>> going to do with it? >>>> >>>> I think it depends on what level your application is >>> working at - if >>>> your application is just a script doing some data munging >>> maybe that >>>> is a fatal exception that you want to cause the script to >>> die gracefully... >>>> But if you are running a large graphical display program >>> and you ask >>>> for the sequence of have a particular component ?- you need to be >>>> catching the exception and handling the failed request >>>> >>>> That's my 2.5p ;) >>>> >>>> Trevor >>>> >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of >>> Andy Yates >>>>> Sent: 18 May 2010 09:53 >>>>> To: Mark Schreiber >>>>> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org >>>>> Subject: Re: [Biojava-dev] Errors versus Exceptions >>>>> >>>>> You've made that previous decision a lot harder now Mark >>> (I knew it >>>>> was over too quickly). >>>>> >>>>> On reflection I'm not sure how well you can recover from something >>>>> like an unknown compound. When you hit that exception what are you >>>>> going to do with it? Something like IOExceptions for a >>> file not being >>>>> open you can do something about that (say it's NFS you >>> could always >>>>> retry after an allotted amount of time has passed). But an unknown >>>>> compound ... not sure how you can deal with that apart >>> from callbacks >>>>> in the parsers. If you think of this in terms of FASTA >>> input into an >>>>> aligner program and you have an unknown compound there all you're >>>>> going to do is to pass a message back to the user saying >>> you didn't >>>>> understand the input sequence with the error message from the >>>>> Exception something like "Compound '?' >>>>> is not supported". You could always recover attempting to move >>>>> through different CompoundSets seeing if one works but are >>> you going >>>>> to do that? >>>>> >>>>> I guess the thing to do is to start with checked exceptions in >>>>> potentially recoverable situations and if they prove to be >>> too clumsy >>>>> then as BioJava did previously we will switch to the >>> RuntimeException >>>>> as a base class. We should also endeavour to pass on >>> exceptions and >>>>> avoid excessive exception wrapping. >>>>> >>>>> I do think there is a situation where we can relax these rules; I >>>>> would like to see the relaxation of them when a developer >>> is building >>>>> say a DNASequence but only using a String. The same should >>> happen for >>>>> helper classes. >>>>> >>>>> That's my 2p :) >>>>> >>>>> Andy >>>>> >>>>> >>>>> On 18 May 2010, at 04:53, Mark Schreiber wrote: >>>>> >>>>>> To give some historical perspective. This started happening >>>>> in BioJava >>>>>> 1.5. We switched BioException to extend RuntimeException >>>>> (which is not >>>>>> checked). This came from some thinking at the time that Java's >>>>>> exception checking was a little bit broken (as exemplified by >>>>>> BioJava). ?Checked exceptions are good as long as you >>> don't swallow >>>>>> them in the API or just use them to spew a stack trace. >>> Spitting a >>>>>> stack trace is pointless as the JVM will do a much better >>>>> job of this >>>>>> when it dumps you out. Swallowing them in the API is stupid >>>>> because it >>>>>> is the developer who needs to do something with it. Unless >>>>> it is truly >>>>>> an Error which is the only time an Error should be used >>>>> (unrecoverable >>>>>> problems). >>>>>> >>>>>> RuntimeExceptions should be used for any type of exception >>>>> that can be >>>>>> avoided by defensive programming. NullPointerExceptions and >>>>>> IndexOutOfBounds exceptions are classic examples. There are >>>>> hundreds >>>>>> of places in Java where you could get one of these but you >>>>> can easily >>>>>> avoid them by checking if a collection contains an item, >>>>> doing simple >>>>>> if( x == null) checks, using array.length etc. For this >>>>> reason these >>>>>> and some other Java exceptions extend RuntimeException >>> and are not >>>>>> checked. Can you imagine what Java would look like if you >>>>> actually had >>>>>> to put every array access in a try, catch statement? >>>>>> >>>>>> Therefore, as much as possible I think BioJava3 should have >>>>> exceptions >>>>>> that extend RuntimeException and provide a defensive >>> mechanism to >>>>>> avoid having them happen. For example in creating a DNA >>>>> sequence from >>>>>> text you could provide a validator which will check if >>>>> there are any >>>>>> "incorrect" characters. Good code examples on the wiki >>>>> should show the >>>>>> use of defensive programming and not just surround >>>>> everything with a >>>>>> try catch. ?Importantly RuntimeExceptions should be >>>>> declared so people >>>>>> are aware they may occur. This is not required by the >>>>> compiler but it >>>>>> is good practice for documentation purposes. The Exception class >>>>>> itself should probably contain some Javadoc which explains >>>>> how it can >>>>>> be avoided in the first place. Checked exceptions are good >>>>> for cases >>>>>> where you can recover (although this is often more >>> challenging than >>>>>> people think) but it is always easier and faster to >>> check yourself, >>>>>> throwing the exception and generating the stacktrace etc >>>>> takes quite a >>>>>> lot of effort from the JVM. >>>>>> >>>>>> Finally, I don't think you should be throwing Errors to often. >>>>>> Although Errors are unchecked and give you the same effect as >>>>>> RuntimeExceptions they imply something really bad has gone wrong. >>>>>> These should be reserved for things like configuration >>> files being >>>>>> corrupt or things that would prevent BioJava from being >>>>> used, like not >>>>>> finding a required JAR file or plugin. You can't recover >>> from these. >>>>>> If your API has lots of places where Errors can occur it might >>>>>> indicate poor design. >>>>>> >>>>>> - Mark >>>>>> >>>>>> On Mon, May 17, 2010 at 10:30 PM, Andy Yates >>>>> wrote: >>>>>>> >>>>>>> There's something that irks me with checked exceptions & I >>>>> found code worked better once I went to a fail-fast method >>> of coding >>>>> but for an API I can see the usefulness of them. >>>>>>> >>>>>>> Wow I think this could be the fastest any group has ever >>>>> dealt with >>>>>>> the Checked/Unchecked exception argument :) >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> On 17 May 2010, at 14:58, PATERSON Trevor wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Is the problem in the code the fact that errors are >>> used or that >>>>>>>>> they are not declared on the method signatures? >>>>>>>> >>>>>>>> Declaring them would help - but >>>>>>>> Even if you declare Errors in the signature, there is >>>>> nothing to force you to catch them. >>>>>>>> So nothing to alert you to the chance that they may be >>>>> thrown when you are writing code. >>>>>>>> >>>>>>>> public class Demo { >>>>>>>> >>>>>>>> ? ? ?public ?static void main(String[] args) throws Exception{ >>>>>>>> >>>>>>>> ? ? ?Demo.demo(); >>>>>>>> ? ? ? ? ? ? ?System.out.println("never reached"); >>>>>>>> >>>>>>>> ? ? ?} >>>>>>>> >>>>>>>> ? ? ?public static void demo() throws Error { >>>>>>>> ? ? ? ? ? ? ?throw new Error("kills the runtime"); >>>>>>>> ? ? ?} >>>>>>>> } >>>>>>>> >>>>>>>> Compiles OK with or without declaring the throwable - but >>>>> throws a runtime error. >>>>>>>> >>>>>>>> :. IMHO throwing errors is not very useful to developer >>>>>>>> - they can use them - IF they know the code well enough >>>>> to know they >>>>>>>> might be thrown >>>>>>>> >>>>>>>> If Exceptions were used >>>>>>>> 1. they would HAVE to be dealt with - if a developer just >>>>> swallows >>>>>>>> something bad, than that's their responsibility for >>>>> shitty code ;) >>>>>>>> 2. like Errors they would be extendable, so real case >>> dataloaders >>>>>>>> could use their own exception types and decide how to >>>>> deal with or >>>>>>>> ignore them >>>>>>>> >>>>>>>> I guess in my background of coding for data access >>> over the wire >>>>>>>> with jdbc and webservices >>>>>>>> - you need to be aware of the real probability of >>> connection and >>>>>>>> request failures >>>>>>>> - so explicitly using Exceptions forces the developer to code >>>>>>>> defensively >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On a more design note I do not like checked >>> exceptions for two >>>>>>>>> reasons. The first is they seem to make more junior >>> developers >>>>>>>>> catch Exception and swallow it. The second is >>> whenever I've been >>>>>>>>> Java coding in the past whenever a checked exception >>> gets thrown >>>>>>>>> (say IOException because of incorrect file >>> permissions) I cannot >>>>>>>>> deal with it which in the past has meant I either >>> forward on the >>>>>>>>> problem or re-throw in an unchecked exception. >>>>>>>>> >>>>>>>>> That said as Scooter mentioned the exception system was >>>>> rushed out >>>>>>>>> in the hackathon and did not have much work put into it. >>>>>>>>> >>>>>>>>> Andy >>>>>>>>> >>>>>>>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >>>>>>>>> >>>>>>>>>> resending cos of bad headers >>>>>>>>>> >>>>>>>>>> ________________________________ >>>>>>>>>> >>>>>>>>>> From: PATERSON Trevor >>>>>>>>>> Sent: 17 May 2010 13:02 >>>>>>>>>> To: biojava-dev at lists.open-bio.org >>>>>>>>>> Subject: Errors versus Exceptions >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Could I ask a quick question about why BJ3 seems to use >>>>>>>>> Errors rather >>>>>>>>>> than Exceptions >>>>>>>>>> - maybe this is already documented somewhere on the wiki >>>>>>>>> for BJ3 or 1 - and you can point me there. >>>>>>>>>> >>>>>>>>>> Obviously for us dataloading from remote databases we need >>>>>>>>> to have a method to catch connection,sql, datamapping >>> errors etc. >>>>>>>>>> We ususally throw DataAccessExceptions when this happens, >>>>>>>>> which wrap any java.net, Ibatis and SQL exceptions. >>>>>>>>>> >>>>>>>>>> It is difficult for us to plug in our lazyload over your >>>>>>>>> signatures in >>>>>>>>>> the BJ3 sequence readers, as these don't throw exceptions, >>>>>>>>> because you >>>>>>>>>> seem to use Errors throughout - which dont need to be >>>>>>>>> declared. Infact I cant actually see examples of you >>>>> catching and >>>>>>>>> responding to thrown errors. >>>>>>>>>> >>>>>>>>>> for example the setContents routine of the Sequence Readers >>>>>>>>> throws a >>>>>>>>>> CompoundNotFoundError if there is no mapping for a base - >>>>>>>>> but I can't see what is done with this. >>>>>>>>>> >>>>>>>>>> According to the Java Lang Spec, errors are used for >>>>>>>>> "serious problems >>>>>>>>>> that a reasonable application should not try to catch" - >>>>>>>>> but I would >>>>>>>>>> have thought finding an unrecognized base in a sequence was >>>>>>>>> just the >>>>>>>>>> sort of thing that should be thrown and caught and acted >>>>>>>>> on. As error >>>>>>>>>> throwables are not reported in the signature - developers >>>>>>>>> don't have >>>>>>>>>> any clue ( or requirement) that they should be catching and >>>>>>>>>> dealing with errors - which seems a bit dangerous to me... If >>>>>>>>> Exceptions were >>>>>>>>>> thrownrather than Errors that would force the developer >>>>> to handle >>>>>>>>>> runtime errors >>>>>>>>>> >>>>>>>>>> Cheers Trevor >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> The University of Edinburgh is a charitable body, >>> registered in >>>>>>>>>> Scotland, with registration number SC005336. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> biojava-dev mailing list >>>>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes >>> Engineer EMBL-EBI >>>>>>>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >>>>>>>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>>>>>>> http://www.ensemblgenomes.org/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> The University of Edinburgh is a charitable body, >>> registered in >>>>>>>> Scotland, with registration number SC005336. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> biojava-dev mailing list >>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>> >>>>>>> -- >>>>>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>> EMBL-EBI >>>>>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >>>>>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>>>>> http://www.ensemblgenomes.org/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> biojava-dev mailing list >>>>>>> biojava-dev at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>>> -- >>>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>> EMBL-EBI >>>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >>>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >>>>> http://www.ensemblgenomes.org/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > From markjschreiber at gmail.com Thu May 20 00:11:33 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 20 May 2010 12:11:33 +0800 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk> Message-ID: If you are using BioJava objects as fake DTO's or EntityBean look-a-likes you should really question why you are using BioJava objects in the first place. Not sure what BioJava3 objects will look like but BJ1.X objects are definitely not good at this. It also raises and interesting point which I haven't seen discussed much on the list; what will BJ3 be for (or not for). One of the painful lessons (for me) from working on BioJava is you can't make an API do everything. The more modular approach to BJ3 should help avoid this. I see nothing wrong with having a module that is more suitable for the kind of work sequence-data-binding you are proposing. This modules objects should definitely have public constructors and public setters. Why not make use of Entity Beans (Post EJB 3) while your at it. If it is in it's own module it will not corrupt the other parts of BioJava with "unsafe" beany objects. In this case making your own objects (and sharing them) would actually be a whole lot better than trying to shoe horn an API that wasn't made for this. Some IDEs will even auto-generate databinding objects for you; although, I understand there is some strange cases in Ensembl that might not make this a good idea). - Mark On Tue, May 18, 2010 at 5:51 PM, PATERSON Trevor wrote: >> You can make it safe using all >> kinds of best practices from Bloch's effective Java (private >> setters, no public constructers etc) but that makes it quite >> a bit harder for people who are newer to Java (and are used >> to bean like patterns) > > I'm not new to Java :) > But bean properties and public setters and constuctors are kind of central to the > idea of lazy load and datamapping through Ibatis ... > There maybe there does come a point where: > "hey if I can't make and use these biojava objects without jumping through enormous hoops > I'll just make my own objects..." ?- which obviously would be 'bad'. > > > trevor > > > > >> -----Original Message----- >> From: Mark Schreiber [mailto:markjschreiber at gmail.com] >> Sent: 18 May 2010 10:24 >> To: PATERSON Trevor >> Cc: Andy Yates; biojava-dev at lists.open-bio.org >> Subject: Re: [Biojava-dev] Errors versus Exceptions >> >> The script example is a good one. If you truly want to do >> something quick and dirty you could use BioJava with Groovy >> and ignore exceptions completely. ?This also eludes to a >> problem with making BioJava easy to use and safe. >> >> I have come to the conclusion that you can't really make a >> Java API easy for beginners. You can make it safe using all >> kinds of best practices from Bloch's effective Java (private >> setters, no public constructers etc) but that makes it quite >> a bit harder for people who are newer to Java (and are used >> to bean like patterns). Alternatively you can use bean like >> patterns (useful in JEE settings) at the great danger that >> people will corrupt objects using public setters and cause >> all kinds of concurrent problems with threads. An experience >> developer should be able to handle both. Someone new will >> struggle with both. >> >> Languages like Groovy might be best for the casual or new >> user. Very quick to get up and running and somewhat safe >> (scripting languages are never really safe). It wouldn't be >> too hard to make some convenience Groovy objects; you could >> make a GSequence for example to make use of Groovy's >> overloaded operators. >> >> Anyhow, making BioJava easier to use was one argument for >> making exceptions inherit from RuntimeException. Ultimately I >> don't think we made it easier to use for beginners although >> it did make code tidier and hopefully forced people to think >> about defensive code. >> >> - Mark >> >> On Tue, May 18, 2010 at 5:06 PM, PATERSON Trevor >> wrote: >> > >> > >> >> On reflection I'm not sure how well you can recover from something >> >> like an unknown compound. When you hit that exception what are you >> >> going to do with it? >> > >> > I think it depends on what level your application is >> working at - if >> > your application is just a script doing some data munging >> maybe that >> > is a fatal exception that you want to cause the script to >> die gracefully... >> > But if you are running a large graphical display program >> and you ask >> > for the sequence of have a particular component ?- you need to be >> > catching the exception and handling the failed request >> > >> > That's my 2.5p ;) >> > >> > Trevor >> > >> > >> > >> > >> > >> > >> >> -----Original Message----- >> >> From: Andy Yates [mailto:andyyatz at gmail.com] On Behalf Of >> Andy Yates >> >> Sent: 18 May 2010 09:53 >> >> To: Mark Schreiber >> >> Cc: PATERSON Trevor; biojava-dev at lists.open-bio.org >> >> Subject: Re: [Biojava-dev] Errors versus Exceptions >> >> >> >> You've made that previous decision a lot harder now Mark >> (I knew it >> >> was over too quickly). >> >> >> >> On reflection I'm not sure how well you can recover from something >> >> like an unknown compound. When you hit that exception what are you >> >> going to do with it? Something like IOExceptions for a >> file not being >> >> open you can do something about that (say it's NFS you >> could always >> >> retry after an allotted amount of time has passed). But an unknown >> >> compound ... not sure how you can deal with that apart >> from callbacks >> >> in the parsers. If you think of this in terms of FASTA >> input into an >> >> aligner program and you have an unknown compound there all you're >> >> going to do is to pass a message back to the user saying >> you didn't >> >> understand the input sequence with the error message from the >> >> Exception something like "Compound '?' >> >> is not supported". You could always recover attempting to move >> >> through different CompoundSets seeing if one works but are >> you going >> >> to do that? >> >> >> >> I guess the thing to do is to start with checked exceptions in >> >> potentially recoverable situations and if they prove to be >> too clumsy >> >> then as BioJava did previously we will switch to the >> RuntimeException >> >> as a base class. We should also endeavour to pass on >> exceptions and >> >> avoid excessive exception wrapping. >> >> >> >> I do think there is a situation where we can relax these rules; I >> >> would like to see the relaxation of them when a developer >> is building >> >> say a DNASequence but only using a String. The same should >> happen for >> >> helper classes. >> >> >> >> That's my 2p :) >> >> >> >> Andy >> >> >> >> >> >> On 18 May 2010, at 04:53, Mark Schreiber wrote: >> >> >> >> > To give some historical perspective. This started happening >> >> in BioJava >> >> > 1.5. We switched BioException to extend RuntimeException >> >> (which is not >> >> > checked). This came from some thinking at the time that Java's >> >> > exception checking was a little bit broken (as exemplified by >> >> > BioJava). ?Checked exceptions are good as long as you >> don't swallow >> >> > them in the API or just use them to spew a stack trace. >> Spitting a >> >> > stack trace is pointless as the JVM will do a much better >> >> job of this >> >> > when it dumps you out. Swallowing them in the API is stupid >> >> because it >> >> > is the developer who needs to do something with it. Unless >> >> it is truly >> >> > an Error which is the only time an Error should be used >> >> (unrecoverable >> >> > problems). >> >> > >> >> > RuntimeExceptions should be used for any type of exception >> >> that can be >> >> > avoided by defensive programming. NullPointerExceptions and >> >> > IndexOutOfBounds exceptions are classic examples. There are >> >> hundreds >> >> > of places in Java where you could get one of these but you >> >> can easily >> >> > avoid them by checking if a collection contains an item, >> >> doing simple >> >> > if( x == null) checks, using array.length etc. For this >> >> reason these >> >> > and some other Java exceptions extend RuntimeException >> and are not >> >> > checked. Can you imagine what Java would look like if you >> >> actually had >> >> > to put every array access in a try, catch statement? >> >> > >> >> > Therefore, as much as possible I think BioJava3 should have >> >> exceptions >> >> > that extend RuntimeException and provide a defensive >> mechanism to >> >> > avoid having them happen. For example in creating a DNA >> >> sequence from >> >> > text you could provide a validator which will check if >> >> there are any >> >> > "incorrect" characters. Good code examples on the wiki >> >> should show the >> >> > use of defensive programming and not just surround >> >> everything with a >> >> > try catch. ?Importantly RuntimeExceptions should be >> >> declared so people >> >> > are aware they may occur. This is not required by the >> >> compiler but it >> >> > is good practice for documentation purposes. The Exception class >> >> > itself should probably contain some Javadoc which explains >> >> how it can >> >> > be avoided in the first place. Checked exceptions are good >> >> for cases >> >> > where you can recover (although this is often more >> challenging than >> >> > people think) but it is always easier and faster to >> check yourself, >> >> > throwing the exception and generating the stacktrace etc >> >> takes quite a >> >> > lot of effort from the JVM. >> >> > >> >> > Finally, I don't think you should be throwing Errors to often. >> >> > Although Errors are unchecked and give you the same effect as >> >> > RuntimeExceptions they imply something really bad has gone wrong. >> >> > These should be reserved for things like configuration >> files being >> >> > corrupt or things that would prevent BioJava from being >> >> used, like not >> >> > finding a required JAR file or plugin. You can't recover >> from these. >> >> > If your API has lots of places where Errors can occur it might >> >> > indicate poor design. >> >> > >> >> > - Mark >> >> > >> >> > On Mon, May 17, 2010 at 10:30 PM, Andy Yates >> >> wrote: >> >> >> >> >> >> There's something that irks me with checked exceptions & I >> >> found code worked better once I went to a fail-fast method >> of coding >> >> but for an API I can see the usefulness of them. >> >> >> >> >> >> Wow I think this could be the fastest any group has ever >> >> dealt with >> >> >> the Checked/Unchecked exception argument :) >> >> >> >> >> >> Andy >> >> >> >> >> >> On 17 May 2010, at 14:58, PATERSON Trevor wrote: >> >> >> >> >> >>> >> >> >>> >> >> >>>> Is the problem in the code the fact that errors are >> used or that >> >> >>>> they are not declared on the method signatures? >> >> >>> >> >> >>> Declaring them would help - but >> >> >>> Even if you declare Errors in the signature, there is >> >> nothing to force you to catch them. >> >> >>> So nothing to alert you to the chance that they may be >> >> thrown when you are writing code. >> >> >>> >> >> >>> public class Demo { >> >> >>> >> >> >>> ? ? ? public ?static void main(String[] args) throws Exception{ >> >> >>> >> >> >>> ? ? ? Demo.demo(); >> >> >>> ? ? ? ? ? ? ? System.out.println("never reached"); >> >> >>> >> >> >>> ? ? ? } >> >> >>> >> >> >>> ? ? ? public static void demo() throws Error { >> >> >>> ? ? ? ? ? ? ? throw new Error("kills the runtime"); >> >> >>> ? ? ? } >> >> >>> } >> >> >>> >> >> >>> Compiles OK with or without declaring the throwable - but >> >> throws a runtime error. >> >> >>> >> >> >>> :. IMHO throwing errors is not very useful to developer >> >> >>> - they can use them - IF they know the code well enough >> >> to know they >> >> >>> might be thrown >> >> >>> >> >> >>> If Exceptions were used >> >> >>> 1. they would HAVE to be dealt with - if a developer just >> >> swallows >> >> >>> something bad, than that's their responsibility for >> >> shitty code ;) >> >> >>> 2. like Errors they would be extendable, so real case >> dataloaders >> >> >>> could use their own exception types and decide how to >> >> deal with or >> >> >>> ignore them >> >> >>> >> >> >>> I guess in my background of coding for data access >> over the wire >> >> >>> with jdbc and webservices >> >> >>> - you need to be aware of the real probability of >> connection and >> >> >>> request failures >> >> >>> - so explicitly using Exceptions forces the developer to code >> >> >>> defensively >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>>> >> >> >>>> On a more design note I do not like checked >> exceptions for two >> >> >>>> reasons. The first is they seem to make more junior >> developers >> >> >>>> catch Exception and swallow it. The second is >> whenever I've been >> >> >>>> Java coding in the past whenever a checked exception >> gets thrown >> >> >>>> (say IOException because of incorrect file >> permissions) I cannot >> >> >>>> deal with it which in the past has meant I either >> forward on the >> >> >>>> problem or re-throw in an unchecked exception. >> >> >>>> >> >> >>>> That said as Scooter mentioned the exception system was >> >> rushed out >> >> >>>> in the hackathon and did not have much work put into it. >> >> >>>> >> >> >>>> Andy >> >> >>>> >> >> >>>> On 17 May 2010, at 13:15, PATERSON Trevor wrote: >> >> >>>> >> >> >>>>> resending cos of bad headers >> >> >>>>> >> >> >>>>> ________________________________ >> >> >>>>> >> >> >>>>> From: PATERSON Trevor >> >> >>>>> Sent: 17 May 2010 13:02 >> >> >>>>> To: biojava-dev at lists.open-bio.org >> >> >>>>> Subject: Errors versus Exceptions >> >> >>>>> >> >> >>>>> >> >> >>>>> Could I ask a quick question about why BJ3 seems to use >> >> >>>> Errors rather >> >> >>>>> than Exceptions >> >> >>>>> - maybe this is already documented somewhere on the wiki >> >> >>>> for BJ3 or 1 - and you can point me there. >> >> >>>>> >> >> >>>>> Obviously for us dataloading from remote databases we need >> >> >>>> to have a method to catch connection,sql, datamapping >> errors etc. >> >> >>>>> We ususally throw DataAccessExceptions when this happens, >> >> >>>> which wrap any java.net, Ibatis and SQL exceptions. >> >> >>>>> >> >> >>>>> It is difficult for us to plug in our lazyload over your >> >> >>>> signatures in >> >> >>>>> the BJ3 sequence readers, as these don't throw exceptions, >> >> >>>> because you >> >> >>>>> seem to use Errors throughout - which dont need to be >> >> >>>> declared. Infact I cant actually see examples of you >> >> catching and >> >> >>>> responding to thrown errors. >> >> >>>>> >> >> >>>>> for example the setContents routine of the Sequence Readers >> >> >>>> throws a >> >> >>>>> CompoundNotFoundError if there is no mapping for a base - >> >> >>>> but I can't see what is done with this. >> >> >>>>> >> >> >>>>> According to the Java Lang Spec, errors are used for >> >> >>>> "serious problems >> >> >>>>> that a reasonable application should not try to catch" - >> >> >>>> but I would >> >> >>>>> have thought finding an unrecognized base in a sequence was >> >> >>>> just the >> >> >>>>> sort of thing that should be thrown and caught and acted >> >> >>>> on. As error >> >> >>>>> throwables are not reported in the signature - developers >> >> >>>> don't have >> >> >>>>> any clue ( or requirement) that they should be catching and >> >> >>>>> dealing with errors - which seems a bit dangerous to me... If >> >> >>>> Exceptions were >> >> >>>>> thrownrather than Errors that would force the developer >> >> to handle >> >> >>>>> runtime errors >> >> >>>>> >> >> >>>>> Cheers Trevor >> >> >>>>> >> >> >>>>> -- >> >> >>>>> The University of Edinburgh is a charitable body, >> registered in >> >> >>>>> Scotland, with registration number SC005336. >> >> >>>>> >> >> >>>>> >> >> >>>>> _______________________________________________ >> >> >>>>> biojava-dev mailing list >> >> >>>>> biojava-dev at lists.open-bio.org >> >> >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> >>>> >> >> >>>> -- >> >> >>>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes >> Engineer EMBL-EBI >> >> >>>> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >> >> >>>> +44-(0)1223-494468 Cambridge CB10 1SD, UK >> >> >>>> http://www.ensemblgenomes.org/ >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>> -- >> >> >>> The University of Edinburgh is a charitable body, >> registered in >> >> >>> Scotland, with registration number SC005336. >> >> >>> >> >> >>> >> >> >>> _______________________________________________ >> >> >>> biojava-dev mailing list >> >> >>> biojava-dev at lists.open-bio.org >> >> >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> >> >> >> >> -- >> >> >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI >> >> >> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >> >> >> +44-(0)1223-494468 Cambridge CB10 1SD, UK >> >> >> http://www.ensemblgenomes.org/ >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> biojava-dev mailing list >> >> >> biojava-dev at lists.open-bio.org >> >> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> >> >> -- >> >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI >> >> Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus ? Fax: >> >> +44-(0)1223-494468 Cambridge CB10 1SD, UK >> >> http://www.ensemblgenomes.org/ >> >> >> >> >> >> >> >> >> >> >> > -- >> > The University of Edinburgh is a charitable body, registered in >> > Scotland, with registration number SC005336. >> > >> > >> > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > From markjschreiber at gmail.com Thu May 20 00:17:43 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 20 May 2010 12:17:43 +0800 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E210@EXSEEDMD.is.ed.ac.uk> Message-ID: Totally agree with Michael. Extensive lazy loading will cause a big hit on your performance. Be selective in your use, which entirely depends on your use case. If people are likely to list every sequence id in the database but only work on a few then you definitely want lazy loading at this point. When they have selected the few they will work on you may want to eagerly load those parts they will need (such as the sequence and feature table). All is entirely dependent on what your application is for. I would usually say that an API for general shouldn't specify the loading mode. It should be configured by the app developer to suite their application (assuming you are going to share your Ensembl data binding API). - Mark On Thu, May 20, 2010 at 10:44 AM, Michael Heuer wrote: > On Tue, 18 May 2010, PATERSON Trevor wrote: > >> Yes... But... >> >> If our implementation relies on a lazy load underneath every Sequence method implementation.. >> >> Then every method can throw an IO type Exception >> >> In essence you can instantiate a valid Sequence object with a LazyLoading Reader before you get the valid SequenceData, >> unlike the case where you can check that you can make a valid Sequence object from the Reader. > > Don't make a lazy load Sequence then. ?Have the reader populate a (package > private) memento object, and then use that to create a fully formed > Sequence. > > Lazy loading is a bad smell. > > ? michael > > From andy.law at roslin.ed.ac.uk Thu May 20 04:30:51 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Thu, 20 May 2010 09:30:51 +0100 Subject: [Biojava-dev] Errors versus Exceptions In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk> Message-ID: On 20 May 2010, at 05:11, Mark Schreiber wrote: > If you are using BioJava objects as fake DTO's or EntityBean > look-a-likes you should really question why you are using BioJava > objects in the first place. Not sure what BioJava3 objects will look > like but BJ1.X objects are definitely not good at this. > > It also raises and interesting point which I haven't seen discussed > much on the list; what will BJ3 be for (or not for). One of the > painful lessons (for me) from working on BioJava is you can't make an > API do everything. The more modular approach to BJ3 should help avoid > this. I see nothing wrong with having a module that is more suitable > for the kind of work sequence-data-binding you are proposing. This > modules objects should definitely have public constructors and public > setters. Why not make use of Entity Beans (Post EJB 3) while your at > it. If it is in it's own module it will not corrupt the other parts of > BioJava with "unsafe" beany objects. > > In this case making your own objects (and sharing them) would actually > be a whole lot better than trying to shoe horn an API that wasn't made > for this. Some IDEs will even auto-generate databinding objects for > you; although, I understand there is some strange cases in Ensembl > that might not make this a good idea). I think this is really the point that we have been picking at all along. The current way that BJ3 objects seem to be set up makes them difficult to use in any manner other than that intended by the core BJ3 developers. We were hoping that there would be all the generic sequence and coordinate transformation "goodness" available for us in a bean format and then all we would have to do (!!) would be to define the data access methods necessary to populate those objects. That seems to be not the way that things are set up and mapping our ideas and thought processes to BJ3 has not been as easy as we would have liked. Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Thu May 20 04:53:43 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 20 May 2010 09:53:43 +0100 Subject: [Biojava-dev] Where BioJava3 is going In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20C@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk>

Message-ID: <020C9EB8-C989-4193-817D-1578DD6A31E3@ebi.ac.uk> Changed the subject to something more relevant. I think this is more exemplified by the current state of the API rather than the intention of where it can go. Scooter's main focus is working towards solid objects to represent entities on Sequence e.g. GeneSequence/ChromosomeSequence are perfect examples of this. My focus is more towards working at the generic level so more programming against Sequence interfaces; translation, reversing & complementing strongly points towards this. Between us though I really do hope we can generate a framework which lets a user come in and use the more solid classes but also let another API use the backing classes. The rest will be coming soon. I am also looking at supporting features and attributes against Sequences but this is still some time off. I'm also looking at coordinate translation but again this is something that is sometime away (mostly because I do not have the test case to do it). The enemy here is time as I am sure it is for your group. Probably the best thing I can do is put up some design documents onto the wiki about where I think parts of the API should go and people can pick this to pieces as much as they want to. Interfaces and test cases about intended behaviour would also help. Again this will take time :( Andy On 20 May 2010, at 09:30, LAW Andrew wrote: > > On 20 May 2010, at 05:11, Mark Schreiber wrote: > >> If you are using BioJava objects as fake DTO's or EntityBean >> look-a-likes you should really question why you are using BioJava >> objects in the first place. Not sure what BioJava3 objects will look >> like but BJ1.X objects are definitely not good at this. >> >> It also raises and interesting point which I haven't seen discussed >> much on the list; what will BJ3 be for (or not for). One of the >> painful lessons (for me) from working on BioJava is you can't make an >> API do everything. The more modular approach to BJ3 should help avoid >> this. I see nothing wrong with having a module that is more suitable >> for the kind of work sequence-data-binding you are proposing. This >> modules objects should definitely have public constructors and public >> setters. Why not make use of Entity Beans (Post EJB 3) while your at >> it. If it is in it's own module it will not corrupt the other parts of >> BioJava with "unsafe" beany objects. >> >> In this case making your own objects (and sharing them) would actually >> be a whole lot better than trying to shoe horn an API that wasn't made >> for this. Some IDEs will even auto-generate databinding objects for >> you; although, I understand there is some strange cases in Ensembl >> that might not make this a good idea). > > > I think this is really the point that we have been picking at all along. The current way that BJ3 objects seem to be set up makes them difficult to use in any manner other than that intended by the core BJ3 developers. We were hoping that there would be all the generic sequence and coordinate transformation "goodness" available for us in a bean format and then all we would have to do (!!) would be to define the data access methods necessary to populate those objects. That seems to be not the way that things are set up and mapping our ideas and thought processes to BJ3 has not been as easy as we would have liked. > > > > Later, > > Andy > -------- > Yada, yada, yada... > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From markjschreiber at gmail.com Thu May 20 04:59:08 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 20 May 2010 16:59:08 +0800 Subject: [Biojava-dev] Where BioJava3 is going In-Reply-To: <020C9EB8-C989-4193-817D-1578DD6A31E3@ebi.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk>

<020C9EB8-C989-4193-817D-1578DD6A31E3@ebi.ac.uk> Message-ID: Any reason why we couldn't have a maven module where Sequence interface goodness is backed by an Ensembl mapping beany implementation? This would seem to me to be the best approach. That way if you want to use Ensembl and BioJava you could just drop in the appropriate module. - Mark On Thu, May 20, 2010 at 4:53 PM, Andy Yates wrote: > Changed the subject to something more relevant. > > I think this is more exemplified by the current state of the API rather than the intention of where it can go. Scooter's main focus is working towards solid objects to represent entities on Sequence e.g. GeneSequence/ChromosomeSequence are perfect examples of this. My focus is more towards working at the generic level so more programming against Sequence interfaces; translation, reversing & complementing strongly points towards this. Between us though I really do hope we can generate a framework which lets a user come in and use the more solid classes but also let another API use the backing classes. > > The rest will be coming soon. I am also looking at supporting features and attributes against Sequences but this is still some time off. I'm also looking at coordinate translation but again this is something that is sometime away (mostly because I do not have the test case to do it). > > The enemy here is time as I am sure it is for your group. > > Probably the best thing I can do is put up some design documents onto the wiki about where I think parts of the API should go and people can pick this to pieces as much as they want to. Interfaces and test cases about intended behaviour would also help. Again this will take time :( > > Andy > > On 20 May 2010, at 09:30, LAW Andrew wrote: > >> >> On 20 May 2010, at 05:11, Mark Schreiber wrote: >> >>> If you are using BioJava objects as fake DTO's or EntityBean >>> look-a-likes you should really question why you are using BioJava >>> objects in the first place. Not sure what BioJava3 objects will look >>> like but BJ1.X objects are definitely not good at this. >>> >>> It also raises and interesting point which I haven't seen discussed >>> much on the list; what will BJ3 be for (or not for). One of the >>> painful lessons (for me) from working on BioJava is you can't make an >>> API do everything. The more modular approach to BJ3 should help avoid >>> this. I see nothing wrong with having a module that is more suitable >>> for the kind of work sequence-data-binding you are proposing. This >>> modules objects should definitely have public constructors and public >>> setters. Why not make use of Entity Beans (Post EJB 3) while your at >>> it. If it is in it's own module it will not corrupt the other parts of >>> BioJava with "unsafe" beany objects. >>> >>> In this case making your own objects (and sharing them) would actually >>> be a whole lot better than trying to shoe horn an API that wasn't made >>> for this. Some IDEs will even auto-generate databinding objects for >>> you; although, I understand there is some strange cases in Ensembl >>> that might not make this a good idea). >> >> >> I think this is really the point that we have been picking at all along. The current way that BJ3 objects seem to be set up makes them difficult to use in any manner other than that intended by the core BJ3 developers. We were hoping that there would be all the generic sequence and coordinate transformation "goodness" available for us in a bean format and then all we would have to do (!!) would be to define the data access methods necessary to populate those objects. That seems to be not the way that things are set up and mapping our ideas and thought processes to BJ3 has not been as easy as we would have liked. >> >> >> >> Later, >> >> Andy >> -------- >> Yada, yada, yada... >> >> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >> >> >> >> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > From ayates at ebi.ac.uk Thu May 20 05:38:26 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 20 May 2010 10:38:26 +0100 Subject: [Biojava-dev] Where BioJava3 is going In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <73D847AF-1C33-4229-BB74-E20C354C13BD@ebi.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk>

<020C9EB8-C989-4193-817D-1578DD6A31E3@ebi.ac.uk> Message-ID: That's got to be the way to go but the interface needs to be slightly more flexible WRT to features & attributes. Currently Sequence defines just that where as Sequence would really be used as the container for compounds & "other things". If that's in place then hopefully this will help the guys at the Roslin to do what they need to do Andy On 20 May 2010, at 09:59, Mark Schreiber wrote: > Any reason why we couldn't have a maven module where Sequence extends Compound> interface goodness is backed by an Ensembl mapping > beany implementation? This would seem to me to be the best approach. > That way if you want to use Ensembl and BioJava you could just drop in > the appropriate module. > > - Mark > > On Thu, May 20, 2010 at 4:53 PM, Andy Yates wrote: >> Changed the subject to something more relevant. >> >> I think this is more exemplified by the current state of the API rather than the intention of where it can go. Scooter's main focus is working towards solid objects to represent entities on Sequence e.g. GeneSequence/ChromosomeSequence are perfect examples of this. My focus is more towards working at the generic level so more programming against Sequence interfaces; translation, reversing & complementing strongly points towards this. Between us though I really do hope we can generate a framework which lets a user come in and use the more solid classes but also let another API use the backing classes. >> >> The rest will be coming soon. I am also looking at supporting features and attributes against Sequences but this is still some time off. I'm also looking at coordinate translation but again this is something that is sometime away (mostly because I do not have the test case to do it). >> >> The enemy here is time as I am sure it is for your group. >> >> Probably the best thing I can do is put up some design documents onto the wiki about where I think parts of the API should go and people can pick this to pieces as much as they want to. Interfaces and test cases about intended behaviour would also help. Again this will take time :( >> >> Andy >> >> On 20 May 2010, at 09:30, LAW Andrew wrote: >> >>> >>> On 20 May 2010, at 05:11, Mark Schreiber wrote: >>> >>>> If you are using BioJava objects as fake DTO's or EntityBean >>>> look-a-likes you should really question why you are using BioJava >>>> objects in the first place. Not sure what BioJava3 objects will look >>>> like but BJ1.X objects are definitely not good at this. >>>> >>>> It also raises and interesting point which I haven't seen discussed >>>> much on the list; what will BJ3 be for (or not for). One of the >>>> painful lessons (for me) from working on BioJava is you can't make an >>>> API do everything. The more modular approach to BJ3 should help avoid >>>> this. I see nothing wrong with having a module that is more suitable >>>> for the kind of work sequence-data-binding you are proposing. This >>>> modules objects should definitely have public constructors and public >>>> setters. Why not make use of Entity Beans (Post EJB 3) while your at >>>> it. If it is in it's own module it will not corrupt the other parts of >>>> BioJava with "unsafe" beany objects. >>>> >>>> In this case making your own objects (and sharing them) would actually >>>> be a whole lot better than trying to shoe horn an API that wasn't made >>>> for this. Some IDEs will even auto-generate databinding objects for >>>> you; although, I understand there is some strange cases in Ensembl >>>> that might not make this a good idea). >>> >>> >>> I think this is really the point that we have been picking at all along. The current way that BJ3 objects seem to be set up makes them difficult to use in any manner other than that intended by the core BJ3 developers. We were hoping that there would be all the generic sequence and coordinate transformation "goodness" available for us in a bean format and then all we would have to do (!!) would be to define the data access methods necessary to populate those objects. That seems to be not the way that things are set up and mapping our ideas and thought processes to BJ3 has not been as easy as we would have liked. >>> >>> >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From markjschreiber at gmail.com Thu May 20 05:42:09 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 20 May 2010 17:42:09 +0800 Subject: [Biojava-dev] Where BioJava3 is going In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk>

<020C9EB8-C989-4193-817D-1578DD6A31E3@ebi.ac.uk>

Message-ID: If the interface doesn't quite fit you could build some bridge conversion code, although this takes you further from BioJava3 and makes it's use more questionable for your purpose. - Mark On Thu, May 20, 2010 at 5:38 PM, Andy Yates wrote: > That's got to be the way to go but the interface needs to be slightly more flexible WRT to features & attributes. Currently Sequence defines just that where as Sequence would really be used as the container for compounds & "other things". If that's in place then hopefully this will help the guys at the Roslin to do what they need to do > > Andy > > On 20 May 2010, at 09:59, Mark Schreiber wrote: > >> Any reason why we couldn't have a maven module where Sequence> extends Compound> interface goodness is backed by an Ensembl mapping >> beany implementation? ?This would seem to me to be the best approach. >> That way if you want to use Ensembl and BioJava you could just drop in >> the appropriate module. >> >> - Mark >> >> On Thu, May 20, 2010 at 4:53 PM, Andy Yates wrote: >>> Changed the subject to something more relevant. >>> >>> I think this is more exemplified by the current state of the API rather than the intention of where it can go. Scooter's main focus is working towards solid objects to represent entities on Sequence e.g. GeneSequence/ChromosomeSequence are perfect examples of this. My focus is more towards working at the generic level so more programming against Sequence interfaces; translation, reversing & complementing strongly points towards this. Between us though I really do hope we can generate a framework which lets a user come in and use the more solid classes but also let another API use the backing classes. >>> >>> The rest will be coming soon. I am also looking at supporting features and attributes against Sequences but this is still some time off. I'm also looking at coordinate translation but again this is something that is sometime away (mostly because I do not have the test case to do it). >>> >>> The enemy here is time as I am sure it is for your group. >>> >>> Probably the best thing I can do is put up some design documents onto the wiki about where I think parts of the API should go and people can pick this to pieces as much as they want to. Interfaces and test cases about intended behaviour would also help. Again this will take time :( >>> >>> Andy >>> >>> On 20 May 2010, at 09:30, LAW Andrew wrote: >>> >>>> >>>> On 20 May 2010, at 05:11, Mark Schreiber wrote: >>>> >>>>> If you are using BioJava objects as fake DTO's or EntityBean >>>>> look-a-likes you should really question why you are using BioJava >>>>> objects in the first place. Not sure what BioJava3 objects will look >>>>> like but BJ1.X objects are definitely not good at this. >>>>> >>>>> It also raises and interesting point which I haven't seen discussed >>>>> much on the list; what will BJ3 be for (or not for). One of the >>>>> painful lessons (for me) from working on BioJava is you can't make an >>>>> API do everything. The more modular approach to BJ3 should help avoid >>>>> this. I see nothing wrong with having a module that is more suitable >>>>> for the kind of work sequence-data-binding you are proposing. This >>>>> modules objects should definitely have public constructors and public >>>>> setters. Why not make use of Entity Beans (Post EJB 3) while your at >>>>> it. If it is in it's own module it will not corrupt the other parts of >>>>> BioJava with "unsafe" beany objects. >>>>> >>>>> In this case making your own objects (and sharing them) would actually >>>>> be a whole lot better than trying to shoe horn an API that wasn't made >>>>> for this. Some IDEs will even auto-generate databinding objects for >>>>> you; although, I understand there is some strange cases in Ensembl >>>>> that might not make this a good idea). >>>> >>>> >>>> I think this is really the point that we have been picking at all along. The current way that BJ3 objects seem to be set up makes them difficult to use in any manner other than that intended by the core BJ3 developers. We were hoping that there would be all the generic sequence and coordinate transformation "goodness" available for us in a bean format and then all we would have to do (!!) would be to define the data access methods necessary to populate those objects. That seems to be not the way that things are set up and mapping our ideas and thought processes to BJ3 has not been as easy as we would have liked. >>>> >>>> >>>> >>>> Later, >>>> >>>> Andy >>>> -------- >>>> Yada, yada, yada... >>>> >>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > From ayates at ebi.ac.uk Thu May 20 05:44:00 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 20 May 2010 10:44:00 +0100 Subject: [Biojava-dev] Where BioJava3 is going In-Reply-To: References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk>

<020C9EB8-C989-4193-817D-1578DD6A31E3@ebi.ac.uk>

Message-ID: <49849EE0-B319-47C7-820E-CD14EA12088E@ebi.ac.uk> If someone is going to use the interface & code then it's got to be 1:1 IMHO. Glue code is fine for existing APIs but not for new ones :) On 20 May 2010, at 10:42, Mark Schreiber wrote: > If the interface doesn't quite fit you could build some bridge > conversion code, although this takes you further from BioJava3 and > makes it's use more questionable for your purpose. > > - Mark > > On Thu, May 20, 2010 at 5:38 PM, Andy Yates wrote: >> That's got to be the way to go but the interface needs to be slightly more flexible WRT to features & attributes. Currently Sequence defines just that where as Sequence would really be used as the container for compounds & "other things". If that's in place then hopefully this will help the guys at the Roslin to do what they need to do >> >> Andy >> >> On 20 May 2010, at 09:59, Mark Schreiber wrote: >> >>> Any reason why we couldn't have a maven module where Sequence>> extends Compound> interface goodness is backed by an Ensembl mapping >>> beany implementation? This would seem to me to be the best approach. >>> That way if you want to use Ensembl and BioJava you could just drop in >>> the appropriate module. >>> >>> - Mark >>> >>> On Thu, May 20, 2010 at 4:53 PM, Andy Yates wrote: >>>> Changed the subject to something more relevant. >>>> >>>> I think this is more exemplified by the current state of the API rather than the intention of where it can go. Scooter's main focus is working towards solid objects to represent entities on Sequence e.g. GeneSequence/ChromosomeSequence are perfect examples of this. My focus is more towards working at the generic level so more programming against Sequence interfaces; translation, reversing & complementing strongly points towards this. Between us though I really do hope we can generate a framework which lets a user come in and use the more solid classes but also let another API use the backing classes. >>>> >>>> The rest will be coming soon. I am also looking at supporting features and attributes against Sequences but this is still some time off. I'm also looking at coordinate translation but again this is something that is sometime away (mostly because I do not have the test case to do it). >>>> >>>> The enemy here is time as I am sure it is for your group. >>>> >>>> Probably the best thing I can do is put up some design documents onto the wiki about where I think parts of the API should go and people can pick this to pieces as much as they want to. Interfaces and test cases about intended behaviour would also help. Again this will take time :( >>>> >>>> Andy >>>> >>>> On 20 May 2010, at 09:30, LAW Andrew wrote: >>>> >>>>> >>>>> On 20 May 2010, at 05:11, Mark Schreiber wrote: >>>>> >>>>>> If you are using BioJava objects as fake DTO's or EntityBean >>>>>> look-a-likes you should really question why you are using BioJava >>>>>> objects in the first place. Not sure what BioJava3 objects will look >>>>>> like but BJ1.X objects are definitely not good at this. >>>>>> >>>>>> It also raises and interesting point which I haven't seen discussed >>>>>> much on the list; what will BJ3 be for (or not for). One of the >>>>>> painful lessons (for me) from working on BioJava is you can't make an >>>>>> API do everything. The more modular approach to BJ3 should help avoid >>>>>> this. I see nothing wrong with having a module that is more suitable >>>>>> for the kind of work sequence-data-binding you are proposing. This >>>>>> modules objects should definitely have public constructors and public >>>>>> setters. Why not make use of Entity Beans (Post EJB 3) while your at >>>>>> it. If it is in it's own module it will not corrupt the other parts of >>>>>> BioJava with "unsafe" beany objects. >>>>>> >>>>>> In this case making your own objects (and sharing them) would actually >>>>>> be a whole lot better than trying to shoe horn an API that wasn't made >>>>>> for this. Some IDEs will even auto-generate databinding objects for >>>>>> you; although, I understand there is some strange cases in Ensembl >>>>>> that might not make this a good idea). >>>>> >>>>> >>>>> I think this is really the point that we have been picking at all along. The current way that BJ3 objects seem to be set up makes them difficult to use in any manner other than that intended by the core BJ3 developers. We were hoping that there would be all the generic sequence and coordinate transformation "goodness" available for us in a bean format and then all we would have to do (!!) would be to define the data access methods necessary to populate those objects. That seems to be not the way that things are set up and mapping our ideas and thought processes to BJ3 has not been as easy as we would have liked. >>>>> >>>>> >>>>> >>>>> Later, >>>>> >>>>> Andy >>>>> -------- >>>>> Yada, yada, yada... >>>>> >>>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >>>>> >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andy.law at roslin.ed.ac.uk Thu May 20 05:55:57 2010 From: andy.law at roslin.ed.ac.uk (LAW Andrew) Date: Thu, 20 May 2010 10:55:57 +0100 Subject: [Biojava-dev] Where BioJava3 is going In-Reply-To: <49849EE0-B319-47C7-820E-CD14EA12088E@ebi.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E212@EXSEEDMD.is.ed.ac.uk>

<020C9EB8-C989-4193-817D-1578DD6A31E3@ebi.ac.uk>

<49849EE0-B319-47C7-820E-CD14EA12088E@ebi.ac.uk> Message-ID: <85CBA861-F912-45A5-86CE-24CFD4825CA2@exseed.ed.ac.uk> Interface is no use to me if I want the coordinate transformation goodness. That requires real active code. My ideal design has Interface (good for coding and testing), underpinned by Bean-style Implementation (good for shared common code actually implementing the functionality) underpinned subsequently by a third package of data access implementation. You can build your third layer, I can build mine but we can all leverage off layers 1 and 2. On 20 May 2010, at 10:44, Andy Yates wrote: > If someone is going to use the interface & code then it's got to be 1:1 IMHO. Glue code is fine for existing APIs but not for new ones :) > > On 20 May 2010, at 10:42, Mark Schreiber wrote: > >> If the interface doesn't quite fit you could build some bridge >> conversion code, although this takes you further from BioJava3 and >> makes it's use more questionable for your purpose. >> >> - Mark >> >> On Thu, May 20, 2010 at 5:38 PM, Andy Yates wrote: >>> That's got to be the way to go but the interface needs to be slightly more flexible WRT to features & attributes. Currently Sequence defines just that where as Sequence would really be used as the container for compounds & "other things". If that's in place then hopefully this will help the guys at the Roslin to do what they need to do >>> >>> Andy >>> >>> On 20 May 2010, at 09:59, Mark Schreiber wrote: >>> >>>> Any reason why we couldn't have a maven module where Sequence>>> extends Compound> interface goodness is backed by an Ensembl mapping >>>> beany implementation? This would seem to me to be the best approach. >>>> That way if you want to use Ensembl and BioJava you could just drop in >>>> the appropriate module. >>>> >>>> - Mark >>>> >>>> On Thu, May 20, 2010 at 4:53 PM, Andy Yates wrote: >>>>> Changed the subject to something more relevant. >>>>> >>>>> I think this is more exemplified by the current state of the API rather than the intention of where it can go. Scooter's main focus is working towards solid objects to represent entities on Sequence e.g. GeneSequence/ChromosomeSequence are perfect examples of this. My focus is more towards working at the generic level so more programming against Sequence interfaces; translation, reversing & complementing strongly points towards this. Between us though I really do hope we can generate a framework which lets a user come in and use the more solid classes but also let another API use the backing classes. >>>>> >>>>> The rest will be coming soon. I am also looking at supporting features and attributes against Sequences but this is still some time off. I'm also looking at coordinate translation but again this is something that is sometime away (mostly because I do not have the test case to do it). >>>>> >>>>> The enemy here is time as I am sure it is for your group. >>>>> >>>>> Probably the best thing I can do is put up some design documents onto the wiki about where I think parts of the API should go and people can pick this to pieces as much as they want to. Interfaces and test cases about intended behaviour would also help. Again this will take time :( >>>>> >>>>> Andy >>>>> >>>>> On 20 May 2010, at 09:30, LAW Andrew wrote: >>>>> >>>>>> >>>>>> On 20 May 2010, at 05:11, Mark Schreiber wrote: >>>>>> >>>>>>> If you are using BioJava objects as fake DTO's or EntityBean >>>>>>> look-a-likes you should really question why you are using BioJava >>>>>>> objects in the first place. Not sure what BioJava3 objects will look >>>>>>> like but BJ1.X objects are definitely not good at this. >>>>>>> >>>>>>> It also raises and interesting point which I haven't seen discussed >>>>>>> much on the list; what will BJ3 be for (or not for). One of the >>>>>>> painful lessons (for me) from working on BioJava is you can't make an >>>>>>> API do everything. The more modular approach to BJ3 should help avoid >>>>>>> this. I see nothing wrong with having a module that is more suitable >>>>>>> for the kind of work sequence-data-binding you are proposing. This >>>>>>> modules objects should definitely have public constructors and public >>>>>>> setters. Why not make use of Entity Beans (Post EJB 3) while your at >>>>>>> it. If it is in it's own module it will not corrupt the other parts of >>>>>>> BioJava with "unsafe" beany objects. >>>>>>> >>>>>>> In this case making your own objects (and sharing them) would actually >>>>>>> be a whole lot better than trying to shoe horn an API that wasn't made >>>>>>> for this. Some IDEs will even auto-generate databinding objects for >>>>>>> you; although, I understand there is some strange cases in Ensembl >>>>>>> that might not make this a good idea). >>>>>> >>>>>> >>>>>> I think this is really the point that we have been picking at all along. The current way that BJ3 objects seem to be set up makes them difficult to use in any manner other than that intended by the core BJ3 developers. We were hoping that there would be all the generic sequence and coordinate transformation "goodness" available for us in a bean format and then all we would have to do (!!) would be to define the data access methods necessary to populate those objects. That seems to be not the way that things are set up and mapping our ideas and thought processes to BJ3 has not been as easy as we would have liked. >>>>>> >>>>>> >>>>>> >>>>>> Later, >>>>>> >>>>>> Andy >>>>>> -------- >>>>>> Yada, yada, yada... >>>>>> >>>>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>>>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>>> -- >>>>> Andrew Yates Ensembl Genomes Engineer >>>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ayates at ebi.ac.uk Thu May 20 06:16:21 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 20 May 2010 11:16:21 +0100 Subject: [Biojava-dev] Where BioJava3 is going In-Reply-To: <85CBA861-F912-45A5-86CE-24CFD4825CA2@exseed.ed.ac.uk> References: <428F360A0BA69249B73A710B58435CFE04DD42E20B@EXSEEDMD.is.ed.ac.uk> <99C91E4D-0CEE-4F65-8BCD-184F4011FE8B@ebi.ac.uk> <428F360A0BA69249B73A710B58435CFE04DD42E20F@EXSEEDMD.is.ed.ac.uk>