From gwaldon at geneinfinity.org Wed Sep 6 19:14:28 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Wed, 06 Sep 2006 16:14:28 -0700 Subject: [Biojava-dev] GenbankFormat and BASE COUNT Message-ID: <200609062314.k86NESGu081640@mmm1924.dulles19-verio.com> >From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] >Are you OK to watch for format changes? Sorry for the delay in responding. There are effectively a few incoming modifications. - new naturally occurring amino acid pyrrolysine (Pyl/O - 22nd) will become official on release 156.0, same with EMBL this fall. We'll have to adjust the PROTEIN and PROTEIN_TERM alphabets and maybe have more translation tables. - talking about translation tables, I noticed a while ago that the official genbank/EMBL/DDBJ feature table contains 23 genetic code tables whereas Biojava only describes 13. We should probably stick to genbank/EMBL/DDBJ translation tables. - Xle/J (leucine/isoleucine) will be legal starting Genbank 156.0 (October 2006). - Feature location syntax X.Y to be discontinued as of October 2006. Record will be changed, although the conversion rule is not given. Maybe it is time to remove this type of fuzziness from Biojava? Still not taken into account in org.biojavax.bio.seq.io.GenbankFormat: - SEGMENT keyword, not currently parsed, maybe on purpose. - CONTIG keyword, same as above. Example: AE014134, this is an entire chromosome. I can do the table and alphabet modifications when they become official. George From gwaldon at geneinfinity.org Mon Sep 11 20:38:41 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Mon, 11 Sep 2006 17:38:41 -0700 Subject: [Biojava-dev] Problem with ranks Message-ID: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com> Hi, I am having difficulties to use ranking with some objects found in SimpleRichSequence. There are 6 objects contained in SimpleRichSequence which are found within collections, namely SimpleComment, SimpleRankedCrossRef, SimpleRankedDocRef, SimpleNote, SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is associated with a TreeSet and uses to some extend ranking for comparison. Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions: - Can rank be negative? We would assume not but this is never checked. - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking. - Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected? - Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*. SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this? Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones? All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality. All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates). A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. We need a clear definition of what ranks are, what the ordering they imply is intended for and how to deal with duplicate ranks? Maybe we could have an interface that encapsulates the concept of ranking, e.g. interface Ranked, methods setRank() and getRank()) and all these information grouped in the javadoc. It seems easier to derive exceptions from a common pattern that the opposite. Maybe we also need separate comparators when they are not consistent with equal. Thanks, George From mark.schreiber at novartis.com Mon Sep 11 23:37:55 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 12 Sep 2006 11:37:55 +0800 Subject: [Biojava-dev] Problem with ranks Message-ID: Hi George, thanks for raising these issues. We should fix this before biojava 1.5 finishes it's beta testing. See my responses below. Richard Holland and David Scott will no doubt have comments too. >I am having difficulties to use ranking with some objects found in SimpleRichSequence. There are 6 objects >contained in SimpleRichSequence which are found within collections, namely SimpleComment, SimpleRankedCrossRef, >SimpleRankedDocRef, SimpleNote, SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is associated with >a TreeSet and uses to some extend ranking for comparison. > >Ranks are never described but the name suggests that they are positive integer, in consecutive order and not >identical for similar objects within the same sequence. Here are some questions: Ranks actually come from the BioSQL schema. They are used so that lists of features, comments etc that are stored in database tables (or any other collection) can be reassembled in the same order that they are found in the original flatfile (Genbank etc). Simply put they are used to preserve order. > - Can rank be negative? We would assume not but this is never checked. I suppose it could be but it would make no sense given the above description. We should probably document this in the javadocs and suggest that classes enforce the non-negative rule. - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking. At the moment this strictly depends on the creating object. Typically this would be a RichSequenceFormat implementation. The Genbank format appears to start numbering from either 0 or 1 (for comments). There should be a common rule. >- Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, >2000, etc. are possible or even expected? Is there any reason why we need to enforce this rule? It would be tidier but it would be a pain to have to re-order everything just because one object is deleted. The genbank parser currently numbers sequentially. >- Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are >*acceptable*. Certainly all the RankedCrossRefs returned by the Genbank parser have the same rank (0). It is possible as long as the objects are somehow unique. If equals() is true then the objects are overwritten. I don't think any Ranked object currently relies only on rank for equality (or for the compare() method either). The Unit tests do a pretty good job of testing equals and compare and making sure they return logically equivalent values. Although it is possible it may not be desirable. Any thoughts? >SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer >number. Any reason for this? I think Richard has a reason. Something to do with Hibernate?? Richard?? >Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and >SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these >objects without creating new ones? Possibly they should. Making things mutable is always tricky but the other objects with setRank methods register change listeners and have the option of vetoing the change so it can be done safely. The ChangeListener could be in charge of re-ordering ranks if you insert into the middle. >All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted >by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence >never works because TreeSet uses compareTo for testing equality. OK, that sounds like a bug that we have missed in the Unit tests. I will report it to bugzilla and fix it when I have time. >All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as >its name indicates). We should change this. Another bugzilla report. >A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but >not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef >can be equal and have different locations ? I can understand this. This is because the term of a SimpleNote is an ontology term and should therefore have only one value. Two Notes with the same term are therefore the same (or should be). For example if the term or keyword of the Note is Organism: there should only be one of these Notes. >We need a clear definition of what ranks are, what the ordering they imply is intended for and how to deal with >duplicate ranks? Maybe we could have an interface that encapsulates the concept of ranking, e.g. interface Ranked, >methods setRank() and getRank()) and all these information grouped in the javadoc. It seems easier to derive >exceptions from a common pattern that the opposite. Maybe we also need separate comparators when they are not >consistent with equal. I think we should have a 'Ranked' interface with clear rules in the javadoc. I can't think of any good reason why comparable and equal should not be consistent. We should try and keep them the same as much as possible. - Mark From Robin.Emig at pioneer.com Tue Sep 12 18:34:34 2006 From: Robin.Emig at pioneer.com (Emig, Robin) Date: Tue, 12 Sep 2006 15:34:34 -0700 Subject: [Biojava-dev] Java1.5 In-Reply-To: Message-ID: I'm a little confused about whether the Biojava 1.5 is using java 1.5. Looking through the email list it appears to be so, but the default compile options in the build file are still for java 1.4. Can anyone clarify for me? Thanks Robin This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From mark.schreiber at novartis.com Tue Sep 12 21:02:15 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 13 Sep 2006 09:02:15 +0800 Subject: [Biojava-dev] Java1.5 Message-ID: Biojava 1.5 officially uses JDK 1.4 - Mark "Emig, Robin" 09/13/2006 06:34 AM To: cc: Subject: Java1.5 I'm a little confused about whether the Biojava 1.5 is using java 1.5. Looking through the email list it appears to be so, but the default compile options in the build file are still for java 1.4. Can anyone clarify for me? Thanks Robin From gwaldon at geneinfinity.org Wed Sep 13 01:33:43 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Tue, 12 Sep 2006 22:33:43 -0700 Subject: [Biojava-dev] Re: Problem with ranks Message-ID: <200609130533.k8D5Xi63019465@mmm1924.dulles19-verio.com> Thank you Mark and Richard for your exhaustive answers. This is very much appreciated. I am not a database person and I was completely missing the other side of the story. Perhaps the Bio* projects could agree quickly on ranks before someone populates a database with exotic values. It seems that there is a consensus on this list for having ranks positive and non null integers when they are defined and equals to zero otherwise. This would also solve the problem of the nullable rank of BioEntryRelationship (which could be then equivalent to an integer value equal to zero). Also, I improperly reported that SimpleRankedDocRef compareTo method does not use rank. My apologies for the mistake. Thanks George From bugzilla-daemon at portal.open-bio.org Mon Sep 25 05:52:56 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 25 Sep 2006 05:52:56 -0400 Subject: [Biojava-dev] [Bug 2107] New: LabelledSequenceRenderer Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2107 Summary: LabelledSequenceRenderer Product: BioJava Version: 1.4 Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: bio AssignedTo: biojava-dev at biojava.org ReportedBy: jolyon.holdstock at ogt.co.uk Using a LabelledSequenceRenderer works as expected in a SequencePanel, but not in TranslatedSequencePanel. In the latter the label is not displayed. Also while sequence is displayed from the correct start point the actual sequence is incorrect. Below is some example code that demonstrates the problem. //Example code -------------------------------------------------- //Java libraries import java.awt.Color; import java.awt.BorderLayout; //Java extension libraries import javax.swing.JFrame; //BioJava libraries import org.biojava.bio.BioException; import org.biojava.utils.ChangeVetoException; import org.biojava.bio.symbol.RangeLocation; import org.biojava.bio.gui.sequence.SymbolSequenceRenderer; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.DNATools; import org.biojava.bio.gui.sequence.SequencePanel; import org.biojava.bio.gui.sequence.TranslatedSequencePanel; import org.biojava.bio.gui.sequence.LabelledSequenceRenderer; public class TestSequencePanel extends JFrame { private Sequence seq; private SequencePanel sp; private TranslatedSequencePanel tsp; public TestSequencePanel(){ try { //Create the SequencePanel and TranslatedSequencePanel sp = new SequencePanel(); tsp = new TranslatedSequencePanel(); //Create a DNA sequence seq = DNATools.createDNASequence("AGATAGCTAGCTAGATATGATAGATCGATAGCAAGCTAGCATCGACTACGATC","DNA"); //Create a renderer for the sequence SymbolSequenceRenderer ssr = new SymbolSequenceRenderer(); //Create the LabelledSequenceRenderer LabelledSequenceRenderer lsr = new LabelledSequenceRenderer(50, 50); lsr.setFillColor(Color.white); lsr.setRenderer(ssr); lsr.addLabelString("Seq"); //Set up the SequencePanel sp.setSequence(seq); sp.setRenderer(lsr); sp.setRange(new RangeLocation(1,300)); //Set up the TranslatedSequencePanel tsp.setSequence(seq); tsp.setRenderer(lsr); tsp.setScale(12); } catch(ChangeVetoException e){ System.out.println("ChangeVetoException: " + e); } catch(BioException e){ System.out.println("BioException: " + e); } //Add the panels to the frame this.getContentPane().setLayout(new BorderLayout()); this.getContentPane().add(sp, BorderLayout.NORTH); this.getContentPane().add(tsp, BorderLayout.CENTER); setLocation(100,100); setSize(400,200); setVisible(true); } public static void main(String[] args) { new TestSequencePanel(); } } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ClevelandJ at BATTELLE.ORG Tue Sep 26 10:42:57 2006 From: ClevelandJ at BATTELLE.ORG (Cleveland, John S) Date: Tue, 26 Sep 2006 10:42:57 -0400 Subject: [Biojava-dev] Percentage similarity Message-ID: <251E388086D4D64B8413DCC66EA438084164DB@WS-BSO-MSE1.milky-way.battelle.org> Does anyone know how to retrieve the percentage similarity from a BLAST result using BioJava? This field is not available from SeqSimilaritySearchSubHit. SeqSimilaritySearchSubHit does have the getEValue() and getScore() methods, so I was a little confused about not finding the "percentage identity" and "percentage similarity" fields. I followed the directions in http://biojava.org/wiki/BioJava:CookBook:Blast:Echo, but again the percentage similarity does not seem to be getting parsed by the BlastLikeSaxParser. Here is the result of the aforementioned code: startHit() HitProp: subjectSequenceLength: 299 HitProp: subjectId: lcd|5392-AAA98259 HitProp: subjectDescription: startSubHit() SubHitProp: score: 24.6 SubHitProp: expectValue: 6.9 SubHitProp: numberOfIdentities: 14 SubHitProp: alignmentSize: 42 SubHitProp: percentageIdentity: 33 SubHitProp: numberOfIdentities: 14 SubHitProp: numberOfPositives: 23 SubHitProp: queryFrame: plus1 SubHitProp: querySequenceStart: 928 SubHitProp: querySequenceEnd: 1047 SubHitProp: querySequence: TKDGKTQEWEMDNPGN--DFMTGSKDTYTFKLKDENLKIDDI SubHitProp: subjectSequenceStart: 126 SubHitProp: subjectSequenceEnd: 167 SubHitProp: subjectSequence: TDDGKIREYELPNKGSYPSFITLGSDNALWFTENQNNAIGRI endSubHit() endHit() Thanks, John Cleveland From bugzilla-daemon at portal.open-bio.org Thu Sep 28 05:27:11 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 28 Sep 2006 05:27:11 -0400 Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer In-Reply-To: Message-ID: <200609280927.k8S9RB8w018668@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2107 jolyon.holdstock at ogt.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jolyon.holdstock at ogt.co.uk ------- Comment #1 from jolyon.holdstock at ogt.co.uk 2006-09-28 05:27 ------- I've had a look at the TranslatedSequencePanel code and seem to have a work around. I say 'seem' as I'm not an expert on Graphics2D When using the LabelledSequenceRenderer in the TSP the paint method in the TSP doesn't set the clip for the renderer correctly. I have edited the following code in the TSP to change clip.x clip.width The point for g2.translate This sets the clip correctly, the label renders and the correct sequence displayed. //OLD CODE ========================================================== if (direction == HORIZONTAL) { // Clip x to edge of delegate renderer's leader clip.x = renderer.getMinimumLeader(this); clip.y = 0.0; // Set the width to visible symbols + the delegate // renderer's minimum trailer (which may have something in // it to render). clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) + renderer.getMinimumTrailer(this); clip.height = renderer.getDepth(this); g2.translate(leadingBorder.getSize() + insets.left, insets.top); } //NEW CODE ============================================================ if (direction == HORIZONTAL) { // Clip x to edge of delegate renderer's leader //clip.x = renderer.getMinimumLeader(this); clip.x = 0 - renderer.getMinimumLeader(this); clip.y = 0.0; // Set the width to visible symbols + the delegate // renderer's minimum trailer (which may have something in // it to render). clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) + renderer.getMinimumLeader(this) + renderer.getMinimumTrailer(this); clip.height = renderer.getDepth(this); g2.translate(leadingBorder.getSize() - clip.x + insets.left, insets.top); } I have used this code with the RulerRenderer via the MultiLineRenderer and think that the ruler doesn't renderer numbers/ticks accurately for the sequence in the TSP. It's marginal and only relevant at high resolution but I'll have a look at this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From smh1008 at cam.ac.uk Thu Sep 28 11:51:36 2006 From: smh1008 at cam.ac.uk (David Huen) Date: 28 Sep 2006 16:51:36 +0100 Subject: [Biojava-dev] tRNA anticodon alphabet Message-ID: Hi, Would there be any object to adding an alphabet to deal with anticodons? This would involve an additional alphabet comprising the current RNA alphabet extended with inosine. Regards, David Huen From smh1008 at cam.ac.uk Thu Sep 28 11:48:25 2006 From: smh1008 at cam.ac.uk (David Huen) Date: 28 Sep 2006 16:48:25 +0100 Subject: [Biojava-dev] CodonPrefTools API Message-ID: Hi, I wish to add to the CodonPrefTools API convenience methods that return each of the 64 codons. It would seem better to put it here in this less used API than clutter up the RNATools API. If anyone wishes to object could they do so now please? Regards, David Huen From mark.schreiber at novartis.com Thu Sep 28 21:17:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 29 Sep 2006 09:17:13 +0800 Subject: [Biojava-dev] tRNA anticodon alphabet Message-ID: I think it would be a useful addition. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 David Huen Sent by: biojava-dev-bounces at lists.open-bio.org 09/28/2006 11:51 PM To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] tRNA anticodon alphabet Hi, Would there be any object to adding an alphabet to deal with anticodons? This would involve an additional alphabet comprising the current RNA alphabet extended with inosine. Regards, David Huen _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From bubba.puryear at gmail.com Fri Sep 29 12:22:03 2006 From: bubba.puryear at gmail.com (Bubba Puryear) Date: Fri, 29 Sep 2006 12:22:03 -0400 Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with leading whitespace Message-ID: Hey all, I've been using biojava for some time now on my project for reading genbank flat files, but until reacently I haven't been writing any. Our client makes extensive use of VectorNTI (version 9, I think) and I was doing some edits to genbank files (via biojavax) and notice that comment values get their whitespace trimmed. Turns out VNTI splats a load of state that it needs in the comment section is a fairly lispish looking syntax... but indentation appears to be important. In particular, VNTI won't read the files I've edited that have had their whitespace munged. I have some local changes to the parser that preserve leading/trailing whitespace for section values for top level sections. I've run the tests locally (and added one for testing indented comments) and run this against ~ 3000 files I have locally. I wanted to get some feedback on this before I committed, though. As an example of the kind of thing that currently gets munged: COMMENT Vector_NTI_Display_Data_(Do_Not_Edit!) COMMENT (SXF COMMENT (CGexDoc "11460" 0 6359 COMMENT (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0 (CObList) (CObList) COMMENT (CObList) (CObList) -1) COMMENT (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5 40 50 0 1 0 .... The level of indentation can get quite deep. Thanks, Bubba From markjschreiber at gmail.com Sat Sep 30 08:29:41 2006 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 30 Sep 2006 20:29:41 +0800 Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with leading whitespace In-Reply-To: References: Message-ID: <93b45ca50609300529r8491cf0p22784589bea59618@mail.gmail.com> I think this should be fine to commit as long as biojava can still read in the file again (and other files). You should probably also comment the code to say VNTI needs this and to be doubly certain put in a unit test. - Mark On 9/30/06, Bubba Puryear wrote: > Hey all, > > I've been using biojava for some time now on my project for reading > genbank flat files, but until reacently I haven't been writing any. > Our client makes extensive use of VectorNTI (version 9, I think) and I > was doing some edits to genbank files (via biojavax) and notice that > comment values get their whitespace trimmed. > > Turns out VNTI splats a load of state that it needs in the comment > section is a fairly lispish looking syntax... but indentation appears > to be important. In particular, VNTI won't read the files I've edited > that have had their whitespace munged. I have some local changes to > the parser that preserve leading/trailing whitespace for section > values for top level sections. > > I've run the tests locally (and added one for testing indented > comments) and run this against ~ 3000 files I have locally. I wanted > to get some feedback on this before I committed, though. > > As an example of the kind of thing that currently gets munged: > > COMMENT Vector_NTI_Display_Data_(Do_Not_Edit!) > COMMENT (SXF > COMMENT (CGexDoc "11460" 0 6359 > COMMENT (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0 > (CObList) (CObList) > COMMENT (CObList) (CObList) -1) > COMMENT (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5 > 40 50 0 1 0 > .... > > The level of indentation can get quite deep. > > Thanks, > Bubba > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at ebi.ac.uk Tue Sep 12 05:36:47 2006 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 12 Sep 2006 09:36:47 -0000 Subject: [Biojava-dev] Problem with ranks In-Reply-To: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com> References: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com> Message-ID: <45067DF3.7020403@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions: Ranks in general are defined by BioSQL, but as much else in that schema they are not defined very well and so everyone has their own interpretation of what should go where. BioJava uses them in the way which I thought was most logical at the time, but BioPerl often ignores them completely and populates them all with zeroes. As BioJava can be connected to a database which could have been populated by BioPerl, it has to be able to cope with these different situations and potentially many others. It would be nice for all the Bio* projects to agree on exactly how to store various bits of information in BioSQL, especially as to how best to represent specific file formats such as GenBank, but this is probably highly unlikely given the limited amount of times when representatives of all the projects are in the same place at the same time (basically only at BOSC, and even then not always - there was nobody from BioJava there this year). > - Can rank be negative? We would assume not but this is never checked. Yes. It can be any integer you want. > - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking. I tried to start them all from 1, and used 0 for no-rank where rank is compulsory, and null where rank is optional (see below). If you find anywhere where I've been inconsistent, please feel free to raise a Bugzilla bug to point out where I've gone wrong so I can fix them. > - Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected? They don't have to be consecutive. > - Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*. Yes, duplicates are fine. > SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this? In BioSQL, BioEntryRelationship has a nullable rank, whereas all other ranked objects have non-null ranks. Hence I have to use an Integer object here to be able to cater for the null case, as this cannot be done with a plain int like the others. > Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones? This is a bug. They should be mutable and fire appropriate change events. > All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality. This is another bug. compareTo, equals and hashCode should always be working with the same fields. In this case, compareTo is missing a bunch. It shouldn't be. A word of warning though - when objects are loaded by Hibernate, often they are instantiated and added to a set _before_ all the setXXX methods are called to populate the various fields. Therefore, if you find nulls in any of the fields required for comparison then you should assume the object is still incomplete and return a non-zero result, to prevent the object from accidentally replacing an existing object that matches the fields populated so far. > All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates). Another bug. It should be using rank as well. > A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. SimpleNote is correct - two notes are equal if they have the same rank and term. SimpleRankedDocRef however is incorrect - it should include location in the equals/compareTo/hashCode methods. Another bug then, but check for non-null locations during Hibernate loading as above. If you or Mark can report all these to Bugzilla, then one of us will get round to fixing them before the end of the beta testing. (Reporting them to Bugzilla makes a nice todo list which is far more reliable than me trying to keep track of everything on paper...). cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFBn3z4C5LeMEKA/QRAmU4AJ9TJ5oh7EnUdJNLHryEx3RxNJ0CXwCfe2eY e8Qww/i+MMBA8sgRJVvV+Z8= =UURD -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Sep 27 05:18:42 2006 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 27 Sep 2006 17:18:42 +0800 Subject: [Biojava-dev] resources for gui? Message-ID: <93b45ca50609270218n76f19a2bxcd6c6b4d53dbea15@mail.gmail.com> Hi - Can someone tell me what the purpose of the files in resources/org/biojava/bio is? Thanks, - Mark From gwaldon at geneinfinity.org Wed Sep 6 23:14:28 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Wed, 06 Sep 2006 16:14:28 -0700 Subject: [Biojava-dev] GenbankFormat and BASE COUNT Message-ID: <200609062314.k86NESGu081640@mmm1924.dulles19-verio.com> >From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] >Are you OK to watch for format changes? Sorry for the delay in responding. There are effectively a few incoming modifications. - new naturally occurring amino acid pyrrolysine (Pyl/O - 22nd) will become official on release 156.0, same with EMBL this fall. We'll have to adjust the PROTEIN and PROTEIN_TERM alphabets and maybe have more translation tables. - talking about translation tables, I noticed a while ago that the official genbank/EMBL/DDBJ feature table contains 23 genetic code tables whereas Biojava only describes 13. We should probably stick to genbank/EMBL/DDBJ translation tables. - Xle/J (leucine/isoleucine) will be legal starting Genbank 156.0 (October 2006). - Feature location syntax X.Y to be discontinued as of October 2006. Record will be changed, although the conversion rule is not given. Maybe it is time to remove this type of fuzziness from Biojava? Still not taken into account in org.biojavax.bio.seq.io.GenbankFormat: - SEGMENT keyword, not currently parsed, maybe on purpose. - CONTIG keyword, same as above. Example: AE014134, this is an entire chromosome. I can do the table and alphabet modifications when they become official. George From gwaldon at geneinfinity.org Tue Sep 12 00:38:41 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Mon, 11 Sep 2006 17:38:41 -0700 Subject: [Biojava-dev] Problem with ranks Message-ID: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com> Hi, I am having difficulties to use ranking with some objects found in SimpleRichSequence. There are 6 objects contained in SimpleRichSequence which are found within collections, namely SimpleComment, SimpleRankedCrossRef, SimpleRankedDocRef, SimpleNote, SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is associated with a TreeSet and uses to some extend ranking for comparison. Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions: - Can rank be negative? We would assume not but this is never checked. - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking. - Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected? - Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*. SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this? Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones? All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality. All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates). A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. We need a clear definition of what ranks are, what the ordering they imply is intended for and how to deal with duplicate ranks? Maybe we could have an interface that encapsulates the concept of ranking, e.g. interface Ranked, methods setRank() and getRank()) and all these information grouped in the javadoc. It seems easier to derive exceptions from a common pattern that the opposite. Maybe we also need separate comparators when they are not consistent with equal. Thanks, George From mark.schreiber at novartis.com Tue Sep 12 03:37:55 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 12 Sep 2006 11:37:55 +0800 Subject: [Biojava-dev] Problem with ranks Message-ID: Hi George, thanks for raising these issues. We should fix this before biojava 1.5 finishes it's beta testing. See my responses below. Richard Holland and David Scott will no doubt have comments too. >I am having difficulties to use ranking with some objects found in SimpleRichSequence. There are 6 objects >contained in SimpleRichSequence which are found within collections, namely SimpleComment, SimpleRankedCrossRef, >SimpleRankedDocRef, SimpleNote, SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is associated with >a TreeSet and uses to some extend ranking for comparison. > >Ranks are never described but the name suggests that they are positive integer, in consecutive order and not >identical for similar objects within the same sequence. Here are some questions: Ranks actually come from the BioSQL schema. They are used so that lists of features, comments etc that are stored in database tables (or any other collection) can be reassembled in the same order that they are found in the original flatfile (Genbank etc). Simply put they are used to preserve order. > - Can rank be negative? We would assume not but this is never checked. I suppose it could be but it would make no sense given the above description. We should probably document this in the javadocs and suggest that classes enforce the non-negative rule. - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking. At the moment this strictly depends on the creating object. Typically this would be a RichSequenceFormat implementation. The Genbank format appears to start numbering from either 0 or 1 (for comments). There should be a common rule. >- Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, >2000, etc. are possible or even expected? Is there any reason why we need to enforce this rule? It would be tidier but it would be a pain to have to re-order everything just because one object is deleted. The genbank parser currently numbers sequentially. >- Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are >*acceptable*. Certainly all the RankedCrossRefs returned by the Genbank parser have the same rank (0). It is possible as long as the objects are somehow unique. If equals() is true then the objects are overwritten. I don't think any Ranked object currently relies only on rank for equality (or for the compare() method either). The Unit tests do a pretty good job of testing equals and compare and making sure they return logically equivalent values. Although it is possible it may not be desirable. Any thoughts? >SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer >number. Any reason for this? I think Richard has a reason. Something to do with Hibernate?? Richard?? >Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and >SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these >objects without creating new ones? Possibly they should. Making things mutable is always tricky but the other objects with setRank methods register change listeners and have the option of vetoing the change so it can be done safely. The ChangeListener could be in charge of re-ordering ranks if you insert into the middle. >All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted >by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence >never works because TreeSet uses compareTo for testing equality. OK, that sounds like a bug that we have missed in the Unit tests. I will report it to bugzilla and fix it when I have time. >All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as >its name indicates). We should change this. Another bugzilla report. >A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but >not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef >can be equal and have different locations ? I can understand this. This is because the term of a SimpleNote is an ontology term and should therefore have only one value. Two Notes with the same term are therefore the same (or should be). For example if the term or keyword of the Note is Organism: there should only be one of these Notes. >We need a clear definition of what ranks are, what the ordering they imply is intended for and how to deal with >duplicate ranks? Maybe we could have an interface that encapsulates the concept of ranking, e.g. interface Ranked, >methods setRank() and getRank()) and all these information grouped in the javadoc. It seems easier to derive >exceptions from a common pattern that the opposite. Maybe we also need separate comparators when they are not >consistent with equal. I think we should have a 'Ranked' interface with clear rules in the javadoc. I can't think of any good reason why comparable and equal should not be consistent. We should try and keep them the same as much as possible. - Mark From Robin.Emig at pioneer.com Tue Sep 12 22:34:34 2006 From: Robin.Emig at pioneer.com (Emig, Robin) Date: Tue, 12 Sep 2006 15:34:34 -0700 Subject: [Biojava-dev] Java1.5 In-Reply-To: Message-ID: I'm a little confused about whether the Biojava 1.5 is using java 1.5. Looking through the email list it appears to be so, but the default compile options in the build file are still for java 1.4. Can anyone clarify for me? Thanks Robin This communication is for use by the intended recipient and contains information that may be Privileged, confidential or copyrighted under applicable law. If you are not the intended recipient, you are hereby formally notified that any use, copying or distribution of this e-mail, in whole or in part, is strictly prohibited. Please notify the sender by return e-mail and delete this e-mail from your system. Unless explicitly and conspicuously designated as "E-Contract Intended", this e-mail does not constitute a contract offer, a contract amendment, or an acceptance of a contract offer. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties. Francais Deutsch Italiano Espanol Portugues Japanese Chinese Korean http://www.DuPont.com/corp/email_disclaimer.html From mark.schreiber at novartis.com Wed Sep 13 01:02:15 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 13 Sep 2006 09:02:15 +0800 Subject: [Biojava-dev] Java1.5 Message-ID: Biojava 1.5 officially uses JDK 1.4 - Mark "Emig, Robin" 09/13/2006 06:34 AM To: cc: Subject: Java1.5 I'm a little confused about whether the Biojava 1.5 is using java 1.5. Looking through the email list it appears to be so, but the default compile options in the build file are still for java 1.4. Can anyone clarify for me? Thanks Robin From gwaldon at geneinfinity.org Wed Sep 13 05:33:43 2006 From: gwaldon at geneinfinity.org (george waldon) Date: Tue, 12 Sep 2006 22:33:43 -0700 Subject: [Biojava-dev] Re: Problem with ranks Message-ID: <200609130533.k8D5Xi63019465@mmm1924.dulles19-verio.com> Thank you Mark and Richard for your exhaustive answers. This is very much appreciated. I am not a database person and I was completely missing the other side of the story. Perhaps the Bio* projects could agree quickly on ranks before someone populates a database with exotic values. It seems that there is a consensus on this list for having ranks positive and non null integers when they are defined and equals to zero otherwise. This would also solve the problem of the nullable rank of BioEntryRelationship (which could be then equivalent to an integer value equal to zero). Also, I improperly reported that SimpleRankedDocRef compareTo method does not use rank. My apologies for the mistake. Thanks George From bugzilla-daemon at portal.open-bio.org Mon Sep 25 09:52:56 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 25 Sep 2006 05:52:56 -0400 Subject: [Biojava-dev] [Bug 2107] New: LabelledSequenceRenderer Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2107 Summary: LabelledSequenceRenderer Product: BioJava Version: 1.4 Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: bio AssignedTo: biojava-dev at biojava.org ReportedBy: jolyon.holdstock at ogt.co.uk Using a LabelledSequenceRenderer works as expected in a SequencePanel, but not in TranslatedSequencePanel. In the latter the label is not displayed. Also while sequence is displayed from the correct start point the actual sequence is incorrect. Below is some example code that demonstrates the problem. //Example code -------------------------------------------------- //Java libraries import java.awt.Color; import java.awt.BorderLayout; //Java extension libraries import javax.swing.JFrame; //BioJava libraries import org.biojava.bio.BioException; import org.biojava.utils.ChangeVetoException; import org.biojava.bio.symbol.RangeLocation; import org.biojava.bio.gui.sequence.SymbolSequenceRenderer; import org.biojava.bio.seq.Sequence; import org.biojava.bio.seq.DNATools; import org.biojava.bio.gui.sequence.SequencePanel; import org.biojava.bio.gui.sequence.TranslatedSequencePanel; import org.biojava.bio.gui.sequence.LabelledSequenceRenderer; public class TestSequencePanel extends JFrame { private Sequence seq; private SequencePanel sp; private TranslatedSequencePanel tsp; public TestSequencePanel(){ try { //Create the SequencePanel and TranslatedSequencePanel sp = new SequencePanel(); tsp = new TranslatedSequencePanel(); //Create a DNA sequence seq = DNATools.createDNASequence("AGATAGCTAGCTAGATATGATAGATCGATAGCAAGCTAGCATCGACTACGATC","DNA"); //Create a renderer for the sequence SymbolSequenceRenderer ssr = new SymbolSequenceRenderer(); //Create the LabelledSequenceRenderer LabelledSequenceRenderer lsr = new LabelledSequenceRenderer(50, 50); lsr.setFillColor(Color.white); lsr.setRenderer(ssr); lsr.addLabelString("Seq"); //Set up the SequencePanel sp.setSequence(seq); sp.setRenderer(lsr); sp.setRange(new RangeLocation(1,300)); //Set up the TranslatedSequencePanel tsp.setSequence(seq); tsp.setRenderer(lsr); tsp.setScale(12); } catch(ChangeVetoException e){ System.out.println("ChangeVetoException: " + e); } catch(BioException e){ System.out.println("BioException: " + e); } //Add the panels to the frame this.getContentPane().setLayout(new BorderLayout()); this.getContentPane().add(sp, BorderLayout.NORTH); this.getContentPane().add(tsp, BorderLayout.CENTER); setLocation(100,100); setSize(400,200); setVisible(true); } public static void main(String[] args) { new TestSequencePanel(); } } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ClevelandJ at BATTELLE.ORG Tue Sep 26 14:42:57 2006 From: ClevelandJ at BATTELLE.ORG (Cleveland, John S) Date: Tue, 26 Sep 2006 10:42:57 -0400 Subject: [Biojava-dev] Percentage similarity Message-ID: <251E388086D4D64B8413DCC66EA438084164DB@WS-BSO-MSE1.milky-way.battelle.org> Does anyone know how to retrieve the percentage similarity from a BLAST result using BioJava? This field is not available from SeqSimilaritySearchSubHit. SeqSimilaritySearchSubHit does have the getEValue() and getScore() methods, so I was a little confused about not finding the "percentage identity" and "percentage similarity" fields. I followed the directions in http://biojava.org/wiki/BioJava:CookBook:Blast:Echo, but again the percentage similarity does not seem to be getting parsed by the BlastLikeSaxParser. Here is the result of the aforementioned code: startHit() HitProp: subjectSequenceLength: 299 HitProp: subjectId: lcd|5392-AAA98259 HitProp: subjectDescription: startSubHit() SubHitProp: score: 24.6 SubHitProp: expectValue: 6.9 SubHitProp: numberOfIdentities: 14 SubHitProp: alignmentSize: 42 SubHitProp: percentageIdentity: 33 SubHitProp: numberOfIdentities: 14 SubHitProp: numberOfPositives: 23 SubHitProp: queryFrame: plus1 SubHitProp: querySequenceStart: 928 SubHitProp: querySequenceEnd: 1047 SubHitProp: querySequence: TKDGKTQEWEMDNPGN--DFMTGSKDTYTFKLKDENLKIDDI SubHitProp: subjectSequenceStart: 126 SubHitProp: subjectSequenceEnd: 167 SubHitProp: subjectSequence: TDDGKIREYELPNKGSYPSFITLGSDNALWFTENQNNAIGRI endSubHit() endHit() Thanks, John Cleveland From bugzilla-daemon at portal.open-bio.org Thu Sep 28 09:27:11 2006 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 28 Sep 2006 05:27:11 -0400 Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer In-Reply-To: Message-ID: <200609280927.k8S9RB8w018668@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2107 jolyon.holdstock at ogt.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jolyon.holdstock at ogt.co.uk ------- Comment #1 from jolyon.holdstock at ogt.co.uk 2006-09-28 05:27 ------- I've had a look at the TranslatedSequencePanel code and seem to have a work around. I say 'seem' as I'm not an expert on Graphics2D When using the LabelledSequenceRenderer in the TSP the paint method in the TSP doesn't set the clip for the renderer correctly. I have edited the following code in the TSP to change clip.x clip.width The point for g2.translate This sets the clip correctly, the label renders and the correct sequence displayed. //OLD CODE ========================================================== if (direction == HORIZONTAL) { // Clip x to edge of delegate renderer's leader clip.x = renderer.getMinimumLeader(this); clip.y = 0.0; // Set the width to visible symbols + the delegate // renderer's minimum trailer (which may have something in // it to render). clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) + renderer.getMinimumTrailer(this); clip.height = renderer.getDepth(this); g2.translate(leadingBorder.getSize() + insets.left, insets.top); } //NEW CODE ============================================================ if (direction == HORIZONTAL) { // Clip x to edge of delegate renderer's leader //clip.x = renderer.getMinimumLeader(this); clip.x = 0 - renderer.getMinimumLeader(this); clip.y = 0.0; // Set the width to visible symbols + the delegate // renderer's minimum trailer (which may have something in // it to render). clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) + renderer.getMinimumLeader(this) + renderer.getMinimumTrailer(this); clip.height = renderer.getDepth(this); g2.translate(leadingBorder.getSize() - clip.x + insets.left, insets.top); } I have used this code with the RulerRenderer via the MultiLineRenderer and think that the ruler doesn't renderer numbers/ticks accurately for the sequence in the TSP. It's marginal and only relevant at high resolution but I'll have a look at this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From smh1008 at cam.ac.uk Thu Sep 28 15:51:36 2006 From: smh1008 at cam.ac.uk (David Huen) Date: 28 Sep 2006 16:51:36 +0100 Subject: [Biojava-dev] tRNA anticodon alphabet Message-ID: Hi, Would there be any object to adding an alphabet to deal with anticodons? This would involve an additional alphabet comprising the current RNA alphabet extended with inosine. Regards, David Huen From smh1008 at cam.ac.uk Thu Sep 28 15:48:25 2006 From: smh1008 at cam.ac.uk (David Huen) Date: 28 Sep 2006 16:48:25 +0100 Subject: [Biojava-dev] CodonPrefTools API Message-ID: Hi, I wish to add to the CodonPrefTools API convenience methods that return each of the 64 codons. It would seem better to put it here in this less used API than clutter up the RNATools API. If anyone wishes to object could they do so now please? Regards, David Huen From mark.schreiber at novartis.com Fri Sep 29 01:17:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 29 Sep 2006 09:17:13 +0800 Subject: [Biojava-dev] tRNA anticodon alphabet Message-ID: I think it would be a useful addition. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 David Huen Sent by: biojava-dev-bounces at lists.open-bio.org 09/28/2006 11:51 PM To: biojava-dev at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] tRNA anticodon alphabet Hi, Would there be any object to adding an alphabet to deal with anticodons? This would involve an additional alphabet comprising the current RNA alphabet extended with inosine. Regards, David Huen _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From bubba.puryear at gmail.com Fri Sep 29 16:22:03 2006 From: bubba.puryear at gmail.com (Bubba Puryear) Date: Fri, 29 Sep 2006 12:22:03 -0400 Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with leading whitespace Message-ID: Hey all, I've been using biojava for some time now on my project for reading genbank flat files, but until reacently I haven't been writing any. Our client makes extensive use of VectorNTI (version 9, I think) and I was doing some edits to genbank files (via biojavax) and notice that comment values get their whitespace trimmed. Turns out VNTI splats a load of state that it needs in the comment section is a fairly lispish looking syntax... but indentation appears to be important. In particular, VNTI won't read the files I've edited that have had their whitespace munged. I have some local changes to the parser that preserve leading/trailing whitespace for section values for top level sections. I've run the tests locally (and added one for testing indented comments) and run this against ~ 3000 files I have locally. I wanted to get some feedback on this before I committed, though. As an example of the kind of thing that currently gets munged: COMMENT Vector_NTI_Display_Data_(Do_Not_Edit!) COMMENT (SXF COMMENT (CGexDoc "11460" 0 6359 COMMENT (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0 (CObList) (CObList) COMMENT (CObList) (CObList) -1) COMMENT (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5 40 50 0 1 0 .... The level of indentation can get quite deep. Thanks, Bubba From markjschreiber at gmail.com Sat Sep 30 12:29:41 2006 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 30 Sep 2006 20:29:41 +0800 Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with leading whitespace In-Reply-To: References: Message-ID: <93b45ca50609300529r8491cf0p22784589bea59618@mail.gmail.com> I think this should be fine to commit as long as biojava can still read in the file again (and other files). You should probably also comment the code to say VNTI needs this and to be doubly certain put in a unit test. - Mark On 9/30/06, Bubba Puryear wrote: > Hey all, > > I've been using biojava for some time now on my project for reading > genbank flat files, but until reacently I haven't been writing any. > Our client makes extensive use of VectorNTI (version 9, I think) and I > was doing some edits to genbank files (via biojavax) and notice that > comment values get their whitespace trimmed. > > Turns out VNTI splats a load of state that it needs in the comment > section is a fairly lispish looking syntax... but indentation appears > to be important. In particular, VNTI won't read the files I've edited > that have had their whitespace munged. I have some local changes to > the parser that preserve leading/trailing whitespace for section > values for top level sections. > > I've run the tests locally (and added one for testing indented > comments) and run this against ~ 3000 files I have locally. I wanted > to get some feedback on this before I committed, though. > > As an example of the kind of thing that currently gets munged: > > COMMENT Vector_NTI_Display_Data_(Do_Not_Edit!) > COMMENT (SXF > COMMENT (CGexDoc "11460" 0 6359 > COMMENT (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0 > (CObList) (CObList) > COMMENT (CObList) (CObList) -1) > COMMENT (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5 > 40 50 0 1 0 > .... > > The level of indentation can get quite deep. > > Thanks, > Bubba > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at ebi.ac.uk Tue Sep 12 09:36:47 2006 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 12 Sep 2006 09:36:47 -0000 Subject: [Biojava-dev] Problem with ranks In-Reply-To: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com> References: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com> Message-ID: <45067DF3.7020403@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions: Ranks in general are defined by BioSQL, but as much else in that schema they are not defined very well and so everyone has their own interpretation of what should go where. BioJava uses them in the way which I thought was most logical at the time, but BioPerl often ignores them completely and populates them all with zeroes. As BioJava can be connected to a database which could have been populated by BioPerl, it has to be able to cope with these different situations and potentially many others. It would be nice for all the Bio* projects to agree on exactly how to store various bits of information in BioSQL, especially as to how best to represent specific file formats such as GenBank, but this is probably highly unlikely given the limited amount of times when representatives of all the projects are in the same place at the same time (basically only at BOSC, and even then not always - there was nobody from BioJava there this year). > - Can rank be negative? We would assume not but this is never checked. Yes. It can be any integer you want. > - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking. I tried to start them all from 1, and used 0 for no-rank where rank is compulsory, and null where rank is optional (see below). If you find anywhere where I've been inconsistent, please feel free to raise a Bugzilla bug to point out where I've gone wrong so I can fix them. > - Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected? They don't have to be consecutive. > - Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*. Yes, duplicates are fine. > SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this? In BioSQL, BioEntryRelationship has a nullable rank, whereas all other ranked objects have non-null ranks. Hence I have to use an Integer object here to be able to cater for the null case, as this cannot be done with a plain int like the others. > Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones? This is a bug. They should be mutable and fire appropriate change events. > All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality. This is another bug. compareTo, equals and hashCode should always be working with the same fields. In this case, compareTo is missing a bunch. It shouldn't be. A word of warning though - when objects are loaded by Hibernate, often they are instantiated and added to a set _before_ all the setXXX methods are called to populate the various fields. Therefore, if you find nulls in any of the fields required for comparison then you should assume the object is still incomplete and return a non-zero result, to prevent the object from accidentally replacing an existing object that matches the fields populated so far. > All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates). Another bug. It should be using rank as well. > A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. SimpleNote is correct - two notes are equal if they have the same rank and term. SimpleRankedDocRef however is incorrect - it should include location in the equals/compareTo/hashCode methods. Another bug then, but check for non-null locations during Hibernate loading as above. If you or Mark can report all these to Bugzilla, then one of us will get round to fixing them before the end of the beta testing. (Reporting them to Bugzilla makes a nice todo list which is far more reliable than me trying to keep track of everything on paper...). cheers, Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFBn3z4C5LeMEKA/QRAmU4AJ9TJ5oh7EnUdJNLHryEx3RxNJ0CXwCfe2eY e8Qww/i+MMBA8sgRJVvV+Z8= =UURD -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Sep 27 09:18:42 2006 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 27 Sep 2006 17:18:42 +0800 Subject: [Biojava-dev] resources for gui? Message-ID: <93b45ca50609270218n76f19a2bxcd6c6b4d53dbea15@mail.gmail.com> Hi - Can someone tell me what the purpose of the files in resources/org/biojava/bio is? Thanks, - Mark