From gwaldon at geneinfinity.org  Wed Sep  6 19:14:28 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Wed, 06 Sep 2006 16:14:28 -0700
Subject: [Biojava-dev] GenbankFormat and BASE COUNT
Message-ID: <200609062314.k86NESGu081640@mmm1924.dulles19-verio.com>


>From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] 
>Are you OK to watch for format changes?

Sorry for the delay in responding. There are effectively a few incoming modifications.

- new naturally occurring amino acid pyrrolysine (Pyl/O - 22nd) will become official on release 156.0, same with EMBL this fall. We'll have to adjust the PROTEIN and PROTEIN_TERM alphabets and maybe have more translation tables. 

- talking about translation tables, I noticed a while ago that the official genbank/EMBL/DDBJ feature table contains 23 genetic code tables whereas Biojava only describes 13. We should probably stick to genbank/EMBL/DDBJ translation tables.

- Xle/J (leucine/isoleucine) will be legal starting Genbank 156.0 (October 2006).

- Feature location syntax X.Y to be discontinued as of October 2006. Record will be changed, although the conversion rule is not given. Maybe it is time to remove this type of fuzziness from Biojava?

Still not taken into account in org.biojavax.bio.seq.io.GenbankFormat:

- SEGMENT keyword, not currently parsed, maybe on purpose. 

- CONTIG keyword, same as above. Example: AE014134, this is an entire chromosome.

I can do the table and alphabet modifications when they become official.
George

From gwaldon at geneinfinity.org  Mon Sep 11 20:38:41 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Mon, 11 Sep 2006 17:38:41 -0700
Subject: [Biojava-dev] Problem with ranks
Message-ID: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com>

Hi,

I am having difficulties to use ranking with some objects found in SimpleRichSequence. There are 6 objects contained in SimpleRichSequence which are found within collections, namely SimpleComment, SimpleRankedCrossRef, SimpleRankedDocRef, SimpleNote, SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is associated with a TreeSet and uses to some extend ranking for comparison.

Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions:

- Can rank be negative? We would assume not but this is never checked.
- If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking.
- Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected?
- Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*.

SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this?

Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones?

All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality.

All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates).

A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. 

We need a clear definition of what ranks are, what the ordering they imply is intended for and how to deal with duplicate ranks? Maybe we could have an interface that encapsulates the concept of ranking, e.g. interface Ranked, methods setRank() and getRank()) and all these information grouped in the javadoc. It seems easier to derive exceptions from a common pattern that the opposite. Maybe we also need separate comparators when they are not consistent with equal. 

Thanks,
George


From mark.schreiber at novartis.com  Mon Sep 11 23:37:55 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 12 Sep 2006 11:37:55 +0800
Subject: [Biojava-dev] Problem with ranks
Message-ID: <OFDAC0A613.57020768-ON482571E7.000A58D9-482571E7.0013F38A@ah.novartis.com>

Hi George, thanks for raising these issues. We should fix this before 
biojava 1.5 finishes it's beta testing. See my responses below. Richard 
Holland and David Scott will no doubt have comments too.

>I am having difficulties to use ranking with some objects found in 
SimpleRichSequence. There are 6 objects >contained in SimpleRichSequence 
which are found within collections, namely SimpleComment, 
SimpleRankedCrossRef, >SimpleRankedDocRef, SimpleNote, 
SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is 
associated with >a TreeSet and uses to some extend ranking for comparison.
>
>Ranks are never described but the name suggests that they are positive 
integer, in consecutive order and not >identical for similar objects 
within the same sequence. Here are some questions:

Ranks actually come from the BioSQL schema. They are used so that lists of 
features, comments etc that are stored in database tables (or any other 
collection) can be reassembled in the same order that they are found in 
the original flatfile (Genbank etc). Simply put they are used to preserve 
order.

> - Can rank be negative? We would assume not but this is never checked.

I suppose it could be but it would make no sense given the above 
description. We should probably document this in the javadocs and suggest 
that classes enforce the non-negative rule.

- If rank cannot be negative, where do they start, 0, 1? 
SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved 
for absence of ranking.

At the moment this strictly depends on the creating object. Typically this 
would be a RichSequenceFormat implementation. The Genbank format appears 
to start numbering from either 0 or 1 (for comments). There should be a 
common rule.

>- Are we expecting ranks to be in consecutive order (or in reasonable 
consecutive order) or values like 1000, >2000, etc. are possible or even 
expected?

Is there any reason why we need to enforce this rule? It would be tidier 
but it would be a pain to have to re-order everything just because one 
object is deleted. The genbank parser currently numbers sequentially.

>- Can we have duplicate ranks? We would assume not but SimpleRichFeature 
javadoc indicates that equal ranks are >*acceptable*.

Certainly all the RankedCrossRefs returned by the Genbank parser have the 
same rank (0). It is possible as long as the objects are somehow unique. 
If equals() is true then the objects are overwritten. I don't think any 
Ranked object currently relies only on rank for equality (or for the 
compare() method either). The Unit tests do a pretty good job of testing 
equals and compare and making sure they return logically equivalent 
values. Although it is possible it may not be desirable. Any thoughts?

>SimpleBioEntryRelationship getRank method returns an Integer object, all 
the other objects return an integer >number. Any reason for this?

I think Richard has a reason. Something to do with Hibernate?? Richard??

>Moreover 3 of these objects do not have a setRank method: SimpleComment, 
SimpleRankedCrossRef and >SimpleRankedDocRef. How do I insert a comment in 
the middle of other comments, how do I change the order of these >objects 
without creating new ones?

Possibly they should. Making things mutable is always tricky but the other 
objects with setRank methods register change listeners and have the option 
of vetoing the change so it can be done safely. The ChangeListener could 
be in charge of re-ordering ranks if you insert into the middle.

>All these objects have an ordering consistent with equality except 
SimpleRichFeature. SimpleRichFeature are sorted >by rank only. Its 
compareTo method also never returns 0. A consequence is that removeFeature 
in ThinRichSequence >never works because TreeSet uses compareTo for 
testing equality.

OK, that sounds like a bug that we have missed in the Unit tests. I will 
report it to bugzilla and fix it when I have time.

>All compareTo methods use rank first except SimpleRankedDocRef which does 
not use rank at all (but is ranked as >its name indicates).

We should change this. Another bugzilla report.

>A few objects are nearly identical when they are equal but not all. 
SimpleNote compares by rank then by term but >not by value. SimpleNotes of 
same rank and term but different values are nevertheless equal. 
SimpleRankedDocRef >can be equal and have different locations ? I can 
understand this. 

This is because the term of a SimpleNote is an ontology term and should 
therefore have only one value. Two Notes with the same term are therefore 
the same (or should be). For example if the term or keyword of the Note is 
Organism: there should only be one of these Notes.

>We need a clear definition of what ranks are, what the ordering they 
imply is intended for and how to deal with >duplicate ranks? Maybe we 
could have an interface that encapsulates the concept of ranking, e.g. 
interface Ranked, >methods setRank() and getRank()) and all these 
information grouped in the javadoc. It seems easier to derive >exceptions 
from a common pattern that the opposite. Maybe we also need separate 
comparators when they are not >consistent with equal. 

I think we should have a 'Ranked' interface with clear rules in the 
javadoc. I can't think of any good reason why comparable and equal should 
not be consistent. We should try and keep them the same as much as 
possible.

- Mark


From Robin.Emig at pioneer.com  Tue Sep 12 18:34:34 2006
From: Robin.Emig at pioneer.com (Emig, Robin)
Date: Tue, 12 Sep 2006 15:34:34 -0700
Subject: [Biojava-dev] Java1.5
In-Reply-To: <OF455B6B7D.2E38347C-ON482571AB.002F72A2-482571AB.002F810E@EU.novartis.net>
Message-ID: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D682C@rcy1ms01.phibred.com>

   I'm a little confused about whether the Biojava 1.5 is using java
1.5. Looking through the email list it appears to be so, but the default
compile options in the build file are still for java 1.4. Can anyone
clarify for me?
Thanks
Robin

This communication is for use by the intended recipient and contains
information that may be Privileged, confidential or copyrighted under
applicable law. If you are not the intended recipient, you are hereby
formally notified that any use, copying or distribution of this e-mail,
in whole or in part, is strictly prohibited. Please notify the sender by
return e-mail and delete this e-mail from your system. Unless explicitly
and conspicuously designated as "E-Contract Intended", this e-mail does
not constitute a contract offer, a contract amendment, or an acceptance
of a contract offer. This e-mail does not constitute a consent to the
use of sender's contact information for direct marketing purposes or for
transfers of data to third parties.

Francais Deutsch Italiano  Espanol  Portugues  Japanese  Chinese  Korean

           http://www.DuPont.com/corp/email_disclaimer.html


From mark.schreiber at novartis.com  Tue Sep 12 21:02:15 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 13 Sep 2006 09:02:15 +0800
Subject: [Biojava-dev] Java1.5
Message-ID: <OF062F7DA8.74BF22F7-ON482571E8.0005A60C-482571E8.0005B345@ah.novartis.com>

Biojava 1.5 officially uses JDK 1.4

- Mark


"Emig, Robin" <Robin.Emig at pioneer.com>
09/13/2006 06:34 AM

 
        To:     <mark.schreiber at novartis.com>
        cc:     <biojava-dev at biojava.org>
        Subject:        Java1.5


   I'm a little confused about whether the Biojava 1.5 is using java
1.5. Looking through the email list it appears to be so, but the default
compile options in the build file are still for java 1.4. Can anyone
clarify for me?
Thanks
Robin


From gwaldon at geneinfinity.org  Wed Sep 13 01:33:43 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Tue, 12 Sep 2006 22:33:43 -0700
Subject: [Biojava-dev]  Re:  Problem with ranks
Message-ID: <200609130533.k8D5Xi63019465@mmm1924.dulles19-verio.com>

Thank you Mark and Richard for your exhaustive answers. This is very much appreciated. I am not a database person and I was completely missing the other side of the story.

Perhaps the Bio* projects could agree quickly on ranks before someone populates a database with exotic values. It seems that there is a consensus on this list for having ranks positive and non null integers when they are defined and equals to zero otherwise. This would also solve the problem of the nullable rank of BioEntryRelationship (which could be then equivalent to an integer value equal to zero).

Also, I improperly reported that SimpleRankedDocRef compareTo method does not use rank. My apologies for the mistake.

Thanks
George

From bugzilla-daemon at portal.open-bio.org  Mon Sep 25 05:52:56 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 25 Sep 2006 05:52:56 -0400
Subject: [Biojava-dev] [Bug 2107] New: LabelledSequenceRenderer
Message-ID: <bug-2107-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2107

           Summary: LabelledSequenceRenderer
           Product: BioJava
           Version: 1.4
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: bio
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: jolyon.holdstock at ogt.co.uk


Using a LabelledSequenceRenderer works as expected in a SequencePanel, but not
in TranslatedSequencePanel. In the latter the label is not displayed. Also
while 
sequence is displayed from the correct start point the actual sequence is 
incorrect. 

Below is some example code that demonstrates the problem.

//Example code --------------------------------------------------
//Java libraries
import java.awt.Color;
import java.awt.BorderLayout;
//Java extension libraries
import javax.swing.JFrame;
//BioJava libraries
import org.biojava.bio.BioException;
import org.biojava.utils.ChangeVetoException;
import org.biojava.bio.symbol.RangeLocation;
import org.biojava.bio.gui.sequence.SymbolSequenceRenderer;
import org.biojava.bio.seq.Sequence;
import org.biojava.bio.seq.DNATools;
import org.biojava.bio.gui.sequence.SequencePanel;
import org.biojava.bio.gui.sequence.TranslatedSequencePanel;
import org.biojava.bio.gui.sequence.LabelledSequenceRenderer;

public class TestSequencePanel extends JFrame {

  private Sequence seq;
  private SequencePanel sp;
  private TranslatedSequencePanel tsp;

  public TestSequencePanel(){
    try {
      //Create the SequencePanel and TranslatedSequencePanel
      sp = new SequencePanel();
      tsp = new TranslatedSequencePanel();

      //Create a DNA sequence
      seq =
DNATools.createDNASequence("AGATAGCTAGCTAGATATGATAGATCGATAGCAAGCTAGCATCGACTACGATC","DNA");

      //Create a renderer for the sequence
      SymbolSequenceRenderer ssr = new SymbolSequenceRenderer();

      //Create the LabelledSequenceRenderer
      LabelledSequenceRenderer lsr = new LabelledSequenceRenderer(50, 50);
      lsr.setFillColor(Color.white);
      lsr.setRenderer(ssr);
      lsr.addLabelString("Seq");

      //Set up the SequencePanel
      sp.setSequence(seq);
      sp.setRenderer(lsr);
      sp.setRange(new RangeLocation(1,300));

      //Set up the TranslatedSequencePanel
      tsp.setSequence(seq);
      tsp.setRenderer(lsr);
      tsp.setScale(12);
    }
    catch(ChangeVetoException e){
      System.out.println("ChangeVetoException: " + e);
    }
    catch(BioException e){
      System.out.println("BioException: " + e);
    }

    //Add the panels to the frame
    this.getContentPane().setLayout(new BorderLayout());
    this.getContentPane().add(sp, BorderLayout.NORTH);
    this.getContentPane().add(tsp, BorderLayout.CENTER);

    setLocation(100,100);
    setSize(400,200);
    setVisible(true);
  }
  public static void main(String[] args) {
    new TestSequencePanel();
  }
}


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From ClevelandJ at BATTELLE.ORG  Tue Sep 26 10:42:57 2006
From: ClevelandJ at BATTELLE.ORG (Cleveland, John S)
Date: Tue, 26 Sep 2006 10:42:57 -0400
Subject: [Biojava-dev] Percentage similarity
Message-ID: <251E388086D4D64B8413DCC66EA438084164DB@WS-BSO-MSE1.milky-way.battelle.org>

Does anyone know how to retrieve the percentage similarity from a BLAST
result using BioJava?

 
This field is not available from SeqSimilaritySearchSubHit.
SeqSimilaritySearchSubHit does have the getEValue() and getScore()
methods, so I was a little confused about not finding the "percentage
identity" and "percentage similarity" fields.  I followed the directions
in http://biojava.org/wiki/BioJava:CookBook:Blast:Echo, but again the
percentage similarity does not seem to be getting parsed by the
BlastLikeSaxParser.  Here is the result of the aforementioned code:
 

startHit()

      HitProp:    subjectSequenceLength: 299

      HitProp:    subjectId: lcd|5392-AAA98259

      HitProp:    subjectDescription: 

startSubHit()

      SubHitProp: score: 24.6

      SubHitProp: expectValue: 6.9

      SubHitProp: numberOfIdentities: 14

      SubHitProp: alignmentSize: 42

      SubHitProp: percentageIdentity: 33

      SubHitProp: numberOfIdentities: 14

      SubHitProp: numberOfPositives: 23

      SubHitProp: queryFrame: plus1

      SubHitProp: querySequenceStart: 928

      SubHitProp: querySequenceEnd: 1047

      SubHitProp: querySequence:
TKDGKTQEWEMDNPGN--DFMTGSKDTYTFKLKDENLKIDDI

      SubHitProp: subjectSequenceStart: 126

      SubHitProp: subjectSequenceEnd: 167

      SubHitProp: subjectSequence:
TDDGKIREYELPNKGSYPSFITLGSDNALWFTENQNNAIGRI

endSubHit()

endHit()
 
 
Thanks,
John Cleveland

 
From bugzilla-daemon at portal.open-bio.org  Thu Sep 28 05:27:11 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 28 Sep 2006 05:27:11 -0400
Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer
In-Reply-To: <bug-2107-485@http.bugzilla.open-bio.org/>
Message-ID: <200609280927.k8S9RB8w018668@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2107


jolyon.holdstock at ogt.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jolyon.holdstock at ogt.co.uk


------- Comment #1 from jolyon.holdstock at ogt.co.uk  2006-09-28 05:27 -------
I've had a look at the TranslatedSequencePanel code and seem to have a work
around. I say 'seem' as I'm not an expert on Graphics2D

When using the LabelledSequenceRenderer in the TSP the paint method in the TSP
doesn't set the clip for the renderer correctly.

I have edited the following code in the TSP to change

clip.x
clip.width
The point for g2.translate

This sets the clip correctly, the label renders and the correct sequence
displayed. 

//OLD CODE ==========================================================
if (direction == HORIZONTAL) {
  // Clip x to edge of delegate renderer's leader
  clip.x = renderer.getMinimumLeader(this);
  clip.y = 0.0;
  // Set the width to visible symbols + the delegate
  // renderer's minimum trailer (which may have something in
  // it to render).
  clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) +
renderer.getMinimumTrailer(this);
  clip.height = renderer.getDepth(this);
  g2.translate(leadingBorder.getSize() + insets.left, insets.top); } 

//NEW CODE ============================================================
if (direction == HORIZONTAL) {
  // Clip x to edge of delegate renderer's leader
  //clip.x = renderer.getMinimumLeader(this);
  clip.x = 0 - renderer.getMinimumLeader(this);
  clip.y = 0.0;
  // Set the width to visible symbols + the delegate
  // renderer's minimum trailer (which may have something in
  // it to render).
  clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) +
renderer.getMinimumLeader(this) + renderer.getMinimumTrailer(this);
  clip.height = renderer.getDepth(this);
  g2.translate(leadingBorder.getSize() - clip.x + insets.left, insets.top); }


I have used this code with the RulerRenderer via the MultiLineRenderer
and think that the ruler doesn't renderer numbers/ticks accurately for
the sequence in the TSP. It's marginal and only relevant at high resolution but
I'll have a look at this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From smh1008 at cam.ac.uk  Thu Sep 28 11:51:36 2006
From: smh1008 at cam.ac.uk (David Huen)
Date: 28 Sep 2006 16:51:36 +0100
Subject: [Biojava-dev] tRNA anticodon alphabet
Message-ID: <Prayer.1.0.18.0609281651360.17217@hermes-2.csi.cam.ac.uk>

Hi, Would there be any object to adding an alphabet to deal with 
anticodons? This would involve an additional alphabet comprising the 
current RNA alphabet extended with inosine.

Regards,
David Huen

From smh1008 at cam.ac.uk  Thu Sep 28 11:48:25 2006
From: smh1008 at cam.ac.uk (David Huen)
Date: 28 Sep 2006 16:48:25 +0100
Subject: [Biojava-dev] CodonPrefTools API
Message-ID: <Prayer.1.0.18.0609281648250.17217@hermes-2.csi.cam.ac.uk>

Hi, I wish to add to the CodonPrefTools API convenience methods that return 
each of the 64 codons. It would seem better to put it here in this less 
used API than clutter up the RNATools API.

If anyone wishes to object could they do so now please?

Regards,
David Huen

From mark.schreiber at novartis.com  Thu Sep 28 21:17:13 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 29 Sep 2006 09:17:13 +0800
Subject: [Biojava-dev] tRNA anticodon alphabet
Message-ID: <OF1F06A041.D888F59F-ON482571F8.000707B1-482571F8.000711D4@ah.novartis.com>

I think it would be a useful addition.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


David Huen <smh1008 at cam.ac.uk>
Sent by: biojava-dev-bounces at lists.open-bio.org
09/28/2006 11:51 PM

 
        To:     biojava-dev at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-dev] tRNA anticodon alphabet


Hi, Would there be any object to adding an alphabet to deal with 
anticodons? This would involve an additional alphabet comprising the 
current RNA alphabet extended with inosine.

Regards,
David Huen
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From bubba.puryear at gmail.com  Fri Sep 29 12:22:03 2006
From: bubba.puryear at gmail.com (Bubba Puryear)
Date: Fri, 29 Sep 2006 12:22:03 -0400
Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with leading
	whitespace
Message-ID: <d2f7533b0609290922w6aee8e07ybae05536a8d634b6@mail.gmail.com>

Hey all,

  I've been using biojava for some time now on my project for reading
genbank flat files, but until reacently I haven't been writing any.
Our client makes extensive use of VectorNTI (version 9, I think) and I
was doing some edits to genbank files (via biojavax) and notice that
comment values get their whitespace trimmed.

  Turns out VNTI splats a load of state that it needs in the comment
section is a fairly lispish looking syntax... but indentation appears
to be important. In particular, VNTI won't read the files I've edited
that have had their whitespace munged. I have some local changes to
the parser that preserve leading/trailing whitespace for section
values for top level sections.

  I've run the tests locally (and added one for testing indented
comments) and run this against ~ 3000 files I have locally. I wanted
to get some feedback on this before I committed, though.

  As an example of the kind of thing that currently gets munged:

COMMENT     Vector_NTI_Display_Data_(Do_Not_Edit!)
COMMENT     (SXF
COMMENT      (CGexDoc "11460" 0 6359
COMMENT       (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0
(CObList) (CObList)
COMMENT        (CObList) (CObList) -1)
COMMENT       (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5
40 50 0 1 0
....

   The level of indentation can get quite deep.

Thanks,
Bubba

From markjschreiber at gmail.com  Sat Sep 30 08:29:41 2006
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 30 Sep 2006 20:29:41 +0800
Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with
	leading whitespace
In-Reply-To: <d2f7533b0609290922w6aee8e07ybae05536a8d634b6@mail.gmail.com>
References: <d2f7533b0609290922w6aee8e07ybae05536a8d634b6@mail.gmail.com>
Message-ID: <93b45ca50609300529r8491cf0p22784589bea59618@mail.gmail.com>

I think this should be fine to commit as long as biojava can still
read in the file again (and other files).

You should probably also comment the code to say VNTI needs this and
to be doubly certain put in a unit test.

- Mark

On 9/30/06, Bubba Puryear <bubba.puryear at gmail.com> wrote:
> Hey all,
>
>   I've been using biojava for some time now on my project for reading
> genbank flat files, but until reacently I haven't been writing any.
> Our client makes extensive use of VectorNTI (version 9, I think) and I
> was doing some edits to genbank files (via biojavax) and notice that
> comment values get their whitespace trimmed.
>
>   Turns out VNTI splats a load of state that it needs in the comment
> section is a fairly lispish looking syntax... but indentation appears
> to be important. In particular, VNTI won't read the files I've edited
> that have had their whitespace munged. I have some local changes to
> the parser that preserve leading/trailing whitespace for section
> values for top level sections.
>
>   I've run the tests locally (and added one for testing indented
> comments) and run this against ~ 3000 files I have locally. I wanted
> to get some feedback on this before I committed, though.
>
>   As an example of the kind of thing that currently gets munged:
>
> COMMENT     Vector_NTI_Display_Data_(Do_Not_Edit!)
> COMMENT     (SXF
> COMMENT      (CGexDoc "11460" 0 6359
> COMMENT       (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0
> (CObList) (CObList)
> COMMENT        (CObList) (CObList) -1)
> COMMENT       (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5
> 40 50 0 1 0
> ....
>
>    The level of indentation can get quite deep.
>
> Thanks,
> Bubba
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From holland at ebi.ac.uk  Tue Sep 12 05:36:47 2006
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 12 Sep 2006 09:36:47 -0000
Subject: [Biojava-dev] Problem with ranks
In-Reply-To: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com>
References: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com>
Message-ID: <45067DF3.7020403@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions:

Ranks in general are defined by BioSQL, but as much else in that schema
they are not defined very well and so everyone has their own
interpretation of what should go where.

BioJava uses them in the way which I thought was most logical at the
time, but BioPerl often ignores them completely and populates them all
with zeroes. As BioJava can be connected to a database which could have
been populated by BioPerl, it has to be able to cope with these
different situations and potentially many others.

It would be nice for all the Bio* projects to agree on exactly how to
store various bits of information in BioSQL, especially as to how best
to represent specific file formats such as GenBank, but this is probably
highly unlikely given the limited amount of times when representatives
of all the projects are in the same place at the same time (basically
only at BOSC, and even then not always - there was nobody from BioJava
there this year).

> - Can rank be negative? We would assume not but this is never checked.

Yes. It can be any integer you want.

> - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking.

I tried to start them all from 1, and used 0 for no-rank where rank is
compulsory, and null where rank is optional (see below). If you find
anywhere where I've been inconsistent, please feel free to raise a
Bugzilla bug to point out where I've gone wrong so I can fix them.

> - Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected?

They don't have to be consecutive.

> - Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*.

Yes, duplicates are fine.

> SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this?

In BioSQL, BioEntryRelationship has a nullable rank, whereas all other
ranked objects have non-null ranks. Hence I have to use an Integer
object here to be able to cater for the null case, as this cannot be
done with a plain int like the others.

> Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones?

This is a bug. They should be mutable and fire appropriate change events.

> All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality.

This is another bug. compareTo, equals and hashCode should always be
working with the same fields. In this case, compareTo is missing a
bunch. It shouldn't be.

A word of warning though - when objects are loaded by Hibernate, often
they are instantiated and added to a set _before_ all the setXXX methods
are called to populate the various fields. Therefore, if you find nulls
in any of the fields required for comparison then you should assume the
object is still incomplete and return a non-zero result, to prevent the
object from accidentally replacing an existing object that matches the
fields populated so far.

> All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates).

Another bug. It should be using rank as well.

> A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. 

SimpleNote is correct - two notes are equal if they have the same rank
and term.

SimpleRankedDocRef however is incorrect - it should include location in
the equals/compareTo/hashCode methods. Another bug then, but check for
non-null locations during Hibernate loading as above.

If you or Mark can report all these to Bugzilla, then one of us will get
round to fixing them before the end of the beta testing. (Reporting them
to Bugzilla makes a nice todo list which is far more reliable than me
trying to keep track of everything on paper...).

cheers,
Richard

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFBn3z4C5LeMEKA/QRAmU4AJ9TJ5oh7EnUdJNLHryEx3RxNJ0CXwCfe2eY
e8Qww/i+MMBA8sgRJVvV+Z8=
=UURD
-----END PGP SIGNATURE-----

From markjschreiber at gmail.com  Wed Sep 27 05:18:42 2006
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 27 Sep 2006 17:18:42 +0800
Subject: [Biojava-dev] resources for gui?
Message-ID: <93b45ca50609270218n76f19a2bxcd6c6b4d53dbea15@mail.gmail.com>

Hi -

Can someone tell me what the purpose of the files in
resources/org/biojava/bio is?

Thanks,

- Mark

From gwaldon at geneinfinity.org  Wed Sep  6 23:14:28 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Wed, 06 Sep 2006 16:14:28 -0700
Subject: [Biojava-dev] GenbankFormat and BASE COUNT
Message-ID: <200609062314.k86NESGu081640@mmm1924.dulles19-verio.com>


>From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] 
>Are you OK to watch for format changes?

Sorry for the delay in responding. There are effectively a few incoming modifications.

- new naturally occurring amino acid pyrrolysine (Pyl/O - 22nd) will become official on release 156.0, same with EMBL this fall. We'll have to adjust the PROTEIN and PROTEIN_TERM alphabets and maybe have more translation tables. 

- talking about translation tables, I noticed a while ago that the official genbank/EMBL/DDBJ feature table contains 23 genetic code tables whereas Biojava only describes 13. We should probably stick to genbank/EMBL/DDBJ translation tables.

- Xle/J (leucine/isoleucine) will be legal starting Genbank 156.0 (October 2006).

- Feature location syntax X.Y to be discontinued as of October 2006. Record will be changed, although the conversion rule is not given. Maybe it is time to remove this type of fuzziness from Biojava?

Still not taken into account in org.biojavax.bio.seq.io.GenbankFormat:

- SEGMENT keyword, not currently parsed, maybe on purpose. 

- CONTIG keyword, same as above. Example: AE014134, this is an entire chromosome.

I can do the table and alphabet modifications when they become official.
George


From gwaldon at geneinfinity.org  Tue Sep 12 00:38:41 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Mon, 11 Sep 2006 17:38:41 -0700
Subject: [Biojava-dev] Problem with ranks
Message-ID: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com>

Hi,

I am having difficulties to use ranking with some objects found in SimpleRichSequence. There are 6 objects contained in SimpleRichSequence which are found within collections, namely SimpleComment, SimpleRankedCrossRef, SimpleRankedDocRef, SimpleNote, SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is associated with a TreeSet and uses to some extend ranking for comparison.

Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions:

- Can rank be negative? We would assume not but this is never checked.
- If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking.
- Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected?
- Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*.

SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this?

Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones?

All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality.

All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates).

A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. 

We need a clear definition of what ranks are, what the ordering they imply is intended for and how to deal with duplicate ranks? Maybe we could have an interface that encapsulates the concept of ranking, e.g. interface Ranked, methods setRank() and getRank()) and all these information grouped in the javadoc. It seems easier to derive exceptions from a common pattern that the opposite. Maybe we also need separate comparators when they are not consistent with equal. 

Thanks,
George


From mark.schreiber at novartis.com  Tue Sep 12 03:37:55 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 12 Sep 2006 11:37:55 +0800
Subject: [Biojava-dev] Problem with ranks
Message-ID: <OFDAC0A613.57020768-ON482571E7.000A58D9-482571E7.0013F38A@ah.novartis.com>

Hi George, thanks for raising these issues. We should fix this before 
biojava 1.5 finishes it's beta testing. See my responses below. Richard 
Holland and David Scott will no doubt have comments too.

>I am having difficulties to use ranking with some objects found in 
SimpleRichSequence. There are 6 objects >contained in SimpleRichSequence 
which are found within collections, namely SimpleComment, 
SimpleRankedCrossRef, >SimpleRankedDocRef, SimpleNote, 
SimpleBioEntryRelationShip, and SimpleRichFeature. Each of them is 
associated with >a TreeSet and uses to some extend ranking for comparison.
>
>Ranks are never described but the name suggests that they are positive 
integer, in consecutive order and not >identical for similar objects 
within the same sequence. Here are some questions:

Ranks actually come from the BioSQL schema. They are used so that lists of 
features, comments etc that are stored in database tables (or any other 
collection) can be reassembled in the same order that they are found in 
the original flatfile (Genbank etc). Simply put they are used to preserve 
order.

> - Can rank be negative? We would assume not but this is never checked.

I suppose it could be but it would make no sense given the above 
description. We should probably document this in the javadocs and suggest 
that classes enforce the non-negative rule.

- If rank cannot be negative, where do they start, 0, 1? 
SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved 
for absence of ranking.

At the moment this strictly depends on the creating object. Typically this 
would be a RichSequenceFormat implementation. The Genbank format appears 
to start numbering from either 0 or 1 (for comments). There should be a 
common rule.

>- Are we expecting ranks to be in consecutive order (or in reasonable 
consecutive order) or values like 1000, >2000, etc. are possible or even 
expected?

Is there any reason why we need to enforce this rule? It would be tidier 
but it would be a pain to have to re-order everything just because one 
object is deleted. The genbank parser currently numbers sequentially.

>- Can we have duplicate ranks? We would assume not but SimpleRichFeature 
javadoc indicates that equal ranks are >*acceptable*.

Certainly all the RankedCrossRefs returned by the Genbank parser have the 
same rank (0). It is possible as long as the objects are somehow unique. 
If equals() is true then the objects are overwritten. I don't think any 
Ranked object currently relies only on rank for equality (or for the 
compare() method either). The Unit tests do a pretty good job of testing 
equals and compare and making sure they return logically equivalent 
values. Although it is possible it may not be desirable. Any thoughts?

>SimpleBioEntryRelationship getRank method returns an Integer object, all 
the other objects return an integer >number. Any reason for this?

I think Richard has a reason. Something to do with Hibernate?? Richard??

>Moreover 3 of these objects do not have a setRank method: SimpleComment, 
SimpleRankedCrossRef and >SimpleRankedDocRef. How do I insert a comment in 
the middle of other comments, how do I change the order of these >objects 
without creating new ones?

Possibly they should. Making things mutable is always tricky but the other 
objects with setRank methods register change listeners and have the option 
of vetoing the change so it can be done safely. The ChangeListener could 
be in charge of re-ordering ranks if you insert into the middle.

>All these objects have an ordering consistent with equality except 
SimpleRichFeature. SimpleRichFeature are sorted >by rank only. Its 
compareTo method also never returns 0. A consequence is that removeFeature 
in ThinRichSequence >never works because TreeSet uses compareTo for 
testing equality.

OK, that sounds like a bug that we have missed in the Unit tests. I will 
report it to bugzilla and fix it when I have time.

>All compareTo methods use rank first except SimpleRankedDocRef which does 
not use rank at all (but is ranked as >its name indicates).

We should change this. Another bugzilla report.

>A few objects are nearly identical when they are equal but not all. 
SimpleNote compares by rank then by term but >not by value. SimpleNotes of 
same rank and term but different values are nevertheless equal. 
SimpleRankedDocRef >can be equal and have different locations ? I can 
understand this. 

This is because the term of a SimpleNote is an ontology term and should 
therefore have only one value. Two Notes with the same term are therefore 
the same (or should be). For example if the term or keyword of the Note is 
Organism: there should only be one of these Notes.

>We need a clear definition of what ranks are, what the ordering they 
imply is intended for and how to deal with >duplicate ranks? Maybe we 
could have an interface that encapsulates the concept of ranking, e.g. 
interface Ranked, >methods setRank() and getRank()) and all these 
information grouped in the javadoc. It seems easier to derive >exceptions 
from a common pattern that the opposite. Maybe we also need separate 
comparators when they are not >consistent with equal. 

I think we should have a 'Ranked' interface with clear rules in the 
javadoc. I can't think of any good reason why comparable and equal should 
not be consistent. We should try and keep them the same as much as 
possible.

- Mark


From Robin.Emig at pioneer.com  Tue Sep 12 22:34:34 2006
From: Robin.Emig at pioneer.com (Emig, Robin)
Date: Tue, 12 Sep 2006 15:34:34 -0700
Subject: [Biojava-dev] Java1.5
In-Reply-To: <OF455B6B7D.2E38347C-ON482571AB.002F72A2-482571AB.002F810E@EU.novartis.net>
Message-ID: <BE0B1B72A3A05C448AD3F6FB6B6D9A137D682C@rcy1ms01.phibred.com>

   I'm a little confused about whether the Biojava 1.5 is using java
1.5. Looking through the email list it appears to be so, but the default
compile options in the build file are still for java 1.4. Can anyone
clarify for me?
Thanks
Robin

This communication is for use by the intended recipient and contains
information that may be Privileged, confidential or copyrighted under
applicable law. If you are not the intended recipient, you are hereby
formally notified that any use, copying or distribution of this e-mail,
in whole or in part, is strictly prohibited. Please notify the sender by
return e-mail and delete this e-mail from your system. Unless explicitly
and conspicuously designated as "E-Contract Intended", this e-mail does
not constitute a contract offer, a contract amendment, or an acceptance
of a contract offer. This e-mail does not constitute a consent to the
use of sender's contact information for direct marketing purposes or for
transfers of data to third parties.

Francais Deutsch Italiano  Espanol  Portugues  Japanese  Chinese  Korean

           http://www.DuPont.com/corp/email_disclaimer.html


From mark.schreiber at novartis.com  Wed Sep 13 01:02:15 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 13 Sep 2006 09:02:15 +0800
Subject: [Biojava-dev] Java1.5
Message-ID: <OF062F7DA8.74BF22F7-ON482571E8.0005A60C-482571E8.0005B345@ah.novartis.com>

Biojava 1.5 officially uses JDK 1.4

- Mark


"Emig, Robin" <Robin.Emig at pioneer.com>
09/13/2006 06:34 AM

 
        To:     <mark.schreiber at novartis.com>
        cc:     <biojava-dev at biojava.org>
        Subject:        Java1.5


   I'm a little confused about whether the Biojava 1.5 is using java
1.5. Looking through the email list it appears to be so, but the default
compile options in the build file are still for java 1.4. Can anyone
clarify for me?
Thanks
Robin


From gwaldon at geneinfinity.org  Wed Sep 13 05:33:43 2006
From: gwaldon at geneinfinity.org (george waldon)
Date: Tue, 12 Sep 2006 22:33:43 -0700
Subject: [Biojava-dev]  Re:  Problem with ranks
Message-ID: <200609130533.k8D5Xi63019465@mmm1924.dulles19-verio.com>

Thank you Mark and Richard for your exhaustive answers. This is very much appreciated. I am not a database person and I was completely missing the other side of the story.

Perhaps the Bio* projects could agree quickly on ranks before someone populates a database with exotic values. It seems that there is a consensus on this list for having ranks positive and non null integers when they are defined and equals to zero otherwise. This would also solve the problem of the nullable rank of BioEntryRelationship (which could be then equivalent to an integer value equal to zero).

Also, I improperly reported that SimpleRankedDocRef compareTo method does not use rank. My apologies for the mistake.

Thanks
George


From bugzilla-daemon at portal.open-bio.org  Mon Sep 25 09:52:56 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 25 Sep 2006 05:52:56 -0400
Subject: [Biojava-dev] [Bug 2107] New: LabelledSequenceRenderer
Message-ID: <bug-2107-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2107

           Summary: LabelledSequenceRenderer
           Product: BioJava
           Version: 1.4
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: bio
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: jolyon.holdstock at ogt.co.uk


Using a LabelledSequenceRenderer works as expected in a SequencePanel, but not
in TranslatedSequencePanel. In the latter the label is not displayed. Also
while 
sequence is displayed from the correct start point the actual sequence is 
incorrect. 

Below is some example code that demonstrates the problem.

//Example code --------------------------------------------------
//Java libraries
import java.awt.Color;
import java.awt.BorderLayout;
//Java extension libraries
import javax.swing.JFrame;
//BioJava libraries
import org.biojava.bio.BioException;
import org.biojava.utils.ChangeVetoException;
import org.biojava.bio.symbol.RangeLocation;
import org.biojava.bio.gui.sequence.SymbolSequenceRenderer;
import org.biojava.bio.seq.Sequence;
import org.biojava.bio.seq.DNATools;
import org.biojava.bio.gui.sequence.SequencePanel;
import org.biojava.bio.gui.sequence.TranslatedSequencePanel;
import org.biojava.bio.gui.sequence.LabelledSequenceRenderer;

public class TestSequencePanel extends JFrame {

  private Sequence seq;
  private SequencePanel sp;
  private TranslatedSequencePanel tsp;

  public TestSequencePanel(){
    try {
      //Create the SequencePanel and TranslatedSequencePanel
      sp = new SequencePanel();
      tsp = new TranslatedSequencePanel();

      //Create a DNA sequence
      seq =
DNATools.createDNASequence("AGATAGCTAGCTAGATATGATAGATCGATAGCAAGCTAGCATCGACTACGATC","DNA");

      //Create a renderer for the sequence
      SymbolSequenceRenderer ssr = new SymbolSequenceRenderer();

      //Create the LabelledSequenceRenderer
      LabelledSequenceRenderer lsr = new LabelledSequenceRenderer(50, 50);
      lsr.setFillColor(Color.white);
      lsr.setRenderer(ssr);
      lsr.addLabelString("Seq");

      //Set up the SequencePanel
      sp.setSequence(seq);
      sp.setRenderer(lsr);
      sp.setRange(new RangeLocation(1,300));

      //Set up the TranslatedSequencePanel
      tsp.setSequence(seq);
      tsp.setRenderer(lsr);
      tsp.setScale(12);
    }
    catch(ChangeVetoException e){
      System.out.println("ChangeVetoException: " + e);
    }
    catch(BioException e){
      System.out.println("BioException: " + e);
    }

    //Add the panels to the frame
    this.getContentPane().setLayout(new BorderLayout());
    this.getContentPane().add(sp, BorderLayout.NORTH);
    this.getContentPane().add(tsp, BorderLayout.CENTER);

    setLocation(100,100);
    setSize(400,200);
    setVisible(true);
  }
  public static void main(String[] args) {
    new TestSequencePanel();
  }
}


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From ClevelandJ at BATTELLE.ORG  Tue Sep 26 14:42:57 2006
From: ClevelandJ at BATTELLE.ORG (Cleveland, John S)
Date: Tue, 26 Sep 2006 10:42:57 -0400
Subject: [Biojava-dev] Percentage similarity
Message-ID: <251E388086D4D64B8413DCC66EA438084164DB@WS-BSO-MSE1.milky-way.battelle.org>

Does anyone know how to retrieve the percentage similarity from a BLAST
result using BioJava?

 
This field is not available from SeqSimilaritySearchSubHit.
SeqSimilaritySearchSubHit does have the getEValue() and getScore()
methods, so I was a little confused about not finding the "percentage
identity" and "percentage similarity" fields.  I followed the directions
in http://biojava.org/wiki/BioJava:CookBook:Blast:Echo, but again the
percentage similarity does not seem to be getting parsed by the
BlastLikeSaxParser.  Here is the result of the aforementioned code:
 

startHit()

      HitProp:    subjectSequenceLength: 299

      HitProp:    subjectId: lcd|5392-AAA98259

      HitProp:    subjectDescription: 

startSubHit()

      SubHitProp: score: 24.6

      SubHitProp: expectValue: 6.9

      SubHitProp: numberOfIdentities: 14

      SubHitProp: alignmentSize: 42

      SubHitProp: percentageIdentity: 33

      SubHitProp: numberOfIdentities: 14

      SubHitProp: numberOfPositives: 23

      SubHitProp: queryFrame: plus1

      SubHitProp: querySequenceStart: 928

      SubHitProp: querySequenceEnd: 1047

      SubHitProp: querySequence:
TKDGKTQEWEMDNPGN--DFMTGSKDTYTFKLKDENLKIDDI

      SubHitProp: subjectSequenceStart: 126

      SubHitProp: subjectSequenceEnd: 167

      SubHitProp: subjectSequence:
TDDGKIREYELPNKGSYPSFITLGSDNALWFTENQNNAIGRI

endSubHit()

endHit()
 
 
Thanks,
John Cleveland

 
From bugzilla-daemon at portal.open-bio.org  Thu Sep 28 09:27:11 2006
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 28 Sep 2006 05:27:11 -0400
Subject: [Biojava-dev] [Bug 2107] LabelledSequenceRenderer
In-Reply-To: <bug-2107-485@http.bugzilla.open-bio.org/>
Message-ID: <200609280927.k8S9RB8w018668@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2107


jolyon.holdstock at ogt.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jolyon.holdstock at ogt.co.uk


------- Comment #1 from jolyon.holdstock at ogt.co.uk  2006-09-28 05:27 -------
I've had a look at the TranslatedSequencePanel code and seem to have a work
around. I say 'seem' as I'm not an expert on Graphics2D

When using the LabelledSequenceRenderer in the TSP the paint method in the TSP
doesn't set the clip for the renderer correctly.

I have edited the following code in the TSP to change

clip.x
clip.width
The point for g2.translate

This sets the clip correctly, the label renders and the correct sequence
displayed. 

//OLD CODE ==========================================================
if (direction == HORIZONTAL) {
  // Clip x to edge of delegate renderer's leader
  clip.x = renderer.getMinimumLeader(this);
  clip.y = 0.0;
  // Set the width to visible symbols + the delegate
  // renderer's minimum trailer (which may have something in
  // it to render).
  clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) +
renderer.getMinimumTrailer(this);
  clip.height = renderer.getDepth(this);
  g2.translate(leadingBorder.getSize() + insets.left, insets.top); } 

//NEW CODE ============================================================
if (direction == HORIZONTAL) {
  // Clip x to edge of delegate renderer's leader
  //clip.x = renderer.getMinimumLeader(this);
  clip.x = 0 - renderer.getMinimumLeader(this);
  clip.y = 0.0;
  // Set the width to visible symbols + the delegate
  // renderer's minimum trailer (which may have something in
  // it to render).
  clip.width = sequenceToGraphics(getVisibleSymbolCount() + 1) +
renderer.getMinimumLeader(this) + renderer.getMinimumTrailer(this);
  clip.height = renderer.getDepth(this);
  g2.translate(leadingBorder.getSize() - clip.x + insets.left, insets.top); }


I have used this code with the RulerRenderer via the MultiLineRenderer
and think that the ruler doesn't renderer numbers/ticks accurately for
the sequence in the TSP. It's marginal and only relevant at high resolution but
I'll have a look at this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From smh1008 at cam.ac.uk  Thu Sep 28 15:51:36 2006
From: smh1008 at cam.ac.uk (David Huen)
Date: 28 Sep 2006 16:51:36 +0100
Subject: [Biojava-dev] tRNA anticodon alphabet
Message-ID: <Prayer.1.0.18.0609281651360.17217@hermes-2.csi.cam.ac.uk>

Hi, Would there be any object to adding an alphabet to deal with 
anticodons? This would involve an additional alphabet comprising the 
current RNA alphabet extended with inosine.

Regards,
David Huen


From smh1008 at cam.ac.uk  Thu Sep 28 15:48:25 2006
From: smh1008 at cam.ac.uk (David Huen)
Date: 28 Sep 2006 16:48:25 +0100
Subject: [Biojava-dev] CodonPrefTools API
Message-ID: <Prayer.1.0.18.0609281648250.17217@hermes-2.csi.cam.ac.uk>

Hi, I wish to add to the CodonPrefTools API convenience methods that return 
each of the 64 codons. It would seem better to put it here in this less 
used API than clutter up the RNATools API.

If anyone wishes to object could they do so now please?

Regards,
David Huen


From mark.schreiber at novartis.com  Fri Sep 29 01:17:13 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 29 Sep 2006 09:17:13 +0800
Subject: [Biojava-dev] tRNA anticodon alphabet
Message-ID: <OF1F06A041.D888F59F-ON482571F8.000707B1-482571F8.000711D4@ah.novartis.com>

I think it would be a useful addition.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


David Huen <smh1008 at cam.ac.uk>
Sent by: biojava-dev-bounces at lists.open-bio.org
09/28/2006 11:51 PM

 
        To:     biojava-dev at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-dev] tRNA anticodon alphabet


Hi, Would there be any object to adding an alphabet to deal with 
anticodons? This would involve an additional alphabet comprising the 
current RNA alphabet extended with inosine.

Regards,
David Huen
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From bubba.puryear at gmail.com  Fri Sep 29 16:22:03 2006
From: bubba.puryear at gmail.com (Bubba Puryear)
Date: Fri, 29 Sep 2006 12:22:03 -0400
Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with leading
	whitespace
Message-ID: <d2f7533b0609290922w6aee8e07ybae05536a8d634b6@mail.gmail.com>

Hey all,

  I've been using biojava for some time now on my project for reading
genbank flat files, but until reacently I haven't been writing any.
Our client makes extensive use of VectorNTI (version 9, I think) and I
was doing some edits to genbank files (via biojavax) and notice that
comment values get their whitespace trimmed.

  Turns out VNTI splats a load of state that it needs in the comment
section is a fairly lispish looking syntax... but indentation appears
to be important. In particular, VNTI won't read the files I've edited
that have had their whitespace munged. I have some local changes to
the parser that preserve leading/trailing whitespace for section
values for top level sections.

  I've run the tests locally (and added one for testing indented
comments) and run this against ~ 3000 files I have locally. I wanted
to get some feedback on this before I committed, though.

  As an example of the kind of thing that currently gets munged:

COMMENT     Vector_NTI_Display_Data_(Do_Not_Edit!)
COMMENT     (SXF
COMMENT      (CGexDoc "11460" 0 6359
COMMENT       (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0
(CObList) (CObList)
COMMENT        (CObList) (CObList) -1)
COMMENT       (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5
40 50 0 1 0
....

   The level of indentation can get quite deep.

Thanks,
Bubba


From markjschreiber at gmail.com  Sat Sep 30 12:29:41 2006
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 30 Sep 2006 20:29:41 +0800
Subject: [Biojava-dev] GenbankFormat (biojavax) and comments with
	leading whitespace
In-Reply-To: <d2f7533b0609290922w6aee8e07ybae05536a8d634b6@mail.gmail.com>
References: <d2f7533b0609290922w6aee8e07ybae05536a8d634b6@mail.gmail.com>
Message-ID: <93b45ca50609300529r8491cf0p22784589bea59618@mail.gmail.com>

I think this should be fine to commit as long as biojava can still
read in the file again (and other files).

You should probably also comment the code to say VNTI needs this and
to be doubly certain put in a unit test.

- Mark

On 9/30/06, Bubba Puryear <bubba.puryear at gmail.com> wrote:
> Hey all,
>
>   I've been using biojava for some time now on my project for reading
> genbank flat files, but until reacently I haven't been writing any.
> Our client makes extensive use of VectorNTI (version 9, I think) and I
> was doing some edits to genbank files (via biojavax) and notice that
> comment values get their whitespace trimmed.
>
>   Turns out VNTI splats a load of state that it needs in the comment
> section is a fairly lispish looking syntax... but indentation appears
> to be important. In particular, VNTI won't read the files I've edited
> that have had their whitespace munged. I have some local changes to
> the parser that preserve leading/trailing whitespace for section
> values for top level sections.
>
>   I've run the tests locally (and added one for testing indented
> comments) and run this against ~ 3000 files I have locally. I wanted
> to get some feedback on this before I committed, though.
>
>   As an example of the kind of thing that currently gets munged:
>
> COMMENT     Vector_NTI_Display_Data_(Do_Not_Edit!)
> COMMENT     (SXF
> COMMENT      (CGexDoc "11460" 0 6359
> COMMENT       (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0
> (CObList) (CObList)
> COMMENT        (CObList) (CObList) -1)
> COMMENT       (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5
> 40 50 0 1 0
> ....
>
>    The level of indentation can get quite deep.
>
> Thanks,
> Bubba
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at ebi.ac.uk  Tue Sep 12 09:36:47 2006
From: holland at ebi.ac.uk (Richard Holland)
Date: Tue, 12 Sep 2006 09:36:47 -0000
Subject: [Biojava-dev] Problem with ranks
In-Reply-To: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com>
References: <200609120038.k8C0cfDV065591@mmm1924.dulles19-verio.com>
Message-ID: <45067DF3.7020403@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Ranks are never described but the name suggests that they are positive integer, in consecutive order and not identical for similar objects within the same sequence. Here are some questions:

Ranks in general are defined by BioSQL, but as much else in that schema
they are not defined very well and so everyone has their own
interpretation of what should go where.

BioJava uses them in the way which I thought was most logical at the
time, but BioPerl often ignores them completely and populates them all
with zeroes. As BioJava can be connected to a database which could have
been populated by BioPerl, it has to be able to cope with these
different situations and potentially many others.

It would be nice for all the Bio* projects to agree on exactly how to
store various bits of information in BioSQL, especially as to how best
to represent specific file formats such as GenBank, but this is probably
highly unlikely given the limited amount of times when representatives
of all the projects are in the same place at the same time (basically
only at BOSC, and even then not always - there was nobody from BioJava
there this year).

> - Can rank be negative? We would assume not but this is never checked.

Yes. It can be any integer you want.

> - If rank cannot be negative, where do they start, 0, 1? SimpleBioEntryRelationShip suggests that they start at 1 with 0 reserved for absence of ranking.

I tried to start them all from 1, and used 0 for no-rank where rank is
compulsory, and null where rank is optional (see below). If you find
anywhere where I've been inconsistent, please feel free to raise a
Bugzilla bug to point out where I've gone wrong so I can fix them.

> - Are we expecting ranks to be in consecutive order (or in reasonable consecutive order) or values like 1000, 2000, etc. are possible or even expected?

They don't have to be consecutive.

> - Can we have duplicate ranks? We would assume not but SimpleRichFeature javadoc indicates that equal ranks are *acceptable*.

Yes, duplicates are fine.

> SimpleBioEntryRelationship getRank method returns an Integer object, all the other objects return an integer number. Any reason for this?

In BioSQL, BioEntryRelationship has a nullable rank, whereas all other
ranked objects have non-null ranks. Hence I have to use an Integer
object here to be able to cater for the null case, as this cannot be
done with a plain int like the others.

> Moreover 3 of these objects do not have a setRank method: SimpleComment, SimpleRankedCrossRef and SimpleRankedDocRef. How do I insert a comment in the middle of other comments, how do I change the order of these objects without creating new ones?

This is a bug. They should be mutable and fire appropriate change events.

> All these objects have an ordering consistent with equality except SimpleRichFeature. SimpleRichFeature are sorted by rank only. Its compareTo method also never returns 0. A consequence is that removeFeature in ThinRichSequence never works because TreeSet uses compareTo for testing equality.

This is another bug. compareTo, equals and hashCode should always be
working with the same fields. In this case, compareTo is missing a
bunch. It shouldn't be.

A word of warning though - when objects are loaded by Hibernate, often
they are instantiated and added to a set _before_ all the setXXX methods
are called to populate the various fields. Therefore, if you find nulls
in any of the fields required for comparison then you should assume the
object is still incomplete and return a non-zero result, to prevent the
object from accidentally replacing an existing object that matches the
fields populated so far.

> All compareTo methods use rank first except SimpleRankedDocRef which does not use rank at all (but is ranked as its name indicates).

Another bug. It should be using rank as well.

> A few objects are nearly identical when they are equal but not all. SimpleNote compares by rank then by term but not by value. SimpleNotes of same rank and term but different values are nevertheless equal. SimpleRankedDocRef can be equal and have different locations ? I can understand this. 

SimpleNote is correct - two notes are equal if they have the same rank
and term.

SimpleRankedDocRef however is incorrect - it should include location in
the equals/compareTo/hashCode methods. Another bug then, but check for
non-null locations during Hibernate loading as above.

If you or Mark can report all these to Bugzilla, then one of us will get
round to fixing them before the end of the beta testing. (Reporting them
to Bugzilla makes a nice todo list which is far more reliable than me
trying to keep track of everything on paper...).

cheers,
Richard

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFBn3z4C5LeMEKA/QRAmU4AJ9TJ5oh7EnUdJNLHryEx3RxNJ0CXwCfe2eY
e8Qww/i+MMBA8sgRJVvV+Z8=
=UURD
-----END PGP SIGNATURE-----


From markjschreiber at gmail.com  Wed Sep 27 09:18:42 2006
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 27 Sep 2006 17:18:42 +0800
Subject: [Biojava-dev] resources for gui?
Message-ID: <93b45ca50609270218n76f19a2bxcd6c6b4d53dbea15@mail.gmail.com>

Hi -

Can someone tell me what the purpose of the files in
resources/org/biojava/bio is?

Thanks,

- Mark