From bugzilla-daemon at portal.open-bio.org  Wed Oct  1 16:48:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 1 Oct 2008 16:48:15 -0400
Subject: [Biojava-dev] [Bug 2602] New: ParseException thrown when parsing
	Genbank file.
Message-ID: <bug-2602-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2602

           Summary: ParseException thrown when parsing Genbank file.
           Product: BioJava
           Version: live (CVS source)
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: seq.io
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: tritt at wisc.edu


When attempting to read in a Genbank file using RichSequence.IOTools, I
received a ParseException. When using SeqIOTools, I do not have this problem.
The code that exposed the bug is given below. 

public static void main(String[] args) {

        String dnaDir = args[args.length-1];

        BufferedReader[] br = new BufferedReader[8];

        FileReader orthologs = null;
        for (int i = 0; i < br.length; i++)
                br[i] = null;

        try {
                orthologs = new FileReader(args[0]);
                for (int i = 0; i < br.length; i++)
                        br[i] = new BufferedReader(new FileReader(args[i+1]));
        } catch (FileNotFoundException ex){
                ex.printStackTrace();
                System.exit(-1);
        }

        RichSequenceIterator[] seqIt = new RichSequenceIterator[8];

        HashMap<String,RichFeature>[] features = new HashMap[8];
        for (int i = 0; i < features.length; i++){
                features[i] = new HashMap<String,RichFeature>();
        }

        for (int i = 0; i < br.length; i++)
                seqIt[i] = RichSequence.IOTools.readGenbankDNA(br[i], null);

        for (int i = 0; i < seqIt.length; i++){
                RichSequence seq = null;
                try {
                        seq = seqIt[i].nextRichSequence();
                        seqIt[i] = null;
                        br[i] = null;
                } catch (NoSuchElementException ex) {
                        ex.printStackTrace();
                        System.exit(-1);
                } catch (BioException ex) {
                        ex.printStackTrace();
                        System.exit(-1);
                }
                 .
                 .
                 .

The following error message was received.

org.biojava.bio.BioException: Could not read sequence
        at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
        at OrthologSeqExtractor.main(OrthologSeqExtractor.java:76)
Caused by: org.biojava.bio.seq.io.ParseException: 

A Exception Has Occurred During Parsing. 
Please submit the details that follow to biojava-l at biojava.org or post a bug
report to http://bugzilla.open-bio.org/ 

Format_object=org.biojavax.bio.seq.io.GenbankFormat
Accession=EDL933
Id=null
Comments=Bad dbxref
Parse_block=FEATURES   Location/Qualifierssource   1..5528423/db_xref  
"GenBank:AE005174"/db_xref   "RefSeq_NA:NC_002655"/db_xref  
"ATCC:700927"/db_xref   "taxon:155864"/db_xref   "ERIC:SOP"/mol_type   "genomic
DNA"/note   "enterohemorrhagic"/organism   "Escherichia coli"/serotype  
"O157:H7:K-"/strain   "EDL933"/transl_table   11/db_xref  
"ASAP:ABH-0023909"/db_xref   "ERIC:ABH-0023909"

                  .
                  .
                  .

Stack trace follows ....


        at
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:462)
        at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
        ... 1 more


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Oct  2 03:54:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 2 Oct 2008 03:54:42 -0400
Subject: [Biojava-dev] [Bug 2603] New: StringIndexOutOfBoundsException while
	parsing blastresult
Message-ID: <bug-2603-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603

           Summary: StringIndexOutOfBoundsException while parsing
                    blastresult
           Product: BioJava
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: bio
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: dtoomey at rcsi.ie


While parsing a blast result I get a StringIndexOutOfBoundsException. I have
narrowed down the cuase of the error to this section

Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol
2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
GN=ISPF

What I have found is that if the 3rd line is less than 11 characters long the
error is thrown. If I add text or even extra spaces to this line then the error
does not occur. Also I have noticed that it does not happen to the first entry
in a file containing multiple blast searches.

I have tried this on both Windows and Linux and get the same error. I have been
using blast version 2.2.18 but have also tried 2.2.17


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Oct  3 06:30:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 3 Oct 2008 06:30:16 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810031030.m93AUGcD007688@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #1 from dtoomey at rcsi.ie  2008-10-03 06:30 EST -------
I have narrowed down the offending line to

oParsedSeq = poLine.substring( iOffset).concat( new String( oPadding ) );

from 'BlastLikeAlignmentSAXParser.java'

I have put in a hack which at least allows me to run the code

                try {
                        oParsedSeq = poLine.substring( iOffset).concat( new
String( oPadding ) );
                } catch (StringIndexOutOfBoundsException ex) {
                        System.out.println("Caught sub string error for poLine:
" + poLine + " Offset is " + String.valueOf(iOffset));
                        oParsedSeq = poLine.concat( new String( oPadding ) );
                }

(In reply to comment #0)
> While parsing a blast result I get a StringIndexOutOfBoundsException. I have
> narrowed down the cuase of the error to this section
> Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol
> 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
> GN=ISPF
> What I have found is that if the 3rd line is less than 11 characters long the
> error is thrown. If I add text or even extra spaces to this line then the error
> does not occur. Also I have noticed that it does not happen to the first entry
> in a file containing multiple blast searches.
> I have tried this on both Windows and Linux and get the same error. I have been
> using blast version 2.2.18 but have also tried 2.2.17


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Oct 15 04:12:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 15 Oct 2008 04:12:18 -0400
Subject: [Biojava-dev] [Bug 2617] New: Cookbook blast parser example fails
	on a tblastn example
Message-ID: <bug-2617-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2617

           Summary: Cookbook blast parser example fails on a tblastn example
           Product: BioJava
           Version: live (CVS source)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: search
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: holland at ebi.ac.uk


(raised on behalf of user Charles Imbusch)

Hello,

for a project I want to parse a tblastn result with BioJava. I used the code
on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I get an
error message as follows:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String
index out of range: -3
  at java.lang.String.substring(String.java:1938)
  at java.lang.String.substring(String.java:1905)
  at
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289)
  at
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118)
  at
org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635)
  at
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337)
  at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
  at
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313)
  at
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276)
  at
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162)
  at BlastEcho.echo(BlastEcho.java:29)
  at BlastEcho.main(BlastEcho.java:75)

I uploaded the Blast output file I want to parse here:
http://charles.imbusch.net/tmp/blastresult.txt

Any answer is appreciated.

Cheers,
 Charles


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From f.jossinet at ibmc.u-strasbg.fr  Wed Oct 15 04:36:09 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Wed, 15 Oct 2008 10:36:09 +0200
Subject: [Biojava-dev] Proposition of participation to the BioJava project
Message-ID: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>

Dear BioJava team,

my name is Fabrice Jossinet. I'm working as assistant professor in a  
french university (Louis Pasteur University in Strasbourg).
I'm developing bioinformatics tool with the Java language since 2002.  
Before that, I did a PhD as  a molecular biologist at the bench ;)
I'm interested in the study of RNA. At now I'm focused on their  
structural features, but i'm also interested in non-coding RNA genes  
in genomes.
You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ 
. At now this project has a size of 60 000 lines of code and uses more  
than 10 external libraries.

I'm following BioJava since several years now. I would like to extend  
it with RNA concepts. If you think that I can participate, don't  
hesitate to answer me ;)

All the best

Fabrice

--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From simpleyrx at 163.com  Wed Oct 15 05:11:50 2008
From: simpleyrx at 163.com (simpleyrx)
Date: Wed, 15 Oct 2008 17:11:50 +0800 (CST)
Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ?
In-Reply-To: <mailman.2377.1224061218.3070.biojava-dev@lists.open-bio.org>
References: <mailman.2377.1224061218.3070.biojava-dev@lists.open-bio.org>
Message-ID: <7852810.354291224061911001.JavaMail.coremail@app143.163.com>

 
Dear experts,
 
         I wonder that can biojava can calcaulte profile-profile alignment ?
 
 
--


student  

From bugzilla-daemon at portal.open-bio.org  Wed Oct 15 12:05:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 15 Oct 2008 12:05:23 -0400
Subject: [Biojava-dev] [Bug 2617] Cookbook blast parser example fails on a
	tblastn example
In-Reply-To: <bug-2617-485@http.bugzilla.open-bio.org/>
Message-ID: <200810151605.m9FG5Nhb004488@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2617


holland at ebi.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from holland at ebi.ac.uk  2008-10-15 12:05 EST -------


*** This bug has been marked as a duplicate of bug 2603 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Oct 15 12:05:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 15 Oct 2008 12:05:25 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810151605.m9FG5PZo004505@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


holland at ebi.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |holland at ebi.ac.uk


------- Comment #2 from holland at ebi.ac.uk  2008-10-15 12:05 EST -------
*** Bug 2617 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From holland at eaglegenomics.com  Wed Oct 15 12:25:16 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 15 Oct 2008 17:25:16 +0100
Subject: [Biojava-dev] Proposition of participation to the BioJava
	project
In-Reply-To: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>
References: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>
Message-ID: <a0d826f40810150925o2c97e5eeob00de5e9e58f5976@mail.gmail.com>

You're absolutely welcome to contribute! We appreciate all the help we can
get.

I will be sending out an email to the BioJava mailing lists in the next
couple of days inviting contributions for the new BioJava 3 code and
describing how to go about it. I think your RNA ideas would be a great
starting point.

cheers,
Richard

2008/10/15 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>

> Dear BioJava team,
>
> my name is Fabrice Jossinet. I'm working as assistant professor in a french
> university (Louis Pasteur University in Strasbourg).
> I'm developing bioinformatics tool with the Java language since 2002.
> Before that, I did a PhD as  a molecular biologist at the bench ;)
> I'm interested in the study of RNA. At now I'm focused on their structural
> features, but i'm also interested in non-coding RNA genes in genomes.
> You can have a look at my current project at this address:
> http://paradise-ibmc.u-strasbg.fr/. At now this project has a size of 60
> 000 lines of code and uses more than 10 external libraries.
>
> I'm following BioJava since several years now. I would like to extend it
> with RNA concepts. If you think that I can participate, don't hesitate to
> answer me ;)
>
> All the best
>
> Fabrice
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From holland at eaglegenomics.com  Wed Oct 15 12:29:59 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 15 Oct 2008 17:29:59 +0100
Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ?
In-Reply-To: <7852810.354291224061911001.JavaMail.coremail@app143.163.com>
References: <mailman.2377.1224061218.3070.biojava-dev@lists.open-bio.org>
	<7852810.354291224061911001.JavaMail.coremail@app143.163.com>
Message-ID: <a0d826f40810150929n76d861a0r16476be43a9ca831@mail.gmail.com>

The short answer: no.

The long answer: not yet! But if someone would like to contribute some code
that can do it, watch out for my email to the mailing lists in the next
couple of days inviting contributions for the new BioJava 3 code base.

cheers,
Richard

2008/10/15 simpleyrx <simpleyrx at 163.com>

>
> Dear experts,
>
>         I wonder that can biojava can calcaulte profile-profile alignment ?
>
>
>
>
> --
>
>
> student
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 02:15:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 02:15:05 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160615.m9G6F5Tk014016@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #3 from tbanks at agr.gc.ca  2008-10-16 02:15 EST -------
Created an attachment (id=1007)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1007&action=view)
patch file 1 for bug 2603


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 02:15:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 02:15:46 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160615.m9G6FkaF014096@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #4 from tbanks at agr.gc.ca  2008-10-16 02:15 EST -------
Created an attachment (id=1008)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1008&action=view)
patch file 2 for bug 2603


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 02:18:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 02:18:10 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160618.m9G6IATb014290@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #5 from tbanks at agr.gc.ca  2008-10-16 02:18 EST -------
I've written up a fix for this bug.  As Richard suspected this fix takes care
of bug 2617 (I've tested both).  I've attached the patch files for the two
affected files.  If the patches don't take let me know and I'll email the
files.

- Travis


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From f.jossinet at ibmc.u-strasbg.fr  Thu Oct 16 04:50:54 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Thu, 16 Oct 2008 10:50:54 +0200
Subject: [Biojava-dev] Proposition of participation to the BioJava
	project
In-Reply-To: <a0d826f40810150925o2c97e5eeob00de5e9e58f5976@mail.gmail.com>
References: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>
	<a0d826f40810150925o2c97e5eeob00de5e9e58f5976@mail.gmail.com>
Message-ID: <65EB20E6-6137-441B-AC13-26031D46BDFE@ibmc.u-strasbg.fr>

Dear Richard,

Thank you very much. I'm looking forward to this invitation.

All the best

Fabrice

Le 15 oct. 08 ? 18:25, Richard Holland a ?crit :

> You're absolutely welcome to contribute! We appreciate all the help  
> we can get.
>
> I will be sending out an email to the BioJava mailing lists in the  
> next couple of days inviting contributions for the new BioJava 3  
> code and describing how to go about it. I think your RNA ideas would  
> be a great starting point.
>
> cheers,
> Richard
>
> 2008/10/15 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
> Dear BioJava team,
>
> my name is Fabrice Jossinet. I'm working as assistant professor in a  
> french university (Louis Pasteur University in Strasbourg).
> I'm developing bioinformatics tool with the Java language since  
> 2002. Before that, I did a PhD as  a molecular biologist at the  
> bench ;)
> I'm interested in the study of RNA. At now I'm focused on their  
> structural features, but i'm also interested in non-coding RNA genes  
> in genomes.
> You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ 
> . At now this project has a size of 60 000 lines of code and uses  
> more than 10 external libraries.
>
> I'm following BioJava since several years now. I would like to  
> extend it with RNA concepts. If you think that I can participate,  
> don't hesitate to answer me ;)
>
> All the best
>
> Fabrice
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
> -- 
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/


From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 05:39:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 05:39:11 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160939.m9G9dBGm028921@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #6 from holland at ebi.ac.uk  2008-10-16 05:39 EST -------
Thanks for the patches! Could you email me the complete two files that you've
modified (it's easier for me to just copy-and-paste the entire file). I'll then
commit them to SVN.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From fbristow at gmail.com  Fri Oct 17 14:58:08 2008
From: fbristow at gmail.com (Franklin Bristow)
Date: Fri, 17 Oct 2008 13:58:08 -0500
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
Message-ID: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>

Hello everyone,
I've been doing some work with swissprot, and I've been needing to make use
of the file reading and writing facilities in biojava.

I was using biojava 1.5, but I've recently moved to using biojava-live so
that I can actually step through the code to see what's going on.

I have successfully created an index of my swissprot database and I can read
my sequences out of that indexed database.  All of the appropriate
information is loaded from the records in the file into the appropriate
objects.  I am quite happy with this.

The problem that I am having has to do with writing swissprot records.

When I started using biojava, the recommended way to do this was using
SeqIOTools:
SeqIOTools.writeSwissprot(byteStream, swissSequence);

While this works (ie: no exceptions are thrown), the record that is printed
to the byteStream looks pretty ugly (it's littered with XX lines) and is not
valid as per the current swissprot file spec (
http://www.expasy.ch/sprot/userman.html).  While this record is invalid, it
does contain all of the information that was originally in the swissprot
file.  I would include what I get as an output here, but it's irrelevant.

SeqIOTools became deprecated in favour of this:
RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);

Once again, while this works (and this time the record is valid), the record
that is printed contains almost none of the original information that is
contained in the swissprot record.  This is the output that I get when I
call this method (the spacing is may not look right because of fonts, but
that is not the problem):

ID   Q4UVA7_null             STANDARD;         273 AA.
> AC   Q4UVA7;
> DT   null, integrated into UniProtKB/?.
> DT   null, sequence version 0.
> DT   null, entry version 0.
> DE   null.
> FT   any           1    273
> FT   any         153    160
> SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> //
>

But what I am expecting to see looks like this (again, the spacing is the
fault of the font, not the output):

> ID   Y1953_XANC8             Reviewed;         273 AA.
> AC   Q4UVA7;
> DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 1.
> DT   06-FEB-2007, entry version 12.
> DE   UPF0085 protein XC_1953.
> GN   OrderedLocusNames=XC_1953;
> OS   Xanthomonas campestris pv. campestris (strain 8004).
> OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
> OC   Xanthomonadaceae; Xanthomonas.
> OX   NCBI_TaxID=314565;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
> RX   PubMed=15899963; DOI=10.1101/gr.3378705;
> RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q.,
> RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
> RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B.,
> RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
> RT   "Comparative and functional genomic analyses of the pathogenicity of
> RT   phytopathogen Xanthomonas campestris pv. campestris.";
> RL   Genome Res. 15:757-767(2005).
> CC   -!- SIMILARITY: Belongs to the UPF0085 family.
> CC   ------------------------------------------------------------
> -----------
> CC   Copyrighted by the UniProt Consortium, see
> http://www.uniprot.org/terms
> CC   Distributed under the Creative Commons Attribution-NoDerivs License
> CC   ------------------------------------------------------------
> -----------
> DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
> DR   GenomeReviews; CP000050_GR; XC_1953.
> DR   KEGG; xcb:XC_1953; -.
> DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
> DR   HAMAP; MF_01062; -; 1.
> DR   InterPro; IPR005177; DUF299.
> DR   Pfam; PF03618; DUF299; 1.
> KW   ATP-binding; Complete proteome; Nucleotide-binding.
> FT   CHAIN         1    273       UPF0085 protein XC_1953.
> FT                                /FTId=PRO_0000196744.
> FT   NP_BIND     153    160       ATP (Potential).
> SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> //
>

Needless to say, there is a considerable loss of information.

At first I wasn't sure if this was a problem with parsing the database that
I had, so I inspected the object that was retrieved from the database.  As I
mentioned before, the parsing seems to be working fine.  I get a
SimpleSequence object that has all of the correct annotations and other
information loaded into it.

I then continued to step through the writeUniProt method in
RichSequence.IOTools and found that this method first calls "enrich" on
SimpleSequence which turns it into a SimpleRichSequence.  There appears to
be some loss of information at this point, specifically in the feature set
where the 'key name' is lost -- it just becomes 'any'.

It is when we get to the actual process of writing to the stream in
UniprotFormat.writeSequence that we have the problems.  All of the code
appears to be there for printing the information out that I'm expecting.  I
think the problem is that in the process of "enrich"-ing the sequence, the
data is still stored in the object, but it is no longer where it is expected
to be.  For example, when we get to writing the comments out:
        // comments - if any
        if (!rs.getComments().isEmpty()) {

The List of comments IS empty, but there are comments in the
SimpleRichSequence, they are stored in the notes data member.

So.  After this lengthy explanation of my problem, I am wondering if I am
merely not doing this correctly.  Is there a better way to pass my
information to the writeUniprot method -- should I be transforming my
SimpleSequence objects into a SimpleRichSequence manually?  Am I just going
about this entirely the wrong way?

If I am going about this correctly and the functionality to do this is
merely not there or hasn't been implemented correctly, I would be more than
happy to help out...  I can supply patches, create bug reports, or anything
else that is necessary.

Any guidance in this matter would be greatly appreciated!

-- 
Franklin

From holland at eaglegenomics.com  Fri Oct 17 16:08:25 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 17 Oct 2008 21:08:25 +0100
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
In-Reply-To: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
Message-ID: <a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>

Hello.

I'm not sure how you're getting your uniprot records out of your swissprot
database, or what format your swissprot database is in? If it's BioSQL, then
the way BioJava interacts with it has altered significantly with BioJavaX -
previous versions basically stuffed everything in as comments, hence all the
XX lines you got when writing it back out again. However if it's not BioSQL
and you've written something custom of your own, then I couldn't really
comment!

BioJavaX will attempt to convert the old sequence objects into rich sequence
objects, but there's not much in common between the way uniprot data is
stored in the old object model and the new one. Therefore the enrich method
can't do a very good job - especially for stuff which the original parser
stored as comments instead of properly distributing it across the object
model. Data which the original parser stored in this comment format will
mostly get ignored by the conversion process, because the conversion process
has no idea where the record came from and therefore what to do with the
comments inside it.

Your best bet is to read your data out of your database directly as rich
sequence objects, or if not possible, then do the conversion manually.

cheers,
Richard


2008/10/17 Franklin Bristow <fbristow at gmail.com>

> Hello everyone,
> I've been doing some work with swissprot, and I've been needing to make use
> of the file reading and writing facilities in biojava.
>
> I was using biojava 1.5, but I've recently moved to using biojava-live so
> that I can actually step through the code to see what's going on.
>
> I have successfully created an index of my swissprot database and I can
> read
> my sequences out of that indexed database.  All of the appropriate
> information is loaded from the records in the file into the appropriate
> objects.  I am quite happy with this.
>
> The problem that I am having has to do with writing swissprot records.
>
> When I started using biojava, the recommended way to do this was using
> SeqIOTools:
> SeqIOTools.writeSwissprot(byteStream, swissSequence);
>
> While this works (ie: no exceptions are thrown), the record that is printed
> to the byteStream looks pretty ugly (it's littered with XX lines) and is
> not
> valid as per the current swissprot file spec (
> http://www.expasy.ch/sprot/userman.html).  While this record is invalid,
> it
> does contain all of the information that was originally in the swissprot
> file.  I would include what I get as an output here, but it's irrelevant.
>
> SeqIOTools became deprecated in favour of this:
> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);
>
> Once again, while this works (and this time the record is valid), the
> record
> that is printed contains almost none of the original information that is
> contained in the swissprot record.  This is the output that I get when I
> call this method (the spacing is may not look right because of fonts, but
> that is not the problem):
>
> ID   Q4UVA7_null             STANDARD;         273 AA.
> > AC   Q4UVA7;
> > DT   null, integrated into UniProtKB/?.
> > DT   null, sequence version 0.
> > DT   null, entry version 0.
> > DE   null.
> > FT   any           1    273
> > FT   any         153    160
> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> > //
> >
>
> But what I am expecting to see looks like this (again, the spacing is the
> fault of the font, not the output):
>
> > ID   Y1953_XANC8             Reviewed;         273 AA.
> > AC   Q4UVA7;
> > DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
> > DT   05-JUL-2005, sequence version 1.
> > DT   06-FEB-2007, entry version 12.
> > DE   UPF0085 protein XC_1953.
> > GN   OrderedLocusNames=XC_1953;
> > OS   Xanthomonas campestris pv. campestris (strain 8004).
> > OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
> > OC   Xanthomonadaceae; Xanthomonas.
> > OX   NCBI_TaxID=314565;
> > RN   [1]
> > RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
> > RX   PubMed=15899963; DOI=10.1101/gr.3378705;
> > RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q.,
> > RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
> > RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B.,
> > RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
> > RT   "Comparative and functional genomic analyses of the pathogenicity of
> > RT   phytopathogen Xanthomonas campestris pv. campestris.";
> > RL   Genome Res. 15:757-767(2005).
> > CC   -!- SIMILARITY: Belongs to the UPF0085 family.
> > CC   ------------------------------------------------------------
> > -----------
> > CC   Copyrighted by the UniProt Consortium, see
> > http://www.uniprot.org/terms
> > CC   Distributed under the Creative Commons Attribution-NoDerivs License
> > CC   ------------------------------------------------------------
> > -----------
> > DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
> > DR   GenomeReviews; CP000050_GR; XC_1953.
> > DR   KEGG; xcb:XC_1953; -.
> > DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
> > DR   HAMAP; MF_01062; -; 1.
> > DR   InterPro; IPR005177; DUF299.
> > DR   Pfam; PF03618; DUF299; 1.
> > KW   ATP-binding; Complete proteome; Nucleotide-binding.
> > FT   CHAIN         1    273       UPF0085 protein XC_1953.
> > FT                                /FTId=PRO_0000196744.
> > FT   NP_BIND     153    160       ATP (Potential).
> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> > //
> >
>
> Needless to say, there is a considerable loss of information.
>
> At first I wasn't sure if this was a problem with parsing the database that
> I had, so I inspected the object that was retrieved from the database.  As
> I
> mentioned before, the parsing seems to be working fine.  I get a
> SimpleSequence object that has all of the correct annotations and other
> information loaded into it.
>
> I then continued to step through the writeUniProt method in
> RichSequence.IOTools and found that this method first calls "enrich" on
> SimpleSequence which turns it into a SimpleRichSequence.  There appears to
> be some loss of information at this point, specifically in the feature set
> where the 'key name' is lost -- it just becomes 'any'.
>
> It is when we get to the actual process of writing to the stream in
> UniprotFormat.writeSequence that we have the problems.  All of the code
> appears to be there for printing the information out that I'm expecting.  I
> think the problem is that in the process of "enrich"-ing the sequence, the
> data is still stored in the object, but it is no longer where it is
> expected
> to be.  For example, when we get to writing the comments out:
>        // comments - if any
>        if (!rs.getComments().isEmpty()) {
>
> The List of comments IS empty, but there are comments in the
> SimpleRichSequence, they are stored in the notes data member.
>
> So.  After this lengthy explanation of my problem, I am wondering if I am
> merely not doing this correctly.  Is there a better way to pass my
> information to the writeUniprot method -- should I be transforming my
> SimpleSequence objects into a SimpleRichSequence manually?  Am I just going
> about this entirely the wrong way?
>
> If I am going about this correctly and the functionality to do this is
> merely not there or hasn't been implemented correctly, I would be more than
> happy to help out...  I can supply patches, create bug reports, or anything
> else that is necessary.
>
> Any guidance in this matter would be greatly appreciated!
>
> --
> Franklin
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From holland at eaglegenomics.com  Sun Oct 19 20:18:29 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 01:18:29 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
Message-ID: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>

Hi all,

I've just committed some new code to the biojava3 branch of the biojava-live
subversion repository. It's the foundations of a brand new alphabet+symbol
set of classes, and an example of how to use them to represent DNA. You'll
notice that the new code is very lightweight and allows for a lot more
flexibility than the old code - for instance, the concept of Alphabet has
changed radically. It also makes much more extensive use of the Collections
API.

I haven't got any test cases or usage examples yet but give me a shout if
you don't understand the code and I'll explain how it works. (Hint:
SymbolFormat is there to convert Strings into SymbolList objects, and vice
versa).

So, now we want some volunteers! We're starting from scratch here so there's
a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
whether it be copy-and-paste existing classes and modify them to suit the
new style, or write completely new ones to provide equivalent functionality.


I'll post an example of how to do file parsing soon, probably starting with
FASTA. In the meantime, a good place to start would be for people to design
object models to represent their favourite data types (e.g. Genbank, or
microarray data). Utility classes to manipulate those objects would be great
too.

The object models need to be normalised as much as possible - e.g. if your
data has a lot of comments, and the order of those comments is important,
then give your object model a collection of comment objects. The object
model for each data type should be completely independent and use basic data
types wherever possible (e.g. store sequences as strings, don't attempt to
parse them into anything fancy like SymbolLists). The closer the object
model is to the original data format, the better. There's going to be clever
tricks when it comes to converting data between different object models
(e.g. Genbank to INSDSeq), which I will explain later when I put the file
parsing examples up.

You'll notice how the biojava3 branch uses Maven instead of Ant. This is
because we want to make it as modular as possible, so if you want to write
microarray stuff, create a new microarray sub-project (as per the dna
example that's already there). This way if someone only wants the microarray
bit of BJ3, they only need install the appropriate JAR file and can ignore
the rest. (The 'core' module is for stuff that is so generic it could be
used anywhere, or is used in every single other module.)

If coding isn't your cup of tea, then we would very much welcome testers
(particularly those who enjoy writing test cases!), documenters
(particularly code commenters), translators (for internationalisation of the
code), and of course all those who wish to contribute ideas and suggestions
no matter how off-the-wall they might be. In particular if you'd like to
take charge of an area of the development process, e.g. Documentation Chief,
or Protein Champion, then that would be much appreciated.

I'm very much looking forward to working with everyone on this. Good luck,
and happy coding!

cheers,
Richard

PS. Please don't forget to attach the appropriate licence to your code. You
can copy-and-paste it from the existing classes I just committed this
evening.

PPS. For those who are worried about backwards compatibility - this was
discussed on the lists a while back and it was made clear that BJ3 is a
clean break. However, the existing code will continue to be maintained and
bugfixed for a couple of years so you don't have to upgrade if you don't
want to - it just won't have any new features developed for it. This is
largely because it'll probably take just that long to write all the new BJ3
code. When we do decide to desupport the existing BJ code, plenty of notice
will be given (i.e. years as opposed to months).


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From markjschreiber at gmail.com  Mon Oct 20 00:13:01 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 20 Oct 2008 12:13:01 +0800
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
Message-ID: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com>

Hi -

Just a comment ...

Does an alphabet need to be a Singleton in this new paradigm? If it
does then do you want to have an equals() method? Currently you could
have:

Alphabet a; Alphabet b;

a.equals(b) //true;
a == b //false

Unless there is a strong reason why Alphabet needs to be a Singleton I
don't think it should be (Singletons make life hard when transporting
between JVMs).  You can get a similar kind of behaivor with caching
where it doesn't hurt if there is more than one instance of an equal
alphabet but when they pass through the cache they can get cleaned up
(like the interning behaivour of Strings).

Put it this way. If I have two copies of the DNA alphabet will it
matter (other than a bit of memory waste)?

- Mark

On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland
<holland at eaglegenomics.com> wrote:
> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From holland at eaglegenomics.com  Mon Oct 20 04:23:17 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 09:23:17 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com>
Message-ID: <a0d826f40810200123x3e3b4d79s71d4aaa89545f4b5@mail.gmail.com>

Good point, and the answer is no it doesn't really matter! So I will remove
the singleton-ish ness of Alphabet.


2008/10/20 Mark Schreiber <markjschreiber at gmail.com>

> Hi -
>
> Just a comment ...
>
> Does an alphabet need to be a Singleton in this new paradigm? If it
> does then do you want to have an equals() method? Currently you could
> have:
>
> Alphabet a; Alphabet b;
>
> a.equals(b) //true;
> a == b //false
>
> Unless there is a strong reason why Alphabet needs to be a Singleton I
> don't think it should be (Singletons make life hard when transporting
> between JVMs).  You can get a similar kind of behaivor with caching
> where it doesn't hurt if there is more than one instance of an equal
> alphabet but when they pass through the cache they can get cleaned up
> (like the interning behaivour of Strings).
>
> Put it this way. If I have two copies of the DNA alphabet will it
> matter (other than a bit of memory waste)?
>
> - Mark
>
> On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland
> <holland at eaglegenomics.com> wrote:
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From fbristow at gmail.com  Mon Oct 20 09:36:15 2008
From: fbristow at gmail.com (Franklin Bristow)
Date: Mon, 20 Oct 2008 08:36:15 -0500
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
In-Reply-To: <a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>
References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
	<a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>
Message-ID: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com>

Hi Richard,
I'm getting my records from an indexed flat file.  I indexed the file using
IndexTools.indexSwissprot().  I am then retrieving the records from the flat
file "database" using the SequenceDBLite interface which is being provided
to me using the Registry and SystemRegistry classes.  The following a simple
example of what I am doing:

First I index the flat file:

> File[] files = new File[] { new File("/home/fbristow/db/uniprot_sprot.dat")
> };
> try {
>       IndexTools.indexSwissprot("uniprot_sprot", new
> File("/home/fbristow/db/index/uniprot_sprot"), files);
> } catch (BioException bioE) {
>       bioE.printStackTrace();
> } catch (ParserException parseE) {
>       parseE.printStackTrace();
> } catch (IOException ioE) {
>       ioE.printStackTrace();
> }


Then I get a handle on that file by doing:

> Registry registry = SystemRegistry.instance();
> setSwissDatabase(registry.getDatabase("swissprot"))
>

And I have a file in /etc that tells the registry how to find the indexes
with the swissprot identifier as per
http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html

Ultimately, this gives me a class that implements the interface
SequenceDBLite, and when I query this interface for sequences it returns to
me Sequence objects.  I can't seem to see anything that would give me a
RichSequence, so I think that I'll continue to get them in this manner, but
I'll convert the Sequence objects into RichSequence objects myself.

Thanks for your attention!


On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Hello.
>
> I'm not sure how you're getting your uniprot records out of your swissprot
> database, or what format your swissprot database is in? If it's BioSQL, then
> the way BioJava interacts with it has altered significantly with BioJavaX -
> previous versions basically stuffed everything in as comments, hence all the
> XX lines you got when writing it back out again. However if it's not BioSQL
> and you've written something custom of your own, then I couldn't really
> comment!
>
> BioJavaX will attempt to convert the old sequence objects into rich
> sequence objects, but there's not much in common between the way uniprot
> data is stored in the old object model and the new one. Therefore the enrich
> method can't do a very good job - especially for stuff which the original
> parser stored as comments instead of properly distributing it across the
> object model. Data which the original parser stored in this comment format
> will mostly get ignored by the conversion process, because the conversion
> process has no idea where the record came from and therefore what to do with
> the comments inside it.
>
> Your best bet is to read your data out of your database directly as rich
> sequence objects, or if not possible, then do the conversion manually.
>
> cheers,
> Richard
>
>
> 2008/10/17 Franklin Bristow <fbristow at gmail.com>
>
>> Hello everyone,
>> I've been doing some work with swissprot, and I've been needing to make
>> use
>> of the file reading and writing facilities in biojava.
>>
>> I was using biojava 1.5, but I've recently moved to using biojava-live so
>> that I can actually step through the code to see what's going on.
>>
>> I have successfully created an index of my swissprot database and I can
>> read
>> my sequences out of that indexed database.  All of the appropriate
>> information is loaded from the records in the file into the appropriate
>> objects.  I am quite happy with this.
>>
>> The problem that I am having has to do with writing swissprot records.
>>
>> When I started using biojava, the recommended way to do this was using
>> SeqIOTools:
>> SeqIOTools.writeSwissprot(byteStream, swissSequence);
>>
>> While this works (ie: no exceptions are thrown), the record that is
>> printed
>> to the byteStream looks pretty ugly (it's littered with XX lines) and is
>> not
>> valid as per the current swissprot file spec (
>> http://www.expasy.ch/sprot/userman.html).  While this record is invalid,
>> it
>> does contain all of the information that was originally in the swissprot
>> file.  I would include what I get as an output here, but it's irrelevant.
>>
>> SeqIOTools became deprecated in favour of this:
>> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);
>>
>> Once again, while this works (and this time the record is valid), the
>> record
>> that is printed contains almost none of the original information that is
>> contained in the swissprot record.  This is the output that I get when I
>> call this method (the spacing is may not look right because of fonts, but
>> that is not the problem):
>>
>> ID   Q4UVA7_null             STANDARD;         273 AA.
>> > AC   Q4UVA7;
>> > DT   null, integrated into UniProtKB/?.
>> > DT   null, sequence version 0.
>> > DT   null, entry version 0.
>> > DE   null.
>> > FT   any           1    273
>> > FT   any         153    160
>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>> > //
>> >
>>
>> But what I am expecting to see looks like this (again, the spacing is the
>> fault of the font, not the output):
>>
>> > ID   Y1953_XANC8             Reviewed;         273 AA.
>> > AC   Q4UVA7;
>> > DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
>> > DT   05-JUL-2005, sequence version 1.
>> > DT   06-FEB-2007, entry version 12.
>> > DE   UPF0085 protein XC_1953.
>> > GN   OrderedLocusNames=XC_1953;
>> > OS   Xanthomonas campestris pv. campestris (strain 8004).
>> > OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
>> > OC   Xanthomonadaceae; Xanthomonas.
>> > OX   NCBI_TaxID=314565;
>> > RN   [1]
>> > RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
>> > RX   PubMed=15899963; DOI=10.1101/gr.3378705;
>> > RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q.,
>> > RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
>> > RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B.,
>> > RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
>> > RT   "Comparative and functional genomic analyses of the pathogenicity
>> of
>> > RT   phytopathogen Xanthomonas campestris pv. campestris.";
>> > RL   Genome Res. 15:757-767(2005).
>> > CC   -!- SIMILARITY: Belongs to the UPF0085 family.
>> > CC   ------------------------------------------------------------
>> > -----------
>> > CC   Copyrighted by the UniProt Consortium, see
>> > http://www.uniprot.org/terms
>> > CC   Distributed under the Creative Commons Attribution-NoDerivs License
>> > CC   ------------------------------------------------------------
>> > -----------
>> > DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
>> > DR   GenomeReviews; CP000050_GR; XC_1953.
>> > DR   KEGG; xcb:XC_1953; -.
>> > DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
>> > DR   HAMAP; MF_01062; -; 1.
>> > DR   InterPro; IPR005177; DUF299.
>> > DR   Pfam; PF03618; DUF299; 1.
>> > KW   ATP-binding; Complete proteome; Nucleotide-binding.
>> > FT   CHAIN         1    273       UPF0085 protein XC_1953.
>> > FT                                /FTId=PRO_0000196744.
>> > FT   NP_BIND     153    160       ATP (Potential).
>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>> > //
>> >
>>
>> Needless to say, there is a considerable loss of information.
>>
>> At first I wasn't sure if this was a problem with parsing the database
>> that
>> I had, so I inspected the object that was retrieved from the database.  As
>> I
>> mentioned before, the parsing seems to be working fine.  I get a
>> SimpleSequence object that has all of the correct annotations and other
>> information loaded into it.
>>
>> I then continued to step through the writeUniProt method in
>> RichSequence.IOTools and found that this method first calls "enrich" on
>> SimpleSequence which turns it into a SimpleRichSequence.  There appears to
>> be some loss of information at this point, specifically in the feature set
>> where the 'key name' is lost -- it just becomes 'any'.
>>
>> It is when we get to the actual process of writing to the stream in
>> UniprotFormat.writeSequence that we have the problems.  All of the code
>> appears to be there for printing the information out that I'm expecting.
>>  I
>> think the problem is that in the process of "enrich"-ing the sequence, the
>> data is still stored in the object, but it is no longer where it is
>> expected
>> to be.  For example, when we get to writing the comments out:
>>        // comments - if any
>>        if (!rs.getComments().isEmpty()) {
>>
>> The List of comments IS empty, but there are comments in the
>> SimpleRichSequence, they are stored in the notes data member.
>>
>> So.  After this lengthy explanation of my problem, I am wondering if I am
>> merely not doing this correctly.  Is there a better way to pass my
>> information to the writeUniprot method -- should I be transforming my
>> SimpleSequence objects into a SimpleRichSequence manually?  Am I just
>> going
>> about this entirely the wrong way?
>>
>> If I am going about this correctly and the functionality to do this is
>> merely not there or hasn't been implemented correctly, I would be more
>> than
>> happy to help out...  I can supply patches, create bug reports, or
>> anything
>> else that is necessary.
>>
>> Any guidance in this matter would be greatly appreciated!
>>
>> --
>> Franklin
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>


-- 
Franklin

From holland at eaglegenomics.com  Mon Oct 20 09:51:36 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 14:51:36 +0100
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
Message-ID: <a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>

Excellent! Thanks for your offer of help!

Yes, an advanced RNA module would be very helpful indeed. You should
probably call it 'rna'.

As long as everyone who intends to work on BJ3 declares their intentions
here, as you just have, then basically it's first come first served. I won't
be doing any official supervision other than keeping an eye on committed
code once in a while to make sure it all looks OK. So feel free to start
coding straight away!

All new modules should probably start by:

1. copying the existing dna module to something new, like 'rna' in this
case.
2. remove all the hidden .svn directories from the copy,
3. update the pom.xml in the copy (do a search-and-replace on dna and change
to the new name, rna in this case), delete the existing source packages in
src/main/java (org.biojava.dna) and create suitable new ones
(org.biojava.rna in this case).
4. empty out the target/ folder then svn add the new module
5. svn:ignore the target/ directory in your new module,
6. include your new module in the list at the end of the pom.xml in the root
directory of the biojava3 branch.

cheers,
Richard


2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>

> Dear Richard,
>
> I'm answering to your "official call", to propose you my help for the
> development of the biojava3 code. With the modularity of Maven, I also would
> like to proposes you my help for the development of a module that will use
> the biojava3 code to manage more specialized RNA stuff (secondary and
> tertiary structures, base-pairs classifications, modified nucleotides, RNA
> alignments,....).
>
> What will be the next step for me? Will you make a selection?
>
> Best Regards
>
> Fabrice Jossinet
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From holland at eaglegenomics.com  Mon Oct 20 10:17:34 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 15:17:34 +0100
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
In-Reply-To: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com>
References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
	<a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>
	<50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com>
Message-ID: <a0d826f40810200717l2d1a2373n756dbd1d083eaa3a@mail.gmail.com>

Wow, I didn't know anyone was actually using the registry thing. I certainly
never have! That's probably why it was left out of the whole update to
RichSequences. There will probably be equivalent functionality in BioJava3
at some point but I doubt anyone will backport the RichSequence updates to
the existing registry setup (unless there's any volunteers!).

Good luck with the conversion process.

cheers,
Richard

2008/10/20 Franklin Bristow <fbristow at gmail.com>

> Hi Richard,
> I'm getting my records from an indexed flat file.  I indexed the file using
> IndexTools.indexSwissprot().  I am then retrieving the records from the flat
> file "database" using the SequenceDBLite interface which is being provided
> to me using the Registry and SystemRegistry classes.  The following a simple
> example of what I am doing:
>
> First I index the flat file:
>
>> File[] files = new File[] { new
>> File("/home/fbristow/db/uniprot_sprot.dat") };
>> try {
>>       IndexTools.indexSwissprot("uniprot_sprot", new
>> File("/home/fbristow/db/index/uniprot_sprot"), files);
>> } catch (BioException bioE) {
>>       bioE.printStackTrace();
>> } catch (ParserException parseE) {
>>       parseE.printStackTrace();
>> } catch (IOException ioE) {
>>       ioE.printStackTrace();
>> }
>
>
> Then I get a handle on that file by doing:
>
>> Registry registry = SystemRegistry.instance();
>> setSwissDatabase(registry.getDatabase("swissprot"))
>>
>
> And I have a file in /etc that tells the registry how to find the indexes
> with the swissprot identifier as per
> http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html
>
> Ultimately, this gives me a class that implements the interface
> SequenceDBLite, and when I query this interface for sequences it returns to
> me Sequence objects.  I can't seem to see anything that would give me a
> RichSequence, so I think that I'll continue to get them in this manner, but
> I'll convert the Sequence objects into RichSequence objects myself.
>
> Thanks for your attention!
>
>
> On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland <
> holland at eaglegenomics.com> wrote:
>
>> Hello.
>>
>> I'm not sure how you're getting your uniprot records out of your swissprot
>> database, or what format your swissprot database is in? If it's BioSQL, then
>> the way BioJava interacts with it has altered significantly with BioJavaX -
>> previous versions basically stuffed everything in as comments, hence all the
>> XX lines you got when writing it back out again. However if it's not BioSQL
>> and you've written something custom of your own, then I couldn't really
>> comment!
>>
>> BioJavaX will attempt to convert the old sequence objects into rich
>> sequence objects, but there's not much in common between the way uniprot
>> data is stored in the old object model and the new one. Therefore the enrich
>> method can't do a very good job - especially for stuff which the original
>> parser stored as comments instead of properly distributing it across the
>> object model. Data which the original parser stored in this comment format
>> will mostly get ignored by the conversion process, because the conversion
>> process has no idea where the record came from and therefore what to do with
>> the comments inside it.
>>
>> Your best bet is to read your data out of your database directly as rich
>> sequence objects, or if not possible, then do the conversion manually.
>>
>> cheers,
>> Richard
>>
>>
>> 2008/10/17 Franklin Bristow <fbristow at gmail.com>
>>
>>> Hello everyone,
>>> I've been doing some work with swissprot, and I've been needing to make
>>> use
>>> of the file reading and writing facilities in biojava.
>>>
>>> I was using biojava 1.5, but I've recently moved to using biojava-live so
>>> that I can actually step through the code to see what's going on.
>>>
>>> I have successfully created an index of my swissprot database and I can
>>> read
>>> my sequences out of that indexed database.  All of the appropriate
>>> information is loaded from the records in the file into the appropriate
>>> objects.  I am quite happy with this.
>>>
>>> The problem that I am having has to do with writing swissprot records.
>>>
>>> When I started using biojava, the recommended way to do this was using
>>> SeqIOTools:
>>> SeqIOTools.writeSwissprot(byteStream, swissSequence);
>>>
>>> While this works (ie: no exceptions are thrown), the record that is
>>> printed
>>> to the byteStream looks pretty ugly (it's littered with XX lines) and is
>>> not
>>> valid as per the current swissprot file spec (
>>> http://www.expasy.ch/sprot/userman.html).  While this record is invalid,
>>> it
>>> does contain all of the information that was originally in the swissprot
>>> file.  I would include what I get as an output here, but it's irrelevant.
>>>
>>> SeqIOTools became deprecated in favour of this:
>>> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);
>>>
>>> Once again, while this works (and this time the record is valid), the
>>> record
>>> that is printed contains almost none of the original information that is
>>> contained in the swissprot record.  This is the output that I get when I
>>> call this method (the spacing is may not look right because of fonts, but
>>> that is not the problem):
>>>
>>> ID   Q4UVA7_null             STANDARD;         273 AA.
>>> > AC   Q4UVA7;
>>> > DT   null, integrated into UniProtKB/?.
>>> > DT   null, sequence version 0.
>>> > DT   null, entry version 0.
>>> > DE   null.
>>> > FT   any           1    273
>>> > FT   any         153    160
>>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>>> > //
>>> >
>>>
>>> But what I am expecting to see looks like this (again, the spacing is the
>>> fault of the font, not the output):
>>>
>>> > ID   Y1953_XANC8             Reviewed;         273 AA.
>>> > AC   Q4UVA7;
>>> > DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
>>> > DT   05-JUL-2005, sequence version 1.
>>> > DT   06-FEB-2007, entry version 12.
>>> > DE   UPF0085 protein XC_1953.
>>> > GN   OrderedLocusNames=XC_1953;
>>> > OS   Xanthomonas campestris pv. campestris (strain 8004).
>>> > OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
>>> > OC   Xanthomonadaceae; Xanthomonas.
>>> > OX   NCBI_TaxID=314565;
>>> > RN   [1]
>>> > RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
>>> > RX   PubMed=15899963; DOI=10.1101/gr.3378705;
>>> > RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun
>>> Q.,
>>> > RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
>>> > RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen
>>> B.,
>>> > RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
>>> > RT   "Comparative and functional genomic analyses of the pathogenicity
>>> of
>>> > RT   phytopathogen Xanthomonas campestris pv. campestris.";
>>> > RL   Genome Res. 15:757-767(2005).
>>> > CC   -!- SIMILARITY: Belongs to the UPF0085 family.
>>> > CC   ------------------------------------------------------------
>>> > -----------
>>> > CC   Copyrighted by the UniProt Consortium, see
>>> > http://www.uniprot.org/terms
>>> > CC   Distributed under the Creative Commons Attribution-NoDerivs
>>> License
>>> > CC   ------------------------------------------------------------
>>> > -----------
>>> > DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
>>> > DR   GenomeReviews; CP000050_GR; XC_1953.
>>> > DR   KEGG; xcb:XC_1953; -.
>>> > DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
>>> > DR   HAMAP; MF_01062; -; 1.
>>> > DR   InterPro; IPR005177; DUF299.
>>> > DR   Pfam; PF03618; DUF299; 1.
>>> > KW   ATP-binding; Complete proteome; Nucleotide-binding.
>>> > FT   CHAIN         1    273       UPF0085 protein XC_1953.
>>> > FT                                /FTId=PRO_0000196744.
>>> > FT   NP_BIND     153    160       ATP (Potential).
>>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>>> > //
>>> >
>>>
>>> Needless to say, there is a considerable loss of information.
>>>
>>> At first I wasn't sure if this was a problem with parsing the database
>>> that
>>> I had, so I inspected the object that was retrieved from the database.
>>>  As I
>>> mentioned before, the parsing seems to be working fine.  I get a
>>> SimpleSequence object that has all of the correct annotations and other
>>> information loaded into it.
>>>
>>> I then continued to step through the writeUniProt method in
>>> RichSequence.IOTools and found that this method first calls "enrich" on
>>> SimpleSequence which turns it into a SimpleRichSequence.  There appears
>>> to
>>> be some loss of information at this point, specifically in the feature
>>> set
>>> where the 'key name' is lost -- it just becomes 'any'.
>>>
>>> It is when we get to the actual process of writing to the stream in
>>> UniprotFormat.writeSequence that we have the problems.  All of the code
>>> appears to be there for printing the information out that I'm expecting.
>>>  I
>>> think the problem is that in the process of "enrich"-ing the sequence,
>>> the
>>> data is still stored in the object, but it is no longer where it is
>>> expected
>>> to be.  For example, when we get to writing the comments out:
>>>        // comments - if any
>>>        if (!rs.getComments().isEmpty()) {
>>>
>>> The List of comments IS empty, but there are comments in the
>>> SimpleRichSequence, they are stored in the notes data member.
>>>
>>> So.  After this lengthy explanation of my problem, I am wondering if I am
>>> merely not doing this correctly.  Is there a better way to pass my
>>> information to the writeUniprot method -- should I be transforming my
>>> SimpleSequence objects into a SimpleRichSequence manually?  Am I just
>>> going
>>> about this entirely the wrong way?
>>>
>>> If I am going about this correctly and the functionality to do this is
>>> merely not there or hasn't been implemented correctly, I would be more
>>> than
>>> happy to help out...  I can supply patches, create bug reports, or
>>> anything
>>> else that is necessary.
>>>
>>> Any guidance in this matter would be greatly appreciated!
>>>
>>> --
>>> Franklin
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>
>
>
> --
> Franklin
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From f.jossinet at ibmc.u-strasbg.fr  Mon Oct 20 09:04:29 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Mon, 20 Oct 2008 15:04:29 +0200
Subject: [Biojava-dev] BioJava3 contribution
Message-ID: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>

Dear Richard,

I'm answering to your "official call", to propose you my help for the  
development of the biojava3 code. With the modularity of Maven, I also  
would like to proposes you my help for the development of a module  
that will use the biojava3 code to manage more specialized RNA stuff  
(secondary and tertiary structures, base-pairs classifications,  
modified nucleotides, RNA alignments,....).

What will be the next step for me? Will you make a selection?

Best Regards

Fabrice Jossinet

--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From andreas at sdsc.edu  Mon Oct 20 15:18:48 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Oct 2008 12:18:48 -0700
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
Message-ID: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>

Hi Fabrice,

Regarding the tertiaty structure representation we should work
together. There is a seet of tools available already in the current
biojava 1.7 which I was intending to maintain and migrate to biojava v
3. Let me know if you have specific RNA related requests...

Andreas

On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland
<holland at eaglegenomics.com> wrote:
> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their intentions
> here, as you just have, then basically it's first come first served. I won't
> be doing any official supervision other than keeping an eye on committed
> code once in a while to make sure it all looks OK. So feel free to start
> coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in this
> case.
> 2. remove all the hidden .svn directories from the copy,
> 3. update the pom.xml in the copy (do a search-and-replace on dna and change
> to the new name, rna in this case), delete the existing source packages in
> src/main/java (org.biojava.dna) and create suitable new ones
> (org.biojava.rna in this case).
> 4. empty out the target/ folder then svn add the new module
> 5. svn:ignore the target/ directory in your new module,
> 6. include your new module in the list at the end of the pom.xml in the root
> directory of the biojava3 branch.
>
> cheers,
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>
>> Dear Richard,
>>
>> I'm answering to your "official call", to propose you my help for the
>> development of the biojava3 code. With the modularity of Maven, I also would
>> like to proposes you my help for the development of a module that will use
>> the biojava3 code to manage more specialized RNA stuff (secondary and
>> tertiary structures, base-pairs classifications, modified nucleotides, RNA
>> alignments,....).
>>
>> What will be the next step for me? Will you make a selection?
>>
>> Best Regards
>>
>> Fabrice Jossinet
>>
>> --
>> Dr. Fabrice Jossinet
>> Laboratoire de Bioinformatique, modelisation et simulation des acides
>> nucleiques
>> Universite Louis Pasteur
>> Institut de biologie moleculaire et cellulaire du CNRS
>> UPR9002, Architecture et Reactivite de l'ARN
>> 15 rue Rene Descartes
>> F-67084 Strasbourg Cedex
>> France
>>
>> Tel + 33 (0) 3 88 417053
>> FAX + 33 (0) 3 88 60 22 18
>>
>> f.jossinet at ibmc.u-strasbg.fr
>> fjossinet at gmail.com
>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>> http://fjossinet.u-strasbg.fr/
>>
>>
>>
>>
>>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From fjossinet at orange.fr  Mon Oct 20 16:40:26 2008
From: fjossinet at orange.fr (Fabrice Jossinet)
Date: Mon, 20 Oct 2008 22:40:26 +0200
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
	<59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>
Message-ID: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr>

Hi Andreas,

yes of course, I really would like to work with you (I like your work  
with SPICE). I wanted to contact you about this point before to start.  
Concerning the tertiary structure representation, I need to annotate  
an RNA tertiary structure with base-pairs families (as described in http://www.ncbi.nlm.nih.gov/pubmed/12177293 
  or in http://prion.bchs.uh.edu/bp_type/ ) and structural motifs  
(like those listed in the SCOR database  http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814) 
. The idea is to attach these features to a 3D in the same way than  
the features attached to a sequence (1D).

What do you think?

Fabrice

Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit :

> Hi Fabrice,
>
> Regarding the tertiaty structure representation we should work
> together. There is a seet of tools available already in the current
> biojava 1.7 which I was intending to maintain and migrate to biojava v
> 3. Let me know if you have specific RNA related requests...
>
> Andreas
>
> On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland
> <holland at eaglegenomics.com> wrote:
>> Excellent! Thanks for your offer of help!
>>
>> Yes, an advanced RNA module would be very helpful indeed. You should
>> probably call it 'rna'.
>>
>> As long as everyone who intends to work on BJ3 declares their  
>> intentions
>> here, as you just have, then basically it's first come first  
>> served. I won't
>> be doing any official supervision other than keeping an eye on  
>> committed
>> code once in a while to make sure it all looks OK. So feel free to  
>> start
>> coding straight away!
>>
>> All new modules should probably start by:
>>
>> 1. copying the existing dna module to something new, like 'rna' in  
>> this
>> case.
>> 2. remove all the hidden .svn directories from the copy,
>> 3. update the pom.xml in the copy (do a search-and-replace on dna  
>> and change
>> to the new name, rna in this case), delete the existing source  
>> packages in
>> src/main/java (org.biojava.dna) and create suitable new ones
>> (org.biojava.rna in this case).
>> 4. empty out the target/ folder then svn add the new module
>> 5. svn:ignore the target/ directory in your new module,
>> 6. include your new module in the list at the end of the pom.xml in  
>> the root
>> directory of the biojava3 branch.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>>
>>> Dear Richard,
>>>
>>> I'm answering to your "official call", to propose you my help for  
>>> the
>>> development of the biojava3 code. With the modularity of Maven, I  
>>> also would
>>> like to proposes you my help for the development of a module that  
>>> will use
>>> the biojava3 code to manage more specialized RNA stuff (secondary  
>>> and
>>> tertiary structures, base-pairs classifications, modified  
>>> nucleotides, RNA
>>> alignments,....).
>>>
>>> What will be the next step for me? Will you make a selection?
>>>
>>> Best Regards
>>>
>>> Fabrice Jossinet
>>>
>>> --
>>> Dr. Fabrice Jossinet
>>> Laboratoire de Bioinformatique, modelisation et simulation des  
>>> acides
>>> nucleiques
>>> Universite Louis Pasteur
>>> Institut de biologie moleculaire et cellulaire du CNRS
>>> UPR9002, Architecture et Reactivite de l'ARN
>>> 15 rue Rene Descartes
>>> F-67084 Strasbourg Cedex
>>> France
>>>
>>> Tel + 33 (0) 3 88 417053
>>> FAX + 33 (0) 3 88 60 22 18
>>>
>>> f.jossinet at ibmc.u-strasbg.fr
>>> fjossinet at gmail.com
>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>>> http://fjossinet.u-strasbg.fr/
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>


--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From markjschreiber at gmail.com  Mon Oct 20 22:54:27 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 10:54:27 +0800
Subject: [Biojava-dev] Biojava / BioSQL entity beans
Message-ID: <93b45ca50810201954k44ab0f65xb94a0214d8eb4e13@mail.gmail.com>

Hi -

Richard has kindly uploaded some JPA Entity beans that map to the
BioSQL database schema as a BioSQL module for BJ3.  These entity beans
where generated as part of the Tokyo webservices workshop.  As
Entities they are useful as POJOs as well as data transfer via JPA,
JAXB and can be used in EJB containers or a plain old JVM.  The have
no biological smarts and the intention was/is that these will be
provided by wrapping them in Bio-aware (and more thread safe) wrappers
that implement interfaces from other BJ3 modules.  In essence it is a
persistence layer.

The following is copied verbatim from the package-info.java and gives
you some idea of how I intend the package to be used (obviously some
of this is still to come).  There is also some discussion of some of
the gotcha's that might trip you up when playing with object
relational persistence.

BTW the naming convention is to call something FooEntity. Where BioSQL
requires a compound primary key this is implemented as an Embeddable
object called FooEntityPK which is the key for FooEntity.  The other
thing you may see is FooEntityUK which is the same concept but
represents some of the cases where BioSQL tables don't have a primary
key (even a compound one) but implicitly they do because all the
fields have the SQL unique restriction. In these cases JPA still
requires an Embeddable key to track updates. As far as Java is
concerned they are the same as a FooEntityPK but I used a different
name to make the distinction.

The annotations provide mapping to tables from a Derby database. This
is the reference Java in memory DB which can run from any JVM and is
also found in Glassfish. The mappings will likely also work with
MySQL. For Oracle (and possibly others) you would need to override the
@GeneratedValue strategy for generating primary keys. I believe this
can be done with external XML config files. You may also wish to
overide the default eager loading and cascade annotations depending on
your JPA persistence method and preferences.

This has been lightly tested using Glassfish, Derby and Toplink
essentials and is a work in progress but seems to work OK.

Best regards,

- Mark

/**
 * The package contains Entity representations of BioJava classes.
 * The purpose of these entities is to allow simple serialization of
BioJava data
 * using binary serialization for protocols that require this (eg RPC between
 * Java application servers) as well as persistence mechanisms that require bean
 * like ojbects such as the Java Persistence Architechture (JPA) or the
 * Java API for XML Binding (JAXB). For this reason all objects in this package
 * should provide a parameterless public constructor and public get/set methods
 * for relevant fields.
 * <p>
 * Given the public nature of the constructors and the setters in these beans
 * these classes are not intended for direct use in general programming when
 * using the BioJava v3 API. This is because it is possible to leave the bean in
 * and inconsitent state and they are <b>not thread safe</b> unless
synchronization
 * controlled externally (via synchornization blocks or via a
application container).
 * </p><p>
 * The Entities are intended to back other objects that a
 * programer will interact with directly. For example
<code>Foo.class</code> will be backed
 * by <code>FooEntity.class</code>. Generally interaction with
Foo.class is to be prefered and
 * will often be more sensible as the entities typically provide no 'biological
 * behaivour'. Relevant behaivour should be provided by the wrapping
class. It is best
 * to think of <code>Foo</code> as a view onto the data that is held in the
 * <code>FooEntity</code>.  A good example is the sophisticated Symbol
 * behaivour that can represent biological logic about IUPAC ambiguity symbols.
 * For example a 'w' in a Biosequence represents an abiguity between
'a' and 't',
 * whereas a 'w' in BiosequenceEntity is simply a 'w' and nothing else.
 * </p><p>
 * The wrapper entity pattern is intended to allow for a lot of the advanced
 * behaivour in the original BioJava while also allowing use of modern transport
 * and persistence packages. This is achieved by peristing and transporting the
 * entity without the wrapper and re-wrapping it at the other end.
 * </p><p>
 * Currently BioJava v3 uses annotated @Id fields to define
 * <code>equals(Object o)</code>. Consistent definition is critical to how
 * the object will behave when persisted to a database. In the case of:
 * <pre>
 * Foo f = ... initialize
 * Foo fo = ... initialize
 * boolean b = f.equals(fo);
 * </pre>
 * <code>b</code> would be true if both objects share the same value
 * (or embeddable object) in the field that represents the primary key in the
 * database <b>even</b> if all other fields are equal. This is desirable because
 * two entities representing the same DB record may be retreived from
two different
 * sessions. Additionally these are the identity fields, so logically,
they should map to
 * the concept of identity. Finally, searching a collection is made very simple
 * without requireing an iterator:
 * <pre>
 * Integer id = //code to initialize
 * collection.contains(new Foo(id));
 * </pre>
 * By default BioJava v3 entities use <b>only</b> the primary key
field for equality
 * If either record has <code>null</code> as the primary key value it
is never equal
 * to another. When implementing <code>equals(Object o)</code> it is
not advisable to perform
 * the test this.getClass() == o.getClass() because of the possibility of proxy
 * classes used in JPA. This can, however, lead to an issue with the
 * <code>hashcode()</code> method.  Consider the following code:
 * <pre>
 * Foo foo = new Foo() //no primary key
 * HashSet set = new HashSet();
 * set.add(foo);
 * // code here to persist Foo and consequently generate it's PK
 * boolean b = set.contains(foo);
 * </pre>
 * Because only the PK is used for equality, then the PK is used in
the hashcode.
 * This means that <code>b</code> is probably going to be false because
 * it would have been stored in a hash bucket using the old hashcode that will
 * now be different even though the set actually does contain a pointer to foo.
 * Although a potential deficiency it is unlikely to be a major problem for
 * BioJava v3 developers because using entity backed objects is
prefered to direct
 * interaction with entities. If you need to use entities directly
then use hashed
 * collections with caution.
 *
 * <p>Wrapper classes can either delegate it's equals call to the underlying
 * entity or it can do something that is more biologically sensible
 * (as PK values are typically not exposed in the wrapper). It is probably more
 * sensible for a wrapper to define it's own <code>equals</code> (and
<code>haschode</code>
 * implementations due to the limitations of the default @Id based system
 * described above. Especially the potential hashcode problems.
 *
 * For example <code>FooSequence.class</code> might want to base
 * equality on the exact match of the DNA sequence it holds even though
 * <code>FooSequenceEntity.class</code> may only use the PK field. If delegation
 * is used (or not) it should be clearly documented.
 * <p>
 *
 * </p>
 * @author Mark Schreiber
 */
package org.biojava.biosql.entity;

From andreas at sdsc.edu  Mon Oct 20 23:17:28 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Oct 2008 20:17:28 -0700
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
Message-ID: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>

Hi,

Couple of thoughts regarding biojava v3:

License: Since it seems we will end up copying code from biojava 1.6
to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
people should still use the same biojava license headers when
committing new files and all code will be considered to be LGPL, if no
header is present. Do NOT commit code under other licenses.

Installation: We need some installation instructions on the wiki site,
e.g. how to get the maven setup running.  What are the code
conventions for the new version?

Blast: the Blast parsing modules are among the most frequently used
ones in biojava 1.6. To make people use biojava v3 it will be crucial
to have a port of them to the new version. Does anybody want to take
care of that?

Automated builds: is it interesting to have automated builds set up
for the new version at this stage, or should we wait until a more
mature stage? I could easily add another auto-build similar to the one
for biojava 1.6 at http://www.spice-3d.org/cruise/

Andreas

On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From fjossinet at orange.fr  Tue Oct 21 03:09:46 2008
From: fjossinet at orange.fr (Fabrice Jossinet)
Date: Tue, 21 Oct 2008 09:09:46 +0200
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
Message-ID: <CC8BE015-CF9C-4FA1-AF96-3626EDD83360@orange.fr>

Hi Richard,

I did everything but, with my IntelliJ IDE, I cannot commit the new  
rna module due to a failure in authentification. Do I have to register  
somewhere to have an account? (but perhaps it's a wrong configuration  
on my side)

Fabrice

Le 20 oct. 08 ? 15:51, Richard Holland a ?crit :

> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should  
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their  
> intentions here, as you just have, then basically it's first come  
> first served. I won't be doing any official supervision other than  
> keeping an eye on committed code once in a while to make sure it all  
> looks OK. So feel free to start coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in  
> this case.
> 2. remove all the hidden .svn directories from the copy,
> 3. update the pom.xml in the copy (do a search-and-replace on dna  
> and change to the new name, rna in this case), delete the existing  
> source packages in src/main/java (org.biojava.dna) and create  
> suitable new ones (org.biojava.rna in this case).
> 4. empty out the target/ folder then svn add the new module
> 5. svn:ignore the target/ directory in your new module,
> 6. include your new module in the list at the end of the pom.xml in  
> the root directory of the biojava3 branch.
>
> cheers,
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
> Dear Richard,
>
> I'm answering to your "official call", to propose you my help for  
> the development of the biojava3 code. With the modularity of Maven,  
> I also would like to proposes you my help for the development of a  
> module that will use the biojava3 code to manage more specialized  
> RNA stuff (secondary and tertiary structures, base-pairs  
> classifications, modified nucleotides, RNA alignments,....).
>
> What will be the next step for me? Will you make a selection?
>
> Best Regards
>
> Fabrice Jossinet
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
>
>
>
> -- 
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/


--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From holland at eaglegenomics.com  Tue Oct 21 05:06:41 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 10:06:41 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
Message-ID: <a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>

>
>
> License: Since it seems we will end up copying code from biojava 1.6
> to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
> people should still use the same biojava license headers when
> committing new files and all code will be considered to be LGPL, if no
> header is present. Do NOT commit code under other licenses.
>
> Installation: We need some installation instructions on the wiki site,
> e.g. how to get the maven setup running.  What are the code
> conventions for the new version?


Not sure where best to put it in the Wiki, but I agree it needs to go there
somewhere.

Installation is a one-liner from within the top level of the project:

   mvn install

This compiles and installs the JARs into your local Maven repository, and
also downloads and installs any external dependencies. Then you can add the
installed modules as dependencies in your own Maven projects.

If you need to write a launcher script for your project, or you want to use
the JAR files outside Maven, you can use this command to generate the
CLASSPATH for use outside Maven. This only includes external dependencies -
you'll also need to add to it the individual JAR files from inside the
various target/ folders that Maven built for you:

  mvn dependency:build-classpath

Code conventions are simple:

1. I'm not fussed about the specific formatter people use in each module, as
long as the code is all formatted using some kind of consistent method. I
personally just use the default settings from Format code in NetBeans.

2. Use 'this' wherever possible, and for static references, use the
classname prefix (e.g. MyClass.staticField). I hate having to try and work
out in my head which references are going where, and which are static and
which are not!

3. Comment every single method, even if it's private. This helps understand
the flow of your code. Also comment liberally inside methods if they are
longer than just a few lines (i.e. if you can't fit the entire method within
the code panel in NetBeans, its going to need internal comments).

4. When writing getters/setters, follow the Java beans conventions so that
automated frameworks like Spring can easily pick it up and work with it.

5. Please write tests for your code using JUnit conventions, inside the
test/ folder of each module. I know I haven't done this myself yet, but I'm
going to!


>
>
> Blast: the Blast parsing modules are among the most frequently used
> ones in biojava 1.6. To make people use biojava v3 it will be crucial
> to have a port of them to the new version. Does anybody want to take
> care of that?


I'll second that. Blast is vital. We'd really appreciate a volunteer,
please!


>
> Automated builds: is it interesting to have automated builds set up
> for the new version at this stage, or should we wait until a more
> mature stage? I could easily add another auto-build similar to the one
> for biojava 1.6 at http://www.spice-3d.org/cruise/


You could do, although I don't think they'd be much use yet. But why not
start early then we won't forget to do it later.


Richard


>
> Andreas
>
> On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
> <holland at eaglegenomics.com> wrote:
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From holland at eaglegenomics.com  Tue Oct 21 05:09:26 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 10:09:26 +0100
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <CC8BE015-CF9C-4FA1-AF96-3626EDD83360@orange.fr>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
	<CC8BE015-CF9C-4FA1-AF96-3626EDD83360@orange.fr>
Message-ID: <a0d826f40810210209x698b786ag86414f58e97ef45d@mail.gmail.com>

Ah, yes. The person to talk to is Andreas. He has control over the SVN
repository.


2008/10/21 Fabrice Jossinet <fjossinet at orange.fr>

> Hi Richard,
> I did everything but, with my IntelliJ IDE, I cannot commit the new rna
> module due to a failure in authentification. Do I have to register somewhere
> to have an account? (but perhaps it's a wrong configuration on my side)
>
> Fabrice
>
> Le 20 oct. 08 ? 15:51, Richard Holland a ?crit :
>
> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their intentions
> here, as you just have, then basically it's first come first served. I won't
> be doing any official supervision other than keeping an eye on committed
> code once in a while to make sure it all looks OK. So feel free to start
> coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in this
> case.
> 2. remove all the hidden .svn directories from the copy,
> 3. update the pom.xml in the copy (do a search-and-replace on dna and
> change to the new name, rna in this case), delete the existing source
> packages in src/main/java (org.biojava.dna) and create suitable new ones
> (org.biojava.rna in this case).
> 4. empty out the target/ folder then svn add the new module
> 5. svn:ignore the target/ directory in your new module,
> 6. include your new module in the list at the end of the pom.xml in the
> root directory of the biojava3 branch.
>
> cheers,
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>
>> Dear Richard,
>>
>> I'm answering to your "official call", to propose you my help for the
>> development of the biojava3 code. With the modularity of Maven, I also would
>> like to proposes you my help for the development of a module that will use
>> the biojava3 code to manage more specialized RNA stuff (secondary and
>> tertiary structures, base-pairs classifications, modified nucleotides, RNA
>> alignments,....).
>>
>> What will be the next step for me? Will you make a selection?
>>
>> Best Regards
>>
>> Fabrice Jossinet
>>
>> --
>> Dr. Fabrice Jossinet
>> Laboratoire de Bioinformatique, modelisation et simulation des acides
>> nucleiques
>> Universite Louis Pasteur
>> Institut de biologie moleculaire et cellulaire du CNRS
>> UPR9002, Architecture et Reactivite de l'ARN
>> 15 rue Rene Descartes
>> F-67084 Strasbourg Cedex
>> France
>>
>> Tel + 33 (0) 3 88 417053
>> FAX + 33 (0) 3 88 60 22 18
>>
>> f.jossinet at ibmc.u-strasbg.fr
>> fjossinet at gmail.com
>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>> http://fjossinet.u-strasbg.fr/
>>
>>
>>
>>
>>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
>
>
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Tue Oct 21 05:26:41 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 17:26:41 +0800
Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	<a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>
Message-ID: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>

>> Blast: the Blast parsing modules are among the most frequently used
>> ones in biojava 1.6. To make people use biojava v3 it will be crucial
>> to have a port of them to the new version. Does anybody want to take
>> care of that?
>
>
> I'll second that. Blast is vital. We'd really appreciate a volunteer,
> please!
>

BlastXML output would certainly be the easiest place to start. I also
think with the new Thing/ ThingBuilder framework it will be possible
to develop all manner of parsers for the vagaries of Blast text output
that come with each new release of Blast. Possible but maybe not a
good idea. I don't think that output was ever supposed to be machine
readable.  The table formatted output (-m8 I think) would be a better
option.

Given the DTD it should be possible to do a quick JAXB binding. How
would that work in the Thing/ ThingBuilder paradigm?

- Mark

From holland at eaglegenomics.com  Tue Oct 21 06:18:40 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 11:18:40 +0100
Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please!
In-Reply-To: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	<a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>
	<93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>
Message-ID: <a0d826f40810210318gbb8b352jd8468395a1926c48@mail.gmail.com>

JAXB would follow the exact same Thing/ThingBuilder pattern, but with the
following subtle differences...

0. Your root data model object as generated by JAXB should be modified to
implement Thing, making it a JAXBThing.
1. JAXBReader (extends ThingReader) would open and read the file using JAXB
and directly construct JAXBThings.
2. JAXBReceiver (extends ThingReceiver) be a pass-through interface with
just one method, something like setJAXBThing() to pass in the already-parsed
JAXBThing directly.
3. Any converters would expand/deflate data from other formats to/from the
JAXBThing object directly.


Richard.

2008/10/21 Mark Schreiber <markjschreiber at gmail.com>

> >> Blast: the Blast parsing modules are among the most frequently used
> >> ones in biojava 1.6. To make people use biojava v3 it will be crucial
> >> to have a port of them to the new version. Does anybody want to take
> >> care of that?
> >
> >
> > I'll second that. Blast is vital. We'd really appreciate a volunteer,
> > please!
> >
>
> BlastXML output would certainly be the easiest place to start. I also
> think with the new Thing/ ThingBuilder framework it will be possible
> to develop all manner of parsers for the vagaries of Blast text output
> that come with each new release of Blast. Possible but maybe not a
> good idea. I don't think that output was ever supposed to be machine
> readable.  The table formatted output (-m8 I think) would be a better
> option.
>
> Given the DTD it should be possible to do a quick JAXB binding. How
> would that work in the Thing/ ThingBuilder paradigm?
>
> - Mark
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From dicknetherlands at gmail.com  Tue Oct 21 07:14:29 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Tue, 21 Oct 2008 12:14:29 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
Message-ID: <a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>

For now, yes it's empty. But I can envisage situations where it might be
nice to have Thing implement some common methods (e.g. isMachineGenerated(),
isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder
for future expansion, than have to re-engineer everything should we identify
a need for common functions in future.

You'll see that Thing already extends Serializable, implying that all Things
must be able to persist to an object backing store. Serializable itself is
also an empty interface!

Also I like the idea of having Thing, not Object, as a kind of marker of
intention. To me it makes it clearer when reading code to avoid Object
wherever possible. Thing may not be any more clever than Object, but it
immediately declares an intention when reading code as to what kind of
Object should be expected.


2008/10/21 Mark Schreiber <markjschreiber at gmail.com>

> Is there any need for Thing at all? Can't a bulder be typed to produce
> something that extends Object?
>
> If Thing provides no behaivour contract or meta-information then why
> does it exist?
>
> - Mark
>
> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> > Depends on what you want to program. If you want to have a collection of
> > objects which are Things & perform a common action on them then
> > annotations are not the way forward.
> >
> > If you want to have some kind of meta-programming occurring & need a
> > class to be multiple things then annotations are right. There is
> > currently no way to enforce compile time dependencies on annotations &
> > my thinking is that this is right. Annotations should be meta data or
> > provide a way to alter a class in a non-invasive way (think Web Service
> > annotations creating WS Servers & Clients without any alteration of the
> > class).
> >
> > Andy
> >
> > Richard Holland wrote:
> >> Spot on.
> >>
> >> Annotation/interface.... i think Annotation is probably better as you
> >> suggest, but I'd have to look into that. Not sure how it works with
> >> collections and generics. If it does turn out to be a better bet, I'll
> >> change it over.
> >>
> >> With the BioSQL dependencies, take a look at the pom.xml file inside the
> >> biojava-dna module. It declares a dependency on biojava-core. If you
> want to
> >> add dependencies to external JARs, take a look at biojava-biosql's
> pom.xml
> >> to see how it depends on javax.persistence. (The easiest way to add
> these is
> >> via an IDE such as NetBeans, which is what I'm using at the moment).
> >>
> >> cheers,
> >> Richard
> >>
> >> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>
> >>> So if I want to build a BioSQL loader from Genbank then would the
> >>> classes (or there wrappers) in the BioSQL Entity package need to
> >>> implement Thing?  Would maven have an issue with that or would it just
> >>> create a dependency on core? (you can tell I've never used Maven
> >>> right).
> >>>
> >>> From a design point of view should Thing be an interface or an
> >>> Annotation? The reason I ask is that it doesn't define any methods so
> >>> it is more of a tag than an interface.
> >>>
> >>> Anyway, my understanding is that I would use a Genbank parser (or
> >>> write one). Write a EntityReceiver interface (probably more than one
> >>> given the number of entities in BioSQL, implement a EntityBuilder
> >>> (again possibly more than one) that implements EntityReceiver and
> >>> builds Entity beans from messages it receives. In this case I probably
> >>> wouldn't provide a writer as JPA would be writing the beans to the
> >>> database.  Would this be how you imagine it?
> >>>
> >>> - Mark
> >>>
> >>>
> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> >>> <holland at eaglegenomics.com> wrote:
> >>>> (From now on I will only be posting these development messages to
> >>>> biojava-dev, which is the intended purpose of that list. Those of you
> who
> >>>> wish to keep track of things but are currently only subscribed to
> >>> biojava-l
> >>>> should also subscribe to biojava-dev in order to keep up to date.)
> >>>>
> >>>> As promised, I've committed a new package in the biojava-core module
> that
> >>>> should help understand how to do file parsing and conversion and
> writing
> >>> in
> >>>> the new BJ3 modules. Here's an example of how to use it to write a
> >>> Genbank
> >>>> parser (note no parsers actually exist yet!):
> >>>>
> >>>> 1. Design yourself a Genbank class which implements the interface
> Thing
> >>> and
> >>>> can fully represent all the data that might possibly occur inside a
> >>> Genbank
> >>>> file.
> >>>>
> >>>> 2. Write an interface called GenbankReceiver, which extends
> ThingReceiver
> >>>> and defines all the methods you might need in order to construct a
> >>> Genbank
> >>>> object in an asynchronous fashion.
> >>>>
> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
> >>>> ThingBuilder. It's job is to receive data via method calls, use that
> data
> >>> to
> >>>> construct a Genbank object, then provide that object on demand.
> >>>>
> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> >>>> constructing new Genbank objects, it writes Genbank records to file
> that
> >>>> reflect the data it receives.
> >>>>
> >>>> 5. Write a GenbankReader class which implements ThingReader. It can
> read
> >>>> GenbankFiles and output the data to the methods of the ThingReceiver
> >>>> provided to it, which in this case could be anything which implements
> the
> >>>> interface GenbankReceiver.
> >>>>
> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
> takes a
> >>>> Genbank object and will fire off data from it to the provided
> >>> ThingReceiver
> >>>> (a GenbankReceiver instance) as if the Genbank object was being read
> from
> >>> a
> >>>> file or some other source.
> >>>>
> >>>> That's it! OK so it's a minimum of 6 classes instead of the original 1
> or
> >>> 2,
> >>>> but the additional steps are necessary for flexibility in converting
> >>> between
> >>>> formats.
> >>>>
> >>>> Now to use it (you'll probably want a GenbankTools class to wrap these
> >>> steps
> >>>> up for user-friendliness, including various options for opening files,
> >>>> etc.):
> >>>>
> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
> >>> the
> >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods
> on
> >>>> ThingParser to get the objects out.
> >>>>
> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
> >>> wrapping
> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the
> >>> parseAll()
> >>>> method on the ThingParser to dump the whole lot to your chosen output.
> >>>>
> >>>> The clever bit comes when you want to convert between files. Imagine
> >>> you've
> >>>> done all the above for Genbank, and you've also done it for FASTA. How
> to
> >>>> convert between them? What you need to do is this:
> >>>>
> >>>> 1. Implement all the classes for both Genbank and FASTA.
> >>>>
> >>>> 2. Write a GenbankFASTAConverter class that implements
> >>> ThingConverter<FASTA>
> >>>> and GenbankReceiver, and will internally convert the data received and
> >>> pass
> >>>> it on out to the receiver provided, which will be a FASTAReceiver
> >>> instance.
> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
> >>> opposite
> >>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >>>>
> >>>> Then to convert you use ThingParser again:
> >>>>
> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a
> >>>> FASTAGenbankConverter instance to the converter chain. Use the
> iterator
> >>> to
> >>>> get your Genbank objects out of your FASTA file.
> >>>>
> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
> >>>> GenbankWriter instead and use parseAll() instead of the iterator
> methos.
> >>>>
> >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide
> a
> >>>> FASTAEmitter wrapping your FASTA object as the reader instead.
> >>>>
> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both
> the
> >>>> reader and the receiver as per options 2 and 3.
> >>>>
> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> >>> mentions
> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >>>>
> >>>> One last and very important feature of this approach is that if you
> >>> discover
> >>>> that nobody has written the appropriate converter for your chosen pair
> of
> >>>> formats A and C, but converters do exist to map A to some other format
> B
> >>> and
> >>>> that other format B on to C, then you can just put the two converts
> A-B
> >>> and
> >>>> B-C into the ThingParser chain and it'll work perfectly.
> >>>>
> >>>> Enjoy!
> >>>>
> >>>> cheers,
> >>>> Richard
> >>>>
> >>>> --
> >>>> Richard Holland, BSc MBCS
> >>>> Finance Director, Eagle Genomics Ltd
> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >>>> http://www.eaglegenomics.com/
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>
> >>
> >>
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From markjschreiber at gmail.com  Tue Oct 21 07:24:13 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 19:24:13 +0800
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
Message-ID: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>

Depending on what you want them for isMachineGenerated(),
isManuallyCurated(), would possibly be better as annotations
(@MachineGenerated, @ManuallyCurated). This is true metadata.

Probably if Java had annotations in version 1.1 Serializable would
also be an Annotation.  I would agree with the idea that ThingBuilder
etc should be typed on extends Serializable.

- Mark

On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
<dicknetherlands at gmail.com> wrote:
> For now, yes it's empty. But I can envisage situations where it might be
> nice to have Thing implement some common methods (e.g. isMachineGenerated(),
> isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder
> for future expansion, than have to re-engineer everything should we identify
> a need for common functions in future.
>
> You'll see that Thing already extends Serializable, implying that all Things
> must be able to persist to an object backing store. Serializable itself is
> also an empty interface!
>
> Also I like the idea of having Thing, not Object, as a kind of marker of
> intention. To me it makes it clearer when reading code to avoid Object
> wherever possible. Thing may not be any more clever than Object, but it
> immediately declares an intention when reading code as to what kind of
> Object should be expected.
>
>
> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>
>> Is there any need for Thing at all? Can't a bulder be typed to produce
>> something that extends Object?
>>
>> If Thing provides no behaivour contract or meta-information then why
>> does it exist?
>>
>> - Mark
>>
>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>> > Depends on what you want to program. If you want to have a collection of
>> > objects which are Things & perform a common action on them then
>> > annotations are not the way forward.
>> >
>> > If you want to have some kind of meta-programming occurring & need a
>> > class to be multiple things then annotations are right. There is
>> > currently no way to enforce compile time dependencies on annotations &
>> > my thinking is that this is right. Annotations should be meta data or
>> > provide a way to alter a class in a non-invasive way (think Web Service
>> > annotations creating WS Servers & Clients without any alteration of the
>> > class).
>> >
>> > Andy
>> >
>> > Richard Holland wrote:
>> >> Spot on.
>> >>
>> >> Annotation/interface.... i think Annotation is probably better as you
>> >> suggest, but I'd have to look into that. Not sure how it works with
>> >> collections and generics. If it does turn out to be a better bet, I'll
>> >> change it over.
>> >>
>> >> With the BioSQL dependencies, take a look at the pom.xml file inside
>> >> the
>> >> biojava-dna module. It declares a dependency on biojava-core. If you
>> >> want to
>> >> add dependencies to external JARs, take a look at biojava-biosql's
>> >> pom.xml
>> >> to see how it depends on javax.persistence. (The easiest way to add
>> >> these is
>> >> via an IDE such as NetBeans, which is what I'm using at the moment).
>> >>
>> >> cheers,
>> >> Richard
>> >>
>> >> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>> >>
>> >>> So if I want to build a BioSQL loader from Genbank then would the
>> >>> classes (or there wrappers) in the BioSQL Entity package need to
>> >>> implement Thing?  Would maven have an issue with that or would it just
>> >>> create a dependency on core? (you can tell I've never used Maven
>> >>> right).
>> >>>
>> >>> From a design point of view should Thing be an interface or an
>> >>> Annotation? The reason I ask is that it doesn't define any methods so
>> >>> it is more of a tag than an interface.
>> >>>
>> >>> Anyway, my understanding is that I would use a Genbank parser (or
>> >>> write one). Write a EntityReceiver interface (probably more than one
>> >>> given the number of entities in BioSQL, implement a EntityBuilder
>> >>> (again possibly more than one) that implements EntityReceiver and
>> >>> builds Entity beans from messages it receives. In this case I probably
>> >>> wouldn't provide a writer as JPA would be writing the beans to the
>> >>> database.  Would this be how you imagine it?
>> >>>
>> >>> - Mark
>> >>>
>> >>>
>> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>> >>> <holland at eaglegenomics.com> wrote:
>> >>>> (From now on I will only be posting these development messages to
>> >>>> biojava-dev, which is the intended purpose of that list. Those of you
>> >>>> who
>> >>>> wish to keep track of things but are currently only subscribed to
>> >>> biojava-l
>> >>>> should also subscribe to biojava-dev in order to keep up to date.)
>> >>>>
>> >>>> As promised, I've committed a new package in the biojava-core module
>> >>>> that
>> >>>> should help understand how to do file parsing and conversion and
>> >>>> writing
>> >>> in
>> >>>> the new BJ3 modules. Here's an example of how to use it to write a
>> >>> Genbank
>> >>>> parser (note no parsers actually exist yet!):
>> >>>>
>> >>>> 1. Design yourself a Genbank class which implements the interface
>> >>>> Thing
>> >>> and
>> >>>> can fully represent all the data that might possibly occur inside a
>> >>> Genbank
>> >>>> file.
>> >>>>
>> >>>> 2. Write an interface called GenbankReceiver, which extends
>> >>>> ThingReceiver
>> >>>> and defines all the methods you might need in order to construct a
>> >>> Genbank
>> >>>> object in an asynchronous fashion.
>> >>>>
>> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>> >>>> ThingBuilder. It's job is to receive data via method calls, use that
>> >>>> data
>> >>> to
>> >>>> construct a Genbank object, then provide that object on demand.
>> >>>>
>> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>> >>>> constructing new Genbank objects, it writes Genbank records to file
>> >>>> that
>> >>>> reflect the data it receives.
>> >>>>
>> >>>> 5. Write a GenbankReader class which implements ThingReader. It can
>> >>>> read
>> >>>> GenbankFiles and output the data to the methods of the ThingReceiver
>> >>>> provided to it, which in this case could be anything which implements
>> >>>> the
>> >>>> interface GenbankReceiver.
>> >>>>
>> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
>> >>>> takes a
>> >>>> Genbank object and will fire off data from it to the provided
>> >>> ThingReceiver
>> >>>> (a GenbankReceiver instance) as if the Genbank object was being read
>> >>>> from
>> >>> a
>> >>>> file or some other source.
>> >>>>
>> >>>> That's it! OK so it's a minimum of 6 classes instead of the original
>> >>>> 1 or
>> >>> 2,
>> >>>> but the additional steps are necessary for flexibility in converting
>> >>> between
>> >>>> formats.
>> >>>>
>> >>>> Now to use it (you'll probably want a GenbankTools class to wrap
>> >>>> these
>> >>> steps
>> >>>> up for user-friendliness, including various options for opening
>> >>>> files,
>> >>>> etc.):
>> >>>>
>> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader
>> >>>> as
>> >>> the
>> >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods
>> >>>> on
>> >>>> ThingParser to get the objects out.
>> >>>>
>> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>> >>> wrapping
>> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>> >>> parseAll()
>> >>>> method on the ThingParser to dump the whole lot to your chosen
>> >>>> output.
>> >>>>
>> >>>> The clever bit comes when you want to convert between files. Imagine
>> >>> you've
>> >>>> done all the above for Genbank, and you've also done it for FASTA.
>> >>>> How to
>> >>>> convert between them? What you need to do is this:
>> >>>>
>> >>>> 1. Implement all the classes for both Genbank and FASTA.
>> >>>>
>> >>>> 2. Write a GenbankFASTAConverter class that implements
>> >>> ThingConverter<FASTA>
>> >>>> and GenbankReceiver, and will internally convert the data received
>> >>>> and
>> >>> pass
>> >>>> it on out to the receiver provided, which will be a FASTAReceiver
>> >>> instance.
>> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>> >>> opposite
>> >>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
>> >>>>
>> >>>> Then to convert you use ThingParser again:
>> >>>>
>> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>> >>>> FASTAGenbankConverter instance to the converter chain. Use the
>> >>>> iterator
>> >>> to
>> >>>> get your Genbank objects out of your FASTA file.
>> >>>>
>> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>> >>>> GenbankWriter instead and use parseAll() instead of the iterator
>> >>>> methos.
>> >>>>
>> >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide
>> >>>> a
>> >>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>> >>>>
>> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both
>> >>>> the
>> >>>> reader and the receiver as per options 2 and 3.
>> >>>>
>> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>> >>> mentions
>> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>> >>>>
>> >>>> One last and very important feature of this approach is that if you
>> >>> discover
>> >>>> that nobody has written the appropriate converter for your chosen
>> >>>> pair of
>> >>>> formats A and C, but converters do exist to map A to some other
>> >>>> format B
>> >>> and
>> >>>> that other format B on to C, then you can just put the two converts
>> >>>> A-B
>> >>> and
>> >>>> B-C into the ThingParser chain and it'll work perfectly.
>> >>>>
>> >>>> Enjoy!
>> >>>>
>> >>>> cheers,
>> >>>> Richard
>> >>>>
>> >>>> --
>> >>>> Richard Holland, BSc MBCS
>> >>>> Finance Director, Eagle Genomics Ltd
>> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> >>>> http://www.eaglegenomics.com/
>> >>>> _______________________________________________
>> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >>>>
>> >>
>> >>
>> >>
>> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>

From andreas at sdsc.edu  Tue Oct 21 07:31:40 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 21 Oct 2008 04:31:40 -0700
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
	<59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>
	<086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr>
Message-ID: <59a41c430810210431v2a9e1647w6a6fca991926f175@mail.gmail.com>

Hi Fabrice,

The biojava 1 features could only accept integer positions as start
and stop. For protein structures an amino acid is uniquely identified
by a number and an insertion code. As such in the biojava 1 world it
was not possible to implement this for the protein structures. If we
have a cleaner interface definition for that in biojava 3 should be no
prob.

Andreas

On Mon, Oct 20, 2008 at 1:40 PM, Fabrice Jossinet <fjossinet at orange.fr> wrote:
> Hi Andreas,
> yes of course, I really would like to work with you (I like your work with
> SPICE). I wanted to contact you about this point before to start. Concerning
> the tertiary structure representation, I need to annotate an RNA tertiary
> structure with base-pairs families (as described in
> http://www.ncbi.nlm.nih.gov/pubmed/12177293 or in
> http://prion.bchs.uh.edu/bp_type/ ) and structural motifs (like those listed
> in the SCOR database
>  http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814). The idea
> is to attach these features to a 3D in the same way than the features
> attached to a sequence (1D).
> What do you think?
> Fabrice
> Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit :
>
> Hi Fabrice,
>
> Regarding the tertiaty structure representation we should work
> together. There is a seet of tools available already in the current
> biojava 1.7 which I was intending to maintain and migrate to biojava v
> 3. Let me know if you have specific RNA related requests...
>
> Andreas
>
> On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland
> <holland at eaglegenomics.com> wrote:
>
> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should
>
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their intentions
>
> here, as you just have, then basically it's first come first served. I won't
>
> be doing any official supervision other than keeping an eye on committed
>
> code once in a while to make sure it all looks OK. So feel free to start
>
> coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in this
>
> case.
>
> 2. remove all the hidden .svn directories from the copy,
>
> 3. update the pom.xml in the copy (do a search-and-replace on dna and change
>
> to the new name, rna in this case), delete the existing source packages in
>
> src/main/java (org.biojava.dna) and create suitable new ones
>
> (org.biojava.rna in this case).
>
> 4. empty out the target/ folder then svn add the new module
>
> 5. svn:ignore the target/ directory in your new module,
>
> 6. include your new module in the list at the end of the pom.xml in the root
>
> directory of the biojava3 branch.
>
> cheers,
>
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>
> Dear Richard,
>
> I'm answering to your "official call", to propose you my help for the
>
> development of the biojava3 code. With the modularity of Maven, I also would
>
> like to proposes you my help for the development of a module that will use
>
> the biojava3 code to manage more specialized RNA stuff (secondary and
>
> tertiary structures, base-pairs classifications, modified nucleotides, RNA
>
> alignments,....).
>
> What will be the next step for me? Will you make a selection?
>
> Best Regards
>
> Fabrice Jossinet
>
> --
>
> Dr. Fabrice Jossinet
>
> Laboratoire de Bioinformatique, modelisation et simulation des acides
>
> nucleiques
>
> Universite Louis Pasteur
>
> Institut de biologie moleculaire et cellulaire du CNRS
>
> UPR9002, Architecture et Reactivite de l'ARN
>
> 15 rue Rene Descartes
>
> F-67084 Strasbourg Cedex
>
> France
>
> Tel + 33 (0) 3 88 417053
>
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
>
> fjossinet at gmail.com
>
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>
> http://fjossinet.u-strasbg.fr/
>
>
>
>
>
>
>
> --
>
> Richard Holland, BSc MBCS
>
> Finance Director, Eagle Genomics Ltd
>
> M: +44 7500 438846 | E: holland at eaglegenomics.com
>
> http://www.eaglegenomics.com/
>
> _______________________________________________
>
> biojava-dev mailing list
>
> biojava-dev at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
>
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>


From holland at eaglegenomics.com  Tue Oct 21 07:39:44 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 12:39:44 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
	<93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
Message-ID: <a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>

The two examples I gave would be better as annotations, its true.
Serializable, and Cloneable for that matter, would definitely work better
that way.

Well, we could do away with Thing altogether then. I'll update the code.


2008/10/21 Mark Schreiber <markjschreiber at gmail.com>

> Depending on what you want them for isMachineGenerated(),
> isManuallyCurated(), would possibly be better as annotations
> (@MachineGenerated, @ManuallyCurated). This is true metadata.
>
> Probably if Java had annotations in version 1.1 Serializable would
> also be an Annotation.  I would agree with the idea that ThingBuilder
> etc should be typed on extends Serializable.
>
> - Mark
>
> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
> <dicknetherlands at gmail.com> wrote:
> > For now, yes it's empty. But I can envisage situations where it might be
> > nice to have Thing implement some common methods (e.g.
> isMachineGenerated(),
> > isManuallyCurated(), etc.). I'd rather have it there now to be a
> placeholder
> > for future expansion, than have to re-engineer everything should we
> identify
> > a need for common functions in future.
> >
> > You'll see that Thing already extends Serializable, implying that all
> Things
> > must be able to persist to an object backing store. Serializable itself
> is
> > also an empty interface!
> >
> > Also I like the idea of having Thing, not Object, as a kind of marker of
> > intention. To me it makes it clearer when reading code to avoid Object
> > wherever possible. Thing may not be any more clever than Object, but it
> > immediately declares an intention when reading code as to what kind of
> > Object should be expected.
> >
> >
> > 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>
> >> Is there any need for Thing at all? Can't a bulder be typed to produce
> >> something that extends Object?
> >>
> >> If Thing provides no behaivour contract or meta-information then why
> >> does it exist?
> >>
> >> - Mark
> >>
> >> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> >> > Depends on what you want to program. If you want to have a collection
> of
> >> > objects which are Things & perform a common action on them then
> >> > annotations are not the way forward.
> >> >
> >> > If you want to have some kind of meta-programming occurring & need a
> >> > class to be multiple things then annotations are right. There is
> >> > currently no way to enforce compile time dependencies on annotations &
> >> > my thinking is that this is right. Annotations should be meta data or
> >> > provide a way to alter a class in a non-invasive way (think Web
> Service
> >> > annotations creating WS Servers & Clients without any alteration of
> the
> >> > class).
> >> >
> >> > Andy
> >> >
> >> > Richard Holland wrote:
> >> >> Spot on.
> >> >>
> >> >> Annotation/interface.... i think Annotation is probably better as you
> >> >> suggest, but I'd have to look into that. Not sure how it works with
> >> >> collections and generics. If it does turn out to be a better bet,
> I'll
> >> >> change it over.
> >> >>
> >> >> With the BioSQL dependencies, take a look at the pom.xml file inside
> >> >> the
> >> >> biojava-dna module. It declares a dependency on biojava-core. If you
> >> >> want to
> >> >> add dependencies to external JARs, take a look at biojava-biosql's
> >> >> pom.xml
> >> >> to see how it depends on javax.persistence. (The easiest way to add
> >> >> these is
> >> >> via an IDE such as NetBeans, which is what I'm using at the moment).
> >> >>
> >> >> cheers,
> >> >> Richard
> >> >>
> >> >> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >> >>
> >> >>> So if I want to build a BioSQL loader from Genbank then would the
> >> >>> classes (or there wrappers) in the BioSQL Entity package need to
> >> >>> implement Thing?  Would maven have an issue with that or would it
> just
> >> >>> create a dependency on core? (you can tell I've never used Maven
> >> >>> right).
> >> >>>
> >> >>> From a design point of view should Thing be an interface or an
> >> >>> Annotation? The reason I ask is that it doesn't define any methods
> so
> >> >>> it is more of a tag than an interface.
> >> >>>
> >> >>> Anyway, my understanding is that I would use a Genbank parser (or
> >> >>> write one). Write a EntityReceiver interface (probably more than one
> >> >>> given the number of entities in BioSQL, implement a EntityBuilder
> >> >>> (again possibly more than one) that implements EntityReceiver and
> >> >>> builds Entity beans from messages it receives. In this case I
> probably
> >> >>> wouldn't provide a writer as JPA would be writing the beans to the
> >> >>> database.  Would this be how you imagine it?
> >> >>>
> >> >>> - Mark
> >> >>>
> >> >>>
> >> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> >> >>> <holland at eaglegenomics.com> wrote:
> >> >>>> (From now on I will only be posting these development messages to
> >> >>>> biojava-dev, which is the intended purpose of that list. Those of
> you
> >> >>>> who
> >> >>>> wish to keep track of things but are currently only subscribed to
> >> >>> biojava-l
> >> >>>> should also subscribe to biojava-dev in order to keep up to date.)
> >> >>>>
> >> >>>> As promised, I've committed a new package in the biojava-core
> module
> >> >>>> that
> >> >>>> should help understand how to do file parsing and conversion and
> >> >>>> writing
> >> >>> in
> >> >>>> the new BJ3 modules. Here's an example of how to use it to write a
> >> >>> Genbank
> >> >>>> parser (note no parsers actually exist yet!):
> >> >>>>
> >> >>>> 1. Design yourself a Genbank class which implements the interface
> >> >>>> Thing
> >> >>> and
> >> >>>> can fully represent all the data that might possibly occur inside a
> >> >>> Genbank
> >> >>>> file.
> >> >>>>
> >> >>>> 2. Write an interface called GenbankReceiver, which extends
> >> >>>> ThingReceiver
> >> >>>> and defines all the methods you might need in order to construct a
> >> >>> Genbank
> >> >>>> object in an asynchronous fashion.
> >> >>>>
> >> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver
> and
> >> >>>> ThingBuilder. It's job is to receive data via method calls, use
> that
> >> >>>> data
> >> >>> to
> >> >>>> construct a Genbank object, then provide that object on demand.
> >> >>>>
> >> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
> >> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> >> >>>> constructing new Genbank objects, it writes Genbank records to file
> >> >>>> that
> >> >>>> reflect the data it receives.
> >> >>>>
> >> >>>> 5. Write a GenbankReader class which implements ThingReader. It can
> >> >>>> read
> >> >>>> GenbankFiles and output the data to the methods of the
> ThingReceiver
> >> >>>> provided to it, which in this case could be anything which
> implements
> >> >>>> the
> >> >>>> interface GenbankReceiver.
> >> >>>>
> >> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
> >> >>>> takes a
> >> >>>> Genbank object and will fire off data from it to the provided
> >> >>> ThingReceiver
> >> >>>> (a GenbankReceiver instance) as if the Genbank object was being
> read
> >> >>>> from
> >> >>> a
> >> >>>> file or some other source.
> >> >>>>
> >> >>>> That's it! OK so it's a minimum of 6 classes instead of the
> original
> >> >>>> 1 or
> >> >>> 2,
> >> >>>> but the additional steps are necessary for flexibility in
> converting
> >> >>> between
> >> >>>> formats.
> >> >>>>
> >> >>>> Now to use it (you'll probably want a GenbankTools class to wrap
> >> >>>> these
> >> >>> steps
> >> >>>> up for user-friendliness, including various options for opening
> >> >>>> files,
> >> >>>> etc.):
> >> >>>>
> >> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader
> >> >>>> as
> >> >>> the
> >> >>>> reader, and GenbankBuilder as the receiver. Use the iterator
> methods
> >> >>>> on
> >> >>>> ThingParser to get the objects out.
> >> >>>>
> >> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
> >> >>> wrapping
> >> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the
> >> >>> parseAll()
> >> >>>> method on the ThingParser to dump the whole lot to your chosen
> >> >>>> output.
> >> >>>>
> >> >>>> The clever bit comes when you want to convert between files.
> Imagine
> >> >>> you've
> >> >>>> done all the above for Genbank, and you've also done it for FASTA.
> >> >>>> How to
> >> >>>> convert between them? What you need to do is this:
> >> >>>>
> >> >>>> 1. Implement all the classes for both Genbank and FASTA.
> >> >>>>
> >> >>>> 2. Write a GenbankFASTAConverter class that implements
> >> >>> ThingConverter<FASTA>
> >> >>>> and GenbankReceiver, and will internally convert the data received
> >> >>>> and
> >> >>> pass
> >> >>>> it on out to the receiver provided, which will be a FASTAReceiver
> >> >>> instance.
> >> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
> >> >>> opposite
> >> >>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >> >>>>
> >> >>>> Then to convert you use ThingParser again:
> >> >>>>
> >> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with
> a
> >> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a
> >> >>>> FASTAGenbankConverter instance to the converter chain. Use the
> >> >>>> iterator
> >> >>> to
> >> >>>> get your Genbank objects out of your FASTA file.
> >> >>>>
> >> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
> >> >>>> GenbankWriter instead and use parseAll() instead of the iterator
> >> >>>> methos.
> >> >>>>
> >> >>>> 3. From FASTA object to Genbank object: Same as option 1, but
> provide
> >> >>>> a
> >> >>>> FASTAEmitter wrapping your FASTA object as the reader instead.
> >> >>>>
> >> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap
> both
> >> >>>> the
> >> >>>> reader and the receiver as per options 2 and 3.
> >> >>>>
> >> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> >> >>> mentions
> >> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >> >>>>
> >> >>>> One last and very important feature of this approach is that if you
> >> >>> discover
> >> >>>> that nobody has written the appropriate converter for your chosen
> >> >>>> pair of
> >> >>>> formats A and C, but converters do exist to map A to some other
> >> >>>> format B
> >> >>> and
> >> >>>> that other format B on to C, then you can just put the two converts
> >> >>>> A-B
> >> >>> and
> >> >>>> B-C into the ThingParser chain and it'll work perfectly.
> >> >>>>
> >> >>>> Enjoy!
> >> >>>>
> >> >>>> cheers,
> >> >>>> Richard
> >> >>>>
> >> >>>> --
> >> >>>> Richard Holland, BSc MBCS
> >> >>>> Finance Director, Eagle Genomics Ltd
> >> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >> >>>> http://www.eaglegenomics.com/
> >> >>>> _______________________________________________
> >> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> >>>>
> >> >>
> >> >>
> >> >>
> >> >
> >
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From ayates at ebi.ac.uk  Tue Oct 21 10:32:45 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 15:32:45 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>	
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>	
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>	
	<48FD97AB.70503@ebi.ac.uk>	
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>	
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>	
	<93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
	<a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>
Message-ID: <48FDE80D.1040106@ebi.ac.uk>

If "Thing" has gone then what impact does this have on remaining
classes? Considering methods like canReadNextThing() & readNextThing();
should this be canReadNext() & readNext()?

Just an idle thought ....

Andy

Richard Holland wrote:
> The two examples I gave would be better as annotations, its true.
> Serializable, and Cloneable for that matter, would definitely work better
> that way.
> 
> Well, we could do away with Thing altogether then. I'll update the code.
> 
> 
> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> 
>> Depending on what you want them for isMachineGenerated(),
>> isManuallyCurated(), would possibly be better as annotations
>> (@MachineGenerated, @ManuallyCurated). This is true metadata.
>>
>> Probably if Java had annotations in version 1.1 Serializable would
>> also be an Annotation.  I would agree with the idea that ThingBuilder
>> etc should be typed on extends Serializable.
>>
>> - Mark
>>
>> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
>> <dicknetherlands at gmail.com> wrote:
>>> For now, yes it's empty. But I can envisage situations where it might be
>>> nice to have Thing implement some common methods (e.g.
>> isMachineGenerated(),
>>> isManuallyCurated(), etc.). I'd rather have it there now to be a
>> placeholder
>>> for future expansion, than have to re-engineer everything should we
>> identify
>>> a need for common functions in future.
>>>
>>> You'll see that Thing already extends Serializable, implying that all
>> Things
>>> must be able to persist to an object backing store. Serializable itself
>> is
>>> also an empty interface!
>>>
>>> Also I like the idea of having Thing, not Object, as a kind of marker of
>>> intention. To me it makes it clearer when reading code to avoid Object
>>> wherever possible. Thing may not be any more clever than Object, but it
>>> immediately declares an intention when reading code as to what kind of
>>> Object should be expected.
>>>
>>>
>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>>> Is there any need for Thing at all? Can't a bulder be typed to produce
>>>> something that extends Object?
>>>>
>>>> If Thing provides no behaivour contract or meta-information then why
>>>> does it exist?
>>>>
>>>> - Mark
>>>>
>>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>>>> Depends on what you want to program. If you want to have a collection
>> of
>>>>> objects which are Things & perform a common action on them then
>>>>> annotations are not the way forward.
>>>>>
>>>>> If you want to have some kind of meta-programming occurring & need a
>>>>> class to be multiple things then annotations are right. There is
>>>>> currently no way to enforce compile time dependencies on annotations &
>>>>> my thinking is that this is right. Annotations should be meta data or
>>>>> provide a way to alter a class in a non-invasive way (think Web
>> Service
>>>>> annotations creating WS Servers & Clients without any alteration of
>> the
>>>>> class).
>>>>>
>>>>> Andy
>>>>>
>>>>> Richard Holland wrote:
>>>>>> Spot on.
>>>>>>
>>>>>> Annotation/interface.... i think Annotation is probably better as you
>>>>>> suggest, but I'd have to look into that. Not sure how it works with
>>>>>> collections and generics. If it does turn out to be a better bet,
>> I'll
>>>>>> change it over.
>>>>>>
>>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside
>>>>>> the
>>>>>> biojava-dna module. It declares a dependency on biojava-core. If you
>>>>>> want to
>>>>>> add dependencies to external JARs, take a look at biojava-biosql's
>>>>>> pom.xml
>>>>>> to see how it depends on javax.persistence. (The easiest way to add
>>>>>> these is
>>>>>> via an IDE such as NetBeans, which is what I'm using at the moment).
>>>>>>
>>>>>> cheers,
>>>>>> Richard
>>>>>>
>>>>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>>>>>
>>>>>>> So if I want to build a BioSQL loader from Genbank then would the
>>>>>>> classes (or there wrappers) in the BioSQL Entity package need to
>>>>>>> implement Thing?  Would maven have an issue with that or would it
>> just
>>>>>>> create a dependency on core? (you can tell I've never used Maven
>>>>>>> right).
>>>>>>>
>>>>>>> From a design point of view should Thing be an interface or an
>>>>>>> Annotation? The reason I ask is that it doesn't define any methods
>> so
>>>>>>> it is more of a tag than an interface.
>>>>>>>
>>>>>>> Anyway, my understanding is that I would use a Genbank parser (or
>>>>>>> write one). Write a EntityReceiver interface (probably more than one
>>>>>>> given the number of entities in BioSQL, implement a EntityBuilder
>>>>>>> (again possibly more than one) that implements EntityReceiver and
>>>>>>> builds Entity beans from messages it receives. In this case I
>> probably
>>>>>>> wouldn't provide a writer as JPA would be writing the beans to the
>>>>>>> database.  Would this be how you imagine it?
>>>>>>>
>>>>>>> - Mark
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>>>>>> <holland at eaglegenomics.com> wrote:
>>>>>>>> (From now on I will only be posting these development messages to
>>>>>>>> biojava-dev, which is the intended purpose of that list. Those of
>> you
>>>>>>>> who
>>>>>>>> wish to keep track of things but are currently only subscribed to
>>>>>>> biojava-l
>>>>>>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>>>>>>
>>>>>>>> As promised, I've committed a new package in the biojava-core
>> module
>>>>>>>> that
>>>>>>>> should help understand how to do file parsing and conversion and
>>>>>>>> writing
>>>>>>> in
>>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a
>>>>>>> Genbank
>>>>>>>> parser (note no parsers actually exist yet!):
>>>>>>>>
>>>>>>>> 1. Design yourself a Genbank class which implements the interface
>>>>>>>> Thing
>>>>>>> and
>>>>>>>> can fully represent all the data that might possibly occur inside a
>>>>>>> Genbank
>>>>>>>> file.
>>>>>>>>
>>>>>>>> 2. Write an interface called GenbankReceiver, which extends
>>>>>>>> ThingReceiver
>>>>>>>> and defines all the methods you might need in order to construct a
>>>>>>> Genbank
>>>>>>>> object in an asynchronous fashion.
>>>>>>>>
>>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver
>> and
>>>>>>>> ThingBuilder. It's job is to receive data via method calls, use
>> that
>>>>>>>> data
>>>>>>> to
>>>>>>>> construct a Genbank object, then provide that object on demand.
>>>>>>>>
>>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>>>>>>> constructing new Genbank objects, it writes Genbank records to file
>>>>>>>> that
>>>>>>>> reflect the data it receives.
>>>>>>>>
>>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It can
>>>>>>>> read
>>>>>>>> GenbankFiles and output the data to the methods of the
>> ThingReceiver
>>>>>>>> provided to it, which in this case could be anything which
>> implements
>>>>>>>> the
>>>>>>>> interface GenbankReceiver.
>>>>>>>>
>>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
>>>>>>>> takes a
>>>>>>>> Genbank object and will fire off data from it to the provided
>>>>>>> ThingReceiver
>>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being
>> read
>>>>>>>> from
>>>>>>> a
>>>>>>>> file or some other source.
>>>>>>>>
>>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the
>> original
>>>>>>>> 1 or
>>>>>>> 2,
>>>>>>>> but the additional steps are necessary for flexibility in
>> converting
>>>>>>> between
>>>>>>>> formats.
>>>>>>>>
>>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap
>>>>>>>> these
>>>>>>> steps
>>>>>>>> up for user-friendliness, including various options for opening
>>>>>>>> files,
>>>>>>>> etc.):
>>>>>>>>
>>>>>>>> 1. To read a file - instantiate ThingParser with your GenbankReader
>>>>>>>> as
>>>>>>> the
>>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator
>> methods
>>>>>>>> on
>>>>>>>> ThingParser to get the objects out.
>>>>>>>>
>>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>>>>>>> wrapping
>>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>>>>>>> parseAll()
>>>>>>>> method on the ThingParser to dump the whole lot to your chosen
>>>>>>>> output.
>>>>>>>>
>>>>>>>> The clever bit comes when you want to convert between files.
>> Imagine
>>>>>>> you've
>>>>>>>> done all the above for Genbank, and you've also done it for FASTA.
>>>>>>>> How to
>>>>>>>> convert between them? What you need to do is this:
>>>>>>>>
>>>>>>>> 1. Implement all the classes for both Genbank and FASTA.
>>>>>>>>
>>>>>>>> 2. Write a GenbankFASTAConverter class that implements
>>>>>>> ThingConverter<FASTA>
>>>>>>>> and GenbankReceiver, and will internally convert the data received
>>>>>>>> and
>>>>>>> pass
>>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver
>>>>>>> instance.
>>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>>>>>>> opposite
>>>>>>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
>>>>>>>>
>>>>>>>> Then to convert you use ThingParser again:
>>>>>>>>
>>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with
>> a
>>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the
>>>>>>>> iterator
>>>>>>> to
>>>>>>>> get your Genbank objects out of your FASTA file.
>>>>>>>>
>>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator
>>>>>>>> methos.
>>>>>>>>
>>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but
>> provide
>>>>>>>> a
>>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>>>>>>
>>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap
>> both
>>>>>>>> the
>>>>>>>> reader and the receiver as per options 2 and 3.
>>>>>>>>
>>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>>>>>>> mentions
>>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>>>>>>
>>>>>>>> One last and very important feature of this approach is that if you
>>>>>>> discover
>>>>>>>> that nobody has written the appropriate converter for your chosen
>>>>>>>> pair of
>>>>>>>> formats A and C, but converters do exist to map A to some other
>>>>>>>> format B
>>>>>>> and
>>>>>>>> that other format B on to C, then you can just put the two converts
>>>>>>>> A-B
>>>>>>> and
>>>>>>>> B-C into the ThingParser chain and it'll work perfectly.
>>>>>>>>
>>>>>>>> Enjoy!
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Richard
>>>>>>>>
>>>>>>>> --
>>>>>>>> Richard Holland, BSc MBCS
>>>>>>>> Finance Director, Eagle Genomics Ltd
>>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>>>>>>> http://www.eaglegenomics.com/
>>>>>>>> _______________________________________________
>>>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>>>
>>>>>>
>>>>>>
>>>
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Finance Director, Eagle Genomics Ltd
>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
> 
> 
> 

From holland at eaglegenomics.com  Tue Oct 21 12:13:37 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 17:13:37 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <48FDE80D.1040106@ebi.ac.uk>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
	<93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
	<a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>
	<48FDE80D.1040106@ebi.ac.uk>
Message-ID: <a0d826f40810210913x36fb7332jdec072c4c1aea0d2@mail.gmail.com>

Yup - why not. Feel free to go in and edit. :)

2008/10/21 Andy Yates <ayates at ebi.ac.uk>

> If "Thing" has gone then what impact does this have on remaining
> classes? Considering methods like canReadNextThing() & readNextThing();
> should this be canReadNext() & readNext()?
>
> Just an idle thought ....
>
> Andy
>
> Richard Holland wrote:
> > The two examples I gave would be better as annotations, its true.
> > Serializable, and Cloneable for that matter, would definitely work better
> > that way.
> >
> > Well, we could do away with Thing altogether then. I'll update the code.
> >
> >
> > 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >
> >> Depending on what you want them for isMachineGenerated(),
> >> isManuallyCurated(), would possibly be better as annotations
> >> (@MachineGenerated, @ManuallyCurated). This is true metadata.
> >>
> >> Probably if Java had annotations in version 1.1 Serializable would
> >> also be an Annotation.  I would agree with the idea that ThingBuilder
> >> etc should be typed on extends Serializable.
> >>
> >> - Mark
> >>
> >> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
> >> <dicknetherlands at gmail.com> wrote:
> >>> For now, yes it's empty. But I can envisage situations where it might
> be
> >>> nice to have Thing implement some common methods (e.g.
> >> isMachineGenerated(),
> >>> isManuallyCurated(), etc.). I'd rather have it there now to be a
> >> placeholder
> >>> for future expansion, than have to re-engineer everything should we
> >> identify
> >>> a need for common functions in future.
> >>>
> >>> You'll see that Thing already extends Serializable, implying that all
> >> Things
> >>> must be able to persist to an object backing store. Serializable itself
> >> is
> >>> also an empty interface!
> >>>
> >>> Also I like the idea of having Thing, not Object, as a kind of marker
> of
> >>> intention. To me it makes it clearer when reading code to avoid Object
> >>> wherever possible. Thing may not be any more clever than Object, but it
> >>> immediately declares an intention when reading code as to what kind of
> >>> Object should be expected.
> >>>
> >>>
> >>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>>> Is there any need for Thing at all? Can't a bulder be typed to produce
> >>>> something that extends Object?
> >>>>
> >>>> If Thing provides no behaivour contract or meta-information then why
> >>>> does it exist?
> >>>>
> >>>> - Mark
> >>>>
> >>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> >>>>> Depends on what you want to program. If you want to have a collection
> >> of
> >>>>> objects which are Things & perform a common action on them then
> >>>>> annotations are not the way forward.
> >>>>>
> >>>>> If you want to have some kind of meta-programming occurring & need a
> >>>>> class to be multiple things then annotations are right. There is
> >>>>> currently no way to enforce compile time dependencies on annotations
> &
> >>>>> my thinking is that this is right. Annotations should be meta data or
> >>>>> provide a way to alter a class in a non-invasive way (think Web
> >> Service
> >>>>> annotations creating WS Servers & Clients without any alteration of
> >> the
> >>>>> class).
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> Richard Holland wrote:
> >>>>>> Spot on.
> >>>>>>
> >>>>>> Annotation/interface.... i think Annotation is probably better as
> you
> >>>>>> suggest, but I'd have to look into that. Not sure how it works with
> >>>>>> collections and generics. If it does turn out to be a better bet,
> >> I'll
> >>>>>> change it over.
> >>>>>>
> >>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside
> >>>>>> the
> >>>>>> biojava-dna module. It declares a dependency on biojava-core. If you
> >>>>>> want to
> >>>>>> add dependencies to external JARs, take a look at biojava-biosql's
> >>>>>> pom.xml
> >>>>>> to see how it depends on javax.persistence. (The easiest way to add
> >>>>>> these is
> >>>>>> via an IDE such as NetBeans, which is what I'm using at the moment).
> >>>>>>
> >>>>>> cheers,
> >>>>>> Richard
> >>>>>>
> >>>>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>>>>>
> >>>>>>> So if I want to build a BioSQL loader from Genbank then would the
> >>>>>>> classes (or there wrappers) in the BioSQL Entity package need to
> >>>>>>> implement Thing?  Would maven have an issue with that or would it
> >> just
> >>>>>>> create a dependency on core? (you can tell I've never used Maven
> >>>>>>> right).
> >>>>>>>
> >>>>>>> From a design point of view should Thing be an interface or an
> >>>>>>> Annotation? The reason I ask is that it doesn't define any methods
> >> so
> >>>>>>> it is more of a tag than an interface.
> >>>>>>>
> >>>>>>> Anyway, my understanding is that I would use a Genbank parser (or
> >>>>>>> write one). Write a EntityReceiver interface (probably more than
> one
> >>>>>>> given the number of entities in BioSQL, implement a EntityBuilder
> >>>>>>> (again possibly more than one) that implements EntityReceiver and
> >>>>>>> builds Entity beans from messages it receives. In this case I
> >> probably
> >>>>>>> wouldn't provide a writer as JPA would be writing the beans to the
> >>>>>>> database.  Would this be how you imagine it?
> >>>>>>>
> >>>>>>> - Mark
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> >>>>>>> <holland at eaglegenomics.com> wrote:
> >>>>>>>> (From now on I will only be posting these development messages to
> >>>>>>>> biojava-dev, which is the intended purpose of that list. Those of
> >> you
> >>>>>>>> who
> >>>>>>>> wish to keep track of things but are currently only subscribed to
> >>>>>>> biojava-l
> >>>>>>>> should also subscribe to biojava-dev in order to keep up to date.)
> >>>>>>>>
> >>>>>>>> As promised, I've committed a new package in the biojava-core
> >> module
> >>>>>>>> that
> >>>>>>>> should help understand how to do file parsing and conversion and
> >>>>>>>> writing
> >>>>>>> in
> >>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a
> >>>>>>> Genbank
> >>>>>>>> parser (note no parsers actually exist yet!):
> >>>>>>>>
> >>>>>>>> 1. Design yourself a Genbank class which implements the interface
> >>>>>>>> Thing
> >>>>>>> and
> >>>>>>>> can fully represent all the data that might possibly occur inside
> a
> >>>>>>> Genbank
> >>>>>>>> file.
> >>>>>>>>
> >>>>>>>> 2. Write an interface called GenbankReceiver, which extends
> >>>>>>>> ThingReceiver
> >>>>>>>> and defines all the methods you might need in order to construct a
> >>>>>>> Genbank
> >>>>>>>> object in an asynchronous fashion.
> >>>>>>>>
> >>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver
> >> and
> >>>>>>>> ThingBuilder. It's job is to receive data via method calls, use
> >> that
> >>>>>>>> data
> >>>>>>> to
> >>>>>>>> construct a Genbank object, then provide that object on demand.
> >>>>>>>>
> >>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver
> and
> >>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> >>>>>>>> constructing new Genbank objects, it writes Genbank records to
> file
> >>>>>>>> that
> >>>>>>>> reflect the data it receives.
> >>>>>>>>
> >>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It
> can
> >>>>>>>> read
> >>>>>>>> GenbankFiles and output the data to the methods of the
> >> ThingReceiver
> >>>>>>>> provided to it, which in this case could be anything which
> >> implements
> >>>>>>>> the
> >>>>>>>> interface GenbankReceiver.
> >>>>>>>>
> >>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
> >>>>>>>> takes a
> >>>>>>>> Genbank object and will fire off data from it to the provided
> >>>>>>> ThingReceiver
> >>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being
> >> read
> >>>>>>>> from
> >>>>>>> a
> >>>>>>>> file or some other source.
> >>>>>>>>
> >>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the
> >> original
> >>>>>>>> 1 or
> >>>>>>> 2,
> >>>>>>>> but the additional steps are necessary for flexibility in
> >> converting
> >>>>>>> between
> >>>>>>>> formats.
> >>>>>>>>
> >>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap
> >>>>>>>> these
> >>>>>>> steps
> >>>>>>>> up for user-friendliness, including various options for opening
> >>>>>>>> files,
> >>>>>>>> etc.):
> >>>>>>>>
> >>>>>>>> 1. To read a file - instantiate ThingParser with your
> GenbankReader
> >>>>>>>> as
> >>>>>>> the
> >>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator
> >> methods
> >>>>>>>> on
> >>>>>>>> ThingParser to get the objects out.
> >>>>>>>>
> >>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
> >>>>>>> wrapping
> >>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
> >>>>>>> parseAll()
> >>>>>>>> method on the ThingParser to dump the whole lot to your chosen
> >>>>>>>> output.
> >>>>>>>>
> >>>>>>>> The clever bit comes when you want to convert between files.
> >> Imagine
> >>>>>>> you've
> >>>>>>>> done all the above for Genbank, and you've also done it for FASTA.
> >>>>>>>> How to
> >>>>>>>> convert between them? What you need to do is this:
> >>>>>>>>
> >>>>>>>> 1. Implement all the classes for both Genbank and FASTA.
> >>>>>>>>
> >>>>>>>> 2. Write a GenbankFASTAConverter class that implements
> >>>>>>> ThingConverter<FASTA>
> >>>>>>>> and GenbankReceiver, and will internally convert the data received
> >>>>>>>> and
> >>>>>>> pass
> >>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver
> >>>>>>> instance.
> >>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly
> the
> >>>>>>> opposite
> >>>>>>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >>>>>>>>
> >>>>>>>> Then to convert you use ThingParser again:
> >>>>>>>>
> >>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with
> >> a
> >>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
> >>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the
> >>>>>>>> iterator
> >>>>>>> to
> >>>>>>>> get your Genbank objects out of your FASTA file.
> >>>>>>>>
> >>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide
> a
> >>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator
> >>>>>>>> methos.
> >>>>>>>>
> >>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but
> >> provide
> >>>>>>>> a
> >>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
> >>>>>>>>
> >>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap
> >> both
> >>>>>>>> the
> >>>>>>>> reader and the receiver as per options 2 and 3.
> >>>>>>>>
> >>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> >>>>>>> mentions
> >>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >>>>>>>>
> >>>>>>>> One last and very important feature of this approach is that if
> you
> >>>>>>> discover
> >>>>>>>> that nobody has written the appropriate converter for your chosen
> >>>>>>>> pair of
> >>>>>>>> formats A and C, but converters do exist to map A to some other
> >>>>>>>> format B
> >>>>>>> and
> >>>>>>>> that other format B on to C, then you can just put the two
> converts
> >>>>>>>> A-B
> >>>>>>> and
> >>>>>>>> B-C into the ThingParser chain and it'll work perfectly.
> >>>>>>>>
> >>>>>>>> Enjoy!
> >>>>>>>>
> >>>>>>>> cheers,
> >>>>>>>> Richard
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Richard Holland, BSc MBCS
> >>>>>>>> Finance Director, Eagle Genomics Ltd
> >>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >>>>>>>> http://www.eaglegenomics.com/
> >>>>>>>> _______________________________________________
> >>>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>
> >>>
> >>> --
> >>> Richard Holland, BSc MBCS
> >>> Finance Director, Eagle Genomics Ltd
> >>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >>> http://www.eaglegenomics.com/
> >>>
> >
> >
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From fjossinet at orange.fr  Tue Oct 21 15:55:47 2008
From: fjossinet at orange.fr (Fabrice Jossinet)
Date: Tue, 21 Oct 2008 21:55:47 +0200
Subject: [Biojava-dev] Biojava 3 and intermolecular features
Message-ID: <F4BAD9C5-8883-4DE8-9DBC-B4A144E6309B@orange.fr>

Hi all,

When I used the previous releases of biojava, i had some problems to  
model inter-molecular features. For example interactions between two  
sequences/molecules in a tertiary structure or the interactions  
between two molecular partners in an interaction network. The feature  
should be the same, shared by (at least) 2 molecules but can be  
attached to different locations for each molecule.

With the current biojava model, a feature is composed of one location  
for a given sequence. Consequently, for the development of my previous  
software, I decided to change a little bit the biojava paradigm. For  
example, to model an intermolecular interaction between the region  
23-35 of mySeq1 and the region 34-46 of mySeq2 i have:

Feature myFeature = new InterMolecularInteraction();

mySeq1.addAnnotation(new Annotation(myFeature, new Location("23-35")));
mySeq2.addAnnotation(new Annotation(myFeature, new Location("34-46")));

The Annotation concept links a feature to a location and is attached  
to a sequence (this concept has no relation with the Annotation  
concept proposed by Biojava).

With this kind of model, I could also able to use the same concepts  
and strategy to model multiple alignments, which can also be seen as a  
kind of "inter-molecular relation".

Is there any plan to model these kind of features in biojava3? If no,  
can my proposal be a good start ?

Fabrice


--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From heuermh at acm.org  Thu Oct 23 01:12:07 2008
From: heuermh at acm.org (Michael Heuer)
Date: Thu, 23 Oct 2008 01:12:07 -0400 (EDT)
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>

Sorry, I'm a bit late to the game.  Hope I didn't miss anything
exciting yet!

Would it be better to commit this to trunk, and put the current codebase
out to pasture on a branch?

Is it possible (or desireable) to send SVN commit messages to the dev
mailing list?  Or alternatively, should someone create a project entry for
biojava on CIA.vc?

http://cia.vc


As soon as I can remember my dev.open-bio.org password I'll start
committing stuff, otherwise I'll post patches to bugzilla.

   michael


On Mon, 20 Oct 2008, Richard Holland wrote:

> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at eaglegenomics.com  Thu Oct 23 02:04:23 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 23 Oct 2008 07:04:23 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>
Message-ID: <a0d826f40810222304w47be582bp5107e1d5718683c9@mail.gmail.com>

>
>
> Would it be better to commit this to trunk, and put the current codebase
> out to pasture on a branch?


Andreas is Mr.SVN. Andreas, what do you think?


>
> Is it possible (or desireable) to send SVN commit messages to the dev
> mailing list?  Or alternatively, should someone create a project entry for
> biojava on CIA.vc?
>
> http://cia.vc


I think commit messages to biojava-dev would be very useful. If nothing
else, it provides a good indicator of activity to casual observers, and also
lets people keep an automated eye (by mail filtering) on commits in the
areas that interest them most.


>
> As soon as I can remember my dev.open-bio.org password I'll start
> committing stuff, otherwise I'll post patches to bugzilla.


If you've forgotten it, let support at OBF know and they'll reset it for
you.

cheers,
Richard


>
>
>   michael
>
>
> On Mon, 20 Oct 2008, Richard Holland wrote:
>
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From ch.koeberle at googlemail.com  Thu Oct 23 04:58:15 2008
From: ch.koeberle at googlemail.com (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Thu, 23 Oct 2008 10:58:15 +0200
Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship
Message-ID: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>

Hi,
I found a bug in the postgre mapping file for BioEntryRelationship.
line:
<many-to-one name="object" class="Feature" column="object_bioentry_id"
not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
embed-xml="false"/>
The value for the attribute class has to be "BioEntry"

For the BioEntry I miss methodes to have access to subject_bioentry
BioEntryRelationship. I think the BioEntryRelationship. is a parent child
relationship. So it will be nice to have access to both.

Furthermore the hibernate mapping strategies for the BioSQL is quite slow
and produces a lot of queries to the database. Because for all lists and set
the lazy fetch mode is disable. In this mode hibernate will execute one
query for each element in a list or set. The faster way is to enable the
lazy fetch mode an use methods to load the list. Each of these methods
executes only one query.
For excample:

public List<BioEntry> getParents(BioEntry bioEntry){

String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
=:subject";
Query query = session.createQuery(stmt);
query.setParameter("subject", bioEntry);
return query.list();

}


This is factor 2 to 4 faster than the methode BioEntry..getRelationships()
In case of all dependences of an BioEntry-Object an select with lazy
fetching can be 500 times faster than a select with eager fetching (in case
of unigene cluster Hs.4 for example).
Here a example for the relationship unigene cluster Hs.2 and the gene
BC067218 (we use BioSQL to store Unigene)

getParents():
runtime: 14 msec
SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
bioentry1_.accession as accession89_, bioentry1_.description as
descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id as
biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as seq93_,
case when bioentry1_1_.bioentry_id is not null then 2 when
bioentry1_.bioentry_id is not null then 0 end as clazz_ from
unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left
outer join unigene.biosequence bioentry1_1_ on
bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
joinunigene.biosequence bioentry1_2_ on
bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
bioentryre0_.object_bioentry_id=?


bioEntry.getRelationships():
runtime: 36 msec
SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
bioentry0_.accession as accession89_, bioentry0_.description as
descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id as
biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as seq93_,
case when bioentry0_1_.bioentry_id is not null then 2 when
bioentry0_.bioentry_id is not null then 0 end as clazz_ from
unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_
on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
unigene.biosequence bioentry0_2_ on
bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
relationsh0_.bioentry_relationship_id as bioentry1_1_,
relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
relationsh0_ where relationsh0_.object_bioentry_id=?
Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
namespace0_.description as descript4_80_0_ from unigene.biodatabase
namespace0_ where namespace0_.biodatabase_id=?
Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, bioentry0_.name
as name89_0_, bioentry0_.identifier as identifier89_0_, bioentry0_.accession
as accession89_0_, bioentry0_.description as descript5_89_0_,
bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length
as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as
seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_
on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
unigene.biosequence bioentry0_2_ on
bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
bioentry0_.bioentry_id=?
Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
namespace0_.description as descript4_80_0_ from unigene.biodatabase
namespace0_ where namespace0_.biodatabase_id=?
Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_,
term0_.identifier as identifier84_0_, term0_.definition as definition84_0_,
term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from
unigene.term term0_ where term0_.term_id=?
Hibernate: select ontology0_.ontology_id as ontology1_83_0_, ontology0_.name
as name83_0_, ontology0_.definition as definition83_0_ from unigene.ontology
ontology0_ where ontology0_.ontology_id=?
Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id
as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_,
termset0_.identifier as identifier84_0_, termset0_.definition as
definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id
as ontology6_84_0_ from unigene.term termset0_ where termset0_.ontology_id=?
Hibernate: select tripleset0_.ontology_id as ontology5_1_,
tripleset0_.term_relationship_id as term1_1_,
tripleset0_.term_relationship_id as term1_87_0_, tripleset0_.subject_term_id
as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id
as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
tripleset0_.ontology_id=?
Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id
as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
rankedcros0_ where rankedcros0_.term_id=?
Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as
synonym0_ from unigene.term_synonym synonymset0_ where
synonymset0_.term_id=?

-- 
Christian K?berle


From dicknetherlands at gmail.com  Thu Oct 23 05:45:53 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Thu, 23 Oct 2008 10:45:53 +0100
Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship
In-Reply-To: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>
References: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>
Message-ID: <a0d826f40810230245q730eb936g4109c71b9893d47@mail.gmail.com>

Christian,

Thanks for your comments.

I'm not sure which file you're referring to, or what version of BioJava you
have, as the line you quote does not appear in any of the current hbm.xml
files in the trunk of SubVersion.

Also, the BioEntryRelationship interface and it's implementations do already
have getSubject() and getObject() methods which return the parent and child
BioEntry instances.

The BioEntry interface itself has a getBioEntryRelationships() method which
returns all relationships in which it is the object BioEntry. You could use
HQL to obtain those for which it is the subject, but you are right that it
would be good to have a method that returns the latter. Could you raise a
BugZilla request for this?

It would be good if you could do some thorough testing of your lazy loading
suggestions on some other use cases before we decide whether or not to adopt
that approach in future developments. Use cases would include:

1. have a very large database with thousands of related records in it (e.g.
load the whole of GenBank). Iterate over all the records in the database and
perform a simple read operation on each that hits the modified methods. See
if you run out of memory.

2. like 1, but perform a series of repeated read/write operations using the
modified methods, with a final commit to attempt to write the results back
to see if they still persist correctly.

The reason is that the modified methods might cause problems with those
people who are processing large volumes of data in their databases. If all
related records are loaded at once, even only on demand, instead of one at a
time, it will cause memory issues. The trade off is therefore memory vs.
speed. We opted for the memory option because it makes life easier for most
novice coders to not have to trace out-of-memory exceptions (although they
will still occur using the existing methods, but it happens less often).

Also, your method reruns the query every time it is called. It probably
should cache the results after the first call, to prevent objects being
reloaded unnecessarily, and to prevent problems with objects from a previous
call being modified then attempted to be overwritten by a subsequent call?
Also if Hibernate does not receive the same set back that it auto-loaded as
a property via the default get() method when it comes to save the object, it
will throw a wobbly and refuse to commit.

cheers,
Richard


2008/10/23 Christian K?berle <ch.koeberle at googlemail.com>

> Hi,
> I found a bug in the postgre mapping file for BioEntryRelationship.
> line:
> <many-to-one name="object" class="Feature" column="object_bioentry_id"
> not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
> embed-xml="false"/>
> The value for the attribute class has to be "BioEntry"
>
> For the BioEntry I miss methodes to have access to subject_bioentry
> BioEntryRelationship. I think the BioEntryRelationship. is a parent child
> relationship. So it will be nice to have access to both.
>
> Furthermore the hibernate mapping strategies for the BioSQL is quite slow
> and produces a lot of queries to the database. Because for all lists and
> set
> the lazy fetch mode is disable. In this mode hibernate will execute one
> query for each element in a list or set. The faster way is to enable the
> lazy fetch mode an use methods to load the list. Each of these methods
> executes only one query.
> For excample:
>
> public List<BioEntry> getParents(BioEntry bioEntry){
>
> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
> =:subject";
> Query query = session.createQuery(stmt);
> query.setParameter("subject", bioEntry);
> return query.list();
>
> }
>
>
> This is factor 2 to 4 faster than the methode BioEntry..getRelationships()
> In case of all dependences of an BioEntry-Object an select with lazy
> fetching can be 500 times faster than a select with eager fetching (in case
> of unigene cluster Hs.4 for example).
> Here a example for the relationship unigene cluster Hs.2 and the gene
> BC067218 (we use BioSQL to store Unigene)
>
> getParents():
> runtime: 14 msec
> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
> bioentry1_.accession as accession89_, bioentry1_.description as
> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id
> as
> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as
> seq93_,
> case when bioentry1_1_.bioentry_id is not null then 2 when
> bioentry1_.bioentry_id is not null then 0 end as clazz_ from
> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left
> outer join unigene.biosequence bioentry1_1_ on
> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
> joinunigene.biosequence bioentry1_2_ on
> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
> bioentryre0_.object_bioentry_id=?
>
>
> bioEntry.getRelationships():
> runtime: 36 msec
> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
> bioentry0_.accession as accession89_, bioentry0_.description as
> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id
> as
> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as
> seq93_,
> case when bioentry0_1_.bioentry_id is not null then 2 when
> bioentry0_.bioentry_id is not null then 0 end as clazz_ from
> unigene.bioentry bioentry0_ left outer join unigene.biosequence
> bioentry0_1_
> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
> unigene.biosequence bioentry0_2_ on
> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
> Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
> relationsh0_.bioentry_relationship_id as bioentry1_1_,
> relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
> relationsh0_ where relationsh0_.object_bioentry_id=?
> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
> namespace0_.description as descript4_80_0_ from unigene.biodatabase
> namespace0_ where namespace0_.biodatabase_id=?
> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_,
> bioentry0_.name
> as name89_0_, bioentry0_.identifier as identifier89_0_,
> bioentry0_.accession
> as accession89_0_, bioentry0_.description as descript5_89_0_,
> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
> biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length
> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as
> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
> unigene.bioentry bioentry0_ left outer join unigene.biosequence
> bioentry0_1_
> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
> unigene.biosequence bioentry0_2_ on
> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
> bioentry0_.bioentry_id=?
> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
> namespace0_.description as descript4_80_0_ from unigene.biodatabase
> namespace0_ where namespace0_.biodatabase_id=?
> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_,
> term0_.identifier as identifier84_0_, term0_.definition as definition84_0_,
> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from
> unigene.term term0_ where term0_.term_id=?
> Hibernate: select ontology0_.ontology_id as ontology1_83_0_,
> ontology0_.name
> as name83_0_, ontology0_.definition as definition83_0_ from
> unigene.ontology
> ontology0_ where ontology0_.ontology_id=?
> Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id
> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_,
> termset0_.identifier as identifier84_0_, termset0_.definition as
> definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id
> as ontology6_84_0_ from unigene.term termset0_ where
> termset0_.ontology_id=?
> Hibernate: select tripleset0_.ontology_id as ontology5_1_,
> tripleset0_.term_relationship_id as term1_1_,
> tripleset0_.term_relationship_id as term1_87_0_,
> tripleset0_.subject_term_id
> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
> tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id
> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
> tripleset0_.ontology_id=?
> Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id
> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
> rankedcros0_ where rankedcros0_.term_id=?
> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as
> synonym0_ from unigene.term_synonym synonymset0_ where
> synonymset0_.term_id=?
>
> --
> Christian K?berle
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From bugzilla-daemon at portal.open-bio.org  Thu Oct 23 09:16:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 23 Oct 2008 09:16:43 -0400
Subject: [Biojava-dev] [Bug 2625] New: Parent Child Relationship of BioEntry
	via BioEntryRelationship
Message-ID: <bug-2625-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2625

           Summary: Parent Child Relationship of BioEntry via
                    BioEntryRelationship
           Product: BioJava
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DB / BioSQL
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: ch.koeberle at googlemail.com


An BioEntry-Object has only the methode getRelationships(), these method gives
all BioEntryRelationship-Objkcts where the BioEntry-Object is the result of
BioEntryRelationship.getObject() . Because the in the BioEntry.hbm.xml is only
these mapping:
<set name="relationships" lazy="false" cascade="all-delete-orphan"
sort="natural" inverse="true">
<key column="object_bioentry_id"/>
<one-to-many class="BioEntryRelationship" embed-xml="true"/>
</set>

I miss somethings like this:
BioEntry.getReverseRelationships() (or getChilds())
<set name="reverseRelationships" lazy="false" cascade="all-delete-orphan"
sort="natural" inverse="true">
<key column="subject_bioentry_id"/>
<one-to-many class="BioEntryRelationship" embed-xml="true"/>
</set>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From andreas at sdsc.edu  Thu Oct 23 09:57:41 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 06:57:41 -0700
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810222304w47be582bp5107e1d5718683c9@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>
	<a0d826f40810222304w47be582bp5107e1d5718683c9@mail.gmail.com>
Message-ID: <59a41c430810230657p73b5d10kbf497c20fdfbe893@mail.gmail.com>

>> Would it be better to commit this to trunk, and put the current codebase
>> out to pasture on a branch?

At the moment we have a number of unreleased bug fixes in
biojava-live/trunk . Also if somebody would start using BJ at the
present I would still recommend to use 1.6. As such I would say for
the moment let's leave it the way it is. Once we reach alpha stage we
could release a final biojava 1.7 and afterwards switch the branches
in svn.

About the commit messages sent to this list: can we make this a once
per day? I can also set something up as part of cruise control...

Andreas

From andreas at sdsc.edu  Thu Oct 23 13:24:27 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 10:24:27 -0700
Subject: [Biojava-dev] svn write access
In-Reply-To: <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr>
References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr>
	<59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com>
	<61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr>
Message-ID: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com>

Hi Fabrice,

in order to obtain a developer checkout you have to follow the
procedure as it is described on
http://biojava.org/wiki/CVS_to_SVN_Migration
under the section
Developer checkout

code.open-bio is a read only copy of the SVN repository for anonymous
checkout. The "real" developer repository is on the dev.open-bio
machine and you can only access it via ssh. This setup is for security
reasons. code.open-bio and dev.open-bio are getting synchronized
approx ev. 20 min.

Andreas

On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet
<f.jossinet at ibmc.u-strasbg.fr> wrote:
> Ok, I did that with the "code.open-bio.org" server and like that:
>
> svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3
> --username fjossinet --password blabla
>
> In this case, it seems it doesn't work.
>
> I will try the other way as described in the biojava homepage
>
> Thanx
>
> F
> Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit :
>
>> you need to check out with that account, so the svn flags are all set
>> correctly.
>>
>> see the biojava  homepage for how to check out with a developer account.
>> A
>>
>> 2008/10/23 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>:
>>>
>>> Hi Andreas,
>>>
>>> Mauricio has created me the account fjossinet for the machine
>>> dev.open-bio.org. But I think this is only the first step since I still
>>> don't have the write access on the svn machine.
>>>
>>> Thank you for your help
>>>
>>> Regards
>>>
>>> Fabrice
>>>
>>>
>>> --
>>> Dr. Fabrice Jossinet
>>> Laboratoire de Bioinformatique, modelisation et simulation des acides
>>> nucleiques
>>> Universite Louis Pasteur
>>> Institut de biologie moleculaire et cellulaire du CNRS
>>> UPR9002, Architecture et Reactivite de l'ARN
>>> 15 rue Rene Descartes
>>> F-67084 Strasbourg Cedex
>>> France
>>>
>>> Tel + 33 (0) 3 88 417053
>>> FAX + 33 (0) 3 88 60 22 18
>>>
>>> f.jossinet at ibmc.u-strasbg.fr
>>> fjossinet at gmail.com
>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>>> http://fjossinet.u-strasbg.fr/
>>>
>>>
>>>
>>>
>>>
>
>


From andreas at sdsc.edu  Thu Oct 23 23:17:02 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 20:17:02 -0700
Subject: [Biojava-dev] biojava 3 docu on wiki
Message-ID: <59a41c430810232017wbc8874fnf829c5b9e7ced4a9@mail.gmail.com>

Hi,

I summarized the current status of the BioJava3 project at

http://biojava.org/wiki/BioJava3_project

feel free to update/add/comment.

Andreas

From andreas at sdsc.edu  Fri Oct 24 00:01:31 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 21:01:31 -0700
Subject: [Biojava-dev] biojava 3 - java version
Message-ID: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>

Hi,

I just tried to get an initial svn checkout of biojava3 on my mac at
home. It fails to build since there is no Java 1.6 available for my
OSX 10.4.11 ...
Is there a strong reason why we should enforce java 1.6? otherwise
would be good to support 1.5+

Andreas

From f.jossinet at ibmc.u-strasbg.fr  Fri Oct 24 04:21:15 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Fri, 24 Oct 2008 10:21:15 +0200
Subject: [Biojava-dev] svn write access
In-Reply-To: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com>
References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr>
	<59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com>
	<61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr>
	<59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com>
Message-ID: <4CF8A26B-C50A-40F2-A7A5-B9F958F0F677@ibmc.u-strasbg.fr>

Hi Andreas,

Thank you for these details. I have added the new RNA module to  
biojava3 branch and I have updated the pom.xml file in the root  
directory of this branch.

Fabrice

Le 23 oct. 08 ? 19:24, Andreas Prlic a ?crit :

> Hi Fabrice,
>
> in order to obtain a developer checkout you have to follow the
> procedure as it is described on
> http://biojava.org/wiki/CVS_to_SVN_Migration
> under the section
> Developer checkout
>
> code.open-bio is a read only copy of the SVN repository for anonymous
> checkout. The "real" developer repository is on the dev.open-bio
> machine and you can only access it via ssh. This setup is for security
> reasons. code.open-bio and dev.open-bio are getting synchronized
> approx ev. 20 min.
>
> Andreas
>
> On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet
> <f.jossinet at ibmc.u-strasbg.fr> wrote:
>> Ok, I did that with the "code.open-bio.org" server and like that:
>>
>> svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3
>> --username fjossinet --password blabla
>>
>> In this case, it seems it doesn't work.
>>
>> I will try the other way as described in the biojava homepage
>>
>> Thanx
>>
>> F
>> Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit :
>>
>>> you need to check out with that account, so the svn flags are all  
>>> set
>>> correctly.
>>>
>>> see the biojava  homepage for how to check out with a developer  
>>> account.
>>> A
>>>
>>> 2008/10/23 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>:
>>>>
>>>> Hi Andreas,
>>>>
>>>> Mauricio has created me the account fjossinet for the machine
>>>> dev.open-bio.org. But I think this is only the first step since I  
>>>> still
>>>> don't have the write access on the svn machine.
>>>>
>>>> Thank you for your help
>>>>
>>>> Regards
>>>>
>>>> Fabrice
>>>>
>>>>
>>>> --
>>>> Dr. Fabrice Jossinet
>>>> Laboratoire de Bioinformatique, modelisation et simulation des  
>>>> acides
>>>> nucleiques
>>>> Universite Louis Pasteur
>>>> Institut de biologie moleculaire et cellulaire du CNRS
>>>> UPR9002, Architecture et Reactivite de l'ARN
>>>> 15 rue Rene Descartes
>>>> F-67084 Strasbourg Cedex
>>>> France
>>>>
>>>> Tel + 33 (0) 3 88 417053
>>>> FAX + 33 (0) 3 88 60 22 18
>>>>
>>>> f.jossinet at ibmc.u-strasbg.fr
>>>> fjossinet at gmail.com
>>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>>>> http://fjossinet.u-strasbg.fr/
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>


From dicknetherlands at gmail.com  Fri Oct 24 05:58:18 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Fri, 24 Oct 2008 10:58:18 +0100
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
Message-ID: <a0d826f40810240258t14b75c86r616671cb34af011@mail.gmail.com>

It's only the older PPC Mac models (running Mac OS X 10.4 or older) which
can't get any newer official versions of Java than 1.5 / 5.0.

However, an alternative (free) route for obtaining a Java 1.6 / 6.0 compiler
is provided for these older machines:
http://landonf.bikemonkey.org/static/soylatte/

We wanted to move to Java 6 because it'll likely take about a year to get
BJ3 fully up and running, by which time Java 6 will probably be the oldest
supported version of Java available from Sun (5.0 is already end-of-lifed,
and with 7.0 due out in January it is likely to be desupported very soon.
When 8.0 probably about 12 months after BJ3 is finished then 5.0 will
definitely become desupported).

cheers,
Richard


2008/10/24 Andreas Prlic <andreas at sdsc.edu>

> Hi,
>
> I just tried to get an initial svn checkout of biojava3 on my mac at
> home. It fails to build since there is no Java 1.6 available for my
> OSX 10.4.11 ...
> Is there a strong reason why we should enforce java 1.6? otherwise
> would be good to support 1.5+
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From f.jossinet at ibmc.u-strasbg.fr  Fri Oct 24 06:20:59 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Fri, 24 Oct 2008 12:20:59 +0200
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
Message-ID: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>

Just to refresh the memory....

Major changes included in Java 6:

     * Support for older Win9x versions dropped. The last version for  
Windows 98 and Windows ME is Java Runtime Environment Version 5.0  
Update 16 (1.5.0.16).
     * Scripting Language Support (JSR 223): Generic API for tight  
integration with scripting languages, and built-in Mozilla Javascript  
Rhino integration
     * Dramatic performance improvements for the core platform, and  
Swing.
     * Improved Web Service support through JAX-WS (JSR 224)
     * JDBC 4.0 support (JSR 221).
     * Java Compiler API (JSR 199): an API allowing a Java program to  
select and invoke a Java Compiler programmatically.
     * Upgrade of JAXB to version 2.0: Including integration of a StAX  
parser.
     * Support for pluggable annotations (JSR 269).
     * Many GUI improvements, such as integration of SwingWorker in  
the API, table sorting and filtering, and true Swing double-buffering  
(eliminating the gray-area effect).

Perhaps the core module can be linked to the 1.5 version. And if  
someone needs, for example, the improvements of the GUI for his  
module, this module will be linked to another version.

Possible or not ?

F

Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit :

> Hi,
>
> I just tried to get an initial svn checkout of biojava3 on my mac at
> home. It fails to build since there is no Java 1.6 available for my
> OSX 10.4.11 ...
> Is there a strong reason why we should enforce java 1.6? otherwise
> would be good to support 1.5+
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From dicknetherlands at gmail.com  Fri Oct 24 07:14:43 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Fri, 24 Oct 2008 12:14:43 +0100
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
	<6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>
Message-ID: <a0d826f40810240414w2fffda69nb171634f0808fb73@mail.gmail.com>

If you can find a way to make Maven do that, then I'm happy for you to make
the relevant changes.

cheers,
Richard

2008/10/24 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>

> Just to refresh the memory....
>
> Major changes included in Java 6:
>
>    * Support for older Win9x versions dropped. The last version for Windows
> 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 (
> 1.5.0.16).
>    * Scripting Language Support (JSR 223): Generic API for tight
> integration with scripting languages, and built-in Mozilla Javascript Rhino
> integration
>    * Dramatic performance improvements for the core platform, and Swing.
>    * Improved Web Service support through JAX-WS (JSR 224)
>    * JDBC 4.0 support (JSR 221).
>    * Java Compiler API (JSR 199): an API allowing a Java program to select
> and invoke a Java Compiler programmatically.
>    * Upgrade of JAXB to version 2.0: Including integration of a StAX
> parser.
>    * Support for pluggable annotations (JSR 269).
>    * Many GUI improvements, such as integration of SwingWorker in the API,
> table sorting and filtering, and true Swing double-buffering (eliminating
> the gray-area effect).
>
> Perhaps the core module can be linked to the 1.5 version. And if someone
> needs, for example, the improvements of the GUI for his module, this module
> will be linked to another version.
>
> Possible or not ?
>
> F
>
> Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit :
>
>
>  Hi,
>>
>> I just tried to get an initial svn checkout of biojava3 on my mac at
>> home. It fails to build since there is no Java 1.6 available for my
>> OSX 10.4.11 ...
>> Is there a strong reason why we should enforce java 1.6? otherwise
>> would be good to support 1.5+
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ayates at ebi.ac.uk  Fri Oct 24 07:28:56 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 24 Oct 2008 12:28:56 +0100
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <a0d826f40810240414w2fffda69nb171634f0808fb73@mail.gmail.com>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>	<6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>
	<a0d826f40810240414w2fffda69nb171634f0808fb73@mail.gmail.com>
Message-ID: <4901B178.7090307@ebi.ac.uk>

Yes I believe it is possible to get a module compiled against a
different type of Java as seen here:

http://maven.apache.org/plugins/maven-compiler-plugin/howto.html

However to do this properly it requires compiling the code using the 1.5
JDK sources especially if we are going to leverage the API as much as we
can. My group has already encountered this with changes to the
java.sql.Connection interfaces meaning we have to compile against 1.5
sources.

Andy

Richard Holland wrote:
> If you can find a way to make Maven do that, then I'm happy for you to make
> the relevant changes.
> 
> cheers,
> Richard
> 
> 2008/10/24 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
> 
>> Just to refresh the memory....
>>
>> Major changes included in Java 6:
>>
>>    * Support for older Win9x versions dropped. The last version for Windows
>> 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 (
>> 1.5.0.16).
>>    * Scripting Language Support (JSR 223): Generic API for tight
>> integration with scripting languages, and built-in Mozilla Javascript Rhino
>> integration
>>    * Dramatic performance improvements for the core platform, and Swing.
>>    * Improved Web Service support through JAX-WS (JSR 224)
>>    * JDBC 4.0 support (JSR 221).
>>    * Java Compiler API (JSR 199): an API allowing a Java program to select
>> and invoke a Java Compiler programmatically.
>>    * Upgrade of JAXB to version 2.0: Including integration of a StAX
>> parser.
>>    * Support for pluggable annotations (JSR 269).
>>    * Many GUI improvements, such as integration of SwingWorker in the API,
>> table sorting and filtering, and true Swing double-buffering (eliminating
>> the gray-area effect).
>>
>> Perhaps the core module can be linked to the 1.5 version. And if someone
>> needs, for example, the improvements of the GUI for his module, this module
>> will be linked to another version.
>>
>> Possible or not ?
>>
>> F
>>
>> Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit :
>>
>>
>>  Hi,
>>> I just tried to get an initial svn checkout of biojava3 on my mac at
>>> home. It fails to build since there is no Java 1.6 available for my
>>> OSX 10.4.11 ...
>>> Is there a strong reason why we should enforce java 1.6? otherwise
>>> would be good to support 1.5+
>>>
>>> Andreas
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> 
> 
> 

From pzgyuanf at gmail.com  Sat Oct 25 10:00:17 2008
From: pzgyuanf at gmail.com (pprun)
Date: Sat, 25 Oct 2008 22:00:17 +0800
Subject: [Biojava-dev] Test failed for Alphabet.getSymbolMatchType method
Message-ID: <49032671.1080309@gmail.com>

Hi,
The current implementation uses the same condition equalsIgnoreCase for
EXACT_STRING_MATCH and MIXED_CASE_MATCH


public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
...
if (a.toString().equalsIgnoreCase(b.toString())) {
return SymbolMatchType.EXACT_STRING_MATCH;
}
if (a.toString().equalsIgnoreCase(b.toString())) {
return SymbolMatchType.MIXED_CASE_MATCH;
}
...

String.equals should be used for EXACT_STRING_MATCH:

public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
...
if (a.toString().equals(b.toString())) {
return SymbolMatchType.EXACT_STRING_MATCH;
}
if (a.toString().equalsIgnoreCase(b.toString())) {
return SymbolMatchType.MIXED_CASE_MATCH;
}
...

The test case used to identify the above bug is:

/*
* BioJava development code
*
* This code may be freely distributed and modified under the
* terms of the GNU Lesser General Public Licence. This should
* be distributed with the code. If you do not have a copy,
* see:
*
* http://www.gnu.org/copyleft/lesser.html
*
* Copyright for this code is held jointly by the individual
* authors. These should be listed in @author doc comments.
*
* For more information on the BioJava project and its aims,
* or to join the biojava-l mailing list, visit the home page
* at:
*
* http://www.biojava.org/
*
*/
package org.biojava.core.symbol;

import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import static org.junit.Assert.*;

/**
*
* @author pprun
*/
public class AlphabetTest {

public AlphabetTest() {
}

@BeforeClass
public static void setUpClass() throws Exception {
}

@AfterClass
public static void tearDownClass() throws Exception {
}

@Before
public void setUp() {
}

@After
public void tearDown() {
}

/**
* Test of getSymbolMatchType method, of class Alphabet.
*/
@Test
public void testGetSymbolMatchType() {
System.out.println("getSymbolMatchType");

Alphabet testAlphabet = new Alphabet("testGetSymbolMatchType");

// 1. exact match
Symbol a = Symbol.get("ATGC");
Symbol b = Symbol.get("ATGC");
SymbolMatchType expResult = SymbolMatchType.EXACT_MATCH;
SymbolMatchType result = testAlphabet.getSymbolMatchType(a, b);
assertEquals(expResult, result);

// 2. mixed case match
a = Symbol.get("ATGC");
b = Symbol.get("aTGC");
expResult = SymbolMatchType.MIXED_CASE_MATCH;
result = testAlphabet.getSymbolMatchType(a, b);
assertEquals(expResult, result);
}
}


BTW., how can I get the dev/test role?
Then I can contribute to the development or test (as I'm still a
beginner for bio field) for BJ3.

Thanks,
Pprun


From andreas at sdsc.edu  Tue Oct 28 00:40:35 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 27 Oct 2008 21:40:35 -0700
Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship
In-Reply-To: <a0d826f40810230624i594909a9o74015ad3dd65501a@mail.gmail.com>
References: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>
	<a0d826f40810230245q730eb936g4109c71b9893d47@mail.gmail.com>
	<ee99d5730810230621i53a05c6bo16e047d37b2fd578@mail.gmail.com>
	<a0d826f40810230624i594909a9o74015ad3dd65501a@mail.gmail.com>
Message-ID: <59a41c430810272140h290a8a91q26af24946c2c63a5@mail.gmail.com>

Hi Richard,

I updated the 1.6 release with your fixes :
http://www.biojava.org/download/bj16/all/biojava-1.6.1-all.jar
Can you please verify and if it is correct update the download page on the wiki?

Andreas

On Thu, Oct 23, 2008 at 6:24 AM, Richard Holland
<dicknetherlands at gmail.com> wrote:
> Andreas - is it possible to rebuild biojava-1.6-all.jar with the following
> fix made to it?
>
> cheers,
> Richard
>
> ---------- Forwarded message ----------
> From: Christian K?berle <ch.koeberle at googlemail.com>
> Date: 2008/10/23
> Subject: Re: [Biojava-dev] BioSQL postgre BioEntryRelationship
> To: Richard Holland <dicknetherlands at gmail.com>
>
>
> Hi Richard,
>
> I found the error in the current download of biojava 6.1
> (http://www.biojava.org/download/bj16/all/biojava-1.6-all.jar) in the file
> src/org/biojavax/bio/db/biosql/pg/BioEntryRelationship.hbm.xml
>
> <hibernate-mapping>
>     <class name="org.biojavax.bio.SimpleBioEntryRelationship"
> table="bioentry_relationship" node="sequenceRelation"
> entity-name="BioEntryRelationship">
>     <id name="id" type="integer" unsaved-value="null"
> column="bioentry_relationship_id" node="@id">
>     <generator class="sequence">
> <param name="sequence">bioentry_relationship_pk_seq</param>
> </generator>
> </id>
> <many-to-one name="term" class="Term" column="term_id" not-null="true"
> cascade="persist,merge,save-update" node="@termId" embed-xml="false"/>
> <many-to-one name="object" class="Feature" column="object_bioentry_id"
> not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
> embed-xml="false"/>
> <many-to-one name="subject" class="BioEntry" column="subject_bioentry_id"
> not-null="true" cascade="persist,merge,save-update"
> node="@subjectBioEntryId" embed-xml="false"/>
> <property name="rank" node="@rank"/>
> </class>
> </hibernate-mapping>
>
> cheers,
> Christian
>
>
> 2008/10/23 Richard Holland <dicknetherlands at gmail.com>
>>
>> Christian,
>>
>> Thanks for your comments.
>>
>> I'm not sure which file you're referring to, or what version of BioJava
>> you have, as the line you quote does not appear in any of the current
>> hbm.xml files in the trunk of SubVersion.
>>
>> Also, the BioEntryRelationship interface and it's implementations do
>> already have getSubject() and getObject() methods which return the parent
>> and child BioEntry instances.
>>
>> The BioEntry interface itself has a getBioEntryRelationships() method
>> which returns all relationships in which it is the object BioEntry. You
>> could use HQL to obtain those for which it is the subject, but you are right
>> that it would be good to have a method that returns the latter. Could you
>> raise a BugZilla request for this?
>>
>> It would be good if you could do some thorough testing of your lazy
>> loading suggestions on some other use cases before we decide whether or not
>> to adopt that approach in future developments. Use cases would include:
>>
>> 1. have a very large database with thousands of related records in it
>> (e.g. load the whole of GenBank). Iterate over all the records in the
>> database and perform a simple read operation on each that hits the modified
>> methods. See if you run out of memory.
>>
>> 2. like 1, but perform a series of repeated read/write operations using
>> the modified methods, with a final commit to attempt to write the results
>> back to see if they still persist correctly.
>>
>> The reason is that the modified methods might cause problems with those
>> people who are processing large volumes of data in their databases. If all
>> related records are loaded at once, even only on demand, instead of one at a
>> time, it will cause memory issues. The trade off is therefore memory vs.
>> speed. We opted for the memory option because it makes life easier for most
>> novice coders to not have to trace out-of-memory exceptions (although they
>> will still occur using the existing methods, but it happens less often).
>>
>> Also, your method reruns the query every time it is called. It probably
>> should cache the results after the first call, to prevent objects being
>> reloaded unnecessarily, and to prevent problems with objects from a previous
>> call being modified then attempted to be overwritten by a subsequent call?
>> Also if Hibernate does not receive the same set back that it auto-loaded as
>> a property via the default get() method when it comes to save the object, it
>> will throw a wobbly and refuse to commit.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/23 Christian K?berle <ch.koeberle at googlemail.com>
>>>
>>> Hi,
>>> I found a bug in the postgre mapping file for BioEntryRelationship.
>>> line:
>>> <many-to-one name="object" class="Feature" column="object_bioentry_id"
>>> not-null="true" cascade="persist,merge,save-update"
>>> node="@objectFeatureId"
>>> embed-xml="false"/>
>>> The value for the attribute class has to be "BioEntry"
>>>
>>> For the BioEntry I miss methodes to have access to subject_bioentry
>>> BioEntryRelationship. I think the BioEntryRelationship. is a parent child
>>> relationship. So it will be nice to have access to both.
>>>
>>> Furthermore the hibernate mapping strategies for the BioSQL is quite slow
>>> and produces a lot of queries to the database. Because for all lists and
>>> set
>>> the lazy fetch mode is disable. In this mode hibernate will execute one
>>> query for each element in a list or set. The faster way is to enable the
>>> lazy fetch mode an use methods to load the list. Each of these methods
>>> executes only one query.
>>> For excample:
>>>
>>> public List<BioEntry> getParents(BioEntry bioEntry){
>>>
>>> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
>>> =:subject";
>>> Query query = session.createQuery(stmt);
>>> query.setParameter("subject", bioEntry);
>>> return query.list();
>>>
>>> }
>>>
>>>
>>> This is factor 2 to 4 faster than the methode
>>> BioEntry..getRelationships()
>>> In case of all dependences of an BioEntry-Object an select with lazy
>>> fetching can be 500 times faster than a select with eager fetching (in
>>> case
>>> of unigene cluster Hs.4 for example).
>>> Here a example for the relationship unigene cluster Hs.2 and the gene
>>> BC067218 (we use BioSQL to store Unigene)
>>>
>>> getParents():
>>> runtime: 14 msec
>>> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
>>> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
>>> bioentry1_.accession as accession89_, bioentry1_.description as
>>> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
>>> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id
>>> as
>>> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
>>> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as
>>> seq93_,
>>> case when bioentry1_1_.bioentry_id is not null then 2 when
>>> bioentry1_.bioentry_id is not null then 0 end as clazz_ from
>>> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
>>> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id
>>> left
>>> outer join unigene.biosequence bioentry1_1_ on
>>> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
>>> joinunigene.biosequence bioentry1_2_ on
>>> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
>>> bioentryre0_.object_bioentry_id=?
>>>
>>>
>>> bioEntry.getRelationships():
>>> runtime: 36 msec
>>> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
>>> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
>>> bioentry0_.accession as accession89_, bioentry0_.description as
>>> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
>>> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id
>>> as
>>> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
>>> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as
>>> seq93_,
>>> case when bioentry0_1_.bioentry_id is not null then 2 when
>>> bioentry0_.bioentry_id is not null then 0 end as clazz_ from
>>> unigene.bioentry bioentry0_ left outer join unigene.biosequence
>>> bioentry0_1_
>>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
>>> unigene.biosequence bioentry0_2_ on
>>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
>>> Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
>>> relationsh0_.bioentry_relationship_id as bioentry1_1_,
>>> relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
>>> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
>>> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
>>> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
>>> relationsh0_ where relationsh0_.object_bioentry_id=?
>>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
>>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
>>> namespace0_.description as descript4_80_0_ from unigene.biodatabase
>>> namespace0_ where namespace0_.biodatabase_id=?
>>> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_,
>>> bioentry0_.name
>>> as name89_0_, bioentry0_.identifier as identifier89_0_,
>>> bioentry0_.accession
>>> as accession89_0_, bioentry0_.description as descript5_89_0_,
>>> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
>>> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
>>> biodatab9_89_0_, bioentry0_1_.version as version93_0_,
>>> bioentry0_1_.length
>>> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq
>>> as
>>> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
>>> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
>>> unigene.bioentry bioentry0_ left outer join unigene.biosequence
>>> bioentry0_1_
>>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
>>> unigene.biosequence bioentry0_2_ on
>>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
>>> bioentry0_.bioentry_id=?
>>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
>>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
>>> namespace0_.description as descript4_80_0_ from unigene.biodatabase
>>> namespace0_ where namespace0_.biodatabase_id=?
>>> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as
>>> name84_0_,
>>> term0_.identifier as identifier84_0_, term0_.definition as
>>> definition84_0_,
>>> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_
>>> from
>>> unigene.term term0_ where term0_.term_id=?
>>> Hibernate: select ontology0_.ontology_id as ontology1_83_0_,
>>> ontology0_.name
>>> as name83_0_, ontology0_.definition as definition83_0_ from
>>> unigene.ontology
>>> ontology0_ where ontology0_.ontology_id=?
>>> Hibernate: select termset0_.ontology_id as ontology6_1_,
>>> termset0_.term_id
>>> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as
>>> name84_0_,
>>> termset0_.identifier as identifier84_0_, termset0_.definition as
>>> definition84_0_, termset0_.is_obsolete as is5_84_0_,
>>> termset0_.ontology_id
>>> as ontology6_84_0_ from unigene.term termset0_ where
>>> termset0_.ontology_id=?
>>> Hibernate: select tripleset0_.ontology_id as ontology5_1_,
>>> tripleset0_.term_relationship_id as term1_1_,
>>> tripleset0_.term_relationship_id as term1_87_0_,
>>> tripleset0_.subject_term_id
>>> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
>>> tripleset0_.predicate_term_id as predicate4_87_0_,
>>> tripleset0_.ontology_id
>>> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
>>> tripleset0_.ontology_id=?
>>> Hibernate: select rankedcros0_.term_id as term1_0_,
>>> rankedcros0_.dbxref_id
>>> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
>>> rankedcros0_ where rankedcros0_.term_id=?
>>> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym
>>> as
>>> synonym0_ from unigene.term_synonym synonymset0_ where
>>> synonymset0_.term_id=?
>>>
>>> --
>>> Christian K?berle
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>
>
>
> --
> Christian K?berle
> Sch?nholzerstr. 5
> 10115 Berlin
> Mobil: 0179 79 35 345
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>


From bugzilla-daemon at portal.open-bio.org  Wed Oct  1 20:48:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 1 Oct 2008 16:48:15 -0400
Subject: [Biojava-dev] [Bug 2602] New: ParseException thrown when parsing
	Genbank file.
Message-ID: <bug-2602-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2602

           Summary: ParseException thrown when parsing Genbank file.
           Product: BioJava
           Version: live (CVS source)
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P1
         Component: seq.io
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: tritt at wisc.edu


When attempting to read in a Genbank file using RichSequence.IOTools, I
received a ParseException. When using SeqIOTools, I do not have this problem.
The code that exposed the bug is given below. 

public static void main(String[] args) {

        String dnaDir = args[args.length-1];

        BufferedReader[] br = new BufferedReader[8];

        FileReader orthologs = null;
        for (int i = 0; i < br.length; i++)
                br[i] = null;

        try {
                orthologs = new FileReader(args[0]);
                for (int i = 0; i < br.length; i++)
                        br[i] = new BufferedReader(new FileReader(args[i+1]));
        } catch (FileNotFoundException ex){
                ex.printStackTrace();
                System.exit(-1);
        }

        RichSequenceIterator[] seqIt = new RichSequenceIterator[8];

        HashMap<String,RichFeature>[] features = new HashMap[8];
        for (int i = 0; i < features.length; i++){
                features[i] = new HashMap<String,RichFeature>();
        }

        for (int i = 0; i < br.length; i++)
                seqIt[i] = RichSequence.IOTools.readGenbankDNA(br[i], null);

        for (int i = 0; i < seqIt.length; i++){
                RichSequence seq = null;
                try {
                        seq = seqIt[i].nextRichSequence();
                        seqIt[i] = null;
                        br[i] = null;
                } catch (NoSuchElementException ex) {
                        ex.printStackTrace();
                        System.exit(-1);
                } catch (BioException ex) {
                        ex.printStackTrace();
                        System.exit(-1);
                }
                 .
                 .
                 .

The following error message was received.

org.biojava.bio.BioException: Could not read sequence
        at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
        at OrthologSeqExtractor.main(OrthologSeqExtractor.java:76)
Caused by: org.biojava.bio.seq.io.ParseException: 

A Exception Has Occurred During Parsing. 
Please submit the details that follow to biojava-l at biojava.org or post a bug
report to http://bugzilla.open-bio.org/ 

Format_object=org.biojavax.bio.seq.io.GenbankFormat
Accession=EDL933
Id=null
Comments=Bad dbxref
Parse_block=FEATURES   Location/Qualifierssource   1..5528423/db_xref  
"GenBank:AE005174"/db_xref   "RefSeq_NA:NC_002655"/db_xref  
"ATCC:700927"/db_xref   "taxon:155864"/db_xref   "ERIC:SOP"/mol_type   "genomic
DNA"/note   "enterohemorrhagic"/organism   "Escherichia coli"/serotype  
"O157:H7:K-"/strain   "EDL933"/transl_table   11/db_xref  
"ASAP:ABH-0023909"/db_xref   "ERIC:ABH-0023909"

                  .
                  .
                  .

Stack trace follows ....


        at
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:462)
        at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
        ... 1 more


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Oct  2 07:54:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 2 Oct 2008 03:54:42 -0400
Subject: [Biojava-dev] [Bug 2603] New: StringIndexOutOfBoundsException while
	parsing blastresult
Message-ID: <bug-2603-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603

           Summary: StringIndexOutOfBoundsException while parsing
                    blastresult
           Product: BioJava
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: bio
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: dtoomey at rcsi.ie


While parsing a blast result I get a StringIndexOutOfBoundsException. I have
narrowed down the cuase of the error to this section

Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol
2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
GN=ISPF

What I have found is that if the 3rd line is less than 11 characters long the
error is thrown. If I add text or even extra spaces to this line then the error
does not occur. Also I have noticed that it does not happen to the first entry
in a file containing multiple blast searches.

I have tried this on both Windows and Linux and get the same error. I have been
using blast version 2.2.18 but have also tried 2.2.17


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Oct  3 10:30:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 3 Oct 2008 06:30:16 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810031030.m93AUGcD007688@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #1 from dtoomey at rcsi.ie  2008-10-03 06:30 EST -------
I have narrowed down the offending line to

oParsedSeq = poLine.substring( iOffset).concat( new String( oPadding ) );

from 'BlastLikeAlignmentSAXParser.java'

I have put in a hack which at least allows me to run the code

                try {
                        oParsedSeq = poLine.substring( iOffset).concat( new
String( oPadding ) );
                } catch (StringIndexOutOfBoundsException ex) {
                        System.out.println("Caught sub string error for poLine:
" + poLine + " Offset is " + String.valueOf(iOffset));
                        oParsedSeq = poLine.concat( new String( oPadding ) );
                }

(In reply to comment #0)
> While parsing a blast result I get a StringIndexOutOfBoundsException. I have
> narrowed down the cuase of the error to this section
> Query= sp|P62368|ISPF_PLAF7 2-C-methyl-D-erythritol
> 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
> GN=ISPF
> What I have found is that if the 3rd line is less than 11 characters long the
> error is thrown. If I add text or even extra spaces to this line then the error
> does not occur. Also I have noticed that it does not happen to the first entry
> in a file containing multiple blast searches.
> I have tried this on both Windows and Linux and get the same error. I have been
> using blast version 2.2.18 but have also tried 2.2.17


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Oct 15 08:12:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 15 Oct 2008 04:12:18 -0400
Subject: [Biojava-dev] [Bug 2617] New: Cookbook blast parser example fails
	on a tblastn example
Message-ID: <bug-2617-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2617

           Summary: Cookbook blast parser example fails on a tblastn example
           Product: BioJava
           Version: live (CVS source)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: search
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: holland at ebi.ac.uk


(raised on behalf of user Charles Imbusch)

Hello,

for a project I want to parse a tblastn result with BioJava. I used the code
on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I get an
error message as follows:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String
index out of range: -3
  at java.lang.String.substring(String.java:1938)
  at java.lang.String.substring(String.java:1905)
  at
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289)
  at
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251)
  at
org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118)
  at
org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635)
  at
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337)
  at org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
  at
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313)
  at
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276)
  at
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162)
  at BlastEcho.echo(BlastEcho.java:29)
  at BlastEcho.main(BlastEcho.java:75)

I uploaded the Blast output file I want to parse here:
http://charles.imbusch.net/tmp/blastresult.txt

Any answer is appreciated.

Cheers,
 Charles


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From f.jossinet at ibmc.u-strasbg.fr  Wed Oct 15 08:36:09 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Wed, 15 Oct 2008 10:36:09 +0200
Subject: [Biojava-dev] Proposition of participation to the BioJava project
Message-ID: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>

Dear BioJava team,

my name is Fabrice Jossinet. I'm working as assistant professor in a  
french university (Louis Pasteur University in Strasbourg).
I'm developing bioinformatics tool with the Java language since 2002.  
Before that, I did a PhD as  a molecular biologist at the bench ;)
I'm interested in the study of RNA. At now I'm focused on their  
structural features, but i'm also interested in non-coding RNA genes  
in genomes.
You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ 
. At now this project has a size of 60 000 lines of code and uses more  
than 10 external libraries.

I'm following BioJava since several years now. I would like to extend  
it with RNA concepts. If you think that I can participate, don't  
hesitate to answer me ;)

All the best

Fabrice

--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From simpleyrx at 163.com  Wed Oct 15 09:11:50 2008
From: simpleyrx at 163.com (simpleyrx)
Date: Wed, 15 Oct 2008 17:11:50 +0800 (CST)
Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ?
In-Reply-To: <mailman.2377.1224061218.3070.biojava-dev@lists.open-bio.org>
References: <mailman.2377.1224061218.3070.biojava-dev@lists.open-bio.org>
Message-ID: <7852810.354291224061911001.JavaMail.coremail@app143.163.com>

 
Dear experts,
 
         I wonder that can biojava can calcaulte profile-profile alignment ?
 
 
--


student  


From bugzilla-daemon at portal.open-bio.org  Wed Oct 15 16:05:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 15 Oct 2008 12:05:23 -0400
Subject: [Biojava-dev] [Bug 2617] Cookbook blast parser example fails on a
	tblastn example
In-Reply-To: <bug-2617-485@http.bugzilla.open-bio.org/>
Message-ID: <200810151605.m9FG5Nhb004488@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2617


holland at ebi.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from holland at ebi.ac.uk  2008-10-15 12:05 EST -------


*** This bug has been marked as a duplicate of bug 2603 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Oct 15 16:05:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 15 Oct 2008 12:05:25 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810151605.m9FG5PZo004505@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


holland at ebi.ac.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |holland at ebi.ac.uk


------- Comment #2 from holland at ebi.ac.uk  2008-10-15 12:05 EST -------
*** Bug 2617 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From holland at eaglegenomics.com  Wed Oct 15 16:25:16 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 15 Oct 2008 17:25:16 +0100
Subject: [Biojava-dev] Proposition of participation to the BioJava
	project
In-Reply-To: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>
References: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>
Message-ID: <a0d826f40810150925o2c97e5eeob00de5e9e58f5976@mail.gmail.com>

You're absolutely welcome to contribute! We appreciate all the help we can
get.

I will be sending out an email to the BioJava mailing lists in the next
couple of days inviting contributions for the new BioJava 3 code and
describing how to go about it. I think your RNA ideas would be a great
starting point.

cheers,
Richard

2008/10/15 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>

> Dear BioJava team,
>
> my name is Fabrice Jossinet. I'm working as assistant professor in a french
> university (Louis Pasteur University in Strasbourg).
> I'm developing bioinformatics tool with the Java language since 2002.
> Before that, I did a PhD as  a molecular biologist at the bench ;)
> I'm interested in the study of RNA. At now I'm focused on their structural
> features, but i'm also interested in non-coding RNA genes in genomes.
> You can have a look at my current project at this address:
> http://paradise-ibmc.u-strasbg.fr/. At now this project has a size of 60
> 000 lines of code and uses more than 10 external libraries.
>
> I'm following BioJava since several years now. I would like to extend it
> with RNA concepts. If you think that I can participate, don't hesitate to
> answer me ;)
>
> All the best
>
> Fabrice
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Wed Oct 15 16:29:59 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 15 Oct 2008 17:29:59 +0100
Subject: [Biojava-dev] can biojava calcaulte profile-profile alignment ?
In-Reply-To: <7852810.354291224061911001.JavaMail.coremail@app143.163.com>
References: <mailman.2377.1224061218.3070.biojava-dev@lists.open-bio.org>
	<7852810.354291224061911001.JavaMail.coremail@app143.163.com>
Message-ID: <a0d826f40810150929n76d861a0r16476be43a9ca831@mail.gmail.com>

The short answer: no.

The long answer: not yet! But if someone would like to contribute some code
that can do it, watch out for my email to the mailing lists in the next
couple of days inviting contributions for the new BioJava 3 code base.

cheers,
Richard

2008/10/15 simpleyrx <simpleyrx at 163.com>

>
> Dear experts,
>
>         I wonder that can biojava can calcaulte profile-profile alignment ?
>
>
>
>
> --
>
>
> student
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 06:15:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 02:15:05 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160615.m9G6F5Tk014016@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #3 from tbanks at agr.gc.ca  2008-10-16 02:15 EST -------
Created an attachment (id=1007)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1007&action=view)
patch file 1 for bug 2603


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 06:15:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 02:15:46 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160615.m9G6FkaF014096@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #4 from tbanks at agr.gc.ca  2008-10-16 02:15 EST -------
Created an attachment (id=1008)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1008&action=view)
patch file 2 for bug 2603


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 06:18:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 02:18:10 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160618.m9G6IATb014290@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #5 from tbanks at agr.gc.ca  2008-10-16 02:18 EST -------
I've written up a fix for this bug.  As Richard suspected this fix takes care
of bug 2617 (I've tested both).  I've attached the patch files for the two
affected files.  If the patches don't take let me know and I'll email the
files.

- Travis


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From f.jossinet at ibmc.u-strasbg.fr  Thu Oct 16 08:50:54 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Thu, 16 Oct 2008 10:50:54 +0200
Subject: [Biojava-dev] Proposition of participation to the BioJava
	project
In-Reply-To: <a0d826f40810150925o2c97e5eeob00de5e9e58f5976@mail.gmail.com>
References: <A2FA53C9-195D-413A-8BEF-C201049272F9@ibmc.u-strasbg.fr>
	<a0d826f40810150925o2c97e5eeob00de5e9e58f5976@mail.gmail.com>
Message-ID: <65EB20E6-6137-441B-AC13-26031D46BDFE@ibmc.u-strasbg.fr>

Dear Richard,

Thank you very much. I'm looking forward to this invitation.

All the best

Fabrice

Le 15 oct. 08 ? 18:25, Richard Holland a ?crit :

> You're absolutely welcome to contribute! We appreciate all the help  
> we can get.
>
> I will be sending out an email to the BioJava mailing lists in the  
> next couple of days inviting contributions for the new BioJava 3  
> code and describing how to go about it. I think your RNA ideas would  
> be a great starting point.
>
> cheers,
> Richard
>
> 2008/10/15 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
> Dear BioJava team,
>
> my name is Fabrice Jossinet. I'm working as assistant professor in a  
> french university (Louis Pasteur University in Strasbourg).
> I'm developing bioinformatics tool with the Java language since  
> 2002. Before that, I did a PhD as  a molecular biologist at the  
> bench ;)
> I'm interested in the study of RNA. At now I'm focused on their  
> structural features, but i'm also interested in non-coding RNA genes  
> in genomes.
> You can have a look at my current project at this address: http://paradise-ibmc.u-strasbg.fr/ 
> . At now this project has a size of 60 000 lines of code and uses  
> more than 10 external libraries.
>
> I'm following BioJava since several years now. I would like to  
> extend it with RNA concepts. If you think that I can participate,  
> don't hesitate to answer me ;)
>
> All the best
>
> Fabrice
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
> -- 
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/


From bugzilla-daemon at portal.open-bio.org  Thu Oct 16 09:39:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 16 Oct 2008 05:39:11 -0400
Subject: [Biojava-dev] [Bug 2603] StringIndexOutOfBoundsException while
	parsing blastresult
In-Reply-To: <bug-2603-485@http.bugzilla.open-bio.org/>
Message-ID: <200810160939.m9G9dBGm028921@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2603


------- Comment #6 from holland at ebi.ac.uk  2008-10-16 05:39 EST -------
Thanks for the patches! Could you email me the complete two files that you've
modified (it's easier for me to just copy-and-paste the entire file). I'll then
commit them to SVN.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From fbristow at gmail.com  Fri Oct 17 18:58:08 2008
From: fbristow at gmail.com (Franklin Bristow)
Date: Fri, 17 Oct 2008 13:58:08 -0500
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
Message-ID: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>

Hello everyone,
I've been doing some work with swissprot, and I've been needing to make use
of the file reading and writing facilities in biojava.

I was using biojava 1.5, but I've recently moved to using biojava-live so
that I can actually step through the code to see what's going on.

I have successfully created an index of my swissprot database and I can read
my sequences out of that indexed database.  All of the appropriate
information is loaded from the records in the file into the appropriate
objects.  I am quite happy with this.

The problem that I am having has to do with writing swissprot records.

When I started using biojava, the recommended way to do this was using
SeqIOTools:
SeqIOTools.writeSwissprot(byteStream, swissSequence);

While this works (ie: no exceptions are thrown), the record that is printed
to the byteStream looks pretty ugly (it's littered with XX lines) and is not
valid as per the current swissprot file spec (
http://www.expasy.ch/sprot/userman.html).  While this record is invalid, it
does contain all of the information that was originally in the swissprot
file.  I would include what I get as an output here, but it's irrelevant.

SeqIOTools became deprecated in favour of this:
RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);

Once again, while this works (and this time the record is valid), the record
that is printed contains almost none of the original information that is
contained in the swissprot record.  This is the output that I get when I
call this method (the spacing is may not look right because of fonts, but
that is not the problem):

ID   Q4UVA7_null             STANDARD;         273 AA.
> AC   Q4UVA7;
> DT   null, integrated into UniProtKB/?.
> DT   null, sequence version 0.
> DT   null, entry version 0.
> DE   null.
> FT   any           1    273
> FT   any         153    160
> SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> //
>

But what I am expecting to see looks like this (again, the spacing is the
fault of the font, not the output):

> ID   Y1953_XANC8             Reviewed;         273 AA.
> AC   Q4UVA7;
> DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 1.
> DT   06-FEB-2007, entry version 12.
> DE   UPF0085 protein XC_1953.
> GN   OrderedLocusNames=XC_1953;
> OS   Xanthomonas campestris pv. campestris (strain 8004).
> OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
> OC   Xanthomonadaceae; Xanthomonas.
> OX   NCBI_TaxID=314565;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
> RX   PubMed=15899963; DOI=10.1101/gr.3378705;
> RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q.,
> RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
> RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B.,
> RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
> RT   "Comparative and functional genomic analyses of the pathogenicity of
> RT   phytopathogen Xanthomonas campestris pv. campestris.";
> RL   Genome Res. 15:757-767(2005).
> CC   -!- SIMILARITY: Belongs to the UPF0085 family.
> CC   ------------------------------------------------------------
> -----------
> CC   Copyrighted by the UniProt Consortium, see
> http://www.uniprot.org/terms
> CC   Distributed under the Creative Commons Attribution-NoDerivs License
> CC   ------------------------------------------------------------
> -----------
> DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
> DR   GenomeReviews; CP000050_GR; XC_1953.
> DR   KEGG; xcb:XC_1953; -.
> DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
> DR   HAMAP; MF_01062; -; 1.
> DR   InterPro; IPR005177; DUF299.
> DR   Pfam; PF03618; DUF299; 1.
> KW   ATP-binding; Complete proteome; Nucleotide-binding.
> FT   CHAIN         1    273       UPF0085 protein XC_1953.
> FT                                /FTId=PRO_0000196744.
> FT   NP_BIND     153    160       ATP (Potential).
> SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> //
>

Needless to say, there is a considerable loss of information.

At first I wasn't sure if this was a problem with parsing the database that
I had, so I inspected the object that was retrieved from the database.  As I
mentioned before, the parsing seems to be working fine.  I get a
SimpleSequence object that has all of the correct annotations and other
information loaded into it.

I then continued to step through the writeUniProt method in
RichSequence.IOTools and found that this method first calls "enrich" on
SimpleSequence which turns it into a SimpleRichSequence.  There appears to
be some loss of information at this point, specifically in the feature set
where the 'key name' is lost -- it just becomes 'any'.

It is when we get to the actual process of writing to the stream in
UniprotFormat.writeSequence that we have the problems.  All of the code
appears to be there for printing the information out that I'm expecting.  I
think the problem is that in the process of "enrich"-ing the sequence, the
data is still stored in the object, but it is no longer where it is expected
to be.  For example, when we get to writing the comments out:
        // comments - if any
        if (!rs.getComments().isEmpty()) {

The List of comments IS empty, but there are comments in the
SimpleRichSequence, they are stored in the notes data member.

So.  After this lengthy explanation of my problem, I am wondering if I am
merely not doing this correctly.  Is there a better way to pass my
information to the writeUniprot method -- should I be transforming my
SimpleSequence objects into a SimpleRichSequence manually?  Am I just going
about this entirely the wrong way?

If I am going about this correctly and the functionality to do this is
merely not there or hasn't been implemented correctly, I would be more than
happy to help out...  I can supply patches, create bug reports, or anything
else that is necessary.

Any guidance in this matter would be greatly appreciated!

-- 
Franklin


From holland at eaglegenomics.com  Fri Oct 17 20:08:25 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 17 Oct 2008 21:08:25 +0100
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
In-Reply-To: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
Message-ID: <a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>

Hello.

I'm not sure how you're getting your uniprot records out of your swissprot
database, or what format your swissprot database is in? If it's BioSQL, then
the way BioJava interacts with it has altered significantly with BioJavaX -
previous versions basically stuffed everything in as comments, hence all the
XX lines you got when writing it back out again. However if it's not BioSQL
and you've written something custom of your own, then I couldn't really
comment!

BioJavaX will attempt to convert the old sequence objects into rich sequence
objects, but there's not much in common between the way uniprot data is
stored in the old object model and the new one. Therefore the enrich method
can't do a very good job - especially for stuff which the original parser
stored as comments instead of properly distributing it across the object
model. Data which the original parser stored in this comment format will
mostly get ignored by the conversion process, because the conversion process
has no idea where the record came from and therefore what to do with the
comments inside it.

Your best bet is to read your data out of your database directly as rich
sequence objects, or if not possible, then do the conversion manually.

cheers,
Richard


2008/10/17 Franklin Bristow <fbristow at gmail.com>

> Hello everyone,
> I've been doing some work with swissprot, and I've been needing to make use
> of the file reading and writing facilities in biojava.
>
> I was using biojava 1.5, but I've recently moved to using biojava-live so
> that I can actually step through the code to see what's going on.
>
> I have successfully created an index of my swissprot database and I can
> read
> my sequences out of that indexed database.  All of the appropriate
> information is loaded from the records in the file into the appropriate
> objects.  I am quite happy with this.
>
> The problem that I am having has to do with writing swissprot records.
>
> When I started using biojava, the recommended way to do this was using
> SeqIOTools:
> SeqIOTools.writeSwissprot(byteStream, swissSequence);
>
> While this works (ie: no exceptions are thrown), the record that is printed
> to the byteStream looks pretty ugly (it's littered with XX lines) and is
> not
> valid as per the current swissprot file spec (
> http://www.expasy.ch/sprot/userman.html).  While this record is invalid,
> it
> does contain all of the information that was originally in the swissprot
> file.  I would include what I get as an output here, but it's irrelevant.
>
> SeqIOTools became deprecated in favour of this:
> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);
>
> Once again, while this works (and this time the record is valid), the
> record
> that is printed contains almost none of the original information that is
> contained in the swissprot record.  This is the output that I get when I
> call this method (the spacing is may not look right because of fonts, but
> that is not the problem):
>
> ID   Q4UVA7_null             STANDARD;         273 AA.
> > AC   Q4UVA7;
> > DT   null, integrated into UniProtKB/?.
> > DT   null, sequence version 0.
> > DT   null, entry version 0.
> > DE   null.
> > FT   any           1    273
> > FT   any         153    160
> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> > //
> >
>
> But what I am expecting to see looks like this (again, the spacing is the
> fault of the font, not the output):
>
> > ID   Y1953_XANC8             Reviewed;         273 AA.
> > AC   Q4UVA7;
> > DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
> > DT   05-JUL-2005, sequence version 1.
> > DT   06-FEB-2007, entry version 12.
> > DE   UPF0085 protein XC_1953.
> > GN   OrderedLocusNames=XC_1953;
> > OS   Xanthomonas campestris pv. campestris (strain 8004).
> > OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
> > OC   Xanthomonadaceae; Xanthomonas.
> > OX   NCBI_TaxID=314565;
> > RN   [1]
> > RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
> > RX   PubMed=15899963; DOI=10.1101/gr.3378705;
> > RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q.,
> > RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
> > RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B.,
> > RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
> > RT   "Comparative and functional genomic analyses of the pathogenicity of
> > RT   phytopathogen Xanthomonas campestris pv. campestris.";
> > RL   Genome Res. 15:757-767(2005).
> > CC   -!- SIMILARITY: Belongs to the UPF0085 family.
> > CC   ------------------------------------------------------------
> > -----------
> > CC   Copyrighted by the UniProt Consortium, see
> > http://www.uniprot.org/terms
> > CC   Distributed under the Creative Commons Attribution-NoDerivs License
> > CC   ------------------------------------------------------------
> > -----------
> > DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
> > DR   GenomeReviews; CP000050_GR; XC_1953.
> > DR   KEGG; xcb:XC_1953; -.
> > DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
> > DR   HAMAP; MF_01062; -; 1.
> > DR   InterPro; IPR005177; DUF299.
> > DR   Pfam; PF03618; DUF299; 1.
> > KW   ATP-binding; Complete proteome; Nucleotide-binding.
> > FT   CHAIN         1    273       UPF0085 protein XC_1953.
> > FT                                /FTId=PRO_0000196744.
> > FT   NP_BIND     153    160       ATP (Potential).
> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
> > //
> >
>
> Needless to say, there is a considerable loss of information.
>
> At first I wasn't sure if this was a problem with parsing the database that
> I had, so I inspected the object that was retrieved from the database.  As
> I
> mentioned before, the parsing seems to be working fine.  I get a
> SimpleSequence object that has all of the correct annotations and other
> information loaded into it.
>
> I then continued to step through the writeUniProt method in
> RichSequence.IOTools and found that this method first calls "enrich" on
> SimpleSequence which turns it into a SimpleRichSequence.  There appears to
> be some loss of information at this point, specifically in the feature set
> where the 'key name' is lost -- it just becomes 'any'.
>
> It is when we get to the actual process of writing to the stream in
> UniprotFormat.writeSequence that we have the problems.  All of the code
> appears to be there for printing the information out that I'm expecting.  I
> think the problem is that in the process of "enrich"-ing the sequence, the
> data is still stored in the object, but it is no longer where it is
> expected
> to be.  For example, when we get to writing the comments out:
>        // comments - if any
>        if (!rs.getComments().isEmpty()) {
>
> The List of comments IS empty, but there are comments in the
> SimpleRichSequence, they are stored in the notes data member.
>
> So.  After this lengthy explanation of my problem, I am wondering if I am
> merely not doing this correctly.  Is there a better way to pass my
> information to the writeUniprot method -- should I be transforming my
> SimpleSequence objects into a SimpleRichSequence manually?  Am I just going
> about this entirely the wrong way?
>
> If I am going about this correctly and the functionality to do this is
> merely not there or hasn't been implemented correctly, I would be more than
> happy to help out...  I can supply patches, create bug reports, or anything
> else that is necessary.
>
> Any guidance in this matter would be greatly appreciated!
>
> --
> Franklin
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Oct 20 00:18:29 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 01:18:29 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
Message-ID: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>

Hi all,

I've just committed some new code to the biojava3 branch of the biojava-live
subversion repository. It's the foundations of a brand new alphabet+symbol
set of classes, and an example of how to use them to represent DNA. You'll
notice that the new code is very lightweight and allows for a lot more
flexibility than the old code - for instance, the concept of Alphabet has
changed radically. It also makes much more extensive use of the Collections
API.

I haven't got any test cases or usage examples yet but give me a shout if
you don't understand the code and I'll explain how it works. (Hint:
SymbolFormat is there to convert Strings into SymbolList objects, and vice
versa).

So, now we want some volunteers! We're starting from scratch here so there's
a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
whether it be copy-and-paste existing classes and modify them to suit the
new style, or write completely new ones to provide equivalent functionality.


I'll post an example of how to do file parsing soon, probably starting with
FASTA. In the meantime, a good place to start would be for people to design
object models to represent their favourite data types (e.g. Genbank, or
microarray data). Utility classes to manipulate those objects would be great
too.

The object models need to be normalised as much as possible - e.g. if your
data has a lot of comments, and the order of those comments is important,
then give your object model a collection of comment objects. The object
model for each data type should be completely independent and use basic data
types wherever possible (e.g. store sequences as strings, don't attempt to
parse them into anything fancy like SymbolLists). The closer the object
model is to the original data format, the better. There's going to be clever
tricks when it comes to converting data between different object models
(e.g. Genbank to INSDSeq), which I will explain later when I put the file
parsing examples up.

You'll notice how the biojava3 branch uses Maven instead of Ant. This is
because we want to make it as modular as possible, so if you want to write
microarray stuff, create a new microarray sub-project (as per the dna
example that's already there). This way if someone only wants the microarray
bit of BJ3, they only need install the appropriate JAR file and can ignore
the rest. (The 'core' module is for stuff that is so generic it could be
used anywhere, or is used in every single other module.)

If coding isn't your cup of tea, then we would very much welcome testers
(particularly those who enjoy writing test cases!), documenters
(particularly code commenters), translators (for internationalisation of the
code), and of course all those who wish to contribute ideas and suggestions
no matter how off-the-wall they might be. In particular if you'd like to
take charge of an area of the development process, e.g. Documentation Chief,
or Protein Champion, then that would be much appreciated.

I'm very much looking forward to working with everyone on this. Good luck,
and happy coding!

cheers,
Richard

PS. Please don't forget to attach the appropriate licence to your code. You
can copy-and-paste it from the existing classes I just committed this
evening.

PPS. For those who are worried about backwards compatibility - this was
discussed on the lists a while back and it was made clear that BJ3 is a
clean break. However, the existing code will continue to be maintained and
bugfixed for a couple of years so you don't have to upgrade if you don't
want to - it just won't have any new features developed for it. This is
largely because it'll probably take just that long to write all the new BJ3
code. When we do decide to desupport the existing BJ code, plenty of notice
will be given (i.e. years as opposed to months).


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Mon Oct 20 04:13:01 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 20 Oct 2008 12:13:01 +0800
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
Message-ID: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com>

Hi -

Just a comment ...

Does an alphabet need to be a Singleton in this new paradigm? If it
does then do you want to have an equals() method? Currently you could
have:

Alphabet a; Alphabet b;

a.equals(b) //true;
a == b //false

Unless there is a strong reason why Alphabet needs to be a Singleton I
don't think it should be (Singletons make life hard when transporting
between JVMs).  You can get a similar kind of behaivor with caching
where it doesn't hurt if there is more than one instance of an equal
alphabet but when they pass through the cache they can get cleaned up
(like the interning behaivour of Strings).

Put it this way. If I have two copies of the DNA alphabet will it
matter (other than a bit of memory waste)?

- Mark

On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland
<holland at eaglegenomics.com> wrote:
> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at eaglegenomics.com  Mon Oct 20 08:23:17 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 09:23:17 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<93b45ca50810192113g4ef0484cm2154f97c3c440f3f@mail.gmail.com>
Message-ID: <a0d826f40810200123x3e3b4d79s71d4aaa89545f4b5@mail.gmail.com>

Good point, and the answer is no it doesn't really matter! So I will remove
the singleton-ish ness of Alphabet.


2008/10/20 Mark Schreiber <markjschreiber at gmail.com>

> Hi -
>
> Just a comment ...
>
> Does an alphabet need to be a Singleton in this new paradigm? If it
> does then do you want to have an equals() method? Currently you could
> have:
>
> Alphabet a; Alphabet b;
>
> a.equals(b) //true;
> a == b //false
>
> Unless there is a strong reason why Alphabet needs to be a Singleton I
> don't think it should be (Singletons make life hard when transporting
> between JVMs).  You can get a similar kind of behaivor with caching
> where it doesn't hurt if there is more than one instance of an equal
> alphabet but when they pass through the cache they can get cleaned up
> (like the interning behaivour of Strings).
>
> Put it this way. If I have two copies of the DNA alphabet will it
> matter (other than a bit of memory waste)?
>
> - Mark
>
> On Mon, Oct 20, 2008 at 8:18 AM, Richard Holland
> <holland at eaglegenomics.com> wrote:
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From fbristow at gmail.com  Mon Oct 20 13:36:15 2008
From: fbristow at gmail.com (Franklin Bristow)
Date: Mon, 20 Oct 2008 08:36:15 -0500
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
In-Reply-To: <a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>
References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
	<a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>
Message-ID: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com>

Hi Richard,
I'm getting my records from an indexed flat file.  I indexed the file using
IndexTools.indexSwissprot().  I am then retrieving the records from the flat
file "database" using the SequenceDBLite interface which is being provided
to me using the Registry and SystemRegistry classes.  The following a simple
example of what I am doing:

First I index the flat file:

> File[] files = new File[] { new File("/home/fbristow/db/uniprot_sprot.dat")
> };
> try {
>       IndexTools.indexSwissprot("uniprot_sprot", new
> File("/home/fbristow/db/index/uniprot_sprot"), files);
> } catch (BioException bioE) {
>       bioE.printStackTrace();
> } catch (ParserException parseE) {
>       parseE.printStackTrace();
> } catch (IOException ioE) {
>       ioE.printStackTrace();
> }


Then I get a handle on that file by doing:

> Registry registry = SystemRegistry.instance();
> setSwissDatabase(registry.getDatabase("swissprot"))
>

And I have a file in /etc that tells the registry how to find the indexes
with the swissprot identifier as per
http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html

Ultimately, this gives me a class that implements the interface
SequenceDBLite, and when I query this interface for sequences it returns to
me Sequence objects.  I can't seem to see anything that would give me a
RichSequence, so I think that I'll continue to get them in this manner, but
I'll convert the Sequence objects into RichSequence objects myself.

Thanks for your attention!


On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Hello.
>
> I'm not sure how you're getting your uniprot records out of your swissprot
> database, or what format your swissprot database is in? If it's BioSQL, then
> the way BioJava interacts with it has altered significantly with BioJavaX -
> previous versions basically stuffed everything in as comments, hence all the
> XX lines you got when writing it back out again. However if it's not BioSQL
> and you've written something custom of your own, then I couldn't really
> comment!
>
> BioJavaX will attempt to convert the old sequence objects into rich
> sequence objects, but there's not much in common between the way uniprot
> data is stored in the old object model and the new one. Therefore the enrich
> method can't do a very good job - especially for stuff which the original
> parser stored as comments instead of properly distributing it across the
> object model. Data which the original parser stored in this comment format
> will mostly get ignored by the conversion process, because the conversion
> process has no idea where the record came from and therefore what to do with
> the comments inside it.
>
> Your best bet is to read your data out of your database directly as rich
> sequence objects, or if not possible, then do the conversion manually.
>
> cheers,
> Richard
>
>
> 2008/10/17 Franklin Bristow <fbristow at gmail.com>
>
>> Hello everyone,
>> I've been doing some work with swissprot, and I've been needing to make
>> use
>> of the file reading and writing facilities in biojava.
>>
>> I was using biojava 1.5, but I've recently moved to using biojava-live so
>> that I can actually step through the code to see what's going on.
>>
>> I have successfully created an index of my swissprot database and I can
>> read
>> my sequences out of that indexed database.  All of the appropriate
>> information is loaded from the records in the file into the appropriate
>> objects.  I am quite happy with this.
>>
>> The problem that I am having has to do with writing swissprot records.
>>
>> When I started using biojava, the recommended way to do this was using
>> SeqIOTools:
>> SeqIOTools.writeSwissprot(byteStream, swissSequence);
>>
>> While this works (ie: no exceptions are thrown), the record that is
>> printed
>> to the byteStream looks pretty ugly (it's littered with XX lines) and is
>> not
>> valid as per the current swissprot file spec (
>> http://www.expasy.ch/sprot/userman.html).  While this record is invalid,
>> it
>> does contain all of the information that was originally in the swissprot
>> file.  I would include what I get as an output here, but it's irrelevant.
>>
>> SeqIOTools became deprecated in favour of this:
>> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);
>>
>> Once again, while this works (and this time the record is valid), the
>> record
>> that is printed contains almost none of the original information that is
>> contained in the swissprot record.  This is the output that I get when I
>> call this method (the spacing is may not look right because of fonts, but
>> that is not the problem):
>>
>> ID   Q4UVA7_null             STANDARD;         273 AA.
>> > AC   Q4UVA7;
>> > DT   null, integrated into UniProtKB/?.
>> > DT   null, sequence version 0.
>> > DT   null, entry version 0.
>> > DE   null.
>> > FT   any           1    273
>> > FT   any         153    160
>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>> > //
>> >
>>
>> But what I am expecting to see looks like this (again, the spacing is the
>> fault of the font, not the output):
>>
>> > ID   Y1953_XANC8             Reviewed;         273 AA.
>> > AC   Q4UVA7;
>> > DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
>> > DT   05-JUL-2005, sequence version 1.
>> > DT   06-FEB-2007, entry version 12.
>> > DE   UPF0085 protein XC_1953.
>> > GN   OrderedLocusNames=XC_1953;
>> > OS   Xanthomonas campestris pv. campestris (strain 8004).
>> > OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
>> > OC   Xanthomonadaceae; Xanthomonas.
>> > OX   NCBI_TaxID=314565;
>> > RN   [1]
>> > RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
>> > RX   PubMed=15899963; DOI=10.1101/gr.3378705;
>> > RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun Q.,
>> > RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
>> > RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen B.,
>> > RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
>> > RT   "Comparative and functional genomic analyses of the pathogenicity
>> of
>> > RT   phytopathogen Xanthomonas campestris pv. campestris.";
>> > RL   Genome Res. 15:757-767(2005).
>> > CC   -!- SIMILARITY: Belongs to the UPF0085 family.
>> > CC   ------------------------------------------------------------
>> > -----------
>> > CC   Copyrighted by the UniProt Consortium, see
>> > http://www.uniprot.org/terms
>> > CC   Distributed under the Creative Commons Attribution-NoDerivs License
>> > CC   ------------------------------------------------------------
>> > -----------
>> > DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
>> > DR   GenomeReviews; CP000050_GR; XC_1953.
>> > DR   KEGG; xcb:XC_1953; -.
>> > DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
>> > DR   HAMAP; MF_01062; -; 1.
>> > DR   InterPro; IPR005177; DUF299.
>> > DR   Pfam; PF03618; DUF299; 1.
>> > KW   ATP-binding; Complete proteome; Nucleotide-binding.
>> > FT   CHAIN         1    273       UPF0085 protein XC_1953.
>> > FT                                /FTId=PRO_0000196744.
>> > FT   NP_BIND     153    160       ATP (Potential).
>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>> > //
>> >
>>
>> Needless to say, there is a considerable loss of information.
>>
>> At first I wasn't sure if this was a problem with parsing the database
>> that
>> I had, so I inspected the object that was retrieved from the database.  As
>> I
>> mentioned before, the parsing seems to be working fine.  I get a
>> SimpleSequence object that has all of the correct annotations and other
>> information loaded into it.
>>
>> I then continued to step through the writeUniProt method in
>> RichSequence.IOTools and found that this method first calls "enrich" on
>> SimpleSequence which turns it into a SimpleRichSequence.  There appears to
>> be some loss of information at this point, specifically in the feature set
>> where the 'key name' is lost -- it just becomes 'any'.
>>
>> It is when we get to the actual process of writing to the stream in
>> UniprotFormat.writeSequence that we have the problems.  All of the code
>> appears to be there for printing the information out that I'm expecting.
>>  I
>> think the problem is that in the process of "enrich"-ing the sequence, the
>> data is still stored in the object, but it is no longer where it is
>> expected
>> to be.  For example, when we get to writing the comments out:
>>        // comments - if any
>>        if (!rs.getComments().isEmpty()) {
>>
>> The List of comments IS empty, but there are comments in the
>> SimpleRichSequence, they are stored in the notes data member.
>>
>> So.  After this lengthy explanation of my problem, I am wondering if I am
>> merely not doing this correctly.  Is there a better way to pass my
>> information to the writeUniprot method -- should I be transforming my
>> SimpleSequence objects into a SimpleRichSequence manually?  Am I just
>> going
>> about this entirely the wrong way?
>>
>> If I am going about this correctly and the functionality to do this is
>> merely not there or hasn't been implemented correctly, I would be more
>> than
>> happy to help out...  I can supply patches, create bug reports, or
>> anything
>> else that is necessary.
>>
>> Any guidance in this matter would be greatly appreciated!
>>
>> --
>> Franklin
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>


-- 
Franklin


From holland at eaglegenomics.com  Mon Oct 20 13:51:36 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 14:51:36 +0100
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
Message-ID: <a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>

Excellent! Thanks for your offer of help!

Yes, an advanced RNA module would be very helpful indeed. You should
probably call it 'rna'.

As long as everyone who intends to work on BJ3 declares their intentions
here, as you just have, then basically it's first come first served. I won't
be doing any official supervision other than keeping an eye on committed
code once in a while to make sure it all looks OK. So feel free to start
coding straight away!

All new modules should probably start by:

1. copying the existing dna module to something new, like 'rna' in this
case.
2. remove all the hidden .svn directories from the copy,
3. update the pom.xml in the copy (do a search-and-replace on dna and change
to the new name, rna in this case), delete the existing source packages in
src/main/java (org.biojava.dna) and create suitable new ones
(org.biojava.rna in this case).
4. empty out the target/ folder then svn add the new module
5. svn:ignore the target/ directory in your new module,
6. include your new module in the list at the end of the pom.xml in the root
directory of the biojava3 branch.

cheers,
Richard


2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>

> Dear Richard,
>
> I'm answering to your "official call", to propose you my help for the
> development of the biojava3 code. With the modularity of Maven, I also would
> like to proposes you my help for the development of a module that will use
> the biojava3 code to manage more specialized RNA stuff (secondary and
> tertiary structures, base-pairs classifications, modified nucleotides, RNA
> alignments,....).
>
> What will be the next step for me? Will you make a selection?
>
> Best Regards
>
> Fabrice Jossinet
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Oct 20 14:17:34 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 15:17:34 +0100
Subject: [Biojava-dev] Writing Swissprot/Uniprot formatted files
In-Reply-To: <50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com>
References: <50a7756d0810171158k51aa3ee4l5f7078321633ebc5@mail.gmail.com>
	<a0d826f40810171308s5b2788aah27a982cb6a3b45e@mail.gmail.com>
	<50a7756d0810200636l4355f3cbj367b155e573e1612@mail.gmail.com>
Message-ID: <a0d826f40810200717l2d1a2373n756dbd1d083eaa3a@mail.gmail.com>

Wow, I didn't know anyone was actually using the registry thing. I certainly
never have! That's probably why it was left out of the whole update to
RichSequences. There will probably be equivalent functionality in BioJava3
at some point but I doubt anyone will backport the RichSequence updates to
the existing registry setup (unless there's any volunteers!).

Good luck with the conversion process.

cheers,
Richard

2008/10/20 Franklin Bristow <fbristow at gmail.com>

> Hi Richard,
> I'm getting my records from an indexed flat file.  I indexed the file using
> IndexTools.indexSwissprot().  I am then retrieving the records from the flat
> file "database" using the SequenceDBLite interface which is being provided
> to me using the Registry and SystemRegistry classes.  The following a simple
> example of what I am doing:
>
> First I index the flat file:
>
>> File[] files = new File[] { new
>> File("/home/fbristow/db/uniprot_sprot.dat") };
>> try {
>>       IndexTools.indexSwissprot("uniprot_sprot", new
>> File("/home/fbristow/db/index/uniprot_sprot"), files);
>> } catch (BioException bioE) {
>>       bioE.printStackTrace();
>> } catch (ParserException parseE) {
>>       parseE.printStackTrace();
>> } catch (IOException ioE) {
>>       ioE.printStackTrace();
>> }
>
>
> Then I get a handle on that file by doing:
>
>> Registry registry = SystemRegistry.instance();
>> setSwissDatabase(registry.getDatabase("swissprot"))
>>
>
> And I have a file in /etc that tells the registry how to find the indexes
> with the swissprot identifier as per
> http://biojava.org/docs/api/org/biojava/directory/SystemRegistry.html
>
> Ultimately, this gives me a class that implements the interface
> SequenceDBLite, and when I query this interface for sequences it returns to
> me Sequence objects.  I can't seem to see anything that would give me a
> RichSequence, so I think that I'll continue to get them in this manner, but
> I'll convert the Sequence objects into RichSequence objects myself.
>
> Thanks for your attention!
>
>
> On Fri, Oct 17, 2008 at 3:08 PM, Richard Holland <
> holland at eaglegenomics.com> wrote:
>
>> Hello.
>>
>> I'm not sure how you're getting your uniprot records out of your swissprot
>> database, or what format your swissprot database is in? If it's BioSQL, then
>> the way BioJava interacts with it has altered significantly with BioJavaX -
>> previous versions basically stuffed everything in as comments, hence all the
>> XX lines you got when writing it back out again. However if it's not BioSQL
>> and you've written something custom of your own, then I couldn't really
>> comment!
>>
>> BioJavaX will attempt to convert the old sequence objects into rich
>> sequence objects, but there's not much in common between the way uniprot
>> data is stored in the old object model and the new one. Therefore the enrich
>> method can't do a very good job - especially for stuff which the original
>> parser stored as comments instead of properly distributing it across the
>> object model. Data which the original parser stored in this comment format
>> will mostly get ignored by the conversion process, because the conversion
>> process has no idea where the record came from and therefore what to do with
>> the comments inside it.
>>
>> Your best bet is to read your data out of your database directly as rich
>> sequence objects, or if not possible, then do the conversion manually.
>>
>> cheers,
>> Richard
>>
>>
>> 2008/10/17 Franklin Bristow <fbristow at gmail.com>
>>
>>> Hello everyone,
>>> I've been doing some work with swissprot, and I've been needing to make
>>> use
>>> of the file reading and writing facilities in biojava.
>>>
>>> I was using biojava 1.5, but I've recently moved to using biojava-live so
>>> that I can actually step through the code to see what's going on.
>>>
>>> I have successfully created an index of my swissprot database and I can
>>> read
>>> my sequences out of that indexed database.  All of the appropriate
>>> information is loaded from the records in the file into the appropriate
>>> objects.  I am quite happy with this.
>>>
>>> The problem that I am having has to do with writing swissprot records.
>>>
>>> When I started using biojava, the recommended way to do this was using
>>> SeqIOTools:
>>> SeqIOTools.writeSwissprot(byteStream, swissSequence);
>>>
>>> While this works (ie: no exceptions are thrown), the record that is
>>> printed
>>> to the byteStream looks pretty ugly (it's littered with XX lines) and is
>>> not
>>> valid as per the current swissprot file spec (
>>> http://www.expasy.ch/sprot/userman.html).  While this record is invalid,
>>> it
>>> does contain all of the information that was originally in the swissprot
>>> file.  I would include what I get as an output here, but it's irrelevant.
>>>
>>> SeqIOTools became deprecated in favour of this:
>>> RichSequence.IOTools.writeUniProt(byteStream, swissSequence, null);
>>>
>>> Once again, while this works (and this time the record is valid), the
>>> record
>>> that is printed contains almost none of the original information that is
>>> contained in the swissprot record.  This is the output that I get when I
>>> call this method (the spacing is may not look right because of fonts, but
>>> that is not the problem):
>>>
>>> ID   Q4UVA7_null             STANDARD;         273 AA.
>>> > AC   Q4UVA7;
>>> > DT   null, integrated into UniProtKB/?.
>>> > DT   null, sequence version 0.
>>> > DT   null, entry version 0.
>>> > DE   null.
>>> > FT   any           1    273
>>> > FT   any         153    160
>>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>>> > //
>>> >
>>>
>>> But what I am expecting to see looks like this (again, the spacing is the
>>> fault of the font, not the output):
>>>
>>> > ID   Y1953_XANC8             Reviewed;         273 AA.
>>> > AC   Q4UVA7;
>>> > DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot.
>>> > DT   05-JUL-2005, sequence version 1.
>>> > DT   06-FEB-2007, entry version 12.
>>> > DE   UPF0085 protein XC_1953.
>>> > GN   OrderedLocusNames=XC_1953;
>>> > OS   Xanthomonas campestris pv. campestris (strain 8004).
>>> > OC   Bacteria; Proteobacteria; Gammaproteobacteria; Xanthomonadales;
>>> > OC   Xanthomonadaceae; Xanthomonas.
>>> > OX   NCBI_TaxID=314565;
>>> > RN   [1]
>>> > RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
>>> > RX   PubMed=15899963; DOI=10.1101/gr.3378705;
>>> > RA   Qian W., Jia Y., Ren S.-X., He Y.-Q., Feng J.-X., Lu L.-F., Sun
>>> Q.,
>>> > RA   Ying G., Tang D.-J., Tang H., Wu W., Hao P., Wang L., Jiang B.-L.,
>>> > RA   Zeng S., Gu W.-Y., Lu G., Rong L., Tian Y., Yao Z., Fu G., Chen
>>> B.,
>>> > RA   Fang R., Qiang B., Chen Z., Zhao G.-P., Tang J.-L., He C.;
>>> > RT   "Comparative and functional genomic analyses of the pathogenicity
>>> of
>>> > RT   phytopathogen Xanthomonas campestris pv. campestris.";
>>> > RL   Genome Res. 15:757-767(2005).
>>> > CC   -!- SIMILARITY: Belongs to the UPF0085 family.
>>> > CC   ------------------------------------------------------------
>>> > -----------
>>> > CC   Copyrighted by the UniProt Consortium, see
>>> > http://www.uniprot.org/terms
>>> > CC   Distributed under the Creative Commons Attribution-NoDerivs
>>> License
>>> > CC   ------------------------------------------------------------
>>> > -----------
>>> > DR   EMBL; CP000050; AAY49016.1; -; Genomic_DNA.
>>> > DR   GenomeReviews; CP000050_GR; XC_1953.
>>> > DR   KEGG; xcb:XC_1953; -.
>>> > DR   GO; GO:0005524; F:ATP binding; IEA:HAMAP.
>>> > DR   HAMAP; MF_01062; -; 1.
>>> > DR   InterPro; IPR005177; DUF299.
>>> > DR   Pfam; PF03618; DUF299; 1.
>>> > KW   ATP-binding; Complete proteome; Nucleotide-binding.
>>> > FT   CHAIN         1    273       UPF0085 protein XC_1953.
>>> > FT                                /FTId=PRO_0000196744.
>>> > FT   NP_BIND     153    160       ATP (Potential).
>>> > SQ   SEQUENCE   273 AA;  30853 MW;  604FB6C6437A9D90 CRC64;
>>> >      MSTIRPVFYV SDGTGITAET IGHSLLTQFS GFNFVTDRMS FIDDADKARD AALRVRAAGE
>>> >      RYQVRPVVVN SCVDPQLSMI LAESGALMLD VFAPFIEPLE RELNAPRHSR VGRAHGMVDF
>>> >      ETYHRRINAM NFALSHDDGI ALNYDEADVI LVAVSRAGKT PTCIYLALHY GIRAANYPLT
>>> >      EEDLESERLP PRLRNYRSKL FGLTIDPERL QQIRQERRAN SRYSAAETCR REVATAERMF
>>> >      QMERIPTLST TNTSIEEISS KVLSTLGLQR EMF
>>> > //
>>> >
>>>
>>> Needless to say, there is a considerable loss of information.
>>>
>>> At first I wasn't sure if this was a problem with parsing the database
>>> that
>>> I had, so I inspected the object that was retrieved from the database.
>>>  As I
>>> mentioned before, the parsing seems to be working fine.  I get a
>>> SimpleSequence object that has all of the correct annotations and other
>>> information loaded into it.
>>>
>>> I then continued to step through the writeUniProt method in
>>> RichSequence.IOTools and found that this method first calls "enrich" on
>>> SimpleSequence which turns it into a SimpleRichSequence.  There appears
>>> to
>>> be some loss of information at this point, specifically in the feature
>>> set
>>> where the 'key name' is lost -- it just becomes 'any'.
>>>
>>> It is when we get to the actual process of writing to the stream in
>>> UniprotFormat.writeSequence that we have the problems.  All of the code
>>> appears to be there for printing the information out that I'm expecting.
>>>  I
>>> think the problem is that in the process of "enrich"-ing the sequence,
>>> the
>>> data is still stored in the object, but it is no longer where it is
>>> expected
>>> to be.  For example, when we get to writing the comments out:
>>>        // comments - if any
>>>        if (!rs.getComments().isEmpty()) {
>>>
>>> The List of comments IS empty, but there are comments in the
>>> SimpleRichSequence, they are stored in the notes data member.
>>>
>>> So.  After this lengthy explanation of my problem, I am wondering if I am
>>> merely not doing this correctly.  Is there a better way to pass my
>>> information to the writeUniprot method -- should I be transforming my
>>> SimpleSequence objects into a SimpleRichSequence manually?  Am I just
>>> going
>>> about this entirely the wrong way?
>>>
>>> If I am going about this correctly and the functionality to do this is
>>> merely not there or hasn't been implemented correctly, I would be more
>>> than
>>> happy to help out...  I can supply patches, create bug reports, or
>>> anything
>>> else that is necessary.
>>>
>>> Any guidance in this matter would be greatly appreciated!
>>>
>>> --
>>> Franklin
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>
>
>
> --
> Franklin
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From f.jossinet at ibmc.u-strasbg.fr  Mon Oct 20 13:04:29 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Mon, 20 Oct 2008 15:04:29 +0200
Subject: [Biojava-dev] BioJava3 contribution
Message-ID: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>

Dear Richard,

I'm answering to your "official call", to propose you my help for the  
development of the biojava3 code. With the modularity of Maven, I also  
would like to proposes you my help for the development of a module  
that will use the biojava3 code to manage more specialized RNA stuff  
(secondary and tertiary structures, base-pairs classifications,  
modified nucleotides, RNA alignments,....).

What will be the next step for me? Will you make a selection?

Best Regards

Fabrice Jossinet

--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From andreas at sdsc.edu  Mon Oct 20 19:18:48 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Oct 2008 12:18:48 -0700
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
Message-ID: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>

Hi Fabrice,

Regarding the tertiaty structure representation we should work
together. There is a seet of tools available already in the current
biojava 1.7 which I was intending to maintain and migrate to biojava v
3. Let me know if you have specific RNA related requests...

Andreas

On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland
<holland at eaglegenomics.com> wrote:
> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their intentions
> here, as you just have, then basically it's first come first served. I won't
> be doing any official supervision other than keeping an eye on committed
> code once in a while to make sure it all looks OK. So feel free to start
> coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in this
> case.
> 2. remove all the hidden .svn directories from the copy,
> 3. update the pom.xml in the copy (do a search-and-replace on dna and change
> to the new name, rna in this case), delete the existing source packages in
> src/main/java (org.biojava.dna) and create suitable new ones
> (org.biojava.rna in this case).
> 4. empty out the target/ folder then svn add the new module
> 5. svn:ignore the target/ directory in your new module,
> 6. include your new module in the list at the end of the pom.xml in the root
> directory of the biojava3 branch.
>
> cheers,
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>
>> Dear Richard,
>>
>> I'm answering to your "official call", to propose you my help for the
>> development of the biojava3 code. With the modularity of Maven, I also would
>> like to proposes you my help for the development of a module that will use
>> the biojava3 code to manage more specialized RNA stuff (secondary and
>> tertiary structures, base-pairs classifications, modified nucleotides, RNA
>> alignments,....).
>>
>> What will be the next step for me? Will you make a selection?
>>
>> Best Regards
>>
>> Fabrice Jossinet
>>
>> --
>> Dr. Fabrice Jossinet
>> Laboratoire de Bioinformatique, modelisation et simulation des acides
>> nucleiques
>> Universite Louis Pasteur
>> Institut de biologie moleculaire et cellulaire du CNRS
>> UPR9002, Architecture et Reactivite de l'ARN
>> 15 rue Rene Descartes
>> F-67084 Strasbourg Cedex
>> France
>>
>> Tel + 33 (0) 3 88 417053
>> FAX + 33 (0) 3 88 60 22 18
>>
>> f.jossinet at ibmc.u-strasbg.fr
>> fjossinet at gmail.com
>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>> http://fjossinet.u-strasbg.fr/
>>
>>
>>
>>
>>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From fjossinet at orange.fr  Mon Oct 20 20:40:26 2008
From: fjossinet at orange.fr (Fabrice Jossinet)
Date: Mon, 20 Oct 2008 22:40:26 +0200
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
	<59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>
Message-ID: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr>

Hi Andreas,

yes of course, I really would like to work with you (I like your work  
with SPICE). I wanted to contact you about this point before to start.  
Concerning the tertiary structure representation, I need to annotate  
an RNA tertiary structure with base-pairs families (as described in http://www.ncbi.nlm.nih.gov/pubmed/12177293 
  or in http://prion.bchs.uh.edu/bp_type/ ) and structural motifs  
(like those listed in the SCOR database  http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814) 
. The idea is to attach these features to a 3D in the same way than  
the features attached to a sequence (1D).

What do you think?

Fabrice

Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit :

> Hi Fabrice,
>
> Regarding the tertiaty structure representation we should work
> together. There is a seet of tools available already in the current
> biojava 1.7 which I was intending to maintain and migrate to biojava v
> 3. Let me know if you have specific RNA related requests...
>
> Andreas
>
> On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland
> <holland at eaglegenomics.com> wrote:
>> Excellent! Thanks for your offer of help!
>>
>> Yes, an advanced RNA module would be very helpful indeed. You should
>> probably call it 'rna'.
>>
>> As long as everyone who intends to work on BJ3 declares their  
>> intentions
>> here, as you just have, then basically it's first come first  
>> served. I won't
>> be doing any official supervision other than keeping an eye on  
>> committed
>> code once in a while to make sure it all looks OK. So feel free to  
>> start
>> coding straight away!
>>
>> All new modules should probably start by:
>>
>> 1. copying the existing dna module to something new, like 'rna' in  
>> this
>> case.
>> 2. remove all the hidden .svn directories from the copy,
>> 3. update the pom.xml in the copy (do a search-and-replace on dna  
>> and change
>> to the new name, rna in this case), delete the existing source  
>> packages in
>> src/main/java (org.biojava.dna) and create suitable new ones
>> (org.biojava.rna in this case).
>> 4. empty out the target/ folder then svn add the new module
>> 5. svn:ignore the target/ directory in your new module,
>> 6. include your new module in the list at the end of the pom.xml in  
>> the root
>> directory of the biojava3 branch.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>>
>>> Dear Richard,
>>>
>>> I'm answering to your "official call", to propose you my help for  
>>> the
>>> development of the biojava3 code. With the modularity of Maven, I  
>>> also would
>>> like to proposes you my help for the development of a module that  
>>> will use
>>> the biojava3 code to manage more specialized RNA stuff (secondary  
>>> and
>>> tertiary structures, base-pairs classifications, modified  
>>> nucleotides, RNA
>>> alignments,....).
>>>
>>> What will be the next step for me? Will you make a selection?
>>>
>>> Best Regards
>>>
>>> Fabrice Jossinet
>>>
>>> --
>>> Dr. Fabrice Jossinet
>>> Laboratoire de Bioinformatique, modelisation et simulation des  
>>> acides
>>> nucleiques
>>> Universite Louis Pasteur
>>> Institut de biologie moleculaire et cellulaire du CNRS
>>> UPR9002, Architecture et Reactivite de l'ARN
>>> 15 rue Rene Descartes
>>> F-67084 Strasbourg Cedex
>>> France
>>>
>>> Tel + 33 (0) 3 88 417053
>>> FAX + 33 (0) 3 88 60 22 18
>>>
>>> f.jossinet at ibmc.u-strasbg.fr
>>> fjossinet at gmail.com
>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>>> http://fjossinet.u-strasbg.fr/
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>


--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From markjschreiber at gmail.com  Tue Oct 21 02:54:27 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 10:54:27 +0800
Subject: [Biojava-dev] Biojava / BioSQL entity beans
Message-ID: <93b45ca50810201954k44ab0f65xb94a0214d8eb4e13@mail.gmail.com>

Hi -

Richard has kindly uploaded some JPA Entity beans that map to the
BioSQL database schema as a BioSQL module for BJ3.  These entity beans
where generated as part of the Tokyo webservices workshop.  As
Entities they are useful as POJOs as well as data transfer via JPA,
JAXB and can be used in EJB containers or a plain old JVM.  The have
no biological smarts and the intention was/is that these will be
provided by wrapping them in Bio-aware (and more thread safe) wrappers
that implement interfaces from other BJ3 modules.  In essence it is a
persistence layer.

The following is copied verbatim from the package-info.java and gives
you some idea of how I intend the package to be used (obviously some
of this is still to come).  There is also some discussion of some of
the gotcha's that might trip you up when playing with object
relational persistence.

BTW the naming convention is to call something FooEntity. Where BioSQL
requires a compound primary key this is implemented as an Embeddable
object called FooEntityPK which is the key for FooEntity.  The other
thing you may see is FooEntityUK which is the same concept but
represents some of the cases where BioSQL tables don't have a primary
key (even a compound one) but implicitly they do because all the
fields have the SQL unique restriction. In these cases JPA still
requires an Embeddable key to track updates. As far as Java is
concerned they are the same as a FooEntityPK but I used a different
name to make the distinction.

The annotations provide mapping to tables from a Derby database. This
is the reference Java in memory DB which can run from any JVM and is
also found in Glassfish. The mappings will likely also work with
MySQL. For Oracle (and possibly others) you would need to override the
@GeneratedValue strategy for generating primary keys. I believe this
can be done with external XML config files. You may also wish to
overide the default eager loading and cascade annotations depending on
your JPA persistence method and preferences.

This has been lightly tested using Glassfish, Derby and Toplink
essentials and is a work in progress but seems to work OK.

Best regards,

- Mark

/**
 * The package contains Entity representations of BioJava classes.
 * The purpose of these entities is to allow simple serialization of
BioJava data
 * using binary serialization for protocols that require this (eg RPC between
 * Java application servers) as well as persistence mechanisms that require bean
 * like ojbects such as the Java Persistence Architechture (JPA) or the
 * Java API for XML Binding (JAXB). For this reason all objects in this package
 * should provide a parameterless public constructor and public get/set methods
 * for relevant fields.
 * <p>
 * Given the public nature of the constructors and the setters in these beans
 * these classes are not intended for direct use in general programming when
 * using the BioJava v3 API. This is because it is possible to leave the bean in
 * and inconsitent state and they are <b>not thread safe</b> unless
synchronization
 * controlled externally (via synchornization blocks or via a
application container).
 * </p><p>
 * The Entities are intended to back other objects that a
 * programer will interact with directly. For example
<code>Foo.class</code> will be backed
 * by <code>FooEntity.class</code>. Generally interaction with
Foo.class is to be prefered and
 * will often be more sensible as the entities typically provide no 'biological
 * behaivour'. Relevant behaivour should be provided by the wrapping
class. It is best
 * to think of <code>Foo</code> as a view onto the data that is held in the
 * <code>FooEntity</code>.  A good example is the sophisticated Symbol
 * behaivour that can represent biological logic about IUPAC ambiguity symbols.
 * For example a 'w' in a Biosequence represents an abiguity between
'a' and 't',
 * whereas a 'w' in BiosequenceEntity is simply a 'w' and nothing else.
 * </p><p>
 * The wrapper entity pattern is intended to allow for a lot of the advanced
 * behaivour in the original BioJava while also allowing use of modern transport
 * and persistence packages. This is achieved by peristing and transporting the
 * entity without the wrapper and re-wrapping it at the other end.
 * </p><p>
 * Currently BioJava v3 uses annotated @Id fields to define
 * <code>equals(Object o)</code>. Consistent definition is critical to how
 * the object will behave when persisted to a database. In the case of:
 * <pre>
 * Foo f = ... initialize
 * Foo fo = ... initialize
 * boolean b = f.equals(fo);
 * </pre>
 * <code>b</code> would be true if both objects share the same value
 * (or embeddable object) in the field that represents the primary key in the
 * database <b>even</b> if all other fields are equal. This is desirable because
 * two entities representing the same DB record may be retreived from
two different
 * sessions. Additionally these are the identity fields, so logically,
they should map to
 * the concept of identity. Finally, searching a collection is made very simple
 * without requireing an iterator:
 * <pre>
 * Integer id = //code to initialize
 * collection.contains(new Foo(id));
 * </pre>
 * By default BioJava v3 entities use <b>only</b> the primary key
field for equality
 * If either record has <code>null</code> as the primary key value it
is never equal
 * to another. When implementing <code>equals(Object o)</code> it is
not advisable to perform
 * the test this.getClass() == o.getClass() because of the possibility of proxy
 * classes used in JPA. This can, however, lead to an issue with the
 * <code>hashcode()</code> method.  Consider the following code:
 * <pre>
 * Foo foo = new Foo() //no primary key
 * HashSet set = new HashSet();
 * set.add(foo);
 * // code here to persist Foo and consequently generate it's PK
 * boolean b = set.contains(foo);
 * </pre>
 * Because only the PK is used for equality, then the PK is used in
the hashcode.
 * This means that <code>b</code> is probably going to be false because
 * it would have been stored in a hash bucket using the old hashcode that will
 * now be different even though the set actually does contain a pointer to foo.
 * Although a potential deficiency it is unlikely to be a major problem for
 * BioJava v3 developers because using entity backed objects is
prefered to direct
 * interaction with entities. If you need to use entities directly
then use hashed
 * collections with caution.
 *
 * <p>Wrapper classes can either delegate it's equals call to the underlying
 * entity or it can do something that is more biologically sensible
 * (as PK values are typically not exposed in the wrapper). It is probably more
 * sensible for a wrapper to define it's own <code>equals</code> (and
<code>haschode</code>
 * implementations due to the limitations of the default @Id based system
 * described above. Especially the potential hashcode problems.
 *
 * For example <code>FooSequence.class</code> might want to base
 * equality on the exact match of the DNA sequence it holds even though
 * <code>FooSequenceEntity.class</code> may only use the PK field. If delegation
 * is used (or not) it should be clearly documented.
 * <p>
 *
 * </p>
 * @author Mark Schreiber
 */
package org.biojava.biosql.entity;


From andreas at sdsc.edu  Tue Oct 21 03:17:28 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Oct 2008 20:17:28 -0700
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
Message-ID: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>

Hi,

Couple of thoughts regarding biojava v3:

License: Since it seems we will end up copying code from biojava 1.6
to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
people should still use the same biojava license headers when
committing new files and all code will be considered to be LGPL, if no
header is present. Do NOT commit code under other licenses.

Installation: We need some installation instructions on the wiki site,
e.g. how to get the maven setup running.  What are the code
conventions for the new version?

Blast: the Blast parsing modules are among the most frequently used
ones in biojava 1.6. To make people use biojava v3 it will be crucial
to have a port of them to the new version. Does anybody want to take
care of that?

Automated builds: is it interesting to have automated builds set up
for the new version at this stage, or should we wait until a more
mature stage? I could easily add another auto-build similar to the one
for biojava 1.6 at http://www.spice-3d.org/cruise/

Andreas

On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From fjossinet at orange.fr  Tue Oct 21 07:09:46 2008
From: fjossinet at orange.fr (Fabrice Jossinet)
Date: Tue, 21 Oct 2008 09:09:46 +0200
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
Message-ID: <CC8BE015-CF9C-4FA1-AF96-3626EDD83360@orange.fr>

Hi Richard,

I did everything but, with my IntelliJ IDE, I cannot commit the new  
rna module due to a failure in authentification. Do I have to register  
somewhere to have an account? (but perhaps it's a wrong configuration  
on my side)

Fabrice

Le 20 oct. 08 ? 15:51, Richard Holland a ?crit :

> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should  
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their  
> intentions here, as you just have, then basically it's first come  
> first served. I won't be doing any official supervision other than  
> keeping an eye on committed code once in a while to make sure it all  
> looks OK. So feel free to start coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in  
> this case.
> 2. remove all the hidden .svn directories from the copy,
> 3. update the pom.xml in the copy (do a search-and-replace on dna  
> and change to the new name, rna in this case), delete the existing  
> source packages in src/main/java (org.biojava.dna) and create  
> suitable new ones (org.biojava.rna in this case).
> 4. empty out the target/ folder then svn add the new module
> 5. svn:ignore the target/ directory in your new module,
> 6. include your new module in the list at the end of the pom.xml in  
> the root directory of the biojava3 branch.
>
> cheers,
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
> Dear Richard,
>
> I'm answering to your "official call", to propose you my help for  
> the development of the biojava3 code. With the modularity of Maven,  
> I also would like to proposes you my help for the development of a  
> module that will use the biojava3 code to manage more specialized  
> RNA stuff (secondary and tertiary structures, base-pairs  
> classifications, modified nucleotides, RNA alignments,....).
>
> What will be the next step for me? Will you make a selection?
>
> Best Regards
>
> Fabrice Jossinet
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>
>
>
>
> -- 
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/


--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From holland at eaglegenomics.com  Tue Oct 21 09:06:41 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 10:06:41 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
Message-ID: <a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>

>
>
> License: Since it seems we will end up copying code from biojava 1.6
> to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
> people should still use the same biojava license headers when
> committing new files and all code will be considered to be LGPL, if no
> header is present. Do NOT commit code under other licenses.
>
> Installation: We need some installation instructions on the wiki site,
> e.g. how to get the maven setup running.  What are the code
> conventions for the new version?


Not sure where best to put it in the Wiki, but I agree it needs to go there
somewhere.

Installation is a one-liner from within the top level of the project:

   mvn install

This compiles and installs the JARs into your local Maven repository, and
also downloads and installs any external dependencies. Then you can add the
installed modules as dependencies in your own Maven projects.

If you need to write a launcher script for your project, or you want to use
the JAR files outside Maven, you can use this command to generate the
CLASSPATH for use outside Maven. This only includes external dependencies -
you'll also need to add to it the individual JAR files from inside the
various target/ folders that Maven built for you:

  mvn dependency:build-classpath

Code conventions are simple:

1. I'm not fussed about the specific formatter people use in each module, as
long as the code is all formatted using some kind of consistent method. I
personally just use the default settings from Format code in NetBeans.

2. Use 'this' wherever possible, and for static references, use the
classname prefix (e.g. MyClass.staticField). I hate having to try and work
out in my head which references are going where, and which are static and
which are not!

3. Comment every single method, even if it's private. This helps understand
the flow of your code. Also comment liberally inside methods if they are
longer than just a few lines (i.e. if you can't fit the entire method within
the code panel in NetBeans, its going to need internal comments).

4. When writing getters/setters, follow the Java beans conventions so that
automated frameworks like Spring can easily pick it up and work with it.

5. Please write tests for your code using JUnit conventions, inside the
test/ folder of each module. I know I haven't done this myself yet, but I'm
going to!


>
>
> Blast: the Blast parsing modules are among the most frequently used
> ones in biojava 1.6. To make people use biojava v3 it will be crucial
> to have a port of them to the new version. Does anybody want to take
> care of that?


I'll second that. Blast is vital. We'd really appreciate a volunteer,
please!


>
> Automated builds: is it interesting to have automated builds set up
> for the new version at this stage, or should we wait until a more
> mature stage? I could easily add another auto-build similar to the one
> for biojava 1.6 at http://www.spice-3d.org/cruise/


You could do, although I don't think they'd be much use yet. But why not
start early then we won't forget to do it later.


Richard


>
> Andreas
>
> On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
> <holland at eaglegenomics.com> wrote:
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Tue Oct 21 09:09:26 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 10:09:26 +0100
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <CC8BE015-CF9C-4FA1-AF96-3626EDD83360@orange.fr>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
	<CC8BE015-CF9C-4FA1-AF96-3626EDD83360@orange.fr>
Message-ID: <a0d826f40810210209x698b786ag86414f58e97ef45d@mail.gmail.com>

Ah, yes. The person to talk to is Andreas. He has control over the SVN
repository.


2008/10/21 Fabrice Jossinet <fjossinet at orange.fr>

> Hi Richard,
> I did everything but, with my IntelliJ IDE, I cannot commit the new rna
> module due to a failure in authentification. Do I have to register somewhere
> to have an account? (but perhaps it's a wrong configuration on my side)
>
> Fabrice
>
> Le 20 oct. 08 ? 15:51, Richard Holland a ?crit :
>
> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their intentions
> here, as you just have, then basically it's first come first served. I won't
> be doing any official supervision other than keeping an eye on committed
> code once in a while to make sure it all looks OK. So feel free to start
> coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in this
> case.
> 2. remove all the hidden .svn directories from the copy,
> 3. update the pom.xml in the copy (do a search-and-replace on dna and
> change to the new name, rna in this case), delete the existing source
> packages in src/main/java (org.biojava.dna) and create suitable new ones
> (org.biojava.rna in this case).
> 4. empty out the target/ folder then svn add the new module
> 5. svn:ignore the target/ directory in your new module,
> 6. include your new module in the list at the end of the pom.xml in the
> root directory of the biojava3 branch.
>
> cheers,
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>
>> Dear Richard,
>>
>> I'm answering to your "official call", to propose you my help for the
>> development of the biojava3 code. With the modularity of Maven, I also would
>> like to proposes you my help for the development of a module that will use
>> the biojava3 code to manage more specialized RNA stuff (secondary and
>> tertiary structures, base-pairs classifications, modified nucleotides, RNA
>> alignments,....).
>>
>> What will be the next step for me? Will you make a selection?
>>
>> Best Regards
>>
>> Fabrice Jossinet
>>
>> --
>> Dr. Fabrice Jossinet
>> Laboratoire de Bioinformatique, modelisation et simulation des acides
>> nucleiques
>> Universite Louis Pasteur
>> Institut de biologie moleculaire et cellulaire du CNRS
>> UPR9002, Architecture et Reactivite de l'ARN
>> 15 rue Rene Descartes
>> F-67084 Strasbourg Cedex
>> France
>>
>> Tel + 33 (0) 3 88 417053
>> FAX + 33 (0) 3 88 60 22 18
>>
>> f.jossinet at ibmc.u-strasbg.fr
>> fjossinet at gmail.com
>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>> http://fjossinet.u-strasbg.fr/
>>
>>
>>
>>
>>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
>
>
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
>
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Tue Oct 21 09:26:41 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 17:26:41 +0800
Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	<a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>
Message-ID: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>

>> Blast: the Blast parsing modules are among the most frequently used
>> ones in biojava 1.6. To make people use biojava v3 it will be crucial
>> to have a port of them to the new version. Does anybody want to take
>> care of that?
>
>
> I'll second that. Blast is vital. We'd really appreciate a volunteer,
> please!
>

BlastXML output would certainly be the easiest place to start. I also
think with the new Thing/ ThingBuilder framework it will be possible
to develop all manner of parsers for the vagaries of Blast text output
that come with each new release of Blast. Possible but maybe not a
good idea. I don't think that output was ever supposed to be machine
readable.  The table formatted output (-m8 I think) would be a better
option.

Given the DTD it should be possible to do a quick JAXB binding. How
would that work in the Thing/ ThingBuilder paradigm?

- Mark


From holland at eaglegenomics.com  Tue Oct 21 10:18:40 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 11:18:40 +0100
Subject: [Biojava-dev] [Biojava-l] BioJava 3 Begins - Volunteers please!
In-Reply-To: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	<a0d826f40810210206m590f90e0t3669254273b56ef@mail.gmail.com>
	<93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>
Message-ID: <a0d826f40810210318gbb8b352jd8468395a1926c48@mail.gmail.com>

JAXB would follow the exact same Thing/ThingBuilder pattern, but with the
following subtle differences...

0. Your root data model object as generated by JAXB should be modified to
implement Thing, making it a JAXBThing.
1. JAXBReader (extends ThingReader) would open and read the file using JAXB
and directly construct JAXBThings.
2. JAXBReceiver (extends ThingReceiver) be a pass-through interface with
just one method, something like setJAXBThing() to pass in the already-parsed
JAXBThing directly.
3. Any converters would expand/deflate data from other formats to/from the
JAXBThing object directly.


Richard.

2008/10/21 Mark Schreiber <markjschreiber at gmail.com>

> >> Blast: the Blast parsing modules are among the most frequently used
> >> ones in biojava 1.6. To make people use biojava v3 it will be crucial
> >> to have a port of them to the new version. Does anybody want to take
> >> care of that?
> >
> >
> > I'll second that. Blast is vital. We'd really appreciate a volunteer,
> > please!
> >
>
> BlastXML output would certainly be the easiest place to start. I also
> think with the new Thing/ ThingBuilder framework it will be possible
> to develop all manner of parsers for the vagaries of Blast text output
> that come with each new release of Blast. Possible but maybe not a
> good idea. I don't think that output was ever supposed to be machine
> readable.  The table formatted output (-m8 I think) would be a better
> option.
>
> Given the DTD it should be possible to do a quick JAXB binding. How
> would that work in the Thing/ ThingBuilder paradigm?
>
> - Mark
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From dicknetherlands at gmail.com  Tue Oct 21 11:14:29 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Tue, 21 Oct 2008 12:14:29 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
Message-ID: <a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>

For now, yes it's empty. But I can envisage situations where it might be
nice to have Thing implement some common methods (e.g. isMachineGenerated(),
isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder
for future expansion, than have to re-engineer everything should we identify
a need for common functions in future.

You'll see that Thing already extends Serializable, implying that all Things
must be able to persist to an object backing store. Serializable itself is
also an empty interface!

Also I like the idea of having Thing, not Object, as a kind of marker of
intention. To me it makes it clearer when reading code to avoid Object
wherever possible. Thing may not be any more clever than Object, but it
immediately declares an intention when reading code as to what kind of
Object should be expected.


2008/10/21 Mark Schreiber <markjschreiber at gmail.com>

> Is there any need for Thing at all? Can't a bulder be typed to produce
> something that extends Object?
>
> If Thing provides no behaivour contract or meta-information then why
> does it exist?
>
> - Mark
>
> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> > Depends on what you want to program. If you want to have a collection of
> > objects which are Things & perform a common action on them then
> > annotations are not the way forward.
> >
> > If you want to have some kind of meta-programming occurring & need a
> > class to be multiple things then annotations are right. There is
> > currently no way to enforce compile time dependencies on annotations &
> > my thinking is that this is right. Annotations should be meta data or
> > provide a way to alter a class in a non-invasive way (think Web Service
> > annotations creating WS Servers & Clients without any alteration of the
> > class).
> >
> > Andy
> >
> > Richard Holland wrote:
> >> Spot on.
> >>
> >> Annotation/interface.... i think Annotation is probably better as you
> >> suggest, but I'd have to look into that. Not sure how it works with
> >> collections and generics. If it does turn out to be a better bet, I'll
> >> change it over.
> >>
> >> With the BioSQL dependencies, take a look at the pom.xml file inside the
> >> biojava-dna module. It declares a dependency on biojava-core. If you
> want to
> >> add dependencies to external JARs, take a look at biojava-biosql's
> pom.xml
> >> to see how it depends on javax.persistence. (The easiest way to add
> these is
> >> via an IDE such as NetBeans, which is what I'm using at the moment).
> >>
> >> cheers,
> >> Richard
> >>
> >> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>
> >>> So if I want to build a BioSQL loader from Genbank then would the
> >>> classes (or there wrappers) in the BioSQL Entity package need to
> >>> implement Thing?  Would maven have an issue with that or would it just
> >>> create a dependency on core? (you can tell I've never used Maven
> >>> right).
> >>>
> >>> From a design point of view should Thing be an interface or an
> >>> Annotation? The reason I ask is that it doesn't define any methods so
> >>> it is more of a tag than an interface.
> >>>
> >>> Anyway, my understanding is that I would use a Genbank parser (or
> >>> write one). Write a EntityReceiver interface (probably more than one
> >>> given the number of entities in BioSQL, implement a EntityBuilder
> >>> (again possibly more than one) that implements EntityReceiver and
> >>> builds Entity beans from messages it receives. In this case I probably
> >>> wouldn't provide a writer as JPA would be writing the beans to the
> >>> database.  Would this be how you imagine it?
> >>>
> >>> - Mark
> >>>
> >>>
> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> >>> <holland at eaglegenomics.com> wrote:
> >>>> (From now on I will only be posting these development messages to
> >>>> biojava-dev, which is the intended purpose of that list. Those of you
> who
> >>>> wish to keep track of things but are currently only subscribed to
> >>> biojava-l
> >>>> should also subscribe to biojava-dev in order to keep up to date.)
> >>>>
> >>>> As promised, I've committed a new package in the biojava-core module
> that
> >>>> should help understand how to do file parsing and conversion and
> writing
> >>> in
> >>>> the new BJ3 modules. Here's an example of how to use it to write a
> >>> Genbank
> >>>> parser (note no parsers actually exist yet!):
> >>>>
> >>>> 1. Design yourself a Genbank class which implements the interface
> Thing
> >>> and
> >>>> can fully represent all the data that might possibly occur inside a
> >>> Genbank
> >>>> file.
> >>>>
> >>>> 2. Write an interface called GenbankReceiver, which extends
> ThingReceiver
> >>>> and defines all the methods you might need in order to construct a
> >>> Genbank
> >>>> object in an asynchronous fashion.
> >>>>
> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
> >>>> ThingBuilder. It's job is to receive data via method calls, use that
> data
> >>> to
> >>>> construct a Genbank object, then provide that object on demand.
> >>>>
> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> >>>> constructing new Genbank objects, it writes Genbank records to file
> that
> >>>> reflect the data it receives.
> >>>>
> >>>> 5. Write a GenbankReader class which implements ThingReader. It can
> read
> >>>> GenbankFiles and output the data to the methods of the ThingReceiver
> >>>> provided to it, which in this case could be anything which implements
> the
> >>>> interface GenbankReceiver.
> >>>>
> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
> takes a
> >>>> Genbank object and will fire off data from it to the provided
> >>> ThingReceiver
> >>>> (a GenbankReceiver instance) as if the Genbank object was being read
> from
> >>> a
> >>>> file or some other source.
> >>>>
> >>>> That's it! OK so it's a minimum of 6 classes instead of the original 1
> or
> >>> 2,
> >>>> but the additional steps are necessary for flexibility in converting
> >>> between
> >>>> formats.
> >>>>
> >>>> Now to use it (you'll probably want a GenbankTools class to wrap these
> >>> steps
> >>>> up for user-friendliness, including various options for opening files,
> >>>> etc.):
> >>>>
> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
> >>> the
> >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods
> on
> >>>> ThingParser to get the objects out.
> >>>>
> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
> >>> wrapping
> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the
> >>> parseAll()
> >>>> method on the ThingParser to dump the whole lot to your chosen output.
> >>>>
> >>>> The clever bit comes when you want to convert between files. Imagine
> >>> you've
> >>>> done all the above for Genbank, and you've also done it for FASTA. How
> to
> >>>> convert between them? What you need to do is this:
> >>>>
> >>>> 1. Implement all the classes for both Genbank and FASTA.
> >>>>
> >>>> 2. Write a GenbankFASTAConverter class that implements
> >>> ThingConverter<FASTA>
> >>>> and GenbankReceiver, and will internally convert the data received and
> >>> pass
> >>>> it on out to the receiver provided, which will be a FASTAReceiver
> >>> instance.
> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
> >>> opposite
> >>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >>>>
> >>>> Then to convert you use ThingParser again:
> >>>>
> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a
> >>>> FASTAGenbankConverter instance to the converter chain. Use the
> iterator
> >>> to
> >>>> get your Genbank objects out of your FASTA file.
> >>>>
> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
> >>>> GenbankWriter instead and use parseAll() instead of the iterator
> methos.
> >>>>
> >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide
> a
> >>>> FASTAEmitter wrapping your FASTA object as the reader instead.
> >>>>
> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both
> the
> >>>> reader and the receiver as per options 2 and 3.
> >>>>
> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> >>> mentions
> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >>>>
> >>>> One last and very important feature of this approach is that if you
> >>> discover
> >>>> that nobody has written the appropriate converter for your chosen pair
> of
> >>>> formats A and C, but converters do exist to map A to some other format
> B
> >>> and
> >>>> that other format B on to C, then you can just put the two converts
> A-B
> >>> and
> >>>> B-C into the ThingParser chain and it'll work perfectly.
> >>>>
> >>>> Enjoy!
> >>>>
> >>>> cheers,
> >>>> Richard
> >>>>
> >>>> --
> >>>> Richard Holland, BSc MBCS
> >>>> Finance Director, Eagle Genomics Ltd
> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >>>> http://www.eaglegenomics.com/
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>
> >>
> >>
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Tue Oct 21 11:24:13 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 19:24:13 +0800
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
Message-ID: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>

Depending on what you want them for isMachineGenerated(),
isManuallyCurated(), would possibly be better as annotations
(@MachineGenerated, @ManuallyCurated). This is true metadata.

Probably if Java had annotations in version 1.1 Serializable would
also be an Annotation.  I would agree with the idea that ThingBuilder
etc should be typed on extends Serializable.

- Mark

On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
<dicknetherlands at gmail.com> wrote:
> For now, yes it's empty. But I can envisage situations where it might be
> nice to have Thing implement some common methods (e.g. isMachineGenerated(),
> isManuallyCurated(), etc.). I'd rather have it there now to be a placeholder
> for future expansion, than have to re-engineer everything should we identify
> a need for common functions in future.
>
> You'll see that Thing already extends Serializable, implying that all Things
> must be able to persist to an object backing store. Serializable itself is
> also an empty interface!
>
> Also I like the idea of having Thing, not Object, as a kind of marker of
> intention. To me it makes it clearer when reading code to avoid Object
> wherever possible. Thing may not be any more clever than Object, but it
> immediately declares an intention when reading code as to what kind of
> Object should be expected.
>
>
> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>
>> Is there any need for Thing at all? Can't a bulder be typed to produce
>> something that extends Object?
>>
>> If Thing provides no behaivour contract or meta-information then why
>> does it exist?
>>
>> - Mark
>>
>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>> > Depends on what you want to program. If you want to have a collection of
>> > objects which are Things & perform a common action on them then
>> > annotations are not the way forward.
>> >
>> > If you want to have some kind of meta-programming occurring & need a
>> > class to be multiple things then annotations are right. There is
>> > currently no way to enforce compile time dependencies on annotations &
>> > my thinking is that this is right. Annotations should be meta data or
>> > provide a way to alter a class in a non-invasive way (think Web Service
>> > annotations creating WS Servers & Clients without any alteration of the
>> > class).
>> >
>> > Andy
>> >
>> > Richard Holland wrote:
>> >> Spot on.
>> >>
>> >> Annotation/interface.... i think Annotation is probably better as you
>> >> suggest, but I'd have to look into that. Not sure how it works with
>> >> collections and generics. If it does turn out to be a better bet, I'll
>> >> change it over.
>> >>
>> >> With the BioSQL dependencies, take a look at the pom.xml file inside
>> >> the
>> >> biojava-dna module. It declares a dependency on biojava-core. If you
>> >> want to
>> >> add dependencies to external JARs, take a look at biojava-biosql's
>> >> pom.xml
>> >> to see how it depends on javax.persistence. (The easiest way to add
>> >> these is
>> >> via an IDE such as NetBeans, which is what I'm using at the moment).
>> >>
>> >> cheers,
>> >> Richard
>> >>
>> >> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>> >>
>> >>> So if I want to build a BioSQL loader from Genbank then would the
>> >>> classes (or there wrappers) in the BioSQL Entity package need to
>> >>> implement Thing?  Would maven have an issue with that or would it just
>> >>> create a dependency on core? (you can tell I've never used Maven
>> >>> right).
>> >>>
>> >>> From a design point of view should Thing be an interface or an
>> >>> Annotation? The reason I ask is that it doesn't define any methods so
>> >>> it is more of a tag than an interface.
>> >>>
>> >>> Anyway, my understanding is that I would use a Genbank parser (or
>> >>> write one). Write a EntityReceiver interface (probably more than one
>> >>> given the number of entities in BioSQL, implement a EntityBuilder
>> >>> (again possibly more than one) that implements EntityReceiver and
>> >>> builds Entity beans from messages it receives. In this case I probably
>> >>> wouldn't provide a writer as JPA would be writing the beans to the
>> >>> database.  Would this be how you imagine it?
>> >>>
>> >>> - Mark
>> >>>
>> >>>
>> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>> >>> <holland at eaglegenomics.com> wrote:
>> >>>> (From now on I will only be posting these development messages to
>> >>>> biojava-dev, which is the intended purpose of that list. Those of you
>> >>>> who
>> >>>> wish to keep track of things but are currently only subscribed to
>> >>> biojava-l
>> >>>> should also subscribe to biojava-dev in order to keep up to date.)
>> >>>>
>> >>>> As promised, I've committed a new package in the biojava-core module
>> >>>> that
>> >>>> should help understand how to do file parsing and conversion and
>> >>>> writing
>> >>> in
>> >>>> the new BJ3 modules. Here's an example of how to use it to write a
>> >>> Genbank
>> >>>> parser (note no parsers actually exist yet!):
>> >>>>
>> >>>> 1. Design yourself a Genbank class which implements the interface
>> >>>> Thing
>> >>> and
>> >>>> can fully represent all the data that might possibly occur inside a
>> >>> Genbank
>> >>>> file.
>> >>>>
>> >>>> 2. Write an interface called GenbankReceiver, which extends
>> >>>> ThingReceiver
>> >>>> and defines all the methods you might need in order to construct a
>> >>> Genbank
>> >>>> object in an asynchronous fashion.
>> >>>>
>> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>> >>>> ThingBuilder. It's job is to receive data via method calls, use that
>> >>>> data
>> >>> to
>> >>>> construct a Genbank object, then provide that object on demand.
>> >>>>
>> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>> >>>> constructing new Genbank objects, it writes Genbank records to file
>> >>>> that
>> >>>> reflect the data it receives.
>> >>>>
>> >>>> 5. Write a GenbankReader class which implements ThingReader. It can
>> >>>> read
>> >>>> GenbankFiles and output the data to the methods of the ThingReceiver
>> >>>> provided to it, which in this case could be anything which implements
>> >>>> the
>> >>>> interface GenbankReceiver.
>> >>>>
>> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
>> >>>> takes a
>> >>>> Genbank object and will fire off data from it to the provided
>> >>> ThingReceiver
>> >>>> (a GenbankReceiver instance) as if the Genbank object was being read
>> >>>> from
>> >>> a
>> >>>> file or some other source.
>> >>>>
>> >>>> That's it! OK so it's a minimum of 6 classes instead of the original
>> >>>> 1 or
>> >>> 2,
>> >>>> but the additional steps are necessary for flexibility in converting
>> >>> between
>> >>>> formats.
>> >>>>
>> >>>> Now to use it (you'll probably want a GenbankTools class to wrap
>> >>>> these
>> >>> steps
>> >>>> up for user-friendliness, including various options for opening
>> >>>> files,
>> >>>> etc.):
>> >>>>
>> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader
>> >>>> as
>> >>> the
>> >>>> reader, and GenbankBuilder as the receiver. Use the iterator methods
>> >>>> on
>> >>>> ThingParser to get the objects out.
>> >>>>
>> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>> >>> wrapping
>> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>> >>> parseAll()
>> >>>> method on the ThingParser to dump the whole lot to your chosen
>> >>>> output.
>> >>>>
>> >>>> The clever bit comes when you want to convert between files. Imagine
>> >>> you've
>> >>>> done all the above for Genbank, and you've also done it for FASTA.
>> >>>> How to
>> >>>> convert between them? What you need to do is this:
>> >>>>
>> >>>> 1. Implement all the classes for both Genbank and FASTA.
>> >>>>
>> >>>> 2. Write a GenbankFASTAConverter class that implements
>> >>> ThingConverter<FASTA>
>> >>>> and GenbankReceiver, and will internally convert the data received
>> >>>> and
>> >>> pass
>> >>>> it on out to the receiver provided, which will be a FASTAReceiver
>> >>> instance.
>> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>> >>> opposite
>> >>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
>> >>>>
>> >>>> Then to convert you use ThingParser again:
>> >>>>
>> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>> >>>> FASTAGenbankConverter instance to the converter chain. Use the
>> >>>> iterator
>> >>> to
>> >>>> get your Genbank objects out of your FASTA file.
>> >>>>
>> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>> >>>> GenbankWriter instead and use parseAll() instead of the iterator
>> >>>> methos.
>> >>>>
>> >>>> 3. From FASTA object to Genbank object: Same as option 1, but provide
>> >>>> a
>> >>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>> >>>>
>> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both
>> >>>> the
>> >>>> reader and the receiver as per options 2 and 3.
>> >>>>
>> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>> >>> mentions
>> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>> >>>>
>> >>>> One last and very important feature of this approach is that if you
>> >>> discover
>> >>>> that nobody has written the appropriate converter for your chosen
>> >>>> pair of
>> >>>> formats A and C, but converters do exist to map A to some other
>> >>>> format B
>> >>> and
>> >>>> that other format B on to C, then you can just put the two converts
>> >>>> A-B
>> >>> and
>> >>>> B-C into the ThingParser chain and it'll work perfectly.
>> >>>>
>> >>>> Enjoy!
>> >>>>
>> >>>> cheers,
>> >>>> Richard
>> >>>>
>> >>>> --
>> >>>> Richard Holland, BSc MBCS
>> >>>> Finance Director, Eagle Genomics Ltd
>> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> >>>> http://www.eaglegenomics.com/
>> >>>> _______________________________________________
>> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >>>>
>> >>
>> >>
>> >>
>> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>


From andreas at sdsc.edu  Tue Oct 21 11:31:40 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 21 Oct 2008 04:31:40 -0700
Subject: [Biojava-dev] BioJava3 contribution
In-Reply-To: <086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr>
References: <B6195C9D-2722-4B2A-A197-79C83ADF2B37@ibmc.u-strasbg.fr>
	<a0d826f40810200651v30b8161ka20de9ebba28f96b@mail.gmail.com>
	<59a41c430810201218n194660e2udb17be18f8029779@mail.gmail.com>
	<086C2EC4-C9AD-4C00-B348-F7D781C0F3EC@orange.fr>
Message-ID: <59a41c430810210431v2a9e1647w6a6fca991926f175@mail.gmail.com>

Hi Fabrice,

The biojava 1 features could only accept integer positions as start
and stop. For protein structures an amino acid is uniquely identified
by a number and an insertion code. As such in the biojava 1 world it
was not possible to implement this for the protein structures. If we
have a cleaner interface definition for that in biojava 3 should be no
prob.

Andreas

On Mon, Oct 20, 2008 at 1:40 PM, Fabrice Jossinet <fjossinet at orange.fr> wrote:
> Hi Andreas,
> yes of course, I really would like to work with you (I like your work with
> SPICE). I wanted to contact you about this point before to start. Concerning
> the tertiary structure representation, I need to annotate an RNA tertiary
> structure with base-pairs families (as described in
> http://www.ncbi.nlm.nih.gov/pubmed/12177293 or in
> http://prion.bchs.uh.edu/bp_type/ ) and structural motifs (like those listed
> in the SCOR database
>  http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=308814). The idea
> is to attach these features to a 3D in the same way than the features
> attached to a sequence (1D).
> What do you think?
> Fabrice
> Le 20 oct. 08 ? 21:18, Andreas Prlic a ?crit :
>
> Hi Fabrice,
>
> Regarding the tertiaty structure representation we should work
> together. There is a seet of tools available already in the current
> biojava 1.7 which I was intending to maintain and migrate to biojava v
> 3. Let me know if you have specific RNA related requests...
>
> Andreas
>
> On Mon, Oct 20, 2008 at 6:51 AM, Richard Holland
> <holland at eaglegenomics.com> wrote:
>
> Excellent! Thanks for your offer of help!
>
> Yes, an advanced RNA module would be very helpful indeed. You should
>
> probably call it 'rna'.
>
> As long as everyone who intends to work on BJ3 declares their intentions
>
> here, as you just have, then basically it's first come first served. I won't
>
> be doing any official supervision other than keeping an eye on committed
>
> code once in a while to make sure it all looks OK. So feel free to start
>
> coding straight away!
>
> All new modules should probably start by:
>
> 1. copying the existing dna module to something new, like 'rna' in this
>
> case.
>
> 2. remove all the hidden .svn directories from the copy,
>
> 3. update the pom.xml in the copy (do a search-and-replace on dna and change
>
> to the new name, rna in this case), delete the existing source packages in
>
> src/main/java (org.biojava.dna) and create suitable new ones
>
> (org.biojava.rna in this case).
>
> 4. empty out the target/ folder then svn add the new module
>
> 5. svn:ignore the target/ directory in your new module,
>
> 6. include your new module in the list at the end of the pom.xml in the root
>
> directory of the biojava3 branch.
>
> cheers,
>
> Richard
>
>
>
> 2008/10/20 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
>
> Dear Richard,
>
> I'm answering to your "official call", to propose you my help for the
>
> development of the biojava3 code. With the modularity of Maven, I also would
>
> like to proposes you my help for the development of a module that will use
>
> the biojava3 code to manage more specialized RNA stuff (secondary and
>
> tertiary structures, base-pairs classifications, modified nucleotides, RNA
>
> alignments,....).
>
> What will be the next step for me? Will you make a selection?
>
> Best Regards
>
> Fabrice Jossinet
>
> --
>
> Dr. Fabrice Jossinet
>
> Laboratoire de Bioinformatique, modelisation et simulation des acides
>
> nucleiques
>
> Universite Louis Pasteur
>
> Institut de biologie moleculaire et cellulaire du CNRS
>
> UPR9002, Architecture et Reactivite de l'ARN
>
> 15 rue Rene Descartes
>
> F-67084 Strasbourg Cedex
>
> France
>
> Tel + 33 (0) 3 88 417053
>
> FAX + 33 (0) 3 88 60 22 18
>
> f.jossinet at ibmc.u-strasbg.fr
>
> fjossinet at gmail.com
>
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>
> http://fjossinet.u-strasbg.fr/
>
>
>
>
>
>
>
> --
>
> Richard Holland, BSc MBCS
>
> Finance Director, Eagle Genomics Ltd
>
> M: +44 7500 438846 | E: holland at eaglegenomics.com
>
> http://www.eaglegenomics.com/
>
> _______________________________________________
>
> biojava-dev mailing list
>
> biojava-dev at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
>
>
> --
> Dr. Fabrice Jossinet
> Laboratoire de Bioinformatique, modelisation et simulation des acides
> nucleiques
> Universite Louis Pasteur
> Institut de biologie moleculaire et cellulaire du CNRS
> UPR9002, Architecture et Reactivite de l'ARN
> 15 rue Rene Descartes
> F-67084 Strasbourg Cedex
> France
> Tel + 33 (0) 3 88 417053
> FAX + 33 (0) 3 88 60 22 18
> f.jossinet at ibmc.u-strasbg.fr
> fjossinet at gmail.com
> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
> http://fjossinet.u-strasbg.fr/
>
>
>


From holland at eaglegenomics.com  Tue Oct 21 11:39:44 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 12:39:44 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
	<93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
Message-ID: <a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>

The two examples I gave would be better as annotations, its true.
Serializable, and Cloneable for that matter, would definitely work better
that way.

Well, we could do away with Thing altogether then. I'll update the code.


2008/10/21 Mark Schreiber <markjschreiber at gmail.com>

> Depending on what you want them for isMachineGenerated(),
> isManuallyCurated(), would possibly be better as annotations
> (@MachineGenerated, @ManuallyCurated). This is true metadata.
>
> Probably if Java had annotations in version 1.1 Serializable would
> also be an Annotation.  I would agree with the idea that ThingBuilder
> etc should be typed on extends Serializable.
>
> - Mark
>
> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
> <dicknetherlands at gmail.com> wrote:
> > For now, yes it's empty. But I can envisage situations where it might be
> > nice to have Thing implement some common methods (e.g.
> isMachineGenerated(),
> > isManuallyCurated(), etc.). I'd rather have it there now to be a
> placeholder
> > for future expansion, than have to re-engineer everything should we
> identify
> > a need for common functions in future.
> >
> > You'll see that Thing already extends Serializable, implying that all
> Things
> > must be able to persist to an object backing store. Serializable itself
> is
> > also an empty interface!
> >
> > Also I like the idea of having Thing, not Object, as a kind of marker of
> > intention. To me it makes it clearer when reading code to avoid Object
> > wherever possible. Thing may not be any more clever than Object, but it
> > immediately declares an intention when reading code as to what kind of
> > Object should be expected.
> >
> >
> > 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>
> >> Is there any need for Thing at all? Can't a bulder be typed to produce
> >> something that extends Object?
> >>
> >> If Thing provides no behaivour contract or meta-information then why
> >> does it exist?
> >>
> >> - Mark
> >>
> >> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> >> > Depends on what you want to program. If you want to have a collection
> of
> >> > objects which are Things & perform a common action on them then
> >> > annotations are not the way forward.
> >> >
> >> > If you want to have some kind of meta-programming occurring & need a
> >> > class to be multiple things then annotations are right. There is
> >> > currently no way to enforce compile time dependencies on annotations &
> >> > my thinking is that this is right. Annotations should be meta data or
> >> > provide a way to alter a class in a non-invasive way (think Web
> Service
> >> > annotations creating WS Servers & Clients without any alteration of
> the
> >> > class).
> >> >
> >> > Andy
> >> >
> >> > Richard Holland wrote:
> >> >> Spot on.
> >> >>
> >> >> Annotation/interface.... i think Annotation is probably better as you
> >> >> suggest, but I'd have to look into that. Not sure how it works with
> >> >> collections and generics. If it does turn out to be a better bet,
> I'll
> >> >> change it over.
> >> >>
> >> >> With the BioSQL dependencies, take a look at the pom.xml file inside
> >> >> the
> >> >> biojava-dna module. It declares a dependency on biojava-core. If you
> >> >> want to
> >> >> add dependencies to external JARs, take a look at biojava-biosql's
> >> >> pom.xml
> >> >> to see how it depends on javax.persistence. (The easiest way to add
> >> >> these is
> >> >> via an IDE such as NetBeans, which is what I'm using at the moment).
> >> >>
> >> >> cheers,
> >> >> Richard
> >> >>
> >> >> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >> >>
> >> >>> So if I want to build a BioSQL loader from Genbank then would the
> >> >>> classes (or there wrappers) in the BioSQL Entity package need to
> >> >>> implement Thing?  Would maven have an issue with that or would it
> just
> >> >>> create a dependency on core? (you can tell I've never used Maven
> >> >>> right).
> >> >>>
> >> >>> From a design point of view should Thing be an interface or an
> >> >>> Annotation? The reason I ask is that it doesn't define any methods
> so
> >> >>> it is more of a tag than an interface.
> >> >>>
> >> >>> Anyway, my understanding is that I would use a Genbank parser (or
> >> >>> write one). Write a EntityReceiver interface (probably more than one
> >> >>> given the number of entities in BioSQL, implement a EntityBuilder
> >> >>> (again possibly more than one) that implements EntityReceiver and
> >> >>> builds Entity beans from messages it receives. In this case I
> probably
> >> >>> wouldn't provide a writer as JPA would be writing the beans to the
> >> >>> database.  Would this be how you imagine it?
> >> >>>
> >> >>> - Mark
> >> >>>
> >> >>>
> >> >>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> >> >>> <holland at eaglegenomics.com> wrote:
> >> >>>> (From now on I will only be posting these development messages to
> >> >>>> biojava-dev, which is the intended purpose of that list. Those of
> you
> >> >>>> who
> >> >>>> wish to keep track of things but are currently only subscribed to
> >> >>> biojava-l
> >> >>>> should also subscribe to biojava-dev in order to keep up to date.)
> >> >>>>
> >> >>>> As promised, I've committed a new package in the biojava-core
> module
> >> >>>> that
> >> >>>> should help understand how to do file parsing and conversion and
> >> >>>> writing
> >> >>> in
> >> >>>> the new BJ3 modules. Here's an example of how to use it to write a
> >> >>> Genbank
> >> >>>> parser (note no parsers actually exist yet!):
> >> >>>>
> >> >>>> 1. Design yourself a Genbank class which implements the interface
> >> >>>> Thing
> >> >>> and
> >> >>>> can fully represent all the data that might possibly occur inside a
> >> >>> Genbank
> >> >>>> file.
> >> >>>>
> >> >>>> 2. Write an interface called GenbankReceiver, which extends
> >> >>>> ThingReceiver
> >> >>>> and defines all the methods you might need in order to construct a
> >> >>> Genbank
> >> >>>> object in an asynchronous fashion.
> >> >>>>
> >> >>>> 3. Write a GenbankBuilder class which implements GenbankReceiver
> and
> >> >>>> ThingBuilder. It's job is to receive data via method calls, use
> that
> >> >>>> data
> >> >>> to
> >> >>>> construct a Genbank object, then provide that object on demand.
> >> >>>>
> >> >>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
> >> >>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> >> >>>> constructing new Genbank objects, it writes Genbank records to file
> >> >>>> that
> >> >>>> reflect the data it receives.
> >> >>>>
> >> >>>> 5. Write a GenbankReader class which implements ThingReader. It can
> >> >>>> read
> >> >>>> GenbankFiles and output the data to the methods of the
> ThingReceiver
> >> >>>> provided to it, which in this case could be anything which
> implements
> >> >>>> the
> >> >>>> interface GenbankReceiver.
> >> >>>>
> >> >>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
> >> >>>> takes a
> >> >>>> Genbank object and will fire off data from it to the provided
> >> >>> ThingReceiver
> >> >>>> (a GenbankReceiver instance) as if the Genbank object was being
> read
> >> >>>> from
> >> >>> a
> >> >>>> file or some other source.
> >> >>>>
> >> >>>> That's it! OK so it's a minimum of 6 classes instead of the
> original
> >> >>>> 1 or
> >> >>> 2,
> >> >>>> but the additional steps are necessary for flexibility in
> converting
> >> >>> between
> >> >>>> formats.
> >> >>>>
> >> >>>> Now to use it (you'll probably want a GenbankTools class to wrap
> >> >>>> these
> >> >>> steps
> >> >>>> up for user-friendliness, including various options for opening
> >> >>>> files,
> >> >>>> etc.):
> >> >>>>
> >> >>>> 1. To read a file - instantiate ThingParser with your GenbankReader
> >> >>>> as
> >> >>> the
> >> >>>> reader, and GenbankBuilder as the receiver. Use the iterator
> methods
> >> >>>> on
> >> >>>> ThingParser to get the objects out.
> >> >>>>
> >> >>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
> >> >>> wrapping
> >> >>>> your Genbank object, and a GenbankWriter as the receiver. Use the
> >> >>> parseAll()
> >> >>>> method on the ThingParser to dump the whole lot to your chosen
> >> >>>> output.
> >> >>>>
> >> >>>> The clever bit comes when you want to convert between files.
> Imagine
> >> >>> you've
> >> >>>> done all the above for Genbank, and you've also done it for FASTA.
> >> >>>> How to
> >> >>>> convert between them? What you need to do is this:
> >> >>>>
> >> >>>> 1. Implement all the classes for both Genbank and FASTA.
> >> >>>>
> >> >>>> 2. Write a GenbankFASTAConverter class that implements
> >> >>> ThingConverter<FASTA>
> >> >>>> and GenbankReceiver, and will internally convert the data received
> >> >>>> and
> >> >>> pass
> >> >>>> it on out to the receiver provided, which will be a FASTAReceiver
> >> >>> instance.
> >> >>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
> >> >>> opposite
> >> >>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >> >>>>
> >> >>>> Then to convert you use ThingParser again:
> >> >>>>
> >> >>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with
> a
> >> >>>> FASTAReader reader, a GenbankBuilder receiver, and add a
> >> >>>> FASTAGenbankConverter instance to the converter chain. Use the
> >> >>>> iterator
> >> >>> to
> >> >>>> get your Genbank objects out of your FASTA file.
> >> >>>>
> >> >>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
> >> >>>> GenbankWriter instead and use parseAll() instead of the iterator
> >> >>>> methos.
> >> >>>>
> >> >>>> 3. From FASTA object to Genbank object: Same as option 1, but
> provide
> >> >>>> a
> >> >>>> FASTAEmitter wrapping your FASTA object as the reader instead.
> >> >>>>
> >> >>>> 4. From FASTA object to Genbank file: Same as option 1, but swap
> both
> >> >>>> the
> >> >>>> reader and the receiver as per options 2 and 3.
> >> >>>>
> >> >>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> >> >>> mentions
> >> >>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >> >>>>
> >> >>>> One last and very important feature of this approach is that if you
> >> >>> discover
> >> >>>> that nobody has written the appropriate converter for your chosen
> >> >>>> pair of
> >> >>>> formats A and C, but converters do exist to map A to some other
> >> >>>> format B
> >> >>> and
> >> >>>> that other format B on to C, then you can just put the two converts
> >> >>>> A-B
> >> >>> and
> >> >>>> B-C into the ThingParser chain and it'll work perfectly.
> >> >>>>
> >> >>>> Enjoy!
> >> >>>>
> >> >>>> cheers,
> >> >>>> Richard
> >> >>>>
> >> >>>> --
> >> >>>> Richard Holland, BSc MBCS
> >> >>>> Finance Director, Eagle Genomics Ltd
> >> >>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >> >>>> http://www.eaglegenomics.com/
> >> >>>> _______________________________________________
> >> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> >>>>
> >> >>
> >> >>
> >> >>
> >> >
> >
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ayates at ebi.ac.uk  Tue Oct 21 14:32:45 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 15:32:45 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>	
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>	
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>	
	<48FD97AB.70503@ebi.ac.uk>	
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>	
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>	
	<93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
	<a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>
Message-ID: <48FDE80D.1040106@ebi.ac.uk>

If "Thing" has gone then what impact does this have on remaining
classes? Considering methods like canReadNextThing() & readNextThing();
should this be canReadNext() & readNext()?

Just an idle thought ....

Andy

Richard Holland wrote:
> The two examples I gave would be better as annotations, its true.
> Serializable, and Cloneable for that matter, would definitely work better
> that way.
> 
> Well, we could do away with Thing altogether then. I'll update the code.
> 
> 
> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> 
>> Depending on what you want them for isMachineGenerated(),
>> isManuallyCurated(), would possibly be better as annotations
>> (@MachineGenerated, @ManuallyCurated). This is true metadata.
>>
>> Probably if Java had annotations in version 1.1 Serializable would
>> also be an Annotation.  I would agree with the idea that ThingBuilder
>> etc should be typed on extends Serializable.
>>
>> - Mark
>>
>> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
>> <dicknetherlands at gmail.com> wrote:
>>> For now, yes it's empty. But I can envisage situations where it might be
>>> nice to have Thing implement some common methods (e.g.
>> isMachineGenerated(),
>>> isManuallyCurated(), etc.). I'd rather have it there now to be a
>> placeholder
>>> for future expansion, than have to re-engineer everything should we
>> identify
>>> a need for common functions in future.
>>>
>>> You'll see that Thing already extends Serializable, implying that all
>> Things
>>> must be able to persist to an object backing store. Serializable itself
>> is
>>> also an empty interface!
>>>
>>> Also I like the idea of having Thing, not Object, as a kind of marker of
>>> intention. To me it makes it clearer when reading code to avoid Object
>>> wherever possible. Thing may not be any more clever than Object, but it
>>> immediately declares an intention when reading code as to what kind of
>>> Object should be expected.
>>>
>>>
>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>>> Is there any need for Thing at all? Can't a bulder be typed to produce
>>>> something that extends Object?
>>>>
>>>> If Thing provides no behaivour contract or meta-information then why
>>>> does it exist?
>>>>
>>>> - Mark
>>>>
>>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>>>> Depends on what you want to program. If you want to have a collection
>> of
>>>>> objects which are Things & perform a common action on them then
>>>>> annotations are not the way forward.
>>>>>
>>>>> If you want to have some kind of meta-programming occurring & need a
>>>>> class to be multiple things then annotations are right. There is
>>>>> currently no way to enforce compile time dependencies on annotations &
>>>>> my thinking is that this is right. Annotations should be meta data or
>>>>> provide a way to alter a class in a non-invasive way (think Web
>> Service
>>>>> annotations creating WS Servers & Clients without any alteration of
>> the
>>>>> class).
>>>>>
>>>>> Andy
>>>>>
>>>>> Richard Holland wrote:
>>>>>> Spot on.
>>>>>>
>>>>>> Annotation/interface.... i think Annotation is probably better as you
>>>>>> suggest, but I'd have to look into that. Not sure how it works with
>>>>>> collections and generics. If it does turn out to be a better bet,
>> I'll
>>>>>> change it over.
>>>>>>
>>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside
>>>>>> the
>>>>>> biojava-dna module. It declares a dependency on biojava-core. If you
>>>>>> want to
>>>>>> add dependencies to external JARs, take a look at biojava-biosql's
>>>>>> pom.xml
>>>>>> to see how it depends on javax.persistence. (The easiest way to add
>>>>>> these is
>>>>>> via an IDE such as NetBeans, which is what I'm using at the moment).
>>>>>>
>>>>>> cheers,
>>>>>> Richard
>>>>>>
>>>>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
>>>>>>
>>>>>>> So if I want to build a BioSQL loader from Genbank then would the
>>>>>>> classes (or there wrappers) in the BioSQL Entity package need to
>>>>>>> implement Thing?  Would maven have an issue with that or would it
>> just
>>>>>>> create a dependency on core? (you can tell I've never used Maven
>>>>>>> right).
>>>>>>>
>>>>>>> From a design point of view should Thing be an interface or an
>>>>>>> Annotation? The reason I ask is that it doesn't define any methods
>> so
>>>>>>> it is more of a tag than an interface.
>>>>>>>
>>>>>>> Anyway, my understanding is that I would use a Genbank parser (or
>>>>>>> write one). Write a EntityReceiver interface (probably more than one
>>>>>>> given the number of entities in BioSQL, implement a EntityBuilder
>>>>>>> (again possibly more than one) that implements EntityReceiver and
>>>>>>> builds Entity beans from messages it receives. In this case I
>> probably
>>>>>>> wouldn't provide a writer as JPA would be writing the beans to the
>>>>>>> database.  Would this be how you imagine it?
>>>>>>>
>>>>>>> - Mark
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>>>>>> <holland at eaglegenomics.com> wrote:
>>>>>>>> (From now on I will only be posting these development messages to
>>>>>>>> biojava-dev, which is the intended purpose of that list. Those of
>> you
>>>>>>>> who
>>>>>>>> wish to keep track of things but are currently only subscribed to
>>>>>>> biojava-l
>>>>>>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>>>>>>
>>>>>>>> As promised, I've committed a new package in the biojava-core
>> module
>>>>>>>> that
>>>>>>>> should help understand how to do file parsing and conversion and
>>>>>>>> writing
>>>>>>> in
>>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a
>>>>>>> Genbank
>>>>>>>> parser (note no parsers actually exist yet!):
>>>>>>>>
>>>>>>>> 1. Design yourself a Genbank class which implements the interface
>>>>>>>> Thing
>>>>>>> and
>>>>>>>> can fully represent all the data that might possibly occur inside a
>>>>>>> Genbank
>>>>>>>> file.
>>>>>>>>
>>>>>>>> 2. Write an interface called GenbankReceiver, which extends
>>>>>>>> ThingReceiver
>>>>>>>> and defines all the methods you might need in order to construct a
>>>>>>> Genbank
>>>>>>>> object in an asynchronous fashion.
>>>>>>>>
>>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver
>> and
>>>>>>>> ThingBuilder. It's job is to receive data via method calls, use
>> that
>>>>>>>> data
>>>>>>> to
>>>>>>>> construct a Genbank object, then provide that object on demand.
>>>>>>>>
>>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>>>>>>> constructing new Genbank objects, it writes Genbank records to file
>>>>>>>> that
>>>>>>>> reflect the data it receives.
>>>>>>>>
>>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It can
>>>>>>>> read
>>>>>>>> GenbankFiles and output the data to the methods of the
>> ThingReceiver
>>>>>>>> provided to it, which in this case could be anything which
>> implements
>>>>>>>> the
>>>>>>>> interface GenbankReceiver.
>>>>>>>>
>>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
>>>>>>>> takes a
>>>>>>>> Genbank object and will fire off data from it to the provided
>>>>>>> ThingReceiver
>>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being
>> read
>>>>>>>> from
>>>>>>> a
>>>>>>>> file or some other source.
>>>>>>>>
>>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the
>> original
>>>>>>>> 1 or
>>>>>>> 2,
>>>>>>>> but the additional steps are necessary for flexibility in
>> converting
>>>>>>> between
>>>>>>>> formats.
>>>>>>>>
>>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap
>>>>>>>> these
>>>>>>> steps
>>>>>>>> up for user-friendliness, including various options for opening
>>>>>>>> files,
>>>>>>>> etc.):
>>>>>>>>
>>>>>>>> 1. To read a file - instantiate ThingParser with your GenbankReader
>>>>>>>> as
>>>>>>> the
>>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator
>> methods
>>>>>>>> on
>>>>>>>> ThingParser to get the objects out.
>>>>>>>>
>>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>>>>>>> wrapping
>>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>>>>>>> parseAll()
>>>>>>>> method on the ThingParser to dump the whole lot to your chosen
>>>>>>>> output.
>>>>>>>>
>>>>>>>> The clever bit comes when you want to convert between files.
>> Imagine
>>>>>>> you've
>>>>>>>> done all the above for Genbank, and you've also done it for FASTA.
>>>>>>>> How to
>>>>>>>> convert between them? What you need to do is this:
>>>>>>>>
>>>>>>>> 1. Implement all the classes for both Genbank and FASTA.
>>>>>>>>
>>>>>>>> 2. Write a GenbankFASTAConverter class that implements
>>>>>>> ThingConverter<FASTA>
>>>>>>>> and GenbankReceiver, and will internally convert the data received
>>>>>>>> and
>>>>>>> pass
>>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver
>>>>>>> instance.
>>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>>>>>>> opposite
>>>>>>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
>>>>>>>>
>>>>>>>> Then to convert you use ThingParser again:
>>>>>>>>
>>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with
>> a
>>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the
>>>>>>>> iterator
>>>>>>> to
>>>>>>>> get your Genbank objects out of your FASTA file.
>>>>>>>>
>>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator
>>>>>>>> methos.
>>>>>>>>
>>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but
>> provide
>>>>>>>> a
>>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>>>>>>
>>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap
>> both
>>>>>>>> the
>>>>>>>> reader and the receiver as per options 2 and 3.
>>>>>>>>
>>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>>>>>>> mentions
>>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>>>>>>
>>>>>>>> One last and very important feature of this approach is that if you
>>>>>>> discover
>>>>>>>> that nobody has written the appropriate converter for your chosen
>>>>>>>> pair of
>>>>>>>> formats A and C, but converters do exist to map A to some other
>>>>>>>> format B
>>>>>>> and
>>>>>>>> that other format B on to C, then you can just put the two converts
>>>>>>>> A-B
>>>>>>> and
>>>>>>>> B-C into the ThingParser chain and it'll work perfectly.
>>>>>>>>
>>>>>>>> Enjoy!
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Richard
>>>>>>>>
>>>>>>>> --
>>>>>>>> Richard Holland, BSc MBCS
>>>>>>>> Finance Director, Eagle Genomics Ltd
>>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>>>>>>> http://www.eaglegenomics.com/
>>>>>>>> _______________________________________________
>>>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>>>
>>>>>>
>>>>>>
>>>
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Finance Director, Eagle Genomics Ltd
>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
> 
> 
> 


From holland at eaglegenomics.com  Tue Oct 21 16:13:37 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 17:13:37 +0100
Subject: [Biojava-dev] [Biojava-l] File parsing in BJ3
In-Reply-To: <48FDE80D.1040106@ebi.ac.uk>
References: <a0d826f40810201052o341f3c4cj10dd2765167ce19f@mail.gmail.com>
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	<a0d826f40810210134h56f9e0dv5abd988ecdd7b7b5@mail.gmail.com>
	<48FD97AB.70503@ebi.ac.uk>
	<93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>
	<a0d826f40810210414u33ea8048p37ebe9d8fa357ea4@mail.gmail.com>
	<93b45ca50810210424g5a9288f0w803e6d5ca4b840d3@mail.gmail.com>
	<a0d826f40810210439q4533b963gebd3b03edf5233c4@mail.gmail.com>
	<48FDE80D.1040106@ebi.ac.uk>
Message-ID: <a0d826f40810210913x36fb7332jdec072c4c1aea0d2@mail.gmail.com>

Yup - why not. Feel free to go in and edit. :)

2008/10/21 Andy Yates <ayates at ebi.ac.uk>

> If "Thing" has gone then what impact does this have on remaining
> classes? Considering methods like canReadNextThing() & readNextThing();
> should this be canReadNext() & readNext()?
>
> Just an idle thought ....
>
> Andy
>
> Richard Holland wrote:
> > The two examples I gave would be better as annotations, its true.
> > Serializable, and Cloneable for that matter, would definitely work better
> > that way.
> >
> > Well, we could do away with Thing altogether then. I'll update the code.
> >
> >
> > 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >
> >> Depending on what you want them for isMachineGenerated(),
> >> isManuallyCurated(), would possibly be better as annotations
> >> (@MachineGenerated, @ManuallyCurated). This is true metadata.
> >>
> >> Probably if Java had annotations in version 1.1 Serializable would
> >> also be an Annotation.  I would agree with the idea that ThingBuilder
> >> etc should be typed on extends Serializable.
> >>
> >> - Mark
> >>
> >> On Tue, Oct 21, 2008 at 7:14 PM, Richard Holland
> >> <dicknetherlands at gmail.com> wrote:
> >>> For now, yes it's empty. But I can envisage situations where it might
> be
> >>> nice to have Thing implement some common methods (e.g.
> >> isMachineGenerated(),
> >>> isManuallyCurated(), etc.). I'd rather have it there now to be a
> >> placeholder
> >>> for future expansion, than have to re-engineer everything should we
> >> identify
> >>> a need for common functions in future.
> >>>
> >>> You'll see that Thing already extends Serializable, implying that all
> >> Things
> >>> must be able to persist to an object backing store. Serializable itself
> >> is
> >>> also an empty interface!
> >>>
> >>> Also I like the idea of having Thing, not Object, as a kind of marker
> of
> >>> intention. To me it makes it clearer when reading code to avoid Object
> >>> wherever possible. Thing may not be any more clever than Object, but it
> >>> immediately declares an intention when reading code as to what kind of
> >>> Object should be expected.
> >>>
> >>>
> >>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>>> Is there any need for Thing at all? Can't a bulder be typed to produce
> >>>> something that extends Object?
> >>>>
> >>>> If Thing provides no behaivour contract or meta-information then why
> >>>> does it exist?
> >>>>
> >>>> - Mark
> >>>>
> >>>> On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> >>>>> Depends on what you want to program. If you want to have a collection
> >> of
> >>>>> objects which are Things & perform a common action on them then
> >>>>> annotations are not the way forward.
> >>>>>
> >>>>> If you want to have some kind of meta-programming occurring & need a
> >>>>> class to be multiple things then annotations are right. There is
> >>>>> currently no way to enforce compile time dependencies on annotations
> &
> >>>>> my thinking is that this is right. Annotations should be meta data or
> >>>>> provide a way to alter a class in a non-invasive way (think Web
> >> Service
> >>>>> annotations creating WS Servers & Clients without any alteration of
> >> the
> >>>>> class).
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> Richard Holland wrote:
> >>>>>> Spot on.
> >>>>>>
> >>>>>> Annotation/interface.... i think Annotation is probably better as
> you
> >>>>>> suggest, but I'd have to look into that. Not sure how it works with
> >>>>>> collections and generics. If it does turn out to be a better bet,
> >> I'll
> >>>>>> change it over.
> >>>>>>
> >>>>>> With the BioSQL dependencies, take a look at the pom.xml file inside
> >>>>>> the
> >>>>>> biojava-dna module. It declares a dependency on biojava-core. If you
> >>>>>> want to
> >>>>>> add dependencies to external JARs, take a look at biojava-biosql's
> >>>>>> pom.xml
> >>>>>> to see how it depends on javax.persistence. (The easiest way to add
> >>>>>> these is
> >>>>>> via an IDE such as NetBeans, which is what I'm using at the moment).
> >>>>>>
> >>>>>> cheers,
> >>>>>> Richard
> >>>>>>
> >>>>>> 2008/10/21 Mark Schreiber <markjschreiber at gmail.com>
> >>>>>>
> >>>>>>> So if I want to build a BioSQL loader from Genbank then would the
> >>>>>>> classes (or there wrappers) in the BioSQL Entity package need to
> >>>>>>> implement Thing?  Would maven have an issue with that or would it
> >> just
> >>>>>>> create a dependency on core? (you can tell I've never used Maven
> >>>>>>> right).
> >>>>>>>
> >>>>>>> From a design point of view should Thing be an interface or an
> >>>>>>> Annotation? The reason I ask is that it doesn't define any methods
> >> so
> >>>>>>> it is more of a tag than an interface.
> >>>>>>>
> >>>>>>> Anyway, my understanding is that I would use a Genbank parser (or
> >>>>>>> write one). Write a EntityReceiver interface (probably more than
> one
> >>>>>>> given the number of entities in BioSQL, implement a EntityBuilder
> >>>>>>> (again possibly more than one) that implements EntityReceiver and
> >>>>>>> builds Entity beans from messages it receives. In this case I
> >> probably
> >>>>>>> wouldn't provide a writer as JPA would be writing the beans to the
> >>>>>>> database.  Would this be how you imagine it?
> >>>>>>>
> >>>>>>> - Mark
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
> >>>>>>> <holland at eaglegenomics.com> wrote:
> >>>>>>>> (From now on I will only be posting these development messages to
> >>>>>>>> biojava-dev, which is the intended purpose of that list. Those of
> >> you
> >>>>>>>> who
> >>>>>>>> wish to keep track of things but are currently only subscribed to
> >>>>>>> biojava-l
> >>>>>>>> should also subscribe to biojava-dev in order to keep up to date.)
> >>>>>>>>
> >>>>>>>> As promised, I've committed a new package in the biojava-core
> >> module
> >>>>>>>> that
> >>>>>>>> should help understand how to do file parsing and conversion and
> >>>>>>>> writing
> >>>>>>> in
> >>>>>>>> the new BJ3 modules. Here's an example of how to use it to write a
> >>>>>>> Genbank
> >>>>>>>> parser (note no parsers actually exist yet!):
> >>>>>>>>
> >>>>>>>> 1. Design yourself a Genbank class which implements the interface
> >>>>>>>> Thing
> >>>>>>> and
> >>>>>>>> can fully represent all the data that might possibly occur inside
> a
> >>>>>>> Genbank
> >>>>>>>> file.
> >>>>>>>>
> >>>>>>>> 2. Write an interface called GenbankReceiver, which extends
> >>>>>>>> ThingReceiver
> >>>>>>>> and defines all the methods you might need in order to construct a
> >>>>>>> Genbank
> >>>>>>>> object in an asynchronous fashion.
> >>>>>>>>
> >>>>>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver
> >> and
> >>>>>>>> ThingBuilder. It's job is to receive data via method calls, use
> >> that
> >>>>>>>> data
> >>>>>>> to
> >>>>>>>> construct a Genbank object, then provide that object on demand.
> >>>>>>>>
> >>>>>>>> 4. Write a GenbankWriter class which implements GenbankReceiver
> and
> >>>>>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> >>>>>>>> constructing new Genbank objects, it writes Genbank records to
> file
> >>>>>>>> that
> >>>>>>>> reflect the data it receives.
> >>>>>>>>
> >>>>>>>> 5. Write a GenbankReader class which implements ThingReader. It
> can
> >>>>>>>> read
> >>>>>>>> GenbankFiles and output the data to the methods of the
> >> ThingReceiver
> >>>>>>>> provided to it, which in this case could be anything which
> >> implements
> >>>>>>>> the
> >>>>>>>> interface GenbankReceiver.
> >>>>>>>>
> >>>>>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It
> >>>>>>>> takes a
> >>>>>>>> Genbank object and will fire off data from it to the provided
> >>>>>>> ThingReceiver
> >>>>>>>> (a GenbankReceiver instance) as if the Genbank object was being
> >> read
> >>>>>>>> from
> >>>>>>> a
> >>>>>>>> file or some other source.
> >>>>>>>>
> >>>>>>>> That's it! OK so it's a minimum of 6 classes instead of the
> >> original
> >>>>>>>> 1 or
> >>>>>>> 2,
> >>>>>>>> but the additional steps are necessary for flexibility in
> >> converting
> >>>>>>> between
> >>>>>>>> formats.
> >>>>>>>>
> >>>>>>>> Now to use it (you'll probably want a GenbankTools class to wrap
> >>>>>>>> these
> >>>>>>> steps
> >>>>>>>> up for user-friendliness, including various options for opening
> >>>>>>>> files,
> >>>>>>>> etc.):
> >>>>>>>>
> >>>>>>>> 1. To read a file - instantiate ThingParser with your
> GenbankReader
> >>>>>>>> as
> >>>>>>> the
> >>>>>>>> reader, and GenbankBuilder as the receiver. Use the iterator
> >> methods
> >>>>>>>> on
> >>>>>>>> ThingParser to get the objects out.
> >>>>>>>>
> >>>>>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
> >>>>>>> wrapping
> >>>>>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
> >>>>>>> parseAll()
> >>>>>>>> method on the ThingParser to dump the whole lot to your chosen
> >>>>>>>> output.
> >>>>>>>>
> >>>>>>>> The clever bit comes when you want to convert between files.
> >> Imagine
> >>>>>>> you've
> >>>>>>>> done all the above for Genbank, and you've also done it for FASTA.
> >>>>>>>> How to
> >>>>>>>> convert between them? What you need to do is this:
> >>>>>>>>
> >>>>>>>> 1. Implement all the classes for both Genbank and FASTA.
> >>>>>>>>
> >>>>>>>> 2. Write a GenbankFASTAConverter class that implements
> >>>>>>> ThingConverter<FASTA>
> >>>>>>>> and GenbankReceiver, and will internally convert the data received
> >>>>>>>> and
> >>>>>>> pass
> >>>>>>>> it on out to the receiver provided, which will be a FASTAReceiver
> >>>>>>> instance.
> >>>>>>>> 3. Write a FASTAGenbankConverter class that operates in exactly
> the
> >>>>>>> opposite
> >>>>>>>> way, implementing ThingConverter<Genbank> and FASTAReceiver.
> >>>>>>>>
> >>>>>>>> Then to convert you use ThingParser again:
> >>>>>>>>
> >>>>>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with
> >> a
> >>>>>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
> >>>>>>>> FASTAGenbankConverter instance to the converter chain. Use the
> >>>>>>>> iterator
> >>>>>>> to
> >>>>>>>> get your Genbank objects out of your FASTA file.
> >>>>>>>>
> >>>>>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide
> a
> >>>>>>>> GenbankWriter instead and use parseAll() instead of the iterator
> >>>>>>>> methos.
> >>>>>>>>
> >>>>>>>> 3. From FASTA object to Genbank object: Same as option 1, but
> >> provide
> >>>>>>>> a
> >>>>>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
> >>>>>>>>
> >>>>>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap
> >> both
> >>>>>>>> the
> >>>>>>>> reader and the receiver as per options 2 and 3.
> >>>>>>>>
> >>>>>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> >>>>>>> mentions
> >>>>>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >>>>>>>>
> >>>>>>>> One last and very important feature of this approach is that if
> you
> >>>>>>> discover
> >>>>>>>> that nobody has written the appropriate converter for your chosen
> >>>>>>>> pair of
> >>>>>>>> formats A and C, but converters do exist to map A to some other
> >>>>>>>> format B
> >>>>>>> and
> >>>>>>>> that other format B on to C, then you can just put the two
> converts
> >>>>>>>> A-B
> >>>>>>> and
> >>>>>>>> B-C into the ThingParser chain and it'll work perfectly.
> >>>>>>>>
> >>>>>>>> Enjoy!
> >>>>>>>>
> >>>>>>>> cheers,
> >>>>>>>> Richard
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Richard Holland, BSc MBCS
> >>>>>>>> Finance Director, Eagle Genomics Ltd
> >>>>>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >>>>>>>> http://www.eaglegenomics.com/
> >>>>>>>> _______________________________________________
> >>>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>
> >>>
> >>> --
> >>> Richard Holland, BSc MBCS
> >>> Finance Director, Eagle Genomics Ltd
> >>> M: +44 7500 438846 | E: holland at eaglegenomics.com
> >>> http://www.eaglegenomics.com/
> >>>
> >
> >
> >
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From fjossinet at orange.fr  Tue Oct 21 19:55:47 2008
From: fjossinet at orange.fr (Fabrice Jossinet)
Date: Tue, 21 Oct 2008 21:55:47 +0200
Subject: [Biojava-dev] Biojava 3 and intermolecular features
Message-ID: <F4BAD9C5-8883-4DE8-9DBC-B4A144E6309B@orange.fr>

Hi all,

When I used the previous releases of biojava, i had some problems to  
model inter-molecular features. For example interactions between two  
sequences/molecules in a tertiary structure or the interactions  
between two molecular partners in an interaction network. The feature  
should be the same, shared by (at least) 2 molecules but can be  
attached to different locations for each molecule.

With the current biojava model, a feature is composed of one location  
for a given sequence. Consequently, for the development of my previous  
software, I decided to change a little bit the biojava paradigm. For  
example, to model an intermolecular interaction between the region  
23-35 of mySeq1 and the region 34-46 of mySeq2 i have:

Feature myFeature = new InterMolecularInteraction();

mySeq1.addAnnotation(new Annotation(myFeature, new Location("23-35")));
mySeq2.addAnnotation(new Annotation(myFeature, new Location("34-46")));

The Annotation concept links a feature to a location and is attached  
to a sequence (this concept has no relation with the Annotation  
concept proposed by Biojava).

With this kind of model, I could also able to use the same concepts  
and strategy to model multiple alignments, which can also be seen as a  
kind of "inter-molecular relation".

Is there any plan to model these kind of features in biojava3? If no,  
can my proposal be a good start ?

Fabrice


--
Dr. Fabrice Jossinet
Laboratoire de Bioinformatique, modelisation et simulation des acides
nucleiques
Universite Louis Pasteur
Institut de biologie moleculaire et cellulaire du CNRS
UPR9002, Architecture et Reactivite de l'ARN
15 rue Rene Descartes
F-67084 Strasbourg Cedex
France

Tel + 33 (0) 3 88 417053
FAX + 33 (0) 3 88 60 22 18

f.jossinet at ibmc.u-strasbg.fr
fjossinet at gmail.com
http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
http://fjossinet.u-strasbg.fr/


From heuermh at acm.org  Thu Oct 23 05:12:07 2008
From: heuermh at acm.org (Michael Heuer)
Date: Thu, 23 Oct 2008 01:12:07 -0400 (EDT)
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>

Sorry, I'm a bit late to the game.  Hope I didn't miss anything
exciting yet!

Would it be better to commit this to trunk, and put the current codebase
out to pasture on a branch?

Is it possible (or desireable) to send SVN commit messages to the dev
mailing list?  Or alternatively, should someone create a project entry for
biojava on CIA.vc?

http://cia.vc


As soon as I can remember my dev.open-bio.org password I'll start
committing stuff, otherwise I'll post patches to bugzilla.

   michael


On Mon, 20 Oct 2008, Richard Holland wrote:

> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at eaglegenomics.com  Thu Oct 23 06:04:23 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 23 Oct 2008 07:04:23 +0100
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>
Message-ID: <a0d826f40810222304w47be582bp5107e1d5718683c9@mail.gmail.com>

>
>
> Would it be better to commit this to trunk, and put the current codebase
> out to pasture on a branch?


Andreas is Mr.SVN. Andreas, what do you think?


>
> Is it possible (or desireable) to send SVN commit messages to the dev
> mailing list?  Or alternatively, should someone create a project entry for
> biojava on CIA.vc?
>
> http://cia.vc


I think commit messages to biojava-dev would be very useful. If nothing
else, it provides a good indicator of activity to casual observers, and also
lets people keep an automated eye (by mail filtering) on commits in the
areas that interest them most.


>
> As soon as I can remember my dev.open-bio.org password I'll start
> committing stuff, otherwise I'll post patches to bugzilla.


If you've forgotten it, let support at OBF know and they'll reset it for
you.

cheers,
Richard


>
>
>   michael
>
>
> On Mon, 20 Oct 2008, Richard Holland wrote:
>
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ch.koeberle at googlemail.com  Thu Oct 23 08:58:15 2008
From: ch.koeberle at googlemail.com (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Thu, 23 Oct 2008 10:58:15 +0200
Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship
Message-ID: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>

Hi,
I found a bug in the postgre mapping file for BioEntryRelationship.
line:
<many-to-one name="object" class="Feature" column="object_bioentry_id"
not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
embed-xml="false"/>
The value for the attribute class has to be "BioEntry"

For the BioEntry I miss methodes to have access to subject_bioentry
BioEntryRelationship. I think the BioEntryRelationship. is a parent child
relationship. So it will be nice to have access to both.

Furthermore the hibernate mapping strategies for the BioSQL is quite slow
and produces a lot of queries to the database. Because for all lists and set
the lazy fetch mode is disable. In this mode hibernate will execute one
query for each element in a list or set. The faster way is to enable the
lazy fetch mode an use methods to load the list. Each of these methods
executes only one query.
For excample:

public List<BioEntry> getParents(BioEntry bioEntry){

String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
=:subject";
Query query = session.createQuery(stmt);
query.setParameter("subject", bioEntry);
return query.list();

}


This is factor 2 to 4 faster than the methode BioEntry..getRelationships()
In case of all dependences of an BioEntry-Object an select with lazy
fetching can be 500 times faster than a select with eager fetching (in case
of unigene cluster Hs.4 for example).
Here a example for the relationship unigene cluster Hs.2 and the gene
BC067218 (we use BioSQL to store Unigene)

getParents():
runtime: 14 msec
SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
bioentry1_.accession as accession89_, bioentry1_.description as
descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id as
biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as seq93_,
case when bioentry1_1_.bioentry_id is not null then 2 when
bioentry1_.bioentry_id is not null then 0 end as clazz_ from
unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left
outer join unigene.biosequence bioentry1_1_ on
bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
joinunigene.biosequence bioentry1_2_ on
bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
bioentryre0_.object_bioentry_id=?


bioEntry.getRelationships():
runtime: 36 msec
SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
bioentry0_.accession as accession89_, bioentry0_.description as
descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id as
biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as seq93_,
case when bioentry0_1_.bioentry_id is not null then 2 when
bioentry0_.bioentry_id is not null then 0 end as clazz_ from
unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_
on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
unigene.biosequence bioentry0_2_ on
bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
relationsh0_.bioentry_relationship_id as bioentry1_1_,
relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
relationsh0_ where relationsh0_.object_bioentry_id=?
Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
namespace0_.description as descript4_80_0_ from unigene.biodatabase
namespace0_ where namespace0_.biodatabase_id=?
Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_, bioentry0_.name
as name89_0_, bioentry0_.identifier as identifier89_0_, bioentry0_.accession
as accession89_0_, bioentry0_.description as descript5_89_0_,
bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length
as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as
seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
unigene.bioentry bioentry0_ left outer join unigene.biosequence bioentry0_1_
on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
unigene.biosequence bioentry0_2_ on
bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
bioentry0_.bioentry_id=?
Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
namespace0_.description as descript4_80_0_ from unigene.biodatabase
namespace0_ where namespace0_.biodatabase_id=?
Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_,
term0_.identifier as identifier84_0_, term0_.definition as definition84_0_,
term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from
unigene.term term0_ where term0_.term_id=?
Hibernate: select ontology0_.ontology_id as ontology1_83_0_, ontology0_.name
as name83_0_, ontology0_.definition as definition83_0_ from unigene.ontology
ontology0_ where ontology0_.ontology_id=?
Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id
as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_,
termset0_.identifier as identifier84_0_, termset0_.definition as
definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id
as ontology6_84_0_ from unigene.term termset0_ where termset0_.ontology_id=?
Hibernate: select tripleset0_.ontology_id as ontology5_1_,
tripleset0_.term_relationship_id as term1_1_,
tripleset0_.term_relationship_id as term1_87_0_, tripleset0_.subject_term_id
as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id
as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
tripleset0_.ontology_id=?
Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id
as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
rankedcros0_ where rankedcros0_.term_id=?
Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as
synonym0_ from unigene.term_synonym synonymset0_ where
synonymset0_.term_id=?

-- 
Christian K?berle


From dicknetherlands at gmail.com  Thu Oct 23 09:45:53 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Thu, 23 Oct 2008 10:45:53 +0100
Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship
In-Reply-To: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>
References: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>
Message-ID: <a0d826f40810230245q730eb936g4109c71b9893d47@mail.gmail.com>

Christian,

Thanks for your comments.

I'm not sure which file you're referring to, or what version of BioJava you
have, as the line you quote does not appear in any of the current hbm.xml
files in the trunk of SubVersion.

Also, the BioEntryRelationship interface and it's implementations do already
have getSubject() and getObject() methods which return the parent and child
BioEntry instances.

The BioEntry interface itself has a getBioEntryRelationships() method which
returns all relationships in which it is the object BioEntry. You could use
HQL to obtain those for which it is the subject, but you are right that it
would be good to have a method that returns the latter. Could you raise a
BugZilla request for this?

It would be good if you could do some thorough testing of your lazy loading
suggestions on some other use cases before we decide whether or not to adopt
that approach in future developments. Use cases would include:

1. have a very large database with thousands of related records in it (e.g.
load the whole of GenBank). Iterate over all the records in the database and
perform a simple read operation on each that hits the modified methods. See
if you run out of memory.

2. like 1, but perform a series of repeated read/write operations using the
modified methods, with a final commit to attempt to write the results back
to see if they still persist correctly.

The reason is that the modified methods might cause problems with those
people who are processing large volumes of data in their databases. If all
related records are loaded at once, even only on demand, instead of one at a
time, it will cause memory issues. The trade off is therefore memory vs.
speed. We opted for the memory option because it makes life easier for most
novice coders to not have to trace out-of-memory exceptions (although they
will still occur using the existing methods, but it happens less often).

Also, your method reruns the query every time it is called. It probably
should cache the results after the first call, to prevent objects being
reloaded unnecessarily, and to prevent problems with objects from a previous
call being modified then attempted to be overwritten by a subsequent call?
Also if Hibernate does not receive the same set back that it auto-loaded as
a property via the default get() method when it comes to save the object, it
will throw a wobbly and refuse to commit.

cheers,
Richard


2008/10/23 Christian K?berle <ch.koeberle at googlemail.com>

> Hi,
> I found a bug in the postgre mapping file for BioEntryRelationship.
> line:
> <many-to-one name="object" class="Feature" column="object_bioentry_id"
> not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
> embed-xml="false"/>
> The value for the attribute class has to be "BioEntry"
>
> For the BioEntry I miss methodes to have access to subject_bioentry
> BioEntryRelationship. I think the BioEntryRelationship. is a parent child
> relationship. So it will be nice to have access to both.
>
> Furthermore the hibernate mapping strategies for the BioSQL is quite slow
> and produces a lot of queries to the database. Because for all lists and
> set
> the lazy fetch mode is disable. In this mode hibernate will execute one
> query for each element in a list or set. The faster way is to enable the
> lazy fetch mode an use methods to load the list. Each of these methods
> executes only one query.
> For excample:
>
> public List<BioEntry> getParents(BioEntry bioEntry){
>
> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
> =:subject";
> Query query = session.createQuery(stmt);
> query.setParameter("subject", bioEntry);
> return query.list();
>
> }
>
>
> This is factor 2 to 4 faster than the methode BioEntry..getRelationships()
> In case of all dependences of an BioEntry-Object an select with lazy
> fetching can be 500 times faster than a select with eager fetching (in case
> of unigene cluster Hs.4 for example).
> Here a example for the relationship unigene cluster Hs.2 and the gene
> BC067218 (we use BioSQL to store Unigene)
>
> getParents():
> runtime: 14 msec
> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
> bioentry1_.accession as accession89_, bioentry1_.description as
> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id
> as
> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as
> seq93_,
> case when bioentry1_1_.bioentry_id is not null then 2 when
> bioentry1_.bioentry_id is not null then 0 end as clazz_ from
> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left
> outer join unigene.biosequence bioentry1_1_ on
> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
> joinunigene.biosequence bioentry1_2_ on
> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
> bioentryre0_.object_bioentry_id=?
>
>
> bioEntry.getRelationships():
> runtime: 36 msec
> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
> bioentry0_.accession as accession89_, bioentry0_.description as
> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id
> as
> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as
> seq93_,
> case when bioentry0_1_.bioentry_id is not null then 2 when
> bioentry0_.bioentry_id is not null then 0 end as clazz_ from
> unigene.bioentry bioentry0_ left outer join unigene.biosequence
> bioentry0_1_
> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
> unigene.biosequence bioentry0_2_ on
> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
> Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
> relationsh0_.bioentry_relationship_id as bioentry1_1_,
> relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
> relationsh0_ where relationsh0_.object_bioentry_id=?
> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
> namespace0_.description as descript4_80_0_ from unigene.biodatabase
> namespace0_ where namespace0_.biodatabase_id=?
> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_,
> bioentry0_.name
> as name89_0_, bioentry0_.identifier as identifier89_0_,
> bioentry0_.accession
> as accession89_0_, bioentry0_.description as descript5_89_0_,
> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
> biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length
> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as
> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
> unigene.bioentry bioentry0_ left outer join unigene.biosequence
> bioentry0_1_
> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
> unigene.biosequence bioentry0_2_ on
> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
> bioentry0_.bioentry_id=?
> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
> namespace0_.description as descript4_80_0_ from unigene.biodatabase
> namespace0_ where namespace0_.biodatabase_id=?
> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_,
> term0_.identifier as identifier84_0_, term0_.definition as definition84_0_,
> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from
> unigene.term term0_ where term0_.term_id=?
> Hibernate: select ontology0_.ontology_id as ontology1_83_0_,
> ontology0_.name
> as name83_0_, ontology0_.definition as definition83_0_ from
> unigene.ontology
> ontology0_ where ontology0_.ontology_id=?
> Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id
> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_,
> termset0_.identifier as identifier84_0_, termset0_.definition as
> definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id
> as ontology6_84_0_ from unigene.term termset0_ where
> termset0_.ontology_id=?
> Hibernate: select tripleset0_.ontology_id as ontology5_1_,
> tripleset0_.term_relationship_id as term1_1_,
> tripleset0_.term_relationship_id as term1_87_0_,
> tripleset0_.subject_term_id
> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
> tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id
> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
> tripleset0_.ontology_id=?
> Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id
> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
> rankedcros0_ where rankedcros0_.term_id=?
> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as
> synonym0_ from unigene.term_synonym synonymset0_ where
> synonymset0_.term_id=?
>
> --
> Christian K?berle
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From bugzilla-daemon at portal.open-bio.org  Thu Oct 23 13:16:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 23 Oct 2008 09:16:43 -0400
Subject: [Biojava-dev] [Bug 2625] New: Parent Child Relationship of BioEntry
	via BioEntryRelationship
Message-ID: <bug-2625-485@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2625

           Summary: Parent Child Relationship of BioEntry via
                    BioEntryRelationship
           Product: BioJava
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DB / BioSQL
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: ch.koeberle at googlemail.com


An BioEntry-Object has only the methode getRelationships(), these method gives
all BioEntryRelationship-Objkcts where the BioEntry-Object is the result of
BioEntryRelationship.getObject() . Because the in the BioEntry.hbm.xml is only
these mapping:
<set name="relationships" lazy="false" cascade="all-delete-orphan"
sort="natural" inverse="true">
<key column="object_bioentry_id"/>
<one-to-many class="BioEntryRelationship" embed-xml="true"/>
</set>

I miss somethings like this:
BioEntry.getReverseRelationships() (or getChilds())
<set name="reverseRelationships" lazy="false" cascade="all-delete-orphan"
sort="natural" inverse="true">
<key column="subject_bioentry_id"/>
<one-to-many class="BioEntryRelationship" embed-xml="true"/>
</set>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From andreas at sdsc.edu  Thu Oct 23 13:57:41 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 06:57:41 -0700
Subject: [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <a0d826f40810222304w47be582bp5107e1d5718683c9@mail.gmail.com>
References: <a0d826f40810191718x4f89c210l27f20f2532ed24d3@mail.gmail.com>
	<Pine.GSO.4.44.0810230057590.16122-100000@shell3.shore.net>
	<a0d826f40810222304w47be582bp5107e1d5718683c9@mail.gmail.com>
Message-ID: <59a41c430810230657p73b5d10kbf497c20fdfbe893@mail.gmail.com>

>> Would it be better to commit this to trunk, and put the current codebase
>> out to pasture on a branch?

At the moment we have a number of unreleased bug fixes in
biojava-live/trunk . Also if somebody would start using BJ at the
present I would still recommend to use 1.6. As such I would say for
the moment let's leave it the way it is. Once we reach alpha stage we
could release a final biojava 1.7 and afterwards switch the branches
in svn.

About the commit messages sent to this list: can we make this a once
per day? I can also set something up as part of cruise control...

Andreas


From andreas at sdsc.edu  Thu Oct 23 17:24:27 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 10:24:27 -0700
Subject: [Biojava-dev] svn write access
In-Reply-To: <61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr>
References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr>
	<59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com>
	<61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr>
Message-ID: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com>

Hi Fabrice,

in order to obtain a developer checkout you have to follow the
procedure as it is described on
http://biojava.org/wiki/CVS_to_SVN_Migration
under the section
Developer checkout

code.open-bio is a read only copy of the SVN repository for anonymous
checkout. The "real" developer repository is on the dev.open-bio
machine and you can only access it via ssh. This setup is for security
reasons. code.open-bio and dev.open-bio are getting synchronized
approx ev. 20 min.

Andreas

On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet
<f.jossinet at ibmc.u-strasbg.fr> wrote:
> Ok, I did that with the "code.open-bio.org" server and like that:
>
> svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3
> --username fjossinet --password blabla
>
> In this case, it seems it doesn't work.
>
> I will try the other way as described in the biojava homepage
>
> Thanx
>
> F
> Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit :
>
>> you need to check out with that account, so the svn flags are all set
>> correctly.
>>
>> see the biojava  homepage for how to check out with a developer account.
>> A
>>
>> 2008/10/23 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>:
>>>
>>> Hi Andreas,
>>>
>>> Mauricio has created me the account fjossinet for the machine
>>> dev.open-bio.org. But I think this is only the first step since I still
>>> don't have the write access on the svn machine.
>>>
>>> Thank you for your help
>>>
>>> Regards
>>>
>>> Fabrice
>>>
>>>
>>> --
>>> Dr. Fabrice Jossinet
>>> Laboratoire de Bioinformatique, modelisation et simulation des acides
>>> nucleiques
>>> Universite Louis Pasteur
>>> Institut de biologie moleculaire et cellulaire du CNRS
>>> UPR9002, Architecture et Reactivite de l'ARN
>>> 15 rue Rene Descartes
>>> F-67084 Strasbourg Cedex
>>> France
>>>
>>> Tel + 33 (0) 3 88 417053
>>> FAX + 33 (0) 3 88 60 22 18
>>>
>>> f.jossinet at ibmc.u-strasbg.fr
>>> fjossinet at gmail.com
>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>>> http://fjossinet.u-strasbg.fr/
>>>
>>>
>>>
>>>
>>>
>
>


From andreas at sdsc.edu  Fri Oct 24 03:17:02 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 20:17:02 -0700
Subject: [Biojava-dev] biojava 3 docu on wiki
Message-ID: <59a41c430810232017wbc8874fnf829c5b9e7ced4a9@mail.gmail.com>

Hi,

I summarized the current status of the BioJava3 project at

http://biojava.org/wiki/BioJava3_project

feel free to update/add/comment.

Andreas


From andreas at sdsc.edu  Fri Oct 24 04:01:31 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 23 Oct 2008 21:01:31 -0700
Subject: [Biojava-dev] biojava 3 - java version
Message-ID: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>

Hi,

I just tried to get an initial svn checkout of biojava3 on my mac at
home. It fails to build since there is no Java 1.6 available for my
OSX 10.4.11 ...
Is there a strong reason why we should enforce java 1.6? otherwise
would be good to support 1.5+

Andreas


From f.jossinet at ibmc.u-strasbg.fr  Fri Oct 24 08:21:15 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Fri, 24 Oct 2008 10:21:15 +0200
Subject: [Biojava-dev] svn write access
In-Reply-To: <59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com>
References: <6F5AE187-46C5-405C-80FB-495F97C704B5@ibmc.u-strasbg.fr>
	<59a41c430810230738p400c185chbc6a96f871dbb71b@mail.gmail.com>
	<61C028BE-074B-4E16-A883-B8A2F6AD883E@ibmc.u-strasbg.fr>
	<59a41c430810231024m7b5daf92t3bf6a1a354723301@mail.gmail.com>
Message-ID: <4CF8A26B-C50A-40F2-A7A5-B9F958F0F677@ibmc.u-strasbg.fr>

Hi Andreas,

Thank you for these details. I have added the new RNA module to  
biojava3 branch and I have updated the pom.xml file in the root  
directory of this branch.

Fabrice

Le 23 oct. 08 ? 19:24, Andreas Prlic a ?crit :

> Hi Fabrice,
>
> in order to obtain a developer checkout you have to follow the
> procedure as it is described on
> http://biojava.org/wiki/CVS_to_SVN_Migration
> under the section
> Developer checkout
>
> code.open-bio is a read only copy of the SVN repository for anonymous
> checkout. The "real" developer repository is on the dev.open-bio
> machine and you can only access it via ssh. This setup is for security
> reasons. code.open-bio and dev.open-bio are getting synchronized
> approx ev. 20 min.
>
> Andreas
>
> On Thu, Oct 23, 2008 at 8:22 AM, Fabrice Jossinet
> <f.jossinet at ibmc.u-strasbg.fr> wrote:
>> Ok, I did that with the "code.open-bio.org" server and like that:
>>
>> svn co svn://code.open-bio.org/biojava/biojava-live/branches/biojava3
>> --username fjossinet --password blabla
>>
>> In this case, it seems it doesn't work.
>>
>> I will try the other way as described in the biojava homepage
>>
>> Thanx
>>
>> F
>> Le 23 oct. 08 ? 16:38, Andreas Prlic a ?crit :
>>
>>> you need to check out with that account, so the svn flags are all  
>>> set
>>> correctly.
>>>
>>> see the biojava  homepage for how to check out with a developer  
>>> account.
>>> A
>>>
>>> 2008/10/23 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>:
>>>>
>>>> Hi Andreas,
>>>>
>>>> Mauricio has created me the account fjossinet for the machine
>>>> dev.open-bio.org. But I think this is only the first step since I  
>>>> still
>>>> don't have the write access on the svn machine.
>>>>
>>>> Thank you for your help
>>>>
>>>> Regards
>>>>
>>>> Fabrice
>>>>
>>>>
>>>> --
>>>> Dr. Fabrice Jossinet
>>>> Laboratoire de Bioinformatique, modelisation et simulation des  
>>>> acides
>>>> nucleiques
>>>> Universite Louis Pasteur
>>>> Institut de biologie moleculaire et cellulaire du CNRS
>>>> UPR9002, Architecture et Reactivite de l'ARN
>>>> 15 rue Rene Descartes
>>>> F-67084 Strasbourg Cedex
>>>> France
>>>>
>>>> Tel + 33 (0) 3 88 417053
>>>> FAX + 33 (0) 3 88 60 22 18
>>>>
>>>> f.jossinet at ibmc.u-strasbg.fr
>>>> fjossinet at gmail.com
>>>> http://www-ibmc.u-strasbg.fr/arn/Westhof/index.html
>>>> http://fjossinet.u-strasbg.fr/
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>


From dicknetherlands at gmail.com  Fri Oct 24 09:58:18 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Fri, 24 Oct 2008 10:58:18 +0100
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
Message-ID: <a0d826f40810240258t14b75c86r616671cb34af011@mail.gmail.com>

It's only the older PPC Mac models (running Mac OS X 10.4 or older) which
can't get any newer official versions of Java than 1.5 / 5.0.

However, an alternative (free) route for obtaining a Java 1.6 / 6.0 compiler
is provided for these older machines:
http://landonf.bikemonkey.org/static/soylatte/

We wanted to move to Java 6 because it'll likely take about a year to get
BJ3 fully up and running, by which time Java 6 will probably be the oldest
supported version of Java available from Sun (5.0 is already end-of-lifed,
and with 7.0 due out in January it is likely to be desupported very soon.
When 8.0 probably about 12 months after BJ3 is finished then 5.0 will
definitely become desupported).

cheers,
Richard


2008/10/24 Andreas Prlic <andreas at sdsc.edu>

> Hi,
>
> I just tried to get an initial svn checkout of biojava3 on my mac at
> home. It fails to build since there is no Java 1.6 available for my
> OSX 10.4.11 ...
> Is there a strong reason why we should enforce java 1.6? otherwise
> would be good to support 1.5+
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From f.jossinet at ibmc.u-strasbg.fr  Fri Oct 24 10:20:59 2008
From: f.jossinet at ibmc.u-strasbg.fr (Fabrice Jossinet)
Date: Fri, 24 Oct 2008 12:20:59 +0200
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
Message-ID: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>

Just to refresh the memory....

Major changes included in Java 6:

     * Support for older Win9x versions dropped. The last version for  
Windows 98 and Windows ME is Java Runtime Environment Version 5.0  
Update 16 (1.5.0.16).
     * Scripting Language Support (JSR 223): Generic API for tight  
integration with scripting languages, and built-in Mozilla Javascript  
Rhino integration
     * Dramatic performance improvements for the core platform, and  
Swing.
     * Improved Web Service support through JAX-WS (JSR 224)
     * JDBC 4.0 support (JSR 221).
     * Java Compiler API (JSR 199): an API allowing a Java program to  
select and invoke a Java Compiler programmatically.
     * Upgrade of JAXB to version 2.0: Including integration of a StAX  
parser.
     * Support for pluggable annotations (JSR 269).
     * Many GUI improvements, such as integration of SwingWorker in  
the API, table sorting and filtering, and true Swing double-buffering  
(eliminating the gray-area effect).

Perhaps the core module can be linked to the 1.5 version. And if  
someone needs, for example, the improvements of the GUI for his  
module, this module will be linked to another version.

Possible or not ?

F

Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit :

> Hi,
>
> I just tried to get an initial svn checkout of biojava3 on my mac at
> home. It fails to build since there is no Java 1.6 available for my
> OSX 10.4.11 ...
> Is there a strong reason why we should enforce java 1.6? otherwise
> would be good to support 1.5+
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From dicknetherlands at gmail.com  Fri Oct 24 11:14:43 2008
From: dicknetherlands at gmail.com (Richard Holland)
Date: Fri, 24 Oct 2008 12:14:43 +0100
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>
	<6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>
Message-ID: <a0d826f40810240414w2fffda69nb171634f0808fb73@mail.gmail.com>

If you can find a way to make Maven do that, then I'm happy for you to make
the relevant changes.

cheers,
Richard

2008/10/24 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>

> Just to refresh the memory....
>
> Major changes included in Java 6:
>
>    * Support for older Win9x versions dropped. The last version for Windows
> 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 (
> 1.5.0.16).
>    * Scripting Language Support (JSR 223): Generic API for tight
> integration with scripting languages, and built-in Mozilla Javascript Rhino
> integration
>    * Dramatic performance improvements for the core platform, and Swing.
>    * Improved Web Service support through JAX-WS (JSR 224)
>    * JDBC 4.0 support (JSR 221).
>    * Java Compiler API (JSR 199): an API allowing a Java program to select
> and invoke a Java Compiler programmatically.
>    * Upgrade of JAXB to version 2.0: Including integration of a StAX
> parser.
>    * Support for pluggable annotations (JSR 269).
>    * Many GUI improvements, such as integration of SwingWorker in the API,
> table sorting and filtering, and true Swing double-buffering (eliminating
> the gray-area effect).
>
> Perhaps the core module can be linked to the 1.5 version. And if someone
> needs, for example, the improvements of the GUI for his module, this module
> will be linked to another version.
>
> Possible or not ?
>
> F
>
> Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit :
>
>
>  Hi,
>>
>> I just tried to get an initial svn checkout of biojava3 on my mac at
>> home. It fails to build since there is no Java 1.6 available for my
>> OSX 10.4.11 ...
>> Is there a strong reason why we should enforce java 1.6? otherwise
>> would be good to support 1.5+
>>
>> Andreas
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ayates at ebi.ac.uk  Fri Oct 24 11:28:56 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 24 Oct 2008 12:28:56 +0100
Subject: [Biojava-dev] biojava 3 - java version
In-Reply-To: <a0d826f40810240414w2fffda69nb171634f0808fb73@mail.gmail.com>
References: <59a41c430810232101q7e3c2d36r822634c0bae4ad43@mail.gmail.com>	<6DF53C0D-E0CC-4504-979B-9122AD39EF62@ibmc.u-strasbg.fr>
	<a0d826f40810240414w2fffda69nb171634f0808fb73@mail.gmail.com>
Message-ID: <4901B178.7090307@ebi.ac.uk>

Yes I believe it is possible to get a module compiled against a
different type of Java as seen here:

http://maven.apache.org/plugins/maven-compiler-plugin/howto.html

However to do this properly it requires compiling the code using the 1.5
JDK sources especially if we are going to leverage the API as much as we
can. My group has already encountered this with changes to the
java.sql.Connection interfaces meaning we have to compile against 1.5
sources.

Andy

Richard Holland wrote:
> If you can find a way to make Maven do that, then I'm happy for you to make
> the relevant changes.
> 
> cheers,
> Richard
> 
> 2008/10/24 Fabrice Jossinet <f.jossinet at ibmc.u-strasbg.fr>
> 
>> Just to refresh the memory....
>>
>> Major changes included in Java 6:
>>
>>    * Support for older Win9x versions dropped. The last version for Windows
>> 98 and Windows ME is Java Runtime Environment Version 5.0 Update 16 (
>> 1.5.0.16).
>>    * Scripting Language Support (JSR 223): Generic API for tight
>> integration with scripting languages, and built-in Mozilla Javascript Rhino
>> integration
>>    * Dramatic performance improvements for the core platform, and Swing.
>>    * Improved Web Service support through JAX-WS (JSR 224)
>>    * JDBC 4.0 support (JSR 221).
>>    * Java Compiler API (JSR 199): an API allowing a Java program to select
>> and invoke a Java Compiler programmatically.
>>    * Upgrade of JAXB to version 2.0: Including integration of a StAX
>> parser.
>>    * Support for pluggable annotations (JSR 269).
>>    * Many GUI improvements, such as integration of SwingWorker in the API,
>> table sorting and filtering, and true Swing double-buffering (eliminating
>> the gray-area effect).
>>
>> Perhaps the core module can be linked to the 1.5 version. And if someone
>> needs, for example, the improvements of the GUI for his module, this module
>> will be linked to another version.
>>
>> Possible or not ?
>>
>> F
>>
>> Le 24 oct. 08 ? 06:01, Andreas Prlic a ?crit :
>>
>>
>>  Hi,
>>> I just tried to get an initial svn checkout of biojava3 on my mac at
>>> home. It fails to build since there is no Java 1.6 available for my
>>> OSX 10.4.11 ...
>>> Is there a strong reason why we should enforce java 1.6? otherwise
>>> would be good to support 1.5+
>>>
>>> Andreas
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> 
> 
> 


From pzgyuanf at gmail.com  Sat Oct 25 14:00:17 2008
From: pzgyuanf at gmail.com (pprun)
Date: Sat, 25 Oct 2008 22:00:17 +0800
Subject: [Biojava-dev] Test failed for Alphabet.getSymbolMatchType method
Message-ID: <49032671.1080309@gmail.com>

Hi,
The current implementation uses the same condition equalsIgnoreCase for
EXACT_STRING_MATCH and MIXED_CASE_MATCH


public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
...
if (a.toString().equalsIgnoreCase(b.toString())) {
return SymbolMatchType.EXACT_STRING_MATCH;
}
if (a.toString().equalsIgnoreCase(b.toString())) {
return SymbolMatchType.MIXED_CASE_MATCH;
}
...

String.equals should be used for EXACT_STRING_MATCH:

public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
...
if (a.toString().equals(b.toString())) {
return SymbolMatchType.EXACT_STRING_MATCH;
}
if (a.toString().equalsIgnoreCase(b.toString())) {
return SymbolMatchType.MIXED_CASE_MATCH;
}
...

The test case used to identify the above bug is:

/*
* BioJava development code
*
* This code may be freely distributed and modified under the
* terms of the GNU Lesser General Public Licence. This should
* be distributed with the code. If you do not have a copy,
* see:
*
* http://www.gnu.org/copyleft/lesser.html
*
* Copyright for this code is held jointly by the individual
* authors. These should be listed in @author doc comments.
*
* For more information on the BioJava project and its aims,
* or to join the biojava-l mailing list, visit the home page
* at:
*
* http://www.biojava.org/
*
*/
package org.biojava.core.symbol;

import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import static org.junit.Assert.*;

/**
*
* @author pprun
*/
public class AlphabetTest {

public AlphabetTest() {
}

@BeforeClass
public static void setUpClass() throws Exception {
}

@AfterClass
public static void tearDownClass() throws Exception {
}

@Before
public void setUp() {
}

@After
public void tearDown() {
}

/**
* Test of getSymbolMatchType method, of class Alphabet.
*/
@Test
public void testGetSymbolMatchType() {
System.out.println("getSymbolMatchType");

Alphabet testAlphabet = new Alphabet("testGetSymbolMatchType");

// 1. exact match
Symbol a = Symbol.get("ATGC");
Symbol b = Symbol.get("ATGC");
SymbolMatchType expResult = SymbolMatchType.EXACT_MATCH;
SymbolMatchType result = testAlphabet.getSymbolMatchType(a, b);
assertEquals(expResult, result);

// 2. mixed case match
a = Symbol.get("ATGC");
b = Symbol.get("aTGC");
expResult = SymbolMatchType.MIXED_CASE_MATCH;
result = testAlphabet.getSymbolMatchType(a, b);
assertEquals(expResult, result);
}
}


BTW., how can I get the dev/test role?
Then I can contribute to the development or test (as I'm still a
beginner for bio field) for BJ3.

Thanks,
Pprun


From andreas at sdsc.edu  Tue Oct 28 04:40:35 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 27 Oct 2008 21:40:35 -0700
Subject: [Biojava-dev] BioSQL postgre BioEntryRelationship
In-Reply-To: <a0d826f40810230624i594909a9o74015ad3dd65501a@mail.gmail.com>
References: <ee99d5730810230158r6834481y54751b821cc8873d@mail.gmail.com>
	<a0d826f40810230245q730eb936g4109c71b9893d47@mail.gmail.com>
	<ee99d5730810230621i53a05c6bo16e047d37b2fd578@mail.gmail.com>
	<a0d826f40810230624i594909a9o74015ad3dd65501a@mail.gmail.com>
Message-ID: <59a41c430810272140h290a8a91q26af24946c2c63a5@mail.gmail.com>

Hi Richard,

I updated the 1.6 release with your fixes :
http://www.biojava.org/download/bj16/all/biojava-1.6.1-all.jar
Can you please verify and if it is correct update the download page on the wiki?

Andreas

On Thu, Oct 23, 2008 at 6:24 AM, Richard Holland
<dicknetherlands at gmail.com> wrote:
> Andreas - is it possible to rebuild biojava-1.6-all.jar with the following
> fix made to it?
>
> cheers,
> Richard
>
> ---------- Forwarded message ----------
> From: Christian K?berle <ch.koeberle at googlemail.com>
> Date: 2008/10/23
> Subject: Re: [Biojava-dev] BioSQL postgre BioEntryRelationship
> To: Richard Holland <dicknetherlands at gmail.com>
>
>
> Hi Richard,
>
> I found the error in the current download of biojava 6.1
> (http://www.biojava.org/download/bj16/all/biojava-1.6-all.jar) in the file
> src/org/biojavax/bio/db/biosql/pg/BioEntryRelationship.hbm.xml
>
> <hibernate-mapping>
>     <class name="org.biojavax.bio.SimpleBioEntryRelationship"
> table="bioentry_relationship" node="sequenceRelation"
> entity-name="BioEntryRelationship">
>     <id name="id" type="integer" unsaved-value="null"
> column="bioentry_relationship_id" node="@id">
>     <generator class="sequence">
> <param name="sequence">bioentry_relationship_pk_seq</param>
> </generator>
> </id>
> <many-to-one name="term" class="Term" column="term_id" not-null="true"
> cascade="persist,merge,save-update" node="@termId" embed-xml="false"/>
> <many-to-one name="object" class="Feature" column="object_bioentry_id"
> not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
> embed-xml="false"/>
> <many-to-one name="subject" class="BioEntry" column="subject_bioentry_id"
> not-null="true" cascade="persist,merge,save-update"
> node="@subjectBioEntryId" embed-xml="false"/>
> <property name="rank" node="@rank"/>
> </class>
> </hibernate-mapping>
>
> cheers,
> Christian
>
>
> 2008/10/23 Richard Holland <dicknetherlands at gmail.com>
>>
>> Christian,
>>
>> Thanks for your comments.
>>
>> I'm not sure which file you're referring to, or what version of BioJava
>> you have, as the line you quote does not appear in any of the current
>> hbm.xml files in the trunk of SubVersion.
>>
>> Also, the BioEntryRelationship interface and it's implementations do
>> already have getSubject() and getObject() methods which return the parent
>> and child BioEntry instances.
>>
>> The BioEntry interface itself has a getBioEntryRelationships() method
>> which returns all relationships in which it is the object BioEntry. You
>> could use HQL to obtain those for which it is the subject, but you are right
>> that it would be good to have a method that returns the latter. Could you
>> raise a BugZilla request for this?
>>
>> It would be good if you could do some thorough testing of your lazy
>> loading suggestions on some other use cases before we decide whether or not
>> to adopt that approach in future developments. Use cases would include:
>>
>> 1. have a very large database with thousands of related records in it
>> (e.g. load the whole of GenBank). Iterate over all the records in the
>> database and perform a simple read operation on each that hits the modified
>> methods. See if you run out of memory.
>>
>> 2. like 1, but perform a series of repeated read/write operations using
>> the modified methods, with a final commit to attempt to write the results
>> back to see if they still persist correctly.
>>
>> The reason is that the modified methods might cause problems with those
>> people who are processing large volumes of data in their databases. If all
>> related records are loaded at once, even only on demand, instead of one at a
>> time, it will cause memory issues. The trade off is therefore memory vs.
>> speed. We opted for the memory option because it makes life easier for most
>> novice coders to not have to trace out-of-memory exceptions (although they
>> will still occur using the existing methods, but it happens less often).
>>
>> Also, your method reruns the query every time it is called. It probably
>> should cache the results after the first call, to prevent objects being
>> reloaded unnecessarily, and to prevent problems with objects from a previous
>> call being modified then attempted to be overwritten by a subsequent call?
>> Also if Hibernate does not receive the same set back that it auto-loaded as
>> a property via the default get() method when it comes to save the object, it
>> will throw a wobbly and refuse to commit.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/23 Christian K?berle <ch.koeberle at googlemail.com>
>>>
>>> Hi,
>>> I found a bug in the postgre mapping file for BioEntryRelationship.
>>> line:
>>> <many-to-one name="object" class="Feature" column="object_bioentry_id"
>>> not-null="true" cascade="persist,merge,save-update"
>>> node="@objectFeatureId"
>>> embed-xml="false"/>
>>> The value for the attribute class has to be "BioEntry"
>>>
>>> For the BioEntry I miss methodes to have access to subject_bioentry
>>> BioEntryRelationship. I think the BioEntryRelationship. is a parent child
>>> relationship. So it will be nice to have access to both.
>>>
>>> Furthermore the hibernate mapping strategies for the BioSQL is quite slow
>>> and produces a lot of queries to the database. Because for all lists and
>>> set
>>> the lazy fetch mode is disable. In this mode hibernate will execute one
>>> query for each element in a list or set. The faster way is to enable the
>>> lazy fetch mode an use methods to load the list. Each of these methods
>>> executes only one query.
>>> For excample:
>>>
>>> public List<BioEntry> getParents(BioEntry bioEntry){
>>>
>>> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
>>> =:subject";
>>> Query query = session.createQuery(stmt);
>>> query.setParameter("subject", bioEntry);
>>> return query.list();
>>>
>>> }
>>>
>>>
>>> This is factor 2 to 4 faster than the methode
>>> BioEntry..getRelationships()
>>> In case of all dependences of an BioEntry-Object an select with lazy
>>> fetching can be 500 times faster than a select with eager fetching (in
>>> case
>>> of unigene cluster Hs.4 for example).
>>> Here a example for the relationship unigene cluster Hs.2 and the gene
>>> BC067218 (we use BioSQL to store Unigene)
>>>
>>> getParents():
>>> runtime: 14 msec
>>> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
>>> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
>>> bioentry1_.accession as accession89_, bioentry1_.description as
>>> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
>>> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id
>>> as
>>> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
>>> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as
>>> seq93_,
>>> case when bioentry1_1_.bioentry_id is not null then 2 when
>>> bioentry1_.bioentry_id is not null then 0 end as clazz_ from
>>> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
>>> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id
>>> left
>>> outer join unigene.biosequence bioentry1_1_ on
>>> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
>>> joinunigene.biosequence bioentry1_2_ on
>>> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
>>> bioentryre0_.object_bioentry_id=?
>>>
>>>
>>> bioEntry.getRelationships():
>>> runtime: 36 msec
>>> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
>>> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
>>> bioentry0_.accession as accession89_, bioentry0_.description as
>>> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
>>> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id
>>> as
>>> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
>>> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as
>>> seq93_,
>>> case when bioentry0_1_.bioentry_id is not null then 2 when
>>> bioentry0_.bioentry_id is not null then 0 end as clazz_ from
>>> unigene.bioentry bioentry0_ left outer join unigene.biosequence
>>> bioentry0_1_
>>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
>>> unigene.biosequence bioentry0_2_ on
>>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
>>> Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
>>> relationsh0_.bioentry_relationship_id as bioentry1_1_,
>>> relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
>>> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
>>> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
>>> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
>>> relationsh0_ where relationsh0_.object_bioentry_id=?
>>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
>>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
>>> namespace0_.description as descript4_80_0_ from unigene.biodatabase
>>> namespace0_ where namespace0_.biodatabase_id=?
>>> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_,
>>> bioentry0_.name
>>> as name89_0_, bioentry0_.identifier as identifier89_0_,
>>> bioentry0_.accession
>>> as accession89_0_, bioentry0_.description as descript5_89_0_,
>>> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
>>> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
>>> biodatab9_89_0_, bioentry0_1_.version as version93_0_,
>>> bioentry0_1_.length
>>> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq
>>> as
>>> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
>>> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
>>> unigene.bioentry bioentry0_ left outer join unigene.biosequence
>>> bioentry0_1_
>>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
>>> unigene.biosequence bioentry0_2_ on
>>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
>>> bioentry0_.bioentry_id=?
>>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
>>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
>>> namespace0_.description as descript4_80_0_ from unigene.biodatabase
>>> namespace0_ where namespace0_.biodatabase_id=?
>>> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as
>>> name84_0_,
>>> term0_.identifier as identifier84_0_, term0_.definition as
>>> definition84_0_,
>>> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_
>>> from
>>> unigene.term term0_ where term0_.term_id=?
>>> Hibernate: select ontology0_.ontology_id as ontology1_83_0_,
>>> ontology0_.name
>>> as name83_0_, ontology0_.definition as definition83_0_ from
>>> unigene.ontology
>>> ontology0_ where ontology0_.ontology_id=?
>>> Hibernate: select termset0_.ontology_id as ontology6_1_,
>>> termset0_.term_id
>>> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as
>>> name84_0_,
>>> termset0_.identifier as identifier84_0_, termset0_.definition as
>>> definition84_0_, termset0_.is_obsolete as is5_84_0_,
>>> termset0_.ontology_id
>>> as ontology6_84_0_ from unigene.term termset0_ where
>>> termset0_.ontology_id=?
>>> Hibernate: select tripleset0_.ontology_id as ontology5_1_,
>>> tripleset0_.term_relationship_id as term1_1_,
>>> tripleset0_.term_relationship_id as term1_87_0_,
>>> tripleset0_.subject_term_id
>>> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
>>> tripleset0_.predicate_term_id as predicate4_87_0_,
>>> tripleset0_.ontology_id
>>> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
>>> tripleset0_.ontology_id=?
>>> Hibernate: select rankedcros0_.term_id as term1_0_,
>>> rankedcros0_.dbxref_id
>>> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
>>> rankedcros0_ where rankedcros0_.term_id=?
>>> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym
>>> as
>>> synonym0_ from unigene.term_synonym synonymset0_ where
>>> synonymset0_.term_id=?
>>>
>>> --
>>> Christian K?berle
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>
>
>
> --
> Christian K?berle
> Sch?nholzerstr. 5
> 10115 Berlin
> Mobil: 0179 79 35 345
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>