From gmicha at gmail.com  Sat Aug  1 11:49:50 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Sat, 01 Aug 2009 17:49:50 +0200
Subject: [Biojava-dev] apidoc in org.biojava.bio.symbol.SimpleSymbolList
Message-ID: <4A74641E.80104@gmail.com>

Hi,

the class header in my copy (1.7) contains the example

..
FiniteAlphabet dna = (FiniteAlphabet) 
AlphabetManager.alphabetForName("DNA");
SymbolParser parser = dna.getParser("token");
..

but the version I check out from the CVS does not contain a method 
FiniteAlphabet.getParser(). I think it should read

parser = dna.getTokenization("token");

right? Just wanted to bring to attention..

Best,

micha.

From bugzilla-daemon at portal.open-bio.org  Sun Aug  2 13:31:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 2 Aug 2009 13:31:09 -0400
Subject: [Biojava-dev] [Bug 2540] RichSequenceIterator does not skip
	sequence when exception is thrown
In-Reply-To: <bug-2540-485@http.bugzilla.open-bio.org/>
Message-ID: <200908021731.n72HV9W4010985@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2540


------- Comment #1 from vdmerwe.karen at gmail.com  2009-08-02 13:31 EST -------
Created an attachment (id=1352)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1352&action=view)
Code to make the RichSequenceIterator skip sequence when exception is thrown

Any feedback regarding the use of this proposed solution will be appreciated.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From gmicha at gmail.com  Sun Aug  2 15:28:10 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Sun, 02 Aug 2009 21:28:10 +0200
Subject: [Biojava-dev] Sequence and Feature
Message-ID: <4A75E8CA.3040904@gmail.com>

Hi,

I am writing a parser for aligned sequencing reads and I plan to 
separate the read information (sequence, qualities) from the alignment 
information by reasons of redundancy and sortings.

I planned the following classes:

Read extends AbstractChangeable implements Sequence, Qualitative

Alignment extends AbstractChangeable implements Feature

Alignment I put directly as inner class of Read, to delegate the 
Feature.getSequence() directly via the outer Object. I also have sort of 
alignment groups which are inserted as additional Feature in between 
these two, but I think for the sketched toy example they are not important.

One doubt is: Alignment links a subpart of the read with a subpart of 
the genomic sequence, which is big and probably I will never hold an 
instance of it. So, getSequence() here refers to the subpart of the read 
that gets aligned and I have a couple of custom attributes that annotate 
the location in the genome. Is this in the philosophy of the class 
hierachy design?

It would be nice if someone with a bit more experience in Biojava could 
leave a comment if I go the right direction, or if there is a more 
natural way to get my hierachy into biojava.

Thanks and cheers!

micha.

From holland at eaglegenomics.com  Mon Aug  3 04:01:57 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 3 Aug 2009 09:01:57 +0100
Subject: [Biojava-dev] Sequence and Feature
In-Reply-To: <4A75E8CA.3040904@gmail.com>
References: <4A75E8CA.3040904@gmail.com>
Message-ID: <2DEC4F45-25E2-497B-A0E7-100A2AD1693C@eaglegenomics.com>

Yes, Feature.getSequence() is intended only to return the sequence of  
the feature itself - so it would be fine not to store the whole  
genomic sequence, and instead just store locations referring to it.

Have you looked into the existing Alignment classes in BioJava? They  
might be of some help to you.

cheers,
Richard

On 2 Aug 2009, at 20:28, Micha Sammeth wrote:

> Hi,
>
> I am writing a parser for aligned sequencing reads and I plan to  
> separate the read information (sequence, qualities) from the  
> alignment information by reasons of redundancy and sortings.
>
> I planned the following classes:
>
> Read extends AbstractChangeable implements Sequence, Qualitative
>
> Alignment extends AbstractChangeable implements Feature
>
> Alignment I put directly as inner class of Read, to delegate the  
> Feature.getSequence() directly via the outer Object. I also have  
> sort of alignment groups which are inserted as additional Feature in  
> between these two, but I think for the sketched toy example they are  
> not important.
>
> One doubt is: Alignment links a subpart of the read with a subpart  
> of the genomic sequence, which is big and probably I will never hold  
> an instance of it. So, getSequence() here refers to the subpart of  
> the read that gets aligned and I have a couple of custom attributes  
> that annotate the location in the genome. Is this in the philosophy  
> of the class hierachy design?
>
> It would be nice if someone with a bit more experience in Biojava  
> could leave a comment if I go the right direction, or if there is a  
> more natural way to get my hierachy into biojava.
>
> Thanks and cheers!
>
> micha.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Aug  3 07:51:19 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 3 Aug 2009 12:51:19 +0100
Subject: [Biojava-dev] Hackathon update
Message-ID: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>

Hi guys,

10 people responded (including me). 5 of those are in Cambridge, UK, 3  
are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the  
hackathon with a holiday, and 3 suggested linking the hackathon with a  
conference, which would almost certainly increase chances of getting  
funding for travel/accommodation from employers.

So, I have two options. Venues in both cases to be worked out later:

   1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle  
of the winter in the UK, but on the bright side, the Cambridge Winter  
Beer Festival runs from the 22nd-24th, so that's something to cheer  
you up at the end of the hackathon.

   2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is  
9th-10th (TBC), then ISMB which is 11th-14th).

Both have pros and cons - the Cambridge meeting means 50% of the  
delegates could attend for free and we might even be able to get a  
free venue, whereas the Boston meeting would be attractive to anyone  
already planning to attend BOSC or ISMB who might otherwise not be  
able to find funding for travel.

I'm going to stick my neck out and suggest that BOSC/ISMB is the  
better choice, simply because of the wider range of potential  
delegates to attend the hackathon. We could always have a Cambridge  
mini-meeting at some other time. So, unless anyone objects, pencil in  
your diary for July 5th-8th in Boston.

Please could all those interested vote yes or no for this plan so that  
I can find a suitably sized venue. Attendance will need to be  
confirmed by the date the venue sets for final booking/payment.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Aug  3 09:29:17 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 3 Aug 2009 14:29:17 +0100
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <C69C5A09.1960%HWillis@scripps.edu>
References: <C69C5A09.1960%HWillis@scripps.edu>
Message-ID: <0BD11B39-1695-4C07-9695-20D095172A9C@eaglegenomics.com>

Good plan - my worry is whether or not people can get 2 weeks off in  
the same year for the purposes of a hackathon.

But, if people are willing, I'm happy to set up both. It does mean  
extra cost in terms of venue hire etc. - do you have any ideas as to  
good sponsors?


On 3 Aug 2009, at 14:10, Scooter Willis wrote:

> Richard
>
> It probably wouldn?t hurt to try and do both. Waiting a year delays  
> getting started and because the two events are six months apart it  
> increases the odds of those who may be able to attend both. This way  
> at BOSC/ISMB we can have good momentum and stability for the current  
> modules. The BOSC/ISMB can then be focused on recruiting new  
> developers with a focus on new modules, code examples, docs etc.
>
> It also probably makes sense to try and identify/recruit Java based  
> bioinformatics open source applications that have needed or  
> interesting functionality to ?biojava? enable the algorithm of the  
> application. This could be a good theme for the BOSC/ISMB conference  
> to have current Biojava developers work with developers of other  
> java bioinformatics application to port key functionality so that it  
> works with Biojava core.
>
> Scooter
>
>
> On 8/3/09 7:51 AM, "Richard Holland" <holland at eaglegenomics.com>  
> wrote:
>
> Hi guys,
>
> 10 people responded (including me). 5 of those are in Cambridge, UK, 3
> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the
> hackathon with a holiday, and 3 suggested linking the hackathon with a
> conference, which would almost certainly increase chances of getting
> funding for travel/accommodation from employers.
>
> So, I have two options. Venues in both cases to be worked out later:
>
>    1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
> of the winter in the UK, but on the bright side, the Cambridge Winter
> Beer Festival runs from the 22nd-24th, so that's something to cheer
> you up at the end of the hackathon.
>
>    2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
> 9th-10th (TBC), then ISMB which is 11th-14th).
>
> Both have pros and cons - the Cambridge meeting means 50% of the
> delegates could attend for free and we might even be able to get a
> free venue, whereas the Boston meeting would be attractive to anyone
> already planning to attend BOSC or ISMB who might otherwise not be
> able to find funding for travel.
>
> I'm going to stick my neck out and suggest that BOSC/ISMB is the
> better choice, simply because of the wider range of potential
> delegates to attend the hackathon. We could always have a Cambridge
> mini-meeting at some other time. So, unless anyone objects, pencil in
> your diary for July 5th-8th in Boston.
>
> Please could all those interested vote yes or no for this plan so that
> I can find a suitably sized venue. Attendance will need to be
> confirmed by the date the venue sets for final booking/payment.
>
> cheers,
> Richard
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Mon Aug  3 12:38:32 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 4 Aug 2009 00:38:32 +0800
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>
References: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>
Message-ID: <93b45ca50908030938j7899572et780fd2ccd0f2f417@mail.gmail.com>

Boston++

On 3 Aug 2009, 8:52 PM, "Richard Holland" <holland at eaglegenomics.com> wrote:

Hi guys,

10 people responded (including me). 5 of those are in Cambridge, UK, 3 are
in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the hackathon
with a holiday, and 3 suggested linking the hackathon with a conference,
which would almost certainly increase chances of getting funding for
travel/accommodation from employers.

So, I have two options. Venues in both cases to be worked out later:

 1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle of the
winter in the UK, but on the bright side, the Cambridge Winter Beer Festival
runs from the 22nd-24th, so that's something to cheer you up at the end of
the hackathon.

 2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
9th-10th (TBC), then ISMB which is 11th-14th).

Both have pros and cons - the Cambridge meeting means 50% of the delegates
could attend for free and we might even be able to get a free venue, whereas
the Boston meeting would be attractive to anyone already planning to attend
BOSC or ISMB who might otherwise not be able to find funding for travel.

I'm going to stick my neck out and suggest that BOSC/ISMB is the better
choice, simply because of the wider range of potential delegates to attend
the hackathon. We could always have a Cambridge mini-meeting at some other
time. So, unless anyone objects, pencil in your diary for July 5th-8th in
Boston.

Please could all those interested vote yes or no for this plan so that I can
find a suitably sized venue. Attendance will need to be confirmed by the
date the venue sets for final booking/payment.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev

From andreas at sdsc.edu  Tue Aug  4 02:09:37 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 3 Aug 2009 23:09:37 -0700
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <0BD11B39-1695-4C07-9695-20D095172A9C@eaglegenomics.com>
References: <C69C5A09.1960%HWillis@scripps.edu>
	<0BD11B39-1695-4C07-9695-20D095172A9C@eaglegenomics.com>
Message-ID: <59a41c430908032309l7b380c92hf018c12d38dd566f@mail.gmail.com>

Hi Richard,

I think it is a great idea to plan a hackaton prior to next BOSC. Still this
is still almost a year ahead and as such a long time away. Ideally I would
like to have something already earlier than that... San Diego is far away
from the UK, but I would be happy to organize and host something here, if
people would be up for the longish-journey...

Andreas


On Mon, Aug 3, 2009 at 6:29 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Good plan - my worry is whether or not people can get 2 weeks off in the
> same year for the purposes of a hackathon.
>
> But, if people are willing, I'm happy to set up both. It does mean extra
> cost in terms of venue hire etc. - do you have any ideas as to good
> sponsors?
>
>
> On 3 Aug 2009, at 14:10, Scooter Willis wrote:
>
>  Richard
>>
>> It probably wouldn?t hurt to try and do both. Waiting a year delays
>> getting started and because the two events are six months apart it increases
>> the odds of those who may be able to attend both. This way at BOSC/ISMB we
>> can have good momentum and stability for the current modules. The BOSC/ISMB
>> can then be focused on recruiting new developers with a focus on new
>> modules, code examples, docs etc.
>>
>> It also probably makes sense to try and identify/recruit Java based
>> bioinformatics open source applications that have needed or interesting
>> functionality to ?biojava? enable the algorithm of the application. This
>> could be a good theme for the BOSC/ISMB conference to have current Biojava
>> developers work with developers of other java bioinformatics application to
>> port key functionality so that it works with Biojava core.
>>
>> Scooter
>>
>>
>>
>> On 8/3/09 7:51 AM, "Richard Holland" <holland at eaglegenomics.com> wrote:
>>
>> Hi guys,
>>
>> 10 people responded (including me). 5 of those are in Cambridge, UK, 3
>> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the
>> hackathon with a holiday, and 3 suggested linking the hackathon with a
>> conference, which would almost certainly increase chances of getting
>> funding for travel/accommodation from employers.
>>
>> So, I have two options. Venues in both cases to be worked out later:
>>
>>   1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
>> of the winter in the UK, but on the bright side, the Cambridge Winter
>> Beer Festival runs from the 22nd-24th, so that's something to cheer
>> you up at the end of the hackathon.
>>
>>   2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
>> 9th-10th (TBC), then ISMB which is 11th-14th).
>>
>> Both have pros and cons - the Cambridge meeting means 50% of the
>> delegates could attend for free and we might even be able to get a
>> free venue, whereas the Boston meeting would be attractive to anyone
>> already planning to attend BOSC or ISMB who might otherwise not be
>> able to find funding for travel.
>>
>> I'm going to stick my neck out and suggest that BOSC/ISMB is the
>> better choice, simply because of the wider range of potential
>> delegates to attend the hackathon. We could always have a Cambridge
>> mini-meeting at some other time. So, unless anyone objects, pencil in
>> your diary for July 5th-8th in Boston.
>>
>> Please could all those interested vote yes or no for this plan so that
>> I can find a suitably sized venue. Attendance will need to be
>> confirmed by the date the venue sets for final booking/payment.
>>
>> cheers,
>> Richard
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From bugzilla-daemon at portal.open-bio.org  Tue Aug  4 13:28:58 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Aug 2009 13:28:58 -0400
Subject: [Biojava-dev] [Bug 2540] RichSequenceIterator does not skip
	sequence when exception is thrown
In-Reply-To: <bug-2540-485@http.bugzilla.open-bio.org/>
Message-ID: <200908041728.n74HSwfd027233@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2540


vdmerwe.karen at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1352 is|0                           |1
           obsolete|                            |


------- Comment #2 from vdmerwe.karen at gmail.com  2009-08-04 13:28 EST -------
Created an attachment (id=1356)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1356&action=view)
Updated the previous solution


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From florian.mittag at uni-tuebingen.de  Wed Aug  5 08:45:41 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Wed, 5 Aug 2009 14:45:41 +0200
Subject: [Biojava-dev]  How to parse large Genbank files?
In-Reply-To: <E8A261BA-DED7-4BE5-A946-1561648BB527@eaglegenomics.com>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<200907281414.55156.florian.mittag@uni-tuebingen.de>
	<E8A261BA-DED7-4BE5-A946-1561648BB527@eaglegenomics.com>
Message-ID: <200908051445.42345.florian.mittag@uni-tuebingen.de>

On Tuesday, 28. July 2009 14:52, Richard Holland wrote:
> > Btw: Should we move this to Biojava-dev?
>> probably, yes! :)

done ;)


> If you want to explore my ideas for a replacement Sequence model, the
> code and docs are here (sequence handling is in the 'core' module with
> DNA-specifics in the 'dna' module):
>
> http://biojava.org/wiki/BioJava3:HowTo
> http://www.biojava.org/wiki/BioJava3_project
>
> (Methods such as file parsers would request Strings (or ideally
> CharSequence - more flexible, and String extends it) as parameters
> whenever they don't care about content - if they care about content
> but don't care in advance about size or random access then they should
> request Iterator<Symbol> which can be used to wrap a String and parse
> on demand, and if they need full functionality then they should
> request List<Symbol> which the default implementation of uses
> ArrayLists but there's no reason a String-backed one could be written
> as well).

By now, I was mostly interested in a quick and dirty solution. I first 
attempted to create a new class StringSymbolList that would use the String as 
representation for the sequence and only convert to Symbols on demand. Since 
SimpleRichSequence uses SimpleSymbolList hard-coded, I wanted to implement a 
new RichSequence as well, but I was back-stabbed by Hibernate, because the 
bindings are set to SimpleRichSequence and when retrieving objects from the 
DB it uses the original BioJava classes again

My solution now works and it consists out of my own implementation of 
GenbankFormat, RichSequenceBuilder, and RichSequence, a new class called 
StringSymbolList as described above and a change to SimpleRichSequence, 
adding the method:

@Override
public String seqString() {
    return seqstring;
}

which circumvents most of the array copying stuff.

I also noticed that processing the Genbank files became slower with every 
file, so I closed the Hibernate session after each chromosome and opened a 
new one. (I also tried session.clean(), but somehow this didn't work).

For now, it seems like everything is fine and I have no more OutOfMemory 
exceptions.

- Florian


>
> cheers,
> Richard
>
> > - Florian
> >
> >> On Mon, Jul 27, 2009 at 8:16 PM, Florian
> >>
> >> Mittag<florian.mittag at uni-tuebingen.de> wrote:
> >>> Hi Mark!
> >>>
> >>> On Saturday, 25. July 2009 04:20, Mark Schreiber wrote:
> >>>> I don't think anyone has done much or anything to optimize these
> >>>> parsers. The process you outline sounds extremely inefficient. It
> >>>> is
> >>>> also likely to lead to memory leaks due to the number of copy
> >>>> operations.
> >>>
> >>> I wouldn't necessarily say that it leads to memory leaks, but it
> >>> definitively leads to a high memory consumption (2GB are not
> >>> enough for a
> >>> 200MB file). Also, my outline of the process is based on only 2
> >>> hours of
> >>> viewing the code, so actually I expected to be corrected on this.
> >>> Unfortunately, it seems like I did get the right idea and it IS
> >>> extremely
> >>> inefficient.
> >>>
> >>> I mean, I understand that this is a high level of abstraction that
> >>> might
> >>> come in handy in many situations, but it certainly is more of an
> >>> obstacle
> >>> in my specific case.
> >>>
> >>>> As always with java, don't try and optimize without a profiler
> >>>> which
> >>>> will tell you which methods are taking a long time and which
> >>>> objects
> >>>> take the most memory.
> >>>
> >>> I think we should continue this discussion on the biojava-dev list
> >>> or in
> >>> a private conversation, as it will probably get very detailed and
> >>> technical.
> >>>
> >>>
> >>> My question to this list again:
> >>> Is there a way to achieve my goal of parsing a 200MB Genbank file
> >>> with
> >>> the current biojava version without code changes?
> >>>
> >>>
> >>> - Florian
> >>>
> >>>> On 25 Jul 2009, 1:33 AM, "Florian Mittag"
> >>>> <florian.mittag at uni-tuebingen.de> wrote:
> >>>>
> >>>> Hi!
> >>>>
> >>>> I think this is a problem worth of its own thread, so I'll start
> >>>> one:
> >>>>
> >>>> I want to store all human chromosomes in a BioSQL database after I
> >>>> loaded the
> >>>> information from .gbk files. The files I get from NCBI with the
> >>>> following URIs, where the id ranges from nc_000001 to nc_000024
> >>>> plus
> >>>> nc_001804:
> >>>>
> >>>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id
> >>>>=n c_0 00023&rettype=gbwithparts&retmode=text
> >>>>
> >>>> I then try to parse the files as described in
> >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriti
> >>>>ng _fi les but it wont work. While there are no problems parsing 1804
> >>>> and
> >>>> 24, chromosome
> >>>> 23 leads to a OutOfMemory exception although I gave it 2GB of heap
> >>>> space.
> >>>>
> >>>> Here is a stack trace (the line numbers might differ, because I
> >>>> already
> >>>> tried
> >>>> to improve GenbankFormat.java in memory efficiency):
> >>>>
> >>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap
> >>>> space
> >>>>        at
> >>>> org
> >>>> .biojava
> >>>> .bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbol
> >>>> Lis tFactory.java:222) at
> >>>> org
> >>>> .biojavax
> >>>> .bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichS
> >>>> equ enceBuilder.java:256) at
> >>>> org
> >>>> .biojavax
> >>>> .bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.jav
> >>>> a:5 35) at
> >>>> org
> >>>> .biojavax
> >>>> .bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead
> >>>> er. java:110) at
> >>>> org
> >>>> .prodge
> >>>> .sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Ma
> >>>> in. java:537) at
> >>>> org
> >>>> .prodge
> >>>> .sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java
> >>>>
> >>>> :46 8) at
> >>>>
> >>>> org
> >>>> .prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:
> >>>> 164)
> >>>>
> >>>> The line in GenbankFormat.java is:
> >>>>
> >>>> rlistener.addSymbols(
> >>>>        symParser.getAlphabet(),
> >>>>        (Symbol[])(sl.toList().toArray(new Symbol[0])),
> >>>>        0, sl.length());
> >>>>
> >>>> Sometimes it fails at the sl.toList().toArray()-part, sometimes
> >>>> it fails
> >>>> later
> >>>> inside the addSymbols method, but it always fails.
> >>>>
> >>>> How can this be? I mean, the file is only 190MB in size, so 2GB of
> >>>> memory should be more than enough. Browsing through the source
> >>>> code, I
> >>>> discovered what I think of as very inefficient handling of
> >>>> sequences:
> >>>>
> >>>> 1) the sequence string is read from file into a StringBuffer
> >>>> 2) it is converted to a string (with whitespaces removed)
> >>>> 3) a SimpleSymbolList is created out of the string
> >>>> 4) the SymbolList is converted to a List of Symbols
> >>>> 5) the List is converted to an array of Symbols
> >>>> 6) the array is passed to addSymbols
> >>>> 7) there it is added to a ChunkedSymbolListFactory
> >>>> 8) if at some point the sequence is requested, a SymbolList is
> >>>> created
> >>>> and then converted to a string.
> >>>>
> >>>> You see, there is a lot of copying and converting, but in the end
> >>>> I have
> >>>> the same string I started with. Well, I had the string, if it ever
> >>>> reached the end, because it will crash before completing this
> >>>> process.
> >>>>
> >>>>
> >>>> Am I doing something wrong or is there a great potential of
> >>>> improving
> >>>> parsing
> >>>> of Genbank files?
> >>>>
> >>>>
> >>>> Regards,
> >>>>   Florian
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >>> --
> >>> Dipl. Inf. Florian Mittag
> >>> Universit?t Tuebingen
> >>> WSI-RA, Sand 1
> >>> 72076 Tuebingen, Germany
> >>> Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091
> >
> > --
> > Dipl. Inf. Florian Mittag
> > Universit?t Tuebingen
> > WSI-RA, Sand 1
> > 72076 Tuebingen, Germany
> > Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From markjschreiber at gmail.com  Wed Aug  5 09:16:03 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 5 Aug 2009 21:16:03 +0800
Subject: [Biojava-dev] How to parse large Genbank files?
In-Reply-To: <200908051445.42345.florian.mittag@uni-tuebingen.de>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<200907281414.55156.florian.mittag@uni-tuebingen.de>
	<E8A261BA-DED7-4BE5-A946-1561648BB527@eaglegenomics.com>
	<200908051445.42345.florian.mittag@uni-tuebingen.de>
Message-ID: <93b45ca50908050616n210bd2a3u8391d9ad7114015a@mail.gmail.com>

Would it be better for the biojava SimpleRichSequence to be backed by a
String and do symbol operations on the fly? Alternatively the default
hibernate mapping could be to a more stringy sequence.

Arguably in the absence of JPA and entity beans Hibernate should probably be
talking to biojava via DTOs. An efficient BioSQL loader would directly use
the DTOs or Entity beans (which could implement biojava interfaces) and not
go through all the symbol hassle.

Might be worth considering for BJ3

- Mark

On Aug 5, 2009 8:45 PM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
wrote:

On Tuesday, 28. July 2009 14:52, Richard Holland wrote: > > Btw: Should we
move this to Biojava-dev?...
done ;)

> If you want to explore my ideas for a replacement Sequence model, the >
code and docs are here (...
By now, I was mostly interested in a quick and dirty solution. I first
attempted to create a new class StringSymbolList that would use the String
as
representation for the sequence and only convert to Symbols on demand. Since
SimpleRichSequence uses SimpleSymbolList hard-coded, I wanted to implement a
new RichSequence as well, but I was back-stabbed by Hibernate, because the
bindings are set to SimpleRichSequence and when retrieving objects from the
DB it uses the original BioJava classes again

My solution now works and it consists out of my own implementation of
GenbankFormat, RichSequenceBuilder, and RichSequence, a new class called
StringSymbolList as described above and a change to SimpleRichSequence,
adding the method:

@Override
public String seqString() {
   return seqstring;
}

which circumvents most of the array copying stuff.

I also noticed that processing the Genbank files became slower with every
file, so I closed the Hibernate session after each chromosome and opened a
new one. (I also tried session.clean(), but somehow this didn't work).

For now, it seems like everything is fine and I have no more OutOfMemory
exceptions.

- Florian

> > cheers, > Richard > > > - Florian > > > >> On Mon, Jul 27, 2009 at 8:16
PM, Florian > >> > >> ...
> >>>>ng _fi les but it wont work. While there are no problems parsing 1804

> >>>> and > >>>> 24, chromosome > >>>> 23 leads to a OutOfMemory exception
although I gave it 2GB o...
--

Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076
Tuebingen, Germany Phone: +49 7...


From florian.mittag at uni-tuebingen.de  Wed Aug  5 11:41:24 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Wed, 5 Aug 2009 17:41:24 +0200
Subject: [Biojava-dev] Error loading Ontology with Hibernate
Message-ID: <200908051741.24367.florian.mittag@uni-tuebingen.de>

Hi, it's me again ;-)

I'm really sorry to bother you with yet another problem, but I seem to attract 
those problems.

When I parse Genbank files and store them in a BioSQL DB, all features 
like "gap", "mRNA", "gene", etc. are represented by newly created Terms in 
the ontology "biojavax" with the comment "autocreated by biojavax". I 
searched for an appropriate ontology and found the Sequence Ontology, which I 
loaded into the DB using BioPerl's load_ontology.pl

I tried setting the default ontology using 
RichObjectBuilder.setDefaultOntology("sequence"), but when it comes to 
instantiation the SimpleRichSequenceBuilder, a multi-nested exception is 
thrown. I followed it in the code and found the cause in Hibernate:

[SEVERE] <init>(): illegal access to loading collection >> 
org.hibernate.LazyInitializationException: illegal access to loading 
collection
	at 
org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:341)
	at 
org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
	at org.hibernate.collection.PersistentSet.toString(PersistentSet.java:309)
	at java.lang.String.valueOf(String.java:2827)
	at java.lang.StringBuilder.append(StringBuilder.java:115)
	at java.util.AbstractCollection.toString(AbstractCollection.java:422)
	at 
org.hibernate.engine.StatefulPersistenceContext.initializeNonLazyCollections(StatefulPersistenceContext.java:844)

probably cause by this exception

org.hibernate.PropertyAccessException: Null value was assigned to a property 
of primitive type setter of org.biojavax.SimpleRankedCrossRef.rank


The code to reproduce this:

sessionFactory = new Configuration().configure().buildSessionFactory();  
session = sessionFactory.openSession();                                         
RichObjectFactory.connectToBioSQL(session);
RichObjectFactory.setDefaultOntologyName("sequence");
Ontology onto = RichObjectFactory.getDefaultOntology();

My DB has the following ontologies listed:
- biological_process
- gene_ontology
- molecular_function
- cellular_component
- sequence
- biojavax

and only for "gene_ontology" and "biojavax" the above code snippet runs 
without failure. All ontologies were loaded with the load_ontology.pl script.


What might be the cause?

Thanks

- Florian


-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From florian.mittag at uni-tuebingen.de  Thu Aug  6 09:16:50 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 15:16:50 +0200
Subject: [Biojava-dev] Error loading Ontology with Hibernate
In-Reply-To: <200908051741.24367.florian.mittag@uni-tuebingen.de>
References: <200908051741.24367.florian.mittag@uni-tuebingen.de>
Message-ID: <200908061516.50183.florian.mittag@uni-tuebingen.de>

Found the cause.

After importing an ontology (Gene or Sequence Ontology) into the BioSQL using 
load_ontology.pl, the table "term_dbxref" has only NULL values in the rank 
column. I tried it with DB2 and MySQL, same results/error.

The way I see it, this is not a problem of Hibernate. Can I set the "rank" to 
an arbitrary value to circumvent this problem?


On Wednesday, 5. August 2009 17:41, Florian Mittag wrote:
> Hi, it's me again ;-)
>
> I'm really sorry to bother you with yet another problem, but I seem to
> attract those problems.
>
> When I parse Genbank files and store them in a BioSQL DB, all features
> like "gap", "mRNA", "gene", etc. are represented by newly created Terms in
> the ontology "biojavax" with the comment "autocreated by biojavax". I
> searched for an appropriate ontology and found the Sequence Ontology, which
> I loaded into the DB using BioPerl's load_ontology.pl
>
> I tried setting the default ontology using
> RichObjectBuilder.setDefaultOntology("sequence"), but when it comes to
> instantiation the SimpleRichSequenceBuilder, a multi-nested exception is
> thrown. I followed it in the code and found the cause in Hibernate:
>
> [SEVERE] <init>(): illegal access to loading collection >>
> org.hibernate.LazyInitializationException: illegal access to loading
> collection
> 	at
> org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPe
>rsistentCollection.java:341) at
> org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersiste
>ntCollection.java:86) at
> org.hibernate.collection.PersistentSet.toString(PersistentSet.java:309) at
> java.lang.String.valueOf(String.java:2827)
> 	at java.lang.StringBuilder.append(StringBuilder.java:115)
> 	at java.util.AbstractCollection.toString(AbstractCollection.java:422)
> 	at
> org.hibernate.engine.StatefulPersistenceContext.initializeNonLazyCollection
>s(StatefulPersistenceContext.java:844)
>
> probably cause by this exception
>
> org.hibernate.PropertyAccessException: Null value was assigned to a
> property of primitive type setter of org.biojavax.SimpleRankedCrossRef.rank
>
>
> The code to reproduce this:
>
> sessionFactory = new Configuration().configure().buildSessionFactory();
> session = sessionFactory.openSession();
> RichObjectFactory.connectToBioSQL(session);
> RichObjectFactory.setDefaultOntologyName("sequence");
> Ontology onto = RichObjectFactory.getDefaultOntology();
>
> My DB has the following ontologies listed:
> - biological_process
> - gene_ontology
> - molecular_function
> - cellular_component
> - sequence
> - biojavax
>
> and only for "gene_ontology" and "biojavax" the above code snippet runs
> without failure. All ontologies were loaded with the load_ontology.pl
> script.
>
>
> What might be the cause?
>
> Thanks
>
> - Florian

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From markjschreiber at gmail.com  Thu Aug  6 09:48:37 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 6 Aug 2009 21:48:37 +0800
Subject: [Biojava-dev] Error loading Ontology with Hibernate
In-Reply-To: <200908061516.50183.florian.mittag@uni-tuebingen.de>
References: <200908051741.24367.florian.mittag@uni-tuebingen.de>
	<200908061516.50183.florian.mittag@uni-tuebingen.de>
Message-ID: <93b45ca50908060648p2451096ax46a179e058a09551@mail.gmail.com>

There shouldn't be an issue with using an arbitrary value. The ranks in
biosql are mainly to preserve the order of features etc. during
roundtripping. It will affect sorting of ontology terms but this is probably
not a problem.

- mark

On Aug 6, 2009 9:42 PM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
wrote:

Found the cause.

After importing an ontology (Gene or Sequence Ontology) into the BioSQL
using
load_ontology.pl, the table "term_dbxref" has only NULL values in the rank
column. I tried it with DB2 and MySQL, same results/error.

The way I see it, this is not a problem of Hibernate. Can I set the "rank"
to
an arbitrary value to circumvent this problem?

On Wednesday, 5. August 2009 17:41, Florian Mittag wrote: > Hi, it's me
again ;-) > > I'm really s...

From florian.mittag at uni-tuebingen.de  Thu Aug  6 10:14:02 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 16:14:02 +0200
Subject: [Biojava-dev] Error loading Ontology with Hibernate
In-Reply-To: <93b45ca50908060648p2451096ax46a179e058a09551@mail.gmail.com>
References: <200908051741.24367.florian.mittag@uni-tuebingen.de>
	<200908061516.50183.florian.mittag@uni-tuebingen.de>
	<93b45ca50908060648p2451096ax46a179e058a09551@mail.gmail.com>
Message-ID: <200908061614.03033.florian.mittag@uni-tuebingen.de>

On Thursday, 6. August 2009 15:48, you wrote:
> There shouldn't be an issue with using an arbitrary value. The ranks in
> biosql are mainly to preserve the order of features etc. during
> roundtripping. It will affect sorting of ontology terms but this is
> probably not a problem.

Ok, then I will try this as a quick hack until I've found out if the NULL 
values are a bug and if it can be fixed.

Thanks for the quick answer!

- Florian


> On Aug 6, 2009 9:42 PM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
> wrote:
>
> Found the cause.
>
> After importing an ontology (Gene or Sequence Ontology) into the BioSQL
> using
> load_ontology.pl, the table "term_dbxref" has only NULL values in the rank
> column. I tried it with DB2 and MySQL, same results/error.
>
> The way I see it, this is not a problem of Hibernate. Can I set the "rank"
> to
> an arbitrary value to circumvent this problem?
>
> On Wednesday, 5. August 2009 17:41, Florian Mittag wrote: > Hi, it's me
> again ;-) > > I'm really s...

From holland at eaglegenomics.com  Fri Aug  7 13:51:59 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 7 Aug 2009 18:51:59 +0100
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <Pine.GSO.4.44.0908071313400.28289-100000@shell3.shore.net>
References: <Pine.GSO.4.44.0908071313400.28289-100000@shell3.shore.net>
Message-ID: <0AA4618C-2A99-4ACD-B07D-0AA05FE77665@eaglegenomics.com>

Several have said the same. I'll try to get both organised. Watch this  
space.

cheers,
Richard

On 7 Aug 2009, at 18:23, Michael Heuer wrote:

> Richard Holland wrote:
>
>> 10 people responded (including me). 5 of those are in Cambridge,  
>> UK, 3
>> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine  
>> the
>> hackathon with a holiday, and 3 suggested linking the hackathon  
>> with a
>> conference, which would almost certainly increase chances of getting
>> funding for travel/accommodation from employers.
>>
>> So, I have two options. Venues in both cases to be worked out later:
>>
>>   1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
>> of the winter in the UK, but on the bright side, the Cambridge Winter
>> Beer Festival runs from the 22nd-24th, so that's something to cheer
>> you up at the end of the hackathon.
>>
>>   2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
>> 9th-10th (TBC), then ISMB which is 11th-14th).
>
>
> I would suggest trying for both.  Winter in the UK means that a lot of
> work would get done.  Attendance would probably be better for Boston.
>
> I would caution that accomodations in Boston are quite expensive, and
> that the 4th of July week is the busiest week of the year with  
> tourists.
> Perhaps the hackathon in Boston might be arranged flexibly around the
> actual days of the conference, evenings and late nights and so on.
>
>   michael
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From heuermh at acm.org  Fri Aug  7 13:23:53 2009
From: heuermh at acm.org (Michael Heuer)
Date: Fri, 7 Aug 2009 13:23:53 -0400 (EDT)
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>
Message-ID: <Pine.GSO.4.44.0908071313400.28289-100000@shell3.shore.net>

Richard Holland wrote:

> 10 people responded (including me). 5 of those are in Cambridge, UK, 3
> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the
> hackathon with a holiday, and 3 suggested linking the hackathon with a
> conference, which would almost certainly increase chances of getting
> funding for travel/accommodation from employers.
>
> So, I have two options. Venues in both cases to be worked out later:
>
>    1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
> of the winter in the UK, but on the bright side, the Cambridge Winter
> Beer Festival runs from the 22nd-24th, so that's something to cheer
> you up at the end of the hackathon.
>
>    2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
> 9th-10th (TBC), then ISMB which is 11th-14th).


I would suggest trying for both.  Winter in the UK means that a lot of
work would get done.  Attendance would probably be better for Boston.

I would caution that accomodations in Boston are quite expensive, and
that the 4th of July week is the busiest week of the year with tourists.
Perhaps the hackathon in Boston might be arranged flexibly around the
actual days of the conference, evenings and late nights and so on.

   michael


From andreas at sdsc.edu  Sun Aug 16 17:41:03 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 16 Aug 2009 14:41:03 -0700
Subject: [Biojava-dev] plans for next months
Message-ID: <59a41c430908161441l3ae3ebao524237a1b7b868fe@mail.gmail.com>

Hi,

Here a quick summary of what I propose to be our action plan for the
next months for BioJava:

* I would like to call for a code-freeze in 2 weeks (or so) in order
to finalize  the new modularized and mavenized version of biojava for
the developers. The current developmental trunk will remain
permanently frozen and all future work should continue at a new
location in SVN. As such it will be important that all  developers
commit any changes they are working on before that.

* We will update the documentation for how to obtain a new mavenized
checkout on the wiki.

* After the change the new modules need to be tested and if no major
problems are found, the ok will be given to continue working on the
new modules (at the new location)

* All developers should obtain a new checkout.

* We need to identify sub-module leaders who will take over leadership
of the sub-modules.

In order to come up with a new release of biojava we should continue
development on the new modules for a few months. Talking off list with
Richard Holland it looks like we will have a hackaton in January in
Cambridge, U.K. (details to be finalized and announced). I suggest
that we use that opportunity to focus on further developing the
modules and make a new public BioJava release shortly after that.

At the present I see the following topics that would be great to work
on until and during the hackaton in order to prepare a shiny new
version of BioJava for public release:

+ Work on standardizing the organization of the modules (tests,
examples, source, docu etc.)
+ Add new modules
+ Improve existing modules
+ Anything the module leaders deem necessary for their modules.
+ Use OSGI for visualisation related modules

I can post a more detailed and specific list of things to work on if
people are interested.

Andreas

From andreas at sdsc.edu  Mon Aug 24 00:18:14 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 23 Aug 2009 21:18:14 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action items for sub modules
Message-ID: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>

Hi,

In order to push the modularization and migration to Maven, I would like to
declare a code freeze on the current developmental trunk. Please commit all
new changes by

Thursday 27th of August 23:00 GMT.

In the week after I would like to refactor the code base and commit the
initial set of modules to a new developmental trunk.  All future development
will happen on that new trunk.

You will be able to follow the ongoing status of this at

http://biojava.org/wiki/BioJava:MavenMigration


Once the modules are in place it is a good moment to hand over the
leadership of the sub-modules to the new module-project leaders. It will be
up to the module-lead to take the modules into the direction that he/she
feels important. I would like to take this opportunity to suggest a couple
of people as module-leaders and propose some action items for the modules.
Feel free to comment or make additional suggestions...

Here a list of modules / action items and the people that I would propose to
become module leaders:

Module: biojava-core Lead: Andreas Prlic
 - break the new modules out of core
 - bring up to modern Java standards, use Generics
 - declare old/unused code obsolete
 - don;t break backwards compatibility

Module: biojava-sequence Lead: Richard Holland
 - Bring in Richard's new code that he started to develop on the biojava-3
branch.
 - provide a more scaleable and efficient basis for dealing with large
sequence files

Module: biojava-alignment Lead: Andreas Draeger
 - allow better access to underlying dynamic programming data structures
 - allow more customizable display of pairwise alignments (HTML/plain text,
etc)

Module : biojava-blast Lead: still looking for a leader
 - provide access to all details of the blast output
 - add support for RPS blast

Module: biojava-phylo Lead: Scooter Willis
 - provide improved NJtree /Jalview

Module: biojava-biosql Lead: Richard Holland
 - merge the new biojava-sequence module with the current biojava-biosql
code


Module: biojava-structure Lead: Andreas Prlic
 - add support for SCOP file parsing
 - add support for easy access of domains (in terms of coordinates)
 - add secondary structure assignment
 - improve structure alignments
 - better integration with 3D viewers (Jmol, RCSB viewers)

Module: biojava-web services:
The details seem still to be under discussion and perhaps we need multiple
modules here?
also what about REST vs. SOAP? To be discussed. People who expressed
interest are:
Niall Haslam,Scooter Willis, Sylvain Foisy

Module?: biojava-ws-blast
Module?: biojava-ws-biolit

Module: biojava-sequencing Lead: ???
  - support FastQ files
 - support parsing of output for various new sequencing machines

This is only an initial set of modules and I think it is safe to say that
more modules will be added after more discussions (and people volunteering
to contribute).

Andreas

From simpleyrx at 163.com  Mon Aug 24 12:48:01 2009
From: simpleyrx at 163.com (simpleyrx)
Date: Tue, 25 Aug 2009 00:48:01 +0800 (CST)
Subject: [Biojava-dev] Adding profile-profile alignment algorithms to Biojava
Message-ID: <9551386.424471251132481047.JavaMail.coremail@app180.163.com>


Experts,
 
           Profile-profile alignment or HMM-HMM alignments have become more important in protein bioinformation field than ever before.  So I think, if we can  implement Profile-profile alignment and HMM-HMM alignments algorithms in Biojava package, it will be more useful to the researchers who interested in protein bioinformatics.


From holland at eaglegenomics.com  Mon Aug 24 13:30:31 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 24 Aug 2009 18:30:31 +0100
Subject: [Biojava-dev] Adding profile-profile alignment algorithms to
	Biojava
In-Reply-To: <9551386.424471251132481047.JavaMail.coremail@app180.163.com>
References: <9551386.424471251132481047.JavaMail.coremail@app180.163.com>
Message-ID: <ECEEC52A-B615-4140-B84E-52097A36A4D0@eaglegenomics.com>

Contributions of code would be welcome! Are you volunteering? :)

cheers,
Richard

On 24 Aug 2009, at 17:48, simpleyrx wrote:

>
> Experts,
>
>           Profile-profile alignment or HMM-HMM alignments have  
> become more important in protein bioinformation field than ever  
> before.  So I think, if we can  implement Profile-profile alignment  
> and HMM-HMM alignments algorithms in Biojava package, it will be  
> more useful to the researchers who interested in protein  
> bioinformatics.
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From heuermh at acm.org  Mon Aug 24 21:19:24 2009
From: heuermh at acm.org (Michael Heuer)
Date: Mon, 24 Aug 2009 21:19:24 -0400 (EDT)
Subject: [Biojava-dev] BioJava code freeze,
 modularization and action items for sub modules
In-Reply-To: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>

Andreas Prlic wrote:

> In order to push the modularization and migration to Maven, I would like to
> declare a code freeze on the current developmental trunk. Please commit all
> new changes by
>
> Thursday 27th of August 23:00 GMT.
>
> In the week after I would like to refactor the code base and commit the
> initial set of modules to a new developmental trunk.  All future development
> will happen on that new trunk.
>
> You will be able to follow the ongoing status of this at
>
> http://biojava.org/wiki/BioJava:MavenMigration
>
>
> Once the modules are in place it is a good moment to hand over the
> leadership of the sub-modules to the new module-project leaders. It will be
> up to the module-lead to take the modules into the direction that he/she
> feels important. I would like to take this opportunity to suggest a couple
> of people as module-leaders and propose some action items for the modules.
> Feel free to comment or make additional suggestions...

Sign me up for help with maven configuration/reporting, unit testing, and
generics API matters if you wish.


> Here a list of modules / action items and the people that I would propose to
> become module leaders:
>
> Module: biojava-core Lead: Andreas Prlic
>  - break the new modules out of core
>  - bring up to modern Java standards, use Generics
>  - declare old/unused code obsolete
>  - don;t break backwards compatibility

Seems to me the last one will greatly hamper the rest of this effort.
The next version needs to be binary compatible with 1.7?

   michael


From andreas at sdsc.edu  Mon Aug 24 22:17:00 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 24 Aug 2009 19:17:00 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
Message-ID: <59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>

>> Once the modules are in place it is a good moment to hand over the
>> leadership of the sub-modules to the new module-project leaders. It will be
>> up to the module-lead to take the modules into the direction that he/she
>> feels important. I would like to take this opportunity to suggest a couple
>> of people as module-leaders and propose some action items for the modules.
>> Feel free to comment or make additional suggestions...
>
> Sign me up for help with maven configuration/reporting, unit testing, and
> generics API matters if you wish.

Excellent, I will come back to you on this :-)

>> ?- don;t break backwards compatibility
>
> Seems to me the last one will greatly hamper the rest of this effort.
> The next version needs to be binary compatible with 1.7?


What I mean is that we should try not to disrupt things as much as is
reasonable. I am all for a pragmatic approach. While trying to be
conservative I guess refactoring should be discussed on a case by case
basis. To give an example: an area where I am supporting re-factoring
is the blast parser. The package name is confusing and we probably
need some code changes to expose more details of the parser. Are you
thinking of any other situtations, where you think breaking backwards
compatibility will be inevitable?

Andreas


From heuermh at acm.org  Mon Aug 24 22:50:09 2009
From: heuermh at acm.org (Michael Heuer)
Date: Mon, 24 Aug 2009 22:50:09 -0400 (EDT)
Subject: [Biojava-dev] BioJava code freeze,
 modularization and action  items for sub modules
In-Reply-To: <59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0908242243520.18799-100000@shell3.shore.net>

Andreas Prlic wrote:

> >> Once the modules are in place it is a good moment to hand over the
> >> leadership of the sub-modules to the new module-project leaders. It will be
> >> up to the module-lead to take the modules into the direction that he/she
> >> feels important. I would like to take this opportunity to suggest a couple
> >> of people as module-leaders and propose some action items for the modules.
> >> Feel free to comment or make additional suggestions...
> >
> > Sign me up for help with maven configuration/reporting, unit testing, and
> > generics API matters if you wish.
>
> Excellent, I will come back to you on this :-)
>
> >> ?- don;t break backwards compatibility
> >
> > Seems to me the last one will greatly hamper the rest of this effort.
> > The next version needs to be binary compatible with 1.7?
>
>
> What I mean is that we should try not to disrupt things as much as is
> reasonable. I am all for a pragmatic approach. While trying to be
> conservative I guess refactoring should be discussed on a case by case
> basis. To give an example: an area where I am supporting re-factoring
> is the blast parser. The package name is confusing and we probably
> need some code changes to expose more details of the parser. Are you
> thinking of any other situtations, where you think breaking backwards
> compatibility will be inevitable?

Ah yes, pragmatically backwards compatible with 1.7 is a better goal.

Maintaining binary compatibility is very difficult, and something we
haven't really done in the past.  Consider the following biojava 1.6.1 vs
biojava 1.7 clirr [1] report.

   michael


[1] http://clirr.sf.net

---
ERROR: 6004: org.biojava.bio.alignment.NeedlemanWunsch: Changed type of field CostMatrix from double[][] to int[][]
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 2 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 3 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 4 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 5 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getDelete()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getEditDistance()' has been changed to int
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getGapExt()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getInsert()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getMatch()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getReplace()' has been changed to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'protected double min(double, double, double)' has changed its type to int
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 2 of 'protected double min(double, double, double)' has changed its type to int
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 3 of 'protected double min(double, double, double)' has changed its type to int
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'protected double min(double, double, double)' has been changed to int
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double pairwiseAlignment(org.biojava.bio.symbol.SymbolList, org.biojava.bio.symbol.SymbolList)' has been changed to int
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public java.lang.String printCostMatrix(double[][], char[], char[])' has changed its type to int[][]
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setDelete(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setGapExt(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setInsert(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setMatch(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setReplace(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SequenceAlignment: Parameter 11 of 'public java.lang.String formatOutput(java.lang.String, java.lang.String, java.lang.String[], java.lang.String, int, int, long, int, int, long, double, long)' has changed its type to int
ERROR: 7006: org.biojava.bio.alignment.SequenceAlignment: Return type of method 'public java.lang.String formatOutput(java.lang.String, java.lang.String, java.lang.String[], java.lang.String, int, int, long, int, int, long, double, long)' has been changed to java.lang.StringBuffer
ERROR: 7006: org.biojava.bio.alignment.SequenceAlignment: Return type of method 'public double pairwiseAlignment(org.biojava.bio.symbol.SymbolList, org.biojava.bio.symbol.SymbolList)' has been changed to int
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 2 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 3 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 4 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 5 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7006: org.biojava.bio.alignment.SmithWaterman: Return type of method 'public double pairwiseAlignment(org.biojava.bio.symbol.SymbolList, org.biojava.bio.symbol.SymbolList)' has been changed to int
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setDelete(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setGapExt(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setInsert(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setMatch(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setReplace(double)' has changed its type to short
ERROR: 6004: org.biojava.bio.alignment.SubstitutionMatrix: Changed type of field matrix from int[][] to short[][]
ERROR: 6004: org.biojava.bio.alignment.SubstitutionMatrix: Changed type of field max from int to short
ERROR: 6004: org.biojava.bio.alignment.SubstitutionMatrix: Changed type of field min from int to short
ERROR: 7005: org.biojava.bio.alignment.SubstitutionMatrix: Parameter 2 of 'public SubstitutionMatrix(org.biojava.bio.symbol.FiniteAlphabet, int, int)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SubstitutionMatrix: Parameter 3 of 'public SubstitutionMatrix(org.biojava.bio.symbol.FiniteAlphabet, int, int)' has changed its type to short
INFO: 7011: org.biojava.bio.alignment.SubstitutionMatrix: Method 'public SubstitutionMatrix(java.io.File)' has been added
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'public int getMax()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'public int getMin()' has been changed to short
INFO: 7011: org.biojava.bio.alignment.SubstitutionMatrix: Method 'public org.biojava.bio.alignment.SubstitutionMatrix getSubstitutionMatrix(java.io.BufferedReader)' has been added
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'public int getValueAt(org.biojava.bio.symbol.Symbol, org.biojava.bio.symbol.Symbol)' has been changed to short
ERROR: 7005: org.biojava.bio.alignment.SubstitutionMatrix: Parameter 1 of 'protected int[][] parseMatrix(java.lang.String)' has changed its type to java.lang.Object
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'protected int[][] parseMatrix(java.lang.String)' has been changed to short[][]
ERROR: 7009: org.biojava.bio.alignment.SubstitutionMatrix: Accessibility of method 'protected int[][] parseMatrix(java.lang.String)' has been decreased from protected to private
INFO: 7003: org.biojava.bio.dp.onehead.SmallCursor: Method 'public boolean canAdvance()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.dp.onehead.SmallCursor: Method 'public org.biojava.bio.symbol.Symbol currentRes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.dp.onehead.SmallCursor: Method 'public org.biojava.bio.symbol.Symbol lastRes()' has been removed, but an inherited definition exists.
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public ArrowGlyph(java.awt.Paint, java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public ArrowGlyph(java.awt.geom.Rectangle2D$Float, java.awt.Paint, java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public java.awt.Paint getFillPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public java.awt.Paint getOuterPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public void setDirection(int)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public void setFillPaint(java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public void setOuterPaint(java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.RectangleGlyph: Method 'public java.awt.Paint getPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.RectangleGlyph: Method 'public void setPaint(java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.TurnGlyph: Method 'public java.awt.Paint getPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.TurnGlyph: Method 'public void setPaint(java.awt.Paint)' has been added
INFO: 6009: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Accessibility of field fList has been increased from private to protected
INFO: 6009: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Accessibility of field gList has been increased from private to protected
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public boolean containsFilter(org.biojava.bio.seq.FeatureFilter)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public org.biojava.bio.seq.FeatureFilter getFeatureFilter(int)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public org.biojava.bio.gui.glyph.Glyph getGlyphForFilter(org.biojava.bio.seq.FeatureFilter)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public void removeFilterWithGlyph(org.biojava.bio.seq.FeatureFilter)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public void setGlyphForFilter(org.biojava.bio.seq.FeatureFilter, org.biojava.bio.gui.glyph.Glyph)' has been added
INFO: 6009: org.biojava.bio.gui.sequence.SequencePanelWrapper: Accessibility of field seqPanels has been increased from private to protected
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.


From holland at eaglegenomics.com  Tue Aug 25 00:32:24 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 25 Aug 2009 05:32:24 +0100
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
Message-ID: <459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>

>
>
> What I mean is that we should try not to disrupt things as much as is
> reasonable. I am all for a pragmatic approach. While trying to be
> conservative I guess refactoring should be discussed on a case by case
> basis. To give an example: an area where I am supporting re-factoring
> is the blast parser. The package name is confusing and we probably
> need some code changes to expose more details of the parser. Are you
> thinking of any other situtations, where you think breaking backwards
> compatibility will be inevitable?

Almost all the parsers would fit this category, as would any realistic  
attempt to 'fix' the sequence model by moving bits of the APIs around  
(for instance, Sequences have Features which have Strands, but  
Locations do _not_ have Strands - which is all wrong, because Strand  
is a Location-level concept, not a Feature-level concept).

My original plan was to not even attempt to make new versions backward  
compatible, and instead to have a separate module which coerced the  
new objects into complying with the old API interface declarations (by  
using the facade model).

cheers,
Richard

> Andreas
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Tue Aug 25 02:58:40 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 25 Aug 2009 14:58:40 +0800
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com> 
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net> 
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com> 
	<459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
Message-ID: <93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>

I would agree with Richard on this. I think the changes being proposed
are not compatible with the current API. There are a couple of things
wrong with the current model (such as the Feature, Strand, Location
issues). There are also several areas where best-practices of the past
(parts of BioJava are 10 years old) are not considered best practices
now (some like Singletons are often thought of as anti-patterns these
days).

Add to that the fact that we have never been truly backwards
compatible (expept maybe 1.3 and 1.3.1 ?)  and I think we can
justifiably try and avoid the claim that BJ1.7 should be backwards
compatible.  We can continue to make older Jars available for people
who need them although most likely people who have a need for legacy
support already have the Jars that they need bundled up with their
apps. Shared libraries have very much fallen out of favor in recent
years in almost all languages and system wide classpaths are asking
for trouble.  Hard-drives are cheap so it is no big deal to have a
dedicated version of the BioJava jar bundled with each app that needs
it.

We could adopt the idea that backwards compatible builds get
minor-version numbers eg 1.1 while other builds get major version
numbers. I guess this would mean we are at BioJava 7 ?

Backwards compatibility would be great to have but not if the effort
required hinders innovation.

- Mark

On Tue, Aug 25, 2009 at 12:32 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
>>
>>
>> What I mean is that we should try not to disrupt things as much as is
>> reasonable. I am all for a pragmatic approach. While trying to be
>> conservative I guess refactoring should be discussed on a case by case
>> basis. To give an example: an area where I am supporting re-factoring
>> is the blast parser. The package name is confusing and we probably
>> need some code changes to expose more details of the parser. Are you
>> thinking of any other situtations, where you think breaking backwards
>> compatibility will be inevitable?
>
> Almost all the parsers would fit this category, as would any realistic attempt to 'fix' the sequence model by moving bits of the APIs around (for instance, Sequences have Features which have Strands, but Locations do _not_ have Strands - which is all wrong, because Strand is a Location-level concept, not a Feature-level concept).
>
> My original plan was to not even attempt to make new versions backward compatible, and instead to have a separate module which coerced the new objects into complying with the old API interface declarations (by using the facade model).
>
> cheers,
> Richard
>
>> Andreas
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From jacobsen at ebi.ac.uk  Tue Aug 25 04:45:52 2009
From: jacobsen at ebi.ac.uk (Jules Jacobsen)
Date: Tue, 25 Aug 2009 09:45:52 +0100
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com> 
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net> 
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com> 
	<459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
	<93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>
Message-ID: <12c279870908250145waf21d9fmed256a3573a9ee1d@mail.gmail.com>

I think Mark has a good point here - there are certain aspects of
BioJava which are considered to be un-necessarily over-complicated and
these things have been deal-breakers for the people concerned - I
remember a couple of cases from the EBI where they have implemented
their own system instead of using and supporting BioJava.

Fixing areas of confusion, simplifying and moving forwards without
maintaining backwards-compatibility might be a good idea for
increasing user numbers and elevating the general perception of the
project, whilst potentially risking alienating some existing users.

I think his idea of maintaining compatibility within point releases
and stating that full version releases may not have backwards
compatibility would make it clearer for users as to what to expect
from a release. It may also help the developers stay on track with the
task and general design focus for that release by constraining them to
the current system during a point release whilst highlighting
confusing areas which can be dealt with in a more satifsfactory manner
in the next full release.

 Jules

On Tue, Aug 25, 2009 at 7:58 AM, Mark Schreiber<markjschreiber at gmail.com> wrote:
> I would agree with Richard on this. I think the changes being proposed
> are not compatible with the current API. There are a couple of things
> wrong with the current model (such as the Feature, Strand, Location
> issues). There are also several areas where best-practices of the past
> (parts of BioJava are 10 years old) are not considered best practices
> now (some like Singletons are often thought of as anti-patterns these
> days).
>
> Add to that the fact that we have never been truly backwards
> compatible (expept maybe 1.3 and 1.3.1 ?) ?and I think we can
> justifiably try and avoid the claim that BJ1.7 should be backwards
> compatible. ?We can continue to make older Jars available for people
> who need them although most likely people who have a need for legacy
> support already have the Jars that they need bundled up with their
> apps. Shared libraries have very much fallen out of favor in recent
> years in almost all languages and system wide classpaths are asking
> for trouble. ?Hard-drives are cheap so it is no big deal to have a
> dedicated version of the BioJava jar bundled with each app that needs
> it.
>
> We could adopt the idea that backwards compatible builds get
> minor-version numbers eg 1.1 while other builds get major version
> numbers. I guess this would mean we are at BioJava 7 ?
>
> Backwards compatibility would be great to have but not if the effort
> required hinders innovation.
>
> - Mark
>
> On Tue, Aug 25, 2009 at 12:32 PM, Richard Holland
> <holland at eaglegenomics.com> wrote:
>>>
>>>
>>> What I mean is that we should try not to disrupt things as much as is
>>> reasonable. I am all for a pragmatic approach. While trying to be
>>> conservative I guess refactoring should be discussed on a case by case
>>> basis. To give an example: an area where I am supporting re-factoring
>>> is the blast parser. The package name is confusing and we probably
>>> need some code changes to expose more details of the parser. Are you
>>> thinking of any other situtations, where you think breaking backwards
>>> compatibility will be inevitable?
>>
>> Almost all the parsers would fit this category, as would any realistic attempt to 'fix' the sequence model by moving bits of the APIs around (for instance, Sequences have Features which have Strands, but Locations do _not_ have Strands - which is all wrong, because Strand is a Location-level concept, not a Feature-level concept).
>>
>> My original plan was to not even attempt to make new versions backward compatible, and instead to have a separate module which coerced the new objects into complying with the old API interface declarations (by using the facade model).
>>
>> cheers,
>> Richard
>>
>>> Andreas
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


Jules Jacobsen

UniProt-PDB Integration
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK


From andreas at sdsc.edu  Tue Aug 25 13:36:45 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 25 Aug 2009 10:36:45 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <12c279870908250145waf21d9fmed256a3573a9ee1d@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
	<459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
	<93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>
	<12c279870908250145waf21d9fmed256a3573a9ee1d@mail.gmail.com>
Message-ID: <59a41c430908251036s616ab5f3m825d95223e758d85@mail.gmail.com>

I agree with all that has been said so far. The Sequence/Feature model
is definitely not good enough and well, also does not work for protein
structures.  (There can be alternate positions and the numbering can
be non-sequential and have negative positions.)

Still the question is, do we need to throw away the backwards
compatibility? The new modularization will allow a plug and play
architecture and we could easily have two generations of code in
different modules. That way legacy code could depend on the older
"core" (perhaps we should find a different name) while newly written
code will be based on biojava-sequence, which would contain Richard's
new code. That way we could prepare the code for the future, while
still embracing the past.

One example that heavily uses the Sequence and Distributions APIs is
NestedMica. It is a pretty cool machine learning software and I was
hoping that we could bring that closer to biojava. (a machine learning
module in BJ would be cool, no?)

Andreas


On Tue, Aug 25, 2009 at 1:45 AM, Jules Jacobsen<jacobsen at ebi.ac.uk> wrote:
> I think Mark has a good point here - there are certain aspects of
> BioJava which are considered to be un-necessarily over-complicated and
> these things have been deal-breakers for the people concerned - I
> remember a couple of cases from the EBI where they have implemented
> their own system instead of using and supporting BioJava.
>
> Fixing areas of confusion, simplifying and moving forwards without
> maintaining backwards-compatibility might be a good idea for
> increasing user numbers and elevating the general perception of the
> project, whilst potentially risking alienating some existing users.
>
> I think his idea of maintaining compatibility within point releases
> and stating that full version releases may not have backwards
> compatibility would make it clearer for users as to what to expect
> from a release. It may also help the developers stay on track with the
> task and general design focus for that release by constraining them to
> the current system during a point release whilst highlighting
> confusing areas which can be dealt with in a more satifsfactory manner
> in the next full release.
>
> ?Jules
>
> On Tue, Aug 25, 2009 at 7:58 AM, Mark Schreiber<markjschreiber at gmail.com> wrote:
>> I would agree with Richard on this. I think the changes being proposed
>> are not compatible with the current API. There are a couple of things
>> wrong with the current model (such as the Feature, Strand, Location
>> issues). There are also several areas where best-practices of the past
>> (parts of BioJava are 10 years old) are not considered best practices
>> now (some like Singletons are often thought of as anti-patterns these
>> days).
>>
>> Add to that the fact that we have never been truly backwards
>> compatible (expept maybe 1.3 and 1.3.1 ?) ?and I think we can
>> justifiably try and avoid the claim that BJ1.7 should be backwards
>> compatible. ?We can continue to make older Jars available for people
>> who need them although most likely people who have a need for legacy
>> support already have the Jars that they need bundled up with their
>> apps. Shared libraries have very much fallen out of favor in recent
>> years in almost all languages and system wide classpaths are asking
>> for trouble. ?Hard-drives are cheap so it is no big deal to have a
>> dedicated version of the BioJava jar bundled with each app that needs
>> it.
>>
>> We could adopt the idea that backwards compatible builds get
>> minor-version numbers eg 1.1 while other builds get major version
>> numbers. I guess this would mean we are at BioJava 7 ?
>>
>> Backwards compatibility would be great to have but not if the effort
>> required hinders innovation.
>>
>> - Mark
>>
>> On Tue, Aug 25, 2009 at 12:32 PM, Richard Holland
>> <holland at eaglegenomics.com> wrote:
>>>>
>>>>
>>>> What I mean is that we should try not to disrupt things as much as is
>>>> reasonable. I am all for a pragmatic approach. While trying to be
>>>> conservative I guess refactoring should be discussed on a case by case
>>>> basis. To give an example: an area where I am supporting re-factoring
>>>> is the blast parser. The package name is confusing and we probably
>>>> need some code changes to expose more details of the parser. Are you
>>>> thinking of any other situtations, where you think breaking backwards
>>>> compatibility will be inevitable?
>>>
>>> Almost all the parsers would fit this category, as would any realistic attempt to 'fix' the sequence model by moving bits of the APIs around (for instance, Sequences have Features which have Strands, but Locations do _not_ have Strands - which is all wrong, because Strand is a Location-level concept, not a Feature-level concept).
>>>
>>> My original plan was to not even attempt to make new versions backward compatible, and instead to have a separate module which coerced the new objects into complying with the old API interface declarations (by using the facade model).
>>>
>>> cheers,
>>> Richard
>>>
>>>> Andreas
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> Jules Jacobsen
>
> UniProt-PDB Integration
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge
> CB10 1SD
> UK
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From cmasak at gmail.com  Wed Aug 26 09:29:24 2009
From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=)
Date: Wed, 26 Aug 2009 15:29:24 +0200
Subject: [Biojava-dev] [BUG] Infinite regress when calling
	DNATools.createDNASequence with a DNA string containing a '~' char
Message-ID: <16d769b70908260629m15512bc4tb8798d41d53fad0f@mail.gmail.com>

Hello,

Two things:

1. The BioJava wiki links to a Bugzilla instance, saying bugs should
be posted there ([1]). As I write this, that Bugzilla instance gives a
500 Internal Server Error ([2]).

[1] <http://biojava.org/wiki/BioJava:MailingLists#Bug_Reports>
[2] <http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava>

2. In the face of this, I hope you don't mind I leave my bug report
here for the time being. We're wrapping BioJava in the Bioclipse
project. We've found what appears to be a logical bug causing an
infinite regress and a stack overflow.

Let's call DNATools.createDNASequence("~", ""). The following code in
that method (org/biojava/bio/seq/DNATools.java:188) will be executed.

  public static Sequence createDNASequence(String dna, String name)
  throws IllegalSymbolException {
    //should I be calling createGappedDNASequence?
    if(dna.indexOf('-') != -1 || dna.indexOf('~') != -1){//there is a gap
        return createGappedDNASequence(dna, name);
    }

The following code in createGappedDNASequence (DNATools.java:207) will
be executed:

    /** Get a new dna as a GappedSequence */
    public static GappedSequence createGappedDNASequence(String dna,
String name) throws IllegalSymbolException{
        String dna1 = dna.replaceAll("-", "");
        Sequence dnaSeq = createDNASequence(dna1, name);

The infinite regress is caused by these two methods calling each
other, for ever. There is no bottoming-out, because none of these
lines removes '~' characters.

We experience this problem in Biojava 1.6, but the above code and line
numbers are from 1.7, where the issue remains.

Regards,
// Carl M?sak


From heuermh at acm.org  Thu Aug 27 13:01:31 2009
From: heuermh at acm.org (Michael Heuer)
Date: Thu, 27 Aug 2009 13:01:31 -0400 (EDT)
Subject: [Biojava-dev] BioJava code freeze,
 modularization and action items for sub modules
In-Reply-To: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>

Andreas Prlic wrote:

> Here a list of modules / action items and the people that I would propose to
> become module leaders:
> ...
>
> Module: biojava-sequencing Lead:  Michael Heuer
>   - support FastQ files
>   - support parsing of output for various new sequencing machines

I have volunteered on the open-bio mailing list to implement FASTQ
support.  A nice collection of test data is being created in collaboration
with the other open-bio projects.  If anyone has interest in a particular
data set, please let me know, as I will also need data for performance
tuning.

   michael


From andreas at sdsc.edu  Thu Aug 27 13:30:08 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 27 Aug 2009 10:30:08 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>
Message-ID: <59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>

Great, thanks for "volunteering", Michael.

To add another Module:

biojava-das : Lead: Jonathan Warren
probably deprecate the old DAS code in BJ and replace it with
the up to date Dasobert library

Thanks to Jonathan for volunteering as well.

Andreas


On Thu, Aug 27, 2009 at 10:01 AM, Michael Heuer<heuermh at acm.org> wrote:
> Andreas Prlic wrote:
>
>> Here a list of modules / action items and the people that I would propose to
>> become module leaders:
>> ...
>>
>> Module: biojava-sequencing Lead: ?Michael Heuer
>> ? - support FastQ files
>> ? - support parsing of output for various new sequencing machines
>
> I have volunteered on the open-bio mailing list to implement FASTQ
> support. ?A nice collection of test data is being created in collaboration
> with the other open-bio projects. ?If anyone has interest in a particular
> data set, please let me know, as I will also need data for performance
> tuning.
>
> ? michael
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From markjschreiber at gmail.com  Fri Aug 28 01:37:59 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 28 Aug 2009 13:37:59 +0800
Subject: [Biojava-dev] [Biojava-l]  BioJava code freeze,
	modularization and 	action items for sub modules
In-Reply-To: <59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com> 
	<Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net> 
	<59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>
Message-ID: <93b45ca50908272237k2485a1d8le343a8b1dc10ae12@mail.gmail.com>

I'm happy to volunteer code for:


   1. BLASTXML parser as long as I can change the ssbind APIs (other parsers
   could go into a legacy module??). Actually I would prefer to completely
   decouple from the sequence/ feature module as many people would like a blast
   parser without the rest of biojava thrown in.
   2. BioSQL/ JPA bindings. I have already generated JPA compliant entity
   beans for mapping to BioSQL as well as JPA handler code that makes sure
   modifications presist properly. Currently the object model very closely
   follows the BioSQL table structure.  Also the current beans are what people
   call Anaemic beans in that they hold data and provide getters and setters
   but no biological behaivour. I can easily provide bio-smarts to the beans
   but it might be better to hold off until there is a module that contains
   sequence/feature interfaces which the beans could implement.
   3. Happy to provide code for an enterprise module if there is sufficient
   interest. This would probably take the form of SessionBeans and WebServices
   that can be deployed to Glassfish/ JBoss etc to provide biological services
   for people who want to make client server or SOA apps.

- Mark


On Fri, Aug 28, 2009 at 1:30 AM, Andreas Prlic <andreas at sdsc.edu> wrote:

> Great, thanks for "volunteering", Michael.
>
> To add another Module:
>
> biojava-das : Lead: Jonathan Warren
> probably deprecate the old DAS code in BJ and replace it with
> the up to date Dasobert library
>
> Thanks to Jonathan for volunteering as well.
>
> Andreas
>
>
>
>
> On Thu, Aug 27, 2009 at 10:01 AM, Michael Heuer<heuermh at acm.org> wrote:
> > Andreas Prlic wrote:
> >
> >> Here a list of modules / action items and the people that I would
> propose to
> >> become module leaders:
> >> ...
> >>
> >> Module: biojava-sequencing Lead:  Michael Heuer
> >>   - support FastQ files
> >>   - support parsing of output for various new sequencing machines
> >
> > I have volunteered on the open-bio mailing list to implement FASTQ
> > support.  A nice collection of test data is being created in
> collaboration
> > with the other open-bio projects.  If anyone has interest in a particular
> > data set, please let me know, as I will also need data for performance
> > tuning.
> >
> >   michael
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From andreas at sdsc.edu  Fri Aug 28 11:10:03 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 28 Aug 2009 08:10:03 -0700
Subject: [Biojava-dev] [Biojava-l]  BioJava code freeze,
	modularization and 	action items for sub modules
In-Reply-To: <93b45ca50908272237k2485a1d8le343a8b1dc10ae12@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>
	<59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>
	<93b45ca50908272237k2485a1d8le343a8b1dc10ae12@mail.gmail.com>
Message-ID: <59a41c430908280810s1720cfckbc36168f2fbc73a8@mail.gmail.com>

Thanks, Mark.

Guess we should start collecting all this info on a wiki page. I started to edit
http://biojava.org/wiki/BioJava:Modules

module leaders: feel free to edit the plans for your module...

Andreas


On Thu, Aug 27, 2009 at 10:37 PM, Mark
Schreiber<markjschreiber at gmail.com> wrote:
- Show quoted text -

On Thu, Aug 27, 2009 at 10:37 PM, Mark
Schreiber<markjschreiber at gmail.com> wrote:
> I'm happy to volunteer code for:
>
> BLASTXML parser as long as I can change the ssbind APIs (other parsers could
> go into a legacy module??). Actually I would prefer to completely decouple
> from the sequence/ feature module as many people would like a blast parser
> without the rest of biojava thrown in.
> BioSQL/ JPA bindings. I have already generated JPA compliant entity beans
> for mapping to BioSQL as well as JPA handler code that makes sure
> modifications presist properly. Currently the object model very closely
> follows the BioSQL table structure.? Also the current beans are what people
> call Anaemic beans in that they hold data and provide getters and setters
> but no biological behaivour. I can easily provide bio-smarts to the beans
> but it might be better to hold off until there is a module that contains
> sequence/feature interfaces which the beans could implement.
> Happy to provide code for an enterprise module if there is sufficient
> interest. This would probably take the form of SessionBeans and WebServices
> that can be deployed to Glassfish/ JBoss etc to provide biological services
> for people who want to make client server or SOA apps.
>
> - Mark
>
>
> On Fri, Aug 28, 2009 at 1:30 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>
>> Great, thanks for "volunteering", Michael.
>>
>> To add another Module:
>>
>> biojava-das : Lead: Jonathan Warren
>> probably deprecate the old DAS code in BJ and replace it with
>> the up to date Dasobert library
>>
>> Thanks to Jonathan for volunteering as well.
>>
>> Andreas
>>
>>
>>
>>
>> On Thu, Aug 27, 2009 at 10:01 AM, Michael Heuer<heuermh at acm.org> wrote:
>> > Andreas Prlic wrote:
>> >
>> >> Here a list of modules / action items and the people that I would
>> >> propose to
>> >> become module leaders:
>> >> ...
>> >>
>> >> Module: biojava-sequencing Lead: ?Michael Heuer
>> >> ? - support FastQ files
>> >> ? - support parsing of output for various new sequencing machines
>> >
>> > I have volunteered on the open-bio mailing list to implement FASTQ
>> > support. ?A nice collection of test data is being created in
>> > collaboration
>> > with the other open-bio projects. ?If anyone has interest in a
>> > particular
>> > data set, please let me know, as I will also need data for performance
>> > tuning.
>> >
>> > ? michael
>> >
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >
>>
>> _______________________________________________
>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>


From andreas at sdsc.edu  Sun Aug 30 21:23:03 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 30 Aug 2009 18:23:03 -0700
Subject: [Biojava-dev] maven progress
Message-ID: <59a41c430908301823s6e2e3d7fi6caffc47e1a8c0ff@mail.gmail.com>

Hi,

I started to split up biojava into submodules and am mavenizing the
build process. The new SVN location is emerging here:

http://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/biojava

or in your browser:

http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/biojava

A few questions so far from my side.

1) bytecode.jar: at the present the core module depends on this.   So
far it is in the /jars subfolder of the module and needs to be
installed by hand. What is the best way to deal with this in SVN?

2) Sequence module (Richard's original biojava v.3 branch) Since this
consists of sub-modules I have set it up as a few hierarchically
organized submodules. There is some biosql code there as well.
Richard/Mark not sure now to arrange this. I think it would be good to
have a biosql module. Shall I refactor the current biosql code out of
core into a new biosql module or will the current code be obsoleted
and  replaced with the new code in the sequence module?

Andreas

From gmicha at gmail.com  Sat Aug  1 15:49:50 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Sat, 01 Aug 2009 17:49:50 +0200
Subject: [Biojava-dev] apidoc in org.biojava.bio.symbol.SimpleSymbolList
Message-ID: <4A74641E.80104@gmail.com>

Hi,

the class header in my copy (1.7) contains the example

..
FiniteAlphabet dna = (FiniteAlphabet) 
AlphabetManager.alphabetForName("DNA");
SymbolParser parser = dna.getParser("token");
..

but the version I check out from the CVS does not contain a method 
FiniteAlphabet.getParser(). I think it should read

parser = dna.getTokenization("token");

right? Just wanted to bring to attention..

Best,

micha.


From bugzilla-daemon at portal.open-bio.org  Sun Aug  2 17:31:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 2 Aug 2009 13:31:09 -0400
Subject: [Biojava-dev] [Bug 2540] RichSequenceIterator does not skip
	sequence when exception is thrown
In-Reply-To: <bug-2540-485@http.bugzilla.open-bio.org/>
Message-ID: <200908021731.n72HV9W4010985@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2540


------- Comment #1 from vdmerwe.karen at gmail.com  2009-08-02 13:31 EST -------
Created an attachment (id=1352)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1352&action=view)
Code to make the RichSequenceIterator skip sequence when exception is thrown

Any feedback regarding the use of this proposed solution will be appreciated.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From gmicha at gmail.com  Sun Aug  2 19:28:10 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Sun, 02 Aug 2009 21:28:10 +0200
Subject: [Biojava-dev] Sequence and Feature
Message-ID: <4A75E8CA.3040904@gmail.com>

Hi,

I am writing a parser for aligned sequencing reads and I plan to 
separate the read information (sequence, qualities) from the alignment 
information by reasons of redundancy and sortings.

I planned the following classes:

Read extends AbstractChangeable implements Sequence, Qualitative

Alignment extends AbstractChangeable implements Feature

Alignment I put directly as inner class of Read, to delegate the 
Feature.getSequence() directly via the outer Object. I also have sort of 
alignment groups which are inserted as additional Feature in between 
these two, but I think for the sketched toy example they are not important.

One doubt is: Alignment links a subpart of the read with a subpart of 
the genomic sequence, which is big and probably I will never hold an 
instance of it. So, getSequence() here refers to the subpart of the read 
that gets aligned and I have a couple of custom attributes that annotate 
the location in the genome. Is this in the philosophy of the class 
hierachy design?

It would be nice if someone with a bit more experience in Biojava could 
leave a comment if I go the right direction, or if there is a more 
natural way to get my hierachy into biojava.

Thanks and cheers!

micha.


From holland at eaglegenomics.com  Mon Aug  3 08:01:57 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 3 Aug 2009 09:01:57 +0100
Subject: [Biojava-dev] Sequence and Feature
In-Reply-To: <4A75E8CA.3040904@gmail.com>
References: <4A75E8CA.3040904@gmail.com>
Message-ID: <2DEC4F45-25E2-497B-A0E7-100A2AD1693C@eaglegenomics.com>

Yes, Feature.getSequence() is intended only to return the sequence of  
the feature itself - so it would be fine not to store the whole  
genomic sequence, and instead just store locations referring to it.

Have you looked into the existing Alignment classes in BioJava? They  
might be of some help to you.

cheers,
Richard

On 2 Aug 2009, at 20:28, Micha Sammeth wrote:

> Hi,
>
> I am writing a parser for aligned sequencing reads and I plan to  
> separate the read information (sequence, qualities) from the  
> alignment information by reasons of redundancy and sortings.
>
> I planned the following classes:
>
> Read extends AbstractChangeable implements Sequence, Qualitative
>
> Alignment extends AbstractChangeable implements Feature
>
> Alignment I put directly as inner class of Read, to delegate the  
> Feature.getSequence() directly via the outer Object. I also have  
> sort of alignment groups which are inserted as additional Feature in  
> between these two, but I think for the sketched toy example they are  
> not important.
>
> One doubt is: Alignment links a subpart of the read with a subpart  
> of the genomic sequence, which is big and probably I will never hold  
> an instance of it. So, getSequence() here refers to the subpart of  
> the read that gets aligned and I have a couple of custom attributes  
> that annotate the location in the genome. Is this in the philosophy  
> of the class hierachy design?
>
> It would be nice if someone with a bit more experience in Biojava  
> could leave a comment if I go the right direction, or if there is a  
> more natural way to get my hierachy into biojava.
>
> Thanks and cheers!
>
> micha.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Aug  3 11:51:19 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 3 Aug 2009 12:51:19 +0100
Subject: [Biojava-dev] Hackathon update
Message-ID: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>

Hi guys,

10 people responded (including me). 5 of those are in Cambridge, UK, 3  
are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the  
hackathon with a holiday, and 3 suggested linking the hackathon with a  
conference, which would almost certainly increase chances of getting  
funding for travel/accommodation from employers.

So, I have two options. Venues in both cases to be worked out later:

   1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle  
of the winter in the UK, but on the bright side, the Cambridge Winter  
Beer Festival runs from the 22nd-24th, so that's something to cheer  
you up at the end of the hackathon.

   2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is  
9th-10th (TBC), then ISMB which is 11th-14th).

Both have pros and cons - the Cambridge meeting means 50% of the  
delegates could attend for free and we might even be able to get a  
free venue, whereas the Boston meeting would be attractive to anyone  
already planning to attend BOSC or ISMB who might otherwise not be  
able to find funding for travel.

I'm going to stick my neck out and suggest that BOSC/ISMB is the  
better choice, simply because of the wider range of potential  
delegates to attend the hackathon. We could always have a Cambridge  
mini-meeting at some other time. So, unless anyone objects, pencil in  
your diary for July 5th-8th in Boston.

Please could all those interested vote yes or no for this plan so that  
I can find a suitably sized venue. Attendance will need to be  
confirmed by the date the venue sets for final booking/payment.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Aug  3 13:29:17 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 3 Aug 2009 14:29:17 +0100
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <C69C5A09.1960%HWillis@scripps.edu>
References: <C69C5A09.1960%HWillis@scripps.edu>
Message-ID: <0BD11B39-1695-4C07-9695-20D095172A9C@eaglegenomics.com>

Good plan - my worry is whether or not people can get 2 weeks off in  
the same year for the purposes of a hackathon.

But, if people are willing, I'm happy to set up both. It does mean  
extra cost in terms of venue hire etc. - do you have any ideas as to  
good sponsors?


On 3 Aug 2009, at 14:10, Scooter Willis wrote:

> Richard
>
> It probably wouldn?t hurt to try and do both. Waiting a year delays  
> getting started and because the two events are six months apart it  
> increases the odds of those who may be able to attend both. This way  
> at BOSC/ISMB we can have good momentum and stability for the current  
> modules. The BOSC/ISMB can then be focused on recruiting new  
> developers with a focus on new modules, code examples, docs etc.
>
> It also probably makes sense to try and identify/recruit Java based  
> bioinformatics open source applications that have needed or  
> interesting functionality to ?biojava? enable the algorithm of the  
> application. This could be a good theme for the BOSC/ISMB conference  
> to have current Biojava developers work with developers of other  
> java bioinformatics application to port key functionality so that it  
> works with Biojava core.
>
> Scooter
>
>
> On 8/3/09 7:51 AM, "Richard Holland" <holland at eaglegenomics.com>  
> wrote:
>
> Hi guys,
>
> 10 people responded (including me). 5 of those are in Cambridge, UK, 3
> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the
> hackathon with a holiday, and 3 suggested linking the hackathon with a
> conference, which would almost certainly increase chances of getting
> funding for travel/accommodation from employers.
>
> So, I have two options. Venues in both cases to be worked out later:
>
>    1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
> of the winter in the UK, but on the bright side, the Cambridge Winter
> Beer Festival runs from the 22nd-24th, so that's something to cheer
> you up at the end of the hackathon.
>
>    2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
> 9th-10th (TBC), then ISMB which is 11th-14th).
>
> Both have pros and cons - the Cambridge meeting means 50% of the
> delegates could attend for free and we might even be able to get a
> free venue, whereas the Boston meeting would be attractive to anyone
> already planning to attend BOSC or ISMB who might otherwise not be
> able to find funding for travel.
>
> I'm going to stick my neck out and suggest that BOSC/ISMB is the
> better choice, simply because of the wider range of potential
> delegates to attend the hackathon. We could always have a Cambridge
> mini-meeting at some other time. So, unless anyone objects, pencil in
> your diary for July 5th-8th in Boston.
>
> Please could all those interested vote yes or no for this plan so that
> I can find a suitably sized venue. Attendance will need to be
> confirmed by the date the venue sets for final booking/payment.
>
> cheers,
> Richard
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Mon Aug  3 16:38:32 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 4 Aug 2009 00:38:32 +0800
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>
References: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>
Message-ID: <93b45ca50908030938j7899572et780fd2ccd0f2f417@mail.gmail.com>

Boston++

On 3 Aug 2009, 8:52 PM, "Richard Holland" <holland at eaglegenomics.com> wrote:

Hi guys,

10 people responded (including me). 5 of those are in Cambridge, UK, 3 are
in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the hackathon
with a holiday, and 3 suggested linking the hackathon with a conference,
which would almost certainly increase chances of getting funding for
travel/accommodation from employers.

So, I have two options. Venues in both cases to be worked out later:

 1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle of the
winter in the UK, but on the bright side, the Cambridge Winter Beer Festival
runs from the 22nd-24th, so that's something to cheer you up at the end of
the hackathon.

 2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
9th-10th (TBC), then ISMB which is 11th-14th).

Both have pros and cons - the Cambridge meeting means 50% of the delegates
could attend for free and we might even be able to get a free venue, whereas
the Boston meeting would be attractive to anyone already planning to attend
BOSC or ISMB who might otherwise not be able to find funding for travel.

I'm going to stick my neck out and suggest that BOSC/ISMB is the better
choice, simply because of the wider range of potential delegates to attend
the hackathon. We could always have a Cambridge mini-meeting at some other
time. So, unless anyone objects, pencil in your diary for July 5th-8th in
Boston.

Please could all those interested vote yes or no for this plan so that I can
find a suitably sized venue. Attendance will need to be confirmed by the
date the venue sets for final booking/payment.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas at sdsc.edu  Tue Aug  4 06:09:37 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 3 Aug 2009 23:09:37 -0700
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <0BD11B39-1695-4C07-9695-20D095172A9C@eaglegenomics.com>
References: <C69C5A09.1960%HWillis@scripps.edu>
	<0BD11B39-1695-4C07-9695-20D095172A9C@eaglegenomics.com>
Message-ID: <59a41c430908032309l7b380c92hf018c12d38dd566f@mail.gmail.com>

Hi Richard,

I think it is a great idea to plan a hackaton prior to next BOSC. Still this
is still almost a year ahead and as such a long time away. Ideally I would
like to have something already earlier than that... San Diego is far away
from the UK, but I would be happy to organize and host something here, if
people would be up for the longish-journey...

Andreas


On Mon, Aug 3, 2009 at 6:29 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Good plan - my worry is whether or not people can get 2 weeks off in the
> same year for the purposes of a hackathon.
>
> But, if people are willing, I'm happy to set up both. It does mean extra
> cost in terms of venue hire etc. - do you have any ideas as to good
> sponsors?
>
>
> On 3 Aug 2009, at 14:10, Scooter Willis wrote:
>
>  Richard
>>
>> It probably wouldn?t hurt to try and do both. Waiting a year delays
>> getting started and because the two events are six months apart it increases
>> the odds of those who may be able to attend both. This way at BOSC/ISMB we
>> can have good momentum and stability for the current modules. The BOSC/ISMB
>> can then be focused on recruiting new developers with a focus on new
>> modules, code examples, docs etc.
>>
>> It also probably makes sense to try and identify/recruit Java based
>> bioinformatics open source applications that have needed or interesting
>> functionality to ?biojava? enable the algorithm of the application. This
>> could be a good theme for the BOSC/ISMB conference to have current Biojava
>> developers work with developers of other java bioinformatics application to
>> port key functionality so that it works with Biojava core.
>>
>> Scooter
>>
>>
>>
>> On 8/3/09 7:51 AM, "Richard Holland" <holland at eaglegenomics.com> wrote:
>>
>> Hi guys,
>>
>> 10 people responded (including me). 5 of those are in Cambridge, UK, 3
>> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the
>> hackathon with a holiday, and 3 suggested linking the hackathon with a
>> conference, which would almost certainly increase chances of getting
>> funding for travel/accommodation from employers.
>>
>> So, I have two options. Venues in both cases to be worked out later:
>>
>>   1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
>> of the winter in the UK, but on the bright side, the Cambridge Winter
>> Beer Festival runs from the 22nd-24th, so that's something to cheer
>> you up at the end of the hackathon.
>>
>>   2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
>> 9th-10th (TBC), then ISMB which is 11th-14th).
>>
>> Both have pros and cons - the Cambridge meeting means 50% of the
>> delegates could attend for free and we might even be able to get a
>> free venue, whereas the Boston meeting would be attractive to anyone
>> already planning to attend BOSC or ISMB who might otherwise not be
>> able to find funding for travel.
>>
>> I'm going to stick my neck out and suggest that BOSC/ISMB is the
>> better choice, simply because of the wider range of potential
>> delegates to attend the hackathon. We could always have a Cambridge
>> mini-meeting at some other time. So, unless anyone objects, pencil in
>> your diary for July 5th-8th in Boston.
>>
>> Please could all those interested vote yes or no for this plan so that
>> I can find a suitably sized venue. Attendance will need to be
>> confirmed by the date the venue sets for final booking/payment.
>>
>> cheers,
>> Richard
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From bugzilla-daemon at portal.open-bio.org  Tue Aug  4 17:28:58 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 4 Aug 2009 13:28:58 -0400
Subject: [Biojava-dev] [Bug 2540] RichSequenceIterator does not skip
	sequence when exception is thrown
In-Reply-To: <bug-2540-485@http.bugzilla.open-bio.org/>
Message-ID: <200908041728.n74HSwfd027233@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2540


vdmerwe.karen at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1352 is|0                           |1
           obsolete|                            |


------- Comment #2 from vdmerwe.karen at gmail.com  2009-08-04 13:28 EST -------
Created an attachment (id=1356)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1356&action=view)
Updated the previous solution


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From florian.mittag at uni-tuebingen.de  Wed Aug  5 12:45:41 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Wed, 5 Aug 2009 14:45:41 +0200
Subject: [Biojava-dev]  How to parse large Genbank files?
In-Reply-To: <E8A261BA-DED7-4BE5-A946-1561648BB527@eaglegenomics.com>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<200907281414.55156.florian.mittag@uni-tuebingen.de>
	<E8A261BA-DED7-4BE5-A946-1561648BB527@eaglegenomics.com>
Message-ID: <200908051445.42345.florian.mittag@uni-tuebingen.de>

On Tuesday, 28. July 2009 14:52, Richard Holland wrote:
> > Btw: Should we move this to Biojava-dev?
>> probably, yes! :)

done ;)


> If you want to explore my ideas for a replacement Sequence model, the
> code and docs are here (sequence handling is in the 'core' module with
> DNA-specifics in the 'dna' module):
>
> http://biojava.org/wiki/BioJava3:HowTo
> http://www.biojava.org/wiki/BioJava3_project
>
> (Methods such as file parsers would request Strings (or ideally
> CharSequence - more flexible, and String extends it) as parameters
> whenever they don't care about content - if they care about content
> but don't care in advance about size or random access then they should
> request Iterator<Symbol> which can be used to wrap a String and parse
> on demand, and if they need full functionality then they should
> request List<Symbol> which the default implementation of uses
> ArrayLists but there's no reason a String-backed one could be written
> as well).

By now, I was mostly interested in a quick and dirty solution. I first 
attempted to create a new class StringSymbolList that would use the String as 
representation for the sequence and only convert to Symbols on demand. Since 
SimpleRichSequence uses SimpleSymbolList hard-coded, I wanted to implement a 
new RichSequence as well, but I was back-stabbed by Hibernate, because the 
bindings are set to SimpleRichSequence and when retrieving objects from the 
DB it uses the original BioJava classes again

My solution now works and it consists out of my own implementation of 
GenbankFormat, RichSequenceBuilder, and RichSequence, a new class called 
StringSymbolList as described above and a change to SimpleRichSequence, 
adding the method:

@Override
public String seqString() {
    return seqstring;
}

which circumvents most of the array copying stuff.

I also noticed that processing the Genbank files became slower with every 
file, so I closed the Hibernate session after each chromosome and opened a 
new one. (I also tried session.clean(), but somehow this didn't work).

For now, it seems like everything is fine and I have no more OutOfMemory 
exceptions.

- Florian


>
> cheers,
> Richard
>
> > - Florian
> >
> >> On Mon, Jul 27, 2009 at 8:16 PM, Florian
> >>
> >> Mittag<florian.mittag at uni-tuebingen.de> wrote:
> >>> Hi Mark!
> >>>
> >>> On Saturday, 25. July 2009 04:20, Mark Schreiber wrote:
> >>>> I don't think anyone has done much or anything to optimize these
> >>>> parsers. The process you outline sounds extremely inefficient. It
> >>>> is
> >>>> also likely to lead to memory leaks due to the number of copy
> >>>> operations.
> >>>
> >>> I wouldn't necessarily say that it leads to memory leaks, but it
> >>> definitively leads to a high memory consumption (2GB are not
> >>> enough for a
> >>> 200MB file). Also, my outline of the process is based on only 2
> >>> hours of
> >>> viewing the code, so actually I expected to be corrected on this.
> >>> Unfortunately, it seems like I did get the right idea and it IS
> >>> extremely
> >>> inefficient.
> >>>
> >>> I mean, I understand that this is a high level of abstraction that
> >>> might
> >>> come in handy in many situations, but it certainly is more of an
> >>> obstacle
> >>> in my specific case.
> >>>
> >>>> As always with java, don't try and optimize without a profiler
> >>>> which
> >>>> will tell you which methods are taking a long time and which
> >>>> objects
> >>>> take the most memory.
> >>>
> >>> I think we should continue this discussion on the biojava-dev list
> >>> or in
> >>> a private conversation, as it will probably get very detailed and
> >>> technical.
> >>>
> >>>
> >>> My question to this list again:
> >>> Is there a way to achieve my goal of parsing a 200MB Genbank file
> >>> with
> >>> the current biojava version without code changes?
> >>>
> >>>
> >>> - Florian
> >>>
> >>>> On 25 Jul 2009, 1:33 AM, "Florian Mittag"
> >>>> <florian.mittag at uni-tuebingen.de> wrote:
> >>>>
> >>>> Hi!
> >>>>
> >>>> I think this is a problem worth of its own thread, so I'll start
> >>>> one:
> >>>>
> >>>> I want to store all human chromosomes in a BioSQL database after I
> >>>> loaded the
> >>>> information from .gbk files. The files I get from NCBI with the
> >>>> following URIs, where the id ranges from nc_000001 to nc_000024
> >>>> plus
> >>>> nc_001804:
> >>>>
> >>>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id
> >>>>=n c_0 00023&rettype=gbwithparts&retmode=text
> >>>>
> >>>> I then try to parse the files as described in
> >>>> http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriti
> >>>>ng _fi les but it wont work. While there are no problems parsing 1804
> >>>> and
> >>>> 24, chromosome
> >>>> 23 leads to a OutOfMemory exception although I gave it 2GB of heap
> >>>> space.
> >>>>
> >>>> Here is a stack trace (the line numbers might differ, because I
> >>>> already
> >>>> tried
> >>>> to improve GenbankFormat.java in memory efficiency):
> >>>>
> >>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap
> >>>> space
> >>>>        at
> >>>> org
> >>>> .biojava
> >>>> .bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbol
> >>>> Lis tFactory.java:222) at
> >>>> org
> >>>> .biojavax
> >>>> .bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichS
> >>>> equ enceBuilder.java:256) at
> >>>> org
> >>>> .biojavax
> >>>> .bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.jav
> >>>> a:5 35) at
> >>>> org
> >>>> .biojavax
> >>>> .bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead
> >>>> er. java:110) at
> >>>> org
> >>>> .prodge
> >>>> .sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Ma
> >>>> in. java:537) at
> >>>> org
> >>>> .prodge
> >>>> .sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java
> >>>>
> >>>> :46 8) at
> >>>>
> >>>> org
> >>>> .prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:
> >>>> 164)
> >>>>
> >>>> The line in GenbankFormat.java is:
> >>>>
> >>>> rlistener.addSymbols(
> >>>>        symParser.getAlphabet(),
> >>>>        (Symbol[])(sl.toList().toArray(new Symbol[0])),
> >>>>        0, sl.length());
> >>>>
> >>>> Sometimes it fails at the sl.toList().toArray()-part, sometimes
> >>>> it fails
> >>>> later
> >>>> inside the addSymbols method, but it always fails.
> >>>>
> >>>> How can this be? I mean, the file is only 190MB in size, so 2GB of
> >>>> memory should be more than enough. Browsing through the source
> >>>> code, I
> >>>> discovered what I think of as very inefficient handling of
> >>>> sequences:
> >>>>
> >>>> 1) the sequence string is read from file into a StringBuffer
> >>>> 2) it is converted to a string (with whitespaces removed)
> >>>> 3) a SimpleSymbolList is created out of the string
> >>>> 4) the SymbolList is converted to a List of Symbols
> >>>> 5) the List is converted to an array of Symbols
> >>>> 6) the array is passed to addSymbols
> >>>> 7) there it is added to a ChunkedSymbolListFactory
> >>>> 8) if at some point the sequence is requested, a SymbolList is
> >>>> created
> >>>> and then converted to a string.
> >>>>
> >>>> You see, there is a lot of copying and converting, but in the end
> >>>> I have
> >>>> the same string I started with. Well, I had the string, if it ever
> >>>> reached the end, because it will crash before completing this
> >>>> process.
> >>>>
> >>>>
> >>>> Am I doing something wrong or is there a great potential of
> >>>> improving
> >>>> parsing
> >>>> of Genbank files?
> >>>>
> >>>>
> >>>> Regards,
> >>>>   Florian
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >>> --
> >>> Dipl. Inf. Florian Mittag
> >>> Universit?t Tuebingen
> >>> WSI-RA, Sand 1
> >>> 72076 Tuebingen, Germany
> >>> Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091
> >
> > --
> > Dipl. Inf. Florian Mittag
> > Universit?t Tuebingen
> > WSI-RA, Sand 1
> > 72076 Tuebingen, Germany
> > Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From markjschreiber at gmail.com  Wed Aug  5 13:16:03 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 5 Aug 2009 21:16:03 +0800
Subject: [Biojava-dev] How to parse large Genbank files?
In-Reply-To: <200908051445.42345.florian.mittag@uni-tuebingen.de>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<200907281414.55156.florian.mittag@uni-tuebingen.de>
	<E8A261BA-DED7-4BE5-A946-1561648BB527@eaglegenomics.com>
	<200908051445.42345.florian.mittag@uni-tuebingen.de>
Message-ID: <93b45ca50908050616n210bd2a3u8391d9ad7114015a@mail.gmail.com>

Would it be better for the biojava SimpleRichSequence to be backed by a
String and do symbol operations on the fly? Alternatively the default
hibernate mapping could be to a more stringy sequence.

Arguably in the absence of JPA and entity beans Hibernate should probably be
talking to biojava via DTOs. An efficient BioSQL loader would directly use
the DTOs or Entity beans (which could implement biojava interfaces) and not
go through all the symbol hassle.

Might be worth considering for BJ3

- Mark

On Aug 5, 2009 8:45 PM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
wrote:

On Tuesday, 28. July 2009 14:52, Richard Holland wrote: > > Btw: Should we
move this to Biojava-dev?...
done ;)

> If you want to explore my ideas for a replacement Sequence model, the >
code and docs are here (...
By now, I was mostly interested in a quick and dirty solution. I first
attempted to create a new class StringSymbolList that would use the String
as
representation for the sequence and only convert to Symbols on demand. Since
SimpleRichSequence uses SimpleSymbolList hard-coded, I wanted to implement a
new RichSequence as well, but I was back-stabbed by Hibernate, because the
bindings are set to SimpleRichSequence and when retrieving objects from the
DB it uses the original BioJava classes again

My solution now works and it consists out of my own implementation of
GenbankFormat, RichSequenceBuilder, and RichSequence, a new class called
StringSymbolList as described above and a change to SimpleRichSequence,
adding the method:

@Override
public String seqString() {
   return seqstring;
}

which circumvents most of the array copying stuff.

I also noticed that processing the Genbank files became slower with every
file, so I closed the Hibernate session after each chromosome and opened a
new one. (I also tried session.clean(), but somehow this didn't work).

For now, it seems like everything is fine and I have no more OutOfMemory
exceptions.

- Florian

> > cheers, > Richard > > > - Florian > > > >> On Mon, Jul 27, 2009 at 8:16
PM, Florian > >> > >> ...
> >>>>ng _fi les but it wont work. While there are no problems parsing 1804

> >>>> and > >>>> 24, chromosome > >>>> 23 leads to a OutOfMemory exception
although I gave it 2GB o...
--

Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076
Tuebingen, Germany Phone: +49 7...


From florian.mittag at uni-tuebingen.de  Wed Aug  5 15:41:24 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Wed, 5 Aug 2009 17:41:24 +0200
Subject: [Biojava-dev] Error loading Ontology with Hibernate
Message-ID: <200908051741.24367.florian.mittag@uni-tuebingen.de>

Hi, it's me again ;-)

I'm really sorry to bother you with yet another problem, but I seem to attract 
those problems.

When I parse Genbank files and store them in a BioSQL DB, all features 
like "gap", "mRNA", "gene", etc. are represented by newly created Terms in 
the ontology "biojavax" with the comment "autocreated by biojavax". I 
searched for an appropriate ontology and found the Sequence Ontology, which I 
loaded into the DB using BioPerl's load_ontology.pl

I tried setting the default ontology using 
RichObjectBuilder.setDefaultOntology("sequence"), but when it comes to 
instantiation the SimpleRichSequenceBuilder, a multi-nested exception is 
thrown. I followed it in the code and found the cause in Hibernate:

[SEVERE] <init>(): illegal access to loading collection >> 
org.hibernate.LazyInitializationException: illegal access to loading 
collection
	at 
org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:341)
	at 
org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
	at org.hibernate.collection.PersistentSet.toString(PersistentSet.java:309)
	at java.lang.String.valueOf(String.java:2827)
	at java.lang.StringBuilder.append(StringBuilder.java:115)
	at java.util.AbstractCollection.toString(AbstractCollection.java:422)
	at 
org.hibernate.engine.StatefulPersistenceContext.initializeNonLazyCollections(StatefulPersistenceContext.java:844)

probably cause by this exception

org.hibernate.PropertyAccessException: Null value was assigned to a property 
of primitive type setter of org.biojavax.SimpleRankedCrossRef.rank


The code to reproduce this:

sessionFactory = new Configuration().configure().buildSessionFactory();  
session = sessionFactory.openSession();                                         
RichObjectFactory.connectToBioSQL(session);
RichObjectFactory.setDefaultOntologyName("sequence");
Ontology onto = RichObjectFactory.getDefaultOntology();

My DB has the following ontologies listed:
- biological_process
- gene_ontology
- molecular_function
- cellular_component
- sequence
- biojavax

and only for "gene_ontology" and "biojavax" the above code snippet runs 
without failure. All ontologies were loaded with the load_ontology.pl script.


What might be the cause?

Thanks

- Florian


-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From florian.mittag at uni-tuebingen.de  Thu Aug  6 13:16:50 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 15:16:50 +0200
Subject: [Biojava-dev] Error loading Ontology with Hibernate
In-Reply-To: <200908051741.24367.florian.mittag@uni-tuebingen.de>
References: <200908051741.24367.florian.mittag@uni-tuebingen.de>
Message-ID: <200908061516.50183.florian.mittag@uni-tuebingen.de>

Found the cause.

After importing an ontology (Gene or Sequence Ontology) into the BioSQL using 
load_ontology.pl, the table "term_dbxref" has only NULL values in the rank 
column. I tried it with DB2 and MySQL, same results/error.

The way I see it, this is not a problem of Hibernate. Can I set the "rank" to 
an arbitrary value to circumvent this problem?


On Wednesday, 5. August 2009 17:41, Florian Mittag wrote:
> Hi, it's me again ;-)
>
> I'm really sorry to bother you with yet another problem, but I seem to
> attract those problems.
>
> When I parse Genbank files and store them in a BioSQL DB, all features
> like "gap", "mRNA", "gene", etc. are represented by newly created Terms in
> the ontology "biojavax" with the comment "autocreated by biojavax". I
> searched for an appropriate ontology and found the Sequence Ontology, which
> I loaded into the DB using BioPerl's load_ontology.pl
>
> I tried setting the default ontology using
> RichObjectBuilder.setDefaultOntology("sequence"), but when it comes to
> instantiation the SimpleRichSequenceBuilder, a multi-nested exception is
> thrown. I followed it in the code and found the cause in Hibernate:
>
> [SEVERE] <init>(): illegal access to loading collection >>
> org.hibernate.LazyInitializationException: illegal access to loading
> collection
> 	at
> org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPe
>rsistentCollection.java:341) at
> org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersiste
>ntCollection.java:86) at
> org.hibernate.collection.PersistentSet.toString(PersistentSet.java:309) at
> java.lang.String.valueOf(String.java:2827)
> 	at java.lang.StringBuilder.append(StringBuilder.java:115)
> 	at java.util.AbstractCollection.toString(AbstractCollection.java:422)
> 	at
> org.hibernate.engine.StatefulPersistenceContext.initializeNonLazyCollection
>s(StatefulPersistenceContext.java:844)
>
> probably cause by this exception
>
> org.hibernate.PropertyAccessException: Null value was assigned to a
> property of primitive type setter of org.biojavax.SimpleRankedCrossRef.rank
>
>
> The code to reproduce this:
>
> sessionFactory = new Configuration().configure().buildSessionFactory();
> session = sessionFactory.openSession();
> RichObjectFactory.connectToBioSQL(session);
> RichObjectFactory.setDefaultOntologyName("sequence");
> Ontology onto = RichObjectFactory.getDefaultOntology();
>
> My DB has the following ontologies listed:
> - biological_process
> - gene_ontology
> - molecular_function
> - cellular_component
> - sequence
> - biojavax
>
> and only for "gene_ontology" and "biojavax" the above code snippet runs
> without failure. All ontologies were loaded with the load_ontology.pl
> script.
>
>
> What might be the cause?
>
> Thanks
>
> - Florian

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From markjschreiber at gmail.com  Thu Aug  6 13:48:37 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 6 Aug 2009 21:48:37 +0800
Subject: [Biojava-dev] Error loading Ontology with Hibernate
In-Reply-To: <200908061516.50183.florian.mittag@uni-tuebingen.de>
References: <200908051741.24367.florian.mittag@uni-tuebingen.de>
	<200908061516.50183.florian.mittag@uni-tuebingen.de>
Message-ID: <93b45ca50908060648p2451096ax46a179e058a09551@mail.gmail.com>

There shouldn't be an issue with using an arbitrary value. The ranks in
biosql are mainly to preserve the order of features etc. during
roundtripping. It will affect sorting of ontology terms but this is probably
not a problem.

- mark

On Aug 6, 2009 9:42 PM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
wrote:

Found the cause.

After importing an ontology (Gene or Sequence Ontology) into the BioSQL
using
load_ontology.pl, the table "term_dbxref" has only NULL values in the rank
column. I tried it with DB2 and MySQL, same results/error.

The way I see it, this is not a problem of Hibernate. Can I set the "rank"
to
an arbitrary value to circumvent this problem?

On Wednesday, 5. August 2009 17:41, Florian Mittag wrote: > Hi, it's me
again ;-) > > I'm really s...


From florian.mittag at uni-tuebingen.de  Thu Aug  6 14:14:02 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 16:14:02 +0200
Subject: [Biojava-dev] Error loading Ontology with Hibernate
In-Reply-To: <93b45ca50908060648p2451096ax46a179e058a09551@mail.gmail.com>
References: <200908051741.24367.florian.mittag@uni-tuebingen.de>
	<200908061516.50183.florian.mittag@uni-tuebingen.de>
	<93b45ca50908060648p2451096ax46a179e058a09551@mail.gmail.com>
Message-ID: <200908061614.03033.florian.mittag@uni-tuebingen.de>

On Thursday, 6. August 2009 15:48, you wrote:
> There shouldn't be an issue with using an arbitrary value. The ranks in
> biosql are mainly to preserve the order of features etc. during
> roundtripping. It will affect sorting of ontology terms but this is
> probably not a problem.

Ok, then I will try this as a quick hack until I've found out if the NULL 
values are a bug and if it can be fixed.

Thanks for the quick answer!

- Florian


> On Aug 6, 2009 9:42 PM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
> wrote:
>
> Found the cause.
>
> After importing an ontology (Gene or Sequence Ontology) into the BioSQL
> using
> load_ontology.pl, the table "term_dbxref" has only NULL values in the rank
> column. I tried it with DB2 and MySQL, same results/error.
>
> The way I see it, this is not a problem of Hibernate. Can I set the "rank"
> to
> an arbitrary value to circumvent this problem?
>
> On Wednesday, 5. August 2009 17:41, Florian Mittag wrote: > Hi, it's me
> again ;-) > > I'm really s...


From holland at eaglegenomics.com  Fri Aug  7 17:51:59 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 7 Aug 2009 18:51:59 +0100
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <Pine.GSO.4.44.0908071313400.28289-100000@shell3.shore.net>
References: <Pine.GSO.4.44.0908071313400.28289-100000@shell3.shore.net>
Message-ID: <0AA4618C-2A99-4ACD-B07D-0AA05FE77665@eaglegenomics.com>

Several have said the same. I'll try to get both organised. Watch this  
space.

cheers,
Richard

On 7 Aug 2009, at 18:23, Michael Heuer wrote:

> Richard Holland wrote:
>
>> 10 people responded (including me). 5 of those are in Cambridge,  
>> UK, 3
>> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine  
>> the
>> hackathon with a holiday, and 3 suggested linking the hackathon  
>> with a
>> conference, which would almost certainly increase chances of getting
>> funding for travel/accommodation from employers.
>>
>> So, I have two options. Venues in both cases to be worked out later:
>>
>>   1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
>> of the winter in the UK, but on the bright side, the Cambridge Winter
>> Beer Festival runs from the 22nd-24th, so that's something to cheer
>> you up at the end of the hackathon.
>>
>>   2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
>> 9th-10th (TBC), then ISMB which is 11th-14th).
>
>
> I would suggest trying for both.  Winter in the UK means that a lot of
> work would get done.  Attendance would probably be better for Boston.
>
> I would caution that accomodations in Boston are quite expensive, and
> that the 4th of July week is the busiest week of the year with  
> tourists.
> Perhaps the hackathon in Boston might be arranged flexibly around the
> actual days of the conference, evenings and late nights and so on.
>
>   michael
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From heuermh at acm.org  Fri Aug  7 17:23:53 2009
From: heuermh at acm.org (Michael Heuer)
Date: Fri, 7 Aug 2009 13:23:53 -0400 (EDT)
Subject: [Biojava-dev] Hackathon update
In-Reply-To: <FA7D98EE-7839-4851-B71C-A78ED7273762@eaglegenomics.com>
Message-ID: <Pine.GSO.4.44.0908071313400.28289-100000@shell3.shore.net>

Richard Holland wrote:

> 10 people responded (including me). 5 of those are in Cambridge, UK, 3
> are in the US, 1 in Spain, and 1 in Singapore. 2 wanted to combine the
> hackathon with a holiday, and 3 suggested linking the hackathon with a
> conference, which would almost certainly increase chances of getting
> funding for travel/accommodation from employers.
>
> So, I have two options. Venues in both cases to be worked out later:
>
>    1. Cambridge, UK, January 18th-22nd 2010. I know this is the middle
> of the winter in the UK, but on the bright side, the Cambridge Winter
> Beer Festival runs from the 22nd-24th, so that's something to cheer
> you up at the end of the hackathon.
>
>    2. Boston, USA, July 5th-8th 2010 (immediately before BOSC which is
> 9th-10th (TBC), then ISMB which is 11th-14th).


I would suggest trying for both.  Winter in the UK means that a lot of
work would get done.  Attendance would probably be better for Boston.

I would caution that accomodations in Boston are quite expensive, and
that the 4th of July week is the busiest week of the year with tourists.
Perhaps the hackathon in Boston might be arranged flexibly around the
actual days of the conference, evenings and late nights and so on.

   michael


From andreas at sdsc.edu  Sun Aug 16 21:41:03 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 16 Aug 2009 14:41:03 -0700
Subject: [Biojava-dev] plans for next months
Message-ID: <59a41c430908161441l3ae3ebao524237a1b7b868fe@mail.gmail.com>

Hi,

Here a quick summary of what I propose to be our action plan for the
next months for BioJava:

* I would like to call for a code-freeze in 2 weeks (or so) in order
to finalize  the new modularized and mavenized version of biojava for
the developers. The current developmental trunk will remain
permanently frozen and all future work should continue at a new
location in SVN. As such it will be important that all  developers
commit any changes they are working on before that.

* We will update the documentation for how to obtain a new mavenized
checkout on the wiki.

* After the change the new modules need to be tested and if no major
problems are found, the ok will be given to continue working on the
new modules (at the new location)

* All developers should obtain a new checkout.

* We need to identify sub-module leaders who will take over leadership
of the sub-modules.

In order to come up with a new release of biojava we should continue
development on the new modules for a few months. Talking off list with
Richard Holland it looks like we will have a hackaton in January in
Cambridge, U.K. (details to be finalized and announced). I suggest
that we use that opportunity to focus on further developing the
modules and make a new public BioJava release shortly after that.

At the present I see the following topics that would be great to work
on until and during the hackaton in order to prepare a shiny new
version of BioJava for public release:

+ Work on standardizing the organization of the modules (tests,
examples, source, docu etc.)
+ Add new modules
+ Improve existing modules
+ Anything the module leaders deem necessary for their modules.
+ Use OSGI for visualisation related modules

I can post a more detailed and specific list of things to work on if
people are interested.

Andreas


From andreas at sdsc.edu  Mon Aug 24 04:18:14 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 23 Aug 2009 21:18:14 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action items for sub modules
Message-ID: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>

Hi,

In order to push the modularization and migration to Maven, I would like to
declare a code freeze on the current developmental trunk. Please commit all
new changes by

Thursday 27th of August 23:00 GMT.

In the week after I would like to refactor the code base and commit the
initial set of modules to a new developmental trunk.  All future development
will happen on that new trunk.

You will be able to follow the ongoing status of this at

http://biojava.org/wiki/BioJava:MavenMigration


Once the modules are in place it is a good moment to hand over the
leadership of the sub-modules to the new module-project leaders. It will be
up to the module-lead to take the modules into the direction that he/she
feels important. I would like to take this opportunity to suggest a couple
of people as module-leaders and propose some action items for the modules.
Feel free to comment or make additional suggestions...

Here a list of modules / action items and the people that I would propose to
become module leaders:

Module: biojava-core Lead: Andreas Prlic
 - break the new modules out of core
 - bring up to modern Java standards, use Generics
 - declare old/unused code obsolete
 - don;t break backwards compatibility

Module: biojava-sequence Lead: Richard Holland
 - Bring in Richard's new code that he started to develop on the biojava-3
branch.
 - provide a more scaleable and efficient basis for dealing with large
sequence files

Module: biojava-alignment Lead: Andreas Draeger
 - allow better access to underlying dynamic programming data structures
 - allow more customizable display of pairwise alignments (HTML/plain text,
etc)

Module : biojava-blast Lead: still looking for a leader
 - provide access to all details of the blast output
 - add support for RPS blast

Module: biojava-phylo Lead: Scooter Willis
 - provide improved NJtree /Jalview

Module: biojava-biosql Lead: Richard Holland
 - merge the new biojava-sequence module with the current biojava-biosql
code


Module: biojava-structure Lead: Andreas Prlic
 - add support for SCOP file parsing
 - add support for easy access of domains (in terms of coordinates)
 - add secondary structure assignment
 - improve structure alignments
 - better integration with 3D viewers (Jmol, RCSB viewers)

Module: biojava-web services:
The details seem still to be under discussion and perhaps we need multiple
modules here?
also what about REST vs. SOAP? To be discussed. People who expressed
interest are:
Niall Haslam,Scooter Willis, Sylvain Foisy

Module?: biojava-ws-blast
Module?: biojava-ws-biolit

Module: biojava-sequencing Lead: ???
  - support FastQ files
 - support parsing of output for various new sequencing machines

This is only an initial set of modules and I think it is safe to say that
more modules will be added after more discussions (and people volunteering
to contribute).

Andreas


From simpleyrx at 163.com  Mon Aug 24 16:48:01 2009
From: simpleyrx at 163.com (simpleyrx)
Date: Tue, 25 Aug 2009 00:48:01 +0800 (CST)
Subject: [Biojava-dev] Adding profile-profile alignment algorithms to Biojava
Message-ID: <9551386.424471251132481047.JavaMail.coremail@app180.163.com>


Experts,
 
           Profile-profile alignment or HMM-HMM alignments have become more important in protein bioinformation field than ever before.  So I think, if we can  implement Profile-profile alignment and HMM-HMM alignments algorithms in Biojava package, it will be more useful to the researchers who interested in protein bioinformatics.


From holland at eaglegenomics.com  Mon Aug 24 17:30:31 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 24 Aug 2009 18:30:31 +0100
Subject: [Biojava-dev] Adding profile-profile alignment algorithms to
	Biojava
In-Reply-To: <9551386.424471251132481047.JavaMail.coremail@app180.163.com>
References: <9551386.424471251132481047.JavaMail.coremail@app180.163.com>
Message-ID: <ECEEC52A-B615-4140-B84E-52097A36A4D0@eaglegenomics.com>

Contributions of code would be welcome! Are you volunteering? :)

cheers,
Richard

On 24 Aug 2009, at 17:48, simpleyrx wrote:

>
> Experts,
>
>           Profile-profile alignment or HMM-HMM alignments have  
> become more important in protein bioinformation field than ever  
> before.  So I think, if we can  implement Profile-profile alignment  
> and HMM-HMM alignments algorithms in Biojava package, it will be  
> more useful to the researchers who interested in protein  
> bioinformatics.
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From heuermh at acm.org  Tue Aug 25 01:19:24 2009
From: heuermh at acm.org (Michael Heuer)
Date: Mon, 24 Aug 2009 21:19:24 -0400 (EDT)
Subject: [Biojava-dev] BioJava code freeze,
 modularization and action items for sub modules
In-Reply-To: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>

Andreas Prlic wrote:

> In order to push the modularization and migration to Maven, I would like to
> declare a code freeze on the current developmental trunk. Please commit all
> new changes by
>
> Thursday 27th of August 23:00 GMT.
>
> In the week after I would like to refactor the code base and commit the
> initial set of modules to a new developmental trunk.  All future development
> will happen on that new trunk.
>
> You will be able to follow the ongoing status of this at
>
> http://biojava.org/wiki/BioJava:MavenMigration
>
>
> Once the modules are in place it is a good moment to hand over the
> leadership of the sub-modules to the new module-project leaders. It will be
> up to the module-lead to take the modules into the direction that he/she
> feels important. I would like to take this opportunity to suggest a couple
> of people as module-leaders and propose some action items for the modules.
> Feel free to comment or make additional suggestions...

Sign me up for help with maven configuration/reporting, unit testing, and
generics API matters if you wish.


> Here a list of modules / action items and the people that I would propose to
> become module leaders:
>
> Module: biojava-core Lead: Andreas Prlic
>  - break the new modules out of core
>  - bring up to modern Java standards, use Generics
>  - declare old/unused code obsolete
>  - don;t break backwards compatibility

Seems to me the last one will greatly hamper the rest of this effort.
The next version needs to be binary compatible with 1.7?

   michael


From andreas at sdsc.edu  Tue Aug 25 02:17:00 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 24 Aug 2009 19:17:00 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
Message-ID: <59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>

>> Once the modules are in place it is a good moment to hand over the
>> leadership of the sub-modules to the new module-project leaders. It will be
>> up to the module-lead to take the modules into the direction that he/she
>> feels important. I would like to take this opportunity to suggest a couple
>> of people as module-leaders and propose some action items for the modules.
>> Feel free to comment or make additional suggestions...
>
> Sign me up for help with maven configuration/reporting, unit testing, and
> generics API matters if you wish.

Excellent, I will come back to you on this :-)

>> ?- don;t break backwards compatibility
>
> Seems to me the last one will greatly hamper the rest of this effort.
> The next version needs to be binary compatible with 1.7?


What I mean is that we should try not to disrupt things as much as is
reasonable. I am all for a pragmatic approach. While trying to be
conservative I guess refactoring should be discussed on a case by case
basis. To give an example: an area where I am supporting re-factoring
is the blast parser. The package name is confusing and we probably
need some code changes to expose more details of the parser. Are you
thinking of any other situtations, where you think breaking backwards
compatibility will be inevitable?

Andreas


From heuermh at acm.org  Tue Aug 25 02:50:09 2009
From: heuermh at acm.org (Michael Heuer)
Date: Mon, 24 Aug 2009 22:50:09 -0400 (EDT)
Subject: [Biojava-dev] BioJava code freeze,
 modularization and action  items for sub modules
In-Reply-To: <59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0908242243520.18799-100000@shell3.shore.net>

Andreas Prlic wrote:

> >> Once the modules are in place it is a good moment to hand over the
> >> leadership of the sub-modules to the new module-project leaders. It will be
> >> up to the module-lead to take the modules into the direction that he/she
> >> feels important. I would like to take this opportunity to suggest a couple
> >> of people as module-leaders and propose some action items for the modules.
> >> Feel free to comment or make additional suggestions...
> >
> > Sign me up for help with maven configuration/reporting, unit testing, and
> > generics API matters if you wish.
>
> Excellent, I will come back to you on this :-)
>
> >> ?- don;t break backwards compatibility
> >
> > Seems to me the last one will greatly hamper the rest of this effort.
> > The next version needs to be binary compatible with 1.7?
>
>
> What I mean is that we should try not to disrupt things as much as is
> reasonable. I am all for a pragmatic approach. While trying to be
> conservative I guess refactoring should be discussed on a case by case
> basis. To give an example: an area where I am supporting re-factoring
> is the blast parser. The package name is confusing and we probably
> need some code changes to expose more details of the parser. Are you
> thinking of any other situtations, where you think breaking backwards
> compatibility will be inevitable?

Ah yes, pragmatically backwards compatible with 1.7 is a better goal.

Maintaining binary compatibility is very difficult, and something we
haven't really done in the past.  Consider the following biojava 1.6.1 vs
biojava 1.7 clirr [1] report.

   michael


[1] http://clirr.sf.net

---
ERROR: 6004: org.biojava.bio.alignment.NeedlemanWunsch: Changed type of field CostMatrix from double[][] to int[][]
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 2 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 3 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 4 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 5 of 'public NeedlemanWunsch(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getDelete()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getEditDistance()' has been changed to int
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getGapExt()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getInsert()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getMatch()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double getReplace()' has been changed to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'protected double min(double, double, double)' has changed its type to int
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 2 of 'protected double min(double, double, double)' has changed its type to int
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 3 of 'protected double min(double, double, double)' has changed its type to int
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'protected double min(double, double, double)' has been changed to int
ERROR: 7006: org.biojava.bio.alignment.NeedlemanWunsch: Return type of method 'public double pairwiseAlignment(org.biojava.bio.symbol.SymbolList, org.biojava.bio.symbol.SymbolList)' has been changed to int
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public java.lang.String printCostMatrix(double[][], char[], char[])' has changed its type to int[][]
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setDelete(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setGapExt(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setInsert(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setMatch(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.NeedlemanWunsch: Parameter 1 of 'public void setReplace(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SequenceAlignment: Parameter 11 of 'public java.lang.String formatOutput(java.lang.String, java.lang.String, java.lang.String[], java.lang.String, int, int, long, int, int, long, double, long)' has changed its type to int
ERROR: 7006: org.biojava.bio.alignment.SequenceAlignment: Return type of method 'public java.lang.String formatOutput(java.lang.String, java.lang.String, java.lang.String[], java.lang.String, int, int, long, int, int, long, double, long)' has been changed to java.lang.StringBuffer
ERROR: 7006: org.biojava.bio.alignment.SequenceAlignment: Return type of method 'public double pairwiseAlignment(org.biojava.bio.symbol.SymbolList, org.biojava.bio.symbol.SymbolList)' has been changed to int
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 2 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 3 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 4 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 5 of 'public SmithWaterman(double, double, double, double, double, org.biojava.bio.alignment.SubstitutionMatrix)' has changed its type to short
ERROR: 7006: org.biojava.bio.alignment.SmithWaterman: Return type of method 'public double pairwiseAlignment(org.biojava.bio.symbol.SymbolList, org.biojava.bio.symbol.SymbolList)' has been changed to int
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setDelete(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setGapExt(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setInsert(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setMatch(double)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SmithWaterman: Parameter 1 of 'public void setReplace(double)' has changed its type to short
ERROR: 6004: org.biojava.bio.alignment.SubstitutionMatrix: Changed type of field matrix from int[][] to short[][]
ERROR: 6004: org.biojava.bio.alignment.SubstitutionMatrix: Changed type of field max from int to short
ERROR: 6004: org.biojava.bio.alignment.SubstitutionMatrix: Changed type of field min from int to short
ERROR: 7005: org.biojava.bio.alignment.SubstitutionMatrix: Parameter 2 of 'public SubstitutionMatrix(org.biojava.bio.symbol.FiniteAlphabet, int, int)' has changed its type to short
ERROR: 7005: org.biojava.bio.alignment.SubstitutionMatrix: Parameter 3 of 'public SubstitutionMatrix(org.biojava.bio.symbol.FiniteAlphabet, int, int)' has changed its type to short
INFO: 7011: org.biojava.bio.alignment.SubstitutionMatrix: Method 'public SubstitutionMatrix(java.io.File)' has been added
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'public int getMax()' has been changed to short
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'public int getMin()' has been changed to short
INFO: 7011: org.biojava.bio.alignment.SubstitutionMatrix: Method 'public org.biojava.bio.alignment.SubstitutionMatrix getSubstitutionMatrix(java.io.BufferedReader)' has been added
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'public int getValueAt(org.biojava.bio.symbol.Symbol, org.biojava.bio.symbol.Symbol)' has been changed to short
ERROR: 7005: org.biojava.bio.alignment.SubstitutionMatrix: Parameter 1 of 'protected int[][] parseMatrix(java.lang.String)' has changed its type to java.lang.Object
ERROR: 7006: org.biojava.bio.alignment.SubstitutionMatrix: Return type of method 'protected int[][] parseMatrix(java.lang.String)' has been changed to short[][]
ERROR: 7009: org.biojava.bio.alignment.SubstitutionMatrix: Accessibility of method 'protected int[][] parseMatrix(java.lang.String)' has been decreased from protected to private
INFO: 7003: org.biojava.bio.dp.onehead.SmallCursor: Method 'public boolean canAdvance()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.dp.onehead.SmallCursor: Method 'public org.biojava.bio.symbol.Symbol currentRes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.dp.onehead.SmallCursor: Method 'public org.biojava.bio.symbol.Symbol lastRes()' has been removed, but an inherited definition exists.
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public ArrowGlyph(java.awt.Paint, java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public ArrowGlyph(java.awt.geom.Rectangle2D$Float, java.awt.Paint, java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public java.awt.Paint getFillPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public java.awt.Paint getOuterPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public void setDirection(int)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public void setFillPaint(java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.ArrowGlyph: Method 'public void setOuterPaint(java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.RectangleGlyph: Method 'public java.awt.Paint getPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.RectangleGlyph: Method 'public void setPaint(java.awt.Paint)' has been added
INFO: 7011: org.biojava.bio.gui.glyph.TurnGlyph: Method 'public java.awt.Paint getPaint()' has been added
INFO: 7011: org.biojava.bio.gui.glyph.TurnGlyph: Method 'public void setPaint(java.awt.Paint)' has been added
INFO: 6009: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Accessibility of field fList has been increased from private to protected
INFO: 6009: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Accessibility of field gList has been increased from private to protected
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public boolean containsFilter(org.biojava.bio.seq.FeatureFilter)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public org.biojava.bio.seq.FeatureFilter getFeatureFilter(int)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public org.biojava.bio.gui.glyph.Glyph getGlyphForFilter(org.biojava.bio.seq.FeatureFilter)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public void removeFilterWithGlyph(org.biojava.bio.seq.FeatureFilter)' has been added
INFO: 7011: org.biojava.bio.gui.sequence.GlyphFeatureRenderer: Method 'public void setGlyphForFilter(org.biojava.bio.seq.FeatureFilter, org.biojava.bio.gui.glyph.Glyph)' has been added
INFO: 6009: org.biojava.bio.gui.sequence.SequencePanelWrapper: Accessibility of field seqPanels has been increased from private to protected
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.BlastLikeSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.ClustalWAlignmentSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSearchSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.FastaSequenceSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.PdbSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void addPrefixMapping(java.lang.String, java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.ContentHandler getContentHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.DTDHandler getDTDHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.EntityResolver getEntityResolver()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public org.xml.sax.ErrorHandler getErrorHandler()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public boolean getFeature(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.String getNamespacePrefix()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public boolean getNamespacePrefixes()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public boolean getNamespaces()' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.Object getProperty(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.String getURIFromPrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void parse(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public java.lang.String prefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setContentHandler(org.xml.sax.ContentHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setDTDHandler(org.xml.sax.DTDHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setEntityResolver(org.xml.sax.EntityResolver)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setErrorHandler(org.xml.sax.ErrorHandler)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setFeature(java.lang.String, boolean)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setNamespacePrefix(java.lang.String)' has been removed, but an inherited definition exists.
INFO: 7003: org.biojava.bio.program.sax.SequenceAlignmentSAXParser: Method 'public void setProperty(java.lang.String, java.lang.Object)' has been removed, but an inherited definition exists.


From holland at eaglegenomics.com  Tue Aug 25 04:32:24 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 25 Aug 2009 05:32:24 +0100
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
Message-ID: <459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>

>
>
> What I mean is that we should try not to disrupt things as much as is
> reasonable. I am all for a pragmatic approach. While trying to be
> conservative I guess refactoring should be discussed on a case by case
> basis. To give an example: an area where I am supporting re-factoring
> is the blast parser. The package name is confusing and we probably
> need some code changes to expose more details of the parser. Are you
> thinking of any other situtations, where you think breaking backwards
> compatibility will be inevitable?

Almost all the parsers would fit this category, as would any realistic  
attempt to 'fix' the sequence model by moving bits of the APIs around  
(for instance, Sequences have Features which have Strands, but  
Locations do _not_ have Strands - which is all wrong, because Strand  
is a Location-level concept, not a Feature-level concept).

My original plan was to not even attempt to make new versions backward  
compatible, and instead to have a separate module which coerced the  
new objects into complying with the old API interface declarations (by  
using the facade model).

cheers,
Richard

> Andreas
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Tue Aug 25 06:58:40 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 25 Aug 2009 14:58:40 +0800
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com> 
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net> 
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com> 
	<459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
Message-ID: <93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>

I would agree with Richard on this. I think the changes being proposed
are not compatible with the current API. There are a couple of things
wrong with the current model (such as the Feature, Strand, Location
issues). There are also several areas where best-practices of the past
(parts of BioJava are 10 years old) are not considered best practices
now (some like Singletons are often thought of as anti-patterns these
days).

Add to that the fact that we have never been truly backwards
compatible (expept maybe 1.3 and 1.3.1 ?)  and I think we can
justifiably try and avoid the claim that BJ1.7 should be backwards
compatible.  We can continue to make older Jars available for people
who need them although most likely people who have a need for legacy
support already have the Jars that they need bundled up with their
apps. Shared libraries have very much fallen out of favor in recent
years in almost all languages and system wide classpaths are asking
for trouble.  Hard-drives are cheap so it is no big deal to have a
dedicated version of the BioJava jar bundled with each app that needs
it.

We could adopt the idea that backwards compatible builds get
minor-version numbers eg 1.1 while other builds get major version
numbers. I guess this would mean we are at BioJava 7 ?

Backwards compatibility would be great to have but not if the effort
required hinders innovation.

- Mark

On Tue, Aug 25, 2009 at 12:32 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
>>
>>
>> What I mean is that we should try not to disrupt things as much as is
>> reasonable. I am all for a pragmatic approach. While trying to be
>> conservative I guess refactoring should be discussed on a case by case
>> basis. To give an example: an area where I am supporting re-factoring
>> is the blast parser. The package name is confusing and we probably
>> need some code changes to expose more details of the parser. Are you
>> thinking of any other situtations, where you think breaking backwards
>> compatibility will be inevitable?
>
> Almost all the parsers would fit this category, as would any realistic attempt to 'fix' the sequence model by moving bits of the APIs around (for instance, Sequences have Features which have Strands, but Locations do _not_ have Strands - which is all wrong, because Strand is a Location-level concept, not a Feature-level concept).
>
> My original plan was to not even attempt to make new versions backward compatible, and instead to have a separate module which coerced the new objects into complying with the old API interface declarations (by using the facade model).
>
> cheers,
> Richard
>
>> Andreas
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From jacobsen at ebi.ac.uk  Tue Aug 25 08:45:52 2009
From: jacobsen at ebi.ac.uk (Jules Jacobsen)
Date: Tue, 25 Aug 2009 09:45:52 +0100
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com> 
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net> 
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com> 
	<459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
	<93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>
Message-ID: <12c279870908250145waf21d9fmed256a3573a9ee1d@mail.gmail.com>

I think Mark has a good point here - there are certain aspects of
BioJava which are considered to be un-necessarily over-complicated and
these things have been deal-breakers for the people concerned - I
remember a couple of cases from the EBI where they have implemented
their own system instead of using and supporting BioJava.

Fixing areas of confusion, simplifying and moving forwards without
maintaining backwards-compatibility might be a good idea for
increasing user numbers and elevating the general perception of the
project, whilst potentially risking alienating some existing users.

I think his idea of maintaining compatibility within point releases
and stating that full version releases may not have backwards
compatibility would make it clearer for users as to what to expect
from a release. It may also help the developers stay on track with the
task and general design focus for that release by constraining them to
the current system during a point release whilst highlighting
confusing areas which can be dealt with in a more satifsfactory manner
in the next full release.

 Jules

On Tue, Aug 25, 2009 at 7:58 AM, Mark Schreiber<markjschreiber at gmail.com> wrote:
> I would agree with Richard on this. I think the changes being proposed
> are not compatible with the current API. There are a couple of things
> wrong with the current model (such as the Feature, Strand, Location
> issues). There are also several areas where best-practices of the past
> (parts of BioJava are 10 years old) are not considered best practices
> now (some like Singletons are often thought of as anti-patterns these
> days).
>
> Add to that the fact that we have never been truly backwards
> compatible (expept maybe 1.3 and 1.3.1 ?) ?and I think we can
> justifiably try and avoid the claim that BJ1.7 should be backwards
> compatible. ?We can continue to make older Jars available for people
> who need them although most likely people who have a need for legacy
> support already have the Jars that they need bundled up with their
> apps. Shared libraries have very much fallen out of favor in recent
> years in almost all languages and system wide classpaths are asking
> for trouble. ?Hard-drives are cheap so it is no big deal to have a
> dedicated version of the BioJava jar bundled with each app that needs
> it.
>
> We could adopt the idea that backwards compatible builds get
> minor-version numbers eg 1.1 while other builds get major version
> numbers. I guess this would mean we are at BioJava 7 ?
>
> Backwards compatibility would be great to have but not if the effort
> required hinders innovation.
>
> - Mark
>
> On Tue, Aug 25, 2009 at 12:32 PM, Richard Holland
> <holland at eaglegenomics.com> wrote:
>>>
>>>
>>> What I mean is that we should try not to disrupt things as much as is
>>> reasonable. I am all for a pragmatic approach. While trying to be
>>> conservative I guess refactoring should be discussed on a case by case
>>> basis. To give an example: an area where I am supporting re-factoring
>>> is the blast parser. The package name is confusing and we probably
>>> need some code changes to expose more details of the parser. Are you
>>> thinking of any other situtations, where you think breaking backwards
>>> compatibility will be inevitable?
>>
>> Almost all the parsers would fit this category, as would any realistic attempt to 'fix' the sequence model by moving bits of the APIs around (for instance, Sequences have Features which have Strands, but Locations do _not_ have Strands - which is all wrong, because Strand is a Location-level concept, not a Feature-level concept).
>>
>> My original plan was to not even attempt to make new versions backward compatible, and instead to have a separate module which coerced the new objects into complying with the old API interface declarations (by using the facade model).
>>
>> cheers,
>> Richard
>>
>>> Andreas
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


Jules Jacobsen

UniProt-PDB Integration
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SD
UK


From andreas at sdsc.edu  Tue Aug 25 17:36:45 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 25 Aug 2009 10:36:45 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <12c279870908250145waf21d9fmed256a3573a9ee1d@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908242114240.18799-100000@shell3.shore.net>
	<59a41c430908241917r6beb5329wb862ce8913ac74d7@mail.gmail.com>
	<459AAD48-B5F5-4725-9142-287726BBB931@eaglegenomics.com>
	<93b45ca50908242358x4181df07ye61197a2d23b6a0@mail.gmail.com>
	<12c279870908250145waf21d9fmed256a3573a9ee1d@mail.gmail.com>
Message-ID: <59a41c430908251036s616ab5f3m825d95223e758d85@mail.gmail.com>

I agree with all that has been said so far. The Sequence/Feature model
is definitely not good enough and well, also does not work for protein
structures.  (There can be alternate positions and the numbering can
be non-sequential and have negative positions.)

Still the question is, do we need to throw away the backwards
compatibility? The new modularization will allow a plug and play
architecture and we could easily have two generations of code in
different modules. That way legacy code could depend on the older
"core" (perhaps we should find a different name) while newly written
code will be based on biojava-sequence, which would contain Richard's
new code. That way we could prepare the code for the future, while
still embracing the past.

One example that heavily uses the Sequence and Distributions APIs is
NestedMica. It is a pretty cool machine learning software and I was
hoping that we could bring that closer to biojava. (a machine learning
module in BJ would be cool, no?)

Andreas


On Tue, Aug 25, 2009 at 1:45 AM, Jules Jacobsen<jacobsen at ebi.ac.uk> wrote:
> I think Mark has a good point here - there are certain aspects of
> BioJava which are considered to be un-necessarily over-complicated and
> these things have been deal-breakers for the people concerned - I
> remember a couple of cases from the EBI where they have implemented
> their own system instead of using and supporting BioJava.
>
> Fixing areas of confusion, simplifying and moving forwards without
> maintaining backwards-compatibility might be a good idea for
> increasing user numbers and elevating the general perception of the
> project, whilst potentially risking alienating some existing users.
>
> I think his idea of maintaining compatibility within point releases
> and stating that full version releases may not have backwards
> compatibility would make it clearer for users as to what to expect
> from a release. It may also help the developers stay on track with the
> task and general design focus for that release by constraining them to
> the current system during a point release whilst highlighting
> confusing areas which can be dealt with in a more satifsfactory manner
> in the next full release.
>
> ?Jules
>
> On Tue, Aug 25, 2009 at 7:58 AM, Mark Schreiber<markjschreiber at gmail.com> wrote:
>> I would agree with Richard on this. I think the changes being proposed
>> are not compatible with the current API. There are a couple of things
>> wrong with the current model (such as the Feature, Strand, Location
>> issues). There are also several areas where best-practices of the past
>> (parts of BioJava are 10 years old) are not considered best practices
>> now (some like Singletons are often thought of as anti-patterns these
>> days).
>>
>> Add to that the fact that we have never been truly backwards
>> compatible (expept maybe 1.3 and 1.3.1 ?) ?and I think we can
>> justifiably try and avoid the claim that BJ1.7 should be backwards
>> compatible. ?We can continue to make older Jars available for people
>> who need them although most likely people who have a need for legacy
>> support already have the Jars that they need bundled up with their
>> apps. Shared libraries have very much fallen out of favor in recent
>> years in almost all languages and system wide classpaths are asking
>> for trouble. ?Hard-drives are cheap so it is no big deal to have a
>> dedicated version of the BioJava jar bundled with each app that needs
>> it.
>>
>> We could adopt the idea that backwards compatible builds get
>> minor-version numbers eg 1.1 while other builds get major version
>> numbers. I guess this would mean we are at BioJava 7 ?
>>
>> Backwards compatibility would be great to have but not if the effort
>> required hinders innovation.
>>
>> - Mark
>>
>> On Tue, Aug 25, 2009 at 12:32 PM, Richard Holland
>> <holland at eaglegenomics.com> wrote:
>>>>
>>>>
>>>> What I mean is that we should try not to disrupt things as much as is
>>>> reasonable. I am all for a pragmatic approach. While trying to be
>>>> conservative I guess refactoring should be discussed on a case by case
>>>> basis. To give an example: an area where I am supporting re-factoring
>>>> is the blast parser. The package name is confusing and we probably
>>>> need some code changes to expose more details of the parser. Are you
>>>> thinking of any other situtations, where you think breaking backwards
>>>> compatibility will be inevitable?
>>>
>>> Almost all the parsers would fit this category, as would any realistic attempt to 'fix' the sequence model by moving bits of the APIs around (for instance, Sequences have Features which have Strands, but Locations do _not_ have Strands - which is all wrong, because Strand is a Location-level concept, not a Feature-level concept).
>>>
>>> My original plan was to not even attempt to make new versions backward compatible, and instead to have a separate module which coerced the new objects into complying with the old API interface declarations (by using the facade model).
>>>
>>> cheers,
>>> Richard
>>>
>>>> Andreas
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> Jules Jacobsen
>
> UniProt-PDB Integration
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge
> CB10 1SD
> UK
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From cmasak at gmail.com  Wed Aug 26 13:29:24 2009
From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=)
Date: Wed, 26 Aug 2009 15:29:24 +0200
Subject: [Biojava-dev] [BUG] Infinite regress when calling
	DNATools.createDNASequence with a DNA string containing a '~' char
Message-ID: <16d769b70908260629m15512bc4tb8798d41d53fad0f@mail.gmail.com>

Hello,

Two things:

1. The BioJava wiki links to a Bugzilla instance, saying bugs should
be posted there ([1]). As I write this, that Bugzilla instance gives a
500 Internal Server Error ([2]).

[1] <http://biojava.org/wiki/BioJava:MailingLists#Bug_Reports>
[2] <http://bugzilla.open-bio.org/enter_bug.cgi?product=BioJava>

2. In the face of this, I hope you don't mind I leave my bug report
here for the time being. We're wrapping BioJava in the Bioclipse
project. We've found what appears to be a logical bug causing an
infinite regress and a stack overflow.

Let's call DNATools.createDNASequence("~", ""). The following code in
that method (org/biojava/bio/seq/DNATools.java:188) will be executed.

  public static Sequence createDNASequence(String dna, String name)
  throws IllegalSymbolException {
    //should I be calling createGappedDNASequence?
    if(dna.indexOf('-') != -1 || dna.indexOf('~') != -1){//there is a gap
        return createGappedDNASequence(dna, name);
    }

The following code in createGappedDNASequence (DNATools.java:207) will
be executed:

    /** Get a new dna as a GappedSequence */
    public static GappedSequence createGappedDNASequence(String dna,
String name) throws IllegalSymbolException{
        String dna1 = dna.replaceAll("-", "");
        Sequence dnaSeq = createDNASequence(dna1, name);

The infinite regress is caused by these two methods calling each
other, for ever. There is no bottoming-out, because none of these
lines removes '~' characters.

We experience this problem in Biojava 1.6, but the above code and line
numbers are from 1.7, where the issue remains.

Regards,
// Carl M?sak


From heuermh at acm.org  Thu Aug 27 17:01:31 2009
From: heuermh at acm.org (Michael Heuer)
Date: Thu, 27 Aug 2009 13:01:31 -0400 (EDT)
Subject: [Biojava-dev] BioJava code freeze,
 modularization and action items for sub modules
In-Reply-To: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>

Andreas Prlic wrote:

> Here a list of modules / action items and the people that I would propose to
> become module leaders:
> ...
>
> Module: biojava-sequencing Lead:  Michael Heuer
>   - support FastQ files
>   - support parsing of output for various new sequencing machines

I have volunteered on the open-bio mailing list to implement FASTQ
support.  A nice collection of test data is being created in collaboration
with the other open-bio projects.  If anyone has interest in a particular
data set, please let me know, as I will also need data for performance
tuning.

   michael


From andreas at sdsc.edu  Thu Aug 27 17:30:08 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 27 Aug 2009 10:30:08 -0700
Subject: [Biojava-dev] BioJava code freeze,
	modularization and action 	items for sub modules
In-Reply-To: <Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>
Message-ID: <59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>

Great, thanks for "volunteering", Michael.

To add another Module:

biojava-das : Lead: Jonathan Warren
probably deprecate the old DAS code in BJ and replace it with
the up to date Dasobert library

Thanks to Jonathan for volunteering as well.

Andreas


On Thu, Aug 27, 2009 at 10:01 AM, Michael Heuer<heuermh at acm.org> wrote:
> Andreas Prlic wrote:
>
>> Here a list of modules / action items and the people that I would propose to
>> become module leaders:
>> ...
>>
>> Module: biojava-sequencing Lead: ?Michael Heuer
>> ? - support FastQ files
>> ? - support parsing of output for various new sequencing machines
>
> I have volunteered on the open-bio mailing list to implement FASTQ
> support. ?A nice collection of test data is being created in collaboration
> with the other open-bio projects. ?If anyone has interest in a particular
> data set, please let me know, as I will also need data for performance
> tuning.
>
> ? michael
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From markjschreiber at gmail.com  Fri Aug 28 05:37:59 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 28 Aug 2009 13:37:59 +0800
Subject: [Biojava-dev] [Biojava-l]  BioJava code freeze,
	modularization and 	action items for sub modules
In-Reply-To: <59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com> 
	<Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net> 
	<59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>
Message-ID: <93b45ca50908272237k2485a1d8le343a8b1dc10ae12@mail.gmail.com>

I'm happy to volunteer code for:


   1. BLASTXML parser as long as I can change the ssbind APIs (other parsers
   could go into a legacy module??). Actually I would prefer to completely
   decouple from the sequence/ feature module as many people would like a blast
   parser without the rest of biojava thrown in.
   2. BioSQL/ JPA bindings. I have already generated JPA compliant entity
   beans for mapping to BioSQL as well as JPA handler code that makes sure
   modifications presist properly. Currently the object model very closely
   follows the BioSQL table structure.  Also the current beans are what people
   call Anaemic beans in that they hold data and provide getters and setters
   but no biological behaivour. I can easily provide bio-smarts to the beans
   but it might be better to hold off until there is a module that contains
   sequence/feature interfaces which the beans could implement.
   3. Happy to provide code for an enterprise module if there is sufficient
   interest. This would probably take the form of SessionBeans and WebServices
   that can be deployed to Glassfish/ JBoss etc to provide biological services
   for people who want to make client server or SOA apps.

- Mark


On Fri, Aug 28, 2009 at 1:30 AM, Andreas Prlic <andreas at sdsc.edu> wrote:

> Great, thanks for "volunteering", Michael.
>
> To add another Module:
>
> biojava-das : Lead: Jonathan Warren
> probably deprecate the old DAS code in BJ and replace it with
> the up to date Dasobert library
>
> Thanks to Jonathan for volunteering as well.
>
> Andreas
>
>
>
>
> On Thu, Aug 27, 2009 at 10:01 AM, Michael Heuer<heuermh at acm.org> wrote:
> > Andreas Prlic wrote:
> >
> >> Here a list of modules / action items and the people that I would
> propose to
> >> become module leaders:
> >> ...
> >>
> >> Module: biojava-sequencing Lead:  Michael Heuer
> >>   - support FastQ files
> >>   - support parsing of output for various new sequencing machines
> >
> > I have volunteered on the open-bio mailing list to implement FASTQ
> > support.  A nice collection of test data is being created in
> collaboration
> > with the other open-bio projects.  If anyone has interest in a particular
> > data set, please let me know, as I will also need data for performance
> > tuning.
> >
> >   michael
> >
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From andreas at sdsc.edu  Fri Aug 28 15:10:03 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 28 Aug 2009 08:10:03 -0700
Subject: [Biojava-dev] [Biojava-l]  BioJava code freeze,
	modularization and 	action items for sub modules
In-Reply-To: <93b45ca50908272237k2485a1d8le343a8b1dc10ae12@mail.gmail.com>
References: <59a41c430908232118k2fff9564of1a45fba447eb922@mail.gmail.com>
	<Pine.GSO.4.44.0908271254430.17078-100000@shell3.shore.net>
	<59a41c430908271030p7318c468u8d145f5750369cb3@mail.gmail.com>
	<93b45ca50908272237k2485a1d8le343a8b1dc10ae12@mail.gmail.com>
Message-ID: <59a41c430908280810s1720cfckbc36168f2fbc73a8@mail.gmail.com>

Thanks, Mark.

Guess we should start collecting all this info on a wiki page. I started to edit
http://biojava.org/wiki/BioJava:Modules

module leaders: feel free to edit the plans for your module...

Andreas


On Thu, Aug 27, 2009 at 10:37 PM, Mark
Schreiber<markjschreiber at gmail.com> wrote:
- Show quoted text -

On Thu, Aug 27, 2009 at 10:37 PM, Mark
Schreiber<markjschreiber at gmail.com> wrote:
> I'm happy to volunteer code for:
>
> BLASTXML parser as long as I can change the ssbind APIs (other parsers could
> go into a legacy module??). Actually I would prefer to completely decouple
> from the sequence/ feature module as many people would like a blast parser
> without the rest of biojava thrown in.
> BioSQL/ JPA bindings. I have already generated JPA compliant entity beans
> for mapping to BioSQL as well as JPA handler code that makes sure
> modifications presist properly. Currently the object model very closely
> follows the BioSQL table structure.? Also the current beans are what people
> call Anaemic beans in that they hold data and provide getters and setters
> but no biological behaivour. I can easily provide bio-smarts to the beans
> but it might be better to hold off until there is a module that contains
> sequence/feature interfaces which the beans could implement.
> Happy to provide code for an enterprise module if there is sufficient
> interest. This would probably take the form of SessionBeans and WebServices
> that can be deployed to Glassfish/ JBoss etc to provide biological services
> for people who want to make client server or SOA apps.
>
> - Mark
>
>
> On Fri, Aug 28, 2009 at 1:30 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>
>> Great, thanks for "volunteering", Michael.
>>
>> To add another Module:
>>
>> biojava-das : Lead: Jonathan Warren
>> probably deprecate the old DAS code in BJ and replace it with
>> the up to date Dasobert library
>>
>> Thanks to Jonathan for volunteering as well.
>>
>> Andreas
>>
>>
>>
>>
>> On Thu, Aug 27, 2009 at 10:01 AM, Michael Heuer<heuermh at acm.org> wrote:
>> > Andreas Prlic wrote:
>> >
>> >> Here a list of modules / action items and the people that I would
>> >> propose to
>> >> become module leaders:
>> >> ...
>> >>
>> >> Module: biojava-sequencing Lead: ?Michael Heuer
>> >> ? - support FastQ files
>> >> ? - support parsing of output for various new sequencing machines
>> >
>> > I have volunteered on the open-bio mailing list to implement FASTQ
>> > support. ?A nice collection of test data is being created in
>> > collaboration
>> > with the other open-bio projects. ?If anyone has interest in a
>> > particular
>> > data set, please let me know, as I will also need data for performance
>> > tuning.
>> >
>> > ? michael
>> >
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >
>>
>> _______________________________________________
>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>


From andreas at sdsc.edu  Mon Aug 31 01:23:03 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 30 Aug 2009 18:23:03 -0700
Subject: [Biojava-dev] maven progress
Message-ID: <59a41c430908301823s6e2e3d7fi6caffc47e1a8c0ff@mail.gmail.com>

Hi,

I started to split up biojava into submodules and am mavenizing the
build process. The new SVN location is emerging here:

http://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/biojava

or in your browser:

http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/biojava

A few questions so far from my side.

1) bytecode.jar: at the present the core module depends on this.   So
far it is in the /jars subfolder of the module and needs to be
installed by hand. What is the best way to deal with this in SVN?

2) Sequence module (Richard's original biojava v.3 branch) Since this
consists of sub-modules I have set it up as a few hierarchically
organized submodules. There is some biosql code there as well.
Richard/Mark not sure now to arrange this. I think it would be good to
have a biosql module. Shall I refactor the current biosql code out of
core into a new biosql module or will the current code be obsoleted
and  replaced with the new code in the sequence module?

Andreas