From heuermh at acm.org  Wed Jul  1 12:56:36 2009
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 1 Jul 2009 12:56:36 -0400 (EDT)
Subject: [Biojava-dev] Singletons are bad
In-Reply-To: <93b45ca50906300133w58109024vb89c6970a8446fed@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0907011251340.2527-100000@shell3.shore.net>

Mark Schreiber wrote:

> I came across this today which is an interesting article about how
> singletons seem like a good idea but after a while you realise they get you
> into serious trouble. After playing with BioJava for over 10 years I
> completely concur. Singletons and fly-weight objects are (IMHO) the most
> serious problem in the BioJava code base and as the article predicts the BJ
> code base is completely infected with them.
>
> The article is here:
> http://tech.puredanger.com/2007/07/03/pattern-hate-singleton/
>
>
> But I have copied the paragraph below as it seems to offer a way out without
> completely breaking everything.  This should be seriously considered for
> future BJ releases.
>
> ... paste starts here
> But I already have a bunch of singletons in my code!
> ...

I've had good luck using Google Guice in several for-work projects:

> http://code.google.com/p/google-guice/

@Inject is the new new as they say.  :)

   michael


From sylvain.foisy at diploide.net  Thu Jul  2 09:12:44 2009
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Thu, 02 Jul 2009 09:12:44 -0400
Subject: [Biojava-dev] Preliminary QBlast support in biojava-live
Message-ID: <C6722A8C.13B63%sylvain.foisy@diploide.net>

Hi all,

I just put some material into a new package (org.biojavax.bio.alignment) for
creating a remote service for alignment with its implementation for QBlast.

The philosophy for using these is this:

- Create an implementation of RemotePairwiseAlignmentService for a specific
remote service;

- Create an implementation of RemotePairwiseAlignementProperties to set
parameters for alignment;

- Use the sendAlignmentRequest() method with a sequence with the implemented
RemotePairwiseAlignementProperties to submit the sequence for alignmnent.

- Retrieve the results with an implementation of
RemotePairwiseAlignmentOutputProperties which specifies the format of the
output to get from the service.

This is done so that submission of sequence and retrieval of results can be
dissociated.

I think that I have addressed most of the points of a few weeks back. If
not, let me know ;-) I created a demo in the demos folder.

Best regards

Sylvain
===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net
 Tel: (514) 893-4363
===================================================================


From aradwen at gmail.com  Thu Jul  2 09:28:00 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Thu, 2 Jul 2009 15:28:00 +0200
Subject: [Biojava-dev] Parsing Interpro results
Message-ID: <e591b1bd0907020628m7c2e75aboa841b75f88b2dc83@mail.gmail.com>

Hello everyone,

I looked around in Biojava doc and through internet but I did'nt found how
to parse Interproscan results (xml as well as tabular formats)
It is not hard to code it in Java, But I just wanted to know if this exists
or not.

Regards
Rad

From hunter at ebi.ac.uk  Thu Jul  2 10:56:28 2009
From: hunter at ebi.ac.uk (Sarah Hunter)
Date: Thu, 02 Jul 2009 15:56:28 +0100
Subject: [Biojava-dev] Parsing Interpro results
In-Reply-To: <12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com>
References: <e591b1bd0907020628m7c2e75aboa841b75f88b2dc83@mail.gmail.com>
	<12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com>
Message-ID: <4A4CCA9C.1080404@ebi.ac.uk>

Hi Radwen (and the rest of the biojava guys),

As far as I am aware, there isn't a biojava parser for InterProScan results.

However, we are undergoing a complete re-write of InterPro and InterProScan at the moment and it is 
our intention to provide a java API for accessing all of our data.  If you wish to be involved in 
testing this API, please contact the InterPro team via the EBI's support pages 
(http://www.ebi.ac.uk/support/)

Many thanks for your interest.

Sarah Hunter

---
  Sarah Hunter

  InterPro Team Leader
  European Bioinformatics Institute
  Wellcome Trust Genome Campus
  Hinxton
  Cambridge
  CB10 1SD, UK

=====================================


> From: Radwen ANIBA <aradwen at gmail.com>
> Date: Thu, Jul 2, 2009 at 2:28 PM
> Subject: [Biojava-dev] Parsing Interpro results
> To: biojava-dev at lists.open-bio.org
> 
> 
> Hello everyone,
> 
> I looked around in Biojava doc and through internet but I did'nt found how
> to parse Interproscan results (xml as well as tabular formats)
> It is not hard to code it in Java, But I just wanted to know if this exists
> or not.
> 
> Regards
> Rad
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> 
> 

From fbristow at gmail.com  Tue Jul  7 09:34:09 2009
From: fbristow at gmail.com (Franklin Bristow)
Date: Tue, 7 Jul 2009 08:34:09 -0500
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
Message-ID: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>

Hi everyone,
Now that you're all back from ISMB (I hope you all had a good time!) I
thought it would be a good time to bring this up.

A while back I wrote to the list about an ABIF parser and SCF writer that I
had written.  I got some pointers on things to change and I've since made
the suggested changes.  Now I was wondering how I should go about getting
these files into BioJava....

-- 
Franklin

From andreas at sdsc.edu  Tue Jul  7 12:51:05 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 7 Jul 2009 09:51:05 -0700
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
In-Reply-To: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
Message-ID: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>

Hi Franklin,

The theme of the moment is modularization... I wonder if we should make a
module for parsing the output of sequencers...

This topic is also a bit related to the discussion we had around BOSC last
week, how to contribute modules, and what is the role of a module
maintainer. I will send out a more detailed summary on that a bit later.

Andreas


On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow <fbristow at gmail.com> wrote:

> Hi everyone,
> Now that you're all back from ISMB (I hope you all had a good time!) I
> thought it would be a good time to bring this up.
>
> A while back I wrote to the list about an ABIF parser and SCF writer that I
> had written.  I got some pointers on things to change and I've since made
> the suggested changes.  Now I was wondering how I should go about getting
> these files into BioJava....
>
> --
> Franklin
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From gmicha at gmail.com  Tue Jul  7 13:08:07 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Tue, 07 Jul 2009 19:08:07 +0200
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
In-Reply-To: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>
References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
	<59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>
Message-ID: <4A5380F7.6090602@gmail.com>

Hi,

the time I worked on SCF and ABI files is a bit ago, but lets see if I 
can contribute something here. Working since a year on NGS, I could 
imagine that readers for the standard-output of pipelines by Illumina & 
Co would also fit there.

If there is a reader module, I would plead for a low-level interface for
accessing sequences/qualities and re-usable data containers during I/O. 
But maybe that is rather an early stage to talk about that when not even 
the existence of the module is decided.

cheers - micha.

Andreas Prlic wrote:
> Hi Franklin,
> 
> The theme of the moment is modularization... I wonder if we should make a
> module for parsing the output of sequencers...
> 
> This topic is also a bit related to the discussion we had around BOSC last
> week, how to contribute modules, and what is the role of a module
> maintainer. I will send out a more detailed summary on that a bit later.
> 
> Andreas
> 
> 
> On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow <fbristow at gmail.com> wrote:
> 
>> Hi everyone,
>> Now that you're all back from ISMB (I hope you all had a good time!) I
>> thought it would be a good time to bring this up.
>>
>> A while back I wrote to the list about an ABIF parser and SCF writer that I
>> had written.  I got some pointers on things to change and I've since made
>> the suggested changes.  Now I was wondering how I should go about getting
>> these files into BioJava....
>>
>> --
>> Franklin
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From fbristow at gmail.com  Tue Jul  7 13:29:56 2009
From: fbristow at gmail.com (Franklin Bristow)
Date: Tue, 7 Jul 2009 12:29:56 -0500
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
In-Reply-To: <4A5380F7.6090602@gmail.com>
References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
	<59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>
	<4A5380F7.6090602@gmail.com>
Message-ID: <50a7756d0907071029s45ee0983y85f2ee307765e65c@mail.gmail.com>

Hello,
I think I like the idea of having a module for the I/O of sequencers in
general.  I really only have familiarity with ABI sequencers (ie: 31xx and
37xx) and the data that they spit out, so I would be able to offer some help
there.  Needless to say, the documentation that ABI released regarding their
binary format was much appreciated when I was going through the code.

To Micha:  when you talk about 'during I/O', do you mean having some kind of
an event based parser?  When I wrote my extended ABIF parser I modelled it
after the perl module Bio::Trace::ABIF, so there are accessors for many of
the tags that are defined in the ABI spec.

On Tue, Jul 7, 2009 at 12:08 PM, Micha Sammeth <gmicha at gmail.com> wrote:

> Hi,
>
> the time I worked on SCF and ABI files is a bit ago, but lets see if I can
> contribute something here. Working since a year on NGS, I could imagine that
> readers for the standard-output of pipelines by Illumina & Co would also fit
> there.
>
> If there is a reader module, I would plead for a low-level interface for
> accessing sequences/qualities and re-usable data containers during I/O. But
> maybe that is rather an early stage to talk about that when not even the
> existence of the module is decided.
>
> cheers - micha.
>
>
> Andreas Prlic wrote:
>
>> Hi Franklin,
>>
>> The theme of the moment is modularization... I wonder if we should make a
>> module for parsing the output of sequencers...
>>
>> This topic is also a bit related to the discussion we had around BOSC last
>> week, how to contribute modules, and what is the role of a module
>> maintainer. I will send out a more detailed summary on that a bit later.
>>
>> Andreas
>>
>>
>> On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow <fbristow at gmail.com>
>> wrote:
>>
>>  Hi everyone,
>>> Now that you're all back from ISMB (I hope you all had a good time!) I
>>> thought it would be a good time to bring this up.
>>>
>>> A while back I wrote to the list about an ABIF parser and SCF writer that
>>> I
>>> had written.  I got some pointers on things to change and I've since made
>>> the suggested changes.  Now I was wondering how I should go about getting
>>> these files into BioJava....
>>>
>>> --
>>> Franklin
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>>  _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>


-- 
Franklin

From sreekanth.m at ocimumbio.com  Wed Jul  8 01:51:15 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Wed, 8 Jul 2009 11:21:15 +0530
Subject: [Biojava-dev] Reg: Source of BioJava
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com>

Dear all,

Just now I started working with biojava to work in next generation sequencing.
I got all the jar files to work with biojava, and i found many things which are very useful to me.
I require sourde jar file of biojava. If anybody has it please send it to me.

Thanks in advance.

Thanks & Regards,
Sreekanth.M


From andreas at sdsc.edu  Wed Jul  8 02:20:59 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 7 Jul 2009 23:20:59 -0700
Subject: [Biojava-dev] summary biojava user meeting
Message-ID: <59a41c430907072320k3d5a4415u962d59a10d286beb@mail.gmail.com>

Hi,

Here a quick summary of the BioJava user meeting we had last week at the
BOSC conference:

The following people were present:

Mattias Piipari
Martijn Devisscher
Frederik Decouttere
Richard Holland
Andreas Prlic

The new modularized code base will allow for individual people to take over
responsibility of some of the sub-modules as well as the contribution of new
modules., which I both welcome greatly. As such it was great to have
Mattias, Martijn and Frederik there and  expressing their interest in this.

Mattias is interested in contributing a new module related to machine
learning. Martijn and Frederik are interested in providing a new GUI module
(seqpad). Due to this our discussions were mainly related to how to organize
the contribution of new modules and their maintainance:

* Before starting a new module the code should undergo public code review
* New modules need docu (wiki cookbook) and junit tests.
* A Module Maintainer (MM) is the main responsible for everything related to
the module.
* MM coordinates patches and other user contributions for the module
* MM can write papers related to the code in the module without having to
cite all of the other BioJava contributors.
* A MM volunteers to support the module for (at least) a year.
* All MMs will be listed by name on a wiki page in order to clarify
responsibilities

Andreas

From holland at eaglegenomics.com  Wed Jul  8 12:41:54 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 08 Jul 2009 18:41:54 +0200
Subject: [Biojava-dev] Reg: Source of BioJava
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com>
Message-ID: <1247071314.3792.9.camel@buzzybee>

The source code can be obtained by following these instructions:

http://biojava.org/wiki/CVS_to_SVN_Migration

Richard.

On Wed, 2009-07-08 at 11:21 +0530, Sreekanth Mogullapally wrote:
> Dear all,
> 
> Just now I started working with biojava to work in next generation sequencing.
> I got all the jar files to work with biojava, and i found many things which are very useful to me.
> I require sourde jar file of biojava. If anybody has it please send it to me.
> 
> Thanks in advance.
> 
> Thanks & Regards,
> Sreekanth.M
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From florian.mittag at uni-tuebingen.de  Thu Jul  9 11:16:12 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 9 Jul 2009 17:16:12 +0200
Subject: [Biojava-dev] Problems in DB2 with VARCHAR,
	TEXT and CLOB using BioJava
Message-ID: <200907091716.13639.florian.mittag@uni-tuebingen.de>

Hi all!

I'm posting this to both the BioSQL and the BioJava-dev mailinglist because 
the problem resides in both domains, I hope this is okay.

We're working on getting BioJava to run with a DB2 Express-C backend for 
various reasons. We've encountered several problems during this task, but 
this one seems to have no real solution.

When adapting the BioSQL schema to DB2, the official IBM conversion guide 
tells us to use the data type CLOB where MySQL uses TEXT.

(Chapter 11 in
ftp://ftp.software.ibm.com/software/data/db2/migration/mtk/mtk_2050.pdf)

So far, no problem. But when we tried reading some genebank files with 
BioJava, the DB2 driver threw an exception:

SQL0401N  The data types of the operands for the operation "=" are not 
compatible.  SQLSTATE=42818 SQLCODE=-401

Explanation:
The class org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder defines some 
Hibernate queries, of which one has the conditions:

"from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"

All three columns "authors", "location", and "title" are of type TEXT in MySQL 
and of type CLOB in DB2, so comparing them with "=" leads to the above error 
message.


The way I see it, there are only two possible solutions to this problem:
1) Change the query to
"from DocRef as cr where cr.authors LIKE '?' and cr.location LIKE '?' and 
cr.title LIKE '?'"

2) Change the data type to something comparable with "=", like VARCHAR.

Solution 1 is no real solution to me, because comparing values with "LIKE" 
usually is slow and it seems a bit odd to change a query that works with 
other databases just for DB2.

But taking a closer look, solution 2 has some problems, too:
Although VARCHARs in DB2 can have a length of theoretically 32767, in reality 
they are limited by the page size of the database, which can be 32K at 
maximum. Since this particular table "reference" has three columns of this 
type, the sum of their lengths must not exceed 32767, so they could only be 
something like VARCHAR(10000).

I have never encountered cases in which values come even close to the length 
of 10000, but you can never be sure.


And that is why I post here. For me, the way to go is pretty clear, but we 
intend to be as compatible as possible with the original BioSQL. Maybe you 
could give me some input on how to solve this problem with as few casualties 
as possible ;-)


Thanks,
Florian

From aradwen at gmail.com  Fri Jul 10 10:45:35 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Fri, 10 Jul 2009 16:45:35 +0200
Subject: [Biojava-dev] ExternalProcess class
Message-ID: <e591b1bd0907100745k7ea36398j2d8ed5b0084f27c2@mail.gmail.com>

Hi everyone,

Do somebody have (as examples of biojava cookbook) a usage example of
ExternalProcess class ?

Let's say we want to run a local clustalw program with it, is it possible ?

Any example code ?

Thank you

Radwen

From sylvain.foisy at diploide.net  Fri Jul 10 12:43:09 2009
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Fri, 10 Jul 2009 12:43:09 -0400
Subject: [Biojava-dev] ExternalProcess class
In-Reply-To: <mailman.25.1247241605.31128.biojava-dev@lists.open-bio.org>
Message-ID: <C67CE7DD.13D14%sylvain.foisy@diploide.net>

Hi,

There is no such example to the best of my knowledge. If you do use this
class, you are welcome to share your experience by contributing. As far as
running something like clustalw, I don't see why you could not make use of
this class. You could actually build a wrapper class to execute clustalw and
do something with its output.

Best regards

Sylvain

On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote:

> Hi everyone,
> 
> Do somebody have (as examples of biojava cookbook) a usage example of
> ExternalProcess class ?
> 
> Let's say we want to run a local clustalw program with it, is it possible ?
> 
> Any example code ?
> 
> Thank you


===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net
 Tel: (514) 893-4363
===================================================================


From hlapp at gmx.net  Sat Jul 11 07:47:34 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 11 Jul 2009 07:47:34 -0400
Subject: [Biojava-dev] [BioSQL-l] Problems in DB2 with VARCHAR,
	TEXT and CLOB using BioJava
In-Reply-To: <200907091716.13639.florian.mittag@uni-tuebingen.de>
References: <200907091716.13639.florian.mittag@uni-tuebingen.de>
Message-ID: <5614AEDA-3406-4844-8690-7653A2C4297C@gmx.net>

Hi Florian:

On Jul 9, 2009, at 11:16 AM, Florian Mittag wrote:

> [...]
> 2) Change the data type to something comparable with "=", like  
> VARCHAR.

That's the way to go. The reason they are not VARCHAR in MySQL is  
because it is limited to 256 characters there.

> [...]
> Although VARCHARs in DB2 can have a length of theoretically 32767,  
> in reality
> they are limited by the page size of the database, which can be 32K at
> maximum. Since this particular table "reference" has three columns  
> of this
> type, the sum of their lengths must not exceed 32767, so they could  
> only be
> something like VARCHAR(10000).

That sounds great though. You may have noticed that the columns are  
all of type VARCHAR in the Oracle version of the schema with the  
following widths:

        Title                VARCHAR2(1000)
        Authors              VARCHAR2(4000)
        Location             VARCHAR2(512)

That has always served me well. Feel free to use larger widths though  
if you think you need them.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From HWillis at scripps.edu  Sat Jul 11 07:08:32 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Sat, 11 Jul 2009 07:08:32 -0400
Subject: [Biojava-dev] ExternalProcess class
In-Reply-To: <C67CE7DD.13D14%sylvain.foisy@diploide.net>
References: <mailman.25.1247241605.31128.biojava-dev@lists.open-bio.org>,
	<C67CE7DD.13D14%sylvain.foisy@diploide.net>
Message-ID: <FF46E20B7EF40940B038819E2CFBA8522B0032E4FB@CCRMB1.fl.ad.scripps.edu>

Biojava already has a Clustalw class that executes Clustalw as an external process  if that is your larger goal.

http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW

Thanks

Scooter
________________________________________
From: biojava-dev-bounces at lists.open-bio.org [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy [sylvain.foisy at diploide.net]
Sent: Friday, July 10, 2009 12:43 PM
To: biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] ExternalProcess class

Hi,

There is no such example to the best of my knowledge. If you do use this
class, you are welcome to share your experience by contributing. As far as
running something like clustalw, I don't see why you could not make use of
this class. You could actually build a wrapper class to execute clustalw and
do something with its output.

Best regards

Sylvain

On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote:

> Hi everyone,
>
> Do somebody have (as examples of biojava cookbook) a usage example of
> ExternalProcess class ?
>
> Let's say we want to run a local clustalw program with it, is it possible ?
>
> Any example code ?
>
> Thank you


===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net
 Tel: (514) 893-4363
===================================================================


_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From paolo.pavan at gmail.com  Sun Jul 12 17:41:05 2009
From: paolo.pavan at gmail.com (Paolo Pavan)
Date: Sun, 12 Jul 2009 23:41:05 +0200
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
Message-ID: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>

Hi,
I would like to post again with some adjustments a question I put some
times ago because maybe this is a more correct list, apologize for the
repeating.
Can someone kindly give me his advise?

thank you in advance,
Paolo


---------- Forwarded message ----------
From: Paolo Pavan <paolo.pavan at gmail.com>
Date: 2009/7/9
Subject: Assembly data reading
To: Biojava-l at lists.open-bio.org


Hi everybody,
I'm almost new to this topic, I would like to know if there is
something can help me to load in my java program data from a large 454
contig. I need to retain in memory and access data from the single
reads forming the contig too.
I suppose these informations are in a *.sff file, if it is not
possible to load such file it should be ok to load a *.ace (phrap)
data file that I have too.
Many thanks for any suggestion you can give me!

Greetings,
Paolo

From holland at eaglegenomics.com  Mon Jul 13 01:20:35 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 13 Jul 2009 06:20:35 +0100
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
Message-ID: <1247462435.25217.7.camel@buzzybee>

Nothing within BJ can parse the 454 .sff files directly. However I think
there is a growing need for it so if anyone is willing to contribute
code, it would be very welcome.

There is also no .ace parser, although in 2007 someone volunteered to
write one but nothing happened, and there was a previous post (many
years ago!) from someone else who already had some working code but
again nothing seems to have happened: 

http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html

So to start with, someone (perhaps yourself? that would be nice! :) )
needs to volunteer to write either a .ace or .sff parser, or both. 

The thing to bear in mind with 454 contigs as you rightly point out is
the sheer size of the things. The requirement to keep them entirely in
memory is likely to be unworkable as it would leave little room for
anything else to run on your average machine. I would suggest either
memory-mapping the file itself, or parsing and writing out a
memory-mapped summary file containing the bits of data you're interested
in. (Memory-mapping is where you keep an index in memory indicating
where in the file each record is, so that when you need to access them
you load them on-the-fly from the file and drop them out of memory again
immediately after use. An accelerated form of this is to put the loaded
records into some kind of LRU cache which holds only the most recently
accessed records and then check that cache first to see if you've
already loaded the record before accessing the file directly.)

cheers,
Richard


On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
> Hi,
> I would like to post again with some adjustments a question I put some
> times ago because maybe this is a more correct list, apologize for the
> repeating.
> Can someone kindly give me his advise?
> 
> thank you in advance,
> Paolo
> 
> 
> ---------- Forwarded message ----------
> From: Paolo Pavan <paolo.pavan at gmail.com>
> Date: 2009/7/9
> Subject: Assembly data reading
> To: Biojava-l at lists.open-bio.org
> 
> 
> Hi everybody,
> I'm almost new to this topic, I would like to know if there is
> something can help me to load in my java program data from a large 454
> contig. I need to retain in memory and access data from the single
> reads forming the contig too.
> I suppose these informations are in a *.sff file, if it is not
> possible to load such file it should be ok to load a *.ace (phrap)
> data file that I have too.
> Many thanks for any suggestion you can give me!
> 
> Greetings,
> Paolo
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Mon Jul 13 02:29:56 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 13 Jul 2009 14:29:56 +0800
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <1247462435.25217.7.camel@buzzybee>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> 
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> 
	<1247462435.25217.7.camel@buzzybee>
Message-ID: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>

I would agree that there is a strong need for this kind of thing in biojava.

As Richard says you probably can't fit it in memory so you may want to
memory map it. There are classes in the javax.nio package that can help a
lot with this.

Also I have had some success with in-memory compression of large files using
LZ compression. Essentially the memory representation of the file is LZ
compressed and compression and decompression are handled on the fly. Again
there are Java utility classes that can help.

- Mark

On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Nothing within BJ can parse the 454 .sff files directly. However I think
> there is a growing need for it so if anyone is willing to contribute
> code, it would be very welcome.
>
> There is also no .ace parser, although in 2007 someone volunteered to
> write one but nothing happened, and there was a previous post (many
> years ago!) from someone else who already had some working code but
> again nothing seems to have happened:
>
> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html
>
> So to start with, someone (perhaps yourself? that would be nice! :) )
> needs to volunteer to write either a .ace or .sff parser, or both.
>
> The thing to bear in mind with 454 contigs as you rightly point out is
> the sheer size of the things. The requirement to keep them entirely in
> memory is likely to be unworkable as it would leave little room for
> anything else to run on your average machine. I would suggest either
> memory-mapping the file itself, or parsing and writing out a
> memory-mapped summary file containing the bits of data you're interested
> in. (Memory-mapping is where you keep an index in memory indicating
> where in the file each record is, so that when you need to access them
> you load them on-the-fly from the file and drop them out of memory again
> immediately after use. An accelerated form of this is to put the loaded
> records into some kind of LRU cache which holds only the most recently
> accessed records and then check that cache first to see if you've
> already loaded the record before accessing the file directly.)
>
> cheers,
> Richard
>
>
> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
> > Hi,
> > I would like to post again with some adjustments a question I put some
> > times ago because maybe this is a more correct list, apologize for the
> > repeating.
> > Can someone kindly give me his advise?
> >
> > thank you in advance,
> > Paolo
> >
> >
> > ---------- Forwarded message ----------
> > From: Paolo Pavan <paolo.pavan at gmail.com>
> > Date: 2009/7/9
> > Subject: Assembly data reading
> > To: Biojava-l at lists.open-bio.org
> >
> >
> > Hi everybody,
> > I'm almost new to this topic, I would like to know if there is
> > something can help me to load in my java program data from a large 454
> > contig. I need to retain in memory and access data from the single
> > reads forming the contig too.
> > I suppose these informations are in a *.sff file, if it is not
> > possible to load such file it should be ok to load a *.ace (phrap)
> > data file that I have too.
> > Many thanks for any suggestion you can give me!
> >
> > Greetings,
> > Paolo
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From sreekanth.m at ocimumbio.com  Mon Jul 13 07:28:52 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Mon, 13 Jul 2009 16:58:52 +0530
Subject: [Biojava-dev] Reg: Quality values of ABI File
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>

Hi Everybody,

I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it,
but I need the Quality values for each base.
Please help me in this regard.

Thanks in Advance
Sreekanth.M


From fbristow at gmail.com  Mon Jul 13 09:15:40 2009
From: fbristow at gmail.com (Franklin Bristow)
Date: Mon, 13 Jul 2009 08:15:40 -0500
Subject: [Biojava-dev] Reg: Quality values of ABI File
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
Message-ID: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com>

Hi Sreekanth,
The quality values are stored under the PCON 1 and 2 tags.  The information
you need is in the offsetData field of the TaggedDataRecord.  You can treat
this byte array as an array of shorts containing the quality values for each
base.

Take a look at this PDF for more information about the different tags
available to you:
http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf

-- 
Franklin

On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally <
sreekanth.m at ocimumbio.com> wrote:

> Hi Everybody,
>
> I am working with "ABI" Files. I am able to get the pixel values for
> Chromatogram viewer from it,
> but I need the Quality values for each base.
> Please help me in this regard.
>
> Thanks in Advance
> Sreekanth.M
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From sylvain.foisy at diploide.net  Mon Jul 13 09:16:34 2009
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Mon, 13 Jul 2009 09:16:34 -0400
Subject: [Biojava-dev] ExternalProcess class
In-Reply-To: <FF46E20B7EF40940B038819E2CFBA8522B0032E4FB@CCRMB1.fl.ad.scripps.edu>
Message-ID: <C680ABF2.13D3D%sylvain.foisy@diploide.net>

Hi Scooter,

Actaully, this is not into BJ 1.7. This material was created by Dickson
Guedes but never formalized into a class for BJ. Maybe it would be the time
to do so?

Best regards

Sylvain


On 11/07/09 07:08, "[NAME]" <[ADDRESS]> wrote:

> Biojava already has a Clustalw class that executes Clustalw as an external
> process  if that is your larger goal.
> 
> http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW
> 
> Thanks
> 
> Scooter
> ________________________________________
> From: biojava-dev-bounces at lists.open-bio.org
> [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy
> [sylvain.foisy at diploide.net]
> Sent: Friday, July 10, 2009 12:43 PM
> To: biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] ExternalProcess class
> 
> Hi,
> 
> There is no such example to the best of my knowledge. If you do use this
> class, you are welcome to share your experience by contributing. As far as
> running something like clustalw, I don't see why you could not make use of
> this class. You could actually build a wrapper class to execute clustalw and
> do something with its output.
> 
> Best regards
> 
> Sylvain
> 
> On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote:
> 
>> Hi everyone,
>> 
>> Do somebody have (as examples of biojava cookbook) a usage example of
>> ExternalProcess class ?
>> 
>> Let's say we want to run a local clustalw program with it, is it possible ?
>> 
>> Any example code ?
>> 
>> Thank you
> 
> 
> ===================================================================
> 
>  Sylvain Foisy, Ph. D.
>  Consultant Bio-informatique / Bioinformatics
>  Diploide.net - TI pour la vie / IT for Life
> 
>  Courriel: sylvain.foisy at diploide.net
>  Web: http://www.diploide.net
>  Tel: (514) 893-4363
> ===================================================================
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From markjschreiber at gmail.com  Mon Jul 13 09:21:42 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 13 Jul 2009 21:21:42 +0800
Subject: [Biojava-dev] Reg: Quality values of ABI File
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
Message-ID: <93b45ca50907130621m6ebd37bn72aad9fafe195db7@mail.gmail.com>

Hi -
You would usually use a program like Phred/Phrap for this. There is a
BioJava package for reading and processing the Phred output.

- Mark

On Mon, Jul 13, 2009 at 7:28 PM, Sreekanth Mogullapally <
sreekanth.m at ocimumbio.com> wrote:

> Hi Everybody,
>
> I am working with "ABI" Files. I am able to get the pixel values for
> Chromatogram viewer from it,
> but I need the Quality values for each base.
> Please help me in this regard.
>
> Thanks in Advance
> Sreekanth.M
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From sreekanth.m at ocimumbio.com  Mon Jul 13 10:49:26 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Mon, 13 Jul 2009 20:19:26 +0530
Subject: [Biojava-dev] Reg: Quality values of ABI File
In-Reply-To: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
	<50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com>
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CBC6@EXCHMB.ocimumbio.com>

Dear Franklin,

Thank you very much for your quick responce.
Now I can able to get the Quality values, which meets my requirement.


Thanks & Regards,
Sreekanth.M

From: Franklin Bristow [mailto:fbristow at gmail.com]
Sent: Monday, July 13, 2009 6:46 PM
To: Sreekanth Mogullapally
Cc: biojava-dev-request at lists.open-bio.org; biojava-dev at lists.open-bio.org; Madhu Mohan. Ganni; Kishore Dunga
Subject: Re: [Biojava-dev] Reg: Quality values of ABI File

Hi Sreekanth,
The quality values are stored under the PCON 1 and 2 tags.  The information you need is in the offsetData field of the TaggedDataRecord.  You can treat this byte array as an array of shorts containing the quality values for each base.

Take a look at this PDF for more information about the different tags available to you:
http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf

--
Franklin
On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally <sreekanth.m at ocimumbio.com<mailto:sreekanth.m at ocimumbio.com>> wrote:
Hi Everybody,

I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it,
but I need the Quality values for each base.
Please help me in this regard.

Thanks in Advance
Sreekanth.M


_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org<mailto:biojava-dev at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From holland at eaglegenomics.com  Mon Jul 13 13:14:13 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 13 Jul 2009 18:14:13 +0100
Subject: [Biojava-dev] Hackathon
Message-ID: <1247505253.27493.15.camel@buzzybee>

Hi all.

Andreas and I would like to organise a hackathon to get the
modularisation and general improvement plans for BJ3 into action, and
bring the project forward into the 21st century (only 10 years late!).

At this time I'm trying to gather interest and gauge who might
realistically be able to attend. We will attempt to site the hackathon
at a location closest to the majority of attendees.

To help me plan numbers and likely costs (for potential sponsors) could
all those who are interested please answer the following questions for
me:

 1. Name,
 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
sequencing, etc.),
 3. Your physical location (country and and nearest major city - e.g.
Cambridge, London, Newcastle, San Diego, Singapore, etc.),
 4. Whether you think your employer would help pay your airfare and/or 1
week in a hotel to attend (and how far you think you could go on such
funding),
 5. Approximate availability for the next 12 months.

To get the ball rolling, here's me:

 1. Richard Holland,  2. Making the whole thing more consistent,
efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
probably only within UK/Europe,  5. Only available in mid-Jan 2010,
otherwise can't do anything until mid-March 2010 onwards.

Looking forward to hearing your comments! Once I have a good idea of
numbers and distribution, I can get some costs together to give you (and
any potential sponsors) the best idea of what might be involved.

cheers,
Richard

-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From HWillis at scripps.edu  Mon Jul 13 16:12:16 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 13 Jul 2009 16:12:16 -0400
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
Message-ID: <C6810D60.1652%HWillis@scripps.edu>

Richard

I would be up for a week away as a "vacation" to do some Java programming. So I am flexible on all elements of time and ability to travel. Bonus points for going some place where the weather is reasonable for the location since it appears we have a global option (January in the UK not my first choice/January in Colorado better choice). Probably wouldn't be a bad idea to try and bookend/overlap a bioinformatics related conference to help justify travel costs for those who need support from work.

We should also consider online options for those who can't travel but can allocate the time.

Thanks

Scooter


On 7/13/09 1:14 PM, "Richard Holland" <holland at eaglegenomics.com> wrote:

Hi all.

Andreas and I would like to organise a hackathon to get the
modularisation and general improvement plans for BJ3 into action, and
bring the project forward into the 21st century (only 10 years late!).

At this time I'm trying to gather interest and gauge who might
realistically be able to attend. We will attempt to site the hackathon
at a location closest to the majority of attendees.

To help me plan numbers and likely costs (for potential sponsors) could
all those who are interested please answer the following questions for
me:

 1. Name,
 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
sequencing, etc.),
 3. Your physical location (country and and nearest major city - e.g.
Cambridge, London, Newcastle, San Diego, Singapore, etc.),
 4. Whether you think your employer would help pay your airfare and/or 1
week in a hotel to attend (and how far you think you could go on such
funding),
 5. Approximate availability for the next 12 months.

To get the ball rolling, here's me:

 1. Richard Holland,  2. Making the whole thing more consistent,
efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
probably only within UK/Europe,  5. Only available in mid-Jan 2010,
otherwise can't do anything until mid-March 2010 onwards.

Looking forward to hearing your comments! Once I have a good idea of
numbers and distribution, I can get some costs together to give you (and
any potential sponsors) the best idea of what might be involved.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From paolo.pavan at gmail.com  Tue Jul 14 12:08:11 2009
From: paolo.pavan at gmail.com (Paolo Pavan)
Date: Tue, 14 Jul 2009 18:08:11 +0200
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
	<1247462435.25217.7.camel@buzzybee>
	<93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>
Message-ID: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com>

Dear all,
I took a day to make a rapid search to try to have a clearest point of
the situation.
?	I found the specification of the .sff file in the 454 instrument
manual, it is fully described and seems to be enough to build a
reader.
?	However from a more careful read it seems that a *.sff file brings
not information about the automatic contig assembling and only stores
flowgram info that are "reads" (not like a *.ace file indeed).
?	Two hidden binary files can be found in a 454 gsAssembler project
folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are
not described in the manual but they seem to contain the former
nucleotide data and the latter read names, they are big enough to
contain such kind of data, the problem is that we don't know how to
parse them.
?	It is necessary to decide a "memory structure" in which store the
information read, I agree on the "memory mapping" solution, maybe
implemented with a Map object that can associate the names of the read
and its location on the file.
?	the parser class then should expose methods to:
	1) iterate through reads, but maybe this should be heavy and avoidable
	2) access read sequence from name
?	if the parser should manage the assembled contigs too and this is
subordinated to what explained in the third bullet point, it should
expose method to:
	1) iterate through contigs names
	2) iterate through contigs consensus sequences
	3) access consensus sequences from name (this is a sub problem of point 2)
	4) access random aligned portions (I mean "slice") of the assembly
given start-end positions returning an alignment object
?	any more suggestions?
I would be glad to be involved in the biojava community through this
project and I could try but first of all I want to say that I?m not a
guru like most of the people here ( :-p ) and to say the truth the job
that my company required me is different and maybe if exists a
workaround I should be honest to choose it.
So let me think a bit about starting such adventure, if I can couple
my job and contributing the community growth I?ll be happy to share my
work! Any suggestion welcome.

Bye bye,
Paolo


2009/7/13 Mark Schreiber <markjschreiber at gmail.com>:
> I would agree that there is a strong need for this kind of thing in biojava.
>
> As Richard says you probably can't fit it in memory so you may want to
> memory map it. There are classes in the javax.nio package that can help a
> lot with this.
>
> Also I have had some success with in-memory compression of large files using
> LZ compression. Essentially the memory representation of the file is LZ
> compressed and compression and decompression are handled on the fly. Again
> there are Java utility classes that can help.
>
> - Mark
>
> On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland <holland at eaglegenomics.com>
> wrote:
>>
>> Nothing within BJ can parse the 454 .sff files directly. However I think
>> there is a growing need for it so if anyone is willing to contribute
>> code, it would be very welcome.
>>
>> There is also no .ace parser, although in 2007 someone volunteered to
>> write one but nothing happened, and there was a previous post (many
>> years ago!) from someone else who already had some working code but
>> again nothing seems to have happened:
>>
>> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
>> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html
>>
>> So to start with, someone (perhaps yourself? that would be nice! :) )
>> needs to volunteer to write either a .ace or .sff parser, or both.
>>
>> The thing to bear in mind with 454 contigs as you rightly point out is
>> the sheer size of the things. The requirement to keep them entirely in
>> memory is likely to be unworkable as it would leave little room for
>> anything else to run on your average machine. I would suggest either
>> memory-mapping the file itself, or parsing and writing out a
>> memory-mapped summary file containing the bits of data you're interested
>> in. (Memory-mapping is where you keep an index in memory indicating
>> where in the file each record is, so that when you need to access them
>> you load them on-the-fly from the file and drop them out of memory again
>> immediately after use. An accelerated form of this is to put the loaded
>> records into some kind of LRU cache which holds only the most recently
>> accessed records and then check that cache first to see if you've
>> already loaded the record before accessing the file directly.)
>>
>> cheers,
>> Richard
>>
>>
>> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
>> > Hi,
>> > I would like to post again with some adjustments a question I put some
>> > times ago because maybe this is a more correct list, apologize for the
>> > repeating.
>> > Can someone kindly give me his advise?
>> >
>> > thank you in advance,
>> > Paolo
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: Paolo Pavan <paolo.pavan at gmail.com>
>> > Date: 2009/7/9
>> > Subject: Assembly data reading
>> > To: Biojava-l at lists.open-bio.org
>> >
>> >
>> > Hi everybody,
>> > I'm almost new to this topic, I would like to know if there is
>> > something can help me to load in my java program data from a large 454
>> > contig. I need to retain in memory and access data from the single
>> > reads forming the contig too.
>> > I suppose these informations are in a *.sff file, if it is not
>> > possible to load such file it should be ok to load a *.ace (phrap)
>> > data file that I have too.
>> > Many thanks for any suggestion you can give me!
>> >
>> > Greetings,
>> > Paolo
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


From aradwen at gmail.com  Wed Jul 15 07:10:03 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Wed, 15 Jul 2009 13:10:03 +0200
Subject: [Biojava-dev] PsiPred
Message-ID: <e591b1bd0907150410jb37063bl66e4a986d7134e87@mail.gmail.com>

Hi mates,

I was wondering if Biojava could handle PsiPred outputs (protein secondary
structures) for parsing (eg saying well this protein have helix start at ..
end at ..., sheet start at ... end at ... ), is there any class or methods
that was done in that sens, if yes i'm interested.

thank you

From andreas at sdsc.edu  Wed Jul 15 11:13:07 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 15 Jul 2009 08:13:07 -0700
Subject: [Biojava-dev] PsiPred
In-Reply-To: <e591b1bd0907150410jb37063bl66e4a986d7134e87@mail.gmail.com>
References: <e591b1bd0907150410jb37063bl66e4a986d7134e87@mail.gmail.com>
Message-ID: <59a41c430907150813q7fd58a81uc2d4372f01d2e89a@mail.gmail.com>

Hi Radwen,

At the present there is no parser for PsiPred. I am happy about any
contribution re. that...

Andreas


On Wed, Jul 15, 2009 at 4:10 AM, Radwen ANIBA<aradwen at gmail.com> wrote:
> Hi mates,
>
> I was wondering if Biojava could handle PsiPred outputs (protein secondary
> structures) for parsing (eg saying well this protein have helix start at ..
> end at ..., sheet start at ... end at ... ), is there any class or methods
> that was done in that sens, if yes i'm interested.
>
> thank you
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From heuermh at acm.org  Wed Jul 15 15:22:14 2009
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 15 Jul 2009 15:22:14 -0400 (EDT)
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
Message-ID: <Pine.GSO.4.44.0907151504410.1033-100000@shell3.shore.net>

Richard Holland wrote:

> Andreas and I would like to organise a hackathon to get the
> modularisation and general improvement plans for BJ3 into action, and
> bring the project forward into the 21st century (only 10 years late!).
>
> At this time I'm trying to gather interest and gauge who might
> realistically be able to attend. We will attempt to site the hackathon
> at a location closest to the majority of attendees.
>
> To help me plan numbers and likely costs (for potential sponsors) could
> all those who are interested please answer the following questions for
> me:
>
>  1. Name,
>  2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
>  3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>  4. Whether you think your employer would help pay your airfare and/or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
>  5. Approximate availability for the next 12 months.


A hackathon would be great.

I'm closest to MSP airport.  My employer would probably not cover air fare
or hotel.  I would then recommend choosing an interesting location so
that it would be worth spending out-of-pocket to get there.

I don't use biojava for my day job any more, so I'm most interested in
helping with architecture and build issues.  My day job is currently a lot
of data viz, maybe better integration with viz tools like Cytoscape,
Piccolo2D, prefuse, and Processing would be fun to work on.

   michael


From sreekanth.m at ocimumbio.com  Thu Jul 16 02:07:19 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Thu, 16 Jul 2009 11:37:19 +0530
Subject: [Biojava-dev] Reg: SeqIOTools Class
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com>


Hi Everybody,

I am newly working with biojava. While I am trying to read Fasta file using biojava with following code

SequenceIterator stream = SeqIOTools.readFastaDNA(new BufferedReader(new FileReader(fileName)));

for this I am importing following class
import org.biojava.bio.seq.io.SeqIOTools;
But it is showing a warning that "SeqIOTools" is deprecated.

Is there any other class which satisfies all the functionality of "SeqIOTools" class.

Please suggest me in this regard.

Thanks in Advance
Sreekanth. M


From mark.schreiber at novartis.com  Thu Jul 16 02:11:52 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 16 Jul 2009 14:11:52 +0800
Subject: [Biojava-dev] Reg: SeqIOTools Class
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com>
Message-ID: <OF2508938A.213CDBBD-ON482575F5.0021F5AA-482575F5.00220BB2@ah.novartis.com>

Hi -

The replacement for this class is RichSequence.IOTools

- Mark


biojava-dev-bounces at lists.open-bio.org wrote on 07/16/2009 02:07:19 PM:

> 
> Hi Everybody,
> 
> I am newly working with biojava. While I am trying to read Fasta 
> file using biojava with following code
> 
> SequenceIterator stream = SeqIOTools.readFastaDNA(new 
> BufferedReader(new FileReader(fileName)));
> 
> for this I am importing following class
> import org.biojava.bio.seq.io.SeqIOTools;
> But it is showing a warning that "SeqIOTools" is deprecated.
> 
> Is there any other class which satisfies all the functionality of 
> "SeqIOTools" class.
> 
> Please suggest me in this regard.
> 
> Thanks in Advance
> Sreekanth. M
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.

From sreekanth.m at ocimumbio.com  Thu Jul 16 02:38:31 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Thu, 16 Jul 2009 12:08:31 +0530
Subject: [Biojava-dev] Reg: Exporting into fasta format
In-Reply-To: <OF2508938A.213CDBBD-ON482575F5.0021F5AA-482575F5.00220BB2@ah.novartis.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com>
	<OF2508938A.213CDBBD-ON482575F5.0021F5AA-482575F5.00220BB2@ah.novartis.com>
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F8916A7@EXCHMB.ocimumbio.com>

Hi Everybody,

I need to Export the sequences into fasta format.
Please suggest me how to export into fasta format.
I have written my own code to export it, but I want to implement it using biojava.


Thanks & Regards,
Sreekanth. M


From ayates at ebi.ac.uk  Sun Jul 19 15:15:41 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Sun, 19 Jul 2009 20:15:41 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
References: <1247505253.27493.15.camel@buzzybee>
Message-ID: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>

Okay another one for the ball:

1). Andy Yates
2). Erm well getting it right so I guess that lands me in testing/ 
integration/killing every singleton
3). Cambridge
4). Currently in a Perl group so not a chance really
5). Quite flexible

How does that sound?

On 13 Jul 2009, at 18:14, Richard Holland wrote:

> Hi all.
>
> Andreas and I would like to organise a hackathon to get the
> modularisation and general improvement plans for BJ3 into action, and
> bring the project forward into the 21st century (only 10 years late!).
>
> At this time I'm trying to gather interest and gauge who might
> realistically be able to attend. We will attempt to site the hackathon
> at a location closest to the majority of attendees.
>
> To help me plan numbers and likely costs (for potential sponsors)  
> could
> all those who are interested please answer the following questions for
> me:
>
> 1. Name,
> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
> 3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
> 4. Whether you think your employer would help pay your airfare and/ 
> or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
> 5. Approximate availability for the next 12 months.
>
> To get the ball rolling, here's me:
>
> 1. Richard Holland,  2. Making the whole thing more consistent,
> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
> otherwise can't do anything until mid-March 2010 onwards.
>
> Looking forward to hearing your comments! Once I have a good idea of
> numbers and distribution, I can get some costs together to give you  
> (and
> any potential sponsors) the best idea of what might be involved.
>
> cheers,
> Richard
>
> -- 
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas at sdsc.edu  Mon Jul 20 16:11:52 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Jul 2009 13:11:52 -0700
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
	<1247462435.25217.7.camel@buzzybee>
	<93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>
	<56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com>
Message-ID: <59a41c430907201311o4d287651k3adc2069b2c95f61@mail.gmail.com>

Hi Paolo,

Not sure if you got a response to your mail off list. If there is
sufficient interest from the people working on processing the output
of the various sequencers, it would be great if those people would
work together to get a new biojava module started. Most probably
somebody needs to take initiative and lead the development, otherwise
it won't happen.

Cheers,
Andreas


On Tue, Jul 14, 2009 at 9:08 AM, Paolo Pavan<paolo.pavan at gmail.com> wrote:
> Dear all,
> I took a day to make a rapid search to try to have a clearest point of
> the situation.
> ? ? ? ? I found the specification of the .sff file in the 454 instrument
> manual, it is fully described and seems to be enough to build a
> reader.
> ? ? ? ? However from a more careful read it seems that a *.sff file brings
> not information about the automatic contig assembling and only stores
> flowgram info that are "reads" (not like a *.ace file indeed).
> ? ? ? ? Two hidden binary files can be found in a 454 gsAssembler project
> folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are
> not described in the manual but they seem to contain the former
> nucleotide data and the latter read names, they are big enough to
> contain such kind of data, the problem is that we don't know how to
> parse them.
> ? ? ? ? It is necessary to decide a "memory structure" in which store the
> information read, I agree on the "memory mapping" solution, maybe
> implemented with a Map object that can associate the names of the read
> and its location on the file.
> ? ? ? ? the parser class then should expose methods to:
> ? ? ? ?1) iterate through reads, but maybe this should be heavy and avoidable
> ? ? ? ?2) access read sequence from name
> ? ? ? ? if the parser should manage the assembled contigs too and this is
> subordinated to what explained in the third bullet point, it should
> expose method to:
> ? ? ? ?1) iterate through contigs names
> ? ? ? ?2) iterate through contigs consensus sequences
> ? ? ? ?3) access consensus sequences from name (this is a sub problem of point 2)
> ? ? ? ?4) access random aligned portions (I mean "slice") of the assembly
> given start-end positions returning an alignment object
> ? ? ? ? any more suggestions?
> I would be glad to be involved in the biojava community through this
> project and I could try but first of all I want to say that I?m not a
> guru like most of the people here ( :-p ) and to say the truth the job
> that my company required me is different and maybe if exists a
> workaround I should be honest to choose it.
> So let me think a bit about starting such adventure, if I can couple
> my job and contributing the community growth I?ll be happy to share my
> work! Any suggestion welcome.
>
> Bye bye,
> Paolo
>
>
> 2009/7/13 Mark Schreiber <markjschreiber at gmail.com>:
>> I would agree that there is a strong need for this kind of thing in biojava.
>>
>> As Richard says you probably can't fit it in memory so you may want to
>> memory map it. There are classes in the javax.nio package that can help a
>> lot with this.
>>
>> Also I have had some success with in-memory compression of large files using
>> LZ compression. Essentially the memory representation of the file is LZ
>> compressed and compression and decompression are handled on the fly. Again
>> there are Java utility classes that can help.
>>
>> - Mark
>>
>> On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland <holland at eaglegenomics.com>
>> wrote:
>>>
>>> Nothing within BJ can parse the 454 .sff files directly. However I think
>>> there is a growing need for it so if anyone is willing to contribute
>>> code, it would be very welcome.
>>>
>>> There is also no .ace parser, although in 2007 someone volunteered to
>>> write one but nothing happened, and there was a previous post (many
>>> years ago!) from someone else who already had some working code but
>>> again nothing seems to have happened:
>>>
>>> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
>>> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html
>>>
>>> So to start with, someone (perhaps yourself? that would be nice! :) )
>>> needs to volunteer to write either a .ace or .sff parser, or both.
>>>
>>> The thing to bear in mind with 454 contigs as you rightly point out is
>>> the sheer size of the things. The requirement to keep them entirely in
>>> memory is likely to be unworkable as it would leave little room for
>>> anything else to run on your average machine. I would suggest either
>>> memory-mapping the file itself, or parsing and writing out a
>>> memory-mapped summary file containing the bits of data you're interested
>>> in. (Memory-mapping is where you keep an index in memory indicating
>>> where in the file each record is, so that when you need to access them
>>> you load them on-the-fly from the file and drop them out of memory again
>>> immediately after use. An accelerated form of this is to put the loaded
>>> records into some kind of LRU cache which holds only the most recently
>>> accessed records and then check that cache first to see if you've
>>> already loaded the record before accessing the file directly.)
>>>
>>> cheers,
>>> Richard
>>>
>>>
>>> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
>>> > Hi,
>>> > I would like to post again with some adjustments a question I put some
>>> > times ago because maybe this is a more correct list, apologize for the
>>> > repeating.
>>> > Can someone kindly give me his advise?
>>> >
>>> > thank you in advance,
>>> > Paolo
>>> >
>>> >
>>> > ---------- Forwarded message ----------
>>> > From: Paolo Pavan <paolo.pavan at gmail.com>
>>> > Date: 2009/7/9
>>> > Subject: Assembly data reading
>>> > To: Biojava-l at lists.open-bio.org
>>> >
>>> >
>>> > Hi everybody,
>>> > I'm almost new to this topic, I would like to know if there is
>>> > something can help me to load in my java program data from a large 454
>>> > contig. I need to retain in memory and access data from the single
>>> > reads forming the contig too.
>>> > I suppose these informations are in a *.sff file, if it is not
>>> > possible to load such file it should be ok to load a *.ace (phrap)
>>> > data file that I have too.
>>> > Many thanks for any suggestion you can give me!
>>> >
>>> > Greetings,
>>> > Paolo
>>> > _______________________________________________
>>> > biojava-dev mailing list
>>> > biojava-dev at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From matias.piipari at gmail.com  Tue Jul 21 18:07:47 2009
From: matias.piipari at gmail.com (Matias Piipari)
Date: Tue, 21 Jul 2009 23:07:47 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>
References: <1247505253.27493.15.camel@buzzybee>
	<3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>
Message-ID: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com>

 1. Matias Piipari
 2. sequences + sequence motifs
 3. Cambridge
 4. Possible but not terribly likely.
 5. Flexible


On Sun, Jul 19, 2009 at 8:15 PM, Andy Yates <ayates at ebi.ac.uk> wrote:

> Okay another one for the ball:
>
> 1). Andy Yates
> 2). Erm well getting it right so I guess that lands me in
> testing/integration/killing every singleton
> 3). Cambridge
> 4). Currently in a Perl group so not a chance really
> 5). Quite flexible
>
> How does that sound?
>
>
> On 13 Jul 2009, at 18:14, Richard Holland wrote:
>
>  Hi all.
>>
>> Andreas and I would like to organise a hackathon to get the
>> modularisation and general improvement plans for BJ3 into action, and
>> bring the project forward into the 21st century (only 10 years late!).
>>
>> At this time I'm trying to gather interest and gauge who might
>> realistically be able to attend. We will attempt to site the hackathon
>> at a location closest to the majority of attendees.
>>
>> To help me plan numbers and likely costs (for potential sponsors) could
>> all those who are interested please answer the following questions for
>> me:
>>
>> 1. Name,
>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
>> sequencing, etc.),
>> 3. Your physical location (country and and nearest major city - e.g.
>> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>> 4. Whether you think your employer would help pay your airfare and/or 1
>> week in a hotel to attend (and how far you think you could go on such
>> funding),
>> 5. Approximate availability for the next 12 months.
>>
>> To get the ball rolling, here's me:
>>
>> 1. Richard Holland,  2. Making the whole thing more consistent,
>> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
>> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
>> otherwise can't do anything until mid-March 2010 onwards.
>>
>> Looking forward to hearing your comments! Once I have a good idea of
>> numbers and distribution, I can get some costs together to give you (and
>> any potential sponsors) the best idea of what might be involved.
>>
>> cheers,
>> Richard
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From HWillis at scripps.edu  Wed Jul 22 14:34:26 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 22 Jul 2009 14:34:26 -0400
Subject: [Biojava-dev] Bug Tracking
Message-ID: <C68CD3F2.17BE%HWillis@scripps.edu>

Do we have a formal defect/feature request tracking setup for Biojava?

In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following.

I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine

Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine.

Thanks

Scooter


From andreas at sdsc.edu  Wed Jul 22 15:15:50 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 22 Jul 2009 12:15:50 -0700
Subject: [Biojava-dev] Bug Tracking
In-Reply-To: <C68CD3F2.17BE%HWillis@scripps.edu>
References: <C68CD3F2.17BE%HWillis@scripps.edu>
Message-ID: <59a41c430907221215y10b95473y25fa9d94c82afc3f@mail.gmail.com>

Hi Scooter,

we have bugzilla running at:
http://bugzilla.open-bio.org/

Andreas

On Wed, Jul 22, 2009 at 11:34 AM, Scooter Willis<HWillis at scripps.edu> wrote:
> Do we have a formal defect/feature request tracking setup for Biojava?
>
> In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following.
>
> I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine
>
> Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine.
>
> Thanks
>
> Scooter
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at eaglegenomics.com  Wed Jul 22 15:16:19 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 22 Jul 2009 20:16:19 +0100
Subject: [Biojava-dev] Bug Tracking
In-Reply-To: <C68CD3F2.17BE%HWillis@scripps.edu>
References: <C68CD3F2.17BE%HWillis@scripps.edu>
Message-ID: <1248290179.28124.54.camel@buzzybee>

Yup, we do. It's here:

http://bugzilla.open-bio.org/

cheers,
Richard

On Wed, 2009-07-22 at 14:34 -0400, Scooter Willis wrote:
> Do we have a formal defect/feature request tracking setup for Biojava?
> 
> In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following.
> 
> I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine
> 
> Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine.
> 
> Thanks
> 
> Scooter
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From jw12 at sanger.ac.uk  Fri Jul 24 05:12:21 2009
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Fri, 24 Jul 2009 10:12:21 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
References: <1247505253.27493.15.camel@buzzybee>
Message-ID: <C98C8A44-8C85-46D7-8F31-94DD26FBC9E8@sanger.ac.uk>

Hi Richard and Andreas, you both know where I'm situated, but for the  
record:

1. Jonathan Warren
2. Any DAS related libararies, visualization
3. Cambridge
4. Yes I think so, definitely if in UK.
5. Anytime.... although obviously I'm very busy, just not many  
definite commitments ;)

There seem to be many DAS classes under the biojava-live site, but  
many of them are not used in any of the DAS related code I have, with  
the exception of Structure and Alignment classes that are used in some  
dazzle plugins. So it might be good to update some of these classes to  
new more relevant code. I also notice that Apollo was trying to get  
away from using Biojava code for it's DAS 1.5 adapter. Maybe the new  
modular design of biojava will resolve issues that Apollo developers  
had?

Anyway- I guess I'm asking Andreas or anyone else if they know the  
history of some of these classes e.g. org.biojava.bio.program.das  
package?


On 13 Jul 2009, at 18:14, Richard Holland wrote:

> 1. Name,
> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
> 3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
> 4. Whether you think your employer would help pay your airfare and/ 
> or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
> 5. Approximate availability for the next 12 months.

Jonathan Warren
Senior Developer and DAS coordinator
jw12 at sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From julie at flymine.org  Fri Jul 24 05:25:03 2009
From: julie at flymine.org (Julie Sullivan)
Date: Fri, 24 Jul 2009 10:25:03 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com>
References: <1247505253.27493.15.camel@buzzybee>	<3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>
	<15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com>
Message-ID: <4A697DEF.2020304@flymine.org>

1. Julie Sullivan
2. InterMine uses BioJava to handle sequences and pdb data
3. Cambridge
4. No
5. Flexible

>>> Andreas and I would like to organise a hackathon to get the
>>> modularisation and general improvement plans for BJ3 into action, and
>>> bring the project forward into the 21st century (only 10 years late!).
>>>
>>> At this time I'm trying to gather interest and gauge who might
>>> realistically be able to attend. We will attempt to site the hackathon
>>> at a location closest to the majority of attendees.
>>>
>>> To help me plan numbers and likely costs (for potential sponsors) could
>>> all those who are interested please answer the following questions for
>>> me:
>>>
>>> 1. Name,
>>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
>>> sequencing, etc.),
>>> 3. Your physical location (country and and nearest major city - e.g.
>>> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>>> 4. Whether you think your employer would help pay your airfare and/or 1
>>> week in a hotel to attend (and how far you think you could go on such
>>> funding),
>>> 5. Approximate availability for the next 12 months.
>>>
>>> To get the ball rolling, here's me:
>>>
>>> 1. Richard Holland,  2. Making the whole thing more consistent,
>>> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
>>> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
>>> otherwise can't do anything until mid-March 2010 onwards.
>>>
>>> Looking forward to hearing your comments! Once I have a good idea of
>>> numbers and distribution, I can get some costs together to give you (and
>>> any potential sponsors) the best idea of what might be involved.
>>>
>>> cheers,
>>> Richard
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 

From gmicha at gmail.com  Fri Jul 24 06:38:04 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Fri, 24 Jul 2009 12:38:04 +0200
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
References: <1247505253.27493.15.camel@buzzybee>
Message-ID: <4A698F0C.9090402@gmail.com>

1) Michael Sammeth

2) sequencing, gene expression, splicing, alignment

3) Barcelona beach

4) continental maybe yes, states probably not

5) not available from mid Oct to end of Nov,
    the rest of the time probably just busy as
    usual

Cheers,

micha


Richard Holland wrote:
> Hi all.
> 
> Andreas and I would like to organise a hackathon to get the
> modularisation and general improvement plans for BJ3 into action, and
> bring the project forward into the 21st century (only 10 years late!).
> 
> At this time I'm trying to gather interest and gauge who might
> realistically be able to attend. We will attempt to site the hackathon
> at a location closest to the majority of attendees.
> 
> To help me plan numbers and likely costs (for potential sponsors) could
> all those who are interested please answer the following questions for
> me:
> 
>  1. Name,
>  2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
>  3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>  4. Whether you think your employer would help pay your airfare and/or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
>  5. Approximate availability for the next 12 months.
> 
> To get the ball rolling, here's me:
> 
>  1. Richard Holland,  2. Making the whole thing more consistent,
> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
> otherwise can't do anything until mid-March 2010 onwards.
> 
> Looking forward to hearing your comments! Once I have a good idea of
> numbers and distribution, I can get some costs together to give you (and
> any potential sponsors) the best idea of what might be involved.
> 
> cheers,
> Richard
> 


-- 
O       o O       o O       o    Dr. Michael Sammeth
| O   o | | O   o | | O   o |         http://www.sammeth.net
| | O | | | | O | GRIB| O   |         Phone: +34-933-160-166
| o   O | | o   O | | o   O |    Fax:   +34 933-969-983
o       O o       O o       O    Dr. Aiguader 88, 08003 Barcelona

From andreas at sdsc.edu  Fri Jul 24 11:23:42 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 24 Jul 2009 08:23:42 -0700
Subject: [Biojava-dev] Hackathon
In-Reply-To: <C98C8A44-8C85-46D7-8F31-94DD26FBC9E8@sanger.ac.uk>
References: <1247505253.27493.15.camel@buzzybee>
	<C98C8A44-8C85-46D7-8F31-94DD26FBC9E8@sanger.ac.uk>
Message-ID: <59a41c430907240823u5dc152i9c6a5854adf9ba0e@mail.gmail.com>

> There seem to be many DAS classes under the biojava-live site, but many of
> them are not used in any of the DAS related code I have, with the exception
> of Structure and Alignment classes that are used in some dazzle plugins.

Many of the org.biojava.bio.program.das  code is quite ancient and
should be deprecated... In a new biojava-das related module would be
nice to merge in one of the more modern DAS libraries (dasobert?) as a
replacement...

Andreas


> it might be good to update some of these classes to new more relevant code.
> I also notice that Apollo was trying to get away from using Biojava code for
> it's DAS 1.5 adapter. Maybe the new modular design of biojava will resolve
> issues that Apollo developers had?
>
> Anyway- I guess I'm asking Andreas or anyone else if they know the history
> of some of these classes e.g. org.biojava.bio.program.das package?
>
>
>
> On 13 Jul 2009, at 18:14, Richard Holland wrote:
>
>> 1. Name,
>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
>> sequencing, etc.),
>> 3. Your physical location (country and and nearest major city - e.g.
>> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>> 4. Whether you think your employer would help pay your airfare and/or 1
>> week in a hotel to attend (and how far you think you could go on such
>> funding),
>> 5. Approximate availability for the next 12 months.
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> jw12 at sanger.ac.uk
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a
> charity registered in England with number 1021457 and acompany registered in
> England with number 2742969, whose registeredoffice is 215 Euston Road,
> London, NW1 2BE._______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From holland at eaglegenomics.com  Mon Jul 27 09:23:19 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 27 Jul 2009 14:23:19 +0100
Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files?
In-Reply-To: <200907271416.33485.florian.mittag@uni-tuebingen.de>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com>
	<200907271416.33485.florian.mittag@uni-tuebingen.de>
Message-ID: <1248700999.2803.103.camel@buzzybee>


> My question to this list again:
> Is there a way to achieve my goal of parsing a 200MB Genbank file with the 
> current biojava version without code changes?

Probably not. The internal requirement to convert everything into
SymbolLists and back again really does get in the way. This is one of
the main drivers behind BioJava3 - to refactor out unnecessary
complexity, of which this is a prime example.

The ideal solution would be to parse the file and keep the sequence as a
string, only to be converted into Symbols when _absolutely necessary_ -
otherwise to remain as a string (or even just as a pointer to a string
stored on a disk-based temporary file repository somewhere, to save
memory). Hibernate et al could then work directly with the string.

cheers,
Richard

> 
> - Florian
> 
> 
> 
> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
> > wrote:
> >
> > Hi!
> >
> > I think this is a problem worth of its own thread, so I'll start one:
> >
> > I want to store all human chromosomes in a BioSQL database after I loaded
> > the
> > information from .gbk files. The files I get from NCBI with the following
> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804:
> >
> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0
> >00023&rettype=gbwithparts&retmode=text
> >
> > I then try to parse the files as described in
> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi
> >les but it wont work. While there are no problems parsing 1804 and 24,
> > chromosome
> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space.
> >
> > Here is a stack trace (the line numbers might differ, because I already
> > tried
> > to improve GenbankFormat.java in memory efficiency):
> >
> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> >        at
> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis
> >tFactory.java:222) at
> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ
> >enceBuilder.java:256) at
> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5
> >35) at
> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.
> >java:110) at
> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main.
> >java:537) at
> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46
> >8) at
> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164)
> >
> > The line in GenbankFormat.java is:
> >
> > rlistener.addSymbols(
> >        symParser.getAlphabet(),
> >        (Symbol[])(sl.toList().toArray(new Symbol[0])),
> >        0, sl.length());
> >
> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails
> > later
> > inside the addSymbols method, but it always fails.
> >
> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory
> > should be more than enough. Browsing through the source code, I discovered
> > what I think of as very inefficient handling of sequences:
> >
> > 1) the sequence string is read from file into a StringBuffer
> > 2) it is converted to a string (with whitespaces removed)
> > 3) a SimpleSymbolList is created out of the string
> > 4) the SymbolList is converted to a List of Symbols
> > 5) the List is converted to an array of Symbols
> > 6) the array is passed to addSymbols
> > 7) there it is added to a ChunkedSymbolListFactory
> > 8) if at some point the sequence is requested, a SymbolList is created and
> > then converted to a string.
> >
> > You see, there is a lot of copying and converting, but in the end I have
> > the same string I started with. Well, I had the string, if it ever reached
> > the end, because it will crash before completing this process.
> >
> >
> > Am I doing something wrong or is there a great potential of improving
> > parsing
> > of Genbank files?
> >
> >
> > Regards,
> >   Florian
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From paolo.pavan at gmail.com  Mon Jul 27 12:47:46 2009
From: paolo.pavan at gmail.com (Paolo Pavan)
Date: Mon, 27 Jul 2009 18:47:46 +0200
Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files?
In-Reply-To: <1248700999.2803.103.camel@buzzybee>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com>
	<200907271416.33485.florian.mittag@uni-tuebingen.de>
	<1248700999.2803.103.camel@buzzybee>
Message-ID: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com>

Calling a garbage collection among the steps doesn't bring to
anything, isn't it?

2009/7/27 Richard Holland <holland at eaglegenomics.com>:
>
>> My question to this list again:
>> Is there a way to achieve my goal of parsing a 200MB Genbank file with the
>> current biojava version without code changes?
>
> Probably not. The internal requirement to convert everything into
> SymbolLists and back again really does get in the way. This is one of
> the main drivers behind BioJava3 - to refactor out unnecessary
> complexity, of which this is a prime example.
>
> The ideal solution would be to parse the file and keep the sequence as a
> string, only to be converted into Symbols when _absolutely necessary_ -
> otherwise to remain as a string (or even just as a pointer to a string
> stored on a disk-based temporary file repository somewhere, to save
> memory). Hibernate et al could then work directly with the string.
>
> cheers,
> Richard
>
>>
>> - Florian
>>
>>
>>
>> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
>> > wrote:
>> >
>> > Hi!
>> >
>> > I think this is a problem worth of its own thread, so I'll start one:
>> >
>> > I want to store all human chromosomes in a BioSQL database after I loaded
>> > the
>> > information from .gbk files. The files I get from NCBI with the following
>> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804:
>> >
>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0
>> >00023&rettype=gbwithparts&retmode=text
>> >
>> > I then try to parse the files as described in
>> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi
>> >les but it wont work. While there are no problems parsing 1804 and 24,
>> > chromosome
>> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space.
>> >
>> > Here is a stack trace (the line numbers might differ, because I already
>> > tried
>> > to improve GenbankFormat.java in memory efficiency):
>> >
>> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>> > ? ? ? ?at
>> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis
>> >tFactory.java:222) at
>> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ
>> >enceBuilder.java:256) at
>> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5
>> >35) at
>> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.
>> >java:110) at
>> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main.
>> >java:537) at
>> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46
>> >8) at
>> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164)
>> >
>> > The line in GenbankFormat.java is:
>> >
>> > rlistener.addSymbols(
>> > ? ? ? ?symParser.getAlphabet(),
>> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])),
>> > ? ? ? ?0, sl.length());
>> >
>> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails
>> > later
>> > inside the addSymbols method, but it always fails.
>> >
>> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory
>> > should be more than enough. Browsing through the source code, I discovered
>> > what I think of as very inefficient handling of sequences:
>> >
>> > 1) the sequence string is read from file into a StringBuffer
>> > 2) it is converted to a string (with whitespaces removed)
>> > 3) a SimpleSymbolList is created out of the string
>> > 4) the SymbolList is converted to a List of Symbols
>> > 5) the List is converted to an array of Symbols
>> > 6) the array is passed to addSymbols
>> > 7) there it is added to a ChunkedSymbolListFactory
>> > 8) if at some point the sequence is requested, a SymbolList is created and
>> > then converted to a string.
>> >
>> > You see, there is a lot of copying and converting, but in the end I have
>> > the same string I started with. Well, I had the string, if it ever reached
>> > the end, because it will crash before completing this process.
>> >
>> >
>> > Am I doing something wrong or is there a great potential of improving
>> > parsing
>> > of Genbank files?
>> >
>> >
>> > Regards,
>> > ? Florian
>> > _______________________________________________
>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From markjschreiber at gmail.com  Mon Jul 27 22:52:44 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 28 Jul 2009 10:52:44 +0800
Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files?
In-Reply-To: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de> 
	<93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> 
	<200907271416.33485.florian.mittag@uni-tuebingen.de>
	<1248700999.2803.103.camel@buzzybee> 
	<56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com>
Message-ID: <93b45ca50907271952k689c2f27h78bf7b9cc47e7d45@mail.gmail.com>

Dear Paolo -

Calling the garbage collector is generally not required and often not
recommended. Modern JVMs do a better job of this than programmers do.
Also a garbage collector cannot release memory that is allocated to
objects that still contain references. I suspect the problem here is
that objects are being copied and references are being retained to the
old copies. These old copies are not really required and therefore the
references can be set to null which will allow the GC to clean them
up.

Also, manually calling the GC is very aggressive and forces the JVM to
dump all classes it is not currently using, when the class is called
again the classloader will need to reload it which can result in a
performance hit.

- Mark

On Tue, Jul 28, 2009 at 12:47 AM, Paolo Pavan<paolo.pavan at gmail.com> wrote:
> Calling a garbage collection among the steps doesn't bring to
> anything, isn't it?
>
> 2009/7/27 Richard Holland <holland at eaglegenomics.com>:
>>
>>> My question to this list again:
>>> Is there a way to achieve my goal of parsing a 200MB Genbank file with the
>>> current biojava version without code changes?
>>
>> Probably not. The internal requirement to convert everything into
>> SymbolLists and back again really does get in the way. This is one of
>> the main drivers behind BioJava3 - to refactor out unnecessary
>> complexity, of which this is a prime example.
>>
>> The ideal solution would be to parse the file and keep the sequence as a
>> string, only to be converted into Symbols when _absolutely necessary_ -
>> otherwise to remain as a string (or even just as a pointer to a string
>> stored on a disk-based temporary file repository somewhere, to save
>> memory). Hibernate et al could then work directly with the string.
>>
>> cheers,
>> Richard
>>
>>>
>>> - Florian
>>>
>>>
>>>
>>> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
>>> > wrote:
>>> >
>>> > Hi!
>>> >
>>> > I think this is a problem worth of its own thread, so I'll start one:
>>> >
>>> > I want to store all human chromosomes in a BioSQL database after I loaded
>>> > the
>>> > information from .gbk files. The files I get from NCBI with the following
>>> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804:
>>> >
>>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0
>>> >00023&rettype=gbwithparts&retmode=text
>>> >
>>> > I then try to parse the files as described in
>>> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi
>>> >les but it wont work. While there are no problems parsing 1804 and 24,
>>> > chromosome
>>> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space.
>>> >
>>> > Here is a stack trace (the line numbers might differ, because I already
>>> > tried
>>> > to improve GenbankFormat.java in memory efficiency):
>>> >
>>> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>> > ? ? ? ?at
>>> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis
>>> >tFactory.java:222) at
>>> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ
>>> >enceBuilder.java:256) at
>>> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5
>>> >35) at
>>> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.
>>> >java:110) at
>>> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main.
>>> >java:537) at
>>> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46
>>> >8) at
>>> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164)
>>> >
>>> > The line in GenbankFormat.java is:
>>> >
>>> > rlistener.addSymbols(
>>> > ? ? ? ?symParser.getAlphabet(),
>>> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])),
>>> > ? ? ? ?0, sl.length());
>>> >
>>> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails
>>> > later
>>> > inside the addSymbols method, but it always fails.
>>> >
>>> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory
>>> > should be more than enough. Browsing through the source code, I discovered
>>> > what I think of as very inefficient handling of sequences:
>>> >
>>> > 1) the sequence string is read from file into a StringBuffer
>>> > 2) it is converted to a string (with whitespaces removed)
>>> > 3) a SimpleSymbolList is created out of the string
>>> > 4) the SymbolList is converted to a List of Symbols
>>> > 5) the List is converted to an array of Symbols
>>> > 6) the array is passed to addSymbols
>>> > 7) there it is added to a ChunkedSymbolListFactory
>>> > 8) if at some point the sequence is requested, a SymbolList is created and
>>> > then converted to a string.
>>> >
>>> > You see, there is a lot of copying and converting, but in the end I have
>>> > the same string I started with. Well, I had the string, if it ever reached
>>> > the end, because it will crash before completing this process.
>>> >
>>> >
>>> > Am I doing something wrong or is there a great potential of improving
>>> > parsing
>>> > of Genbank files?
>>> >
>>> >
>>> > Regards,
>>> > ? Florian
>>> > _______________________________________________
>>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From abhishek.vit at gmail.com  Tue Jul 28 15:04:00 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Tue, 28 Jul 2009 15:04:00 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
Message-ID: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>

Hi Guys

Before I ask the question, let me introduce myself. I am Abhishek
primarily a Bioinformatician and this is my first mail here. I
realized sooner thn later that I have to use BioJava to make my life
easier. :)

So basically we have a lot of perl code where we would like to plugin
some Biojava code and some inhouse written packages/classes. I am just
wondering what is the best way to do so. Clearly I am not a java guy
so please excuse me in case I am asking something which is very basic.
I found couple of solutions after few googles but not sure which is
the efficient one.

Thanks,
-Abhi

From ayates at ebi.ac.uk  Tue Jul 28 17:54:17 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 28 Jul 2009 22:54:17 +0100
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
Message-ID: <A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>

Hi Abhi,

Well to answer your first question the only real way to do this is by  
shelling out to Java. Inter-process communication could then be dealt  
with by writing to temporary files or maybe communicating back over  
STDOUT.

The question I would ask you though is what particular part of BioJava  
are you using? Is there any reason why another similarly named Bio  
project (shall not mention it here as I think people think I'm  
becoming weak when it comes to Perl) cannot be used? As always when  
programming avoiding shelling out to another program if possible is  
always a good idea; sometimes it cannot happen say if you want to run  
clustalw but say shelling out to delete a file is unnecessary.

Andy

On 28 Jul 2009, at 20:04, Abhishek Pratap wrote:

> Hi Guys
>
> Before I ask the question, let me introduce myself. I am Abhishek
> primarily a Bioinformatician and this is my first mail here. I
> realized sooner thn later that I have to use BioJava to make my life
> easier. :)
>
> So basically we have a lot of perl code where we would like to plugin
> some Biojava code and some inhouse written packages/classes. I am just
> wondering what is the best way to do so. Clearly I am not a java guy
> so please excuse me in case I am asking something which is very basic.
> I found couple of solutions after few googles but not sure which is
> the efficient one.
>
> Thanks,
> -Abhi
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From abhishek.vit at gmail.com  Tue Jul 28 18:23:49 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Tue, 28 Jul 2009 18:23:49 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
Message-ID: <be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>

Hi Andy

Thanks for a quick reply.  I think SHELLING out will be too process
intensive as we expect thousands of call to same Java method. I also
read about the Perl modules Java::Inline. Is that any good ?

And to answer your second question I am basically using a inhouse
method which in turns used a lot of BioJava classes for DNA
manipulation.

Thanks,
-Abhi

On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote:
> Hi Abhi,
>
> Well to answer your first question the only real way to do this is by
> shelling out to Java. Inter-process communication could then be dealt with
> by writing to temporary files or maybe communicating back over STDOUT.
>
> The question I would ask you though is what particular part of BioJava are
> you using? Is there any reason why another similarly named Bio project
> (shall not mention it here as I think people think I'm becoming weak when it
> comes to Perl) cannot be used? As always when programming avoiding shelling
> out to another program if possible is always a good idea; sometimes it
> cannot happen say if you want to run clustalw but say shelling out to delete
> a file is unnecessary.
>
> Andy
>
> On 28 Jul 2009, at 20:04, Abhishek Pratap wrote:
>
>> Hi Guys
>>
>> Before I ask the question, let me introduce myself. I am Abhishek
>> primarily a Bioinformatician and this is my first mail here. I
>> realized sooner thn later that I have to use BioJava to make my life
>> easier. :)
>>
>> So basically we have a lot of perl code where we would like to plugin
>> some Biojava code and some inhouse written packages/classes. I am just
>> wondering what is the best way to do so. Clearly I am not a java guy
>> so please excuse me in case I am asking something which is very basic.
>> I found couple of solutions after few googles but not sure which is
>> the efficient one.
>>
>> Thanks,
>> -Abhi
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>

From markjschreiber at gmail.com  Tue Jul 28 19:24:30 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 29 Jul 2009 07:24:30 +0800
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
Message-ID: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>

Hi -

You could try and use something like CORBA but that would be quite ugly.

A nicer alternative would be to put the BioJava functionality in a web
service and send sequences as FASTA or some custom format??

I think WS is considered the best way for Java and .NET to talk so probably
it is for Perl too.

- Mark

On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:

Hi Andy

Thanks for a quick reply.  I think SHELLING out will be too process
intensive as we expect thousands of call to same Java method. I also
read about the Perl modules Java::Inline. Is that any good ?

And to answer your second question I am basically using a inhouse
method which in turns used a lot of BioJava classes for DNA
manipulation.

Thanks,
-Abhi

On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
Abhi, > > Well to answer ...

From ayates at ebi.ac.uk  Wed Jul 29 04:48:33 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 29 Jul 2009 09:48:33 +0100
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>	
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>	
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>	
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
	<93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>
Message-ID: <4A700CE1.3050901@ebi.ac.uk>

Indeed I would agree with Mark here and go for web services as the
desired solution. JAX-WS & CXF are both popular frameworks for doing Web
Services and as far as I remember Spring had some very nice helper
classes for quickly exposing any Java class as a web service. Then there
are other remoting protocols such as Hessian, Burlap, Protocol Buffers
or Thrift all of which are good in their own ways.

However Web Services should be the quickest (re implementation) way to
communicate with a persistent Java process.

Personally I would stay away from Java::Inline.

Andy

Mark Schreiber wrote:
> Hi -
> 
> You could try and use something like CORBA but that would be quite ugly.
> 
> A nicer alternative would be to put the BioJava functionality in a web
> service and send sequences as FASTA or some custom format??
> 
> I think WS is considered the best way for Java and .NET to talk so probably
> it is for Perl too.
> 
> - Mark
> 
> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:
> 
> Hi Andy
> 
> Thanks for a quick reply.  I think SHELLING out will be too process
> intensive as we expect thousands of call to same Java method. I also
> read about the Perl modules Java::Inline. Is that any good ?
> 
> And to answer your second question I am basically using a inhouse
> method which in turns used a lot of BioJava classes for DNA
> manipulation.
> 
> Thanks,
> -Abhi
> 
> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
> Abhi, > > Well to answer ...
> 

From abhishek.vit at gmail.com  Wed Jul 29 12:04:04 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Wed, 29 Jul 2009 12:04:04 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <4A700CE1.3050901@ebi.ac.uk>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
	<93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>
	<4A700CE1.3050901@ebi.ac.uk>
Message-ID: <be9b52410907290904y2d92bbaep6217321e62d15577@mail.gmail.com>

Thanks all. I think Java WS is a way out for me then. As you said it
would be code agnostic and will help me in updating the core code
later.

Just a quick question . Do you happen to know of any good tutorial to
implement a WS for a java process.

Thanks,
-Abhi

On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates<ayates at ebi.ac.uk> wrote:
> Indeed I would agree with Mark here and go for web services as the
> desired solution. JAX-WS & CXF are both popular frameworks for doing Web
> Services and as far as I remember Spring had some very nice helper
> classes for quickly exposing any Java class as a web service. Then there
> are other remoting protocols such as Hessian, Burlap, Protocol Buffers
> or Thrift all of which are good in their own ways.
>
> However Web Services should be the quickest (re implementation) way to
> communicate with a persistent Java process.
>
> Personally I would stay away from Java::Inline.
>
> Andy
>
> Mark Schreiber wrote:
>> Hi -
>>
>> You could try and use something like CORBA but that would be quite ugly.
>>
>> A nicer alternative would be to put the BioJava functionality in a web
>> service and send sequences as FASTA or some custom format??
>>
>> I think WS is considered the best way for Java and .NET to talk so probably
>> it is for Perl too.
>>
>> - Mark
>>
>> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:
>>
>> Hi Andy
>>
>> Thanks for a quick reply. ?I think SHELLING out will be too process
>> intensive as we expect thousands of call to same Java method. I also
>> read about the Perl modules Java::Inline. Is that any good ?
>>
>> And to answer your second question I am basically using a inhouse
>> method which in turns used a lot of BioJava classes for DNA
>> manipulation.
>>
>> Thanks,
>> -Abhi
>>
>> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
>> Abhi, > > Well to answer ...
>>
>


From ayates at ebi.ac.uk  Wed Jul 29 12:21:12 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 29 Jul 2009 17:21:12 +0100
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <be9b52410907290904y2d92bbaep6217321e62d15577@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>	
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>	
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>	
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>	
	<93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>	
	<4A700CE1.3050901@ebi.ac.uk>
	<be9b52410907290904y2d92bbaep6217321e62d15577@mail.gmail.com>
Message-ID: <4A7076F8.2040308@ebi.ac.uk>

Depends on what you're going to use but when I last did it I bought into
the Spring way of things and found that the spring manual was very good.
The WS bit is:

http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/ch21s05.html

It goes through doing it for JAX-WS & XFire.

There's also a JAX-WS tutorial from:

http://java.sun.com/javaee/5/docs/tutorial/doc/?wp405739&JAXWS.html#wp72279

To be honest though Google is your best friend here.

Good luck,

Andy

Abhishek Pratap wrote:
> Thanks all. I think Java WS is a way out for me then. As you said it
> would be code agnostic and will help me in updating the core code
> later.
> 
> Just a quick question . Do you happen to know of any good tutorial to
> implement a WS for a java process.
> 
> Thanks,
> -Abhi
> 
> On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates<ayates at ebi.ac.uk> wrote:
>> Indeed I would agree with Mark here and go for web services as the
>> desired solution. JAX-WS & CXF are both popular frameworks for doing Web
>> Services and as far as I remember Spring had some very nice helper
>> classes for quickly exposing any Java class as a web service. Then there
>> are other remoting protocols such as Hessian, Burlap, Protocol Buffers
>> or Thrift all of which are good in their own ways.
>>
>> However Web Services should be the quickest (re implementation) way to
>> communicate with a persistent Java process.
>>
>> Personally I would stay away from Java::Inline.
>>
>> Andy
>>
>> Mark Schreiber wrote:
>>> Hi -
>>>
>>> You could try and use something like CORBA but that would be quite ugly.
>>>
>>> A nicer alternative would be to put the BioJava functionality in a web
>>> service and send sequences as FASTA or some custom format??
>>>
>>> I think WS is considered the best way for Java and .NET to talk so probably
>>> it is for Perl too.
>>>
>>> - Mark
>>>
>>> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:
>>>
>>> Hi Andy
>>>
>>> Thanks for a quick reply.  I think SHELLING out will be too process
>>> intensive as we expect thousands of call to same Java method. I also
>>> read about the Perl modules Java::Inline. Is that any good ?
>>>
>>> And to answer your second question I am basically using a inhouse
>>> method which in turns used a lot of BioJava classes for DNA
>>> manipulation.
>>>
>>> Thanks,
>>> -Abhi
>>>
>>> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
>>> Abhi, > > Well to answer ...
>>>

From Russell.Smithies at agresearch.co.nz  Wed Jul 29 16:25:05 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 30 Jul 2009 08:25:05 +1200
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>

You could always use BioPerl instead :-)
http://www.bioperl.org/wiki/Main_Page


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E? russell.smithies at agresearch.co.nz 

Invermay? Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T? +64 3 489 3809?? 
F? +64 3 489 9174? 
www.agresearch.co.nz 


> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-
> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
> Sent: Wednesday, 29 July 2009 7:04 a.m.
> To: biojava-dev at lists.open-bio.org
> Subject: [Biojava-dev] Hi.. Calling Java from Perl
> 
> Hi Guys
> 
> Before I ask the question, let me introduce myself. I am Abhishek
> primarily a Bioinformatician and this is my first mail here. I
> realized sooner thn later that I have to use BioJava to make my life
> easier. :)
> 
> So basically we have a lot of perl code where we would like to plugin
> some Biojava code and some inhouse written packages/classes. I am just
> wondering what is the best way to do so. Clearly I am not a java guy
> so please excuse me in case I am asking something which is very basic.
> I found couple of solutions after few googles but not sure which is
> the efficient one.
> 
> Thanks,
> -Abhi
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From ayates at ebi.ac.uk  Wed Jul 29 16:56:34 2009
From: ayates at ebi.ac.uk (ayates at ebi.ac.uk)
Date: Wed, 29 Jul 2009 21:56:34 +0100 (BST)
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>
Message-ID: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk>

That was my original point however it sounds from the original poster that
the system which is in Perl needs to call out to an already implemented
system in BioJava. In a perfect world this mismatch would never happen but
hey we all know it can :)

Andy

> You could always use BioPerl instead :-)
> http://www.bioperl.org/wiki/Main_Page
>
>
>
>
> Russell Smithies
>
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E? russell.smithies at agresearch.co.nz
>
> Invermay? Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T? +64 3 489 3809??
> F? +64 3 489 9174?
> www.agresearch.co.nz
>
>
>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-
>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>> Sent: Wednesday, 29 July 2009 7:04 a.m.
>> To: biojava-dev at lists.open-bio.org
>> Subject: [Biojava-dev] Hi.. Calling Java from Perl
>>
>> Hi Guys
>>
>> Before I ask the question, let me introduce myself. I am Abhishek
>> primarily a Bioinformatician and this is my first mail here. I
>> realized sooner thn later that I have to use BioJava to make my life
>> easier. :)
>>
>> So basically we have a lot of perl code where we would like to plugin
>> some Biojava code and some inhouse written packages/classes. I am just
>> wondering what is the best way to do so. Clearly I am not a java guy
>> so please excuse me in case I am asking something which is very basic.
>> I found couple of solutions after few googles but not sure which is
>> the efficient one.
>>
>> Thanks,
>> -Abhi
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From abhishek.vit at gmail.com  Wed Jul 29 17:06:54 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Wed, 29 Jul 2009 17:06:54 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>
	<35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk>
Message-ID: <be9b52410907291406t6845bdeau99cd047e146e3395@mail.gmail.com>

Yeah it is part of the development cycle. We need to integrate the
code some part of it is perl and some in Java.

>From your suggestions I feel I clearly have two options.
1. use a web service to talk between java and perl. This might not be
very efficient as we expect to make thousands of call per run.
2. Port the whole java code to bioperl.

#2 is scary but I might just have to do it.

Thanks again to all of you,
-Abhi

On Wed, Jul 29, 2009 at 4:56 PM, <ayates at ebi.ac.uk> wrote:
> That was my original point however it sounds from the original poster that
> the system which is in Perl needs to call out to an already implemented
> system in BioJava. In a perfect world this mismatch would never happen but
> hey we all know it can :)
>
> Andy
>
>> You could always use BioPerl instead :-)
>> http://www.bioperl.org/wiki/Main_Page
>>
>>
>>
>>
>> Russell Smithies
>>
>> Bioinformatics Applications Developer
>> T +64 3 489 9085
>> E? russell.smithies at agresearch.co.nz
>>
>> Invermay? Research Centre
>> Puddle Alley,
>> Mosgiel,
>> New Zealand
>> T? +64 3 489 3809
>> F? +64 3 489 9174
>> www.agresearch.co.nz
>>
>>
>>
>>> -----Original Message-----
>>> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-
>>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>>> Sent: Wednesday, 29 July 2009 7:04 a.m.
>>> To: biojava-dev at lists.open-bio.org
>>> Subject: [Biojava-dev] Hi.. Calling Java from Perl
>>>
>>> Hi Guys
>>>
>>> Before I ask the question, let me introduce myself. I am Abhishek
>>> primarily a Bioinformatician and this is my first mail here. I
>>> realized sooner thn later that I have to use BioJava to make my life
>>> easier. :)
>>>
>>> So basically we have a lot of perl code where we would like to plugin
>>> some Biojava code and some inhouse written packages/classes. I am just
>>> wondering what is the best way to do so. Clearly I am not a java guy
>>> so please excuse me in case I am asking something which is very basic.
>>> I found couple of solutions after few googles but not sure which is
>>> the efficient one.
>>>
>>> Thanks,
>>> -Abhi
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>


From niall at sgenomics.org  Thu Jul 30 12:32:00 2009
From: niall at sgenomics.org (Niall Haslam)
Date: Thu, 30 Jul 2009 18:32:00 +0200
Subject: [Biojava-dev] Webservices
Message-ID: <200907301832.01103.niall@sgenomics.org>

Hi,

I know it was brought up in the users list a month or two ago. But I wanted to 
ask in the Dev list what the consensus is on creating a biojava module for 
webservices clients. I am interested and have a little code to contribute. I 
think it would consist of mainly example code in how to use the webservice. 
And critically would not incorporate the stub code generated by axis. I would 
also bump for axis2. I think this could have the benefit of making services 
more standards compliant. But we'll probably have to do it on a case by case 
basis. 

I'd also like to know if there are people who are interested in using or 
writing some of it as well.

Thanks and looking forward to your input,

Niall.

From HWillis at scripps.edu  Fri Jul 31 10:04:31 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Fri, 31 Jul 2009 10:04:31 -0400
Subject: [Biojava-dev] Webservices
In-Reply-To: <200907301832.01103.niall@sgenomics.org>
Message-ID: <C698722F.1935%HWillis@scripps.edu>

Niall

I have the web services biojava implementation on my list of things to do! I have an upcoming project that doing Blast through web services to external sources and internal sources will make things easier. I like what axis2 is doing on making it easy to publish web services but using Netbeans as an example it is fairly painless to create a web service. Since we are mainly focused on consuming web services it would be nice to use the built in support of Java 6 to keep the external library count as low as possible which also helps avoid conflicts when an external application is using a different version of the same external library.

I think the main driving force as you mention is that much will depend on the provider of the web service as to what web services client library will be needed.

Thanks

Scooter


On 7/30/09 12:32 PM, "Niall Haslam" <niall at sgenomics.org> wrote:

Hi,

I know it was brought up in the users list a month or two ago. But I wanted to
ask in the Dev list what the consensus is on creating a biojava module for
webservices clients. I am interested and have a little code to contribute. I
think it would consist of mainly example code in how to use the webservice.
And critically would not incorporate the stub code generated by axis. I would
also bump for axis2. I think this could have the benefit of making services
more standards compliant. But we'll probably have to do it on a case by case
basis.

I'd also like to know if there are people who are interested in using or
writing some of it as well.

Thanks and looking forward to your input,

Niall.
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From heuermh at acm.org  Wed Jul  1 16:56:36 2009
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 1 Jul 2009 12:56:36 -0400 (EDT)
Subject: [Biojava-dev] Singletons are bad
In-Reply-To: <93b45ca50906300133w58109024vb89c6970a8446fed@mail.gmail.com>
Message-ID: <Pine.GSO.4.44.0907011251340.2527-100000@shell3.shore.net>

Mark Schreiber wrote:

> I came across this today which is an interesting article about how
> singletons seem like a good idea but after a while you realise they get you
> into serious trouble. After playing with BioJava for over 10 years I
> completely concur. Singletons and fly-weight objects are (IMHO) the most
> serious problem in the BioJava code base and as the article predicts the BJ
> code base is completely infected with them.
>
> The article is here:
> http://tech.puredanger.com/2007/07/03/pattern-hate-singleton/
>
>
> But I have copied the paragraph below as it seems to offer a way out without
> completely breaking everything.  This should be seriously considered for
> future BJ releases.
>
> ... paste starts here
> But I already have a bunch of singletons in my code!
> ...

I've had good luck using Google Guice in several for-work projects:

> http://code.google.com/p/google-guice/

@Inject is the new new as they say.  :)

   michael


From sylvain.foisy at diploide.net  Thu Jul  2 13:12:44 2009
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Thu, 02 Jul 2009 09:12:44 -0400
Subject: [Biojava-dev] Preliminary QBlast support in biojava-live
Message-ID: <C6722A8C.13B63%sylvain.foisy@diploide.net>

Hi all,

I just put some material into a new package (org.biojavax.bio.alignment) for
creating a remote service for alignment with its implementation for QBlast.

The philosophy for using these is this:

- Create an implementation of RemotePairwiseAlignmentService for a specific
remote service;

- Create an implementation of RemotePairwiseAlignementProperties to set
parameters for alignment;

- Use the sendAlignmentRequest() method with a sequence with the implemented
RemotePairwiseAlignementProperties to submit the sequence for alignmnent.

- Retrieve the results with an implementation of
RemotePairwiseAlignmentOutputProperties which specifies the format of the
output to get from the service.

This is done so that submission of sequence and retrieval of results can be
dissociated.

I think that I have addressed most of the points of a few weeks back. If
not, let me know ;-) I created a demo in the demos folder.

Best regards

Sylvain
===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net
 Tel: (514) 893-4363
===================================================================


From aradwen at gmail.com  Thu Jul  2 13:28:00 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Thu, 2 Jul 2009 15:28:00 +0200
Subject: [Biojava-dev] Parsing Interpro results
Message-ID: <e591b1bd0907020628m7c2e75aboa841b75f88b2dc83@mail.gmail.com>

Hello everyone,

I looked around in Biojava doc and through internet but I did'nt found how
to parse Interproscan results (xml as well as tabular formats)
It is not hard to code it in Java, But I just wanted to know if this exists
or not.

Regards
Rad


From hunter at ebi.ac.uk  Thu Jul  2 14:56:28 2009
From: hunter at ebi.ac.uk (Sarah Hunter)
Date: Thu, 02 Jul 2009 15:56:28 +0100
Subject: [Biojava-dev] Parsing Interpro results
In-Reply-To: <12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com>
References: <e591b1bd0907020628m7c2e75aboa841b75f88b2dc83@mail.gmail.com>
	<12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com>
Message-ID: <4A4CCA9C.1080404@ebi.ac.uk>

Hi Radwen (and the rest of the biojava guys),

As far as I am aware, there isn't a biojava parser for InterProScan results.

However, we are undergoing a complete re-write of InterPro and InterProScan at the moment and it is 
our intention to provide a java API for accessing all of our data.  If you wish to be involved in 
testing this API, please contact the InterPro team via the EBI's support pages 
(http://www.ebi.ac.uk/support/)

Many thanks for your interest.

Sarah Hunter

---
  Sarah Hunter

  InterPro Team Leader
  European Bioinformatics Institute
  Wellcome Trust Genome Campus
  Hinxton
  Cambridge
  CB10 1SD, UK

=====================================


> From: Radwen ANIBA <aradwen at gmail.com>
> Date: Thu, Jul 2, 2009 at 2:28 PM
> Subject: [Biojava-dev] Parsing Interpro results
> To: biojava-dev at lists.open-bio.org
> 
> 
> Hello everyone,
> 
> I looked around in Biojava doc and through internet but I did'nt found how
> to parse Interproscan results (xml as well as tabular formats)
> It is not hard to code it in Java, But I just wanted to know if this exists
> or not.
> 
> Regards
> Rad
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> 
> 


From fbristow at gmail.com  Tue Jul  7 13:34:09 2009
From: fbristow at gmail.com (Franklin Bristow)
Date: Tue, 7 Jul 2009 08:34:09 -0500
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
Message-ID: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>

Hi everyone,
Now that you're all back from ISMB (I hope you all had a good time!) I
thought it would be a good time to bring this up.

A while back I wrote to the list about an ABIF parser and SCF writer that I
had written.  I got some pointers on things to change and I've since made
the suggested changes.  Now I was wondering how I should go about getting
these files into BioJava....

-- 
Franklin


From andreas at sdsc.edu  Tue Jul  7 16:51:05 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 7 Jul 2009 09:51:05 -0700
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
In-Reply-To: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
Message-ID: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>

Hi Franklin,

The theme of the moment is modularization... I wonder if we should make a
module for parsing the output of sequencers...

This topic is also a bit related to the discussion we had around BOSC last
week, how to contribute modules, and what is the role of a module
maintainer. I will send out a more detailed summary on that a bit later.

Andreas


On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow <fbristow at gmail.com> wrote:

> Hi everyone,
> Now that you're all back from ISMB (I hope you all had a good time!) I
> thought it would be a good time to bring this up.
>
> A while back I wrote to the list about an ABIF parser and SCF writer that I
> had written.  I got some pointers on things to change and I've since made
> the suggested changes.  Now I was wondering how I should go about getting
> these files into BioJava....
>
> --
> Franklin
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From gmicha at gmail.com  Tue Jul  7 17:08:07 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Tue, 07 Jul 2009 19:08:07 +0200
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
In-Reply-To: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>
References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
	<59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>
Message-ID: <4A5380F7.6090602@gmail.com>

Hi,

the time I worked on SCF and ABI files is a bit ago, but lets see if I 
can contribute something here. Working since a year on NGS, I could 
imagine that readers for the standard-output of pipelines by Illumina & 
Co would also fit there.

If there is a reader module, I would plead for a low-level interface for
accessing sequences/qualities and re-usable data containers during I/O. 
But maybe that is rather an early stage to talk about that when not even 
the existence of the module is decided.

cheers - micha.

Andreas Prlic wrote:
> Hi Franklin,
> 
> The theme of the moment is modularization... I wonder if we should make a
> module for parsing the output of sequencers...
> 
> This topic is also a bit related to the discussion we had around BOSC last
> week, how to contribute modules, and what is the role of a module
> maintainer. I will send out a more detailed summary on that a bit later.
> 
> Andreas
> 
> 
> On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow <fbristow at gmail.com> wrote:
> 
>> Hi everyone,
>> Now that you're all back from ISMB (I hope you all had a good time!) I
>> thought it would be a good time to bring this up.
>>
>> A while back I wrote to the list about an ABIF parser and SCF writer that I
>> had written.  I got some pointers on things to change and I've since made
>> the suggested changes.  Now I was wondering how I should go about getting
>> these files into BioJava....
>>
>> --
>> Franklin
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From fbristow at gmail.com  Tue Jul  7 17:29:56 2009
From: fbristow at gmail.com (Franklin Bristow)
Date: Tue, 7 Jul 2009 12:29:56 -0500
Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer
In-Reply-To: <4A5380F7.6090602@gmail.com>
References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com>
	<59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com>
	<4A5380F7.6090602@gmail.com>
Message-ID: <50a7756d0907071029s45ee0983y85f2ee307765e65c@mail.gmail.com>

Hello,
I think I like the idea of having a module for the I/O of sequencers in
general.  I really only have familiarity with ABI sequencers (ie: 31xx and
37xx) and the data that they spit out, so I would be able to offer some help
there.  Needless to say, the documentation that ABI released regarding their
binary format was much appreciated when I was going through the code.

To Micha:  when you talk about 'during I/O', do you mean having some kind of
an event based parser?  When I wrote my extended ABIF parser I modelled it
after the perl module Bio::Trace::ABIF, so there are accessors for many of
the tags that are defined in the ABI spec.

On Tue, Jul 7, 2009 at 12:08 PM, Micha Sammeth <gmicha at gmail.com> wrote:

> Hi,
>
> the time I worked on SCF and ABI files is a bit ago, but lets see if I can
> contribute something here. Working since a year on NGS, I could imagine that
> readers for the standard-output of pipelines by Illumina & Co would also fit
> there.
>
> If there is a reader module, I would plead for a low-level interface for
> accessing sequences/qualities and re-usable data containers during I/O. But
> maybe that is rather an early stage to talk about that when not even the
> existence of the module is decided.
>
> cheers - micha.
>
>
> Andreas Prlic wrote:
>
>> Hi Franklin,
>>
>> The theme of the moment is modularization... I wonder if we should make a
>> module for parsing the output of sequencers...
>>
>> This topic is also a bit related to the discussion we had around BOSC last
>> week, how to contribute modules, and what is the role of a module
>> maintainer. I will send out a more detailed summary on that a bit later.
>>
>> Andreas
>>
>>
>> On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow <fbristow at gmail.com>
>> wrote:
>>
>>  Hi everyone,
>>> Now that you're all back from ISMB (I hope you all had a good time!) I
>>> thought it would be a good time to bring this up.
>>>
>>> A while back I wrote to the list about an ABIF parser and SCF writer that
>>> I
>>> had written.  I got some pointers on things to change and I've since made
>>> the suggested changes.  Now I was wondering how I should go about getting
>>> these files into BioJava....
>>>
>>> --
>>> Franklin
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>>  _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>


-- 
Franklin


From sreekanth.m at ocimumbio.com  Wed Jul  8 05:51:15 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Wed, 8 Jul 2009 11:21:15 +0530
Subject: [Biojava-dev] Reg: Source of BioJava
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com>

Dear all,

Just now I started working with biojava to work in next generation sequencing.
I got all the jar files to work with biojava, and i found many things which are very useful to me.
I require sourde jar file of biojava. If anybody has it please send it to me.

Thanks in advance.

Thanks & Regards,
Sreekanth.M


From andreas at sdsc.edu  Wed Jul  8 06:20:59 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 7 Jul 2009 23:20:59 -0700
Subject: [Biojava-dev] summary biojava user meeting
Message-ID: <59a41c430907072320k3d5a4415u962d59a10d286beb@mail.gmail.com>

Hi,

Here a quick summary of the BioJava user meeting we had last week at the
BOSC conference:

The following people were present:

Mattias Piipari
Martijn Devisscher
Frederik Decouttere
Richard Holland
Andreas Prlic

The new modularized code base will allow for individual people to take over
responsibility of some of the sub-modules as well as the contribution of new
modules., which I both welcome greatly. As such it was great to have
Mattias, Martijn and Frederik there and  expressing their interest in this.

Mattias is interested in contributing a new module related to machine
learning. Martijn and Frederik are interested in providing a new GUI module
(seqpad). Due to this our discussions were mainly related to how to organize
the contribution of new modules and their maintainance:

* Before starting a new module the code should undergo public code review
* New modules need docu (wiki cookbook) and junit tests.
* A Module Maintainer (MM) is the main responsible for everything related to
the module.
* MM coordinates patches and other user contributions for the module
* MM can write papers related to the code in the module without having to
cite all of the other BioJava contributors.
* A MM volunteers to support the module for (at least) a year.
* All MMs will be listed by name on a wiki page in order to clarify
responsibilities

Andreas


From holland at eaglegenomics.com  Wed Jul  8 16:41:54 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 08 Jul 2009 18:41:54 +0200
Subject: [Biojava-dev] Reg: Source of BioJava
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com>
Message-ID: <1247071314.3792.9.camel@buzzybee>

The source code can be obtained by following these instructions:

http://biojava.org/wiki/CVS_to_SVN_Migration

Richard.

On Wed, 2009-07-08 at 11:21 +0530, Sreekanth Mogullapally wrote:
> Dear all,
> 
> Just now I started working with biojava to work in next generation sequencing.
> I got all the jar files to work with biojava, and i found many things which are very useful to me.
> I require sourde jar file of biojava. If anybody has it please send it to me.
> 
> Thanks in advance.
> 
> Thanks & Regards,
> Sreekanth.M
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From florian.mittag at uni-tuebingen.de  Thu Jul  9 15:16:12 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 9 Jul 2009 17:16:12 +0200
Subject: [Biojava-dev] Problems in DB2 with VARCHAR,
	TEXT and CLOB using BioJava
Message-ID: <200907091716.13639.florian.mittag@uni-tuebingen.de>

Hi all!

I'm posting this to both the BioSQL and the BioJava-dev mailinglist because 
the problem resides in both domains, I hope this is okay.

We're working on getting BioJava to run with a DB2 Express-C backend for 
various reasons. We've encountered several problems during this task, but 
this one seems to have no real solution.

When adapting the BioSQL schema to DB2, the official IBM conversion guide 
tells us to use the data type CLOB where MySQL uses TEXT.

(Chapter 11 in
ftp://ftp.software.ibm.com/software/data/db2/migration/mtk/mtk_2050.pdf)

So far, no problem. But when we tried reading some genebank files with 
BioJava, the DB2 driver threw an exception:

SQL0401N  The data types of the operands for the operation "=" are not 
compatible.  SQLSTATE=42818 SQLCODE=-401

Explanation:
The class org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder defines some 
Hibernate queries, of which one has the conditions:

"from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"

All three columns "authors", "location", and "title" are of type TEXT in MySQL 
and of type CLOB in DB2, so comparing them with "=" leads to the above error 
message.


The way I see it, there are only two possible solutions to this problem:
1) Change the query to
"from DocRef as cr where cr.authors LIKE '?' and cr.location LIKE '?' and 
cr.title LIKE '?'"

2) Change the data type to something comparable with "=", like VARCHAR.

Solution 1 is no real solution to me, because comparing values with "LIKE" 
usually is slow and it seems a bit odd to change a query that works with 
other databases just for DB2.

But taking a closer look, solution 2 has some problems, too:
Although VARCHARs in DB2 can have a length of theoretically 32767, in reality 
they are limited by the page size of the database, which can be 32K at 
maximum. Since this particular table "reference" has three columns of this 
type, the sum of their lengths must not exceed 32767, so they could only be 
something like VARCHAR(10000).

I have never encountered cases in which values come even close to the length 
of 10000, but you can never be sure.


And that is why I post here. For me, the way to go is pretty clear, but we 
intend to be as compatible as possible with the original BioSQL. Maybe you 
could give me some input on how to solve this problem with as few casualties 
as possible ;-)


Thanks,
Florian


From aradwen at gmail.com  Fri Jul 10 14:45:35 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Fri, 10 Jul 2009 16:45:35 +0200
Subject: [Biojava-dev] ExternalProcess class
Message-ID: <e591b1bd0907100745k7ea36398j2d8ed5b0084f27c2@mail.gmail.com>

Hi everyone,

Do somebody have (as examples of biojava cookbook) a usage example of
ExternalProcess class ?

Let's say we want to run a local clustalw program with it, is it possible ?

Any example code ?

Thank you

Radwen


From sylvain.foisy at diploide.net  Fri Jul 10 16:43:09 2009
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Fri, 10 Jul 2009 12:43:09 -0400
Subject: [Biojava-dev] ExternalProcess class
In-Reply-To: <mailman.25.1247241605.31128.biojava-dev@lists.open-bio.org>
Message-ID: <C67CE7DD.13D14%sylvain.foisy@diploide.net>

Hi,

There is no such example to the best of my knowledge. If you do use this
class, you are welcome to share your experience by contributing. As far as
running something like clustalw, I don't see why you could not make use of
this class. You could actually build a wrapper class to execute clustalw and
do something with its output.

Best regards

Sylvain

On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote:

> Hi everyone,
> 
> Do somebody have (as examples of biojava cookbook) a usage example of
> ExternalProcess class ?
> 
> Let's say we want to run a local clustalw program with it, is it possible ?
> 
> Any example code ?
> 
> Thank you


===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net
 Tel: (514) 893-4363
===================================================================


From hlapp at gmx.net  Sat Jul 11 11:47:34 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 11 Jul 2009 07:47:34 -0400
Subject: [Biojava-dev] [BioSQL-l] Problems in DB2 with VARCHAR,
	TEXT and CLOB using BioJava
In-Reply-To: <200907091716.13639.florian.mittag@uni-tuebingen.de>
References: <200907091716.13639.florian.mittag@uni-tuebingen.de>
Message-ID: <5614AEDA-3406-4844-8690-7653A2C4297C@gmx.net>

Hi Florian:

On Jul 9, 2009, at 11:16 AM, Florian Mittag wrote:

> [...]
> 2) Change the data type to something comparable with "=", like  
> VARCHAR.

That's the way to go. The reason they are not VARCHAR in MySQL is  
because it is limited to 256 characters there.

> [...]
> Although VARCHARs in DB2 can have a length of theoretically 32767,  
> in reality
> they are limited by the page size of the database, which can be 32K at
> maximum. Since this particular table "reference" has three columns  
> of this
> type, the sum of their lengths must not exceed 32767, so they could  
> only be
> something like VARCHAR(10000).

That sounds great though. You may have noticed that the columns are  
all of type VARCHAR in the Oracle version of the schema with the  
following widths:

        Title                VARCHAR2(1000)
        Authors              VARCHAR2(4000)
        Location             VARCHAR2(512)

That has always served me well. Feel free to use larger widths though  
if you think you need them.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From HWillis at scripps.edu  Sat Jul 11 11:08:32 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Sat, 11 Jul 2009 07:08:32 -0400
Subject: [Biojava-dev] ExternalProcess class
In-Reply-To: <C67CE7DD.13D14%sylvain.foisy@diploide.net>
References: <mailman.25.1247241605.31128.biojava-dev@lists.open-bio.org>,
	<C67CE7DD.13D14%sylvain.foisy@diploide.net>
Message-ID: <FF46E20B7EF40940B038819E2CFBA8522B0032E4FB@CCRMB1.fl.ad.scripps.edu>

Biojava already has a Clustalw class that executes Clustalw as an external process  if that is your larger goal.

http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW

Thanks

Scooter
________________________________________
From: biojava-dev-bounces at lists.open-bio.org [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy [sylvain.foisy at diploide.net]
Sent: Friday, July 10, 2009 12:43 PM
To: biojava-dev at lists.open-bio.org
Subject: Re: [Biojava-dev] ExternalProcess class

Hi,

There is no such example to the best of my knowledge. If you do use this
class, you are welcome to share your experience by contributing. As far as
running something like clustalw, I don't see why you could not make use of
this class. You could actually build a wrapper class to execute clustalw and
do something with its output.

Best regards

Sylvain

On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote:

> Hi everyone,
>
> Do somebody have (as examples of biojava cookbook) a usage example of
> ExternalProcess class ?
>
> Let's say we want to run a local clustalw program with it, is it possible ?
>
> Any example code ?
>
> Thank you


===================================================================

 Sylvain Foisy, Ph. D.
 Consultant Bio-informatique / Bioinformatics
 Diploide.net - TI pour la vie / IT for Life

 Courriel: sylvain.foisy at diploide.net
 Web: http://www.diploide.net
 Tel: (514) 893-4363
===================================================================


_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From paolo.pavan at gmail.com  Sun Jul 12 21:41:05 2009
From: paolo.pavan at gmail.com (Paolo Pavan)
Date: Sun, 12 Jul 2009 23:41:05 +0200
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
Message-ID: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>

Hi,
I would like to post again with some adjustments a question I put some
times ago because maybe this is a more correct list, apologize for the
repeating.
Can someone kindly give me his advise?

thank you in advance,
Paolo


---------- Forwarded message ----------
From: Paolo Pavan <paolo.pavan at gmail.com>
Date: 2009/7/9
Subject: Assembly data reading
To: Biojava-l at lists.open-bio.org


Hi everybody,
I'm almost new to this topic, I would like to know if there is
something can help me to load in my java program data from a large 454
contig. I need to retain in memory and access data from the single
reads forming the contig too.
I suppose these informations are in a *.sff file, if it is not
possible to load such file it should be ok to load a *.ace (phrap)
data file that I have too.
Many thanks for any suggestion you can give me!

Greetings,
Paolo


From holland at eaglegenomics.com  Mon Jul 13 05:20:35 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 13 Jul 2009 06:20:35 +0100
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
Message-ID: <1247462435.25217.7.camel@buzzybee>

Nothing within BJ can parse the 454 .sff files directly. However I think
there is a growing need for it so if anyone is willing to contribute
code, it would be very welcome.

There is also no .ace parser, although in 2007 someone volunteered to
write one but nothing happened, and there was a previous post (many
years ago!) from someone else who already had some working code but
again nothing seems to have happened: 

http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html

So to start with, someone (perhaps yourself? that would be nice! :) )
needs to volunteer to write either a .ace or .sff parser, or both. 

The thing to bear in mind with 454 contigs as you rightly point out is
the sheer size of the things. The requirement to keep them entirely in
memory is likely to be unworkable as it would leave little room for
anything else to run on your average machine. I would suggest either
memory-mapping the file itself, or parsing and writing out a
memory-mapped summary file containing the bits of data you're interested
in. (Memory-mapping is where you keep an index in memory indicating
where in the file each record is, so that when you need to access them
you load them on-the-fly from the file and drop them out of memory again
immediately after use. An accelerated form of this is to put the loaded
records into some kind of LRU cache which holds only the most recently
accessed records and then check that cache first to see if you've
already loaded the record before accessing the file directly.)

cheers,
Richard


On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
> Hi,
> I would like to post again with some adjustments a question I put some
> times ago because maybe this is a more correct list, apologize for the
> repeating.
> Can someone kindly give me his advise?
> 
> thank you in advance,
> Paolo
> 
> 
> ---------- Forwarded message ----------
> From: Paolo Pavan <paolo.pavan at gmail.com>
> Date: 2009/7/9
> Subject: Assembly data reading
> To: Biojava-l at lists.open-bio.org
> 
> 
> Hi everybody,
> I'm almost new to this topic, I would like to know if there is
> something can help me to load in my java program data from a large 454
> contig. I need to retain in memory and access data from the single
> reads forming the contig too.
> I suppose these informations are in a *.sff file, if it is not
> possible to load such file it should be ok to load a *.ace (phrap)
> data file that I have too.
> Many thanks for any suggestion you can give me!
> 
> Greetings,
> Paolo
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Mon Jul 13 06:29:56 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 13 Jul 2009 14:29:56 +0800
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <1247462435.25217.7.camel@buzzybee>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> 
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> 
	<1247462435.25217.7.camel@buzzybee>
Message-ID: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>

I would agree that there is a strong need for this kind of thing in biojava.

As Richard says you probably can't fit it in memory so you may want to
memory map it. There are classes in the javax.nio package that can help a
lot with this.

Also I have had some success with in-memory compression of large files using
LZ compression. Essentially the memory representation of the file is LZ
compressed and compression and decompression are handled on the fly. Again
there are Java utility classes that can help.

- Mark

On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Nothing within BJ can parse the 454 .sff files directly. However I think
> there is a growing need for it so if anyone is willing to contribute
> code, it would be very welcome.
>
> There is also no .ace parser, although in 2007 someone volunteered to
> write one but nothing happened, and there was a previous post (many
> years ago!) from someone else who already had some working code but
> again nothing seems to have happened:
>
> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html
>
> So to start with, someone (perhaps yourself? that would be nice! :) )
> needs to volunteer to write either a .ace or .sff parser, or both.
>
> The thing to bear in mind with 454 contigs as you rightly point out is
> the sheer size of the things. The requirement to keep them entirely in
> memory is likely to be unworkable as it would leave little room for
> anything else to run on your average machine. I would suggest either
> memory-mapping the file itself, or parsing and writing out a
> memory-mapped summary file containing the bits of data you're interested
> in. (Memory-mapping is where you keep an index in memory indicating
> where in the file each record is, so that when you need to access them
> you load them on-the-fly from the file and drop them out of memory again
> immediately after use. An accelerated form of this is to put the loaded
> records into some kind of LRU cache which holds only the most recently
> accessed records and then check that cache first to see if you've
> already loaded the record before accessing the file directly.)
>
> cheers,
> Richard
>
>
> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
> > Hi,
> > I would like to post again with some adjustments a question I put some
> > times ago because maybe this is a more correct list, apologize for the
> > repeating.
> > Can someone kindly give me his advise?
> >
> > thank you in advance,
> > Paolo
> >
> >
> > ---------- Forwarded message ----------
> > From: Paolo Pavan <paolo.pavan at gmail.com>
> > Date: 2009/7/9
> > Subject: Assembly data reading
> > To: Biojava-l at lists.open-bio.org
> >
> >
> > Hi everybody,
> > I'm almost new to this topic, I would like to know if there is
> > something can help me to load in my java program data from a large 454
> > contig. I need to retain in memory and access data from the single
> > reads forming the contig too.
> > I suppose these informations are in a *.sff file, if it is not
> > possible to load such file it should be ok to load a *.ace (phrap)
> > data file that I have too.
> > Many thanks for any suggestion you can give me!
> >
> > Greetings,
> > Paolo
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From sreekanth.m at ocimumbio.com  Mon Jul 13 11:28:52 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Mon, 13 Jul 2009 16:58:52 +0530
Subject: [Biojava-dev] Reg: Quality values of ABI File
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>

Hi Everybody,

I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it,
but I need the Quality values for each base.
Please help me in this regard.

Thanks in Advance
Sreekanth.M


From fbristow at gmail.com  Mon Jul 13 13:15:40 2009
From: fbristow at gmail.com (Franklin Bristow)
Date: Mon, 13 Jul 2009 08:15:40 -0500
Subject: [Biojava-dev] Reg: Quality values of ABI File
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
Message-ID: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com>

Hi Sreekanth,
The quality values are stored under the PCON 1 and 2 tags.  The information
you need is in the offsetData field of the TaggedDataRecord.  You can treat
this byte array as an array of shorts containing the quality values for each
base.

Take a look at this PDF for more information about the different tags
available to you:
http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf

-- 
Franklin

On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally <
sreekanth.m at ocimumbio.com> wrote:

> Hi Everybody,
>
> I am working with "ABI" Files. I am able to get the pixel values for
> Chromatogram viewer from it,
> but I need the Quality values for each base.
> Please help me in this regard.
>
> Thanks in Advance
> Sreekanth.M
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From sylvain.foisy at diploide.net  Mon Jul 13 13:16:34 2009
From: sylvain.foisy at diploide.net (Sylvain Foisy)
Date: Mon, 13 Jul 2009 09:16:34 -0400
Subject: [Biojava-dev] ExternalProcess class
In-Reply-To: <FF46E20B7EF40940B038819E2CFBA8522B0032E4FB@CCRMB1.fl.ad.scripps.edu>
Message-ID: <C680ABF2.13D3D%sylvain.foisy@diploide.net>

Hi Scooter,

Actaully, this is not into BJ 1.7. This material was created by Dickson
Guedes but never formalized into a class for BJ. Maybe it would be the time
to do so?

Best regards

Sylvain


On 11/07/09 07:08, "[NAME]" <[ADDRESS]> wrote:

> Biojava already has a Clustalw class that executes Clustalw as an external
> process  if that is your larger goal.
> 
> http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW
> 
> Thanks
> 
> Scooter
> ________________________________________
> From: biojava-dev-bounces at lists.open-bio.org
> [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy
> [sylvain.foisy at diploide.net]
> Sent: Friday, July 10, 2009 12:43 PM
> To: biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] ExternalProcess class
> 
> Hi,
> 
> There is no such example to the best of my knowledge. If you do use this
> class, you are welcome to share your experience by contributing. As far as
> running something like clustalw, I don't see why you could not make use of
> this class. You could actually build a wrapper class to execute clustalw and
> do something with its output.
> 
> Best regards
> 
> Sylvain
> 
> On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote:
> 
>> Hi everyone,
>> 
>> Do somebody have (as examples of biojava cookbook) a usage example of
>> ExternalProcess class ?
>> 
>> Let's say we want to run a local clustalw program with it, is it possible ?
>> 
>> Any example code ?
>> 
>> Thank you
> 
> 
> ===================================================================
> 
>  Sylvain Foisy, Ph. D.
>  Consultant Bio-informatique / Bioinformatics
>  Diploide.net - TI pour la vie / IT for Life
> 
>  Courriel: sylvain.foisy at diploide.net
>  Web: http://www.diploide.net
>  Tel: (514) 893-4363
> ===================================================================
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From markjschreiber at gmail.com  Mon Jul 13 13:21:42 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 13 Jul 2009 21:21:42 +0800
Subject: [Biojava-dev] Reg: Quality values of ABI File
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
Message-ID: <93b45ca50907130621m6ebd37bn72aad9fafe195db7@mail.gmail.com>

Hi -
You would usually use a program like Phred/Phrap for this. There is a
BioJava package for reading and processing the Phred output.

- Mark

On Mon, Jul 13, 2009 at 7:28 PM, Sreekanth Mogullapally <
sreekanth.m at ocimumbio.com> wrote:

> Hi Everybody,
>
> I am working with "ABI" Files. I am able to get the pixel values for
> Chromatogram viewer from it,
> but I need the Quality values for each base.
> Please help me in this regard.
>
> Thanks in Advance
> Sreekanth.M
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From sreekanth.m at ocimumbio.com  Mon Jul 13 14:49:26 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Mon, 13 Jul 2009 20:19:26 +0530
Subject: [Biojava-dev] Reg: Quality values of ABI File
In-Reply-To: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com>
	<50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com>
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CBC6@EXCHMB.ocimumbio.com>

Dear Franklin,

Thank you very much for your quick responce.
Now I can able to get the Quality values, which meets my requirement.


Thanks & Regards,
Sreekanth.M

From: Franklin Bristow [mailto:fbristow at gmail.com]
Sent: Monday, July 13, 2009 6:46 PM
To: Sreekanth Mogullapally
Cc: biojava-dev-request at lists.open-bio.org; biojava-dev at lists.open-bio.org; Madhu Mohan. Ganni; Kishore Dunga
Subject: Re: [Biojava-dev] Reg: Quality values of ABI File

Hi Sreekanth,
The quality values are stored under the PCON 1 and 2 tags.  The information you need is in the offsetData field of the TaggedDataRecord.  You can treat this byte array as an array of shorts containing the quality values for each base.

Take a look at this PDF for more information about the different tags available to you:
http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf

--
Franklin
On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally <sreekanth.m at ocimumbio.com<mailto:sreekanth.m at ocimumbio.com>> wrote:
Hi Everybody,

I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it,
but I need the Quality values for each base.
Please help me in this regard.

Thanks in Advance
Sreekanth.M


_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org<mailto:biojava-dev at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From holland at eaglegenomics.com  Mon Jul 13 17:14:13 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 13 Jul 2009 18:14:13 +0100
Subject: [Biojava-dev] Hackathon
Message-ID: <1247505253.27493.15.camel@buzzybee>

Hi all.

Andreas and I would like to organise a hackathon to get the
modularisation and general improvement plans for BJ3 into action, and
bring the project forward into the 21st century (only 10 years late!).

At this time I'm trying to gather interest and gauge who might
realistically be able to attend. We will attempt to site the hackathon
at a location closest to the majority of attendees.

To help me plan numbers and likely costs (for potential sponsors) could
all those who are interested please answer the following questions for
me:

 1. Name,
 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
sequencing, etc.),
 3. Your physical location (country and and nearest major city - e.g.
Cambridge, London, Newcastle, San Diego, Singapore, etc.),
 4. Whether you think your employer would help pay your airfare and/or 1
week in a hotel to attend (and how far you think you could go on such
funding),
 5. Approximate availability for the next 12 months.

To get the ball rolling, here's me:

 1. Richard Holland,  2. Making the whole thing more consistent,
efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
probably only within UK/Europe,  5. Only available in mid-Jan 2010,
otherwise can't do anything until mid-March 2010 onwards.

Looking forward to hearing your comments! Once I have a good idea of
numbers and distribution, I can get some costs together to give you (and
any potential sponsors) the best idea of what might be involved.

cheers,
Richard

-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From HWillis at scripps.edu  Mon Jul 13 20:12:16 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Mon, 13 Jul 2009 16:12:16 -0400
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
Message-ID: <C6810D60.1652%HWillis@scripps.edu>

Richard

I would be up for a week away as a "vacation" to do some Java programming. So I am flexible on all elements of time and ability to travel. Bonus points for going some place where the weather is reasonable for the location since it appears we have a global option (January in the UK not my first choice/January in Colorado better choice). Probably wouldn't be a bad idea to try and bookend/overlap a bioinformatics related conference to help justify travel costs for those who need support from work.

We should also consider online options for those who can't travel but can allocate the time.

Thanks

Scooter


On 7/13/09 1:14 PM, "Richard Holland" <holland at eaglegenomics.com> wrote:

Hi all.

Andreas and I would like to organise a hackathon to get the
modularisation and general improvement plans for BJ3 into action, and
bring the project forward into the 21st century (only 10 years late!).

At this time I'm trying to gather interest and gauge who might
realistically be able to attend. We will attempt to site the hackathon
at a location closest to the majority of attendees.

To help me plan numbers and likely costs (for potential sponsors) could
all those who are interested please answer the following questions for
me:

 1. Name,
 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
sequencing, etc.),
 3. Your physical location (country and and nearest major city - e.g.
Cambridge, London, Newcastle, San Diego, Singapore, etc.),
 4. Whether you think your employer would help pay your airfare and/or 1
week in a hotel to attend (and how far you think you could go on such
funding),
 5. Approximate availability for the next 12 months.

To get the ball rolling, here's me:

 1. Richard Holland,  2. Making the whole thing more consistent,
efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
probably only within UK/Europe,  5. Only available in mid-Jan 2010,
otherwise can't do anything until mid-March 2010 onwards.

Looking forward to hearing your comments! Once I have a good idea of
numbers and distribution, I can get some costs together to give you (and
any potential sponsors) the best idea of what might be involved.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From paolo.pavan at gmail.com  Tue Jul 14 16:08:11 2009
From: paolo.pavan at gmail.com (Paolo Pavan)
Date: Tue, 14 Jul 2009 18:08:11 +0200
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
	<1247462435.25217.7.camel@buzzybee>
	<93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>
Message-ID: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com>

Dear all,
I took a day to make a rapid search to try to have a clearest point of
the situation.
?	I found the specification of the .sff file in the 454 instrument
manual, it is fully described and seems to be enough to build a
reader.
?	However from a more careful read it seems that a *.sff file brings
not information about the automatic contig assembling and only stores
flowgram info that are "reads" (not like a *.ace file indeed).
?	Two hidden binary files can be found in a 454 gsAssembler project
folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are
not described in the manual but they seem to contain the former
nucleotide data and the latter read names, they are big enough to
contain such kind of data, the problem is that we don't know how to
parse them.
?	It is necessary to decide a "memory structure" in which store the
information read, I agree on the "memory mapping" solution, maybe
implemented with a Map object that can associate the names of the read
and its location on the file.
?	the parser class then should expose methods to:
	1) iterate through reads, but maybe this should be heavy and avoidable
	2) access read sequence from name
?	if the parser should manage the assembled contigs too and this is
subordinated to what explained in the third bullet point, it should
expose method to:
	1) iterate through contigs names
	2) iterate through contigs consensus sequences
	3) access consensus sequences from name (this is a sub problem of point 2)
	4) access random aligned portions (I mean "slice") of the assembly
given start-end positions returning an alignment object
?	any more suggestions?
I would be glad to be involved in the biojava community through this
project and I could try but first of all I want to say that I?m not a
guru like most of the people here ( :-p ) and to say the truth the job
that my company required me is different and maybe if exists a
workaround I should be honest to choose it.
So let me think a bit about starting such adventure, if I can couple
my job and contributing the community growth I?ll be happy to share my
work! Any suggestion welcome.

Bye bye,
Paolo


2009/7/13 Mark Schreiber <markjschreiber at gmail.com>:
> I would agree that there is a strong need for this kind of thing in biojava.
>
> As Richard says you probably can't fit it in memory so you may want to
> memory map it. There are classes in the javax.nio package that can help a
> lot with this.
>
> Also I have had some success with in-memory compression of large files using
> LZ compression. Essentially the memory representation of the file is LZ
> compressed and compression and decompression are handled on the fly. Again
> there are Java utility classes that can help.
>
> - Mark
>
> On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland <holland at eaglegenomics.com>
> wrote:
>>
>> Nothing within BJ can parse the 454 .sff files directly. However I think
>> there is a growing need for it so if anyone is willing to contribute
>> code, it would be very welcome.
>>
>> There is also no .ace parser, although in 2007 someone volunteered to
>> write one but nothing happened, and there was a previous post (many
>> years ago!) from someone else who already had some working code but
>> again nothing seems to have happened:
>>
>> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
>> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html
>>
>> So to start with, someone (perhaps yourself? that would be nice! :) )
>> needs to volunteer to write either a .ace or .sff parser, or both.
>>
>> The thing to bear in mind with 454 contigs as you rightly point out is
>> the sheer size of the things. The requirement to keep them entirely in
>> memory is likely to be unworkable as it would leave little room for
>> anything else to run on your average machine. I would suggest either
>> memory-mapping the file itself, or parsing and writing out a
>> memory-mapped summary file containing the bits of data you're interested
>> in. (Memory-mapping is where you keep an index in memory indicating
>> where in the file each record is, so that when you need to access them
>> you load them on-the-fly from the file and drop them out of memory again
>> immediately after use. An accelerated form of this is to put the loaded
>> records into some kind of LRU cache which holds only the most recently
>> accessed records and then check that cache first to see if you've
>> already loaded the record before accessing the file directly.)
>>
>> cheers,
>> Richard
>>
>>
>> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
>> > Hi,
>> > I would like to post again with some adjustments a question I put some
>> > times ago because maybe this is a more correct list, apologize for the
>> > repeating.
>> > Can someone kindly give me his advise?
>> >
>> > thank you in advance,
>> > Paolo
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: Paolo Pavan <paolo.pavan at gmail.com>
>> > Date: 2009/7/9
>> > Subject: Assembly data reading
>> > To: Biojava-l at lists.open-bio.org
>> >
>> >
>> > Hi everybody,
>> > I'm almost new to this topic, I would like to know if there is
>> > something can help me to load in my java program data from a large 454
>> > contig. I need to retain in memory and access data from the single
>> > reads forming the contig too.
>> > I suppose these informations are in a *.sff file, if it is not
>> > possible to load such file it should be ok to load a *.ace (phrap)
>> > data file that I have too.
>> > Many thanks for any suggestion you can give me!
>> >
>> > Greetings,
>> > Paolo
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


From aradwen at gmail.com  Wed Jul 15 11:10:03 2009
From: aradwen at gmail.com (Radwen ANIBA)
Date: Wed, 15 Jul 2009 13:10:03 +0200
Subject: [Biojava-dev] PsiPred
Message-ID: <e591b1bd0907150410jb37063bl66e4a986d7134e87@mail.gmail.com>

Hi mates,

I was wondering if Biojava could handle PsiPred outputs (protein secondary
structures) for parsing (eg saying well this protein have helix start at ..
end at ..., sheet start at ... end at ... ), is there any class or methods
that was done in that sens, if yes i'm interested.

thank you


From andreas at sdsc.edu  Wed Jul 15 15:13:07 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 15 Jul 2009 08:13:07 -0700
Subject: [Biojava-dev] PsiPred
In-Reply-To: <e591b1bd0907150410jb37063bl66e4a986d7134e87@mail.gmail.com>
References: <e591b1bd0907150410jb37063bl66e4a986d7134e87@mail.gmail.com>
Message-ID: <59a41c430907150813q7fd58a81uc2d4372f01d2e89a@mail.gmail.com>

Hi Radwen,

At the present there is no parser for PsiPred. I am happy about any
contribution re. that...

Andreas


On Wed, Jul 15, 2009 at 4:10 AM, Radwen ANIBA<aradwen at gmail.com> wrote:
> Hi mates,
>
> I was wondering if Biojava could handle PsiPred outputs (protein secondary
> structures) for parsing (eg saying well this protein have helix start at ..
> end at ..., sheet start at ... end at ... ), is there any class or methods
> that was done in that sens, if yes i'm interested.
>
> thank you
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From heuermh at acm.org  Wed Jul 15 19:22:14 2009
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 15 Jul 2009 15:22:14 -0400 (EDT)
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
Message-ID: <Pine.GSO.4.44.0907151504410.1033-100000@shell3.shore.net>

Richard Holland wrote:

> Andreas and I would like to organise a hackathon to get the
> modularisation and general improvement plans for BJ3 into action, and
> bring the project forward into the 21st century (only 10 years late!).
>
> At this time I'm trying to gather interest and gauge who might
> realistically be able to attend. We will attempt to site the hackathon
> at a location closest to the majority of attendees.
>
> To help me plan numbers and likely costs (for potential sponsors) could
> all those who are interested please answer the following questions for
> me:
>
>  1. Name,
>  2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
>  3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>  4. Whether you think your employer would help pay your airfare and/or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
>  5. Approximate availability for the next 12 months.


A hackathon would be great.

I'm closest to MSP airport.  My employer would probably not cover air fare
or hotel.  I would then recommend choosing an interesting location so
that it would be worth spending out-of-pocket to get there.

I don't use biojava for my day job any more, so I'm most interested in
helping with architecture and build issues.  My day job is currently a lot
of data viz, maybe better integration with viz tools like Cytoscape,
Piccolo2D, prefuse, and Processing would be fun to work on.

   michael


From sreekanth.m at ocimumbio.com  Thu Jul 16 06:07:19 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Thu, 16 Jul 2009 11:37:19 +0530
Subject: [Biojava-dev] Reg: SeqIOTools Class
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com>


Hi Everybody,

I am newly working with biojava. While I am trying to read Fasta file using biojava with following code

SequenceIterator stream = SeqIOTools.readFastaDNA(new BufferedReader(new FileReader(fileName)));

for this I am importing following class
import org.biojava.bio.seq.io.SeqIOTools;
But it is showing a warning that "SeqIOTools" is deprecated.

Is there any other class which satisfies all the functionality of "SeqIOTools" class.

Please suggest me in this regard.

Thanks in Advance
Sreekanth. M


From mark.schreiber at novartis.com  Thu Jul 16 06:11:52 2009
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 16 Jul 2009 14:11:52 +0800
Subject: [Biojava-dev] Reg: SeqIOTools Class
In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com>
Message-ID: <OF2508938A.213CDBBD-ON482575F5.0021F5AA-482575F5.00220BB2@ah.novartis.com>

Hi -

The replacement for this class is RichSequence.IOTools

- Mark


biojava-dev-bounces at lists.open-bio.org wrote on 07/16/2009 02:07:19 PM:

> 
> Hi Everybody,
> 
> I am newly working with biojava. While I am trying to read Fasta 
> file using biojava with following code
> 
> SequenceIterator stream = SeqIOTools.readFastaDNA(new 
> BufferedReader(new FileReader(fileName)));
> 
> for this I am importing following class
> import org.biojava.bio.seq.io.SeqIOTools;
> But it is showing a warning that "SeqIOTools" is deprecated.
> 
> Is there any other class which satisfies all the functionality of 
> "SeqIOTools" class.
> 
> Please suggest me in this regard.
> 
> Thanks in Advance
> Sreekanth. M
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From sreekanth.m at ocimumbio.com  Thu Jul 16 06:38:31 2009
From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally)
Date: Thu, 16 Jul 2009 12:08:31 +0530
Subject: [Biojava-dev] Reg: Exporting into fasta format
In-Reply-To: <OF2508938A.213CDBBD-ON482575F5.0021F5AA-482575F5.00220BB2@ah.novartis.com>
References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com>
	<OF2508938A.213CDBBD-ON482575F5.0021F5AA-482575F5.00220BB2@ah.novartis.com>
Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F8916A7@EXCHMB.ocimumbio.com>

Hi Everybody,

I need to Export the sequences into fasta format.
Please suggest me how to export into fasta format.
I have written my own code to export it, but I want to implement it using biojava.


Thanks & Regards,
Sreekanth. M


From ayates at ebi.ac.uk  Sun Jul 19 19:15:41 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Sun, 19 Jul 2009 20:15:41 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
References: <1247505253.27493.15.camel@buzzybee>
Message-ID: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>

Okay another one for the ball:

1). Andy Yates
2). Erm well getting it right so I guess that lands me in testing/ 
integration/killing every singleton
3). Cambridge
4). Currently in a Perl group so not a chance really
5). Quite flexible

How does that sound?

On 13 Jul 2009, at 18:14, Richard Holland wrote:

> Hi all.
>
> Andreas and I would like to organise a hackathon to get the
> modularisation and general improvement plans for BJ3 into action, and
> bring the project forward into the 21st century (only 10 years late!).
>
> At this time I'm trying to gather interest and gauge who might
> realistically be able to attend. We will attempt to site the hackathon
> at a location closest to the majority of attendees.
>
> To help me plan numbers and likely costs (for potential sponsors)  
> could
> all those who are interested please answer the following questions for
> me:
>
> 1. Name,
> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
> 3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
> 4. Whether you think your employer would help pay your airfare and/ 
> or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
> 5. Approximate availability for the next 12 months.
>
> To get the ball rolling, here's me:
>
> 1. Richard Holland,  2. Making the whole thing more consistent,
> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
> otherwise can't do anything until mid-March 2010 onwards.
>
> Looking forward to hearing your comments! Once I have a good idea of
> numbers and distribution, I can get some costs together to give you  
> (and
> any potential sponsors) the best idea of what might be involved.
>
> cheers,
> Richard
>
> -- 
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From andreas at sdsc.edu  Mon Jul 20 20:11:52 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Jul 2009 13:11:52 -0700
Subject: [Biojava-dev] Fwd: Assembly data reading
In-Reply-To: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com>
References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com>
	<56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com>
	<1247462435.25217.7.camel@buzzybee>
	<93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com>
	<56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com>
Message-ID: <59a41c430907201311o4d287651k3adc2069b2c95f61@mail.gmail.com>

Hi Paolo,

Not sure if you got a response to your mail off list. If there is
sufficient interest from the people working on processing the output
of the various sequencers, it would be great if those people would
work together to get a new biojava module started. Most probably
somebody needs to take initiative and lead the development, otherwise
it won't happen.

Cheers,
Andreas


On Tue, Jul 14, 2009 at 9:08 AM, Paolo Pavan<paolo.pavan at gmail.com> wrote:
> Dear all,
> I took a day to make a rapid search to try to have a clearest point of
> the situation.
> ? ? ? ? I found the specification of the .sff file in the 454 instrument
> manual, it is fully described and seems to be enough to build a
> reader.
> ? ? ? ? However from a more careful read it seems that a *.sff file brings
> not information about the automatic contig assembling and only stores
> flowgram info that are "reads" (not like a *.ace file indeed).
> ? ? ? ? Two hidden binary files can be found in a 454 gsAssembler project
> folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are
> not described in the manual but they seem to contain the former
> nucleotide data and the latter read names, they are big enough to
> contain such kind of data, the problem is that we don't know how to
> parse them.
> ? ? ? ? It is necessary to decide a "memory structure" in which store the
> information read, I agree on the "memory mapping" solution, maybe
> implemented with a Map object that can associate the names of the read
> and its location on the file.
> ? ? ? ? the parser class then should expose methods to:
> ? ? ? ?1) iterate through reads, but maybe this should be heavy and avoidable
> ? ? ? ?2) access read sequence from name
> ? ? ? ? if the parser should manage the assembled contigs too and this is
> subordinated to what explained in the third bullet point, it should
> expose method to:
> ? ? ? ?1) iterate through contigs names
> ? ? ? ?2) iterate through contigs consensus sequences
> ? ? ? ?3) access consensus sequences from name (this is a sub problem of point 2)
> ? ? ? ?4) access random aligned portions (I mean "slice") of the assembly
> given start-end positions returning an alignment object
> ? ? ? ? any more suggestions?
> I would be glad to be involved in the biojava community through this
> project and I could try but first of all I want to say that I?m not a
> guru like most of the people here ( :-p ) and to say the truth the job
> that my company required me is different and maybe if exists a
> workaround I should be honest to choose it.
> So let me think a bit about starting such adventure, if I can couple
> my job and contributing the community growth I?ll be happy to share my
> work! Any suggestion welcome.
>
> Bye bye,
> Paolo
>
>
> 2009/7/13 Mark Schreiber <markjschreiber at gmail.com>:
>> I would agree that there is a strong need for this kind of thing in biojava.
>>
>> As Richard says you probably can't fit it in memory so you may want to
>> memory map it. There are classes in the javax.nio package that can help a
>> lot with this.
>>
>> Also I have had some success with in-memory compression of large files using
>> LZ compression. Essentially the memory representation of the file is LZ
>> compressed and compression and decompression are handled on the fly. Again
>> there are Java utility classes that can help.
>>
>> - Mark
>>
>> On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland <holland at eaglegenomics.com>
>> wrote:
>>>
>>> Nothing within BJ can parse the 454 .sff files directly. However I think
>>> there is a growing need for it so if anyone is willing to contribute
>>> code, it would be very welcome.
>>>
>>> There is also no .ace parser, although in 2007 someone volunteered to
>>> write one but nothing happened, and there was a previous post (many
>>> years ago!) from someone else who already had some working code but
>>> again nothing seems to have happened:
>>>
>>> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html
>>> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html
>>>
>>> So to start with, someone (perhaps yourself? that would be nice! :) )
>>> needs to volunteer to write either a .ace or .sff parser, or both.
>>>
>>> The thing to bear in mind with 454 contigs as you rightly point out is
>>> the sheer size of the things. The requirement to keep them entirely in
>>> memory is likely to be unworkable as it would leave little room for
>>> anything else to run on your average machine. I would suggest either
>>> memory-mapping the file itself, or parsing and writing out a
>>> memory-mapped summary file containing the bits of data you're interested
>>> in. (Memory-mapping is where you keep an index in memory indicating
>>> where in the file each record is, so that when you need to access them
>>> you load them on-the-fly from the file and drop them out of memory again
>>> immediately after use. An accelerated form of this is to put the loaded
>>> records into some kind of LRU cache which holds only the most recently
>>> accessed records and then check that cache first to see if you've
>>> already loaded the record before accessing the file directly.)
>>>
>>> cheers,
>>> Richard
>>>
>>>
>>> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote:
>>> > Hi,
>>> > I would like to post again with some adjustments a question I put some
>>> > times ago because maybe this is a more correct list, apologize for the
>>> > repeating.
>>> > Can someone kindly give me his advise?
>>> >
>>> > thank you in advance,
>>> > Paolo
>>> >
>>> >
>>> > ---------- Forwarded message ----------
>>> > From: Paolo Pavan <paolo.pavan at gmail.com>
>>> > Date: 2009/7/9
>>> > Subject: Assembly data reading
>>> > To: Biojava-l at lists.open-bio.org
>>> >
>>> >
>>> > Hi everybody,
>>> > I'm almost new to this topic, I would like to know if there is
>>> > something can help me to load in my java program data from a large 454
>>> > contig. I need to retain in memory and access data from the single
>>> > reads forming the contig too.
>>> > I suppose these informations are in a *.sff file, if it is not
>>> > possible to load such file it should be ok to load a *.ace (phrap)
>>> > data file that I have too.
>>> > Many thanks for any suggestion you can give me!
>>> >
>>> > Greetings,
>>> > Paolo
>>> > _______________________________________________
>>> > biojava-dev mailing list
>>> > biojava-dev at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From matias.piipari at gmail.com  Tue Jul 21 22:07:47 2009
From: matias.piipari at gmail.com (Matias Piipari)
Date: Tue, 21 Jul 2009 23:07:47 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>
References: <1247505253.27493.15.camel@buzzybee>
	<3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>
Message-ID: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com>

 1. Matias Piipari
 2. sequences + sequence motifs
 3. Cambridge
 4. Possible but not terribly likely.
 5. Flexible


On Sun, Jul 19, 2009 at 8:15 PM, Andy Yates <ayates at ebi.ac.uk> wrote:

> Okay another one for the ball:
>
> 1). Andy Yates
> 2). Erm well getting it right so I guess that lands me in
> testing/integration/killing every singleton
> 3). Cambridge
> 4). Currently in a Perl group so not a chance really
> 5). Quite flexible
>
> How does that sound?
>
>
> On 13 Jul 2009, at 18:14, Richard Holland wrote:
>
>  Hi all.
>>
>> Andreas and I would like to organise a hackathon to get the
>> modularisation and general improvement plans for BJ3 into action, and
>> bring the project forward into the 21st century (only 10 years late!).
>>
>> At this time I'm trying to gather interest and gauge who might
>> realistically be able to attend. We will attempt to site the hackathon
>> at a location closest to the majority of attendees.
>>
>> To help me plan numbers and likely costs (for potential sponsors) could
>> all those who are interested please answer the following questions for
>> me:
>>
>> 1. Name,
>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
>> sequencing, etc.),
>> 3. Your physical location (country and and nearest major city - e.g.
>> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>> 4. Whether you think your employer would help pay your airfare and/or 1
>> week in a hotel to attend (and how far you think you could go on such
>> funding),
>> 5. Approximate availability for the next 12 months.
>>
>> To get the ball rolling, here's me:
>>
>> 1. Richard Holland,  2. Making the whole thing more consistent,
>> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
>> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
>> otherwise can't do anything until mid-March 2010 onwards.
>>
>> Looking forward to hearing your comments! Once I have a good idea of
>> numbers and distribution, I can get some costs together to give you (and
>> any potential sponsors) the best idea of what might be involved.
>>
>> cheers,
>> Richard
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From HWillis at scripps.edu  Wed Jul 22 18:34:26 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Wed, 22 Jul 2009 14:34:26 -0400
Subject: [Biojava-dev] Bug Tracking
Message-ID: <C68CD3F2.17BE%HWillis@scripps.edu>

Do we have a formal defect/feature request tracking setup for Biojava?

In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following.

I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine

Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine.

Thanks

Scooter


From andreas at sdsc.edu  Wed Jul 22 19:15:50 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 22 Jul 2009 12:15:50 -0700
Subject: [Biojava-dev] Bug Tracking
In-Reply-To: <C68CD3F2.17BE%HWillis@scripps.edu>
References: <C68CD3F2.17BE%HWillis@scripps.edu>
Message-ID: <59a41c430907221215y10b95473y25fa9d94c82afc3f@mail.gmail.com>

Hi Scooter,

we have bugzilla running at:
http://bugzilla.open-bio.org/

Andreas

On Wed, Jul 22, 2009 at 11:34 AM, Scooter Willis<HWillis at scripps.edu> wrote:
> Do we have a formal defect/feature request tracking setup for Biojava?
>
> In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following.
>
> I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine
>
> Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine.
>
> Thanks
>
> Scooter
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at eaglegenomics.com  Wed Jul 22 19:16:19 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 22 Jul 2009 20:16:19 +0100
Subject: [Biojava-dev] Bug Tracking
In-Reply-To: <C68CD3F2.17BE%HWillis@scripps.edu>
References: <C68CD3F2.17BE%HWillis@scripps.edu>
Message-ID: <1248290179.28124.54.camel@buzzybee>

Yup, we do. It's here:

http://bugzilla.open-bio.org/

cheers,
Richard

On Wed, 2009-07-22 at 14:34 -0400, Scooter Willis wrote:
> Do we have a formal defect/feature request tracking setup for Biojava?
> 
> In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following.
> 
> I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine
> 
> Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine.
> 
> Thanks
> 
> Scooter
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From jw12 at sanger.ac.uk  Fri Jul 24 09:12:21 2009
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Fri, 24 Jul 2009 10:12:21 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
References: <1247505253.27493.15.camel@buzzybee>
Message-ID: <C98C8A44-8C85-46D7-8F31-94DD26FBC9E8@sanger.ac.uk>

Hi Richard and Andreas, you both know where I'm situated, but for the  
record:

1. Jonathan Warren
2. Any DAS related libararies, visualization
3. Cambridge
4. Yes I think so, definitely if in UK.
5. Anytime.... although obviously I'm very busy, just not many  
definite commitments ;)

There seem to be many DAS classes under the biojava-live site, but  
many of them are not used in any of the DAS related code I have, with  
the exception of Structure and Alignment classes that are used in some  
dazzle plugins. So it might be good to update some of these classes to  
new more relevant code. I also notice that Apollo was trying to get  
away from using Biojava code for it's DAS 1.5 adapter. Maybe the new  
modular design of biojava will resolve issues that Apollo developers  
had?

Anyway- I guess I'm asking Andreas or anyone else if they know the  
history of some of these classes e.g. org.biojava.bio.program.das  
package?


On 13 Jul 2009, at 18:14, Richard Holland wrote:

> 1. Name,
> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
> 3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
> 4. Whether you think your employer would help pay your airfare and/ 
> or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
> 5. Approximate availability for the next 12 months.

Jonathan Warren
Senior Developer and DAS coordinator
jw12 at sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From julie at flymine.org  Fri Jul 24 09:25:03 2009
From: julie at flymine.org (Julie Sullivan)
Date: Fri, 24 Jul 2009 10:25:03 +0100
Subject: [Biojava-dev] Hackathon
In-Reply-To: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com>
References: <1247505253.27493.15.camel@buzzybee>	<3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk>
	<15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com>
Message-ID: <4A697DEF.2020304@flymine.org>

1. Julie Sullivan
2. InterMine uses BioJava to handle sequences and pdb data
3. Cambridge
4. No
5. Flexible

>>> Andreas and I would like to organise a hackathon to get the
>>> modularisation and general improvement plans for BJ3 into action, and
>>> bring the project forward into the 21st century (only 10 years late!).
>>>
>>> At this time I'm trying to gather interest and gauge who might
>>> realistically be able to attend. We will attempt to site the hackathon
>>> at a location closest to the majority of attendees.
>>>
>>> To help me plan numbers and likely costs (for potential sponsors) could
>>> all those who are interested please answer the following questions for
>>> me:
>>>
>>> 1. Name,
>>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
>>> sequencing, etc.),
>>> 3. Your physical location (country and and nearest major city - e.g.
>>> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>>> 4. Whether you think your employer would help pay your airfare and/or 1
>>> week in a hotel to attend (and how far you think you could go on such
>>> funding),
>>> 5. Approximate availability for the next 12 months.
>>>
>>> To get the ball rolling, here's me:
>>>
>>> 1. Richard Holland,  2. Making the whole thing more consistent,
>>> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
>>> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
>>> otherwise can't do anything until mid-March 2010 onwards.
>>>
>>> Looking forward to hearing your comments! Once I have a good idea of
>>> numbers and distribution, I can get some costs together to give you (and
>>> any potential sponsors) the best idea of what might be involved.
>>>
>>> cheers,
>>> Richard
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 


From gmicha at gmail.com  Fri Jul 24 10:38:04 2009
From: gmicha at gmail.com (Micha Sammeth)
Date: Fri, 24 Jul 2009 12:38:04 +0200
Subject: [Biojava-dev] Hackathon
In-Reply-To: <1247505253.27493.15.camel@buzzybee>
References: <1247505253.27493.15.camel@buzzybee>
Message-ID: <4A698F0C.9090402@gmail.com>

1) Michael Sammeth

2) sequencing, gene expression, splicing, alignment

3) Barcelona beach

4) continental maybe yes, states probably not

5) not available from mid Oct to end of Nov,
    the rest of the time probably just busy as
    usual

Cheers,

micha


Richard Holland wrote:
> Hi all.
> 
> Andreas and I would like to organise a hackathon to get the
> modularisation and general improvement plans for BJ3 into action, and
> bring the project forward into the 21st century (only 10 years late!).
> 
> At this time I'm trying to gather interest and gauge who might
> realistically be able to attend. We will attempt to site the hackathon
> at a location closest to the majority of attendees.
> 
> To help me plan numbers and likely costs (for potential sponsors) could
> all those who are interested please answer the following questions for
> me:
> 
>  1. Name,
>  2. Specialist interest within Biojava (e.g. proteomics, microarrays,
> sequencing, etc.),
>  3. Your physical location (country and and nearest major city - e.g.
> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>  4. Whether you think your employer would help pay your airfare and/or 1
> week in a hotel to attend (and how far you think you could go on such
> funding),
>  5. Approximate availability for the next 12 months.
> 
> To get the ball rolling, here's me:
> 
>  1. Richard Holland,  2. Making the whole thing more consistent,
> efficient, and easier to use,  3. Southampton (UK),  4. Possibly but
> probably only within UK/Europe,  5. Only available in mid-Jan 2010,
> otherwise can't do anything until mid-March 2010 onwards.
> 
> Looking forward to hearing your comments! Once I have a good idea of
> numbers and distribution, I can get some costs together to give you (and
> any potential sponsors) the best idea of what might be involved.
> 
> cheers,
> Richard
> 


-- 
O       o O       o O       o    Dr. Michael Sammeth
| O   o | | O   o | | O   o |         http://www.sammeth.net
| | O | | | | O | GRIB| O   |         Phone: +34-933-160-166
| o   O | | o   O | | o   O |    Fax:   +34 933-969-983
o       O o       O o       O    Dr. Aiguader 88, 08003 Barcelona


From andreas at sdsc.edu  Fri Jul 24 15:23:42 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 24 Jul 2009 08:23:42 -0700
Subject: [Biojava-dev] Hackathon
In-Reply-To: <C98C8A44-8C85-46D7-8F31-94DD26FBC9E8@sanger.ac.uk>
References: <1247505253.27493.15.camel@buzzybee>
	<C98C8A44-8C85-46D7-8F31-94DD26FBC9E8@sanger.ac.uk>
Message-ID: <59a41c430907240823u5dc152i9c6a5854adf9ba0e@mail.gmail.com>

> There seem to be many DAS classes under the biojava-live site, but many of
> them are not used in any of the DAS related code I have, with the exception
> of Structure and Alignment classes that are used in some dazzle plugins.

Many of the org.biojava.bio.program.das  code is quite ancient and
should be deprecated... In a new biojava-das related module would be
nice to merge in one of the more modern DAS libraries (dasobert?) as a
replacement...

Andreas


> it might be good to update some of these classes to new more relevant code.
> I also notice that Apollo was trying to get away from using Biojava code for
> it's DAS 1.5 adapter. Maybe the new modular design of biojava will resolve
> issues that Apollo developers had?
>
> Anyway- I guess I'm asking Andreas or anyone else if they know the history
> of some of these classes e.g. org.biojava.bio.program.das package?
>
>
>
> On 13 Jul 2009, at 18:14, Richard Holland wrote:
>
>> 1. Name,
>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays,
>> sequencing, etc.),
>> 3. Your physical location (country and and nearest major city - e.g.
>> Cambridge, London, Newcastle, San Diego, Singapore, etc.),
>> 4. Whether you think your employer would help pay your airfare and/or 1
>> week in a hotel to attend (and how far you think you could go on such
>> funding),
>> 5. Approximate availability for the next 12 months.
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> jw12 at sanger.ac.uk
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a
> charity registered in England with number 1021457 and acompany registered in
> England with number 2742969, whose registeredoffice is 215 Euston Road,
> London, NW1 2BE._______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From holland at eaglegenomics.com  Mon Jul 27 13:23:19 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 27 Jul 2009 14:23:19 +0100
Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files?
In-Reply-To: <200907271416.33485.florian.mittag@uni-tuebingen.de>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com>
	<200907271416.33485.florian.mittag@uni-tuebingen.de>
Message-ID: <1248700999.2803.103.camel@buzzybee>


> My question to this list again:
> Is there a way to achieve my goal of parsing a 200MB Genbank file with the 
> current biojava version without code changes?

Probably not. The internal requirement to convert everything into
SymbolLists and back again really does get in the way. This is one of
the main drivers behind BioJava3 - to refactor out unnecessary
complexity, of which this is a prime example.

The ideal solution would be to parse the file and keep the sequence as a
string, only to be converted into Symbols when _absolutely necessary_ -
otherwise to remain as a string (or even just as a pointer to a string
stored on a disk-based temporary file repository somewhere, to save
memory). Hibernate et al could then work directly with the string.

cheers,
Richard

> 
> - Florian
> 
> 
> 
> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
> > wrote:
> >
> > Hi!
> >
> > I think this is a problem worth of its own thread, so I'll start one:
> >
> > I want to store all human chromosomes in a BioSQL database after I loaded
> > the
> > information from .gbk files. The files I get from NCBI with the following
> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804:
> >
> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0
> >00023&rettype=gbwithparts&retmode=text
> >
> > I then try to parse the files as described in
> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi
> >les but it wont work. While there are no problems parsing 1804 and 24,
> > chromosome
> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space.
> >
> > Here is a stack trace (the line numbers might differ, because I already
> > tried
> > to improve GenbankFormat.java in memory efficiency):
> >
> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> >        at
> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis
> >tFactory.java:222) at
> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ
> >enceBuilder.java:256) at
> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5
> >35) at
> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.
> >java:110) at
> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main.
> >java:537) at
> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46
> >8) at
> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164)
> >
> > The line in GenbankFormat.java is:
> >
> > rlistener.addSymbols(
> >        symParser.getAlphabet(),
> >        (Symbol[])(sl.toList().toArray(new Symbol[0])),
> >        0, sl.length());
> >
> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails
> > later
> > inside the addSymbols method, but it always fails.
> >
> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory
> > should be more than enough. Browsing through the source code, I discovered
> > what I think of as very inefficient handling of sequences:
> >
> > 1) the sequence string is read from file into a StringBuffer
> > 2) it is converted to a string (with whitespaces removed)
> > 3) a SimpleSymbolList is created out of the string
> > 4) the SymbolList is converted to a List of Symbols
> > 5) the List is converted to an array of Symbols
> > 6) the array is passed to addSymbols
> > 7) there it is added to a ChunkedSymbolListFactory
> > 8) if at some point the sequence is requested, a SymbolList is created and
> > then converted to a string.
> >
> > You see, there is a lot of copying and converting, but in the end I have
> > the same string I started with. Well, I had the string, if it ever reached
> > the end, because it will crash before completing this process.
> >
> >
> > Am I doing something wrong or is there a great potential of improving
> > parsing
> > of Genbank files?
> >
> >
> > Regards,
> >   Florian
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
-- 
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From paolo.pavan at gmail.com  Mon Jul 27 16:47:46 2009
From: paolo.pavan at gmail.com (Paolo Pavan)
Date: Mon, 27 Jul 2009 18:47:46 +0200
Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files?
In-Reply-To: <1248700999.2803.103.camel@buzzybee>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de>
	<93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com>
	<200907271416.33485.florian.mittag@uni-tuebingen.de>
	<1248700999.2803.103.camel@buzzybee>
Message-ID: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com>

Calling a garbage collection among the steps doesn't bring to
anything, isn't it?

2009/7/27 Richard Holland <holland at eaglegenomics.com>:
>
>> My question to this list again:
>> Is there a way to achieve my goal of parsing a 200MB Genbank file with the
>> current biojava version without code changes?
>
> Probably not. The internal requirement to convert everything into
> SymbolLists and back again really does get in the way. This is one of
> the main drivers behind BioJava3 - to refactor out unnecessary
> complexity, of which this is a prime example.
>
> The ideal solution would be to parse the file and keep the sequence as a
> string, only to be converted into Symbols when _absolutely necessary_ -
> otherwise to remain as a string (or even just as a pointer to a string
> stored on a disk-based temporary file repository somewhere, to save
> memory). Hibernate et al could then work directly with the string.
>
> cheers,
> Richard
>
>>
>> - Florian
>>
>>
>>
>> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
>> > wrote:
>> >
>> > Hi!
>> >
>> > I think this is a problem worth of its own thread, so I'll start one:
>> >
>> > I want to store all human chromosomes in a BioSQL database after I loaded
>> > the
>> > information from .gbk files. The files I get from NCBI with the following
>> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804:
>> >
>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0
>> >00023&rettype=gbwithparts&retmode=text
>> >
>> > I then try to parse the files as described in
>> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi
>> >les but it wont work. While there are no problems parsing 1804 and 24,
>> > chromosome
>> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space.
>> >
>> > Here is a stack trace (the line numbers might differ, because I already
>> > tried
>> > to improve GenbankFormat.java in memory efficiency):
>> >
>> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>> > ? ? ? ?at
>> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis
>> >tFactory.java:222) at
>> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ
>> >enceBuilder.java:256) at
>> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5
>> >35) at
>> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.
>> >java:110) at
>> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main.
>> >java:537) at
>> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46
>> >8) at
>> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164)
>> >
>> > The line in GenbankFormat.java is:
>> >
>> > rlistener.addSymbols(
>> > ? ? ? ?symParser.getAlphabet(),
>> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])),
>> > ? ? ? ?0, sl.length());
>> >
>> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails
>> > later
>> > inside the addSymbols method, but it always fails.
>> >
>> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory
>> > should be more than enough. Browsing through the source code, I discovered
>> > what I think of as very inefficient handling of sequences:
>> >
>> > 1) the sequence string is read from file into a StringBuffer
>> > 2) it is converted to a string (with whitespaces removed)
>> > 3) a SimpleSymbolList is created out of the string
>> > 4) the SymbolList is converted to a List of Symbols
>> > 5) the List is converted to an array of Symbols
>> > 6) the array is passed to addSymbols
>> > 7) there it is added to a ChunkedSymbolListFactory
>> > 8) if at some point the sequence is requested, a SymbolList is created and
>> > then converted to a string.
>> >
>> > You see, there is a lot of copying and converting, but in the end I have
>> > the same string I started with. Well, I had the string, if it ever reached
>> > the end, because it will crash before completing this process.
>> >
>> >
>> > Am I doing something wrong or is there a great potential of improving
>> > parsing
>> > of Genbank files?
>> >
>> >
>> > Regards,
>> > ? Florian
>> > _______________________________________________
>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From markjschreiber at gmail.com  Tue Jul 28 02:52:44 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 28 Jul 2009 10:52:44 +0800
Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files?
In-Reply-To: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com>
References: <200907241929.08768.florian.mittag@uni-tuebingen.de> 
	<93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> 
	<200907271416.33485.florian.mittag@uni-tuebingen.de>
	<1248700999.2803.103.camel@buzzybee> 
	<56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com>
Message-ID: <93b45ca50907271952k689c2f27h78bf7b9cc47e7d45@mail.gmail.com>

Dear Paolo -

Calling the garbage collector is generally not required and often not
recommended. Modern JVMs do a better job of this than programmers do.
Also a garbage collector cannot release memory that is allocated to
objects that still contain references. I suspect the problem here is
that objects are being copied and references are being retained to the
old copies. These old copies are not really required and therefore the
references can be set to null which will allow the GC to clean them
up.

Also, manually calling the GC is very aggressive and forces the JVM to
dump all classes it is not currently using, when the class is called
again the classloader will need to reload it which can result in a
performance hit.

- Mark

On Tue, Jul 28, 2009 at 12:47 AM, Paolo Pavan<paolo.pavan at gmail.com> wrote:
> Calling a garbage collection among the steps doesn't bring to
> anything, isn't it?
>
> 2009/7/27 Richard Holland <holland at eaglegenomics.com>:
>>
>>> My question to this list again:
>>> Is there a way to achieve my goal of parsing a 200MB Genbank file with the
>>> current biojava version without code changes?
>>
>> Probably not. The internal requirement to convert everything into
>> SymbolLists and back again really does get in the way. This is one of
>> the main drivers behind BioJava3 - to refactor out unnecessary
>> complexity, of which this is a prime example.
>>
>> The ideal solution would be to parse the file and keep the sequence as a
>> string, only to be converted into Symbols when _absolutely necessary_ -
>> otherwise to remain as a string (or even just as a pointer to a string
>> stored on a disk-based temporary file repository somewhere, to save
>> memory). Hibernate et al could then work directly with the string.
>>
>> cheers,
>> Richard
>>
>>>
>>> - Florian
>>>
>>>
>>>
>>> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" <florian.mittag at uni-tuebingen.de>
>>> > wrote:
>>> >
>>> > Hi!
>>> >
>>> > I think this is a problem worth of its own thread, so I'll start one:
>>> >
>>> > I want to store all human chromosomes in a BioSQL database after I loaded
>>> > the
>>> > information from .gbk files. The files I get from NCBI with the following
>>> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804:
>>> >
>>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0
>>> >00023&rettype=gbwithparts&retmode=text
>>> >
>>> > I then try to parse the files as described in
>>> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi
>>> >les but it wont work. While there are no problems parsing 1804 and 24,
>>> > chromosome
>>> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space.
>>> >
>>> > Here is a stack trace (the line numbers might differ, because I already
>>> > tried
>>> > to improve GenbankFormat.java in memory efficiency):
>>> >
>>> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>> > ? ? ? ?at
>>> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis
>>> >tFactory.java:222) at
>>> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ
>>> >enceBuilder.java:256) at
>>> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5
>>> >35) at
>>> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.
>>> >java:110) at
>>> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main.
>>> >java:537) at
>>> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46
>>> >8) at
>>> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164)
>>> >
>>> > The line in GenbankFormat.java is:
>>> >
>>> > rlistener.addSymbols(
>>> > ? ? ? ?symParser.getAlphabet(),
>>> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])),
>>> > ? ? ? ?0, sl.length());
>>> >
>>> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails
>>> > later
>>> > inside the addSymbols method, but it always fails.
>>> >
>>> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory
>>> > should be more than enough. Browsing through the source code, I discovered
>>> > what I think of as very inefficient handling of sequences:
>>> >
>>> > 1) the sequence string is read from file into a StringBuffer
>>> > 2) it is converted to a string (with whitespaces removed)
>>> > 3) a SimpleSymbolList is created out of the string
>>> > 4) the SymbolList is converted to a List of Symbols
>>> > 5) the List is converted to an array of Symbols
>>> > 6) the array is passed to addSymbols
>>> > 7) there it is added to a ChunkedSymbolListFactory
>>> > 8) if at some point the sequence is requested, a SymbolList is created and
>>> > then converted to a string.
>>> >
>>> > You see, there is a lot of copying and converting, but in the end I have
>>> > the same string I started with. Well, I had the string, if it ever reached
>>> > the end, because it will crash before completing this process.
>>> >
>>> >
>>> > Am I doing something wrong or is there a great potential of improving
>>> > parsing
>>> > of Genbank files?
>>> >
>>> >
>>> > Regards,
>>> > ? Florian
>>> > _______________________________________________
>>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From abhishek.vit at gmail.com  Tue Jul 28 19:04:00 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Tue, 28 Jul 2009 15:04:00 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
Message-ID: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>

Hi Guys

Before I ask the question, let me introduce myself. I am Abhishek
primarily a Bioinformatician and this is my first mail here. I
realized sooner thn later that I have to use BioJava to make my life
easier. :)

So basically we have a lot of perl code where we would like to plugin
some Biojava code and some inhouse written packages/classes. I am just
wondering what is the best way to do so. Clearly I am not a java guy
so please excuse me in case I am asking something which is very basic.
I found couple of solutions after few googles but not sure which is
the efficient one.

Thanks,
-Abhi


From ayates at ebi.ac.uk  Tue Jul 28 21:54:17 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 28 Jul 2009 22:54:17 +0100
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
Message-ID: <A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>

Hi Abhi,

Well to answer your first question the only real way to do this is by  
shelling out to Java. Inter-process communication could then be dealt  
with by writing to temporary files or maybe communicating back over  
STDOUT.

The question I would ask you though is what particular part of BioJava  
are you using? Is there any reason why another similarly named Bio  
project (shall not mention it here as I think people think I'm  
becoming weak when it comes to Perl) cannot be used? As always when  
programming avoiding shelling out to another program if possible is  
always a good idea; sometimes it cannot happen say if you want to run  
clustalw but say shelling out to delete a file is unnecessary.

Andy

On 28 Jul 2009, at 20:04, Abhishek Pratap wrote:

> Hi Guys
>
> Before I ask the question, let me introduce myself. I am Abhishek
> primarily a Bioinformatician and this is my first mail here. I
> realized sooner thn later that I have to use BioJava to make my life
> easier. :)
>
> So basically we have a lot of perl code where we would like to plugin
> some Biojava code and some inhouse written packages/classes. I am just
> wondering what is the best way to do so. Clearly I am not a java guy
> so please excuse me in case I am asking something which is very basic.
> I found couple of solutions after few googles but not sure which is
> the efficient one.
>
> Thanks,
> -Abhi
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From abhishek.vit at gmail.com  Tue Jul 28 22:23:49 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Tue, 28 Jul 2009 18:23:49 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
Message-ID: <be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>

Hi Andy

Thanks for a quick reply.  I think SHELLING out will be too process
intensive as we expect thousands of call to same Java method. I also
read about the Perl modules Java::Inline. Is that any good ?

And to answer your second question I am basically using a inhouse
method which in turns used a lot of BioJava classes for DNA
manipulation.

Thanks,
-Abhi

On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote:
> Hi Abhi,
>
> Well to answer your first question the only real way to do this is by
> shelling out to Java. Inter-process communication could then be dealt with
> by writing to temporary files or maybe communicating back over STDOUT.
>
> The question I would ask you though is what particular part of BioJava are
> you using? Is there any reason why another similarly named Bio project
> (shall not mention it here as I think people think I'm becoming weak when it
> comes to Perl) cannot be used? As always when programming avoiding shelling
> out to another program if possible is always a good idea; sometimes it
> cannot happen say if you want to run clustalw but say shelling out to delete
> a file is unnecessary.
>
> Andy
>
> On 28 Jul 2009, at 20:04, Abhishek Pratap wrote:
>
>> Hi Guys
>>
>> Before I ask the question, let me introduce myself. I am Abhishek
>> primarily a Bioinformatician and this is my first mail here. I
>> realized sooner thn later that I have to use BioJava to make my life
>> easier. :)
>>
>> So basically we have a lot of perl code where we would like to plugin
>> some Biojava code and some inhouse written packages/classes. I am just
>> wondering what is the best way to do so. Clearly I am not a java guy
>> so please excuse me in case I am asking something which is very basic.
>> I found couple of solutions after few googles but not sure which is
>> the efficient one.
>>
>> Thanks,
>> -Abhi
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>


From markjschreiber at gmail.com  Tue Jul 28 23:24:30 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 29 Jul 2009 07:24:30 +0800
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
Message-ID: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>

Hi -

You could try and use something like CORBA but that would be quite ugly.

A nicer alternative would be to put the BioJava functionality in a web
service and send sequences as FASTA or some custom format??

I think WS is considered the best way for Java and .NET to talk so probably
it is for Perl too.

- Mark

On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:

Hi Andy

Thanks for a quick reply.  I think SHELLING out will be too process
intensive as we expect thousands of call to same Java method. I also
read about the Perl modules Java::Inline. Is that any good ?

And to answer your second question I am basically using a inhouse
method which in turns used a lot of BioJava classes for DNA
manipulation.

Thanks,
-Abhi

On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
Abhi, > > Well to answer ...


From ayates at ebi.ac.uk  Wed Jul 29 08:48:33 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 29 Jul 2009 09:48:33 +0100
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>	
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>	
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>	
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
	<93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>
Message-ID: <4A700CE1.3050901@ebi.ac.uk>

Indeed I would agree with Mark here and go for web services as the
desired solution. JAX-WS & CXF are both popular frameworks for doing Web
Services and as far as I remember Spring had some very nice helper
classes for quickly exposing any Java class as a web service. Then there
are other remoting protocols such as Hessian, Burlap, Protocol Buffers
or Thrift all of which are good in their own ways.

However Web Services should be the quickest (re implementation) way to
communicate with a persistent Java process.

Personally I would stay away from Java::Inline.

Andy

Mark Schreiber wrote:
> Hi -
> 
> You could try and use something like CORBA but that would be quite ugly.
> 
> A nicer alternative would be to put the BioJava functionality in a web
> service and send sequences as FASTA or some custom format??
> 
> I think WS is considered the best way for Java and .NET to talk so probably
> it is for Perl too.
> 
> - Mark
> 
> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:
> 
> Hi Andy
> 
> Thanks for a quick reply.  I think SHELLING out will be too process
> intensive as we expect thousands of call to same Java method. I also
> read about the Perl modules Java::Inline. Is that any good ?
> 
> And to answer your second question I am basically using a inhouse
> method which in turns used a lot of BioJava classes for DNA
> manipulation.
> 
> Thanks,
> -Abhi
> 
> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
> Abhi, > > Well to answer ...
> 


From abhishek.vit at gmail.com  Wed Jul 29 16:04:04 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Wed, 29 Jul 2009 12:04:04 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <4A700CE1.3050901@ebi.ac.uk>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>
	<93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>
	<4A700CE1.3050901@ebi.ac.uk>
Message-ID: <be9b52410907290904y2d92bbaep6217321e62d15577@mail.gmail.com>

Thanks all. I think Java WS is a way out for me then. As you said it
would be code agnostic and will help me in updating the core code
later.

Just a quick question . Do you happen to know of any good tutorial to
implement a WS for a java process.

Thanks,
-Abhi

On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates<ayates at ebi.ac.uk> wrote:
> Indeed I would agree with Mark here and go for web services as the
> desired solution. JAX-WS & CXF are both popular frameworks for doing Web
> Services and as far as I remember Spring had some very nice helper
> classes for quickly exposing any Java class as a web service. Then there
> are other remoting protocols such as Hessian, Burlap, Protocol Buffers
> or Thrift all of which are good in their own ways.
>
> However Web Services should be the quickest (re implementation) way to
> communicate with a persistent Java process.
>
> Personally I would stay away from Java::Inline.
>
> Andy
>
> Mark Schreiber wrote:
>> Hi -
>>
>> You could try and use something like CORBA but that would be quite ugly.
>>
>> A nicer alternative would be to put the BioJava functionality in a web
>> service and send sequences as FASTA or some custom format??
>>
>> I think WS is considered the best way for Java and .NET to talk so probably
>> it is for Perl too.
>>
>> - Mark
>>
>> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:
>>
>> Hi Andy
>>
>> Thanks for a quick reply. ?I think SHELLING out will be too process
>> intensive as we expect thousands of call to same Java method. I also
>> read about the Perl modules Java::Inline. Is that any good ?
>>
>> And to answer your second question I am basically using a inhouse
>> method which in turns used a lot of BioJava classes for DNA
>> manipulation.
>>
>> Thanks,
>> -Abhi
>>
>> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
>> Abhi, > > Well to answer ...
>>
>


From ayates at ebi.ac.uk  Wed Jul 29 16:21:12 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 29 Jul 2009 17:21:12 +0100
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <be9b52410907290904y2d92bbaep6217321e62d15577@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>	
	<A76BE049-94FA-4C81-BE55-5AB1A118D972@ebi.ac.uk>	
	<be9b52410907281523y46b1a36ey81a6fba2021c07d0@mail.gmail.com>	
	<93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com>	
	<93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com>	
	<4A700CE1.3050901@ebi.ac.uk>
	<be9b52410907290904y2d92bbaep6217321e62d15577@mail.gmail.com>
Message-ID: <4A7076F8.2040308@ebi.ac.uk>

Depends on what you're going to use but when I last did it I bought into
the Spring way of things and found that the spring manual was very good.
The WS bit is:

http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/ch21s05.html

It goes through doing it for JAX-WS & XFire.

There's also a JAX-WS tutorial from:

http://java.sun.com/javaee/5/docs/tutorial/doc/?wp405739&JAXWS.html#wp72279

To be honest though Google is your best friend here.

Good luck,

Andy

Abhishek Pratap wrote:
> Thanks all. I think Java WS is a way out for me then. As you said it
> would be code agnostic and will help me in updating the core code
> later.
> 
> Just a quick question . Do you happen to know of any good tutorial to
> implement a WS for a java process.
> 
> Thanks,
> -Abhi
> 
> On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates<ayates at ebi.ac.uk> wrote:
>> Indeed I would agree with Mark here and go for web services as the
>> desired solution. JAX-WS & CXF are both popular frameworks for doing Web
>> Services and as far as I remember Spring had some very nice helper
>> classes for quickly exposing any Java class as a web service. Then there
>> are other remoting protocols such as Hessian, Burlap, Protocol Buffers
>> or Thrift all of which are good in their own ways.
>>
>> However Web Services should be the quickest (re implementation) way to
>> communicate with a persistent Java process.
>>
>> Personally I would stay away from Java::Inline.
>>
>> Andy
>>
>> Mark Schreiber wrote:
>>> Hi -
>>>
>>> You could try and use something like CORBA but that would be quite ugly.
>>>
>>> A nicer alternative would be to put the BioJava functionality in a web
>>> service and send sequences as FASTA or some custom format??
>>>
>>> I think WS is considered the best way for Java and .NET to talk so probably
>>> it is for Perl too.
>>>
>>> - Mark
>>>
>>> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" <abhishek.vit at gmail.com> wrote:
>>>
>>> Hi Andy
>>>
>>> Thanks for a quick reply.  I think SHELLING out will be too process
>>> intensive as we expect thousands of call to same Java method. I also
>>> read about the Perl modules Java::Inline. Is that any good ?
>>>
>>> And to answer your second question I am basically using a inhouse
>>> method which in turns used a lot of BioJava classes for DNA
>>> manipulation.
>>>
>>> Thanks,
>>> -Abhi
>>>
>>> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates<ayates at ebi.ac.uk> wrote: > Hi
>>> Abhi, > > Well to answer ...
>>>


From Russell.Smithies at agresearch.co.nz  Wed Jul 29 20:25:05 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 30 Jul 2009 08:25:05 +1200
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>

You could always use BioPerl instead :-)
http://www.bioperl.org/wiki/Main_Page


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E? russell.smithies at agresearch.co.nz 

Invermay? Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T? +64 3 489 3809?? 
F? +64 3 489 9174? 
www.agresearch.co.nz 


> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-
> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
> Sent: Wednesday, 29 July 2009 7:04 a.m.
> To: biojava-dev at lists.open-bio.org
> Subject: [Biojava-dev] Hi.. Calling Java from Perl
> 
> Hi Guys
> 
> Before I ask the question, let me introduce myself. I am Abhishek
> primarily a Bioinformatician and this is my first mail here. I
> realized sooner thn later that I have to use BioJava to make my life
> easier. :)
> 
> So basically we have a lot of perl code where we would like to plugin
> some Biojava code and some inhouse written packages/classes. I am just
> wondering what is the best way to do so. Clearly I am not a java guy
> so please excuse me in case I am asking something which is very basic.
> I found couple of solutions after few googles but not sure which is
> the efficient one.
> 
> Thanks,
> -Abhi
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From ayates at ebi.ac.uk  Wed Jul 29 20:56:34 2009
From: ayates at ebi.ac.uk (ayates at ebi.ac.uk)
Date: Wed, 29 Jul 2009 21:56:34 +0100 (BST)
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>
Message-ID: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk>

That was my original point however it sounds from the original poster that
the system which is in Perl needs to call out to an already implemented
system in BioJava. In a perfect world this mismatch would never happen but
hey we all know it can :)

Andy

> You could always use BioPerl instead :-)
> http://www.bioperl.org/wiki/Main_Page
>
>
>
>
> Russell Smithies
>
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E? russell.smithies at agresearch.co.nz
>
> Invermay? Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T? +64 3 489 3809??
> F? +64 3 489 9174?
> www.agresearch.co.nz
>
>
>
>> -----Original Message-----
>> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-
>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>> Sent: Wednesday, 29 July 2009 7:04 a.m.
>> To: biojava-dev at lists.open-bio.org
>> Subject: [Biojava-dev] Hi.. Calling Java from Perl
>>
>> Hi Guys
>>
>> Before I ask the question, let me introduce myself. I am Abhishek
>> primarily a Bioinformatician and this is my first mail here. I
>> realized sooner thn later that I have to use BioJava to make my life
>> easier. :)
>>
>> So basically we have a lot of perl code where we would like to plugin
>> some Biojava code and some inhouse written packages/classes. I am just
>> wondering what is the best way to do so. Clearly I am not a java guy
>> so please excuse me in case I am asking something which is very basic.
>> I found couple of solutions after few googles but not sure which is
>> the efficient one.
>>
>> Thanks,
>> -Abhi
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From abhishek.vit at gmail.com  Wed Jul 29 21:06:54 2009
From: abhishek.vit at gmail.com (Abhishek Pratap)
Date: Wed, 29 Jul 2009 17:06:54 -0400
Subject: [Biojava-dev] Hi.. Calling Java from Perl
In-Reply-To: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk>
References: <be9b52410907281204l4ebe3340qade871abdc1c05b5@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz>
	<35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk>
Message-ID: <be9b52410907291406t6845bdeau99cd047e146e3395@mail.gmail.com>

Yeah it is part of the development cycle. We need to integrate the
code some part of it is perl and some in Java.

>From your suggestions I feel I clearly have two options.
1. use a web service to talk between java and perl. This might not be
very efficient as we expect to make thousands of call per run.
2. Port the whole java code to bioperl.

#2 is scary but I might just have to do it.

Thanks again to all of you,
-Abhi

On Wed, Jul 29, 2009 at 4:56 PM, <ayates at ebi.ac.uk> wrote:
> That was my original point however it sounds from the original poster that
> the system which is in Perl needs to call out to an already implemented
> system in BioJava. In a perfect world this mismatch would never happen but
> hey we all know it can :)
>
> Andy
>
>> You could always use BioPerl instead :-)
>> http://www.bioperl.org/wiki/Main_Page
>>
>>
>>
>>
>> Russell Smithies
>>
>> Bioinformatics Applications Developer
>> T +64 3 489 9085
>> E? russell.smithies at agresearch.co.nz
>>
>> Invermay? Research Centre
>> Puddle Alley,
>> Mosgiel,
>> New Zealand
>> T? +64 3 489 3809
>> F? +64 3 489 9174
>> www.agresearch.co.nz
>>
>>
>>
>>> -----Original Message-----
>>> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-
>>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>>> Sent: Wednesday, 29 July 2009 7:04 a.m.
>>> To: biojava-dev at lists.open-bio.org
>>> Subject: [Biojava-dev] Hi.. Calling Java from Perl
>>>
>>> Hi Guys
>>>
>>> Before I ask the question, let me introduce myself. I am Abhishek
>>> primarily a Bioinformatician and this is my first mail here. I
>>> realized sooner thn later that I have to use BioJava to make my life
>>> easier. :)
>>>
>>> So basically we have a lot of perl code where we would like to plugin
>>> some Biojava code and some inhouse written packages/classes. I am just
>>> wondering what is the best way to do so. Clearly I am not a java guy
>>> so please excuse me in case I am asking something which is very basic.
>>> I found couple of solutions after few googles but not sure which is
>>> the efficient one.
>>>
>>> Thanks,
>>> -Abhi
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>


From niall at sgenomics.org  Thu Jul 30 16:32:00 2009
From: niall at sgenomics.org (Niall Haslam)
Date: Thu, 30 Jul 2009 18:32:00 +0200
Subject: [Biojava-dev] Webservices
Message-ID: <200907301832.01103.niall@sgenomics.org>

Hi,

I know it was brought up in the users list a month or two ago. But I wanted to 
ask in the Dev list what the consensus is on creating a biojava module for 
webservices clients. I am interested and have a little code to contribute. I 
think it would consist of mainly example code in how to use the webservice. 
And critically would not incorporate the stub code generated by axis. I would 
also bump for axis2. I think this could have the benefit of making services 
more standards compliant. But we'll probably have to do it on a case by case 
basis. 

I'd also like to know if there are people who are interested in using or 
writing some of it as well.

Thanks and looking forward to your input,

Niall.


From HWillis at scripps.edu  Fri Jul 31 14:04:31 2009
From: HWillis at scripps.edu (Scooter Willis)
Date: Fri, 31 Jul 2009 10:04:31 -0400
Subject: [Biojava-dev] Webservices
In-Reply-To: <200907301832.01103.niall@sgenomics.org>
Message-ID: <C698722F.1935%HWillis@scripps.edu>

Niall

I have the web services biojava implementation on my list of things to do! I have an upcoming project that doing Blast through web services to external sources and internal sources will make things easier. I like what axis2 is doing on making it easy to publish web services but using Netbeans as an example it is fairly painless to create a web service. Since we are mainly focused on consuming web services it would be nice to use the built in support of Java 6 to keep the external library count as low as possible which also helps avoid conflicts when an external application is using a different version of the same external library.

I think the main driving force as you mention is that much will depend on the provider of the web service as to what web services client library will be needed.

Thanks

Scooter


On 7/30/09 12:32 PM, "Niall Haslam" <niall at sgenomics.org> wrote:

Hi,

I know it was brought up in the users list a month or two ago. But I wanted to
ask in the Dev list what the consensus is on creating a biojava module for
webservices clients. I am interested and have a little code to contribute. I
think it would consist of mainly example code in how to use the webservice.
And critically would not incorporate the stub code generated by axis. I would
also bump for axis2. I think this could have the benefit of making services
more standards compliant. But we'll probably have to do it on a case by case
basis.

I'd also like to know if there are people who are interested in using or
writing some of it as well.

Thanks and looking forward to your input,

Niall.
_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev