From daniel.rohrbach at web.de  Sun Feb 11 07:11:13 2007
From: daniel.rohrbach at web.de (Daniel Rohrbach)
Date: Sun, 11 Feb 2007 13:11:13 +0100
Subject: [Biojava-dev] Hi, introduction of myself
Message-ID: <1386536776@web.de>

Hello,

my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me.    

I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. 

A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work.

Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging).    

Apart from the work on the genetic project I'm interested to participate as a  voluntary developer for the BioJava project.    Please write me if there is a way to work on this projekt.

Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models.

Many greetings 
Daniel
__________________________________________________________________________
Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach!		
Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131


From holland at ebi.ac.uk  Mon Feb 12 06:20:19 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Mon, 12 Feb 2007 11:20:19 +0000
Subject: [Biojava-dev] Hi, introduction of myself
In-Reply-To: <1386536776@web.de>
References: <1386536776@web.de>
Message-ID: <45D04D73.9060000@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The best thing to do is to identify an area of BioJava which you feel
needs improvement and jump right in. Most of BioJava is
sequence-centric, which was fine when it was written but things have
moved on since then. We therefore could do with some new functionality
for the more popular current research areas, e.g. proteomics,
microarrays, phylogenetics, etc.

cheers,
Richard

Daniel Rohrbach wrote:
> Hello,
> 
> my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me.    
> 
> I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. 
> 
> A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work.
> 
> Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging).    
> 
> Apart from the work on the genetic project I'm interested to participate as a  voluntary developer for the BioJava project.    Please write me if there is a way to work on this projekt.
> 
> Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models.
> 
> Many greetings 
> Daniel
> __________________________________________________________________________
> Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach!		
> Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF0E1z4C5LeMEKA/QRAgWkAJ9of7i/GIdTAkbhGCK4dwZUX+rr/wCfckE0
tLxNF0QBORtSBi9QdJNA+Hw=
=kDtn
-----END PGP SIGNATURE-----

From smh1008 at cam.ac.uk  Tue Feb 13 08:47:03 2007
From: smh1008 at cam.ac.uk (David Huen)
Date: 13 Feb 2007 13:47:03 +0000
Subject: [Biojava-dev] AlphabetManager.createSymbol(...)
Message-ID: <Prayer.1.0.18.0702131347030.29216@hermes-1.csi.cam.ac.uk>

Hi, The current implementation of the above for basis symbols creates a 
symbol then caches it. I suggest that this is an undesirable behaviour.

First, it is quite possible for a cross-product behaviour to have a 
potentially huge number of symbols and that a significant fraction of these 
can be instantiated once only, e.g. when reading thru a 12-species genome 
alignment. Caching every instantiated cross-product symbol under these 
circumstances is very expensive on memory and also pointless.

Next, the existing cache is a Map keyed on a list of Symbols. This forces 
all caching to run off this implementation which can be inefficient for 
certain alphabets.

I propose to change the behaviour to leave all symbol implementation 
details and caching in cross-product/basis alphabets (including uniqueness 
checking) to the alphabet implementation. Are there any implications that I 
may not have considered (is it OK with serialisation?). Or objections? I 
think this change can be done without breaking the API.

Another change which I would like to be considered at some future stage (BJ 
2.0?) is a means of dealing with really large alphabets (think DNA**n). The 
size of the alphabet can readily exceed the limits of an int and therefore 
a solution will require breaking our FiniteAlphabet and AlphabetIndex APIs. 
I propose some extension that allows returning results for size and index 
in terms of BigInteger.

Regards,
David

-- 
David Huen
Dept of Genetics
University of Cambridge
CB2 3EH
U.K.


From mark.schreiber at novartis.com  Thu Feb 15 04:31:08 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 15 Feb 2007 17:31:08 +0800
Subject: [Biojava-dev] AlphabetManager.createSymbol(...)
Message-ID: <OFC83783D6.284BDBE8-ON48257283.0033C24C-48257283.00344AEE@ah.novartis.com>

>Hi, The current implementation of the above for basis symbols creates a 
>symbol then caches it. I suggest that this is an undesirable behaviour.
>
>First, it is quite possible for a cross-product behaviour to have a 
>potentially huge number of symbols and that a significant fraction of 
these 
>can be instantiated once only, e.g. when reading thru a 12-species genome 

>alignment. Caching every instantiated cross-product symbol under these 
>circumstances is very expensive on memory and also pointless.
>
>Next, the existing cache is a Map keyed on a list of Symbols. This forces 

>all caching to run off this implementation which can be inefficient for 
>certain alphabets.
>
>I propose to change the behaviour to leave all symbol implementation 
>details and caching in cross-product/basis alphabets (including 
uniqueness 
>checking) to the alphabet implementation. Are there any implications that 
I 
>may not have considered (is it OK with serialisation?). Or objections? I 
>think this change can be done without breaking the API.

I think as long as it is well documented how to do the caching that would 
be fine. Would you keep cahcing for core alphabets like DNA?

I suspect it might cause problems with serialisation but it might be 
avoidable. As long as there are unit tests for serialisation of both 
cached and unchached alphabets it should be OK. Careful attention might be 
needed for Gaps??

>Another change which I would like to be considered at some future stage 
(BJ 
>2.0?) is a means of dealing with really large alphabets (think DNA**n). 
The 
>size of the alphabet can readily exceed the limits of an int and 
therefore 
>a solution will require breaking our FiniteAlphabet and AlphabetIndex 
APIs. 
>I propose some extension that allows returning results for size and index 

>in terms of BigInteger.

A similar suggestion has been made in the past for indexing SymbolLists in 
terms of BigInteger. How practical would such a large alphabet be? Eg 
unless you expect it to be pretty sparse in terms of the number of 
possible symbols that are actually seen you might get major problems with 
memory.

- Mark

>Regards,
>David

-- 
David Huen
Dept of Genetics
University of Cambridge
CB2 3EH
U.K.

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From smh1008 at cam.ac.uk  Thu Feb 15 07:16:49 2007
From: smh1008 at cam.ac.uk (David Huen)
Date: 15 Feb 2007 12:16:49 +0000
Subject: [Biojava-dev] AlphabetManager.createSymbol(...)
In-Reply-To: <OFC83783D6.284BDBE8-ON48257283.0033C24C-48257283.00344AEE@ah.novartis.com>
References: <OFC83783D6.284BDBE8-ON48257283.0033C24C-48257283.00344AEE@ah.novartis.com>
Message-ID: <Prayer.1.0.18.0702151216490.2112@hermes-1.csi.cam.ac.uk>

On Feb 15 2007, mark.schreiber at novartis.com wrote:

>
>A similar suggestion has been made in the past for indexing SymbolLists in 
>terms of BigInteger. How practical would such a large alphabet be? Eg 
>unless you expect it to be pretty sparse in terms of the number of 
>possible symbols that are actually seen you might get major problems with 
>memory.
>
I think it is practical in the sense that even a simple (AA)^10 alphabet 
will exceed the range of int but an alignment of 10 proteins may only be, 
say, 1000 residues long so only a max of 1000 symbols will ever be 
instantiated with much fewer needing to remain instantiated throughout the 
run. I see less point for SymbolLists in that it seems unlikely that any 
chromosome could have more than an int's worth of bases.

The main reason I need these huge alphabets is for 1-D HMMs that run over 
genome alignments. I also hope to internally representing symbols in these 
alphabets by BigInteger values of their alphabet index.

Incidentally, the SparseCrossProductAlphabet appeared to be caching every 
symbol it was ever asked for and I have changed that to a WeakValueHashMap 
internally now.

Regards,
David


-- 
David Huen
Dept of Genetics
University of Cambridge
CB2 3EH
U.K.


From kirankumarvoona at hotmail.com  Wed Feb 21 12:37:05 2007
From: kirankumarvoona at hotmail.com (kiran kumar)
Date: Wed, 21 Feb 2007 09:37:05 -0800 (PST)
Subject: [Biojava-dev] Invitation to connect on LinkedIn
Message-ID: <10449181.1172079425867.JavaMail.app@app04.prod>

s,

I'd like to add you to my professional network on LinkedIn.

-kiran

PS: Here is the link: 
https://www.linkedin.com/e/isd/73729563/Jww4EOmW/

It is free to join and takes less than 60 seconds to sign up.

This is an exclusive invitation from kiran kumar to s biojava-dev at biojava.org.  For security reasons, please do not forward this invitation.

From hl450 at hotmail.com  Tue Feb 13 02:14:00 2007
From: hl450 at hotmail.com (Lee Heewook)
Date: Tue, 13 Feb 2007 07:14:00 -0000
Subject: [Biojava-dev] Changing the sample name of the ABI file
Message-ID: <BAY127-F25B709EDDC57334A92E002FA900@phx.gbl>

Is there way to change the sample name of the ABI file? 

Hee


From daniel.rohrbach at web.de  Sun Feb 11 12:11:13 2007
From: daniel.rohrbach at web.de (Daniel Rohrbach)
Date: Sun, 11 Feb 2007 13:11:13 +0100
Subject: [Biojava-dev] Hi, introduction of myself
Message-ID: <1386536776@web.de>

Hello,

my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me.    

I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. 

A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work.

Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging).    

Apart from the work on the genetic project I'm interested to participate as a  voluntary developer for the BioJava project.    Please write me if there is a way to work on this projekt.

Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models.

Many greetings 
Daniel
__________________________________________________________________________
Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach!		
Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131


From holland at ebi.ac.uk  Mon Feb 12 11:20:19 2007
From: holland at ebi.ac.uk (Richard Holland)
Date: Mon, 12 Feb 2007 11:20:19 +0000
Subject: [Biojava-dev] Hi, introduction of myself
In-Reply-To: <1386536776@web.de>
References: <1386536776@web.de>
Message-ID: <45D04D73.9060000@ebi.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The best thing to do is to identify an area of BioJava which you feel
needs improvement and jump right in. Most of BioJava is
sequence-centric, which was fine when it was written but things have
moved on since then. We therefore could do with some new functionality
for the more popular current research areas, e.g. proteomics,
microarrays, phylogenetics, etc.

cheers,
Richard

Daniel Rohrbach wrote:
> Hello,
> 
> my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me.    
> 
> I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. 
> 
> A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work.
> 
> Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging).    
> 
> Apart from the work on the genetic project I'm interested to participate as a  voluntary developer for the BioJava project.    Please write me if there is a way to work on this projekt.
> 
> Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models.
> 
> Many greetings 
> Daniel
> __________________________________________________________________________
> Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach!		
> Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF0E1z4C5LeMEKA/QRAgWkAJ9of7i/GIdTAkbhGCK4dwZUX+rr/wCfckE0
tLxNF0QBORtSBi9QdJNA+Hw=
=kDtn
-----END PGP SIGNATURE-----


From smh1008 at cam.ac.uk  Tue Feb 13 13:47:03 2007
From: smh1008 at cam.ac.uk (David Huen)
Date: 13 Feb 2007 13:47:03 +0000
Subject: [Biojava-dev] AlphabetManager.createSymbol(...)
Message-ID: <Prayer.1.0.18.0702131347030.29216@hermes-1.csi.cam.ac.uk>

Hi, The current implementation of the above for basis symbols creates a 
symbol then caches it. I suggest that this is an undesirable behaviour.

First, it is quite possible for a cross-product behaviour to have a 
potentially huge number of symbols and that a significant fraction of these 
can be instantiated once only, e.g. when reading thru a 12-species genome 
alignment. Caching every instantiated cross-product symbol under these 
circumstances is very expensive on memory and also pointless.

Next, the existing cache is a Map keyed on a list of Symbols. This forces 
all caching to run off this implementation which can be inefficient for 
certain alphabets.

I propose to change the behaviour to leave all symbol implementation 
details and caching in cross-product/basis alphabets (including uniqueness 
checking) to the alphabet implementation. Are there any implications that I 
may not have considered (is it OK with serialisation?). Or objections? I 
think this change can be done without breaking the API.

Another change which I would like to be considered at some future stage (BJ 
2.0?) is a means of dealing with really large alphabets (think DNA**n). The 
size of the alphabet can readily exceed the limits of an int and therefore 
a solution will require breaking our FiniteAlphabet and AlphabetIndex APIs. 
I propose some extension that allows returning results for size and index 
in terms of BigInteger.

Regards,
David

-- 
David Huen
Dept of Genetics
University of Cambridge
CB2 3EH
U.K.


From mark.schreiber at novartis.com  Thu Feb 15 09:31:08 2007
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 15 Feb 2007 17:31:08 +0800
Subject: [Biojava-dev] AlphabetManager.createSymbol(...)
Message-ID: <OFC83783D6.284BDBE8-ON48257283.0033C24C-48257283.00344AEE@ah.novartis.com>

>Hi, The current implementation of the above for basis symbols creates a 
>symbol then caches it. I suggest that this is an undesirable behaviour.
>
>First, it is quite possible for a cross-product behaviour to have a 
>potentially huge number of symbols and that a significant fraction of 
these 
>can be instantiated once only, e.g. when reading thru a 12-species genome 

>alignment. Caching every instantiated cross-product symbol under these 
>circumstances is very expensive on memory and also pointless.
>
>Next, the existing cache is a Map keyed on a list of Symbols. This forces 

>all caching to run off this implementation which can be inefficient for 
>certain alphabets.
>
>I propose to change the behaviour to leave all symbol implementation 
>details and caching in cross-product/basis alphabets (including 
uniqueness 
>checking) to the alphabet implementation. Are there any implications that 
I 
>may not have considered (is it OK with serialisation?). Or objections? I 
>think this change can be done without breaking the API.

I think as long as it is well documented how to do the caching that would 
be fine. Would you keep cahcing for core alphabets like DNA?

I suspect it might cause problems with serialisation but it might be 
avoidable. As long as there are unit tests for serialisation of both 
cached and unchached alphabets it should be OK. Careful attention might be 
needed for Gaps??

>Another change which I would like to be considered at some future stage 
(BJ 
>2.0?) is a means of dealing with really large alphabets (think DNA**n). 
The 
>size of the alphabet can readily exceed the limits of an int and 
therefore 
>a solution will require breaking our FiniteAlphabet and AlphabetIndex 
APIs. 
>I propose some extension that allows returning results for size and index 

>in terms of BigInteger.

A similar suggestion has been made in the past for indexing SymbolLists in 
terms of BigInteger. How practical would such a large alphabet be? Eg 
unless you expect it to be pretty sparse in terms of the number of 
possible symbols that are actually seen you might get major problems with 
memory.

- Mark

>Regards,
>David

-- 
David Huen
Dept of Genetics
University of Cambridge
CB2 3EH
U.K.

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev


From smh1008 at cam.ac.uk  Thu Feb 15 12:16:49 2007
From: smh1008 at cam.ac.uk (David Huen)
Date: 15 Feb 2007 12:16:49 +0000
Subject: [Biojava-dev] AlphabetManager.createSymbol(...)
In-Reply-To: <OFC83783D6.284BDBE8-ON48257283.0033C24C-48257283.00344AEE@ah.novartis.com>
References: <OFC83783D6.284BDBE8-ON48257283.0033C24C-48257283.00344AEE@ah.novartis.com>
Message-ID: <Prayer.1.0.18.0702151216490.2112@hermes-1.csi.cam.ac.uk>

On Feb 15 2007, mark.schreiber at novartis.com wrote:

>
>A similar suggestion has been made in the past for indexing SymbolLists in 
>terms of BigInteger. How practical would such a large alphabet be? Eg 
>unless you expect it to be pretty sparse in terms of the number of 
>possible symbols that are actually seen you might get major problems with 
>memory.
>
I think it is practical in the sense that even a simple (AA)^10 alphabet 
will exceed the range of int but an alignment of 10 proteins may only be, 
say, 1000 residues long so only a max of 1000 symbols will ever be 
instantiated with much fewer needing to remain instantiated throughout the 
run. I see less point for SymbolLists in that it seems unlikely that any 
chromosome could have more than an int's worth of bases.

The main reason I need these huge alphabets is for 1-D HMMs that run over 
genome alignments. I also hope to internally representing symbols in these 
alphabets by BigInteger values of their alphabet index.

Incidentally, the SparseCrossProductAlphabet appeared to be caching every 
symbol it was ever asked for and I have changed that to a WeakValueHashMap 
internally now.

Regards,
David


-- 
David Huen
Dept of Genetics
University of Cambridge
CB2 3EH
U.K.


From kirankumarvoona at hotmail.com  Wed Feb 21 17:37:05 2007
From: kirankumarvoona at hotmail.com (kiran kumar)
Date: Wed, 21 Feb 2007 09:37:05 -0800 (PST)
Subject: [Biojava-dev] Invitation to connect on LinkedIn
Message-ID: <10449181.1172079425867.JavaMail.app@app04.prod>

s,

I'd like to add you to my professional network on LinkedIn.

-kiran

PS: Here is the link: 
https://www.linkedin.com/e/isd/73729563/Jww4EOmW/

It is free to join and takes less than 60 seconds to sign up.

This is an exclusive invitation from kiran kumar to s biojava-dev at biojava.org.  For security reasons, please do not forward this invitation.


From hl450 at hotmail.com  Tue Feb 13 07:14:00 2007
From: hl450 at hotmail.com (Lee Heewook)
Date: Tue, 13 Feb 2007 07:14:00 -0000
Subject: [Biojava-dev] Changing the sample name of the ABI file
Message-ID: <BAY127-F25B709EDDC57334A92E002FA900@phx.gbl>

Is there way to change the sample name of the ABI file? 

Hee