From daniel.rohrbach at web.de Sun Feb 11 07:11:13 2007 From: daniel.rohrbach at web.de (Daniel Rohrbach) Date: Sun, 11 Feb 2007 13:11:13 +0100 Subject: [Biojava-dev] Hi, introduction of myself Message-ID: <1386536776@web.de> Hello, my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me. I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work. Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging). Apart from the work on the genetic project I'm interested to participate as a voluntary developer for the BioJava project. Please write me if there is a way to work on this projekt. Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models. Many greetings Daniel __________________________________________________________________________ Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach! Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131 From holland at ebi.ac.uk Mon Feb 12 06:20:19 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 12 Feb 2007 11:20:19 +0000 Subject: [Biojava-dev] Hi, introduction of myself In-Reply-To: <1386536776@web.de> References: <1386536776@web.de> Message-ID: <45D04D73.9060000@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The best thing to do is to identify an area of BioJava which you feel needs improvement and jump right in. Most of BioJava is sequence-centric, which was fine when it was written but things have moved on since then. We therefore could do with some new functionality for the more popular current research areas, e.g. proteomics, microarrays, phylogenetics, etc. cheers, Richard Daniel Rohrbach wrote: > Hello, > > my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me. > > I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. > > A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work. > > Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging). > > Apart from the work on the genetic project I'm interested to participate as a voluntary developer for the BioJava project. Please write me if there is a way to work on this projekt. > > Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models. > > Many greetings > Daniel > __________________________________________________________________________ > Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach! > Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131 > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF0E1z4C5LeMEKA/QRAgWkAJ9of7i/GIdTAkbhGCK4dwZUX+rr/wCfckE0 tLxNF0QBORtSBi9QdJNA+Hw= =kDtn -----END PGP SIGNATURE----- From smh1008 at cam.ac.uk Tue Feb 13 08:47:03 2007 From: smh1008 at cam.ac.uk (David Huen) Date: 13 Feb 2007 13:47:03 +0000 Subject: [Biojava-dev] AlphabetManager.createSymbol(...) Message-ID: Hi, The current implementation of the above for basis symbols creates a symbol then caches it. I suggest that this is an undesirable behaviour. First, it is quite possible for a cross-product behaviour to have a potentially huge number of symbols and that a significant fraction of these can be instantiated once only, e.g. when reading thru a 12-species genome alignment. Caching every instantiated cross-product symbol under these circumstances is very expensive on memory and also pointless. Next, the existing cache is a Map keyed on a list of Symbols. This forces all caching to run off this implementation which can be inefficient for certain alphabets. I propose to change the behaviour to leave all symbol implementation details and caching in cross-product/basis alphabets (including uniqueness checking) to the alphabet implementation. Are there any implications that I may not have considered (is it OK with serialisation?). Or objections? I think this change can be done without breaking the API. Another change which I would like to be considered at some future stage (BJ 2.0?) is a means of dealing with really large alphabets (think DNA**n). The size of the alphabet can readily exceed the limits of an int and therefore a solution will require breaking our FiniteAlphabet and AlphabetIndex APIs. I propose some extension that allows returning results for size and index in terms of BigInteger. Regards, David -- David Huen Dept of Genetics University of Cambridge CB2 3EH U.K. From mark.schreiber at novartis.com Thu Feb 15 04:31:08 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 15 Feb 2007 17:31:08 +0800 Subject: [Biojava-dev] AlphabetManager.createSymbol(...) Message-ID: >Hi, The current implementation of the above for basis symbols creates a >symbol then caches it. I suggest that this is an undesirable behaviour. > >First, it is quite possible for a cross-product behaviour to have a >potentially huge number of symbols and that a significant fraction of these >can be instantiated once only, e.g. when reading thru a 12-species genome >alignment. Caching every instantiated cross-product symbol under these >circumstances is very expensive on memory and also pointless. > >Next, the existing cache is a Map keyed on a list of Symbols. This forces >all caching to run off this implementation which can be inefficient for >certain alphabets. > >I propose to change the behaviour to leave all symbol implementation >details and caching in cross-product/basis alphabets (including uniqueness >checking) to the alphabet implementation. Are there any implications that I >may not have considered (is it OK with serialisation?). Or objections? I >think this change can be done without breaking the API. I think as long as it is well documented how to do the caching that would be fine. Would you keep cahcing for core alphabets like DNA? I suspect it might cause problems with serialisation but it might be avoidable. As long as there are unit tests for serialisation of both cached and unchached alphabets it should be OK. Careful attention might be needed for Gaps?? >Another change which I would like to be considered at some future stage (BJ >2.0?) is a means of dealing with really large alphabets (think DNA**n). The >size of the alphabet can readily exceed the limits of an int and therefore >a solution will require breaking our FiniteAlphabet and AlphabetIndex APIs. >I propose some extension that allows returning results for size and index >in terms of BigInteger. A similar suggestion has been made in the past for indexing SymbolLists in terms of BigInteger. How practical would such a large alphabet be? Eg unless you expect it to be pretty sparse in terms of the number of possible symbols that are actually seen you might get major problems with memory. - Mark >Regards, >David -- David Huen Dept of Genetics University of Cambridge CB2 3EH U.K. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From smh1008 at cam.ac.uk Thu Feb 15 07:16:49 2007 From: smh1008 at cam.ac.uk (David Huen) Date: 15 Feb 2007 12:16:49 +0000 Subject: [Biojava-dev] AlphabetManager.createSymbol(...) In-Reply-To: References: Message-ID: On Feb 15 2007, mark.schreiber at novartis.com wrote: > >A similar suggestion has been made in the past for indexing SymbolLists in >terms of BigInteger. How practical would such a large alphabet be? Eg >unless you expect it to be pretty sparse in terms of the number of >possible symbols that are actually seen you might get major problems with >memory. > I think it is practical in the sense that even a simple (AA)^10 alphabet will exceed the range of int but an alignment of 10 proteins may only be, say, 1000 residues long so only a max of 1000 symbols will ever be instantiated with much fewer needing to remain instantiated throughout the run. I see less point for SymbolLists in that it seems unlikely that any chromosome could have more than an int's worth of bases. The main reason I need these huge alphabets is for 1-D HMMs that run over genome alignments. I also hope to internally representing symbols in these alphabets by BigInteger values of their alphabet index. Incidentally, the SparseCrossProductAlphabet appeared to be caching every symbol it was ever asked for and I have changed that to a WeakValueHashMap internally now. Regards, David -- David Huen Dept of Genetics University of Cambridge CB2 3EH U.K. From kirankumarvoona at hotmail.com Wed Feb 21 12:37:05 2007 From: kirankumarvoona at hotmail.com (kiran kumar) Date: Wed, 21 Feb 2007 09:37:05 -0800 (PST) Subject: [Biojava-dev] Invitation to connect on LinkedIn Message-ID: <10449181.1172079425867.JavaMail.app@app04.prod> s, I'd like to add you to my professional network on LinkedIn. -kiran PS: Here is the link: https://www.linkedin.com/e/isd/73729563/Jww4EOmW/ It is free to join and takes less than 60 seconds to sign up. This is an exclusive invitation from kiran kumar to s biojava-dev at biojava.org. For security reasons, please do not forward this invitation. From hl450 at hotmail.com Tue Feb 13 02:14:00 2007 From: hl450 at hotmail.com (Lee Heewook) Date: Tue, 13 Feb 2007 07:14:00 -0000 Subject: [Biojava-dev] Changing the sample name of the ABI file Message-ID: Is there way to change the sample name of the ABI file? Hee From daniel.rohrbach at web.de Sun Feb 11 12:11:13 2007 From: daniel.rohrbach at web.de (Daniel Rohrbach) Date: Sun, 11 Feb 2007 13:11:13 +0100 Subject: [Biojava-dev] Hi, introduction of myself Message-ID: <1386536776@web.de> Hello, my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me. I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work. Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging). Apart from the work on the genetic project I'm interested to participate as a voluntary developer for the BioJava project. Please write me if there is a way to work on this projekt. Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models. Many greetings Daniel __________________________________________________________________________ Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach! Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131 From holland at ebi.ac.uk Mon Feb 12 11:20:19 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 12 Feb 2007 11:20:19 +0000 Subject: [Biojava-dev] Hi, introduction of myself In-Reply-To: <1386536776@web.de> References: <1386536776@web.de> Message-ID: <45D04D73.9060000@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The best thing to do is to identify an area of BioJava which you feel needs improvement and jump right in. Most of BioJava is sequence-centric, which was fine when it was written but things have moved on since then. We therefore could do with some new functionality for the more popular current research areas, e.g. proteomics, microarrays, phylogenetics, etc. cheers, Richard Daniel Rohrbach wrote: > Hello, > > my name is Daniel. I have straight registered myself into the mailing list and i hope that this is the right way to describe me. > > I'm a student of Bioinformatics at the Martin-Luther-University in germany. I am in the seventh term and my English is unfortunately not so good. I ask to excuse this in the following. > > A few days ago I heard of BioJava. I'm currently working for a project for a lecturer of genetics. We try to develop a program to classify some special effector proteins of plants and animals. In consequence, we want to use several algorithms and data structures such as blast, HMM's, Baysian -and Neuronal Networks. I saw not very deep into the BioJava package up to now but after reading the tutorial, I was confident that BioJava can be a good base for our work. > > Before the project started, I had worked on a voluntary project for a professor. This work covers an image analyzer based on Java technology and the JAI API(java advanced imaging). > > Apart from the work on the genetic project I'm interested to participate as a voluntary developer for the BioJava project. Please write me if there is a way to work on this projekt. > > Short: my skills are programming with JAVA, CVS, different bioinformatic methods like sequence analysis and statistical methods/ models. > > Many greetings > Daniel > __________________________________________________________________________ > Erweitern Sie FreeMail zu einem noch leistungsst?rkeren E-Mail-Postfach! > Mehr Infos unter http://freemail.web.de/home/landingpad/?mc=021131 > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF0E1z4C5LeMEKA/QRAgWkAJ9of7i/GIdTAkbhGCK4dwZUX+rr/wCfckE0 tLxNF0QBORtSBi9QdJNA+Hw= =kDtn -----END PGP SIGNATURE----- From smh1008 at cam.ac.uk Tue Feb 13 13:47:03 2007 From: smh1008 at cam.ac.uk (David Huen) Date: 13 Feb 2007 13:47:03 +0000 Subject: [Biojava-dev] AlphabetManager.createSymbol(...) Message-ID: Hi, The current implementation of the above for basis symbols creates a symbol then caches it. I suggest that this is an undesirable behaviour. First, it is quite possible for a cross-product behaviour to have a potentially huge number of symbols and that a significant fraction of these can be instantiated once only, e.g. when reading thru a 12-species genome alignment. Caching every instantiated cross-product symbol under these circumstances is very expensive on memory and also pointless. Next, the existing cache is a Map keyed on a list of Symbols. This forces all caching to run off this implementation which can be inefficient for certain alphabets. I propose to change the behaviour to leave all symbol implementation details and caching in cross-product/basis alphabets (including uniqueness checking) to the alphabet implementation. Are there any implications that I may not have considered (is it OK with serialisation?). Or objections? I think this change can be done without breaking the API. Another change which I would like to be considered at some future stage (BJ 2.0?) is a means of dealing with really large alphabets (think DNA**n). The size of the alphabet can readily exceed the limits of an int and therefore a solution will require breaking our FiniteAlphabet and AlphabetIndex APIs. I propose some extension that allows returning results for size and index in terms of BigInteger. Regards, David -- David Huen Dept of Genetics University of Cambridge CB2 3EH U.K. From mark.schreiber at novartis.com Thu Feb 15 09:31:08 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 15 Feb 2007 17:31:08 +0800 Subject: [Biojava-dev] AlphabetManager.createSymbol(...) Message-ID: >Hi, The current implementation of the above for basis symbols creates a >symbol then caches it. I suggest that this is an undesirable behaviour. > >First, it is quite possible for a cross-product behaviour to have a >potentially huge number of symbols and that a significant fraction of these >can be instantiated once only, e.g. when reading thru a 12-species genome >alignment. Caching every instantiated cross-product symbol under these >circumstances is very expensive on memory and also pointless. > >Next, the existing cache is a Map keyed on a list of Symbols. This forces >all caching to run off this implementation which can be inefficient for >certain alphabets. > >I propose to change the behaviour to leave all symbol implementation >details and caching in cross-product/basis alphabets (including uniqueness >checking) to the alphabet implementation. Are there any implications that I >may not have considered (is it OK with serialisation?). Or objections? I >think this change can be done without breaking the API. I think as long as it is well documented how to do the caching that would be fine. Would you keep cahcing for core alphabets like DNA? I suspect it might cause problems with serialisation but it might be avoidable. As long as there are unit tests for serialisation of both cached and unchached alphabets it should be OK. Careful attention might be needed for Gaps?? >Another change which I would like to be considered at some future stage (BJ >2.0?) is a means of dealing with really large alphabets (think DNA**n). The >size of the alphabet can readily exceed the limits of an int and therefore >a solution will require breaking our FiniteAlphabet and AlphabetIndex APIs. >I propose some extension that allows returning results for size and index >in terms of BigInteger. A similar suggestion has been made in the past for indexing SymbolLists in terms of BigInteger. How practical would such a large alphabet be? Eg unless you expect it to be pretty sparse in terms of the number of possible symbols that are actually seen you might get major problems with memory. - Mark >Regards, >David -- David Huen Dept of Genetics University of Cambridge CB2 3EH U.K. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From smh1008 at cam.ac.uk Thu Feb 15 12:16:49 2007 From: smh1008 at cam.ac.uk (David Huen) Date: 15 Feb 2007 12:16:49 +0000 Subject: [Biojava-dev] AlphabetManager.createSymbol(...) In-Reply-To: References: Message-ID: On Feb 15 2007, mark.schreiber at novartis.com wrote: > >A similar suggestion has been made in the past for indexing SymbolLists in >terms of BigInteger. How practical would such a large alphabet be? Eg >unless you expect it to be pretty sparse in terms of the number of >possible symbols that are actually seen you might get major problems with >memory. > I think it is practical in the sense that even a simple (AA)^10 alphabet will exceed the range of int but an alignment of 10 proteins may only be, say, 1000 residues long so only a max of 1000 symbols will ever be instantiated with much fewer needing to remain instantiated throughout the run. I see less point for SymbolLists in that it seems unlikely that any chromosome could have more than an int's worth of bases. The main reason I need these huge alphabets is for 1-D HMMs that run over genome alignments. I also hope to internally representing symbols in these alphabets by BigInteger values of their alphabet index. Incidentally, the SparseCrossProductAlphabet appeared to be caching every symbol it was ever asked for and I have changed that to a WeakValueHashMap internally now. Regards, David -- David Huen Dept of Genetics University of Cambridge CB2 3EH U.K. From kirankumarvoona at hotmail.com Wed Feb 21 17:37:05 2007 From: kirankumarvoona at hotmail.com (kiran kumar) Date: Wed, 21 Feb 2007 09:37:05 -0800 (PST) Subject: [Biojava-dev] Invitation to connect on LinkedIn Message-ID: <10449181.1172079425867.JavaMail.app@app04.prod> s, I'd like to add you to my professional network on LinkedIn. -kiran PS: Here is the link: https://www.linkedin.com/e/isd/73729563/Jww4EOmW/ It is free to join and takes less than 60 seconds to sign up. This is an exclusive invitation from kiran kumar to s biojava-dev at biojava.org. For security reasons, please do not forward this invitation. From hl450 at hotmail.com Tue Feb 13 07:14:00 2007 From: hl450 at hotmail.com (Lee Heewook) Date: Tue, 13 Feb 2007 07:14:00 -0000 Subject: [Biojava-dev] Changing the sample name of the ABI file Message-ID: Is there way to change the sample name of the ABI file? Hee