From andreas.prlic at gmail.com Tue Nov 1 18:55:08 2011 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 1 Nov 2011 15:55:08 -0700 Subject: [Biojava-l] queries on biojava In-Reply-To: <1319952341.16116.YahooMailNeo@web112102.mail.gq1.yahoo.com> References: <1319952341.16116.YahooMailNeo@web112102.mail.gq1.yahoo.com> Message-ID: Hi Parinita, Please don't send BioJava related questions to me directly, but to the mailing list. Probably your genome sequences are very long and require a lot of memory? Try increasing your Java memory to the maximum available on your computer with something like -Xmx4G . If this does not help and your DNA sequences are too long, you might have to use either shorter DNA sequences, or you use one of the alignment algorithms that are available for the alignment of larger genomes. Eg. Mummer might be one of the options. Andreas On Sat, Oct 29, 2011 at 10:25 PM, parinita ujoodha wrote: > Hello andreas, > ??? ??? ??? ??? ??? I am currently an undergraduate final year student on > the computer science course at the university of Mauritius.? I am doing a > project on a genome comparison tool and using java and biojava as > programming tools.? I have certain queries and hope you can answer them. > I have tried the MSA in the cookbook and it works fine but when I try to > align genomes, the program crashes. > I would sincerely like to know whether there is a way of? aligning genomes > and displaying them in a pleasant interface > I thank you in advance > > Regards > Parinita > > > From andreas at sdsc.edu Tue Nov 1 19:19:17 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 1 Nov 2011 16:19:17 -0700 Subject: [Biojava-l] [Biojava-dev] Genbank import In-Reply-To: References: Message-ID: Hi Ed, Interesting that you found this, this ancient tutorial has not been modified since the early 2000s and has been replaced with the Cookbook pages: http://biojava.org/wiki/BioJava:CookBook Some of the code for this is available as part of the BioJava-legacy (v 1.8) project. BioJava3 currently does not have a Genbank parser yet, only fasta. I added it to the (new) missing features page : http://www.biojava.org/wiki/BioJava3_Feature_Requests Andreas On Sun, Oct 30, 2011 at 7:54 PM, Ed Beaty wrote: > Hello all. ?Is it possible to import Genbank (*.gb) format files using BioJava 3? ?I can't find the SeqIOTools class described in the GeneralReader example in the http://www.biojava.org/docs/bj_in_anger/ReadingGES.htm doc. > Thanks, > Ed Beaty > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From member at linkedin.com Tue Nov 1 22:57:54 2011 From: member at linkedin.com (Huijie Qiao via LinkedIn) Date: Wed, 2 Nov 2011 02:57:54 +0000 (UTC) Subject: [Biojava-l] Invitation to connect on LinkedIn Message-ID: <1927181069.419433.1320202674916.JavaMail.app@ela4-bed82.prod> LinkedIn ------------ Huijie Qiao requested to add you as a connection on LinkedIn: ------------------------------------------ Christopher, I'd like to add you to my professional network on LinkedIn. Accept invitation from Huijie Qiao http://www.linkedin.com/e/triamj-guhql3hb-5d/zz8MFWe4hXWC7m_VsDmWDUKsZA0p5qlGHsp1420HEwv/blk/I24690062_16/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_dz5vczoMc3ASd399bQtmrkhhdlhObPsRd38Ve3gVcPgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=1poUfgEBGNLAY1 View invitation from Huijie Qiao http://www.linkedin.com/e/triamj-guhql3hb-5d/zz8MFWe4hXWC7m_VsDmWDUKsZA0p5qlGHsp1420HEwv/blk/I24690062_16/0SclYOdz0MejoQcAALqnpPbOYWrSlI/svi/?hs=false&tok=14dMLFvIWNLAY1 ------------------------------------------ Why might connecting with Huijie Qiao be a good idea? Have a question? Huijie Qiao's network will probably have an answer: You can use LinkedIn Answers to distribute your professional questions to Huijie Qiao and your extended network. You can get high-quality answers from experienced professionals. http://www.linkedin.com/e/triamj-guhql3hb-5d/ash/inv19_ayn/?hs=false&tok=1eLzvWwpiNLAY1 -- (c) 2011, LinkedIn Corporation From biojava at hannes.oib.com Thu Nov 3 09:08:22 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Thu, 3 Nov 2011 14:08:22 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences Message-ID: Hi! Is there a Class/Method in Biojava that calculates the Levenshtein distance between two sequences? I could not find anything in the docs at first search. I need to compare 2 DNASequences (or Strings) and get the number of insertions, deletions, and substitutions. Ideally, there would be an option to abort the comparison if the number of mismatches exceeds a certain number. Hannes From biojava at hannes.oib.com Thu Nov 3 10:17:05 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Thu, 3 Nov 2011 15:17:05 +0100 Subject: [Biojava-l] Sequence Alignment and IUB Codes Message-ID: Does the biojava sequence alignment support IUB Codes? http://en.wikipedia.org/wiki/FASTA_format#Sequence_representation Hannes From andreas at sdsc.edu Fri Nov 4 02:18:12 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 3 Nov 2011 23:18:12 -0700 Subject: [Biojava-l] Sequence Alignment and IUB Codes In-Reply-To: References: Message-ID: Hi Hannes, I think that depends which substitution matrix you are using. Some of them don't support ambiguity codes. Andreas On Thu, Nov 3, 2011 at 7:17 AM, Hannes Brandst?tter-M?ller wrote: > Does the biojava sequence alignment support IUB Codes? > http://en.wikipedia.org/wiki/FASTA_format#Sequence_representation > > Hannes > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From biojava at hannes.oib.com Fri Nov 4 02:20:53 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Fri, 4 Nov 2011 07:20:53 +0100 Subject: [Biojava-l] Sequence Alignment and IUB Codes In-Reply-To: References: Message-ID: Thanks Andreas, of course, that's where I can configure it. Thanks for the clarification. Hannes On Fri, Nov 4, 2011 at 07:18, Andreas Prlic wrote: > Hi Hannes, > > I think that depends which substitution matrix you are using. Some of > them don't support ambiguity codes. > > Andreas > > On Thu, Nov 3, 2011 at 7:17 AM, Hannes Brandst?tter-M?ller > wrote: >> Does the biojava sequence alignment support IUB Codes? >> http://en.wikipedia.org/wiki/FASTA_format#Sequence_representation >> >> Hannes >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From andreas at sdsc.edu Fri Nov 4 02:21:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 3 Nov 2011 23:21:57 -0700 Subject: [Biojava-l] BioJava in education Message-ID: Hi, Is anybody using BioJava in teaching, or has been introduced to BioJava as part of a course? Andreas From cfriedline at vcu.edu Fri Nov 4 09:50:52 2011 From: cfriedline at vcu.edu (Chris Friedline) Date: Fri, 4 Nov 2011 09:50:52 -0400 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Andreas, I had to come to it on my own. Seems the focus at this university is on Perl only and there's a serious (IMO) disconnect with our CS department. Java (or some other OO language) is only a minor requirement for bioinformatics undergrads/graduate students. However, I'd love to see this change and/or be involved in the development of a course to do just that. We've been kicking around developing some intersession modules around dealing with genetic data sets using Java/BioJava, but doing actual research continues to stand in our way. On a side note, have you seen the Java Evolutionary Biology Library? Would be excellent to combine forces of these development efforts since JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely from a selfish user perspective. ;-) Chris On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: > Hi, > > Is anybody using BioJava in teaching, or has been introduced to > BioJava as part of a course? > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Fri Nov 4 19:11:46 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 4 Nov 2011 16:11:46 -0700 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Hi, to quickly follow up: A few people have contacted me off list that they were introduced to BioJava through some course. Interestingly no teachers though who arranged this. I would be interested in organising the exchange of teaching materials. If anybody wants to exchange power points, homework assignment ideas, etc. please let me know. Chris wrote: > On a side note, have you seen the Java Evolutionary Biology Library? ?Would be excellent to combine forces of these development efforts since JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely from a selfish user perspective. ?;-) That is a great suggestion. Seems that project does not really have a web presence. Any idea how to contact the main developers? There is no mailing list. I guess one could try via sourceforge somehow. Andreas > Chris > > On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: > >> Hi, >> >> Is anybody using BioJava in teaching, or has been introduced to >> BioJava as part of a course? >> >> Andreas >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From simon.rayner.cn at gmail.com Fri Nov 4 19:18:33 2011 From: simon.rayner.cn at gmail.com (simon rayner) Date: Sat, 5 Nov 2011 07:18:33 +0800 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: i've been trying to use biojava as a segment of an advanced bioinformatics course for masters students but these are students who are studying virology with a specific interest in bioinformatics. Nevertheless, some students take to it very easily but, for me, it seems the major drawback with biojava is it is a lot more disjointed compared to bioperl,. By this i mean, it's not always easy to find the classes you need, or if the class is available, it was broken during a previous release, or simply wasn't included in the latest version. I think biojava is the way to go for training and a course development project might be the way to get a more cohesive documentation set. On Fri, Nov 4, 2011 at 9:50 PM, Chris Friedline wrote: > >> Andreas, >> >> I had to come to it on my own. Seems the focus at this university is on >> Perl only and there's a serious (IMO) disconnect with our CS department. >> Java (or some other OO language) is only a minor requirement for >> bioinformatics undergrads/graduate students. However, I'd love to see this >> change and/or be involved in the development of a course to do just that. >> We've been kicking around developing some intersession modules around >> dealing with genetic data sets using Java/BioJava, but doing actual >> research continues to stand in our way. >> >> On a side note, have you seen the Java Evolutionary Biology Library? >> Would be excellent to combine forces of these development efforts since >> JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely >> from a selfish user perspective. ;-) >> >> Chris >> >> On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: >> >> > Hi, >> > >> > Is anybody using BioJava in teaching, or has been introduced to >> > BioJava as part of a course? >> > >> > Andreas >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From andreas at sdsc.edu Fri Nov 4 20:37:03 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 4 Nov 2011 17:37:03 -0700 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Thanks Simon, v. interesting. Can you provide some more details? What kind of tasks were you requiring the students to perform? I agree, a project to prepare course material would be great. Andreas On Fri, Nov 4, 2011 at 4:18 PM, simon rayner wrote: > i've been trying to use biojava as a segment of an advanced bioinformatics > course for masters students but these are students who are studying > virology with a specific interest in bioinformatics. ?Nevertheless, some > students take to it very easily but, for me, it seems the major drawback > with biojava is it is a lot more disjointed compared to bioperl,. By this i > mean, it's not always easy to find the classes you need, or if the class is > available, it was broken during a previous release, or simply wasn't > included in the latest version. I think biojava is the way to go for > training and a course development project might be the way to get a more > cohesive documentation set. > > On Fri, Nov 4, 2011 at 9:50 PM, Chris Friedline wrote: >> >>> Andreas, >>> >>> I had to come to it on my own. ?Seems the focus at this university is on >>> Perl only and there's a serious (IMO) disconnect with our CS department. >>> ?Java (or some other OO language) is only a minor requirement for >>> bioinformatics undergrads/graduate students. ?However, I'd love to see this >>> change and/or be involved in the development of a course to do just that. >>> ?We've been kicking around developing some intersession modules around >>> dealing with genetic data sets using Java/BioJava, but doing actual >>> research continues to stand in our way. >>> >>> On a side note, have you seen the Java Evolutionary Biology Library? >>> ?Would be excellent to combine forces of these development efforts since >>> JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely >>> from a selfish user perspective. ?;-) >>> >>> Chris >>> >>> On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: >>> >>> > Hi, >>> > >>> > Is anybody using BioJava in teaching, or has been introduced to >>> > BioJava as part of a course? >>> > >>> > Andreas >>> > _______________________________________________ >>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> >> > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895?(office) > +86 18627113001?(cell) > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From cfriedline at vcu.edu Sat Nov 5 10:18:18 2011 From: cfriedline at vcu.edu (Chris Friedline) Date: Sat, 5 Nov 2011 10:18:18 -0400 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: <9B411F3A-FCC2-449A-9F3C-44AD91DF30BD@vcu.edu> On Nov 4, 2011, at 7:11 PM, Andreas Prlic wrote: > > That is a great suggestion. Seems that project does not really have a > web presence. Any idea how to contact the main developers? There is no > mailing list. I guess one could try via sourceforge somehow. You may want to contact either Alexei Drummond (http://compevol.auckland.ac.nz/people/) or Andrew Rambaut (http://tree.bio.ed.ac.uk/). I think their emails are on their respective sites. Chris From biojava at hannes.oib.com Mon Nov 7 08:39:16 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Mon, 7 Nov 2011 14:39:16 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: Following up: If there is no such thing, should I make it available if I write it? Hannes On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller wrote: > Hi! > > Is there a Class/Method in Biojava that calculates the Levenshtein > distance between two sequences? I could not find anything in the docs > at first search. > > I need to compare 2 DNASequences (or Strings) and get the number of > insertions, deletions, and substitutions. Ideally, there would be an > option to abort the comparison if the number of mismatches exceeds a > certain number. > > Hannes > From andreas at sdsc.edu Mon Nov 7 09:42:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 7 Nov 2011 06:42:57 -0800 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: Hi Hannes, you are right, this does not exist yet. Somebody else asked the same question a few weeks ago. As such it would be great if you could provide a patch, there might be other people interested in that, too. Andreas On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller wrote: > Following up: > > If there is no such thing, should I make it available if I write it? > > Hannes > > On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller > wrote: >> Hi! >> >> Is there a Class/Method in Biojava that calculates the Levenshtein >> distance between two sequences? I could not find anything in the docs >> at first search. >> >> I need to compare 2 DNASequences (or Strings) and get the number of >> insertions, deletions, and substitutions. Ideally, there would be an >> option to abort the comparison if the number of mismatches exceeds a >> certain number. >> >> Hannes >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From forumjspro at gmail.com Mon Nov 7 10:03:55 2011 From: forumjspro at gmail.com (forumjspro at gmail.com) Date: Mon, 7 Nov 2011 16:03:55 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Hi Hannes, You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. But it will not stop when a maximal number of errors is reached ... JS Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : > Hi Hannes, > > you are right, this does not exist yet. Somebody else asked the same > question a few weeks ago. As such it would be great if you could > provide a patch, there might be other people interested in that, too. > > Andreas > > On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller > wrote: >> Following up: >> >> If there is no such thing, should I make it available if I write it? >> >> Hannes >> >> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >> wrote: >>> Hi! >>> >>> Is there a Class/Method in Biojava that calculates the Levenshtein >>> distance between two sequences? I could not find anything in the docs >>> at first search. >>> >>> I need to compare 2 DNASequences (or Strings) and get the number of >>> insertions, deletions, and substitutions. Ideally, there would be an >>> option to abort the comparison if the number of mismatches exceeds a >>> certain number. >>> >>> Hannes >>> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From cfriedline at vcu.edu Mon Nov 7 10:58:28 2011 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 7 Nov 2011 10:58:28 -0500 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: <7E0C3BDA-03F4-4723-BE91-0F4B274DD671@vcu.edu> Hannes, If you really want to use Levenstein distance, there's a method in Apache commons lang StringUtils. Chris On Nov 3, 2011, at 9:08 AM, Hannes Brandst?tter-M?ller wrote: > Hi! > > Is there a Class/Method in Biojava that calculates the Levenshtein > distance between two sequences? I could not find anything in the docs > at first search. > > I need to compare 2 DNASequences (or Strings) and get the number of > insertions, deletions, and substitutions. Ideally, there would be an > option to abort the comparison if the number of mismatches exceeds a > certain number. > > Hannes > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From amr_alhossary at hotmail.com Mon Nov 7 09:48:28 2011 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Mon, 7 Nov 2011 16:48:28 +0200 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Sorry for the late reply. Although I come into it on my own, I had the interest to voluntarily teach it to the Bioinforamtics group of ITi (a famous IT Institute herein Egypt). I focused on BioJava 1.7 and I May send you the presentation if you like. Unfortunately, this department was closed after 2 successive intakes, so I didn't update it. Amr -------------------------------------------------- From: "Andreas Prlic" Sent: Friday, November 04, 2011 8:21 AM To: "Biojava" Subject: [Biojava-l] BioJava in education > Hi, > > Is anybody using BioJava in teaching, or has been introduced to > BioJava as part of a course? > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From amr_alhossary at hotmail.com Mon Nov 7 09:48:28 2011 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Mon, 7 Nov 2011 16:48:28 +0200 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Sorry for the late reply. Although I come into it on my own, I had the interest to voluntarily teach it to the Bioinforamtics group of ITi (a famous IT Institute herein Egypt). I focused on BioJava 1.7 and I May send you the presentation if you like. Unfortunately, this department was closed after 2 successive intakes, so I didn't update it. Amr -------------------------------------------------- From: "Andreas Prlic" Sent: Friday, November 04, 2011 8:21 AM To: "Biojava" Subject: [Biojava-l] BioJava in education > Hi, > > Is anybody using BioJava in teaching, or has been introduced to > BioJava as part of a course? > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From biojava at hannes.oib.com Wed Nov 9 04:01:15 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Wed, 9 Nov 2011 10:01:15 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> References: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Message-ID: Thanks. I am thinking about implementing a modified MatrixAligner to fit my needs here. Direct Levenstein Distance is not exactly right for my application, because there are 0 to n insertions/deletions. A direct LevensteinDistance as implemented in would not give me all the Information I need. Thanks for the hints and input. Hannes PS we seriously need more examples in the Cookbook - I'll submit some later, but the modified N-W Aligner mentioned below would make a good example too, don't you think? ;) On Mon, Nov 7, 2011 at 16:03, wrote: > Hi Hannes, > > You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. > > But it will not stop when a maximal number of errors is reached ... > > JS > > Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : > >> Hi Hannes, >> >> you are right, this does not exist yet. Somebody else asked the same >> question a few weeks ago. As such it would be great if you could >> provide a patch, there might be other people interested in that, too. >> >> Andreas >> >> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller >> wrote: >>> Following up: >>> >>> If there is no such thing, should I make it available if I write it? >>> >>> Hannes >>> >>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >>> wrote: >>>> Hi! >>>> >>>> Is there a Class/Method in Biojava that calculates the Levenshtein >>>> distance between two sequences? I could not find anything in the docs >>>> at first search. >>>> >>>> I need to compare 2 DNASequences (or Strings) and get the number of >>>> insertions, deletions, and substitutions. Ideally, there would be an >>>> option to abort the comparison if the number of mismatches exceeds a >>>> certain number. >>>> >>>> Hannes >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > From khalil.elmazouari at gmail.com Mon Nov 14 11:00:37 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 14 Nov 2011 17:00:37 +0100 Subject: [Biojava-l] Rename RichSequence Message-ID: <97C210D8-F5C4-4E0D-AF59-FDB4BCE193F7@gmail.com> Hi, how to rename a RichSequence? Thanks From Dietmar.Birzer at biologie.uni-regensburg.de Mon Nov 14 11:53:29 2011 From: Dietmar.Birzer at biologie.uni-regensburg.de (Dietmar Birzer) Date: Mon, 14 Nov 2011 17:53:29 +0100 Subject: [Biojava-l] Exception thrown when parsing GenBank file Message-ID: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> Hi all, I am currently trying to debug a little software application which uses BioJava's core-1.8.1.jar library because it has started to throw exceptions a while ago. I guess the problem is, that the GenbankLocationParser is not able to handle "Het" entries in the features section of the GenBank/GenPept format, e.g. Het join(bond(9),bond(125)) /heterogen="( NA, 5 )" for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . Calling new GenpeptRichSequenceDB().getRichSequence("14719485"); will result in an error message like: Error while querying 14719485! org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenpeptRichSequenceDB.getRichSequence(GenpeptRichSequenceDB.java:158) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at org.biojavax.bio.db.ncbi.GenpeptRichSequenceDB.getRichSequence(GenpeptRichSequenceDB.java:154) ... 2 more Caused by: org.biojava.bio.seq.io.ParseException: Could not understand position: bond(9 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:286) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:272) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:237) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:132) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:508) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 3 more I have checked several other database entires and none of the ones that worked had a "Het" entry. But I also failed at 56965892, 13786715, 209156668 and 12084365. Has anybody else come across this problem or knows how to fix it? Thanks, Dietmar From p.j.a.cock at googlemail.com Mon Nov 14 12:29:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 14 Nov 2011 17:29:31 +0000 Subject: [Biojava-l] Exception thrown when parsing GenBank file In-Reply-To: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> Message-ID: On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: > > Hi all, > > I am currently trying to debug a little software application which > uses BioJava's core-1.8.1.jar library because it has started to > throw exceptions a while ago. > > I guess the problem is, that the GenbankLocationParser is not > able to handle "Het" entries in the features section of the > GenBank/GenPept format, e.g. > > ?Het ? ? ? ? ? ? join(bond(9),bond(125)) > ? ? ? ? ? ? ? ? ? ? /heterogen="( NA, ? 5 )" > > for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . Interesting - note the bond locations are not in the official DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations that seems to be intended only for nucleotides and not proteins as here. It might be worth contacting the NCBI to find out if there is an official specification covering these location strings? Peter From gwaldon at geneinfinity.org Mon Nov 14 18:20:02 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 14 Nov 2011 17:20:02 -0600 Subject: [Biojava-l] Rename RichSequence In-Reply-To: <97C210D8-F5C4-4E0D-AF59-FDB4BCE193F7@gmail.com> References: <97C210D8-F5C4-4E0D-AF59-FDB4BCE193F7@gmail.com> Message-ID: <20111114172002.19321y9clytb4ug0@gator1273.hostgator.com> Quoting Khalil El Mazouari : > Hi, > > how to rename a RichSequence? > > Thanks > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > Hi, You cannot rename SimpleRichSequence by design. A simple way is to wrap the sequence in your own implementation of RichSequence and delegate to it. Regards, George From biojava at hannes.oib.com Tue Nov 15 09:10:43 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Tue, 15 Nov 2011 15:10:43 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Message-ID: Well, I have implemented a first version that is running quite well for me and my needs/specifications, although I did not integrate it directly into the biojava class hierarchy yet. Is anyone interested in taking a look at it and giving me some feedback if I shoud invest the time and work to make it includable into biojava? Hannes On Wed, Nov 9, 2011 at 10:01, Hannes Brandst?tter-M?ller wrote: > Thanks. > > I am thinking about implementing a modified MatrixAligner to fit my > needs here. Direct Levenstein Distance is not exactly right for my > application, because there are 0 to n insertions/deletions. > > A direct LevensteinDistance as implemented in would not give me all > the Information I need. > > Thanks for the hints and input. > > Hannes > > PS we seriously need more examples in the Cookbook - I'll submit some > later, but the modified N-W Aligner mentioned below would make a good > example too, don't you think? ;) > > On Mon, Nov 7, 2011 at 16:03, ? wrote: >> Hi Hannes, >> >> You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. >> >> But it will not stop when a maximal number of errors is reached ... >> >> JS >> >> Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : >> >>> Hi Hannes, >>> >>> you are right, this does not exist yet. Somebody else asked the same >>> question a few weeks ago. As such it would be great if you could >>> provide a patch, there might be other people interested in that, too. >>> >>> Andreas >>> >>> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller >>> wrote: >>>> Following up: >>>> >>>> If there is no such thing, should I make it available if I write it? >>>> >>>> Hannes >>>> >>>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >>>> wrote: >>>>> Hi! >>>>> >>>>> Is there a Class/Method in Biojava that calculates the Levenshtein >>>>> distance between two sequences? I could not find anything in the docs >>>>> at first search. >>>>> >>>>> I need to compare 2 DNASequences (or Strings) and get the number of >>>>> insertions, deletions, and substitutions. Ideally, there would be an >>>>> option to abort the comparison if the number of mismatches exceeds a >>>>> certain number. >>>>> >>>>> Hannes >>>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > From andreas at sdsc.edu Tue Nov 15 10:53:50 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 15 Nov 2011 07:53:50 -0800 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Message-ID: you could send it to the list here and ask for feedback.. Andreas On Tue, Nov 15, 2011 at 6:10 AM, Hannes Brandst?tter-M?ller wrote: > Well, I have implemented a first version that is running quite well > for me and my needs/specifications, although I did not integrate it > directly into the biojava class hierarchy yet. > Is anyone interested in taking a look at it and giving me some > feedback if I shoud invest the time and work to make it includable > into biojava? > > Hannes > > On Wed, Nov 9, 2011 at 10:01, Hannes Brandst?tter-M?ller > wrote: >> Thanks. >> >> I am thinking about implementing a modified MatrixAligner to fit my >> needs here. Direct Levenstein Distance is not exactly right for my >> application, because there are 0 to n insertions/deletions. >> >> A direct LevensteinDistance as implemented in would not give me all >> the Information I need. >> >> Thanks for the hints and input. >> >> Hannes >> >> PS we seriously need more examples in the Cookbook - I'll submit some >> later, but the modified N-W Aligner mentioned below would make a good >> example too, don't you think? ;) >> >> On Mon, Nov 7, 2011 at 16:03, ? wrote: >>> Hi Hannes, >>> >>> You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. >>> >>> But it will not stop when a maximal number of errors is reached ... >>> >>> JS >>> >>> Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : >>> >>>> Hi Hannes, >>>> >>>> you are right, this does not exist yet. Somebody else asked the same >>>> question a few weeks ago. As such it would be great if you could >>>> provide a patch, there might be other people interested in that, too. >>>> >>>> Andreas >>>> >>>> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller >>>> wrote: >>>>> Following up: >>>>> >>>>> If there is no such thing, should I make it available if I write it? >>>>> >>>>> Hannes >>>>> >>>>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >>>>> wrote: >>>>>> Hi! >>>>>> >>>>>> Is there a Class/Method in Biojava that calculates the Levenshtein >>>>>> distance between two sequences? I could not find anything in the docs >>>>>> at first search. >>>>>> >>>>>> I need to compare 2 DNASequences (or Strings) and get the number of >>>>>> insertions, deletions, and substitutions. Ideally, there would be an >>>>>> option to abort the comparison if the number of mismatches exceeds a >>>>>> certain number. >>>>>> >>>>>> Hannes >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From lfrohman at gmail.com Wed Nov 16 11:14:18 2011 From: lfrohman at gmail.com (Lance Frohman) Date: Wed, 16 Nov 2011 08:14:18 -0800 Subject: [Biojava-l] biojava-ensembl Message-ID: Where do I find the source code for biojava-ensembl? thanks From Dietmar.Birzer at biologie.uni-regensburg.de Wed Nov 23 10:18:29 2011 From: Dietmar.Birzer at biologie.uni-regensburg.de (Dietmar Birzer) Date: Wed, 23 Nov 2011 16:18:29 +0100 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> Message-ID: <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> Hi all, as the GenbankLocationParser from biojava-1.8.1 is not working properly anymore, I was wondering if there is an equivalent way to do this ( GenpeptRichSequenceDB().getRichSequence("14719485") ) using BioJava 3. Unfortunately I could not find any GenBank/GenPept parser so far. Is that because it does not exist (yet), or just because I have not looked properly? Best wishes Dietmar >>> Peter Cock 11/14/2011 6:29 PM >>> On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: > > Hi all, > > I am currently trying to debug a little software application which > uses BioJava's core-1.8.1.jar library because it has started to > throw exceptions a while ago. > > I guess the problem is, that the GenbankLocationParser is not > able to handle "Het" entries in the features section of the > GenBank/GenPept format, e.g. > > Het join(bond(9),bond(125)) > /heterogen="( NA, 5 )" > > for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . Interesting - note the bond locations are not in the official DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations that seems to be intended only for nucleotides and not proteins as here. It might be worth contacting the NCBI to find out if there is an official specification covering these location strings? Peter From andreas at sdsc.edu Wed Nov 23 10:34:46 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 23 Nov 2011 07:34:46 -0800 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> Message-ID: Hi Dietmar, The genbank parser is on top of the feature request list for biojava 3: http://biojava.org/wiki/BioJava3_Feature_Requests Anybody who wants to take an initiative here and claim ownership of this topic is welcome... Andreas On Wed, Nov 23, 2011 at 7:18 AM, Dietmar Birzer wrote: > ?Hi all, > > as the GenbankLocationParser from biojava-1.8.1 is not working properly anymore, I was wondering if there is an equivalent way to do this ( GenpeptRichSequenceDB().getRichSequence("14719485") ) using BioJava 3. > Unfortunately I could not find any GenBank/GenPept parser so far. Is that because it does not exist (yet), or just because I have not looked properly? > > Best wishes > ?Dietmar > >>>> Peter Cock 11/14/2011 6:29 PM >>> > On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: >> >> Hi all, >> >> I am currently trying to debug a little software application which >> uses BioJava's core-1.8.1.jar library because it has started to >> throw exceptions a while ago. >> >> I guess the problem is, that the GenbankLocationParser is not >> able to handle "Het" entries in the features section of the >> GenBank/GenPept format, e.g. >> >> ?Het ? ? ? ? ? ? join(bond(9),bond(125)) >> ? ? ? ? ? ? ? ? ? ? /heterogen="( NA, ? 5 )" >> >> for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . > > Interesting - note the bond locations are not in the official > DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): > http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html > > However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations > that seems to be intended only for nucleotides and not proteins as here. > It might be worth contacting the NCBI to find out if there is an official > specification covering these location strings? > > Peter > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From gwaldon at geneinfinity.org Wed Nov 23 11:06:57 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Wed, 23 Nov 2011 10:06:57 -0600 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> Message-ID: <20111123100657.23712tvrvitfya0w@gator1273.hostgator.com> Hi Dietmar, The GenbankLocationParser from biojava-1.8.1 is working perfectly as long as you feed him with DNA locations that use the syntax described in the DDBJ/EMBL/GenBank feature table definition. What you've got (join(bond(9),bond(125))...) is a pseudo GenBank format that apparently describes protein structure and uses a slightly different syntax to indicate heteroatom locations. That is why you get all these exceptions thrown. George Quoting Dietmar Birzer : > Hi all, > > as the GenbankLocationParser from biojava-1.8.1 is not working > properly anymore, I was wondering if there is an equivalent way to > do this ( GenpeptRichSequenceDB().getRichSequence("14719485") ) > using BioJava 3. > Unfortunately I could not find any GenBank/GenPept parser so far. Is > that because it does not exist (yet), or just because I have not > looked properly? > > Best wishes > Dietmar > >>>> Peter Cock 11/14/2011 6:29 PM >>> > On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: >> >> Hi all, >> >> I am currently trying to debug a little software application which >> uses BioJava's core-1.8.1.jar library because it has started to >> throw exceptions a while ago. >> >> I guess the problem is, that the GenbankLocationParser is not >> able to handle "Het" entries in the features section of the >> GenBank/GenPept format, e.g. >> >> Het join(bond(9),bond(125)) >> /heterogen="( NA, 5 )" >> >> for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . > > Interesting - note the bond locations are not in the official > DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): > http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html > > However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations > that seems to be intended only for nucleotides and not proteins as here. > It might be worth contacting the NCBI to find out if there is an official > specification covering these location strings? > > Peter > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From p.j.a.cock at googlemail.com Wed Nov 23 12:01:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 Nov 2011 17:01:11 +0000 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: <20111123100657.23712tvrvitfya0w@gator1273.hostgator.com> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> <20111123100657.23712tvrvitfya0w@gator1273.hostgator.com> Message-ID: On Wed, Nov 23, 2011 at 4:06 PM, George Waldon wrote: > Hi Dietmar, > > The GenbankLocationParser from biojava-1.8.1 is working perfectly as long as > you feed him with DNA locations that use the syntax described in the > DDBJ/EMBL/GenBank feature table definition. As mentioned earlier, that document only covers nucleotide sequences, not proteins as used in GenPept and the EMBL patents database: http://lists.open-bio.org/pipermail/biojava-l/2011-November/007889.html > What you've got (join(bond(9),bond(125))...) is a pseudo GenBank format > that apparently describes protein structure and uses a slightly different > syntax to indicate heteroatom locations. That is why you get all these > exceptions thrown. > > George It would be interesting to see if the EBI offer the same record(s) in EMBL format, and if that too uses the (undocumented?) bond locations. Peter From andreas.prlic at gmail.com Tue Nov 1 22:55:08 2011 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 1 Nov 2011 15:55:08 -0700 Subject: [Biojava-l] queries on biojava In-Reply-To: <1319952341.16116.YahooMailNeo@web112102.mail.gq1.yahoo.com> References: <1319952341.16116.YahooMailNeo@web112102.mail.gq1.yahoo.com> Message-ID: Hi Parinita, Please don't send BioJava related questions to me directly, but to the mailing list. Probably your genome sequences are very long and require a lot of memory? Try increasing your Java memory to the maximum available on your computer with something like -Xmx4G . If this does not help and your DNA sequences are too long, you might have to use either shorter DNA sequences, or you use one of the alignment algorithms that are available for the alignment of larger genomes. Eg. Mummer might be one of the options. Andreas On Sat, Oct 29, 2011 at 10:25 PM, parinita ujoodha wrote: > Hello andreas, > ??? ??? ??? ??? ??? I am currently an undergraduate final year student on > the computer science course at the university of Mauritius.? I am doing a > project on a genome comparison tool and using java and biojava as > programming tools.? I have certain queries and hope you can answer them. > I have tried the MSA in the cookbook and it works fine but when I try to > align genomes, the program crashes. > I would sincerely like to know whether there is a way of? aligning genomes > and displaying them in a pleasant interface > I thank you in advance > > Regards > Parinita > > > From andreas at sdsc.edu Tue Nov 1 23:19:17 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 1 Nov 2011 16:19:17 -0700 Subject: [Biojava-l] [Biojava-dev] Genbank import In-Reply-To: References: Message-ID: Hi Ed, Interesting that you found this, this ancient tutorial has not been modified since the early 2000s and has been replaced with the Cookbook pages: http://biojava.org/wiki/BioJava:CookBook Some of the code for this is available as part of the BioJava-legacy (v 1.8) project. BioJava3 currently does not have a Genbank parser yet, only fasta. I added it to the (new) missing features page : http://www.biojava.org/wiki/BioJava3_Feature_Requests Andreas On Sun, Oct 30, 2011 at 7:54 PM, Ed Beaty wrote: > Hello all. ?Is it possible to import Genbank (*.gb) format files using BioJava 3? ?I can't find the SeqIOTools class described in the GeneralReader example in the http://www.biojava.org/docs/bj_in_anger/ReadingGES.htm doc. > Thanks, > Ed Beaty > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From member at linkedin.com Wed Nov 2 02:57:54 2011 From: member at linkedin.com (Huijie Qiao via LinkedIn) Date: Wed, 2 Nov 2011 02:57:54 +0000 (UTC) Subject: [Biojava-l] Invitation to connect on LinkedIn Message-ID: <1927181069.419433.1320202674916.JavaMail.app@ela4-bed82.prod> LinkedIn ------------ Huijie Qiao requested to add you as a connection on LinkedIn: ------------------------------------------ Christopher, I'd like to add you to my professional network on LinkedIn. Accept invitation from Huijie Qiao http://www.linkedin.com/e/triamj-guhql3hb-5d/zz8MFWe4hXWC7m_VsDmWDUKsZA0p5qlGHsp1420HEwv/blk/I24690062_16/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_dz5vczoMc3ASd399bQtmrkhhdlhObPsRd38Ve3gVcPgLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=1poUfgEBGNLAY1 View invitation from Huijie Qiao http://www.linkedin.com/e/triamj-guhql3hb-5d/zz8MFWe4hXWC7m_VsDmWDUKsZA0p5qlGHsp1420HEwv/blk/I24690062_16/0SclYOdz0MejoQcAALqnpPbOYWrSlI/svi/?hs=false&tok=14dMLFvIWNLAY1 ------------------------------------------ Why might connecting with Huijie Qiao be a good idea? Have a question? Huijie Qiao's network will probably have an answer: You can use LinkedIn Answers to distribute your professional questions to Huijie Qiao and your extended network. You can get high-quality answers from experienced professionals. http://www.linkedin.com/e/triamj-guhql3hb-5d/ash/inv19_ayn/?hs=false&tok=1eLzvWwpiNLAY1 -- (c) 2011, LinkedIn Corporation From biojava at hannes.oib.com Thu Nov 3 13:08:22 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Thu, 3 Nov 2011 14:08:22 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences Message-ID: Hi! Is there a Class/Method in Biojava that calculates the Levenshtein distance between two sequences? I could not find anything in the docs at first search. I need to compare 2 DNASequences (or Strings) and get the number of insertions, deletions, and substitutions. Ideally, there would be an option to abort the comparison if the number of mismatches exceeds a certain number. Hannes From biojava at hannes.oib.com Thu Nov 3 14:17:05 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Thu, 3 Nov 2011 15:17:05 +0100 Subject: [Biojava-l] Sequence Alignment and IUB Codes Message-ID: Does the biojava sequence alignment support IUB Codes? http://en.wikipedia.org/wiki/FASTA_format#Sequence_representation Hannes From andreas at sdsc.edu Fri Nov 4 06:18:12 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 3 Nov 2011 23:18:12 -0700 Subject: [Biojava-l] Sequence Alignment and IUB Codes In-Reply-To: References: Message-ID: Hi Hannes, I think that depends which substitution matrix you are using. Some of them don't support ambiguity codes. Andreas On Thu, Nov 3, 2011 at 7:17 AM, Hannes Brandst?tter-M?ller wrote: > Does the biojava sequence alignment support IUB Codes? > http://en.wikipedia.org/wiki/FASTA_format#Sequence_representation > > Hannes > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From biojava at hannes.oib.com Fri Nov 4 06:20:53 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Fri, 4 Nov 2011 07:20:53 +0100 Subject: [Biojava-l] Sequence Alignment and IUB Codes In-Reply-To: References: Message-ID: Thanks Andreas, of course, that's where I can configure it. Thanks for the clarification. Hannes On Fri, Nov 4, 2011 at 07:18, Andreas Prlic wrote: > Hi Hannes, > > I think that depends which substitution matrix you are using. Some of > them don't support ambiguity codes. > > Andreas > > On Thu, Nov 3, 2011 at 7:17 AM, Hannes Brandst?tter-M?ller > wrote: >> Does the biojava sequence alignment support IUB Codes? >> http://en.wikipedia.org/wiki/FASTA_format#Sequence_representation >> >> Hannes >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From andreas at sdsc.edu Fri Nov 4 06:21:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 3 Nov 2011 23:21:57 -0700 Subject: [Biojava-l] BioJava in education Message-ID: Hi, Is anybody using BioJava in teaching, or has been introduced to BioJava as part of a course? Andreas From cfriedline at vcu.edu Fri Nov 4 13:50:52 2011 From: cfriedline at vcu.edu (Chris Friedline) Date: Fri, 4 Nov 2011 09:50:52 -0400 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Andreas, I had to come to it on my own. Seems the focus at this university is on Perl only and there's a serious (IMO) disconnect with our CS department. Java (or some other OO language) is only a minor requirement for bioinformatics undergrads/graduate students. However, I'd love to see this change and/or be involved in the development of a course to do just that. We've been kicking around developing some intersession modules around dealing with genetic data sets using Java/BioJava, but doing actual research continues to stand in our way. On a side note, have you seen the Java Evolutionary Biology Library? Would be excellent to combine forces of these development efforts since JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely from a selfish user perspective. ;-) Chris On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: > Hi, > > Is anybody using BioJava in teaching, or has been introduced to > BioJava as part of a course? > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Fri Nov 4 23:11:46 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 4 Nov 2011 16:11:46 -0700 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Hi, to quickly follow up: A few people have contacted me off list that they were introduced to BioJava through some course. Interestingly no teachers though who arranged this. I would be interested in organising the exchange of teaching materials. If anybody wants to exchange power points, homework assignment ideas, etc. please let me know. Chris wrote: > On a side note, have you seen the Java Evolutionary Biology Library? ?Would be excellent to combine forces of these development efforts since JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely from a selfish user perspective. ?;-) That is a great suggestion. Seems that project does not really have a web presence. Any idea how to contact the main developers? There is no mailing list. I guess one could try via sourceforge somehow. Andreas > Chris > > On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: > >> Hi, >> >> Is anybody using BioJava in teaching, or has been introduced to >> BioJava as part of a course? >> >> Andreas >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From simon.rayner.cn at gmail.com Fri Nov 4 23:18:33 2011 From: simon.rayner.cn at gmail.com (simon rayner) Date: Sat, 5 Nov 2011 07:18:33 +0800 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: i've been trying to use biojava as a segment of an advanced bioinformatics course for masters students but these are students who are studying virology with a specific interest in bioinformatics. Nevertheless, some students take to it very easily but, for me, it seems the major drawback with biojava is it is a lot more disjointed compared to bioperl,. By this i mean, it's not always easy to find the classes you need, or if the class is available, it was broken during a previous release, or simply wasn't included in the latest version. I think biojava is the way to go for training and a course development project might be the way to get a more cohesive documentation set. On Fri, Nov 4, 2011 at 9:50 PM, Chris Friedline wrote: > >> Andreas, >> >> I had to come to it on my own. Seems the focus at this university is on >> Perl only and there's a serious (IMO) disconnect with our CS department. >> Java (or some other OO language) is only a minor requirement for >> bioinformatics undergrads/graduate students. However, I'd love to see this >> change and/or be involved in the development of a course to do just that. >> We've been kicking around developing some intersession modules around >> dealing with genetic data sets using Java/BioJava, but doing actual >> research continues to stand in our way. >> >> On a side note, have you seen the Java Evolutionary Biology Library? >> Would be excellent to combine forces of these development efforts since >> JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely >> from a selfish user perspective. ;-) >> >> Chris >> >> On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: >> >> > Hi, >> > >> > Is anybody using BioJava in teaching, or has been introduced to >> > BioJava as part of a course? >> > >> > Andreas >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895 (office) > +86 18627113001 (cell) > > -- Simon Rayner State Key Laboratory of Virology Wuhan Institute of Virology Chinese Academy of Sciences Wuhan, Hubei 430071 P.R.China +86 (27) 87199895 (office) +86 18627113001 (cell) From andreas at sdsc.edu Sat Nov 5 00:37:03 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 4 Nov 2011 17:37:03 -0700 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Thanks Simon, v. interesting. Can you provide some more details? What kind of tasks were you requiring the students to perform? I agree, a project to prepare course material would be great. Andreas On Fri, Nov 4, 2011 at 4:18 PM, simon rayner wrote: > i've been trying to use biojava as a segment of an advanced bioinformatics > course for masters students but these are students who are studying > virology with a specific interest in bioinformatics. ?Nevertheless, some > students take to it very easily but, for me, it seems the major drawback > with biojava is it is a lot more disjointed compared to bioperl,. By this i > mean, it's not always easy to find the classes you need, or if the class is > available, it was broken during a previous release, or simply wasn't > included in the latest version. I think biojava is the way to go for > training and a course development project might be the way to get a more > cohesive documentation set. > > On Fri, Nov 4, 2011 at 9:50 PM, Chris Friedline wrote: >> >>> Andreas, >>> >>> I had to come to it on my own. ?Seems the focus at this university is on >>> Perl only and there's a serious (IMO) disconnect with our CS department. >>> ?Java (or some other OO language) is only a minor requirement for >>> bioinformatics undergrads/graduate students. ?However, I'd love to see this >>> change and/or be involved in the development of a course to do just that. >>> ?We've been kicking around developing some intersession modules around >>> dealing with genetic data sets using Java/BioJava, but doing actual >>> research continues to stand in our way. >>> >>> On a side note, have you seen the Java Evolutionary Biology Library? >>> ?Would be excellent to combine forces of these development efforts since >>> JEBL is well-vetted (e.g., FigTree, BEAST, Geneious), though that's purely >>> from a selfish user perspective. ?;-) >>> >>> Chris >>> >>> On Nov 4, 2011, at 2:21 AM, Andreas Prlic wrote: >>> >>> > Hi, >>> > >>> > Is anybody using BioJava in teaching, or has been introduced to >>> > BioJava as part of a course? >>> > >>> > Andreas >>> > _______________________________________________ >>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> Simon Rayner >> >> State Key Laboratory of Virology >> Wuhan Institute of Virology >> Chinese Academy of Sciences >> Wuhan, Hubei 430071 >> P.R.China >> >> +86 (27) 87199895 (office) >> +86 18627113001 (cell) >> >> > > > -- > Simon Rayner > > State Key Laboratory of Virology > Wuhan Institute of Virology > Chinese Academy of Sciences > Wuhan, Hubei 430071 > P.R.China > > +86 (27) 87199895?(office) > +86 18627113001?(cell) > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From cfriedline at vcu.edu Sat Nov 5 14:18:18 2011 From: cfriedline at vcu.edu (Chris Friedline) Date: Sat, 5 Nov 2011 10:18:18 -0400 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: <9B411F3A-FCC2-449A-9F3C-44AD91DF30BD@vcu.edu> On Nov 4, 2011, at 7:11 PM, Andreas Prlic wrote: > > That is a great suggestion. Seems that project does not really have a > web presence. Any idea how to contact the main developers? There is no > mailing list. I guess one could try via sourceforge somehow. You may want to contact either Alexei Drummond (http://compevol.auckland.ac.nz/people/) or Andrew Rambaut (http://tree.bio.ed.ac.uk/). I think their emails are on their respective sites. Chris From biojava at hannes.oib.com Mon Nov 7 13:39:16 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Mon, 7 Nov 2011 14:39:16 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: Following up: If there is no such thing, should I make it available if I write it? Hannes On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller wrote: > Hi! > > Is there a Class/Method in Biojava that calculates the Levenshtein > distance between two sequences? I could not find anything in the docs > at first search. > > I need to compare 2 DNASequences (or Strings) and get the number of > insertions, deletions, and substitutions. Ideally, there would be an > option to abort the comparison if the number of mismatches exceeds a > certain number. > > Hannes > From andreas at sdsc.edu Mon Nov 7 14:42:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 7 Nov 2011 06:42:57 -0800 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: Hi Hannes, you are right, this does not exist yet. Somebody else asked the same question a few weeks ago. As such it would be great if you could provide a patch, there might be other people interested in that, too. Andreas On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller wrote: > Following up: > > If there is no such thing, should I make it available if I write it? > > Hannes > > On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller > wrote: >> Hi! >> >> Is there a Class/Method in Biojava that calculates the Levenshtein >> distance between two sequences? I could not find anything in the docs >> at first search. >> >> I need to compare 2 DNASequences (or Strings) and get the number of >> insertions, deletions, and substitutions. Ideally, there would be an >> option to abort the comparison if the number of mismatches exceeds a >> certain number. >> >> Hannes >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From forumjspro at gmail.com Mon Nov 7 15:03:55 2011 From: forumjspro at gmail.com (forumjspro at gmail.com) Date: Mon, 7 Nov 2011 16:03:55 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Hi Hannes, You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. But it will not stop when a maximal number of errors is reached ... JS Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : > Hi Hannes, > > you are right, this does not exist yet. Somebody else asked the same > question a few weeks ago. As such it would be great if you could > provide a patch, there might be other people interested in that, too. > > Andreas > > On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller > wrote: >> Following up: >> >> If there is no such thing, should I make it available if I write it? >> >> Hannes >> >> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >> wrote: >>> Hi! >>> >>> Is there a Class/Method in Biojava that calculates the Levenshtein >>> distance between two sequences? I could not find anything in the docs >>> at first search. >>> >>> I need to compare 2 DNASequences (or Strings) and get the number of >>> insertions, deletions, and substitutions. Ideally, there would be an >>> option to abort the comparison if the number of mismatches exceeds a >>> certain number. >>> >>> Hannes >>> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From cfriedline at vcu.edu Mon Nov 7 15:58:28 2011 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 7 Nov 2011 10:58:28 -0500 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: Message-ID: <7E0C3BDA-03F4-4723-BE91-0F4B274DD671@vcu.edu> Hannes, If you really want to use Levenstein distance, there's a method in Apache commons lang StringUtils. Chris On Nov 3, 2011, at 9:08 AM, Hannes Brandst?tter-M?ller wrote: > Hi! > > Is there a Class/Method in Biojava that calculates the Levenshtein > distance between two sequences? I could not find anything in the docs > at first search. > > I need to compare 2 DNASequences (or Strings) and get the number of > insertions, deletions, and substitutions. Ideally, there would be an > option to abort the comparison if the number of mismatches exceeds a > certain number. > > Hannes > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From amr_alhossary at hotmail.com Mon Nov 7 14:48:28 2011 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Mon, 7 Nov 2011 16:48:28 +0200 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Sorry for the late reply. Although I come into it on my own, I had the interest to voluntarily teach it to the Bioinforamtics group of ITi (a famous IT Institute herein Egypt). I focused on BioJava 1.7 and I May send you the presentation if you like. Unfortunately, this department was closed after 2 successive intakes, so I didn't update it. Amr -------------------------------------------------- From: "Andreas Prlic" Sent: Friday, November 04, 2011 8:21 AM To: "Biojava" Subject: [Biojava-l] BioJava in education > Hi, > > Is anybody using BioJava in teaching, or has been introduced to > BioJava as part of a course? > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From amr_alhossary at hotmail.com Mon Nov 7 14:48:28 2011 From: amr_alhossary at hotmail.com (Amr AL-Hossary) Date: Mon, 7 Nov 2011 16:48:28 +0200 Subject: [Biojava-l] BioJava in education In-Reply-To: References: Message-ID: Sorry for the late reply. Although I come into it on my own, I had the interest to voluntarily teach it to the Bioinforamtics group of ITi (a famous IT Institute herein Egypt). I focused on BioJava 1.7 and I May send you the presentation if you like. Unfortunately, this department was closed after 2 successive intakes, so I didn't update it. Amr -------------------------------------------------- From: "Andreas Prlic" Sent: Friday, November 04, 2011 8:21 AM To: "Biojava" Subject: [Biojava-l] BioJava in education > Hi, > > Is anybody using BioJava in teaching, or has been introduced to > BioJava as part of a course? > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From biojava at hannes.oib.com Wed Nov 9 09:01:15 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Wed, 9 Nov 2011 10:01:15 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> References: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Message-ID: Thanks. I am thinking about implementing a modified MatrixAligner to fit my needs here. Direct Levenstein Distance is not exactly right for my application, because there are 0 to n insertions/deletions. A direct LevensteinDistance as implemented in would not give me all the Information I need. Thanks for the hints and input. Hannes PS we seriously need more examples in the Cookbook - I'll submit some later, but the modified N-W Aligner mentioned below would make a good example too, don't you think? ;) On Mon, Nov 7, 2011 at 16:03, wrote: > Hi Hannes, > > You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. > > But it will not stop when a maximal number of errors is reached ... > > JS > > Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : > >> Hi Hannes, >> >> you are right, this does not exist yet. Somebody else asked the same >> question a few weeks ago. As such it would be great if you could >> provide a patch, there might be other people interested in that, too. >> >> Andreas >> >> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller >> wrote: >>> Following up: >>> >>> If there is no such thing, should I make it available if I write it? >>> >>> Hannes >>> >>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >>> wrote: >>>> Hi! >>>> >>>> Is there a Class/Method in Biojava that calculates the Levenshtein >>>> distance between two sequences? I could not find anything in the docs >>>> at first search. >>>> >>>> I need to compare 2 DNASequences (or Strings) and get the number of >>>> insertions, deletions, and substitutions. Ideally, there would be an >>>> option to abort the comparison if the number of mismatches exceeds a >>>> certain number. >>>> >>>> Hannes >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > From khalil.elmazouari at gmail.com Mon Nov 14 16:00:37 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 14 Nov 2011 17:00:37 +0100 Subject: [Biojava-l] Rename RichSequence Message-ID: <97C210D8-F5C4-4E0D-AF59-FDB4BCE193F7@gmail.com> Hi, how to rename a RichSequence? Thanks From Dietmar.Birzer at biologie.uni-regensburg.de Mon Nov 14 16:53:29 2011 From: Dietmar.Birzer at biologie.uni-regensburg.de (Dietmar Birzer) Date: Mon, 14 Nov 2011 17:53:29 +0100 Subject: [Biojava-l] Exception thrown when parsing GenBank file Message-ID: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> Hi all, I am currently trying to debug a little software application which uses BioJava's core-1.8.1.jar library because it has started to throw exceptions a while ago. I guess the problem is, that the GenbankLocationParser is not able to handle "Het" entries in the features section of the GenBank/GenPept format, e.g. Het join(bond(9),bond(125)) /heterogen="( NA, 5 )" for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . Calling new GenpeptRichSequenceDB().getRichSequence("14719485"); will result in an error message like: Error while querying 14719485! org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenpeptRichSequenceDB.getRichSequence(GenpeptRichSequenceDB.java:158) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at org.biojavax.bio.db.ncbi.GenpeptRichSequenceDB.getRichSequence(GenpeptRichSequenceDB.java:154) ... 2 more Caused by: org.biojava.bio.seq.io.ParseException: Could not understand position: bond(9 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:286) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:272) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:237) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:132) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:508) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 3 more I have checked several other database entires and none of the ones that worked had a "Het" entry. But I also failed at 56965892, 13786715, 209156668 and 12084365. Has anybody else come across this problem or knows how to fix it? Thanks, Dietmar From p.j.a.cock at googlemail.com Mon Nov 14 17:29:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 14 Nov 2011 17:29:31 +0000 Subject: [Biojava-l] Exception thrown when parsing GenBank file In-Reply-To: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> Message-ID: On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: > > Hi all, > > I am currently trying to debug a little software application which > uses BioJava's core-1.8.1.jar library because it has started to > throw exceptions a while ago. > > I guess the problem is, that the GenbankLocationParser is not > able to handle "Het" entries in the features section of the > GenBank/GenPept format, e.g. > > ?Het ? ? ? ? ? ? join(bond(9),bond(125)) > ? ? ? ? ? ? ? ? ? ? /heterogen="( NA, ? 5 )" > > for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . Interesting - note the bond locations are not in the official DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations that seems to be intended only for nucleotides and not proteins as here. It might be worth contacting the NCBI to find out if there is an official specification covering these location strings? Peter From gwaldon at geneinfinity.org Mon Nov 14 23:20:02 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 14 Nov 2011 17:20:02 -0600 Subject: [Biojava-l] Rename RichSequence In-Reply-To: <97C210D8-F5C4-4E0D-AF59-FDB4BCE193F7@gmail.com> References: <97C210D8-F5C4-4E0D-AF59-FDB4BCE193F7@gmail.com> Message-ID: <20111114172002.19321y9clytb4ug0@gator1273.hostgator.com> Quoting Khalil El Mazouari : > Hi, > > how to rename a RichSequence? > > Thanks > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > Hi, You cannot rename SimpleRichSequence by design. A simple way is to wrap the sequence in your own implementation of RichSequence and delegate to it. Regards, George From biojava at hannes.oib.com Tue Nov 15 14:10:43 2011 From: biojava at hannes.oib.com (=?ISO-8859-1?Q?Hannes_Brandst=E4tter=2DM=FCller?=) Date: Tue, 15 Nov 2011 15:10:43 +0100 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Message-ID: Well, I have implemented a first version that is running quite well for me and my needs/specifications, although I did not integrate it directly into the biojava class hierarchy yet. Is anyone interested in taking a look at it and giving me some feedback if I shoud invest the time and work to make it includable into biojava? Hannes On Wed, Nov 9, 2011 at 10:01, Hannes Brandst?tter-M?ller wrote: > Thanks. > > I am thinking about implementing a modified MatrixAligner to fit my > needs here. Direct Levenstein Distance is not exactly right for my > application, because there are 0 to n insertions/deletions. > > A direct LevensteinDistance as implemented in would not give me all > the Information I need. > > Thanks for the hints and input. > > Hannes > > PS we seriously need more examples in the Cookbook - I'll submit some > later, but the modified N-W Aligner mentioned below would make a good > example too, don't you think? ;) > > On Mon, Nov 7, 2011 at 16:03, ? wrote: >> Hi Hannes, >> >> You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. >> >> But it will not stop when a maximal number of errors is reached ... >> >> JS >> >> Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : >> >>> Hi Hannes, >>> >>> you are right, this does not exist yet. Somebody else asked the same >>> question a few weeks ago. As such it would be great if you could >>> provide a patch, there might be other people interested in that, too. >>> >>> Andreas >>> >>> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller >>> wrote: >>>> Following up: >>>> >>>> If there is no such thing, should I make it available if I write it? >>>> >>>> Hannes >>>> >>>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >>>> wrote: >>>>> Hi! >>>>> >>>>> Is there a Class/Method in Biojava that calculates the Levenshtein >>>>> distance between two sequences? I could not find anything in the docs >>>>> at first search. >>>>> >>>>> I need to compare 2 DNASequences (or Strings) and get the number of >>>>> insertions, deletions, and substitutions. Ideally, there would be an >>>>> option to abort the comparison if the number of mismatches exceeds a >>>>> certain number. >>>>> >>>>> Hannes >>>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > From andreas at sdsc.edu Tue Nov 15 15:53:50 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 15 Nov 2011 07:53:50 -0800 Subject: [Biojava-l] Calculating edit distance between 2 DNA Sequences In-Reply-To: References: <6671E3C3-790E-4545-B228-17D8238DD81C@gmail.com> Message-ID: you could send it to the list here and ask for feedback.. Andreas On Tue, Nov 15, 2011 at 6:10 AM, Hannes Brandst?tter-M?ller wrote: > Well, I have implemented a first version that is running quite well > for me and my needs/specifications, although I did not integrate it > directly into the biojava class hierarchy yet. > Is anyone interested in taking a look at it and giving me some > feedback if I shoud invest the time and work to make it includable > into biojava? > > Hannes > > On Wed, Nov 9, 2011 at 10:01, Hannes Brandst?tter-M?ller > wrote: >> Thanks. >> >> I am thinking about implementing a modified MatrixAligner to fit my >> needs here. Direct Levenstein Distance is not exactly right for my >> application, because there are 0 to n insertions/deletions. >> >> A direct LevensteinDistance as implemented in would not give me all >> the Information I need. >> >> Thanks for the hints and input. >> >> Hannes >> >> PS we seriously need more examples in the Cookbook - I'll submit some >> later, but the modified N-W Aligner mentioned below would make a good >> example too, don't you think? ;) >> >> On Mon, Nov 7, 2011 at 16:03, ? wrote: >>> Hi Hannes, >>> >>> You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors. >>> >>> But it will not stop when a maximal number of errors is reached ... >>> >>> JS >>> >>> Le 7 nov. 2011 ? 15:42, Andreas Prlic a ?crit : >>> >>>> Hi Hannes, >>>> >>>> you are right, this does not exist yet. Somebody else asked the same >>>> question a few weeks ago. As such it would be great if you could >>>> provide a patch, there might be other people interested in that, too. >>>> >>>> Andreas >>>> >>>> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandst?tter-M?ller >>>> wrote: >>>>> Following up: >>>>> >>>>> If there is no such thing, should I make it available if I write it? >>>>> >>>>> Hannes >>>>> >>>>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandst?tter-M?ller >>>>> wrote: >>>>>> Hi! >>>>>> >>>>>> Is there a Class/Method in Biojava that calculates the Levenshtein >>>>>> distance between two sequences? I could not find anything in the docs >>>>>> at first search. >>>>>> >>>>>> I need to compare 2 DNASequences (or Strings) and get the number of >>>>>> insertions, deletions, and substitutions. Ideally, there would be an >>>>>> option to abort the comparison if the number of mismatches exceeds a >>>>>> certain number. >>>>>> >>>>>> Hannes >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From lfrohman at gmail.com Wed Nov 16 16:14:18 2011 From: lfrohman at gmail.com (Lance Frohman) Date: Wed, 16 Nov 2011 08:14:18 -0800 Subject: [Biojava-l] biojava-ensembl Message-ID: Where do I find the source code for biojava-ensembl? thanks From Dietmar.Birzer at biologie.uni-regensburg.de Wed Nov 23 15:18:29 2011 From: Dietmar.Birzer at biologie.uni-regensburg.de (Dietmar Birzer) Date: Wed, 23 Nov 2011 16:18:29 +0100 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> Message-ID: <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> Hi all, as the GenbankLocationParser from biojava-1.8.1 is not working properly anymore, I was wondering if there is an equivalent way to do this ( GenpeptRichSequenceDB().getRichSequence("14719485") ) using BioJava 3. Unfortunately I could not find any GenBank/GenPept parser so far. Is that because it does not exist (yet), or just because I have not looked properly? Best wishes Dietmar >>> Peter Cock 11/14/2011 6:29 PM >>> On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: > > Hi all, > > I am currently trying to debug a little software application which > uses BioJava's core-1.8.1.jar library because it has started to > throw exceptions a while ago. > > I guess the problem is, that the GenbankLocationParser is not > able to handle "Het" entries in the features section of the > GenBank/GenPept format, e.g. > > Het join(bond(9),bond(125)) > /heterogen="( NA, 5 )" > > for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . Interesting - note the bond locations are not in the official DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations that seems to be intended only for nucleotides and not proteins as here. It might be worth contacting the NCBI to find out if there is an official specification covering these location strings? Peter From andreas at sdsc.edu Wed Nov 23 15:34:46 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 23 Nov 2011 07:34:46 -0800 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> Message-ID: Hi Dietmar, The genbank parser is on top of the feature request list for biojava 3: http://biojava.org/wiki/BioJava3_Feature_Requests Anybody who wants to take an initiative here and claim ownership of this topic is welcome... Andreas On Wed, Nov 23, 2011 at 7:18 AM, Dietmar Birzer wrote: > ?Hi all, > > as the GenbankLocationParser from biojava-1.8.1 is not working properly anymore, I was wondering if there is an equivalent way to do this ( GenpeptRichSequenceDB().getRichSequence("14719485") ) using BioJava 3. > Unfortunately I could not find any GenBank/GenPept parser so far. Is that because it does not exist (yet), or just because I have not looked properly? > > Best wishes > ?Dietmar > >>>> Peter Cock 11/14/2011 6:29 PM >>> > On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: >> >> Hi all, >> >> I am currently trying to debug a little software application which >> uses BioJava's core-1.8.1.jar library because it has started to >> throw exceptions a while ago. >> >> I guess the problem is, that the GenbankLocationParser is not >> able to handle "Het" entries in the features section of the >> GenBank/GenPept format, e.g. >> >> ?Het ? ? ? ? ? ? join(bond(9),bond(125)) >> ? ? ? ? ? ? ? ? ? ? /heterogen="( NA, ? 5 )" >> >> for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . > > Interesting - note the bond locations are not in the official > DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): > http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html > > However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations > that seems to be intended only for nucleotides and not proteins as here. > It might be worth contacting the NCBI to find out if there is an official > specification covering these location strings? > > Peter > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From gwaldon at geneinfinity.org Wed Nov 23 16:06:57 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Wed, 23 Nov 2011 10:06:57 -0600 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> Message-ID: <20111123100657.23712tvrvitfya0w@gator1273.hostgator.com> Hi Dietmar, The GenbankLocationParser from biojava-1.8.1 is working perfectly as long as you feed him with DNA locations that use the syntax described in the DDBJ/EMBL/GenBank feature table definition. What you've got (join(bond(9),bond(125))...) is a pseudo GenBank format that apparently describes protein structure and uses a slightly different syntax to indicate heteroatom locations. That is why you get all these exceptions thrown. George Quoting Dietmar Birzer : > Hi all, > > as the GenbankLocationParser from biojava-1.8.1 is not working > properly anymore, I was wondering if there is an equivalent way to > do this ( GenpeptRichSequenceDB().getRichSequence("14719485") ) > using BioJava 3. > Unfortunately I could not find any GenBank/GenPept parser so far. Is > that because it does not exist (yet), or just because I have not > looked properly? > > Best wishes > Dietmar > >>>> Peter Cock 11/14/2011 6:29 PM >>> > On Mon, Nov 14, 2011 at 4:53 PM, Dietmar Birzer wrote: >> >> Hi all, >> >> I am currently trying to debug a little software application which >> uses BioJava's core-1.8.1.jar library because it has started to >> throw exceptions a while ago. >> >> I guess the problem is, that the GenbankLocationParser is not >> able to handle "Het" entries in the features section of the >> GenBank/GenPept format, e.g. >> >> Het join(bond(9),bond(125)) >> /heterogen="( NA, 5 )" >> >> for database id 14719485 (http://www.ncbi.nlm.nih.gov/protein/14719485) . > > Interesting - note the bond locations are not in the official > DDBJ/EMBL/GenBank feature table specification (v9, Oct 2011): > http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html > > However, as noted on http://www.bioperl.org/wiki/BioPerl_Locations > that seems to be intended only for nucleotides and not proteins as here. > It might be worth contacting the NCBI to find out if there is an official > specification covering these location strings? > > Peter > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From p.j.a.cock at googlemail.com Wed Nov 23 17:01:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 Nov 2011 17:01:11 +0000 Subject: [Biojava-l] Antw: Re: Exception thrown when parsing GenBank file In-Reply-To: <20111123100657.23712tvrvitfya0w@gator1273.hostgator.com> References: <4EC15599020000AA0001B58D@gwsmtp1.uni-regensburg.de> <4ECD1CD5020000AA0001BADF@gwsmtp1.uni-regensburg.de> <20111123100657.23712tvrvitfya0w@gator1273.hostgator.com> Message-ID: On Wed, Nov 23, 2011 at 4:06 PM, George Waldon wrote: > Hi Dietmar, > > The GenbankLocationParser from biojava-1.8.1 is working perfectly as long as > you feed him with DNA locations that use the syntax described in the > DDBJ/EMBL/GenBank feature table definition. As mentioned earlier, that document only covers nucleotide sequences, not proteins as used in GenPept and the EMBL patents database: http://lists.open-bio.org/pipermail/biojava-l/2011-November/007889.html > What you've got (join(bond(9),bond(125))...) is a pseudo GenBank format > that apparently describes protein structure and uses a slightly different > syntax to indicate heteroatom locations. That is why you get all these > exceptions thrown. > > George It would be interesting to see if the EBI offer the same record(s) in EMBL format, and if that too uses the (undocumented?) bond locations. Peter