From gwaldon at geneinfinity.org Fri Apr 1 20:11:02 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Fri, 01 Apr 2011 19:11:02 -0500 Subject: [Biojava-l] Expasy pI calculation algorythm Message-ID: <20110401191102.44902rooroy3x9k4@gator1273.hostgator.com> Hello, Sorry if this comes a bit late; we had to solve some email issues - Thanks again to Andreas for doing it. This is part of the email exchange I had with Christine Hoogland and Gregoire Rossier a few years ago regarding the algorithm used by "Compute pI/Mw" on the Expazy server. The code which was given to me is included at the end of this email; I used it to update bj1. Good luck to all GSoC candidates, George On Tue, May 22, 2007 at 9:26 AM, Christine Hoogland via RT wrote: Dear George, Please find enclosed the algorithm we are using on ExPASy. I hope this helps. Best regards Christine > > The pK values used for "Compute pI/Mw" can be found in > > # Bjellqvist, B.,Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F., > Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing > positions of polypeptides in immobilized pH gradients can be predicted > from their amino acid sequences. Electrophoresis 1993, 14, 1023-1031. > > MEDLINE: 8125050 > > # Bjellqvist, B., Basse, B., Olsen, E. and Celis, J.E. Reference > points > for comparisons of two-dimensional maps of proteins from different > human > cell types defined in a pH scale where isoelectric points correlate > with > polypeptide compositions. Electrophoresis 1994, 15, 529-539. > > MEDLINE: 8055880 > > The pK were defined by examining polypeptide migration between pH 4.5 > to > 7.3 in an immobilised pH gradient gel environment with 9.2M and 9.8M > urea at 15?C or 25?C. Prediction of protein pI for highly basic > proteins > is yet to be studied and it is possible that current Compute pI/Mw > predictions may not be adequate for this purpose. > > I hope this helps. > > > Best regards > Gregoire Rossier > > -------------------------------------------------------- Christine Hoogland Swiss Institute of Bioinformatics CMU - 1, rue Michel Servet Tel. (+41 22) 379 58 28 CH - 1211 Geneva 4 Switzerland Fax (+41 22) 379 58 58 Christine.Hoogland at isb-sib.ch http://www.expasy.org/ -------------------------------------------------------- // VERSION : 1.6 // DATE : 1/25/95 // Copyright 1993 by Swiss Institute of Bioinformatics. All rights reserved. // // Table of pk values : // Note: the current algorithm does not use the last two columns. Each // row corresponds to an amino acid starting with Ala. J, O and U are // inexistant, but here only in order to have the complete alphabet. // // Ct Nt Sm Sc Sn // static double cPk[26][5] = { 3.55, 7.59, 0. , 0. , 0. , // A 3.55, 7.50, 0. , 0. , 0. , // B 3.55, 7.50, 9.00 , 9.00 , 9.00 , // C 4.55, 7.50, 4.05 , 4.05 , 4.05 , // D 4.75, 7.70, 4.45 , 4.45 , 4.45 , // E 3.55, 7.50, 0. , 0. , 0. , // F 3.55, 7.50, 0. , 0. , 0. , // G 3.55, 7.50, 5.98 , 5.98 , 5.98 , // H 3.55, 7.50, 0. , 0. , 0. , // I 0.00, 0.00, 0. , 0. , 0. , // J 3.55, 7.50, 10.00, 10.00, 10.00 , // K 3.55, 7.50, 0. , 0. , 0. , // L 3.55, 7.00, 0. , 0. , 0. , // M 3.55, 7.50, 0. , 0. , 0. , // N 0.00, 0.00, 0. , 0. , 0. , // O 3.55, 8.36, 0. , 0. , 0. , // P 3.55, 7.50, 0. , 0. , 0. , // Q 3.55, 7.50, 12.0 , 12.0 , 12.0 , // R 3.55, 6.93, 0. , 0. , 0. , // S 3.55, 6.82, 0. , 0. , 0. , // T 0.00, 0.00, 0. , 0. , 0. , // U 3.55, 7.44, 0. , 0. , 0. , // V 3.55, 7.50, 0. , 0. , 0. , // W 3.55, 7.50, 0. , 0. , 0. , // X 3.55, 7.50, 10.00, 10.00, 10.00 , // Y 3.55, 7.50, 0. , 0. , 0. }; // Z #define PH_MIN 0 /* minimum pH value */ #define PH_MAX 14 /* maximum pH value */ #define MAXLOOP 2000 /* maximum number of iterations */ #define EPSI 0.0001 /* desired precision */ // // Compute the amino-acid composition. // for (i = 0; i < sequenceLength; i++) comp[sequence[i] - 'A']++; // // Look up N-terminal and C-terminal residue. // nTermResidue = sequence[0] - 'A'; cTermResidue = sequence[sequenceLength - 1] - 'A'; phMin = PH_MIN; phMax = PH_MAX; for (i = 0, charge = 1.0; i < MAXLOOP && (phMax - phMin) > EPSI; i++) { phMid = phMin + (phMax - phMin) / 2; cter = exp10(-cPk[cTermResidue][0]) / (exp10(-cPk[cTermResidue][0]) + exp10(-phMid)); nter = exp10(-phMid) / (exp10(-cPk[nTermResidue][1]) + exp10(-phMid)); carg = comp[R] * exp10(-phMid) / (exp10(-cPk[R][2]) + exp10(-phMid)); chis = comp[H] * exp10(-phMid) / (exp10(-cPk[H][2]) + exp10(-phMid)); clys = comp[K] * exp10(-phMid) / (exp10(-cPk[K][2]) + exp10(-phMid)); casp = comp[D] * exp10(-cPk[D][2]) / (exp10(-cPk[D][2]) + exp10(-phMid)); cglu = comp[E] * exp10(-cPk[E][2]) / (exp10(-cPk[E][2]) + exp10(-phMid)); ccys = comp[C] * exp10(-cPk[C][2]) / (exp10(-cPk[C][2]) + exp10(-phMid)); ctyr = comp[Y] * exp10(-cPk[Y][2]) / (exp10(-cPk[Y][2]) + exp10(-phMid)); charge = carg + clys + chis + nter - (casp + cglu + ctyr + ccys + cter); if (charge > 0.0) phMin = phMid; else phMax = phMid; } } From frieman6 at gmail.com Mon Apr 4 08:59:50 2011 From: frieman6 at gmail.com (omer f) Date: Mon, 4 Apr 2011 15:59:50 +0300 Subject: [Biojava-l] Fwd: Google Summer of Coding: Java project In-Reply-To: References: Message-ID: To all it may concern, Dear BioJava, My name Omer Frieman and i am a student of Computer Science (B.Sc) and International Politics at the Hebrew university in Jerusalem, Israel. Last week, I attended a conference about Google SOC, Which i find most interesting and something that i would like to experience. I am in the second year of my degree, and so far i have learned the program languages: Java, C, C++ and in addition Object Oriented Programming in Java. Though, i have no working experience in programming. While searching the list of SOC projects i came across the Bioinformatics Foundation Projects. I found the Java Project very interesting, First, for the content - the research subject of Bio-informatics, and Second, I was looking specifically for a Java project for gaining experience in it, Before sending my application with the short coding exercise, i send you this mail to Introduce myself. My only concern is, will My lack of experience could hurt my chances to be accepted to the project. I would appreciate your opinion in that matter. Furthermore, i was wondering how do i connect the Project mentor (Peter Troshin). Sincerely, Omer. From p.v.troshin at dundee.ac.uk Mon Apr 4 10:33:38 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 04 Apr 2011 15:33:38 +0100 Subject: [Biojava-l] Fwd: Google Summer of Coding: Java project In-Reply-To: References: Message-ID: <4D99D6C2.9060107@dundee.ac.uk> Hi Omer, I am on the BioJava list so I got your email. > >> I am in the second year of my degree, and so far i have learned > >> the program languages: Java, C, C++ and in addition Object > >> Oriented Programming in Java. Sounds good! > >> Though, i have no working experience in programming. Thank you for being honest (:-)) > >> My only concern is, will My lack of experience could hurt my > >> chances to be accepted to the project. It may but it may not. Some people pick up things pretty quickly - are you one of them? Seriously, have a look at the methods that you are about to implement, search the web for more information about them and see if you have a good idea how to do that. You do not have to know anything about the BioJava at this stage, you can learn about it later. For this project you also need to know the basics of Chemistry and Molecular Biology, nothing too complicated, but still without this knowledge the learning curve for this project is going to be pretty steep. I hope I answered your question. Regards, Peter On 04/04/2011 13:59, omer f wrote: > To all it may concern, > > Dear BioJava, > > My name Omer Frieman and i am a student of Computer Science (B.Sc) > and International Politics at the Hebrew university in Jerusalem, > Israel. Last week, I attended a conference about Google SOC, Which i > find most interesting and something that i would like to experience. > I am in the second year of my degree, and so far i have learned the > program languages: Java, C, C++ and in addition Object Oriented > Programming in Java. Though, i have no working experience in > programming. > > While searching the list of SOC projects i came across the > Bioinformatics Foundation Projects. I found the Java Project very > interesting, First, for the content - the research subject of > Bio-informatics, and Second, I was looking specifically for a Java > project for gaining experience in it, > > Before sending my application with the short coding exercise, i send > you this mail to Introduce myself. My only concern is, will My lack > of experience could hurt my chances to be accepted to the project. I > would appreciate your opinion in that matter. > > Furthermore, i was wondering how do i connect the Project mentor > (Peter Troshin). > > Sincerely, Omer. _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From x4roth at gmail.com Mon Apr 4 11:34:13 2011 From: x4roth at gmail.com (Xaroth) Date: Mon, 4 Apr 2011 11:34:13 -0400 Subject: [Biojava-l] GSoC Intent - Amino Acids Physico-Chemical Properties Calculation Message-ID: Hello everyone and project mentor Peter Troshin, My name is Justin Pugh. I am a 3rd year Computer Science student at the University of Central Florida. I am proficient in C and Java and I prefer to do all of my coding work in Java. I've recently learned about the Google Summer of Code program and I've been spending the past week or so looking over the available open-source projects. I am excited about participating in the GSoC because I have been anxious to be a part of a larger project than the small-scope programs that I have to write for classes. Sadly, I do not have any experience doing any programming projects on this scale but I am very interested in learning. The largest project I have completed was to write a compiler and virtual machine for PL/0. I'm currently working on writing a game from scratch as a vehicle for implementing some AI techniques and I am learning a lot about making use of others' code provided in libraries and integrating it into my own through the use of jMonkeyEngine (an open source java-based 3d game engine). I want to take my skills to the next level with GSoC. Out of all the projects I have looked at, BioJava has stood out the most for me (in particular, the Amino Acids Physico-Chemical Properties Calculation idea). I wanted to work on something written entirely in Java so I wouldnt have to spend any time learning a new language and I would be able to focus more on the actual project. I feel like I am able to wrap my head around the Amino Acids Physico-Chemical Properties Calculation proposal and I think it falls within my ability to get it done and even more - to have fun with it and be able to spend time refining it. I do not know anything about the calculations which must be implemented, but I am of course willing to learn them. I have only had small instruction in the area of multi-threaded Java programming but I hope this will not be a severe handicap for me. I plan on spending the next couple days doing some research and developing my proposal. I just wanted to announce my intent to submit an application and to introduce myself to the group. Please let me know your opinions regarding my ability to complete this project - I am open to criticism and pointers! :) Thank you for your time, -Justin Pugh From p.v.troshin at dundee.ac.uk Tue Apr 5 06:23:10 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 05 Apr 2011 11:23:10 +0100 Subject: [Biojava-l] GSoC Intent - Amino Acids Physico-Chemical Properties Calculation In-Reply-To: References: Message-ID: <4D9AED8E.3070309@dundee.ac.uk> Hi Justin, Welcome to the BioJava mailing list and thank you for your interest in the project. It sounds like you have plenty of experience in programming, which is definitely a plus. Do not worry about you lack of experience in multi-threaded programs, in the end there must be something for you to learn too! I hope that this project can give you what you are looking for ? the experience of implementing algorithms in pure Java and integrating with the relatively large BioJava code base. However, do not underestimate the difficulty of the algorithms for calculating the physico-chemical properties. Although the algorithms may not be particularly complex, but in order to understand them you would need some knowledge in chemistry, molecular biology and maths. Programming skills only will not be sufficient for this project. Good luck with your application. Regards, Peter On 04/04/2011 16:34, Xaroth wrote: > Hello everyone and project mentor Peter Troshin, > > My name is Justin Pugh. I am a 3rd year Computer Science student at the > University of Central Florida. I am proficient in C and Java and I prefer to > do all of my coding work in Java. I've recently learned about the Google > Summer of Code program and I've been spending the past week or so looking > over the available open-source projects. I am excited about participating in > the GSoC because I have been anxious to be a part of a larger project than > the small-scope programs that I have to write for classes. Sadly, I do not > have any experience doing any programming projects on this scale but I am > very interested in learning. The largest project I have completed was to > write a compiler and virtual machine for PL/0. I'm currently working on > writing a game from scratch as a vehicle for implementing some AI techniques > and I am learning a lot about making use of others' code provided in > libraries and integrating it into my own through the use of jMonkeyEngine > (an open source java-based 3d game engine). I want to take my skills to the > next level with GSoC. > > Out of all the projects I have looked at, BioJava has stood out the most for > me (in particular, the Amino Acids Physico-Chemical Properties Calculation > idea). I wanted to work on something written entirely in Java so I wouldnt > have to spend any time learning a new language and I would be able to focus > more on the actual project. I feel like I am able to wrap my head around the > Amino Acids Physico-Chemical Properties Calculation proposal and I think it > falls within my ability to get it done and even more - to have fun with it > and be able to spend time refining it. I do not know anything about the > calculations which must be implemented, but I am of course willing to learn > them. I have only had small instruction in the area of multi-threaded Java > programming but I hope this will not be a severe handicap for me. > > I plan on spending the next couple days doing some research and developing > my proposal. I just wanted to announce my intent to submit an application > and to introduce myself to the group. Please let me know your opinions > regarding my ability to complete this project - I am open to criticism and > pointers! :) > > Thank you for your time, > -Justin Pugh > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Tue Apr 5 12:00:32 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 05 Apr 2011 17:00:32 +0100 Subject: [Biojava-l] Final Application (Paiu Alexandru ) (added project plan) In-Reply-To: References: Message-ID: <4D9B3CA0.6000707@dundee.ac.uk> Hi Alex, I have a look at your plan and I'd suggest that you add a few more details into it. Right now I'd say it does not look sufficiently detailed. What are the deliverables of you project? What steps you will take at every stage of the project. How you make sure that your implementation gives the correct results? Are you going to use BioJava? If so how? Also, I would not bother with any implementation details as yet. Finally, If you worked with any version control systems it would help to state this. I hope this will help you to improve your proposal. Regards, Peter On 28/03/2011 17:17, Alexandru Paiu wrote: > *1. **1.Your complete contact information*, including full name, > physical address, preferred email address, and telephone number, plus other > pertinent contact information such as IRC handles, etc. > > > > Full Name : Paiu Alexandru > > Address : Country Romania , city Constanta , Bld. Aurel Vlaicu , Nr. 41 , > Bl. Pc1 sc. B , Et 6 , Apt. 46 > > E-mail : paiualex12 at google.com or paiualex12 at yahoo.com > > Telephone number : 40733924684 > > > > *2. **2.Why you are interested in the p*roject you are proposing and > are well-suited to undertake it. > > > > This project suits me perfectly , because the interested students should > have a general knowledge of core Java programming, knowledge of > multi-threaded programming . I?ve started learning Java for 1 and a half > years , and I used a lot of Threads in applications and projects . > > This is the only project that I apply , because I haven?t found a more > interesting project than this one . > > > > *3. **3.A summary of your programming *experience and skills. > > > > I?ve did a lot of miniproject and applications for school and for me . I?ve > made projects like : > > a) Lanchat Client-Server using TCP/IP ? I wrote two applications : one > for the client and one for the server . I used an JApplet for the client > with Swing elements . I?ve used Threads especially in the server sider > application , and sockets > > b) Lanchat Peer-to-Peer using UDP and multicasting . I wrote only a > application for the client . I used Threads and multicast sockets . > > c) A project for administrating a database , using a JApplet with > Connector/J and MySql . It has to applications , one for clients and for the > administrator . > > > *4. **4.Programs or projects you have previously* authored or > contributed to available as open-source, including, if applicable, any past > Summer of Code involvement. > > > > I haven?t worked yet for any open-source and a I haven?t any past experience > with GSoc , and it?s the first time a apply for a open-source project . I > haven?t either worked for a company . > > > > *5. **5. A project plan for the project* you are proposing, even if > your proposed project is directly based on one of the proposed project ideas > for member projects. > > > > I wish to apply to the project called *Amino Acids physic-chemical > properties calculation .* > > I?ve been thinking since some time of a possible implementation and I > stopped at a single one (that I think it?s the best) . > > > > I will use two main classes . One that will represent an atom of a substance > ( for example He , H , O , etc ) , that will have params like : atom weight > , name , abbreviation , valence . I?ll use the second class for > constructing amino-acids from this class . So , the second class will extend > the class of atoms . So for example I have to initiate a molecule of H20 > (water) . I will have a constructor with a string param , that will build > the substance . For example , let?s say that the second class it?s called > Aminoacids , and the first one Atoms . > > Let?s say I choose from a Combo box H2O ( it?s only a example) . Then I sad > the string ? H2O1? to the aminoacids class , to intiate an object of > aminoacid . That constructor will be evaluated char by char . If it?s found > a char or two chars that means that I have to initiate an atom of that char > or chars . If it?s found a number , then that means that it?s the multiplier > of that atom before it . > > So the class aminoacids will have a private Object [] array , in which will > be number and objects called atoms . > > So for H20 the array will look like this : array[0] = atom of H (Hidrogen) , > array[1]=2 , array[2]=O (Oxigen) , array[3]=1 . > > All the know substances will be in a file called atoms.txt with atom mass , > name , abbreviation etc . The atoms class will have a method to add new > atoms to the list . > > > > And for calculating the molecular weigth the algorithm is very simple . We > already have array={H,2,O,1} , and the atoms will have as params the atoms > weight so all we have to do is just : > > Mol. Weight=H.weight*2+O.weigth.*1 > > > > The plan for implementing : > > > > - May 20-June 20 ? implementing the two classes and the first two > methods > > - June 20 ? 20 July ? Implementing the rest of the methods > > - 20 July ? until the final ? final retouching , docummentation for > end users , and 1 method proposed by me > > - > > *6. **6.Any obligations, vacations, or plans* for the summer that may > require scheduling during the GSoC work period. > > > > I will have School final exams during 20 May ? 20 June . So I won?t be able > to work at maximum capacity . That?s all . I > > > *7. PS * > > I hope you've got my short coding exercise program ( I received a kinda > error for sending a mail will atachement) > > thanks > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Tue Apr 5 12:44:08 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 05 Apr 2011 17:44:08 +0100 Subject: [Biojava-l] attention to GSoC Amino Acids Physico-Chemical Properties prospective students In-Reply-To: References: Message-ID: <4D9B46D8.6070906@dundee.ac.uk> Hello prospective GSoC students, A have seen a few project plans so far and none of them seems to go into the trouble of providing the formulas for the calculations. Yet, it would have definitely strengthened your proposal if you can demonstrate that you are capable of extracting this information from the web and/or scientific papers. That said you do not have to include everything in your proposal but at least show that you know where to find what you need. There is the balance here to strike and this is something that is going to set you apart. I'd like to stress that the ideal student for this project should not only be a good Java programmer but have a keen interest in Bioinformatics! So please feel free to extend the original idea as you see fit. Good luck with your applications, Peter P.S. The project plan is if of cause yours and you should decide whether to include anything from the above or leave it, this is only an advice. Dr Peter Troshin Bioinformatics Software Developer Phone: +44 (0)1382 388589 Fax: +44 (0)1382 385764 The Barton Group College of Life Sciences Medical Sciences Institute University of Dundee Dundee DD1 5EH UK From andreas at sdsc.edu Wed Apr 6 00:33:47 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 5 Apr 2011 21:33:47 -0700 Subject: [Biojava-l] last days before submitting for GSoC Message-ID: Hi, just a quick reminder that the deadline to submit proposals for GSoC is approaching rapidly (Friday). In order to correctly apply you need to submit your applications at http://www.google-melange.com/ Don't forget to discuss your projects with the potential mentors or this list. A couple of more links that might be useful: how to write a proposal http://www.booki.cc/gsocstudentguide/_v/1.0/writing-a-proposal/ Here two proposals that got funded last year: http://biojava.org/wiki/GSoC:MSA http://biojava.org/wiki/GSoC:PTM Andreas From madhatterkkt at gmail.com Wed Apr 6 00:37:47 2011 From: madhatterkkt at gmail.com (K T) Date: Tue, 5 Apr 2011 21:37:47 -0700 Subject: [Biojava-l] Retrieving data from Genbank online Message-ID: I'm new to BioJava3, and recently downloaded 3.0.1. I'm looking for a way to retrieve entries from NCBI online (not a flat-file) using a RefSeq accession ID. I was able to do this with BioPerl, and I've also seen that there was a way to do this under BioJava 1.x. Can someone point me to a code-snippet where this is done for BioJava3? I tried searching online, but most of my search hits point either to BioJava 1.x or refer to retrieving them from a flat-file. I'm trying to get it from NCBI's online database directly. Thanks in advance. From alastair.m.kilpatrick at googlemail.com Wed Apr 6 05:07:27 2011 From: alastair.m.kilpatrick at googlemail.com (Alastair Kilpatrick) Date: Wed, 6 Apr 2011 10:07:27 +0100 Subject: [Biojava-l] Multiple sequence alignment in BioJava Message-ID: Dear all, I'm pretty new to BioJava so I may be missing something - I've checked through the list archives without much luck. I've been trying to do some multiple sequence alignments but I've been running into strange errors. To try and find where the problem is, I've just copied the code straight from http://biojava.org/wiki/BioJava:CookBook3:MSA and tried to run it, but I'm still getting an error: Caused by: java.lang.ClassNotFoundException: org.forester.phylogenyinference.DistanceMatrix It would seem that this is something to do with forester.jar, which I had to download separately (from http://code.google.com/p/forester/) as per http://biojava.org/wiki/BioJava:CookBook#biojava3-alignment - when I look through the jar in Eclipse, there isn't anything named 'phylogenyinference', although there is 'phylogeny', 'phylogeny.data', 'phylogeny.factories' and 'phylogeny.iterators'. Is there something I'm doing wrong, or is it the case that either one of BioJava or forester has been updated and things have broken somewhere (I see the CookBook page was updated in July 2010 but forester was updated just last month)? Either way, any ideas would be much appreciated! Many thanks, Alastair Kilpatrick PhD candidate, School of Informatics, University of Edinburgh From alastair.m.kilpatrick at googlemail.com Wed Apr 6 12:40:26 2011 From: alastair.m.kilpatrick at googlemail.com (Alastair Kilpatrick) Date: Wed, 6 Apr 2011 17:40:26 +0100 Subject: [Biojava-l] Multiple sequence alignment in BioJava In-Reply-To: References: Message-ID: Thanks - I had just downloaded the BioJava jars manually so that's fixed the CookBook code. However, I've made some changes in order to align DNA sequences instead and am running into more errors - this code: public static void main(String[] args) { String[] seqs = {"GATTACATTT", "CGATTACATG", "ATGGATTACA"}; List lst = new ArrayList(); for(String seq : seqs) { lst.add(new DNASequence(seq)); } Profile profile = Alignments.getMultipleSequenceAlignment(lst); //** System.out.println(profile); ConcurrencyTools.shutdown(); } gives: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) at java.util.concurrent.FutureTask.get(Unknown Source) at org.biojava3.alignment.Alignments.getListFromFutures(Alignments.java:282) at org.biojava3.alignment.Alignments.runPairwiseScorers(Alignments.java:602) at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:173) at CookbookMSA.main(CookbookMSA.java:49) at the 'alignment' line (**) - not sure what the problem is here, I see that getMultipleSequenceAlignment() has an extra argument(s) in the Javadoc but these weren't required in the example? Final question (hopefully!) - once I have the alignments I require I'd like to create a sequence logo - is there a way of doing this in BioJava3? From a google search I've seen references to DistributionTools.distOverAlignment() and similar, but can't find anything like that in the new api. Thanks again, sorry everyone for all the questions! Alastair On 6 April 2011 12:09, Scooter Willis wrote: > > You need to use the forester.jar file from the biojava3 code check out. If you are using maven this should be automatic. > > On Apr 6, 2011 5:07 AM, "Alastair Kilpatrick" wrote: > > Dear all, > I'm pretty new to BioJava so I may be missing something - I've checked > through the list archives without much luck. > I've been trying to do some multiple sequence alignments but I've been > running into strange errors. To try and find where the problem is, > I've just copied the code straight from > http://biojava.org/wiki/BioJava:CookBook3:MSA and tried to run it, but > I'm still getting an error: > > Caused by: java.lang.ClassNotFoundException: > org.forester.phylogenyinference.DistanceMatrix > > It would seem that this is something to do with forester.jar, which I > had to download separately (from http://code.google.com/p/forester/) > as per http://biojava.org/wiki/BioJava:CookBook#biojava3-alignment - > when I look through the jar in Eclipse, there isn't anything named > 'phylogenyinference', although there is 'phylogeny', 'phylogeny.data', > 'phylogeny.factories' and 'phylogeny.iterators'. Is there something > I'm doing wrong, or is it the case that either one of BioJava or > forester has been updated and things have broken somewhere (I see the > CookBook page was updated in July 2010 but forester was updated just > last month)? Either way, any ideas would be much appreciated! > > Many thanks, > Alastair Kilpatrick > > PhD candidate, > School of Informatics, University of Edinburgh > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From paiualex12 at gmail.com Wed Apr 6 13:05:38 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Wed, 6 Apr 2011 20:05:38 +0300 Subject: [Biojava-l] Identified bug in BioJava(Paiu Alexandru) Message-ID: Hi to all . I've identified a bug in BioJava . Usually the first element in an array is on index 0 (pos 0 ) . I've tried to learn more about Java and I've tryed the next 2 instructions : ProteinSequence a=new ProteinSequence("A"); AminoAcidCompound b=a.getCompoundAt(0); This will give a null execption (out of array bounds or something like that ) . when I put AminoAcidCompound b=a.getCompoundAt(1); it works and identifies the first aminoacid as Alanine . That's all for today From willishf at ufl.edu Wed Apr 6 14:30:11 2011 From: willishf at ufl.edu (Scooter Willis) Date: Wed, 6 Apr 2011 14:30:11 -0400 Subject: [Biojava-l] Identified bug in BioJava(Paiu Alexandru) In-Reply-To: References: Message-ID: Alexandru To make it easier on how biologist think about the first position(1) versus how computer scientists think(0) we opted to go with the biologist view of the world. So use 1 for the first sequence position. Thanks Scooter On Wed, Apr 6, 2011 at 1:05 PM, Alexandru Paiu wrote: > Hi to all . > I've identified a bug in BioJava . > Usually the first element in an array is on index 0 (pos 0 ) . > I've tried to learn more about Java and I've tryed the next 2 instructions : > > ProteinSequence a=new ProteinSequence("A"); > AminoAcidCompound b=a.getCompoundAt(0); > > > This will give a null execption (out of array bounds or something like that > ) . > when I put AminoAcidCompound b=a.getCompoundAt(1); ?it works and identifies > the first > aminoacid as Alanine . > > That's all for today > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From andreas at sdsc.edu Thu Apr 7 01:03:40 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 6 Apr 2011 22:03:40 -0700 Subject: [Biojava-l] Multiple sequence alignment in BioJava In-Reply-To: References: Message-ID: Hi Alastair, BioJava 1.X can do distributions, however there is no counterpart for this yet in BioJava 3. Andreas On Wed, Apr 6, 2011 at 9:40 AM, Alastair Kilpatrick wrote: > Thanks - I had just downloaded the BioJava jars manually so that's > fixed the CookBook code. However, I've made some changes in order to > align DNA sequences instead and am running into more errors - this > code: > > ? ? ? ?public static void main(String[] args) { > ? ? ? ? ? ? ? ?String[] seqs = {"GATTACATTT", "CGATTACATG", "ATGGATTACA"}; > ? ? ? ? ? ? ? ?List lst = new ArrayList(); > ? ? ? ? ? ? ? ?for(String seq : seqs) { > ? ? ? ? ? ? ? ? ? ? ? ?lst.add(new DNASequence(seq)); > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?Profile profile = > Alignments.getMultipleSequenceAlignment(lst); //** > ? ? ? ? ? ? ? ?System.out.println(profile); > ? ? ? ? ? ? ? ?ConcurrencyTools.shutdown(); > ? ? ? ?} > > gives: > java.util.concurrent.ExecutionException: java.lang.NullPointerException > ? ? ? ?at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) > ? ? ? ?at java.util.concurrent.FutureTask.get(Unknown Source) > ? ? ? ?at org.biojava3.alignment.Alignments.getListFromFutures(Alignments.java:282) > ? ? ? ?at org.biojava3.alignment.Alignments.runPairwiseScorers(Alignments.java:602) > ? ? ? ?at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:173) > ? ? ? ?at CookbookMSA.main(CookbookMSA.java:49) > > at the 'alignment' line (**) - not sure what the problem is here, I > see that getMultipleSequenceAlignment() has an extra argument(s) in > the Javadoc but these weren't required in the example? > Final question (hopefully!) - once I have the alignments I require I'd > like to create a sequence logo - is there a way of doing this in > BioJava3? From a google search I've seen references to > DistributionTools.distOverAlignment() and similar, but can't find > anything like that in the new api. > > Thanks again, sorry everyone for all the questions! > > Alastair > > > On 6 April 2011 12:09, Scooter Willis wrote: >> >> You need to use the forester.jar file from the biojava3 code check out. If you are using maven this should be automatic. >> >> On Apr 6, 2011 5:07 AM, "Alastair Kilpatrick" wrote: >> >> Dear all, >> I'm pretty new to BioJava so I may be missing something - I've checked >> through the list archives without much luck. >> I've been trying to do some multiple sequence alignments but I've been >> running into strange errors. To try and find where the problem is, >> I've just copied the code straight from >> http://biojava.org/wiki/BioJava:CookBook3:MSA and tried to run it, but >> I'm still getting an error: >> >> Caused by: java.lang.ClassNotFoundException: >> org.forester.phylogenyinference.DistanceMatrix >> >> It would seem that this is something to do with forester.jar, which I >> had to download separately (from http://code.google.com/p/forester/) >> as per http://biojava.org/wiki/BioJava:CookBook#biojava3-alignment - >> when I look through the jar in Eclipse, there isn't anything named >> 'phylogenyinference', although there is 'phylogeny', 'phylogeny.data', >> 'phylogeny.factories' and 'phylogeny.iterators'. Is there something >> I'm doing wrong, or is it the case that either one of BioJava or >> forester has been updated and things have broken somewhere (I see the >> CookBook page was updated in July 2010 but forester was updated just >> last month)? Either way, any ideas would be much appreciated! >> >> Many thanks, >> Alastair Kilpatrick >> >> PhD candidate, >> School of Informatics, University of Edinburgh >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From effat34 at gmail.com Thu Apr 7 07:26:43 2011 From: effat34 at gmail.com (effat farhana) Date: Thu, 7 Apr 2011 17:26:43 +0600 Subject: [Biojava-l] Amino acids physico-chemical properties calculation Message-ID: Hi, I'm a student of Computer Science and Engineering background. I heard about GSoC a few days earlier and very much eager to participate in it. I'm quite familiar with C++, Java multithreading. The idea of this project proposal seems quite interesting to me. One of the methods to be implemented is to calculate the numbers of different amino acids in protein. Would you please explain how the input of this method will be given? Will the protein sequence be represented by String as sequence of amino acids and I've just count the different types of amino acid? Looking forward for your quick reply Thanks in advance farhana From p.v.troshin at dundee.ac.uk Thu Apr 7 17:47:20 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Thu, 07 Apr 2011 22:47:20 +0100 Subject: [Biojava-l] Amino acids physico-chemical properties calculation In-Reply-To: References: Message-ID: <4D9E30E8.3090501@dundee.ac.uk> > >> Would you please explain how the input of this method will be > >> given? Just assume that this would be a String for now. For example "MFVAWLMLADAELGMGDTTAGEMAVQRGLALHPGHPEAVARLGR". Hope that helps. Regards, Peter On 07/04/2011 12:26, effat farhana wrote: > Hi, I'm a student of Computer Science and Engineering background. I > heard about GSoC a few days earlier and very much eager to > participate in it. I'm quite familiar with C++, Java multithreading. > The idea of this project proposal seems quite interesting to me. > > One of the methods to be implemented is to calculate the numbers of > different amino acids in protein. Would you please explain how the > input of this method will be given? Will the protein sequence be > represented by String as sequence of amino acids and I've just count > the different types of amino acid? > > > Looking forward for your quick reply Thanks in advance > > farhana _______________________________________________ Biojava-l > mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Thu Apr 7 18:26:39 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Thu, 07 Apr 2011 23:26:39 +0100 Subject: [Biojava-l] google summer of code proposal In-Reply-To: References: Message-ID: <4D9E3A1F.8050306@dundee.ac.uk> Dear Mihaly, I am sorry for belated response. I'd suggest simplifying your plan a little. For example you do not need to write what you are going to do every day of GSoC. One week precision will be fine. On the other hand, it would be good to include formulas into the plan as well as expand the original idea with something that is in line with the project and interest you. Did you have a go at the coding exercise? If not I would recommend you to do that. Finally do not forget to send your proposal to melange. Regards, Peter On 06/04/2011 21:35, Mihaly Cs?k?s wrote: > > Dear Peter, > > I am very interested in gsoc and I would like to apply for the BioJava > - Amino acids physico-chemical properties calculation project. > > I send you my Proposal in the attachment. Please if you have a little > time, read it throughand write me your opinion. > > Thank you in advance! > > Yours sincerely: > > Mihaly Csokas > From andreas at sdsc.edu Fri Apr 8 00:45:41 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 7 Apr 2011 21:45:41 -0700 Subject: [Biojava-l] GSoC last day to submit proposals Message-ID: Hi, This is the final reminder that the deadline for submitting proposals for the Google Summer of Code is Friday April 8th, 19:00 UTC. Andreas From kohchuanhock at gmail.com Fri Apr 8 13:46:21 2011 From: kohchuanhock at gmail.com (Chuan Hock Koh) Date: Sat, 9 Apr 2011 01:46:21 +0800 Subject: [Biojava-l] Submission of Short Coding Exercise Message-ID: Dear Dr Peter Troshin, I had completed the short coding exercise. As I was about to submit it, I realize that I am unsure about the behavior that runme.jar should exhibit. In Goal 2, it was given that the example command is "java FindEnds inputFile.txt outputFile.txt" Am I right to say that runme.jar should replace FindEnds and run as follows? "java -jar runme.jar inputFile.txt outputFile.txt" Please advice. Thanks. Looking forward to your reply. Regards, Chuan Hock -- http://compbio.ddns.comp.nus.edu.sg/~ChuanHockKoh From p.v.troshin at dundee.ac.uk Fri Apr 8 17:43:29 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 08 Apr 2011 22:43:29 +0100 Subject: [Biojava-l] Submission of Short Coding Exercise In-Reply-To: References: Message-ID: <4D9F8181.9040204@dundee.ac.uk> Hello Chuan, Yes, you are right. Peter On 08/04/2011 18:46, Chuan Hock Koh wrote: > Dear Dr Peter Troshin, > > I had completed the short coding exercise. As I was about to submit it, I > realize that I am unsure about the behavior that runme.jar should exhibit. > > In Goal 2, it was given that the example command is "java FindEnds > inputFile.txt outputFile.txt" > > Am I right to say that runme.jar should replace FindEnds and run as follows? > > "java -jar runme.jar inputFile.txt outputFile.txt" > > Please advice. Thanks. > > Looking forward to your reply.t. > > Regards, > Chuan Hock > From khalil.elmazouari at gmail.com Mon Apr 11 16:25:15 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 11 Apr 2011 22:25:15 +0200 Subject: [Biojava-l] RichSequenceIterator.hasNext from empty file return true!!! Message-ID: <63EB0592-5431-4AB1-A73A-744832E7B547@gmail.com> Hi, RichSequenceIterator.hasNext from empty file return true!!! and throws an infinite BioException loop!!! Any explanation?? Thanks khalil test using the following code public static void main(String[] args) { String path = "emptyFile.txt"; BufferedReader br = null; try { br = new BufferedReader(new FileReader(path)); } catch (FileNotFoundException ex) { ex.printStackTrace(); } RichSequenceIterator seqs = RichSequence.IOTools.readFastaProtein(br, null); while (seqs.hasNext()) { try { RichSequence seq = seqs.nextRichSequence(); } catch (NoSuchElementException ex) { ex.printStackTrace(); } catch (BioException ex) { ex.printStackTrace(); } } ==== org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at com.kem.ae.core.Empty.main(Empty.java:51) Caused by: java.io.IOException: Premature stream end at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 1 more org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at com.kem.ae.core.Empty.main(Empty.java:51) Caused by: java.io.IOException: Premature stream end at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) infinite loop.... From chapmanb at 50mail.com Tue Apr 12 08:36:32 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 12 Apr 2011 08:36:32 -0400 Subject: [Biojava-l] Bioinformatics Open Source Conference (BOSC 2011)--Abstracts due April 18th! Message-ID: <20110412123632.GE2105@kunkel> Only one week left to submit an abstract to BOSC 2011! We have two great keynote speakers lined up (Lawrence Hunter and Matt Wood) and session topics that include parallel and cloud-based approaches to bioinformatics, genome content management, and tools for next-generation sequencing. We'd love to hear about your Open Source bioinformatics project! The 12th Annual Bioinformatics Open Source Conference (BOSC 2011) An ISMB 2011 Special Interest Group (SIG) July 15-16, 2011, in Vienna, Austria http://www.open-bio.org/wiki/BOSC_2011 Important Dates: April 18, 2011: Deadline for submitting abstracts to BOSC 2011 May 9, 2011: Notifications of accepted abstracts emailed to corresponding authors July 13-14, 2011: Codefest 2011 programming session (see http://www.open-bio.org/wiki/Codefest_2011 for details) July 15-16, 2011: BOSC 2011 July 17-19, 2011: ISMB 2011 The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. To be considered for acceptance, software systems representing the central topic in a presentation submitted to BOSC must be licensed with a recognized Open Source License, and be freely available for download in source code form. We invite you to submit abstracts for talks and posters. Sessions include: - Approaches to parallel processing - Cloud-based approaches to improving software and data accessibility - The Semantic Web in open source bioinformatics - Data visualization - Tools for next-generation sequencing - Other Open Source software In addition to the above sessions, there will be a panel discussion about "Meeting the challenges of inter-institutional collaboration". We are also working to arrange a joint session with one of the other ISMB SIGs. Thanks to generous sponsorship from Eagle Genomics and an anonymous donor, we are pleased to announce a competition for three Student Travel Awards for BOSC 2011. Each winner will be awarded $250 to defray the costs of travel to BOSC 2011. All students whose abstracts are accepted for talks will be considered for this award. For instructions on submitting your abstract, please visit http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information BOSC 2011 Organizing Committee: Nomi Harris and Peter Rice (co-chairs); Brad Chapman, Peter Cock, Erwin Frise, Darin London, Ron Taylor From gwaldon at geneinfinity.org Tue Apr 12 14:02:28 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Tue, 12 Apr 2011 13:02:28 -0500 Subject: [Biojava-l] RichSequenceIterator.hasNext from empty file return true!!! In-Reply-To: <63EB0592-5431-4AB1-A73A-744832E7B547@gmail.com> References: <63EB0592-5431-4AB1-A73A-744832E7B547@gmail.com> Message-ID: <20110412130228.31866c2hd15fztm8@gator1273.hostgator.com> Hi Khali, In my hands I found "java.io.FileNotFoundException: emptyFile.txt (The system cannot find the file specified)", which is not exactly the same thing as an empty file. Anyway, I think you are right and RichStreamReader should switch the moreSequenceAvailable flag before throwing the exception. In your case just get out of the loop and you should be safe. Thanks for reporting. George Quoting Khalil El Mazouari : > Hi, > > RichSequenceIterator.hasNext from empty file return true!!! and > throws an infinite BioException loop!!! > Any explanation?? Thanks > > khalil > > test using the following code > > public static void main(String[] args) { > String path = "emptyFile.txt"; > > BufferedReader br = null; > try { > br = new BufferedReader(new FileReader(path)); > } catch (FileNotFoundException ex) { > ex.printStackTrace(); > } > RichSequenceIterator seqs = > RichSequence.IOTools.readFastaProtein(br, null); > while (seqs.hasNext()) { > try { > RichSequence seq = seqs.nextRichSequence(); > } catch (NoSuchElementException ex) { > ex.printStackTrace(); > } catch (BioException ex) { > ex.printStackTrace(); > } > } > > ==== > > org.biojava.bio.BioException: Could not read sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at com.kem.ae.core.Empty.main(Empty.java:51) > Caused by: java.io.IOException: Premature stream end > at > org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 1 more > org.biojava.bio.BioException: Could not read sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at com.kem.ae.core.Empty.main(Empty.java:51) > Caused by: java.io.IOException: Premature stream end > at > org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > > infinite loop.... > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mandarijnopw8 at gmail.com Wed Apr 13 10:47:12 2011 From: mandarijnopw8 at gmail.com (Shamanou van Leeuwen) Date: Wed, 13 Apr 2011 16:47:12 +0200 Subject: [Biojava-l] GSOC Message-ID: <4DA5B770.7040809@gmail.com> Hi everyone, i am student who would like to join GSOC with another programmer. But we still are in need of a mentor who can guide us. is there someone with biojava experience who can help us during GSOC? greetings, Shamanou van Leeuwen From rmb32 at cornell.edu Wed Apr 13 11:30:59 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 13 Apr 2011 08:30:59 -0700 Subject: [Biojava-l] last call for Google Summer of Code mentors Message-ID: <4DA5C1B3.4010504@cornell.edu> Hi all, This is the last call for mentors for Google Summer of Code. We have a good crop of student proposals this year for doing work on OBF projects, and money from Google to fund them, but we need experienced Bio* developers to mentor them. If you'd like to see the student proposals, participate in their scoring, and possibly volunteer to mentor them (remotely of course) over the summer, do two things: 1.) Create an account on http://google-melange.com and send a request to be an admin from the OBF page on there, http://www.google-melange.com/gsoc/org/google/gsoc2011/obf 2.) Join the OBF GSoC mentors mailing list at http://lists.open-bio.org/mailman/listinfo/gsoc-mentors Even if you just want to see the student applications and help with scoring, but don't necessarily have time to mentor a student, your input in the scoring process is appreciated. :-) Rob ---- Robert Buels OBF GSoC 2011 Administrator From andreas at sdsc.edu Wed Apr 13 23:17:29 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 13 Apr 2011 20:17:29 -0700 Subject: [Biojava-l] GSOC In-Reply-To: <4DA5B770.7040809@gmail.com> References: <4DA5B770.7040809@gmail.com> Message-ID: Hi Shamanou, The deadline for submitting proposals was last Friday. Usually the funding is for one student only and not for pairs of students... Andreas On Wed, Apr 13, 2011 at 7:47 AM, Shamanou van Leeuwen wrote: > Hi everyone, > > i am student who would like to join GSOC with another programmer. > But we still are in need of a mentor who can guide us. > > is there someone with biojava experience who can help us during GSOC? > > greetings, > Shamanou van Leeuwen > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From p.v.troshin at dundee.ac.uk Wed Apr 6 11:49:59 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 06 Apr 2011 16:49:59 +0100 Subject: [Biojava-l] Fwd: Google Summer of Coding: Java project In-Reply-To: References: <4D99D6C2.9060107@dundee.ac.uk> Message-ID: <4D9C8BA7.4020806@dundee.ac.uk> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: molecularweight.gif Type: image/gif Size: 1390 bytes Desc: not available URL: From mostafa.shokrof at gmail.com Thu Apr 14 07:35:57 2011 From: mostafa.shokrof at gmail.com (Mostafa Shokrof) Date: Thu, 14 Apr 2011 13:35:57 +0200 Subject: [Biojava-l] Google summer code proposal Message-ID: On Thu, Apr 14, 2011 at 8:48 AM, Jason Stajich wrote: > Mostafa - > > I may have accidently deleted your post to biojava-l about GSOC that was in > the held queue for the mailiman system. If you can resend it to the list one > more time I will make sure it goes through. > > > Name :Mostafa EL-Sayed Shokrof > > Email:mostafa.shokrof at gmail.com > > Physical Address:7 el gomhoreya off tarh el bahr st - el shark -portsaid > -Egypt > > Telephone Number:02-016755576 > > I am a third year student in Bioinformatics department computer science > faculty Ain shams university with average grade A. I am interested in > applying for amino acid properties calculation project which-i do believe- > is a great opportunity to enhance my experience. I chose Open-Bio > organization as it offers interaction with real world Bioinformatics applications > which would certainly help me a lot in my graduation project next semester. > Choosing this project in particular was based on several reasons. Firstly I > believe in the project's importance as it addresses core Bioinformatics > problems and applies Bioinformatics algorithms..Second it will be > implemented in Java ,which has two main advantages ,it is Object oriented > and cross platform . ,so I think BIO-Java project is very useful. > > I also believe I am well suited for this project as I passed both > Algorithms and OOP courses with grade A ,besides my strong background in > biology as I studied biology and Bioinformatics at college. > > Not only did I gain my programming experience by attending training courses > in CIT Global Mobi dev company for mobile applications last summer , but I > also studied the following courses: > > 1.Structural Programming > > 2.Data Structure > > 3.OOP > > 4.Algorithms > > 5.File Organization > > 6.Software Engineering > > Courses being studied this semester are: > > 6.Numerical Analysis > > 7.System Analysis > > > > Concerning my biological background,I have studied: > > 1.Introduction to Biology > > 2.Introduction to Biochemistry > > 3.Introduction to Biophysics > > 4.Introduction to Molecular Cell Biology > > 5.Advanced Molecular Genetics > > 6.Introduction to Bioinformatics > > Courses being studied in this semester: > > 7.Biotechnology > > 8.Structural Bioinformatics > > > > Unfortunately Google schedule does not fit me since our exams are expected > to be delayed due to the current circumstances in Egypt ,so I hope I will be > free on the beginning of July . > > But after that I will dedicate my summer to GSCO .I can work an average 8 > hours a day ,five days a week of total 40 hours a week. The New semester > starts at the half of September. so I will have plenty of time .Finally I > would be really grateful if you provided me this cherish opportunity which > would help me out in my career further on. > > > > > > I attached time line for the project ,UML design ,Abstract of how the > implementation will look and , the Short code Exercise with the proposal, > > I will appreciate your feedback.. > resourceshttps:// > docs.google.com/leaf?id=0B-lMAeKAGH9gYjFjNWFmZmItMGZkMC00NDIwLWFlODgtMDUzYzhlYWEwNzk0&hl=en: > > > my regards, > > Mostafa Shokrof > > Bioinformatics student > Ain Shams University > From erikclarke at gmail.com Thu Apr 14 22:41:48 2011 From: erikclarke at gmail.com (Erik C) Date: Thu, 14 Apr 2011 19:41:48 -0700 Subject: [Biojava-l] needleman-wunsch score problems Message-ID: Hi all, I'm having some trouble reconciling the scores from the NeedlemanWunsch sequence alignment object in BioJava with the scores I'm getting from the EMBL-EBI command-line tool 'needle'. Specifically, for the same sequences, matrix, and penalties, BioJava returns 274 (in one case), while `needle' returns 163. Does anybody have any ideas as to why this might be happening? Is there a parameter or setting I'm missing? My implementation of the n.w. code in biojava is below: public long alignTwoSequences(ProteinSequence subject, ProteinSequence target) { SubstitutionMatrix blosum62 = SubstitutionMatrixHelper.getBlosum62(); GapPenalty penalties = new SimpleGapPenalty(); penalties.setExtensionPenalty((short) .5); penalties.setOpenPenalty((short) 10); NeedlemanWunsch nw = new NeedlemanWunsch(subject, target, penalties, blosum62); return nw.getScore(); } Thanks, Erik From andreas at sdsc.edu Thu Apr 14 23:14:16 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 14 Apr 2011 20:14:16 -0700 Subject: [Biojava-l] needleman-wunsch score problems In-Reply-To: References: Message-ID: Hi Eric, Did you compare the alignments, i.e which pairs of amino acids are getting aligned? There might be subtle differences.. Andreas On Thu, Apr 14, 2011 at 7:41 PM, Erik C wrote: > Hi all, > I'm having some trouble reconciling the scores from the NeedlemanWunsch > sequence alignment object in BioJava with the scores I'm getting from the > EMBL-EBI command-line tool 'needle'. Specifically, for the same sequences, > matrix, and penalties, BioJava returns 274 (in one case), while `needle' > returns 163. > Does anybody have any ideas as to why this might be happening? Is there a > parameter or setting I'm missing? My implementation of the n.w. code in > biojava is below: > > ? ?public long alignTwoSequences(ProteinSequence subject, > ? ? ? ? ? ?ProteinSequence target) { > > ? ? ? ?SubstitutionMatrix blosum62 = > SubstitutionMatrixHelper.getBlosum62(); > ? ? ? ?GapPenalty penalties = new SimpleGapPenalty(); > ? ? ? ?penalties.setExtensionPenalty((short) .5); > ? ? ? ?penalties.setOpenPenalty((short) 10); > ? ? ? ?NeedlemanWunsch nw = new > NeedlemanWunsch(subject, target, > penalties, blosum62); > ? ? ? ?return nw.getScore(); > > ? ?} > > Thanks, > Erik > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jayunit100 at gmail.com Fri Apr 15 13:16:18 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Fri, 15 Apr 2011 13:16:18 -0400 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: References: Message-ID: I adopted a java Needlman Wunsch for proteins and it works fine ; if someone wants to integrate it or integrate or compare it w/ biojava's implementation i can provide it just email me i will send the source . otherwise i can potentially just try to commit it to a sandbox repo if one exists.... From andreas at sdsc.edu Fri Apr 15 15:14:34 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 15 Apr 2011 12:14:34 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> References: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> Message-ID: Hi Jay, CCing the list again, somehow I dropped it off this thread.. If I understand your example right, you want the alignment to only span the domain that is share between the two proteins. This suggests you would use a local alignment, not penalizing end gaps.. If you provide some example IDs we can be more specific... Andreas On Fri, Apr 15, 2011 at 11:32 AM, JAX wrote: > Thanks andreas : ?Is there a way to globally align to sequences using smith watermans optimal local alignment; I.e. So as to get an ideal alignment of two multi domain proteins (which only share one domain), that is gapped in such a way so as to cover the whole protein length? > > Sent from my iPad > > Sent from my iPad > > On Apr 15, 2011, at 2:00 PM, Andreas Prlic wrote: > >> Hi Jay, >> >> thanks for the suggestion. There is already a needleman wunsch and >> smith waterman implementation available as part of the alignment >> module. Have you seen those? >> >> http://biojava.org/wiki/BioJava:CookBook3:PSA >> >> Andreas >> >> On Fri, Apr 15, 2011 at 10:16 AM, Jay Vyas wrote: >>> I adopted a java Needlman Wunsch for proteins and it works fine ; if someone >>> wants to integrate it or integrate or compare it w/ biojava's implementation >>> i can provide it just email me i will send the source . >>> >>> otherwise i can potentially just try to commit it to a sandbox repo if one >>> exists.... >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > From jayunit100 at gmail.com Fri Apr 15 15:30:30 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Fri, 15 Apr 2011 15:30:30 -0400 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: References: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> Message-ID: Consider Kalirin , a multidomain protein. http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=NP_003938.1 Now, lets say I aligned Kalirin with a single domain member of the RHO-Gef family.... Then only one part of the sequence would match.... But what if I WANTED a global alignment, for programmatic purposes... Then I would want an alignment like this ------------------------------------------------RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF------ With lots of gaps in the beggining, and lots of matches at the end, where the length of the alignment between the two proteins was exactly equivalent. Contrast this with a smith waterman output, which would look like this : RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF.... So is there a way to get the specificity of Smith Waterman in a Needlman Wunsch alignment ? From andreas at sdsc.edu Fri Apr 15 17:49:36 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 15 Apr 2011 14:49:36 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: References: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> Message-ID: It sounds like you are talking about doing a needleman wunsch alignment, while not penalizing end gaps. This should give you quite similar results to just doing a smith waterman one. Is it the alignment display that you are concerned about? You could easily fill up the ends with unaligned regions for smith waterman results. I don't find end-gaps particularly informative. Usually an alignment display will give you the aligned positions, then you can easily see (or script) if there are end gaps. Hope that makes sense.. Andreas On Fri, Apr 15, 2011 at 12:30 PM, Jay Vyas wrote: > Consider Kalirin , a multidomain protein. > http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=NP_003938.1 > Now, lets say I aligned Kalirin with a single domain member of the RHO-Gef > family.... ?Then only one part of the sequence would match.... ?But what if > I WANTED a global alignment, for programmatic purposes... Then I would want > an alignment like this > ------------------------------------------------RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF------ > With lots of gaps in the beggining, and lots of matches at the end, where > the length of the alignment between the two proteins was exactly equivalent. > ?Contrast this with a smith waterman output, which would look like this : > RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF.... > So is there a way to get the specificity of Smith Waterman in a Needlman > Wunsch alignment ? From Wim.DeSmet at UGent.be Mon Apr 18 11:22:26 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Mon, 18 Apr 2011 17:22:26 +0200 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle Message-ID: <4DAC5732.5010507@UGent.be> Hi all, I've been trying to generate some global alignments with biojava and comparing them with what needle returns. Doing this, I can't seem to reproduce needle's alignment with biojava. The score returned from biojava seems to be worse than that from needle, so I'm not sure what's happening here. The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid spamming people, let me know if you want one). I align them with: GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); PairwiseSequenceAligner aligner = Alignments.getPairwiseAligner( new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), PairwiseSequenceAlignerType.GLOBAL, penalty, SubstitutionMatrixHelper.getNuc4_4()); SequencePair alignment = aligner.getPair(); This gives me an alignment with only 23% similarity and a gap at the end. Varying the gap penalties can give me a gap in front too, but that's about it. When aligning in needle, I get a sequence with a higher score (6784 vs (-)5862) and 94% similarity (which seems closer to home). Needle I just run with defaults (so it uses EDNAFULL) and a go/ge of 14/4. Could this be a bug or am I misunderstanding some of the options? BTW, if I use a really large gapextend, say -4000, I also get a nullpointer exception. TIA, Wim De Smet -- Wim De Smet http://www.straininfo.net/ From andreas at sdsc.edu Mon Apr 18 14:34:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 18 Apr 2011 11:34:57 -0700 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle In-Reply-To: <4DAC5732.5010507@UGent.be> References: <4DAC5732.5010507@UGent.be> Message-ID: Hi Wim, thanks for tracking this down. I agree, something does not look right here. I'll try to see what is going on... Andreas On Mon, Apr 18, 2011 at 8:22 AM, Wim De Smet wrote: > Hi all, > > I've been trying to generate some global alignments with biojava and > comparing them with what needle returns. Doing this, I can't seem to > reproduce needle's alignment with biojava. The score returned from biojava > seems to be worse than that from needle, so I'm not sure what's happening > here. > > The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid > spamming people, let me know if you want one). I align them with: > GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); > PairwiseSequenceAligner aligner = > Alignments.getPairwiseAligner( > new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), > new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), > PairwiseSequenceAlignerType.GLOBAL, > penalty, SubstitutionMatrixHelper.getNuc4_4()); > SequencePair > alignment = aligner.getPair(); > > This gives me an alignment with only 23% similarity and a gap at the end. > Varying the gap penalties can give me a gap in front too, but that's about > it. When aligning in needle, I get a sequence with a higher score (6784 vs > (-)5862) and 94% similarity (which seems closer to home). Needle I just run > with defaults (so it uses EDNAFULL) and a go/ge of 14/4. > > Could this be a bug or am I misunderstanding some of the options? > > BTW, if I use a really large gapextend, say -4000, I also get a nullpointer > exception. > > TIA, > Wim De Smet > > -- > Wim De Smet > http://www.straininfo.net/ > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Wed Apr 20 13:33:03 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 20 Apr 2011 10:33:03 -0700 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle In-Reply-To: <4DAC5732.5010507@UGent.be> References: <4DAC5732.5010507@UGent.be> Message-ID: Hi Wim, are you sure you are using the correct sequences in your test? When I run the code at the bottom of this emails I am getting 95 and 97% sequence ID, which is similar to what you are expecting. Andreas Here my code: (using latest code from SVN) package demo; import org.biojava3.alignment.Alignments; import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType; import org.biojava3.alignment.SimpleGapPenalty; import org.biojava3.alignment.SubstitutionMatrixHelper; import org.biojava3.alignment.template.GapPenalty; import org.biojava3.alignment.template.PairwiseSequenceAligner; import org.biojava3.alignment.template.SequencePair; import org.biojava3.core.sequence.DNASequence; import org.biojava3.core.sequence.compound.AmbiguityDNACompoundSet; import org.biojava3.core.sequence.compound.NucleotideCompound; public class TestDNANeedlemanWunsch { public static void main(String[] args){ String query = "AGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGTGAAGCCCAGCTTGCTGGGTGGATCA" + "GTGGCGAACGGGTGAGTAACACGTGAGCAACCTGCCCCTGACTCTGGGATAAGCGCTGGAAACGGTGTCT" + "AATACTGGATATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCG" + "GCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGTCGACGGGTAGCCGGCCTGAGAGGGTGACC" + "GGCCACACTGGGACTGAGACACGGCCCAGACTCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGC" + "GGAAGCCTGATGCAGCAACGCCGCGTGAGGGACGACGGCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAG" + "AAGCGAGAGTGACGGTACCTGCAGAAAAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAG" + "GGCGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAA" + "CCCGAGGCTCAACCTNNGGGCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCC" + "TGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAAC" + "TGACGCTGAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTT" + "GGGAACTAGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCACGTAACGCATTAAGTTCCCCGCCTGGGG" + "AGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTA" + "AATCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCT" + "TTGGACACTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCG" + "CAACGAGCGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTC" + "AACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACA" + "ATGGCCGGTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATT" + "GAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATA" + "CGTTCCCGGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCTAA" + "CCCTTGTGGAGGGAGCCGGTAATTAAA"; String target = "CTGGCCGCCTGCTTAACACATCCAAGTCGAACGGTGAAGCCCCANCTTACTGGGTGGATCAGTGCCGAAC" + "GGGTGAGTAACACGTGAGCAACCTCCCCCTGACTCTGGGATAAGCGCTGGAANCGGTGTCTAATACTGGA" + "TATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCGCCCTATGAG" + "CTTGTTGGTGAGGTAATGGCTCACCAAGCCGTCGACGGGTAGCCGGCCTGAGAGGGTGACCGNCCACACT" + "GGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGGAAGCCT" + "GATTCANCAACCCCGCGTGAGGGACGACGGCCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAGAAGCGAG" + "AGTGACGGTACCTGCAGAAAAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAA" + "GCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAACCCGAGG" + "CTCAACCTCGGGCCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTA" + "GCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCT" + "GAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTTGGGAACT" + "AGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCAGCTAACGCATTAAGTTCCCCGCCTGGGGAGTACGG" + "CCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTAATTCGAT" + "GCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCTTTGGACA" + "CTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAG" + "CGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTCAACTCGG" + "AGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCG" + "GTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATTGAGGTCT" + "GCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCC" + "GGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCCAACCCTTGT" + "GGAGGGAGCCGTCGAAGGTGGGATCGGTAATTAGGACTAAGTCGTAACAAGGTAGCCGTACC"; GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); PairwiseSequenceAligner aligner = Alignments.getPairwiseAligner( new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), PairwiseSequenceAlignerType.GLOBAL, penalty, SubstitutionMatrixHelper.getNuc4_4()); SequencePair alignment = aligner.getPair(); System.out.println(alignment); int identical = alignment.getNumIdenticals(); System.out.println("Number of identical residues: " + identical); System.out.println("% identical query: " + identical / (float) query.length() ); System.out.println("% identical query: " + identical / (float) target.length() ); } } On Mon, Apr 18, 2011 at 8:22 AM, Wim De Smet wrote: > Hi all, > > I've been trying to generate some global alignments with biojava and > comparing them with what needle returns. Doing this, I can't seem to > reproduce needle's alignment with biojava. The score returned from biojava > seems to be worse than that from needle, so I'm not sure what's happening > here. > > The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid > spamming people, let me know if you want one). I align them with: > GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); > PairwiseSequenceAligner aligner = > Alignments.getPairwiseAligner( > new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), > new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), > PairwiseSequenceAlignerType.GLOBAL, > penalty, SubstitutionMatrixHelper.getNuc4_4()); > SequencePair > alignment = aligner.getPair(); > > This gives me an alignment with only 23% similarity and a gap at the end. > Varying the gap penalties can give me a gap in front too, but that's about > it. When aligning in needle, I get a sequence with a higher score (6784 vs > (-)5862) and 94% similarity (which seems closer to home). Needle I just run > with defaults (so it uses EDNAFULL) and a go/ge of 14/4. > > Could this be a bug or am I misunderstanding some of the options? > > BTW, if I use a really large gapextend, say -4000, I also get a nullpointer > exception. > > TIA, > Wim De Smet > > -- > Wim De Smet > http://www.straininfo.net/ > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jayunit100 at gmail.com Wed Apr 20 13:54:44 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Wed, 20 Apr 2011 13:54:44 -0400 Subject: [Biojava-l] biojava resolver issues in maven Message-ID: Hi guys, I've intermittently been getting some weird biojava/maven resolution errors. Any ideas on this ? 4/20/11 1:52:13 PM EDT: [WARN] Failure to transfer org.biojava:biojava3-structure:3.0-alpha3-SNAPSHOT/maven-metadata.xml from http://download.java.net/maven/1 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repository.dev.java.net has elapsed or updates are forced. From andreas at sdsc.edu Wed Apr 20 14:04:38 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 20 Apr 2011 11:04:38 -0700 Subject: [Biojava-l] biojava resolver issues in maven In-Reply-To: References: Message-ID: BioJava jar files are hosted at our own repository, not the public Maven ones. It should be in your list of repositories, however if you are setting up your own custom pom file, make sure to add: ... biojava-maven-repo BioJava repository http://www.biojava.org/download/maven/ true true Andreas On Wed, Apr 20, 2011 at 10:54 AM, Jay Vyas wrote: > ? Hi guys, I've intermittently been getting some weird biojava/maven > resolution errors. ?Any ideas on this ? > > 4/20/11 1:52:13 PM EDT: [WARN] Failure to transfer > org.biojava:biojava3-structure:3.0-alpha3-SNAPSHOT/maven-metadata.xml from > http://download.java.net/maven/1 was cached in the local repository, > resolution will not be reattempted until the update interval of > maven-repository.dev.java.net has elapsed or updates are forced. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From frieman6 at gmail.com Thu Apr 21 02:44:02 2011 From: frieman6 at gmail.com (omer f) Date: Thu, 21 Apr 2011 09:44:02 +0300 Subject: [Biojava-l] Bio Java API - Amino Acids Data Message-ID: Hi All, I Applied to the "Amino Acid" Project, I found in the Bio Java API there is Amino Acids Objects, But what i couldn't find - where there is Data (weight,Charge...) about each Amino Acid, My question, is there basic Data base for Amino Acids or we should get it from user input, Thank you, Omer. From Wim.DeSmet at UGent.be Thu Apr 21 03:54:56 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Thu, 21 Apr 2011 09:54:56 +0200 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle In-Reply-To: References: <4DAC5732.5010507@UGent.be> Message-ID: <4DAFE2D0.7090601@UGent.be> Hi Andreas Thanks for having a look. I found the issue: if your sequence is lower case, the alignment is different. Try aligning with query.toLowerCase() and target.toLowerCase(), then the output is: Number of identical residues: 334 % identical query: 0.23405746 % identical query: 0.22845417 vs upper case Number of identical residues: 1394 % identical query: 0.9768746 % identical query: 0.95348835 So I just inserted a toUpperCase() and it works now. Regards Wim On 20-04-11 19:33, Andreas Prlic wrote: > Hi Wim, > > are you sure you are using the correct sequences in your test? When I > run the code at the bottom of this emails I am getting 95 and 97% > sequence ID, which is similar to what you are expecting. > > Andreas > > Here my code: (using latest code from SVN) > > package demo; > > import org.biojava3.alignment.Alignments; > import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType; > import org.biojava3.alignment.SimpleGapPenalty; > import org.biojava3.alignment.SubstitutionMatrixHelper; > import org.biojava3.alignment.template.GapPenalty; > import org.biojava3.alignment.template.PairwiseSequenceAligner; > import org.biojava3.alignment.template.SequencePair; > import org.biojava3.core.sequence.DNASequence; > import org.biojava3.core.sequence.compound.AmbiguityDNACompoundSet; > import org.biojava3.core.sequence.compound.NucleotideCompound; > > public class TestDNANeedlemanWunsch { > public static void main(String[] args){ > > String query = > "AGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGTGAAGCCCAGCTTGCTGGGTGGATCA" > + > "GTGGCGAACGGGTGAGTAACACGTGAGCAACCTGCCCCTGACTCTGGGATAAGCGCTGGAAACGGTGTCT" + > "AATACTGGATATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCG" + > "GCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGTCGACGGGTAGCCGGCCTGAGAGGGTGACC" + > "GGCCACACTGGGACTGAGACACGGCCCAGACTCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGC" + > "GGAAGCCTGATGCAGCAACGCCGCGTGAGGGACGACGGCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAG" + > "AAGCGAGAGTGACGGTACCTGCAGAAAAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAG" + > "GGCGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAA" + > "CCCGAGGCTCAACCTNNGGGCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCC" + > "TGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAAC" + > "TGACGCTGAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTT" + > "GGGAACTAGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCACGTAACGCATTAAGTTCCCCGCCTGGGG" + > "AGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTA" + > "AATCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCT" + > "TTGGACACTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCG" + > "CAACGAGCGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTC" + > "AACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACA" + > "ATGGCCGGTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATT" + > "GAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATA" + > "CGTTCCCGGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCTAA" + > "CCCTTGTGGAGGGAGCCGGTAATTAAA"; > > String target = > "CTGGCCGCCTGCTTAACACATCCAAGTCGAACGGTGAAGCCCCANCTTACTGGGTGGATCAGTGCCGAAC" > + > "GGGTGAGTAACACGTGAGCAACCTCCCCCTGACTCTGGGATAAGCGCTGGAANCGGTGTCTAATACTGGA" + > "TATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCGCCCTATGAG" + > "CTTGTTGGTGAGGTAATGGCTCACCAAGCCGTCGACGGGTAGCCGGCCTGAGAGGGTGACCGNCCACACT" + > "GGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGGAAGCCT" + > "GATTCANCAACCCCGCGTGAGGGACGACGGCCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAGAAGCGAG" + > "AGTGACGGTACCTGCAGAAAAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAA" + > "GCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAACCCGAGG" + > "CTCAACCTCGGGCCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTA" + > "GCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCT" + > "GAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTTGGGAACT" + > "AGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCAGCTAACGCATTAAGTTCCCCGCCTGGGGAGTACGG" + > "CCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTAATTCGAT" + > "GCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCTTTGGACA" + > "CTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAG" + > "CGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTCAACTCGG" + > "AGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCG" + > "GTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATTGAGGTCT" + > "GCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCC" + > "GGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCCAACCCTTGT" + > "GGAGGGAGCCGTCGAAGGTGGGATCGGTAATTAGGACTAAGTCGTAACAAGGTAGCCGTACC"; > > GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); > PairwiseSequenceAligner aligner = > Alignments.getPairwiseAligner( > new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), > new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), > PairwiseSequenceAlignerType.GLOBAL, > penalty, SubstitutionMatrixHelper.getNuc4_4()); > SequencePair > alignment = aligner.getPair(); > > System.out.println(alignment); > > int identical = alignment.getNumIdenticals(); > System.out.println("Number of identical residues: " + identical); > System.out.println("% identical query: " + identical / (float) > query.length() ); > System.out.println("% identical query: " + identical / (float) > target.length() ); > } > } > > > > > > > On Mon, Apr 18, 2011 at 8:22 AM, Wim De Smet wrote: >> Hi all, >> >> I've been trying to generate some global alignments with biojava and >> comparing them with what needle returns. Doing this, I can't seem to >> reproduce needle's alignment with biojava. The score returned from biojava >> seems to be worse than that from needle, so I'm not sure what's happening >> here. >> >> The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid >> spamming people, let me know if you want one). I align them with: >> GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); >> PairwiseSequenceAligner aligner = >> Alignments.getPairwiseAligner( >> new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), >> new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), >> PairwiseSequenceAlignerType.GLOBAL, >> penalty, SubstitutionMatrixHelper.getNuc4_4()); >> SequencePair >> alignment = aligner.getPair(); >> >> This gives me an alignment with only 23% similarity and a gap at the end. >> Varying the gap penalties can give me a gap in front too, but that's about >> it. When aligning in needle, I get a sequence with a higher score (6784 vs >> (-)5862) and 94% similarity (which seems closer to home). Needle I just run >> with defaults (so it uses EDNAFULL) and a go/ge of 14/4. >> >> Could this be a bug or am I misunderstanding some of the options? >> >> BTW, if I use a really large gapextend, say -4000, I also get a nullpointer >> exception. >> >> TIA, >> Wim De Smet >> >> -- >> Wim De Smet >> http://www.straininfo.net/ >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > -- Wim De Smet http://www.straininfo.net/ From khalil.elmazouari at gmail.com Thu Apr 21 06:36:29 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 21 Apr 2011 12:36:29 +0200 Subject: [Biojava-l] Codon count Message-ID: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> Hi, I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. Any suggestion is welcome. Regards, khalil From ayates at ebi.ac.uk Thu Apr 21 07:18:55 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 12:18:55 +0100 Subject: [Biojava-l] Bio Java API - Amino Acids Data In-Reply-To: References: Message-ID: <4408CC30-6456-4269-B192-E9F66CEDAC3C@ebi.ac.uk> Hi Omer, If you are using BioJava3 then weight is available from the AminoAcidCompound object when taken from an AminoAcidCompoundSet. The other data points are not available from within BioJava. I know Andreas mentioned ages ago linking into one of the compound databases e.g. ChEBI but I don't think that's been done. Regards, Andy On 21 Apr 2011, at 07:44, omer f wrote: > Hi All, > > I Applied to the "Amino Acid" Project, > I found in the Bio Java API there is Amino Acids Objects, > But what i couldn't find - where there is Data (weight,Charge...) about each > Amino Acid, > My question, is there basic Data base for Amino Acids or we should get it > from user input, > > Thank you, > Omer. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Thu Apr 21 07:23:47 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 12:23:47 +0100 Subject: [Biojava-l] Codon count In-Reply-To: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> Message-ID: Hi Khalil, I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. Regards, Andy On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: > Hi, > > I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. > > Any suggestion is welcome. > > Regards, > > khalil > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From khalil.elmazouari at gmail.com Thu Apr 21 07:37:16 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 21 Apr 2011 13:37:16 +0200 Subject: [Biojava-l] Codon count In-Reply-To: References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> Message-ID: <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> Thanks Andy, it's the second option I am looking for. Regards, khalil On 21 Apr 2011, at 13:23, Andy Yates wrote: > Hi Khalil, > > I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. > > Regards, > > Andy > > On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: > >> Hi, >> >> I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. >> >> Any suggestion is welcome. >> >> Regards, >> >> khalil >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > From ayates at ebi.ac.uk Thu Apr 21 07:40:00 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 12:40:00 +0100 Subject: [Biojava-l] Codon count In-Reply-To: <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> Message-ID: <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> Hi Khalil, Then I think windowed sequence is the only way to go. Actually one particularly "interesting" idea has just sprung to mind. What if you translated the entire sequence in frame 1 forward & reverse? Then finding the amount of correct codons is a case of looking for amino acids which are not a stop or unknown amino acid. Andy On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote: > Thanks Andy, > it's the second option I am looking for. > > Regards, > khalil > > > > On 21 Apr 2011, at 13:23, Andy Yates wrote: > >> Hi Khalil, >> >> I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. >> >> Regards, >> >> Andy >> >> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: >> >>> Hi, >>> >>> I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. >>> >>> Any suggestion is welcome. >>> >>> Regards, >>> >>> khalil >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From khalil.elmazouari at gmail.com Thu Apr 21 07:54:23 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 21 Apr 2011 13:54:23 +0200 Subject: [Biojava-l] Codon count In-Reply-To: <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> Message-ID: <7DFE6B97-A465-4A1A-8EC6-A6BEC1EFCBF3@gmail.com> Hi Andy, I am actually counting codons via 6 ORFs translations. I am working on ?100.000 seq/run => 600.000 ORFs to check. So, performance is an issue for my job. I am just wondering if counting Codons directly on NT seq (both strand) will be faster vs translation + AA counting. Regards, khalil On 21 Apr 2011, at 13:40, Andy Yates wrote: > Hi Khalil, > > Then I think windowed sequence is the only way to go. Actually one particularly "interesting" idea has just sprung to mind. What if you translated the entire sequence in frame 1 forward & reverse? Then finding the amount of correct codons is a case of looking for amino acids which are not a stop or unknown amino acid. > > Andy > > On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote: > >> Thanks Andy, >> it's the second option I am looking for. >> >> Regards, >> khalil >> >> >> >> On 21 Apr 2011, at 13:23, Andy Yates wrote: >> >>> Hi Khalil, >>> >>> I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. >>> >>> Regards, >>> >>> Andy >>> >>> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: >>> >>>> Hi, >>>> >>>> I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. >>>> >>>> Any suggestion is welcome. >>>> >>>> Regards, >>>> >>>> khalil >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > From ayates at ebi.ac.uk Thu Apr 21 08:06:35 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 13:06:35 +0100 Subject: [Biojava-l] Codon count In-Reply-To: <7DFE6B97-A465-4A1A-8EC6-A6BEC1EFCBF3@gmail.com> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> <7DFE6B97-A465-4A1A-8EC6-A6BEC1EFCBF3@gmail.com> Message-ID: <6177ca65-9f8b-40bb-8856-03bf6ad62361@email.android.com> There will be a performance hit but you'll be rewriting the translation code so maybe the speed reduction isn't worth the recoding task. Give it a benchmark before recoding. I can't remember the exact speed but it isn't too slow Andy Khalil El Mazouari wrote: >Hi Andy, > >I am actually counting codons via 6 ORFs translations. I am working on >?100.000 seq/run => 600.000 ORFs to check. So, performance is an issue >for my job. > >I am just wondering if counting Codons directly on NT seq (both strand) >will be faster vs translation + AA counting. > >Regards, > >khalil > > >On 21 Apr 2011, at 13:40, Andy Yates wrote: > >> Hi Khalil, >> >> Then I think windowed sequence is the only way to go. Actually one >particularly "interesting" idea has just sprung to mind. What if you >translated the entire sequence in frame 1 forward & reverse? Then >finding the amount of correct codons is a case of looking for amino >acids which are not a stop or unknown amino acid. >> >> Andy >> >> On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote: >> >>> Thanks Andy, >>> it's the second option I am looking for. >>> >>> Regards, >>> khalil >>> >>> >>> >>> On 21 Apr 2011, at 13:23, Andy Yates wrote: >>> >>>> Hi Khalil, >>>> >>>> I'm not 100% sure what you want here. If you just want to know the >potential number of codons on both strands of DNA then it would be >(length / 3)*2. If what you are actually asking for is how many codons >code for an amino acid then you would have to perform work similar to >the transcription engine in BJ3. All codon tables are available from >the IUPACParser class & then it would be up to you to use a >WindowedSequence over the top of your NT sequence to get the windows or >SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the >WindowedSequence. >>>> >>>> Regards, >>>> >>>> Andy >>>> >>>> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: >>>> >>>>> Hi, >>>>> >>>>> I am looking for a simple method or class to count the number of a >specific AA codon on NT seq. Counting on both strands. >>>>> >>>>> Any suggestion is welcome. >>>>> >>>>> Regards, >>>>> >>>>> khalil >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>> >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> From flf.mib at gmail.com Thu Apr 21 17:18:01 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_F=E8vre?=) Date: Thu, 21 Apr 2011 23:18:01 +0200 Subject: [Biojava-l] translation and leucine Message-ID: <4DB09F09.4060303@gmail.com> Dear all, i have a quick question about translation with biojava 3 I would like to retrieve that the codon CTA is coding for a leucine. But some leucine codons code also for a start in Universal Genetic code Here I have build a very short example: given a short dna sequence coomposed of a start codon and 6 leucine codons TranscriptionEngine e = TranscriptionEngine.getDefault(); DNASequence dd = new DNASequence("ATGTTGTTACTTCTCCTACTG"); //return MLLLLLL : OK DNASequence dd = new DNASequence("CTATTGTTACTTCTCCTACTG"); //return MLLLLLL : KO I would prefer have LLLLLLL ! DNASequence dd = new DNASequence("CTA"); //return M : KO I would prefer have L ! Could someone explain me this feature ? How the default transcritionEngine works? How can I ask to the TranscriptionEngine give me the aminoacids corresponding to CTA when it is not in first position? Thanks a lot for your help ! Francois From ayates at ebi.ac.uk Thu Apr 21 17:46:14 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 22:46:14 +0100 Subject: [Biojava-l] translation and leucine In-Reply-To: <4DB09F09.4060303@gmail.com> References: <4DB09F09.4060303@gmail.com> Message-ID: <44E003C7-40C8-4595-BCAC-1613C296FFEC@ebi.ac.uk> Hi Francois, The engine by default will always return the first amino acid as an init met if the amino acid could have been a start codon (but stupid atmo but it will be altered). If you do not want this behaviour then you'll have to create your own. You can do this using the following: TranscriptionEngine e = new TranscriptionEngine.Builder().initMet(false).build(); HTH, Andy On 21 Apr 2011, at 22:18, Fran?ois Le F?vre wrote: > Dear all, > i have a quick question about translation with biojava 3 > > I would like to retrieve that the codon CTA is coding for a leucine. > But some leucine codons code also for a start in Universal Genetic code > > Here I have build a very short example: > given a short dna sequence coomposed of a start codon and 6 leucine codons > > TranscriptionEngine e = TranscriptionEngine.getDefault(); > DNASequence dd = new DNASequence("ATGTTGTTACTTCTCCTACTG"); > //return MLLLLLL : OK > > DNASequence dd = new DNASequence("CTATTGTTACTTCTCCTACTG"); > //return MLLLLLL : KO I would prefer have LLLLLLL ! > > DNASequence dd = new DNASequence("CTA"); > //return M : KO I would prefer have L ! > > Could someone explain me this feature ? > How the default transcritionEngine works? > > How can I ask to the TranscriptionEngine give me the aminoacids corresponding to CTA when it is not in first position? > > Thanks a lot for your help ! > > Francois > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From flf.mib at gmail.com Sat Apr 23 02:12:53 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_F=E8vre?=) Date: Sat, 23 Apr 2011 08:12:53 +0200 Subject: [Biojava-l] translation and leucine In-Reply-To: <44E003C7-40C8-4595-BCAC-1613C296FFEC@ebi.ac.uk> References: <4DB09F09.4060303@gmail.com> <44E003C7-40C8-4595-BCAC-1613C296FFEC@ebi.ac.uk> Message-ID: <4DB26DE5.8000805@gmail.com> Andy ok, perfect I did not know it! Thanks Francois > Hi Francois, > > The engine by default will always return the first amino acid as an init met if the amino acid could have been a start codon (but stupid atmo but it will be altered). If you do not want this behaviour then you'll have to create your own. You can do this using the following: > > TranscriptionEngine e = new TranscriptionEngine.Builder().initMet(false).build(); > > HTH, > > Andy > > On 21 Apr 2011, at 22:18, Fran?ois Le F?vre wrote: > >> Dear all, >> i have a quick question about translation with biojava 3 >> >> I would like to retrieve that the codon CTA is coding for a leucine. >> But some leucine codons code also for a start in Universal Genetic code >> >> Here I have build a very short example: >> given a short dna sequence coomposed of a start codon and 6 leucine codons >> >> TranscriptionEngine e = TranscriptionEngine.getDefault(); >> DNASequence dd = new DNASequence("ATGTTGTTACTTCTCCTACTG"); >> //return MLLLLLL : OK >> >> DNASequence dd = new DNASequence("CTATTGTTACTTCTCCTACTG"); >> //return MLLLLLL : KO I would prefer have LLLLLLL ! >> >> DNASequence dd = new DNASequence("CTA"); >> //return M : KO I would prefer have L ! >> >> Could someone explain me this feature ? >> How the default transcritionEngine works? >> >> How can I ask to the TranscriptionEngine give me the aminoacids corresponding to CTA when it is not in first position? >> >> Thanks a lot for your help ! >> >> Francois >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l From rmb32 at cornell.edu Mon Apr 25 17:42:48 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 25 Apr 2011 14:42:48 -0700 Subject: [Biojava-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4DB5EAD8.1020905@cornell.edu> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 6 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Justinas Vygintas Daugmaudis Michele dos Santos da Silva (2 students!) Mocapy++Biopython: from data to probabilistic models of biomolecules mentored by Thomas Hamelryck and Eric Talevich Chuan Hock Koh BioJava - Amino acids physico-chemical properties calculation mentored by Peter Troshin, Andreas Prlic, and Jay Vyas Micha? Koziarski Representing bio-objects and related information with images (BioRuby) mentored by Raoul J.P. Bonnal and Francesco Strozzi Sheena Scroggins Major BioPerl Reorganization mentored by Robert Buels and Chris Fields Mikael Eric Trellet Interface analysis module for BioPython mentored by Jo?o Rodrigues and Eric Talevich Once again this year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2011 Administrator From pavlo.lutsik at googlemail.com Tue Apr 26 07:07:42 2011 From: pavlo.lutsik at googlemail.com (Pavlo Lutsik) Date: Tue, 26 Apr 2011 13:07:42 +0200 Subject: [Biojava-l] NCBIQBlastService Message-ID: Hi, cannot get the subject class working in the Cookbook example. Simple rbw.printRemoteBlastInfo() throws the java.lang.Exception: Impossible to get info from QBlast service at this time. Check your network connection. Any ideas? Best, Pavlo Lutsik From p.v.troshin at dundee.ac.uk Wed Apr 27 09:49:54 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 27 Apr 2011 14:49:54 +0100 Subject: [Biojava-l] amino acid physico-chemical properties calculation project In-Reply-To: References: Message-ID: <4DB81F02.2020504@dundee.ac.uk> Dear students, I would like to thank all of you for your interest in and enthusiasm for the amino acid physico-chemical properties calculation project 2011 Google Summer of Code. Unfortunately, only one student has to be chosen and I am sorry if this happens not to be you. This idea generated enormous interest and we have received a total of 18 applications for this idea. Many of those are of good or very good quality. I worked with many of you during the application process, and was very impressed by the level of enthusiasm, energy, and capability I saw in the applications and our conversations on the mailing list. It wasn't easy to choose the best applicant, but we had to do it. A few general comments on the solutions for the short coding exercise. 1) Make sure you understand the task. About half of the application fails to stick to the specifications. 2) Only a few people used threads correctly. Among them only one person used java.concurrency package, which I expected everyone to use. I would recommend reading B.Goetz "Java Concurrency in Practice" if you want to learn more about multithreading in Java. 3) Do not overcomplicate the solution; good programmer would do just what he was asked to. 4) You can determine whether your solution worked correctly by processing the following input aaabbbccc bbcccalkfw aaaabb aaabbbcc caaabbbccc aaabbbccc aaaaabbbbb abbbbbccdddd sjdhfjksdhfk weiuriweiru ddddrepeatrepeat repeatrepeat and comparing it to the output that should have been produced: aaab a c aaaa sjdhfjksdhfk dddd Thanks to everyone again, and I wish you all the best of luck with whatever endeavour you take on. Regards, Peter On 26/04/2011 07:33, Alexandru Paiu wrote: > Hi Peter . > > I want to know what should I improve in the next gsoc . I wan't to > know what wasn't right for my application . > > In my opinion I think that I did a good job with the solutions for > instability index and isoelectric point which are the hardest methods > . I even used the jay vyas suggestion with those hashtables . > > Please , give me an advice > > Best regards Paiu Alexandru From p.v.troshin at dundee.ac.uk Wed Apr 27 10:05:42 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 27 Apr 2011 15:05:42 +0100 Subject: [Biojava-l] Please give me an advice In-Reply-To: References: Message-ID: <4DB822B6.9030003@dundee.ac.uk> Dear Alex, There was nothing particularly wrong with your project plan, it just was not as good as the winning proposal. Your solution for the coding exercise did not run out of the box (java -jar runme.jar), was not correct and could have been better engineered. I'd recommend you to contribute to the open source project of your choice before applying to the GSoC next year. This way you will be in a much better position for the next GSoC. Regards, Peter On 26/04/2011 07:33, Alexandru Paiu wrote: > Hi Peter . > > I want to know what should I improve in the next gsoc . I wan't to > know what wasn't right for my application . > > In my opinion I think that I did a good job with the solutions for > instability index and isoelectric point which are the hardest methods > . I even used the jay vyas suggestion with those hashtables . > > Please , give me an advice > > Best regards > Paiu Alexandru From flf.mib at gmail.com Wed Apr 27 15:54:28 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_F=E8vre?=) Date: Wed, 27 Apr 2011 21:54:28 +0200 Subject: [Biojava-l] biojava3 and blast xml parser Message-ID: <4DB87474.1080307@gmail.com> Dear all, I have just a difficulty: is there still a blast xml parser in the biojava 3.0-SNAPSHOT I am not able to find it... I have found http://www.biojava.org/docs/api1.8/org/biojava/bio/program/sax/blastxml/BlastXMLParser.html with a good tutorial here http://biojava.org/wiki/BioJava:CookBook:Blast:Echo but it seems to be part of biojava 1.8 can you confirm me? Thanks Francois From willishf at ufl.edu Wed Apr 27 17:30:52 2011 From: willishf at ufl.edu (Scooter Willis) Date: Wed, 27 Apr 2011 17:30:52 -0400 Subject: [Biojava-l] biojava3 and blast xml parser In-Reply-To: <4DB87474.1080307@gmail.com> References: <4DB87474.1080307@gmail.com> Message-ID: We did not migrate the blast xml parser from 1.8 to 3.0. Typically I will load the XML as a DOM object and use xpath to query the desired results. I have some code in the biojava3-genome module (org.biojava3.genome.BlastXMLQuery) that could possibly provide a solution for you depending on what you are trying to accomplish. Let me know if you have specific requirements that could help motivate formal blast XML->Java objects support in Biojava3. Thanks Scooter 2011/4/27 Fran?ois Le F?vre : > Dear all, > > I have just a difficulty: is there still a blast xml parser in the biojava > 3.0-SNAPSHOT > I am not able to find it... > > I have found > http://www.biojava.org/docs/api1.8/org/biojava/bio/program/sax/blastxml/BlastXMLParser.html > with a good tutorial here > http://biojava.org/wiki/BioJava:CookBook:Blast:Echo > > but it seems to be part of biojava 1.8 > > can you confirm me? > > Thanks > > Francois > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From gwaldon at geneinfinity.org Sat Apr 2 00:11:02 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Fri, 01 Apr 2011 19:11:02 -0500 Subject: [Biojava-l] Expasy pI calculation algorythm Message-ID: <20110401191102.44902rooroy3x9k4@gator1273.hostgator.com> Hello, Sorry if this comes a bit late; we had to solve some email issues - Thanks again to Andreas for doing it. This is part of the email exchange I had with Christine Hoogland and Gregoire Rossier a few years ago regarding the algorithm used by "Compute pI/Mw" on the Expazy server. The code which was given to me is included at the end of this email; I used it to update bj1. Good luck to all GSoC candidates, George On Tue, May 22, 2007 at 9:26 AM, Christine Hoogland via RT wrote: Dear George, Please find enclosed the algorithm we are using on ExPASy. I hope this helps. Best regards Christine > > The pK values used for "Compute pI/Mw" can be found in > > # Bjellqvist, B.,Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F., > Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing > positions of polypeptides in immobilized pH gradients can be predicted > from their amino acid sequences. Electrophoresis 1993, 14, 1023-1031. > > MEDLINE: 8125050 > > # Bjellqvist, B., Basse, B., Olsen, E. and Celis, J.E. Reference > points > for comparisons of two-dimensional maps of proteins from different > human > cell types defined in a pH scale where isoelectric points correlate > with > polypeptide compositions. Electrophoresis 1994, 15, 529-539. > > MEDLINE: 8055880 > > The pK were defined by examining polypeptide migration between pH 4.5 > to > 7.3 in an immobilised pH gradient gel environment with 9.2M and 9.8M > urea at 15?C or 25?C. Prediction of protein pI for highly basic > proteins > is yet to be studied and it is possible that current Compute pI/Mw > predictions may not be adequate for this purpose. > > I hope this helps. > > > Best regards > Gregoire Rossier > > -------------------------------------------------------- Christine Hoogland Swiss Institute of Bioinformatics CMU - 1, rue Michel Servet Tel. (+41 22) 379 58 28 CH - 1211 Geneva 4 Switzerland Fax (+41 22) 379 58 58 Christine.Hoogland at isb-sib.ch http://www.expasy.org/ -------------------------------------------------------- // VERSION : 1.6 // DATE : 1/25/95 // Copyright 1993 by Swiss Institute of Bioinformatics. All rights reserved. // // Table of pk values : // Note: the current algorithm does not use the last two columns. Each // row corresponds to an amino acid starting with Ala. J, O and U are // inexistant, but here only in order to have the complete alphabet. // // Ct Nt Sm Sc Sn // static double cPk[26][5] = { 3.55, 7.59, 0. , 0. , 0. , // A 3.55, 7.50, 0. , 0. , 0. , // B 3.55, 7.50, 9.00 , 9.00 , 9.00 , // C 4.55, 7.50, 4.05 , 4.05 , 4.05 , // D 4.75, 7.70, 4.45 , 4.45 , 4.45 , // E 3.55, 7.50, 0. , 0. , 0. , // F 3.55, 7.50, 0. , 0. , 0. , // G 3.55, 7.50, 5.98 , 5.98 , 5.98 , // H 3.55, 7.50, 0. , 0. , 0. , // I 0.00, 0.00, 0. , 0. , 0. , // J 3.55, 7.50, 10.00, 10.00, 10.00 , // K 3.55, 7.50, 0. , 0. , 0. , // L 3.55, 7.00, 0. , 0. , 0. , // M 3.55, 7.50, 0. , 0. , 0. , // N 0.00, 0.00, 0. , 0. , 0. , // O 3.55, 8.36, 0. , 0. , 0. , // P 3.55, 7.50, 0. , 0. , 0. , // Q 3.55, 7.50, 12.0 , 12.0 , 12.0 , // R 3.55, 6.93, 0. , 0. , 0. , // S 3.55, 6.82, 0. , 0. , 0. , // T 0.00, 0.00, 0. , 0. , 0. , // U 3.55, 7.44, 0. , 0. , 0. , // V 3.55, 7.50, 0. , 0. , 0. , // W 3.55, 7.50, 0. , 0. , 0. , // X 3.55, 7.50, 10.00, 10.00, 10.00 , // Y 3.55, 7.50, 0. , 0. , 0. }; // Z #define PH_MIN 0 /* minimum pH value */ #define PH_MAX 14 /* maximum pH value */ #define MAXLOOP 2000 /* maximum number of iterations */ #define EPSI 0.0001 /* desired precision */ // // Compute the amino-acid composition. // for (i = 0; i < sequenceLength; i++) comp[sequence[i] - 'A']++; // // Look up N-terminal and C-terminal residue. // nTermResidue = sequence[0] - 'A'; cTermResidue = sequence[sequenceLength - 1] - 'A'; phMin = PH_MIN; phMax = PH_MAX; for (i = 0, charge = 1.0; i < MAXLOOP && (phMax - phMin) > EPSI; i++) { phMid = phMin + (phMax - phMin) / 2; cter = exp10(-cPk[cTermResidue][0]) / (exp10(-cPk[cTermResidue][0]) + exp10(-phMid)); nter = exp10(-phMid) / (exp10(-cPk[nTermResidue][1]) + exp10(-phMid)); carg = comp[R] * exp10(-phMid) / (exp10(-cPk[R][2]) + exp10(-phMid)); chis = comp[H] * exp10(-phMid) / (exp10(-cPk[H][2]) + exp10(-phMid)); clys = comp[K] * exp10(-phMid) / (exp10(-cPk[K][2]) + exp10(-phMid)); casp = comp[D] * exp10(-cPk[D][2]) / (exp10(-cPk[D][2]) + exp10(-phMid)); cglu = comp[E] * exp10(-cPk[E][2]) / (exp10(-cPk[E][2]) + exp10(-phMid)); ccys = comp[C] * exp10(-cPk[C][2]) / (exp10(-cPk[C][2]) + exp10(-phMid)); ctyr = comp[Y] * exp10(-cPk[Y][2]) / (exp10(-cPk[Y][2]) + exp10(-phMid)); charge = carg + clys + chis + nter - (casp + cglu + ctyr + ccys + cter); if (charge > 0.0) phMin = phMid; else phMax = phMid; } } From frieman6 at gmail.com Mon Apr 4 12:59:50 2011 From: frieman6 at gmail.com (omer f) Date: Mon, 4 Apr 2011 15:59:50 +0300 Subject: [Biojava-l] Fwd: Google Summer of Coding: Java project In-Reply-To: References: Message-ID: To all it may concern, Dear BioJava, My name Omer Frieman and i am a student of Computer Science (B.Sc) and International Politics at the Hebrew university in Jerusalem, Israel. Last week, I attended a conference about Google SOC, Which i find most interesting and something that i would like to experience. I am in the second year of my degree, and so far i have learned the program languages: Java, C, C++ and in addition Object Oriented Programming in Java. Though, i have no working experience in programming. While searching the list of SOC projects i came across the Bioinformatics Foundation Projects. I found the Java Project very interesting, First, for the content - the research subject of Bio-informatics, and Second, I was looking specifically for a Java project for gaining experience in it, Before sending my application with the short coding exercise, i send you this mail to Introduce myself. My only concern is, will My lack of experience could hurt my chances to be accepted to the project. I would appreciate your opinion in that matter. Furthermore, i was wondering how do i connect the Project mentor (Peter Troshin). Sincerely, Omer. From p.v.troshin at dundee.ac.uk Mon Apr 4 14:33:38 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Mon, 04 Apr 2011 15:33:38 +0100 Subject: [Biojava-l] Fwd: Google Summer of Coding: Java project In-Reply-To: References: Message-ID: <4D99D6C2.9060107@dundee.ac.uk> Hi Omer, I am on the BioJava list so I got your email. > >> I am in the second year of my degree, and so far i have learned > >> the program languages: Java, C, C++ and in addition Object > >> Oriented Programming in Java. Sounds good! > >> Though, i have no working experience in programming. Thank you for being honest (:-)) > >> My only concern is, will My lack of experience could hurt my > >> chances to be accepted to the project. It may but it may not. Some people pick up things pretty quickly - are you one of them? Seriously, have a look at the methods that you are about to implement, search the web for more information about them and see if you have a good idea how to do that. You do not have to know anything about the BioJava at this stage, you can learn about it later. For this project you also need to know the basics of Chemistry and Molecular Biology, nothing too complicated, but still without this knowledge the learning curve for this project is going to be pretty steep. I hope I answered your question. Regards, Peter On 04/04/2011 13:59, omer f wrote: > To all it may concern, > > Dear BioJava, > > My name Omer Frieman and i am a student of Computer Science (B.Sc) > and International Politics at the Hebrew university in Jerusalem, > Israel. Last week, I attended a conference about Google SOC, Which i > find most interesting and something that i would like to experience. > I am in the second year of my degree, and so far i have learned the > program languages: Java, C, C++ and in addition Object Oriented > Programming in Java. Though, i have no working experience in > programming. > > While searching the list of SOC projects i came across the > Bioinformatics Foundation Projects. I found the Java Project very > interesting, First, for the content - the research subject of > Bio-informatics, and Second, I was looking specifically for a Java > project for gaining experience in it, > > Before sending my application with the short coding exercise, i send > you this mail to Introduce myself. My only concern is, will My lack > of experience could hurt my chances to be accepted to the project. I > would appreciate your opinion in that matter. > > Furthermore, i was wondering how do i connect the Project mentor > (Peter Troshin). > > Sincerely, Omer. _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From x4roth at gmail.com Mon Apr 4 15:34:13 2011 From: x4roth at gmail.com (Xaroth) Date: Mon, 4 Apr 2011 11:34:13 -0400 Subject: [Biojava-l] GSoC Intent - Amino Acids Physico-Chemical Properties Calculation Message-ID: Hello everyone and project mentor Peter Troshin, My name is Justin Pugh. I am a 3rd year Computer Science student at the University of Central Florida. I am proficient in C and Java and I prefer to do all of my coding work in Java. I've recently learned about the Google Summer of Code program and I've been spending the past week or so looking over the available open-source projects. I am excited about participating in the GSoC because I have been anxious to be a part of a larger project than the small-scope programs that I have to write for classes. Sadly, I do not have any experience doing any programming projects on this scale but I am very interested in learning. The largest project I have completed was to write a compiler and virtual machine for PL/0. I'm currently working on writing a game from scratch as a vehicle for implementing some AI techniques and I am learning a lot about making use of others' code provided in libraries and integrating it into my own through the use of jMonkeyEngine (an open source java-based 3d game engine). I want to take my skills to the next level with GSoC. Out of all the projects I have looked at, BioJava has stood out the most for me (in particular, the Amino Acids Physico-Chemical Properties Calculation idea). I wanted to work on something written entirely in Java so I wouldnt have to spend any time learning a new language and I would be able to focus more on the actual project. I feel like I am able to wrap my head around the Amino Acids Physico-Chemical Properties Calculation proposal and I think it falls within my ability to get it done and even more - to have fun with it and be able to spend time refining it. I do not know anything about the calculations which must be implemented, but I am of course willing to learn them. I have only had small instruction in the area of multi-threaded Java programming but I hope this will not be a severe handicap for me. I plan on spending the next couple days doing some research and developing my proposal. I just wanted to announce my intent to submit an application and to introduce myself to the group. Please let me know your opinions regarding my ability to complete this project - I am open to criticism and pointers! :) Thank you for your time, -Justin Pugh From p.v.troshin at dundee.ac.uk Tue Apr 5 10:23:10 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 05 Apr 2011 11:23:10 +0100 Subject: [Biojava-l] GSoC Intent - Amino Acids Physico-Chemical Properties Calculation In-Reply-To: References: Message-ID: <4D9AED8E.3070309@dundee.ac.uk> Hi Justin, Welcome to the BioJava mailing list and thank you for your interest in the project. It sounds like you have plenty of experience in programming, which is definitely a plus. Do not worry about you lack of experience in multi-threaded programs, in the end there must be something for you to learn too! I hope that this project can give you what you are looking for ? the experience of implementing algorithms in pure Java and integrating with the relatively large BioJava code base. However, do not underestimate the difficulty of the algorithms for calculating the physico-chemical properties. Although the algorithms may not be particularly complex, but in order to understand them you would need some knowledge in chemistry, molecular biology and maths. Programming skills only will not be sufficient for this project. Good luck with your application. Regards, Peter On 04/04/2011 16:34, Xaroth wrote: > Hello everyone and project mentor Peter Troshin, > > My name is Justin Pugh. I am a 3rd year Computer Science student at the > University of Central Florida. I am proficient in C and Java and I prefer to > do all of my coding work in Java. I've recently learned about the Google > Summer of Code program and I've been spending the past week or so looking > over the available open-source projects. I am excited about participating in > the GSoC because I have been anxious to be a part of a larger project than > the small-scope programs that I have to write for classes. Sadly, I do not > have any experience doing any programming projects on this scale but I am > very interested in learning. The largest project I have completed was to > write a compiler and virtual machine for PL/0. I'm currently working on > writing a game from scratch as a vehicle for implementing some AI techniques > and I am learning a lot about making use of others' code provided in > libraries and integrating it into my own through the use of jMonkeyEngine > (an open source java-based 3d game engine). I want to take my skills to the > next level with GSoC. > > Out of all the projects I have looked at, BioJava has stood out the most for > me (in particular, the Amino Acids Physico-Chemical Properties Calculation > idea). I wanted to work on something written entirely in Java so I wouldnt > have to spend any time learning a new language and I would be able to focus > more on the actual project. I feel like I am able to wrap my head around the > Amino Acids Physico-Chemical Properties Calculation proposal and I think it > falls within my ability to get it done and even more - to have fun with it > and be able to spend time refining it. I do not know anything about the > calculations which must be implemented, but I am of course willing to learn > them. I have only had small instruction in the area of multi-threaded Java > programming but I hope this will not be a severe handicap for me. > > I plan on spending the next couple days doing some research and developing > my proposal. I just wanted to announce my intent to submit an application > and to introduce myself to the group. Please let me know your opinions > regarding my ability to complete this project - I am open to criticism and > pointers! :) > > Thank you for your time, > -Justin Pugh > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Tue Apr 5 16:00:32 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 05 Apr 2011 17:00:32 +0100 Subject: [Biojava-l] Final Application (Paiu Alexandru ) (added project plan) In-Reply-To: References: Message-ID: <4D9B3CA0.6000707@dundee.ac.uk> Hi Alex, I have a look at your plan and I'd suggest that you add a few more details into it. Right now I'd say it does not look sufficiently detailed. What are the deliverables of you project? What steps you will take at every stage of the project. How you make sure that your implementation gives the correct results? Are you going to use BioJava? If so how? Also, I would not bother with any implementation details as yet. Finally, If you worked with any version control systems it would help to state this. I hope this will help you to improve your proposal. Regards, Peter On 28/03/2011 17:17, Alexandru Paiu wrote: > *1. **1.Your complete contact information*, including full name, > physical address, preferred email address, and telephone number, plus other > pertinent contact information such as IRC handles, etc. > > > > Full Name : Paiu Alexandru > > Address : Country Romania , city Constanta , Bld. Aurel Vlaicu , Nr. 41 , > Bl. Pc1 sc. B , Et 6 , Apt. 46 > > E-mail : paiualex12 at google.com or paiualex12 at yahoo.com > > Telephone number : 40733924684 > > > > *2. **2.Why you are interested in the p*roject you are proposing and > are well-suited to undertake it. > > > > This project suits me perfectly , because the interested students should > have a general knowledge of core Java programming, knowledge of > multi-threaded programming . I?ve started learning Java for 1 and a half > years , and I used a lot of Threads in applications and projects . > > This is the only project that I apply , because I haven?t found a more > interesting project than this one . > > > > *3. **3.A summary of your programming *experience and skills. > > > > I?ve did a lot of miniproject and applications for school and for me . I?ve > made projects like : > > a) Lanchat Client-Server using TCP/IP ? I wrote two applications : one > for the client and one for the server . I used an JApplet for the client > with Swing elements . I?ve used Threads especially in the server sider > application , and sockets > > b) Lanchat Peer-to-Peer using UDP and multicasting . I wrote only a > application for the client . I used Threads and multicast sockets . > > c) A project for administrating a database , using a JApplet with > Connector/J and MySql . It has to applications , one for clients and for the > administrator . > > > *4. **4.Programs or projects you have previously* authored or > contributed to available as open-source, including, if applicable, any past > Summer of Code involvement. > > > > I haven?t worked yet for any open-source and a I haven?t any past experience > with GSoc , and it?s the first time a apply for a open-source project . I > haven?t either worked for a company . > > > > *5. **5. A project plan for the project* you are proposing, even if > your proposed project is directly based on one of the proposed project ideas > for member projects. > > > > I wish to apply to the project called *Amino Acids physic-chemical > properties calculation .* > > I?ve been thinking since some time of a possible implementation and I > stopped at a single one (that I think it?s the best) . > > > > I will use two main classes . One that will represent an atom of a substance > ( for example He , H , O , etc ) , that will have params like : atom weight > , name , abbreviation , valence . I?ll use the second class for > constructing amino-acids from this class . So , the second class will extend > the class of atoms . So for example I have to initiate a molecule of H20 > (water) . I will have a constructor with a string param , that will build > the substance . For example , let?s say that the second class it?s called > Aminoacids , and the first one Atoms . > > Let?s say I choose from a Combo box H2O ( it?s only a example) . Then I sad > the string ? H2O1? to the aminoacids class , to intiate an object of > aminoacid . That constructor will be evaluated char by char . If it?s found > a char or two chars that means that I have to initiate an atom of that char > or chars . If it?s found a number , then that means that it?s the multiplier > of that atom before it . > > So the class aminoacids will have a private Object [] array , in which will > be number and objects called atoms . > > So for H20 the array will look like this : array[0] = atom of H (Hidrogen) , > array[1]=2 , array[2]=O (Oxigen) , array[3]=1 . > > All the know substances will be in a file called atoms.txt with atom mass , > name , abbreviation etc . The atoms class will have a method to add new > atoms to the list . > > > > And for calculating the molecular weigth the algorithm is very simple . We > already have array={H,2,O,1} , and the atoms will have as params the atoms > weight so all we have to do is just : > > Mol. Weight=H.weight*2+O.weigth.*1 > > > > The plan for implementing : > > > > - May 20-June 20 ? implementing the two classes and the first two > methods > > - June 20 ? 20 July ? Implementing the rest of the methods > > - 20 July ? until the final ? final retouching , docummentation for > end users , and 1 method proposed by me > > - > > *6. **6.Any obligations, vacations, or plans* for the summer that may > require scheduling during the GSoC work period. > > > > I will have School final exams during 20 May ? 20 June . So I won?t be able > to work at maximum capacity . That?s all . I > > > *7. PS * > > I hope you've got my short coding exercise program ( I received a kinda > error for sending a mail will atachement) > > thanks > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Tue Apr 5 16:44:08 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Tue, 05 Apr 2011 17:44:08 +0100 Subject: [Biojava-l] attention to GSoC Amino Acids Physico-Chemical Properties prospective students In-Reply-To: References: Message-ID: <4D9B46D8.6070906@dundee.ac.uk> Hello prospective GSoC students, A have seen a few project plans so far and none of them seems to go into the trouble of providing the formulas for the calculations. Yet, it would have definitely strengthened your proposal if you can demonstrate that you are capable of extracting this information from the web and/or scientific papers. That said you do not have to include everything in your proposal but at least show that you know where to find what you need. There is the balance here to strike and this is something that is going to set you apart. I'd like to stress that the ideal student for this project should not only be a good Java programmer but have a keen interest in Bioinformatics! So please feel free to extend the original idea as you see fit. Good luck with your applications, Peter P.S. The project plan is if of cause yours and you should decide whether to include anything from the above or leave it, this is only an advice. Dr Peter Troshin Bioinformatics Software Developer Phone: +44 (0)1382 388589 Fax: +44 (0)1382 385764 The Barton Group College of Life Sciences Medical Sciences Institute University of Dundee Dundee DD1 5EH UK From andreas at sdsc.edu Wed Apr 6 04:33:47 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 5 Apr 2011 21:33:47 -0700 Subject: [Biojava-l] last days before submitting for GSoC Message-ID: Hi, just a quick reminder that the deadline to submit proposals for GSoC is approaching rapidly (Friday). In order to correctly apply you need to submit your applications at http://www.google-melange.com/ Don't forget to discuss your projects with the potential mentors or this list. A couple of more links that might be useful: how to write a proposal http://www.booki.cc/gsocstudentguide/_v/1.0/writing-a-proposal/ Here two proposals that got funded last year: http://biojava.org/wiki/GSoC:MSA http://biojava.org/wiki/GSoC:PTM Andreas From madhatterkkt at gmail.com Wed Apr 6 04:37:47 2011 From: madhatterkkt at gmail.com (K T) Date: Tue, 5 Apr 2011 21:37:47 -0700 Subject: [Biojava-l] Retrieving data from Genbank online Message-ID: I'm new to BioJava3, and recently downloaded 3.0.1. I'm looking for a way to retrieve entries from NCBI online (not a flat-file) using a RefSeq accession ID. I was able to do this with BioPerl, and I've also seen that there was a way to do this under BioJava 1.x. Can someone point me to a code-snippet where this is done for BioJava3? I tried searching online, but most of my search hits point either to BioJava 1.x or refer to retrieving them from a flat-file. I'm trying to get it from NCBI's online database directly. Thanks in advance. From alastair.m.kilpatrick at googlemail.com Wed Apr 6 09:07:27 2011 From: alastair.m.kilpatrick at googlemail.com (Alastair Kilpatrick) Date: Wed, 6 Apr 2011 10:07:27 +0100 Subject: [Biojava-l] Multiple sequence alignment in BioJava Message-ID: Dear all, I'm pretty new to BioJava so I may be missing something - I've checked through the list archives without much luck. I've been trying to do some multiple sequence alignments but I've been running into strange errors. To try and find where the problem is, I've just copied the code straight from http://biojava.org/wiki/BioJava:CookBook3:MSA and tried to run it, but I'm still getting an error: Caused by: java.lang.ClassNotFoundException: org.forester.phylogenyinference.DistanceMatrix It would seem that this is something to do with forester.jar, which I had to download separately (from http://code.google.com/p/forester/) as per http://biojava.org/wiki/BioJava:CookBook#biojava3-alignment - when I look through the jar in Eclipse, there isn't anything named 'phylogenyinference', although there is 'phylogeny', 'phylogeny.data', 'phylogeny.factories' and 'phylogeny.iterators'. Is there something I'm doing wrong, or is it the case that either one of BioJava or forester has been updated and things have broken somewhere (I see the CookBook page was updated in July 2010 but forester was updated just last month)? Either way, any ideas would be much appreciated! Many thanks, Alastair Kilpatrick PhD candidate, School of Informatics, University of Edinburgh From alastair.m.kilpatrick at googlemail.com Wed Apr 6 16:40:26 2011 From: alastair.m.kilpatrick at googlemail.com (Alastair Kilpatrick) Date: Wed, 6 Apr 2011 17:40:26 +0100 Subject: [Biojava-l] Multiple sequence alignment in BioJava In-Reply-To: References: Message-ID: Thanks - I had just downloaded the BioJava jars manually so that's fixed the CookBook code. However, I've made some changes in order to align DNA sequences instead and am running into more errors - this code: public static void main(String[] args) { String[] seqs = {"GATTACATTT", "CGATTACATG", "ATGGATTACA"}; List lst = new ArrayList(); for(String seq : seqs) { lst.add(new DNASequence(seq)); } Profile profile = Alignments.getMultipleSequenceAlignment(lst); //** System.out.println(profile); ConcurrencyTools.shutdown(); } gives: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) at java.util.concurrent.FutureTask.get(Unknown Source) at org.biojava3.alignment.Alignments.getListFromFutures(Alignments.java:282) at org.biojava3.alignment.Alignments.runPairwiseScorers(Alignments.java:602) at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:173) at CookbookMSA.main(CookbookMSA.java:49) at the 'alignment' line (**) - not sure what the problem is here, I see that getMultipleSequenceAlignment() has an extra argument(s) in the Javadoc but these weren't required in the example? Final question (hopefully!) - once I have the alignments I require I'd like to create a sequence logo - is there a way of doing this in BioJava3? From a google search I've seen references to DistributionTools.distOverAlignment() and similar, but can't find anything like that in the new api. Thanks again, sorry everyone for all the questions! Alastair On 6 April 2011 12:09, Scooter Willis wrote: > > You need to use the forester.jar file from the biojava3 code check out. If you are using maven this should be automatic. > > On Apr 6, 2011 5:07 AM, "Alastair Kilpatrick" wrote: > > Dear all, > I'm pretty new to BioJava so I may be missing something - I've checked > through the list archives without much luck. > I've been trying to do some multiple sequence alignments but I've been > running into strange errors. To try and find where the problem is, > I've just copied the code straight from > http://biojava.org/wiki/BioJava:CookBook3:MSA and tried to run it, but > I'm still getting an error: > > Caused by: java.lang.ClassNotFoundException: > org.forester.phylogenyinference.DistanceMatrix > > It would seem that this is something to do with forester.jar, which I > had to download separately (from http://code.google.com/p/forester/) > as per http://biojava.org/wiki/BioJava:CookBook#biojava3-alignment - > when I look through the jar in Eclipse, there isn't anything named > 'phylogenyinference', although there is 'phylogeny', 'phylogeny.data', > 'phylogeny.factories' and 'phylogeny.iterators'. Is there something > I'm doing wrong, or is it the case that either one of BioJava or > forester has been updated and things have broken somewhere (I see the > CookBook page was updated in July 2010 but forester was updated just > last month)? Either way, any ideas would be much appreciated! > > Many thanks, > Alastair Kilpatrick > > PhD candidate, > School of Informatics, University of Edinburgh > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From paiualex12 at gmail.com Wed Apr 6 17:05:38 2011 From: paiualex12 at gmail.com (Alexandru Paiu) Date: Wed, 6 Apr 2011 20:05:38 +0300 Subject: [Biojava-l] Identified bug in BioJava(Paiu Alexandru) Message-ID: Hi to all . I've identified a bug in BioJava . Usually the first element in an array is on index 0 (pos 0 ) . I've tried to learn more about Java and I've tryed the next 2 instructions : ProteinSequence a=new ProteinSequence("A"); AminoAcidCompound b=a.getCompoundAt(0); This will give a null execption (out of array bounds or something like that ) . when I put AminoAcidCompound b=a.getCompoundAt(1); it works and identifies the first aminoacid as Alanine . That's all for today From willishf at ufl.edu Wed Apr 6 18:30:11 2011 From: willishf at ufl.edu (Scooter Willis) Date: Wed, 6 Apr 2011 14:30:11 -0400 Subject: [Biojava-l] Identified bug in BioJava(Paiu Alexandru) In-Reply-To: References: Message-ID: Alexandru To make it easier on how biologist think about the first position(1) versus how computer scientists think(0) we opted to go with the biologist view of the world. So use 1 for the first sequence position. Thanks Scooter On Wed, Apr 6, 2011 at 1:05 PM, Alexandru Paiu wrote: > Hi to all . > I've identified a bug in BioJava . > Usually the first element in an array is on index 0 (pos 0 ) . > I've tried to learn more about Java and I've tryed the next 2 instructions : > > ProteinSequence a=new ProteinSequence("A"); > AminoAcidCompound b=a.getCompoundAt(0); > > > This will give a null execption (out of array bounds or something like that > ) . > when I put AminoAcidCompound b=a.getCompoundAt(1); ?it works and identifies > the first > aminoacid as Alanine . > > That's all for today > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From andreas at sdsc.edu Thu Apr 7 05:03:40 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 6 Apr 2011 22:03:40 -0700 Subject: [Biojava-l] Multiple sequence alignment in BioJava In-Reply-To: References: Message-ID: Hi Alastair, BioJava 1.X can do distributions, however there is no counterpart for this yet in BioJava 3. Andreas On Wed, Apr 6, 2011 at 9:40 AM, Alastair Kilpatrick wrote: > Thanks - I had just downloaded the BioJava jars manually so that's > fixed the CookBook code. However, I've made some changes in order to > align DNA sequences instead and am running into more errors - this > code: > > ? ? ? ?public static void main(String[] args) { > ? ? ? ? ? ? ? ?String[] seqs = {"GATTACATTT", "CGATTACATG", "ATGGATTACA"}; > ? ? ? ? ? ? ? ?List lst = new ArrayList(); > ? ? ? ? ? ? ? ?for(String seq : seqs) { > ? ? ? ? ? ? ? ? ? ? ? ?lst.add(new DNASequence(seq)); > ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?Profile profile = > Alignments.getMultipleSequenceAlignment(lst); //** > ? ? ? ? ? ? ? ?System.out.println(profile); > ? ? ? ? ? ? ? ?ConcurrencyTools.shutdown(); > ? ? ? ?} > > gives: > java.util.concurrent.ExecutionException: java.lang.NullPointerException > ? ? ? ?at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) > ? ? ? ?at java.util.concurrent.FutureTask.get(Unknown Source) > ? ? ? ?at org.biojava3.alignment.Alignments.getListFromFutures(Alignments.java:282) > ? ? ? ?at org.biojava3.alignment.Alignments.runPairwiseScorers(Alignments.java:602) > ? ? ? ?at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:173) > ? ? ? ?at CookbookMSA.main(CookbookMSA.java:49) > > at the 'alignment' line (**) - not sure what the problem is here, I > see that getMultipleSequenceAlignment() has an extra argument(s) in > the Javadoc but these weren't required in the example? > Final question (hopefully!) - once I have the alignments I require I'd > like to create a sequence logo - is there a way of doing this in > BioJava3? From a google search I've seen references to > DistributionTools.distOverAlignment() and similar, but can't find > anything like that in the new api. > > Thanks again, sorry everyone for all the questions! > > Alastair > > > On 6 April 2011 12:09, Scooter Willis wrote: >> >> You need to use the forester.jar file from the biojava3 code check out. If you are using maven this should be automatic. >> >> On Apr 6, 2011 5:07 AM, "Alastair Kilpatrick" wrote: >> >> Dear all, >> I'm pretty new to BioJava so I may be missing something - I've checked >> through the list archives without much luck. >> I've been trying to do some multiple sequence alignments but I've been >> running into strange errors. To try and find where the problem is, >> I've just copied the code straight from >> http://biojava.org/wiki/BioJava:CookBook3:MSA and tried to run it, but >> I'm still getting an error: >> >> Caused by: java.lang.ClassNotFoundException: >> org.forester.phylogenyinference.DistanceMatrix >> >> It would seem that this is something to do with forester.jar, which I >> had to download separately (from http://code.google.com/p/forester/) >> as per http://biojava.org/wiki/BioJava:CookBook#biojava3-alignment - >> when I look through the jar in Eclipse, there isn't anything named >> 'phylogenyinference', although there is 'phylogeny', 'phylogeny.data', >> 'phylogeny.factories' and 'phylogeny.iterators'. Is there something >> I'm doing wrong, or is it the case that either one of BioJava or >> forester has been updated and things have broken somewhere (I see the >> CookBook page was updated in July 2010 but forester was updated just >> last month)? Either way, any ideas would be much appreciated! >> >> Many thanks, >> Alastair Kilpatrick >> >> PhD candidate, >> School of Informatics, University of Edinburgh >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From effat34 at gmail.com Thu Apr 7 11:26:43 2011 From: effat34 at gmail.com (effat farhana) Date: Thu, 7 Apr 2011 17:26:43 +0600 Subject: [Biojava-l] Amino acids physico-chemical properties calculation Message-ID: Hi, I'm a student of Computer Science and Engineering background. I heard about GSoC a few days earlier and very much eager to participate in it. I'm quite familiar with C++, Java multithreading. The idea of this project proposal seems quite interesting to me. One of the methods to be implemented is to calculate the numbers of different amino acids in protein. Would you please explain how the input of this method will be given? Will the protein sequence be represented by String as sequence of amino acids and I've just count the different types of amino acid? Looking forward for your quick reply Thanks in advance farhana From p.v.troshin at dundee.ac.uk Thu Apr 7 21:47:20 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Thu, 07 Apr 2011 22:47:20 +0100 Subject: [Biojava-l] Amino acids physico-chemical properties calculation In-Reply-To: References: Message-ID: <4D9E30E8.3090501@dundee.ac.uk> > >> Would you please explain how the input of this method will be > >> given? Just assume that this would be a String for now. For example "MFVAWLMLADAELGMGDTTAGEMAVQRGLALHPGHPEAVARLGR". Hope that helps. Regards, Peter On 07/04/2011 12:26, effat farhana wrote: > Hi, I'm a student of Computer Science and Engineering background. I > heard about GSoC a few days earlier and very much eager to > participate in it. I'm quite familiar with C++, Java multithreading. > The idea of this project proposal seems quite interesting to me. > > One of the methods to be implemented is to calculate the numbers of > different amino acids in protein. Would you please explain how the > input of this method will be given? Will the protein sequence be > represented by String as sequence of amino acids and I've just count > the different types of amino acid? > > > Looking forward for your quick reply Thanks in advance > > farhana _______________________________________________ Biojava-l > mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From p.v.troshin at dundee.ac.uk Thu Apr 7 22:26:39 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Thu, 07 Apr 2011 23:26:39 +0100 Subject: [Biojava-l] google summer of code proposal In-Reply-To: References: Message-ID: <4D9E3A1F.8050306@dundee.ac.uk> Dear Mihaly, I am sorry for belated response. I'd suggest simplifying your plan a little. For example you do not need to write what you are going to do every day of GSoC. One week precision will be fine. On the other hand, it would be good to include formulas into the plan as well as expand the original idea with something that is in line with the project and interest you. Did you have a go at the coding exercise? If not I would recommend you to do that. Finally do not forget to send your proposal to melange. Regards, Peter On 06/04/2011 21:35, Mihaly Cs?k?s wrote: > > Dear Peter, > > I am very interested in gsoc and I would like to apply for the BioJava > - Amino acids physico-chemical properties calculation project. > > I send you my Proposal in the attachment. Please if you have a little > time, read it throughand write me your opinion. > > Thank you in advance! > > Yours sincerely: > > Mihaly Csokas > From andreas at sdsc.edu Fri Apr 8 04:45:41 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 7 Apr 2011 21:45:41 -0700 Subject: [Biojava-l] GSoC last day to submit proposals Message-ID: Hi, This is the final reminder that the deadline for submitting proposals for the Google Summer of Code is Friday April 8th, 19:00 UTC. Andreas From kohchuanhock at gmail.com Fri Apr 8 17:46:21 2011 From: kohchuanhock at gmail.com (Chuan Hock Koh) Date: Sat, 9 Apr 2011 01:46:21 +0800 Subject: [Biojava-l] Submission of Short Coding Exercise Message-ID: Dear Dr Peter Troshin, I had completed the short coding exercise. As I was about to submit it, I realize that I am unsure about the behavior that runme.jar should exhibit. In Goal 2, it was given that the example command is "java FindEnds inputFile.txt outputFile.txt" Am I right to say that runme.jar should replace FindEnds and run as follows? "java -jar runme.jar inputFile.txt outputFile.txt" Please advice. Thanks. Looking forward to your reply. Regards, Chuan Hock -- http://compbio.ddns.comp.nus.edu.sg/~ChuanHockKoh From p.v.troshin at dundee.ac.uk Fri Apr 8 21:43:29 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Fri, 08 Apr 2011 22:43:29 +0100 Subject: [Biojava-l] Submission of Short Coding Exercise In-Reply-To: References: Message-ID: <4D9F8181.9040204@dundee.ac.uk> Hello Chuan, Yes, you are right. Peter On 08/04/2011 18:46, Chuan Hock Koh wrote: > Dear Dr Peter Troshin, > > I had completed the short coding exercise. As I was about to submit it, I > realize that I am unsure about the behavior that runme.jar should exhibit. > > In Goal 2, it was given that the example command is "java FindEnds > inputFile.txt outputFile.txt" > > Am I right to say that runme.jar should replace FindEnds and run as follows? > > "java -jar runme.jar inputFile.txt outputFile.txt" > > Please advice. Thanks. > > Looking forward to your reply.t. > > Regards, > Chuan Hock > From khalil.elmazouari at gmail.com Mon Apr 11 20:25:15 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Mon, 11 Apr 2011 22:25:15 +0200 Subject: [Biojava-l] RichSequenceIterator.hasNext from empty file return true!!! Message-ID: <63EB0592-5431-4AB1-A73A-744832E7B547@gmail.com> Hi, RichSequenceIterator.hasNext from empty file return true!!! and throws an infinite BioException loop!!! Any explanation?? Thanks khalil test using the following code public static void main(String[] args) { String path = "emptyFile.txt"; BufferedReader br = null; try { br = new BufferedReader(new FileReader(path)); } catch (FileNotFoundException ex) { ex.printStackTrace(); } RichSequenceIterator seqs = RichSequence.IOTools.readFastaProtein(br, null); while (seqs.hasNext()) { try { RichSequence seq = seqs.nextRichSequence(); } catch (NoSuchElementException ex) { ex.printStackTrace(); } catch (BioException ex) { ex.printStackTrace(); } } ==== org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at com.kem.ae.core.Empty.main(Empty.java:51) Caused by: java.io.IOException: Premature stream end at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 1 more org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at com.kem.ae.core.Empty.main(Empty.java:51) Caused by: java.io.IOException: Premature stream end at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) infinite loop.... From chapmanb at 50mail.com Tue Apr 12 12:36:32 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 12 Apr 2011 08:36:32 -0400 Subject: [Biojava-l] Bioinformatics Open Source Conference (BOSC 2011)--Abstracts due April 18th! Message-ID: <20110412123632.GE2105@kunkel> Only one week left to submit an abstract to BOSC 2011! We have two great keynote speakers lined up (Lawrence Hunter and Matt Wood) and session topics that include parallel and cloud-based approaches to bioinformatics, genome content management, and tools for next-generation sequencing. We'd love to hear about your Open Source bioinformatics project! The 12th Annual Bioinformatics Open Source Conference (BOSC 2011) An ISMB 2011 Special Interest Group (SIG) July 15-16, 2011, in Vienna, Austria http://www.open-bio.org/wiki/BOSC_2011 Important Dates: April 18, 2011: Deadline for submitting abstracts to BOSC 2011 May 9, 2011: Notifications of accepted abstracts emailed to corresponding authors July 13-14, 2011: Codefest 2011 programming session (see http://www.open-bio.org/wiki/Codefest_2011 for details) July 15-16, 2011: BOSC 2011 July 17-19, 2011: ISMB 2011 The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. To be considered for acceptance, software systems representing the central topic in a presentation submitted to BOSC must be licensed with a recognized Open Source License, and be freely available for download in source code form. We invite you to submit abstracts for talks and posters. Sessions include: - Approaches to parallel processing - Cloud-based approaches to improving software and data accessibility - The Semantic Web in open source bioinformatics - Data visualization - Tools for next-generation sequencing - Other Open Source software In addition to the above sessions, there will be a panel discussion about "Meeting the challenges of inter-institutional collaboration". We are also working to arrange a joint session with one of the other ISMB SIGs. Thanks to generous sponsorship from Eagle Genomics and an anonymous donor, we are pleased to announce a competition for three Student Travel Awards for BOSC 2011. Each winner will be awarded $250 to defray the costs of travel to BOSC 2011. All students whose abstracts are accepted for talks will be considered for this award. For instructions on submitting your abstract, please visit http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information BOSC 2011 Organizing Committee: Nomi Harris and Peter Rice (co-chairs); Brad Chapman, Peter Cock, Erwin Frise, Darin London, Ron Taylor From gwaldon at geneinfinity.org Tue Apr 12 18:02:28 2011 From: gwaldon at geneinfinity.org (George Waldon) Date: Tue, 12 Apr 2011 13:02:28 -0500 Subject: [Biojava-l] RichSequenceIterator.hasNext from empty file return true!!! In-Reply-To: <63EB0592-5431-4AB1-A73A-744832E7B547@gmail.com> References: <63EB0592-5431-4AB1-A73A-744832E7B547@gmail.com> Message-ID: <20110412130228.31866c2hd15fztm8@gator1273.hostgator.com> Hi Khali, In my hands I found "java.io.FileNotFoundException: emptyFile.txt (The system cannot find the file specified)", which is not exactly the same thing as an empty file. Anyway, I think you are right and RichStreamReader should switch the moreSequenceAvailable flag before throwing the exception. In your case just get out of the loop and you should be safe. Thanks for reporting. George Quoting Khalil El Mazouari : > Hi, > > RichSequenceIterator.hasNext from empty file return true!!! and > throws an infinite BioException loop!!! > Any explanation?? Thanks > > khalil > > test using the following code > > public static void main(String[] args) { > String path = "emptyFile.txt"; > > BufferedReader br = null; > try { > br = new BufferedReader(new FileReader(path)); > } catch (FileNotFoundException ex) { > ex.printStackTrace(); > } > RichSequenceIterator seqs = > RichSequence.IOTools.readFastaProtein(br, null); > while (seqs.hasNext()) { > try { > RichSequence seq = seqs.nextRichSequence(); > } catch (NoSuchElementException ex) { > ex.printStackTrace(); > } catch (BioException ex) { > ex.printStackTrace(); > } > } > > ==== > > org.biojava.bio.BioException: Could not read sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at com.kem.ae.core.Empty.main(Empty.java:51) > Caused by: java.io.IOException: Premature stream end > at > org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 1 more > org.biojava.bio.BioException: Could not read sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at com.kem.ae.core.Empty.main(Empty.java:51) > Caused by: java.io.IOException: Premature stream end > at > org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:178) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > > infinite loop.... > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mandarijnopw8 at gmail.com Wed Apr 13 14:47:12 2011 From: mandarijnopw8 at gmail.com (Shamanou van Leeuwen) Date: Wed, 13 Apr 2011 16:47:12 +0200 Subject: [Biojava-l] GSOC Message-ID: <4DA5B770.7040809@gmail.com> Hi everyone, i am student who would like to join GSOC with another programmer. But we still are in need of a mentor who can guide us. is there someone with biojava experience who can help us during GSOC? greetings, Shamanou van Leeuwen From rmb32 at cornell.edu Wed Apr 13 15:30:59 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 13 Apr 2011 08:30:59 -0700 Subject: [Biojava-l] last call for Google Summer of Code mentors Message-ID: <4DA5C1B3.4010504@cornell.edu> Hi all, This is the last call for mentors for Google Summer of Code. We have a good crop of student proposals this year for doing work on OBF projects, and money from Google to fund them, but we need experienced Bio* developers to mentor them. If you'd like to see the student proposals, participate in their scoring, and possibly volunteer to mentor them (remotely of course) over the summer, do two things: 1.) Create an account on http://google-melange.com and send a request to be an admin from the OBF page on there, http://www.google-melange.com/gsoc/org/google/gsoc2011/obf 2.) Join the OBF GSoC mentors mailing list at http://lists.open-bio.org/mailman/listinfo/gsoc-mentors Even if you just want to see the student applications and help with scoring, but don't necessarily have time to mentor a student, your input in the scoring process is appreciated. :-) Rob ---- Robert Buels OBF GSoC 2011 Administrator From andreas at sdsc.edu Thu Apr 14 03:17:29 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 13 Apr 2011 20:17:29 -0700 Subject: [Biojava-l] GSOC In-Reply-To: <4DA5B770.7040809@gmail.com> References: <4DA5B770.7040809@gmail.com> Message-ID: Hi Shamanou, The deadline for submitting proposals was last Friday. Usually the funding is for one student only and not for pairs of students... Andreas On Wed, Apr 13, 2011 at 7:47 AM, Shamanou van Leeuwen wrote: > Hi everyone, > > i am student who would like to join GSOC with another programmer. > But we still are in need of a mentor who can guide us. > > is there someone with biojava experience who can help us during GSOC? > > greetings, > Shamanou van Leeuwen > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From p.v.troshin at dundee.ac.uk Wed Apr 6 15:49:59 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 06 Apr 2011 16:49:59 +0100 Subject: [Biojava-l] Fwd: Google Summer of Coding: Java project In-Reply-To: References: <4D99D6C2.9060107@dundee.ac.uk> Message-ID: <4D9C8BA7.4020806@dundee.ac.uk> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: molecularweight.gif Type: image/gif Size: 1390 bytes Desc: not available URL: From mostafa.shokrof at gmail.com Thu Apr 14 11:35:57 2011 From: mostafa.shokrof at gmail.com (Mostafa Shokrof) Date: Thu, 14 Apr 2011 13:35:57 +0200 Subject: [Biojava-l] Google summer code proposal Message-ID: On Thu, Apr 14, 2011 at 8:48 AM, Jason Stajich wrote: > Mostafa - > > I may have accidently deleted your post to biojava-l about GSOC that was in > the held queue for the mailiman system. If you can resend it to the list one > more time I will make sure it goes through. > > > Name :Mostafa EL-Sayed Shokrof > > Email:mostafa.shokrof at gmail.com > > Physical Address:7 el gomhoreya off tarh el bahr st - el shark -portsaid > -Egypt > > Telephone Number:02-016755576 > > I am a third year student in Bioinformatics department computer science > faculty Ain shams university with average grade A. I am interested in > applying for amino acid properties calculation project which-i do believe- > is a great opportunity to enhance my experience. I chose Open-Bio > organization as it offers interaction with real world Bioinformatics applications > which would certainly help me a lot in my graduation project next semester. > Choosing this project in particular was based on several reasons. Firstly I > believe in the project's importance as it addresses core Bioinformatics > problems and applies Bioinformatics algorithms..Second it will be > implemented in Java ,which has two main advantages ,it is Object oriented > and cross platform . ,so I think BIO-Java project is very useful. > > I also believe I am well suited for this project as I passed both > Algorithms and OOP courses with grade A ,besides my strong background in > biology as I studied biology and Bioinformatics at college. > > Not only did I gain my programming experience by attending training courses > in CIT Global Mobi dev company for mobile applications last summer , but I > also studied the following courses: > > 1.Structural Programming > > 2.Data Structure > > 3.OOP > > 4.Algorithms > > 5.File Organization > > 6.Software Engineering > > Courses being studied this semester are: > > 6.Numerical Analysis > > 7.System Analysis > > > > Concerning my biological background,I have studied: > > 1.Introduction to Biology > > 2.Introduction to Biochemistry > > 3.Introduction to Biophysics > > 4.Introduction to Molecular Cell Biology > > 5.Advanced Molecular Genetics > > 6.Introduction to Bioinformatics > > Courses being studied in this semester: > > 7.Biotechnology > > 8.Structural Bioinformatics > > > > Unfortunately Google schedule does not fit me since our exams are expected > to be delayed due to the current circumstances in Egypt ,so I hope I will be > free on the beginning of July . > > But after that I will dedicate my summer to GSCO .I can work an average 8 > hours a day ,five days a week of total 40 hours a week. The New semester > starts at the half of September. so I will have plenty of time .Finally I > would be really grateful if you provided me this cherish opportunity which > would help me out in my career further on. > > > > > > I attached time line for the project ,UML design ,Abstract of how the > implementation will look and , the Short code Exercise with the proposal, > > I will appreciate your feedback.. > resourceshttps:// > docs.google.com/leaf?id=0B-lMAeKAGH9gYjFjNWFmZmItMGZkMC00NDIwLWFlODgtMDUzYzhlYWEwNzk0&hl=en: > > > my regards, > > Mostafa Shokrof > > Bioinformatics student > Ain Shams University > From erikclarke at gmail.com Fri Apr 15 02:41:48 2011 From: erikclarke at gmail.com (Erik C) Date: Thu, 14 Apr 2011 19:41:48 -0700 Subject: [Biojava-l] needleman-wunsch score problems Message-ID: Hi all, I'm having some trouble reconciling the scores from the NeedlemanWunsch sequence alignment object in BioJava with the scores I'm getting from the EMBL-EBI command-line tool 'needle'. Specifically, for the same sequences, matrix, and penalties, BioJava returns 274 (in one case), while `needle' returns 163. Does anybody have any ideas as to why this might be happening? Is there a parameter or setting I'm missing? My implementation of the n.w. code in biojava is below: public long alignTwoSequences(ProteinSequence subject, ProteinSequence target) { SubstitutionMatrix blosum62 = SubstitutionMatrixHelper.getBlosum62(); GapPenalty penalties = new SimpleGapPenalty(); penalties.setExtensionPenalty((short) .5); penalties.setOpenPenalty((short) 10); NeedlemanWunsch nw = new NeedlemanWunsch(subject, target, penalties, blosum62); return nw.getScore(); } Thanks, Erik From andreas at sdsc.edu Fri Apr 15 03:14:16 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 14 Apr 2011 20:14:16 -0700 Subject: [Biojava-l] needleman-wunsch score problems In-Reply-To: References: Message-ID: Hi Eric, Did you compare the alignments, i.e which pairs of amino acids are getting aligned? There might be subtle differences.. Andreas On Thu, Apr 14, 2011 at 7:41 PM, Erik C wrote: > Hi all, > I'm having some trouble reconciling the scores from the NeedlemanWunsch > sequence alignment object in BioJava with the scores I'm getting from the > EMBL-EBI command-line tool 'needle'. Specifically, for the same sequences, > matrix, and penalties, BioJava returns 274 (in one case), while `needle' > returns 163. > Does anybody have any ideas as to why this might be happening? Is there a > parameter or setting I'm missing? My implementation of the n.w. code in > biojava is below: > > ? ?public long alignTwoSequences(ProteinSequence subject, > ? ? ? ? ? ?ProteinSequence target) { > > ? ? ? ?SubstitutionMatrix blosum62 = > SubstitutionMatrixHelper.getBlosum62(); > ? ? ? ?GapPenalty penalties = new SimpleGapPenalty(); > ? ? ? ?penalties.setExtensionPenalty((short) .5); > ? ? ? ?penalties.setOpenPenalty((short) 10); > ? ? ? ?NeedlemanWunsch nw = new > NeedlemanWunsch(subject, target, > penalties, blosum62); > ? ? ? ?return nw.getScore(); > > ? ?} > > Thanks, > Erik > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jayunit100 at gmail.com Fri Apr 15 17:16:18 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Fri, 15 Apr 2011 13:16:18 -0400 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: References: Message-ID: I adopted a java Needlman Wunsch for proteins and it works fine ; if someone wants to integrate it or integrate or compare it w/ biojava's implementation i can provide it just email me i will send the source . otherwise i can potentially just try to commit it to a sandbox repo if one exists.... From andreas at sdsc.edu Fri Apr 15 19:14:34 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 15 Apr 2011 12:14:34 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> References: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> Message-ID: Hi Jay, CCing the list again, somehow I dropped it off this thread.. If I understand your example right, you want the alignment to only span the domain that is share between the two proteins. This suggests you would use a local alignment, not penalizing end gaps.. If you provide some example IDs we can be more specific... Andreas On Fri, Apr 15, 2011 at 11:32 AM, JAX wrote: > Thanks andreas : ?Is there a way to globally align to sequences using smith watermans optimal local alignment; I.e. So as to get an ideal alignment of two multi domain proteins (which only share one domain), that is gapped in such a way so as to cover the whole protein length? > > Sent from my iPad > > Sent from my iPad > > On Apr 15, 2011, at 2:00 PM, Andreas Prlic wrote: > >> Hi Jay, >> >> thanks for the suggestion. There is already a needleman wunsch and >> smith waterman implementation available as part of the alignment >> module. Have you seen those? >> >> http://biojava.org/wiki/BioJava:CookBook3:PSA >> >> Andreas >> >> On Fri, Apr 15, 2011 at 10:16 AM, Jay Vyas wrote: >>> I adopted a java Needlman Wunsch for proteins and it works fine ; if someone >>> wants to integrate it or integrate or compare it w/ biojava's implementation >>> i can provide it just email me i will send the source . >>> >>> otherwise i can potentially just try to commit it to a sandbox repo if one >>> exists.... >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > From jayunit100 at gmail.com Fri Apr 15 19:30:30 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Fri, 15 Apr 2011 15:30:30 -0400 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: References: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> Message-ID: Consider Kalirin , a multidomain protein. http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=NP_003938.1 Now, lets say I aligned Kalirin with a single domain member of the RHO-Gef family.... Then only one part of the sequence would match.... But what if I WANTED a global alignment, for programmatic purposes... Then I would want an alignment like this ------------------------------------------------RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF------ With lots of gaps in the beggining, and lots of matches at the end, where the length of the alignment between the two proteins was exactly equivalent. Contrast this with a smith waterman output, which would look like this : RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF.... So is there a way to get the specificity of Smith Waterman in a Needlman Wunsch alignment ? From andreas at sdsc.edu Fri Apr 15 21:49:36 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 15 Apr 2011 14:49:36 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 99, Issue 12 In-Reply-To: References: <507BA137-A032-4D2E-AAF6-10898EB411B6@gmail.com> Message-ID: It sounds like you are talking about doing a needleman wunsch alignment, while not penalizing end gaps. This should give you quite similar results to just doing a smith waterman one. Is it the alignment display that you are concerned about? You could easily fill up the ends with unaligned regions for smith waterman results. I don't find end-gaps particularly informative. Usually an alignment display will give you the aligned positions, then you can easily see (or script) if there are end gaps. Hope that makes sense.. Andreas On Fri, Apr 15, 2011 at 12:30 PM, Jay Vyas wrote: > Consider Kalirin , a multidomain protein. > http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=NP_003938.1 > Now, lets say I aligned Kalirin with a single domain member of the RHO-Gef > family.... ?Then only one part of the sequence would match.... ?But what if > I WANTED a global alignment, for programmatic purposes... Then I would want > an alignment like this > ------------------------------------------------RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF------ > With lots of gaps in the beggining, and lots of matches at the end, where > the length of the alignment between the two proteins was exactly equivalent. > ?Contrast this with a smith waterman output, which would look like this : > RHOGEFRHOGEFRHOG--FRHOGEFRHOGEFRHOGEF.... > So is there a way to get the specificity of Smith Waterman in a Needlman > Wunsch alignment ? From Wim.DeSmet at UGent.be Mon Apr 18 15:22:26 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Mon, 18 Apr 2011 17:22:26 +0200 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle Message-ID: <4DAC5732.5010507@UGent.be> Hi all, I've been trying to generate some global alignments with biojava and comparing them with what needle returns. Doing this, I can't seem to reproduce needle's alignment with biojava. The score returned from biojava seems to be worse than that from needle, so I'm not sure what's happening here. The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid spamming people, let me know if you want one). I align them with: GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); PairwiseSequenceAligner aligner = Alignments.getPairwiseAligner( new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), PairwiseSequenceAlignerType.GLOBAL, penalty, SubstitutionMatrixHelper.getNuc4_4()); SequencePair alignment = aligner.getPair(); This gives me an alignment with only 23% similarity and a gap at the end. Varying the gap penalties can give me a gap in front too, but that's about it. When aligning in needle, I get a sequence with a higher score (6784 vs (-)5862) and 94% similarity (which seems closer to home). Needle I just run with defaults (so it uses EDNAFULL) and a go/ge of 14/4. Could this be a bug or am I misunderstanding some of the options? BTW, if I use a really large gapextend, say -4000, I also get a nullpointer exception. TIA, Wim De Smet -- Wim De Smet http://www.straininfo.net/ From andreas at sdsc.edu Mon Apr 18 18:34:57 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 18 Apr 2011 11:34:57 -0700 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle In-Reply-To: <4DAC5732.5010507@UGent.be> References: <4DAC5732.5010507@UGent.be> Message-ID: Hi Wim, thanks for tracking this down. I agree, something does not look right here. I'll try to see what is going on... Andreas On Mon, Apr 18, 2011 at 8:22 AM, Wim De Smet wrote: > Hi all, > > I've been trying to generate some global alignments with biojava and > comparing them with what needle returns. Doing this, I can't seem to > reproduce needle's alignment with biojava. The score returned from biojava > seems to be worse than that from needle, so I'm not sure what's happening > here. > > The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid > spamming people, let me know if you want one). I align them with: > GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); > PairwiseSequenceAligner aligner = > Alignments.getPairwiseAligner( > new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), > new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), > PairwiseSequenceAlignerType.GLOBAL, > penalty, SubstitutionMatrixHelper.getNuc4_4()); > SequencePair > alignment = aligner.getPair(); > > This gives me an alignment with only 23% similarity and a gap at the end. > Varying the gap penalties can give me a gap in front too, but that's about > it. When aligning in needle, I get a sequence with a higher score (6784 vs > (-)5862) and 94% similarity (which seems closer to home). Needle I just run > with defaults (so it uses EDNAFULL) and a go/ge of 14/4. > > Could this be a bug or am I misunderstanding some of the options? > > BTW, if I use a really large gapextend, say -4000, I also get a nullpointer > exception. > > TIA, > Wim De Smet > > -- > Wim De Smet > http://www.straininfo.net/ > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Wed Apr 20 17:33:03 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 20 Apr 2011 10:33:03 -0700 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle In-Reply-To: <4DAC5732.5010507@UGent.be> References: <4DAC5732.5010507@UGent.be> Message-ID: Hi Wim, are you sure you are using the correct sequences in your test? When I run the code at the bottom of this emails I am getting 95 and 97% sequence ID, which is similar to what you are expecting. Andreas Here my code: (using latest code from SVN) package demo; import org.biojava3.alignment.Alignments; import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType; import org.biojava3.alignment.SimpleGapPenalty; import org.biojava3.alignment.SubstitutionMatrixHelper; import org.biojava3.alignment.template.GapPenalty; import org.biojava3.alignment.template.PairwiseSequenceAligner; import org.biojava3.alignment.template.SequencePair; import org.biojava3.core.sequence.DNASequence; import org.biojava3.core.sequence.compound.AmbiguityDNACompoundSet; import org.biojava3.core.sequence.compound.NucleotideCompound; public class TestDNANeedlemanWunsch { public static void main(String[] args){ String query = "AGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGTGAAGCCCAGCTTGCTGGGTGGATCA" + "GTGGCGAACGGGTGAGTAACACGTGAGCAACCTGCCCCTGACTCTGGGATAAGCGCTGGAAACGGTGTCT" + "AATACTGGATATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCG" + "GCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGTCGACGGGTAGCCGGCCTGAGAGGGTGACC" + "GGCCACACTGGGACTGAGACACGGCCCAGACTCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGC" + "GGAAGCCTGATGCAGCAACGCCGCGTGAGGGACGACGGCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAG" + "AAGCGAGAGTGACGGTACCTGCAGAAAAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAG" + "GGCGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAA" + "CCCGAGGCTCAACCTNNGGGCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCC" + "TGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAAC" + "TGACGCTGAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTT" + "GGGAACTAGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCACGTAACGCATTAAGTTCCCCGCCTGGGG" + "AGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTA" + "AATCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCT" + "TTGGACACTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCG" + "CAACGAGCGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTC" + "AACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACA" + "ATGGCCGGTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATT" + "GAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATA" + "CGTTCCCGGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCTAA" + "CCCTTGTGGAGGGAGCCGGTAATTAAA"; String target = "CTGGCCGCCTGCTTAACACATCCAAGTCGAACGGTGAAGCCCCANCTTACTGGGTGGATCAGTGCCGAAC" + "GGGTGAGTAACACGTGAGCAACCTCCCCCTGACTCTGGGATAAGCGCTGGAANCGGTGTCTAATACTGGA" + "TATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCGCCCTATGAG" + "CTTGTTGGTGAGGTAATGGCTCACCAAGCCGTCGACGGGTAGCCGGCCTGAGAGGGTGACCGNCCACACT" + "GGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGGAAGCCT" + "GATTCANCAACCCCGCGTGAGGGACGACGGCCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAGAAGCGAG" + "AGTGACGGTACCTGCAGAAAAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAA" + "GCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAACCCGAGG" + "CTCAACCTCGGGCCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTA" + "GCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCT" + "GAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTTGGGAACT" + "AGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCAGCTAACGCATTAAGTTCCCCGCCTGGGGAGTACGG" + "CCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTAATTCGAT" + "GCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCTTTGGACA" + "CTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAG" + "CGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTCAACTCGG" + "AGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCG" + "GTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATTGAGGTCT" + "GCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCC" + "GGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCCAACCCTTGT" + "GGAGGGAGCCGTCGAAGGTGGGATCGGTAATTAGGACTAAGTCGTAACAAGGTAGCCGTACC"; GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); PairwiseSequenceAligner aligner = Alignments.getPairwiseAligner( new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), PairwiseSequenceAlignerType.GLOBAL, penalty, SubstitutionMatrixHelper.getNuc4_4()); SequencePair alignment = aligner.getPair(); System.out.println(alignment); int identical = alignment.getNumIdenticals(); System.out.println("Number of identical residues: " + identical); System.out.println("% identical query: " + identical / (float) query.length() ); System.out.println("% identical query: " + identical / (float) target.length() ); } } On Mon, Apr 18, 2011 at 8:22 AM, Wim De Smet wrote: > Hi all, > > I've been trying to generate some global alignments with biojava and > comparing them with what needle returns. Doing this, I can't seem to > reproduce needle's alignment with biojava. The score returned from biojava > seems to be worse than that from needle, so I'm not sure what's happening > here. > > The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid > spamming people, let me know if you want one). I align them with: > GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); > PairwiseSequenceAligner aligner = > Alignments.getPairwiseAligner( > new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), > new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), > PairwiseSequenceAlignerType.GLOBAL, > penalty, SubstitutionMatrixHelper.getNuc4_4()); > SequencePair > alignment = aligner.getPair(); > > This gives me an alignment with only 23% similarity and a gap at the end. > Varying the gap penalties can give me a gap in front too, but that's about > it. When aligning in needle, I get a sequence with a higher score (6784 vs > (-)5862) and 94% similarity (which seems closer to home). Needle I just run > with defaults (so it uses EDNAFULL) and a go/ge of 14/4. > > Could this be a bug or am I misunderstanding some of the options? > > BTW, if I use a really large gapextend, say -4000, I also get a nullpointer > exception. > > TIA, > Wim De Smet > > -- > Wim De Smet > http://www.straininfo.net/ > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From jayunit100 at gmail.com Wed Apr 20 17:54:44 2011 From: jayunit100 at gmail.com (Jay Vyas) Date: Wed, 20 Apr 2011 13:54:44 -0400 Subject: [Biojava-l] biojava resolver issues in maven Message-ID: Hi guys, I've intermittently been getting some weird biojava/maven resolution errors. Any ideas on this ? 4/20/11 1:52:13 PM EDT: [WARN] Failure to transfer org.biojava:biojava3-structure:3.0-alpha3-SNAPSHOT/maven-metadata.xml from http://download.java.net/maven/1 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repository.dev.java.net has elapsed or updates are forced. From andreas at sdsc.edu Wed Apr 20 18:04:38 2011 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 20 Apr 2011 11:04:38 -0700 Subject: [Biojava-l] biojava resolver issues in maven In-Reply-To: References: Message-ID: BioJava jar files are hosted at our own repository, not the public Maven ones. It should be in your list of repositories, however if you are setting up your own custom pom file, make sure to add: ... biojava-maven-repo BioJava repository http://www.biojava.org/download/maven/ true true Andreas On Wed, Apr 20, 2011 at 10:54 AM, Jay Vyas wrote: > ? Hi guys, I've intermittently been getting some weird biojava/maven > resolution errors. ?Any ideas on this ? > > 4/20/11 1:52:13 PM EDT: [WARN] Failure to transfer > org.biojava:biojava3-structure:3.0-alpha3-SNAPSHOT/maven-metadata.xml from > http://download.java.net/maven/1 was cached in the local repository, > resolution will not be reattempted until the update interval of > maven-repository.dev.java.net has elapsed or updates are forced. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From frieman6 at gmail.com Thu Apr 21 06:44:02 2011 From: frieman6 at gmail.com (omer f) Date: Thu, 21 Apr 2011 09:44:02 +0300 Subject: [Biojava-l] Bio Java API - Amino Acids Data Message-ID: Hi All, I Applied to the "Amino Acid" Project, I found in the Bio Java API there is Amino Acids Objects, But what i couldn't find - where there is Data (weight,Charge...) about each Amino Acid, My question, is there basic Data base for Amino Acids or we should get it from user input, Thank you, Omer. From Wim.DeSmet at UGent.be Thu Apr 21 07:54:56 2011 From: Wim.DeSmet at UGent.be (Wim De Smet) Date: Thu, 21 Apr 2011 09:54:56 +0200 Subject: [Biojava-l] comparison of the pairwise aligner to emboss' needle In-Reply-To: References: <4DAC5732.5010507@UGent.be> Message-ID: <4DAFE2D0.7090601@UGent.be> Hi Andreas Thanks for having a look. I found the issue: if your sequence is lower case, the alignment is different. Try aligning with query.toLowerCase() and target.toLowerCase(), then the output is: Number of identical residues: 334 % identical query: 0.23405746 % identical query: 0.22845417 vs upper case Number of identical residues: 1394 % identical query: 0.9768746 % identical query: 0.95348835 So I just inserted a toUpperCase() and it works now. Regards Wim On 20-04-11 19:33, Andreas Prlic wrote: > Hi Wim, > > are you sure you are using the correct sequences in your test? When I > run the code at the bottom of this emails I am getting 95 and 97% > sequence ID, which is similar to what you are expecting. > > Andreas > > Here my code: (using latest code from SVN) > > package demo; > > import org.biojava3.alignment.Alignments; > import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType; > import org.biojava3.alignment.SimpleGapPenalty; > import org.biojava3.alignment.SubstitutionMatrixHelper; > import org.biojava3.alignment.template.GapPenalty; > import org.biojava3.alignment.template.PairwiseSequenceAligner; > import org.biojava3.alignment.template.SequencePair; > import org.biojava3.core.sequence.DNASequence; > import org.biojava3.core.sequence.compound.AmbiguityDNACompoundSet; > import org.biojava3.core.sequence.compound.NucleotideCompound; > > public class TestDNANeedlemanWunsch { > public static void main(String[] args){ > > String query = > "AGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGTGAAGCCCAGCTTGCTGGGTGGATCA" > + > "GTGGCGAACGGGTGAGTAACACGTGAGCAACCTGCCCCTGACTCTGGGATAAGCGCTGGAAACGGTGTCT" + > "AATACTGGATATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCG" + > "GCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGTCGACGGGTAGCCGGCCTGAGAGGGTGACC" + > "GGCCACACTGGGACTGAGACACGGCCCAGACTCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGC" + > "GGAAGCCTGATGCAGCAACGCCGCGTGAGGGACGACGGCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAG" + > "AAGCGAGAGTGACGGTACCTGCAGAAAAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAG" + > "GGCGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAA" + > "CCCGAGGCTCAACCTNNGGGCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCC" + > "TGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAAC" + > "TGACGCTGAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTT" + > "GGGAACTAGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCACGTAACGCATTAAGTTCCCCGCCTGGGG" + > "AGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTA" + > "AATCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCT" + > "TTGGACACTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCG" + > "CAACGAGCGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTC" + > "AACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACA" + > "ATGGCCGGTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATT" + > "GAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATA" + > "CGTTCCCGGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCTAA" + > "CCCTTGTGGAGGGAGCCGGTAATTAAA"; > > String target = > "CTGGCCGCCTGCTTAACACATCCAAGTCGAACGGTGAAGCCCCANCTTACTGGGTGGATCAGTGCCGAAC" > + > "GGGTGAGTAACACGTGAGCAACCTCCCCCTGACTCTGGGATAAGCGCTGGAANCGGTGTCTAATACTGGA" + > "TATGAGCTACCACCGCATGGTGAGTGGTTGGAAAGATTTTTCGGTTGGGGATGGGCTCGCGCCCTATGAG" + > "CTTGTTGGTGAGGTAATGGCTCACCAAGCCGTCGACGGGTAGCCGGCCTGAGAGGGTGACCGNCCACACT" + > "GGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGGAAGCCT" + > "GATTCANCAACCCCGCGTGAGGGACGACGGCCTTCGGGTTGTAAACCTCTTTTAGCAGGGAAGAAGCGAG" + > "AGTGACGGTACCTGCAGAAAAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAA" + > "GCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAACCCGAGG" + > "CTCAACCTCGGGCCTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTA" + > "GCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCT" + > "GAGGAGCGAAAGGGTGGGGAGCAAACAGGCTTAGATACCCTGGTAGTCCACCCCGTAAACGTTGGGAACT" + > "AGTTGTGGGGTCCTTTCCACGGATTCCGTGACGCAGCTAACGCATTAAGTTCCCCGCCTGGGGAGTACGG" + > "CCGCAAGGCTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCATGCGGATTAATTCGAT" + > "GCAACGCGAAGAACCTTACCAAGGCTTGACATACACGAGAACGCTGCAGAAATGTAGAACTCTTTGGACA" + > "CTCGTGAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAG" + > "CGCAACCCTCGTTCTATGTTGCCAGCACGTAATGGTGGGAACTCATGGGATACTGCCGGGGTCAACTCGG" + > "AGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCG" + > "GTACAAAGGGCTGCAATACCGTGAGGTGGAGCGAATCCCAAAAAGCCGGTCCCAGTTCGGATTGAGGTCT" + > "GCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCC" + > "GGGTCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCTGAAGCCGGTGGCCCAACCCTTGT" + > "GGAGGGAGCCGTCGAAGGTGGGATCGGTAATTAGGACTAAGTCGTAACAAGGTAGCCGTACC"; > > GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); > PairwiseSequenceAligner aligner = > Alignments.getPairwiseAligner( > new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), > new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), > PairwiseSequenceAlignerType.GLOBAL, > penalty, SubstitutionMatrixHelper.getNuc4_4()); > SequencePair > alignment = aligner.getPair(); > > System.out.println(alignment); > > int identical = alignment.getNumIdenticals(); > System.out.println("Number of identical residues: " + identical); > System.out.println("% identical query: " + identical / (float) > query.length() ); > System.out.println("% identical query: " + identical / (float) > target.length() ); > } > } > > > > > > > On Mon, Apr 18, 2011 at 8:22 AM, Wim De Smet wrote: >> Hi all, >> >> I've been trying to generate some global alignments with biojava and >> comparing them with what needle returns. Doing this, I can't seem to >> reproduce needle's alignment with biojava. The score returned from biojava >> seems to be worse than that from needle, so I'm not sure what's happening >> here. >> >> The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid >> spamming people, let me know if you want one). I align them with: >> GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4); >> PairwiseSequenceAligner aligner = >> Alignments.getPairwiseAligner( >> new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()), >> new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()), >> PairwiseSequenceAlignerType.GLOBAL, >> penalty, SubstitutionMatrixHelper.getNuc4_4()); >> SequencePair >> alignment = aligner.getPair(); >> >> This gives me an alignment with only 23% similarity and a gap at the end. >> Varying the gap penalties can give me a gap in front too, but that's about >> it. When aligning in needle, I get a sequence with a higher score (6784 vs >> (-)5862) and 94% similarity (which seems closer to home). Needle I just run >> with defaults (so it uses EDNAFULL) and a go/ge of 14/4. >> >> Could this be a bug or am I misunderstanding some of the options? >> >> BTW, if I use a really large gapextend, say -4000, I also get a nullpointer >> exception. >> >> TIA, >> Wim De Smet >> >> -- >> Wim De Smet >> http://www.straininfo.net/ >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > -- Wim De Smet http://www.straininfo.net/ From khalil.elmazouari at gmail.com Thu Apr 21 10:36:29 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 21 Apr 2011 12:36:29 +0200 Subject: [Biojava-l] Codon count Message-ID: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> Hi, I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. Any suggestion is welcome. Regards, khalil From ayates at ebi.ac.uk Thu Apr 21 11:18:55 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 12:18:55 +0100 Subject: [Biojava-l] Bio Java API - Amino Acids Data In-Reply-To: References: Message-ID: <4408CC30-6456-4269-B192-E9F66CEDAC3C@ebi.ac.uk> Hi Omer, If you are using BioJava3 then weight is available from the AminoAcidCompound object when taken from an AminoAcidCompoundSet. The other data points are not available from within BioJava. I know Andreas mentioned ages ago linking into one of the compound databases e.g. ChEBI but I don't think that's been done. Regards, Andy On 21 Apr 2011, at 07:44, omer f wrote: > Hi All, > > I Applied to the "Amino Acid" Project, > I found in the Bio Java API there is Amino Acids Objects, > But what i couldn't find - where there is Data (weight,Charge...) about each > Amino Acid, > My question, is there basic Data base for Amino Acids or we should get it > from user input, > > Thank you, > Omer. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Thu Apr 21 11:23:47 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 12:23:47 +0100 Subject: [Biojava-l] Codon count In-Reply-To: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> Message-ID: Hi Khalil, I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. Regards, Andy On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: > Hi, > > I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. > > Any suggestion is welcome. > > Regards, > > khalil > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From khalil.elmazouari at gmail.com Thu Apr 21 11:37:16 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 21 Apr 2011 13:37:16 +0200 Subject: [Biojava-l] Codon count In-Reply-To: References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> Message-ID: <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> Thanks Andy, it's the second option I am looking for. Regards, khalil On 21 Apr 2011, at 13:23, Andy Yates wrote: > Hi Khalil, > > I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. > > Regards, > > Andy > > On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: > >> Hi, >> >> I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. >> >> Any suggestion is welcome. >> >> Regards, >> >> khalil >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > From ayates at ebi.ac.uk Thu Apr 21 11:40:00 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 12:40:00 +0100 Subject: [Biojava-l] Codon count In-Reply-To: <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> Message-ID: <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> Hi Khalil, Then I think windowed sequence is the only way to go. Actually one particularly "interesting" idea has just sprung to mind. What if you translated the entire sequence in frame 1 forward & reverse? Then finding the amount of correct codons is a case of looking for amino acids which are not a stop or unknown amino acid. Andy On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote: > Thanks Andy, > it's the second option I am looking for. > > Regards, > khalil > > > > On 21 Apr 2011, at 13:23, Andy Yates wrote: > >> Hi Khalil, >> >> I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. >> >> Regards, >> >> Andy >> >> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: >> >>> Hi, >>> >>> I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. >>> >>> Any suggestion is welcome. >>> >>> Regards, >>> >>> khalil >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From khalil.elmazouari at gmail.com Thu Apr 21 11:54:23 2011 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Thu, 21 Apr 2011 13:54:23 +0200 Subject: [Biojava-l] Codon count In-Reply-To: <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> Message-ID: <7DFE6B97-A465-4A1A-8EC6-A6BEC1EFCBF3@gmail.com> Hi Andy, I am actually counting codons via 6 ORFs translations. I am working on ?100.000 seq/run => 600.000 ORFs to check. So, performance is an issue for my job. I am just wondering if counting Codons directly on NT seq (both strand) will be faster vs translation + AA counting. Regards, khalil On 21 Apr 2011, at 13:40, Andy Yates wrote: > Hi Khalil, > > Then I think windowed sequence is the only way to go. Actually one particularly "interesting" idea has just sprung to mind. What if you translated the entire sequence in frame 1 forward & reverse? Then finding the amount of correct codons is a case of looking for amino acids which are not a stop or unknown amino acid. > > Andy > > On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote: > >> Thanks Andy, >> it's the second option I am looking for. >> >> Regards, >> khalil >> >> >> >> On 21 Apr 2011, at 13:23, Andy Yates wrote: >> >>> Hi Khalil, >>> >>> I'm not 100% sure what you want here. If you just want to know the potential number of codons on both strands of DNA then it would be (length / 3)*2. If what you are actually asking for is how many codons code for an amino acid then you would have to perform work similar to the transcription engine in BJ3. All codon tables are available from the IUPACParser class & then it would be up to you to use a WindowedSequence over the top of your NT sequence to get the windows or SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the WindowedSequence. >>> >>> Regards, >>> >>> Andy >>> >>> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: >>> >>>> Hi, >>>> >>>> I am looking for a simple method or class to count the number of a specific AA codon on NT seq. Counting on both strands. >>>> >>>> Any suggestion is welcome. >>>> >>>> Regards, >>>> >>>> khalil >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Andrew Yates Ensembl Genomes Engineer >>> EMBL-EBI Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>> >>> >>> >>> >> > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > From ayates at ebi.ac.uk Thu Apr 21 12:06:35 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 13:06:35 +0100 Subject: [Biojava-l] Codon count In-Reply-To: <7DFE6B97-A465-4A1A-8EC6-A6BEC1EFCBF3@gmail.com> References: <7348CEDD-3D91-4351-A24E-130B4D72DAE8@gmail.com> <1DECE13D-34A1-4EFC-ADED-320395E9320A@gmail.com> <6E526B83-5E96-4364-A248-9270FE139D7D@ebi.ac.uk> <7DFE6B97-A465-4A1A-8EC6-A6BEC1EFCBF3@gmail.com> Message-ID: <6177ca65-9f8b-40bb-8856-03bf6ad62361@email.android.com> There will be a performance hit but you'll be rewriting the translation code so maybe the speed reduction isn't worth the recoding task. Give it a benchmark before recoding. I can't remember the exact speed but it isn't too slow Andy Khalil El Mazouari wrote: >Hi Andy, > >I am actually counting codons via 6 ORFs translations. I am working on >?100.000 seq/run => 600.000 ORFs to check. So, performance is an issue >for my job. > >I am just wondering if counting Codons directly on NT seq (both strand) >will be faster vs translation + AA counting. > >Regards, > >khalil > > >On 21 Apr 2011, at 13:40, Andy Yates wrote: > >> Hi Khalil, >> >> Then I think windowed sequence is the only way to go. Actually one >particularly "interesting" idea has just sprung to mind. What if you >translated the entire sequence in frame 1 forward & reverse? Then >finding the amount of correct codons is a case of looking for amino >acids which are not a stop or unknown amino acid. >> >> Andy >> >> On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote: >> >>> Thanks Andy, >>> it's the second option I am looking for. >>> >>> Regards, >>> khalil >>> >>> >>> >>> On 21 Apr 2011, at 13:23, Andy Yates wrote: >>> >>>> Hi Khalil, >>>> >>>> I'm not 100% sure what you want here. If you just want to know the >potential number of codons on both strands of DNA then it would be >(length / 3)*2. If what you are actually asking for is how many codons >code for an amino acid then you would have to perform work similar to >the transcription engine in BJ3. All codon tables are available from >the IUPACParser class & then it would be up to you to use a >WindowedSequence over the top of your NT sequence to get the windows or >SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the >WindowedSequence. >>>> >>>> Regards, >>>> >>>> Andy >>>> >>>> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote: >>>> >>>>> Hi, >>>>> >>>>> I am looking for a simple method or class to count the number of a >specific AA codon on NT seq. Counting on both strands. >>>>> >>>>> Any suggestion is welcome. >>>>> >>>>> Regards, >>>>> >>>>> khalil >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>> >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> From flf.mib at gmail.com Thu Apr 21 21:18:01 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_F=E8vre?=) Date: Thu, 21 Apr 2011 23:18:01 +0200 Subject: [Biojava-l] translation and leucine Message-ID: <4DB09F09.4060303@gmail.com> Dear all, i have a quick question about translation with biojava 3 I would like to retrieve that the codon CTA is coding for a leucine. But some leucine codons code also for a start in Universal Genetic code Here I have build a very short example: given a short dna sequence coomposed of a start codon and 6 leucine codons TranscriptionEngine e = TranscriptionEngine.getDefault(); DNASequence dd = new DNASequence("ATGTTGTTACTTCTCCTACTG"); //return MLLLLLL : OK DNASequence dd = new DNASequence("CTATTGTTACTTCTCCTACTG"); //return MLLLLLL : KO I would prefer have LLLLLLL ! DNASequence dd = new DNASequence("CTA"); //return M : KO I would prefer have L ! Could someone explain me this feature ? How the default transcritionEngine works? How can I ask to the TranscriptionEngine give me the aminoacids corresponding to CTA when it is not in first position? Thanks a lot for your help ! Francois From ayates at ebi.ac.uk Thu Apr 21 21:46:14 2011 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 21 Apr 2011 22:46:14 +0100 Subject: [Biojava-l] translation and leucine In-Reply-To: <4DB09F09.4060303@gmail.com> References: <4DB09F09.4060303@gmail.com> Message-ID: <44E003C7-40C8-4595-BCAC-1613C296FFEC@ebi.ac.uk> Hi Francois, The engine by default will always return the first amino acid as an init met if the amino acid could have been a start codon (but stupid atmo but it will be altered). If you do not want this behaviour then you'll have to create your own. You can do this using the following: TranscriptionEngine e = new TranscriptionEngine.Builder().initMet(false).build(); HTH, Andy On 21 Apr 2011, at 22:18, Fran?ois Le F?vre wrote: > Dear all, > i have a quick question about translation with biojava 3 > > I would like to retrieve that the codon CTA is coding for a leucine. > But some leucine codons code also for a start in Universal Genetic code > > Here I have build a very short example: > given a short dna sequence coomposed of a start codon and 6 leucine codons > > TranscriptionEngine e = TranscriptionEngine.getDefault(); > DNASequence dd = new DNASequence("ATGTTGTTACTTCTCCTACTG"); > //return MLLLLLL : OK > > DNASequence dd = new DNASequence("CTATTGTTACTTCTCCTACTG"); > //return MLLLLLL : KO I would prefer have LLLLLLL ! > > DNASequence dd = new DNASequence("CTA"); > //return M : KO I would prefer have L ! > > Could someone explain me this feature ? > How the default transcritionEngine works? > > How can I ask to the TranscriptionEngine give me the aminoacids corresponding to CTA when it is not in first position? > > Thanks a lot for your help ! > > Francois > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From flf.mib at gmail.com Sat Apr 23 06:12:53 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_F=E8vre?=) Date: Sat, 23 Apr 2011 08:12:53 +0200 Subject: [Biojava-l] translation and leucine In-Reply-To: <44E003C7-40C8-4595-BCAC-1613C296FFEC@ebi.ac.uk> References: <4DB09F09.4060303@gmail.com> <44E003C7-40C8-4595-BCAC-1613C296FFEC@ebi.ac.uk> Message-ID: <4DB26DE5.8000805@gmail.com> Andy ok, perfect I did not know it! Thanks Francois > Hi Francois, > > The engine by default will always return the first amino acid as an init met if the amino acid could have been a start codon (but stupid atmo but it will be altered). If you do not want this behaviour then you'll have to create your own. You can do this using the following: > > TranscriptionEngine e = new TranscriptionEngine.Builder().initMet(false).build(); > > HTH, > > Andy > > On 21 Apr 2011, at 22:18, Fran?ois Le F?vre wrote: > >> Dear all, >> i have a quick question about translation with biojava 3 >> >> I would like to retrieve that the codon CTA is coding for a leucine. >> But some leucine codons code also for a start in Universal Genetic code >> >> Here I have build a very short example: >> given a short dna sequence coomposed of a start codon and 6 leucine codons >> >> TranscriptionEngine e = TranscriptionEngine.getDefault(); >> DNASequence dd = new DNASequence("ATGTTGTTACTTCTCCTACTG"); >> //return MLLLLLL : OK >> >> DNASequence dd = new DNASequence("CTATTGTTACTTCTCCTACTG"); >> //return MLLLLLL : KO I would prefer have LLLLLLL ! >> >> DNASequence dd = new DNASequence("CTA"); >> //return M : KO I would prefer have L ! >> >> Could someone explain me this feature ? >> How the default transcritionEngine works? >> >> How can I ask to the TranscriptionEngine give me the aminoacids corresponding to CTA when it is not in first position? >> >> Thanks a lot for your help ! >> >> Francois >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l From rmb32 at cornell.edu Mon Apr 25 21:42:48 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 25 Apr 2011 14:42:48 -0700 Subject: [Biojava-l] Announcing OBF Google Summer of Code Accepted Students Message-ID: <4DB5EAD8.1020905@cornell.edu> Hello all, I'm very pleased and excited to announce that the Open Bioinformatics Foundation has selected 6 very capable students to work on OBF projects this summer as part of the Google Summer of Code program. The accepted students, their projects, and their mentors (in alphabetical order): Justinas Vygintas Daugmaudis Michele dos Santos da Silva (2 students!) Mocapy++Biopython: from data to probabilistic models of biomolecules mentored by Thomas Hamelryck and Eric Talevich Chuan Hock Koh BioJava - Amino acids physico-chemical properties calculation mentored by Peter Troshin, Andreas Prlic, and Jay Vyas Micha? Koziarski Representing bio-objects and related information with images (BioRuby) mentored by Raoul J.P. Bonnal and Francesco Strozzi Sheena Scroggins Major BioPerl Reorganization mentored by Robert Buels and Chris Fields Mikael Eric Trellet Interface analysis module for BioPython mentored by Jo?o Rodrigues and Eric Talevich Once again this year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF's open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received. For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it's time to "put your money where your mouth is", as the saying goes. Let's get out there and write some great code this summer! Best regards, Rob ---- Robert Buels OBF GSoC 2011 Administrator From pavlo.lutsik at googlemail.com Tue Apr 26 11:07:42 2011 From: pavlo.lutsik at googlemail.com (Pavlo Lutsik) Date: Tue, 26 Apr 2011 13:07:42 +0200 Subject: [Biojava-l] NCBIQBlastService Message-ID: Hi, cannot get the subject class working in the Cookbook example. Simple rbw.printRemoteBlastInfo() throws the java.lang.Exception: Impossible to get info from QBlast service at this time. Check your network connection. Any ideas? Best, Pavlo Lutsik From p.v.troshin at dundee.ac.uk Wed Apr 27 13:49:54 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 27 Apr 2011 14:49:54 +0100 Subject: [Biojava-l] amino acid physico-chemical properties calculation project In-Reply-To: References: Message-ID: <4DB81F02.2020504@dundee.ac.uk> Dear students, I would like to thank all of you for your interest in and enthusiasm for the amino acid physico-chemical properties calculation project 2011 Google Summer of Code. Unfortunately, only one student has to be chosen and I am sorry if this happens not to be you. This idea generated enormous interest and we have received a total of 18 applications for this idea. Many of those are of good or very good quality. I worked with many of you during the application process, and was very impressed by the level of enthusiasm, energy, and capability I saw in the applications and our conversations on the mailing list. It wasn't easy to choose the best applicant, but we had to do it. A few general comments on the solutions for the short coding exercise. 1) Make sure you understand the task. About half of the application fails to stick to the specifications. 2) Only a few people used threads correctly. Among them only one person used java.concurrency package, which I expected everyone to use. I would recommend reading B.Goetz "Java Concurrency in Practice" if you want to learn more about multithreading in Java. 3) Do not overcomplicate the solution; good programmer would do just what he was asked to. 4) You can determine whether your solution worked correctly by processing the following input aaabbbccc bbcccalkfw aaaabb aaabbbcc caaabbbccc aaabbbccc aaaaabbbbb abbbbbccdddd sjdhfjksdhfk weiuriweiru ddddrepeatrepeat repeatrepeat and comparing it to the output that should have been produced: aaab a c aaaa sjdhfjksdhfk dddd Thanks to everyone again, and I wish you all the best of luck with whatever endeavour you take on. Regards, Peter On 26/04/2011 07:33, Alexandru Paiu wrote: > Hi Peter . > > I want to know what should I improve in the next gsoc . I wan't to > know what wasn't right for my application . > > In my opinion I think that I did a good job with the solutions for > instability index and isoelectric point which are the hardest methods > . I even used the jay vyas suggestion with those hashtables . > > Please , give me an advice > > Best regards Paiu Alexandru From p.v.troshin at dundee.ac.uk Wed Apr 27 14:05:42 2011 From: p.v.troshin at dundee.ac.uk (Peter Troshin) Date: Wed, 27 Apr 2011 15:05:42 +0100 Subject: [Biojava-l] Please give me an advice In-Reply-To: References: Message-ID: <4DB822B6.9030003@dundee.ac.uk> Dear Alex, There was nothing particularly wrong with your project plan, it just was not as good as the winning proposal. Your solution for the coding exercise did not run out of the box (java -jar runme.jar), was not correct and could have been better engineered. I'd recommend you to contribute to the open source project of your choice before applying to the GSoC next year. This way you will be in a much better position for the next GSoC. Regards, Peter On 26/04/2011 07:33, Alexandru Paiu wrote: > Hi Peter . > > I want to know what should I improve in the next gsoc . I wan't to > know what wasn't right for my application . > > In my opinion I think that I did a good job with the solutions for > instability index and isoelectric point which are the hardest methods > . I even used the jay vyas suggestion with those hashtables . > > Please , give me an advice > > Best regards > Paiu Alexandru From flf.mib at gmail.com Wed Apr 27 19:54:28 2011 From: flf.mib at gmail.com (=?ISO-8859-1?Q?Fran=E7ois_Le_F=E8vre?=) Date: Wed, 27 Apr 2011 21:54:28 +0200 Subject: [Biojava-l] biojava3 and blast xml parser Message-ID: <4DB87474.1080307@gmail.com> Dear all, I have just a difficulty: is there still a blast xml parser in the biojava 3.0-SNAPSHOT I am not able to find it... I have found http://www.biojava.org/docs/api1.8/org/biojava/bio/program/sax/blastxml/BlastXMLParser.html with a good tutorial here http://biojava.org/wiki/BioJava:CookBook:Blast:Echo but it seems to be part of biojava 1.8 can you confirm me? Thanks Francois From willishf at ufl.edu Wed Apr 27 21:30:52 2011 From: willishf at ufl.edu (Scooter Willis) Date: Wed, 27 Apr 2011 17:30:52 -0400 Subject: [Biojava-l] biojava3 and blast xml parser In-Reply-To: <4DB87474.1080307@gmail.com> References: <4DB87474.1080307@gmail.com> Message-ID: We did not migrate the blast xml parser from 1.8 to 3.0. Typically I will load the XML as a DOM object and use xpath to query the desired results. I have some code in the biojava3-genome module (org.biojava3.genome.BlastXMLQuery) that could possibly provide a solution for you depending on what you are trying to accomplish. Let me know if you have specific requirements that could help motivate formal blast XML->Java objects support in Biojava3. Thanks Scooter 2011/4/27 Fran?ois Le F?vre : > Dear all, > > I have just a difficulty: is there still a blast xml parser in the biojava > 3.0-SNAPSHOT > I am not able to find it... > > I have found > http://www.biojava.org/docs/api1.8/org/biojava/bio/program/sax/blastxml/BlastXMLParser.html > with a good tutorial here > http://biojava.org/wiki/BioJava:CookBook:Blast:Echo > > but it seems to be part of biojava 1.8 > > can you confirm me? > > Thanks > > Francois > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > >