From jbdundas at gmail.com Sun Nov 1 10:41:03 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Sun, 1 Nov 2009 21:11:03 +0530 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> Message-ID: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> Hi friends, I am getting this error on doing a post(using the code below) to this url-> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 I have written this code in .jsp file. Later I will change it into servlet. Error:- XML Parsing Error: XML or text declaration not at start of entity Location: http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI Line Number 11, Column 1:2034200 19877350 19877304 19877297 19877284 19877271 19877265 19877250 19877245 19877226 19877210 19877179 19877175 19877161 19877159 19877158 19877123 19877122 19877120 19877119 19877118 cancer "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields] "neoplasms"[MeSH Terms] MeSH Terms 2082133 Y "neoplasms"[All Fields] All Fields 1634731 Y OR "cancer"[All Fields] All Fields 902537 Y OR GROUP 2009/10/22[EDAT] EDAT 0 Y 2009/11/01[EDAT] EDAT 0 Y RANGE AND ("neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : 2009/11/01[EDAT] ^ As you can see, the XML output is coming fine but the above error does not go..The output via this program should be just like hitting manually the above URL in the browser.. The browser is Mozilla Firefox. Code:- <%@ page language = "java" %> <%@ page import = "java.sql.*" %> <%@ page import = "java.util.*" %> <%@ page import = "java.io.*" %> <%@ page import="java.lang.*" %> <%@ page import="java.net.*" %> <%@ page import="java.nio.*" %> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> <% try { //String str = ""; //out.println(""); Properties systemSettings = System.getProperties(); systemSettings.put("http.proxyHost", "********"); systemSettings.put("http.proxyPort", "******"); systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); //out.println("Properties Set"); Authenticator.setDefault(new Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication("**", "******".toCharArray()); // specify ur user name password of iitb login } }); System.setProperties(systemSettings); //out.println("After Authentication & Properties Settings"); //create xml file. //the input to google api //String textAreaContent = request.getParameter("text"); String textAreaContent = "This si a tst"; String str = ""; //xml file generation ends here.. //FetchDataFromNCBI_URLString.jsp String URLString = request.getParameter("txtURLString").trim(); //URL url = new URL(" http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 "); URL url = new URL(URLString); //url string taken from user input. HttpURLConnection connection = null; connection = (HttpURLConnection) url.openConnection(); System.out.println("After open connection"); connection.setRequestMethod("POST"); connection.setDoInput(true); connection.setDoOutput(true); connection.setUseCaches(false); connection.setAllowUserInteraction(false); //connection.setFollowRedirects(true); //connection.setInstanceFollowRedirects(true); //System.out.println("Before-------------------"); connection.setRequestProperty ("Content-Type","text/xml; charset=\"utf-8\""); //System.out.println("After-------------------"); //System.out.println(""+ connection.getOutputStream()); //System.out.println("After dataoutputstream..Line No-65"); //System.out.println("Response Code="+ connection.getResponseCode); OutputStreamWriter dosout = new OutputStreamWriter(connection.getOutputStream()); //System.out.println("After dosout object..Line No-63"); //dosout.write(str); dosout.close (); BufferedReader in = new BufferedReader( new InputStreamReader( connection.getInputStream())); String decodedString; String tempstr = ""; while ((decodedString = in.readLine()) != null) { tempstr = tempstr + decodedString; //out.println(decodedString); } out.println(tempstr); in.close(); } catch(Exception ex) { out.println("Exception->"+ex); PrintWriter pw = response.getWriter(); ex.printStackTrace(pw); } %> Thanks in advance.. Regards, JItesh Dundas From andreas at sdsc.edu Sun Nov 1 11:06:29 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 1 Nov 2009 08:06:29 -0800 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> Message-ID: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> Hi Jitesh, It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... Andreas On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: > Hi friends, > > I am getting this error on doing a post(using the code below) to this url-> > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > > I have written this code in .jsp file. Later I will change it into servlet. > > Error:- > XML Parsing Error: XML or text declaration not at start of entity > Location: > > http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > Line Number 11, Column 1: PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd > ">2034200 > 19877350 19877304 19877297 > 19877284 19877271 19877265 > 19877250 19877245 19877226 > 19877210 19877179 19877175 > 19877161 19877159 19877158 > 19877123 19877122 19877120 > 19877119 19877118 > cancer > "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > Fields] > "neoplasms"[MeSH Terms] MeSH > Terms 2082133 Y > "neoplasms"[All Fields] > All > Fields 1634731 Y > OR "cancer"[All Fields] > All Fields 902537 Y > OR GROUP > 2009/10/22[EDAT] EDAT 0 > Y > 2009/11/01[EDAT] EDAT 0 > Y RANGE AND > ("neoplasms"[MeSH Terms] OR > "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > 2009/11/01[EDAT] > ^ > > As you can see, the XML output is coming fine but the above error does not > go..The output via this program should be just like hitting manually the > above URL in the browser.. > The browser is Mozilla Firefox. > > Code:- > > <%@ page language = "java" %> > <%@ page import = "java.sql.*" %> > <%@ page import = "java.util.*" %> > <%@ page import = "java.io.*" %> > <%@ page import="java.lang.*" %> > <%@ page import="java.net.*" %> > <%@ page import="java.nio.*" %> > <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > > > <% > > try > { > //String str = ""; > //out.println(""); > > Properties systemSettings = System.getProperties(); > systemSettings.put("http.proxyHost", "********"); > systemSettings.put("http.proxyPort", "******"); > systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > > //out.println("Properties Set"); > Authenticator.setDefault(new Authenticator() > { > protected PasswordAuthentication getPasswordAuthentication() > { > return new PasswordAuthentication("**", > "******".toCharArray()); // specify ur user name password of iitb login > } > }); > > > System.setProperties(systemSettings); > //out.println("After Authentication & Properties Settings"); > > //create xml file. > //the input to google api > //String textAreaContent = request.getParameter("text"); > String textAreaContent = "This si a tst"; > > String str = ""; > > //xml file generation ends here.. > //FetchDataFromNCBI_URLString.jsp > String URLString = request.getParameter("txtURLString").trim(); > > //URL url = new URL(" > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > "); > URL url = new URL(URLString); //url string taken from user input. > HttpURLConnection connection = null; > > connection = (HttpURLConnection) url.openConnection(); > System.out.println("After open connection"); > connection.setRequestMethod("POST"); > connection.setDoInput(true); > connection.setDoOutput(true); > > connection.setUseCaches(false); > connection.setAllowUserInteraction(false); > //connection.setFollowRedirects(true); > //connection.setInstanceFollowRedirects(true); > //System.out.println("Before-------------------"); > connection.setRequestProperty ("Content-Type","text/xml; > charset=\"utf-8\""); > //System.out.println("After-------------------"); > > //System.out.println(""+ connection.getOutputStream()); > > //System.out.println("After dataoutputstream..Line No-65"); > > //System.out.println("Response Code="+ connection.getResponseCode); > > OutputStreamWriter dosout = new > OutputStreamWriter(connection.getOutputStream()); > //System.out.println("After dosout object..Line No-63"); > //dosout.write(str); > dosout.close (); > > BufferedReader in = new BufferedReader( new InputStreamReader( > connection.getInputStream())); > > String decodedString; > String tempstr = ""; > > > while ((decodedString = in.readLine()) != null) > { > tempstr = tempstr + decodedString; > //out.println(decodedString); > } > out.println(tempstr); > in.close(); > } > catch(Exception ex) > { > out.println("Exception->"+ex); > PrintWriter pw = response.getWriter(); > ex.printStackTrace(pw); > } > > > %> > > Thanks in advance.. > > Regards, > JItesh Dundas > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jbdundas at gmail.com Mon Nov 2 03:19:19 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Mon, 2 Nov 2009 13:49:19 +0530 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> Message-ID: <326ea8620911020019w2b6a8307o5befcc5a4395299a@mail.gmail.com> Dear Dr. *Andreas Prlic,* Thank you for the advise. I will do that. Regards, Jitesh Dundas On 11/1/09, Andreas Prlic wrote: > > Hi Jitesh, > > It is hard to read your code with all the formatting off probably due to > email and many commented lines that don;t seem to get used. Can you provide > the stacktrace, so we can see what part of biojava is affected? > > Probably a good strategy to write and debug this is to simply the problem > into smaller steps. Try to first download the files you want to parse and > write the code to parse them from the local file. That will avoid any > issues you might encounter with networking and server/client communication. > Once the parsing is working you could take it to the next step and add the > server communication... > > Andreas > > > > > On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: > >> Hi friends, >> >> I am getting this error on doing a post(using the code below) to this >> url-> >> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 >> >> I have written this code in .jsp file. Later I will change it into >> servlet. >> >> Error:- >> XML Parsing Error: XML or text declaration not at start of entity >> Location: >> >> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI >> Line Number 11, Column 1:> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " >> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd >> ">2034200 >> 19877350 19877304 19877297 >> 19877284 19877271 19877265 >> 19877250 19877245 19877226 >> 19877210 19877179 19877175 >> 19877161 19877159 19877158 >> 19877123 19877122 19877120 >> 19877119 19877118 >> cancer >> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All >> Fields] >> "neoplasms"[MeSH Terms] MeSH >> Terms 2082133 Y >> "neoplasms"[All Fields] >> All >> Fields 1634731 Y >> OR "cancer"[All Fields] >> All Fields 902537 Y >> OR GROUP >> 2009/10/22[EDAT] EDAT 0 >> Y >> 2009/11/01[EDAT] EDAT 0 >> Y RANGE AND >> ("neoplasms"[MeSH Terms] OR >> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : >> 2009/11/01[EDAT] >> ^ >> >> As you can see, the XML output is coming fine but the above error does not >> go..The output via this program should be just like hitting manually the >> above URL in the browser.. >> The browser is Mozilla Firefox. >> >> Code:- >> >> <%@ page language = "java" %> >> <%@ page import = "java.sql.*" %> >> <%@ page import = "java.util.*" %> >> <%@ page import = "java.io.*" %> >> <%@ page import="java.lang.*" %> >> <%@ page import="java.net.*" %> >> <%@ page import="java.nio.*" %> >> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> >> >> >> <% >> >> try >> { >> //String str = ""; >> //out.println(""); >> >> Properties systemSettings = System.getProperties(); >> systemSettings.put("http.proxyHost", "********"); >> systemSettings.put("http.proxyPort", "******"); >> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); >> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); >> >> //out.println("Properties Set"); >> Authenticator.setDefault(new Authenticator() >> { >> protected PasswordAuthentication getPasswordAuthentication() >> { >> return new PasswordAuthentication("**", >> "******".toCharArray()); // specify ur user name password of iitb login >> } >> }); >> >> >> System.setProperties(systemSettings); >> //out.println("After Authentication & Properties Settings"); >> >> //create xml file. >> //the input to google api >> //String textAreaContent = request.getParameter("text"); >> String textAreaContent = "This si a tst"; >> >> String str = ""; >> >> //xml file generation ends here.. >> //FetchDataFromNCBI_URLString.jsp >> String URLString = request.getParameter("txtURLString").trim(); >> >> //URL url = new URL(" >> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 >> "); >> URL url = new URL(URLString); //url string taken from user input. >> HttpURLConnection connection = null; >> >> connection = (HttpURLConnection) url.openConnection(); >> System.out.println("After open connection"); >> connection.setRequestMethod("POST"); >> connection.setDoInput(true); >> connection.setDoOutput(true); >> >> connection.setUseCaches(false); >> connection.setAllowUserInteraction(false); >> //connection.setFollowRedirects(true); >> //connection.setInstanceFollowRedirects(true); >> //System.out.println("Before-------------------"); >> connection.setRequestProperty ("Content-Type","text/xml; >> charset=\"utf-8\""); >> //System.out.println("After-------------------"); >> >> //System.out.println(""+ connection.getOutputStream()); >> >> //System.out.println("After dataoutputstream..Line No-65"); >> >> //System.out.println("Response Code="+ connection.getResponseCode); >> >> OutputStreamWriter dosout = new >> OutputStreamWriter(connection.getOutputStream()); >> //System.out.println("After dosout object..Line No-63"); >> //dosout.write(str); >> dosout.close (); >> >> BufferedReader in = new BufferedReader( new InputStreamReader( >> connection.getInputStream())); >> >> String decodedString; >> String tempstr = ""; >> >> >> while ((decodedString = in.readLine()) != null) >> { >> tempstr = tempstr + decodedString; >> //out.println(decodedString); >> } >> out.println(tempstr); >> in.close(); >> } >> catch(Exception ex) >> { >> out.println("Exception->"+ex); >> PrintWriter pw = response.getWriter(); >> ex.printStackTrace(pw); >> } >> >> >> %> >> >> Thanks in advance.. >> >> Regards, >> JItesh Dundas >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From pingou at pingoured.fr Mon Nov 2 09:03:15 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Mon, 02 Nov 2009 15:03:15 +0100 Subject: [Biojava-l] NCBI xml parser Message-ID: <1257170595.29918.8.camel@localhost.localdomain> Dear list, I am trying to find my way around parsing ncbi blast xml. I am using a small library which performs the blast online [1] and returns a FileReader of the xml. I can convert the FileReader to a string and print it, it seems fine. (I used the default input shown on [1]). So I am now trying to parse it automatically. I looked at [2] and [3] but I could not get them working. I then found this message from this mailing list [4] and thus went to use BlastXMLParserFacade. It returns me an "org.xml.sax.SAXException: illegal frame number encountered. (0)". So my question is then: which method should I use ? Thanks in advance, Best regards, Pierre [1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/ [2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo [3] http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book [4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html From jogoodma at indiana.edu Mon Nov 2 09:45:09 2009 From: jogoodma at indiana.edu (Josh Goodman) Date: Mon, 02 Nov 2009 09:45:09 -0500 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257170595.29918.8.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> Message-ID: <4AEEF075.4020700@indiana.edu> It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1 for blastp. Hence the illegal frame number (0) error. Josh Pierre-Yves wrote: > Dear list, > > I am trying to find my way around parsing ncbi blast xml. > I am using a small library which performs the blast online [1] and > returns a FileReader of the xml. > I can convert the FileReader to a string and print it, it seems fine. > (I used the default input shown on [1]). > > So I am now trying to parse it automatically. I looked at [2] and [3] > but I could not get them working. I then found this message from this > mailing list [4] and thus went to use BlastXMLParserFacade. > It returns me an "org.xml.sax.SAXException: illegal frame number > encountered. (0)". > > So my question is then: which method should I use ? > > Thanks in advance, > > Best regards, > > Pierre > > > > [1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/ > [2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo > [3] > http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book > [4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From pingou at pingoured.fr Mon Nov 2 11:17:16 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Mon, 02 Nov 2009 17:17:16 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <4AEEF075.4020700@indiana.edu> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> Message-ID: <1257178636.29918.11.camel@localhost.localdomain> On Mon, 2009-11-02 at 09:45 -0500, Josh Goodman wrote: > It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1 > for blastp. Hence the illegal frame number (0) error. > > Josh Thanks for the hint. I downloaded the biojava-1.7-src.jar to check the sources and correct the frame to 0 (I already saw the case to change). However, without changing anything on the source, when I try to reproduce the error, I got a new one: "org.xml.sax.SAXParseException: The markup declarations contained or pointed to by the document type declaration must be well-formed." I understand the error, I am more surprised by the fact that the jar and the sources of the release 1.7 are given a different errors. Did I miss something ? Thanks, Best regards, Pierre From holland at eaglegenomics.com Mon Nov 2 12:16:00 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 2 Nov 2009 17:16:00 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> Message-ID: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> The graphs returned by the Nexus parser are instances that implement the org.jgrapht.UndirectedGraph interface. Undirected graphs have no root. cheers, Richard On 30 Oct 2009, at 21:14, Tiago Ant?o wrote: > Hi, > > I have been trying to use biojava to parse some trees on nexus files > and I have a small doubt: > If there is a rooted tree, how can one know what is the root vertex in > the weighted graph (JGraphT)? > I understand that there is no root if the tree is unrooted, but in > case it is rooted, how to determine the vertex? > > Many thanks, > Tiago > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Mon Nov 2 14:29:04 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 2 Nov 2009 11:29:04 -0800 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257178636.29918.11.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> Message-ID: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> > > > I understand the error, I am more surprised by the fact that the jar > and the sources of the release 1.7 are given a different errors. > > that's surprising... I built the src-jar and the other jars at the same time so the code should be identical... Are you sure you are doing exactly the same? Andreas From tiagoantao at gmail.com Mon Nov 2 14:36:31 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 2 Nov 2009 19:36:31 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> Message-ID: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> 2009/11/2 Richard Holland : > The graphs returned by the Nexus parser are instances that implement the > org.jgrapht.UndirectedGraph interface. Undirected graphs have no root. Yes, that is a property of the jgrapht. But it might not be the case of the original nexus file/tree. So, if the tree is rooted, how can one know the root (without doing the parsing again ourselves to discover it)? I note two things: a) The root is obviously not one taxa, but one intermediate node. b) Even if the tree is unrooted, it might be interesting to know the "root", for instance to draw the tree, in the way that is was written in the file. Tiago PS - I also added to bugzilla one but related to the parser, but that is different problem... From pingou at pingoured.fr Mon Nov 2 14:50:25 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Mon, 02 Nov 2009 20:50:25 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> Message-ID: <4AEF3801.10304@pingoured.fr> On 11/02/2009 08:29 PM, Andreas Prlic wrote: >> >> I understand the error, I am more surprised by the fact that the jar >> and the sources of the release 1.7 are given a different errors. >> >> > that's surprising... I built the src-jar and the other jars at the same time > so the code should be identical... Are you sure you are doing exactly the > same? I can confirm you this tomorrow but AFAIR before I left I tried the same code using or the jar file or the project generated from the sources in NetBeans and it gaves me two differents errors. Best regards, Pierre From holland at eaglegenomics.com Mon Nov 2 17:14:58 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 2 Nov 2009 22:14:58 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> Message-ID: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> The current parser that converts the original Newick tree string into a JGraphT does not take the root into account, and therefore it is not recorded anywhere in the JGraphT object. Someone would have to change the parser to be able to make it record the root node. In the meantime, the JGraph library which is used for displaying JGraphT graphs in a visual form does include root-finding methods, so maybe you could investigate there to see if any of the existing functions might help? cheers, Richard On 2 Nov 2009, at 19:36, Tiago Ant?o wrote: > 2009/11/2 Richard Holland : >> The graphs returned by the Nexus parser are instances that >> implement the >> org.jgrapht.UndirectedGraph interface. Undirected graphs have no >> root. > > > Yes, that is a property of the jgrapht. But it might not be the case > of the original nexus file/tree. So, if the tree is rooted, how can > one know the root (without doing the parsing again ourselves to > discover it)? I note two things: > a) The root is obviously not one taxa, but one intermediate node. > b) Even if the tree is unrooted, it might be interesting to know the > "root", for instance to draw the tree, in the way that is was written > in the file. > > Tiago > PS - I also added to bugzilla one but related to the parser, but that > is different problem... -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Mon Nov 2 18:11:13 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 2 Nov 2009 23:11:13 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> Message-ID: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> 2009/11/2 Richard Holland : > In the meantime, the JGraph library which is used for displaying JGraphT > graphs in a visual form does include root-finding methods, so maybe you > could investigate there to see if any of the existing functions might help? Did that. None can help as the graph is not directed (it would be trivial with a directed graph ,of course). In the current form, the nexus parser is of limited use for tree information: 1. For rooted trees it has a bug has it doesn't say what is the root 2. For unrooted trees, sometimes the "root" (what the user perceives as root) is interesting information. Tiago From holland at eaglegenomics.com Tue Nov 3 04:56:21 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 09:56:21 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> Message-ID: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> On 2 Nov 2009, at 23:11, Tiago Ant?o wrote: > 2009/11/2 Richard Holland : >> In the meantime, the JGraph library which is used for displaying >> JGraphT >> graphs in a visual form does include root-finding methods, so maybe >> you >> could investigate there to see if any of the existing functions >> might help? > > Did that. None can help as the graph is not directed (it would be > trivial with a directed graph ,of course). > In the current form, the nexus parser is of limited use for tree > information: > 1. For rooted trees it has a bug has it doesn't say what is the root The Newick strings used in the Nexus format are themselves undirected graphs. They don't specify which node is the root, which means it must be determined by computation after parsing the string. I'm unsure of the algorithm to use to do this. If there are people on this list who know the algorithm and have time to code it up, volunteers would be welcome. > 2. For unrooted trees, sometimes the "root" (what the user perceives > as root) is interesting information. What the user perceives as root in an unrooted tree could be different for every user, so it would be hard to provide a standard function to read their mind! However if everyone can come up with a commonly agreed way of determining the most likely root computationally, it would be interesting to add this as a feature, with the caveat that it is only a best-effort approximation as the original tree is unrooted. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From pingou at pingoured.fr Tue Nov 3 09:45:08 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Tue, 03 Nov 2009 15:45:08 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <4AEF3801.10304@pingoured.fr> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> Message-ID: <1257259508.26094.2.camel@localhost.localdomain> On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote: > On 11/02/2009 08:29 PM, Andreas Prlic wrote: > >> > >> I understand the error, I am more surprised by the fact that the jar > >> and the sources of the release 1.7 are given a different errors. > >> > >> > > that's surprising... I built the src-jar and the other jars at the same time > > so the code should be identical... Are you sure you are doing exactly the > > same? > > I can confirm you this tomorrow but AFAIR before I left I tried the same > code using or the jar file or the project generated from the sources in > NetBeans and it gaves me two differents errors. Ok so just for the record: - If I use the .jar file I get an error (1) - If I create a project in NetBeans using the source from BioJava I get a different error (2) - If I add as dependencies the sources from BioJava I get the first error (1) I thus went for the third solution and found my way around :-) Thanks for the help. Best regards, Pierre From andreas.prlic at gmail.com Tue Nov 3 09:56:06 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 3 Nov 2009 06:56:06 -0800 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257259508.26094.2.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> <1257259508.26094.2.camel@localhost.localdomain> Message-ID: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> So what you are saying is that you had a classpath problem and by configuring dependencies correctly the problem went away? Andreas On 3 Nov 2009, at 06:45, Pierre-Yves wrote: > On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote: >> On 11/02/2009 08:29 PM, Andreas Prlic wrote: >>>> >>>> I understand the error, I am more surprised by the fact that the >>>> jar >>>> and the sources of the release 1.7 are given a different errors. >>>> >>>> >>> that's surprising... I built the src-jar and the other jars at the >>> same time >>> so the code should be identical... Are you sure you are doing >>> exactly the >>> same? >> >> I can confirm you this tomorrow but AFAIR before I left I tried the >> same >> code using or the jar file or the project generated from the >> sources in >> NetBeans and it gaves me two differents errors. > > Ok so just for the record: > - If I use the .jar file I get an error (1) > - If I create a project in NetBeans using the source from BioJava I > get > a different error (2) > - If I add as dependencies the sources from BioJava I get the first > error (1) > > I thus went for the third solution and found my way around :-) > > Thanks for the help. > > Best regards, > > Pierre > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From pingou at pingoured.fr Tue Nov 3 10:00:32 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Tue, 03 Nov 2009 16:00:32 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> <1257259508.26094.2.camel@localhost.localdomain> <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> Message-ID: <1257260432.26094.3.camel@localhost.localdomain> On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote: > So what you are saying is that you had a classpath problem and by > configuring dependencies correctly the problem went away? In both case it was compiling, only the error at run time was different. Regards, Pierre From andreas.prlic at gmail.com Tue Nov 3 10:05:17 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 3 Nov 2009 07:05:17 -0800 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257260432.26094.3.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> <1257259508.26094.2.camel@localhost.localdomain> <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> <1257260432.26094.3.camel@localhost.localdomain> Message-ID: <447A40F9-52A1-4B22-8D10-27D22F8381B9@gmail.com> Can you send me the code snipplet off list so I can take a look? Thanks, A On 3 Nov 2009, at 07:00, Pierre-Yves wrote: > On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote: >> So what you are saying is that you had a classpath problem and by >> configuring dependencies correctly the problem went away? > > In both case it was compiling, only the error at run time was > different. > > Regards, > > Pierre > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From hlapp at gmx.net Tue Nov 3 11:53:23 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 3 Nov 2009 11:53:23 -0500 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> Message-ID: The most common ways to root a tree is by mid-point rooting, or using an outgroup. The latter I suppose is equivalent as the user specifying a node as the root. -hilmar On Nov 3, 2009, at 4:56 AM, Richard Holland wrote: > > On 2 Nov 2009, at 23:11, Tiago Ant?o wrote: > >> 2009/11/2 Richard Holland : >>> In the meantime, the JGraph library which is used for displaying >>> JGraphT >>> graphs in a visual form does include root-finding methods, so >>> maybe you >>> could investigate there to see if any of the existing functions >>> might help? >> >> Did that. None can help as the graph is not directed (it would be >> trivial with a directed graph ,of course). >> In the current form, the nexus parser is of limited use for tree >> information: >> 1. For rooted trees it has a bug has it doesn't say what is the root > > The Newick strings used in the Nexus format are themselves > undirected graphs. They don't specify which node is the root, which > means it must be determined by computation after parsing the string. > I'm unsure of the algorithm to use to do this. If there are people > on this list who know the algorithm and have time to code it up, > volunteers would be welcome. > >> 2. For unrooted trees, sometimes the "root" (what the user perceives >> as root) is interesting information. > > What the user perceives as root in an unrooted tree could be > different for every user, so it would be hard to provide a standard > function to read their mind! However if everyone can come up with a > commonly agreed way of determining the most likely root > computationally, it would be interesting to add this as a feature, > with the caveat that it is only a best-effort approximation as the > original tree is unrooted. > > cheers, > Richard > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From thasso.griebel at uni-jena.de Tue Nov 3 12:58:14 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Tue, 3 Nov 2009 18:58:14 +0100 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> Message-ID: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> Hi, > On 2 Nov 2009, at 23:11, Tiago Ant?o wrote: > >> 2009/11/2 Richard Holland : >>> In the meantime, the JGraph library which is used for displaying >>> JGraphT >>> graphs in a visual form does include root-finding methods, so >>> maybe you >>> could investigate there to see if any of the existing functions >>> might help? >> >> Did that. None can help as the graph is not directed (it would be >> trivial with a directed graph ,of course). >> In the current form, the nexus parser is of limited use for tree >> information: >> 1. For rooted trees it has a bug has it doesn't say what is the root > > The Newick strings used in the Nexus format are themselves > undirected graphs. They don't specify which node is the root, which > means it must be determined by computation after parsing the string. > I'm unsure of the algorithm to use to do this. If there are people > on this list who know the algorithm and have time to code it up, > volunteers would be welcome. There is a way to uniquely get a root from a newick string. Usually a rooted newick is surrounded with brackets, which indicates the root as the highest node in the tree. For example: (A, (B,C)) describes a tree rooted between "A" and the clade (B,C), and with the surrounding brackets this is unique. In nexus the situation might be a bit different. nexus allows you to prefix the newick string with [&R] or [&U] to indicate rooted/unrooted trees. For example: tree treename = [&R] ((A,(B,C)),(D,E)); is a valid rooted nexus tree where the root is placed between the clades [A.B,C] and [D,E], although in this example the newick is surrounded with brackets and rooted uniquely by itself. >> 2. For unrooted trees, sometimes the "root" (what the user perceives >> as root) is interesting information. > > What the user perceives as root in an unrooted tree could be > different for every user, so it would be hard to provide a standard > function to read their mind! However if everyone can come up with a > commonly agreed way of determining the most likely root > computationally, it would be interesting to add this as a feature, > with the caveat that it is only a best-effort approximation as the > original tree is unrooted. BioNJ implements multiple methods to determine a root in a neighbor- joining tree. I can look it up, but I think the most common ways to compute the root are: try to place the root in the "middle" such that your tree is balanced and you have equal number of leaves to both sides of the tree. The other method I remember is based on the edge weights. Basically you find the longest path between two leaves and place the root in the middle of that path (based on the path length). I think the most common way though is to specify an outgroup node and place the root on the path between that outgroup and its successor. I am not sure if the outgroup can be described in nexus somehow. I would also suggest to generally parse trees as rooted trees (maybe jsut for th initial internal model). Creating an unrooted tree from a rooted one is easy, remove the root and forget about directions. The other way might be hard and ambiguous. cheers, Thasso -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From tiagoantao at gmail.com Tue Nov 3 13:16:43 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 3 Nov 2009 18:16:43 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> Message-ID: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> 2009/11/3 Thasso Griebel : > There is a way to uniquely ?get a root from a newick string. Usually a > rooted newick is surrounded with brackets, which indicates the root as the > highest node in the tree. For example: > > (A, (B,C)) > Agree, it is quite easy to get the root of the tree from the newick representation. But it should be done on parsing and returned in some way by the parsing system. If the user has to do it again, it means that the user has to parse it again just to know the root node. > I would also suggest to generally parse trees as rooted trees (maybe jsut > for th initial internal model). Creating an unrooted tree from a rooted ?one > is easy, remove the root and forget about directions. The other way might be > hard and ambiguous. 100% agree. The newick _representation_ always has a root by virtue of the way it is done. If that root has meaning or not depends. Doing as you suggest seems the most reasonable idea. I would add that even if it is an unrooted tree, the topology might be of interest. In my case I am doing a comparative visualizer and it might be nice for the user to be able to visualize the topology as specified. It has no biological meaning, but in practice, for many users, it helps. I note that PhyloXML (even by virtue of being a XML format) always represents the phylogenies as trees (not weigthed DAGs). There an attribute rooted which can be true or false. But, anyway. Even assuming a very conservative view on this, the current parser, for rooted trees, does not allow to determine where is the root. I think that there would be a consensus that that is a bug? Tiago From holland at eaglegenomics.com Tue Nov 3 13:19:36 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 18:19:36 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> Message-ID: Agreed that there is a bug. Now all we need is someone to go in and fix it! :) cheers, Richard On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: > 2009/11/3 Thasso Griebel : >> There is a way to uniquely get a root from a newick string. >> Usually a >> rooted newick is surrounded with brackets, which indicates the root >> as the >> highest node in the tree. For example: >> >> (A, (B,C)) >> > > Agree, it is quite easy to get the root of the tree from the newick > representation. But it should be done on parsing and returned in some > way by the parsing system. If the user has to do it again, it means > that the user has to parse it again just to know the root node. > >> I would also suggest to generally parse trees as rooted trees >> (maybe jsut >> for th initial internal model). Creating an unrooted tree from a >> rooted one >> is easy, remove the root and forget about directions. The other way >> might be >> hard and ambiguous. > > 100% agree. > The newick _representation_ always has a root by virtue of the way it > is done. If that root has meaning or not depends. Doing as you suggest > seems the most reasonable idea. > I would add that even if it is an unrooted tree, the topology might be > of interest. In my case I am doing a comparative visualizer and it > might be nice for the user to be able to visualize the topology as > specified. It has no biological meaning, but in practice, for many > users, it helps. > I note that PhyloXML (even by virtue of being a XML format) always > represents the phylogenies as trees (not weigthed DAGs). There an > attribute rooted which can be true or false. > > But, anyway. Even assuming a very conservative view on this, the > current parser, for rooted trees, does not allow to determine where is > the root. I think that there would be a consensus that that is a bug? > > Tiago -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Tue Nov 3 13:24:52 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 3 Nov 2009 18:24:52 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> Message-ID: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> If somebody would provide the desired changes to the parser interface (wrt this bug and the other one reported previously), I might offer to to the grunt work. But somebody has to say which interface changes are desired. I remember which problems exist: 1. Lack of knowledge of root node 2. The p* stuff. Tiago 2009/11/3 Richard Holland : > Agreed that there is a bug. Now all we need is someone to go in and fix it! > :) > > cheers, > Richard > > On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: > >> 2009/11/3 Thasso Griebel : >>> >>> There is a way to uniquely ?get a root from a newick string. Usually a >>> rooted newick is surrounded with brackets, which indicates the root as >>> the >>> highest node in the tree. For example: >>> >>> (A, (B,C)) >>> >> >> Agree, it is quite easy to get the root of the tree from the newick >> representation. But it should be done on parsing and returned in some >> way by the parsing system. If the user has to do it again, it means >> that the user has to parse it again just to know the root node. >> >>> I would also suggest to generally parse trees as rooted trees (maybe jsut >>> for th initial internal model). Creating an unrooted tree from a rooted >>> ?one >>> is easy, remove the root and forget about directions. The other way might >>> be >>> hard and ambiguous. >> >> 100% agree. >> The newick _representation_ always has a root by virtue of the way it >> is done. If that root has meaning or not depends. Doing as you suggest >> seems the most reasonable idea. >> I would add that even if it is an unrooted tree, the topology might be >> of interest. In my case I am doing a comparative visualizer and it >> might be nice for the user to be able to visualize the topology as >> specified. It has no biological meaning, but in practice, for many >> users, it helps. >> I note that PhyloXML (even by virtue of being a XML format) always >> represents the phylogenies as trees (not weigthed DAGs). There an >> attribute rooted which can be true or false. >> >> But, anyway. Even assuming a very conservative view on this, the >> current parser, for rooted trees, does not allow to determine where is >> the root. I think that there would be a consensus that that is a bug? >> >> Tiago > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Tue Nov 3 13:46:05 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 18:46:05 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> Message-ID: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> > 1. Lack of knowledge of root node The Newick tree string is read as-is and is not parsed. It only gets parsed at the point of conversion to a Undirected or WeightedGraph inside the TreeBlocks.java source code (inside the two types of get-As- JGraphT methods). It's at this point the string is parsed and it's here that root note determination should take place. It's already known whether &R or &U have been specified here, which should help the code work out what to do. > 2. The p* stuff. Exactly the same part of the code as described above. Wherever it pushes values to the stack but prepends them with 'p' first, you'll need to change the 'p' to some instance variable and provide a getter/ setter to change it, with 'p' being the default setting. cheers, Richard > > Tiago > 2009/11/3 Richard Holland : >> Agreed that there is a bug. Now all we need is someone to go in and >> fix it! >> :) >> >> cheers, >> Richard >> >> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >> >>> 2009/11/3 Thasso Griebel : >>>> >>>> There is a way to uniquely get a root from a newick string. >>>> Usually a >>>> rooted newick is surrounded with brackets, which indicates the >>>> root as >>>> the >>>> highest node in the tree. For example: >>>> >>>> (A, (B,C)) >>>> >>> >>> Agree, it is quite easy to get the root of the tree from the newick >>> representation. But it should be done on parsing and returned in >>> some >>> way by the parsing system. If the user has to do it again, it means >>> that the user has to parse it again just to know the root node. >>> >>>> I would also suggest to generally parse trees as rooted trees >>>> (maybe jsut >>>> for th initial internal model). Creating an unrooted tree from a >>>> rooted >>>> one >>>> is easy, remove the root and forget about directions. The other >>>> way might >>>> be >>>> hard and ambiguous. >>> >>> 100% agree. >>> The newick _representation_ always has a root by virtue of the way >>> it >>> is done. If that root has meaning or not depends. Doing as you >>> suggest >>> seems the most reasonable idea. >>> I would add that even if it is an unrooted tree, the topology >>> might be >>> of interest. In my case I am doing a comparative visualizer and it >>> might be nice for the user to be able to visualize the topology as >>> specified. It has no biological meaning, but in practice, for many >>> users, it helps. >>> I note that PhyloXML (even by virtue of being a XML format) always >>> represents the phylogenies as trees (not weigthed DAGs). There an >>> attribute rooted which can be true or false. >>> >>> But, anyway. Even assuming a very conservative view on this, the >>> current parser, for rooted trees, does not allow to determine >>> where is >>> the root. I think that there would be a consensus that that is a >>> bug? >>> >>> Tiago >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Tue Nov 3 13:55:23 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 3 Nov 2009 18:55:23 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> Message-ID: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> But the point is that the class interface changes to the outside user: 1. How does one report back the root to the user? 2. Regarding the prefix stuff, should the user be allowed to specify a preferred prefix? Both this things imply interface changes visible to users. If you still need volunteers to do the change, I can do it. But I need to know what changes to the user interface are to be done. For 1, maybe a method getRoot, returning a string with the name of the root node? For 2, maybe an extended version of the parse function with a suffix as input parameter? 2009/11/3 Richard Holland : >> 1. Lack of knowledge of root node > > The Newick tree string is read as-is and is not parsed. It only gets parsed > at the point of conversion to a Undirected or WeightedGraph inside the > TreeBlocks.java source code (inside the two types of get-As-JGraphT > methods). It's at this point the string is parsed and it's here that root > note determination should take place. It's already known whether &R or &U > have been specified here, which should help the code work out what to do. > >> 2. The p* stuff. > > Exactly the same part of the code as described above. Wherever it pushes > values to the stack but prepends them with 'p' first, you'll need to change > the 'p' to some instance variable and provide a getter/setter to change it, > with 'p' being the default setting. > > cheers, > Richard > >> >> Tiago >> 2009/11/3 Richard Holland : >>> >>> Agreed that there is a bug. Now all we need is someone to go in and fix >>> it! >>> :) >>> >>> cheers, >>> Richard >>> >>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>> >>>> 2009/11/3 Thasso Griebel : >>>>> >>>>> There is a way to uniquely ?get a root from a newick string. Usually a >>>>> rooted newick is surrounded with brackets, which indicates the root as >>>>> the >>>>> highest node in the tree. For example: >>>>> >>>>> (A, (B,C)) >>>>> >>>> >>>> Agree, it is quite easy to get the root of the tree from the newick >>>> representation. But it should be done on parsing and returned in some >>>> way by the parsing system. If the user has to do it again, it means >>>> that the user has to parse it again just to know the root node. >>>> >>>>> I would also suggest to generally parse trees as rooted trees (maybe >>>>> jsut >>>>> for th initial internal model). Creating an unrooted tree from a rooted >>>>> ?one >>>>> is easy, remove the root and forget about directions. The other way >>>>> might >>>>> be >>>>> hard and ambiguous. >>>> >>>> 100% agree. >>>> The newick _representation_ always has a root by virtue of the way it >>>> is done. If that root has meaning or not depends. Doing as you suggest >>>> seems the most reasonable idea. >>>> I would add that even if it is an unrooted tree, the topology might be >>>> of interest. In my case I am doing a comparative visualizer and it >>>> might be nice for the user to be able to visualize the topology as >>>> specified. It has no biological meaning, but in practice, for many >>>> users, it helps. >>>> I note that PhyloXML (even by virtue of being a XML format) always >>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>> attribute rooted which can be true or false. >>>> >>>> But, anyway. Even assuming a very conservative view on this, the >>>> current parser, for rooted trees, does not allow to determine where is >>>> the root. I think that there would be a consensus that that is a bug? >>>> >>>> Tiago >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From peter.midford at gmail.com Tue Nov 3 14:28:14 2009 From: peter.midford at gmail.com (Peter Midford) Date: Tue, 3 Nov 2009 14:28:14 -0500 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: <2E8B7EE9-2617-4096-B7AC-52A398D7E69F@gmail.com> Tiago, If you return a directed graph, the root will be a node with no incoming edges. Peter On Nov 3, 2009, at 13:55, Tiago Ant?o wrote: > But the point is that the class interface changes to the outside user: > 1. How does one report back the root to the user? > 2. Regarding the prefix stuff, should the user be allowed to specify a > preferred prefix? > > Both this things imply interface changes visible to users. > If you still need volunteers to do the change, I can do it. But I need > to know what changes to the user interface are to be done. > For 1, maybe a method getRoot, returning a string with the name of the > root node? > For 2, maybe an extended version of the parse function with a suffix > as input parameter? > > 2009/11/3 Richard Holland : >>> 1. Lack of knowledge of root node >> >> The Newick tree string is read as-is and is not parsed. It only >> gets parsed >> at the point of conversion to a Undirected or WeightedGraph inside >> the >> TreeBlocks.java source code (inside the two types of get-As-JGraphT >> methods). It's at this point the string is parsed and it's here >> that root >> note determination should take place. It's already known whether &R >> or &U >> have been specified here, which should help the code work out what >> to do. >> >>> 2. The p* stuff. >> >> Exactly the same part of the code as described above. Wherever it >> pushes >> values to the stack but prepends them with 'p' first, you'll need >> to change >> the 'p' to some instance variable and provide a getter/setter to >> change it, >> with 'p' being the default setting. >> >> cheers, >> Richard >> >>> >>> Tiago >>> 2009/11/3 Richard Holland : >>>> >>>> Agreed that there is a bug. Now all we need is someone to go in >>>> and fix >>>> it! >>>> :) >>>> >>>> cheers, >>>> Richard >>>> >>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>> >>>>> 2009/11/3 Thasso Griebel : >>>>>> >>>>>> There is a way to uniquely get a root from a newick string. >>>>>> Usually a >>>>>> rooted newick is surrounded with brackets, which indicates the >>>>>> root as >>>>>> the >>>>>> highest node in the tree. For example: >>>>>> >>>>>> (A, (B,C)) >>>>>> >>>>> >>>>> Agree, it is quite easy to get the root of the tree from the >>>>> newick >>>>> representation. But it should be done on parsing and returned in >>>>> some >>>>> way by the parsing system. If the user has to do it again, it >>>>> means >>>>> that the user has to parse it again just to know the root node. >>>>> >>>>>> I would also suggest to generally parse trees as rooted trees >>>>>> (maybe >>>>>> jsut >>>>>> for th initial internal model). Creating an unrooted tree from >>>>>> a rooted >>>>>> one >>>>>> is easy, remove the root and forget about directions. The other >>>>>> way >>>>>> might >>>>>> be >>>>>> hard and ambiguous. >>>>> >>>>> 100% agree. >>>>> The newick _representation_ always has a root by virtue of the >>>>> way it >>>>> is done. If that root has meaning or not depends. Doing as you >>>>> suggest >>>>> seems the most reasonable idea. >>>>> I would add that even if it is an unrooted tree, the topology >>>>> might be >>>>> of interest. In my case I am doing a comparative visualizer and it >>>>> might be nice for the user to be able to visualize the topology as >>>>> specified. It has no biological meaning, but in practice, for many >>>>> users, it helps. >>>>> I note that PhyloXML (even by virtue of being a XML format) always >>>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>>> attribute rooted which can be true or false. >>>>> >>>>> But, anyway. Even assuming a very conservative view on this, the >>>>> current parser, for rooted trees, does not allow to determine >>>>> where is >>>>> the root. I think that there would be a consensus that that is a >>>>> bug? >>>>> >>>>> Tiago >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l Peter E. Midford Mesquite Developer Peter.Midford at gmail.com From holland at eaglegenomics.com Tue Nov 3 15:20:31 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 20:20:31 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: A getRoot() function sounds good. It would return the String label of the root node, the same as which identifies the corresponding vertex in the JGraphT model. An equivalent setRoot() would be nice. The prefix for the parser currently is hardcoded as p. Two new methods - set and getDefaultPrefix which accept a string should be provided (it should check that the string is valid, i.e. all alphanumeric and with no spaces or other Newick-sensitive characters). The parser should be changed to use the output from getDefaultPrefix() instead of the hardcoded p. The default behaviour should be such that it behaves the same as at present unless the user explicitly says otherwise by calling the setDefaultPrefix() method. Personally I would also alter the methods that return JGraphTs so that they return their Directed equivalents if possible. I believe that these can still be unrooted - you'd have to check the JGraphT documentation to make sure. Richard. On 3 Nov 2009, at 18:55, Tiago Ant?o wrote: > But the point is that the class interface changes to the outside user: > 1. How does one report back the root to the user? > 2. Regarding the prefix stuff, should the user be allowed to specify a > preferred prefix? > > Both this things imply interface changes visible to users. > If you still need volunteers to do the change, I can do it. But I need > to know what changes to the user interface are to be done. > For 1, maybe a method getRoot, returning a string with the name of the > root node? > For 2, maybe an extended version of the parse function with a suffix > as input parameter? > > 2009/11/3 Richard Holland : >>> 1. Lack of knowledge of root node >> >> The Newick tree string is read as-is and is not parsed. It only >> gets parsed >> at the point of conversion to a Undirected or WeightedGraph inside >> the >> TreeBlocks.java source code (inside the two types of get-As-JGraphT >> methods). It's at this point the string is parsed and it's here >> that root >> note determination should take place. It's already known whether &R >> or &U >> have been specified here, which should help the code work out what >> to do. >> >>> 2. The p* stuff. >> >> Exactly the same part of the code as described above. Wherever it >> pushes >> values to the stack but prepends them with 'p' first, you'll need >> to change >> the 'p' to some instance variable and provide a getter/setter to >> change it, >> with 'p' being the default setting. >> >> cheers, >> Richard >> >>> >>> Tiago >>> 2009/11/3 Richard Holland : >>>> >>>> Agreed that there is a bug. Now all we need is someone to go in >>>> and fix >>>> it! >>>> :) >>>> >>>> cheers, >>>> Richard >>>> >>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>> >>>>> 2009/11/3 Thasso Griebel : >>>>>> >>>>>> There is a way to uniquely get a root from a newick string. >>>>>> Usually a >>>>>> rooted newick is surrounded with brackets, which indicates the >>>>>> root as >>>>>> the >>>>>> highest node in the tree. For example: >>>>>> >>>>>> (A, (B,C)) >>>>>> >>>>> >>>>> Agree, it is quite easy to get the root of the tree from the >>>>> newick >>>>> representation. But it should be done on parsing and returned in >>>>> some >>>>> way by the parsing system. If the user has to do it again, it >>>>> means >>>>> that the user has to parse it again just to know the root node. >>>>> >>>>>> I would also suggest to generally parse trees as rooted trees >>>>>> (maybe >>>>>> jsut >>>>>> for th initial internal model). Creating an unrooted tree from >>>>>> a rooted >>>>>> one >>>>>> is easy, remove the root and forget about directions. The other >>>>>> way >>>>>> might >>>>>> be >>>>>> hard and ambiguous. >>>>> >>>>> 100% agree. >>>>> The newick _representation_ always has a root by virtue of the >>>>> way it >>>>> is done. If that root has meaning or not depends. Doing as you >>>>> suggest >>>>> seems the most reasonable idea. >>>>> I would add that even if it is an unrooted tree, the topology >>>>> might be >>>>> of interest. In my case I am doing a comparative visualizer and it >>>>> might be nice for the user to be able to visualize the topology as >>>>> specified. It has no biological meaning, but in practice, for many >>>>> users, it helps. >>>>> I note that PhyloXML (even by virtue of being a XML format) always >>>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>>> attribute rooted which can be true or false. >>>>> >>>>> But, anyway. Even assuming a very conservative view on this, the >>>>> current parser, for rooted trees, does not allow to determine >>>>> where is >>>>> the root. I think that there would be a consensus that that is a >>>>> bug? >>>>> >>>>> Tiago >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thasso.griebel at uni-jena.de Wed Nov 4 06:57:45 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Wed, 4 Nov 2009 12:57:45 +0100 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> Hi, > A getRoot() function sounds good. It would return the String label > of the root node, the same as which identifies the corresponding > vertex in the JGraphT model. An equivalent setRoot() would be nice. Though you have to keep in mind that switching the root to another node has certain implications on the tree structure and this has to be taken into account when the newick string is parsed and the graph is created. You have to parse the graph from newick and then "reroot" the tree as the root might not be equal to the one specified in the newick string. > Personally I would also alter the methods that return JGraphTs so > that they return their Directed equivalents if possible. I believe > that these can still be unrooted - you'd have to check the JGraphT > documentation to make sure. You have to change that method signature if you want to use the same method. The only relationship between JGraphTs UndirectedGraph and the DirectedGraph counterpart is that they both extend the Graph interface, but a DirectedGraph is not an UndirectedGraph. Switching to DirectedGraph definitely breaks the current API ! I don't know how you usually handle such situations in BioJava, but this clearly breaks compatibility. Maybe it would be better to introduce a new method that returns directed graphs ? cheers, -thasso > > Richard. > > On 3 Nov 2009, at 18:55, Tiago Ant?o wrote: > >> But the point is that the class interface changes to the outside >> user: >> 1. How does one report back the root to the user? >> 2. Regarding the prefix stuff, should the user be allowed to >> specify a >> preferred prefix? >> >> Both this things imply interface changes visible to users. >> If you still need volunteers to do the change, I can do it. But I >> need >> to know what changes to the user interface are to be done. >> For 1, maybe a method getRoot, returning a string with the name of >> the >> root node? >> For 2, maybe an extended version of the parse function with a suffix >> as input parameter? >> >> 2009/11/3 Richard Holland : >>>> 1. Lack of knowledge of root node >>> >>> The Newick tree string is read as-is and is not parsed. It only >>> gets parsed >>> at the point of conversion to a Undirected or WeightedGraph inside >>> the >>> TreeBlocks.java source code (inside the two types of get-As-JGraphT >>> methods). It's at this point the string is parsed and it's here >>> that root >>> note determination should take place. It's already known whether >>> &R or &U >>> have been specified here, which should help the code work out what >>> to do. >>> >>>> 2. The p* stuff. >>> >>> Exactly the same part of the code as described above. Wherever it >>> pushes >>> values to the stack but prepends them with 'p' first, you'll need >>> to change >>> the 'p' to some instance variable and provide a getter/setter to >>> change it, >>> with 'p' being the default setting. >>> >>> cheers, >>> Richard >>> >>>> >>>> Tiago >>>> 2009/11/3 Richard Holland : >>>>> >>>>> Agreed that there is a bug. Now all we need is someone to go in >>>>> and fix >>>>> it! >>>>> :) >>>>> >>>>> cheers, >>>>> Richard >>>>> >>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>>> >>>>>> 2009/11/3 Thasso Griebel : >>>>>>> >>>>>>> There is a way to uniquely get a root from a newick string. >>>>>>> Usually a >>>>>>> rooted newick is surrounded with brackets, which indicates the >>>>>>> root as >>>>>>> the >>>>>>> highest node in the tree. For example: >>>>>>> >>>>>>> (A, (B,C)) >>>>>>> >>>>>> >>>>>> Agree, it is quite easy to get the root of the tree from the >>>>>> newick >>>>>> representation. But it should be done on parsing and returned >>>>>> in some >>>>>> way by the parsing system. If the user has to do it again, it >>>>>> means >>>>>> that the user has to parse it again just to know the root node. >>>>>> >>>>>>> I would also suggest to generally parse trees as rooted trees >>>>>>> (maybe >>>>>>> jsut >>>>>>> for th initial internal model). Creating an unrooted tree from >>>>>>> a rooted >>>>>>> one >>>>>>> is easy, remove the root and forget about directions. The >>>>>>> other way >>>>>>> might >>>>>>> be >>>>>>> hard and ambiguous. >>>>>> >>>>>> 100% agree. >>>>>> The newick _representation_ always has a root by virtue of the >>>>>> way it >>>>>> is done. If that root has meaning or not depends. Doing as you >>>>>> suggest >>>>>> seems the most reasonable idea. >>>>>> I would add that even if it is an unrooted tree, the topology >>>>>> might be >>>>>> of interest. In my case I am doing a comparative visualizer and >>>>>> it >>>>>> might be nice for the user to be able to visualize the topology >>>>>> as >>>>>> specified. It has no biological meaning, but in practice, for >>>>>> many >>>>>> users, it helps. >>>>>> I note that PhyloXML (even by virtue of being a XML format) >>>>>> always >>>>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>>>> attribute rooted which can be true or false. >>>>>> >>>>>> But, anyway. Even assuming a very conservative view on this, the >>>>>> current parser, for rooted trees, does not allow to determine >>>>>> where is >>>>>> the root. I think that there would be a consensus that that is >>>>>> a bug? >>>>>> >>>>>> Tiago >>>>> >>>>> -- >>>>> Richard Holland, BSc MBCS >>>>> Operations and Delivery Director, Eagle Genomics Ltd >>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>>> http://www.eaglegenomics.com/ >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> "The hottest places in hell are reserved for those who, in times of >>>> moral crisis, maintain a neutrality." - Dante >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From tiagoantao at gmail.com Wed Nov 4 07:40:46 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Nov 2009 12:40:46 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> 2009/11/3 Richard Holland : > The prefix for the parser currently is hardcoded as p. Two new methods - set > and getDefaultPrefix which accept a string should be provided (it should > check that the string is valid, i.e. all alphanumeric and with no spaces or > other Newick-sensitive characters). The parser should be changed to use the > output from getDefaultPrefix() instead of the hardcoded p. The default > behaviour should be such that it behaves the same as at present unless the > user explicitly says otherwise by calling the setDefaultPrefix() method. This default behavior would still raise an exception with nodes called p* . I would suggest a minor change: If there is a clash, the parser would try the next p* (or whatever defaultPrefix) ... Example to make it clear: if there is a leaf called p2, internal nodes generated would be p1, p3, p4, .... -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From tiagoantao at gmail.com Wed Nov 4 07:44:21 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Nov 2009 12:44:21 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> Message-ID: <6d941f120911040444y33da2642oe7104708a2d2a6cb@mail.gmail.com> 2009/11/4 Thasso Griebel : >> Personally I would also alter the methods that return JGraphTs so that >> they return their Directed equivalents if possible. I believe that these can >> still be unrooted - you'd have to check the JGraphT documentation to make >> sure. > > You have to change that method signature if you want to use the same method. > The only relationship between JGraphTs UndirectedGraph and the DirectedGraph > counterpart is that they both extend the Graph interface, but a > DirectedGraph is not an UndirectedGraph. Switching to DirectedGraph > definitely breaks the current API ! I don't know how you usually handle such > situations in BioJava, but this clearly breaks compatibility. Maybe it would > be better to introduce a new method that returns directed graphs ? I also don't know how BioJava sorts these kinds of issues. But my personal, outsider, opinion would be in your direction, ie: a. Not break the current API b. Add a new method with a directed graph c. (extra) Add a new method boolean isRooted(void) to check is the tree is rooted or not... Best Tiago From holland at eaglegenomics.com Wed Nov 4 07:46:01 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 4 Nov 2009 12:46:01 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> Message-ID: > > You have to change that method signature if you want to use the same > method. The only relationship between JGraphTs UndirectedGraph and > the DirectedGraph counterpart is that they both extend the Graph > interface, but a DirectedGraph is not an UndirectedGraph. Switching > to DirectedGraph definitely breaks the current API ! I don't know > how you usually handle such situations in BioJava, but this clearly > breaks compatibility. Maybe it would be better to introduce a new > method that returns directed graphs ? Whether or not to break the API depends on a few things. First, how old and well adopted is the code. Second, is the existing API illogical or just plain wrong. A balance between the two gives the confidence in which the API can be changed. In this instance, the code is fairly new, not widely adopted, and the existing API is clearly wrong by forcing all JGraphT graphs to be undirected. To keep everyone happy, I would introduce a new method with a new name that takes a boolean or enum option indicating what type of graph the user wants (undirected,directed,whatever). I would then deprecate the existing method and move its contents into the undirected part of the new method, and replace the old method contents with a call to the new method with the option set to undirected. cheers, Richard > cheers, > -thasso > > > > > > >> >> Richard. >> >> On 3 Nov 2009, at 18:55, Tiago Ant?o wrote: >> >>> But the point is that the class interface changes to the outside >>> user: >>> 1. How does one report back the root to the user? >>> 2. Regarding the prefix stuff, should the user be allowed to >>> specify a >>> preferred prefix? >>> >>> Both this things imply interface changes visible to users. >>> If you still need volunteers to do the change, I can do it. But I >>> need >>> to know what changes to the user interface are to be done. >>> For 1, maybe a method getRoot, returning a string with the name of >>> the >>> root node? >>> For 2, maybe an extended version of the parse function with a suffix >>> as input parameter? >>> >>> 2009/11/3 Richard Holland : >>>>> 1. Lack of knowledge of root node >>>> >>>> The Newick tree string is read as-is and is not parsed. It only >>>> gets parsed >>>> at the point of conversion to a Undirected or WeightedGraph >>>> inside the >>>> TreeBlocks.java source code (inside the two types of get-As-JGraphT >>>> methods). It's at this point the string is parsed and it's here >>>> that root >>>> note determination should take place. It's already known whether >>>> &R or &U >>>> have been specified here, which should help the code work out >>>> what to do. >>>> >>>>> 2. The p* stuff. >>>> >>>> Exactly the same part of the code as described above. Wherever it >>>> pushes >>>> values to the stack but prepends them with 'p' first, you'll need >>>> to change >>>> the 'p' to some instance variable and provide a getter/setter to >>>> change it, >>>> with 'p' being the default setting. >>>> >>>> cheers, >>>> Richard >>>> >>>>> >>>>> Tiago >>>>> 2009/11/3 Richard Holland : >>>>>> >>>>>> Agreed that there is a bug. Now all we need is someone to go in >>>>>> and fix >>>>>> it! >>>>>> :) >>>>>> >>>>>> cheers, >>>>>> Richard >>>>>> >>>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>>>> >>>>>>> 2009/11/3 Thasso Griebel : >>>>>>>> >>>>>>>> There is a way to uniquely get a root from a newick string. >>>>>>>> Usually a >>>>>>>> rooted newick is surrounded with brackets, which indicates >>>>>>>> the root as >>>>>>>> the >>>>>>>> highest node in the tree. For example: >>>>>>>> >>>>>>>> (A, (B,C)) >>>>>>>> >>>>>>> >>>>>>> Agree, it is quite easy to get the root of the tree from the >>>>>>> newick >>>>>>> representation. But it should be done on parsing and returned >>>>>>> in some >>>>>>> way by the parsing system. If the user has to do it again, it >>>>>>> means >>>>>>> that the user has to parse it again just to know the root node. >>>>>>> >>>>>>>> I would also suggest to generally parse trees as rooted trees >>>>>>>> (maybe >>>>>>>> jsut >>>>>>>> for th initial internal model). Creating an unrooted tree >>>>>>>> from a rooted >>>>>>>> one >>>>>>>> is easy, remove the root and forget about directions. The >>>>>>>> other way >>>>>>>> might >>>>>>>> be >>>>>>>> hard and ambiguous. >>>>>>> >>>>>>> 100% agree. >>>>>>> The newick _representation_ always has a root by virtue of the >>>>>>> way it >>>>>>> is done. If that root has meaning or not depends. Doing as you >>>>>>> suggest >>>>>>> seems the most reasonable idea. >>>>>>> I would add that even if it is an unrooted tree, the topology >>>>>>> might be >>>>>>> of interest. In my case I am doing a comparative visualizer >>>>>>> and it >>>>>>> might be nice for the user to be able to visualize the >>>>>>> topology as >>>>>>> specified. It has no biological meaning, but in practice, for >>>>>>> many >>>>>>> users, it helps. >>>>>>> I note that PhyloXML (even by virtue of being a XML format) >>>>>>> always >>>>>>> represents the phylogenies as trees (not weigthed DAGs). There >>>>>>> an >>>>>>> attribute rooted which can be true or false. >>>>>>> >>>>>>> But, anyway. Even assuming a very conservative view on this, the >>>>>>> current parser, for rooted trees, does not allow to determine >>>>>>> where is >>>>>>> the root. I think that there would be a consensus that that is >>>>>>> a bug? >>>>>>> >>>>>>> Tiago >>>>>> >>>>>> -- >>>>>> Richard Holland, BSc MBCS >>>>>> Operations and Delivery Director, Eagle Genomics Ltd >>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>>>> http://www.eaglegenomics.com/ >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> "The hottest places in hell are reserved for those who, in times >>>>> of >>>>> moral crisis, maintain a neutrality." - Dante >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > -- > Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer > Bioinformatik > Office 3426--http://bio.informatik.uni-jena.de--Institut fuer > Informatik > Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet > Jena > Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, > Germany > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Wed Nov 4 07:46:34 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 4 Nov 2009 12:46:34 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> Message-ID: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> Sounds good. On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: > 2009/11/3 Richard Holland : >> The prefix for the parser currently is hardcoded as p. Two new >> methods - set >> and getDefaultPrefix which accept a string should be provided (it >> should >> check that the string is valid, i.e. all alphanumeric and with no >> spaces or >> other Newick-sensitive characters). The parser should be changed to >> use the >> output from getDefaultPrefix() instead of the hardcoded p. The >> default >> behaviour should be such that it behaves the same as at present >> unless the >> user explicitly says otherwise by calling the setDefaultPrefix() >> method. > > This default behavior would still raise an exception with nodes called > p* . I would suggest a minor change: If there is a clash, the parser > would try the next p* (or whatever defaultPrefix) ... > > Example to make it clear: if there is a leaf called p2, internal nodes > generated would be p1, p3, p4, .... > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Wed Nov 4 07:51:37 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 4 Nov 2009 12:51:37 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> Message-ID: ah... except a problem! The parser does not know all names in the string in advance, so if it auto-assigns one that is then used later in the string, we have the same problem with name clashes as before. The names the parser assigns cannot totally avoid all clashes unless it has already parsed the string to find out what names were used in the string itself already. So some kind of pre-parse would be necessary. On 4 Nov 2009, at 12:46, Richard Holland wrote: > Sounds good. > > On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: > >> 2009/11/3 Richard Holland : >>> The prefix for the parser currently is hardcoded as p. Two new >>> methods - set >>> and getDefaultPrefix which accept a string should be provided (it >>> should >>> check that the string is valid, i.e. all alphanumeric and with no >>> spaces or >>> other Newick-sensitive characters). The parser should be changed >>> to use the >>> output from getDefaultPrefix() instead of the hardcoded p. The >>> default >>> behaviour should be such that it behaves the same as at present >>> unless the >>> user explicitly says otherwise by calling the setDefaultPrefix() >>> method. >> >> This default behavior would still raise an exception with nodes >> called >> p* . I would suggest a minor change: If there is a clash, the parser >> would try the next p* (or whatever defaultPrefix) ... >> >> Example to make it clear: if there is a leaf called p2, internal >> nodes >> generated would be p1, p3, p4, .... >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Wed Nov 4 12:18:52 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Nov 2009 17:18:52 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> Message-ID: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> Unless anyone with experience in biojava development wants to take on this, I would volunteer to do this. I ended up using the PhyloXML forester-atv parser (and moving to phyloxml instead of nexus), but as I reported this, I might as well sort it out... 2009/11/4 Richard Holland : > ah... except a problem! The parser does not know all names in the string in > advance, so if it auto-assigns one that is then used later in the string, we > have the same problem with name clashes as before. > > The names the parser assigns cannot totally avoid all clashes unless it has > already parsed the string to find out what names were used in the string > itself already. So some kind of pre-parse would be necessary. > > On 4 Nov 2009, at 12:46, Richard Holland wrote: > >> Sounds good. >> >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: >> >>> 2009/11/3 Richard Holland : >>>> >>>> The prefix for the parser currently is hardcoded as p. Two new methods - >>>> set >>>> and getDefaultPrefix which accept a string should be provided (it should >>>> check that the string is valid, i.e. all alphanumeric and with no spaces >>>> or >>>> other Newick-sensitive characters). The parser should be changed to use >>>> the >>>> output from getDefaultPrefix() instead of the hardcoded p. The default >>>> behaviour should be such that it behaves the same as at present unless >>>> the >>>> user explicitly says otherwise by calling the setDefaultPrefix() method. >>> >>> This default behavior would still raise an exception with nodes called >>> p* . I would suggest a minor change: If there is a clash, the parser >>> would try the next p* (or whatever defaultPrefix) ... >>> >>> Example to make it clear: if there is a leaf called p2, internal nodes >>> generated would be p1, p3, p4, .... >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From andreas at sdsc.edu Wed Nov 4 12:26:06 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 4 Nov 2009 09:26:06 -0800 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> Message-ID: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> excellent, thanks for taking this on! Andreas 2009/11/4 Tiago Ant?o > Unless anyone with experience in biojava development wants to take on > this, I would volunteer to do this. I ended up using the PhyloXML > forester-atv parser (and moving to phyloxml instead of nexus), but as > I reported this, I might as well sort it out... > > 2009/11/4 Richard Holland : > > ah... except a problem! The parser does not know all names in the string > in > > advance, so if it auto-assigns one that is then used later in the string, > we > > have the same problem with name clashes as before. > > > > The names the parser assigns cannot totally avoid all clashes unless it > has > > already parsed the string to find out what names were used in the string > > itself already. So some kind of pre-parse would be necessary. > > > > On 4 Nov 2009, at 12:46, Richard Holland wrote: > > > >> Sounds good. > >> > >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: > >> > >>> 2009/11/3 Richard Holland : > >>>> > >>>> The prefix for the parser currently is hardcoded as p. Two new methods > - > >>>> set > >>>> and getDefaultPrefix which accept a string should be provided (it > should > >>>> check that the string is valid, i.e. all alphanumeric and with no > spaces > >>>> or > >>>> other Newick-sensitive characters). The parser should be changed to > use > >>>> the > >>>> output from getDefaultPrefix() instead of the hardcoded p. The default > >>>> behaviour should be such that it behaves the same as at present unless > >>>> the > >>>> user explicitly says otherwise by calling the setDefaultPrefix() > method. > >>> > >>> This default behavior would still raise an exception with nodes called > >>> p* . I would suggest a minor change: If there is a clash, the parser > >>> would try the next p* (or whatever defaultPrefix) ... > >>> > >>> Example to make it clear: if there is a leaf called p2, internal nodes > >>> generated would be p1, p3, p4, .... > >>> > >>> -- > >>> "The hottest places in hell are reserved for those who, in times of > >>> moral crisis, maintain a neutrality." - Dante > >> > >> -- > >> Richard Holland, BSc MBCS > >> Operations and Delivery Director, Eagle Genomics Ltd > >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > >> http://www.eaglegenomics.com/ > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From tiagoantao at gmail.com Fri Nov 6 06:30:00 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Nov 2009 11:30:00 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> Message-ID: <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com> I've done a few changes to TreesBlock, namely implementing a version of what was talked here: 1. I maintained getTreeAsJGraphT and getTreeAsWeightedJGraphT as they are in terms of interface 2. There is now a new method getTopNode, stating which node is on the "top". I use the name getTopNode and not getRootNode to avoid misleading users: only rooted trees have a root, but for the nexus type of representation all have a "top" (which in rooted trees is the root) 3. There exist now setNodePrefix and getNodePrefix to be able to change the prefix (which defaults to p, as before) In my view these changes solve both problems: The issue with node names and the need to know the root/top of a nexus tree. It might not be the best solution, but it gets things on the right track without taking too much of my time. There are also no changes to the signatures of existing methods Now, there is still a problem: addTree(final String label, UndirectedGraph treegraph) Is highly dependent on the p* convention for internal nodes. Here I would be tempted to change the method signature to: addTree(final String label, UndirectedGraph treegraph, String topNode) Interestingly there is no addTree with weighted graphs (for distances). If nobody sees a problem with this, I will change addTree. I will then attach a patch to the currently open bug (along with test cases). And it should be done. 2009/11/4 Andreas Prlic : > excellent, thanks for taking this on! > Andreas > > 2009/11/4 Tiago Ant?o >> >> Unless anyone with experience in biojava development wants to take on >> this, I would volunteer to do this. I ended up using the PhyloXML >> forester-atv parser (and moving to phyloxml instead of nexus), but as >> I reported this, I might as well sort it out... >> >> 2009/11/4 Richard Holland : >> > ah... except a problem! The parser does not know all names in the string >> > in >> > advance, so if it auto-assigns one that is then used later in the >> > string, we >> > have the same problem with name clashes as before. >> > >> > The names the parser assigns cannot totally avoid all clashes unless it >> > has >> > already parsed the string to find out what names were used in the string >> > itself already. So some kind of pre-parse would be necessary. >> > >> > On 4 Nov 2009, at 12:46, Richard Holland wrote: >> > >> >> Sounds good. >> >> >> >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: >> >> >> >>> 2009/11/3 Richard Holland : >> >>>> >> >>>> The prefix for the parser currently is hardcoded as p. Two new >> >>>> methods - >> >>>> set >> >>>> and getDefaultPrefix which accept a string should be provided (it >> >>>> should >> >>>> check that the string is valid, i.e. all alphanumeric and with no >> >>>> spaces >> >>>> or >> >>>> other Newick-sensitive characters). The parser should be changed to >> >>>> use >> >>>> the >> >>>> output from getDefaultPrefix() instead of the hardcoded p. The >> >>>> default >> >>>> behaviour should be such that it behaves the same as at present >> >>>> unless >> >>>> the >> >>>> user explicitly says otherwise by calling the setDefaultPrefix() >> >>>> method. >> >>> >> >>> This default behavior would still raise an exception with nodes called >> >>> p* . I would suggest a minor change: If there is a clash, the parser >> >>> would try the next p* (or whatever defaultPrefix) ... >> >>> >> >>> Example to make it clear: if there is a leaf called p2, internal nodes >> >>> generated would be p1, p3, p4, .... >> >>> >> >>> -- >> >>> "The hottest places in hell are reserved for those who, in times of >> >>> moral crisis, maintain a neutrality." - Dante >> >> >> >> -- >> >> Richard Holland, BSc MBCS >> >> Operations and Delivery Director, Eagle Genomics Ltd >> >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> >> http://www.eaglegenomics.com/ >> >> >> >> >> >> _______________________________________________ >> >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> > -- >> > Richard Holland, BSc MBCS >> > Operations and Delivery Director, Eagle Genomics Ltd >> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> > http://www.eaglegenomics.com/ >> > >> > >> >> >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Fri Nov 6 06:45:18 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 11:45:18 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com> Message-ID: Sounds great. With regard to addTree you could add a new method with the signature that you propose, copy the existing method body into it and modify appropriately, then delete the existing method body and replace with a call to the new one instead, with a default topNode value that corresponds to the assumptions that the existing method currently makes. cheers, Richard On 6 Nov 2009, at 11:30, Tiago Ant?o wrote: > I've done a few changes to TreesBlock, namely implementing a version > of what was talked here: > > 1. I maintained getTreeAsJGraphT and getTreeAsWeightedJGraphT as they > are in terms of interface > 2. There is now a new method getTopNode, stating which node is on the > "top". I use the name getTopNode and not getRootNode to avoid > misleading users: only rooted trees have a root, but for the nexus > type of representation all have a "top" (which in rooted trees is the > root) > 3. There exist now setNodePrefix and getNodePrefix to be able to > change the prefix (which defaults to p, as before) > > In my view these changes solve both problems: The issue with node > names and the need to know the root/top of a nexus tree. It might not > be the best solution, but it gets things on the right track without > taking too much of my time. There are also no changes to the > signatures of existing methods > > Now, there is still a problem: > addTree(final String label, UndirectedGraph > treegraph) > Is highly dependent on the p* convention for internal nodes. > Here I would be tempted to change the method signature to: > addTree(final String label, UndirectedGraph > treegraph, String topNode) > > Interestingly there is no addTree with weighted graphs (for > distances). > > If nobody sees a problem with this, I will change addTree. > > I will then attach a patch to the currently open bug (along with test > cases). And it should be done. > > 2009/11/4 Andreas Prlic : >> excellent, thanks for taking this on! >> Andreas >> >> 2009/11/4 Tiago Ant?o >>> >>> Unless anyone with experience in biojava development wants to take >>> on >>> this, I would volunteer to do this. I ended up using the PhyloXML >>> forester-atv parser (and moving to phyloxml instead of nexus), but >>> as >>> I reported this, I might as well sort it out... >>> >>> 2009/11/4 Richard Holland : >>>> ah... except a problem! The parser does not know all names in the >>>> string >>>> in >>>> advance, so if it auto-assigns one that is then used later in the >>>> string, we >>>> have the same problem with name clashes as before. >>>> >>>> The names the parser assigns cannot totally avoid all clashes >>>> unless it >>>> has >>>> already parsed the string to find out what names were used in the >>>> string >>>> itself already. So some kind of pre-parse would be necessary. >>>> >>>> On 4 Nov 2009, at 12:46, Richard Holland wrote: >>>> >>>>> Sounds good. >>>>> >>>>> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: >>>>> >>>>>> 2009/11/3 Richard Holland : >>>>>>> >>>>>>> The prefix for the parser currently is hardcoded as p. Two new >>>>>>> methods - >>>>>>> set >>>>>>> and getDefaultPrefix which accept a string should be provided >>>>>>> (it >>>>>>> should >>>>>>> check that the string is valid, i.e. all alphanumeric and with >>>>>>> no >>>>>>> spaces >>>>>>> or >>>>>>> other Newick-sensitive characters). The parser should be >>>>>>> changed to >>>>>>> use >>>>>>> the >>>>>>> output from getDefaultPrefix() instead of the hardcoded p. The >>>>>>> default >>>>>>> behaviour should be such that it behaves the same as at present >>>>>>> unless >>>>>>> the >>>>>>> user explicitly says otherwise by calling the setDefaultPrefix() >>>>>>> method. >>>>>> >>>>>> This default behavior would still raise an exception with nodes >>>>>> called >>>>>> p* . I would suggest a minor change: If there is a clash, the >>>>>> parser >>>>>> would try the next p* (or whatever defaultPrefix) ... >>>>>> >>>>>> Example to make it clear: if there is a leaf called p2, >>>>>> internal nodes >>>>>> generated would be p1, p3, p4, .... >>>>>> >>>>>> -- >>>>>> "The hottest places in hell are reserved for those who, in >>>>>> times of >>>>>> moral crisis, maintain a neutrality." - Dante >>>>> >>>>> -- >>>>> Richard Holland, BSc MBCS >>>>> Operations and Delivery Director, Eagle Genomics Ltd >>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>>> http://www.eaglegenomics.com/ >>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Fri Nov 6 08:26:58 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Nov 2009 13:26:58 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees Message-ID: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> Hi, Either I am looking for too much time to the code or it seems to me that the current implementation only supports binary trees (ie, trees with 2 children). I have tested with: tree tree6 = (1,2,3); And I get only 2 edges. The edge pointing to "1" gets lost. Inspecting the old code, this seems to be how it is implemented. In the case I am correct, this renders the whole tree parser somewhat useless in its current form, as most phylo trees are not binary only. The other two bugs are now corrected, but this is much more serious, me thinks. -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Fri Nov 6 09:10:54 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 14:10:54 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> Message-ID: <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> If that's true, sounds like it's broke. Is the old code easily modified to suit arbitrary numbers of children? On 6 Nov 2009, at 13:26, Tiago Ant?o wrote: > Hi, > > Either I am looking for too much time to the code or it seems to me > that the current implementation only supports binary trees (ie, trees > with 2 children). > > I have tested with: > tree tree6 = (1,2,3); > > And I get only 2 edges. The edge pointing to "1" gets lost. > Inspecting the old code, this seems to be how it is implemented. > > In the case I am correct, this renders the whole tree parser somewhat > useless in its current form, as most phylo trees are not binary only. > > The other two bugs are now corrected, but this is much more serious, > me thinks. > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Fri Nov 6 09:40:00 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Nov 2009 14:40:00 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> Message-ID: <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> 2009/11/6 Richard Holland : > If that's true, sounds like it's broke. Is the old code easily modified to > suit arbitrary numbers of children? I don't think so. It uses a stack based solution, so it would not be possible to know when a part of the stack belongs to the current node being processed or something else on the tree. One could put markers on the stack or something, but it would become a bit convoluted. I would suppose a recursive implementation would be cleaner here. My suggestion: for somebody else to verify my findings. I might be doing something stupidly wrong. Maybe things are correct. Just a simple tree like (1,2,3) (as long as it is not binary) - should expose the problem. From cmasak at gmail.com Fri Nov 6 11:25:57 2009 From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=) Date: Fri, 6 Nov 2009 17:25:57 +0100 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? Message-ID: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> I'm using RichSequenceIterator to read FASTA files containing proteins. Somehow it doesn't work when the protein sequences are in lowercase, which they sometimes are when downloaded from e.g. Uniprot. My code fails to recognize the following file as containing a protein sequence: >OPSD_FELCA mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn cmlttlccgknplgddeasttgsktetsqvapa What am I missing? Here's the code I'm using to read in sequences: private List sequencesFromInputStream(InputStream stream) { BufferedInputStream bufferedStream = new BufferedInputStream(stream); Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator seqit = null; try { seqit = RichSequence.IOTools.readStream(bufferedStream, ns); } catch (IOException e) { logger.error("Couldn't read sequences from file", e); return Collections.emptyList(); } List sequences = new ArrayList(); try { while ( seqit.hasNext() ) { RichSequence rseq; rseq = seqit.nextRichSequence(); // *error occurs here* if (rseq == null) continue; String alphabet = rseq.getAlphabet().getName(); sequences.add( "DNA".equals(alphabet) ? new BiojavaDNA(rseq) : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) : new BiojavaProtein(rseq) ); } } catch (NoSuchElementException e) { logger.error("Read past last sequence", e); } catch (BioException e) { logger.error(e); // *ends up here* } return sequences; } Grateful for any pointers you might have. Regards, // Carl M?sak From cmasak at gmail.com Fri Nov 6 11:54:30 2009 From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=) Date: Fri, 6 Nov 2009 17:54:30 +0100 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> Message-ID: <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> Richard (>), Carl (>>): >> I'm using RichSequenceIterator to read FASTA files containing >> proteins. Somehow it doesn't work when the protein sequences are in >> lowercase, which they sometimes are when downloaded from e.g. Uniprot. >> My code fails to recognize the following file as containing a protein >> sequence: >> >>> OPSD_FELCA >> >> >> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln >> >> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv >> >> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq >> >> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn >> cmlttlccgknplgddeasttgsktetsqvapa >> >> What am I missing? Here's the code I'm using to read in sequences: >> >> private List sequencesFromInputStream(InputStream stream) { >> >> BufferedInputStream bufferedStream = new >> BufferedInputStream(stream); >> Namespace ns = RichObjectFactory.getDefaultNamespace(); >> RichSequenceIterator seqit = null; >> >> try { >> seqit = RichSequence.IOTools.readStream(bufferedStream, ns); >> } catch (IOException e) { >> logger.error("Couldn't read sequences from file", e); >> return Collections.emptyList(); >> } >> >> List sequences = new ArrayList(); >> try { >> while ( seqit.hasNext() ) { >> RichSequence rseq; >> rseq = seqit.nextRichSequence(); // *error occurs here* >> if (rseq == null) >> continue; >> String alphabet = rseq.getAlphabet().getName(); >> sequences.add( >> "DNA".equals(alphabet) ? new BiojavaDNA(rseq) >> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) >> : new BiojavaProtein(rseq) ); >> } >> } catch (NoSuchElementException e) { >> logger.error("Read past last sequence", e); >> } catch (BioException e) { >> logger.error(e); // *ends up here* >> } >> >> return sequences; >> } >> >> Grateful for any pointers you might have. > > Could you post the output from the exception stack that it generates? org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream(BiojavaManager.java:314) at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile(BiojavaManager.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke(AbstractManagerMethodDispatcher.java:243) at net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread(JavaManagerMethodDispatcher.java:248) at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke(AbstractManagerMethodDispatcher.java:130) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at net.bioclipse.recording.WrapInProxyAdvice.invoke(WrapInProxyAdvice.java:22) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:59) at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke(ServiceInvoker.java:67) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke(ServiceTCCLInterceptor.java:34) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke(LocalBundleContextAdvice.java:59) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131) at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy18.invoke(Unknown Source) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:50) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy20.sequencesFromFile(Unknown Source) at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java:152) at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:238) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:212) at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages(SequenceEditor.java:47) at org.eclipse.ui.part.MultiPageEditorPart.createPartControl(MultiPageEditorPart.java:357) at org.eclipse.ui.internal.EditorReference.createPartHelper(EditorReference.java:662) at org.eclipse.ui.internal.EditorReference.createPart(EditorReference.java:462) at org.eclipse.ui.internal.WorkbenchPartReference.getPart(WorkbenchPartReference.java:595) at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313) at org.eclipse.ui.internal.presentations.PresentablePart.setVisible(PresentablePart.java:180) at org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select(PresentablePartFolder.java:270) at org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select(LeftToRightTabOrder.java:65) at org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart(TabbedStackPresentation.java:473) at org.eclipse.ui.internal.PartStack.refreshPresentationSelection(PartStack.java:1256) at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java:1209) at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608) at org.eclipse.ui.internal.PartStack.add(PartStack.java:499) at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103) at org.eclipse.ui.internal.PartStack.add(PartStack.java:485) at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112) at org.eclipse.ui.internal.EditorSashContainer.addEditor(EditorSashContainer.java:63) at org.eclipse.ui.internal.EditorAreaHelper.addToLayout(EditorAreaHelper.java:225) at org.eclipse.ui.internal.EditorAreaHelper.addEditor(EditorAreaHelper.java:213) at org.eclipse.ui.internal.EditorManager.createEditorTab(EditorManager.java:778) at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor(EditorManager.java:677) at org.eclipse.ui.internal.EditorManager.openEditor(EditorManager.java:638) at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched(WorkbenchPage.java:2854) at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor(WorkbenchPage.java:2762) at org.eclipse.ui.internal.WorkbenchPage.access$11(WorkbenchPage.java:2754) at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java:2705) at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2701) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2685) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2676) at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651) at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610) at org.eclipse.ui.actions.OpenFileAction.openFile(OpenFileAction.java:99) at org.eclipse.ui.actions.OpenSystemEditorAction.run(OpenSystemEditorAction.java:99) at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221) at org.eclipse.ui.navigator.CommonNavigatorManager$3.open(CommonNavigatorManager.java:202) at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open(OpenAndLinkWithEditorHelper.java:48) at org.eclipse.jface.viewers.StructuredViewer$2.run(StructuredViewer.java:842) at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) at org.eclipse.core.runtime.Platform.run(Platform.java:888) at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48) at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175) at org.eclipse.jface.viewers.StructuredViewer.fireOpen(StructuredViewer.java:840) at org.eclipse.jface.viewers.StructuredViewer.handleOpen(StructuredViewer.java:1101) at org.eclipse.ui.navigator.CommonViewer.handleOpen(CommonViewer.java:467) at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen(StructuredViewer.java:1205) at org.eclipse.jface.util.OpenStrategy.fireOpenEvent(OpenStrategy.java:264) at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java:258) at org.eclipse.jface.util.OpenStrategy$1.handleEvent(OpenStrategy.java:298) at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258) at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079) at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3441) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100) at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2405) at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369) at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:493) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149) at net.bioclipse.ui.Application.start(Application.java:36) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:194) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514) at org.eclipse.equinox.launcher.Main.run(Main.java:1311) at org.eclipse.equinox.launcher.Main.main(Main.java:1287) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.FastaFormat Accession=OPSD_FELCA Id=null Comments=problem parsing symbols Parse_block=mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa Stack trace follows .... at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:244) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 114 more Caused by: org.biojava.bio.symbol.IllegalSymbolException: This tokenization doesn't contain character: 'e' at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar(CharacterTokenization.java:175) at org.biojava.bio.seq.io.CharacterTokenization$TPStreamParser.characters(CharacterTokenization.java:246) at org.biojava.bio.symbol.SimpleSymbolList.(SimpleSymbolList.java:178) at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:237) ... 115 more // Carl From holland at eaglegenomics.com Fri Nov 6 12:15:28 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 17:15:28 +0000 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> Message-ID: <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> Ah OK I see what's going on. The convenience method you're using, RichSequence.IOTools.readStream (), uses FastaFormat to try and guess the alphabet to use based on the first line of the input sequence. In FastaFormat, it does this by searching for matching non-DNA symbols. The search is case-sensitive: protected static final Pattern aminoAcids = Pattern.compile(".* [FLIPQE].*"); FastaFormat needs patching to make this pattern non-case-sensitive. Still, if the sequence is such that any of the above symbols don't appear until the second or subsequent lines, the guessing will not work and it'll assume it's DNA, and give you the same error as before. In the circumstances where you know what alphabet the sequence is in advance, it's best to avoid the guessing algorithms and instead use the methods such as readFastaDNA that explicity specify the alphabet you want to read. However, there's still one thing that you definitely can't do and that's parse different types of sequence from the same input without inserting some kind of additional code to detect what alphabet each individual sequence is using before parsing it using the appropriate BioJava parser. Your code appears to expecting mixed input, but this won't work unless they all happen to be the same alphabet. cheers, Richard On 6 Nov 2009, at 16:54, Carl M?sak wrote: > Richard (>), Carl (>>): >>> I'm using RichSequenceIterator to read FASTA files containing >>> proteins. Somehow it doesn't work when the protein sequences are in >>> lowercase, which they sometimes are when downloaded from e.g. >>> Uniprot. >>> My code fails to recognize the following file as containing a >>> protein >>> sequence: >>> >>>> OPSD_FELCA >>> >>> >>> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln >>> >>> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv >>> >>> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq >>> >>> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn >>> cmlttlccgknplgddeasttgsktetsqvapa >>> >>> What am I missing? Here's the code I'm using to read in sequences: >>> >>> private List sequencesFromInputStream(InputStream >>> stream) { >>> >>> BufferedInputStream bufferedStream = new >>> BufferedInputStream(stream); >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> RichSequenceIterator seqit = null; >>> >>> try { >>> seqit = RichSequence.IOTools.readStream(bufferedStream, >>> ns); >>> } catch (IOException e) { >>> logger.error("Couldn't read sequences from file", e); >>> return Collections.emptyList(); >>> } >>> >>> List sequences = new ArrayList(); >>> try { >>> while ( seqit.hasNext() ) { >>> RichSequence rseq; >>> rseq = seqit.nextRichSequence(); // *error occurs >>> here* >>> if (rseq == null) >>> continue; >>> String alphabet = rseq.getAlphabet().getName(); >>> sequences.add( >>> "DNA".equals(alphabet) ? new BiojavaDNA(rseq) >>> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) >>> : new BiojavaProtein >>> (rseq) ); >>> } >>> } catch (NoSuchElementException e) { >>> logger.error("Read past last sequence", e); >>> } catch (BioException e) { >>> logger.error(e); // *ends up here* >>> } >>> >>> return sequences; >>> } >>> >>> Grateful for any pointers you might have. >> >> Could you post the output from the exception stack that it generates? > > org.biojava.bio.BioException: Could not read sequence > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence > (RichStreamReader.java:113) > at > net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream > (BiojavaManager.java:314) > at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile > (BiojavaManager.java:291) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke > (AbstractManagerMethodDispatcher.java:243) > at > net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread > (JavaManagerMethodDispatcher.java:248) > at > net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke > (AbstractManagerMethodDispatcher.java:130) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at net.bioclipse.recording.WrapInProxyAdvice.invoke > (WrapInProxyAdvice.java:22) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke > (ServiceInvoker.java:59) > at > org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke > (ServiceInvoker.java:67) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke > (ServiceTCCLInterceptor.java:34) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke > (LocalBundleContextAdvice.java:59) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed > (DelegatingIntroductionInterceptor.java:131) > at > org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke > (DelegatingIntroductionInterceptor.java:119) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at org.springframework.aop.framework.JdkDynamicAopProxy.invoke > (JdkDynamicAopProxy.java:204) > at $Proxy18.invoke(Unknown Source) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke > (AfterReturningAdviceInterceptor.java:50) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at org.springframework.aop.framework.JdkDynamicAopProxy.invoke > (JdkDynamicAopProxy.java:204) > at $Proxy20.sequencesFromFile(Unknown Source) > at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java: > 152) > at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138) > at org.eclipse.ui.part.MultiPageEditorPart.addPage > (MultiPageEditorPart.java:238) > at org.eclipse.ui.part.MultiPageEditorPart.addPage > (MultiPageEditorPart.java:212) > at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages > (SequenceEditor.java:47) > at org.eclipse.ui.part.MultiPageEditorPart.createPartControl > (MultiPageEditorPart.java:357) > at org.eclipse.ui.internal.EditorReference.createPartHelper > (EditorReference.java:662) > at org.eclipse.ui.internal.EditorReference.createPart > (EditorReference.java:462) > at org.eclipse.ui.internal.WorkbenchPartReference.getPart > (WorkbenchPartReference.java:595) > at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313) > at org.eclipse.ui.internal.presentations.PresentablePart.setVisible > (PresentablePart.java:180) > at > org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select > (PresentablePartFolder.java:270) > at > org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select > (LeftToRightTabOrder.java:65) > at > org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart > (TabbedStackPresentation.java:473) > at org.eclipse.ui.internal.PartStack.refreshPresentationSelection > (PartStack.java:1256) > at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java: > 1209) > at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608) > at org.eclipse.ui.internal.PartStack.add(PartStack.java:499) > at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103) > at org.eclipse.ui.internal.PartStack.add(PartStack.java:485) > at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112) > at org.eclipse.ui.internal.EditorSashContainer.addEditor > (EditorSashContainer.java:63) > at org.eclipse.ui.internal.EditorAreaHelper.addToLayout > (EditorAreaHelper.java:225) > at org.eclipse.ui.internal.EditorAreaHelper.addEditor > (EditorAreaHelper.java:213) > at org.eclipse.ui.internal.EditorManager.createEditorTab > (EditorManager.java:778) > at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor > (EditorManager.java:677) > at org.eclipse.ui.internal.EditorManager.openEditor > (EditorManager.java:638) > at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched > (WorkbenchPage.java:2854) > at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor > (WorkbenchPage.java:2762) > at org.eclipse.ui.internal.WorkbenchPage.access$11 > (WorkbenchPage.java:2754) > at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java: > 2705) > at org.eclipse.swt.custom.BusyIndicator.showWhile > (BusyIndicator.java:70) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2701) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2685) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2676) > at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651) > at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610) > at org.eclipse.ui.actions.OpenFileAction.openFile > (OpenFileAction.java:99) > at org.eclipse.ui.actions.OpenSystemEditorAction.run > (OpenSystemEditorAction.java:99) > at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221) > at org.eclipse.ui.navigator.CommonNavigatorManager$3.open > (CommonNavigatorManager.java:202) > at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open > (OpenAndLinkWithEditorHelper.java:48) > at org.eclipse.jface.viewers.StructuredViewer$2.run > (StructuredViewer.java:842) > at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) > at org.eclipse.core.runtime.Platform.run(Platform.java:888) > at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48) > at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175) > at org.eclipse.jface.viewers.StructuredViewer.fireOpen > (StructuredViewer.java:840) > at org.eclipse.jface.viewers.StructuredViewer.handleOpen > (StructuredViewer.java:1101) > at org.eclipse.ui.navigator.CommonViewer.handleOpen > (CommonViewer.java:467) > at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen > (StructuredViewer.java:1205) > at org.eclipse.jface.util.OpenStrategy.fireOpenEvent > (OpenStrategy.java:264) > at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java: > 258) > at org.eclipse.jface.util.OpenStrategy$1.handleEvent > (OpenStrategy.java:298) > at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) > at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258) > at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079) > at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java: > 3441) > at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100) > at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java: > 2405) > at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369) > at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221) > at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500) > at org.eclipse.core.databinding.observable.Realm.runWithDefault > (Realm.java:332) > at org.eclipse.ui.internal.Workbench.createAndRunWorkbench > (Workbench.java:493) > at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java: > 149) > at net.bioclipse.ui.Application.start(Application.java:36) > at org.eclipse.equinox.internal.app.EclipseAppHandle.run > (EclipseAppHandle.java:194) > at > org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication > (EclipseAppLauncher.java:110) > at > org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start > (EclipseAppLauncher.java:79) > at org.eclipse.core.runtime.adaptor.EclipseStarter.run > (EclipseStarter.java:368) > at org.eclipse.core.runtime.adaptor.EclipseStarter.run > (EclipseStarter.java:179) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559) > at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514) > at org.eclipse.equinox.launcher.Main.run(Main.java:1311) > at org.eclipse.equinox.launcher.Main.main(Main.java:1287) > Caused by: org.biojava.bio.seq.io.ParseException: > > A Exception Has Occurred During Parsing. > Please submit the details that follow to biojava-l at biojava.org or post > a bug report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.FastaFormat > Accession=OPSD_FELCA > Id=null > Comments=problem parsing symbols > Parse_block > = > mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa > Stack trace follows .... > > > at org.biojavax.bio.seq.io.FastaFormat.readRichSequence > (FastaFormat.java:244) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence > (RichStreamReader.java:110) > ... 114 more > Caused by: org.biojava.bio.symbol.IllegalSymbolException: This > tokenization doesn't contain character: 'e' > at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar > (CharacterTokenization.java:175) > at org.biojava.bio.seq.io.CharacterTokenization > $TPStreamParser.characters(CharacterTokenization.java:246) > at org.biojava.bio.symbol.SimpleSymbolList. > (SimpleSymbolList.java:178) > at org.biojavax.bio.seq.io.FastaFormat.readRichSequence > (FastaFormat.java:237) > ... 115 more > > // Carl -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Nov 6 11:35:24 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 16:35:24 +0000 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> Message-ID: Could you post the output from the exception stack that it generates? thanks, Richard On 6 Nov 2009, at 16:25, Carl M?sak wrote: > I'm using RichSequenceIterator to read FASTA files containing > proteins. Somehow it doesn't work when the protein sequences are in > lowercase, which they sometimes are when downloaded from e.g. Uniprot. > My code fails to recognize the following file as containing a protein > sequence: > >> OPSD_FELCA > mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln > lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv > aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq > qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn > cmlttlccgknplgddeasttgsktetsqvapa > > What am I missing? Here's the code I'm using to read in sequences: > > private List sequencesFromInputStream(InputStream > stream) { > > BufferedInputStream bufferedStream = new BufferedInputStream > (stream); > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqit = null; > > try { > seqit = RichSequence.IOTools.readStream(bufferedStream, > ns); > } catch (IOException e) { > logger.error("Couldn't read sequences from file", e); > return Collections.emptyList(); > } > > List sequences = new ArrayList(); > try { > while ( seqit.hasNext() ) { > RichSequence rseq; > rseq = seqit.nextRichSequence(); // *error occurs > here* > if (rseq == null) > continue; > String alphabet = rseq.getAlphabet().getName(); > sequences.add( > "DNA".equals(alphabet) ? new BiojavaDNA(rseq) > : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) > : new BiojavaProtein > (rseq) ); > } > } catch (NoSuchElementException e) { > logger.error("Read past last sequence", e); > } catch (BioException e) { > logger.error(e); // *ends up here* > } > > return sequences; > } > > Grateful for any pointers you might have. > > Regards, > // Carl M?sak > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andylu0320 at gmail.com Sat Nov 7 13:06:39 2009 From: andylu0320 at gmail.com (Andy Lu) Date: Sat, 7 Nov 2009 13:06:39 -0500 Subject: [Biojava-l] Bio Java installation inquiry Message-ID: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> Hi, I am able to get JMol to run on Eclipse, but I am having a lot of trouble getting biojava to run, I am not sure how to set up all of the class path, etc. I am new to using Eclipse and biojava. Is there a specific step by step instruction online available? Any help would be greatly appreciated! From andreas at sdsc.edu Sat Nov 7 13:14:17 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 7 Nov 2009 10:14:17 -0800 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> Message-ID: <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> Hi Andy, best thing is to download the jar files from http://biojava.org/wiki/BioJava:Download . Proably the easiest way to get started is to create a new project in eclipse and right click on the project-> Properties -> Java build path -> Libraries -> Add jars Then your project will know how where to find the dependencies and you can start writing your own code. Andreas On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: > Hi, I am able to get JMol to run on Eclipse, but I am having a lot of > trouble getting biojava to run, I am not sure how to set up all of the > class > path, etc. > I am new to using Eclipse and biojava. Is there a specific step by step > instruction online available? > > Any help would be greatly appreciated! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Sat Nov 7 13:40:59 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 7 Nov 2009 10:40:59 -0800 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <4a1a3f7d0911071033j20dcd234rfd287998e7fab603@mail.gmail.com> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> <4a1a3f7d0911071033j20dcd234rfd287998e7fab603@mail.gmail.com> Message-ID: <59a41c430911071040i2b574d0ak2af98dbf22c1ab6a@mail.gmail.com> You don;t need to have Jmol in the classpath for running biojava, but if you do, you can use the jmol/biojava interface contained in the protein structure modules. In that case JmolApplet.jar would be sufficient, you don;t need to check out the Jmol source... Andreas On Sat, Nov 7, 2009 at 10:33 AM, Andy Lu wrote: > O I see, but don't I also need to have all of the JMol java files set up > first, or the BioJava jar file contains everything I need? > > > On Sat, Nov 7, 2009 at 1:14 PM, Andreas Prlic wrote: > >> Hi Andy, >> >> best thing is to download the jar files from >> http://biojava.org/wiki/BioJava:Download . >> >> Proably the easiest way to get started is to create a new project in >> eclipse and right click on the project-> Properties -> Java build path -> >> Libraries -> Add jars >> >> Then your project will know how where to find the dependencies and you >> can start writing your own code. >> >> Andreas >> >> >> On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: >> >>> Hi, I am able to get JMol to run on Eclipse, but I am having a lot of >>> trouble getting biojava to run, I am not sure how to set up all of the >>> class >>> path, etc. >>> I am new to using Eclipse and biojava. Is there a specific step by step >>> instruction online available? >>> >>> Any help would be greatly appreciated! >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> > > > -- > Andy Lu > From andy.law at roslin.ed.ac.uk Sat Nov 7 15:21:59 2009 From: andy.law at roslin.ed.ac.uk (andy law (RI)) Date: Sat, 7 Nov 2009 20:21:59 +0000 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>, <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> Message-ID: <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk> Andreas, When will the mavenised version of biojava be officially released? Later, Andy ________________________________________ From: biojava-l-bounces at lists.open-bio.org [biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic [andreas at sdsc.edu] Sent: 07 November 2009 18:14 To: Andy Lu Cc: biojava-l at biojava.org Subject: Re: [Biojava-l] Bio Java installation inquiry Hi Andy, best thing is to download the jar files from http://biojava.org/wiki/BioJava:Download . Proably the easiest way to get started is to create a new project in eclipse and right click on the project-> Properties -> Java build path -> Libraries -> Add jars Then your project will know how where to find the dependencies and you can start writing your own code. Andreas On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: > Hi, I am able to get JMol to run on Eclipse, but I am having a lot of > trouble getting biojava to run, I am not sure how to set up all of the > class > path, etc. > I am new to using Eclipse and biojava. Is there a specific step by step > instruction online available? > > Any help would be greatly appreciated! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From invite+m---dj1_ at facebookmail.com Sat Nov 7 10:36:06 2009 From: invite+m---dj1_ at facebookmail.com (Hemant Katta) Date: Sat, 7 Nov 2009 07:36:06 -0800 Subject: [Biojava-l] Check out my photos on Facebook Message-ID: <5cfd033d354d05252e1f2adb571f7451@localhost.localdomain> Hi Biojava-l, I set up a Facebook profile where I can post my pictures, videos and events and I want to add you as a friend so you can see it. First, you need to join Facebook! Once you join, you can also create your own profile. Thanks, Hemant To sign up for Facebook, follow the link below: http://www.facebook.com/p.php?i=1248280623&k=Z4AT2VW4W4TBXFL1XE5Y2USVTSCK5YW&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=biojava-l at biojava.org&c=152b234aad67c75ff060fc623aab7b42.biojava-l at biojava.org was invited to join Facebook by Hemant Katta. If you do not wish to receive this type of email from Facebook in the future, please click on the link below to unsubscribe. http://www.facebook.com/o.php?k=139a0f&u=574809715&mid=15f9114G2242e673G0G8 Facebook's offices are located at 1601 S. California Ave., Palo Alto, CA 94304. From andreas at sdsc.edu Sun Nov 8 01:52:00 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 7 Nov 2009 22:52:00 -0800 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk> Message-ID: <59a41c430911072252j40e22912k85886a5d61427bba@mail.gmail.com> Hi Andy, At the present the plan is to spend some more time working on the modules and then make a release (called 3.0) at some point shortly after the hackaton in Cambridge in January. Early adopters can already now use the modules via SVN. Andreas On Sat, Nov 7, 2009 at 12:21 PM, andy law (RI) wrote: > Andreas, > > When will the mavenised version of biojava be officially released? > > Later, > > Andy > ________________________________________ > From: biojava-l-bounces at lists.open-bio.org [ > biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic [ > andreas at sdsc.edu] > Sent: 07 November 2009 18:14 > To: Andy Lu > Cc: biojava-l at biojava.org > Subject: Re: [Biojava-l] Bio Java installation inquiry > > Hi Andy, > > best thing is to download the jar files from > http://biojava.org/wiki/BioJava:Download . > > Proably the easiest way to get started is to create a new project in > eclipse > and right click on the project-> Properties -> Java build path -> Libraries > -> Add jars > > Then your project will know how where to find the dependencies and you can > start writing your own code. > > Andreas > > > On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: > > > Hi, I am able to get JMol to run on Eclipse, but I am having a lot of > > trouble getting biojava to run, I am not sure how to set up all of the > > class > > path, etc. > > I am new to using Eclipse and biojava. Is there a specific step by step > > instruction online available? > > > > Any help would be greatly appreciated! > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jbdundas at gmail.com Tue Nov 10 09:23:10 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 10 Nov 2009 19:53:10 +0530 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> Message-ID: <326ea8620911100623k4daa1222s60481a9f35777c31@mail.gmail.com> Dear Friends, Thank you for your help and advise. The code in the mentioned URL is working fine -> http://gist.github.com/229248 (this is my code that has been uploaded by a wise group member. Many thanks to him for doing that) Hope this helps.. Regards, JItesh Dundas On Sun, Nov 8, 2009 at 3:52 PM, jitesh dundas wrote: > Dear Sir, > > My program is working fine and can send me an xml file with 20 > records. However, it does not allow me to send large amounts of > records. > > For e.g. if I enter "cancer" it will return only 20 records. > > Can you please tell me what I should do next to get all those records. > Thank you in advance > > Regards, > Jitesh Dundas > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > > > > Hi Jitesh, > > > > It is hard to read your code with all the formatting off probably due to > email and many commented lines that don;t seem to get used. Can you provide > the stacktrace, so we can see what part of biojava is affected? > > > > Probably a good strategy to write and debug this is to simply the problem > into smaller steps. Try to first download the files you want to parse and > write the code to parse them from the local file. That will avoid any > issues you might encounter with networking and server/client communication. > Once the parsing is working you could take it to the next step and add the > server communication... > > > > Andreas > > > > > > > > > > On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas > wrote: > >> > >> Hi friends, > >> > >> I am getting this error on doing a post(using the code below) to this > url-> > >> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > >> > >> I have written this code in .jsp file. Later I will change it into > servlet. > >> > >> Error:- > >> XML Parsing Error: XML or text declaration not at start of entity > >> Location: > >> > http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > >> Line Number 11, Column 1: >> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > >> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd > ">2034200 > >> 19877350 19877304 19877297 > >> 19877284 19877271 19877265 > >> 19877250 19877245 19877226 > >> 19877210 19877179 19877175 > >> 19877161 19877159 19877158 > >> 19877123 19877122 19877120 > >> 19877119 19877118 > >> cancer > >> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > >> Fields] > >> "neoplasms"[MeSH Terms] MeSH > >> Terms 2082133 Y > >> "neoplasms"[All Fields] > All > >> Fields 1634731 Y > >> OR "cancer"[All > Fields] > >> All Fields 902537 > Y > >> OR GROUP > >> 2009/10/22[EDAT] EDAT 0 > >> Y > >> 2009/11/01[EDAT] EDAT 0 > >> Y RANGE AND > >> ("neoplasms"[MeSH Terms] OR > >> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > >> 2009/11/01[EDAT] > >> ^ > >> > >> As you can see, the XML output is coming fine but the above error does > not > >> go..The output via this program should be just like hitting manually the > >> above URL in the browser.. > >> The browser is Mozilla Firefox. > >> > >> Code:- > >> > >> <%@ page language = "java" %> > >> <%@ page import = "java.sql.*" %> > >> <%@ page import = "java.util.*" %> > >> <%@ page import = "java.io.*" %> > >> <%@ page import="java.lang.*" %> > >> <%@ page import="java.net.*" %> > >> <%@ page import="java.nio.*" %> > >> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > >> > >> > >> <% > >> > >> try > >> { > >> //String str = ""; > >> //out.println(""); > >> > >> Properties systemSettings = System.getProperties(); > >> systemSettings.put("http.proxyHost", "********"); > >> systemSettings.put("http.proxyPort", "******"); > >> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > >> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > >> > >> //out.println("Properties Set"); > >> Authenticator.setDefault(new Authenticator() > >> { > >> protected PasswordAuthentication getPasswordAuthentication() > >> { > >> return new PasswordAuthentication("**", > >> "******".toCharArray()); // specify ur user name password of iitb login > >> } > >> }); > >> > >> > >> System.setProperties(systemSettings); > >> //out.println("After Authentication & Properties Settings"); > >> > >> //create xml file. > >> //the input to google api > >> //String textAreaContent = request.getParameter("text"); > >> String textAreaContent = "This si a tst"; > >> > >> String str = ""; > >> > >> //xml file generation ends here.. > >> //FetchDataFromNCBI_URLString.jsp > >> String URLString = request.getParameter("txtURLString").trim(); > >> > >> //URL url = new URL(" > >> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > >> "); > >> URL url = new URL(URLString); //url string taken from user input. > >> HttpURLConnection connection = null; > >> > >> connection = (HttpURLConnection) url.openConnection(); > >> System.out.println("After open connection"); > >> connection.setRequestMethod("POST"); > >> connection.setDoInput(true); > >> connection.setDoOutput(true); > >> > >> connection.setUseCaches(false); > >> connection.setAllowUserInteraction(false); > >> //connection.setFollowRedirects(true); > >> //connection.setInstanceFollowRedirects(true); > >> //System.out.println("Before-------------------"); > >> connection.setRequestProperty ("Content-Type","text/xml; > >> charset=\"utf-8\""); > >> //System.out.println("After-------------------"); > >> > >> //System.out.println(""+ connection.getOutputStream()); > >> > >> //System.out.println("After dataoutputstream..Line No-65"); > >> > >> //System.out.println("Response Code="+ connection.getResponseCode); > >> > >> OutputStreamWriter dosout = new > >> OutputStreamWriter(connection.getOutputStream()); > >> //System.out.println("After dosout object..Line No-63"); > >> //dosout.write(str); > >> dosout.close (); > >> > >> BufferedReader in = new BufferedReader( new InputStreamReader( > >> connection.getInputStream())); > >> > >> String decodedString; > >> String tempstr = ""; > >> > >> > >> while ((decodedString = in.readLine()) != null) > >> { > >> tempstr = tempstr + decodedString; > >> //out.println(decodedString); > >> } > >> out.println(tempstr); > >> in.close(); > >> } > >> catch(Exception ex) > >> { > >> out.println("Exception->"+ex); > >> PrintWriter pw = response.getWriter(); > >> ex.printStackTrace(pw); > >> } > >> > >> > >> %> > >> > >> Thanks in advance.. > >> > >> Regards, > >> JItesh Dundas > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > From oliver.stolpe at fu-berlin.de Thu Nov 12 08:18:52 2009 From: oliver.stolpe at fu-berlin.de (Oliver Stolpe) Date: Thu, 12 Nov 2009 14:18:52 +0100 Subject: [Biojava-l] SeqIOTools deprecated, looking for alternatives // RichSeq.IOTools Message-ID: <4AFC0B3C.3010503@fu-berlin.de> Hello *, the cookbook uses in its examples the SeqIOTools-class for reading the files. But in the API it is marked as deprecated. Now I am looking for alternatives, so I searched the list and internet and found out that biojavax provides methods and classes for reading the files (RichSequence.IOTools). For example, I try to read an EMBL-file: --begin:code-- BufferedReader br = new BufferedReader(new FileReader(filename)); Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns); while (seqs.hasNext()) { RichSequence seq = seqs.nextRichSequence(); System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap()); } --end:code-- But I always get this error message: --begin:error-- org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at ReadGenbankFile.EMBL(ReadGenbankFile.java:42) at ReadGenbankFile.main(ReadGenbankFile.java:85) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.EMBLFormat Accession=null Id=not set Comments= Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT CDS join(<1082..1272,2484..2638,4926..>5041) /codon_start=3 /gene="PGM1" /product="phosphoglucomutase 1" /function="carbohydrate metabolism" /EC_number="5.4.2.2" /db_xref="GOA:Q9H1D2" /db_xref="HGNC:8905" /db_xref="HSSP:3PMG" /db_xref="InterPro:IPR016055" /db_xref="UniProtKB/TrEMBL:Q9H1D2" /protein_id="CAC19809.1" /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ Sequence 462 BP; Stack trace follows .... at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 2 more Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1949) at java.lang.String.substring(String.java:1916) at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761) ... 4 more --end:error-- The file looks all ok I think and works well with the deprecated SeqIOTools: --begin:embl-file-- ID AJ243265_2; parent: AJ243265 AC AJ243265; FT CDS join(<1082..1272,2484..2638,4926..>5041) FT /codon_start=3 FT /gene="PGM1" FT /product="phosphoglucomutase 1" FT /function="carbohydrate metabolism" FT /EC_number="5.4.2.2" FT /db_xref="GOA:Q9H1D2" FT /db_xref="HGNC:8905" FT /db_xref="HSSP:3PMG" FT /db_xref="InterPro:IPR016055" FT /db_xref="UniProtKB/TrEMBL:Q9H1D2" FT /protein_id="CAC19809.1" FT /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV FT ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA FT RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT" SQ Sequence 462 BP; ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct gcgaactcgg 60 cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc aacctcacct 120 atgcagctga cctggtggag accatgaagt caggagagca tgattttggg gctgcctttg 180 atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg aacccttcag 240 actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag cagactgggg 300 tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg gctagtgcta 360 caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat ttgatggacg 420 cgagcaaact gtccctttgt ggggaggaga gcttcgggac cg 462 // --end:embl-file-- The parser always crashes before reading the sequence (ttgt..., directly after the BP;). Any suggestions how I get this work? Or are there other alternatives for substituting the deprecated SeqIOTools-class? Thanks in advance, with best regards, Oliver From holland at eaglegenomics.com Fri Nov 13 06:21:47 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 13 Nov 2009 11:21:47 +0000 Subject: [Biojava-l] SeqIOTools deprecated, looking for alternatives // RichSeq.IOTools In-Reply-To: <4AFC0B3C.3010503@fu-berlin.de> References: <4AFC0B3C.3010503@fu-berlin.de> Message-ID: <05574914-87FC-44BB-90F1-75C79670A8EC@eaglegenomics.com> Hello, The file you are parsing is not a valid EMBL format file. The EMBL format is specified here: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4 and this is what the file should look like for your accession: http://www.ebi.ac.uk/cgi-bin/emblfetch?style=html&id=AJ243265&Submit=Go The most obvious problems in your file are the absence of the required 'XX' section delimiters, and an invalid ID line. There might be other problems too but I haven't checked the whole file, just the first few lines. The deprecated SeqIOTools really didn't care if the file was valid or not, they basically just made a copy of all the lines in an internal token/value map. They made no attempt to parse or understand the data in each line. The new RichSequence-based parsers actually attempt to enforce the file format definitions and break down and understand the contents of each line. This means that they will reject any file that does not strictly conform to the specified format. cheers, Richard On 12 Nov 2009, at 13:18, Oliver Stolpe wrote: > Hello *, > > the cookbook uses in its examples the SeqIOTools-class for reading the files. But in the API it is marked as deprecated. Now I am looking for alternatives, so I searched the list and internet and found out that biojavax provides methods and classes for reading the files (RichSequence.IOTools). > > For example, I try to read an EMBL-file: > > --begin:code-- > > BufferedReader br = new BufferedReader(new FileReader(filename)); > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns); > > while (seqs.hasNext()) { > RichSequence seq = seqs.nextRichSequence(); > System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap()); > } > > --end:code-- > > But I always get this error message: > > --begin:error-- > > org.biojava.bio.BioException: Could not read sequence > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at ReadGenbankFile.EMBL(ReadGenbankFile.java:42) > at ReadGenbankFile.main(ReadGenbankFile.java:85) > Caused by: org.biojava.bio.seq.io.ParseException: > > A Exception Has Occurred During Parsing. > Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.EMBLFormat > Accession=null > Id=not set > Comments= > Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT CDS join(<1082..1272,2484..2638,4926..>5041) > /codon_start=3 > /gene="PGM1" > /product="phosphoglucomutase 1" > /function="carbohydrate metabolism" > /EC_number="5.4.2.2" > /db_xref="GOA:Q9H1D2" > /db_xref="HGNC:8905" > /db_xref="HSSP:3PMG" > /db_xref="InterPro:IPR016055" > /db_xref="UniProtKB/TrEMBL:Q9H1D2" > /protein_id="CAC19809.1" > /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV > ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA > RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ Sequence 462 BP; > Stack trace follows .... > > > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775) > at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 2 more > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -3 > at java.lang.String.substring(String.java:1949) > at java.lang.String.substring(String.java:1916) > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761) > ... 4 more > > --end:error-- > > The file looks all ok I think and works well with the deprecated SeqIOTools: > > --begin:embl-file-- > ID AJ243265_2; parent: AJ243265 > AC AJ243265; > FT CDS join(<1082..1272,2484..2638,4926..>5041) > FT /codon_start=3 > FT /gene="PGM1" > FT /product="phosphoglucomutase 1" > FT /function="carbohydrate metabolism" > FT /EC_number="5.4.2.2" > FT /db_xref="GOA:Q9H1D2" > FT /db_xref="HGNC:8905" > FT /db_xref="HSSP:3PMG" > FT /db_xref="InterPro:IPR016055" > FT /db_xref="UniProtKB/TrEMBL:Q9H1D2" > FT /protein_id="CAC19809.1" > FT /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV > FT ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA > FT RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT" > SQ Sequence 462 BP; > ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct gcgaactcgg 60 > cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc aacctcacct 120 > atgcagctga cctggtggag accatgaagt caggagagca tgattttggg gctgcctttg 180 > atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg aacccttcag 240 > actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag cagactgggg 300 > tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg gctagtgcta 360 > caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat ttgatggacg 420 > cgagcaaact gtccctttgt ggggaggaga gcttcgggac cg 462 > // > --end:embl-file-- > > The parser always crashes before reading the sequence (ttgt..., directly after the BP;). > > Any suggestions how I get this work? > Or are there other alternatives for substituting the deprecated SeqIOTools-class? > > Thanks in advance, > > with best regards, > > Oliver > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Fri Nov 13 07:25:41 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 13 Nov 2009 12:25:41 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> Message-ID: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Hi, > My suggestion: for somebody else to verify my findings. I might be > doing something stupidly wrong. Maybe things are correct. Just a > simple tree like (1,2,3) (as long as it is not binary) - should expose > the problem. > Has nobody answered here is my take: 1. The error reported probably exists 2. Most probably nobody is using the parser (as it only supports binary trees). In this light, changing the API should not be a problem at all. I would not mind correcting the problem (I have already corrected the previous 2 ones in my local version). I would suggest removing the call to the unweighted graph. Reasons: 1. A weighted version is enough. If branch lengths are not specified, then weights could be set to 0. There there would not be a decrease in functionality. 2. Severely reducing the size of the code is important. Clearly the code is not much maintained (and I am not offering to maintain it in the long run, just putting it in good shape) and not much used. Therefore a smaller, more easy to manage code base makes even more sense. If you accept a solution along these lines. I would correct all the bugs and also include test code (which is also missing). -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Fri Nov 13 07:42:03 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 13 Nov 2009 12:42:03 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Message-ID: <6467088B-93FA-48D2-A7C4-27CD238CE1AE@eaglegenomics.com> i'm all for that. The original code was developed by a Google Summer of Code student, who we haven't heard much from since. :( cheers, Richard On 13 Nov 2009, at 12:25, Tiago Ant?o wrote: > Hi, > >> My suggestion: for somebody else to verify my findings. I might be >> doing something stupidly wrong. Maybe things are correct. Just a >> simple tree like (1,2,3) (as long as it is not binary) - should expose >> the problem. >> > > Has nobody answered here is my take: > > 1. The error reported probably exists > 2. Most probably nobody is using the parser (as it only supports binary trees). > > In this light, changing the API should not be a problem at all. > > I would not mind correcting the problem (I have already corrected the > previous 2 ones in my local version). > I would suggest removing the call to the unweighted graph. Reasons: > 1. A weighted version is enough. If branch lengths are not specified, > then weights could be set to 0. There there would not be a decrease in > functionality. > 2. Severely reducing the size of the code is important. Clearly the > code is not much maintained (and I am not offering to maintain it in > the long run, just putting it in good shape) and not much used. > Therefore a smaller, more easy to manage code base makes even more > sense. > > If you accept a solution along these lines. I would correct all the > bugs and also include test code (which is also missing). > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thasso.griebel at uni-jena.de Fri Nov 13 09:51:41 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Fri, 13 Nov 2009 15:51:41 +0100 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Message-ID: Hi, > 1. A weighted version is enough. If branch lengths are not specified, > then weights could be set to 0. There there would not be a decrease in > functionality. just my two cents, but I would go with a default weight of 1.0. If you read something unweighted you would ignore the edge weights anyways, but, for example, if you write something simple that computes path lengths, a default weight of 1.0 ensures that the method also works for "unweighted" trees, where the length of a path is defined as the number of edges you need to traverse to move from say A to B. I think the argument also hold for other algorithms used on trees and graphs. anyways, just my two cent. -thasso -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From tiagoantao at gmail.com Fri Nov 13 09:54:08 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 13 Nov 2009 14:54:08 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Message-ID: <6d941f120911130654n55153f50r477cd11c281bd9a1@mail.gmail.com> 2009/11/13 Thasso Griebel : > just my two cents, but I would go with a default weight of 1.0. If you read > something unweighted you would ignore the edge weights anyways, but, for > example, if you write something simple that computes path lengths, a default > weight of 1.0 ensures that the method also works for "unweighted" trees, > where the length of a path is defined as the number of edges you need to > traverse to move from say A to B. I think the argument also hold for other > algorithms used on trees and graphs. OK, I will do this. From holland at eaglegenomics.com Fri Nov 13 10:04:27 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 13 Nov 2009 15:04:27 +0000 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com> Message-ID: <2180C289-D2F7-4910-8534-9A94B1003941@eaglegenomics.com> I've applied the patch to the trunk of biojava-live. Thanks! Richard On 9 Nov 2009, at 16:26, Carl M?sak wrote: > Richard (>): >> Ah OK I see what's going on. >> >> The convenience method you're using, RichSequence.IOTools.readStream(), uses >> FastaFormat to try and guess the alphabet to use based on the first line of >> the input sequence. >> >> In FastaFormat, it does this by searching for matching non-DNA symbols. The >> search is case-sensitive: >> >> protected static final Pattern aminoAcids = >> Pattern.compile(".*[FLIPQE].*"); >> >> FastaFormat needs patching to make this pattern non-case-sensitive. > > Patch attached. > > I also took the opportunity to remove the occurrences of .* in the > Pattern above. Generally, once should be using Matcher.find() when one > is interested in matching a part of a string. This is more efficient > than using Matcher.matches() and surrounding the desired regular > expression with .*, since the latter will cause a lot of unnecessary > backtracking and make the search quadratic. > > This effect only shows up for very long strings, but long strings can > and do happen in bioinformatics. The below measurements show the > quadratic behaviour of the former approach. > > $ for length in 100 1000 10000 100000 1000000; do (time java > WithDotStar $length) 2>&1 | grep real; done > real 0m0.371s > real 0m0.367s > real 0m0.577s > real 0m2.735s > real 0m25.275s > > $ for length in 100 1000 10000 100000 1000000; do (time java > WithoutDotStar $length) 2>&1 | grep real; done > real 0m0.309s > real 0m0.361s > real 0m0.468s > real 0m1.184s > real 0m9.703s > > Kindly, > // Carl > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andylu0320 at gmail.com Sun Nov 15 17:07:41 2009 From: andylu0320 at gmail.com (Andy Lu) Date: Sun, 15 Nov 2009 17:07:41 -0500 Subject: [Biojava-l] JMol I/O Message-ID: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com> Hi, sorry to bother everyone again. But I have a simple quesiton, I am using the SimpleJMolExample.java provided on the website and it works. But for a pdb file containing about 20 atoms, all of the atoms shows up on JMol for 1 second and then disappears, is it because the color changes or something or some atom size restriction? It works for files that contain much larger number of atoms. If I try to open a file manually from JMol through the open option, it shows up nicely. Is there a way that I can make the pdb file displayed on JMol through Biojava the same color/display as the one if I open it manually though JMol? Any help would be greatly appreciated! Thank you! -- Andy Lu From tiagoantao at gmail.com Sun Nov 15 18:19:46 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 15 Nov 2009 23:19:46 +0000 Subject: [Biojava-l] Newick parser Message-ID: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> Hi, I have made the changes as discussed, the code is attached to the bugzilla bug concerning part of the issues that were found. A few notes: 1. There is a ParserException raised on TreeBlock. Tough there is a TreeBlockParser, most of the important parsing was (and still is!) being made on TreeBlock. I would imagine that this is not the best design, but I did not change it. 2. I made some test cases. Also included. 3. I don't mind producing some documentation, in case you accept the code. 4. I noticed a few minor bugs more (like eating spaces in the names of nodes). But they are really minor. 5. The API was changed, but I suppose not many people were parsing trees. If there were people parsing trees most probably the bug on not being able to process trees that are not binary would have been detected as it is pretty major. Tiago -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From andreas at sdsc.edu Sun Nov 15 23:41:57 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 15 Nov 2009 20:41:57 -0800 Subject: [Biojava-l] JMol I/O In-Reply-To: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com> References: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com> Message-ID: <59a41c430911152041w592e43c2w3048b916a1855b85@mail.gmail.com> Hi Andy, probably you are trying to visualize a small molecule in Jmol, but the visualization script you are sending only works if you have several C-alpha atoms available. Try something like "select * ; spacefill on;". Jmol has a powerful scripting language which is probably worth having a look at, if you want to work with it more closely. Andreas On Sun, Nov 15, 2009 at 2:07 PM, Andy Lu wrote: > Hi, sorry to bother everyone again. > But I have a simple quesiton, I am using the SimpleJMolExample.java provided > on the website and it works. But for a pdb file containing about 20 atoms, > all of the atoms shows up on JMol for 1 second and then disappears, is it > because the color changes or something or some atom size restriction? It > works for files that contain much larger number of atoms. > If I try to open a file manually from JMol through the open option, it shows > up nicely. Is there a way that I can make the pdb file displayed on JMol > through Biojava the same color/display as the one if I open it manually > though JMol? > Any help would be greatly appreciated! > Thank you! > > -- > Andy Lu > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at eaglegenomics.com Mon Nov 16 03:39:02 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 16 Nov 2009 08:39:02 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> Message-ID: Patch applied to the trunk of biojava-live. Thanks for fixing it! cheers, Richard On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: > Hi, > > I have made the changes as discussed, the code is attached to the > bugzilla bug concerning part of the issues that were found. > A few notes: > > 1. There is a ParserException raised on TreeBlock. Tough there is a > TreeBlockParser, most of the important parsing was (and still is!) > being made on TreeBlock. I would imagine that this is not the best > design, but I did not change it. > 2. I made some test cases. Also included. > 3. I don't mind producing some documentation, in case you accept the code. > 4. I noticed a few minor bugs more (like eating spaces in the names of > nodes). But they are really minor. > 5. The API was changed, but I suppose not many people were parsing > trees. If there were people parsing trees most probably the bug on not > being able to process trees that are not binary would have been > detected as it is pretty major. > > Tiago > > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Mon Nov 16 07:35:11 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 16 Nov 2009 12:35:11 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> Message-ID: <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> I can just easily solve 2679, as it is precisely on the file that I was changing. In case there is interest I'll just solve it. 2009/11/16 Richard Holland : > Patch applied to the trunk of biojava-live. Thanks for fixing it! > > cheers, > Richard > > On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: > >> Hi, >> >> I have made the changes as discussed, the code is attached to the >> bugzilla bug concerning part of the issues that were found. >> A few notes: >> >> 1. There is a ParserException raised on TreeBlock. Tough there is a >> TreeBlockParser, most of the important parsing was (and still is!) >> being made on TreeBlock. I would imagine that this is not the best >> design, but I did not change it. >> 2. I made some test cases. Also included. >> 3. I don't mind producing some documentation, in case you accept the code. >> 4. I noticed a few minor bugs more (like eating spaces in the names of >> nodes). But they are really minor. >> 5. The API was changed, but I suppose not many people were parsing >> trees. If there were people parsing trees most probably the bug on not >> being able to process trees that are not binary would have been >> detected as it is pretty major. >> >> Tiago >> >> -- >> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From holland at eaglegenomics.com Mon Nov 16 07:41:50 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 16 Nov 2009 12:41:50 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> Message-ID: yes please! On 16 Nov 2009, at 12:35, Tiago Ant?o wrote: > I can just easily solve 2679, as it is precisely on the file that I > was changing. In case there is interest I'll just solve it. > > 2009/11/16 Richard Holland : >> Patch applied to the trunk of biojava-live. Thanks for fixing it! >> >> cheers, >> Richard >> >> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: >> >>> Hi, >>> >>> I have made the changes as discussed, the code is attached to the >>> bugzilla bug concerning part of the issues that were found. >>> A few notes: >>> >>> 1. There is a ParserException raised on TreeBlock. Tough there is a >>> TreeBlockParser, most of the important parsing was (and still is!) >>> being made on TreeBlock. I would imagine that this is not the best >>> design, but I did not change it. >>> 2. I made some test cases. Also included. >>> 3. I don't mind producing some documentation, in case you accept the code. >>> 4. I noticed a few minor bugs more (like eating spaces in the names of >>> nodes). But they are really minor. >>> 5. The API was changed, but I suppose not many people were parsing >>> trees. If there were people parsing trees most probably the bug on not >>> being able to process trees that are not binary would have been >>> detected as it is pretty major. >>> >>> Tiago >>> >>> -- >>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Mon Nov 16 12:52:56 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 16 Nov 2009 17:52:56 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> Message-ID: <6d941f120911160952y36b26e40r4ffa5dd980e012fd@mail.gmail.com> I've submitted a patch to 2679. Please have a look and see if you like it. 2009/11/16 Richard Holland : > yes please! > > On 16 Nov 2009, at 12:35, Tiago Ant?o wrote: > >> I can just easily solve 2679, as it is precisely on the file that I >> was changing. In case there is interest I'll just solve it. >> >> 2009/11/16 Richard Holland : >>> Patch applied to the trunk of biojava-live. Thanks for fixing it! >>> >>> cheers, >>> Richard >>> >>> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: >>> >>>> Hi, >>>> >>>> I have made the changes as discussed, the code is attached to the >>>> bugzilla bug concerning part of the issues that were found. >>>> A few notes: >>>> >>>> 1. There is a ParserException raised on TreeBlock. Tough there is a >>>> TreeBlockParser, most of the important parsing was (and still is!) >>>> being made on TreeBlock. I would imagine that this is not the best >>>> design, but I did not change it. >>>> 2. I made some test cases. Also included. >>>> 3. I don't mind producing some documentation, in case you accept the code. >>>> 4. I noticed a few minor bugs more (like eating spaces in the names of >>>> nodes). But they are really minor. >>>> 5. The API was changed, but I suppose not many people were parsing >>>> trees. If there were people parsing trees most probably the bug on not >>>> being able to process trees that are not binary would have been >>>> detected as it is pretty major. >>>> >>>> Tiago >>>> >>>> -- >>>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> >> -- >> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From tiagoantao at gmail.com Tue Nov 17 14:57:50 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Nov 2009 19:57:50 +0000 Subject: [Biojava-l] Fwd: Newick parser In-Reply-To: <6d941f120911171154q31df1c32taff20f5b282867bc@mail.gmail.com> References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> <59a41c430911171123j43806c25vda67e406aa2d3caa@mail.gmail.com> <6d941f120911171154q31df1c32taff20f5b282867bc@mail.gmail.com> Message-ID: <6d941f120911171157x20daedcdif84496c363a5bfcd@mail.gmail.com> Forwarding this to the users mailing list also, as there might be some interest in the documentation. ---------- Forwarded message ---------- From: Tiago Ant?o Date: 2009/11/17 Subject: Re: [Biojava-l] Newick parser To: Richard Holland Cc: Andreas Prlic , biojava-dev As this was all fresh in my head, I wrote a small tutorial: http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ As I don't follow the biojava mailing list regularly (or bug reports), if some bug arises on this code, feel free to send me an email to my personal account: If I have some time to spare, I will have a look at it. Tiago 2009/11/17 Richard Holland : > Sorry - forgot to change the filenames in the test (under the new modular system they're in a different place than in the non-modular codebase that Tiago was working from). Fixed and committed. > > On 17 Nov 2009, at 19:23, Andreas Prlic wrote: > >> Hi Richard, >> >> I just did an update of my checkout and it seems the -phylo unit tests >> don't compile any more. Can you take a look? >> >> Thanks, >> Andreas >> >> Test set: org.biojavax.bio.phylo.io.nexus.TreesBlockTest >> ------------------------------------------------------------------------------- >> Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.063 >> sec <<< FAILURE! >> testSimple(org.biojavax.bio.phylo.io.nexus.TreesBlockTest) ?Time >> elapsed: 0.021 sec ?<<< ERROR! >> java.lang.NullPointerException >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTreeNode(TreesBlockTest.java:160) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTree(TreesBlockTest.java:175) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.doVertexCount(TreesBlockTest.java:139) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.testSimple(TreesBlockTest.java:63) >> >> testThreeOffspring(org.biojavax.bio.phylo.io.nexus.TreesBlockTest) >> Time elapsed: 0.002 sec ?<<< ERROR! >> java.lang.NullPointerException >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTreeNode(TreesBlockTest.java:160) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTree(TreesBlockTest.java:175) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.doVertexCount(TreesBlockTest.java:139) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.testThreeOffspring(TreesBlockTest.java:70 >> >> 2009/11/16 Richard Holland : >>> Patch applied to the trunk of biojava-live. Thanks for fixing it! >>> >>> cheers, >>> Richard >>> >>> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: >>> >>>> Hi, >>>> >>>> I have made the changes as discussed, the code is attached to the >>>> bugzilla bug concerning part of the issues that were found. >>>> A few notes: >>>> >>>> 1. There is a ParserException raised on TreeBlock. Tough there is a >>>> TreeBlockParser, most of the important parsing was (and still is!) >>>> being made on TreeBlock. I would imagine that this is not the best >>>> design, but I did not change it. >>>> 2. I made some test cases. Also included. >>>> 3. I don't mind producing some documentation, in case you accept the code. >>>> 4. I noticed a few minor bugs more (like eating spaces in the names of >>>> nodes). But they are really minor. >>>> 5. The API was changed, but I suppose not many people were parsing >>>> trees. If there were people parsing trees most probably the bug on not >>>> being able to process trees that are not binary would have been >>>> detected as it is pretty major. >>>> >>>> Tiago >>>> >>>> -- >>>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From mara.axiom at gmail.com Sat Nov 21 00:43:52 2009 From: mara.axiom at gmail.com (Mara Axiom) Date: Sat, 21 Nov 2009 00:43:52 -0500 Subject: [Biojava-l] Algorithm to compare protein sequences Message-ID: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> Hello all, I am looking for an algorithm to compare protein sequences and output the result in Newick format, for a project. I was told that I could not use UPGMA and Nearest Neighbor, algorithms. I'm new in working with phylogenetic data. Any help is appreciated. Thanks, Mara From andreas.draeger at uni-tuebingen.de Sat Nov 21 03:35:19 2009 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Sat, 21 Nov 2009 09:35:19 +0100 Subject: [Biojava-l] Algorithm to compare protein sequences In-Reply-To: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> References: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> Message-ID: <4B07A647.1080405@uni-tuebingen.de> Hi Mara, At the moment there are two alignment algorithms available: Smith-Waterman for local and Needleman-Wunsh for global alignment. In addition to that there is a package for hidden Markov models that is also able to perform sequence alignments (see the BioJava cookbook for examples). However, currently both approaches will write the alignment similar to the BLAST output and not in this Newick format (I am actually not familiar with that). I hope that helps. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From thasso.griebel at uni-jena.de Sat Nov 21 06:25:34 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Sat, 21 Nov 2009 12:25:34 +0100 Subject: [Biojava-l] Algorithm to compare protein sequences In-Reply-To: <4B07A647.1080405@uni-tuebingen.de> References: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> <4B07A647.1080405@uni-tuebingen.de> Message-ID: Hi, if I get this one right you want to do three things. 1. create a multiple sequence alignment. 2. create a pairwise distance matrix from the alignment. 3. use a distance based tree construction method (Agglomerative clustering (UPGME, WPGMA..) or Neighbor Joining) to create a tree. The tree can be printed as newick string. I don't know if all of this is possible with biojava. If not, I could at least provide code to create the pairwise distance matrix (including JC and Kimura corrections) and for the clustering algorithms. But I thought NJ and AgglomerativeClustering are already implemented, though I couldn't find the classes in the 1.7 API ? If you don't need to do the computations programmatically, you can also try http://bio.informatik.uni-jena.de/epos/ though with the currently released version you have to do the alignment externally. The next release will also provide a way to do multiple sequence alignments directly. Another alternative is http://gi.cebitec.uni-bielefeld.de/qalign QAlign can be used to create the alignment (using clustalw, tcoffee or dialign) and create NJ or Agglomerative tree in one step. Nice thing is that you can manipulate the alignment (i.e. insert gaps) and the tree updated continuously cheers, thasso On Nov 21, 2009, at 09:35 , Andreas Dr?ger wrote: > Hi Mara, > > At the moment there are two alignment algorithms available: > Smith-Waterman for local and Needleman-Wunsh for global alignment. In > addition to that there is a package for hidden Markov models that is > also able to perform sequence alignments (see the BioJava cookbook for > examples). However, currently both approaches will write the alignment > similar to the BLAST output and not in this Newick format (I am actually > not familiar with that). I hope that helps. > > Cheers > Andreas > > -- > Dipl.-Bioinform. Andreas Dr?ger > Eberhard Karls University T?bingen > Center for Bioinformatics (ZBIT) > Sand 1 > 72076 T?bingen > Germany > > Phone: +49-7071-29-70436 > Fax: +49-7071-29-5091 > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From jbdundas at gmail.com Sun Nov 8 05:22:59 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Sun, 08 Nov 2009 10:22:59 -0000 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> Message-ID: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> Dear Sir, My program is working fine and can send me an xml file with 20 records. However, it does not allow me to send large amounts of records. For e.g. if I enter "cancer" it will return only 20 records. Can you please tell me what I should do next to get all those records. Thank you in advance Regards, Jitesh Dundas On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > > Hi Jitesh, > > It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? > > Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... > > Andreas > > > > > On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: >> >> Hi friends, >> >> I am getting this error on doing a post(using the code below) to this url-> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 >> >> I have written this code in .jsp file. Later I will change it into servlet. >> >> Error:- >> XML Parsing Error: XML or text declaration not at start of entity >> Location: >> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI >> Line Number 11, Column 1:> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " >> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200 >> 19877350 19877304 19877297 >> 19877284 19877271 19877265 >> 19877250 19877245 19877226 >> 19877210 19877179 19877175 >> 19877161 19877159 19877158 >> 19877123 19877122 19877120 >> 19877119 19877118 >> cancer >> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All >> Fields] >> "neoplasms"[MeSH Terms] MeSH >> Terms 2082133 Y >> "neoplasms"[All Fields] All >> Fields 1634731 Y >> OR "cancer"[All Fields] >> All Fields 902537 Y >> OR GROUP >> 2009/10/22[EDAT] EDAT 0 >> Y >> 2009/11/01[EDAT] EDAT 0 >> Y RANGE AND >> ("neoplasms"[MeSH Terms] OR >> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : >> 2009/11/01[EDAT] >> ^ >> >> As you can see, the XML output is coming fine but the above error does not >> go..The output via this program should be just like hitting manually the >> above URL in the browser.. >> The browser is Mozilla Firefox. >> >> Code:- >> >> <%@ page language = "java" %> >> <%@ page import = "java.sql.*" %> >> <%@ page import = "java.util.*" %> >> <%@ page import = "java.io.*" %> >> <%@ page import="java.lang.*" %> >> <%@ page import="java.net.*" %> >> <%@ page import="java.nio.*" %> >> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> >> >> >> <% >> >> try >> { >> //String str = ""; >> //out.println(""); >> >> Properties systemSettings = System.getProperties(); >> systemSettings.put("http.proxyHost", "********"); >> systemSettings.put("http.proxyPort", "******"); >> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); >> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); >> >> //out.println("Properties Set"); >> Authenticator.setDefault(new Authenticator() >> { >> protected PasswordAuthentication getPasswordAuthentication() >> { >> return new PasswordAuthentication("**", >> "******".toCharArray()); // specify ur user name password of iitb login >> } >> }); >> >> >> System.setProperties(systemSettings); >> //out.println("After Authentication & Properties Settings"); >> >> //create xml file. >> //the input to google api >> //String textAreaContent = request.getParameter("text"); >> String textAreaContent = "This si a tst"; >> >> String str = ""; >> >> //xml file generation ends here.. >> //FetchDataFromNCBI_URLString.jsp >> String URLString = request.getParameter("txtURLString").trim(); >> >> //URL url = new URL(" >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 >> "); >> URL url = new URL(URLString); //url string taken from user input. >> HttpURLConnection connection = null; >> >> connection = (HttpURLConnection) url.openConnection(); >> System.out.println("After open connection"); >> connection.setRequestMethod("POST"); >> connection.setDoInput(true); >> connection.setDoOutput(true); >> >> connection.setUseCaches(false); >> connection.setAllowUserInteraction(false); >> //connection.setFollowRedirects(true); >> //connection.setInstanceFollowRedirects(true); >> //System.out.println("Before-------------------"); >> connection.setRequestProperty ("Content-Type","text/xml; >> charset=\"utf-8\""); >> //System.out.println("After-------------------"); >> >> //System.out.println(""+ connection.getOutputStream()); >> >> //System.out.println("After dataoutputstream..Line No-65"); >> >> //System.out.println("Response Code="+ connection.getResponseCode); >> >> OutputStreamWriter dosout = new >> OutputStreamWriter(connection.getOutputStream()); >> //System.out.println("After dosout object..Line No-63"); >> //dosout.write(str); >> dosout.close (); >> >> BufferedReader in = new BufferedReader( new InputStreamReader( >> connection.getInputStream())); >> >> String decodedString; >> String tempstr = ""; >> >> >> while ((decodedString = in.readLine()) != null) >> { >> tempstr = tempstr + decodedString; >> //out.println(decodedString); >> } >> out.println(tempstr); >> in.close(); >> } >> catch(Exception ex) >> { >> out.println("Exception->"+ex); >> PrintWriter pw = response.getWriter(); >> ex.printStackTrace(pw); >> } >> >> >> %> >> >> Thanks in advance.. >> >> Regards, >> JItesh Dundas >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: ImportFromPubmed3.jsp Type: application/octet-stream Size: 2696 bytes Desc: not available URL: From cmasak at gmail.com Mon Nov 9 11:26:00 2009 From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=) Date: Mon, 9 Nov 2009 17:26:00 +0100 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> Message-ID: <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com> Richard (>): > Ah OK I see what's going on. > > The convenience method you're using, RichSequence.IOTools.readStream(), uses > FastaFormat to try and guess the alphabet to use based on the first line of > the input sequence. > > In FastaFormat, it does this by searching for matching non-DNA symbols. The > search is case-sensitive: > > ? ? ? ?protected static final Pattern aminoAcids = > Pattern.compile(".*[FLIPQE].*"); > > FastaFormat needs patching to make this pattern non-case-sensitive. Patch attached. I also took the opportunity to remove the occurrences of .* in the Pattern above. Generally, once should be using Matcher.find() when one is interested in matching a part of a string. This is more efficient than using Matcher.matches() and surrounding the desired regular expression with .*, since the latter will cause a lot of unnecessary backtracking and make the search quadratic. This effect only shows up for very long strings, but long strings can and do happen in bioinformatics. The below measurements show the quadratic behaviour of the former approach. $ for length in 100 1000 10000 100000 1000000; do (time java WithDotStar $length) 2>&1 | grep real; done real 0m0.371s real 0m0.367s real 0m0.577s real 0m2.735s real 0m25.275s $ for length in 100 1000 10000 100000 1000000; do (time java WithoutDotStar $length) 2>&1 | grep real; done real 0m0.309s real 0m0.361s real 0m0.468s real 0m1.184s real 0m9.703s Kindly, // Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: aminoAcids.patch Type: application/octet-stream Size: 1995 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: WithDotStar.java Type: application/octet-stream Size: 634 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: WithoutDotStar.java Type: application/octet-stream Size: 633 bytes Desc: not available URL: From holland at eaglegenomics.com Mon Nov 23 14:08:11 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 23 Nov 2009 19:08:11 +0000 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> Message-ID: <2582F218-3873-49FB-BFB2-6F72B2B4815C@eaglegenomics.com> Your program takes an input 'txtURLString' - could you give an example of the value that this usually contains? I suspect that this URL is where your problem lies but without seeing an example value I couldn't say for sure. thanks, Richard On 8 Nov 2009, at 10:22, jitesh dundas wrote: > Dear Sir, > > My program is working fine and can send me an xml file with 20 > records. However, it does not allow me to send large amounts of > records. > > For e.g. if I enter "cancer" it will return only 20 records. > > Can you please tell me what I should do next to get all those records. > Thank you in advance > > Regards, > Jitesh Dundas > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: >> >> Hi Jitesh, >> >> It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? >> >> Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... >> >> Andreas >> >> >> >> >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: >>> >>> Hi friends, >>> >>> I am getting this error on doing a post(using the code below) to this url-> >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 >>> >>> I have written this code in .jsp file. Later I will change it into servlet. >>> >>> Error:- >>> XML Parsing Error: XML or text declaration not at start of entity >>> Location: >>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI >>> Line Number 11, Column 1:>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200 >>> 19877350 19877304 19877297 >>> 19877284 19877271 19877265 >>> 19877250 19877245 19877226 >>> 19877210 19877179 19877175 >>> 19877161 19877159 19877158 >>> 19877123 19877122 19877120 >>> 19877119 19877118 >>> cancer >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All >>> Fields] >>> "neoplasms"[MeSH Terms] MeSH >>> Terms 2082133 Y >>> "neoplasms"[All Fields] All >>> Fields 1634731 Y >>> OR "cancer"[All Fields] >>> All Fields 902537 Y >>> OR GROUP >>> 2009/10/22[EDAT] EDAT 0 >>> Y >>> 2009/11/01[EDAT] EDAT 0 >>> Y RANGE AND >>> ("neoplasms"[MeSH Terms] OR >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : >>> 2009/11/01[EDAT] >>> ^ >>> >>> As you can see, the XML output is coming fine but the above error does not >>> go..The output via this program should be just like hitting manually the >>> above URL in the browser.. >>> The browser is Mozilla Firefox. >>> >>> Code:- >>> >>> <%@ page language = "java" %> >>> <%@ page import = "java.sql.*" %> >>> <%@ page import = "java.util.*" %> >>> <%@ page import = "java.io.*" %> >>> <%@ page import="java.lang.*" %> >>> <%@ page import="java.net.*" %> >>> <%@ page import="java.nio.*" %> >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> >>> >>> >>> <% >>> >>> try >>> { >>> //String str = ""; >>> //out.println(""); >>> >>> Properties systemSettings = System.getProperties(); >>> systemSettings.put("http.proxyHost", "********"); >>> systemSettings.put("http.proxyPort", "******"); >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); >>> >>> //out.println("Properties Set"); >>> Authenticator.setDefault(new Authenticator() >>> { >>> protected PasswordAuthentication getPasswordAuthentication() >>> { >>> return new PasswordAuthentication("**", >>> "******".toCharArray()); // specify ur user name password of iitb login >>> } >>> }); >>> >>> >>> System.setProperties(systemSettings); >>> //out.println("After Authentication & Properties Settings"); >>> >>> //create xml file. >>> //the input to google api >>> //String textAreaContent = request.getParameter("text"); >>> String textAreaContent = "This si a tst"; >>> >>> String str = ""; >>> >>> //xml file generation ends here.. >>> //FetchDataFromNCBI_URLString.jsp >>> String URLString = request.getParameter("txtURLString").trim(); >>> >>> //URL url = new URL(" >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 >>> "); >>> URL url = new URL(URLString); //url string taken from user input. >>> HttpURLConnection connection = null; >>> >>> connection = (HttpURLConnection) url.openConnection(); >>> System.out.println("After open connection"); >>> connection.setRequestMethod("POST"); >>> connection.setDoInput(true); >>> connection.setDoOutput(true); >>> >>> connection.setUseCaches(false); >>> connection.setAllowUserInteraction(false); >>> //connection.setFollowRedirects(true); >>> //connection.setInstanceFollowRedirects(true); >>> //System.out.println("Before-------------------"); >>> connection.setRequestProperty ("Content-Type","text/xml; >>> charset=\"utf-8\""); >>> //System.out.println("After-------------------"); >>> >>> //System.out.println(""+ connection.getOutputStream()); >>> >>> //System.out.println("After dataoutputstream..Line No-65"); >>> >>> //System.out.println("Response Code="+ connection.getResponseCode); >>> >>> OutputStreamWriter dosout = new >>> OutputStreamWriter(connection.getOutputStream()); >>> //System.out.println("After dosout object..Line No-63"); >>> //dosout.write(str); >>> dosout.close (); >>> >>> BufferedReader in = new BufferedReader( new InputStreamReader( >>> connection.getInputStream())); >>> >>> String decodedString; >>> String tempstr = ""; >>> >>> >>> while ((decodedString = in.readLine()) != null) >>> { >>> tempstr = tempstr + decodedString; >>> //out.println(decodedString); >>> } >>> out.println(tempstr); >>> in.close(); >>> } >>> catch(Exception ex) >>> { >>> out.println("Exception->"+ex); >>> PrintWriter pw = response.getWriter(); >>> ex.printStackTrace(pw); >>> } >>> >>> >>> %> >>> >>> Thanks in advance.. >>> >>> Regards, >>> JItesh Dundas >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From rabee.a.aa at m.titech.ac.jp Tue Nov 24 05:14:30 2009 From: rabee.a.aa at m.titech.ac.jp (rabee.a.aa at m.titech.ac.jp) Date: Tue, 24 Nov 2009 19:14:30 +0900 Subject: [Biojava-l] sequencing data analysis Message-ID: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp> Dear Biojava members, I'm new to Biojava and i would like to use it for analysis of next generation sequencing data. May i ask you about the available packages for analysis of sequencing data? Best Regards, Rabe From holland at eaglegenomics.com Tue Nov 24 05:33:43 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Nov 2009 10:33:43 +0000 Subject: [Biojava-l] sequencing data analysis In-Reply-To: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp> References: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp> Message-ID: <7ECB9ED5-F983-4E74-8AC1-C70C129EDA7E@eaglegenomics.com> There's loads of things you can do. A good starting point is here: http://biojava.org/wiki/BioJava:CookBook cheers, Richard On 24 Nov 2009, at 10:14, wrote: > Dear Biojava members, > I'm new to Biojava and i would like to use it for analysis of next generation sequencing data. > May i ask you about the available packages for analysis of sequencing data? > > Best Regards, > Rabe > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jbdundas at gmail.com Tue Nov 24 09:48:55 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 24 Nov 2009 20:18:55 +0530 Subject: [Biojava-l] Fwd: Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> <2582F218-3873-49FB-BFB2-6F72B2B4815C@eaglegenomics.com> <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com> Message-ID: <326ea8620911240648w371c1c7fx7f495133753bbbe@mail.gmail.com> Dear Sir/Madam, FYI.. Jus trying to contribute to this mailing list and help. Regards, Jitesh Dundas ---------- Forwarded message ---------- From: jitesh dundas Date: Nov 24, 2009 8:17 PM Subject: Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity To: Richard Holland Dear Sir, Thank you for your reply. I figured this problem out by sending records in small sets. e.g. 20 pages per page. It is like a pagination functionality. For each new page, we need to hit the URl.. My functionality is working fine.I will be happy to share my code with you (and anyone) who needs it. I simply fetch data from the URL and write to an XML file. Next I just read the XML file and show them in the web page to the user. Again, I need to know how to fetch records for protein database. Two types of searches are needed I suspect. First we use the Esearch utility and then the Efetch utility to get the data of the specific protein.. I welcome any suggestions on this ! Thank you everyone for your help. Regards, Jitesh Dundas On 11/24/09, Richard Holland wrote: > > Your program takes an input 'txtURLString' - could you give an example of > the value that this usually contains? I suspect that this URL is where your > problem lies but without seeing an example value I couldn't say for sure. > > thanks, > Richard > > On 8 Nov 2009, at 10:22, jitesh dundas wrote: > > > Dear Sir, > > > > My program is working fine and can send me an xml file with 20 > > records. However, it does not allow me to send large amounts of > > records. > > > > For e.g. if I enter "cancer" it will return only 20 records. > > > > Can you please tell me what I should do next to get all those records. > > Thank you in advance > > > > Regards, > > Jitesh Dundas > > > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > >> > >> Hi Jitesh, > >> > >> It is hard to read your code with all the formatting off probably due to > email and many commented lines that don;t seem to get used. Can you provide > the stacktrace, so we can see what part of biojava is affected? > >> > >> Probably a good strategy to write and debug this is to simply the > problem into smaller steps. Try to first download the files you want to > parse and write the code to parse them from the local file. That will avoid > any issues you might encounter with networking and server/client > communication. Once the parsing is working you could take it to the next > step and add the server communication... > >> > >> Andreas > >> > >> > >> > >> > >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas > wrote: > >>> > >>> Hi friends, > >>> > >>> I am getting this error on doing a post(using the code below) to this > url-> > >>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > >>> > >>> I have written this code in .jsp file. Later I will change it into > servlet. > >>> > >>> Error:- > >>> XML Parsing Error: XML or text declaration not at start of entity > >>> Location: > >>> > http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > >>> Line Number 11, Column 1: >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd > ">2034200 > >>> 19877350 19877304 19877297 > >>> 19877284 19877271 19877265 > >>> 19877250 19877245 19877226 > >>> 19877210 19877179 19877175 > >>> 19877161 19877159 19877158 > >>> 19877123 19877122 19877120 > >>> 19877119 19877118 > >>> cancer > >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > >>> Fields] > >>> "neoplasms"[MeSH Terms] MeSH > >>> Terms 2082133 Y > >>> "neoplasms"[All > Fields] All > >>> Fields 1634731 Y > >>> OR "cancer"[All > Fields] > >>> All > Fields 902537 Y > >>> OR GROUP > >>> > 2009/10/22[EDAT] EDAT 0 > >>> Y > >>> > 2009/11/01[EDAT] EDAT 0 > >>> Y RANGE AND > >>> ("neoplasms"[MeSH Terms] OR > >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > >>> 2009/11/01[EDAT] > >>> ^ > >>> > >>> As you can see, the XML output is coming fine but the above error does > not > >>> go..The output via this program should be just like hitting manually > the > >>> above URL in the browser.. > >>> The browser is Mozilla Firefox. > >>> > >>> Code:- > >>> > >>> <%@ page language = "java" %> > >>> <%@ page import = "java.sql.*" %> > >>> <%@ page import = "java.util.*" %> > >>> <%@ page import = "java.io.*" %> > >>> <%@ page import="java.lang.*" %> > >>> <%@ page import="java.net.*" %> > >>> <%@ page import="java.nio.*" %> > >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > >>> > >>> > >>> <% > >>> > >>> try > >>> { > >>> //String str = ""; > >>> //out.println(""); > >>> > >>> Properties systemSettings = System.getProperties(); > >>> systemSettings.put("http.proxyHost", "********"); > >>> systemSettings.put("http.proxyPort", "******"); > >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > >>> > >>> //out.println("Properties Set"); > >>> Authenticator.setDefault(new Authenticator() > >>> { > >>> protected PasswordAuthentication getPasswordAuthentication() > >>> { > >>> return new PasswordAuthentication("**", > >>> "******".toCharArray()); // specify ur user name password of iitb login > >>> } > >>> }); > >>> > >>> > >>> System.setProperties(systemSettings); > >>> //out.println("After Authentication & Properties Settings"); > >>> > >>> //create xml file. > >>> //the input to google api > >>> //String textAreaContent = request.getParameter("text"); > >>> String textAreaContent = "This si a tst"; > >>> > >>> String str = ""; > >>> > >>> //xml file generation ends here.. > >>> //FetchDataFromNCBI_URLString.jsp > >>> String URLString = request.getParameter("txtURLString").trim(); > >>> > >>> //URL url = new URL(" > >>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > >>> "); > >>> URL url = new URL(URLString); //url string taken from user input. > >>> HttpURLConnection connection = null; > >>> > >>> connection = (HttpURLConnection) url.openConnection(); > >>> System.out.println("After open connection"); > >>> connection.setRequestMethod("POST"); > >>> connection.setDoInput(true); > >>> connection.setDoOutput(true); > >>> > >>> connection.setUseCaches(false); > >>> connection.setAllowUserInteraction(false); > >>> //connection.setFollowRedirects(true); > >>> //connection.setInstanceFollowRedirects(true); > >>> //System.out.println("Before-------------------"); > >>> connection.setRequestProperty ("Content-Type","text/xml; > >>> charset=\"utf-8\""); > >>> //System.out.println("After-------------------"); > >>> > >>> //System.out.println(""+ connection.getOutputStream()); > >>> > >>> //System.out.println("After dataoutputstream..Line No-65"); > >>> > >>> //System.out.println("Response Code="+ connection.getResponseCode); > >>> > >>> OutputStreamWriter dosout = new > >>> OutputStreamWriter(connection.getOutputStream()); > >>> //System.out.println("After dosout object..Line No-63"); > >>> //dosout.write(str); > >>> dosout.close (); > >>> > >>> BufferedReader in = new BufferedReader( new InputStreamReader( > >>> connection.getInputStream())); > >>> > >>> String decodedString; > >>> String tempstr = ""; > >>> > >>> > >>> while ((decodedString = in.readLine()) != null) > >>> { > >>> tempstr = tempstr + decodedString; > >>> //out.println(decodedString); > >>> } > >>> out.println(tempstr); > >>> in.close(); > >>> } > >>> catch(Exception ex) > >>> { > >>> out.println("Exception->"+ex); > >>> PrintWriter pw = response.getWriter(); > >>> ex.printStackTrace(pw); > >>> } > >>> > >>> > >>> %> > >>> > >>> Thanks in advance.. > >>> > >>> Regards, > >>> JItesh Dundas > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From holland at eaglegenomics.com Tue Nov 24 09:51:49 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Nov 2009 14:51:49 +0000 Subject: [Biojava-l] Fwd: Java Error:- XML Parsing Error: XML or text declaration not at start of entity References: <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com> Message-ID: <02966AA0-0DD3-4EF1-9D02-E86F593D16D8@eaglegenomics.com> Jitesh - I forwarded your response to the list so that everyone can get the chance to reply. cheers, Richard Begin forwarded message: > From: jitesh dundas > Date: 24 November 2009 14:47:00 GMT > To: Richard Holland > Subject: Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity > > Dear Sir, > > Thank you for your reply. I figured this problem out by sending records in small sets. e.g. 20 pages per page. > > It is like a pagination functionality. For each new page, we need to hit the URl.. > > My functionality is working fine.I will be happy to share my code with you (and anyone) who needs it. > > I simply fetch data from the URL and write to an XML file. Next I just read the XML file and show them in the web page to the user. > > Again, I need to know how to fetch records for protein database. Two types of searches are needed I suspect. > > First we use the Esearch utility and then the Efetch utility to get the data of the specific protein.. > > I welcome any suggestions on this ! > > Thank you everyone for your help. > > Regards, > Jitesh Dundas > > On 11/24/09, Richard Holland wrote: > Your program takes an input 'txtURLString' - could you give an example of the value that this usually contains? I suspect that this URL is where your problem lies but without seeing an example value I couldn't say for sure. > > thanks, > Richard > > On 8 Nov 2009, at 10:22, jitesh dundas wrote: > > > Dear Sir, > > > > My program is working fine and can send me an xml file with 20 > > records. However, it does not allow me to send large amounts of > > records. > > > > For e.g. if I enter "cancer" it will return only 20 records. > > > > Can you please tell me what I should do next to get all those records. > > Thank you in advance > > > > Regards, > > Jitesh Dundas > > > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > >> > >> Hi Jitesh, > >> > >> It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? > >> > >> Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... > >> > >> Andreas > >> > >> > >> > >> > >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: > >>> > >>> Hi friends, > >>> > >>> I am getting this error on doing a post(using the code below) to this url-> > >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > >>> > >>> I have written this code in .jsp file. Later I will change it into servlet. > >>> > >>> Error:- > >>> XML Parsing Error: XML or text declaration not at start of entity > >>> Location: > >>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > >>> Line Number 11, Column 1: >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200 > >>> 19877350 19877304 19877297 > >>> 19877284 19877271 19877265 > >>> 19877250 19877245 19877226 > >>> 19877210 19877179 19877175 > >>> 19877161 19877159 19877158 > >>> 19877123 19877122 19877120 > >>> 19877119 19877118 > >>> cancer > >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > >>> Fields] > >>> "neoplasms"[MeSH Terms] MeSH > >>> Terms 2082133 Y > >>> "neoplasms"[All Fields] All > >>> Fields 1634731 Y > >>> OR "cancer"[All Fields] > >>> All Fields 902537 Y > >>> OR GROUP > >>> 2009/10/22[EDAT] EDAT 0 > >>> Y > >>> 2009/11/01[EDAT] EDAT 0 > >>> Y RANGE AND > >>> ("neoplasms"[MeSH Terms] OR > >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > >>> 2009/11/01[EDAT] > >>> ^ > >>> > >>> As you can see, the XML output is coming fine but the above error does not > >>> go..The output via this program should be just like hitting manually the > >>> above URL in the browser.. > >>> The browser is Mozilla Firefox. > >>> > >>> Code:- > >>> > >>> <%@ page language = "java" %> > >>> <%@ page import = "java.sql.*" %> > >>> <%@ page import = "java.util.*" %> > >>> <%@ page import = "java.io.*" %> > >>> <%@ page import="java.lang.*" %> > >>> <%@ page import="java.net.*" %> > >>> <%@ page import="java.nio.*" %> > >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > >>> > >>> > >>> <% > >>> > >>> try > >>> { > >>> //String str = ""; > >>> //out.println(""); > >>> > >>> Properties systemSettings = System.getProperties(); > >>> systemSettings.put("http.proxyHost", "********"); > >>> systemSettings.put("http.proxyPort", "******"); > >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > >>> > >>> //out.println("Properties Set"); > >>> Authenticator.setDefault(new Authenticator() > >>> { > >>> protected PasswordAuthentication getPasswordAuthentication() > >>> { > >>> return new PasswordAuthentication("**", > >>> "******".toCharArray()); // specify ur user name password of iitb login > >>> } > >>> }); > >>> > >>> > >>> System.setProperties(systemSettings); > >>> //out.println("After Authentication & Properties Settings"); > >>> > >>> //create xml file. > >>> //the input to google api > >>> //String textAreaContent = request.getParameter("text"); > >>> String textAreaContent = "This si a tst"; > >>> > >>> String str = ""; > >>> > >>> //xml file generation ends here.. > >>> //FetchDataFromNCBI_URLString.jsp > >>> String URLString = request.getParameter("txtURLString").trim(); > >>> > >>> //URL url = new URL(" > >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > >>> "); > >>> URL url = new URL(URLString); //url string taken from user input. > >>> HttpURLConnection connection = null; > >>> > >>> connection = (HttpURLConnection) url.openConnection(); > >>> System.out.println("After open connection"); > >>> connection.setRequestMethod("POST"); > >>> connection.setDoInput(true); > >>> connection.setDoOutput(true); > >>> > >>> connection.setUseCaches(false); > >>> connection.setAllowUserInteraction(false); > >>> //connection.setFollowRedirects(true); > >>> //connection.setInstanceFollowRedirects(true); > >>> //System.out.println("Before-------------------"); > >>> connection.setRequestProperty ("Content-Type","text/xml; > >>> charset=\"utf-8\""); > >>> //System.out.println("After-------------------"); > >>> > >>> //System.out.println(""+ connection.getOutputStream()); > >>> > >>> //System.out.println("After dataoutputstream..Line No-65"); > >>> > >>> //System.out.println("Response Code="+ connection.getResponseCode); > >>> > >>> OutputStreamWriter dosout = new > >>> OutputStreamWriter(connection.getOutputStream()); > >>> //System.out.println("After dosout object..Line No-63"); > >>> //dosout.write(str); > >>> dosout.close (); > >>> > >>> BufferedReader in = new BufferedReader( new InputStreamReader( > >>> connection.getInputStream())); > >>> > >>> String decodedString; > >>> String tempstr = ""; > >>> > >>> > >>> while ((decodedString = in.readLine()) != null) > >>> { > >>> tempstr = tempstr + decodedString; > >>> //out.println(decodedString); > >>> } > >>> out.println(tempstr); > >>> in.close(); > >>> } > >>> catch(Exception ex) > >>> { > >>> out.println("Exception->"+ex); > >>> PrintWriter pw = response.getWriter(); > >>> ex.printStackTrace(pw); > >>> } > >>> > >>> > >>> %> > >>> > >>> Thanks in advance.. > >>> > >>> Regards, > >>> JItesh Dundas > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Tue Nov 24 10:27:20 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Nov 2009 15:27:20 +0000 Subject: [Biojava-l] Hackathon in January Message-ID: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> Hi all. To anyone planning on attending the BioJava hackathon in Cambridge (UK) in January, now would be a good time to sort out travel arrangements. If you're intending to come but haven't yet said so, please do let me know so that I can ensure we get a big enough room to work in! cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Tue Nov 24 12:54:38 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 24 Nov 2009 09:54:38 -0800 Subject: [Biojava-l] Hackathon in January In-Reply-To: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> Message-ID: <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> * Is anybody interested in following the ongoings at the hackaton via an online-stream? - I received a request about this and am wondering if more people would be interested in this. * Just to repeat the current status regarding the program: So far the plan is to continue working on the new modules. Ideally we will have a brand new biojava 3 ready soon after the hackaton. A more detailed program for the week will be sent out in January. If anybody wants to propose feature requests, you can have a look at the current todo list for the modules: http://biojava.org/wiki/BioJava:Modules Andreas On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland wrote: > Hi all. > > To anyone planning on attending the BioJava hackathon in Cambridge (UK) in January, now would be a good time to sort out travel arrangements. If you're intending to come but haven't yet said so, please do let me know so that I can ensure we get a big enough room to work in! > > cheers, > Richard > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Wed Nov 25 07:51:23 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 25 Nov 2009 12:51:23 +0000 Subject: [Biojava-l] [Biojava-dev] Hackathon in January In-Reply-To: <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> Message-ID: <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk> By online stream do you mean Wave or Twitter or something else trendy? :) Andy On 24 Nov 2009, at 17:54, Andreas Prlic wrote: > * Is anybody interested in following the ongoings at the hackaton via > an online-stream? - I received a request about this and am wondering > if more people would be interested in this. > > * Just to repeat the current status regarding the program: So far the > plan is to continue working on the new modules. Ideally we will have a > brand new biojava 3 ready soon after the hackaton. A more detailed > program for the week will be sent out in January. > > If anybody wants to propose feature requests, you can have a look at > the current todo list for the modules: > http://biojava.org/wiki/BioJava:Modules > > Andreas > > > > On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland > wrote: >> Hi all. >> >> To anyone planning on attending the BioJava hackathon in Cambridge >> (UK) in January, now would be a good time to sort out travel >> arrangements. If you're intending to come but haven't yet said so, >> please do let me know so that I can ensure we get a big enough room >> to work in! >> >> cheers, >> Richard >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Wed Nov 25 15:08:33 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 25 Nov 2009 12:08:33 -0800 Subject: [Biojava-l] [Biojava-dev] Hackathon in January In-Reply-To: <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk> References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk> Message-ID: <59a41c430911251208u44bb1218l7cb0065657ed7227@mail.gmail.com> I was thinking about video... I would expect that some of the participants will do some sort of tweeting, blogging, etc. Andreas On Wed, Nov 25, 2009 at 4:51 AM, Andy Yates wrote: > By online stream do you mean Wave or Twitter or something else trendy? :) > > Andy > > On 24 Nov 2009, at 17:54, Andreas Prlic wrote: > >> * Is anybody interested in following the ongoings at the hackaton via >> an online-stream? - I received a request about this and am wondering >> if more people would be interested in this. >> >> * Just to repeat the current status regarding the program: So far the >> plan is to continue working on the new modules. Ideally we will have a >> brand new biojava 3 ready soon after the hackaton. A more detailed >> program for the week will be sent out in January. >> >> If anybody wants to propose feature requests, you can have a look at >> the current todo list for the modules: >> http://biojava.org/wiki/BioJava:Modules >> >> Andreas >> >> >> >> On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland >> wrote: >>> >>> Hi all. >>> >>> To anyone planning on attending the BioJava hackathon in Cambridge (UK) >>> in January, now would be a good time to sort out travel arrangements. If >>> you're intending to come but haven't yet said so, please do let me know so >>> that I can ensure we get a big enough room to work in! >>> >>> cheers, >>> Richard >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From jw12 at sanger.ac.uk Thu Nov 26 09:57:35 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 26 Nov 2009 14:57:35 +0000 Subject: [Biojava-l] DAS workshop 7th-9th April 2010 Message-ID: We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part then please email me jw12 at sanger.ac.uk The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: http://www.dasregistry.org/course.jsp If you would like to present then please send a short summary of what you would like to talk about. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From mauricio at open-bio.org Thu Nov 26 16:45:43 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 26 Nov 2009 15:45:43 -0600 Subject: [Biojava-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <4B0EF707.6080202@open-bio.org> Hi Jonathan, Any chance it can be webcasted? I'm sure it would attract a lot of remote attendees ;) Regards, Mauricio. Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here > at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If > you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st > day for beginners, 2nd for both beginners and advanced users, 3rd day > for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what > you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > From jbdundas at gmail.com Sun Nov 1 15:41:03 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Sun, 1 Nov 2009 21:11:03 +0530 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> Message-ID: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> Hi friends, I am getting this error on doing a post(using the code below) to this url-> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 I have written this code in .jsp file. Later I will change it into servlet. Error:- XML Parsing Error: XML or text declaration not at start of entity Location: http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI Line Number 11, Column 1:2034200 19877350 19877304 19877297 19877284 19877271 19877265 19877250 19877245 19877226 19877210 19877179 19877175 19877161 19877159 19877158 19877123 19877122 19877120 19877119 19877118 cancer "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields] "neoplasms"[MeSH Terms] MeSH Terms 2082133 Y "neoplasms"[All Fields] All Fields 1634731 Y OR "cancer"[All Fields] All Fields 902537 Y OR GROUP 2009/10/22[EDAT] EDAT 0 Y 2009/11/01[EDAT] EDAT 0 Y RANGE AND ("neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : 2009/11/01[EDAT] ^ As you can see, the XML output is coming fine but the above error does not go..The output via this program should be just like hitting manually the above URL in the browser.. The browser is Mozilla Firefox. Code:- <%@ page language = "java" %> <%@ page import = "java.sql.*" %> <%@ page import = "java.util.*" %> <%@ page import = "java.io.*" %> <%@ page import="java.lang.*" %> <%@ page import="java.net.*" %> <%@ page import="java.nio.*" %> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> <% try { //String str = ""; //out.println(""); Properties systemSettings = System.getProperties(); systemSettings.put("http.proxyHost", "********"); systemSettings.put("http.proxyPort", "******"); systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); //out.println("Properties Set"); Authenticator.setDefault(new Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication("**", "******".toCharArray()); // specify ur user name password of iitb login } }); System.setProperties(systemSettings); //out.println("After Authentication & Properties Settings"); //create xml file. //the input to google api //String textAreaContent = request.getParameter("text"); String textAreaContent = "This si a tst"; String str = ""; //xml file generation ends here.. //FetchDataFromNCBI_URLString.jsp String URLString = request.getParameter("txtURLString").trim(); //URL url = new URL(" http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 "); URL url = new URL(URLString); //url string taken from user input. HttpURLConnection connection = null; connection = (HttpURLConnection) url.openConnection(); System.out.println("After open connection"); connection.setRequestMethod("POST"); connection.setDoInput(true); connection.setDoOutput(true); connection.setUseCaches(false); connection.setAllowUserInteraction(false); //connection.setFollowRedirects(true); //connection.setInstanceFollowRedirects(true); //System.out.println("Before-------------------"); connection.setRequestProperty ("Content-Type","text/xml; charset=\"utf-8\""); //System.out.println("After-------------------"); //System.out.println(""+ connection.getOutputStream()); //System.out.println("After dataoutputstream..Line No-65"); //System.out.println("Response Code="+ connection.getResponseCode); OutputStreamWriter dosout = new OutputStreamWriter(connection.getOutputStream()); //System.out.println("After dosout object..Line No-63"); //dosout.write(str); dosout.close (); BufferedReader in = new BufferedReader( new InputStreamReader( connection.getInputStream())); String decodedString; String tempstr = ""; while ((decodedString = in.readLine()) != null) { tempstr = tempstr + decodedString; //out.println(decodedString); } out.println(tempstr); in.close(); } catch(Exception ex) { out.println("Exception->"+ex); PrintWriter pw = response.getWriter(); ex.printStackTrace(pw); } %> Thanks in advance.. Regards, JItesh Dundas From andreas at sdsc.edu Sun Nov 1 16:06:29 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 1 Nov 2009 08:06:29 -0800 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> Message-ID: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> Hi Jitesh, It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... Andreas On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: > Hi friends, > > I am getting this error on doing a post(using the code below) to this url-> > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > > I have written this code in .jsp file. Later I will change it into servlet. > > Error:- > XML Parsing Error: XML or text declaration not at start of entity > Location: > > http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > Line Number 11, Column 1: PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd > ">2034200 > 19877350 19877304 19877297 > 19877284 19877271 19877265 > 19877250 19877245 19877226 > 19877210 19877179 19877175 > 19877161 19877159 19877158 > 19877123 19877122 19877120 > 19877119 19877118 > cancer > "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > Fields] > "neoplasms"[MeSH Terms] MeSH > Terms 2082133 Y > "neoplasms"[All Fields] > All > Fields 1634731 Y > OR "cancer"[All Fields] > All Fields 902537 Y > OR GROUP > 2009/10/22[EDAT] EDAT 0 > Y > 2009/11/01[EDAT] EDAT 0 > Y RANGE AND > ("neoplasms"[MeSH Terms] OR > "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > 2009/11/01[EDAT] > ^ > > As you can see, the XML output is coming fine but the above error does not > go..The output via this program should be just like hitting manually the > above URL in the browser.. > The browser is Mozilla Firefox. > > Code:- > > <%@ page language = "java" %> > <%@ page import = "java.sql.*" %> > <%@ page import = "java.util.*" %> > <%@ page import = "java.io.*" %> > <%@ page import="java.lang.*" %> > <%@ page import="java.net.*" %> > <%@ page import="java.nio.*" %> > <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > > > <% > > try > { > //String str = ""; > //out.println(""); > > Properties systemSettings = System.getProperties(); > systemSettings.put("http.proxyHost", "********"); > systemSettings.put("http.proxyPort", "******"); > systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > > //out.println("Properties Set"); > Authenticator.setDefault(new Authenticator() > { > protected PasswordAuthentication getPasswordAuthentication() > { > return new PasswordAuthentication("**", > "******".toCharArray()); // specify ur user name password of iitb login > } > }); > > > System.setProperties(systemSettings); > //out.println("After Authentication & Properties Settings"); > > //create xml file. > //the input to google api > //String textAreaContent = request.getParameter("text"); > String textAreaContent = "This si a tst"; > > String str = ""; > > //xml file generation ends here.. > //FetchDataFromNCBI_URLString.jsp > String URLString = request.getParameter("txtURLString").trim(); > > //URL url = new URL(" > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > "); > URL url = new URL(URLString); //url string taken from user input. > HttpURLConnection connection = null; > > connection = (HttpURLConnection) url.openConnection(); > System.out.println("After open connection"); > connection.setRequestMethod("POST"); > connection.setDoInput(true); > connection.setDoOutput(true); > > connection.setUseCaches(false); > connection.setAllowUserInteraction(false); > //connection.setFollowRedirects(true); > //connection.setInstanceFollowRedirects(true); > //System.out.println("Before-------------------"); > connection.setRequestProperty ("Content-Type","text/xml; > charset=\"utf-8\""); > //System.out.println("After-------------------"); > > //System.out.println(""+ connection.getOutputStream()); > > //System.out.println("After dataoutputstream..Line No-65"); > > //System.out.println("Response Code="+ connection.getResponseCode); > > OutputStreamWriter dosout = new > OutputStreamWriter(connection.getOutputStream()); > //System.out.println("After dosout object..Line No-63"); > //dosout.write(str); > dosout.close (); > > BufferedReader in = new BufferedReader( new InputStreamReader( > connection.getInputStream())); > > String decodedString; > String tempstr = ""; > > > while ((decodedString = in.readLine()) != null) > { > tempstr = tempstr + decodedString; > //out.println(decodedString); > } > out.println(tempstr); > in.close(); > } > catch(Exception ex) > { > out.println("Exception->"+ex); > PrintWriter pw = response.getWriter(); > ex.printStackTrace(pw); > } > > > %> > > Thanks in advance.. > > Regards, > JItesh Dundas > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jbdundas at gmail.com Mon Nov 2 08:19:19 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Mon, 2 Nov 2009 13:49:19 +0530 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> Message-ID: <326ea8620911020019w2b6a8307o5befcc5a4395299a@mail.gmail.com> Dear Dr. *Andreas Prlic,* Thank you for the advise. I will do that. Regards, Jitesh Dundas On 11/1/09, Andreas Prlic wrote: > > Hi Jitesh, > > It is hard to read your code with all the formatting off probably due to > email and many commented lines that don;t seem to get used. Can you provide > the stacktrace, so we can see what part of biojava is affected? > > Probably a good strategy to write and debug this is to simply the problem > into smaller steps. Try to first download the files you want to parse and > write the code to parse them from the local file. That will avoid any > issues you might encounter with networking and server/client communication. > Once the parsing is working you could take it to the next step and add the > server communication... > > Andreas > > > > > On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: > >> Hi friends, >> >> I am getting this error on doing a post(using the code below) to this >> url-> >> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 >> >> I have written this code in .jsp file. Later I will change it into >> servlet. >> >> Error:- >> XML Parsing Error: XML or text declaration not at start of entity >> Location: >> >> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI >> Line Number 11, Column 1:> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " >> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd >> ">2034200 >> 19877350 19877304 19877297 >> 19877284 19877271 19877265 >> 19877250 19877245 19877226 >> 19877210 19877179 19877175 >> 19877161 19877159 19877158 >> 19877123 19877122 19877120 >> 19877119 19877118 >> cancer >> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All >> Fields] >> "neoplasms"[MeSH Terms] MeSH >> Terms 2082133 Y >> "neoplasms"[All Fields] >> All >> Fields 1634731 Y >> OR "cancer"[All Fields] >> All Fields 902537 Y >> OR GROUP >> 2009/10/22[EDAT] EDAT 0 >> Y >> 2009/11/01[EDAT] EDAT 0 >> Y RANGE AND >> ("neoplasms"[MeSH Terms] OR >> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : >> 2009/11/01[EDAT] >> ^ >> >> As you can see, the XML output is coming fine but the above error does not >> go..The output via this program should be just like hitting manually the >> above URL in the browser.. >> The browser is Mozilla Firefox. >> >> Code:- >> >> <%@ page language = "java" %> >> <%@ page import = "java.sql.*" %> >> <%@ page import = "java.util.*" %> >> <%@ page import = "java.io.*" %> >> <%@ page import="java.lang.*" %> >> <%@ page import="java.net.*" %> >> <%@ page import="java.nio.*" %> >> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> >> >> >> <% >> >> try >> { >> //String str = ""; >> //out.println(""); >> >> Properties systemSettings = System.getProperties(); >> systemSettings.put("http.proxyHost", "********"); >> systemSettings.put("http.proxyPort", "******"); >> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); >> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); >> >> //out.println("Properties Set"); >> Authenticator.setDefault(new Authenticator() >> { >> protected PasswordAuthentication getPasswordAuthentication() >> { >> return new PasswordAuthentication("**", >> "******".toCharArray()); // specify ur user name password of iitb login >> } >> }); >> >> >> System.setProperties(systemSettings); >> //out.println("After Authentication & Properties Settings"); >> >> //create xml file. >> //the input to google api >> //String textAreaContent = request.getParameter("text"); >> String textAreaContent = "This si a tst"; >> >> String str = ""; >> >> //xml file generation ends here.. >> //FetchDataFromNCBI_URLString.jsp >> String URLString = request.getParameter("txtURLString").trim(); >> >> //URL url = new URL(" >> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 >> "); >> URL url = new URL(URLString); //url string taken from user input. >> HttpURLConnection connection = null; >> >> connection = (HttpURLConnection) url.openConnection(); >> System.out.println("After open connection"); >> connection.setRequestMethod("POST"); >> connection.setDoInput(true); >> connection.setDoOutput(true); >> >> connection.setUseCaches(false); >> connection.setAllowUserInteraction(false); >> //connection.setFollowRedirects(true); >> //connection.setInstanceFollowRedirects(true); >> //System.out.println("Before-------------------"); >> connection.setRequestProperty ("Content-Type","text/xml; >> charset=\"utf-8\""); >> //System.out.println("After-------------------"); >> >> //System.out.println(""+ connection.getOutputStream()); >> >> //System.out.println("After dataoutputstream..Line No-65"); >> >> //System.out.println("Response Code="+ connection.getResponseCode); >> >> OutputStreamWriter dosout = new >> OutputStreamWriter(connection.getOutputStream()); >> //System.out.println("After dosout object..Line No-63"); >> //dosout.write(str); >> dosout.close (); >> >> BufferedReader in = new BufferedReader( new InputStreamReader( >> connection.getInputStream())); >> >> String decodedString; >> String tempstr = ""; >> >> >> while ((decodedString = in.readLine()) != null) >> { >> tempstr = tempstr + decodedString; >> //out.println(decodedString); >> } >> out.println(tempstr); >> in.close(); >> } >> catch(Exception ex) >> { >> out.println("Exception->"+ex); >> PrintWriter pw = response.getWriter(); >> ex.printStackTrace(pw); >> } >> >> >> %> >> >> Thanks in advance.. >> >> Regards, >> JItesh Dundas >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From pingou at pingoured.fr Mon Nov 2 14:03:15 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Mon, 02 Nov 2009 15:03:15 +0100 Subject: [Biojava-l] NCBI xml parser Message-ID: <1257170595.29918.8.camel@localhost.localdomain> Dear list, I am trying to find my way around parsing ncbi blast xml. I am using a small library which performs the blast online [1] and returns a FileReader of the xml. I can convert the FileReader to a string and print it, it seems fine. (I used the default input shown on [1]). So I am now trying to parse it automatically. I looked at [2] and [3] but I could not get them working. I then found this message from this mailing list [4] and thus went to use BlastXMLParserFacade. It returns me an "org.xml.sax.SAXException: illegal frame number encountered. (0)". So my question is then: which method should I use ? Thanks in advance, Best regards, Pierre [1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/ [2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo [3] http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book [4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html From jogoodma at indiana.edu Mon Nov 2 14:45:09 2009 From: jogoodma at indiana.edu (Josh Goodman) Date: Mon, 02 Nov 2009 09:45:09 -0500 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257170595.29918.8.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> Message-ID: <4AEEF075.4020700@indiana.edu> It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1 for blastp. Hence the illegal frame number (0) error. Josh Pierre-Yves wrote: > Dear list, > > I am trying to find my way around parsing ncbi blast xml. > I am using a small library which performs the blast online [1] and > returns a FileReader of the xml. > I can convert the FileReader to a string and print it, it seems fine. > (I used the default input shown on [1]). > > So I am now trying to parse it automatically. I looked at [2] and [3] > but I could not get them working. I then found this message from this > mailing list [4] and thus went to use BlastXMLParserFacade. > It returns me an "org.xml.sax.SAXException: illegal frame number > encountered. (0)". > > So my question is then: which method should I use ? > > Thanks in advance, > > Best regards, > > Pierre > > > > [1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/ > [2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo > [3] > http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book > [4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From pingou at pingoured.fr Mon Nov 2 16:17:16 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Mon, 02 Nov 2009 17:17:16 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <4AEEF075.4020700@indiana.edu> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> Message-ID: <1257178636.29918.11.camel@localhost.localdomain> On Mon, 2009-11-02 at 09:45 -0500, Josh Goodman wrote: > It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1 > for blastp. Hence the illegal frame number (0) error. > > Josh Thanks for the hint. I downloaded the biojava-1.7-src.jar to check the sources and correct the frame to 0 (I already saw the case to change). However, without changing anything on the source, when I try to reproduce the error, I got a new one: "org.xml.sax.SAXParseException: The markup declarations contained or pointed to by the document type declaration must be well-formed." I understand the error, I am more surprised by the fact that the jar and the sources of the release 1.7 are given a different errors. Did I miss something ? Thanks, Best regards, Pierre From holland at eaglegenomics.com Mon Nov 2 17:16:00 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 2 Nov 2009 17:16:00 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> Message-ID: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> The graphs returned by the Nexus parser are instances that implement the org.jgrapht.UndirectedGraph interface. Undirected graphs have no root. cheers, Richard On 30 Oct 2009, at 21:14, Tiago Ant?o wrote: > Hi, > > I have been trying to use biojava to parse some trees on nexus files > and I have a small doubt: > If there is a rooted tree, how can one know what is the root vertex in > the weighted graph (JGraphT)? > I understand that there is no root if the tree is unrooted, but in > case it is rooted, how to determine the vertex? > > Many thanks, > Tiago > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Mon Nov 2 19:29:04 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 2 Nov 2009 11:29:04 -0800 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257178636.29918.11.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> Message-ID: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> > > > I understand the error, I am more surprised by the fact that the jar > and the sources of the release 1.7 are given a different errors. > > that's surprising... I built the src-jar and the other jars at the same time so the code should be identical... Are you sure you are doing exactly the same? Andreas From tiagoantao at gmail.com Mon Nov 2 19:36:31 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 2 Nov 2009 19:36:31 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> Message-ID: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> 2009/11/2 Richard Holland : > The graphs returned by the Nexus parser are instances that implement the > org.jgrapht.UndirectedGraph interface. Undirected graphs have no root. Yes, that is a property of the jgrapht. But it might not be the case of the original nexus file/tree. So, if the tree is rooted, how can one know the root (without doing the parsing again ourselves to discover it)? I note two things: a) The root is obviously not one taxa, but one intermediate node. b) Even if the tree is unrooted, it might be interesting to know the "root", for instance to draw the tree, in the way that is was written in the file. Tiago PS - I also added to bugzilla one but related to the parser, but that is different problem... From pingou at pingoured.fr Mon Nov 2 19:50:25 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Mon, 02 Nov 2009 20:50:25 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> Message-ID: <4AEF3801.10304@pingoured.fr> On 11/02/2009 08:29 PM, Andreas Prlic wrote: >> >> I understand the error, I am more surprised by the fact that the jar >> and the sources of the release 1.7 are given a different errors. >> >> > that's surprising... I built the src-jar and the other jars at the same time > so the code should be identical... Are you sure you are doing exactly the > same? I can confirm you this tomorrow but AFAIR before I left I tried the same code using or the jar file or the project generated from the sources in NetBeans and it gaves me two differents errors. Best regards, Pierre From holland at eaglegenomics.com Mon Nov 2 22:14:58 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 2 Nov 2009 22:14:58 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> Message-ID: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> The current parser that converts the original Newick tree string into a JGraphT does not take the root into account, and therefore it is not recorded anywhere in the JGraphT object. Someone would have to change the parser to be able to make it record the root node. In the meantime, the JGraph library which is used for displaying JGraphT graphs in a visual form does include root-finding methods, so maybe you could investigate there to see if any of the existing functions might help? cheers, Richard On 2 Nov 2009, at 19:36, Tiago Ant?o wrote: > 2009/11/2 Richard Holland : >> The graphs returned by the Nexus parser are instances that >> implement the >> org.jgrapht.UndirectedGraph interface. Undirected graphs have no >> root. > > > Yes, that is a property of the jgrapht. But it might not be the case > of the original nexus file/tree. So, if the tree is rooted, how can > one know the root (without doing the parsing again ourselves to > discover it)? I note two things: > a) The root is obviously not one taxa, but one intermediate node. > b) Even if the tree is unrooted, it might be interesting to know the > "root", for instance to draw the tree, in the way that is was written > in the file. > > Tiago > PS - I also added to bugzilla one but related to the parser, but that > is different problem... -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Mon Nov 2 23:11:13 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 2 Nov 2009 23:11:13 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> Message-ID: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> 2009/11/2 Richard Holland : > In the meantime, the JGraph library which is used for displaying JGraphT > graphs in a visual form does include root-finding methods, so maybe you > could investigate there to see if any of the existing functions might help? Did that. None can help as the graph is not directed (it would be trivial with a directed graph ,of course). In the current form, the nexus parser is of limited use for tree information: 1. For rooted trees it has a bug has it doesn't say what is the root 2. For unrooted trees, sometimes the "root" (what the user perceives as root) is interesting information. Tiago From holland at eaglegenomics.com Tue Nov 3 09:56:21 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 09:56:21 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> Message-ID: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> On 2 Nov 2009, at 23:11, Tiago Ant?o wrote: > 2009/11/2 Richard Holland : >> In the meantime, the JGraph library which is used for displaying >> JGraphT >> graphs in a visual form does include root-finding methods, so maybe >> you >> could investigate there to see if any of the existing functions >> might help? > > Did that. None can help as the graph is not directed (it would be > trivial with a directed graph ,of course). > In the current form, the nexus parser is of limited use for tree > information: > 1. For rooted trees it has a bug has it doesn't say what is the root The Newick strings used in the Nexus format are themselves undirected graphs. They don't specify which node is the root, which means it must be determined by computation after parsing the string. I'm unsure of the algorithm to use to do this. If there are people on this list who know the algorithm and have time to code it up, volunteers would be welcome. > 2. For unrooted trees, sometimes the "root" (what the user perceives > as root) is interesting information. What the user perceives as root in an unrooted tree could be different for every user, so it would be hard to provide a standard function to read their mind! However if everyone can come up with a commonly agreed way of determining the most likely root computationally, it would be interesting to add this as a feature, with the caveat that it is only a best-effort approximation as the original tree is unrooted. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From pingou at pingoured.fr Tue Nov 3 14:45:08 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Tue, 03 Nov 2009 15:45:08 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <4AEF3801.10304@pingoured.fr> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> Message-ID: <1257259508.26094.2.camel@localhost.localdomain> On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote: > On 11/02/2009 08:29 PM, Andreas Prlic wrote: > >> > >> I understand the error, I am more surprised by the fact that the jar > >> and the sources of the release 1.7 are given a different errors. > >> > >> > > that's surprising... I built the src-jar and the other jars at the same time > > so the code should be identical... Are you sure you are doing exactly the > > same? > > I can confirm you this tomorrow but AFAIR before I left I tried the same > code using or the jar file or the project generated from the sources in > NetBeans and it gaves me two differents errors. Ok so just for the record: - If I use the .jar file I get an error (1) - If I create a project in NetBeans using the source from BioJava I get a different error (2) - If I add as dependencies the sources from BioJava I get the first error (1) I thus went for the third solution and found my way around :-) Thanks for the help. Best regards, Pierre From andreas.prlic at gmail.com Tue Nov 3 14:56:06 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 3 Nov 2009 06:56:06 -0800 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257259508.26094.2.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> <1257259508.26094.2.camel@localhost.localdomain> Message-ID: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> So what you are saying is that you had a classpath problem and by configuring dependencies correctly the problem went away? Andreas On 3 Nov 2009, at 06:45, Pierre-Yves wrote: > On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote: >> On 11/02/2009 08:29 PM, Andreas Prlic wrote: >>>> >>>> I understand the error, I am more surprised by the fact that the >>>> jar >>>> and the sources of the release 1.7 are given a different errors. >>>> >>>> >>> that's surprising... I built the src-jar and the other jars at the >>> same time >>> so the code should be identical... Are you sure you are doing >>> exactly the >>> same? >> >> I can confirm you this tomorrow but AFAIR before I left I tried the >> same >> code using or the jar file or the project generated from the >> sources in >> NetBeans and it gaves me two differents errors. > > Ok so just for the record: > - If I use the .jar file I get an error (1) > - If I create a project in NetBeans using the source from BioJava I > get > a different error (2) > - If I add as dependencies the sources from BioJava I get the first > error (1) > > I thus went for the third solution and found my way around :-) > > Thanks for the help. > > Best regards, > > Pierre > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From pingou at pingoured.fr Tue Nov 3 15:00:32 2009 From: pingou at pingoured.fr (Pierre-Yves) Date: Tue, 03 Nov 2009 16:00:32 +0100 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> <1257259508.26094.2.camel@localhost.localdomain> <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> Message-ID: <1257260432.26094.3.camel@localhost.localdomain> On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote: > So what you are saying is that you had a classpath problem and by > configuring dependencies correctly the problem went away? In both case it was compiling, only the error at run time was different. Regards, Pierre From andreas.prlic at gmail.com Tue Nov 3 15:05:17 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 3 Nov 2009 07:05:17 -0800 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257260432.26094.3.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> <4AEEF075.4020700@indiana.edu> <1257178636.29918.11.camel@localhost.localdomain> <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com> <4AEF3801.10304@pingoured.fr> <1257259508.26094.2.camel@localhost.localdomain> <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com> <1257260432.26094.3.camel@localhost.localdomain> Message-ID: <447A40F9-52A1-4B22-8D10-27D22F8381B9@gmail.com> Can you send me the code snipplet off list so I can take a look? Thanks, A On 3 Nov 2009, at 07:00, Pierre-Yves wrote: > On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote: >> So what you are saying is that you had a classpath problem and by >> configuring dependencies correctly the problem went away? > > In both case it was compiling, only the error at run time was > different. > > Regards, > > Pierre > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From hlapp at gmx.net Tue Nov 3 16:53:23 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 3 Nov 2009 11:53:23 -0500 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> Message-ID: The most common ways to root a tree is by mid-point rooting, or using an outgroup. The latter I suppose is equivalent as the user specifying a node as the root. -hilmar On Nov 3, 2009, at 4:56 AM, Richard Holland wrote: > > On 2 Nov 2009, at 23:11, Tiago Ant?o wrote: > >> 2009/11/2 Richard Holland : >>> In the meantime, the JGraph library which is used for displaying >>> JGraphT >>> graphs in a visual form does include root-finding methods, so >>> maybe you >>> could investigate there to see if any of the existing functions >>> might help? >> >> Did that. None can help as the graph is not directed (it would be >> trivial with a directed graph ,of course). >> In the current form, the nexus parser is of limited use for tree >> information: >> 1. For rooted trees it has a bug has it doesn't say what is the root > > The Newick strings used in the Nexus format are themselves > undirected graphs. They don't specify which node is the root, which > means it must be determined by computation after parsing the string. > I'm unsure of the algorithm to use to do this. If there are people > on this list who know the algorithm and have time to code it up, > volunteers would be welcome. > >> 2. For unrooted trees, sometimes the "root" (what the user perceives >> as root) is interesting information. > > What the user perceives as root in an unrooted tree could be > different for every user, so it would be hard to provide a standard > function to read their mind! However if everyone can come up with a > commonly agreed way of determining the most likely root > computationally, it would be interesting to add this as a feature, > with the caveat that it is only a best-effort approximation as the > original tree is unrooted. > > cheers, > Richard > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From thasso.griebel at uni-jena.de Tue Nov 3 17:58:14 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Tue, 3 Nov 2009 18:58:14 +0100 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> Message-ID: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> Hi, > On 2 Nov 2009, at 23:11, Tiago Ant?o wrote: > >> 2009/11/2 Richard Holland : >>> In the meantime, the JGraph library which is used for displaying >>> JGraphT >>> graphs in a visual form does include root-finding methods, so >>> maybe you >>> could investigate there to see if any of the existing functions >>> might help? >> >> Did that. None can help as the graph is not directed (it would be >> trivial with a directed graph ,of course). >> In the current form, the nexus parser is of limited use for tree >> information: >> 1. For rooted trees it has a bug has it doesn't say what is the root > > The Newick strings used in the Nexus format are themselves > undirected graphs. They don't specify which node is the root, which > means it must be determined by computation after parsing the string. > I'm unsure of the algorithm to use to do this. If there are people > on this list who know the algorithm and have time to code it up, > volunteers would be welcome. There is a way to uniquely get a root from a newick string. Usually a rooted newick is surrounded with brackets, which indicates the root as the highest node in the tree. For example: (A, (B,C)) describes a tree rooted between "A" and the clade (B,C), and with the surrounding brackets this is unique. In nexus the situation might be a bit different. nexus allows you to prefix the newick string with [&R] or [&U] to indicate rooted/unrooted trees. For example: tree treename = [&R] ((A,(B,C)),(D,E)); is a valid rooted nexus tree where the root is placed between the clades [A.B,C] and [D,E], although in this example the newick is surrounded with brackets and rooted uniquely by itself. >> 2. For unrooted trees, sometimes the "root" (what the user perceives >> as root) is interesting information. > > What the user perceives as root in an unrooted tree could be > different for every user, so it would be hard to provide a standard > function to read their mind! However if everyone can come up with a > commonly agreed way of determining the most likely root > computationally, it would be interesting to add this as a feature, > with the caveat that it is only a best-effort approximation as the > original tree is unrooted. BioNJ implements multiple methods to determine a root in a neighbor- joining tree. I can look it up, but I think the most common ways to compute the root are: try to place the root in the "middle" such that your tree is balanced and you have equal number of leaves to both sides of the tree. The other method I remember is based on the edge weights. Basically you find the longest path between two leaves and place the root in the middle of that path (based on the path length). I think the most common way though is to specify an outgroup node and place the root on the path between that outgroup and its successor. I am not sure if the outgroup can be described in nexus somehow. I would also suggest to generally parse trees as rooted trees (maybe jsut for th initial internal model). Creating an unrooted tree from a rooted one is easy, remove the root and forget about directions. The other way might be hard and ambiguous. cheers, Thasso -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From tiagoantao at gmail.com Tue Nov 3 18:16:43 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 3 Nov 2009 18:16:43 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> Message-ID: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> 2009/11/3 Thasso Griebel : > There is a way to uniquely ?get a root from a newick string. Usually a > rooted newick is surrounded with brackets, which indicates the root as the > highest node in the tree. For example: > > (A, (B,C)) > Agree, it is quite easy to get the root of the tree from the newick representation. But it should be done on parsing and returned in some way by the parsing system. If the user has to do it again, it means that the user has to parse it again just to know the root node. > I would also suggest to generally parse trees as rooted trees (maybe jsut > for th initial internal model). Creating an unrooted tree from a rooted ?one > is easy, remove the root and forget about directions. The other way might be > hard and ambiguous. 100% agree. The newick _representation_ always has a root by virtue of the way it is done. If that root has meaning or not depends. Doing as you suggest seems the most reasonable idea. I would add that even if it is an unrooted tree, the topology might be of interest. In my case I am doing a comparative visualizer and it might be nice for the user to be able to visualize the topology as specified. It has no biological meaning, but in practice, for many users, it helps. I note that PhyloXML (even by virtue of being a XML format) always represents the phylogenies as trees (not weigthed DAGs). There an attribute rooted which can be true or false. But, anyway. Even assuming a very conservative view on this, the current parser, for rooted trees, does not allow to determine where is the root. I think that there would be a consensus that that is a bug? Tiago From holland at eaglegenomics.com Tue Nov 3 18:19:36 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 18:19:36 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> Message-ID: Agreed that there is a bug. Now all we need is someone to go in and fix it! :) cheers, Richard On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: > 2009/11/3 Thasso Griebel : >> There is a way to uniquely get a root from a newick string. >> Usually a >> rooted newick is surrounded with brackets, which indicates the root >> as the >> highest node in the tree. For example: >> >> (A, (B,C)) >> > > Agree, it is quite easy to get the root of the tree from the newick > representation. But it should be done on parsing and returned in some > way by the parsing system. If the user has to do it again, it means > that the user has to parse it again just to know the root node. > >> I would also suggest to generally parse trees as rooted trees >> (maybe jsut >> for th initial internal model). Creating an unrooted tree from a >> rooted one >> is easy, remove the root and forget about directions. The other way >> might be >> hard and ambiguous. > > 100% agree. > The newick _representation_ always has a root by virtue of the way it > is done. If that root has meaning or not depends. Doing as you suggest > seems the most reasonable idea. > I would add that even if it is an unrooted tree, the topology might be > of interest. In my case I am doing a comparative visualizer and it > might be nice for the user to be able to visualize the topology as > specified. It has no biological meaning, but in practice, for many > users, it helps. > I note that PhyloXML (even by virtue of being a XML format) always > represents the phylogenies as trees (not weigthed DAGs). There an > attribute rooted which can be true or false. > > But, anyway. Even assuming a very conservative view on this, the > current parser, for rooted trees, does not allow to determine where is > the root. I think that there would be a consensus that that is a bug? > > Tiago -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Tue Nov 3 18:24:52 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 3 Nov 2009 18:24:52 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> Message-ID: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> If somebody would provide the desired changes to the parser interface (wrt this bug and the other one reported previously), I might offer to to the grunt work. But somebody has to say which interface changes are desired. I remember which problems exist: 1. Lack of knowledge of root node 2. The p* stuff. Tiago 2009/11/3 Richard Holland : > Agreed that there is a bug. Now all we need is someone to go in and fix it! > :) > > cheers, > Richard > > On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: > >> 2009/11/3 Thasso Griebel : >>> >>> There is a way to uniquely ?get a root from a newick string. Usually a >>> rooted newick is surrounded with brackets, which indicates the root as >>> the >>> highest node in the tree. For example: >>> >>> (A, (B,C)) >>> >> >> Agree, it is quite easy to get the root of the tree from the newick >> representation. But it should be done on parsing and returned in some >> way by the parsing system. If the user has to do it again, it means >> that the user has to parse it again just to know the root node. >> >>> I would also suggest to generally parse trees as rooted trees (maybe jsut >>> for th initial internal model). Creating an unrooted tree from a rooted >>> ?one >>> is easy, remove the root and forget about directions. The other way might >>> be >>> hard and ambiguous. >> >> 100% agree. >> The newick _representation_ always has a root by virtue of the way it >> is done. If that root has meaning or not depends. Doing as you suggest >> seems the most reasonable idea. >> I would add that even if it is an unrooted tree, the topology might be >> of interest. In my case I am doing a comparative visualizer and it >> might be nice for the user to be able to visualize the topology as >> specified. It has no biological meaning, but in practice, for many >> users, it helps. >> I note that PhyloXML (even by virtue of being a XML format) always >> represents the phylogenies as trees (not weigthed DAGs). There an >> attribute rooted which can be true or false. >> >> But, anyway. Even assuming a very conservative view on this, the >> current parser, for rooted trees, does not allow to determine where is >> the root. I think that there would be a consensus that that is a bug? >> >> Tiago > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Tue Nov 3 18:46:05 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 18:46:05 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> Message-ID: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> > 1. Lack of knowledge of root node The Newick tree string is read as-is and is not parsed. It only gets parsed at the point of conversion to a Undirected or WeightedGraph inside the TreeBlocks.java source code (inside the two types of get-As- JGraphT methods). It's at this point the string is parsed and it's here that root note determination should take place. It's already known whether &R or &U have been specified here, which should help the code work out what to do. > 2. The p* stuff. Exactly the same part of the code as described above. Wherever it pushes values to the stack but prepends them with 'p' first, you'll need to change the 'p' to some instance variable and provide a getter/ setter to change it, with 'p' being the default setting. cheers, Richard > > Tiago > 2009/11/3 Richard Holland : >> Agreed that there is a bug. Now all we need is someone to go in and >> fix it! >> :) >> >> cheers, >> Richard >> >> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >> >>> 2009/11/3 Thasso Griebel : >>>> >>>> There is a way to uniquely get a root from a newick string. >>>> Usually a >>>> rooted newick is surrounded with brackets, which indicates the >>>> root as >>>> the >>>> highest node in the tree. For example: >>>> >>>> (A, (B,C)) >>>> >>> >>> Agree, it is quite easy to get the root of the tree from the newick >>> representation. But it should be done on parsing and returned in >>> some >>> way by the parsing system. If the user has to do it again, it means >>> that the user has to parse it again just to know the root node. >>> >>>> I would also suggest to generally parse trees as rooted trees >>>> (maybe jsut >>>> for th initial internal model). Creating an unrooted tree from a >>>> rooted >>>> one >>>> is easy, remove the root and forget about directions. The other >>>> way might >>>> be >>>> hard and ambiguous. >>> >>> 100% agree. >>> The newick _representation_ always has a root by virtue of the way >>> it >>> is done. If that root has meaning or not depends. Doing as you >>> suggest >>> seems the most reasonable idea. >>> I would add that even if it is an unrooted tree, the topology >>> might be >>> of interest. In my case I am doing a comparative visualizer and it >>> might be nice for the user to be able to visualize the topology as >>> specified. It has no biological meaning, but in practice, for many >>> users, it helps. >>> I note that PhyloXML (even by virtue of being a XML format) always >>> represents the phylogenies as trees (not weigthed DAGs). There an >>> attribute rooted which can be true or false. >>> >>> But, anyway. Even assuming a very conservative view on this, the >>> current parser, for rooted trees, does not allow to determine >>> where is >>> the root. I think that there would be a consensus that that is a >>> bug? >>> >>> Tiago >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Tue Nov 3 18:55:23 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 3 Nov 2009 18:55:23 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> Message-ID: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> But the point is that the class interface changes to the outside user: 1. How does one report back the root to the user? 2. Regarding the prefix stuff, should the user be allowed to specify a preferred prefix? Both this things imply interface changes visible to users. If you still need volunteers to do the change, I can do it. But I need to know what changes to the user interface are to be done. For 1, maybe a method getRoot, returning a string with the name of the root node? For 2, maybe an extended version of the parse function with a suffix as input parameter? 2009/11/3 Richard Holland : >> 1. Lack of knowledge of root node > > The Newick tree string is read as-is and is not parsed. It only gets parsed > at the point of conversion to a Undirected or WeightedGraph inside the > TreeBlocks.java source code (inside the two types of get-As-JGraphT > methods). It's at this point the string is parsed and it's here that root > note determination should take place. It's already known whether &R or &U > have been specified here, which should help the code work out what to do. > >> 2. The p* stuff. > > Exactly the same part of the code as described above. Wherever it pushes > values to the stack but prepends them with 'p' first, you'll need to change > the 'p' to some instance variable and provide a getter/setter to change it, > with 'p' being the default setting. > > cheers, > Richard > >> >> Tiago >> 2009/11/3 Richard Holland : >>> >>> Agreed that there is a bug. Now all we need is someone to go in and fix >>> it! >>> :) >>> >>> cheers, >>> Richard >>> >>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>> >>>> 2009/11/3 Thasso Griebel : >>>>> >>>>> There is a way to uniquely ?get a root from a newick string. Usually a >>>>> rooted newick is surrounded with brackets, which indicates the root as >>>>> the >>>>> highest node in the tree. For example: >>>>> >>>>> (A, (B,C)) >>>>> >>>> >>>> Agree, it is quite easy to get the root of the tree from the newick >>>> representation. But it should be done on parsing and returned in some >>>> way by the parsing system. If the user has to do it again, it means >>>> that the user has to parse it again just to know the root node. >>>> >>>>> I would also suggest to generally parse trees as rooted trees (maybe >>>>> jsut >>>>> for th initial internal model). Creating an unrooted tree from a rooted >>>>> ?one >>>>> is easy, remove the root and forget about directions. The other way >>>>> might >>>>> be >>>>> hard and ambiguous. >>>> >>>> 100% agree. >>>> The newick _representation_ always has a root by virtue of the way it >>>> is done. If that root has meaning or not depends. Doing as you suggest >>>> seems the most reasonable idea. >>>> I would add that even if it is an unrooted tree, the topology might be >>>> of interest. In my case I am doing a comparative visualizer and it >>>> might be nice for the user to be able to visualize the topology as >>>> specified. It has no biological meaning, but in practice, for many >>>> users, it helps. >>>> I note that PhyloXML (even by virtue of being a XML format) always >>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>> attribute rooted which can be true or false. >>>> >>>> But, anyway. Even assuming a very conservative view on this, the >>>> current parser, for rooted trees, does not allow to determine where is >>>> the root. I think that there would be a consensus that that is a bug? >>>> >>>> Tiago >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From peter.midford at gmail.com Tue Nov 3 19:28:14 2009 From: peter.midford at gmail.com (Peter Midford) Date: Tue, 3 Nov 2009 14:28:14 -0500 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: <2E8B7EE9-2617-4096-B7AC-52A398D7E69F@gmail.com> Tiago, If you return a directed graph, the root will be a node with no incoming edges. Peter On Nov 3, 2009, at 13:55, Tiago Ant?o wrote: > But the point is that the class interface changes to the outside user: > 1. How does one report back the root to the user? > 2. Regarding the prefix stuff, should the user be allowed to specify a > preferred prefix? > > Both this things imply interface changes visible to users. > If you still need volunteers to do the change, I can do it. But I need > to know what changes to the user interface are to be done. > For 1, maybe a method getRoot, returning a string with the name of the > root node? > For 2, maybe an extended version of the parse function with a suffix > as input parameter? > > 2009/11/3 Richard Holland : >>> 1. Lack of knowledge of root node >> >> The Newick tree string is read as-is and is not parsed. It only >> gets parsed >> at the point of conversion to a Undirected or WeightedGraph inside >> the >> TreeBlocks.java source code (inside the two types of get-As-JGraphT >> methods). It's at this point the string is parsed and it's here >> that root >> note determination should take place. It's already known whether &R >> or &U >> have been specified here, which should help the code work out what >> to do. >> >>> 2. The p* stuff. >> >> Exactly the same part of the code as described above. Wherever it >> pushes >> values to the stack but prepends them with 'p' first, you'll need >> to change >> the 'p' to some instance variable and provide a getter/setter to >> change it, >> with 'p' being the default setting. >> >> cheers, >> Richard >> >>> >>> Tiago >>> 2009/11/3 Richard Holland : >>>> >>>> Agreed that there is a bug. Now all we need is someone to go in >>>> and fix >>>> it! >>>> :) >>>> >>>> cheers, >>>> Richard >>>> >>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>> >>>>> 2009/11/3 Thasso Griebel : >>>>>> >>>>>> There is a way to uniquely get a root from a newick string. >>>>>> Usually a >>>>>> rooted newick is surrounded with brackets, which indicates the >>>>>> root as >>>>>> the >>>>>> highest node in the tree. For example: >>>>>> >>>>>> (A, (B,C)) >>>>>> >>>>> >>>>> Agree, it is quite easy to get the root of the tree from the >>>>> newick >>>>> representation. But it should be done on parsing and returned in >>>>> some >>>>> way by the parsing system. If the user has to do it again, it >>>>> means >>>>> that the user has to parse it again just to know the root node. >>>>> >>>>>> I would also suggest to generally parse trees as rooted trees >>>>>> (maybe >>>>>> jsut >>>>>> for th initial internal model). Creating an unrooted tree from >>>>>> a rooted >>>>>> one >>>>>> is easy, remove the root and forget about directions. The other >>>>>> way >>>>>> might >>>>>> be >>>>>> hard and ambiguous. >>>>> >>>>> 100% agree. >>>>> The newick _representation_ always has a root by virtue of the >>>>> way it >>>>> is done. If that root has meaning or not depends. Doing as you >>>>> suggest >>>>> seems the most reasonable idea. >>>>> I would add that even if it is an unrooted tree, the topology >>>>> might be >>>>> of interest. In my case I am doing a comparative visualizer and it >>>>> might be nice for the user to be able to visualize the topology as >>>>> specified. It has no biological meaning, but in practice, for many >>>>> users, it helps. >>>>> I note that PhyloXML (even by virtue of being a XML format) always >>>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>>> attribute rooted which can be true or false. >>>>> >>>>> But, anyway. Even assuming a very conservative view on this, the >>>>> current parser, for rooted trees, does not allow to determine >>>>> where is >>>>> the root. I think that there would be a consensus that that is a >>>>> bug? >>>>> >>>>> Tiago >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l Peter E. Midford Mesquite Developer Peter.Midford at gmail.com From holland at eaglegenomics.com Tue Nov 3 20:20:31 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 3 Nov 2009 20:20:31 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: A getRoot() function sounds good. It would return the String label of the root node, the same as which identifies the corresponding vertex in the JGraphT model. An equivalent setRoot() would be nice. The prefix for the parser currently is hardcoded as p. Two new methods - set and getDefaultPrefix which accept a string should be provided (it should check that the string is valid, i.e. all alphanumeric and with no spaces or other Newick-sensitive characters). The parser should be changed to use the output from getDefaultPrefix() instead of the hardcoded p. The default behaviour should be such that it behaves the same as at present unless the user explicitly says otherwise by calling the setDefaultPrefix() method. Personally I would also alter the methods that return JGraphTs so that they return their Directed equivalents if possible. I believe that these can still be unrooted - you'd have to check the JGraphT documentation to make sure. Richard. On 3 Nov 2009, at 18:55, Tiago Ant?o wrote: > But the point is that the class interface changes to the outside user: > 1. How does one report back the root to the user? > 2. Regarding the prefix stuff, should the user be allowed to specify a > preferred prefix? > > Both this things imply interface changes visible to users. > If you still need volunteers to do the change, I can do it. But I need > to know what changes to the user interface are to be done. > For 1, maybe a method getRoot, returning a string with the name of the > root node? > For 2, maybe an extended version of the parse function with a suffix > as input parameter? > > 2009/11/3 Richard Holland : >>> 1. Lack of knowledge of root node >> >> The Newick tree string is read as-is and is not parsed. It only >> gets parsed >> at the point of conversion to a Undirected or WeightedGraph inside >> the >> TreeBlocks.java source code (inside the two types of get-As-JGraphT >> methods). It's at this point the string is parsed and it's here >> that root >> note determination should take place. It's already known whether &R >> or &U >> have been specified here, which should help the code work out what >> to do. >> >>> 2. The p* stuff. >> >> Exactly the same part of the code as described above. Wherever it >> pushes >> values to the stack but prepends them with 'p' first, you'll need >> to change >> the 'p' to some instance variable and provide a getter/setter to >> change it, >> with 'p' being the default setting. >> >> cheers, >> Richard >> >>> >>> Tiago >>> 2009/11/3 Richard Holland : >>>> >>>> Agreed that there is a bug. Now all we need is someone to go in >>>> and fix >>>> it! >>>> :) >>>> >>>> cheers, >>>> Richard >>>> >>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>> >>>>> 2009/11/3 Thasso Griebel : >>>>>> >>>>>> There is a way to uniquely get a root from a newick string. >>>>>> Usually a >>>>>> rooted newick is surrounded with brackets, which indicates the >>>>>> root as >>>>>> the >>>>>> highest node in the tree. For example: >>>>>> >>>>>> (A, (B,C)) >>>>>> >>>>> >>>>> Agree, it is quite easy to get the root of the tree from the >>>>> newick >>>>> representation. But it should be done on parsing and returned in >>>>> some >>>>> way by the parsing system. If the user has to do it again, it >>>>> means >>>>> that the user has to parse it again just to know the root node. >>>>> >>>>>> I would also suggest to generally parse trees as rooted trees >>>>>> (maybe >>>>>> jsut >>>>>> for th initial internal model). Creating an unrooted tree from >>>>>> a rooted >>>>>> one >>>>>> is easy, remove the root and forget about directions. The other >>>>>> way >>>>>> might >>>>>> be >>>>>> hard and ambiguous. >>>>> >>>>> 100% agree. >>>>> The newick _representation_ always has a root by virtue of the >>>>> way it >>>>> is done. If that root has meaning or not depends. Doing as you >>>>> suggest >>>>> seems the most reasonable idea. >>>>> I would add that even if it is an unrooted tree, the topology >>>>> might be >>>>> of interest. In my case I am doing a comparative visualizer and it >>>>> might be nice for the user to be able to visualize the topology as >>>>> specified. It has no biological meaning, but in practice, for many >>>>> users, it helps. >>>>> I note that PhyloXML (even by virtue of being a XML format) always >>>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>>> attribute rooted which can be true or false. >>>>> >>>>> But, anyway. Even assuming a very conservative view on this, the >>>>> current parser, for rooted trees, does not allow to determine >>>>> where is >>>>> the root. I think that there would be a consensus that that is a >>>>> bug? >>>>> >>>>> Tiago >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thasso.griebel at uni-jena.de Wed Nov 4 11:57:45 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Wed, 4 Nov 2009 12:57:45 +0100 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> Hi, > A getRoot() function sounds good. It would return the String label > of the root node, the same as which identifies the corresponding > vertex in the JGraphT model. An equivalent setRoot() would be nice. Though you have to keep in mind that switching the root to another node has certain implications on the tree structure and this has to be taken into account when the newick string is parsed and the graph is created. You have to parse the graph from newick and then "reroot" the tree as the root might not be equal to the one specified in the newick string. > Personally I would also alter the methods that return JGraphTs so > that they return their Directed equivalents if possible. I believe > that these can still be unrooted - you'd have to check the JGraphT > documentation to make sure. You have to change that method signature if you want to use the same method. The only relationship between JGraphTs UndirectedGraph and the DirectedGraph counterpart is that they both extend the Graph interface, but a DirectedGraph is not an UndirectedGraph. Switching to DirectedGraph definitely breaks the current API ! I don't know how you usually handle such situations in BioJava, but this clearly breaks compatibility. Maybe it would be better to introduce a new method that returns directed graphs ? cheers, -thasso > > Richard. > > On 3 Nov 2009, at 18:55, Tiago Ant?o wrote: > >> But the point is that the class interface changes to the outside >> user: >> 1. How does one report back the root to the user? >> 2. Regarding the prefix stuff, should the user be allowed to >> specify a >> preferred prefix? >> >> Both this things imply interface changes visible to users. >> If you still need volunteers to do the change, I can do it. But I >> need >> to know what changes to the user interface are to be done. >> For 1, maybe a method getRoot, returning a string with the name of >> the >> root node? >> For 2, maybe an extended version of the parse function with a suffix >> as input parameter? >> >> 2009/11/3 Richard Holland : >>>> 1. Lack of knowledge of root node >>> >>> The Newick tree string is read as-is and is not parsed. It only >>> gets parsed >>> at the point of conversion to a Undirected or WeightedGraph inside >>> the >>> TreeBlocks.java source code (inside the two types of get-As-JGraphT >>> methods). It's at this point the string is parsed and it's here >>> that root >>> note determination should take place. It's already known whether >>> &R or &U >>> have been specified here, which should help the code work out what >>> to do. >>> >>>> 2. The p* stuff. >>> >>> Exactly the same part of the code as described above. Wherever it >>> pushes >>> values to the stack but prepends them with 'p' first, you'll need >>> to change >>> the 'p' to some instance variable and provide a getter/setter to >>> change it, >>> with 'p' being the default setting. >>> >>> cheers, >>> Richard >>> >>>> >>>> Tiago >>>> 2009/11/3 Richard Holland : >>>>> >>>>> Agreed that there is a bug. Now all we need is someone to go in >>>>> and fix >>>>> it! >>>>> :) >>>>> >>>>> cheers, >>>>> Richard >>>>> >>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>>> >>>>>> 2009/11/3 Thasso Griebel : >>>>>>> >>>>>>> There is a way to uniquely get a root from a newick string. >>>>>>> Usually a >>>>>>> rooted newick is surrounded with brackets, which indicates the >>>>>>> root as >>>>>>> the >>>>>>> highest node in the tree. For example: >>>>>>> >>>>>>> (A, (B,C)) >>>>>>> >>>>>> >>>>>> Agree, it is quite easy to get the root of the tree from the >>>>>> newick >>>>>> representation. But it should be done on parsing and returned >>>>>> in some >>>>>> way by the parsing system. If the user has to do it again, it >>>>>> means >>>>>> that the user has to parse it again just to know the root node. >>>>>> >>>>>>> I would also suggest to generally parse trees as rooted trees >>>>>>> (maybe >>>>>>> jsut >>>>>>> for th initial internal model). Creating an unrooted tree from >>>>>>> a rooted >>>>>>> one >>>>>>> is easy, remove the root and forget about directions. The >>>>>>> other way >>>>>>> might >>>>>>> be >>>>>>> hard and ambiguous. >>>>>> >>>>>> 100% agree. >>>>>> The newick _representation_ always has a root by virtue of the >>>>>> way it >>>>>> is done. If that root has meaning or not depends. Doing as you >>>>>> suggest >>>>>> seems the most reasonable idea. >>>>>> I would add that even if it is an unrooted tree, the topology >>>>>> might be >>>>>> of interest. In my case I am doing a comparative visualizer and >>>>>> it >>>>>> might be nice for the user to be able to visualize the topology >>>>>> as >>>>>> specified. It has no biological meaning, but in practice, for >>>>>> many >>>>>> users, it helps. >>>>>> I note that PhyloXML (even by virtue of being a XML format) >>>>>> always >>>>>> represents the phylogenies as trees (not weigthed DAGs). There an >>>>>> attribute rooted which can be true or false. >>>>>> >>>>>> But, anyway. Even assuming a very conservative view on this, the >>>>>> current parser, for rooted trees, does not allow to determine >>>>>> where is >>>>>> the root. I think that there would be a consensus that that is >>>>>> a bug? >>>>>> >>>>>> Tiago >>>>> >>>>> -- >>>>> Richard Holland, BSc MBCS >>>>> Operations and Delivery Director, Eagle Genomics Ltd >>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>>> http://www.eaglegenomics.com/ >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> "The hottest places in hell are reserved for those who, in times of >>>> moral crisis, maintain a neutrality." - Dante >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From tiagoantao at gmail.com Wed Nov 4 12:40:46 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Nov 2009 12:40:46 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> Message-ID: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> 2009/11/3 Richard Holland : > The prefix for the parser currently is hardcoded as p. Two new methods - set > and getDefaultPrefix which accept a string should be provided (it should > check that the string is valid, i.e. all alphanumeric and with no spaces or > other Newick-sensitive characters). The parser should be changed to use the > output from getDefaultPrefix() instead of the hardcoded p. The default > behaviour should be such that it behaves the same as at present unless the > user explicitly says otherwise by calling the setDefaultPrefix() method. This default behavior would still raise an exception with nodes called p* . I would suggest a minor change: If there is a clash, the parser would try the next p* (or whatever defaultPrefix) ... Example to make it clear: if there is a leaf called p2, internal nodes generated would be p1, p3, p4, .... -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From tiagoantao at gmail.com Wed Nov 4 12:44:21 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Nov 2009 12:44:21 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> Message-ID: <6d941f120911040444y33da2642oe7104708a2d2a6cb@mail.gmail.com> 2009/11/4 Thasso Griebel : >> Personally I would also alter the methods that return JGraphTs so that >> they return their Directed equivalents if possible. I believe that these can >> still be unrooted - you'd have to check the JGraphT documentation to make >> sure. > > You have to change that method signature if you want to use the same method. > The only relationship between JGraphTs UndirectedGraph and the DirectedGraph > counterpart is that they both extend the Graph interface, but a > DirectedGraph is not an UndirectedGraph. Switching to DirectedGraph > definitely breaks the current API ! I don't know how you usually handle such > situations in BioJava, but this clearly breaks compatibility. Maybe it would > be better to introduce a new method that returns directed graphs ? I also don't know how BioJava sorts these kinds of issues. But my personal, outsider, opinion would be in your direction, ie: a. Not break the current API b. Add a new method with a directed graph c. (extra) Add a new method boolean isRooted(void) to check is the tree is rooted or not... Best Tiago From holland at eaglegenomics.com Wed Nov 4 12:46:01 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 4 Nov 2009 12:46:01 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com> <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de> Message-ID: > > You have to change that method signature if you want to use the same > method. The only relationship between JGraphTs UndirectedGraph and > the DirectedGraph counterpart is that they both extend the Graph > interface, but a DirectedGraph is not an UndirectedGraph. Switching > to DirectedGraph definitely breaks the current API ! I don't know > how you usually handle such situations in BioJava, but this clearly > breaks compatibility. Maybe it would be better to introduce a new > method that returns directed graphs ? Whether or not to break the API depends on a few things. First, how old and well adopted is the code. Second, is the existing API illogical or just plain wrong. A balance between the two gives the confidence in which the API can be changed. In this instance, the code is fairly new, not widely adopted, and the existing API is clearly wrong by forcing all JGraphT graphs to be undirected. To keep everyone happy, I would introduce a new method with a new name that takes a boolean or enum option indicating what type of graph the user wants (undirected,directed,whatever). I would then deprecate the existing method and move its contents into the undirected part of the new method, and replace the old method contents with a call to the new method with the option set to undirected. cheers, Richard > cheers, > -thasso > > > > > > >> >> Richard. >> >> On 3 Nov 2009, at 18:55, Tiago Ant?o wrote: >> >>> But the point is that the class interface changes to the outside >>> user: >>> 1. How does one report back the root to the user? >>> 2. Regarding the prefix stuff, should the user be allowed to >>> specify a >>> preferred prefix? >>> >>> Both this things imply interface changes visible to users. >>> If you still need volunteers to do the change, I can do it. But I >>> need >>> to know what changes to the user interface are to be done. >>> For 1, maybe a method getRoot, returning a string with the name of >>> the >>> root node? >>> For 2, maybe an extended version of the parse function with a suffix >>> as input parameter? >>> >>> 2009/11/3 Richard Holland : >>>>> 1. Lack of knowledge of root node >>>> >>>> The Newick tree string is read as-is and is not parsed. It only >>>> gets parsed >>>> at the point of conversion to a Undirected or WeightedGraph >>>> inside the >>>> TreeBlocks.java source code (inside the two types of get-As-JGraphT >>>> methods). It's at this point the string is parsed and it's here >>>> that root >>>> note determination should take place. It's already known whether >>>> &R or &U >>>> have been specified here, which should help the code work out >>>> what to do. >>>> >>>>> 2. The p* stuff. >>>> >>>> Exactly the same part of the code as described above. Wherever it >>>> pushes >>>> values to the stack but prepends them with 'p' first, you'll need >>>> to change >>>> the 'p' to some instance variable and provide a getter/setter to >>>> change it, >>>> with 'p' being the default setting. >>>> >>>> cheers, >>>> Richard >>>> >>>>> >>>>> Tiago >>>>> 2009/11/3 Richard Holland : >>>>>> >>>>>> Agreed that there is a bug. Now all we need is someone to go in >>>>>> and fix >>>>>> it! >>>>>> :) >>>>>> >>>>>> cheers, >>>>>> Richard >>>>>> >>>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote: >>>>>> >>>>>>> 2009/11/3 Thasso Griebel : >>>>>>>> >>>>>>>> There is a way to uniquely get a root from a newick string. >>>>>>>> Usually a >>>>>>>> rooted newick is surrounded with brackets, which indicates >>>>>>>> the root as >>>>>>>> the >>>>>>>> highest node in the tree. For example: >>>>>>>> >>>>>>>> (A, (B,C)) >>>>>>>> >>>>>>> >>>>>>> Agree, it is quite easy to get the root of the tree from the >>>>>>> newick >>>>>>> representation. But it should be done on parsing and returned >>>>>>> in some >>>>>>> way by the parsing system. If the user has to do it again, it >>>>>>> means >>>>>>> that the user has to parse it again just to know the root node. >>>>>>> >>>>>>>> I would also suggest to generally parse trees as rooted trees >>>>>>>> (maybe >>>>>>>> jsut >>>>>>>> for th initial internal model). Creating an unrooted tree >>>>>>>> from a rooted >>>>>>>> one >>>>>>>> is easy, remove the root and forget about directions. The >>>>>>>> other way >>>>>>>> might >>>>>>>> be >>>>>>>> hard and ambiguous. >>>>>>> >>>>>>> 100% agree. >>>>>>> The newick _representation_ always has a root by virtue of the >>>>>>> way it >>>>>>> is done. If that root has meaning or not depends. Doing as you >>>>>>> suggest >>>>>>> seems the most reasonable idea. >>>>>>> I would add that even if it is an unrooted tree, the topology >>>>>>> might be >>>>>>> of interest. In my case I am doing a comparative visualizer >>>>>>> and it >>>>>>> might be nice for the user to be able to visualize the >>>>>>> topology as >>>>>>> specified. It has no biological meaning, but in practice, for >>>>>>> many >>>>>>> users, it helps. >>>>>>> I note that PhyloXML (even by virtue of being a XML format) >>>>>>> always >>>>>>> represents the phylogenies as trees (not weigthed DAGs). There >>>>>>> an >>>>>>> attribute rooted which can be true or false. >>>>>>> >>>>>>> But, anyway. Even assuming a very conservative view on this, the >>>>>>> current parser, for rooted trees, does not allow to determine >>>>>>> where is >>>>>>> the root. I think that there would be a consensus that that is >>>>>>> a bug? >>>>>>> >>>>>>> Tiago >>>>>> >>>>>> -- >>>>>> Richard Holland, BSc MBCS >>>>>> Operations and Delivery Director, Eagle Genomics Ltd >>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>>>> http://www.eaglegenomics.com/ >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> "The hottest places in hell are reserved for those who, in times >>>>> of >>>>> moral crisis, maintain a neutrality." - Dante >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > -- > Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer > Bioinformatik > Office 3426--http://bio.informatik.uni-jena.de--Institut fuer > Informatik > Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet > Jena > Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, > Germany > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Wed Nov 4 12:46:34 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 4 Nov 2009 12:46:34 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> Message-ID: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> Sounds good. On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: > 2009/11/3 Richard Holland : >> The prefix for the parser currently is hardcoded as p. Two new >> methods - set >> and getDefaultPrefix which accept a string should be provided (it >> should >> check that the string is valid, i.e. all alphanumeric and with no >> spaces or >> other Newick-sensitive characters). The parser should be changed to >> use the >> output from getDefaultPrefix() instead of the hardcoded p. The >> default >> behaviour should be such that it behaves the same as at present >> unless the >> user explicitly says otherwise by calling the setDefaultPrefix() >> method. > > This default behavior would still raise an exception with nodes called > p* . I would suggest a minor change: If there is a clash, the parser > would try the next p* (or whatever defaultPrefix) ... > > Example to make it clear: if there is a leaf called p2, internal nodes > generated would be p1, p3, p4, .... > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Wed Nov 4 12:51:37 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 4 Nov 2009 12:51:37 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com> <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com> <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> Message-ID: ah... except a problem! The parser does not know all names in the string in advance, so if it auto-assigns one that is then used later in the string, we have the same problem with name clashes as before. The names the parser assigns cannot totally avoid all clashes unless it has already parsed the string to find out what names were used in the string itself already. So some kind of pre-parse would be necessary. On 4 Nov 2009, at 12:46, Richard Holland wrote: > Sounds good. > > On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: > >> 2009/11/3 Richard Holland : >>> The prefix for the parser currently is hardcoded as p. Two new >>> methods - set >>> and getDefaultPrefix which accept a string should be provided (it >>> should >>> check that the string is valid, i.e. all alphanumeric and with no >>> spaces or >>> other Newick-sensitive characters). The parser should be changed >>> to use the >>> output from getDefaultPrefix() instead of the hardcoded p. The >>> default >>> behaviour should be such that it behaves the same as at present >>> unless the >>> user explicitly says otherwise by calling the setDefaultPrefix() >>> method. >> >> This default behavior would still raise an exception with nodes >> called >> p* . I would suggest a minor change: If there is a clash, the parser >> would try the next p* (or whatever defaultPrefix) ... >> >> Example to make it clear: if there is a leaf called p2, internal >> nodes >> generated would be p1, p3, p4, .... >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Wed Nov 4 17:18:52 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Nov 2009 17:18:52 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> Message-ID: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> Unless anyone with experience in biojava development wants to take on this, I would volunteer to do this. I ended up using the PhyloXML forester-atv parser (and moving to phyloxml instead of nexus), but as I reported this, I might as well sort it out... 2009/11/4 Richard Holland : > ah... except a problem! The parser does not know all names in the string in > advance, so if it auto-assigns one that is then used later in the string, we > have the same problem with name clashes as before. > > The names the parser assigns cannot totally avoid all clashes unless it has > already parsed the string to find out what names were used in the string > itself already. So some kind of pre-parse would be necessary. > > On 4 Nov 2009, at 12:46, Richard Holland wrote: > >> Sounds good. >> >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: >> >>> 2009/11/3 Richard Holland : >>>> >>>> The prefix for the parser currently is hardcoded as p. Two new methods - >>>> set >>>> and getDefaultPrefix which accept a string should be provided (it should >>>> check that the string is valid, i.e. all alphanumeric and with no spaces >>>> or >>>> other Newick-sensitive characters). The parser should be changed to use >>>> the >>>> output from getDefaultPrefix() instead of the hardcoded p. The default >>>> behaviour should be such that it behaves the same as at present unless >>>> the >>>> user explicitly says otherwise by calling the setDefaultPrefix() method. >>> >>> This default behavior would still raise an exception with nodes called >>> p* . I would suggest a minor change: If there is a clash, the parser >>> would try the next p* (or whatever defaultPrefix) ... >>> >>> Example to make it clear: if there is a leaf called p2, internal nodes >>> generated would be p1, p3, p4, .... >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From andreas at sdsc.edu Wed Nov 4 17:26:06 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 4 Nov 2009 09:26:06 -0800 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> Message-ID: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> excellent, thanks for taking this on! Andreas 2009/11/4 Tiago Ant?o > Unless anyone with experience in biojava development wants to take on > this, I would volunteer to do this. I ended up using the PhyloXML > forester-atv parser (and moving to phyloxml instead of nexus), but as > I reported this, I might as well sort it out... > > 2009/11/4 Richard Holland : > > ah... except a problem! The parser does not know all names in the string > in > > advance, so if it auto-assigns one that is then used later in the string, > we > > have the same problem with name clashes as before. > > > > The names the parser assigns cannot totally avoid all clashes unless it > has > > already parsed the string to find out what names were used in the string > > itself already. So some kind of pre-parse would be necessary. > > > > On 4 Nov 2009, at 12:46, Richard Holland wrote: > > > >> Sounds good. > >> > >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: > >> > >>> 2009/11/3 Richard Holland : > >>>> > >>>> The prefix for the parser currently is hardcoded as p. Two new methods > - > >>>> set > >>>> and getDefaultPrefix which accept a string should be provided (it > should > >>>> check that the string is valid, i.e. all alphanumeric and with no > spaces > >>>> or > >>>> other Newick-sensitive characters). The parser should be changed to > use > >>>> the > >>>> output from getDefaultPrefix() instead of the hardcoded p. The default > >>>> behaviour should be such that it behaves the same as at present unless > >>>> the > >>>> user explicitly says otherwise by calling the setDefaultPrefix() > method. > >>> > >>> This default behavior would still raise an exception with nodes called > >>> p* . I would suggest a minor change: If there is a clash, the parser > >>> would try the next p* (or whatever defaultPrefix) ... > >>> > >>> Example to make it clear: if there is a leaf called p2, internal nodes > >>> generated would be p1, p3, p4, .... > >>> > >>> -- > >>> "The hottest places in hell are reserved for those who, in times of > >>> moral crisis, maintain a neutrality." - Dante > >> > >> -- > >> Richard Holland, BSc MBCS > >> Operations and Delivery Director, Eagle Genomics Ltd > >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > >> http://www.eaglegenomics.com/ > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From tiagoantao at gmail.com Fri Nov 6 11:30:00 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Nov 2009 11:30:00 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> Message-ID: <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com> I've done a few changes to TreesBlock, namely implementing a version of what was talked here: 1. I maintained getTreeAsJGraphT and getTreeAsWeightedJGraphT as they are in terms of interface 2. There is now a new method getTopNode, stating which node is on the "top". I use the name getTopNode and not getRootNode to avoid misleading users: only rooted trees have a root, but for the nexus type of representation all have a "top" (which in rooted trees is the root) 3. There exist now setNodePrefix and getNodePrefix to be able to change the prefix (which defaults to p, as before) In my view these changes solve both problems: The issue with node names and the need to know the root/top of a nexus tree. It might not be the best solution, but it gets things on the right track without taking too much of my time. There are also no changes to the signatures of existing methods Now, there is still a problem: addTree(final String label, UndirectedGraph treegraph) Is highly dependent on the p* convention for internal nodes. Here I would be tempted to change the method signature to: addTree(final String label, UndirectedGraph treegraph, String topNode) Interestingly there is no addTree with weighted graphs (for distances). If nobody sees a problem with this, I will change addTree. I will then attach a patch to the currently open bug (along with test cases). And it should be done. 2009/11/4 Andreas Prlic : > excellent, thanks for taking this on! > Andreas > > 2009/11/4 Tiago Ant?o >> >> Unless anyone with experience in biojava development wants to take on >> this, I would volunteer to do this. I ended up using the PhyloXML >> forester-atv parser (and moving to phyloxml instead of nexus), but as >> I reported this, I might as well sort it out... >> >> 2009/11/4 Richard Holland : >> > ah... except a problem! The parser does not know all names in the string >> > in >> > advance, so if it auto-assigns one that is then used later in the >> > string, we >> > have the same problem with name clashes as before. >> > >> > The names the parser assigns cannot totally avoid all clashes unless it >> > has >> > already parsed the string to find out what names were used in the string >> > itself already. So some kind of pre-parse would be necessary. >> > >> > On 4 Nov 2009, at 12:46, Richard Holland wrote: >> > >> >> Sounds good. >> >> >> >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: >> >> >> >>> 2009/11/3 Richard Holland : >> >>>> >> >>>> The prefix for the parser currently is hardcoded as p. Two new >> >>>> methods - >> >>>> set >> >>>> and getDefaultPrefix which accept a string should be provided (it >> >>>> should >> >>>> check that the string is valid, i.e. all alphanumeric and with no >> >>>> spaces >> >>>> or >> >>>> other Newick-sensitive characters). The parser should be changed to >> >>>> use >> >>>> the >> >>>> output from getDefaultPrefix() instead of the hardcoded p. The >> >>>> default >> >>>> behaviour should be such that it behaves the same as at present >> >>>> unless >> >>>> the >> >>>> user explicitly says otherwise by calling the setDefaultPrefix() >> >>>> method. >> >>> >> >>> This default behavior would still raise an exception with nodes called >> >>> p* . I would suggest a minor change: If there is a clash, the parser >> >>> would try the next p* (or whatever defaultPrefix) ... >> >>> >> >>> Example to make it clear: if there is a leaf called p2, internal nodes >> >>> generated would be p1, p3, p4, .... >> >>> >> >>> -- >> >>> "The hottest places in hell are reserved for those who, in times of >> >>> moral crisis, maintain a neutrality." - Dante >> >> >> >> -- >> >> Richard Holland, BSc MBCS >> >> Operations and Delivery Director, Eagle Genomics Ltd >> >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> >> http://www.eaglegenomics.com/ >> >> >> >> >> >> _______________________________________________ >> >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> > -- >> > Richard Holland, BSc MBCS >> > Operations and Delivery Director, Eagle Genomics Ltd >> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> > http://www.eaglegenomics.com/ >> > >> > >> >> >> >> -- >> "The hottest places in hell are reserved for those who, in times of >> moral crisis, maintain a neutrality." - Dante >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Fri Nov 6 11:45:18 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 11:45:18 +0000 Subject: [Biojava-l] Rooted trees in nexus files In-Reply-To: <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com> References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com> <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com> <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com> <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com> <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com> <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com> <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com> <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com> <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com> Message-ID: Sounds great. With regard to addTree you could add a new method with the signature that you propose, copy the existing method body into it and modify appropriately, then delete the existing method body and replace with a call to the new one instead, with a default topNode value that corresponds to the assumptions that the existing method currently makes. cheers, Richard On 6 Nov 2009, at 11:30, Tiago Ant?o wrote: > I've done a few changes to TreesBlock, namely implementing a version > of what was talked here: > > 1. I maintained getTreeAsJGraphT and getTreeAsWeightedJGraphT as they > are in terms of interface > 2. There is now a new method getTopNode, stating which node is on the > "top". I use the name getTopNode and not getRootNode to avoid > misleading users: only rooted trees have a root, but for the nexus > type of representation all have a "top" (which in rooted trees is the > root) > 3. There exist now setNodePrefix and getNodePrefix to be able to > change the prefix (which defaults to p, as before) > > In my view these changes solve both problems: The issue with node > names and the need to know the root/top of a nexus tree. It might not > be the best solution, but it gets things on the right track without > taking too much of my time. There are also no changes to the > signatures of existing methods > > Now, there is still a problem: > addTree(final String label, UndirectedGraph > treegraph) > Is highly dependent on the p* convention for internal nodes. > Here I would be tempted to change the method signature to: > addTree(final String label, UndirectedGraph > treegraph, String topNode) > > Interestingly there is no addTree with weighted graphs (for > distances). > > If nobody sees a problem with this, I will change addTree. > > I will then attach a patch to the currently open bug (along with test > cases). And it should be done. > > 2009/11/4 Andreas Prlic : >> excellent, thanks for taking this on! >> Andreas >> >> 2009/11/4 Tiago Ant?o >>> >>> Unless anyone with experience in biojava development wants to take >>> on >>> this, I would volunteer to do this. I ended up using the PhyloXML >>> forester-atv parser (and moving to phyloxml instead of nexus), but >>> as >>> I reported this, I might as well sort it out... >>> >>> 2009/11/4 Richard Holland : >>>> ah... except a problem! The parser does not know all names in the >>>> string >>>> in >>>> advance, so if it auto-assigns one that is then used later in the >>>> string, we >>>> have the same problem with name clashes as before. >>>> >>>> The names the parser assigns cannot totally avoid all clashes >>>> unless it >>>> has >>>> already parsed the string to find out what names were used in the >>>> string >>>> itself already. So some kind of pre-parse would be necessary. >>>> >>>> On 4 Nov 2009, at 12:46, Richard Holland wrote: >>>> >>>>> Sounds good. >>>>> >>>>> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote: >>>>> >>>>>> 2009/11/3 Richard Holland : >>>>>>> >>>>>>> The prefix for the parser currently is hardcoded as p. Two new >>>>>>> methods - >>>>>>> set >>>>>>> and getDefaultPrefix which accept a string should be provided >>>>>>> (it >>>>>>> should >>>>>>> check that the string is valid, i.e. all alphanumeric and with >>>>>>> no >>>>>>> spaces >>>>>>> or >>>>>>> other Newick-sensitive characters). The parser should be >>>>>>> changed to >>>>>>> use >>>>>>> the >>>>>>> output from getDefaultPrefix() instead of the hardcoded p. The >>>>>>> default >>>>>>> behaviour should be such that it behaves the same as at present >>>>>>> unless >>>>>>> the >>>>>>> user explicitly says otherwise by calling the setDefaultPrefix() >>>>>>> method. >>>>>> >>>>>> This default behavior would still raise an exception with nodes >>>>>> called >>>>>> p* . I would suggest a minor change: If there is a clash, the >>>>>> parser >>>>>> would try the next p* (or whatever defaultPrefix) ... >>>>>> >>>>>> Example to make it clear: if there is a leaf called p2, >>>>>> internal nodes >>>>>> generated would be p1, p3, p4, .... >>>>>> >>>>>> -- >>>>>> "The hottest places in hell are reserved for those who, in >>>>>> times of >>>>>> moral crisis, maintain a neutrality." - Dante >>>>> >>>>> -- >>>>> Richard Holland, BSc MBCS >>>>> Operations and Delivery Director, Eagle Genomics Ltd >>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>>> http://www.eaglegenomics.com/ >>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> >>> >>> >>> -- >>> "The hottest places in hell are reserved for those who, in times of >>> moral crisis, maintain a neutrality." - Dante >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Fri Nov 6 13:26:58 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Nov 2009 13:26:58 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees Message-ID: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> Hi, Either I am looking for too much time to the code or it seems to me that the current implementation only supports binary trees (ie, trees with 2 children). I have tested with: tree tree6 = (1,2,3); And I get only 2 edges. The edge pointing to "1" gets lost. Inspecting the old code, this seems to be how it is implemented. In the case I am correct, this renders the whole tree parser somewhat useless in its current form, as most phylo trees are not binary only. The other two bugs are now corrected, but this is much more serious, me thinks. -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Fri Nov 6 14:10:54 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 14:10:54 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> Message-ID: <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> If that's true, sounds like it's broke. Is the old code easily modified to suit arbitrary numbers of children? On 6 Nov 2009, at 13:26, Tiago Ant?o wrote: > Hi, > > Either I am looking for too much time to the code or it seems to me > that the current implementation only supports binary trees (ie, trees > with 2 children). > > I have tested with: > tree tree6 = (1,2,3); > > And I get only 2 edges. The edge pointing to "1" gets lost. > Inspecting the old code, this seems to be how it is implemented. > > In the case I am correct, this renders the whole tree parser somewhat > useless in its current form, as most phylo trees are not binary only. > > The other two bugs are now corrected, but this is much more serious, > me thinks. > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Fri Nov 6 14:40:00 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Nov 2009 14:40:00 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> Message-ID: <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> 2009/11/6 Richard Holland : > If that's true, sounds like it's broke. Is the old code easily modified to > suit arbitrary numbers of children? I don't think so. It uses a stack based solution, so it would not be possible to know when a part of the stack belongs to the current node being processed or something else on the tree. One could put markers on the stack or something, but it would become a bit convoluted. I would suppose a recursive implementation would be cleaner here. My suggestion: for somebody else to verify my findings. I might be doing something stupidly wrong. Maybe things are correct. Just a simple tree like (1,2,3) (as long as it is not binary) - should expose the problem. From cmasak at gmail.com Fri Nov 6 16:25:57 2009 From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=) Date: Fri, 6 Nov 2009 17:25:57 +0100 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? Message-ID: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> I'm using RichSequenceIterator to read FASTA files containing proteins. Somehow it doesn't work when the protein sequences are in lowercase, which they sometimes are when downloaded from e.g. Uniprot. My code fails to recognize the following file as containing a protein sequence: >OPSD_FELCA mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn cmlttlccgknplgddeasttgsktetsqvapa What am I missing? Here's the code I'm using to read in sequences: private List sequencesFromInputStream(InputStream stream) { BufferedInputStream bufferedStream = new BufferedInputStream(stream); Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator seqit = null; try { seqit = RichSequence.IOTools.readStream(bufferedStream, ns); } catch (IOException e) { logger.error("Couldn't read sequences from file", e); return Collections.emptyList(); } List sequences = new ArrayList(); try { while ( seqit.hasNext() ) { RichSequence rseq; rseq = seqit.nextRichSequence(); // *error occurs here* if (rseq == null) continue; String alphabet = rseq.getAlphabet().getName(); sequences.add( "DNA".equals(alphabet) ? new BiojavaDNA(rseq) : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) : new BiojavaProtein(rseq) ); } } catch (NoSuchElementException e) { logger.error("Read past last sequence", e); } catch (BioException e) { logger.error(e); // *ends up here* } return sequences; } Grateful for any pointers you might have. Regards, // Carl M?sak From cmasak at gmail.com Fri Nov 6 16:54:30 2009 From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=) Date: Fri, 6 Nov 2009 17:54:30 +0100 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> Message-ID: <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> Richard (>), Carl (>>): >> I'm using RichSequenceIterator to read FASTA files containing >> proteins. Somehow it doesn't work when the protein sequences are in >> lowercase, which they sometimes are when downloaded from e.g. Uniprot. >> My code fails to recognize the following file as containing a protein >> sequence: >> >>> OPSD_FELCA >> >> >> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln >> >> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv >> >> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq >> >> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn >> cmlttlccgknplgddeasttgsktetsqvapa >> >> What am I missing? Here's the code I'm using to read in sequences: >> >> private List sequencesFromInputStream(InputStream stream) { >> >> BufferedInputStream bufferedStream = new >> BufferedInputStream(stream); >> Namespace ns = RichObjectFactory.getDefaultNamespace(); >> RichSequenceIterator seqit = null; >> >> try { >> seqit = RichSequence.IOTools.readStream(bufferedStream, ns); >> } catch (IOException e) { >> logger.error("Couldn't read sequences from file", e); >> return Collections.emptyList(); >> } >> >> List sequences = new ArrayList(); >> try { >> while ( seqit.hasNext() ) { >> RichSequence rseq; >> rseq = seqit.nextRichSequence(); // *error occurs here* >> if (rseq == null) >> continue; >> String alphabet = rseq.getAlphabet().getName(); >> sequences.add( >> "DNA".equals(alphabet) ? new BiojavaDNA(rseq) >> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) >> : new BiojavaProtein(rseq) ); >> } >> } catch (NoSuchElementException e) { >> logger.error("Read past last sequence", e); >> } catch (BioException e) { >> logger.error(e); // *ends up here* >> } >> >> return sequences; >> } >> >> Grateful for any pointers you might have. > > Could you post the output from the exception stack that it generates? org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream(BiojavaManager.java:314) at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile(BiojavaManager.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke(AbstractManagerMethodDispatcher.java:243) at net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread(JavaManagerMethodDispatcher.java:248) at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke(AbstractManagerMethodDispatcher.java:130) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at net.bioclipse.recording.WrapInProxyAdvice.invoke(WrapInProxyAdvice.java:22) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:59) at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke(ServiceInvoker.java:67) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke(ServiceTCCLInterceptor.java:34) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke(LocalBundleContextAdvice.java:59) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131) at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy18.invoke(Unknown Source) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:50) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy20.sequencesFromFile(Unknown Source) at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java:152) at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:238) at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:212) at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages(SequenceEditor.java:47) at org.eclipse.ui.part.MultiPageEditorPart.createPartControl(MultiPageEditorPart.java:357) at org.eclipse.ui.internal.EditorReference.createPartHelper(EditorReference.java:662) at org.eclipse.ui.internal.EditorReference.createPart(EditorReference.java:462) at org.eclipse.ui.internal.WorkbenchPartReference.getPart(WorkbenchPartReference.java:595) at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313) at org.eclipse.ui.internal.presentations.PresentablePart.setVisible(PresentablePart.java:180) at org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select(PresentablePartFolder.java:270) at org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select(LeftToRightTabOrder.java:65) at org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart(TabbedStackPresentation.java:473) at org.eclipse.ui.internal.PartStack.refreshPresentationSelection(PartStack.java:1256) at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java:1209) at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608) at org.eclipse.ui.internal.PartStack.add(PartStack.java:499) at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103) at org.eclipse.ui.internal.PartStack.add(PartStack.java:485) at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112) at org.eclipse.ui.internal.EditorSashContainer.addEditor(EditorSashContainer.java:63) at org.eclipse.ui.internal.EditorAreaHelper.addToLayout(EditorAreaHelper.java:225) at org.eclipse.ui.internal.EditorAreaHelper.addEditor(EditorAreaHelper.java:213) at org.eclipse.ui.internal.EditorManager.createEditorTab(EditorManager.java:778) at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor(EditorManager.java:677) at org.eclipse.ui.internal.EditorManager.openEditor(EditorManager.java:638) at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched(WorkbenchPage.java:2854) at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor(WorkbenchPage.java:2762) at org.eclipse.ui.internal.WorkbenchPage.access$11(WorkbenchPage.java:2754) at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java:2705) at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2701) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2685) at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2676) at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651) at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610) at org.eclipse.ui.actions.OpenFileAction.openFile(OpenFileAction.java:99) at org.eclipse.ui.actions.OpenSystemEditorAction.run(OpenSystemEditorAction.java:99) at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221) at org.eclipse.ui.navigator.CommonNavigatorManager$3.open(CommonNavigatorManager.java:202) at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open(OpenAndLinkWithEditorHelper.java:48) at org.eclipse.jface.viewers.StructuredViewer$2.run(StructuredViewer.java:842) at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) at org.eclipse.core.runtime.Platform.run(Platform.java:888) at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48) at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175) at org.eclipse.jface.viewers.StructuredViewer.fireOpen(StructuredViewer.java:840) at org.eclipse.jface.viewers.StructuredViewer.handleOpen(StructuredViewer.java:1101) at org.eclipse.ui.navigator.CommonViewer.handleOpen(CommonViewer.java:467) at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen(StructuredViewer.java:1205) at org.eclipse.jface.util.OpenStrategy.fireOpenEvent(OpenStrategy.java:264) at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java:258) at org.eclipse.jface.util.OpenStrategy$1.handleEvent(OpenStrategy.java:298) at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258) at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079) at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3441) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100) at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2405) at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369) at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:493) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149) at net.bioclipse.ui.Application.start(Application.java:36) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:194) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514) at org.eclipse.equinox.launcher.Main.run(Main.java:1311) at org.eclipse.equinox.launcher.Main.main(Main.java:1287) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.FastaFormat Accession=OPSD_FELCA Id=null Comments=problem parsing symbols Parse_block=mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa Stack trace follows .... at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:244) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 114 more Caused by: org.biojava.bio.symbol.IllegalSymbolException: This tokenization doesn't contain character: 'e' at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar(CharacterTokenization.java:175) at org.biojava.bio.seq.io.CharacterTokenization$TPStreamParser.characters(CharacterTokenization.java:246) at org.biojava.bio.symbol.SimpleSymbolList.(SimpleSymbolList.java:178) at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:237) ... 115 more // Carl From holland at eaglegenomics.com Fri Nov 6 17:15:28 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 17:15:28 +0000 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> Message-ID: <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> Ah OK I see what's going on. The convenience method you're using, RichSequence.IOTools.readStream (), uses FastaFormat to try and guess the alphabet to use based on the first line of the input sequence. In FastaFormat, it does this by searching for matching non-DNA symbols. The search is case-sensitive: protected static final Pattern aminoAcids = Pattern.compile(".* [FLIPQE].*"); FastaFormat needs patching to make this pattern non-case-sensitive. Still, if the sequence is such that any of the above symbols don't appear until the second or subsequent lines, the guessing will not work and it'll assume it's DNA, and give you the same error as before. In the circumstances where you know what alphabet the sequence is in advance, it's best to avoid the guessing algorithms and instead use the methods such as readFastaDNA that explicity specify the alphabet you want to read. However, there's still one thing that you definitely can't do and that's parse different types of sequence from the same input without inserting some kind of additional code to detect what alphabet each individual sequence is using before parsing it using the appropriate BioJava parser. Your code appears to expecting mixed input, but this won't work unless they all happen to be the same alphabet. cheers, Richard On 6 Nov 2009, at 16:54, Carl M?sak wrote: > Richard (>), Carl (>>): >>> I'm using RichSequenceIterator to read FASTA files containing >>> proteins. Somehow it doesn't work when the protein sequences are in >>> lowercase, which they sometimes are when downloaded from e.g. >>> Uniprot. >>> My code fails to recognize the following file as containing a >>> protein >>> sequence: >>> >>>> OPSD_FELCA >>> >>> >>> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln >>> >>> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv >>> >>> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq >>> >>> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn >>> cmlttlccgknplgddeasttgsktetsqvapa >>> >>> What am I missing? Here's the code I'm using to read in sequences: >>> >>> private List sequencesFromInputStream(InputStream >>> stream) { >>> >>> BufferedInputStream bufferedStream = new >>> BufferedInputStream(stream); >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> RichSequenceIterator seqit = null; >>> >>> try { >>> seqit = RichSequence.IOTools.readStream(bufferedStream, >>> ns); >>> } catch (IOException e) { >>> logger.error("Couldn't read sequences from file", e); >>> return Collections.emptyList(); >>> } >>> >>> List sequences = new ArrayList(); >>> try { >>> while ( seqit.hasNext() ) { >>> RichSequence rseq; >>> rseq = seqit.nextRichSequence(); // *error occurs >>> here* >>> if (rseq == null) >>> continue; >>> String alphabet = rseq.getAlphabet().getName(); >>> sequences.add( >>> "DNA".equals(alphabet) ? new BiojavaDNA(rseq) >>> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) >>> : new BiojavaProtein >>> (rseq) ); >>> } >>> } catch (NoSuchElementException e) { >>> logger.error("Read past last sequence", e); >>> } catch (BioException e) { >>> logger.error(e); // *ends up here* >>> } >>> >>> return sequences; >>> } >>> >>> Grateful for any pointers you might have. >> >> Could you post the output from the exception stack that it generates? > > org.biojava.bio.BioException: Could not read sequence > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence > (RichStreamReader.java:113) > at > net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream > (BiojavaManager.java:314) > at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile > (BiojavaManager.java:291) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke > (AbstractManagerMethodDispatcher.java:243) > at > net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread > (JavaManagerMethodDispatcher.java:248) > at > net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke > (AbstractManagerMethodDispatcher.java:130) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at net.bioclipse.recording.WrapInProxyAdvice.invoke > (WrapInProxyAdvice.java:22) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke > (ServiceInvoker.java:59) > at > org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke > (ServiceInvoker.java:67) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke > (ServiceTCCLInterceptor.java:34) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke > (LocalBundleContextAdvice.java:59) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed > (DelegatingIntroductionInterceptor.java:131) > at > org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke > (DelegatingIntroductionInterceptor.java:119) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at org.springframework.aop.framework.JdkDynamicAopProxy.invoke > (JdkDynamicAopProxy.java:204) > at $Proxy18.invoke(Unknown Source) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at > org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke > (AfterReturningAdviceInterceptor.java:50) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed > (ReflectiveMethodInvocation.java:171) > at org.springframework.aop.framework.JdkDynamicAopProxy.invoke > (JdkDynamicAopProxy.java:204) > at $Proxy20.sequencesFromFile(Unknown Source) > at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java: > 152) > at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138) > at org.eclipse.ui.part.MultiPageEditorPart.addPage > (MultiPageEditorPart.java:238) > at org.eclipse.ui.part.MultiPageEditorPart.addPage > (MultiPageEditorPart.java:212) > at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages > (SequenceEditor.java:47) > at org.eclipse.ui.part.MultiPageEditorPart.createPartControl > (MultiPageEditorPart.java:357) > at org.eclipse.ui.internal.EditorReference.createPartHelper > (EditorReference.java:662) > at org.eclipse.ui.internal.EditorReference.createPart > (EditorReference.java:462) > at org.eclipse.ui.internal.WorkbenchPartReference.getPart > (WorkbenchPartReference.java:595) > at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313) > at org.eclipse.ui.internal.presentations.PresentablePart.setVisible > (PresentablePart.java:180) > at > org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select > (PresentablePartFolder.java:270) > at > org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select > (LeftToRightTabOrder.java:65) > at > org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart > (TabbedStackPresentation.java:473) > at org.eclipse.ui.internal.PartStack.refreshPresentationSelection > (PartStack.java:1256) > at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java: > 1209) > at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608) > at org.eclipse.ui.internal.PartStack.add(PartStack.java:499) > at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103) > at org.eclipse.ui.internal.PartStack.add(PartStack.java:485) > at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112) > at org.eclipse.ui.internal.EditorSashContainer.addEditor > (EditorSashContainer.java:63) > at org.eclipse.ui.internal.EditorAreaHelper.addToLayout > (EditorAreaHelper.java:225) > at org.eclipse.ui.internal.EditorAreaHelper.addEditor > (EditorAreaHelper.java:213) > at org.eclipse.ui.internal.EditorManager.createEditorTab > (EditorManager.java:778) > at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor > (EditorManager.java:677) > at org.eclipse.ui.internal.EditorManager.openEditor > (EditorManager.java:638) > at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched > (WorkbenchPage.java:2854) > at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor > (WorkbenchPage.java:2762) > at org.eclipse.ui.internal.WorkbenchPage.access$11 > (WorkbenchPage.java:2754) > at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java: > 2705) > at org.eclipse.swt.custom.BusyIndicator.showWhile > (BusyIndicator.java:70) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2701) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2685) > at org.eclipse.ui.internal.WorkbenchPage.openEditor > (WorkbenchPage.java:2676) > at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651) > at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610) > at org.eclipse.ui.actions.OpenFileAction.openFile > (OpenFileAction.java:99) > at org.eclipse.ui.actions.OpenSystemEditorAction.run > (OpenSystemEditorAction.java:99) > at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221) > at org.eclipse.ui.navigator.CommonNavigatorManager$3.open > (CommonNavigatorManager.java:202) > at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open > (OpenAndLinkWithEditorHelper.java:48) > at org.eclipse.jface.viewers.StructuredViewer$2.run > (StructuredViewer.java:842) > at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) > at org.eclipse.core.runtime.Platform.run(Platform.java:888) > at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48) > at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175) > at org.eclipse.jface.viewers.StructuredViewer.fireOpen > (StructuredViewer.java:840) > at org.eclipse.jface.viewers.StructuredViewer.handleOpen > (StructuredViewer.java:1101) > at org.eclipse.ui.navigator.CommonViewer.handleOpen > (CommonViewer.java:467) > at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen > (StructuredViewer.java:1205) > at org.eclipse.jface.util.OpenStrategy.fireOpenEvent > (OpenStrategy.java:264) > at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java: > 258) > at org.eclipse.jface.util.OpenStrategy$1.handleEvent > (OpenStrategy.java:298) > at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) > at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273) > at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258) > at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079) > at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java: > 3441) > at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100) > at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java: > 2405) > at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369) > at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221) > at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500) > at org.eclipse.core.databinding.observable.Realm.runWithDefault > (Realm.java:332) > at org.eclipse.ui.internal.Workbench.createAndRunWorkbench > (Workbench.java:493) > at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java: > 149) > at net.bioclipse.ui.Application.start(Application.java:36) > at org.eclipse.equinox.internal.app.EclipseAppHandle.run > (EclipseAppHandle.java:194) > at > org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication > (EclipseAppLauncher.java:110) > at > org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start > (EclipseAppLauncher.java:79) > at org.eclipse.core.runtime.adaptor.EclipseStarter.run > (EclipseStarter.java:368) > at org.eclipse.core.runtime.adaptor.EclipseStarter.run > (EclipseStarter.java:179) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559) > at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514) > at org.eclipse.equinox.launcher.Main.run(Main.java:1311) > at org.eclipse.equinox.launcher.Main.main(Main.java:1287) > Caused by: org.biojava.bio.seq.io.ParseException: > > A Exception Has Occurred During Parsing. > Please submit the details that follow to biojava-l at biojava.org or post > a bug report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.FastaFormat > Accession=OPSD_FELCA > Id=null > Comments=problem parsing symbols > Parse_block > = > mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa > Stack trace follows .... > > > at org.biojavax.bio.seq.io.FastaFormat.readRichSequence > (FastaFormat.java:244) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence > (RichStreamReader.java:110) > ... 114 more > Caused by: org.biojava.bio.symbol.IllegalSymbolException: This > tokenization doesn't contain character: 'e' > at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar > (CharacterTokenization.java:175) > at org.biojava.bio.seq.io.CharacterTokenization > $TPStreamParser.characters(CharacterTokenization.java:246) > at org.biojava.bio.symbol.SimpleSymbolList. > (SimpleSymbolList.java:178) > at org.biojavax.bio.seq.io.FastaFormat.readRichSequence > (FastaFormat.java:237) > ... 115 more > > // Carl -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Nov 6 16:35:24 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 6 Nov 2009 16:35:24 +0000 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> Message-ID: Could you post the output from the exception stack that it generates? thanks, Richard On 6 Nov 2009, at 16:25, Carl M?sak wrote: > I'm using RichSequenceIterator to read FASTA files containing > proteins. Somehow it doesn't work when the protein sequences are in > lowercase, which they sometimes are when downloaded from e.g. Uniprot. > My code fails to recognize the following file as containing a protein > sequence: > >> OPSD_FELCA > mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln > lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv > aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq > qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn > cmlttlccgknplgddeasttgsktetsqvapa > > What am I missing? Here's the code I'm using to read in sequences: > > private List sequencesFromInputStream(InputStream > stream) { > > BufferedInputStream bufferedStream = new BufferedInputStream > (stream); > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqit = null; > > try { > seqit = RichSequence.IOTools.readStream(bufferedStream, > ns); > } catch (IOException e) { > logger.error("Couldn't read sequences from file", e); > return Collections.emptyList(); > } > > List sequences = new ArrayList(); > try { > while ( seqit.hasNext() ) { > RichSequence rseq; > rseq = seqit.nextRichSequence(); // *error occurs > here* > if (rseq == null) > continue; > String alphabet = rseq.getAlphabet().getName(); > sequences.add( > "DNA".equals(alphabet) ? new BiojavaDNA(rseq) > : "RNA".equals(alphabet) ? new BiojavaRNA(rseq) > : new BiojavaProtein > (rseq) ); > } > } catch (NoSuchElementException e) { > logger.error("Read past last sequence", e); > } catch (BioException e) { > logger.error(e); // *ends up here* > } > > return sequences; > } > > Grateful for any pointers you might have. > > Regards, > // Carl M?sak > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andylu0320 at gmail.com Sat Nov 7 18:06:39 2009 From: andylu0320 at gmail.com (Andy Lu) Date: Sat, 7 Nov 2009 13:06:39 -0500 Subject: [Biojava-l] Bio Java installation inquiry Message-ID: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> Hi, I am able to get JMol to run on Eclipse, but I am having a lot of trouble getting biojava to run, I am not sure how to set up all of the class path, etc. I am new to using Eclipse and biojava. Is there a specific step by step instruction online available? Any help would be greatly appreciated! From andreas at sdsc.edu Sat Nov 7 18:14:17 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 7 Nov 2009 10:14:17 -0800 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> Message-ID: <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> Hi Andy, best thing is to download the jar files from http://biojava.org/wiki/BioJava:Download . Proably the easiest way to get started is to create a new project in eclipse and right click on the project-> Properties -> Java build path -> Libraries -> Add jars Then your project will know how where to find the dependencies and you can start writing your own code. Andreas On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: > Hi, I am able to get JMol to run on Eclipse, but I am having a lot of > trouble getting biojava to run, I am not sure how to set up all of the > class > path, etc. > I am new to using Eclipse and biojava. Is there a specific step by step > instruction online available? > > Any help would be greatly appreciated! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Sat Nov 7 18:40:59 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 7 Nov 2009 10:40:59 -0800 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <4a1a3f7d0911071033j20dcd234rfd287998e7fab603@mail.gmail.com> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> <4a1a3f7d0911071033j20dcd234rfd287998e7fab603@mail.gmail.com> Message-ID: <59a41c430911071040i2b574d0ak2af98dbf22c1ab6a@mail.gmail.com> You don;t need to have Jmol in the classpath for running biojava, but if you do, you can use the jmol/biojava interface contained in the protein structure modules. In that case JmolApplet.jar would be sufficient, you don;t need to check out the Jmol source... Andreas On Sat, Nov 7, 2009 at 10:33 AM, Andy Lu wrote: > O I see, but don't I also need to have all of the JMol java files set up > first, or the BioJava jar file contains everything I need? > > > On Sat, Nov 7, 2009 at 1:14 PM, Andreas Prlic wrote: > >> Hi Andy, >> >> best thing is to download the jar files from >> http://biojava.org/wiki/BioJava:Download . >> >> Proably the easiest way to get started is to create a new project in >> eclipse and right click on the project-> Properties -> Java build path -> >> Libraries -> Add jars >> >> Then your project will know how where to find the dependencies and you >> can start writing your own code. >> >> Andreas >> >> >> On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: >> >>> Hi, I am able to get JMol to run on Eclipse, but I am having a lot of >>> trouble getting biojava to run, I am not sure how to set up all of the >>> class >>> path, etc. >>> I am new to using Eclipse and biojava. Is there a specific step by step >>> instruction online available? >>> >>> Any help would be greatly appreciated! >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> > > > -- > Andy Lu > From andy.law at roslin.ed.ac.uk Sat Nov 7 20:21:59 2009 From: andy.law at roslin.ed.ac.uk (andy law (RI)) Date: Sat, 7 Nov 2009 20:21:59 +0000 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>, <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> Message-ID: <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk> Andreas, When will the mavenised version of biojava be officially released? Later, Andy ________________________________________ From: biojava-l-bounces at lists.open-bio.org [biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic [andreas at sdsc.edu] Sent: 07 November 2009 18:14 To: Andy Lu Cc: biojava-l at biojava.org Subject: Re: [Biojava-l] Bio Java installation inquiry Hi Andy, best thing is to download the jar files from http://biojava.org/wiki/BioJava:Download . Proably the easiest way to get started is to create a new project in eclipse and right click on the project-> Properties -> Java build path -> Libraries -> Add jars Then your project will know how where to find the dependencies and you can start writing your own code. Andreas On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: > Hi, I am able to get JMol to run on Eclipse, but I am having a lot of > trouble getting biojava to run, I am not sure how to set up all of the > class > path, etc. > I am new to using Eclipse and biojava. Is there a specific step by step > instruction online available? > > Any help would be greatly appreciated! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From invite+m---dj1_ at facebookmail.com Sat Nov 7 15:36:06 2009 From: invite+m---dj1_ at facebookmail.com (Hemant Katta) Date: Sat, 7 Nov 2009 07:36:06 -0800 Subject: [Biojava-l] Check out my photos on Facebook Message-ID: <5cfd033d354d05252e1f2adb571f7451@localhost.localdomain> Hi Biojava-l, I set up a Facebook profile where I can post my pictures, videos and events and I want to add you as a friend so you can see it. First, you need to join Facebook! Once you join, you can also create your own profile. Thanks, Hemant To sign up for Facebook, follow the link below: http://www.facebook.com/p.php?i=1248280623&k=Z4AT2VW4W4TBXFL1XE5Y2USVTSCK5YW&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=biojava-l at biojava.org&c=152b234aad67c75ff060fc623aab7b42.biojava-l at biojava.org was invited to join Facebook by Hemant Katta. If you do not wish to receive this type of email from Facebook in the future, please click on the link below to unsubscribe. http://www.facebook.com/o.php?k=139a0f&u=574809715&mid=15f9114G2242e673G0G8 Facebook's offices are located at 1601 S. California Ave., Palo Alto, CA 94304. From andreas at sdsc.edu Sun Nov 8 06:52:00 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 7 Nov 2009 22:52:00 -0800 Subject: [Biojava-l] Bio Java installation inquiry In-Reply-To: <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk> References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com> <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com> <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk> Message-ID: <59a41c430911072252j40e22912k85886a5d61427bba@mail.gmail.com> Hi Andy, At the present the plan is to spend some more time working on the modules and then make a release (called 3.0) at some point shortly after the hackaton in Cambridge in January. Early adopters can already now use the modules via SVN. Andreas On Sat, Nov 7, 2009 at 12:21 PM, andy law (RI) wrote: > Andreas, > > When will the mavenised version of biojava be officially released? > > Later, > > Andy > ________________________________________ > From: biojava-l-bounces at lists.open-bio.org [ > biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic [ > andreas at sdsc.edu] > Sent: 07 November 2009 18:14 > To: Andy Lu > Cc: biojava-l at biojava.org > Subject: Re: [Biojava-l] Bio Java installation inquiry > > Hi Andy, > > best thing is to download the jar files from > http://biojava.org/wiki/BioJava:Download . > > Proably the easiest way to get started is to create a new project in > eclipse > and right click on the project-> Properties -> Java build path -> Libraries > -> Add jars > > Then your project will know how where to find the dependencies and you can > start writing your own code. > > Andreas > > > On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote: > > > Hi, I am able to get JMol to run on Eclipse, but I am having a lot of > > trouble getting biojava to run, I am not sure how to set up all of the > > class > > path, etc. > > I am new to using Eclipse and biojava. Is there a specific step by step > > instruction online available? > > > > Any help would be greatly appreciated! > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jbdundas at gmail.com Tue Nov 10 14:23:10 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 10 Nov 2009 19:53:10 +0530 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> Message-ID: <326ea8620911100623k4daa1222s60481a9f35777c31@mail.gmail.com> Dear Friends, Thank you for your help and advise. The code in the mentioned URL is working fine -> http://gist.github.com/229248 (this is my code that has been uploaded by a wise group member. Many thanks to him for doing that) Hope this helps.. Regards, JItesh Dundas On Sun, Nov 8, 2009 at 3:52 PM, jitesh dundas wrote: > Dear Sir, > > My program is working fine and can send me an xml file with 20 > records. However, it does not allow me to send large amounts of > records. > > For e.g. if I enter "cancer" it will return only 20 records. > > Can you please tell me what I should do next to get all those records. > Thank you in advance > > Regards, > Jitesh Dundas > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > > > > Hi Jitesh, > > > > It is hard to read your code with all the formatting off probably due to > email and many commented lines that don;t seem to get used. Can you provide > the stacktrace, so we can see what part of biojava is affected? > > > > Probably a good strategy to write and debug this is to simply the problem > into smaller steps. Try to first download the files you want to parse and > write the code to parse them from the local file. That will avoid any > issues you might encounter with networking and server/client communication. > Once the parsing is working you could take it to the next step and add the > server communication... > > > > Andreas > > > > > > > > > > On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas > wrote: > >> > >> Hi friends, > >> > >> I am getting this error on doing a post(using the code below) to this > url-> > >> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > >> > >> I have written this code in .jsp file. Later I will change it into > servlet. > >> > >> Error:- > >> XML Parsing Error: XML or text declaration not at start of entity > >> Location: > >> > http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > >> Line Number 11, Column 1: >> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > >> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd > ">2034200 > >> 19877350 19877304 19877297 > >> 19877284 19877271 19877265 > >> 19877250 19877245 19877226 > >> 19877210 19877179 19877175 > >> 19877161 19877159 19877158 > >> 19877123 19877122 19877120 > >> 19877119 19877118 > >> cancer > >> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > >> Fields] > >> "neoplasms"[MeSH Terms] MeSH > >> Terms 2082133 Y > >> "neoplasms"[All Fields] > All > >> Fields 1634731 Y > >> OR "cancer"[All > Fields] > >> All Fields 902537 > Y > >> OR GROUP > >> 2009/10/22[EDAT] EDAT 0 > >> Y > >> 2009/11/01[EDAT] EDAT 0 > >> Y RANGE AND > >> ("neoplasms"[MeSH Terms] OR > >> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > >> 2009/11/01[EDAT] > >> ^ > >> > >> As you can see, the XML output is coming fine but the above error does > not > >> go..The output via this program should be just like hitting manually the > >> above URL in the browser.. > >> The browser is Mozilla Firefox. > >> > >> Code:- > >> > >> <%@ page language = "java" %> > >> <%@ page import = "java.sql.*" %> > >> <%@ page import = "java.util.*" %> > >> <%@ page import = "java.io.*" %> > >> <%@ page import="java.lang.*" %> > >> <%@ page import="java.net.*" %> > >> <%@ page import="java.nio.*" %> > >> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > >> > >> > >> <% > >> > >> try > >> { > >> //String str = ""; > >> //out.println(""); > >> > >> Properties systemSettings = System.getProperties(); > >> systemSettings.put("http.proxyHost", "********"); > >> systemSettings.put("http.proxyPort", "******"); > >> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > >> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > >> > >> //out.println("Properties Set"); > >> Authenticator.setDefault(new Authenticator() > >> { > >> protected PasswordAuthentication getPasswordAuthentication() > >> { > >> return new PasswordAuthentication("**", > >> "******".toCharArray()); // specify ur user name password of iitb login > >> } > >> }); > >> > >> > >> System.setProperties(systemSettings); > >> //out.println("After Authentication & Properties Settings"); > >> > >> //create xml file. > >> //the input to google api > >> //String textAreaContent = request.getParameter("text"); > >> String textAreaContent = "This si a tst"; > >> > >> String str = ""; > >> > >> //xml file generation ends here.. > >> //FetchDataFromNCBI_URLString.jsp > >> String URLString = request.getParameter("txtURLString").trim(); > >> > >> //URL url = new URL(" > >> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > >> "); > >> URL url = new URL(URLString); //url string taken from user input. > >> HttpURLConnection connection = null; > >> > >> connection = (HttpURLConnection) url.openConnection(); > >> System.out.println("After open connection"); > >> connection.setRequestMethod("POST"); > >> connection.setDoInput(true); > >> connection.setDoOutput(true); > >> > >> connection.setUseCaches(false); > >> connection.setAllowUserInteraction(false); > >> //connection.setFollowRedirects(true); > >> //connection.setInstanceFollowRedirects(true); > >> //System.out.println("Before-------------------"); > >> connection.setRequestProperty ("Content-Type","text/xml; > >> charset=\"utf-8\""); > >> //System.out.println("After-------------------"); > >> > >> //System.out.println(""+ connection.getOutputStream()); > >> > >> //System.out.println("After dataoutputstream..Line No-65"); > >> > >> //System.out.println("Response Code="+ connection.getResponseCode); > >> > >> OutputStreamWriter dosout = new > >> OutputStreamWriter(connection.getOutputStream()); > >> //System.out.println("After dosout object..Line No-63"); > >> //dosout.write(str); > >> dosout.close (); > >> > >> BufferedReader in = new BufferedReader( new InputStreamReader( > >> connection.getInputStream())); > >> > >> String decodedString; > >> String tempstr = ""; > >> > >> > >> while ((decodedString = in.readLine()) != null) > >> { > >> tempstr = tempstr + decodedString; > >> //out.println(decodedString); > >> } > >> out.println(tempstr); > >> in.close(); > >> } > >> catch(Exception ex) > >> { > >> out.println("Exception->"+ex); > >> PrintWriter pw = response.getWriter(); > >> ex.printStackTrace(pw); > >> } > >> > >> > >> %> > >> > >> Thanks in advance.. > >> > >> Regards, > >> JItesh Dundas > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > From oliver.stolpe at fu-berlin.de Thu Nov 12 13:18:52 2009 From: oliver.stolpe at fu-berlin.de (Oliver Stolpe) Date: Thu, 12 Nov 2009 14:18:52 +0100 Subject: [Biojava-l] SeqIOTools deprecated, looking for alternatives // RichSeq.IOTools Message-ID: <4AFC0B3C.3010503@fu-berlin.de> Hello *, the cookbook uses in its examples the SeqIOTools-class for reading the files. But in the API it is marked as deprecated. Now I am looking for alternatives, so I searched the list and internet and found out that biojavax provides methods and classes for reading the files (RichSequence.IOTools). For example, I try to read an EMBL-file: --begin:code-- BufferedReader br = new BufferedReader(new FileReader(filename)); Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns); while (seqs.hasNext()) { RichSequence seq = seqs.nextRichSequence(); System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap()); } --end:code-- But I always get this error message: --begin:error-- org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at ReadGenbankFile.EMBL(ReadGenbankFile.java:42) at ReadGenbankFile.main(ReadGenbankFile.java:85) Caused by: org.biojava.bio.seq.io.ParseException: A Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.EMBLFormat Accession=null Id=not set Comments= Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT CDS join(<1082..1272,2484..2638,4926..>5041) /codon_start=3 /gene="PGM1" /product="phosphoglucomutase 1" /function="carbohydrate metabolism" /EC_number="5.4.2.2" /db_xref="GOA:Q9H1D2" /db_xref="HGNC:8905" /db_xref="HSSP:3PMG" /db_xref="InterPro:IPR016055" /db_xref="UniProtKB/TrEMBL:Q9H1D2" /protein_id="CAC19809.1" /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ Sequence 462 BP; Stack trace follows .... at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 2 more Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1949) at java.lang.String.substring(String.java:1916) at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761) ... 4 more --end:error-- The file looks all ok I think and works well with the deprecated SeqIOTools: --begin:embl-file-- ID AJ243265_2; parent: AJ243265 AC AJ243265; FT CDS join(<1082..1272,2484..2638,4926..>5041) FT /codon_start=3 FT /gene="PGM1" FT /product="phosphoglucomutase 1" FT /function="carbohydrate metabolism" FT /EC_number="5.4.2.2" FT /db_xref="GOA:Q9H1D2" FT /db_xref="HGNC:8905" FT /db_xref="HSSP:3PMG" FT /db_xref="InterPro:IPR016055" FT /db_xref="UniProtKB/TrEMBL:Q9H1D2" FT /protein_id="CAC19809.1" FT /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV FT ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA FT RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT" SQ Sequence 462 BP; ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct gcgaactcgg 60 cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc aacctcacct 120 atgcagctga cctggtggag accatgaagt caggagagca tgattttggg gctgcctttg 180 atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg aacccttcag 240 actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag cagactgggg 300 tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg gctagtgcta 360 caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat ttgatggacg 420 cgagcaaact gtccctttgt ggggaggaga gcttcgggac cg 462 // --end:embl-file-- The parser always crashes before reading the sequence (ttgt..., directly after the BP;). Any suggestions how I get this work? Or are there other alternatives for substituting the deprecated SeqIOTools-class? Thanks in advance, with best regards, Oliver From holland at eaglegenomics.com Fri Nov 13 11:21:47 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 13 Nov 2009 11:21:47 +0000 Subject: [Biojava-l] SeqIOTools deprecated, looking for alternatives // RichSeq.IOTools In-Reply-To: <4AFC0B3C.3010503@fu-berlin.de> References: <4AFC0B3C.3010503@fu-berlin.de> Message-ID: <05574914-87FC-44BB-90F1-75C79670A8EC@eaglegenomics.com> Hello, The file you are parsing is not a valid EMBL format file. The EMBL format is specified here: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4 and this is what the file should look like for your accession: http://www.ebi.ac.uk/cgi-bin/emblfetch?style=html&id=AJ243265&Submit=Go The most obvious problems in your file are the absence of the required 'XX' section delimiters, and an invalid ID line. There might be other problems too but I haven't checked the whole file, just the first few lines. The deprecated SeqIOTools really didn't care if the file was valid or not, they basically just made a copy of all the lines in an internal token/value map. They made no attempt to parse or understand the data in each line. The new RichSequence-based parsers actually attempt to enforce the file format definitions and break down and understand the contents of each line. This means that they will reject any file that does not strictly conform to the specified format. cheers, Richard On 12 Nov 2009, at 13:18, Oliver Stolpe wrote: > Hello *, > > the cookbook uses in its examples the SeqIOTools-class for reading the files. But in the API it is marked as deprecated. Now I am looking for alternatives, so I searched the list and internet and found out that biojavax provides methods and classes for reading the files (RichSequence.IOTools). > > For example, I try to read an EMBL-file: > > --begin:code-- > > BufferedReader br = new BufferedReader(new FileReader(filename)); > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns); > > while (seqs.hasNext()) { > RichSequence seq = seqs.nextRichSequence(); > System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap()); > } > > --end:code-- > > But I always get this error message: > > --begin:error-- > > org.biojava.bio.BioException: Could not read sequence > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at ReadGenbankFile.EMBL(ReadGenbankFile.java:42) > at ReadGenbankFile.main(ReadGenbankFile.java:85) > Caused by: org.biojava.bio.seq.io.ParseException: > > A Exception Has Occurred During Parsing. > Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.EMBLFormat > Accession=null > Id=not set > Comments= > Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT CDS join(<1082..1272,2484..2638,4926..>5041) > /codon_start=3 > /gene="PGM1" > /product="phosphoglucomutase 1" > /function="carbohydrate metabolism" > /EC_number="5.4.2.2" > /db_xref="GOA:Q9H1D2" > /db_xref="HGNC:8905" > /db_xref="HSSP:3PMG" > /db_xref="InterPro:IPR016055" > /db_xref="UniProtKB/TrEMBL:Q9H1D2" > /protein_id="CAC19809.1" > /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV > ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA > RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ Sequence 462 BP; > Stack trace follows .... > > > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775) > at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 2 more > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -3 > at java.lang.String.substring(String.java:1949) > at java.lang.String.substring(String.java:1916) > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761) > ... 4 more > > --end:error-- > > The file looks all ok I think and works well with the deprecated SeqIOTools: > > --begin:embl-file-- > ID AJ243265_2; parent: AJ243265 > AC AJ243265; > FT CDS join(<1082..1272,2484..2638,4926..>5041) > FT /codon_start=3 > FT /gene="PGM1" > FT /product="phosphoglucomutase 1" > FT /function="carbohydrate metabolism" > FT /EC_number="5.4.2.2" > FT /db_xref="GOA:Q9H1D2" > FT /db_xref="HGNC:8905" > FT /db_xref="HSSP:3PMG" > FT /db_xref="InterPro:IPR016055" > FT /db_xref="UniProtKB/TrEMBL:Q9H1D2" > FT /protein_id="CAC19809.1" > FT /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV > FT ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA > FT RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT" > SQ Sequence 462 BP; > ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct gcgaactcgg 60 > cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc aacctcacct 120 > atgcagctga cctggtggag accatgaagt caggagagca tgattttggg gctgcctttg 180 > atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg aacccttcag 240 > actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag cagactgggg 300 > tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg gctagtgcta 360 > caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat ttgatggacg 420 > cgagcaaact gtccctttgt ggggaggaga gcttcgggac cg 462 > // > --end:embl-file-- > > The parser always crashes before reading the sequence (ttgt..., directly after the BP;). > > Any suggestions how I get this work? > Or are there other alternatives for substituting the deprecated SeqIOTools-class? > > Thanks in advance, > > with best regards, > > Oliver > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Fri Nov 13 12:25:41 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 13 Nov 2009 12:25:41 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> Message-ID: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Hi, > My suggestion: for somebody else to verify my findings. I might be > doing something stupidly wrong. Maybe things are correct. Just a > simple tree like (1,2,3) (as long as it is not binary) - should expose > the problem. > Has nobody answered here is my take: 1. The error reported probably exists 2. Most probably nobody is using the parser (as it only supports binary trees). In this light, changing the API should not be a problem at all. I would not mind correcting the problem (I have already corrected the previous 2 ones in my local version). I would suggest removing the call to the unweighted graph. Reasons: 1. A weighted version is enough. If branch lengths are not specified, then weights could be set to 0. There there would not be a decrease in functionality. 2. Severely reducing the size of the code is important. Clearly the code is not much maintained (and I am not offering to maintain it in the long run, just putting it in good shape) and not much used. Therefore a smaller, more easy to manage code base makes even more sense. If you accept a solution along these lines. I would correct all the bugs and also include test code (which is also missing). -- "The hottest places in hell are reserved for those who, in times of moral crisis, maintain a neutrality." - Dante From holland at eaglegenomics.com Fri Nov 13 12:42:03 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 13 Nov 2009 12:42:03 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Message-ID: <6467088B-93FA-48D2-A7C4-27CD238CE1AE@eaglegenomics.com> i'm all for that. The original code was developed by a Google Summer of Code student, who we haven't heard much from since. :( cheers, Richard On 13 Nov 2009, at 12:25, Tiago Ant?o wrote: > Hi, > >> My suggestion: for somebody else to verify my findings. I might be >> doing something stupidly wrong. Maybe things are correct. Just a >> simple tree like (1,2,3) (as long as it is not binary) - should expose >> the problem. >> > > Has nobody answered here is my take: > > 1. The error reported probably exists > 2. Most probably nobody is using the parser (as it only supports binary trees). > > In this light, changing the API should not be a problem at all. > > I would not mind correcting the problem (I have already corrected the > previous 2 ones in my local version). > I would suggest removing the call to the unweighted graph. Reasons: > 1. A weighted version is enough. If branch lengths are not specified, > then weights could be set to 0. There there would not be a decrease in > functionality. > 2. Severely reducing the size of the code is important. Clearly the > code is not much maintained (and I am not offering to maintain it in > the long run, just putting it in good shape) and not much used. > Therefore a smaller, more easy to manage code base makes even more > sense. > > If you accept a solution along these lines. I would correct all the > bugs and also include test code (which is also missing). > > > > -- > "The hottest places in hell are reserved for those who, in times of > moral crisis, maintain a neutrality." - Dante -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thasso.griebel at uni-jena.de Fri Nov 13 14:51:41 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Fri, 13 Nov 2009 15:51:41 +0100 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Message-ID: Hi, > 1. A weighted version is enough. If branch lengths are not specified, > then weights could be set to 0. There there would not be a decrease in > functionality. just my two cents, but I would go with a default weight of 1.0. If you read something unweighted you would ignore the edge weights anyways, but, for example, if you write something simple that computes path lengths, a default weight of 1.0 ensures that the method also works for "unweighted" trees, where the length of a path is defined as the number of edges you need to traverse to move from say A to B. I think the argument also hold for other algorithms used on trees and graphs. anyways, just my two cent. -thasso -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From tiagoantao at gmail.com Fri Nov 13 14:54:08 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 13 Nov 2009 14:54:08 +0000 Subject: [Biojava-l] Newick/Nexus processing of non-binary trees In-Reply-To: References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com> <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com> <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com> <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com> Message-ID: <6d941f120911130654n55153f50r477cd11c281bd9a1@mail.gmail.com> 2009/11/13 Thasso Griebel : > just my two cents, but I would go with a default weight of 1.0. If you read > something unweighted you would ignore the edge weights anyways, but, for > example, if you write something simple that computes path lengths, a default > weight of 1.0 ensures that the method also works for "unweighted" trees, > where the length of a path is defined as the number of edges you need to > traverse to move from say A to B. I think the argument also hold for other > algorithms used on trees and graphs. OK, I will do this. From holland at eaglegenomics.com Fri Nov 13 15:04:27 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 13 Nov 2009 15:04:27 +0000 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com> Message-ID: <2180C289-D2F7-4910-8534-9A94B1003941@eaglegenomics.com> I've applied the patch to the trunk of biojava-live. Thanks! Richard On 9 Nov 2009, at 16:26, Carl M?sak wrote: > Richard (>): >> Ah OK I see what's going on. >> >> The convenience method you're using, RichSequence.IOTools.readStream(), uses >> FastaFormat to try and guess the alphabet to use based on the first line of >> the input sequence. >> >> In FastaFormat, it does this by searching for matching non-DNA symbols. The >> search is case-sensitive: >> >> protected static final Pattern aminoAcids = >> Pattern.compile(".*[FLIPQE].*"); >> >> FastaFormat needs patching to make this pattern non-case-sensitive. > > Patch attached. > > I also took the opportunity to remove the occurrences of .* in the > Pattern above. Generally, once should be using Matcher.find() when one > is interested in matching a part of a string. This is more efficient > than using Matcher.matches() and surrounding the desired regular > expression with .*, since the latter will cause a lot of unnecessary > backtracking and make the search quadratic. > > This effect only shows up for very long strings, but long strings can > and do happen in bioinformatics. The below measurements show the > quadratic behaviour of the former approach. > > $ for length in 100 1000 10000 100000 1000000; do (time java > WithDotStar $length) 2>&1 | grep real; done > real 0m0.371s > real 0m0.367s > real 0m0.577s > real 0m2.735s > real 0m25.275s > > $ for length in 100 1000 10000 100000 1000000; do (time java > WithoutDotStar $length) 2>&1 | grep real; done > real 0m0.309s > real 0m0.361s > real 0m0.468s > real 0m1.184s > real 0m9.703s > > Kindly, > // Carl > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andylu0320 at gmail.com Sun Nov 15 22:07:41 2009 From: andylu0320 at gmail.com (Andy Lu) Date: Sun, 15 Nov 2009 17:07:41 -0500 Subject: [Biojava-l] JMol I/O Message-ID: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com> Hi, sorry to bother everyone again. But I have a simple quesiton, I am using the SimpleJMolExample.java provided on the website and it works. But for a pdb file containing about 20 atoms, all of the atoms shows up on JMol for 1 second and then disappears, is it because the color changes or something or some atom size restriction? It works for files that contain much larger number of atoms. If I try to open a file manually from JMol through the open option, it shows up nicely. Is there a way that I can make the pdb file displayed on JMol through Biojava the same color/display as the one if I open it manually though JMol? Any help would be greatly appreciated! Thank you! -- Andy Lu From tiagoantao at gmail.com Sun Nov 15 23:19:46 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 15 Nov 2009 23:19:46 +0000 Subject: [Biojava-l] Newick parser Message-ID: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> Hi, I have made the changes as discussed, the code is attached to the bugzilla bug concerning part of the issues that were found. A few notes: 1. There is a ParserException raised on TreeBlock. Tough there is a TreeBlockParser, most of the important parsing was (and still is!) being made on TreeBlock. I would imagine that this is not the best design, but I did not change it. 2. I made some test cases. Also included. 3. I don't mind producing some documentation, in case you accept the code. 4. I noticed a few minor bugs more (like eating spaces in the names of nodes). But they are really minor. 5. The API was changed, but I suppose not many people were parsing trees. If there were people parsing trees most probably the bug on not being able to process trees that are not binary would have been detected as it is pretty major. Tiago -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From andreas at sdsc.edu Mon Nov 16 04:41:57 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 15 Nov 2009 20:41:57 -0800 Subject: [Biojava-l] JMol I/O In-Reply-To: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com> References: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com> Message-ID: <59a41c430911152041w592e43c2w3048b916a1855b85@mail.gmail.com> Hi Andy, probably you are trying to visualize a small molecule in Jmol, but the visualization script you are sending only works if you have several C-alpha atoms available. Try something like "select * ; spacefill on;". Jmol has a powerful scripting language which is probably worth having a look at, if you want to work with it more closely. Andreas On Sun, Nov 15, 2009 at 2:07 PM, Andy Lu wrote: > Hi, sorry to bother everyone again. > But I have a simple quesiton, I am using the SimpleJMolExample.java provided > on the website and it works. But for a pdb file containing about 20 atoms, > all of the atoms shows up on JMol for 1 second and then disappears, is it > because the color changes or something or some atom size restriction? It > works for files that contain much larger number of atoms. > If I try to open a file manually from JMol through the open option, it shows > up nicely. Is there a way that I can make the pdb file displayed on JMol > through Biojava the same color/display as the one if I open it manually > though JMol? > Any help would be greatly appreciated! > Thank you! > > -- > Andy Lu > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at eaglegenomics.com Mon Nov 16 08:39:02 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 16 Nov 2009 08:39:02 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> Message-ID: Patch applied to the trunk of biojava-live. Thanks for fixing it! cheers, Richard On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: > Hi, > > I have made the changes as discussed, the code is attached to the > bugzilla bug concerning part of the issues that were found. > A few notes: > > 1. There is a ParserException raised on TreeBlock. Tough there is a > TreeBlockParser, most of the important parsing was (and still is!) > being made on TreeBlock. I would imagine that this is not the best > design, but I did not change it. > 2. I made some test cases. Also included. > 3. I don't mind producing some documentation, in case you accept the code. > 4. I noticed a few minor bugs more (like eating spaces in the names of > nodes). But they are really minor. > 5. The API was changed, but I suppose not many people were parsing > trees. If there were people parsing trees most probably the bug on not > being able to process trees that are not binary would have been > detected as it is pretty major. > > Tiago > > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Mon Nov 16 12:35:11 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 16 Nov 2009 12:35:11 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> Message-ID: <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> I can just easily solve 2679, as it is precisely on the file that I was changing. In case there is interest I'll just solve it. 2009/11/16 Richard Holland : > Patch applied to the trunk of biojava-live. Thanks for fixing it! > > cheers, > Richard > > On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: > >> Hi, >> >> I have made the changes as discussed, the code is attached to the >> bugzilla bug concerning part of the issues that were found. >> A few notes: >> >> 1. There is a ParserException raised on TreeBlock. Tough there is a >> TreeBlockParser, most of the important parsing was (and still is!) >> being made on TreeBlock. I would imagine that this is not the best >> design, but I did not change it. >> 2. I made some test cases. Also included. >> 3. I don't mind producing some documentation, in case you accept the code. >> 4. I noticed a few minor bugs more (like eating spaces in the names of >> nodes). But they are really minor. >> 5. The API was changed, but I suppose not many people were parsing >> trees. If there were people parsing trees most probably the bug on not >> being able to process trees that are not binary would have been >> detected as it is pretty major. >> >> Tiago >> >> -- >> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From holland at eaglegenomics.com Mon Nov 16 12:41:50 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 16 Nov 2009 12:41:50 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> Message-ID: yes please! On 16 Nov 2009, at 12:35, Tiago Ant?o wrote: > I can just easily solve 2679, as it is precisely on the file that I > was changing. In case there is interest I'll just solve it. > > 2009/11/16 Richard Holland : >> Patch applied to the trunk of biojava-live. Thanks for fixing it! >> >> cheers, >> Richard >> >> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: >> >>> Hi, >>> >>> I have made the changes as discussed, the code is attached to the >>> bugzilla bug concerning part of the issues that were found. >>> A few notes: >>> >>> 1. There is a ParserException raised on TreeBlock. Tough there is a >>> TreeBlockParser, most of the important parsing was (and still is!) >>> being made on TreeBlock. I would imagine that this is not the best >>> design, but I did not change it. >>> 2. I made some test cases. Also included. >>> 3. I don't mind producing some documentation, in case you accept the code. >>> 4. I noticed a few minor bugs more (like eating spaces in the names of >>> nodes). But they are really minor. >>> 5. The API was changed, but I suppose not many people were parsing >>> trees. If there were people parsing trees most probably the bug on not >>> being able to process trees that are not binary would have been >>> detected as it is pretty major. >>> >>> Tiago >>> >>> -- >>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Mon Nov 16 17:52:56 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 16 Nov 2009 17:52:56 +0000 Subject: [Biojava-l] Newick parser In-Reply-To: References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com> Message-ID: <6d941f120911160952y36b26e40r4ffa5dd980e012fd@mail.gmail.com> I've submitted a patch to 2679. Please have a look and see if you like it. 2009/11/16 Richard Holland : > yes please! > > On 16 Nov 2009, at 12:35, Tiago Ant?o wrote: > >> I can just easily solve 2679, as it is precisely on the file that I >> was changing. In case there is interest I'll just solve it. >> >> 2009/11/16 Richard Holland : >>> Patch applied to the trunk of biojava-live. Thanks for fixing it! >>> >>> cheers, >>> Richard >>> >>> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: >>> >>>> Hi, >>>> >>>> I have made the changes as discussed, the code is attached to the >>>> bugzilla bug concerning part of the issues that were found. >>>> A few notes: >>>> >>>> 1. There is a ParserException raised on TreeBlock. Tough there is a >>>> TreeBlockParser, most of the important parsing was (and still is!) >>>> being made on TreeBlock. I would imagine that this is not the best >>>> design, but I did not change it. >>>> 2. I made some test cases. Also included. >>>> 3. I don't mind producing some documentation, in case you accept the code. >>>> 4. I noticed a few minor bugs more (like eating spaces in the names of >>>> nodes). But they are really minor. >>>> 5. The API was changed, but I suppose not many people were parsing >>>> trees. If there were people parsing trees most probably the bug on not >>>> being able to process trees that are not binary would have been >>>> detected as it is pretty major. >>>> >>>> Tiago >>>> >>>> -- >>>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> >> -- >> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From tiagoantao at gmail.com Tue Nov 17 19:57:50 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Nov 2009 19:57:50 +0000 Subject: [Biojava-l] Fwd: Newick parser In-Reply-To: <6d941f120911171154q31df1c32taff20f5b282867bc@mail.gmail.com> References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com> <59a41c430911171123j43806c25vda67e406aa2d3caa@mail.gmail.com> <6d941f120911171154q31df1c32taff20f5b282867bc@mail.gmail.com> Message-ID: <6d941f120911171157x20daedcdif84496c363a5bfcd@mail.gmail.com> Forwarding this to the users mailing list also, as there might be some interest in the documentation. ---------- Forwarded message ---------- From: Tiago Ant?o Date: 2009/11/17 Subject: Re: [Biojava-l] Newick parser To: Richard Holland Cc: Andreas Prlic , biojava-dev As this was all fresh in my head, I wrote a small tutorial: http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/ As I don't follow the biojava mailing list regularly (or bug reports), if some bug arises on this code, feel free to send me an email to my personal account: If I have some time to spare, I will have a look at it. Tiago 2009/11/17 Richard Holland : > Sorry - forgot to change the filenames in the test (under the new modular system they're in a different place than in the non-modular codebase that Tiago was working from). Fixed and committed. > > On 17 Nov 2009, at 19:23, Andreas Prlic wrote: > >> Hi Richard, >> >> I just did an update of my checkout and it seems the -phylo unit tests >> don't compile any more. Can you take a look? >> >> Thanks, >> Andreas >> >> Test set: org.biojavax.bio.phylo.io.nexus.TreesBlockTest >> ------------------------------------------------------------------------------- >> Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.063 >> sec <<< FAILURE! >> testSimple(org.biojavax.bio.phylo.io.nexus.TreesBlockTest) ?Time >> elapsed: 0.021 sec ?<<< ERROR! >> java.lang.NullPointerException >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTreeNode(TreesBlockTest.java:160) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTree(TreesBlockTest.java:175) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.doVertexCount(TreesBlockTest.java:139) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.testSimple(TreesBlockTest.java:63) >> >> testThreeOffspring(org.biojavax.bio.phylo.io.nexus.TreesBlockTest) >> Time elapsed: 0.002 sec ?<<< ERROR! >> java.lang.NullPointerException >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTreeNode(TreesBlockTest.java:160) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTree(TreesBlockTest.java:175) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.doVertexCount(TreesBlockTest.java:139) >> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.testThreeOffspring(TreesBlockTest.java:70 >> >> 2009/11/16 Richard Holland : >>> Patch applied to the trunk of biojava-live. Thanks for fixing it! >>> >>> cheers, >>> Richard >>> >>> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote: >>> >>>> Hi, >>>> >>>> I have made the changes as discussed, the code is attached to the >>>> bugzilla bug concerning part of the issues that were found. >>>> A few notes: >>>> >>>> 1. There is a ParserException raised on TreeBlock. Tough there is a >>>> TreeBlockParser, most of the important parsing was (and still is!) >>>> being made on TreeBlock. I would imagine that this is not the best >>>> design, but I did not change it. >>>> 2. I made some test cases. Also included. >>>> 3. I don't mind producing some documentation, in case you accept the code. >>>> 4. I noticed a few minor bugs more (like eating spaces in the names of >>>> nodes). But they are really minor. >>>> 5. The API was changed, but I suppose not many people were parsing >>>> trees. If there were people parsing trees most probably the bug on not >>>> being able to process trees that are not binary would have been >>>> detected as it is pretty major. >>>> >>>> Tiago >>>> >>>> -- >>>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From mara.axiom at gmail.com Sat Nov 21 05:43:52 2009 From: mara.axiom at gmail.com (Mara Axiom) Date: Sat, 21 Nov 2009 00:43:52 -0500 Subject: [Biojava-l] Algorithm to compare protein sequences Message-ID: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> Hello all, I am looking for an algorithm to compare protein sequences and output the result in Newick format, for a project. I was told that I could not use UPGMA and Nearest Neighbor, algorithms. I'm new in working with phylogenetic data. Any help is appreciated. Thanks, Mara From andreas.draeger at uni-tuebingen.de Sat Nov 21 08:35:19 2009 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Sat, 21 Nov 2009 09:35:19 +0100 Subject: [Biojava-l] Algorithm to compare protein sequences In-Reply-To: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> References: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> Message-ID: <4B07A647.1080405@uni-tuebingen.de> Hi Mara, At the moment there are two alignment algorithms available: Smith-Waterman for local and Needleman-Wunsh for global alignment. In addition to that there is a package for hidden Markov models that is also able to perform sequence alignments (see the BioJava cookbook for examples). However, currently both approaches will write the alignment similar to the BLAST output and not in this Newick format (I am actually not familiar with that). I hope that helps. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From thasso.griebel at uni-jena.de Sat Nov 21 11:25:34 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Sat, 21 Nov 2009 12:25:34 +0100 Subject: [Biojava-l] Algorithm to compare protein sequences In-Reply-To: <4B07A647.1080405@uni-tuebingen.de> References: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com> <4B07A647.1080405@uni-tuebingen.de> Message-ID: Hi, if I get this one right you want to do three things. 1. create a multiple sequence alignment. 2. create a pairwise distance matrix from the alignment. 3. use a distance based tree construction method (Agglomerative clustering (UPGME, WPGMA..) or Neighbor Joining) to create a tree. The tree can be printed as newick string. I don't know if all of this is possible with biojava. If not, I could at least provide code to create the pairwise distance matrix (including JC and Kimura corrections) and for the clustering algorithms. But I thought NJ and AgglomerativeClustering are already implemented, though I couldn't find the classes in the 1.7 API ? If you don't need to do the computations programmatically, you can also try http://bio.informatik.uni-jena.de/epos/ though with the currently released version you have to do the alignment externally. The next release will also provide a way to do multiple sequence alignments directly. Another alternative is http://gi.cebitec.uni-bielefeld.de/qalign QAlign can be used to create the alignment (using clustalw, tcoffee or dialign) and create NJ or Agglomerative tree in one step. Nice thing is that you can manipulate the alignment (i.e. insert gaps) and the tree updated continuously cheers, thasso On Nov 21, 2009, at 09:35 , Andreas Dr?ger wrote: > Hi Mara, > > At the moment there are two alignment algorithms available: > Smith-Waterman for local and Needleman-Wunsh for global alignment. In > addition to that there is a package for hidden Markov models that is > also able to perform sequence alignments (see the BioJava cookbook for > examples). However, currently both approaches will write the alignment > similar to the BLAST output and not in this Newick format (I am actually > not familiar with that). I hope that helps. > > Cheers > Andreas > > -- > Dipl.-Bioinform. Andreas Dr?ger > Eberhard Karls University T?bingen > Center for Bioinformatics (ZBIT) > Sand 1 > 72076 T?bingen > Germany > > Phone: +49-7071-29-70436 > Fax: +49-7071-29-5091 > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany From jbdundas at gmail.com Sun Nov 8 10:22:59 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Sun, 08 Nov 2009 10:22:59 -0000 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> Message-ID: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> Dear Sir, My program is working fine and can send me an xml file with 20 records. However, it does not allow me to send large amounts of records. For e.g. if I enter "cancer" it will return only 20 records. Can you please tell me what I should do next to get all those records. Thank you in advance Regards, Jitesh Dundas On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > > Hi Jitesh, > > It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? > > Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... > > Andreas > > > > > On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: >> >> Hi friends, >> >> I am getting this error on doing a post(using the code below) to this url-> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 >> >> I have written this code in .jsp file. Later I will change it into servlet. >> >> Error:- >> XML Parsing Error: XML or text declaration not at start of entity >> Location: >> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI >> Line Number 11, Column 1:> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " >> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200 >> 19877350 19877304 19877297 >> 19877284 19877271 19877265 >> 19877250 19877245 19877226 >> 19877210 19877179 19877175 >> 19877161 19877159 19877158 >> 19877123 19877122 19877120 >> 19877119 19877118 >> cancer >> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All >> Fields] >> "neoplasms"[MeSH Terms] MeSH >> Terms 2082133 Y >> "neoplasms"[All Fields] All >> Fields 1634731 Y >> OR "cancer"[All Fields] >> All Fields 902537 Y >> OR GROUP >> 2009/10/22[EDAT] EDAT 0 >> Y >> 2009/11/01[EDAT] EDAT 0 >> Y RANGE AND >> ("neoplasms"[MeSH Terms] OR >> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : >> 2009/11/01[EDAT] >> ^ >> >> As you can see, the XML output is coming fine but the above error does not >> go..The output via this program should be just like hitting manually the >> above URL in the browser.. >> The browser is Mozilla Firefox. >> >> Code:- >> >> <%@ page language = "java" %> >> <%@ page import = "java.sql.*" %> >> <%@ page import = "java.util.*" %> >> <%@ page import = "java.io.*" %> >> <%@ page import="java.lang.*" %> >> <%@ page import="java.net.*" %> >> <%@ page import="java.nio.*" %> >> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> >> >> >> <% >> >> try >> { >> //String str = ""; >> //out.println(""); >> >> Properties systemSettings = System.getProperties(); >> systemSettings.put("http.proxyHost", "********"); >> systemSettings.put("http.proxyPort", "******"); >> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); >> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); >> >> //out.println("Properties Set"); >> Authenticator.setDefault(new Authenticator() >> { >> protected PasswordAuthentication getPasswordAuthentication() >> { >> return new PasswordAuthentication("**", >> "******".toCharArray()); // specify ur user name password of iitb login >> } >> }); >> >> >> System.setProperties(systemSettings); >> //out.println("After Authentication & Properties Settings"); >> >> //create xml file. >> //the input to google api >> //String textAreaContent = request.getParameter("text"); >> String textAreaContent = "This si a tst"; >> >> String str = ""; >> >> //xml file generation ends here.. >> //FetchDataFromNCBI_URLString.jsp >> String URLString = request.getParameter("txtURLString").trim(); >> >> //URL url = new URL(" >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 >> "); >> URL url = new URL(URLString); //url string taken from user input. >> HttpURLConnection connection = null; >> >> connection = (HttpURLConnection) url.openConnection(); >> System.out.println("After open connection"); >> connection.setRequestMethod("POST"); >> connection.setDoInput(true); >> connection.setDoOutput(true); >> >> connection.setUseCaches(false); >> connection.setAllowUserInteraction(false); >> //connection.setFollowRedirects(true); >> //connection.setInstanceFollowRedirects(true); >> //System.out.println("Before-------------------"); >> connection.setRequestProperty ("Content-Type","text/xml; >> charset=\"utf-8\""); >> //System.out.println("After-------------------"); >> >> //System.out.println(""+ connection.getOutputStream()); >> >> //System.out.println("After dataoutputstream..Line No-65"); >> >> //System.out.println("Response Code="+ connection.getResponseCode); >> >> OutputStreamWriter dosout = new >> OutputStreamWriter(connection.getOutputStream()); >> //System.out.println("After dosout object..Line No-63"); >> //dosout.write(str); >> dosout.close (); >> >> BufferedReader in = new BufferedReader( new InputStreamReader( >> connection.getInputStream())); >> >> String decodedString; >> String tempstr = ""; >> >> >> while ((decodedString = in.readLine()) != null) >> { >> tempstr = tempstr + decodedString; >> //out.println(decodedString); >> } >> out.println(tempstr); >> in.close(); >> } >> catch(Exception ex) >> { >> out.println("Exception->"+ex); >> PrintWriter pw = response.getWriter(); >> ex.printStackTrace(pw); >> } >> >> >> %> >> >> Thanks in advance.. >> >> Regards, >> JItesh Dundas >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -------------- next part -------------- A non-text attachment was scrubbed... Name: ImportFromPubmed3.jsp Type: application/octet-stream Size: 2696 bytes Desc: not available URL: From cmasak at gmail.com Mon Nov 9 16:26:00 2009 From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=) Date: Mon, 9 Nov 2009 17:26:00 +0100 Subject: [Biojava-l] How do I read a FASTA file containing protein sequences in lowercase? In-Reply-To: <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com> <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com> <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com> Message-ID: <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com> Richard (>): > Ah OK I see what's going on. > > The convenience method you're using, RichSequence.IOTools.readStream(), uses > FastaFormat to try and guess the alphabet to use based on the first line of > the input sequence. > > In FastaFormat, it does this by searching for matching non-DNA symbols. The > search is case-sensitive: > > ? ? ? ?protected static final Pattern aminoAcids = > Pattern.compile(".*[FLIPQE].*"); > > FastaFormat needs patching to make this pattern non-case-sensitive. Patch attached. I also took the opportunity to remove the occurrences of .* in the Pattern above. Generally, once should be using Matcher.find() when one is interested in matching a part of a string. This is more efficient than using Matcher.matches() and surrounding the desired regular expression with .*, since the latter will cause a lot of unnecessary backtracking and make the search quadratic. This effect only shows up for very long strings, but long strings can and do happen in bioinformatics. The below measurements show the quadratic behaviour of the former approach. $ for length in 100 1000 10000 100000 1000000; do (time java WithDotStar $length) 2>&1 | grep real; done real 0m0.371s real 0m0.367s real 0m0.577s real 0m2.735s real 0m25.275s $ for length in 100 1000 10000 100000 1000000; do (time java WithoutDotStar $length) 2>&1 | grep real; done real 0m0.309s real 0m0.361s real 0m0.468s real 0m1.184s real 0m9.703s Kindly, // Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: aminoAcids.patch Type: application/octet-stream Size: 1995 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: WithDotStar.java Type: application/octet-stream Size: 634 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: WithoutDotStar.java Type: application/octet-stream Size: 633 bytes Desc: not available URL: From holland at eaglegenomics.com Mon Nov 23 19:08:11 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 23 Nov 2009 19:08:11 +0000 Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> Message-ID: <2582F218-3873-49FB-BFB2-6F72B2B4815C@eaglegenomics.com> Your program takes an input 'txtURLString' - could you give an example of the value that this usually contains? I suspect that this URL is where your problem lies but without seeing an example value I couldn't say for sure. thanks, Richard On 8 Nov 2009, at 10:22, jitesh dundas wrote: > Dear Sir, > > My program is working fine and can send me an xml file with 20 > records. However, it does not allow me to send large amounts of > records. > > For e.g. if I enter "cancer" it will return only 20 records. > > Can you please tell me what I should do next to get all those records. > Thank you in advance > > Regards, > Jitesh Dundas > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: >> >> Hi Jitesh, >> >> It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? >> >> Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... >> >> Andreas >> >> >> >> >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: >>> >>> Hi friends, >>> >>> I am getting this error on doing a post(using the code below) to this url-> >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 >>> >>> I have written this code in .jsp file. Later I will change it into servlet. >>> >>> Error:- >>> XML Parsing Error: XML or text declaration not at start of entity >>> Location: >>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI >>> Line Number 11, Column 1:>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200 >>> 19877350 19877304 19877297 >>> 19877284 19877271 19877265 >>> 19877250 19877245 19877226 >>> 19877210 19877179 19877175 >>> 19877161 19877159 19877158 >>> 19877123 19877122 19877120 >>> 19877119 19877118 >>> cancer >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All >>> Fields] >>> "neoplasms"[MeSH Terms] MeSH >>> Terms 2082133 Y >>> "neoplasms"[All Fields] All >>> Fields 1634731 Y >>> OR "cancer"[All Fields] >>> All Fields 902537 Y >>> OR GROUP >>> 2009/10/22[EDAT] EDAT 0 >>> Y >>> 2009/11/01[EDAT] EDAT 0 >>> Y RANGE AND >>> ("neoplasms"[MeSH Terms] OR >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : >>> 2009/11/01[EDAT] >>> ^ >>> >>> As you can see, the XML output is coming fine but the above error does not >>> go..The output via this program should be just like hitting manually the >>> above URL in the browser.. >>> The browser is Mozilla Firefox. >>> >>> Code:- >>> >>> <%@ page language = "java" %> >>> <%@ page import = "java.sql.*" %> >>> <%@ page import = "java.util.*" %> >>> <%@ page import = "java.io.*" %> >>> <%@ page import="java.lang.*" %> >>> <%@ page import="java.net.*" %> >>> <%@ page import="java.nio.*" %> >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> >>> >>> >>> <% >>> >>> try >>> { >>> //String str = ""; >>> //out.println(""); >>> >>> Properties systemSettings = System.getProperties(); >>> systemSettings.put("http.proxyHost", "********"); >>> systemSettings.put("http.proxyPort", "******"); >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); >>> >>> //out.println("Properties Set"); >>> Authenticator.setDefault(new Authenticator() >>> { >>> protected PasswordAuthentication getPasswordAuthentication() >>> { >>> return new PasswordAuthentication("**", >>> "******".toCharArray()); // specify ur user name password of iitb login >>> } >>> }); >>> >>> >>> System.setProperties(systemSettings); >>> //out.println("After Authentication & Properties Settings"); >>> >>> //create xml file. >>> //the input to google api >>> //String textAreaContent = request.getParameter("text"); >>> String textAreaContent = "This si a tst"; >>> >>> String str = ""; >>> >>> //xml file generation ends here.. >>> //FetchDataFromNCBI_URLString.jsp >>> String URLString = request.getParameter("txtURLString").trim(); >>> >>> //URL url = new URL(" >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 >>> "); >>> URL url = new URL(URLString); //url string taken from user input. >>> HttpURLConnection connection = null; >>> >>> connection = (HttpURLConnection) url.openConnection(); >>> System.out.println("After open connection"); >>> connection.setRequestMethod("POST"); >>> connection.setDoInput(true); >>> connection.setDoOutput(true); >>> >>> connection.setUseCaches(false); >>> connection.setAllowUserInteraction(false); >>> //connection.setFollowRedirects(true); >>> //connection.setInstanceFollowRedirects(true); >>> //System.out.println("Before-------------------"); >>> connection.setRequestProperty ("Content-Type","text/xml; >>> charset=\"utf-8\""); >>> //System.out.println("After-------------------"); >>> >>> //System.out.println(""+ connection.getOutputStream()); >>> >>> //System.out.println("After dataoutputstream..Line No-65"); >>> >>> //System.out.println("Response Code="+ connection.getResponseCode); >>> >>> OutputStreamWriter dosout = new >>> OutputStreamWriter(connection.getOutputStream()); >>> //System.out.println("After dosout object..Line No-63"); >>> //dosout.write(str); >>> dosout.close (); >>> >>> BufferedReader in = new BufferedReader( new InputStreamReader( >>> connection.getInputStream())); >>> >>> String decodedString; >>> String tempstr = ""; >>> >>> >>> while ((decodedString = in.readLine()) != null) >>> { >>> tempstr = tempstr + decodedString; >>> //out.println(decodedString); >>> } >>> out.println(tempstr); >>> in.close(); >>> } >>> catch(Exception ex) >>> { >>> out.println("Exception->"+ex); >>> PrintWriter pw = response.getWriter(); >>> ex.printStackTrace(pw); >>> } >>> >>> >>> %> >>> >>> Thanks in advance.. >>> >>> Regards, >>> JItesh Dundas >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From rabee.a.aa at m.titech.ac.jp Tue Nov 24 10:14:30 2009 From: rabee.a.aa at m.titech.ac.jp (rabee.a.aa at m.titech.ac.jp) Date: Tue, 24 Nov 2009 19:14:30 +0900 Subject: [Biojava-l] sequencing data analysis Message-ID: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp> Dear Biojava members, I'm new to Biojava and i would like to use it for analysis of next generation sequencing data. May i ask you about the available packages for analysis of sequencing data? Best Regards, Rabe From holland at eaglegenomics.com Tue Nov 24 10:33:43 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Nov 2009 10:33:43 +0000 Subject: [Biojava-l] sequencing data analysis In-Reply-To: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp> References: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp> Message-ID: <7ECB9ED5-F983-4E74-8AC1-C70C129EDA7E@eaglegenomics.com> There's loads of things you can do. A good starting point is here: http://biojava.org/wiki/BioJava:CookBook cheers, Richard On 24 Nov 2009, at 10:14, wrote: > Dear Biojava members, > I'm new to Biojava and i would like to use it for analysis of next generation sequencing data. > May i ask you about the available packages for analysis of sequencing data? > > Best Regards, > Rabe > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jbdundas at gmail.com Tue Nov 24 14:48:55 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 24 Nov 2009 20:18:55 +0530 Subject: [Biojava-l] Fwd: Java Error:- XML Parsing Error: XML or text declaration not at start of entity In-Reply-To: <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com> References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com> <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com> <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com> <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com> <2582F218-3873-49FB-BFB2-6F72B2B4815C@eaglegenomics.com> <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com> Message-ID: <326ea8620911240648w371c1c7fx7f495133753bbbe@mail.gmail.com> Dear Sir/Madam, FYI.. Jus trying to contribute to this mailing list and help. Regards, Jitesh Dundas ---------- Forwarded message ---------- From: jitesh dundas Date: Nov 24, 2009 8:17 PM Subject: Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity To: Richard Holland Dear Sir, Thank you for your reply. I figured this problem out by sending records in small sets. e.g. 20 pages per page. It is like a pagination functionality. For each new page, we need to hit the URl.. My functionality is working fine.I will be happy to share my code with you (and anyone) who needs it. I simply fetch data from the URL and write to an XML file. Next I just read the XML file and show them in the web page to the user. Again, I need to know how to fetch records for protein database. Two types of searches are needed I suspect. First we use the Esearch utility and then the Efetch utility to get the data of the specific protein.. I welcome any suggestions on this ! Thank you everyone for your help. Regards, Jitesh Dundas On 11/24/09, Richard Holland wrote: > > Your program takes an input 'txtURLString' - could you give an example of > the value that this usually contains? I suspect that this URL is where your > problem lies but without seeing an example value I couldn't say for sure. > > thanks, > Richard > > On 8 Nov 2009, at 10:22, jitesh dundas wrote: > > > Dear Sir, > > > > My program is working fine and can send me an xml file with 20 > > records. However, it does not allow me to send large amounts of > > records. > > > > For e.g. if I enter "cancer" it will return only 20 records. > > > > Can you please tell me what I should do next to get all those records. > > Thank you in advance > > > > Regards, > > Jitesh Dundas > > > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > >> > >> Hi Jitesh, > >> > >> It is hard to read your code with all the formatting off probably due to > email and many commented lines that don;t seem to get used. Can you provide > the stacktrace, so we can see what part of biojava is affected? > >> > >> Probably a good strategy to write and debug this is to simply the > problem into smaller steps. Try to first download the files you want to > parse and write the code to parse them from the local file. That will avoid > any issues you might encounter with networking and server/client > communication. Once the parsing is working you could take it to the next > step and add the server communication... > >> > >> Andreas > >> > >> > >> > >> > >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas > wrote: > >>> > >>> Hi friends, > >>> > >>> I am getting this error on doing a post(using the code below) to this > url-> > >>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > >>> > >>> I have written this code in .jsp file. Later I will change it into > servlet. > >>> > >>> Error:- > >>> XML Parsing Error: XML or text declaration not at start of entity > >>> Location: > >>> > http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > >>> Line Number 11, Column 1: >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd > ">2034200 > >>> 19877350 19877304 19877297 > >>> 19877284 19877271 19877265 > >>> 19877250 19877245 19877226 > >>> 19877210 19877179 19877175 > >>> 19877161 19877159 19877158 > >>> 19877123 19877122 19877120 > >>> 19877119 19877118 > >>> cancer > >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > >>> Fields] > >>> "neoplasms"[MeSH Terms] MeSH > >>> Terms 2082133 Y > >>> "neoplasms"[All > Fields] All > >>> Fields 1634731 Y > >>> OR "cancer"[All > Fields] > >>> All > Fields 902537 Y > >>> OR GROUP > >>> > 2009/10/22[EDAT] EDAT 0 > >>> Y > >>> > 2009/11/01[EDAT] EDAT 0 > >>> Y RANGE AND > >>> ("neoplasms"[MeSH Terms] OR > >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > >>> 2009/11/01[EDAT] > >>> ^ > >>> > >>> As you can see, the XML output is coming fine but the above error does > not > >>> go..The output via this program should be just like hitting manually > the > >>> above URL in the browser.. > >>> The browser is Mozilla Firefox. > >>> > >>> Code:- > >>> > >>> <%@ page language = "java" %> > >>> <%@ page import = "java.sql.*" %> > >>> <%@ page import = "java.util.*" %> > >>> <%@ page import = "java.io.*" %> > >>> <%@ page import="java.lang.*" %> > >>> <%@ page import="java.net.*" %> > >>> <%@ page import="java.nio.*" %> > >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > >>> > >>> > >>> <% > >>> > >>> try > >>> { > >>> //String str = ""; > >>> //out.println(""); > >>> > >>> Properties systemSettings = System.getProperties(); > >>> systemSettings.put("http.proxyHost", "********"); > >>> systemSettings.put("http.proxyPort", "******"); > >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > >>> > >>> //out.println("Properties Set"); > >>> Authenticator.setDefault(new Authenticator() > >>> { > >>> protected PasswordAuthentication getPasswordAuthentication() > >>> { > >>> return new PasswordAuthentication("**", > >>> "******".toCharArray()); // specify ur user name password of iitb login > >>> } > >>> }); > >>> > >>> > >>> System.setProperties(systemSettings); > >>> //out.println("After Authentication & Properties Settings"); > >>> > >>> //create xml file. > >>> //the input to google api > >>> //String textAreaContent = request.getParameter("text"); > >>> String textAreaContent = "This si a tst"; > >>> > >>> String str = ""; > >>> > >>> //xml file generation ends here.. > >>> //FetchDataFromNCBI_URLString.jsp > >>> String URLString = request.getParameter("txtURLString").trim(); > >>> > >>> //URL url = new URL(" > >>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > >>> "); > >>> URL url = new URL(URLString); //url string taken from user input. > >>> HttpURLConnection connection = null; > >>> > >>> connection = (HttpURLConnection) url.openConnection(); > >>> System.out.println("After open connection"); > >>> connection.setRequestMethod("POST"); > >>> connection.setDoInput(true); > >>> connection.setDoOutput(true); > >>> > >>> connection.setUseCaches(false); > >>> connection.setAllowUserInteraction(false); > >>> //connection.setFollowRedirects(true); > >>> //connection.setInstanceFollowRedirects(true); > >>> //System.out.println("Before-------------------"); > >>> connection.setRequestProperty ("Content-Type","text/xml; > >>> charset=\"utf-8\""); > >>> //System.out.println("After-------------------"); > >>> > >>> //System.out.println(""+ connection.getOutputStream()); > >>> > >>> //System.out.println("After dataoutputstream..Line No-65"); > >>> > >>> //System.out.println("Response Code="+ connection.getResponseCode); > >>> > >>> OutputStreamWriter dosout = new > >>> OutputStreamWriter(connection.getOutputStream()); > >>> //System.out.println("After dosout object..Line No-63"); > >>> //dosout.write(str); > >>> dosout.close (); > >>> > >>> BufferedReader in = new BufferedReader( new InputStreamReader( > >>> connection.getInputStream())); > >>> > >>> String decodedString; > >>> String tempstr = ""; > >>> > >>> > >>> while ((decodedString = in.readLine()) != null) > >>> { > >>> tempstr = tempstr + decodedString; > >>> //out.println(decodedString); > >>> } > >>> out.println(tempstr); > >>> in.close(); > >>> } > >>> catch(Exception ex) > >>> { > >>> out.println("Exception->"+ex); > >>> PrintWriter pw = response.getWriter(); > >>> ex.printStackTrace(pw); > >>> } > >>> > >>> > >>> %> > >>> > >>> Thanks in advance.. > >>> > >>> Regards, > >>> JItesh Dundas > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From holland at eaglegenomics.com Tue Nov 24 14:51:49 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Nov 2009 14:51:49 +0000 Subject: [Biojava-l] Fwd: Java Error:- XML Parsing Error: XML or text declaration not at start of entity References: <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com> Message-ID: <02966AA0-0DD3-4EF1-9D02-E86F593D16D8@eaglegenomics.com> Jitesh - I forwarded your response to the list so that everyone can get the chance to reply. cheers, Richard Begin forwarded message: > From: jitesh dundas > Date: 24 November 2009 14:47:00 GMT > To: Richard Holland > Subject: Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity > > Dear Sir, > > Thank you for your reply. I figured this problem out by sending records in small sets. e.g. 20 pages per page. > > It is like a pagination functionality. For each new page, we need to hit the URl.. > > My functionality is working fine.I will be happy to share my code with you (and anyone) who needs it. > > I simply fetch data from the URL and write to an XML file. Next I just read the XML file and show them in the web page to the user. > > Again, I need to know how to fetch records for protein database. Two types of searches are needed I suspect. > > First we use the Esearch utility and then the Efetch utility to get the data of the specific protein.. > > I welcome any suggestions on this ! > > Thank you everyone for your help. > > Regards, > Jitesh Dundas > > On 11/24/09, Richard Holland wrote: > Your program takes an input 'txtURLString' - could you give an example of the value that this usually contains? I suspect that this URL is where your problem lies but without seeing an example value I couldn't say for sure. > > thanks, > Richard > > On 8 Nov 2009, at 10:22, jitesh dundas wrote: > > > Dear Sir, > > > > My program is working fine and can send me an xml file with 20 > > records. However, it does not allow me to send large amounts of > > records. > > > > For e.g. if I enter "cancer" it will return only 20 records. > > > > Can you please tell me what I should do next to get all those records. > > Thank you in advance > > > > Regards, > > Jitesh Dundas > > > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: > >> > >> Hi Jitesh, > >> > >> It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected? > >> > >> Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication... > >> > >> Andreas > >> > >> > >> > >> > >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote: > >>> > >>> Hi friends, > >>> > >>> I am getting this error on doing a post(using the code below) to this url-> > >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > >>> > >>> I have written this code in .jsp file. Later I will change it into servlet. > >>> > >>> Error:- > >>> XML Parsing Error: XML or text declaration not at start of entity > >>> Location: > >>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > >>> Line Number 11, Column 1: >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200 > >>> 19877350 19877304 19877297 > >>> 19877284 19877271 19877265 > >>> 19877250 19877245 19877226 > >>> 19877210 19877179 19877175 > >>> 19877161 19877159 19877158 > >>> 19877123 19877122 19877120 > >>> 19877119 19877118 > >>> cancer > >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > >>> Fields] > >>> "neoplasms"[MeSH Terms] MeSH > >>> Terms 2082133 Y > >>> "neoplasms"[All Fields] All > >>> Fields 1634731 Y > >>> OR "cancer"[All Fields] > >>> All Fields 902537 Y > >>> OR GROUP > >>> 2009/10/22[EDAT] EDAT 0 > >>> Y > >>> 2009/11/01[EDAT] EDAT 0 > >>> Y RANGE AND > >>> ("neoplasms"[MeSH Terms] OR > >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > >>> 2009/11/01[EDAT] > >>> ^ > >>> > >>> As you can see, the XML output is coming fine but the above error does not > >>> go..The output via this program should be just like hitting manually the > >>> above URL in the browser.. > >>> The browser is Mozilla Firefox. > >>> > >>> Code:- > >>> > >>> <%@ page language = "java" %> > >>> <%@ page import = "java.sql.*" %> > >>> <%@ page import = "java.util.*" %> > >>> <%@ page import = "java.io.*" %> > >>> <%@ page import="java.lang.*" %> > >>> <%@ page import="java.net.*" %> > >>> <%@ page import="java.nio.*" %> > >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > >>> > >>> > >>> <% > >>> > >>> try > >>> { > >>> //String str = ""; > >>> //out.println(""); > >>> > >>> Properties systemSettings = System.getProperties(); > >>> systemSettings.put("http.proxyHost", "********"); > >>> systemSettings.put("http.proxyPort", "******"); > >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > >>> > >>> //out.println("Properties Set"); > >>> Authenticator.setDefault(new Authenticator() > >>> { > >>> protected PasswordAuthentication getPasswordAuthentication() > >>> { > >>> return new PasswordAuthentication("**", > >>> "******".toCharArray()); // specify ur user name password of iitb login > >>> } > >>> }); > >>> > >>> > >>> System.setProperties(systemSettings); > >>> //out.println("After Authentication & Properties Settings"); > >>> > >>> //create xml file. > >>> //the input to google api > >>> //String textAreaContent = request.getParameter("text"); > >>> String textAreaContent = "This si a tst"; > >>> > >>> String str = ""; > >>> > >>> //xml file generation ends here.. > >>> //FetchDataFromNCBI_URLString.jsp > >>> String URLString = request.getParameter("txtURLString").trim(); > >>> > >>> //URL url = new URL(" > >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > >>> "); > >>> URL url = new URL(URLString); //url string taken from user input. > >>> HttpURLConnection connection = null; > >>> > >>> connection = (HttpURLConnection) url.openConnection(); > >>> System.out.println("After open connection"); > >>> connection.setRequestMethod("POST"); > >>> connection.setDoInput(true); > >>> connection.setDoOutput(true); > >>> > >>> connection.setUseCaches(false); > >>> connection.setAllowUserInteraction(false); > >>> //connection.setFollowRedirects(true); > >>> //connection.setInstanceFollowRedirects(true); > >>> //System.out.println("Before-------------------"); > >>> connection.setRequestProperty ("Content-Type","text/xml; > >>> charset=\"utf-8\""); > >>> //System.out.println("After-------------------"); > >>> > >>> //System.out.println(""+ connection.getOutputStream()); > >>> > >>> //System.out.println("After dataoutputstream..Line No-65"); > >>> > >>> //System.out.println("Response Code="+ connection.getResponseCode); > >>> > >>> OutputStreamWriter dosout = new > >>> OutputStreamWriter(connection.getOutputStream()); > >>> //System.out.println("After dosout object..Line No-63"); > >>> //dosout.write(str); > >>> dosout.close (); > >>> > >>> BufferedReader in = new BufferedReader( new InputStreamReader( > >>> connection.getInputStream())); > >>> > >>> String decodedString; > >>> String tempstr = ""; > >>> > >>> > >>> while ((decodedString = in.readLine()) != null) > >>> { > >>> tempstr = tempstr + decodedString; > >>> //out.println(decodedString); > >>> } > >>> out.println(tempstr); > >>> in.close(); > >>> } > >>> catch(Exception ex) > >>> { > >>> out.println("Exception->"+ex); > >>> PrintWriter pw = response.getWriter(); > >>> ex.printStackTrace(pw); > >>> } > >>> > >>> > >>> %> > >>> > >>> Thanks in advance.. > >>> > >>> Regards, > >>> JItesh Dundas > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Tue Nov 24 15:27:20 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Nov 2009 15:27:20 +0000 Subject: [Biojava-l] Hackathon in January Message-ID: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> Hi all. To anyone planning on attending the BioJava hackathon in Cambridge (UK) in January, now would be a good time to sort out travel arrangements. If you're intending to come but haven't yet said so, please do let me know so that I can ensure we get a big enough room to work in! cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Tue Nov 24 17:54:38 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 24 Nov 2009 09:54:38 -0800 Subject: [Biojava-l] Hackathon in January In-Reply-To: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> Message-ID: <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> * Is anybody interested in following the ongoings at the hackaton via an online-stream? - I received a request about this and am wondering if more people would be interested in this. * Just to repeat the current status regarding the program: So far the plan is to continue working on the new modules. Ideally we will have a brand new biojava 3 ready soon after the hackaton. A more detailed program for the week will be sent out in January. If anybody wants to propose feature requests, you can have a look at the current todo list for the modules: http://biojava.org/wiki/BioJava:Modules Andreas On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland wrote: > Hi all. > > To anyone planning on attending the BioJava hackathon in Cambridge (UK) in January, now would be a good time to sort out travel arrangements. If you're intending to come but haven't yet said so, please do let me know so that I can ensure we get a big enough room to work in! > > cheers, > Richard > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Wed Nov 25 12:51:23 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 25 Nov 2009 12:51:23 +0000 Subject: [Biojava-l] [Biojava-dev] Hackathon in January In-Reply-To: <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> Message-ID: <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk> By online stream do you mean Wave or Twitter or something else trendy? :) Andy On 24 Nov 2009, at 17:54, Andreas Prlic wrote: > * Is anybody interested in following the ongoings at the hackaton via > an online-stream? - I received a request about this and am wondering > if more people would be interested in this. > > * Just to repeat the current status regarding the program: So far the > plan is to continue working on the new modules. Ideally we will have a > brand new biojava 3 ready soon after the hackaton. A more detailed > program for the week will be sent out in January. > > If anybody wants to propose feature requests, you can have a look at > the current todo list for the modules: > http://biojava.org/wiki/BioJava:Modules > > Andreas > > > > On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland > wrote: >> Hi all. >> >> To anyone planning on attending the BioJava hackathon in Cambridge >> (UK) in January, now would be a good time to sort out travel >> arrangements. If you're intending to come but haven't yet said so, >> please do let me know so that I can ensure we get a big enough room >> to work in! >> >> cheers, >> Richard >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Wed Nov 25 20:08:33 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 25 Nov 2009 12:08:33 -0800 Subject: [Biojava-l] [Biojava-dev] Hackathon in January In-Reply-To: <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk> References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com> <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com> <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk> Message-ID: <59a41c430911251208u44bb1218l7cb0065657ed7227@mail.gmail.com> I was thinking about video... I would expect that some of the participants will do some sort of tweeting, blogging, etc. Andreas On Wed, Nov 25, 2009 at 4:51 AM, Andy Yates wrote: > By online stream do you mean Wave or Twitter or something else trendy? :) > > Andy > > On 24 Nov 2009, at 17:54, Andreas Prlic wrote: > >> * Is anybody interested in following the ongoings at the hackaton via >> an online-stream? - I received a request about this and am wondering >> if more people would be interested in this. >> >> * Just to repeat the current status regarding the program: So far the >> plan is to continue working on the new modules. Ideally we will have a >> brand new biojava 3 ready soon after the hackaton. A more detailed >> program for the week will be sent out in January. >> >> If anybody wants to propose feature requests, you can have a look at >> the current todo list for the modules: >> http://biojava.org/wiki/BioJava:Modules >> >> Andreas >> >> >> >> On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland >> wrote: >>> >>> Hi all. >>> >>> To anyone planning on attending the BioJava hackathon in Cambridge (UK) >>> in January, now would be a good time to sort out travel arrangements. If >>> you're intending to come but haven't yet said so, please do let me know so >>> that I can ensure we get a big enough room to work in! >>> >>> cheers, >>> Richard >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From jw12 at sanger.ac.uk Thu Nov 26 14:57:35 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 26 Nov 2009 14:57:35 +0000 Subject: [Biojava-l] DAS workshop 7th-9th April 2010 Message-ID: We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part then please email me jw12 at sanger.ac.uk The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: http://www.dasregistry.org/course.jsp If you would like to present then please send a short summary of what you would like to talk about. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From mauricio at open-bio.org Thu Nov 26 21:45:43 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 26 Nov 2009 15:45:43 -0600 Subject: [Biojava-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <4B0EF707.6080202@open-bio.org> Hi Jonathan, Any chance it can be webcasted? I'm sure it would attract a lot of remote attendees ;) Regards, Mauricio. Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here > at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If > you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st > day for beginners, 2nd for both beginners and advanced users, 3rd day > for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what > you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > >