From jbdundas at gmail.com Sun Nov 1 10:41:03 2009
From: jbdundas at gmail.com (jitesh dundas)
Date: Sun, 1 Nov 2009 21:11:03 +0530
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration
not at start of entity
In-Reply-To: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
Message-ID: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
Hi friends,
I am getting this error on doing a post(using the code below) to this url->
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
I have written this code in .jsp file. Later I will change it into servlet.
Error:-
XML Parsing Error: XML or text declaration not at start of entity
Location:
http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
Line Number 11, Column 1:2034200
19877350 19877304 19877297
19877284 19877271 19877265
19877250 19877245 19877226
19877210 19877179 19877175
19877161 19877159 19877158
19877123 19877122 19877120
19877119 19877118
cancer
"neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
Fields]
"neoplasms"[MeSH Terms] MeSH
Terms 2082133 Y
"neoplasms"[All Fields] All
Fields 1634731 Y
OR "cancer"[All Fields]
All Fields 902537 Y
OR GROUP
2009/10/22[EDAT] EDAT 0
Y
2009/11/01[EDAT] EDAT 0
Y RANGE AND
("neoplasms"[MeSH Terms] OR
"neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
2009/11/01[EDAT]
^
As you can see, the XML output is coming fine but the above error does not
go..The output via this program should be just like hitting manually the
above URL in the browser..
The browser is Mozilla Firefox.
Code:-
<%@ page language = "java" %>
<%@ page import = "java.sql.*" %>
<%@ page import = "java.util.*" %>
<%@ page import = "java.io.*" %>
<%@ page import="java.lang.*" %>
<%@ page import="java.net.*" %>
<%@ page import="java.nio.*" %>
<%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
<%
try
{
//String str = "";
//out.println("");
Properties systemSettings = System.getProperties();
systemSettings.put("http.proxyHost", "********");
systemSettings.put("http.proxyPort", "******");
systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
//out.println("Properties Set");
Authenticator.setDefault(new Authenticator()
{
protected PasswordAuthentication getPasswordAuthentication()
{
return new PasswordAuthentication("**",
"******".toCharArray()); // specify ur user name password of iitb login
}
});
System.setProperties(systemSettings);
//out.println("After Authentication & Properties Settings");
//create xml file.
//the input to google api
//String textAreaContent = request.getParameter("text");
String textAreaContent = "This si a tst";
String str = "";
//xml file generation ends here..
//FetchDataFromNCBI_URLString.jsp
String URLString = request.getParameter("txtURLString").trim();
//URL url = new URL("
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
");
URL url = new URL(URLString); //url string taken from user input.
HttpURLConnection connection = null;
connection = (HttpURLConnection) url.openConnection();
System.out.println("After open connection");
connection.setRequestMethod("POST");
connection.setDoInput(true);
connection.setDoOutput(true);
connection.setUseCaches(false);
connection.setAllowUserInteraction(false);
//connection.setFollowRedirects(true);
//connection.setInstanceFollowRedirects(true);
//System.out.println("Before-------------------");
connection.setRequestProperty ("Content-Type","text/xml;
charset=\"utf-8\"");
//System.out.println("After-------------------");
//System.out.println(""+ connection.getOutputStream());
//System.out.println("After dataoutputstream..Line No-65");
//System.out.println("Response Code="+ connection.getResponseCode);
OutputStreamWriter dosout = new
OutputStreamWriter(connection.getOutputStream());
//System.out.println("After dosout object..Line No-63");
//dosout.write(str);
dosout.close ();
BufferedReader in = new BufferedReader( new InputStreamReader(
connection.getInputStream()));
String decodedString;
String tempstr = "";
while ((decodedString = in.readLine()) != null)
{
tempstr = tempstr + decodedString;
//out.println(decodedString);
}
out.println(tempstr);
in.close();
}
catch(Exception ex)
{
out.println("Exception->"+ex);
PrintWriter pw = response.getWriter();
ex.printStackTrace(pw);
}
%>
Thanks in advance..
Regards,
JItesh Dundas
From andreas at sdsc.edu Sun Nov 1 11:06:29 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 1 Nov 2009 08:06:29 -0800
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
Message-ID: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
Hi Jitesh,
It is hard to read your code with all the formatting off probably due to
email and many commented lines that don;t seem to get used. Can you provide
the stacktrace, so we can see what part of biojava is affected?
Probably a good strategy to write and debug this is to simply the problem
into smaller steps. Try to first download the files you want to parse and
write the code to parse them from the local file. That will avoid any
issues you might encounter with networking and server/client communication.
Once the parsing is working you could take it to the next step and add the
server communication...
Andreas
On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote:
> Hi friends,
>
> I am getting this error on doing a post(using the code below) to this url->
>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
>
> I have written this code in .jsp file. Later I will change it into servlet.
>
> Error:-
> XML Parsing Error: XML or text declaration not at start of entity
> Location:
>
> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
> Line Number 11, Column 1: PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd
> ">2034200
> 19877350 19877304 19877297
> 19877284 19877271 19877265
> 19877250 19877245 19877226
> 19877210 19877179 19877175
> 19877161 19877159 19877158
> 19877123 19877122 19877120
> 19877119 19877118
> cancer
> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
> Fields]
> "neoplasms"[MeSH Terms] MeSH
> Terms 2082133 Y
> "neoplasms"[All Fields]
> All
> Fields 1634731 Y
> OR "cancer"[All Fields]
> All Fields 902537 Y
> OR GROUP
> 2009/10/22[EDAT] EDAT 0
> Y
> 2009/11/01[EDAT] EDAT 0
> Y RANGE AND
> ("neoplasms"[MeSH Terms] OR
> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
> 2009/11/01[EDAT]
> ^
>
> As you can see, the XML output is coming fine but the above error does not
> go..The output via this program should be just like hitting manually the
> above URL in the browser..
> The browser is Mozilla Firefox.
>
> Code:-
>
> <%@ page language = "java" %>
> <%@ page import = "java.sql.*" %>
> <%@ page import = "java.util.*" %>
> <%@ page import = "java.io.*" %>
> <%@ page import="java.lang.*" %>
> <%@ page import="java.net.*" %>
> <%@ page import="java.nio.*" %>
> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
>
>
> <%
>
> try
> {
> //String str = "";
> //out.println("");
>
> Properties systemSettings = System.getProperties();
> systemSettings.put("http.proxyHost", "********");
> systemSettings.put("http.proxyPort", "******");
> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
>
> //out.println("Properties Set");
> Authenticator.setDefault(new Authenticator()
> {
> protected PasswordAuthentication getPasswordAuthentication()
> {
> return new PasswordAuthentication("**",
> "******".toCharArray()); // specify ur user name password of iitb login
> }
> });
>
>
> System.setProperties(systemSettings);
> //out.println("After Authentication & Properties Settings");
>
> //create xml file.
> //the input to google api
> //String textAreaContent = request.getParameter("text");
> String textAreaContent = "This si a tst";
>
> String str = "";
>
> //xml file generation ends here..
> //FetchDataFromNCBI_URLString.jsp
> String URLString = request.getParameter("txtURLString").trim();
>
> //URL url = new URL("
>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
> ");
> URL url = new URL(URLString); //url string taken from user input.
> HttpURLConnection connection = null;
>
> connection = (HttpURLConnection) url.openConnection();
> System.out.println("After open connection");
> connection.setRequestMethod("POST");
> connection.setDoInput(true);
> connection.setDoOutput(true);
>
> connection.setUseCaches(false);
> connection.setAllowUserInteraction(false);
> //connection.setFollowRedirects(true);
> //connection.setInstanceFollowRedirects(true);
> //System.out.println("Before-------------------");
> connection.setRequestProperty ("Content-Type","text/xml;
> charset=\"utf-8\"");
> //System.out.println("After-------------------");
>
> //System.out.println(""+ connection.getOutputStream());
>
> //System.out.println("After dataoutputstream..Line No-65");
>
> //System.out.println("Response Code="+ connection.getResponseCode);
>
> OutputStreamWriter dosout = new
> OutputStreamWriter(connection.getOutputStream());
> //System.out.println("After dosout object..Line No-63");
> //dosout.write(str);
> dosout.close ();
>
> BufferedReader in = new BufferedReader( new InputStreamReader(
> connection.getInputStream()));
>
> String decodedString;
> String tempstr = "";
>
>
> while ((decodedString = in.readLine()) != null)
> {
> tempstr = tempstr + decodedString;
> //out.println(decodedString);
> }
> out.println(tempstr);
> in.close();
> }
> catch(Exception ex)
> {
> out.println("Exception->"+ex);
> PrintWriter pw = response.getWriter();
> ex.printStackTrace(pw);
> }
>
>
> %>
>
> Thanks in advance..
>
> Regards,
> JItesh Dundas
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From jbdundas at gmail.com Mon Nov 2 03:19:19 2009
From: jbdundas at gmail.com (jitesh dundas)
Date: Mon, 2 Nov 2009 13:49:19 +0530
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
<59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
Message-ID: <326ea8620911020019w2b6a8307o5befcc5a4395299a@mail.gmail.com>
Dear Dr. *Andreas Prlic,*
Thank you for the advise. I will do that.
Regards,
Jitesh Dundas
On 11/1/09, Andreas Prlic wrote:
>
> Hi Jitesh,
>
> It is hard to read your code with all the formatting off probably due to
> email and many commented lines that don;t seem to get used. Can you provide
> the stacktrace, so we can see what part of biojava is affected?
>
> Probably a good strategy to write and debug this is to simply the problem
> into smaller steps. Try to first download the files you want to parse and
> write the code to parse them from the local file. That will avoid any
> issues you might encounter with networking and server/client communication.
> Once the parsing is working you could take it to the next step and add the
> server communication...
>
> Andreas
>
>
>
>
> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote:
>
>> Hi friends,
>>
>> I am getting this error on doing a post(using the code below) to this
>> url->
>>
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
>>
>> I have written this code in .jsp file. Later I will change it into
>> servlet.
>>
>> Error:-
>> XML Parsing Error: XML or text declaration not at start of entity
>> Location:
>>
>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
>> Line Number 11, Column 1:> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd
>> ">2034200
>> 19877350 19877304 19877297
>> 19877284 19877271 19877265
>> 19877250 19877245 19877226
>> 19877210 19877179 19877175
>> 19877161 19877159 19877158
>> 19877123 19877122 19877120
>> 19877119 19877118
>> cancer
>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
>> Fields]
>> "neoplasms"[MeSH Terms] MeSH
>> Terms 2082133 Y
>> "neoplasms"[All Fields]
>> All
>> Fields 1634731 Y
>> OR "cancer"[All Fields]
>> All Fields 902537 Y
>> OR GROUP
>> 2009/10/22[EDAT] EDAT 0
>> Y
>> 2009/11/01[EDAT] EDAT 0
>> Y RANGE AND
>> ("neoplasms"[MeSH Terms] OR
>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
>> 2009/11/01[EDAT]
>> ^
>>
>> As you can see, the XML output is coming fine but the above error does not
>> go..The output via this program should be just like hitting manually the
>> above URL in the browser..
>> The browser is Mozilla Firefox.
>>
>> Code:-
>>
>> <%@ page language = "java" %>
>> <%@ page import = "java.sql.*" %>
>> <%@ page import = "java.util.*" %>
>> <%@ page import = "java.io.*" %>
>> <%@ page import="java.lang.*" %>
>> <%@ page import="java.net.*" %>
>> <%@ page import="java.nio.*" %>
>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
>>
>>
>> <%
>>
>> try
>> {
>> //String str = "";
>> //out.println("");
>>
>> Properties systemSettings = System.getProperties();
>> systemSettings.put("http.proxyHost", "********");
>> systemSettings.put("http.proxyPort", "******");
>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
>>
>> //out.println("Properties Set");
>> Authenticator.setDefault(new Authenticator()
>> {
>> protected PasswordAuthentication getPasswordAuthentication()
>> {
>> return new PasswordAuthentication("**",
>> "******".toCharArray()); // specify ur user name password of iitb login
>> }
>> });
>>
>>
>> System.setProperties(systemSettings);
>> //out.println("After Authentication & Properties Settings");
>>
>> //create xml file.
>> //the input to google api
>> //String textAreaContent = request.getParameter("text");
>> String textAreaContent = "This si a tst";
>>
>> String str = "";
>>
>> //xml file generation ends here..
>> //FetchDataFromNCBI_URLString.jsp
>> String URLString = request.getParameter("txtURLString").trim();
>>
>> //URL url = new URL("
>>
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
>> ");
>> URL url = new URL(URLString); //url string taken from user input.
>> HttpURLConnection connection = null;
>>
>> connection = (HttpURLConnection) url.openConnection();
>> System.out.println("After open connection");
>> connection.setRequestMethod("POST");
>> connection.setDoInput(true);
>> connection.setDoOutput(true);
>>
>> connection.setUseCaches(false);
>> connection.setAllowUserInteraction(false);
>> //connection.setFollowRedirects(true);
>> //connection.setInstanceFollowRedirects(true);
>> //System.out.println("Before-------------------");
>> connection.setRequestProperty ("Content-Type","text/xml;
>> charset=\"utf-8\"");
>> //System.out.println("After-------------------");
>>
>> //System.out.println(""+ connection.getOutputStream());
>>
>> //System.out.println("After dataoutputstream..Line No-65");
>>
>> //System.out.println("Response Code="+ connection.getResponseCode);
>>
>> OutputStreamWriter dosout = new
>> OutputStreamWriter(connection.getOutputStream());
>> //System.out.println("After dosout object..Line No-63");
>> //dosout.write(str);
>> dosout.close ();
>>
>> BufferedReader in = new BufferedReader( new InputStreamReader(
>> connection.getInputStream()));
>>
>> String decodedString;
>> String tempstr = "";
>>
>>
>> while ((decodedString = in.readLine()) != null)
>> {
>> tempstr = tempstr + decodedString;
>> //out.println(decodedString);
>> }
>> out.println(tempstr);
>> in.close();
>> }
>> catch(Exception ex)
>> {
>> out.println("Exception->"+ex);
>> PrintWriter pw = response.getWriter();
>> ex.printStackTrace(pw);
>> }
>>
>>
>> %>
>>
>> Thanks in advance..
>>
>> Regards,
>> JItesh Dundas
>>
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
From pingou at pingoured.fr Mon Nov 2 09:03:15 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Mon, 02 Nov 2009 15:03:15 +0100
Subject: [Biojava-l] NCBI xml parser
Message-ID: <1257170595.29918.8.camel@localhost.localdomain>
Dear list,
I am trying to find my way around parsing ncbi blast xml.
I am using a small library which performs the blast online [1] and
returns a FileReader of the xml.
I can convert the FileReader to a string and print it, it seems fine.
(I used the default input shown on [1]).
So I am now trying to parse it automatically. I looked at [2] and [3]
but I could not get them working. I then found this message from this
mailing list [4] and thus went to use BlastXMLParserFacade.
It returns me an "org.xml.sax.SAXException: illegal frame number
encountered. (0)".
So my question is then: which method should I use ?
Thanks in advance,
Best regards,
Pierre
[1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/
[2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo
[3]
http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book
[4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html
From jogoodma at indiana.edu Mon Nov 2 09:45:09 2009
From: jogoodma at indiana.edu (Josh Goodman)
Date: Mon, 02 Nov 2009 09:45:09 -0500
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257170595.29918.8.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
Message-ID: <4AEEF075.4020700@indiana.edu>
It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1
for blastp. Hence the illegal frame number (0) error.
Josh
Pierre-Yves wrote:
> Dear list,
>
> I am trying to find my way around parsing ncbi blast xml.
> I am using a small library which performs the blast online [1] and
> returns a FileReader of the xml.
> I can convert the FileReader to a string and print it, it seems fine.
> (I used the default input shown on [1]).
>
> So I am now trying to parse it automatically. I looked at [2] and [3]
> but I could not get them working. I then found this message from this
> mailing list [4] and thus went to use BlastXMLParserFacade.
> It returns me an "org.xml.sax.SAXException: illegal frame number
> encountered. (0)".
>
> So my question is then: which method should I use ?
>
> Thanks in advance,
>
> Best regards,
>
> Pierre
>
>
>
> [1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/
> [2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo
> [3]
> http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book
> [4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html
>
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
From pingou at pingoured.fr Mon Nov 2 11:17:16 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Mon, 02 Nov 2009 17:17:16 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <4AEEF075.4020700@indiana.edu>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
Message-ID: <1257178636.29918.11.camel@localhost.localdomain>
On Mon, 2009-11-02 at 09:45 -0500, Josh Goodman wrote:
> It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1
> for blastp. Hence the illegal frame number (0) error.
>
> Josh
Thanks for the hint.
I downloaded the biojava-1.7-src.jar to check the sources and correct
the frame to 0 (I already saw the case to change).
However, without changing anything on the source, when I try to
reproduce the error, I got a new one:
"org.xml.sax.SAXParseException: The markup declarations contained or
pointed to by the document type declaration must be well-formed."
I understand the error, I am more surprised by the fact that the jar
and the sources of the release 1.7 are given a different errors.
Did I miss something ?
Thanks,
Best regards,
Pierre
From holland at eaglegenomics.com Mon Nov 2 12:16:00 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 2 Nov 2009 17:16:00 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
Message-ID: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
The graphs returned by the Nexus parser are instances that implement
the org.jgrapht.UndirectedGraph interface. Undirected graphs have no
root.
cheers,
Richard
On 30 Oct 2009, at 21:14, Tiago Ant?o wrote:
> Hi,
>
> I have been trying to use biojava to parse some trees on nexus files
> and I have a small doubt:
> If there is a rooted tree, how can one know what is the root vertex in
> the weighted graph (JGraphT)?
> I understand that there is no root if the tree is unrooted, but in
> case it is rooted, how to determine the vertex?
>
> Many thanks,
> Tiago
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From andreas at sdsc.edu Mon Nov 2 14:29:04 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 2 Nov 2009 11:29:04 -0800
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257178636.29918.11.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
Message-ID: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
>
>
> I understand the error, I am more surprised by the fact that the jar
> and the sources of the release 1.7 are given a different errors.
>
>
that's surprising... I built the src-jar and the other jars at the same time
so the code should be identical... Are you sure you are doing exactly the
same?
Andreas
From tiagoantao at gmail.com Mon Nov 2 14:36:31 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 2 Nov 2009 19:36:31 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
Message-ID: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
2009/11/2 Richard Holland :
> The graphs returned by the Nexus parser are instances that implement the
> org.jgrapht.UndirectedGraph interface. Undirected graphs have no root.
Yes, that is a property of the jgrapht. But it might not be the case
of the original nexus file/tree. So, if the tree is rooted, how can
one know the root (without doing the parsing again ourselves to
discover it)? I note two things:
a) The root is obviously not one taxa, but one intermediate node.
b) Even if the tree is unrooted, it might be interesting to know the
"root", for instance to draw the tree, in the way that is was written
in the file.
Tiago
PS - I also added to bugzilla one but related to the parser, but that
is different problem...
From pingou at pingoured.fr Mon Nov 2 14:50:25 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Mon, 02 Nov 2009 20:50:25 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
Message-ID: <4AEF3801.10304@pingoured.fr>
On 11/02/2009 08:29 PM, Andreas Prlic wrote:
>>
>> I understand the error, I am more surprised by the fact that the jar
>> and the sources of the release 1.7 are given a different errors.
>>
>>
> that's surprising... I built the src-jar and the other jars at the same time
> so the code should be identical... Are you sure you are doing exactly the
> same?
I can confirm you this tomorrow but AFAIR before I left I tried the same
code using or the jar file or the project generated from the sources in
NetBeans and it gaves me two differents errors.
Best regards,
Pierre
From holland at eaglegenomics.com Mon Nov 2 17:14:58 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 2 Nov 2009 22:14:58 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
Message-ID: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
The current parser that converts the original Newick tree string into
a JGraphT does not take the root into account, and therefore it is not
recorded anywhere in the JGraphT object. Someone would have to change
the parser to be able to make it record the root node.
In the meantime, the JGraph library which is used for displaying
JGraphT graphs in a visual form does include root-finding methods, so
maybe you could investigate there to see if any of the existing
functions might help?
cheers,
Richard
On 2 Nov 2009, at 19:36, Tiago Ant?o wrote:
> 2009/11/2 Richard Holland :
>> The graphs returned by the Nexus parser are instances that
>> implement the
>> org.jgrapht.UndirectedGraph interface. Undirected graphs have no
>> root.
>
>
> Yes, that is a property of the jgrapht. But it might not be the case
> of the original nexus file/tree. So, if the tree is rooted, how can
> one know the root (without doing the parsing again ourselves to
> discover it)? I note two things:
> a) The root is obviously not one taxa, but one intermediate node.
> b) Even if the tree is unrooted, it might be interesting to know the
> "root", for instance to draw the tree, in the way that is was written
> in the file.
>
> Tiago
> PS - I also added to bugzilla one but related to the parser, but that
> is different problem...
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Mon Nov 2 18:11:13 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 2 Nov 2009 23:11:13 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
Message-ID: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
2009/11/2 Richard Holland :
> In the meantime, the JGraph library which is used for displaying JGraphT
> graphs in a visual form does include root-finding methods, so maybe you
> could investigate there to see if any of the existing functions might help?
Did that. None can help as the graph is not directed (it would be
trivial with a directed graph ,of course).
In the current form, the nexus parser is of limited use for tree information:
1. For rooted trees it has a bug has it doesn't say what is the root
2. For unrooted trees, sometimes the "root" (what the user perceives
as root) is interesting information.
Tiago
From holland at eaglegenomics.com Tue Nov 3 04:56:21 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 09:56:21 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
Message-ID: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
On 2 Nov 2009, at 23:11, Tiago Ant?o wrote:
> 2009/11/2 Richard Holland :
>> In the meantime, the JGraph library which is used for displaying
>> JGraphT
>> graphs in a visual form does include root-finding methods, so maybe
>> you
>> could investigate there to see if any of the existing functions
>> might help?
>
> Did that. None can help as the graph is not directed (it would be
> trivial with a directed graph ,of course).
> In the current form, the nexus parser is of limited use for tree
> information:
> 1. For rooted trees it has a bug has it doesn't say what is the root
The Newick strings used in the Nexus format are themselves undirected
graphs. They don't specify which node is the root, which means it must
be determined by computation after parsing the string. I'm unsure of
the algorithm to use to do this. If there are people on this list who
know the algorithm and have time to code it up, volunteers would be
welcome.
> 2. For unrooted trees, sometimes the "root" (what the user perceives
> as root) is interesting information.
What the user perceives as root in an unrooted tree could be different
for every user, so it would be hard to provide a standard function to
read their mind! However if everyone can come up with a commonly
agreed way of determining the most likely root computationally, it
would be interesting to add this as a feature, with the caveat that it
is only a best-effort approximation as the original tree is unrooted.
cheers,
Richard
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From pingou at pingoured.fr Tue Nov 3 09:45:08 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Tue, 03 Nov 2009 15:45:08 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <4AEF3801.10304@pingoured.fr>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
Message-ID: <1257259508.26094.2.camel@localhost.localdomain>
On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote:
> On 11/02/2009 08:29 PM, Andreas Prlic wrote:
> >>
> >> I understand the error, I am more surprised by the fact that the jar
> >> and the sources of the release 1.7 are given a different errors.
> >>
> >>
> > that's surprising... I built the src-jar and the other jars at the same time
> > so the code should be identical... Are you sure you are doing exactly the
> > same?
>
> I can confirm you this tomorrow but AFAIR before I left I tried the same
> code using or the jar file or the project generated from the sources in
> NetBeans and it gaves me two differents errors.
Ok so just for the record:
- If I use the .jar file I get an error (1)
- If I create a project in NetBeans using the source from BioJava I get
a different error (2)
- If I add as dependencies the sources from BioJava I get the first
error (1)
I thus went for the third solution and found my way around :-)
Thanks for the help.
Best regards,
Pierre
From andreas.prlic at gmail.com Tue Nov 3 09:56:06 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Tue, 3 Nov 2009 06:56:06 -0800
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257259508.26094.2.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
<1257259508.26094.2.camel@localhost.localdomain>
Message-ID: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
So what you are saying is that you had a classpath problem and by
configuring dependencies correctly the problem went away?
Andreas
On 3 Nov 2009, at 06:45, Pierre-Yves wrote:
> On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote:
>> On 11/02/2009 08:29 PM, Andreas Prlic wrote:
>>>>
>>>> I understand the error, I am more surprised by the fact that the
>>>> jar
>>>> and the sources of the release 1.7 are given a different errors.
>>>>
>>>>
>>> that's surprising... I built the src-jar and the other jars at the
>>> same time
>>> so the code should be identical... Are you sure you are doing
>>> exactly the
>>> same?
>>
>> I can confirm you this tomorrow but AFAIR before I left I tried the
>> same
>> code using or the jar file or the project generated from the
>> sources in
>> NetBeans and it gaves me two differents errors.
>
> Ok so just for the record:
> - If I use the .jar file I get an error (1)
> - If I create a project in NetBeans using the source from BioJava I
> get
> a different error (2)
> - If I add as dependencies the sources from BioJava I get the first
> error (1)
>
> I thus went for the third solution and found my way around :-)
>
> Thanks for the help.
>
> Best regards,
>
> Pierre
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
From pingou at pingoured.fr Tue Nov 3 10:00:32 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Tue, 03 Nov 2009 16:00:32 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
<1257259508.26094.2.camel@localhost.localdomain>
<14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
Message-ID: <1257260432.26094.3.camel@localhost.localdomain>
On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote:
> So what you are saying is that you had a classpath problem and by
> configuring dependencies correctly the problem went away?
In both case it was compiling, only the error at run time was different.
Regards,
Pierre
From andreas.prlic at gmail.com Tue Nov 3 10:05:17 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Tue, 3 Nov 2009 07:05:17 -0800
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257260432.26094.3.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
<1257259508.26094.2.camel@localhost.localdomain>
<14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
<1257260432.26094.3.camel@localhost.localdomain>
Message-ID: <447A40F9-52A1-4B22-8D10-27D22F8381B9@gmail.com>
Can you send me the code snipplet off list so I can take a look? Thanks,
A
On 3 Nov 2009, at 07:00, Pierre-Yves wrote:
> On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote:
>> So what you are saying is that you had a classpath problem and by
>> configuring dependencies correctly the problem went away?
>
> In both case it was compiling, only the error at run time was
> different.
>
> Regards,
>
> Pierre
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
From hlapp at gmx.net Tue Nov 3 11:53:23 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 3 Nov 2009 11:53:23 -0500
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
Message-ID:
The most common ways to root a tree is by mid-point rooting, or using
an outgroup. The latter I suppose is equivalent as the user specifying
a node as the root.
-hilmar
On Nov 3, 2009, at 4:56 AM, Richard Holland wrote:
>
> On 2 Nov 2009, at 23:11, Tiago Ant?o wrote:
>
>> 2009/11/2 Richard Holland :
>>> In the meantime, the JGraph library which is used for displaying
>>> JGraphT
>>> graphs in a visual form does include root-finding methods, so
>>> maybe you
>>> could investigate there to see if any of the existing functions
>>> might help?
>>
>> Did that. None can help as the graph is not directed (it would be
>> trivial with a directed graph ,of course).
>> In the current form, the nexus parser is of limited use for tree
>> information:
>> 1. For rooted trees it has a bug has it doesn't say what is the root
>
> The Newick strings used in the Nexus format are themselves
> undirected graphs. They don't specify which node is the root, which
> means it must be determined by computation after parsing the string.
> I'm unsure of the algorithm to use to do this. If there are people
> on this list who know the algorithm and have time to code it up,
> volunteers would be welcome.
>
>> 2. For unrooted trees, sometimes the "root" (what the user perceives
>> as root) is interesting information.
>
> What the user perceives as root in an unrooted tree could be
> different for every user, so it would be hard to provide a standard
> function to read their mind! However if everyone can come up with a
> commonly agreed way of determining the most likely root
> computationally, it would be interesting to add this as a feature,
> with the caveat that it is only a best-effort approximation as the
> original tree is unrooted.
>
> cheers,
> Richard
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
From thasso.griebel at uni-jena.de Tue Nov 3 12:58:14 2009
From: thasso.griebel at uni-jena.de (Thasso Griebel)
Date: Tue, 3 Nov 2009 18:58:14 +0100
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
Message-ID: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
Hi,
> On 2 Nov 2009, at 23:11, Tiago Ant?o wrote:
>
>> 2009/11/2 Richard Holland :
>>> In the meantime, the JGraph library which is used for displaying
>>> JGraphT
>>> graphs in a visual form does include root-finding methods, so
>>> maybe you
>>> could investigate there to see if any of the existing functions
>>> might help?
>>
>> Did that. None can help as the graph is not directed (it would be
>> trivial with a directed graph ,of course).
>> In the current form, the nexus parser is of limited use for tree
>> information:
>> 1. For rooted trees it has a bug has it doesn't say what is the root
>
> The Newick strings used in the Nexus format are themselves
> undirected graphs. They don't specify which node is the root, which
> means it must be determined by computation after parsing the string.
> I'm unsure of the algorithm to use to do this. If there are people
> on this list who know the algorithm and have time to code it up,
> volunteers would be welcome.
There is a way to uniquely get a root from a newick string. Usually a
rooted newick is surrounded with brackets, which indicates the root as
the highest node in the tree. For example:
(A, (B,C))
describes a tree rooted between "A" and the clade (B,C), and with the
surrounding brackets this is unique.
In nexus the situation might be a bit different. nexus allows you to
prefix the newick string with [&R] or [&U] to indicate rooted/unrooted
trees. For example:
tree treename = [&R] ((A,(B,C)),(D,E));
is a valid rooted nexus tree where the root is placed between the
clades [A.B,C] and [D,E], although in this example the newick is
surrounded with brackets and rooted uniquely by itself.
>> 2. For unrooted trees, sometimes the "root" (what the user perceives
>> as root) is interesting information.
>
> What the user perceives as root in an unrooted tree could be
> different for every user, so it would be hard to provide a standard
> function to read their mind! However if everyone can come up with a
> commonly agreed way of determining the most likely root
> computationally, it would be interesting to add this as a feature,
> with the caveat that it is only a best-effort approximation as the
> original tree is unrooted.
BioNJ implements multiple methods to determine a root in a neighbor-
joining tree. I can look it up, but I think the most common ways to
compute the root are: try to place the root in the "middle" such that
your tree is balanced and you have equal number of leaves to both
sides of the tree. The other method I remember is based on the edge
weights. Basically you find the longest path between two leaves and
place the root in the middle of that path (based on the path length).
I think the most common way though is to specify an outgroup node and
place the root on the path between that outgroup and its successor. I
am not sure if the outgroup can be described in nexus somehow.
I would also suggest to generally parse trees as rooted trees (maybe
jsut for th initial internal model). Creating an unrooted tree from a
rooted one is easy, remove the root and forget about directions. The
other way might be hard and ambiguous.
cheers,
Thasso
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
From tiagoantao at gmail.com Tue Nov 3 13:16:43 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 3 Nov 2009 18:16:43 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
Message-ID: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
2009/11/3 Thasso Griebel :
> There is a way to uniquely ?get a root from a newick string. Usually a
> rooted newick is surrounded with brackets, which indicates the root as the
> highest node in the tree. For example:
>
> (A, (B,C))
>
Agree, it is quite easy to get the root of the tree from the newick
representation. But it should be done on parsing and returned in some
way by the parsing system. If the user has to do it again, it means
that the user has to parse it again just to know the root node.
> I would also suggest to generally parse trees as rooted trees (maybe jsut
> for th initial internal model). Creating an unrooted tree from a rooted ?one
> is easy, remove the root and forget about directions. The other way might be
> hard and ambiguous.
100% agree.
The newick _representation_ always has a root by virtue of the way it
is done. If that root has meaning or not depends. Doing as you suggest
seems the most reasonable idea.
I would add that even if it is an unrooted tree, the topology might be
of interest. In my case I am doing a comparative visualizer and it
might be nice for the user to be able to visualize the topology as
specified. It has no biological meaning, but in practice, for many
users, it helps.
I note that PhyloXML (even by virtue of being a XML format) always
represents the phylogenies as trees (not weigthed DAGs). There an
attribute rooted which can be true or false.
But, anyway. Even assuming a very conservative view on this, the
current parser, for rooted trees, does not allow to determine where is
the root. I think that there would be a consensus that that is a bug?
Tiago
From holland at eaglegenomics.com Tue Nov 3 13:19:36 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 18:19:36 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
Message-ID:
Agreed that there is a bug. Now all we need is someone to go in and
fix it! :)
cheers,
Richard
On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
> 2009/11/3 Thasso Griebel :
>> There is a way to uniquely get a root from a newick string.
>> Usually a
>> rooted newick is surrounded with brackets, which indicates the root
>> as the
>> highest node in the tree. For example:
>>
>> (A, (B,C))
>>
>
> Agree, it is quite easy to get the root of the tree from the newick
> representation. But it should be done on parsing and returned in some
> way by the parsing system. If the user has to do it again, it means
> that the user has to parse it again just to know the root node.
>
>> I would also suggest to generally parse trees as rooted trees
>> (maybe jsut
>> for th initial internal model). Creating an unrooted tree from a
>> rooted one
>> is easy, remove the root and forget about directions. The other way
>> might be
>> hard and ambiguous.
>
> 100% agree.
> The newick _representation_ always has a root by virtue of the way it
> is done. If that root has meaning or not depends. Doing as you suggest
> seems the most reasonable idea.
> I would add that even if it is an unrooted tree, the topology might be
> of interest. In my case I am doing a comparative visualizer and it
> might be nice for the user to be able to visualize the topology as
> specified. It has no biological meaning, but in practice, for many
> users, it helps.
> I note that PhyloXML (even by virtue of being a XML format) always
> represents the phylogenies as trees (not weigthed DAGs). There an
> attribute rooted which can be true or false.
>
> But, anyway. Even assuming a very conservative view on this, the
> current parser, for rooted trees, does not allow to determine where is
> the root. I think that there would be a consensus that that is a bug?
>
> Tiago
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Tue Nov 3 13:24:52 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 3 Nov 2009 18:24:52 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
Message-ID: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
If somebody would provide the desired changes to the parser interface
(wrt this bug and the other one reported previously), I might offer to
to the grunt work.
But somebody has to say which interface changes are desired.
I remember which problems exist:
1. Lack of knowledge of root node
2. The p* stuff.
Tiago
2009/11/3 Richard Holland :
> Agreed that there is a bug. Now all we need is someone to go in and fix it!
> :)
>
> cheers,
> Richard
>
> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>
>> 2009/11/3 Thasso Griebel :
>>>
>>> There is a way to uniquely ?get a root from a newick string. Usually a
>>> rooted newick is surrounded with brackets, which indicates the root as
>>> the
>>> highest node in the tree. For example:
>>>
>>> (A, (B,C))
>>>
>>
>> Agree, it is quite easy to get the root of the tree from the newick
>> representation. But it should be done on parsing and returned in some
>> way by the parsing system. If the user has to do it again, it means
>> that the user has to parse it again just to know the root node.
>>
>>> I would also suggest to generally parse trees as rooted trees (maybe jsut
>>> for th initial internal model). Creating an unrooted tree from a rooted
>>> ?one
>>> is easy, remove the root and forget about directions. The other way might
>>> be
>>> hard and ambiguous.
>>
>> 100% agree.
>> The newick _representation_ always has a root by virtue of the way it
>> is done. If that root has meaning or not depends. Doing as you suggest
>> seems the most reasonable idea.
>> I would add that even if it is an unrooted tree, the topology might be
>> of interest. In my case I am doing a comparative visualizer and it
>> might be nice for the user to be able to visualize the topology as
>> specified. It has no biological meaning, but in practice, for many
>> users, it helps.
>> I note that PhyloXML (even by virtue of being a XML format) always
>> represents the phylogenies as trees (not weigthed DAGs). There an
>> attribute rooted which can be true or false.
>>
>> But, anyway. Even assuming a very conservative view on this, the
>> current parser, for rooted trees, does not allow to determine where is
>> the root. I think that there would be a consensus that that is a bug?
>>
>> Tiago
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From holland at eaglegenomics.com Tue Nov 3 13:46:05 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 18:46:05 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
Message-ID: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
> 1. Lack of knowledge of root node
The Newick tree string is read as-is and is not parsed. It only gets
parsed at the point of conversion to a Undirected or WeightedGraph
inside the TreeBlocks.java source code (inside the two types of get-As-
JGraphT methods). It's at this point the string is parsed and it's
here that root note determination should take place. It's already
known whether &R or &U have been specified here, which should help the
code work out what to do.
> 2. The p* stuff.
Exactly the same part of the code as described above. Wherever it
pushes values to the stack but prepends them with 'p' first, you'll
need to change the 'p' to some instance variable and provide a getter/
setter to change it, with 'p' being the default setting.
cheers,
Richard
>
> Tiago
> 2009/11/3 Richard Holland :
>> Agreed that there is a bug. Now all we need is someone to go in and
>> fix it!
>> :)
>>
>> cheers,
>> Richard
>>
>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>
>>> 2009/11/3 Thasso Griebel :
>>>>
>>>> There is a way to uniquely get a root from a newick string.
>>>> Usually a
>>>> rooted newick is surrounded with brackets, which indicates the
>>>> root as
>>>> the
>>>> highest node in the tree. For example:
>>>>
>>>> (A, (B,C))
>>>>
>>>
>>> Agree, it is quite easy to get the root of the tree from the newick
>>> representation. But it should be done on parsing and returned in
>>> some
>>> way by the parsing system. If the user has to do it again, it means
>>> that the user has to parse it again just to know the root node.
>>>
>>>> I would also suggest to generally parse trees as rooted trees
>>>> (maybe jsut
>>>> for th initial internal model). Creating an unrooted tree from a
>>>> rooted
>>>> one
>>>> is easy, remove the root and forget about directions. The other
>>>> way might
>>>> be
>>>> hard and ambiguous.
>>>
>>> 100% agree.
>>> The newick _representation_ always has a root by virtue of the way
>>> it
>>> is done. If that root has meaning or not depends. Doing as you
>>> suggest
>>> seems the most reasonable idea.
>>> I would add that even if it is an unrooted tree, the topology
>>> might be
>>> of interest. In my case I am doing a comparative visualizer and it
>>> might be nice for the user to be able to visualize the topology as
>>> specified. It has no biological meaning, but in practice, for many
>>> users, it helps.
>>> I note that PhyloXML (even by virtue of being a XML format) always
>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>> attribute rooted which can be true or false.
>>>
>>> But, anyway. Even assuming a very conservative view on this, the
>>> current parser, for rooted trees, does not allow to determine
>>> where is
>>> the root. I think that there would be a consensus that that is a
>>> bug?
>>>
>>> Tiago
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Tue Nov 3 13:55:23 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 3 Nov 2009 18:55:23 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
Message-ID: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
But the point is that the class interface changes to the outside user:
1. How does one report back the root to the user?
2. Regarding the prefix stuff, should the user be allowed to specify a
preferred prefix?
Both this things imply interface changes visible to users.
If you still need volunteers to do the change, I can do it. But I need
to know what changes to the user interface are to be done.
For 1, maybe a method getRoot, returning a string with the name of the
root node?
For 2, maybe an extended version of the parse function with a suffix
as input parameter?
2009/11/3 Richard Holland :
>> 1. Lack of knowledge of root node
>
> The Newick tree string is read as-is and is not parsed. It only gets parsed
> at the point of conversion to a Undirected or WeightedGraph inside the
> TreeBlocks.java source code (inside the two types of get-As-JGraphT
> methods). It's at this point the string is parsed and it's here that root
> note determination should take place. It's already known whether &R or &U
> have been specified here, which should help the code work out what to do.
>
>> 2. The p* stuff.
>
> Exactly the same part of the code as described above. Wherever it pushes
> values to the stack but prepends them with 'p' first, you'll need to change
> the 'p' to some instance variable and provide a getter/setter to change it,
> with 'p' being the default setting.
>
> cheers,
> Richard
>
>>
>> Tiago
>> 2009/11/3 Richard Holland :
>>>
>>> Agreed that there is a bug. Now all we need is someone to go in and fix
>>> it!
>>> :)
>>>
>>> cheers,
>>> Richard
>>>
>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>
>>>> 2009/11/3 Thasso Griebel :
>>>>>
>>>>> There is a way to uniquely ?get a root from a newick string. Usually a
>>>>> rooted newick is surrounded with brackets, which indicates the root as
>>>>> the
>>>>> highest node in the tree. For example:
>>>>>
>>>>> (A, (B,C))
>>>>>
>>>>
>>>> Agree, it is quite easy to get the root of the tree from the newick
>>>> representation. But it should be done on parsing and returned in some
>>>> way by the parsing system. If the user has to do it again, it means
>>>> that the user has to parse it again just to know the root node.
>>>>
>>>>> I would also suggest to generally parse trees as rooted trees (maybe
>>>>> jsut
>>>>> for th initial internal model). Creating an unrooted tree from a rooted
>>>>> ?one
>>>>> is easy, remove the root and forget about directions. The other way
>>>>> might
>>>>> be
>>>>> hard and ambiguous.
>>>>
>>>> 100% agree.
>>>> The newick _representation_ always has a root by virtue of the way it
>>>> is done. If that root has meaning or not depends. Doing as you suggest
>>>> seems the most reasonable idea.
>>>> I would add that even if it is an unrooted tree, the topology might be
>>>> of interest. In my case I am doing a comparative visualizer and it
>>>> might be nice for the user to be able to visualize the topology as
>>>> specified. It has no biological meaning, but in practice, for many
>>>> users, it helps.
>>>> I note that PhyloXML (even by virtue of being a XML format) always
>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>> attribute rooted which can be true or false.
>>>>
>>>> But, anyway. Even assuming a very conservative view on this, the
>>>> current parser, for rooted trees, does not allow to determine where is
>>>> the root. I think that there would be a consensus that that is a bug?
>>>>
>>>> Tiago
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>
>>
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From peter.midford at gmail.com Tue Nov 3 14:28:14 2009
From: peter.midford at gmail.com (Peter Midford)
Date: Tue, 3 Nov 2009 14:28:14 -0500
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID: <2E8B7EE9-2617-4096-B7AC-52A398D7E69F@gmail.com>
Tiago,
If you return a directed graph, the root will be a node
with no incoming edges.
Peter
On Nov 3, 2009, at 13:55, Tiago Ant?o wrote:
> But the point is that the class interface changes to the outside user:
> 1. How does one report back the root to the user?
> 2. Regarding the prefix stuff, should the user be allowed to specify a
> preferred prefix?
>
> Both this things imply interface changes visible to users.
> If you still need volunteers to do the change, I can do it. But I need
> to know what changes to the user interface are to be done.
> For 1, maybe a method getRoot, returning a string with the name of the
> root node?
> For 2, maybe an extended version of the parse function with a suffix
> as input parameter?
>
> 2009/11/3 Richard Holland :
>>> 1. Lack of knowledge of root node
>>
>> The Newick tree string is read as-is and is not parsed. It only
>> gets parsed
>> at the point of conversion to a Undirected or WeightedGraph inside
>> the
>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>> methods). It's at this point the string is parsed and it's here
>> that root
>> note determination should take place. It's already known whether &R
>> or &U
>> have been specified here, which should help the code work out what
>> to do.
>>
>>> 2. The p* stuff.
>>
>> Exactly the same part of the code as described above. Wherever it
>> pushes
>> values to the stack but prepends them with 'p' first, you'll need
>> to change
>> the 'p' to some instance variable and provide a getter/setter to
>> change it,
>> with 'p' being the default setting.
>>
>> cheers,
>> Richard
>>
>>>
>>> Tiago
>>> 2009/11/3 Richard Holland :
>>>>
>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>> and fix
>>>> it!
>>>> :)
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>
>>>>> 2009/11/3 Thasso Griebel :
>>>>>>
>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>> Usually a
>>>>>> rooted newick is surrounded with brackets, which indicates the
>>>>>> root as
>>>>>> the
>>>>>> highest node in the tree. For example:
>>>>>>
>>>>>> (A, (B,C))
>>>>>>
>>>>>
>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>> newick
>>>>> representation. But it should be done on parsing and returned in
>>>>> some
>>>>> way by the parsing system. If the user has to do it again, it
>>>>> means
>>>>> that the user has to parse it again just to know the root node.
>>>>>
>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>> (maybe
>>>>>> jsut
>>>>>> for th initial internal model). Creating an unrooted tree from
>>>>>> a rooted
>>>>>> one
>>>>>> is easy, remove the root and forget about directions. The other
>>>>>> way
>>>>>> might
>>>>>> be
>>>>>> hard and ambiguous.
>>>>>
>>>>> 100% agree.
>>>>> The newick _representation_ always has a root by virtue of the
>>>>> way it
>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>> suggest
>>>>> seems the most reasonable idea.
>>>>> I would add that even if it is an unrooted tree, the topology
>>>>> might be
>>>>> of interest. In my case I am doing a comparative visualizer and it
>>>>> might be nice for the user to be able to visualize the topology as
>>>>> specified. It has no biological meaning, but in practice, for many
>>>>> users, it helps.
>>>>> I note that PhyloXML (even by virtue of being a XML format) always
>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>> attribute rooted which can be true or false.
>>>>>
>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>> current parser, for rooted trees, does not allow to determine
>>>>> where is
>>>>> the root. I think that there would be a consensus that that is a
>>>>> bug?
>>>>>
>>>>> Tiago
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
Peter E. Midford
Mesquite Developer
Peter.Midford at gmail.com
From holland at eaglegenomics.com Tue Nov 3 15:20:31 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 20:20:31 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID:
A getRoot() function sounds good. It would return the String label of
the root node, the same as which identifies the corresponding vertex
in the JGraphT model. An equivalent setRoot() would be nice.
The prefix for the parser currently is hardcoded as p. Two new methods
- set and getDefaultPrefix which accept a string should be provided
(it should check that the string is valid, i.e. all alphanumeric and
with no spaces or other Newick-sensitive characters). The parser
should be changed to use the output from getDefaultPrefix() instead of
the hardcoded p. The default behaviour should be such that it behaves
the same as at present unless the user explicitly says otherwise by
calling the setDefaultPrefix() method.
Personally I would also alter the methods that return JGraphTs so that
they return their Directed equivalents if possible. I believe that
these can still be unrooted - you'd have to check the JGraphT
documentation to make sure.
Richard.
On 3 Nov 2009, at 18:55, Tiago Ant?o wrote:
> But the point is that the class interface changes to the outside user:
> 1. How does one report back the root to the user?
> 2. Regarding the prefix stuff, should the user be allowed to specify a
> preferred prefix?
>
> Both this things imply interface changes visible to users.
> If you still need volunteers to do the change, I can do it. But I need
> to know what changes to the user interface are to be done.
> For 1, maybe a method getRoot, returning a string with the name of the
> root node?
> For 2, maybe an extended version of the parse function with a suffix
> as input parameter?
>
> 2009/11/3 Richard Holland :
>>> 1. Lack of knowledge of root node
>>
>> The Newick tree string is read as-is and is not parsed. It only
>> gets parsed
>> at the point of conversion to a Undirected or WeightedGraph inside
>> the
>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>> methods). It's at this point the string is parsed and it's here
>> that root
>> note determination should take place. It's already known whether &R
>> or &U
>> have been specified here, which should help the code work out what
>> to do.
>>
>>> 2. The p* stuff.
>>
>> Exactly the same part of the code as described above. Wherever it
>> pushes
>> values to the stack but prepends them with 'p' first, you'll need
>> to change
>> the 'p' to some instance variable and provide a getter/setter to
>> change it,
>> with 'p' being the default setting.
>>
>> cheers,
>> Richard
>>
>>>
>>> Tiago
>>> 2009/11/3 Richard Holland :
>>>>
>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>> and fix
>>>> it!
>>>> :)
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>
>>>>> 2009/11/3 Thasso Griebel :
>>>>>>
>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>> Usually a
>>>>>> rooted newick is surrounded with brackets, which indicates the
>>>>>> root as
>>>>>> the
>>>>>> highest node in the tree. For example:
>>>>>>
>>>>>> (A, (B,C))
>>>>>>
>>>>>
>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>> newick
>>>>> representation. But it should be done on parsing and returned in
>>>>> some
>>>>> way by the parsing system. If the user has to do it again, it
>>>>> means
>>>>> that the user has to parse it again just to know the root node.
>>>>>
>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>> (maybe
>>>>>> jsut
>>>>>> for th initial internal model). Creating an unrooted tree from
>>>>>> a rooted
>>>>>> one
>>>>>> is easy, remove the root and forget about directions. The other
>>>>>> way
>>>>>> might
>>>>>> be
>>>>>> hard and ambiguous.
>>>>>
>>>>> 100% agree.
>>>>> The newick _representation_ always has a root by virtue of the
>>>>> way it
>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>> suggest
>>>>> seems the most reasonable idea.
>>>>> I would add that even if it is an unrooted tree, the topology
>>>>> might be
>>>>> of interest. In my case I am doing a comparative visualizer and it
>>>>> might be nice for the user to be able to visualize the topology as
>>>>> specified. It has no biological meaning, but in practice, for many
>>>>> users, it helps.
>>>>> I note that PhyloXML (even by virtue of being a XML format) always
>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>> attribute rooted which can be true or false.
>>>>>
>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>> current parser, for rooted trees, does not allow to determine
>>>>> where is
>>>>> the root. I think that there would be a consensus that that is a
>>>>> bug?
>>>>>
>>>>> Tiago
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From thasso.griebel at uni-jena.de Wed Nov 4 06:57:45 2009
From: thasso.griebel at uni-jena.de (Thasso Griebel)
Date: Wed, 4 Nov 2009 12:57:45 +0100
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
Hi,
> A getRoot() function sounds good. It would return the String label
> of the root node, the same as which identifies the corresponding
> vertex in the JGraphT model. An equivalent setRoot() would be nice.
Though you have to keep in mind that switching the root to another
node has certain implications on the tree structure and this has to be
taken into account when the newick string is parsed and the graph is
created. You have to parse the graph from newick and then "reroot" the
tree as the root might not be equal to the one specified in the newick
string.
> Personally I would also alter the methods that return JGraphTs so
> that they return their Directed equivalents if possible. I believe
> that these can still be unrooted - you'd have to check the JGraphT
> documentation to make sure.
You have to change that method signature if you want to use the same
method. The only relationship between JGraphTs UndirectedGraph and the
DirectedGraph counterpart is that they both extend the Graph
interface, but a DirectedGraph is not an UndirectedGraph. Switching to
DirectedGraph definitely breaks the current API ! I don't know how you
usually handle such situations in BioJava, but this clearly breaks
compatibility. Maybe it would be better to introduce a new method that
returns directed graphs ?
cheers,
-thasso
>
> Richard.
>
> On 3 Nov 2009, at 18:55, Tiago Ant?o wrote:
>
>> But the point is that the class interface changes to the outside
>> user:
>> 1. How does one report back the root to the user?
>> 2. Regarding the prefix stuff, should the user be allowed to
>> specify a
>> preferred prefix?
>>
>> Both this things imply interface changes visible to users.
>> If you still need volunteers to do the change, I can do it. But I
>> need
>> to know what changes to the user interface are to be done.
>> For 1, maybe a method getRoot, returning a string with the name of
>> the
>> root node?
>> For 2, maybe an extended version of the parse function with a suffix
>> as input parameter?
>>
>> 2009/11/3 Richard Holland :
>>>> 1. Lack of knowledge of root node
>>>
>>> The Newick tree string is read as-is and is not parsed. It only
>>> gets parsed
>>> at the point of conversion to a Undirected or WeightedGraph inside
>>> the
>>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>>> methods). It's at this point the string is parsed and it's here
>>> that root
>>> note determination should take place. It's already known whether
>>> &R or &U
>>> have been specified here, which should help the code work out what
>>> to do.
>>>
>>>> 2. The p* stuff.
>>>
>>> Exactly the same part of the code as described above. Wherever it
>>> pushes
>>> values to the stack but prepends them with 'p' first, you'll need
>>> to change
>>> the 'p' to some instance variable and provide a getter/setter to
>>> change it,
>>> with 'p' being the default setting.
>>>
>>> cheers,
>>> Richard
>>>
>>>>
>>>> Tiago
>>>> 2009/11/3 Richard Holland :
>>>>>
>>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>>> and fix
>>>>> it!
>>>>> :)
>>>>>
>>>>> cheers,
>>>>> Richard
>>>>>
>>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>>
>>>>>> 2009/11/3 Thasso Griebel :
>>>>>>>
>>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>>> Usually a
>>>>>>> rooted newick is surrounded with brackets, which indicates the
>>>>>>> root as
>>>>>>> the
>>>>>>> highest node in the tree. For example:
>>>>>>>
>>>>>>> (A, (B,C))
>>>>>>>
>>>>>>
>>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>>> newick
>>>>>> representation. But it should be done on parsing and returned
>>>>>> in some
>>>>>> way by the parsing system. If the user has to do it again, it
>>>>>> means
>>>>>> that the user has to parse it again just to know the root node.
>>>>>>
>>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>>> (maybe
>>>>>>> jsut
>>>>>>> for th initial internal model). Creating an unrooted tree from
>>>>>>> a rooted
>>>>>>> one
>>>>>>> is easy, remove the root and forget about directions. The
>>>>>>> other way
>>>>>>> might
>>>>>>> be
>>>>>>> hard and ambiguous.
>>>>>>
>>>>>> 100% agree.
>>>>>> The newick _representation_ always has a root by virtue of the
>>>>>> way it
>>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>>> suggest
>>>>>> seems the most reasonable idea.
>>>>>> I would add that even if it is an unrooted tree, the topology
>>>>>> might be
>>>>>> of interest. In my case I am doing a comparative visualizer and
>>>>>> it
>>>>>> might be nice for the user to be able to visualize the topology
>>>>>> as
>>>>>> specified. It has no biological meaning, but in practice, for
>>>>>> many
>>>>>> users, it helps.
>>>>>> I note that PhyloXML (even by virtue of being a XML format)
>>>>>> always
>>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>>> attribute rooted which can be true or false.
>>>>>>
>>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>>> current parser, for rooted trees, does not allow to determine
>>>>>> where is
>>>>>> the root. I think that there would be a consensus that that is
>>>>>> a bug?
>>>>>>
>>>>>> Tiago
>>>>>
>>>>> --
>>>>> Richard Holland, BSc MBCS
>>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>>> http://www.eaglegenomics.com/
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "The hottest places in hell are reserved for those who, in times of
>>>> moral crisis, maintain a neutrality." - Dante
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>
>>
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
From tiagoantao at gmail.com Wed Nov 4 07:40:46 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 4 Nov 2009 12:40:46 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
2009/11/3 Richard Holland :
> The prefix for the parser currently is hardcoded as p. Two new methods - set
> and getDefaultPrefix which accept a string should be provided (it should
> check that the string is valid, i.e. all alphanumeric and with no spaces or
> other Newick-sensitive characters). The parser should be changed to use the
> output from getDefaultPrefix() instead of the hardcoded p. The default
> behaviour should be such that it behaves the same as at present unless the
> user explicitly says otherwise by calling the setDefaultPrefix() method.
This default behavior would still raise an exception with nodes called
p* . I would suggest a minor change: If there is a clash, the parser
would try the next p* (or whatever defaultPrefix) ...
Example to make it clear: if there is a leaf called p2, internal nodes
generated would be p1, p3, p4, ....
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From tiagoantao at gmail.com Wed Nov 4 07:44:21 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 4 Nov 2009 12:44:21 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
Message-ID: <6d941f120911040444y33da2642oe7104708a2d2a6cb@mail.gmail.com>
2009/11/4 Thasso Griebel :
>> Personally I would also alter the methods that return JGraphTs so that
>> they return their Directed equivalents if possible. I believe that these can
>> still be unrooted - you'd have to check the JGraphT documentation to make
>> sure.
>
> You have to change that method signature if you want to use the same method.
> The only relationship between JGraphTs UndirectedGraph and the DirectedGraph
> counterpart is that they both extend the Graph interface, but a
> DirectedGraph is not an UndirectedGraph. Switching to DirectedGraph
> definitely breaks the current API ! I don't know how you usually handle such
> situations in BioJava, but this clearly breaks compatibility. Maybe it would
> be better to introduce a new method that returns directed graphs ?
I also don't know how BioJava sorts these kinds of issues. But my
personal, outsider, opinion would be in your direction, ie:
a. Not break the current API
b. Add a new method with a directed graph
c. (extra) Add a new method boolean isRooted(void) to check is the
tree is rooted or not...
Best
Tiago
From holland at eaglegenomics.com Wed Nov 4 07:46:01 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 4 Nov 2009 12:46:01 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
Message-ID:
>
> You have to change that method signature if you want to use the same
> method. The only relationship between JGraphTs UndirectedGraph and
> the DirectedGraph counterpart is that they both extend the Graph
> interface, but a DirectedGraph is not an UndirectedGraph. Switching
> to DirectedGraph definitely breaks the current API ! I don't know
> how you usually handle such situations in BioJava, but this clearly
> breaks compatibility. Maybe it would be better to introduce a new
> method that returns directed graphs ?
Whether or not to break the API depends on a few things. First, how
old and well adopted is the code. Second, is the existing API
illogical or just plain wrong. A balance between the two gives the
confidence in which the API can be changed.
In this instance, the code is fairly new, not widely adopted, and the
existing API is clearly wrong by forcing all JGraphT graphs to be
undirected.
To keep everyone happy, I would introduce a new method with a new name
that takes a boolean or enum option indicating what type of graph the
user wants (undirected,directed,whatever). I would then deprecate the
existing method and move its contents into the undirected part of the
new method, and replace the old method contents with a call to the new
method with the option set to undirected.
cheers,
Richard
> cheers,
> -thasso
>
>
>
>
>
>
>>
>> Richard.
>>
>> On 3 Nov 2009, at 18:55, Tiago Ant?o wrote:
>>
>>> But the point is that the class interface changes to the outside
>>> user:
>>> 1. How does one report back the root to the user?
>>> 2. Regarding the prefix stuff, should the user be allowed to
>>> specify a
>>> preferred prefix?
>>>
>>> Both this things imply interface changes visible to users.
>>> If you still need volunteers to do the change, I can do it. But I
>>> need
>>> to know what changes to the user interface are to be done.
>>> For 1, maybe a method getRoot, returning a string with the name of
>>> the
>>> root node?
>>> For 2, maybe an extended version of the parse function with a suffix
>>> as input parameter?
>>>
>>> 2009/11/3 Richard Holland :
>>>>> 1. Lack of knowledge of root node
>>>>
>>>> The Newick tree string is read as-is and is not parsed. It only
>>>> gets parsed
>>>> at the point of conversion to a Undirected or WeightedGraph
>>>> inside the
>>>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>>>> methods). It's at this point the string is parsed and it's here
>>>> that root
>>>> note determination should take place. It's already known whether
>>>> &R or &U
>>>> have been specified here, which should help the code work out
>>>> what to do.
>>>>
>>>>> 2. The p* stuff.
>>>>
>>>> Exactly the same part of the code as described above. Wherever it
>>>> pushes
>>>> values to the stack but prepends them with 'p' first, you'll need
>>>> to change
>>>> the 'p' to some instance variable and provide a getter/setter to
>>>> change it,
>>>> with 'p' being the default setting.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>>>
>>>>> Tiago
>>>>> 2009/11/3 Richard Holland :
>>>>>>
>>>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>>>> and fix
>>>>>> it!
>>>>>> :)
>>>>>>
>>>>>> cheers,
>>>>>> Richard
>>>>>>
>>>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>>>
>>>>>>> 2009/11/3 Thasso Griebel :
>>>>>>>>
>>>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>>>> Usually a
>>>>>>>> rooted newick is surrounded with brackets, which indicates
>>>>>>>> the root as
>>>>>>>> the
>>>>>>>> highest node in the tree. For example:
>>>>>>>>
>>>>>>>> (A, (B,C))
>>>>>>>>
>>>>>>>
>>>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>>>> newick
>>>>>>> representation. But it should be done on parsing and returned
>>>>>>> in some
>>>>>>> way by the parsing system. If the user has to do it again, it
>>>>>>> means
>>>>>>> that the user has to parse it again just to know the root node.
>>>>>>>
>>>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>>>> (maybe
>>>>>>>> jsut
>>>>>>>> for th initial internal model). Creating an unrooted tree
>>>>>>>> from a rooted
>>>>>>>> one
>>>>>>>> is easy, remove the root and forget about directions. The
>>>>>>>> other way
>>>>>>>> might
>>>>>>>> be
>>>>>>>> hard and ambiguous.
>>>>>>>
>>>>>>> 100% agree.
>>>>>>> The newick _representation_ always has a root by virtue of the
>>>>>>> way it
>>>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>>>> suggest
>>>>>>> seems the most reasonable idea.
>>>>>>> I would add that even if it is an unrooted tree, the topology
>>>>>>> might be
>>>>>>> of interest. In my case I am doing a comparative visualizer
>>>>>>> and it
>>>>>>> might be nice for the user to be able to visualize the
>>>>>>> topology as
>>>>>>> specified. It has no biological meaning, but in practice, for
>>>>>>> many
>>>>>>> users, it helps.
>>>>>>> I note that PhyloXML (even by virtue of being a XML format)
>>>>>>> always
>>>>>>> represents the phylogenies as trees (not weigthed DAGs). There
>>>>>>> an
>>>>>>> attribute rooted which can be true or false.
>>>>>>>
>>>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>>>> current parser, for rooted trees, does not allow to determine
>>>>>>> where is
>>>>>>> the root. I think that there would be a consensus that that is
>>>>>>> a bug?
>>>>>>>
>>>>>>> Tiago
>>>>>>
>>>>>> --
>>>>>> Richard Holland, BSc MBCS
>>>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>>>> http://www.eaglegenomics.com/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "The hottest places in hell are reserved for those who, in times
>>>>> of
>>>>> moral crisis, maintain a neutrality." - Dante
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>
> --
> Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer
> Bioinformatik
> Office 3426--http://bio.informatik.uni-jena.de--Institut fuer
> Informatik
> Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet
> Jena
> Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena,
> Germany
>
>
>
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From holland at eaglegenomics.com Wed Nov 4 07:46:34 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 4 Nov 2009 12:46:34 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
Message-ID: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
Sounds good.
On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
> 2009/11/3 Richard Holland :
>> The prefix for the parser currently is hardcoded as p. Two new
>> methods - set
>> and getDefaultPrefix which accept a string should be provided (it
>> should
>> check that the string is valid, i.e. all alphanumeric and with no
>> spaces or
>> other Newick-sensitive characters). The parser should be changed to
>> use the
>> output from getDefaultPrefix() instead of the hardcoded p. The
>> default
>> behaviour should be such that it behaves the same as at present
>> unless the
>> user explicitly says otherwise by calling the setDefaultPrefix()
>> method.
>
> This default behavior would still raise an exception with nodes called
> p* . I would suggest a minor change: If there is a clash, the parser
> would try the next p* (or whatever defaultPrefix) ...
>
> Example to make it clear: if there is a leaf called p2, internal nodes
> generated would be p1, p3, p4, ....
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From holland at eaglegenomics.com Wed Nov 4 07:51:37 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 4 Nov 2009 12:51:37 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
Message-ID:
ah... except a problem! The parser does not know all names in the
string in advance, so if it auto-assigns one that is then used later
in the string, we have the same problem with name clashes as before.
The names the parser assigns cannot totally avoid all clashes unless
it has already parsed the string to find out what names were used in
the string itself already. So some kind of pre-parse would be necessary.
On 4 Nov 2009, at 12:46, Richard Holland wrote:
> Sounds good.
>
> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
>
>> 2009/11/3 Richard Holland :
>>> The prefix for the parser currently is hardcoded as p. Two new
>>> methods - set
>>> and getDefaultPrefix which accept a string should be provided (it
>>> should
>>> check that the string is valid, i.e. all alphanumeric and with no
>>> spaces or
>>> other Newick-sensitive characters). The parser should be changed
>>> to use the
>>> output from getDefaultPrefix() instead of the hardcoded p. The
>>> default
>>> behaviour should be such that it behaves the same as at present
>>> unless the
>>> user explicitly says otherwise by calling the setDefaultPrefix()
>>> method.
>>
>> This default behavior would still raise an exception with nodes
>> called
>> p* . I would suggest a minor change: If there is a clash, the parser
>> would try the next p* (or whatever defaultPrefix) ...
>>
>> Example to make it clear: if there is a leaf called p2, internal
>> nodes
>> generated would be p1, p3, p4, ....
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Wed Nov 4 12:18:52 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 4 Nov 2009 17:18:52 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
Message-ID: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
Unless anyone with experience in biojava development wants to take on
this, I would volunteer to do this. I ended up using the PhyloXML
forester-atv parser (and moving to phyloxml instead of nexus), but as
I reported this, I might as well sort it out...
2009/11/4 Richard Holland :
> ah... except a problem! The parser does not know all names in the string in
> advance, so if it auto-assigns one that is then used later in the string, we
> have the same problem with name clashes as before.
>
> The names the parser assigns cannot totally avoid all clashes unless it has
> already parsed the string to find out what names were used in the string
> itself already. So some kind of pre-parse would be necessary.
>
> On 4 Nov 2009, at 12:46, Richard Holland wrote:
>
>> Sounds good.
>>
>> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
>>
>>> 2009/11/3 Richard Holland :
>>>>
>>>> The prefix for the parser currently is hardcoded as p. Two new methods -
>>>> set
>>>> and getDefaultPrefix which accept a string should be provided (it should
>>>> check that the string is valid, i.e. all alphanumeric and with no spaces
>>>> or
>>>> other Newick-sensitive characters). The parser should be changed to use
>>>> the
>>>> output from getDefaultPrefix() instead of the hardcoded p. The default
>>>> behaviour should be such that it behaves the same as at present unless
>>>> the
>>>> user explicitly says otherwise by calling the setDefaultPrefix() method.
>>>
>>> This default behavior would still raise an exception with nodes called
>>> p* . I would suggest a minor change: If there is a clash, the parser
>>> would try the next p* (or whatever defaultPrefix) ...
>>>
>>> Example to make it clear: if there is a leaf called p2, internal nodes
>>> generated would be p1, p3, p4, ....
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>> _______________________________________________
>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From andreas at sdsc.edu Wed Nov 4 12:26:06 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 4 Nov 2009 09:26:06 -0800
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
<6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
Message-ID: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com>
excellent, thanks for taking this on!
Andreas
2009/11/4 Tiago Ant?o
> Unless anyone with experience in biojava development wants to take on
> this, I would volunteer to do this. I ended up using the PhyloXML
> forester-atv parser (and moving to phyloxml instead of nexus), but as
> I reported this, I might as well sort it out...
>
> 2009/11/4 Richard Holland :
> > ah... except a problem! The parser does not know all names in the string
> in
> > advance, so if it auto-assigns one that is then used later in the string,
> we
> > have the same problem with name clashes as before.
> >
> > The names the parser assigns cannot totally avoid all clashes unless it
> has
> > already parsed the string to find out what names were used in the string
> > itself already. So some kind of pre-parse would be necessary.
> >
> > On 4 Nov 2009, at 12:46, Richard Holland wrote:
> >
> >> Sounds good.
> >>
> >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
> >>
> >>> 2009/11/3 Richard Holland :
> >>>>
> >>>> The prefix for the parser currently is hardcoded as p. Two new methods
> -
> >>>> set
> >>>> and getDefaultPrefix which accept a string should be provided (it
> should
> >>>> check that the string is valid, i.e. all alphanumeric and with no
> spaces
> >>>> or
> >>>> other Newick-sensitive characters). The parser should be changed to
> use
> >>>> the
> >>>> output from getDefaultPrefix() instead of the hardcoded p. The default
> >>>> behaviour should be such that it behaves the same as at present unless
> >>>> the
> >>>> user explicitly says otherwise by calling the setDefaultPrefix()
> method.
> >>>
> >>> This default behavior would still raise an exception with nodes called
> >>> p* . I would suggest a minor change: If there is a clash, the parser
> >>> would try the next p* (or whatever defaultPrefix) ...
> >>>
> >>> Example to make it clear: if there is a leaf called p2, internal nodes
> >>> generated would be p1, p3, p4, ....
> >>>
> >>> --
> >>> "The hottest places in hell are reserved for those who, in times of
> >>> moral crisis, maintain a neutrality." - Dante
> >>
> >> --
> >> Richard Holland, BSc MBCS
> >> Operations and Delivery Director, Eagle Genomics Ltd
> >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> >> http://www.eaglegenomics.com/
> >>
> >>
> >> _______________________________________________
> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
> >
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From tiagoantao at gmail.com Fri Nov 6 06:30:00 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 6 Nov 2009 11:30:00 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
<6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
<59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com>
Message-ID: <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com>
I've done a few changes to TreesBlock, namely implementing a version
of what was talked here:
1. I maintained getTreeAsJGraphT and getTreeAsWeightedJGraphT as they
are in terms of interface
2. There is now a new method getTopNode, stating which node is on the
"top". I use the name getTopNode and not getRootNode to avoid
misleading users: only rooted trees have a root, but for the nexus
type of representation all have a "top" (which in rooted trees is the
root)
3. There exist now setNodePrefix and getNodePrefix to be able to
change the prefix (which defaults to p, as before)
In my view these changes solve both problems: The issue with node
names and the need to know the root/top of a nexus tree. It might not
be the best solution, but it gets things on the right track without
taking too much of my time. There are also no changes to the
signatures of existing methods
Now, there is still a problem:
addTree(final String label, UndirectedGraph treegraph)
Is highly dependent on the p* convention for internal nodes.
Here I would be tempted to change the method signature to:
addTree(final String label, UndirectedGraph
treegraph, String topNode)
Interestingly there is no addTree with weighted graphs (for distances).
If nobody sees a problem with this, I will change addTree.
I will then attach a patch to the currently open bug (along with test
cases). And it should be done.
2009/11/4 Andreas Prlic :
> excellent, thanks for taking this on!
> Andreas
>
> 2009/11/4 Tiago Ant?o
>>
>> Unless anyone with experience in biojava development wants to take on
>> this, I would volunteer to do this. I ended up using the PhyloXML
>> forester-atv parser (and moving to phyloxml instead of nexus), but as
>> I reported this, I might as well sort it out...
>>
>> 2009/11/4 Richard Holland :
>> > ah... except a problem! The parser does not know all names in the string
>> > in
>> > advance, so if it auto-assigns one that is then used later in the
>> > string, we
>> > have the same problem with name clashes as before.
>> >
>> > The names the parser assigns cannot totally avoid all clashes unless it
>> > has
>> > already parsed the string to find out what names were used in the string
>> > itself already. So some kind of pre-parse would be necessary.
>> >
>> > On 4 Nov 2009, at 12:46, Richard Holland wrote:
>> >
>> >> Sounds good.
>> >>
>> >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
>> >>
>> >>> 2009/11/3 Richard Holland :
>> >>>>
>> >>>> The prefix for the parser currently is hardcoded as p. Two new
>> >>>> methods -
>> >>>> set
>> >>>> and getDefaultPrefix which accept a string should be provided (it
>> >>>> should
>> >>>> check that the string is valid, i.e. all alphanumeric and with no
>> >>>> spaces
>> >>>> or
>> >>>> other Newick-sensitive characters). The parser should be changed to
>> >>>> use
>> >>>> the
>> >>>> output from getDefaultPrefix() instead of the hardcoded p. The
>> >>>> default
>> >>>> behaviour should be such that it behaves the same as at present
>> >>>> unless
>> >>>> the
>> >>>> user explicitly says otherwise by calling the setDefaultPrefix()
>> >>>> method.
>> >>>
>> >>> This default behavior would still raise an exception with nodes called
>> >>> p* . I would suggest a minor change: If there is a clash, the parser
>> >>> would try the next p* (or whatever defaultPrefix) ...
>> >>>
>> >>> Example to make it clear: if there is a leaf called p2, internal nodes
>> >>> generated would be p1, p3, p4, ....
>> >>>
>> >>> --
>> >>> "The hottest places in hell are reserved for those who, in times of
>> >>> moral crisis, maintain a neutrality." - Dante
>> >>
>> >> --
>> >> Richard Holland, BSc MBCS
>> >> Operations and Delivery Director, Eagle Genomics Ltd
>> >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> >> http://www.eaglegenomics.com/
>> >>
>> >>
>> >> _______________________________________________
>> >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>> > --
>> > Richard Holland, BSc MBCS
>> > Operations and Delivery Director, Eagle Genomics Ltd
>> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> > http://www.eaglegenomics.com/
>> >
>> >
>>
>>
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>>
>> _______________________________________________
>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From holland at eaglegenomics.com Fri Nov 6 06:45:18 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 6 Nov 2009 11:45:18 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
<6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
<59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com>
<6d941f120911060330t236fc033x105c8d05749fad36@mail.gmail.com>
Message-ID:
Sounds great.
With regard to addTree you could add a new method with the signature
that you propose, copy the existing method body into it and modify
appropriately, then delete the existing method body and replace with a
call to the new one instead, with a default topNode value that
corresponds to the assumptions that the existing method currently makes.
cheers,
Richard
On 6 Nov 2009, at 11:30, Tiago Ant?o wrote:
> I've done a few changes to TreesBlock, namely implementing a version
> of what was talked here:
>
> 1. I maintained getTreeAsJGraphT and getTreeAsWeightedJGraphT as they
> are in terms of interface
> 2. There is now a new method getTopNode, stating which node is on the
> "top". I use the name getTopNode and not getRootNode to avoid
> misleading users: only rooted trees have a root, but for the nexus
> type of representation all have a "top" (which in rooted trees is the
> root)
> 3. There exist now setNodePrefix and getNodePrefix to be able to
> change the prefix (which defaults to p, as before)
>
> In my view these changes solve both problems: The issue with node
> names and the need to know the root/top of a nexus tree. It might not
> be the best solution, but it gets things on the right track without
> taking too much of my time. There are also no changes to the
> signatures of existing methods
>
> Now, there is still a problem:
> addTree(final String label, UndirectedGraph
> treegraph)
> Is highly dependent on the p* convention for internal nodes.
> Here I would be tempted to change the method signature to:
> addTree(final String label, UndirectedGraph
> treegraph, String topNode)
>
> Interestingly there is no addTree with weighted graphs (for
> distances).
>
> If nobody sees a problem with this, I will change addTree.
>
> I will then attach a patch to the currently open bug (along with test
> cases). And it should be done.
>
> 2009/11/4 Andreas Prlic :
>> excellent, thanks for taking this on!
>> Andreas
>>
>> 2009/11/4 Tiago Ant?o
>>>
>>> Unless anyone with experience in biojava development wants to take
>>> on
>>> this, I would volunteer to do this. I ended up using the PhyloXML
>>> forester-atv parser (and moving to phyloxml instead of nexus), but
>>> as
>>> I reported this, I might as well sort it out...
>>>
>>> 2009/11/4 Richard Holland :
>>>> ah... except a problem! The parser does not know all names in the
>>>> string
>>>> in
>>>> advance, so if it auto-assigns one that is then used later in the
>>>> string, we
>>>> have the same problem with name clashes as before.
>>>>
>>>> The names the parser assigns cannot totally avoid all clashes
>>>> unless it
>>>> has
>>>> already parsed the string to find out what names were used in the
>>>> string
>>>> itself already. So some kind of pre-parse would be necessary.
>>>>
>>>> On 4 Nov 2009, at 12:46, Richard Holland wrote:
>>>>
>>>>> Sounds good.
>>>>>
>>>>> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
>>>>>
>>>>>> 2009/11/3 Richard Holland :
>>>>>>>
>>>>>>> The prefix for the parser currently is hardcoded as p. Two new
>>>>>>> methods -
>>>>>>> set
>>>>>>> and getDefaultPrefix which accept a string should be provided
>>>>>>> (it
>>>>>>> should
>>>>>>> check that the string is valid, i.e. all alphanumeric and with
>>>>>>> no
>>>>>>> spaces
>>>>>>> or
>>>>>>> other Newick-sensitive characters). The parser should be
>>>>>>> changed to
>>>>>>> use
>>>>>>> the
>>>>>>> output from getDefaultPrefix() instead of the hardcoded p. The
>>>>>>> default
>>>>>>> behaviour should be such that it behaves the same as at present
>>>>>>> unless
>>>>>>> the
>>>>>>> user explicitly says otherwise by calling the setDefaultPrefix()
>>>>>>> method.
>>>>>>
>>>>>> This default behavior would still raise an exception with nodes
>>>>>> called
>>>>>> p* . I would suggest a minor change: If there is a clash, the
>>>>>> parser
>>>>>> would try the next p* (or whatever defaultPrefix) ...
>>>>>>
>>>>>> Example to make it clear: if there is a leaf called p2,
>>>>>> internal nodes
>>>>>> generated would be p1, p3, p4, ....
>>>>>>
>>>>>> --
>>>>>> "The hottest places in hell are reserved for those who, in
>>>>>> times of
>>>>>> moral crisis, maintain a neutrality." - Dante
>>>>>
>>>>> --
>>>>> Richard Holland, BSc MBCS
>>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>>> http://www.eaglegenomics.com/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Fri Nov 6 08:26:58 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 6 Nov 2009 13:26:58 +0000
Subject: [Biojava-l] Newick/Nexus processing of non-binary trees
Message-ID: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
Hi,
Either I am looking for too much time to the code or it seems to me
that the current implementation only supports binary trees (ie, trees
with 2 children).
I have tested with:
tree tree6 = (1,2,3);
And I get only 2 edges. The edge pointing to "1" gets lost.
Inspecting the old code, this seems to be how it is implemented.
In the case I am correct, this renders the whole tree parser somewhat
useless in its current form, as most phylo trees are not binary only.
The other two bugs are now corrected, but this is much more serious, me thinks.
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From holland at eaglegenomics.com Fri Nov 6 09:10:54 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 6 Nov 2009 14:10:54 +0000
Subject: [Biojava-l] Newick/Nexus processing of non-binary trees
In-Reply-To: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
Message-ID: <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com>
If that's true, sounds like it's broke. Is the old code easily
modified to suit arbitrary numbers of children?
On 6 Nov 2009, at 13:26, Tiago Ant?o wrote:
> Hi,
>
> Either I am looking for too much time to the code or it seems to me
> that the current implementation only supports binary trees (ie, trees
> with 2 children).
>
> I have tested with:
> tree tree6 = (1,2,3);
>
> And I get only 2 edges. The edge pointing to "1" gets lost.
> Inspecting the old code, this seems to be how it is implemented.
>
> In the case I am correct, this renders the whole tree parser somewhat
> useless in its current form, as most phylo trees are not binary only.
>
> The other two bugs are now corrected, but this is much more serious,
> me thinks.
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Fri Nov 6 09:40:00 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 6 Nov 2009 14:40:00 +0000
Subject: [Biojava-l] Newick/Nexus processing of non-binary trees
In-Reply-To: <120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com>
References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
<120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com>
Message-ID: <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com>
2009/11/6 Richard Holland :
> If that's true, sounds like it's broke. Is the old code easily modified to
> suit arbitrary numbers of children?
I don't think so. It uses a stack based solution, so it would not be
possible to know when a part of the stack belongs to the current node
being processed or something else on the tree.
One could put markers on the stack or something, but it would become a
bit convoluted. I would suppose a recursive implementation would be
cleaner here.
My suggestion: for somebody else to verify my findings. I might be
doing something stupidly wrong. Maybe things are correct. Just a
simple tree like (1,2,3) (as long as it is not binary) - should expose
the problem.
From cmasak at gmail.com Fri Nov 6 11:25:57 2009
From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=)
Date: Fri, 6 Nov 2009 17:25:57 +0100
Subject: [Biojava-l] How do I read a FASTA file containing protein sequences
in lowercase?
Message-ID: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com>
I'm using RichSequenceIterator to read FASTA files containing
proteins. Somehow it doesn't work when the protein sequences are in
lowercase, which they sometimes are when downloaded from e.g. Uniprot.
My code fails to recognize the following file as containing a protein
sequence:
>OPSD_FELCA
mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln
lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv
aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq
qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn
cmlttlccgknplgddeasttgsktetsqvapa
What am I missing? Here's the code I'm using to read in sequences:
private List sequencesFromInputStream(InputStream stream) {
BufferedInputStream bufferedStream = new BufferedInputStream(stream);
Namespace ns = RichObjectFactory.getDefaultNamespace();
RichSequenceIterator seqit = null;
try {
seqit = RichSequence.IOTools.readStream(bufferedStream, ns);
} catch (IOException e) {
logger.error("Couldn't read sequences from file", e);
return Collections.emptyList();
}
List sequences = new ArrayList();
try {
while ( seqit.hasNext() ) {
RichSequence rseq;
rseq = seqit.nextRichSequence(); // *error occurs here*
if (rseq == null)
continue;
String alphabet = rseq.getAlphabet().getName();
sequences.add(
"DNA".equals(alphabet) ? new BiojavaDNA(rseq)
: "RNA".equals(alphabet) ? new BiojavaRNA(rseq)
: new BiojavaProtein(rseq) );
}
} catch (NoSuchElementException e) {
logger.error("Read past last sequence", e);
} catch (BioException e) {
logger.error(e); // *ends up here*
}
return sequences;
}
Grateful for any pointers you might have.
Regards,
// Carl M?sak
From cmasak at gmail.com Fri Nov 6 11:54:30 2009
From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=)
Date: Fri, 6 Nov 2009 17:54:30 +0100
Subject: [Biojava-l] How do I read a FASTA file containing protein
sequences in lowercase?
In-Reply-To:
References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com>
Message-ID: <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com>
Richard (>), Carl (>>):
>> I'm using RichSequenceIterator to read FASTA files containing
>> proteins. Somehow it doesn't work when the protein sequences are in
>> lowercase, which they sometimes are when downloaded from e.g. Uniprot.
>> My code fails to recognize the following file as containing a protein
>> sequence:
>>
>>> OPSD_FELCA
>>
>>
>> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln
>>
>> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv
>>
>> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq
>>
>> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn
>> cmlttlccgknplgddeasttgsktetsqvapa
>>
>> What am I missing? Here's the code I'm using to read in sequences:
>>
>> private List sequencesFromInputStream(InputStream stream) {
>>
>> BufferedInputStream bufferedStream = new
>> BufferedInputStream(stream);
>> Namespace ns = RichObjectFactory.getDefaultNamespace();
>> RichSequenceIterator seqit = null;
>>
>> try {
>> seqit = RichSequence.IOTools.readStream(bufferedStream, ns);
>> } catch (IOException e) {
>> logger.error("Couldn't read sequences from file", e);
>> return Collections.emptyList();
>> }
>>
>> List sequences = new ArrayList();
>> try {
>> while ( seqit.hasNext() ) {
>> RichSequence rseq;
>> rseq = seqit.nextRichSequence(); // *error occurs here*
>> if (rseq == null)
>> continue;
>> String alphabet = rseq.getAlphabet().getName();
>> sequences.add(
>> "DNA".equals(alphabet) ? new BiojavaDNA(rseq)
>> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq)
>> : new BiojavaProtein(rseq) );
>> }
>> } catch (NoSuchElementException e) {
>> logger.error("Read past last sequence", e);
>> } catch (BioException e) {
>> logger.error(e); // *ends up here*
>> }
>>
>> return sequences;
>> }
>>
>> Grateful for any pointers you might have.
>
> Could you post the output from the exception stack that it generates?
org.biojava.bio.BioException: Could not read sequence
at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream(BiojavaManager.java:314)
at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile(BiojavaManager.java:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke(AbstractManagerMethodDispatcher.java:243)
at net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread(JavaManagerMethodDispatcher.java:248)
at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke(AbstractManagerMethodDispatcher.java:130)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at net.bioclipse.recording.WrapInProxyAdvice.invoke(WrapInProxyAdvice.java:22)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:59)
at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke(ServiceInvoker.java:67)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke(ServiceTCCLInterceptor.java:34)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke(LocalBundleContextAdvice.java:59)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy18.invoke(Unknown Source)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:50)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy20.sequencesFromFile(Unknown Source)
at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java:152)
at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138)
at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:238)
at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:212)
at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages(SequenceEditor.java:47)
at org.eclipse.ui.part.MultiPageEditorPart.createPartControl(MultiPageEditorPart.java:357)
at org.eclipse.ui.internal.EditorReference.createPartHelper(EditorReference.java:662)
at org.eclipse.ui.internal.EditorReference.createPart(EditorReference.java:462)
at org.eclipse.ui.internal.WorkbenchPartReference.getPart(WorkbenchPartReference.java:595)
at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313)
at org.eclipse.ui.internal.presentations.PresentablePart.setVisible(PresentablePart.java:180)
at org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select(PresentablePartFolder.java:270)
at org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select(LeftToRightTabOrder.java:65)
at org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart(TabbedStackPresentation.java:473)
at org.eclipse.ui.internal.PartStack.refreshPresentationSelection(PartStack.java:1256)
at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java:1209)
at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608)
at org.eclipse.ui.internal.PartStack.add(PartStack.java:499)
at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103)
at org.eclipse.ui.internal.PartStack.add(PartStack.java:485)
at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112)
at org.eclipse.ui.internal.EditorSashContainer.addEditor(EditorSashContainer.java:63)
at org.eclipse.ui.internal.EditorAreaHelper.addToLayout(EditorAreaHelper.java:225)
at org.eclipse.ui.internal.EditorAreaHelper.addEditor(EditorAreaHelper.java:213)
at org.eclipse.ui.internal.EditorManager.createEditorTab(EditorManager.java:778)
at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor(EditorManager.java:677)
at org.eclipse.ui.internal.EditorManager.openEditor(EditorManager.java:638)
at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched(WorkbenchPage.java:2854)
at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor(WorkbenchPage.java:2762)
at org.eclipse.ui.internal.WorkbenchPage.access$11(WorkbenchPage.java:2754)
at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java:2705)
at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2701)
at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2685)
at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2676)
at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651)
at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610)
at org.eclipse.ui.actions.OpenFileAction.openFile(OpenFileAction.java:99)
at org.eclipse.ui.actions.OpenSystemEditorAction.run(OpenSystemEditorAction.java:99)
at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221)
at org.eclipse.ui.navigator.CommonNavigatorManager$3.open(CommonNavigatorManager.java:202)
at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open(OpenAndLinkWithEditorHelper.java:48)
at org.eclipse.jface.viewers.StructuredViewer$2.run(StructuredViewer.java:842)
at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
at org.eclipse.core.runtime.Platform.run(Platform.java:888)
at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48)
at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175)
at org.eclipse.jface.viewers.StructuredViewer.fireOpen(StructuredViewer.java:840)
at org.eclipse.jface.viewers.StructuredViewer.handleOpen(StructuredViewer.java:1101)
at org.eclipse.ui.navigator.CommonViewer.handleOpen(CommonViewer.java:467)
at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen(StructuredViewer.java:1205)
at org.eclipse.jface.util.OpenStrategy.fireOpenEvent(OpenStrategy.java:264)
at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java:258)
at org.eclipse.jface.util.OpenStrategy$1.handleEvent(OpenStrategy.java:298)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3441)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100)
at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2405)
at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369)
at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221)
at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:493)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
at net.bioclipse.ui.Application.start(Application.java:36)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:194)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514)
at org.eclipse.equinox.launcher.Main.run(Main.java:1311)
at org.eclipse.equinox.launcher.Main.main(Main.java:1287)
Caused by: org.biojava.bio.seq.io.ParseException:
A Exception Has Occurred During Parsing.
Please submit the details that follow to biojava-l at biojava.org or post
a bug report to http://bugzilla.open-bio.org/
Format_object=org.biojavax.bio.seq.io.FastaFormat
Accession=OPSD_FELCA
Id=null
Comments=problem parsing symbols
Parse_block=mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa
Stack trace follows ....
at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:244)
at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
... 114 more
Caused by: org.biojava.bio.symbol.IllegalSymbolException: This
tokenization doesn't contain character: 'e'
at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar(CharacterTokenization.java:175)
at org.biojava.bio.seq.io.CharacterTokenization$TPStreamParser.characters(CharacterTokenization.java:246)
at org.biojava.bio.symbol.SimpleSymbolList.(SimpleSymbolList.java:178)
at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:237)
... 115 more
// Carl
From holland at eaglegenomics.com Fri Nov 6 12:15:28 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 6 Nov 2009 17:15:28 +0000
Subject: [Biojava-l] How do I read a FASTA file containing protein
sequences in lowercase?
In-Reply-To: <16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com>
References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com>
<16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com>
Message-ID: <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com>
Ah OK I see what's going on.
The convenience method you're using, RichSequence.IOTools.readStream
(), uses FastaFormat to try and guess the alphabet to use based on the
first line of the input sequence.
In FastaFormat, it does this by searching for matching non-DNA
symbols. The search is case-sensitive:
protected static final Pattern aminoAcids = Pattern.compile(".*
[FLIPQE].*");
FastaFormat needs patching to make this pattern non-case-sensitive.
Still, if the sequence is such that any of the above symbols don't
appear until the second or subsequent lines, the guessing will not
work and it'll assume it's DNA, and give you the same error as before.
In the circumstances where you know what alphabet the sequence is in
advance, it's best to avoid the guessing algorithms and instead use
the methods such as readFastaDNA that explicity specify the alphabet
you want to read.
However, there's still one thing that you definitely can't do and
that's parse different types of sequence from the same input without
inserting some kind of additional code to detect what alphabet each
individual sequence is using before parsing it using the appropriate
BioJava parser. Your code appears to expecting mixed input, but this
won't work unless they all happen to be the same alphabet.
cheers,
Richard
On 6 Nov 2009, at 16:54, Carl M?sak wrote:
> Richard (>), Carl (>>):
>>> I'm using RichSequenceIterator to read FASTA files containing
>>> proteins. Somehow it doesn't work when the protein sequences are in
>>> lowercase, which they sometimes are when downloaded from e.g.
>>> Uniprot.
>>> My code fails to recognize the following file as containing a
>>> protein
>>> sequence:
>>>
>>>> OPSD_FELCA
>>>
>>>
>>> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln
>>>
>>> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv
>>>
>>> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq
>>>
>>> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn
>>> cmlttlccgknplgddeasttgsktetsqvapa
>>>
>>> What am I missing? Here's the code I'm using to read in sequences:
>>>
>>> private List sequencesFromInputStream(InputStream
>>> stream) {
>>>
>>> BufferedInputStream bufferedStream = new
>>> BufferedInputStream(stream);
>>> Namespace ns = RichObjectFactory.getDefaultNamespace();
>>> RichSequenceIterator seqit = null;
>>>
>>> try {
>>> seqit = RichSequence.IOTools.readStream(bufferedStream,
>>> ns);
>>> } catch (IOException e) {
>>> logger.error("Couldn't read sequences from file", e);
>>> return Collections.emptyList();
>>> }
>>>
>>> List sequences = new ArrayList();
>>> try {
>>> while ( seqit.hasNext() ) {
>>> RichSequence rseq;
>>> rseq = seqit.nextRichSequence(); // *error occurs
>>> here*
>>> if (rseq == null)
>>> continue;
>>> String alphabet = rseq.getAlphabet().getName();
>>> sequences.add(
>>> "DNA".equals(alphabet) ? new BiojavaDNA(rseq)
>>> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq)
>>> : new BiojavaProtein
>>> (rseq) );
>>> }
>>> } catch (NoSuchElementException e) {
>>> logger.error("Read past last sequence", e);
>>> } catch (BioException e) {
>>> logger.error(e); // *ends up here*
>>> }
>>>
>>> return sequences;
>>> }
>>>
>>> Grateful for any pointers you might have.
>>
>> Could you post the output from the exception stack that it generates?
>
> org.biojava.bio.BioException: Could not read sequence
> at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence
> (RichStreamReader.java:113)
> at
> net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream
> (BiojavaManager.java:314)
> at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile
> (BiojavaManager.java:291)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke
> (NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke
> (AbstractManagerMethodDispatcher.java:243)
> at
> net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread
> (JavaManagerMethodDispatcher.java:248)
> at
> net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke
> (AbstractManagerMethodDispatcher.java:130)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:171)
> at net.bioclipse.recording.WrapInProxyAdvice.invoke
> (WrapInProxyAdvice.java:22)
> at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke
> (ServiceInvoker.java:59)
> at
> org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke
> (ServiceInvoker.java:67)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:171)
> at
> org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke
> (ServiceTCCLInterceptor.java:34)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:171)
> at
> org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke
> (LocalBundleContextAdvice.java:59)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:171)
> at
> org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed
> (DelegatingIntroductionInterceptor.java:131)
> at
> org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke
> (DelegatingIntroductionInterceptor.java:119)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:171)
> at org.springframework.aop.framework.JdkDynamicAopProxy.invoke
> (JdkDynamicAopProxy.java:204)
> at $Proxy18.invoke(Unknown Source)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:171)
> at
> org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke
> (AfterReturningAdviceInterceptor.java:50)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed
> (ReflectiveMethodInvocation.java:171)
> at org.springframework.aop.framework.JdkDynamicAopProxy.invoke
> (JdkDynamicAopProxy.java:204)
> at $Proxy20.sequencesFromFile(Unknown Source)
> at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java:
> 152)
> at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138)
> at org.eclipse.ui.part.MultiPageEditorPart.addPage
> (MultiPageEditorPart.java:238)
> at org.eclipse.ui.part.MultiPageEditorPart.addPage
> (MultiPageEditorPart.java:212)
> at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages
> (SequenceEditor.java:47)
> at org.eclipse.ui.part.MultiPageEditorPart.createPartControl
> (MultiPageEditorPart.java:357)
> at org.eclipse.ui.internal.EditorReference.createPartHelper
> (EditorReference.java:662)
> at org.eclipse.ui.internal.EditorReference.createPart
> (EditorReference.java:462)
> at org.eclipse.ui.internal.WorkbenchPartReference.getPart
> (WorkbenchPartReference.java:595)
> at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313)
> at org.eclipse.ui.internal.presentations.PresentablePart.setVisible
> (PresentablePart.java:180)
> at
> org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select
> (PresentablePartFolder.java:270)
> at
> org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select
> (LeftToRightTabOrder.java:65)
> at
> org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart
> (TabbedStackPresentation.java:473)
> at org.eclipse.ui.internal.PartStack.refreshPresentationSelection
> (PartStack.java:1256)
> at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java:
> 1209)
> at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608)
> at org.eclipse.ui.internal.PartStack.add(PartStack.java:499)
> at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103)
> at org.eclipse.ui.internal.PartStack.add(PartStack.java:485)
> at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112)
> at org.eclipse.ui.internal.EditorSashContainer.addEditor
> (EditorSashContainer.java:63)
> at org.eclipse.ui.internal.EditorAreaHelper.addToLayout
> (EditorAreaHelper.java:225)
> at org.eclipse.ui.internal.EditorAreaHelper.addEditor
> (EditorAreaHelper.java:213)
> at org.eclipse.ui.internal.EditorManager.createEditorTab
> (EditorManager.java:778)
> at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor
> (EditorManager.java:677)
> at org.eclipse.ui.internal.EditorManager.openEditor
> (EditorManager.java:638)
> at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched
> (WorkbenchPage.java:2854)
> at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor
> (WorkbenchPage.java:2762)
> at org.eclipse.ui.internal.WorkbenchPage.access$11
> (WorkbenchPage.java:2754)
> at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java:
> 2705)
> at org.eclipse.swt.custom.BusyIndicator.showWhile
> (BusyIndicator.java:70)
> at org.eclipse.ui.internal.WorkbenchPage.openEditor
> (WorkbenchPage.java:2701)
> at org.eclipse.ui.internal.WorkbenchPage.openEditor
> (WorkbenchPage.java:2685)
> at org.eclipse.ui.internal.WorkbenchPage.openEditor
> (WorkbenchPage.java:2676)
> at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651)
> at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610)
> at org.eclipse.ui.actions.OpenFileAction.openFile
> (OpenFileAction.java:99)
> at org.eclipse.ui.actions.OpenSystemEditorAction.run
> (OpenSystemEditorAction.java:99)
> at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221)
> at org.eclipse.ui.navigator.CommonNavigatorManager$3.open
> (CommonNavigatorManager.java:202)
> at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open
> (OpenAndLinkWithEditorHelper.java:48)
> at org.eclipse.jface.viewers.StructuredViewer$2.run
> (StructuredViewer.java:842)
> at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
> at org.eclipse.core.runtime.Platform.run(Platform.java:888)
> at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48)
> at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175)
> at org.eclipse.jface.viewers.StructuredViewer.fireOpen
> (StructuredViewer.java:840)
> at org.eclipse.jface.viewers.StructuredViewer.handleOpen
> (StructuredViewer.java:1101)
> at org.eclipse.ui.navigator.CommonViewer.handleOpen
> (CommonViewer.java:467)
> at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen
> (StructuredViewer.java:1205)
> at org.eclipse.jface.util.OpenStrategy.fireOpenEvent
> (OpenStrategy.java:264)
> at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java:
> 258)
> at org.eclipse.jface.util.OpenStrategy$1.handleEvent
> (OpenStrategy.java:298)
> at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
> at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543)
> at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250)
> at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273)
> at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
> at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079)
> at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:
> 3441)
> at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100)
> at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:
> 2405)
> at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369)
> at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221)
> at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500)
> at org.eclipse.core.databinding.observable.Realm.runWithDefault
> (Realm.java:332)
> at org.eclipse.ui.internal.Workbench.createAndRunWorkbench
> (Workbench.java:493)
> at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:
> 149)
> at net.bioclipse.ui.Application.start(Application.java:36)
> at org.eclipse.equinox.internal.app.EclipseAppHandle.run
> (EclipseAppHandle.java:194)
> at
> org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication
> (EclipseAppLauncher.java:110)
> at
> org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start
> (EclipseAppLauncher.java:79)
> at org.eclipse.core.runtime.adaptor.EclipseStarter.run
> (EclipseStarter.java:368)
> at org.eclipse.core.runtime.adaptor.EclipseStarter.run
> (EclipseStarter.java:179)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke
> (NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559)
> at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514)
> at org.eclipse.equinox.launcher.Main.run(Main.java:1311)
> at org.eclipse.equinox.launcher.Main.main(Main.java:1287)
> Caused by: org.biojava.bio.seq.io.ParseException:
>
> A Exception Has Occurred During Parsing.
> Please submit the details that follow to biojava-l at biojava.org or post
> a bug report to http://bugzilla.open-bio.org/
>
> Format_object=org.biojavax.bio.seq.io.FastaFormat
> Accession=OPSD_FELCA
> Id=null
> Comments=problem parsing symbols
> Parse_block
> =
> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa
> Stack trace follows ....
>
>
> at org.biojavax.bio.seq.io.FastaFormat.readRichSequence
> (FastaFormat.java:244)
> at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence
> (RichStreamReader.java:110)
> ... 114 more
> Caused by: org.biojava.bio.symbol.IllegalSymbolException: This
> tokenization doesn't contain character: 'e'
> at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar
> (CharacterTokenization.java:175)
> at org.biojava.bio.seq.io.CharacterTokenization
> $TPStreamParser.characters(CharacterTokenization.java:246)
> at org.biojava.bio.symbol.SimpleSymbolList.
> (SimpleSymbolList.java:178)
> at org.biojavax.bio.seq.io.FastaFormat.readRichSequence
> (FastaFormat.java:237)
> ... 115 more
>
> // Carl
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From holland at eaglegenomics.com Fri Nov 6 11:35:24 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 6 Nov 2009 16:35:24 +0000
Subject: [Biojava-l] How do I read a FASTA file containing protein
sequences in lowercase?
In-Reply-To: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com>
References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com>
Message-ID:
Could you post the output from the exception stack that it generates?
thanks,
Richard
On 6 Nov 2009, at 16:25, Carl M?sak wrote:
> I'm using RichSequenceIterator to read FASTA files containing
> proteins. Somehow it doesn't work when the protein sequences are in
> lowercase, which they sometimes are when downloaded from e.g. Uniprot.
> My code fails to recognize the following file as containing a protein
> sequence:
>
>> OPSD_FELCA
> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln
> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv
> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq
> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn
> cmlttlccgknplgddeasttgsktetsqvapa
>
> What am I missing? Here's the code I'm using to read in sequences:
>
> private List sequencesFromInputStream(InputStream
> stream) {
>
> BufferedInputStream bufferedStream = new BufferedInputStream
> (stream);
> Namespace ns = RichObjectFactory.getDefaultNamespace();
> RichSequenceIterator seqit = null;
>
> try {
> seqit = RichSequence.IOTools.readStream(bufferedStream,
> ns);
> } catch (IOException e) {
> logger.error("Couldn't read sequences from file", e);
> return Collections.emptyList();
> }
>
> List sequences = new ArrayList();
> try {
> while ( seqit.hasNext() ) {
> RichSequence rseq;
> rseq = seqit.nextRichSequence(); // *error occurs
> here*
> if (rseq == null)
> continue;
> String alphabet = rseq.getAlphabet().getName();
> sequences.add(
> "DNA".equals(alphabet) ? new BiojavaDNA(rseq)
> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq)
> : new BiojavaProtein
> (rseq) );
> }
> } catch (NoSuchElementException e) {
> logger.error("Read past last sequence", e);
> } catch (BioException e) {
> logger.error(e); // *ends up here*
> }
>
> return sequences;
> }
>
> Grateful for any pointers you might have.
>
> Regards,
> // Carl M?sak
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From andylu0320 at gmail.com Sat Nov 7 13:06:39 2009
From: andylu0320 at gmail.com (Andy Lu)
Date: Sat, 7 Nov 2009 13:06:39 -0500
Subject: [Biojava-l] Bio Java installation inquiry
Message-ID: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>
Hi, I am able to get JMol to run on Eclipse, but I am having a lot of
trouble getting biojava to run, I am not sure how to set up all of the class
path, etc.
I am new to using Eclipse and biojava. Is there a specific step by step
instruction online available?
Any help would be greatly appreciated!
From andreas at sdsc.edu Sat Nov 7 13:14:17 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sat, 7 Nov 2009 10:14:17 -0800
Subject: [Biojava-l] Bio Java installation inquiry
In-Reply-To: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>
References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>
Message-ID: <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com>
Hi Andy,
best thing is to download the jar files from
http://biojava.org/wiki/BioJava:Download .
Proably the easiest way to get started is to create a new project in eclipse
and right click on the project-> Properties -> Java build path -> Libraries
-> Add jars
Then your project will know how where to find the dependencies and you can
start writing your own code.
Andreas
On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote:
> Hi, I am able to get JMol to run on Eclipse, but I am having a lot of
> trouble getting biojava to run, I am not sure how to set up all of the
> class
> path, etc.
> I am new to using Eclipse and biojava. Is there a specific step by step
> instruction online available?
>
> Any help would be greatly appreciated!
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From andreas at sdsc.edu Sat Nov 7 13:40:59 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sat, 7 Nov 2009 10:40:59 -0800
Subject: [Biojava-l] Bio Java installation inquiry
In-Reply-To: <4a1a3f7d0911071033j20dcd234rfd287998e7fab603@mail.gmail.com>
References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>
<59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com>
<4a1a3f7d0911071033j20dcd234rfd287998e7fab603@mail.gmail.com>
Message-ID: <59a41c430911071040i2b574d0ak2af98dbf22c1ab6a@mail.gmail.com>
You don;t need to have Jmol in the classpath for running biojava, but if you
do, you can use the jmol/biojava interface contained in the protein
structure modules.
In that case JmolApplet.jar would be sufficient, you don;t need to check out
the Jmol source...
Andreas
On Sat, Nov 7, 2009 at 10:33 AM, Andy Lu wrote:
> O I see, but don't I also need to have all of the JMol java files set up
> first, or the BioJava jar file contains everything I need?
>
>
> On Sat, Nov 7, 2009 at 1:14 PM, Andreas Prlic wrote:
>
>> Hi Andy,
>>
>> best thing is to download the jar files from
>> http://biojava.org/wiki/BioJava:Download .
>>
>> Proably the easiest way to get started is to create a new project in
>> eclipse and right click on the project-> Properties -> Java build path ->
>> Libraries -> Add jars
>>
>> Then your project will know how where to find the dependencies and you
>> can start writing your own code.
>>
>> Andreas
>>
>>
>> On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote:
>>
>>> Hi, I am able to get JMol to run on Eclipse, but I am having a lot of
>>> trouble getting biojava to run, I am not sure how to set up all of the
>>> class
>>> path, etc.
>>> I am new to using Eclipse and biojava. Is there a specific step by step
>>> instruction online available?
>>>
>>> Any help would be greatly appreciated!
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>
>
> --
> Andy Lu
>
From andy.law at roslin.ed.ac.uk Sat Nov 7 15:21:59 2009
From: andy.law at roslin.ed.ac.uk (andy law (RI))
Date: Sat, 7 Nov 2009 20:21:59 +0000
Subject: [Biojava-l] Bio Java installation inquiry
In-Reply-To: <59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com>
References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>,
<59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com>
Message-ID: <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk>
Andreas,
When will the mavenised version of biojava be officially released?
Later,
Andy
________________________________________
From: biojava-l-bounces at lists.open-bio.org [biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic [andreas at sdsc.edu]
Sent: 07 November 2009 18:14
To: Andy Lu
Cc: biojava-l at biojava.org
Subject: Re: [Biojava-l] Bio Java installation inquiry
Hi Andy,
best thing is to download the jar files from
http://biojava.org/wiki/BioJava:Download .
Proably the easiest way to get started is to create a new project in eclipse
and right click on the project-> Properties -> Java build path -> Libraries
-> Add jars
Then your project will know how where to find the dependencies and you can
start writing your own code.
Andreas
On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote:
> Hi, I am able to get JMol to run on Eclipse, but I am having a lot of
> trouble getting biojava to run, I am not sure how to set up all of the
> class
> path, etc.
> I am new to using Eclipse and biojava. Is there a specific step by step
> instruction online available?
>
> Any help would be greatly appreciated!
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
_______________________________________________
Biojava-l mailing list - Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
From invite+m---dj1_ at facebookmail.com Sat Nov 7 10:36:06 2009
From: invite+m---dj1_ at facebookmail.com (Hemant Katta)
Date: Sat, 7 Nov 2009 07:36:06 -0800
Subject: [Biojava-l] Check out my photos on Facebook
Message-ID: <5cfd033d354d05252e1f2adb571f7451@localhost.localdomain>
Hi Biojava-l,
I set up a Facebook profile where I can post my pictures, videos and events and I want to add you as a friend so you can see it. First, you need to join Facebook! Once you join, you can also create your own profile.
Thanks,
Hemant
To sign up for Facebook, follow the link below:
http://www.facebook.com/p.php?i=1248280623&k=Z4AT2VW4W4TBXFL1XE5Y2USVTSCK5YW&r
Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=biojava-l at biojava.org&c=152b234aad67c75ff060fc623aab7b42.biojava-l at biojava.org was invited to join Facebook by Hemant Katta. If you do not wish to receive this type of email from Facebook in the future, please click on the link below to unsubscribe.
http://www.facebook.com/o.php?k=139a0f&u=574809715&mid=15f9114G2242e673G0G8
Facebook's offices are located at 1601 S. California Ave., Palo Alto, CA 94304.
From andreas at sdsc.edu Sun Nov 8 01:52:00 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sat, 7 Nov 2009 22:52:00 -0800
Subject: [Biojava-l] Bio Java installation inquiry
In-Reply-To: <2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk>
References: <4a1a3f7d0911071006i45d25a71jfc4bfffcc46ae839@mail.gmail.com>
<59a41c430911071014t2aeb0646u7d1cc237e1a61180@mail.gmail.com>
<2FA0B8F4EACC05449112A4C02C6DACC00431C3FC17@ebrcexch1.ebrc.bbsrc.ac.uk>
Message-ID: <59a41c430911072252j40e22912k85886a5d61427bba@mail.gmail.com>
Hi Andy,
At the present the plan is to spend some more time working on the modules
and then make a release (called 3.0) at some point shortly after the
hackaton in Cambridge in January. Early adopters can already now use the
modules via SVN.
Andreas
On Sat, Nov 7, 2009 at 12:21 PM, andy law (RI) wrote:
> Andreas,
>
> When will the mavenised version of biojava be officially released?
>
> Later,
>
> Andy
> ________________________________________
> From: biojava-l-bounces at lists.open-bio.org [
> biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic [
> andreas at sdsc.edu]
> Sent: 07 November 2009 18:14
> To: Andy Lu
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Bio Java installation inquiry
>
> Hi Andy,
>
> best thing is to download the jar files from
> http://biojava.org/wiki/BioJava:Download .
>
> Proably the easiest way to get started is to create a new project in
> eclipse
> and right click on the project-> Properties -> Java build path -> Libraries
> -> Add jars
>
> Then your project will know how where to find the dependencies and you can
> start writing your own code.
>
> Andreas
>
>
> On Sat, Nov 7, 2009 at 10:06 AM, Andy Lu wrote:
>
> > Hi, I am able to get JMol to run on Eclipse, but I am having a lot of
> > trouble getting biojava to run, I am not sure how to set up all of the
> > class
> > path, etc.
> > I am new to using Eclipse and biojava. Is there a specific step by step
> > instruction online available?
> >
> > Any help would be greatly appreciated!
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From jbdundas at gmail.com Tue Nov 10 09:23:10 2009
From: jbdundas at gmail.com (jitesh dundas)
Date: Tue, 10 Nov 2009 19:53:10 +0530
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
<59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
<326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com>
Message-ID: <326ea8620911100623k4daa1222s60481a9f35777c31@mail.gmail.com>
Dear Friends,
Thank you for your help and advise.
The code in the mentioned URL is working fine ->
http://gist.github.com/229248 (this is my code that has been uploaded by a
wise group member. Many thanks to him for doing that)
Hope this helps..
Regards,
JItesh Dundas
On Sun, Nov 8, 2009 at 3:52 PM, jitesh dundas wrote:
> Dear Sir,
>
> My program is working fine and can send me an xml file with 20
> records. However, it does not allow me to send large amounts of
> records.
>
> For e.g. if I enter "cancer" it will return only 20 records.
>
> Can you please tell me what I should do next to get all those records.
> Thank you in advance
>
> Regards,
> Jitesh Dundas
>
> On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote:
> >
> > Hi Jitesh,
> >
> > It is hard to read your code with all the formatting off probably due to
> email and many commented lines that don;t seem to get used. Can you provide
> the stacktrace, so we can see what part of biojava is affected?
> >
> > Probably a good strategy to write and debug this is to simply the problem
> into smaller steps. Try to first download the files you want to parse and
> write the code to parse them from the local file. That will avoid any
> issues you might encounter with networking and server/client communication.
> Once the parsing is working you could take it to the next step and add the
> server communication...
> >
> > Andreas
> >
> >
> >
> >
> > On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas
> wrote:
> >>
> >> Hi friends,
> >>
> >> I am getting this error on doing a post(using the code below) to this
> url->
> >>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
> >>
> >> I have written this code in .jsp file. Later I will change it into
> servlet.
> >>
> >> Error:-
> >> XML Parsing Error: XML or text declaration not at start of entity
> >> Location:
> >>
> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
> >> Line Number 11, Column 1: >> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
> >> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd
> ">2034200
> >> 19877350 19877304 19877297
> >> 19877284 19877271 19877265
> >> 19877250 19877245 19877226
> >> 19877210 19877179 19877175
> >> 19877161 19877159 19877158
> >> 19877123 19877122 19877120
> >> 19877119 19877118
> >> cancer
> >> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
> >> Fields]
> >> "neoplasms"[MeSH Terms] MeSH
> >> Terms 2082133 Y
> >> "neoplasms"[All Fields]
> All
> >> Fields 1634731 Y
> >> OR "cancer"[All
> Fields]
> >> All Fields 902537
> Y
> >> OR GROUP
> >> 2009/10/22[EDAT] EDAT 0
> >> Y
> >> 2009/11/01[EDAT] EDAT 0
> >> Y RANGE AND
> >> ("neoplasms"[MeSH Terms] OR
> >> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
> >> 2009/11/01[EDAT]
> >> ^
> >>
> >> As you can see, the XML output is coming fine but the above error does
> not
> >> go..The output via this program should be just like hitting manually the
> >> above URL in the browser..
> >> The browser is Mozilla Firefox.
> >>
> >> Code:-
> >>
> >> <%@ page language = "java" %>
> >> <%@ page import = "java.sql.*" %>
> >> <%@ page import = "java.util.*" %>
> >> <%@ page import = "java.io.*" %>
> >> <%@ page import="java.lang.*" %>
> >> <%@ page import="java.net.*" %>
> >> <%@ page import="java.nio.*" %>
> >> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
> >>
> >>
> >> <%
> >>
> >> try
> >> {
> >> //String str = "";
> >> //out.println("");
> >>
> >> Properties systemSettings = System.getProperties();
> >> systemSettings.put("http.proxyHost", "********");
> >> systemSettings.put("http.proxyPort", "******");
> >> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
> >> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
> >>
> >> //out.println("Properties Set");
> >> Authenticator.setDefault(new Authenticator()
> >> {
> >> protected PasswordAuthentication getPasswordAuthentication()
> >> {
> >> return new PasswordAuthentication("**",
> >> "******".toCharArray()); // specify ur user name password of iitb login
> >> }
> >> });
> >>
> >>
> >> System.setProperties(systemSettings);
> >> //out.println("After Authentication & Properties Settings");
> >>
> >> //create xml file.
> >> //the input to google api
> >> //String textAreaContent = request.getParameter("text");
> >> String textAreaContent = "This si a tst";
> >>
> >> String str = "";
> >>
> >> //xml file generation ends here..
> >> //FetchDataFromNCBI_URLString.jsp
> >> String URLString = request.getParameter("txtURLString").trim();
> >>
> >> //URL url = new URL("
> >>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
> >> ");
> >> URL url = new URL(URLString); //url string taken from user input.
> >> HttpURLConnection connection = null;
> >>
> >> connection = (HttpURLConnection) url.openConnection();
> >> System.out.println("After open connection");
> >> connection.setRequestMethod("POST");
> >> connection.setDoInput(true);
> >> connection.setDoOutput(true);
> >>
> >> connection.setUseCaches(false);
> >> connection.setAllowUserInteraction(false);
> >> //connection.setFollowRedirects(true);
> >> //connection.setInstanceFollowRedirects(true);
> >> //System.out.println("Before-------------------");
> >> connection.setRequestProperty ("Content-Type","text/xml;
> >> charset=\"utf-8\"");
> >> //System.out.println("After-------------------");
> >>
> >> //System.out.println(""+ connection.getOutputStream());
> >>
> >> //System.out.println("After dataoutputstream..Line No-65");
> >>
> >> //System.out.println("Response Code="+ connection.getResponseCode);
> >>
> >> OutputStreamWriter dosout = new
> >> OutputStreamWriter(connection.getOutputStream());
> >> //System.out.println("After dosout object..Line No-63");
> >> //dosout.write(str);
> >> dosout.close ();
> >>
> >> BufferedReader in = new BufferedReader( new InputStreamReader(
> >> connection.getInputStream()));
> >>
> >> String decodedString;
> >> String tempstr = "";
> >>
> >>
> >> while ((decodedString = in.readLine()) != null)
> >> {
> >> tempstr = tempstr + decodedString;
> >> //out.println(decodedString);
> >> }
> >> out.println(tempstr);
> >> in.close();
> >> }
> >> catch(Exception ex)
> >> {
> >> out.println("Exception->"+ex);
> >> PrintWriter pw = response.getWriter();
> >> ex.printStackTrace(pw);
> >> }
> >>
> >>
> >> %>
> >>
> >> Thanks in advance..
> >>
> >> Regards,
> >> JItesh Dundas
> >>
> >> _______________________________________________
> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> >
>
From oliver.stolpe at fu-berlin.de Thu Nov 12 08:18:52 2009
From: oliver.stolpe at fu-berlin.de (Oliver Stolpe)
Date: Thu, 12 Nov 2009 14:18:52 +0100
Subject: [Biojava-l] SeqIOTools deprecated,
looking for alternatives // RichSeq.IOTools
Message-ID: <4AFC0B3C.3010503@fu-berlin.de>
Hello *,
the cookbook uses in its examples the SeqIOTools-class for reading the
files. But in the API it is marked as deprecated. Now I am looking for
alternatives, so I searched the list and internet and found out that
biojavax provides methods and classes for reading the files
(RichSequence.IOTools).
For example, I try to read an EMBL-file:
--begin:code--
BufferedReader br = new BufferedReader(new FileReader(filename));
Namespace ns = RichObjectFactory.getDefaultNamespace();
RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns);
while (seqs.hasNext()) {
RichSequence seq = seqs.nextRichSequence();
System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap());
}
--end:code--
But I always get this error message:
--begin:error--
org.biojava.bio.BioException: Could not read sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at ReadGenbankFile.EMBL(ReadGenbankFile.java:42)
at ReadGenbankFile.main(ReadGenbankFile.java:85)
Caused by: org.biojava.bio.seq.io.ParseException:
A Exception Has Occurred During Parsing.
Please submit the details that follow to biojava-l at biojava.org or post a
bug report to http://bugzilla.open-bio.org/
Format_object=org.biojavax.bio.seq.io.EMBLFormat
Accession=null
Id=not set
Comments=
Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT
CDS join(<1082..1272,2484..2638,4926..>5041)
/codon_start=3
/gene="PGM1"
/product="phosphoglucomutase 1"
/function="carbohydrate metabolism"
/EC_number="5.4.2.2"
/db_xref="GOA:Q9H1D2"
/db_xref="HGNC:8905"
/db_xref="HSSP:3PMG"
/db_xref="InterPro:IPR016055"
/db_xref="UniProtKB/TrEMBL:Q9H1D2"
/protein_id="CAC19809.1"
/translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ
Sequence 462 BP;
Stack trace follows ....
at
org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775)
at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
... 2 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -3
at java.lang.String.substring(String.java:1949)
at java.lang.String.substring(String.java:1916)
at
org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761)
... 4 more
--end:error--
The file looks all ok I think and works well with the deprecated SeqIOTools:
--begin:embl-file--
ID AJ243265_2; parent: AJ243265
AC AJ243265;
FT CDS join(<1082..1272,2484..2638,4926..>5041)
FT /codon_start=3
FT /gene="PGM1"
FT /product="phosphoglucomutase 1"
FT /function="carbohydrate metabolism"
FT /EC_number="5.4.2.2"
FT /db_xref="GOA:Q9H1D2"
FT /db_xref="HGNC:8905"
FT /db_xref="HSSP:3PMG"
FT /db_xref="InterPro:IPR016055"
FT /db_xref="UniProtKB/TrEMBL:Q9H1D2"
FT /protein_id="CAC19809.1"
FT
/translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
FT
ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
FT RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"
SQ Sequence 462 BP;
ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct
gcgaactcgg 60
cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc
aacctcacct 120
atgcagctga cctggtggag accatgaagt caggagagca tgattttggg
gctgcctttg 180
atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg
aacccttcag 240
actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag
cagactgggg 300
tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg
gctagtgcta 360
caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat
ttgatggacg 420
cgagcaaact gtccctttgt ggggaggaga gcttcgggac
cg 462
//
--end:embl-file--
The parser always crashes before reading the sequence (ttgt..., directly
after the BP;).
Any suggestions how I get this work?
Or are there other alternatives for substituting the deprecated
SeqIOTools-class?
Thanks in advance,
with best regards,
Oliver
From holland at eaglegenomics.com Fri Nov 13 06:21:47 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 13 Nov 2009 11:21:47 +0000
Subject: [Biojava-l] SeqIOTools deprecated,
looking for alternatives // RichSeq.IOTools
In-Reply-To: <4AFC0B3C.3010503@fu-berlin.de>
References: <4AFC0B3C.3010503@fu-berlin.de>
Message-ID: <05574914-87FC-44BB-90F1-75C79670A8EC@eaglegenomics.com>
Hello,
The file you are parsing is not a valid EMBL format file. The EMBL format is specified here:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4
and this is what the file should look like for your accession:
http://www.ebi.ac.uk/cgi-bin/emblfetch?style=html&id=AJ243265&Submit=Go
The most obvious problems in your file are the absence of the required 'XX' section delimiters, and an invalid ID line. There might be other problems too but I haven't checked the whole file, just the first few lines.
The deprecated SeqIOTools really didn't care if the file was valid or not, they basically just made a copy of all the lines in an internal token/value map. They made no attempt to parse or understand the data in each line. The new RichSequence-based parsers actually attempt to enforce the file format definitions and break down and understand the contents of each line. This means that they will reject any file that does not strictly conform to the specified format.
cheers,
Richard
On 12 Nov 2009, at 13:18, Oliver Stolpe wrote:
> Hello *,
>
> the cookbook uses in its examples the SeqIOTools-class for reading the files. But in the API it is marked as deprecated. Now I am looking for alternatives, so I searched the list and internet and found out that biojavax provides methods and classes for reading the files (RichSequence.IOTools).
>
> For example, I try to read an EMBL-file:
>
> --begin:code--
>
> BufferedReader br = new BufferedReader(new FileReader(filename));
> Namespace ns = RichObjectFactory.getDefaultNamespace();
> RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns);
>
> while (seqs.hasNext()) {
> RichSequence seq = seqs.nextRichSequence();
> System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap());
> }
>
> --end:code--
>
> But I always get this error message:
>
> --begin:error--
>
> org.biojava.bio.BioException: Could not read sequence
> at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
> at ReadGenbankFile.EMBL(ReadGenbankFile.java:42)
> at ReadGenbankFile.main(ReadGenbankFile.java:85)
> Caused by: org.biojava.bio.seq.io.ParseException:
>
> A Exception Has Occurred During Parsing.
> Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/
>
> Format_object=org.biojavax.bio.seq.io.EMBLFormat
> Accession=null
> Id=not set
> Comments=
> Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT CDS join(<1082..1272,2484..2638,4926..>5041)
> /codon_start=3
> /gene="PGM1"
> /product="phosphoglucomutase 1"
> /function="carbohydrate metabolism"
> /EC_number="5.4.2.2"
> /db_xref="GOA:Q9H1D2"
> /db_xref="HGNC:8905"
> /db_xref="HSSP:3PMG"
> /db_xref="InterPro:IPR016055"
> /db_xref="UniProtKB/TrEMBL:Q9H1D2"
> /protein_id="CAC19809.1"
> /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
> ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
> RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ Sequence 462 BP;
> Stack trace follows ....
>
>
> at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775)
> at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284)
> at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
> ... 2 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> at java.lang.String.substring(String.java:1949)
> at java.lang.String.substring(String.java:1916)
> at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761)
> ... 4 more
>
> --end:error--
>
> The file looks all ok I think and works well with the deprecated SeqIOTools:
>
> --begin:embl-file--
> ID AJ243265_2; parent: AJ243265
> AC AJ243265;
> FT CDS join(<1082..1272,2484..2638,4926..>5041)
> FT /codon_start=3
> FT /gene="PGM1"
> FT /product="phosphoglucomutase 1"
> FT /function="carbohydrate metabolism"
> FT /EC_number="5.4.2.2"
> FT /db_xref="GOA:Q9H1D2"
> FT /db_xref="HGNC:8905"
> FT /db_xref="HSSP:3PMG"
> FT /db_xref="InterPro:IPR016055"
> FT /db_xref="UniProtKB/TrEMBL:Q9H1D2"
> FT /protein_id="CAC19809.1"
> FT /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
> FT ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
> FT RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"
> SQ Sequence 462 BP;
> ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct gcgaactcgg 60
> cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc aacctcacct 120
> atgcagctga cctggtggag accatgaagt caggagagca tgattttggg gctgcctttg 180
> atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg aacccttcag 240
> actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag cagactgggg 300
> tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg gctagtgcta 360
> caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat ttgatggacg 420
> cgagcaaact gtccctttgt ggggaggaga gcttcgggac cg 462
> //
> --end:embl-file--
>
> The parser always crashes before reading the sequence (ttgt..., directly after the BP;).
>
> Any suggestions how I get this work?
> Or are there other alternatives for substituting the deprecated SeqIOTools-class?
>
> Thanks in advance,
>
> with best regards,
>
> Oliver
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Fri Nov 13 07:25:41 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 13 Nov 2009 12:25:41 +0000
Subject: [Biojava-l] Newick/Nexus processing of non-binary trees
In-Reply-To: <6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com>
References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
<120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com>
<6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com>
Message-ID: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com>
Hi,
> My suggestion: for somebody else to verify my findings. I might be
> doing something stupidly wrong. Maybe things are correct. Just a
> simple tree like (1,2,3) (as long as it is not binary) - should expose
> the problem.
>
Has nobody answered here is my take:
1. The error reported probably exists
2. Most probably nobody is using the parser (as it only supports binary trees).
In this light, changing the API should not be a problem at all.
I would not mind correcting the problem (I have already corrected the
previous 2 ones in my local version).
I would suggest removing the call to the unweighted graph. Reasons:
1. A weighted version is enough. If branch lengths are not specified,
then weights could be set to 0. There there would not be a decrease in
functionality.
2. Severely reducing the size of the code is important. Clearly the
code is not much maintained (and I am not offering to maintain it in
the long run, just putting it in good shape) and not much used.
Therefore a smaller, more easy to manage code base makes even more
sense.
If you accept a solution along these lines. I would correct all the
bugs and also include test code (which is also missing).
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From holland at eaglegenomics.com Fri Nov 13 07:42:03 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 13 Nov 2009 12:42:03 +0000
Subject: [Biojava-l] Newick/Nexus processing of non-binary trees
In-Reply-To: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com>
References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
<120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com>
<6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com>
<6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com>
Message-ID: <6467088B-93FA-48D2-A7C4-27CD238CE1AE@eaglegenomics.com>
i'm all for that.
The original code was developed by a Google Summer of Code student, who we haven't heard much from since. :(
cheers,
Richard
On 13 Nov 2009, at 12:25, Tiago Ant?o wrote:
> Hi,
>
>> My suggestion: for somebody else to verify my findings. I might be
>> doing something stupidly wrong. Maybe things are correct. Just a
>> simple tree like (1,2,3) (as long as it is not binary) - should expose
>> the problem.
>>
>
> Has nobody answered here is my take:
>
> 1. The error reported probably exists
> 2. Most probably nobody is using the parser (as it only supports binary trees).
>
> In this light, changing the API should not be a problem at all.
>
> I would not mind correcting the problem (I have already corrected the
> previous 2 ones in my local version).
> I would suggest removing the call to the unweighted graph. Reasons:
> 1. A weighted version is enough. If branch lengths are not specified,
> then weights could be set to 0. There there would not be a decrease in
> functionality.
> 2. Severely reducing the size of the code is important. Clearly the
> code is not much maintained (and I am not offering to maintain it in
> the long run, just putting it in good shape) and not much used.
> Therefore a smaller, more easy to manage code base makes even more
> sense.
>
> If you accept a solution along these lines. I would correct all the
> bugs and also include test code (which is also missing).
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From thasso.griebel at uni-jena.de Fri Nov 13 09:51:41 2009
From: thasso.griebel at uni-jena.de (Thasso Griebel)
Date: Fri, 13 Nov 2009 15:51:41 +0100
Subject: [Biojava-l] Newick/Nexus processing of non-binary trees
In-Reply-To: <6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com>
References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
<120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com>
<6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com>
<6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com>
Message-ID:
Hi,
> 1. A weighted version is enough. If branch lengths are not specified,
> then weights could be set to 0. There there would not be a decrease in
> functionality.
just my two cents, but I would go with a default weight of 1.0. If you
read something unweighted you would ignore the edge weights anyways,
but, for example, if you write something simple that computes path
lengths, a default weight of 1.0 ensures that the method also works
for "unweighted" trees, where the length of a path is defined as the
number of edges you need to traverse to move from say A to B. I think
the argument also hold for other algorithms used on trees and graphs.
anyways, just my two cent.
-thasso
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
From tiagoantao at gmail.com Fri Nov 13 09:54:08 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 13 Nov 2009 14:54:08 +0000
Subject: [Biojava-l] Newick/Nexus processing of non-binary trees
In-Reply-To:
References: <6d941f120911060526r14d43c43ncb5541d89a8dcaa3@mail.gmail.com>
<120EB3C1-0043-4E8C-8637-F16B3B56094B@eaglegenomics.com>
<6d941f120911060640v29b52bedxd120e39d092a1e88@mail.gmail.com>
<6d941f120911130425h31840441he04ce4f5bd88092f@mail.gmail.com>
Message-ID: <6d941f120911130654n55153f50r477cd11c281bd9a1@mail.gmail.com>
2009/11/13 Thasso Griebel :
> just my two cents, but I would go with a default weight of 1.0. If you read
> something unweighted you would ignore the edge weights anyways, but, for
> example, if you write something simple that computes path lengths, a default
> weight of 1.0 ensures that the method also works for "unweighted" trees,
> where the length of a path is defined as the number of edges you need to
> traverse to move from say A to B. I think the argument also hold for other
> algorithms used on trees and graphs.
OK, I will do this.
From holland at eaglegenomics.com Fri Nov 13 10:04:27 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 13 Nov 2009 15:04:27 +0000
Subject: [Biojava-l] How do I read a FASTA file containing protein
sequences in lowercase?
In-Reply-To: <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com>
References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com>
<16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com>
<179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com>
<16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com>
Message-ID: <2180C289-D2F7-4910-8534-9A94B1003941@eaglegenomics.com>
I've applied the patch to the trunk of biojava-live. Thanks!
Richard
On 9 Nov 2009, at 16:26, Carl M?sak wrote:
> Richard (>):
>> Ah OK I see what's going on.
>>
>> The convenience method you're using, RichSequence.IOTools.readStream(), uses
>> FastaFormat to try and guess the alphabet to use based on the first line of
>> the input sequence.
>>
>> In FastaFormat, it does this by searching for matching non-DNA symbols. The
>> search is case-sensitive:
>>
>> protected static final Pattern aminoAcids =
>> Pattern.compile(".*[FLIPQE].*");
>>
>> FastaFormat needs patching to make this pattern non-case-sensitive.
>
> Patch attached.
>
> I also took the opportunity to remove the occurrences of .* in the
> Pattern above. Generally, once should be using Matcher.find() when one
> is interested in matching a part of a string. This is more efficient
> than using Matcher.matches() and surrounding the desired regular
> expression with .*, since the latter will cause a lot of unnecessary
> backtracking and make the search quadratic.
>
> This effect only shows up for very long strings, but long strings can
> and do happen in bioinformatics. The below measurements show the
> quadratic behaviour of the former approach.
>
> $ for length in 100 1000 10000 100000 1000000; do (time java
> WithDotStar $length) 2>&1 | grep real; done
> real 0m0.371s
> real 0m0.367s
> real 0m0.577s
> real 0m2.735s
> real 0m25.275s
>
> $ for length in 100 1000 10000 100000 1000000; do (time java
> WithoutDotStar $length) 2>&1 | grep real; done
> real 0m0.309s
> real 0m0.361s
> real 0m0.468s
> real 0m1.184s
> real 0m9.703s
>
> Kindly,
> // Carl
>
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From andylu0320 at gmail.com Sun Nov 15 17:07:41 2009
From: andylu0320 at gmail.com (Andy Lu)
Date: Sun, 15 Nov 2009 17:07:41 -0500
Subject: [Biojava-l] JMol I/O
Message-ID: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com>
Hi, sorry to bother everyone again.
But I have a simple quesiton, I am using the SimpleJMolExample.java provided
on the website and it works. But for a pdb file containing about 20 atoms,
all of the atoms shows up on JMol for 1 second and then disappears, is it
because the color changes or something or some atom size restriction? It
works for files that contain much larger number of atoms.
If I try to open a file manually from JMol through the open option, it shows
up nicely. Is there a way that I can make the pdb file displayed on JMol
through Biojava the same color/display as the one if I open it manually
though JMol?
Any help would be greatly appreciated!
Thank you!
--
Andy Lu
From tiagoantao at gmail.com Sun Nov 15 18:19:46 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 15 Nov 2009 23:19:46 +0000
Subject: [Biojava-l] Newick parser
Message-ID: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com>
Hi,
I have made the changes as discussed, the code is attached to the
bugzilla bug concerning part of the issues that were found.
A few notes:
1. There is a ParserException raised on TreeBlock. Tough there is a
TreeBlockParser, most of the important parsing was (and still is!)
being made on TreeBlock. I would imagine that this is not the best
design, but I did not change it.
2. I made some test cases. Also included.
3. I don't mind producing some documentation, in case you accept the code.
4. I noticed a few minor bugs more (like eating spaces in the names of
nodes). But they are really minor.
5. The API was changed, but I suppose not many people were parsing
trees. If there were people parsing trees most probably the bug on not
being able to process trees that are not binary would have been
detected as it is pretty major.
Tiago
--
?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
From andreas at sdsc.edu Sun Nov 15 23:41:57 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 15 Nov 2009 20:41:57 -0800
Subject: [Biojava-l] JMol I/O
In-Reply-To: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com>
References: <4a1a3f7d0911151407w2f0fe3bfyfdcf94a1dae48fe5@mail.gmail.com>
Message-ID: <59a41c430911152041w592e43c2w3048b916a1855b85@mail.gmail.com>
Hi Andy,
probably you are trying to visualize a small molecule in Jmol, but the
visualization script you are sending only works if you have several
C-alpha atoms available. Try something like "select * ; spacefill
on;". Jmol has a powerful scripting language which is probably worth
having a look at, if you want to work with it more closely.
Andreas
On Sun, Nov 15, 2009 at 2:07 PM, Andy Lu wrote:
> Hi, sorry to bother everyone again.
> But I have a simple quesiton, I am using the SimpleJMolExample.java provided
> on the website and it works. But for a pdb file containing about 20 atoms,
> all of the atoms shows up on JMol for 1 second and then disappears, is it
> because the color changes or something or some atom size restriction? It
> works for files that contain much larger number of atoms.
> If I try to open a file manually from JMol through the open option, it shows
> up nicely. Is there a way that I can make the pdb file displayed on JMol
> through Biojava the same color/display as the one if I open it manually
> though JMol?
> Any help would be greatly appreciated!
> Thank you!
>
> --
> Andy Lu
> _______________________________________________
> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From holland at eaglegenomics.com Mon Nov 16 03:39:02 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 16 Nov 2009 08:39:02 +0000
Subject: [Biojava-l] Newick parser
In-Reply-To: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com>
References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com>
Message-ID:
Patch applied to the trunk of biojava-live. Thanks for fixing it!
cheers,
Richard
On 15 Nov 2009, at 23:19, Tiago Ant?o wrote:
> Hi,
>
> I have made the changes as discussed, the code is attached to the
> bugzilla bug concerning part of the issues that were found.
> A few notes:
>
> 1. There is a ParserException raised on TreeBlock. Tough there is a
> TreeBlockParser, most of the important parsing was (and still is!)
> being made on TreeBlock. I would imagine that this is not the best
> design, but I did not change it.
> 2. I made some test cases. Also included.
> 3. I don't mind producing some documentation, in case you accept the code.
> 4. I noticed a few minor bugs more (like eating spaces in the names of
> nodes). But they are really minor.
> 5. The API was changed, but I suppose not many people were parsing
> trees. If there were people parsing trees most probably the bug on not
> being able to process trees that are not binary would have been
> detected as it is pretty major.
>
> Tiago
>
> --
> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Mon Nov 16 07:35:11 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 16 Nov 2009 12:35:11 +0000
Subject: [Biojava-l] Newick parser
In-Reply-To:
References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com>
Message-ID: <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com>
I can just easily solve 2679, as it is precisely on the file that I
was changing. In case there is interest I'll just solve it.
2009/11/16 Richard Holland :
> Patch applied to the trunk of biojava-live. Thanks for fixing it!
>
> cheers,
> Richard
>
> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote:
>
>> Hi,
>>
>> I have made the changes as discussed, the code is attached to the
>> bugzilla bug concerning part of the issues that were found.
>> A few notes:
>>
>> 1. There is a ParserException raised on TreeBlock. Tough there is a
>> TreeBlockParser, most of the important parsing was (and still is!)
>> being made on TreeBlock. I would imagine that this is not the best
>> design, but I did not change it.
>> 2. I made some test cases. Also included.
>> 3. I don't mind producing some documentation, in case you accept the code.
>> 4. I noticed a few minor bugs more (like eating spaces in the names of
>> nodes). But they are really minor.
>> 5. The API was changed, but I suppose not many people were parsing
>> trees. If there were people parsing trees most probably the bug on not
>> being able to process trees that are not binary would have been
>> detected as it is pretty major.
>>
>> Tiago
>>
>> --
>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
>>
>> _______________________________________________
>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
From holland at eaglegenomics.com Mon Nov 16 07:41:50 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 16 Nov 2009 12:41:50 +0000
Subject: [Biojava-l] Newick parser
In-Reply-To: <6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com>
References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com>
<6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com>
Message-ID:
yes please!
On 16 Nov 2009, at 12:35, Tiago Ant?o wrote:
> I can just easily solve 2679, as it is precisely on the file that I
> was changing. In case there is interest I'll just solve it.
>
> 2009/11/16 Richard Holland :
>> Patch applied to the trunk of biojava-live. Thanks for fixing it!
>>
>> cheers,
>> Richard
>>
>> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote:
>>
>>> Hi,
>>>
>>> I have made the changes as discussed, the code is attached to the
>>> bugzilla bug concerning part of the issues that were found.
>>> A few notes:
>>>
>>> 1. There is a ParserException raised on TreeBlock. Tough there is a
>>> TreeBlockParser, most of the important parsing was (and still is!)
>>> being made on TreeBlock. I would imagine that this is not the best
>>> design, but I did not change it.
>>> 2. I made some test cases. Also included.
>>> 3. I don't mind producing some documentation, in case you accept the code.
>>> 4. I noticed a few minor bugs more (like eating spaces in the names of
>>> nodes). But they are really minor.
>>> 5. The API was changed, but I suppose not many people were parsing
>>> trees. If there were people parsing trees most probably the bug on not
>>> being able to process trees that are not binary would have been
>>> detected as it is pretty major.
>>>
>>> Tiago
>>>
>>> --
>>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Mon Nov 16 12:52:56 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 16 Nov 2009 17:52:56 +0000
Subject: [Biojava-l] Newick parser
In-Reply-To:
References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com>
<6d941f120911160435o4d232187mbca004fedd1e5624@mail.gmail.com>
Message-ID: <6d941f120911160952y36b26e40r4ffa5dd980e012fd@mail.gmail.com>
I've submitted a patch to 2679. Please have a look and see if you like it.
2009/11/16 Richard Holland :
> yes please!
>
> On 16 Nov 2009, at 12:35, Tiago Ant?o wrote:
>
>> I can just easily solve 2679, as it is precisely on the file that I
>> was changing. In case there is interest I'll just solve it.
>>
>> 2009/11/16 Richard Holland :
>>> Patch applied to the trunk of biojava-live. Thanks for fixing it!
>>>
>>> cheers,
>>> Richard
>>>
>>> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote:
>>>
>>>> Hi,
>>>>
>>>> I have made the changes as discussed, the code is attached to the
>>>> bugzilla bug concerning part of the issues that were found.
>>>> A few notes:
>>>>
>>>> 1. There is a ParserException raised on TreeBlock. Tough there is a
>>>> TreeBlockParser, most of the important parsing was (and still is!)
>>>> being made on TreeBlock. I would imagine that this is not the best
>>>> design, but I did not change it.
>>>> 2. I made some test cases. Also included.
>>>> 3. I don't mind producing some documentation, in case you accept the code.
>>>> 4. I noticed a few minor bugs more (like eating spaces in the names of
>>>> nodes). But they are really minor.
>>>> 5. The API was changed, but I suppose not many people were parsing
>>>> trees. If there were people parsing trees most probably the bug on not
>>>> being able to process trees that are not binary would have been
>>>> detected as it is pretty major.
>>>>
>>>> Tiago
>>>>
>>>> --
>>>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>
>>
>>
>> --
>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
From tiagoantao at gmail.com Tue Nov 17 14:57:50 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Nov 2009 19:57:50 +0000
Subject: [Biojava-l] Fwd: Newick parser
In-Reply-To: <6d941f120911171154q31df1c32taff20f5b282867bc@mail.gmail.com>
References: <6d941f120911151519s732144fu1150d672107fca1e@mail.gmail.com>
<59a41c430911171123j43806c25vda67e406aa2d3caa@mail.gmail.com>
<6d941f120911171154q31df1c32taff20f5b282867bc@mail.gmail.com>
Message-ID: <6d941f120911171157x20daedcdif84496c363a5bfcd@mail.gmail.com>
Forwarding this to the users mailing list also, as there might be some
interest in the documentation.
---------- Forwarded message ----------
From: Tiago Ant?o
Date: 2009/11/17
Subject: Re: [Biojava-l] Newick parser
To: Richard Holland
Cc: Andreas Prlic , biojava-dev
As this was all fresh in my head, I wrote a small tutorial:
http://tiago.org/cc/2009/11/17/reading-newicknexus-phylogenetic-trees-with-biojava/
As I don't follow the biojava mailing list regularly (or bug reports),
if some bug arises on this code, feel free to send me an email to my
personal account: If I have some time to spare, I will have a look at
it.
Tiago
2009/11/17 Richard Holland :
> Sorry - forgot to change the filenames in the test (under the new modular system they're in a different place than in the non-modular codebase that Tiago was working from). Fixed and committed.
>
> On 17 Nov 2009, at 19:23, Andreas Prlic wrote:
>
>> Hi Richard,
>>
>> I just did an update of my checkout and it seems the -phylo unit tests
>> don't compile any more. Can you take a look?
>>
>> Thanks,
>> Andreas
>>
>> Test set: org.biojavax.bio.phylo.io.nexus.TreesBlockTest
>> -------------------------------------------------------------------------------
>> Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.063
>> sec <<< FAILURE!
>> testSimple(org.biojavax.bio.phylo.io.nexus.TreesBlockTest) ?Time
>> elapsed: 0.021 sec ?<<< ERROR!
>> java.lang.NullPointerException
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTreeNode(TreesBlockTest.java:160)
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTree(TreesBlockTest.java:175)
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.doVertexCount(TreesBlockTest.java:139)
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.testSimple(TreesBlockTest.java:63)
>>
>> testThreeOffspring(org.biojavax.bio.phylo.io.nexus.TreesBlockTest)
>> Time elapsed: 0.002 sec ?<<< ERROR!
>> java.lang.NullPointerException
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTreeNode(TreesBlockTest.java:160)
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.getTree(TreesBlockTest.java:175)
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.doVertexCount(TreesBlockTest.java:139)
>> ? ? ? at org.biojavax.bio.phylo.io.nexus.TreesBlockTest.testThreeOffspring(TreesBlockTest.java:70
>>
>> 2009/11/16 Richard Holland :
>>> Patch applied to the trunk of biojava-live. Thanks for fixing it!
>>>
>>> cheers,
>>> Richard
>>>
>>> On 15 Nov 2009, at 23:19, Tiago Ant?o wrote:
>>>
>>>> Hi,
>>>>
>>>> I have made the changes as discussed, the code is attached to the
>>>> bugzilla bug concerning part of the issues that were found.
>>>> A few notes:
>>>>
>>>> 1. There is a ParserException raised on TreeBlock. Tough there is a
>>>> TreeBlockParser, most of the important parsing was (and still is!)
>>>> being made on TreeBlock. I would imagine that this is not the best
>>>> design, but I did not change it.
>>>> 2. I made some test cases. Also included.
>>>> 3. I don't mind producing some documentation, in case you accept the code.
>>>> 4. I noticed a few minor bugs more (like eating spaces in the names of
>>>> nodes). But they are really minor.
>>>> 5. The API was changed, but I suppose not many people were parsing
>>>> trees. If there were people parsing trees most probably the bug on not
>>>> being able to process trees that are not binary would have been
>>>> detected as it is pretty major.
>>>>
>>>> Tiago
>>>>
>>>> --
>>>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
--
?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci
From mara.axiom at gmail.com Sat Nov 21 00:43:52 2009
From: mara.axiom at gmail.com (Mara Axiom)
Date: Sat, 21 Nov 2009 00:43:52 -0500
Subject: [Biojava-l] Algorithm to compare protein sequences
Message-ID: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com>
Hello all,
I am looking for an algorithm to compare protein sequences and output the
result in Newick format, for a project. I was told that I could not use
UPGMA and Nearest Neighbor, algorithms. I'm new in working with phylogenetic
data. Any help is appreciated.
Thanks,
Mara
From andreas.draeger at uni-tuebingen.de Sat Nov 21 03:35:19 2009
From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Sat, 21 Nov 2009 09:35:19 +0100
Subject: [Biojava-l] Algorithm to compare protein sequences
In-Reply-To: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com>
References: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com>
Message-ID: <4B07A647.1080405@uni-tuebingen.de>
Hi Mara,
At the moment there are two alignment algorithms available:
Smith-Waterman for local and Needleman-Wunsh for global alignment. In
addition to that there is a package for hidden Markov models that is
also able to perform sequence alignments (see the BioJava cookbook for
examples). However, currently both approaches will write the alignment
similar to the BLAST output and not in this Newick format (I am actually
not familiar with that). I hope that helps.
Cheers
Andreas
--
Dipl.-Bioinform. Andreas Dr?ger
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Sand 1
72076 T?bingen
Germany
Phone: +49-7071-29-70436
Fax: +49-7071-29-5091
From thasso.griebel at uni-jena.de Sat Nov 21 06:25:34 2009
From: thasso.griebel at uni-jena.de (Thasso Griebel)
Date: Sat, 21 Nov 2009 12:25:34 +0100
Subject: [Biojava-l] Algorithm to compare protein sequences
In-Reply-To: <4B07A647.1080405@uni-tuebingen.de>
References: <6375ed360911202143sb6956e7n9788a3a603e69bd9@mail.gmail.com>
<4B07A647.1080405@uni-tuebingen.de>
Message-ID:
Hi,
if I get this one right you want to do three things.
1. create a multiple sequence alignment.
2. create a pairwise distance matrix from the alignment.
3. use a distance based tree construction method (Agglomerative clustering (UPGME, WPGMA..) or Neighbor Joining) to create a tree. The tree can be printed as newick string.
I don't know if all of this is possible with biojava. If not, I could at least provide code to create the pairwise distance matrix (including JC and Kimura corrections) and for the clustering algorithms. But I thought NJ and AgglomerativeClustering are already implemented, though I couldn't find the classes in the 1.7 API ?
If you don't need to do the computations programmatically, you can also try
http://bio.informatik.uni-jena.de/epos/
though with the currently released version you have to do the alignment externally. The next release will also provide a way to do multiple sequence alignments directly.
Another alternative is
http://gi.cebitec.uni-bielefeld.de/qalign
QAlign can be used to create the alignment (using clustalw, tcoffee or dialign) and create NJ or Agglomerative tree in one step. Nice thing is that you can manipulate the alignment (i.e. insert gaps) and the tree updated continuously
cheers,
thasso
On Nov 21, 2009, at 09:35 , Andreas Dr?ger wrote:
> Hi Mara,
>
> At the moment there are two alignment algorithms available:
> Smith-Waterman for local and Needleman-Wunsh for global alignment. In
> addition to that there is a package for hidden Markov models that is
> also able to perform sequence alignments (see the BioJava cookbook for
> examples). However, currently both approaches will write the alignment
> similar to the BLAST output and not in this Newick format (I am actually
> not familiar with that). I hope that helps.
>
> Cheers
> Andreas
>
> --
> Dipl.-Bioinform. Andreas Dr?ger
> Eberhard Karls University T?bingen
> Center for Bioinformatics (ZBIT)
> Sand 1
> 72076 T?bingen
> Germany
>
> Phone: +49-7071-29-70436
> Fax: +49-7071-29-5091
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
From jbdundas at gmail.com Sun Nov 8 05:22:59 2009
From: jbdundas at gmail.com (jitesh dundas)
Date: Sun, 08 Nov 2009 10:22:59 -0000
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
<59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
Message-ID: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com>
Dear Sir,
My program is working fine and can send me an xml file with 20
records. However, it does not allow me to send large amounts of
records.
For e.g. if I enter "cancer" it will return only 20 records.
Can you please tell me what I should do next to get all those records.
Thank you in advance
Regards,
Jitesh Dundas
On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote:
>
> Hi Jitesh,
>
> It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected?
>
> Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication...
>
> Andreas
>
>
>
>
> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote:
>>
>> Hi friends,
>>
>> I am getting this error on doing a post(using the code below) to this url->
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
>>
>> I have written this code in .jsp file. Later I will change it into servlet.
>>
>> Error:-
>> XML Parsing Error: XML or text declaration not at start of entity
>> Location:
>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
>> Line Number 11, Column 1:> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200
>> 19877350 19877304 19877297
>> 19877284 19877271 19877265
>> 19877250 19877245 19877226
>> 19877210 19877179 19877175
>> 19877161 19877159 19877158
>> 19877123 19877122 19877120
>> 19877119 19877118
>> cancer
>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
>> Fields]
>> "neoplasms"[MeSH Terms] MeSH
>> Terms 2082133 Y
>> "neoplasms"[All Fields] All
>> Fields 1634731 Y
>> OR "cancer"[All Fields]
>> All Fields 902537 Y
>> OR GROUP
>> 2009/10/22[EDAT] EDAT 0
>> Y
>> 2009/11/01[EDAT] EDAT 0
>> Y RANGE AND
>> ("neoplasms"[MeSH Terms] OR
>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
>> 2009/11/01[EDAT]
>> ^
>>
>> As you can see, the XML output is coming fine but the above error does not
>> go..The output via this program should be just like hitting manually the
>> above URL in the browser..
>> The browser is Mozilla Firefox.
>>
>> Code:-
>>
>> <%@ page language = "java" %>
>> <%@ page import = "java.sql.*" %>
>> <%@ page import = "java.util.*" %>
>> <%@ page import = "java.io.*" %>
>> <%@ page import="java.lang.*" %>
>> <%@ page import="java.net.*" %>
>> <%@ page import="java.nio.*" %>
>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
>>
>>
>> <%
>>
>> try
>> {
>> //String str = "";
>> //out.println("");
>>
>> Properties systemSettings = System.getProperties();
>> systemSettings.put("http.proxyHost", "********");
>> systemSettings.put("http.proxyPort", "******");
>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
>>
>> //out.println("Properties Set");
>> Authenticator.setDefault(new Authenticator()
>> {
>> protected PasswordAuthentication getPasswordAuthentication()
>> {
>> return new PasswordAuthentication("**",
>> "******".toCharArray()); // specify ur user name password of iitb login
>> }
>> });
>>
>>
>> System.setProperties(systemSettings);
>> //out.println("After Authentication & Properties Settings");
>>
>> //create xml file.
>> //the input to google api
>> //String textAreaContent = request.getParameter("text");
>> String textAreaContent = "This si a tst";
>>
>> String str = "";
>>
>> //xml file generation ends here..
>> //FetchDataFromNCBI_URLString.jsp
>> String URLString = request.getParameter("txtURLString").trim();
>>
>> //URL url = new URL("
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
>> ");
>> URL url = new URL(URLString); //url string taken from user input.
>> HttpURLConnection connection = null;
>>
>> connection = (HttpURLConnection) url.openConnection();
>> System.out.println("After open connection");
>> connection.setRequestMethod("POST");
>> connection.setDoInput(true);
>> connection.setDoOutput(true);
>>
>> connection.setUseCaches(false);
>> connection.setAllowUserInteraction(false);
>> //connection.setFollowRedirects(true);
>> //connection.setInstanceFollowRedirects(true);
>> //System.out.println("Before-------------------");
>> connection.setRequestProperty ("Content-Type","text/xml;
>> charset=\"utf-8\"");
>> //System.out.println("After-------------------");
>>
>> //System.out.println(""+ connection.getOutputStream());
>>
>> //System.out.println("After dataoutputstream..Line No-65");
>>
>> //System.out.println("Response Code="+ connection.getResponseCode);
>>
>> OutputStreamWriter dosout = new
>> OutputStreamWriter(connection.getOutputStream());
>> //System.out.println("After dosout object..Line No-63");
>> //dosout.write(str);
>> dosout.close ();
>>
>> BufferedReader in = new BufferedReader( new InputStreamReader(
>> connection.getInputStream()));
>>
>> String decodedString;
>> String tempstr = "";
>>
>>
>> while ((decodedString = in.readLine()) != null)
>> {
>> tempstr = tempstr + decodedString;
>> //out.println(decodedString);
>> }
>> out.println(tempstr);
>> in.close();
>> }
>> catch(Exception ex)
>> {
>> out.println("Exception->"+ex);
>> PrintWriter pw = response.getWriter();
>> ex.printStackTrace(pw);
>> }
>>
>>
>> %>
>>
>> Thanks in advance..
>>
>> Regards,
>> JItesh Dundas
>>
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ImportFromPubmed3.jsp
Type: application/octet-stream
Size: 2696 bytes
Desc: not available
URL:
From cmasak at gmail.com Mon Nov 9 11:26:00 2009
From: cmasak at gmail.com (=?ISO-8859-1?Q?Carl_M=E4sak?=)
Date: Mon, 9 Nov 2009 17:26:00 +0100
Subject: [Biojava-l] How do I read a FASTA file containing protein
sequences in lowercase?
In-Reply-To: <179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com>
References: <16d769b70911060825v298529b5m805c1f7fd388549b@mail.gmail.com>
<16d769b70911060854o69705434x574ce2d1c7d85699@mail.gmail.com>
<179C4DF7-69AF-4E65-8E6A-F984DF7CCE69@eaglegenomics.com>
Message-ID: <16d769b70911090826j135f2ddar13e4fc861b78e4fc@mail.gmail.com>
Richard (>):
> Ah OK I see what's going on.
>
> The convenience method you're using, RichSequence.IOTools.readStream(), uses
> FastaFormat to try and guess the alphabet to use based on the first line of
> the input sequence.
>
> In FastaFormat, it does this by searching for matching non-DNA symbols. The
> search is case-sensitive:
>
> ? ? ? ?protected static final Pattern aminoAcids =
> Pattern.compile(".*[FLIPQE].*");
>
> FastaFormat needs patching to make this pattern non-case-sensitive.
Patch attached.
I also took the opportunity to remove the occurrences of .* in the
Pattern above. Generally, once should be using Matcher.find() when one
is interested in matching a part of a string. This is more efficient
than using Matcher.matches() and surrounding the desired regular
expression with .*, since the latter will cause a lot of unnecessary
backtracking and make the search quadratic.
This effect only shows up for very long strings, but long strings can
and do happen in bioinformatics. The below measurements show the
quadratic behaviour of the former approach.
$ for length in 100 1000 10000 100000 1000000; do (time java
WithDotStar $length) 2>&1 | grep real; done
real 0m0.371s
real 0m0.367s
real 0m0.577s
real 0m2.735s
real 0m25.275s
$ for length in 100 1000 10000 100000 1000000; do (time java
WithoutDotStar $length) 2>&1 | grep real; done
real 0m0.309s
real 0m0.361s
real 0m0.468s
real 0m1.184s
real 0m9.703s
Kindly,
// Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aminoAcids.patch
Type: application/octet-stream
Size: 1995 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WithDotStar.java
Type: application/octet-stream
Size: 634 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WithoutDotStar.java
Type: application/octet-stream
Size: 633 bytes
Desc: not available
URL:
From holland at eaglegenomics.com Mon Nov 23 14:08:11 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 23 Nov 2009 19:08:11 +0000
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
<59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
<326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com>
Message-ID: <2582F218-3873-49FB-BFB2-6F72B2B4815C@eaglegenomics.com>
Your program takes an input 'txtURLString' - could you give an example of the value that this usually contains? I suspect that this URL is where your problem lies but without seeing an example value I couldn't say for sure.
thanks,
Richard
On 8 Nov 2009, at 10:22, jitesh dundas wrote:
> Dear Sir,
>
> My program is working fine and can send me an xml file with 20
> records. However, it does not allow me to send large amounts of
> records.
>
> For e.g. if I enter "cancer" it will return only 20 records.
>
> Can you please tell me what I should do next to get all those records.
> Thank you in advance
>
> Regards,
> Jitesh Dundas
>
> On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote:
>>
>> Hi Jitesh,
>>
>> It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected?
>>
>> Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication...
>>
>> Andreas
>>
>>
>>
>>
>> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote:
>>>
>>> Hi friends,
>>>
>>> I am getting this error on doing a post(using the code below) to this url->
>>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
>>>
>>> I have written this code in .jsp file. Later I will change it into servlet.
>>>
>>> Error:-
>>> XML Parsing Error: XML or text declaration not at start of entity
>>> Location:
>>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
>>> Line Number 11, Column 1:>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
>>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200
>>> 19877350 19877304 19877297
>>> 19877284 19877271 19877265
>>> 19877250 19877245 19877226
>>> 19877210 19877179 19877175
>>> 19877161 19877159 19877158
>>> 19877123 19877122 19877120
>>> 19877119 19877118
>>> cancer
>>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
>>> Fields]
>>> "neoplasms"[MeSH Terms] MeSH
>>> Terms 2082133 Y
>>> "neoplasms"[All Fields] All
>>> Fields 1634731 Y
>>> OR "cancer"[All Fields]
>>> All Fields 902537 Y
>>> OR GROUP
>>> 2009/10/22[EDAT] EDAT 0
>>> Y
>>> 2009/11/01[EDAT] EDAT 0
>>> Y RANGE AND
>>> ("neoplasms"[MeSH Terms] OR
>>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
>>> 2009/11/01[EDAT]
>>> ^
>>>
>>> As you can see, the XML output is coming fine but the above error does not
>>> go..The output via this program should be just like hitting manually the
>>> above URL in the browser..
>>> The browser is Mozilla Firefox.
>>>
>>> Code:-
>>>
>>> <%@ page language = "java" %>
>>> <%@ page import = "java.sql.*" %>
>>> <%@ page import = "java.util.*" %>
>>> <%@ page import = "java.io.*" %>
>>> <%@ page import="java.lang.*" %>
>>> <%@ page import="java.net.*" %>
>>> <%@ page import="java.nio.*" %>
>>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
>>>
>>>
>>> <%
>>>
>>> try
>>> {
>>> //String str = "";
>>> //out.println("");
>>>
>>> Properties systemSettings = System.getProperties();
>>> systemSettings.put("http.proxyHost", "********");
>>> systemSettings.put("http.proxyPort", "******");
>>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
>>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
>>>
>>> //out.println("Properties Set");
>>> Authenticator.setDefault(new Authenticator()
>>> {
>>> protected PasswordAuthentication getPasswordAuthentication()
>>> {
>>> return new PasswordAuthentication("**",
>>> "******".toCharArray()); // specify ur user name password of iitb login
>>> }
>>> });
>>>
>>>
>>> System.setProperties(systemSettings);
>>> //out.println("After Authentication & Properties Settings");
>>>
>>> //create xml file.
>>> //the input to google api
>>> //String textAreaContent = request.getParameter("text");
>>> String textAreaContent = "This si a tst";
>>>
>>> String str = "";
>>>
>>> //xml file generation ends here..
>>> //FetchDataFromNCBI_URLString.jsp
>>> String URLString = request.getParameter("txtURLString").trim();
>>>
>>> //URL url = new URL("
>>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
>>> ");
>>> URL url = new URL(URLString); //url string taken from user input.
>>> HttpURLConnection connection = null;
>>>
>>> connection = (HttpURLConnection) url.openConnection();
>>> System.out.println("After open connection");
>>> connection.setRequestMethod("POST");
>>> connection.setDoInput(true);
>>> connection.setDoOutput(true);
>>>
>>> connection.setUseCaches(false);
>>> connection.setAllowUserInteraction(false);
>>> //connection.setFollowRedirects(true);
>>> //connection.setInstanceFollowRedirects(true);
>>> //System.out.println("Before-------------------");
>>> connection.setRequestProperty ("Content-Type","text/xml;
>>> charset=\"utf-8\"");
>>> //System.out.println("After-------------------");
>>>
>>> //System.out.println(""+ connection.getOutputStream());
>>>
>>> //System.out.println("After dataoutputstream..Line No-65");
>>>
>>> //System.out.println("Response Code="+ connection.getResponseCode);
>>>
>>> OutputStreamWriter dosout = new
>>> OutputStreamWriter(connection.getOutputStream());
>>> //System.out.println("After dosout object..Line No-63");
>>> //dosout.write(str);
>>> dosout.close ();
>>>
>>> BufferedReader in = new BufferedReader( new InputStreamReader(
>>> connection.getInputStream()));
>>>
>>> String decodedString;
>>> String tempstr = "";
>>>
>>>
>>> while ((decodedString = in.readLine()) != null)
>>> {
>>> tempstr = tempstr + decodedString;
>>> //out.println(decodedString);
>>> }
>>> out.println(tempstr);
>>> in.close();
>>> }
>>> catch(Exception ex)
>>> {
>>> out.println("Exception->"+ex);
>>> PrintWriter pw = response.getWriter();
>>> ex.printStackTrace(pw);
>>> }
>>>
>>>
>>> %>
>>>
>>> Thanks in advance..
>>>
>>> Regards,
>>> JItesh Dundas
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From rabee.a.aa at m.titech.ac.jp Tue Nov 24 05:14:30 2009
From: rabee.a.aa at m.titech.ac.jp (rabee.a.aa at m.titech.ac.jp)
Date: Tue, 24 Nov 2009 19:14:30 +0900
Subject: [Biojava-l] sequencing data analysis
Message-ID: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp>
Dear Biojava members,
I'm new to Biojava and i would like to use it for analysis of next generation sequencing data.
May i ask you about the available packages for analysis of sequencing data?
Best Regards,
Rabe
From holland at eaglegenomics.com Tue Nov 24 05:33:43 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 24 Nov 2009 10:33:43 +0000
Subject: [Biojava-l] sequencing data analysis
In-Reply-To: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp>
References: <1259057670648712.31357@mail2.nap.gsic.titech.ac.jp>
Message-ID: <7ECB9ED5-F983-4E74-8AC1-C70C129EDA7E@eaglegenomics.com>
There's loads of things you can do. A good starting point is here:
http://biojava.org/wiki/BioJava:CookBook
cheers,
Richard
On 24 Nov 2009, at 10:14, wrote:
> Dear Biojava members,
> I'm new to Biojava and i would like to use it for analysis of next generation sequencing data.
> May i ask you about the available packages for analysis of sequencing data?
>
> Best Regards,
> Rabe
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From jbdundas at gmail.com Tue Nov 24 09:48:55 2009
From: jbdundas at gmail.com (jitesh dundas)
Date: Tue, 24 Nov 2009 20:18:55 +0530
Subject: [Biojava-l] Fwd: Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
<59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
<326ea8620911080222q225cb6a4m4957be8dd5c9f91f@mail.gmail.com>
<2582F218-3873-49FB-BFB2-6F72B2B4815C@eaglegenomics.com>
<326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com>
Message-ID: <326ea8620911240648w371c1c7fx7f495133753bbbe@mail.gmail.com>
Dear Sir/Madam,
FYI..
Jus trying to contribute to this mailing list and help.
Regards,
Jitesh Dundas
---------- Forwarded message ----------
From: jitesh dundas
Date: Nov 24, 2009 8:17 PM
Subject: Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
To: Richard Holland
Dear Sir,
Thank you for your reply. I figured this problem out by sending records in
small sets. e.g. 20 pages per page.
It is like a pagination functionality. For each new page, we need to hit the
URl..
My functionality is working fine.I will be happy to share my code with you
(and anyone) who needs it.
I simply fetch data from the URL and write to an XML file. Next I just read
the XML file and show them in the web page to the user.
Again, I need to know how to fetch records for protein database. Two types
of searches are needed I suspect.
First we use the Esearch utility and then the Efetch utility to get the data
of the specific protein..
I welcome any suggestions on this !
Thank you everyone for your help.
Regards,
Jitesh Dundas
On 11/24/09, Richard Holland wrote:
>
> Your program takes an input 'txtURLString' - could you give an example of
> the value that this usually contains? I suspect that this URL is where your
> problem lies but without seeing an example value I couldn't say for sure.
>
> thanks,
> Richard
>
> On 8 Nov 2009, at 10:22, jitesh dundas wrote:
>
> > Dear Sir,
> >
> > My program is working fine and can send me an xml file with 20
> > records. However, it does not allow me to send large amounts of
> > records.
> >
> > For e.g. if I enter "cancer" it will return only 20 records.
> >
> > Can you please tell me what I should do next to get all those records.
> > Thank you in advance
> >
> > Regards,
> > Jitesh Dundas
> >
> > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote:
> >>
> >> Hi Jitesh,
> >>
> >> It is hard to read your code with all the formatting off probably due to
> email and many commented lines that don;t seem to get used. Can you provide
> the stacktrace, so we can see what part of biojava is affected?
> >>
> >> Probably a good strategy to write and debug this is to simply the
> problem into smaller steps. Try to first download the files you want to
> parse and write the code to parse them from the local file. That will avoid
> any issues you might encounter with networking and server/client
> communication. Once the parsing is working you could take it to the next
> step and add the server communication...
> >>
> >> Andreas
> >>
> >>
> >>
> >>
> >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas
> wrote:
> >>>
> >>> Hi friends,
> >>>
> >>> I am getting this error on doing a post(using the code below) to this
> url->
> >>>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
> >>>
> >>> I have written this code in .jsp file. Later I will change it into
> servlet.
> >>>
> >>> Error:-
> >>> XML Parsing Error: XML or text declaration not at start of entity
> >>> Location:
> >>>
> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
> >>> Line Number 11, Column 1: >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
> >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd
> ">2034200
> >>> 19877350 19877304 19877297
> >>> 19877284 19877271 19877265
> >>> 19877250 19877245 19877226
> >>> 19877210 19877179 19877175
> >>> 19877161 19877159 19877158
> >>> 19877123 19877122 19877120
> >>> 19877119 19877118
> >>> cancer
> >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
> >>> Fields]
> >>> "neoplasms"[MeSH Terms] MeSH
> >>> Terms 2082133 Y
> >>> "neoplasms"[All
> Fields] All
> >>> Fields 1634731 Y
> >>> OR "cancer"[All
> Fields]
> >>> All
> Fields 902537 Y
> >>> OR GROUP
> >>>
> 2009/10/22[EDAT] EDAT 0
> >>> Y
> >>>
> 2009/11/01[EDAT] EDAT 0
> >>> Y RANGE AND
> >>> ("neoplasms"[MeSH Terms] OR
> >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
> >>> 2009/11/01[EDAT]
> >>> ^
> >>>
> >>> As you can see, the XML output is coming fine but the above error does
> not
> >>> go..The output via this program should be just like hitting manually
> the
> >>> above URL in the browser..
> >>> The browser is Mozilla Firefox.
> >>>
> >>> Code:-
> >>>
> >>> <%@ page language = "java" %>
> >>> <%@ page import = "java.sql.*" %>
> >>> <%@ page import = "java.util.*" %>
> >>> <%@ page import = "java.io.*" %>
> >>> <%@ page import="java.lang.*" %>
> >>> <%@ page import="java.net.*" %>
> >>> <%@ page import="java.nio.*" %>
> >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
> >>>
> >>>
> >>> <%
> >>>
> >>> try
> >>> {
> >>> //String str = "";
> >>> //out.println("");
> >>>
> >>> Properties systemSettings = System.getProperties();
> >>> systemSettings.put("http.proxyHost", "********");
> >>> systemSettings.put("http.proxyPort", "******");
> >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
> >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
> >>>
> >>> //out.println("Properties Set");
> >>> Authenticator.setDefault(new Authenticator()
> >>> {
> >>> protected PasswordAuthentication getPasswordAuthentication()
> >>> {
> >>> return new PasswordAuthentication("**",
> >>> "******".toCharArray()); // specify ur user name password of iitb login
> >>> }
> >>> });
> >>>
> >>>
> >>> System.setProperties(systemSettings);
> >>> //out.println("After Authentication & Properties Settings");
> >>>
> >>> //create xml file.
> >>> //the input to google api
> >>> //String textAreaContent = request.getParameter("text");
> >>> String textAreaContent = "This si a tst";
> >>>
> >>> String str = "";
> >>>
> >>> //xml file generation ends here..
> >>> //FetchDataFromNCBI_URLString.jsp
> >>> String URLString = request.getParameter("txtURLString").trim();
> >>>
> >>> //URL url = new URL("
> >>>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
> >>> ");
> >>> URL url = new URL(URLString); //url string taken from user input.
> >>> HttpURLConnection connection = null;
> >>>
> >>> connection = (HttpURLConnection) url.openConnection();
> >>> System.out.println("After open connection");
> >>> connection.setRequestMethod("POST");
> >>> connection.setDoInput(true);
> >>> connection.setDoOutput(true);
> >>>
> >>> connection.setUseCaches(false);
> >>> connection.setAllowUserInteraction(false);
> >>> //connection.setFollowRedirects(true);
> >>> //connection.setInstanceFollowRedirects(true);
> >>> //System.out.println("Before-------------------");
> >>> connection.setRequestProperty ("Content-Type","text/xml;
> >>> charset=\"utf-8\"");
> >>> //System.out.println("After-------------------");
> >>>
> >>> //System.out.println(""+ connection.getOutputStream());
> >>>
> >>> //System.out.println("After dataoutputstream..Line No-65");
> >>>
> >>> //System.out.println("Response Code="+ connection.getResponseCode);
> >>>
> >>> OutputStreamWriter dosout = new
> >>> OutputStreamWriter(connection.getOutputStream());
> >>> //System.out.println("After dosout object..Line No-63");
> >>> //dosout.write(str);
> >>> dosout.close ();
> >>>
> >>> BufferedReader in = new BufferedReader( new InputStreamReader(
> >>> connection.getInputStream()));
> >>>
> >>> String decodedString;
> >>> String tempstr = "";
> >>>
> >>>
> >>> while ((decodedString = in.readLine()) != null)
> >>> {
> >>> tempstr = tempstr + decodedString;
> >>> //out.println(decodedString);
> >>> }
> >>> out.println(tempstr);
> >>> in.close();
> >>> }
> >>> catch(Exception ex)
> >>> {
> >>> out.println("Exception->"+ex);
> >>> PrintWriter pw = response.getWriter();
> >>> ex.printStackTrace(pw);
> >>> }
> >>>
> >>>
> >>> %>
> >>>
> >>> Thanks in advance..
> >>>
> >>> Regards,
> >>> JItesh Dundas
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >>
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
From holland at eaglegenomics.com Tue Nov 24 09:51:49 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 24 Nov 2009 14:51:49 +0000
Subject: [Biojava-l] Fwd: Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
References: <326ea8620911240647i686a3488sc9ad46cc314dfbd3@mail.gmail.com>
Message-ID: <02966AA0-0DD3-4EF1-9D02-E86F593D16D8@eaglegenomics.com>
Jitesh - I forwarded your response to the list so that everyone can get the chance to reply.
cheers,
Richard
Begin forwarded message:
> From: jitesh dundas
> Date: 24 November 2009 14:47:00 GMT
> To: Richard Holland
> Subject: Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration not at start of entity
>
> Dear Sir,
>
> Thank you for your reply. I figured this problem out by sending records in small sets. e.g. 20 pages per page.
>
> It is like a pagination functionality. For each new page, we need to hit the URl..
>
> My functionality is working fine.I will be happy to share my code with you (and anyone) who needs it.
>
> I simply fetch data from the URL and write to an XML file. Next I just read the XML file and show them in the web page to the user.
>
> Again, I need to know how to fetch records for protein database. Two types of searches are needed I suspect.
>
> First we use the Esearch utility and then the Efetch utility to get the data of the specific protein..
>
> I welcome any suggestions on this !
>
> Thank you everyone for your help.
>
> Regards,
> Jitesh Dundas
>
> On 11/24/09, Richard Holland wrote:
> Your program takes an input 'txtURLString' - could you give an example of the value that this usually contains? I suspect that this URL is where your problem lies but without seeing an example value I couldn't say for sure.
>
> thanks,
> Richard
>
> On 8 Nov 2009, at 10:22, jitesh dundas wrote:
>
> > Dear Sir,
> >
> > My program is working fine and can send me an xml file with 20
> > records. However, it does not allow me to send large amounts of
> > records.
> >
> > For e.g. if I enter "cancer" it will return only 20 records.
> >
> > Can you please tell me what I should do next to get all those records.
> > Thank you in advance
> >
> > Regards,
> > Jitesh Dundas
> >
> > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote:
> >>
> >> Hi Jitesh,
> >>
> >> It is hard to read your code with all the formatting off probably due to email and many commented lines that don;t seem to get used. Can you provide the stacktrace, so we can see what part of biojava is affected?
> >>
> >> Probably a good strategy to write and debug this is to simply the problem into smaller steps. Try to first download the files you want to parse and write the code to parse them from the local file. That will avoid any issues you might encounter with networking and server/client communication. Once the parsing is working you could take it to the next step and add the server communication...
> >>
> >> Andreas
> >>
> >>
> >>
> >>
> >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote:
> >>>
> >>> Hi friends,
> >>>
> >>> I am getting this error on doing a post(using the code below) to this url->
> >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
> >>>
> >>> I have written this code in .jsp file. Later I will change it into servlet.
> >>>
> >>> Error:-
> >>> XML Parsing Error: XML or text declaration not at start of entity
> >>> Location:
> >>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
> >>> Line Number 11, Column 1: >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
> >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">2034200
> >>> 19877350 19877304 19877297
> >>> 19877284 19877271 19877265
> >>> 19877250 19877245 19877226
> >>> 19877210 19877179 19877175
> >>> 19877161 19877159 19877158
> >>> 19877123 19877122 19877120
> >>> 19877119 19877118
> >>> cancer
> >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
> >>> Fields]
> >>> "neoplasms"[MeSH Terms] MeSH
> >>> Terms 2082133 Y
> >>> "neoplasms"[All Fields] All
> >>> Fields 1634731 Y
> >>> OR "cancer"[All Fields]
> >>> All Fields 902537 Y
> >>> OR GROUP
> >>> 2009/10/22[EDAT] EDAT 0
> >>> Y
> >>> 2009/11/01[EDAT] EDAT 0
> >>> Y RANGE AND
> >>> ("neoplasms"[MeSH Terms] OR
> >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
> >>> 2009/11/01[EDAT]
> >>> ^
> >>>
> >>> As you can see, the XML output is coming fine but the above error does not
> >>> go..The output via this program should be just like hitting manually the
> >>> above URL in the browser..
> >>> The browser is Mozilla Firefox.
> >>>
> >>> Code:-
> >>>
> >>> <%@ page language = "java" %>
> >>> <%@ page import = "java.sql.*" %>
> >>> <%@ page import = "java.util.*" %>
> >>> <%@ page import = "java.io.*" %>
> >>> <%@ page import="java.lang.*" %>
> >>> <%@ page import="java.net.*" %>
> >>> <%@ page import="java.nio.*" %>
> >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
> >>>
> >>>
> >>> <%
> >>>
> >>> try
> >>> {
> >>> //String str = "";
> >>> //out.println("");
> >>>
> >>> Properties systemSettings = System.getProperties();
> >>> systemSettings.put("http.proxyHost", "********");
> >>> systemSettings.put("http.proxyPort", "******");
> >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
> >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
> >>>
> >>> //out.println("Properties Set");
> >>> Authenticator.setDefault(new Authenticator()
> >>> {
> >>> protected PasswordAuthentication getPasswordAuthentication()
> >>> {
> >>> return new PasswordAuthentication("**",
> >>> "******".toCharArray()); // specify ur user name password of iitb login
> >>> }
> >>> });
> >>>
> >>>
> >>> System.setProperties(systemSettings);
> >>> //out.println("After Authentication & Properties Settings");
> >>>
> >>> //create xml file.
> >>> //the input to google api
> >>> //String textAreaContent = request.getParameter("text");
> >>> String textAreaContent = "This si a tst";
> >>>
> >>> String str = "";
> >>>
> >>> //xml file generation ends here..
> >>> //FetchDataFromNCBI_URLString.jsp
> >>> String URLString = request.getParameter("txtURLString").trim();
> >>>
> >>> //URL url = new URL("
> >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
> >>> ");
> >>> URL url = new URL(URLString); //url string taken from user input.
> >>> HttpURLConnection connection = null;
> >>>
> >>> connection = (HttpURLConnection) url.openConnection();
> >>> System.out.println("After open connection");
> >>> connection.setRequestMethod("POST");
> >>> connection.setDoInput(true);
> >>> connection.setDoOutput(true);
> >>>
> >>> connection.setUseCaches(false);
> >>> connection.setAllowUserInteraction(false);
> >>> //connection.setFollowRedirects(true);
> >>> //connection.setInstanceFollowRedirects(true);
> >>> //System.out.println("Before-------------------");
> >>> connection.setRequestProperty ("Content-Type","text/xml;
> >>> charset=\"utf-8\"");
> >>> //System.out.println("After-------------------");
> >>>
> >>> //System.out.println(""+ connection.getOutputStream());
> >>>
> >>> //System.out.println("After dataoutputstream..Line No-65");
> >>>
> >>> //System.out.println("Response Code="+ connection.getResponseCode);
> >>>
> >>> OutputStreamWriter dosout = new
> >>> OutputStreamWriter(connection.getOutputStream());
> >>> //System.out.println("After dosout object..Line No-63");
> >>> //dosout.write(str);
> >>> dosout.close ();
> >>>
> >>> BufferedReader in = new BufferedReader( new InputStreamReader(
> >>> connection.getInputStream()));
> >>>
> >>> String decodedString;
> >>> String tempstr = "";
> >>>
> >>>
> >>> while ((decodedString = in.readLine()) != null)
> >>> {
> >>> tempstr = tempstr + decodedString;
> >>> //out.println(decodedString);
> >>> }
> >>> out.println(tempstr);
> >>> in.close();
> >>> }
> >>> catch(Exception ex)
> >>> {
> >>> out.println("Exception->"+ex);
> >>> PrintWriter pw = response.getWriter();
> >>> ex.printStackTrace(pw);
> >>> }
> >>>
> >>>
> >>> %>
> >>>
> >>> Thanks in advance..
> >>>
> >>> Regards,
> >>> JItesh Dundas
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >>
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From holland at eaglegenomics.com Tue Nov 24 10:27:20 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 24 Nov 2009 15:27:20 +0000
Subject: [Biojava-l] Hackathon in January
Message-ID: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com>
Hi all.
To anyone planning on attending the BioJava hackathon in Cambridge (UK) in January, now would be a good time to sort out travel arrangements. If you're intending to come but haven't yet said so, please do let me know so that I can ensure we get a big enough room to work in!
cheers,
Richard
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From andreas at sdsc.edu Tue Nov 24 12:54:38 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 24 Nov 2009 09:54:38 -0800
Subject: [Biojava-l] Hackathon in January
In-Reply-To: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com>
References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com>
Message-ID: <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com>
* Is anybody interested in following the ongoings at the hackaton via
an online-stream? - I received a request about this and am wondering
if more people would be interested in this.
* Just to repeat the current status regarding the program: So far the
plan is to continue working on the new modules. Ideally we will have a
brand new biojava 3 ready soon after the hackaton. A more detailed
program for the week will be sent out in January.
If anybody wants to propose feature requests, you can have a look at
the current todo list for the modules:
http://biojava.org/wiki/BioJava:Modules
Andreas
On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland
wrote:
> Hi all.
>
> To anyone planning on attending the BioJava hackathon in Cambridge (UK) in January, now would be a good time to sort out travel arrangements. If you're intending to come but haven't yet said so, please do let me know so that I can ensure we get a big enough room to work in!
>
> cheers,
> Richard
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From ayates at ebi.ac.uk Wed Nov 25 07:51:23 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 25 Nov 2009 12:51:23 +0000
Subject: [Biojava-l] [Biojava-dev] Hackathon in January
In-Reply-To: <59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com>
References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com>
<59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com>
Message-ID: <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk>
By online stream do you mean Wave or Twitter or something else
trendy? :)
Andy
On 24 Nov 2009, at 17:54, Andreas Prlic wrote:
> * Is anybody interested in following the ongoings at the hackaton via
> an online-stream? - I received a request about this and am wondering
> if more people would be interested in this.
>
> * Just to repeat the current status regarding the program: So far the
> plan is to continue working on the new modules. Ideally we will have a
> brand new biojava 3 ready soon after the hackaton. A more detailed
> program for the week will be sent out in January.
>
> If anybody wants to propose feature requests, you can have a look at
> the current todo list for the modules:
> http://biojava.org/wiki/BioJava:Modules
>
> Andreas
>
>
>
> On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland
> wrote:
>> Hi all.
>>
>> To anyone planning on attending the BioJava hackathon in Cambridge
>> (UK) in January, now would be a good time to sort out travel
>> arrangements. If you're intending to come but haven't yet said so,
>> please do let me know so that I can ensure we get a big enough room
>> to work in!
>>
>> cheers,
>> Richard
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
From andreas at sdsc.edu Wed Nov 25 15:08:33 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 25 Nov 2009 12:08:33 -0800
Subject: [Biojava-l] [Biojava-dev] Hackathon in January
In-Reply-To: <4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk>
References: <1235107C-E1D2-4EEB-8A5F-2335751120EE@eaglegenomics.com>
<59a41c430911240954g589496dbg60a829d711131fab@mail.gmail.com>
<4DA69755-069B-4E70-BBE2-AEB408AE7E02@ebi.ac.uk>
Message-ID: <59a41c430911251208u44bb1218l7cb0065657ed7227@mail.gmail.com>
I was thinking about video... I would expect that some of the
participants will do some sort of tweeting, blogging, etc.
Andreas
On Wed, Nov 25, 2009 at 4:51 AM, Andy Yates wrote:
> By online stream do you mean Wave or Twitter or something else trendy? :)
>
> Andy
>
> On 24 Nov 2009, at 17:54, Andreas Prlic wrote:
>
>> * Is anybody interested in following the ongoings at the hackaton via
>> an online-stream? - I received a request about this and am wondering
>> if more people would be interested in this.
>>
>> * Just to repeat the current status regarding the program: So far the
>> plan is to continue working on the new modules. Ideally we will have a
>> brand new biojava 3 ready soon after the hackaton. A more detailed
>> program for the week will be sent out in January.
>>
>> If anybody wants to propose feature requests, you can have a look at
>> the current todo list for the modules:
>> http://biojava.org/wiki/BioJava:Modules
>>
>> Andreas
>>
>>
>>
>> On Tue, Nov 24, 2009 at 7:27 AM, Richard Holland
>> wrote:
>>>
>>> Hi all.
>>>
>>> To anyone planning on attending the BioJava hackathon in Cambridge (UK)
>>> in January, now would be a good time to sort out travel arrangements. If
>>> you're intending to come but haven't yet said so, please do let me know so
>>> that I can ensure we get a big enough room to work in!
>>>
>>> cheers,
>>> Richard
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
From jw12 at sanger.ac.uk Thu Nov 26 09:57:35 2009
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Thu, 26 Nov 2009 14:57:35 +0000
Subject: [Biojava-l] DAS workshop 7th-9th April 2010
Message-ID:
We are considering running a Distributed Annotation System workshop
here at the Sanger/EBI in the UK subject to decent demand.
The workshop will be held from Wednesday 7th-Friday 9th April 2010. If
you would be interested in attending either to present or just take part
then please email me jw12 at sanger.ac.uk
The format of the workshop is likely to be similar to last years (1st
day for beginners, 2nd for both beginners and advanced users, 3rd day
for advanced), information for which can be found here:
http://www.dasregistry.org/course.jsp
If you would like to present then please send a short summary of what
you would like to talk about.
Thanks
Jonathan.
Jonathan Warren
Senior Developer and DAS coordinator
jw12 at sanger.ac.uk
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From mauricio at open-bio.org Thu Nov 26 16:45:43 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 26 Nov 2009 15:45:43 -0600
Subject: [Biojava-l] [DAS] DAS workshop 7th-9th April 2010
In-Reply-To:
References:
Message-ID: <4B0EF707.6080202@open-bio.org>
Hi Jonathan,
Any chance it can be webcasted? I'm sure it would attract a lot of
remote attendees ;)
Regards,
Mauricio.
Jonathan Warren wrote:
> We are considering running a Distributed Annotation System workshop here
> at the Sanger/EBI in the UK subject to decent demand.
> The workshop will be held from Wednesday 7th-Friday 9th April 2010. If
> you would be interested in attending either to present or just take part
> then please email me jw12 at sanger.ac.uk
>
> The format of the workshop is likely to be similar to last years (1st
> day for beginners, 2nd for both beginners and advanced users, 3rd day
> for advanced), information for which can be found here:
> http://www.dasregistry.org/course.jsp
>
> If you would like to present then please send a short summary of what
> you would like to talk about.
>
> Thanks
>
> Jonathan.
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> jw12 at sanger.ac.uk
>
>
>
>
>
>
>
>
>
From jbdundas at gmail.com Sun Nov 1 15:41:03 2009
From: jbdundas at gmail.com (jitesh dundas)
Date: Sun, 1 Nov 2009 21:11:03 +0530
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text declaration
not at start of entity
In-Reply-To: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
Message-ID: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
Hi friends,
I am getting this error on doing a post(using the code below) to this url->
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
I have written this code in .jsp file. Later I will change it into servlet.
Error:-
XML Parsing Error: XML or text declaration not at start of entity
Location:
http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
Line Number 11, Column 1:2034200
19877350 19877304 19877297
19877284 19877271 19877265
19877250 19877245 19877226
19877210 19877179 19877175
19877161 19877159 19877158
19877123 19877122 19877120
19877119 19877118
cancer
"neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
Fields]
"neoplasms"[MeSH Terms] MeSH
Terms 2082133 Y
"neoplasms"[All Fields] All
Fields 1634731 Y
OR "cancer"[All Fields]
All Fields 902537 Y
OR GROUP
2009/10/22[EDAT] EDAT 0
Y
2009/11/01[EDAT] EDAT 0
Y RANGE AND
("neoplasms"[MeSH Terms] OR
"neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
2009/11/01[EDAT]
^
As you can see, the XML output is coming fine but the above error does not
go..The output via this program should be just like hitting manually the
above URL in the browser..
The browser is Mozilla Firefox.
Code:-
<%@ page language = "java" %>
<%@ page import = "java.sql.*" %>
<%@ page import = "java.util.*" %>
<%@ page import = "java.io.*" %>
<%@ page import="java.lang.*" %>
<%@ page import="java.net.*" %>
<%@ page import="java.nio.*" %>
<%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
<%
try
{
//String str = "";
//out.println("");
Properties systemSettings = System.getProperties();
systemSettings.put("http.proxyHost", "********");
systemSettings.put("http.proxyPort", "******");
systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
//out.println("Properties Set");
Authenticator.setDefault(new Authenticator()
{
protected PasswordAuthentication getPasswordAuthentication()
{
return new PasswordAuthentication("**",
"******".toCharArray()); // specify ur user name password of iitb login
}
});
System.setProperties(systemSettings);
//out.println("After Authentication & Properties Settings");
//create xml file.
//the input to google api
//String textAreaContent = request.getParameter("text");
String textAreaContent = "This si a tst";
String str = "";
//xml file generation ends here..
//FetchDataFromNCBI_URLString.jsp
String URLString = request.getParameter("txtURLString").trim();
//URL url = new URL("
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
");
URL url = new URL(URLString); //url string taken from user input.
HttpURLConnection connection = null;
connection = (HttpURLConnection) url.openConnection();
System.out.println("After open connection");
connection.setRequestMethod("POST");
connection.setDoInput(true);
connection.setDoOutput(true);
connection.setUseCaches(false);
connection.setAllowUserInteraction(false);
//connection.setFollowRedirects(true);
//connection.setInstanceFollowRedirects(true);
//System.out.println("Before-------------------");
connection.setRequestProperty ("Content-Type","text/xml;
charset=\"utf-8\"");
//System.out.println("After-------------------");
//System.out.println(""+ connection.getOutputStream());
//System.out.println("After dataoutputstream..Line No-65");
//System.out.println("Response Code="+ connection.getResponseCode);
OutputStreamWriter dosout = new
OutputStreamWriter(connection.getOutputStream());
//System.out.println("After dosout object..Line No-63");
//dosout.write(str);
dosout.close ();
BufferedReader in = new BufferedReader( new InputStreamReader(
connection.getInputStream()));
String decodedString;
String tempstr = "";
while ((decodedString = in.readLine()) != null)
{
tempstr = tempstr + decodedString;
//out.println(decodedString);
}
out.println(tempstr);
in.close();
}
catch(Exception ex)
{
out.println("Exception->"+ex);
PrintWriter pw = response.getWriter();
ex.printStackTrace(pw);
}
%>
Thanks in advance..
Regards,
JItesh Dundas
From andreas at sdsc.edu Sun Nov 1 16:06:29 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 1 Nov 2009 08:06:29 -0800
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
Message-ID: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
Hi Jitesh,
It is hard to read your code with all the formatting off probably due to
email and many commented lines that don;t seem to get used. Can you provide
the stacktrace, so we can see what part of biojava is affected?
Probably a good strategy to write and debug this is to simply the problem
into smaller steps. Try to first download the files you want to parse and
write the code to parse them from the local file. That will avoid any
issues you might encounter with networking and server/client communication.
Once the parsing is working you could take it to the next step and add the
server communication...
Andreas
On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote:
> Hi friends,
>
> I am getting this error on doing a post(using the code below) to this url->
>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
>
> I have written this code in .jsp file. Later I will change it into servlet.
>
> Error:-
> XML Parsing Error: XML or text declaration not at start of entity
> Location:
>
> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
> Line Number 11, Column 1: PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd
> ">2034200
> 19877350 19877304 19877297
> 19877284 19877271 19877265
> 19877250 19877245 19877226
> 19877210 19877179 19877175
> 19877161 19877159 19877158
> 19877123 19877122 19877120
> 19877119 19877118
> cancer
> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
> Fields]
> "neoplasms"[MeSH Terms] MeSH
> Terms 2082133 Y
> "neoplasms"[All Fields]
> All
> Fields 1634731 Y
> OR "cancer"[All Fields]
> All Fields 902537 Y
> OR GROUP
> 2009/10/22[EDAT] EDAT 0
> Y
> 2009/11/01[EDAT] EDAT 0
> Y RANGE AND
> ("neoplasms"[MeSH Terms] OR
> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
> 2009/11/01[EDAT]
> ^
>
> As you can see, the XML output is coming fine but the above error does not
> go..The output via this program should be just like hitting manually the
> above URL in the browser..
> The browser is Mozilla Firefox.
>
> Code:-
>
> <%@ page language = "java" %>
> <%@ page import = "java.sql.*" %>
> <%@ page import = "java.util.*" %>
> <%@ page import = "java.io.*" %>
> <%@ page import="java.lang.*" %>
> <%@ page import="java.net.*" %>
> <%@ page import="java.nio.*" %>
> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
>
>
> <%
>
> try
> {
> //String str = "";
> //out.println("");
>
> Properties systemSettings = System.getProperties();
> systemSettings.put("http.proxyHost", "********");
> systemSettings.put("http.proxyPort", "******");
> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
>
> //out.println("Properties Set");
> Authenticator.setDefault(new Authenticator()
> {
> protected PasswordAuthentication getPasswordAuthentication()
> {
> return new PasswordAuthentication("**",
> "******".toCharArray()); // specify ur user name password of iitb login
> }
> });
>
>
> System.setProperties(systemSettings);
> //out.println("After Authentication & Properties Settings");
>
> //create xml file.
> //the input to google api
> //String textAreaContent = request.getParameter("text");
> String textAreaContent = "This si a tst";
>
> String str = "";
>
> //xml file generation ends here..
> //FetchDataFromNCBI_URLString.jsp
> String URLString = request.getParameter("txtURLString").trim();
>
> //URL url = new URL("
>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
> ");
> URL url = new URL(URLString); //url string taken from user input.
> HttpURLConnection connection = null;
>
> connection = (HttpURLConnection) url.openConnection();
> System.out.println("After open connection");
> connection.setRequestMethod("POST");
> connection.setDoInput(true);
> connection.setDoOutput(true);
>
> connection.setUseCaches(false);
> connection.setAllowUserInteraction(false);
> //connection.setFollowRedirects(true);
> //connection.setInstanceFollowRedirects(true);
> //System.out.println("Before-------------------");
> connection.setRequestProperty ("Content-Type","text/xml;
> charset=\"utf-8\"");
> //System.out.println("After-------------------");
>
> //System.out.println(""+ connection.getOutputStream());
>
> //System.out.println("After dataoutputstream..Line No-65");
>
> //System.out.println("Response Code="+ connection.getResponseCode);
>
> OutputStreamWriter dosout = new
> OutputStreamWriter(connection.getOutputStream());
> //System.out.println("After dosout object..Line No-63");
> //dosout.write(str);
> dosout.close ();
>
> BufferedReader in = new BufferedReader( new InputStreamReader(
> connection.getInputStream()));
>
> String decodedString;
> String tempstr = "";
>
>
> while ((decodedString = in.readLine()) != null)
> {
> tempstr = tempstr + decodedString;
> //out.println(decodedString);
> }
> out.println(tempstr);
> in.close();
> }
> catch(Exception ex)
> {
> out.println("Exception->"+ex);
> PrintWriter pw = response.getWriter();
> ex.printStackTrace(pw);
> }
>
>
> %>
>
> Thanks in advance..
>
> Regards,
> JItesh Dundas
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From jbdundas at gmail.com Mon Nov 2 08:19:19 2009
From: jbdundas at gmail.com (jitesh dundas)
Date: Mon, 2 Nov 2009 13:49:19 +0530
Subject: [Biojava-l] Java Error:- XML Parsing Error: XML or text
declaration not at start of entity
In-Reply-To: <59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
References: <326ea8620911010739t1e658509h7dd33ba8482312f8@mail.gmail.com>
<326ea8620911010741q3880a13g53626d94d0d2abd2@mail.gmail.com>
<59a41c430911010806u62f45b90ic4a9563a27ee12e2@mail.gmail.com>
Message-ID: <326ea8620911020019w2b6a8307o5befcc5a4395299a@mail.gmail.com>
Dear Dr. *Andreas Prlic,*
Thank you for the advise. I will do that.
Regards,
Jitesh Dundas
On 11/1/09, Andreas Prlic wrote:
>
> Hi Jitesh,
>
> It is hard to read your code with all the formatting off probably due to
> email and many commented lines that don;t seem to get used. Can you provide
> the stacktrace, so we can see what part of biojava is affected?
>
> Probably a good strategy to write and debug this is to simply the problem
> into smaller steps. Try to first download the files you want to parse and
> write the code to parse them from the local file. That will avoid any
> issues you might encounter with networking and server/client communication.
> Once the parsing is working you could take it to the next step and add the
> server communication...
>
> Andreas
>
>
>
>
> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas wrote:
>
>> Hi friends,
>>
>> I am getting this error on doing a post(using the code below) to this
>> url->
>>
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10
>>
>> I have written this code in .jsp file. Later I will change it into
>> servlet.
>>
>> Error:-
>> XML Parsing Error: XML or text declaration not at start of entity
>> Location:
>>
>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI
>> Line Number 11, Column 1:> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "
>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd
>> ">2034200
>> 19877350 19877304 19877297
>> 19877284 19877271 19877265
>> 19877250 19877245 19877226
>> 19877210 19877179 19877175
>> 19877161 19877159 19877158
>> 19877123 19877122 19877120
>> 19877119 19877118
>> cancer
>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All
>> Fields]
>> "neoplasms"[MeSH Terms] MeSH
>> Terms 2082133 Y
>> "neoplasms"[All Fields]
>> All
>> Fields 1634731 Y
>> OR "cancer"[All Fields]
>> All Fields 902537 Y
>> OR GROUP
>> 2009/10/22[EDAT] EDAT 0
>> Y
>> 2009/11/01[EDAT] EDAT 0
>> Y RANGE AND
>> ("neoplasms"[MeSH Terms] OR
>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] :
>> 2009/11/01[EDAT]
>> ^
>>
>> As you can see, the XML output is coming fine but the above error does not
>> go..The output via this program should be just like hitting manually the
>> above URL in the browser..
>> The browser is Mozilla Firefox.
>>
>> Code:-
>>
>> <%@ page language = "java" %>
>> <%@ page import = "java.sql.*" %>
>> <%@ page import = "java.util.*" %>
>> <%@ page import = "java.io.*" %>
>> <%@ page import="java.lang.*" %>
>> <%@ page import="java.net.*" %>
>> <%@ page import="java.nio.*" %>
>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %>
>>
>>
>> <%
>>
>> try
>> {
>> //String str = "";
>> //out.println("");
>>
>> Properties systemSettings = System.getProperties();
>> systemSettings.put("http.proxyHost", "********");
>> systemSettings.put("http.proxyPort", "******");
>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000");
>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000");
>>
>> //out.println("Properties Set");
>> Authenticator.setDefault(new Authenticator()
>> {
>> protected PasswordAuthentication getPasswordAuthentication()
>> {
>> return new PasswordAuthentication("**",
>> "******".toCharArray()); // specify ur user name password of iitb login
>> }
>> });
>>
>>
>> System.setProperties(systemSettings);
>> //out.println("After Authentication & Properties Settings");
>>
>> //create xml file.
>> //the input to google api
>> //String textAreaContent = request.getParameter("text");
>> String textAreaContent = "This si a tst";
>>
>> String str = "";
>>
>> //xml file generation ends here..
>> //FetchDataFromNCBI_URLString.jsp
>> String URLString = request.getParameter("txtURLString").trim();
>>
>> //URL url = new URL("
>>
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519
>> ");
>> URL url = new URL(URLString); //url string taken from user input.
>> HttpURLConnection connection = null;
>>
>> connection = (HttpURLConnection) url.openConnection();
>> System.out.println("After open connection");
>> connection.setRequestMethod("POST");
>> connection.setDoInput(true);
>> connection.setDoOutput(true);
>>
>> connection.setUseCaches(false);
>> connection.setAllowUserInteraction(false);
>> //connection.setFollowRedirects(true);
>> //connection.setInstanceFollowRedirects(true);
>> //System.out.println("Before-------------------");
>> connection.setRequestProperty ("Content-Type","text/xml;
>> charset=\"utf-8\"");
>> //System.out.println("After-------------------");
>>
>> //System.out.println(""+ connection.getOutputStream());
>>
>> //System.out.println("After dataoutputstream..Line No-65");
>>
>> //System.out.println("Response Code="+ connection.getResponseCode);
>>
>> OutputStreamWriter dosout = new
>> OutputStreamWriter(connection.getOutputStream());
>> //System.out.println("After dosout object..Line No-63");
>> //dosout.write(str);
>> dosout.close ();
>>
>> BufferedReader in = new BufferedReader( new InputStreamReader(
>> connection.getInputStream()));
>>
>> String decodedString;
>> String tempstr = "";
>>
>>
>> while ((decodedString = in.readLine()) != null)
>> {
>> tempstr = tempstr + decodedString;
>> //out.println(decodedString);
>> }
>> out.println(tempstr);
>> in.close();
>> }
>> catch(Exception ex)
>> {
>> out.println("Exception->"+ex);
>> PrintWriter pw = response.getWriter();
>> ex.printStackTrace(pw);
>> }
>>
>>
>> %>
>>
>> Thanks in advance..
>>
>> Regards,
>> JItesh Dundas
>>
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
From pingou at pingoured.fr Mon Nov 2 14:03:15 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Mon, 02 Nov 2009 15:03:15 +0100
Subject: [Biojava-l] NCBI xml parser
Message-ID: <1257170595.29918.8.camel@localhost.localdomain>
Dear list,
I am trying to find my way around parsing ncbi blast xml.
I am using a small library which performs the blast online [1] and
returns a FileReader of the xml.
I can convert the FileReader to a string and print it, it seems fine.
(I used the default input shown on [1]).
So I am now trying to parse it automatically. I looked at [2] and [3]
but I could not get them working. I then found this message from this
mailing list [4] and thus went to use BlastXMLParserFacade.
It returns me an "org.xml.sax.SAXException: illegal frame number
encountered. (0)".
So my question is then: which method should I use ?
Thanks in advance,
Best regards,
Pierre
[1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/
[2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo
[3]
http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book
[4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html
From jogoodma at indiana.edu Mon Nov 2 14:45:09 2009
From: jogoodma at indiana.edu (Josh Goodman)
Date: Mon, 02 Nov 2009 09:45:09 -0500
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257170595.29918.8.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
Message-ID: <4AEEF075.4020700@indiana.edu>
It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1
for blastp. Hence the illegal frame number (0) error.
Josh
Pierre-Yves wrote:
> Dear list,
>
> I am trying to find my way around parsing ncbi blast xml.
> I am using a small library which performs the blast online [1] and
> returns a FileReader of the xml.
> I can convert the FileReader to a string and print it, it seems fine.
> (I used the default input shown on [1]).
>
> So I am now trying to parse it automatically. I looked at [2] and [3]
> but I could not get them working. I then found this message from this
> mailing list [4] and thus went to use BlastXMLParserFacade.
> It returns me an "org.xml.sax.SAXException: illegal frame number
> encountered. (0)".
>
> So my question is then: which method should I use ?
>
> Thanks in advance,
>
> Best regards,
>
> Pierre
>
>
>
> [1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/
> [2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo
> [3]
> http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book
> [4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html
>
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
From pingou at pingoured.fr Mon Nov 2 16:17:16 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Mon, 02 Nov 2009 17:17:16 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <4AEEF075.4020700@indiana.edu>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
Message-ID: <1257178636.29918.11.camel@localhost.localdomain>
On Mon, 2009-11-02 at 09:45 -0500, Josh Goodman wrote:
> It looks like the new BLAST+ binary is using a default frame of 0 instead of the old default of 1
> for blastp. Hence the illegal frame number (0) error.
>
> Josh
Thanks for the hint.
I downloaded the biojava-1.7-src.jar to check the sources and correct
the frame to 0 (I already saw the case to change).
However, without changing anything on the source, when I try to
reproduce the error, I got a new one:
"org.xml.sax.SAXParseException: The markup declarations contained or
pointed to by the document type declaration must be well-formed."
I understand the error, I am more surprised by the fact that the jar
and the sources of the release 1.7 are given a different errors.
Did I miss something ?
Thanks,
Best regards,
Pierre
From holland at eaglegenomics.com Mon Nov 2 17:16:00 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 2 Nov 2009 17:16:00 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
Message-ID: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
The graphs returned by the Nexus parser are instances that implement
the org.jgrapht.UndirectedGraph interface. Undirected graphs have no
root.
cheers,
Richard
On 30 Oct 2009, at 21:14, Tiago Ant?o wrote:
> Hi,
>
> I have been trying to use biojava to parse some trees on nexus files
> and I have a small doubt:
> If there is a rooted tree, how can one know what is the root vertex in
> the weighted graph (JGraphT)?
> I understand that there is no root if the tree is unrooted, but in
> case it is rooted, how to determine the vertex?
>
> Many thanks,
> Tiago
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From andreas at sdsc.edu Mon Nov 2 19:29:04 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 2 Nov 2009 11:29:04 -0800
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257178636.29918.11.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
Message-ID: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
>
>
> I understand the error, I am more surprised by the fact that the jar
> and the sources of the release 1.7 are given a different errors.
>
>
that's surprising... I built the src-jar and the other jars at the same time
so the code should be identical... Are you sure you are doing exactly the
same?
Andreas
From tiagoantao at gmail.com Mon Nov 2 19:36:31 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 2 Nov 2009 19:36:31 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
Message-ID: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
2009/11/2 Richard Holland :
> The graphs returned by the Nexus parser are instances that implement the
> org.jgrapht.UndirectedGraph interface. Undirected graphs have no root.
Yes, that is a property of the jgrapht. But it might not be the case
of the original nexus file/tree. So, if the tree is rooted, how can
one know the root (without doing the parsing again ourselves to
discover it)? I note two things:
a) The root is obviously not one taxa, but one intermediate node.
b) Even if the tree is unrooted, it might be interesting to know the
"root", for instance to draw the tree, in the way that is was written
in the file.
Tiago
PS - I also added to bugzilla one but related to the parser, but that
is different problem...
From pingou at pingoured.fr Mon Nov 2 19:50:25 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Mon, 02 Nov 2009 20:50:25 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
Message-ID: <4AEF3801.10304@pingoured.fr>
On 11/02/2009 08:29 PM, Andreas Prlic wrote:
>>
>> I understand the error, I am more surprised by the fact that the jar
>> and the sources of the release 1.7 are given a different errors.
>>
>>
> that's surprising... I built the src-jar and the other jars at the same time
> so the code should be identical... Are you sure you are doing exactly the
> same?
I can confirm you this tomorrow but AFAIR before I left I tried the same
code using or the jar file or the project generated from the sources in
NetBeans and it gaves me two differents errors.
Best regards,
Pierre
From holland at eaglegenomics.com Mon Nov 2 22:14:58 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 2 Nov 2009 22:14:58 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
Message-ID: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
The current parser that converts the original Newick tree string into
a JGraphT does not take the root into account, and therefore it is not
recorded anywhere in the JGraphT object. Someone would have to change
the parser to be able to make it record the root node.
In the meantime, the JGraph library which is used for displaying
JGraphT graphs in a visual form does include root-finding methods, so
maybe you could investigate there to see if any of the existing
functions might help?
cheers,
Richard
On 2 Nov 2009, at 19:36, Tiago Ant?o wrote:
> 2009/11/2 Richard Holland :
>> The graphs returned by the Nexus parser are instances that
>> implement the
>> org.jgrapht.UndirectedGraph interface. Undirected graphs have no
>> root.
>
>
> Yes, that is a property of the jgrapht. But it might not be the case
> of the original nexus file/tree. So, if the tree is rooted, how can
> one know the root (without doing the parsing again ourselves to
> discover it)? I note two things:
> a) The root is obviously not one taxa, but one intermediate node.
> b) Even if the tree is unrooted, it might be interesting to know the
> "root", for instance to draw the tree, in the way that is was written
> in the file.
>
> Tiago
> PS - I also added to bugzilla one but related to the parser, but that
> is different problem...
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Mon Nov 2 23:11:13 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 2 Nov 2009 23:11:13 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
Message-ID: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
2009/11/2 Richard Holland :
> In the meantime, the JGraph library which is used for displaying JGraphT
> graphs in a visual form does include root-finding methods, so maybe you
> could investigate there to see if any of the existing functions might help?
Did that. None can help as the graph is not directed (it would be
trivial with a directed graph ,of course).
In the current form, the nexus parser is of limited use for tree information:
1. For rooted trees it has a bug has it doesn't say what is the root
2. For unrooted trees, sometimes the "root" (what the user perceives
as root) is interesting information.
Tiago
From holland at eaglegenomics.com Tue Nov 3 09:56:21 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 09:56:21 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
Message-ID: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
On 2 Nov 2009, at 23:11, Tiago Ant?o wrote:
> 2009/11/2 Richard Holland :
>> In the meantime, the JGraph library which is used for displaying
>> JGraphT
>> graphs in a visual form does include root-finding methods, so maybe
>> you
>> could investigate there to see if any of the existing functions
>> might help?
>
> Did that. None can help as the graph is not directed (it would be
> trivial with a directed graph ,of course).
> In the current form, the nexus parser is of limited use for tree
> information:
> 1. For rooted trees it has a bug has it doesn't say what is the root
The Newick strings used in the Nexus format are themselves undirected
graphs. They don't specify which node is the root, which means it must
be determined by computation after parsing the string. I'm unsure of
the algorithm to use to do this. If there are people on this list who
know the algorithm and have time to code it up, volunteers would be
welcome.
> 2. For unrooted trees, sometimes the "root" (what the user perceives
> as root) is interesting information.
What the user perceives as root in an unrooted tree could be different
for every user, so it would be hard to provide a standard function to
read their mind! However if everyone can come up with a commonly
agreed way of determining the most likely root computationally, it
would be interesting to add this as a feature, with the caveat that it
is only a best-effort approximation as the original tree is unrooted.
cheers,
Richard
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From pingou at pingoured.fr Tue Nov 3 14:45:08 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Tue, 03 Nov 2009 15:45:08 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <4AEF3801.10304@pingoured.fr>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
Message-ID: <1257259508.26094.2.camel@localhost.localdomain>
On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote:
> On 11/02/2009 08:29 PM, Andreas Prlic wrote:
> >>
> >> I understand the error, I am more surprised by the fact that the jar
> >> and the sources of the release 1.7 are given a different errors.
> >>
> >>
> > that's surprising... I built the src-jar and the other jars at the same time
> > so the code should be identical... Are you sure you are doing exactly the
> > same?
>
> I can confirm you this tomorrow but AFAIR before I left I tried the same
> code using or the jar file or the project generated from the sources in
> NetBeans and it gaves me two differents errors.
Ok so just for the record:
- If I use the .jar file I get an error (1)
- If I create a project in NetBeans using the source from BioJava I get
a different error (2)
- If I add as dependencies the sources from BioJava I get the first
error (1)
I thus went for the third solution and found my way around :-)
Thanks for the help.
Best regards,
Pierre
From andreas.prlic at gmail.com Tue Nov 3 14:56:06 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Tue, 3 Nov 2009 06:56:06 -0800
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257259508.26094.2.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
<1257259508.26094.2.camel@localhost.localdomain>
Message-ID: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
So what you are saying is that you had a classpath problem and by
configuring dependencies correctly the problem went away?
Andreas
On 3 Nov 2009, at 06:45, Pierre-Yves wrote:
> On Mon, 2009-11-02 at 20:50 +0100, Pierre-Yves wrote:
>> On 11/02/2009 08:29 PM, Andreas Prlic wrote:
>>>>
>>>> I understand the error, I am more surprised by the fact that the
>>>> jar
>>>> and the sources of the release 1.7 are given a different errors.
>>>>
>>>>
>>> that's surprising... I built the src-jar and the other jars at the
>>> same time
>>> so the code should be identical... Are you sure you are doing
>>> exactly the
>>> same?
>>
>> I can confirm you this tomorrow but AFAIR before I left I tried the
>> same
>> code using or the jar file or the project generated from the
>> sources in
>> NetBeans and it gaves me two differents errors.
>
> Ok so just for the record:
> - If I use the .jar file I get an error (1)
> - If I create a project in NetBeans using the source from BioJava I
> get
> a different error (2)
> - If I add as dependencies the sources from BioJava I get the first
> error (1)
>
> I thus went for the third solution and found my way around :-)
>
> Thanks for the help.
>
> Best regards,
>
> Pierre
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
From pingou at pingoured.fr Tue Nov 3 15:00:32 2009
From: pingou at pingoured.fr (Pierre-Yves)
Date: Tue, 03 Nov 2009 16:00:32 +0100
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
<1257259508.26094.2.camel@localhost.localdomain>
<14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
Message-ID: <1257260432.26094.3.camel@localhost.localdomain>
On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote:
> So what you are saying is that you had a classpath problem and by
> configuring dependencies correctly the problem went away?
In both case it was compiling, only the error at run time was different.
Regards,
Pierre
From andreas.prlic at gmail.com Tue Nov 3 15:05:17 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Tue, 3 Nov 2009 07:05:17 -0800
Subject: [Biojava-l] NCBI xml parser
In-Reply-To: <1257260432.26094.3.camel@localhost.localdomain>
References: <1257170595.29918.8.camel@localhost.localdomain>
<4AEEF075.4020700@indiana.edu>
<1257178636.29918.11.camel@localhost.localdomain>
<59a41c430911021129i7d320b03xd44e99e2bc1baf2c@mail.gmail.com>
<4AEF3801.10304@pingoured.fr>
<1257259508.26094.2.camel@localhost.localdomain>
<14C95451-D221-4CED-BE79-FB2EB805D264@gmail.com>
<1257260432.26094.3.camel@localhost.localdomain>
Message-ID: <447A40F9-52A1-4B22-8D10-27D22F8381B9@gmail.com>
Can you send me the code snipplet off list so I can take a look? Thanks,
A
On 3 Nov 2009, at 07:00, Pierre-Yves wrote:
> On Tue, 2009-11-03 at 06:56 -0800, Andreas Prlic wrote:
>> So what you are saying is that you had a classpath problem and by
>> configuring dependencies correctly the problem went away?
>
> In both case it was compiling, only the error at run time was
> different.
>
> Regards,
>
> Pierre
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
From hlapp at gmx.net Tue Nov 3 16:53:23 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 3 Nov 2009 11:53:23 -0500
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
Message-ID:
The most common ways to root a tree is by mid-point rooting, or using
an outgroup. The latter I suppose is equivalent as the user specifying
a node as the root.
-hilmar
On Nov 3, 2009, at 4:56 AM, Richard Holland wrote:
>
> On 2 Nov 2009, at 23:11, Tiago Ant?o wrote:
>
>> 2009/11/2 Richard Holland :
>>> In the meantime, the JGraph library which is used for displaying
>>> JGraphT
>>> graphs in a visual form does include root-finding methods, so
>>> maybe you
>>> could investigate there to see if any of the existing functions
>>> might help?
>>
>> Did that. None can help as the graph is not directed (it would be
>> trivial with a directed graph ,of course).
>> In the current form, the nexus parser is of limited use for tree
>> information:
>> 1. For rooted trees it has a bug has it doesn't say what is the root
>
> The Newick strings used in the Nexus format are themselves
> undirected graphs. They don't specify which node is the root, which
> means it must be determined by computation after parsing the string.
> I'm unsure of the algorithm to use to do this. If there are people
> on this list who know the algorithm and have time to code it up,
> volunteers would be welcome.
>
>> 2. For unrooted trees, sometimes the "root" (what the user perceives
>> as root) is interesting information.
>
> What the user perceives as root in an unrooted tree could be
> different for every user, so it would be hard to provide a standard
> function to read their mind! However if everyone can come up with a
> commonly agreed way of determining the most likely root
> computationally, it would be interesting to add this as a feature,
> with the caveat that it is only a best-effort approximation as the
> original tree is unrooted.
>
> cheers,
> Richard
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
From thasso.griebel at uni-jena.de Tue Nov 3 17:58:14 2009
From: thasso.griebel at uni-jena.de (Thasso Griebel)
Date: Tue, 3 Nov 2009 18:58:14 +0100
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
Message-ID: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
Hi,
> On 2 Nov 2009, at 23:11, Tiago Ant?o wrote:
>
>> 2009/11/2 Richard Holland :
>>> In the meantime, the JGraph library which is used for displaying
>>> JGraphT
>>> graphs in a visual form does include root-finding methods, so
>>> maybe you
>>> could investigate there to see if any of the existing functions
>>> might help?
>>
>> Did that. None can help as the graph is not directed (it would be
>> trivial with a directed graph ,of course).
>> In the current form, the nexus parser is of limited use for tree
>> information:
>> 1. For rooted trees it has a bug has it doesn't say what is the root
>
> The Newick strings used in the Nexus format are themselves
> undirected graphs. They don't specify which node is the root, which
> means it must be determined by computation after parsing the string.
> I'm unsure of the algorithm to use to do this. If there are people
> on this list who know the algorithm and have time to code it up,
> volunteers would be welcome.
There is a way to uniquely get a root from a newick string. Usually a
rooted newick is surrounded with brackets, which indicates the root as
the highest node in the tree. For example:
(A, (B,C))
describes a tree rooted between "A" and the clade (B,C), and with the
surrounding brackets this is unique.
In nexus the situation might be a bit different. nexus allows you to
prefix the newick string with [&R] or [&U] to indicate rooted/unrooted
trees. For example:
tree treename = [&R] ((A,(B,C)),(D,E));
is a valid rooted nexus tree where the root is placed between the
clades [A.B,C] and [D,E], although in this example the newick is
surrounded with brackets and rooted uniquely by itself.
>> 2. For unrooted trees, sometimes the "root" (what the user perceives
>> as root) is interesting information.
>
> What the user perceives as root in an unrooted tree could be
> different for every user, so it would be hard to provide a standard
> function to read their mind! However if everyone can come up with a
> commonly agreed way of determining the most likely root
> computationally, it would be interesting to add this as a feature,
> with the caveat that it is only a best-effort approximation as the
> original tree is unrooted.
BioNJ implements multiple methods to determine a root in a neighbor-
joining tree. I can look it up, but I think the most common ways to
compute the root are: try to place the root in the "middle" such that
your tree is balanced and you have equal number of leaves to both
sides of the tree. The other method I remember is based on the edge
weights. Basically you find the longest path between two leaves and
place the root in the middle of that path (based on the path length).
I think the most common way though is to specify an outgroup node and
place the root on the path between that outgroup and its successor. I
am not sure if the outgroup can be described in nexus somehow.
I would also suggest to generally parse trees as rooted trees (maybe
jsut for th initial internal model). Creating an unrooted tree from a
rooted one is easy, remove the root and forget about directions. The
other way might be hard and ambiguous.
cheers,
Thasso
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
From tiagoantao at gmail.com Tue Nov 3 18:16:43 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 3 Nov 2009 18:16:43 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
Message-ID: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
2009/11/3 Thasso Griebel :
> There is a way to uniquely ?get a root from a newick string. Usually a
> rooted newick is surrounded with brackets, which indicates the root as the
> highest node in the tree. For example:
>
> (A, (B,C))
>
Agree, it is quite easy to get the root of the tree from the newick
representation. But it should be done on parsing and returned in some
way by the parsing system. If the user has to do it again, it means
that the user has to parse it again just to know the root node.
> I would also suggest to generally parse trees as rooted trees (maybe jsut
> for th initial internal model). Creating an unrooted tree from a rooted ?one
> is easy, remove the root and forget about directions. The other way might be
> hard and ambiguous.
100% agree.
The newick _representation_ always has a root by virtue of the way it
is done. If that root has meaning or not depends. Doing as you suggest
seems the most reasonable idea.
I would add that even if it is an unrooted tree, the topology might be
of interest. In my case I am doing a comparative visualizer and it
might be nice for the user to be able to visualize the topology as
specified. It has no biological meaning, but in practice, for many
users, it helps.
I note that PhyloXML (even by virtue of being a XML format) always
represents the phylogenies as trees (not weigthed DAGs). There an
attribute rooted which can be true or false.
But, anyway. Even assuming a very conservative view on this, the
current parser, for rooted trees, does not allow to determine where is
the root. I think that there would be a consensus that that is a bug?
Tiago
From holland at eaglegenomics.com Tue Nov 3 18:19:36 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 18:19:36 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
Message-ID:
Agreed that there is a bug. Now all we need is someone to go in and
fix it! :)
cheers,
Richard
On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
> 2009/11/3 Thasso Griebel :
>> There is a way to uniquely get a root from a newick string.
>> Usually a
>> rooted newick is surrounded with brackets, which indicates the root
>> as the
>> highest node in the tree. For example:
>>
>> (A, (B,C))
>>
>
> Agree, it is quite easy to get the root of the tree from the newick
> representation. But it should be done on parsing and returned in some
> way by the parsing system. If the user has to do it again, it means
> that the user has to parse it again just to know the root node.
>
>> I would also suggest to generally parse trees as rooted trees
>> (maybe jsut
>> for th initial internal model). Creating an unrooted tree from a
>> rooted one
>> is easy, remove the root and forget about directions. The other way
>> might be
>> hard and ambiguous.
>
> 100% agree.
> The newick _representation_ always has a root by virtue of the way it
> is done. If that root has meaning or not depends. Doing as you suggest
> seems the most reasonable idea.
> I would add that even if it is an unrooted tree, the topology might be
> of interest. In my case I am doing a comparative visualizer and it
> might be nice for the user to be able to visualize the topology as
> specified. It has no biological meaning, but in practice, for many
> users, it helps.
> I note that PhyloXML (even by virtue of being a XML format) always
> represents the phylogenies as trees (not weigthed DAGs). There an
> attribute rooted which can be true or false.
>
> But, anyway. Even assuming a very conservative view on this, the
> current parser, for rooted trees, does not allow to determine where is
> the root. I think that there would be a consensus that that is a bug?
>
> Tiago
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Tue Nov 3 18:24:52 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 3 Nov 2009 18:24:52 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
Message-ID: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
If somebody would provide the desired changes to the parser interface
(wrt this bug and the other one reported previously), I might offer to
to the grunt work.
But somebody has to say which interface changes are desired.
I remember which problems exist:
1. Lack of knowledge of root node
2. The p* stuff.
Tiago
2009/11/3 Richard Holland :
> Agreed that there is a bug. Now all we need is someone to go in and fix it!
> :)
>
> cheers,
> Richard
>
> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>
>> 2009/11/3 Thasso Griebel :
>>>
>>> There is a way to uniquely ?get a root from a newick string. Usually a
>>> rooted newick is surrounded with brackets, which indicates the root as
>>> the
>>> highest node in the tree. For example:
>>>
>>> (A, (B,C))
>>>
>>
>> Agree, it is quite easy to get the root of the tree from the newick
>> representation. But it should be done on parsing and returned in some
>> way by the parsing system. If the user has to do it again, it means
>> that the user has to parse it again just to know the root node.
>>
>>> I would also suggest to generally parse trees as rooted trees (maybe jsut
>>> for th initial internal model). Creating an unrooted tree from a rooted
>>> ?one
>>> is easy, remove the root and forget about directions. The other way might
>>> be
>>> hard and ambiguous.
>>
>> 100% agree.
>> The newick _representation_ always has a root by virtue of the way it
>> is done. If that root has meaning or not depends. Doing as you suggest
>> seems the most reasonable idea.
>> I would add that even if it is an unrooted tree, the topology might be
>> of interest. In my case I am doing a comparative visualizer and it
>> might be nice for the user to be able to visualize the topology as
>> specified. It has no biological meaning, but in practice, for many
>> users, it helps.
>> I note that PhyloXML (even by virtue of being a XML format) always
>> represents the phylogenies as trees (not weigthed DAGs). There an
>> attribute rooted which can be true or false.
>>
>> But, anyway. Even assuming a very conservative view on this, the
>> current parser, for rooted trees, does not allow to determine where is
>> the root. I think that there would be a consensus that that is a bug?
>>
>> Tiago
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From holland at eaglegenomics.com Tue Nov 3 18:46:05 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 18:46:05 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<46746923-CB17-4338-AF1B-ED22FEBE104D@eaglegenomics.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
Message-ID: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
> 1. Lack of knowledge of root node
The Newick tree string is read as-is and is not parsed. It only gets
parsed at the point of conversion to a Undirected or WeightedGraph
inside the TreeBlocks.java source code (inside the two types of get-As-
JGraphT methods). It's at this point the string is parsed and it's
here that root note determination should take place. It's already
known whether &R or &U have been specified here, which should help the
code work out what to do.
> 2. The p* stuff.
Exactly the same part of the code as described above. Wherever it
pushes values to the stack but prepends them with 'p' first, you'll
need to change the 'p' to some instance variable and provide a getter/
setter to change it, with 'p' being the default setting.
cheers,
Richard
>
> Tiago
> 2009/11/3 Richard Holland :
>> Agreed that there is a bug. Now all we need is someone to go in and
>> fix it!
>> :)
>>
>> cheers,
>> Richard
>>
>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>
>>> 2009/11/3 Thasso Griebel :
>>>>
>>>> There is a way to uniquely get a root from a newick string.
>>>> Usually a
>>>> rooted newick is surrounded with brackets, which indicates the
>>>> root as
>>>> the
>>>> highest node in the tree. For example:
>>>>
>>>> (A, (B,C))
>>>>
>>>
>>> Agree, it is quite easy to get the root of the tree from the newick
>>> representation. But it should be done on parsing and returned in
>>> some
>>> way by the parsing system. If the user has to do it again, it means
>>> that the user has to parse it again just to know the root node.
>>>
>>>> I would also suggest to generally parse trees as rooted trees
>>>> (maybe jsut
>>>> for th initial internal model). Creating an unrooted tree from a
>>>> rooted
>>>> one
>>>> is easy, remove the root and forget about directions. The other
>>>> way might
>>>> be
>>>> hard and ambiguous.
>>>
>>> 100% agree.
>>> The newick _representation_ always has a root by virtue of the way
>>> it
>>> is done. If that root has meaning or not depends. Doing as you
>>> suggest
>>> seems the most reasonable idea.
>>> I would add that even if it is an unrooted tree, the topology
>>> might be
>>> of interest. In my case I am doing a comparative visualizer and it
>>> might be nice for the user to be able to visualize the topology as
>>> specified. It has no biological meaning, but in practice, for many
>>> users, it helps.
>>> I note that PhyloXML (even by virtue of being a XML format) always
>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>> attribute rooted which can be true or false.
>>>
>>> But, anyway. Even assuming a very conservative view on this, the
>>> current parser, for rooted trees, does not allow to determine
>>> where is
>>> the root. I think that there would be a consensus that that is a
>>> bug?
>>>
>>> Tiago
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Tue Nov 3 18:55:23 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 3 Nov 2009 18:55:23 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
Message-ID: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
But the point is that the class interface changes to the outside user:
1. How does one report back the root to the user?
2. Regarding the prefix stuff, should the user be allowed to specify a
preferred prefix?
Both this things imply interface changes visible to users.
If you still need volunteers to do the change, I can do it. But I need
to know what changes to the user interface are to be done.
For 1, maybe a method getRoot, returning a string with the name of the
root node?
For 2, maybe an extended version of the parse function with a suffix
as input parameter?
2009/11/3 Richard Holland :
>> 1. Lack of knowledge of root node
>
> The Newick tree string is read as-is and is not parsed. It only gets parsed
> at the point of conversion to a Undirected or WeightedGraph inside the
> TreeBlocks.java source code (inside the two types of get-As-JGraphT
> methods). It's at this point the string is parsed and it's here that root
> note determination should take place. It's already known whether &R or &U
> have been specified here, which should help the code work out what to do.
>
>> 2. The p* stuff.
>
> Exactly the same part of the code as described above. Wherever it pushes
> values to the stack but prepends them with 'p' first, you'll need to change
> the 'p' to some instance variable and provide a getter/setter to change it,
> with 'p' being the default setting.
>
> cheers,
> Richard
>
>>
>> Tiago
>> 2009/11/3 Richard Holland :
>>>
>>> Agreed that there is a bug. Now all we need is someone to go in and fix
>>> it!
>>> :)
>>>
>>> cheers,
>>> Richard
>>>
>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>
>>>> 2009/11/3 Thasso Griebel :
>>>>>
>>>>> There is a way to uniquely ?get a root from a newick string. Usually a
>>>>> rooted newick is surrounded with brackets, which indicates the root as
>>>>> the
>>>>> highest node in the tree. For example:
>>>>>
>>>>> (A, (B,C))
>>>>>
>>>>
>>>> Agree, it is quite easy to get the root of the tree from the newick
>>>> representation. But it should be done on parsing and returned in some
>>>> way by the parsing system. If the user has to do it again, it means
>>>> that the user has to parse it again just to know the root node.
>>>>
>>>>> I would also suggest to generally parse trees as rooted trees (maybe
>>>>> jsut
>>>>> for th initial internal model). Creating an unrooted tree from a rooted
>>>>> ?one
>>>>> is easy, remove the root and forget about directions. The other way
>>>>> might
>>>>> be
>>>>> hard and ambiguous.
>>>>
>>>> 100% agree.
>>>> The newick _representation_ always has a root by virtue of the way it
>>>> is done. If that root has meaning or not depends. Doing as you suggest
>>>> seems the most reasonable idea.
>>>> I would add that even if it is an unrooted tree, the topology might be
>>>> of interest. In my case I am doing a comparative visualizer and it
>>>> might be nice for the user to be able to visualize the topology as
>>>> specified. It has no biological meaning, but in practice, for many
>>>> users, it helps.
>>>> I note that PhyloXML (even by virtue of being a XML format) always
>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>> attribute rooted which can be true or false.
>>>>
>>>> But, anyway. Even assuming a very conservative view on this, the
>>>> current parser, for rooted trees, does not allow to determine where is
>>>> the root. I think that there would be a consensus that that is a bug?
>>>>
>>>> Tiago
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>
>>
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From peter.midford at gmail.com Tue Nov 3 19:28:14 2009
From: peter.midford at gmail.com (Peter Midford)
Date: Tue, 3 Nov 2009 14:28:14 -0500
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID: <2E8B7EE9-2617-4096-B7AC-52A398D7E69F@gmail.com>
Tiago,
If you return a directed graph, the root will be a node
with no incoming edges.
Peter
On Nov 3, 2009, at 13:55, Tiago Ant?o wrote:
> But the point is that the class interface changes to the outside user:
> 1. How does one report back the root to the user?
> 2. Regarding the prefix stuff, should the user be allowed to specify a
> preferred prefix?
>
> Both this things imply interface changes visible to users.
> If you still need volunteers to do the change, I can do it. But I need
> to know what changes to the user interface are to be done.
> For 1, maybe a method getRoot, returning a string with the name of the
> root node?
> For 2, maybe an extended version of the parse function with a suffix
> as input parameter?
>
> 2009/11/3 Richard Holland :
>>> 1. Lack of knowledge of root node
>>
>> The Newick tree string is read as-is and is not parsed. It only
>> gets parsed
>> at the point of conversion to a Undirected or WeightedGraph inside
>> the
>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>> methods). It's at this point the string is parsed and it's here
>> that root
>> note determination should take place. It's already known whether &R
>> or &U
>> have been specified here, which should help the code work out what
>> to do.
>>
>>> 2. The p* stuff.
>>
>> Exactly the same part of the code as described above. Wherever it
>> pushes
>> values to the stack but prepends them with 'p' first, you'll need
>> to change
>> the 'p' to some instance variable and provide a getter/setter to
>> change it,
>> with 'p' being the default setting.
>>
>> cheers,
>> Richard
>>
>>>
>>> Tiago
>>> 2009/11/3 Richard Holland :
>>>>
>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>> and fix
>>>> it!
>>>> :)
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>
>>>>> 2009/11/3 Thasso Griebel :
>>>>>>
>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>> Usually a
>>>>>> rooted newick is surrounded with brackets, which indicates the
>>>>>> root as
>>>>>> the
>>>>>> highest node in the tree. For example:
>>>>>>
>>>>>> (A, (B,C))
>>>>>>
>>>>>
>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>> newick
>>>>> representation. But it should be done on parsing and returned in
>>>>> some
>>>>> way by the parsing system. If the user has to do it again, it
>>>>> means
>>>>> that the user has to parse it again just to know the root node.
>>>>>
>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>> (maybe
>>>>>> jsut
>>>>>> for th initial internal model). Creating an unrooted tree from
>>>>>> a rooted
>>>>>> one
>>>>>> is easy, remove the root and forget about directions. The other
>>>>>> way
>>>>>> might
>>>>>> be
>>>>>> hard and ambiguous.
>>>>>
>>>>> 100% agree.
>>>>> The newick _representation_ always has a root by virtue of the
>>>>> way it
>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>> suggest
>>>>> seems the most reasonable idea.
>>>>> I would add that even if it is an unrooted tree, the topology
>>>>> might be
>>>>> of interest. In my case I am doing a comparative visualizer and it
>>>>> might be nice for the user to be able to visualize the topology as
>>>>> specified. It has no biological meaning, but in practice, for many
>>>>> users, it helps.
>>>>> I note that PhyloXML (even by virtue of being a XML format) always
>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>> attribute rooted which can be true or false.
>>>>>
>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>> current parser, for rooted trees, does not allow to determine
>>>>> where is
>>>>> the root. I think that there would be a consensus that that is a
>>>>> bug?
>>>>>
>>>>> Tiago
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
Peter E. Midford
Mesquite Developer
Peter.Midford at gmail.com
From holland at eaglegenomics.com Tue Nov 3 20:20:31 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 3 Nov 2009 20:20:31 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID:
A getRoot() function sounds good. It would return the String label of
the root node, the same as which identifies the corresponding vertex
in the JGraphT model. An equivalent setRoot() would be nice.
The prefix for the parser currently is hardcoded as p. Two new methods
- set and getDefaultPrefix which accept a string should be provided
(it should check that the string is valid, i.e. all alphanumeric and
with no spaces or other Newick-sensitive characters). The parser
should be changed to use the output from getDefaultPrefix() instead of
the hardcoded p. The default behaviour should be such that it behaves
the same as at present unless the user explicitly says otherwise by
calling the setDefaultPrefix() method.
Personally I would also alter the methods that return JGraphTs so that
they return their Directed equivalents if possible. I believe that
these can still be unrooted - you'd have to check the JGraphT
documentation to make sure.
Richard.
On 3 Nov 2009, at 18:55, Tiago Ant?o wrote:
> But the point is that the class interface changes to the outside user:
> 1. How does one report back the root to the user?
> 2. Regarding the prefix stuff, should the user be allowed to specify a
> preferred prefix?
>
> Both this things imply interface changes visible to users.
> If you still need volunteers to do the change, I can do it. But I need
> to know what changes to the user interface are to be done.
> For 1, maybe a method getRoot, returning a string with the name of the
> root node?
> For 2, maybe an extended version of the parse function with a suffix
> as input parameter?
>
> 2009/11/3 Richard Holland :
>>> 1. Lack of knowledge of root node
>>
>> The Newick tree string is read as-is and is not parsed. It only
>> gets parsed
>> at the point of conversion to a Undirected or WeightedGraph inside
>> the
>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>> methods). It's at this point the string is parsed and it's here
>> that root
>> note determination should take place. It's already known whether &R
>> or &U
>> have been specified here, which should help the code work out what
>> to do.
>>
>>> 2. The p* stuff.
>>
>> Exactly the same part of the code as described above. Wherever it
>> pushes
>> values to the stack but prepends them with 'p' first, you'll need
>> to change
>> the 'p' to some instance variable and provide a getter/setter to
>> change it,
>> with 'p' being the default setting.
>>
>> cheers,
>> Richard
>>
>>>
>>> Tiago
>>> 2009/11/3 Richard Holland :
>>>>
>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>> and fix
>>>> it!
>>>> :)
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>
>>>>> 2009/11/3 Thasso Griebel :
>>>>>>
>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>> Usually a
>>>>>> rooted newick is surrounded with brackets, which indicates the
>>>>>> root as
>>>>>> the
>>>>>> highest node in the tree. For example:
>>>>>>
>>>>>> (A, (B,C))
>>>>>>
>>>>>
>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>> newick
>>>>> representation. But it should be done on parsing and returned in
>>>>> some
>>>>> way by the parsing system. If the user has to do it again, it
>>>>> means
>>>>> that the user has to parse it again just to know the root node.
>>>>>
>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>> (maybe
>>>>>> jsut
>>>>>> for th initial internal model). Creating an unrooted tree from
>>>>>> a rooted
>>>>>> one
>>>>>> is easy, remove the root and forget about directions. The other
>>>>>> way
>>>>>> might
>>>>>> be
>>>>>> hard and ambiguous.
>>>>>
>>>>> 100% agree.
>>>>> The newick _representation_ always has a root by virtue of the
>>>>> way it
>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>> suggest
>>>>> seems the most reasonable idea.
>>>>> I would add that even if it is an unrooted tree, the topology
>>>>> might be
>>>>> of interest. In my case I am doing a comparative visualizer and it
>>>>> might be nice for the user to be able to visualize the topology as
>>>>> specified. It has no biological meaning, but in practice, for many
>>>>> users, it helps.
>>>>> I note that PhyloXML (even by virtue of being a XML format) always
>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>> attribute rooted which can be true or false.
>>>>>
>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>> current parser, for rooted trees, does not allow to determine
>>>>> where is
>>>>> the root. I think that there would be a consensus that that is a
>>>>> bug?
>>>>>
>>>>> Tiago
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From thasso.griebel at uni-jena.de Wed Nov 4 11:57:45 2009
From: thasso.griebel at uni-jena.de (Thasso Griebel)
Date: Wed, 4 Nov 2009 12:57:45 +0100
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
Hi,
> A getRoot() function sounds good. It would return the String label
> of the root node, the same as which identifies the corresponding
> vertex in the JGraphT model. An equivalent setRoot() would be nice.
Though you have to keep in mind that switching the root to another
node has certain implications on the tree structure and this has to be
taken into account when the newick string is parsed and the graph is
created. You have to parse the graph from newick and then "reroot" the
tree as the root might not be equal to the one specified in the newick
string.
> Personally I would also alter the methods that return JGraphTs so
> that they return their Directed equivalents if possible. I believe
> that these can still be unrooted - you'd have to check the JGraphT
> documentation to make sure.
You have to change that method signature if you want to use the same
method. The only relationship between JGraphTs UndirectedGraph and the
DirectedGraph counterpart is that they both extend the Graph
interface, but a DirectedGraph is not an UndirectedGraph. Switching to
DirectedGraph definitely breaks the current API ! I don't know how you
usually handle such situations in BioJava, but this clearly breaks
compatibility. Maybe it would be better to introduce a new method that
returns directed graphs ?
cheers,
-thasso
>
> Richard.
>
> On 3 Nov 2009, at 18:55, Tiago Ant?o wrote:
>
>> But the point is that the class interface changes to the outside
>> user:
>> 1. How does one report back the root to the user?
>> 2. Regarding the prefix stuff, should the user be allowed to
>> specify a
>> preferred prefix?
>>
>> Both this things imply interface changes visible to users.
>> If you still need volunteers to do the change, I can do it. But I
>> need
>> to know what changes to the user interface are to be done.
>> For 1, maybe a method getRoot, returning a string with the name of
>> the
>> root node?
>> For 2, maybe an extended version of the parse function with a suffix
>> as input parameter?
>>
>> 2009/11/3 Richard Holland :
>>>> 1. Lack of knowledge of root node
>>>
>>> The Newick tree string is read as-is and is not parsed. It only
>>> gets parsed
>>> at the point of conversion to a Undirected or WeightedGraph inside
>>> the
>>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>>> methods). It's at this point the string is parsed and it's here
>>> that root
>>> note determination should take place. It's already known whether
>>> &R or &U
>>> have been specified here, which should help the code work out what
>>> to do.
>>>
>>>> 2. The p* stuff.
>>>
>>> Exactly the same part of the code as described above. Wherever it
>>> pushes
>>> values to the stack but prepends them with 'p' first, you'll need
>>> to change
>>> the 'p' to some instance variable and provide a getter/setter to
>>> change it,
>>> with 'p' being the default setting.
>>>
>>> cheers,
>>> Richard
>>>
>>>>
>>>> Tiago
>>>> 2009/11/3 Richard Holland :
>>>>>
>>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>>> and fix
>>>>> it!
>>>>> :)
>>>>>
>>>>> cheers,
>>>>> Richard
>>>>>
>>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>>
>>>>>> 2009/11/3 Thasso Griebel :
>>>>>>>
>>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>>> Usually a
>>>>>>> rooted newick is surrounded with brackets, which indicates the
>>>>>>> root as
>>>>>>> the
>>>>>>> highest node in the tree. For example:
>>>>>>>
>>>>>>> (A, (B,C))
>>>>>>>
>>>>>>
>>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>>> newick
>>>>>> representation. But it should be done on parsing and returned
>>>>>> in some
>>>>>> way by the parsing system. If the user has to do it again, it
>>>>>> means
>>>>>> that the user has to parse it again just to know the root node.
>>>>>>
>>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>>> (maybe
>>>>>>> jsut
>>>>>>> for th initial internal model). Creating an unrooted tree from
>>>>>>> a rooted
>>>>>>> one
>>>>>>> is easy, remove the root and forget about directions. The
>>>>>>> other way
>>>>>>> might
>>>>>>> be
>>>>>>> hard and ambiguous.
>>>>>>
>>>>>> 100% agree.
>>>>>> The newick _representation_ always has a root by virtue of the
>>>>>> way it
>>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>>> suggest
>>>>>> seems the most reasonable idea.
>>>>>> I would add that even if it is an unrooted tree, the topology
>>>>>> might be
>>>>>> of interest. In my case I am doing a comparative visualizer and
>>>>>> it
>>>>>> might be nice for the user to be able to visualize the topology
>>>>>> as
>>>>>> specified. It has no biological meaning, but in practice, for
>>>>>> many
>>>>>> users, it helps.
>>>>>> I note that PhyloXML (even by virtue of being a XML format)
>>>>>> always
>>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>>> attribute rooted which can be true or false.
>>>>>>
>>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>>> current parser, for rooted trees, does not allow to determine
>>>>>> where is
>>>>>> the root. I think that there would be a consensus that that is
>>>>>> a bug?
>>>>>>
>>>>>> Tiago
>>>>>
>>>>> --
>>>>> Richard Holland, BSc MBCS
>>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>>> http://www.eaglegenomics.com/
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "The hottest places in hell are reserved for those who, in times of
>>>> moral crisis, maintain a neutrality." - Dante
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>>
>>>
>>
>>
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany
From tiagoantao at gmail.com Wed Nov 4 12:40:46 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 4 Nov 2009 12:40:46 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
Message-ID: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
2009/11/3 Richard Holland :
> The prefix for the parser currently is hardcoded as p. Two new methods - set
> and getDefaultPrefix which accept a string should be provided (it should
> check that the string is valid, i.e. all alphanumeric and with no spaces or
> other Newick-sensitive characters). The parser should be changed to use the
> output from getDefaultPrefix() instead of the hardcoded p. The default
> behaviour should be such that it behaves the same as at present unless the
> user explicitly says otherwise by calling the setDefaultPrefix() method.
This default behavior would still raise an exception with nodes called
p* . I would suggest a minor change: If there is a clash, the parser
would try the next p* (or whatever defaultPrefix) ...
Example to make it clear: if there is a leaf called p2, internal nodes
generated would be p1, p3, p4, ....
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From tiagoantao at gmail.com Wed Nov 4 12:44:21 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 4 Nov 2009 12:44:21 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
Message-ID: <6d941f120911040444y33da2642oe7104708a2d2a6cb@mail.gmail.com>
2009/11/4 Thasso Griebel :
>> Personally I would also alter the methods that return JGraphTs so that
>> they return their Directed equivalents if possible. I believe that these can
>> still be unrooted - you'd have to check the JGraphT documentation to make
>> sure.
>
> You have to change that method signature if you want to use the same method.
> The only relationship between JGraphTs UndirectedGraph and the DirectedGraph
> counterpart is that they both extend the Graph interface, but a
> DirectedGraph is not an UndirectedGraph. Switching to DirectedGraph
> definitely breaks the current API ! I don't know how you usually handle such
> situations in BioJava, but this clearly breaks compatibility. Maybe it would
> be better to introduce a new method that returns directed graphs ?
I also don't know how BioJava sorts these kinds of issues. But my
personal, outsider, opinion would be in your direction, ie:
a. Not break the current API
b. Add a new method with a directed graph
c. (extra) Add a new method boolean isRooted(void) to check is the
tree is rooted or not...
Best
Tiago
From holland at eaglegenomics.com Wed Nov 4 12:46:01 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 4 Nov 2009 12:46:01 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021136k8218472hd9fb52efd4021baf@mail.gmail.com>
<0A136D45-0116-4F6A-A96B-FF9710FB018C@eaglegenomics.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6303BD72-3F47-4DB1-8526-0402FEB502EE@uni-jena.de>
Message-ID:
>
> You have to change that method signature if you want to use the same
> method. The only relationship between JGraphTs UndirectedGraph and
> the DirectedGraph counterpart is that they both extend the Graph
> interface, but a DirectedGraph is not an UndirectedGraph. Switching
> to DirectedGraph definitely breaks the current API ! I don't know
> how you usually handle such situations in BioJava, but this clearly
> breaks compatibility. Maybe it would be better to introduce a new
> method that returns directed graphs ?
Whether or not to break the API depends on a few things. First, how
old and well adopted is the code. Second, is the existing API
illogical or just plain wrong. A balance between the two gives the
confidence in which the API can be changed.
In this instance, the code is fairly new, not widely adopted, and the
existing API is clearly wrong by forcing all JGraphT graphs to be
undirected.
To keep everyone happy, I would introduce a new method with a new name
that takes a boolean or enum option indicating what type of graph the
user wants (undirected,directed,whatever). I would then deprecate the
existing method and move its contents into the undirected part of the
new method, and replace the old method contents with a call to the new
method with the option set to undirected.
cheers,
Richard
> cheers,
> -thasso
>
>
>
>
>
>
>>
>> Richard.
>>
>> On 3 Nov 2009, at 18:55, Tiago Ant?o wrote:
>>
>>> But the point is that the class interface changes to the outside
>>> user:
>>> 1. How does one report back the root to the user?
>>> 2. Regarding the prefix stuff, should the user be allowed to
>>> specify a
>>> preferred prefix?
>>>
>>> Both this things imply interface changes visible to users.
>>> If you still need volunteers to do the change, I can do it. But I
>>> need
>>> to know what changes to the user interface are to be done.
>>> For 1, maybe a method getRoot, returning a string with the name of
>>> the
>>> root node?
>>> For 2, maybe an extended version of the parse function with a suffix
>>> as input parameter?
>>>
>>> 2009/11/3 Richard Holland :
>>>>> 1. Lack of knowledge of root node
>>>>
>>>> The Newick tree string is read as-is and is not parsed. It only
>>>> gets parsed
>>>> at the point of conversion to a Undirected or WeightedGraph
>>>> inside the
>>>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>>>> methods). It's at this point the string is parsed and it's here
>>>> that root
>>>> note determination should take place. It's already known whether
>>>> &R or &U
>>>> have been specified here, which should help the code work out
>>>> what to do.
>>>>
>>>>> 2. The p* stuff.
>>>>
>>>> Exactly the same part of the code as described above. Wherever it
>>>> pushes
>>>> values to the stack but prepends them with 'p' first, you'll need
>>>> to change
>>>> the 'p' to some instance variable and provide a getter/setter to
>>>> change it,
>>>> with 'p' being the default setting.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>>>
>>>>> Tiago
>>>>> 2009/11/3 Richard Holland :
>>>>>>
>>>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>>>> and fix
>>>>>> it!
>>>>>> :)
>>>>>>
>>>>>> cheers,
>>>>>> Richard
>>>>>>
>>>>>> On 3 Nov 2009, at 18:16, Tiago Ant?o wrote:
>>>>>>
>>>>>>> 2009/11/3 Thasso Griebel :
>>>>>>>>
>>>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>>>> Usually a
>>>>>>>> rooted newick is surrounded with brackets, which indicates
>>>>>>>> the root as
>>>>>>>> the
>>>>>>>> highest node in the tree. For example:
>>>>>>>>
>>>>>>>> (A, (B,C))
>>>>>>>>
>>>>>>>
>>>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>>>> newick
>>>>>>> representation. But it should be done on parsing and returned
>>>>>>> in some
>>>>>>> way by the parsing system. If the user has to do it again, it
>>>>>>> means
>>>>>>> that the user has to parse it again just to know the root node.
>>>>>>>
>>>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>>>> (maybe
>>>>>>>> jsut
>>>>>>>> for th initial internal model). Creating an unrooted tree
>>>>>>>> from a rooted
>>>>>>>> one
>>>>>>>> is easy, remove the root and forget about directions. The
>>>>>>>> other way
>>>>>>>> might
>>>>>>>> be
>>>>>>>> hard and ambiguous.
>>>>>>>
>>>>>>> 100% agree.
>>>>>>> The newick _representation_ always has a root by virtue of the
>>>>>>> way it
>>>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>>>> suggest
>>>>>>> seems the most reasonable idea.
>>>>>>> I would add that even if it is an unrooted tree, the topology
>>>>>>> might be
>>>>>>> of interest. In my case I am doing a comparative visualizer
>>>>>>> and it
>>>>>>> might be nice for the user to be able to visualize the
>>>>>>> topology as
>>>>>>> specified. It has no biological meaning, but in practice, for
>>>>>>> many
>>>>>>> users, it helps.
>>>>>>> I note that PhyloXML (even by virtue of being a XML format)
>>>>>>> always
>>>>>>> represents the phylogenies as trees (not weigthed DAGs). There
>>>>>>> an
>>>>>>> attribute rooted which can be true or false.
>>>>>>>
>>>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>>>> current parser, for rooted trees, does not allow to determine
>>>>>>> where is
>>>>>>> the root. I think that there would be a consensus that that is
>>>>>>> a bug?
>>>>>>>
>>>>>>> Tiago
>>>>>>
>>>>>> --
>>>>>> Richard Holland, BSc MBCS
>>>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>>>> http://www.eaglegenomics.com/
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "The hottest places in hell are reserved for those who, in times
>>>>> of
>>>>> moral crisis, maintain a neutrality." - Dante
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>
> --
> Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer
> Bioinformatik
> Office 3426--http://bio.informatik.uni-jena.de--Institut fuer
> Informatik
> Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet
> Jena
> Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena,
> Germany
>
>
>
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From holland at eaglegenomics.com Wed Nov 4 12:46:34 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 4 Nov 2009 12:46:34 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
Message-ID: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
Sounds good.
On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
> 2009/11/3 Richard Holland :
>> The prefix for the parser currently is hardcoded as p. Two new
>> methods - set
>> and getDefaultPrefix which accept a string should be provided (it
>> should
>> check that the string is valid, i.e. all alphanumeric and with no
>> spaces or
>> other Newick-sensitive characters). The parser should be changed to
>> use the
>> output from getDefaultPrefix() instead of the hardcoded p. The
>> default
>> behaviour should be such that it behaves the same as at present
>> unless the
>> user explicitly says otherwise by calling the setDefaultPrefix()
>> method.
>
> This default behavior would still raise an exception with nodes called
> p* . I would suggest a minor change: If there is a clash, the parser
> would try the next p* (or whatever defaultPrefix) ...
>
> Example to make it clear: if there is a leaf called p2, internal nodes
> generated would be p1, p3, p4, ....
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From holland at eaglegenomics.com Wed Nov 4 12:51:37 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 4 Nov 2009 12:51:37 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911021511o5bd3bd4bh51b3a8158d8b094b@mail.gmail.com>
<78371022-4851-45D1-81BE-72B10906F924@eaglegenomics.com>
<196A2F45-F93A-46A8-9952-D9C7D2F2828B@uni-jena.de>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
Message-ID:
ah... except a problem! The parser does not know all names in the
string in advance, so if it auto-assigns one that is then used later
in the string, we have the same problem with name clashes as before.
The names the parser assigns cannot totally avoid all clashes unless
it has already parsed the string to find out what names were used in
the string itself already. So some kind of pre-parse would be necessary.
On 4 Nov 2009, at 12:46, Richard Holland wrote:
> Sounds good.
>
> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
>
>> 2009/11/3 Richard Holland :
>>> The prefix for the parser currently is hardcoded as p. Two new
>>> methods - set
>>> and getDefaultPrefix which accept a string should be provided (it
>>> should
>>> check that the string is valid, i.e. all alphanumeric and with no
>>> spaces or
>>> other Newick-sensitive characters). The parser should be changed
>>> to use the
>>> output from getDefaultPrefix() instead of the hardcoded p. The
>>> default
>>> behaviour should be such that it behaves the same as at present
>>> unless the
>>> user explicitly says otherwise by calling the setDefaultPrefix()
>>> method.
>>
>> This default behavior would still raise an exception with nodes
>> called
>> p* . I would suggest a minor change: If there is a clash, the parser
>> would try the next p* (or whatever defaultPrefix) ...
>>
>> Example to make it clear: if there is a leaf called p2, internal
>> nodes
>> generated would be p1, p3, p4, ....
>>
>> --
>> "The hottest places in hell are reserved for those who, in times of
>> moral crisis, maintain a neutrality." - Dante
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
From tiagoantao at gmail.com Wed Nov 4 17:18:52 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 4 Nov 2009 17:18:52 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To:
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911031016l2761d27j84526d837f2c1245@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
Message-ID: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
Unless anyone with experience in biojava development wants to take on
this, I would volunteer to do this. I ended up using the PhyloXML
forester-atv parser (and moving to phyloxml instead of nexus), but as
I reported this, I might as well sort it out...
2009/11/4 Richard Holland :
> ah... except a problem! The parser does not know all names in the string in
> advance, so if it auto-assigns one that is then used later in the string, we
> have the same problem with name clashes as before.
>
> The names the parser assigns cannot totally avoid all clashes unless it has
> already parsed the string to find out what names were used in the string
> itself already. So some kind of pre-parse would be necessary.
>
> On 4 Nov 2009, at 12:46, Richard Holland wrote:
>
>> Sounds good.
>>
>> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
>>
>>> 2009/11/3 Richard Holland :
>>>>
>>>> The prefix for the parser currently is hardcoded as p. Two new methods -
>>>> set
>>>> and getDefaultPrefix which accept a string should be provided (it should
>>>> check that the string is valid, i.e. all alphanumeric and with no spaces
>>>> or
>>>> other Newick-sensitive characters). The parser should be changed to use
>>>> the
>>>> output from getDefaultPrefix() instead of the hardcoded p. The default
>>>> behaviour should be such that it behaves the same as at present unless
>>>> the
>>>> user explicitly says otherwise by calling the setDefaultPrefix() method.
>>>
>>> This default behavior would still raise an exception with nodes called
>>> p* . I would suggest a minor change: If there is a clash, the parser
>>> would try the next p* (or whatever defaultPrefix) ...
>>>
>>> Example to make it clear: if there is a leaf called p2, internal nodes
>>> generated would be p1, p3, p4, ....
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>> _______________________________________________
>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
--
"The hottest places in hell are reserved for those who, in times of
moral crisis, maintain a neutrality." - Dante
From andreas at sdsc.edu Wed Nov 4 17:26:06 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 4 Nov 2009 09:26:06 -0800
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>
<6d941f120911040918v663f6d01s8d8d14d0bda94fc0@mail.gmail.com>
Message-ID: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com>
excellent, thanks for taking this on!
Andreas
2009/11/4 Tiago Ant?o
> Unless anyone with experience in biojava development wants to take on
> this, I would volunteer to do this. I ended up using the PhyloXML
> forester-atv parser (and moving to phyloxml instead of nexus), but as
> I reported this, I might as well sort it out...
>
> 2009/11/4 Richard Holland :
> > ah... except a problem! The parser does not know all names in the string
> in
> > advance, so if it auto-assigns one that is then used later in the string,
> we
> > have the same problem with name clashes as before.
> >
> > The names the parser assigns cannot totally avoid all clashes unless it
> has
> > already parsed the string to find out what names were used in the string
> > itself already. So some kind of pre-parse would be necessary.
> >
> > On 4 Nov 2009, at 12:46, Richard Holland wrote:
> >
> >> Sounds good.
> >>
> >> On 4 Nov 2009, at 12:40, Tiago Ant?o wrote:
> >>
> >>> 2009/11/3 Richard Holland :
> >>>>
> >>>> The prefix for the parser currently is hardcoded as p. Two new methods
> -
> >>>> set
> >>>> and getDefaultPrefix which accept a string should be provided (it
> should
> >>>> check that the string is valid, i.e. all alphanumeric and with no
> spaces
> >>>> or
> >>>> other Newick-sensitive characters). The parser should be changed to
> use
> >>>> the
> >>>> output from getDefaultPrefix() instead of the hardcoded p. The default
> >>>> behaviour should be such that it behaves the same as at present unless
> >>>> the
> >>>> user explicitly says otherwise by calling the setDefaultPrefix()
> method.
> >>>
> >>> This default behavior would still raise an exception with nodes called
> >>> p* . I would suggest a minor change: If there is a clash, the parser
> >>> would try the next p* (or whatever defaultPrefix) ...
> >>>
> >>> Example to make it clear: if there is a leaf called p2, internal nodes
> >>> generated would be p1, p3, p4, ....
> >>>
> >>> --
> >>> "The hottest places in hell are reserved for those who, in times of
> >>> moral crisis, maintain a neutrality." - Dante
> >>
> >> --
> >> Richard Holland, BSc MBCS
> >> Operations and Delivery Director, Eagle Genomics Ltd
> >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> >> http://www.eaglegenomics.com/
> >>
> >>
> >> _______________________________________________
> >> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
> >
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
From tiagoantao at gmail.com Fri Nov 6 11:30:00 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 6 Nov 2009 11:30:00 +0000
Subject: [Biojava-l] Rooted trees in nexus files
In-Reply-To: <59a41c430911040926h607b51e6ydd6d8145424d073b@mail.gmail.com>
References: <6d941f120910301414k29c45c9ep6fa2c05b43f21842@mail.gmail.com>
<6d941f120911031024h4d1c1295o79bb17372883d269@mail.gmail.com>
<9F521EA8-B1E1-419F-B430-85CFCFDF0751@eaglegenomics.com>
<6d941f120911031055h21c20b75kec38e0b55bd5921b@mail.gmail.com>
<6d941f120911040440h38aa3fb9m873e4504f2a966a2@mail.gmail.com>
<6E4DA847-C5A6-47FA-ACA4-88C55DDD2CFC@eaglegenomics.com>