From jbdundas at gmail.com Tue Dec 1 08:39:56 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 1 Dec 2009 19:09:56 +0530 Subject: [Biojava-l] How do I get set of protein interactors for a specific protein from my code Message-ID: <326ea8620912010539l5c1f06acs87539939b40ca41a@mail.gmail.com> Dear Sir/Madam, Many thanks for your help in solving my previous problem(XML parsing error). I am writing a program to fetch details regarding any protein from the NCBI database. However, I do not know how to fetch the details of Protein-Protein interactors for a specific protein. For e.g.) If I click on protein p53, it should give me a list of protein interactors for p53. such as shown in http://www.hprd.org/interactions?hprd_id=01859&isoform_id=01859_1&isoform_name= Can someone please tell me how I can get this data from NCBI or any other source. Which database should I consider and which are the params involved. I am using Java as my language for doing so. Thanks in advance. Regards, Jitesh Dundas On Tue, Nov 24, 2009 at 8:21 PM, Richard Holland wrote: > Jitesh - I forwarded your response to the list so that everyone can get the > chance to reply. > > cheers, > Richard > > Begin forwarded message: > > *From: *jitesh dundas > *Date: *24 November 2009 14:47:00 GMT > *To: *Richard Holland > *Subject: **Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text > declaration not at start of entity* > > Dear Sir, > > Thank you for your reply. I figured this problem out by sending records in > small sets. e.g. 20 pages per page. > > It is like a pagination functionality. For each new page, we need to hit > the URl.. > > My functionality is working fine.I will be happy to share my code with you > (and anyone) who needs it. > > I simply fetch data from the URL and write to an XML file. Next I just read > the XML file and show them in the web page to the user. > > Again, I need to know how to fetch records for protein database. Two types > of searches are needed I suspect. > > First we use the Esearch utility and then the Efetch utility to get the > data of the specific protein.. > > I welcome any suggestions on this ! > > Thank you everyone for your help. > > Regards, > Jitesh Dundas > > On 11/24/09, Richard Holland wrote: >> >> Your program takes an input 'txtURLString' - could you give an example of >> the value that this usually contains? I suspect that this URL is where your >> problem lies but without seeing an example value I couldn't say for sure. >> >> thanks, >> Richard >> >> On 8 Nov 2009, at 10:22, jitesh dundas wrote: >> >> > Dear Sir, >> > >> > My program is working fine and can send me an xml file with 20 >> > records. However, it does not allow me to send large amounts of >> > records. >> > >> > For e.g. if I enter "cancer" it will return only 20 records. >> > >> > Can you please tell me what I should do next to get all those records. >> > Thank you in advance >> > >> > Regards, >> > Jitesh Dundas >> > >> > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic wrote: >> >> >> >> Hi Jitesh, >> >> >> >> It is hard to read your code with all the formatting off probably due >> to email and many commented lines that don;t seem to get used. Can you >> provide the stacktrace, so we can see what part of biojava is affected? >> >> >> >> Probably a good strategy to write and debug this is to simply the >> problem into smaller steps. Try to first download the files you want to >> parse and write the code to parse them from the local file. That will avoid >> any issues you might encounter with networking and server/client >> communication. Once the parsing is working you could take it to the next >> step and add the server communication... >> >> >> >> Andreas >> >> >> >> >> >> >> >> >> >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas >> wrote: >> >>> >> >>> Hi friends, >> >>> >> >>> I am getting this error on doing a post(using the code below) to this >> url-> >> >>> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 >> >>> >> >>> I have written this code in .jsp file. Later I will change it into >> servlet. >> >>> >> >>> Error:- >> >>> XML Parsing Error: XML or text declaration not at start of entity >> >>> Location: >> >>> >> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI >> >>> Line Number 11, Column 1:> >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " >> >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd >> ">2034200 >> >>> 19877350 19877304 19877297 >> >>> 19877284 19877271 19877265 >> >>> 19877250 19877245 19877226 >> >>> 19877210 19877179 19877175 >> >>> 19877161 19877159 19877158 >> >>> 19877123 19877122 19877120 >> >>> 19877119 19877118 >> >>> cancer >> >>> "neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All >> >>> Fields] >> >>> "neoplasms"[MeSH Terms] MeSH >> >>> Terms 2082133 Y >> >>> "neoplasms"[All >> Fields] All >> >>> Fields 1634731 Y >> >>> OR "cancer"[All >> Fields] >> >>> All >> Fields 902537 Y >> >>> OR GROUP >> >>> >> 2009/10/22[EDAT] EDAT 0 >> >>> Y >> >>> >> 2009/11/01[EDAT] EDAT 0 >> >>> Y RANGE AND >> >>> ("neoplasms"[MeSH Terms] OR >> >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] >> : >> >>> 2009/11/01[EDAT] >> >>> ^ >> >>> >> >>> As you can see, the XML output is coming fine but the above error does >> not >> >>> go..The output via this program should be just like hitting manually >> the >> >>> above URL in the browser.. >> >>> The browser is Mozilla Firefox. >> >>> >> >>> Code:- >> >>> >> >>> <%@ page language = "java" %> >> >>> <%@ page import = "java.sql.*" %> >> >>> <%@ page import = "java.util.*" %> >> >>> <%@ page import = "java.io.*" %> >> >>> <%@ page import="java.lang.*" %> >> >>> <%@ page import="java.net.*" %> >> >>> <%@ page import="java.nio.*" %> >> >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> >> >>> >> >>> >> >>> <% >> >>> >> >>> try >> >>> { >> >>> //String str = ""; >> >>> //out.println(""); >> >>> >> >>> Properties systemSettings = System.getProperties(); >> >>> systemSettings.put("http.proxyHost", "********"); >> >>> systemSettings.put("http.proxyPort", "******"); >> >>> systemSettings.put("sun.net.client.defaultConnectTimeout", >> "10000"); >> >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); >> >>> >> >>> //out.println("Properties Set"); >> >>> Authenticator.setDefault(new Authenticator() >> >>> { >> >>> protected PasswordAuthentication getPasswordAuthentication() >> >>> { >> >>> return new PasswordAuthentication("**", >> >>> "******".toCharArray()); // specify ur user name password of iitb >> login >> >>> } >> >>> }); >> >>> >> >>> >> >>> System.setProperties(systemSettings); >> >>> //out.println("After Authentication & Properties Settings"); >> >>> >> >>> //create xml file. >> >>> //the input to google api >> >>> //String textAreaContent = request.getParameter("text"); >> >>> String textAreaContent = "This si a tst"; >> >>> >> >>> String str = ""; >> >>> >> >>> //xml file generation ends here.. >> >>> //FetchDataFromNCBI_URLString.jsp >> >>> String URLString = request.getParameter("txtURLString").trim(); >> >>> >> >>> //URL url = new URL(" >> >>> >> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 >> >>> "); >> >>> URL url = new URL(URLString); //url string taken from user input. >> >>> HttpURLConnection connection = null; >> >>> >> >>> connection = (HttpURLConnection) url.openConnection(); >> >>> System.out.println("After open connection"); >> >>> connection.setRequestMethod("POST"); >> >>> connection.setDoInput(true); >> >>> connection.setDoOutput(true); >> >>> >> >>> connection.setUseCaches(false); >> >>> connection.setAllowUserInteraction(false); >> >>> //connection.setFollowRedirects(true); >> >>> //connection.setInstanceFollowRedirects(true); >> >>> //System.out.println("Before-------------------"); >> >>> connection.setRequestProperty ("Content-Type","text/xml; >> >>> charset=\"utf-8\""); >> >>> //System.out.println("After-------------------"); >> >>> >> >>> //System.out.println(""+ connection.getOutputStream()); >> >>> >> >>> //System.out.println("After dataoutputstream..Line No-65"); >> >>> >> >>> //System.out.println("Response Code="+ connection.getResponseCode); >> >>> >> >>> OutputStreamWriter dosout = new >> >>> OutputStreamWriter(connection.getOutputStream()); >> >>> //System.out.println("After dosout object..Line No-63"); >> >>> //dosout.write(str); >> >>> dosout.close (); >> >>> >> >>> BufferedReader in = new BufferedReader( new InputStreamReader( >> >>> connection.getInputStream())); >> >>> >> >>> String decodedString; >> >>> String tempstr = ""; >> >>> >> >>> >> >>> while ((decodedString = in.readLine()) != null) >> >>> { >> >>> tempstr = tempstr + decodedString; >> >>> //out.println(decodedString); >> >>> } >> >>> out.println(tempstr); >> >>> in.close(); >> >>> } >> >>> catch(Exception ex) >> >>> { >> >>> out.println("Exception->"+ex); >> >>> PrintWriter pw = response.getWriter(); >> >>> ex.printStackTrace(pw); >> >>> } >> >>> >> >>> >> >>> %> >> >>> >> >>> Thanks in advance.. >> >>> >> >>> Regards, >> >>> JItesh Dundas >> >>> >> >>> _______________________________________________ >> >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From jbdundas at gmail.com Tue Dec 1 08:33:31 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Tue, 1 Dec 2009 19:03:31 +0530 Subject: [Biojava-l] NCBI xml parser In-Reply-To: <1257170595.29918.8.camel@localhost.localdomain> References: <1257170595.29918.8.camel@localhost.localdomain> Message-ID: <326ea8620912010533i4e9524dcm30c74814d5e950c2@mail.gmail.com> Dear Sir, I am not using any of those methods. However, I have written a code to parse through an XML file for pubmed data. Hope this might help you. Regards, Jitesh Dundas On Mon, Nov 2, 2009 at 7:33 PM, Pierre-Yves wrote: > Dear list, > > I am trying to find my way around parsing ncbi blast xml. > I am using a small library which performs the blast online [1] and > returns a FileReader of the xml. > I can convert the FileReader to a string and print it, it seems fine. > (I used the default input shown on [1]). > > So I am now trying to parse it automatically. I looked at [2] and [3] > but I could not get them working. I then found this message from this > mailing list [4] and thus went to use BlastXMLParserFacade. > It returns me an "org.xml.sax.SAXException: illegal frame number > encountered. (0)". > > So my question is then: which method should I use ? > > Thanks in advance, > > Best regards, > > Pierre > > > > [1] http://users.encs.concordia.ca/~f_kohant/ncbiblast/ > [2] http://biojava.org/wiki/BioJava:CookBook:Blast:Echo > [3] > http://biojava.org/wiki/BioJava:Tutorial:Blast-like_Parsing_Cook_Book > [4] http://osdir.com/ml/java.bio.general/2005-06/msg00018.html > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: FetchDataFromNCBI.jsp Type: application/octet-stream Size: 3378 bytes Desc: not available URL: From oliver.stolpe at fu-berlin.de Sun Dec 6 09:04:11 2009 From: oliver.stolpe at fu-berlin.de (Oliver Stolpe) Date: Sun, 06 Dec 2009 15:04:11 +0100 Subject: [Biojava-l] Sequences as strings to RichSequence iterator Message-ID: <4B1BB9DB.1070700@fu-berlin.de> Hello *, I have a set of sequences as strings in an array. I now want to turn them into an iterator over RichSequences in fasta-format. I read in the cookbook, but I dont get it. And looked up the examples in biojavax-doc. I tried much but I have no good starting point. No starting point at all. How do the RichSequenceBuilder work? What about the FastaFormat-thing? I thought about putting the sequences in a fast-file and then read the file. But there must be a much more straight-forward way! Thanks in advance for any hints, Oliver From holland at eaglegenomics.com Sun Dec 6 10:41:31 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 6 Dec 2009 15:41:31 +0000 Subject: [Biojava-l] Sequences as strings to RichSequence iterator In-Reply-To: <4B1BB9DB.1070700@fu-berlin.de> References: <4B1BB9DB.1070700@fu-berlin.de> Message-ID: <6A410A55-1745-4189-A314-15DFC37360DF@eaglegenomics.com> I'm not sure what you're trying to do here - are you trying to represent your string array of sequences as a RichSequenceIterator, or are you trying to convert them into FASTA? I'll answer both anyway...: To convert your String[] of sequences into a RichSequenceIterator you need to create a new class that implements the RichSequenceIterator interface. You would probably write something like this (which I have not checked or compiled - so if it has bugs, sorry!): public class MyDNASeqIterator implements RichSequenceIterator { private final String[] sequences; private final int counter; public MyDNASeqIterator(String[] sequences) { this.sequences = sequences; this.counter = 0; } public hasNext() { return this.counter <= this.sequences.length; } public Sequence nextSequence() { return nextRichSequence(); } public BioEntry nextBioEntry() { return nextRichSequence(); } public RichSequence nextRichSequence() { String seqName = "MySeq"+this.counter; return RichSequence.Tools.createRichSequence(seqName, this.sequences[this.counter++], DNATools.getDNA()); } } You can then instantiate an object using MyDNASeqIterator's constructor to give it your string array, and iterate over it to get corresponding RichSequence instances. To convert your sequences to FASTA, use the above iterator to generate sequences to pass to FastaFormat in the same way that you would write a normal FASTA file. cheers, Richard On 6 Dec 2009, at 14:04, Oliver Stolpe wrote: > Hello *, > > I have a set of sequences as strings in an array. I now want to turn them into an iterator over RichSequences in fasta-format. I read in the cookbook, but I dont get it. And looked up the examples in biojavax-doc. I tried much but I have no good starting point. No starting point at all. How do the RichSequenceBuilder work? What about the FastaFormat-thing? > I thought about putting the sequences in a fast-file and then read the file. But there must be a much more straight-forward way! > > Thanks in advance for any hints, > Oliver > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From koen.bruynseels at cropdesign.com Sun Dec 6 12:09:01 2009 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Sun, 6 Dec 2009 18:09:01 +0100 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 12/06/2009 and will not return until 12/09/2009. I will respond to your message when I return. From holland at eaglegenomics.com Sun Dec 6 12:52:32 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 6 Dec 2009 17:52:32 +0000 Subject: [Biojava-l] Sequences as strings to RichSequence iterator In-Reply-To: <6A410A55-1745-4189-A314-15DFC37360DF@eaglegenomics.com> References: <4B1BB9DB.1070700@fu-berlin.de> <6A410A55-1745-4189-A314-15DFC37360DF@eaglegenomics.com> Message-ID: <67CC4477-F8DC-4C90-9ECF-D56789488229@eaglegenomics.com> PS. Spot the deliberate mistake in the hasNext() function... that should be <, not <=! PPS. In your original email you stated you wanted to read your sequences as Fasta. In Biojava, all sequences are RichSequences - they have no format other than the object model of RichSequence itself. Fasta only gets involved when you're reading from a Fasta file, or writing to one. If you need to show the sequences as Fasta in your user interface, you should consider using the FastaWriter writeSequence() method with the PrintStream parameter and wiring in a StringWriter to the PrintStream so that you can get a String representation of a Fasta record. On 6 Dec 2009, at 15:41, Richard Holland wrote: > I'm not sure what you're trying to do here - are you trying to represent your string array of sequences as a RichSequenceIterator, or are you trying to convert them into FASTA? I'll answer both anyway...: > > To convert your String[] of sequences into a RichSequenceIterator you need to create a new class that implements the RichSequenceIterator interface. You would probably write something like this (which I have not checked or compiled - so if it has bugs, sorry!): > > public class MyDNASeqIterator implements RichSequenceIterator { > private final String[] sequences; > private final int counter; > > public MyDNASeqIterator(String[] sequences) { this.sequences = sequences; this.counter = 0; } > > public hasNext() { > return this.counter <= this.sequences.length; > } > > public Sequence nextSequence() { return nextRichSequence(); } > > public BioEntry nextBioEntry() { return nextRichSequence(); } > > public RichSequence nextRichSequence() { > String seqName = "MySeq"+this.counter; > return RichSequence.Tools.createRichSequence(seqName, this.sequences[this.counter++], DNATools.getDNA()); > } > } > > You can then instantiate an object using MyDNASeqIterator's constructor to give it your string array, and iterate over it to get corresponding RichSequence instances. > > To convert your sequences to FASTA, use the above iterator to generate sequences to pass to FastaFormat in the same way that you would write a normal FASTA file. > > cheers, > Richard > > On 6 Dec 2009, at 14:04, Oliver Stolpe wrote: > >> Hello *, >> >> I have a set of sequences as strings in an array. I now want to turn them into an iterator over RichSequences in fasta-format. I read in the cookbook, but I dont get it. And looked up the examples in biojavax-doc. I tried much but I have no good starting point. No starting point at all. How do the RichSequenceBuilder work? What about the FastaFormat-thing? >> I thought about putting the sequences in a fast-file and then read the file. But there must be a much more straight-forward way! >> >> Thanks in advance for any hints, >> Oliver >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From martin.petr at matfyz.cz Mon Dec 7 14:54:04 2009 From: martin.petr at matfyz.cz (Martin Petr) Date: Mon, 7 Dec 2009 20:54:04 +0100 Subject: [Biojava-l] A software project suggestion Message-ID: Hi everybody, I'm a computer science student currently in the last year of bachelor studies and I'm looking for an interesting software project for my Java course. And since I also happen to study molecular biology (I'm just in the first semester now, so there is a long way ahead of me) and I'm very interested in bioinformatics, I decided to ask here for suggestions. Do you have any ideas for a possible BioJava related project? Do you miss any functionality in BioJava that I could add? I have to say that my knowledge of bioinformatics is very vague (although I have quite a solid background in general computer science, at least) but I guess that shouldn't be a big problem. I'm not talking here anything PhD level-like, not even BSc level-like, it may be just some "boring" technical stuff that needs to be done. I would just prefer to help and do something more useful, which really can't be said about a zilionth clone of IRC bot or something like that. :) I take it as a good opportunity to learn something about BioJava itself, since it very well may be my tool of choice when I finally get a chance to get my hands dirty in some research! In fact, that's why I got the idea to help BioJava in the first place. Thanks in advance for any replies and suggestions. Have a nice day. Martin Petr Charles University in Prague Czech Republic From andreas at sdsc.edu Mon Dec 7 21:15:22 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 7 Dec 2009 18:15:22 -0800 Subject: [Biojava-l] A software project suggestion In-Reply-To: References: Message-ID: <59a41c430912071815v4cd9ffe4m758b198182dc1316@mail.gmail.com> Hi Martin, Thanks for your interest in a project. Our current TODO list can be found here: http://www.biojava.org/wiki/BioJava:Modules The part of biojava that probably would benefit a lot from some work are the Blast parser modules... E.g. we are missing a PSI blast parser ... Andreas On Mon, Dec 7, 2009 at 11:54 AM, Martin Petr wrote: > Hi everybody, > > I'm a computer science student currently in the last year of bachelor > studies and I'm looking for an interesting software project for my > Java course. And since I also happen to study molecular biology (I'm > just in the first semester now, so there is a long way ahead of me) > and I'm very interested in bioinformatics, I decided to ask here for > suggestions. > > Do you have any ideas for a possible BioJava related project? Do you > miss any functionality in BioJava that I could add? I have to say that > my knowledge of bioinformatics is very vague (although I have quite a > solid background in general computer science, at least) but I guess > that shouldn't be a big problem. > > I'm not talking here anything PhD level-like, not even BSc level-like, > it may be just some "boring" technical stuff that needs to be done. I > would just prefer to help and do something more useful, which really > can't be said about a zilionth clone of IRC bot or something like > that. :) > > I take it as a good opportunity to learn something about BioJava > itself, since it very well may be my tool of choice when I finally get > a chance to get my hands dirty in some research! In fact, that's why I > got the idea to help BioJava in the first place. > > Thanks in advance for any replies and suggestions. Have a nice day. > > Martin Petr > Charles University in Prague > Czech Republic > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From markjschreiber at gmail.com Mon Dec 7 21:34:48 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 8 Dec 2009 10:34:48 +0800 Subject: [Biojava-l] A software project suggestion In-Reply-To: References: Message-ID: <93b45ca50912071834g49673addgdc66345f79a80f25@mail.gmail.com> Hi Martin - Something that would be useful to have is a parser and object model for Entrez Gene, mainly the XML format.? This format is available by CGI-BIN, SOAP and FTP and contains a great deal of useful information. The project is possibly a bit more challenging than it first may seem as the format is arcane to say the least. You will not get anything terribly useful by autogenerating something with JAXB.? Also the SOAP WSDL doesn't work out of the box with JAX-WS, you can use AXIS but again the autogenerated object binding to the XML is rubbish (due to the very confusing XML structure).? If you made a parser for the SOAP service you would be best to go to a lower level (such as SAAJ). Anyhow the main idea is to get all of the tremendously useful information extracted from the XML and into a friendly Java beans API. - Mark On Tue, Dec 8, 2009 at 3:54 AM, Martin Petr wrote: > > Hi everybody, > > I'm a computer science student currently in the last year of bachelor > studies and I'm looking for an interesting software project for my > Java course. And since I also happen to study molecular biology (I'm > just in the first semester now, so there is a long way ahead of me) > and I'm very interested in bioinformatics, I decided to ask here for > suggestions. > > Do you have any ideas for a possible BioJava related project? Do you > miss any functionality in BioJava that I could add? I have to say that > my knowledge of bioinformatics is very vague (although I have quite a > solid background in general computer science, at least) but I guess > that shouldn't be a big problem. > > I'm not talking here anything PhD level-like, not even BSc level-like, > it may be just some "boring" technical stuff that needs to be done. I > would just prefer to help and do something more useful, which really > can't be said about a zilionth clone of IRC bot or something like > that. :) > > I take it as a good opportunity to learn something about BioJava > itself, since it very well may be my tool of choice when I finally get > a chance to get my hands dirty in some research! In fact, that's why I > got the idea to help BioJava in the first place. > > Thanks in advance for any replies and suggestions. Have a nice day. > > Martin Petr > Charles University in Prague > Czech Republic > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas.draeger at uni-tuebingen.de Tue Dec 8 01:27:00 2009 From: andreas.draeger at uni-tuebingen.de (Andreas Draeger) Date: Tue, 08 Dec 2009 07:27:00 +0100 Subject: [Biojava-l] A software project suggestion In-Reply-To: References: Message-ID: <4B1DF1B4.9040103@uni-tuebingen.de> Hi Martin Petr, Something else that is important is to implement an interface data structure that allows for more customizable Alignment outputs. At the moment our two alignment algorithms (SmithWaterman and NeedlemanWunsch) both print each pair-wise alignment in a Blast-like format. However, as sometimes it would be nice to have some renderer that lays out the output, the alignment should not be printed anymore but passed to the renderer, which is an interface and can therefore be implemented in several ways. Depending on the implementation, one would receive an HTML document or the current Blast-like output or something different. Could this be interesting to you? Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From niall at sgenomics.org Tue Dec 8 08:59:52 2009 From: niall at sgenomics.org (Niall Haslam) Date: Tue, 08 Dec 2009 13:59:52 +0000 Subject: [Biojava-l] A software project suggestion In-Reply-To: <59a41c430912071815v4cd9ffe4m758b198182dc1316@mail.gmail.com> References: <59a41c430912071815v4cd9ffe4m758b198182dc1316@mail.gmail.com> Message-ID: Hi, Could we vote for these? I'd like to echo Andreas' call for someone to work on BLAST parsers. Its never gonna be sexy but it is something that needs doing. Niall. On 8 Dec 2009, at 02:15, Andreas Prlic wrote: > Hi Martin, > > Thanks for your interest in a project. Our current TODO list can be found here: > http://www.biojava.org/wiki/BioJava:Modules > > The part of biojava that probably would benefit a lot from some work > are the Blast parser modules... E.g. we are missing a PSI blast parser > ... > > Andreas > > > > On Mon, Dec 7, 2009 at 11:54 AM, Martin Petr wrote: >> Hi everybody, >> >> I'm a computer science student currently in the last year of bachelor >> studies and I'm looking for an interesting software project for my >> Java course. And since I also happen to study molecular biology (I'm >> just in the first semester now, so there is a long way ahead of me) >> and I'm very interested in bioinformatics, I decided to ask here for >> suggestions. >> >> Do you have any ideas for a possible BioJava related project? Do you >> miss any functionality in BioJava that I could add? I have to say that >> my knowledge of bioinformatics is very vague (although I have quite a >> solid background in general computer science, at least) but I guess >> that shouldn't be a big problem. >> >> I'm not talking here anything PhD level-like, not even BSc level-like, >> it may be just some "boring" technical stuff that needs to be done. I >> would just prefer to help and do something more useful, which really >> can't be said about a zilionth clone of IRC bot or something like >> that. :) >> >> I take it as a good opportunity to learn something about BioJava >> itself, since it very well may be my tool of choice when I finally get >> a chance to get my hands dirty in some research! In fact, that's why I >> got the idea to help BioJava in the first place. >> >> Thanks in advance for any replies and suggestions. Have a nice day. >> >> Martin Petr >> Charles University in Prague >> Czech Republic >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From holland at eaglegenomics.com Tue Dec 8 12:43:16 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 8 Dec 2009 17:43:16 +0000 Subject: [Biojava-l] A software project suggestion In-Reply-To: References: <59a41c430912071815v4cd9ffe4m758b198182dc1316@mail.gmail.com> Message-ID: My vote is with BLAST. On 8 Dec 2009, at 13:59, Niall Haslam wrote: > Hi, > > Could we vote for these? I'd like to echo Andreas' call for someone to work on BLAST parsers. Its never gonna be sexy but it is something that needs doing. > > Niall. > On 8 Dec 2009, at 02:15, Andreas Prlic wrote: > >> Hi Martin, >> >> Thanks for your interest in a project. Our current TODO list can be found here: >> http://www.biojava.org/wiki/BioJava:Modules >> >> The part of biojava that probably would benefit a lot from some work >> are the Blast parser modules... E.g. we are missing a PSI blast parser >> ... >> >> Andreas >> >> >> >> On Mon, Dec 7, 2009 at 11:54 AM, Martin Petr wrote: >>> Hi everybody, >>> >>> I'm a computer science student currently in the last year of bachelor >>> studies and I'm looking for an interesting software project for my >>> Java course. And since I also happen to study molecular biology (I'm >>> just in the first semester now, so there is a long way ahead of me) >>> and I'm very interested in bioinformatics, I decided to ask here for >>> suggestions. >>> >>> Do you have any ideas for a possible BioJava related project? Do you >>> miss any functionality in BioJava that I could add? I have to say that >>> my knowledge of bioinformatics is very vague (although I have quite a >>> solid background in general computer science, at least) but I guess >>> that shouldn't be a big problem. >>> >>> I'm not talking here anything PhD level-like, not even BSc level-like, >>> it may be just some "boring" technical stuff that needs to be done. I >>> would just prefer to help and do something more useful, which really >>> can't be said about a zilionth clone of IRC bot or something like >>> that. :) >>> >>> I take it as a good opportunity to learn something about BioJava >>> itself, since it very well may be my tool of choice when I finally get >>> a chance to get my hands dirty in some research! In fact, that's why I >>> got the idea to help BioJava in the first place. >>> >>> Thanks in advance for any replies and suggestions. Have a nice day. >>> >>> Martin Petr >>> Charles University in Prague >>> Czech Republic >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Wed Dec 9 07:07:16 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 9 Dec 2009 12:07:16 +0000 Subject: [Biojava-l] Job vacancies Message-ID: Hi there, Apologies for posting job ads to the lists but I think this one is relevant to the members so please forgive me. If you know of anyone that might be interested, could you forward them this job ad? Also if you have any local resources for posting job ads, it would be great if you could put the ad up there as well. Sorry to intrude - I'll leave you in peace again now. :) thanks! Richard 1. Senior Bioinformatics Software Developer (Perl/Ensembl) 2. Senior Bioinformatics Software Developer (Java/Taverna) Eagle Genomics Ltd., Cambridge, UK http://www.eaglegenomics.com/ We are a young and exciting bioinformatics company looking to revolutionise the way in which industry and academia work together. We are based at the heart of Europe's largest biotech cluster in Cambridge, UK. As we expand our client base, we're looking to build a talented and committed team of experts. We are currently looking for TWO top-class developers to work on a wide range of complex projects. Both these roles involve a large element of customer support and so you'll need to be friendly, communicative, and happy to work face-to-face with our customers on a daily basis. You will have had substantial prior experience working in a life science company or research institute. For the Perl/Ensembl role, you will have had extensive experience as a Perl developer and ideally you will have written code or plugins for the Ensembl Genome Browser or used its APIs in your own progams. For the Java/Taverna role you will be an expert Java developer with the ability to rapidly analyse and understand complex systems. Ideally you will have written code or plugins for the Taverna project or written code that makes use of it. For both roles, in addition to your superb technical and customer service skills, you will also: ? be able to communicate clearly and proactively, ? have the ability to quickly translate scientific problems into real software solutions, ? be able to put technical concepts into simple language for end users to understand, ? be able to pick up new skills and techniques in record time, ? work well in a collaborative team environment, ? be creative, innovative, and forward-thinking. You will also have had hands-on experience in at least two of the following: ? SQL query design, ? Open-source bioinformatics toolkits such as BioPerl, BioJava, BioSQL, etc., ? Amazon EC2, ? Workflow/pipeline design, ? System and user documentation, ? Developing user training courses. The successful candidate would be expected to work from our offices near Cambridge, however remote working in the UK/Europe may be considered for exceptional candidates (depending on exact location). We offer a competitive salary and a range of company benefits. To apply, please send your CV and cover letter as PDF documents to jobs @eaglegenomics.com . We are only able to offer positions to EEA citizens and permanent residents, or existing holders of UK Tier 1 migrant visas. No agencies please. Closing date: 8th JANUARY 2010. -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ilhami.visne at gmail.com Thu Dec 10 19:46:26 2009 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Fri, 11 Dec 2009 01:46:26 +0100 Subject: [Biojava-l] A Exception Has Occurred During Parsing. Message-ID: Hi, I'm following the suggestion "Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" . The sequence of concerns is at http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 Format_object=org.biojavax.bio.seq.io.GenbankFormat Accession=null Id=null Comments=Bad section Parse_block= Stack trace follows .... at org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 8 more Caused by: java.lang.NullPointerException at org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) ... 10 more org.biojava.bio.BioException: IO failure whilst reading from Genbank Any quick fix,patch? thanks. Ilhami Visne From holland at eaglegenomics.com Fri Dec 11 04:59:34 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 11 Dec 2009 09:59:34 +0000 Subject: [Biojava-l] A Exception Has Occurred During Parsing. In-Reply-To: References: Message-ID: Hello. Could you also post the relevant parts of your code that you are running when this exception happens? cheers, Richard On 11 Dec 2009, at 00:46, Ilhami Visne wrote: > Hi, > > I'm following the suggestion "Please submit the details that follow to > biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" > . > > The sequence of concerns is at > http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 > > Format_object=org.biojavax.bio.seq.io.GenbankFormat > Accession=null > Id=null > Comments=Bad section > Parse_block= > Stack trace follows .... > > > at > org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) > at > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 8 more > Caused by: java.lang.NullPointerException > at > org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) > ... 10 more > org.biojava.bio.BioException: IO failure whilst reading from Genbank > > Any quick fix,patch? > > thanks. > Ilhami Visne > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From oliver.stolpe at fu-berlin.de Fri Dec 11 05:44:26 2009 From: oliver.stolpe at fu-berlin.de (Oliver Stolpe) Date: Fri, 11 Dec 2009 11:44:26 +0100 Subject: [Biojava-l] Sequences as strings to RichSequence iterator In-Reply-To: <67CC4477-F8DC-4C90-9ECF-D56789488229@eaglegenomics.com> References: <4B1BB9DB.1070700@fu-berlin.de> <6A410A55-1745-4189-A314-15DFC37360DF@eaglegenomics.com> <67CC4477-F8DC-4C90-9ECF-D56789488229@eaglegenomics.com> Message-ID: <4B22228A.7010901@fu-berlin.de> Dear Richard, thank you for your answer. What you stated in your PSS is exactly what I want. It worked well with the ByteArrayOutputStream. Nevertheless I dont use it but write my own string by concatenating the names and sequences using String s = ">" + name + "\r\n" + sequence + "\r\n". I know its not the best way but this fasta format thing modified the input names too much that I would have had to much trouble getting the right information out of them. Best regards, Oliver Richard Holland schrieb: > PS. Spot the deliberate mistake in the hasNext() function... that should be <, not <=! > > PPS. In your original email you stated you wanted to read your sequences as Fasta. In Biojava, all sequences are RichSequences - they have no format other than the object model of RichSequence itself. Fasta only gets involved when you're reading from a Fasta file, or writing to one. If you need to show the sequences as Fasta in your user interface, you should consider using the FastaWriter writeSequence() method with the PrintStream parameter and wiring in a StringWriter to the PrintStream so that you can get a String representation of a Fasta record. > > On 6 Dec 2009, at 15:41, Richard Holland wrote: > > >> I'm not sure what you're trying to do here - are you trying to represent your string array of sequences as a RichSequenceIterator, or are you trying to convert them into FASTA? I'll answer both anyway...: >> >> To convert your String[] of sequences into a RichSequenceIterator you need to create a new class that implements the RichSequenceIterator interface. You would probably write something like this (which I have not checked or compiled - so if it has bugs, sorry!): >> >> public class MyDNASeqIterator implements RichSequenceIterator { >> private final String[] sequences; >> private final int counter; >> >> public MyDNASeqIterator(String[] sequences) { this.sequences = sequences; this.counter = 0; } >> >> public hasNext() { >> return this.counter <= this.sequences.length; >> } >> >> public Sequence nextSequence() { return nextRichSequence(); } >> >> public BioEntry nextBioEntry() { return nextRichSequence(); } >> >> public RichSequence nextRichSequence() { >> String seqName = "MySeq"+this.counter; >> return RichSequence.Tools.createRichSequence(seqName, this.sequences[this.counter++], DNATools.getDNA()); >> } >> } >> >> You can then instantiate an object using MyDNASeqIterator's constructor to give it your string array, and iterate over it to get corresponding RichSequence instances. >> >> To convert your sequences to FASTA, use the above iterator to generate sequences to pass to FastaFormat in the same way that you would write a normal FASTA file. >> >> cheers, >> Richard >> >> On 6 Dec 2009, at 14:04, Oliver Stolpe wrote: >> >> >>> Hello *, >>> >>> I have a set of sequences as strings in an array. I now want to turn them into an iterator over RichSequences in fasta-format. I read in the cookbook, but I dont get it. And looked up the examples in biojavax-doc. I tried much but I have no good starting point. No starting point at all. How do the RichSequenceBuilder work? What about the FastaFormat-thing? >>> I thought about putting the sequences in a fast-file and then read the file. But there must be a much more straight-forward way! >>> >>> Thanks in advance for any hints, >>> Oliver >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > Mit den besten Gr??en, Oliver -- Semestersprecher 5. Semester Bioinformatik From ilhami.visne at gmail.com Fri Dec 11 06:36:46 2009 From: ilhami.visne at gmail.com (ilhami visne) Date: Fri, 11 Dec 2009 12:36:46 +0100 Subject: [Biojava-l] A Exception Has Occurred During Parsing. In-Reply-To: References: Message-ID: <4B222ECE.9020401@gmail.com> I created a new class NCBIGenbankSequenceFetcher.java which extends GenbankRichSequenceDB and overrid the "getAddress(String id)" to limit the sequence for an id (seq_start and seq_stop). public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{ private String seq_start; private String seq_stop; private String strand="1";//1=plus, 2=minus public NCBIGenbankSequenceFetcher() { } public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) { this.seq_start = seq_start; this.seq_stop = seq_stop; } public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop,String strand) { this.seq_start = seq_start; this.seq_stop = seq_stop; this.strand = strand; } @Override protected URL getAddress(String id) throws MalformedURLException { FetchURL seqURL = new FetchURL("Genbank", "text"); String baseurl = seqURL.getbaseURL(); String db = seqURL.getDB(); String url = baseurl+db+"&id="+id+"&rettype=gb"; if(seq_start != null && seq_stop != null){ url +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand; } return new URL(url); } } From an other class, i create an instance of this class and then call its "getRichSequence(id)" method. (Not the same, but similar) for(String gi:ids){ // ids is a list seq = new NCBIGenbankSequenceFetcher(seq_start, seq_stop,strand).getRichSequence(gi); } What i found later is that it randomly throws the exception, not by any particular sequence. So my guess an io error, which arises during the data streaming from server. ilhami visne. On 12/11/2009 10:59 AM, Richard Holland wrote: > Hello. Could you also post the relevant parts of your code that you are running when this exception happens? > > cheers, > Richard > > On 11 Dec 2009, at 00:46, Ilhami Visne wrote: > > >> Hi, >> >> I'm following the suggestion "Please submit the details that follow to >> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" >> . >> >> The sequence of concerns is at >> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >> >> Format_object=org.biojavax.bio.seq.io.GenbankFormat >> Accession=null >> Id=null >> Comments=Bad section >> Parse_block= >> Stack trace follows .... >> >> >> at >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >> at >> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >> ... 8 more >> Caused by: java.lang.NullPointerException >> at >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >> ... 10 more >> org.biojava.bio.BioException: IO failure whilst reading from Genbank >> >> Any quick fix,patch? >> >> thanks. >> Ilhami Visne >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > From holland at eaglegenomics.com Fri Dec 11 09:17:21 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 11 Dec 2009 14:17:21 +0000 Subject: [Biojava-l] A Exception Has Occurred During Parsing. In-Reply-To: <4B222ECE.9020401@gmail.com> References: <4B222ECE.9020401@gmail.com> Message-ID: <30CF8138-B7BD-41A3-B130-702BE553A06A@eaglegenomics.com> If the problem is random, it's almost certainly due to problems with the NCBI server feeding you data. There are restrictions on usage - e.g. NCBI only allows a certain number of requests - so you might be running into those. cheers, Richard On 11 Dec 2009, at 11:36, ilhami visne wrote: > I created a new class NCBIGenbankSequenceFetcher.java which extends GenbankRichSequenceDB and overrid the "getAddress(String id)" to limit the sequence for an id (seq_start and seq_stop). > > public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{ > > private String seq_start; > private String seq_stop; > private String strand="1";//1=plus, 2=minus > > public NCBIGenbankSequenceFetcher() { > } > > public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) { > this.seq_start = seq_start; > this.seq_stop = seq_stop; > } > > public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop,String strand) { > this.seq_start = seq_start; > this.seq_stop = seq_stop; > this.strand = strand; > } > > @Override > protected URL getAddress(String id) throws MalformedURLException { > FetchURL seqURL = new FetchURL("Genbank", "text"); > String baseurl = seqURL.getbaseURL(); > String db = seqURL.getDB(); > String url = baseurl+db+"&id="+id+"&rettype=gb"; > if(seq_start != null && seq_stop != null){ > url +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand; > } > return new URL(url); > } > } > > From an other class, i create an instance of this class and then call its "getRichSequence(id)" method. (Not the same, but similar) > > for(String gi:ids){ // ids is a list > seq = new NCBIGenbankSequenceFetcher(seq_start, seq_stop,strand).getRichSequence(gi); > } > > What i found later is that it randomly throws the exception, not by any particular sequence. So my guess an io error, which arises during the data streaming from server. > > ilhami visne. > > On 12/11/2009 10:59 AM, Richard Holland wrote: >> Hello. Could you also post the relevant parts of your code that you are running when this exception happens? >> >> cheers, >> Richard >> >> On 11 Dec 2009, at 00:46, Ilhami Visne wrote: >> >> >> >>> Hi, >>> >>> I'm following the suggestion "Please submit the details that follow to >>> >>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ >>> " >>> . >>> >>> The sequence of concerns is at >>> >>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >>> >>> >>> Format_object=org.biojavax.bio.seq.io.GenbankFormat >>> Accession=null >>> Id=null >>> Comments=Bad section >>> Parse_block= >>> Stack trace follows .... >>> >>> >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >>> at >>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>> ... 8 more >>> Caused by: java.lang.NullPointerException >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >>> ... 10 more >>> org.biojava.bio.BioException: IO failure whilst reading from Genbank >>> >>> Any quick fix,patch? >>> >>> thanks. >>> Ilhami Visne >>> _______________________________________________ >>> Biojava-l mailing list - >>> Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: >> holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> >> >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From mark.schreiber at novartis.com Sun Dec 13 21:38:04 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Mon, 14 Dec 2009 10:38:04 +0800 Subject: [Biojava-l] A Exception Has Occurred During Parsing. In-Reply-To: <4B222ECE.9020401@gmail.com> Message-ID: Hi - Judging by the sporadic nature of the bug and the error report I think you are correct that there is an I/O problem. I think you are not getting back a Genbank format object but probably an error message from NCBI or a time out return or similar. It would be useful if the NCBIGenbankSequenceFetcher could be modified to optionally log the response to the URL. Best regards, - Mark biojava-l-bounces at lists.open-bio.org wrote on 12/11/2009 07:36:46 PM: > I created a new class NCBIGenbankSequenceFetcher.java which extends > GenbankRichSequenceDB and overrid the "getAddress(String id)" to limit > the sequence for an id (seq_start and seq_stop). > > public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{ > > private String seq_start; > private String seq_stop; > private String strand="1";//1=plus, 2=minus > > public NCBIGenbankSequenceFetcher() { > } > > public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) { > this.seq_start = seq_start; > this.seq_stop = seq_stop; > } > > public NCBIGenbankSequenceFetcher(String seq_start, String > seq_stop,String strand) { > this.seq_start = seq_start; > this.seq_stop = seq_stop; > this.strand = strand; > } > > @Override > protected URL getAddress(String id) throws MalformedURLException { > FetchURL seqURL = new FetchURL("Genbank", "text"); > String baseurl = seqURL.getbaseURL(); > String db = seqURL.getDB(); > String url = baseurl+db+"&id="+id+"&rettype=gb"; > if(seq_start != null && seq_stop != null){ > url > +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand; > } > return new URL(url); > } > } > > From an other class, i create an instance of this class and then call > its "getRichSequence(id)" method. (Not the same, but similar) > > for(String gi:ids){ // ids is a list > seq = new NCBIGenbankSequenceFetcher(seq_start, > seq_stop,strand).getRichSequence(gi); > } > > What i found later is that it randomly throws the exception, not by any > particular sequence. So my guess an io error, which arises during the > data streaming from server. > > ilhami visne. > > On 12/11/2009 10:59 AM, Richard Holland wrote: > > Hello. Could you also post the relevant parts of your code that > you are running when this exception happens? > > > > cheers, > > Richard > > > > On 11 Dec 2009, at 00:46, Ilhami Visne wrote: > > > > > >> Hi, > >> > >> I'm following the suggestion "Please submit the details that follow to > >> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ > " > >> . > >> > >> The sequence of concerns is at > >> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? > db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 > >> > >> Format_object=org.biojavax.bio.seq.io.GenbankFormat > >> Accession=null > >> Id=null > >> Comments=Bad section > >> Parse_block= > >> Stack trace follows .... > >> > >> > >> at > >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) > >> at > >> org.biojavax.bio.seq.io.GenbankFormat. > readRichSequence(GenbankFormat.java:278) > >> at > >> org.biojavax.bio.seq.io.RichStreamReader. > nextRichSequence(RichStreamReader.java:110) > >> ... 8 more > >> Caused by: java.lang.NullPointerException > >> at > >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) > >> ... 10 more > >> org.biojava.bio.BioException: IO failure whilst reading from Genbank > >> > >> Any quick fix,patch? > >> > >> thanks. > >> Ilhami Visne > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From sheoran143 at gmail.com Sun Dec 13 17:11:44 2009 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Sun, 13 Dec 2009 16:11:44 -0600 Subject: [Biojava-l] A Exception Has Occurred During Parsing. (Ilhami Visne) In-Reply-To: References: Message-ID: <4B2566A0.9070909@gmail.com> Hi, I am attaching a quick fix to solve you problem with this record. Please have look on the attached .jpeg file to see the solution. Thanks Deepak Sheoran > Hi, > > I'm following the suggestion "Please submit the details that follow to > biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" > . > > The sequence of concerns is at > http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 > > Format_object=org.biojavax.bio.seq.io.GenbankFormat > Accession=null > Id=null > Comments=Bad section > Parse_block= > Stack trace follows .... > > > at > org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) > at > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 8 more > Caused by: java.lang.NullPointerException > at > org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) > ... 10 more > org.biojava.bio.BioException: IO failure whilst reading from Genbank > > Any quick fix,patch? > > thanks. > Ilhami Visne > > > > ------------------------------------------------------------------------ > > Subject: > Re: [Biojava-l] A Exception Has Occurred During Parsing. > From: > Richard Holland > Date: > Fri, 11 Dec 2009 09:59:34 +0000 > To: > Ilhami Visne > > To: > Ilhami Visne > CC: > biojava-l at lists.open-bio.org > > > Hello. Could you also post the relevant parts of your code that you are running when this exception happens? > > cheers, > Richard > > On 11 Dec 2009, at 00:46, Ilhami Visne wrote: > > >> Hi, >> >> I'm following the suggestion "Please submit the details that follow to >> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" >> . >> >> The sequence of concerns is at >> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >> >> Format_object=org.biojavax.bio.seq.io.GenbankFormat >> Accession=null >> Id=null >> Comments=Bad section >> Parse_block= >> Stack trace follows .... >> >> >> at >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >> at >> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >> ... 8 more >> Caused by: java.lang.NullPointerException >> at >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >> ... 10 more >> org.biojava.bio.BioException: IO failure whilst reading from Genbank >> >> Any quick fix,patch? >> >> thanks. >> Ilhami Visne >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > > > > ------------------------------------------------------------------------ > > Subject: > Re: [Biojava-l] Sequences as strings to RichSequence iterator > From: > Oliver Stolpe > Date: > Fri, 11 Dec 2009 11:44:26 +0100 > To: > Richard Holland > > To: > Richard Holland > CC: > biojava-l at biojava.org > > > Dear Richard, > > thank you for your answer. What you stated in your PSS is exactly what > I want. It worked well with the ByteArrayOutputStream. Nevertheless I > dont use it but write my own string by concatenating the names and > sequences using String s = ">" + name + "\r\n" + sequence + "\r\n". I > know its not the best way but this fasta format thing modified the > input names too much that I would have had to much trouble getting the > right information out of them. > > Best regards, > Oliver > > Richard Holland schrieb: >> PS. Spot the deliberate mistake in the hasNext() function... that >> should be <, not <=! >> >> PPS. In your original email you stated you wanted to read your >> sequences as Fasta. In Biojava, all sequences are RichSequences - >> they have no format other than the object model of RichSequence >> itself. Fasta only gets involved when you're reading from a Fasta >> file, or writing to one. If you need to show the sequences as Fasta >> in your user interface, you should consider using the FastaWriter >> writeSequence() method with the PrintStream parameter and wiring in a >> StringWriter to the PrintStream so that you can get a String >> representation of a Fasta record. >> >> On 6 Dec 2009, at 15:41, Richard Holland wrote: >> >> >>> I'm not sure what you're trying to do here - are you trying to >>> represent your string array of sequences as a RichSequenceIterator, >>> or are you trying to convert them into FASTA? I'll answer both >>> anyway...: >>> >>> To convert your String[] of sequences into a RichSequenceIterator >>> you need to create a new class that implements the >>> RichSequenceIterator interface. You would probably write something >>> like this (which I have not checked or compiled - so if it has bugs, >>> sorry!): >>> >>> public class MyDNASeqIterator implements RichSequenceIterator { >>> private final String[] sequences; >>> private final int counter; >>> >>> public MyDNASeqIterator(String[] sequences) { this.sequences = >>> sequences; this.counter = 0; } >>> >>> public hasNext() { return this.counter <= >>> this.sequences.length; >>> } >>> >>> public Sequence nextSequence() { return nextRichSequence(); } >>> >>> public BioEntry nextBioEntry() { return nextRichSequence(); } >>> >>> public RichSequence nextRichSequence() { >>> String seqName = "MySeq"+this.counter; >>> return RichSequence.Tools.createRichSequence(seqName, >>> this.sequences[this.counter++], DNATools.getDNA()); >>> } >>> } >>> >>> You can then instantiate an object using MyDNASeqIterator's >>> constructor to give it your string array, and iterate over it to get >>> corresponding RichSequence instances. >>> >>> To convert your sequences to FASTA, use the above iterator to >>> generate sequences to pass to FastaFormat in the same way that you >>> would write a normal FASTA file. >>> >>> cheers, >>> Richard >>> >>> On 6 Dec 2009, at 14:04, Oliver Stolpe wrote: >>> >>> >>>> Hello *, >>>> >>>> I have a set of sequences as strings in an array. I now want to >>>> turn them into an iterator over RichSequences in fasta-format. I >>>> read in the cookbook, but I dont get it. And looked up the examples >>>> in biojavax-doc. I tried much but I have no good starting point. No >>>> starting point at all. How do the RichSequenceBuilder work? What >>>> about the FastaFormat-thing? >>>> I thought about putting the sequences in a fast-file and then read >>>> the file. But there must be a much more straight-forward way! >>>> >>>> Thanks in advance for any hints, >>>> Oliver >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > Mit den besten Gr??en, > Oliver > > > ------------------------------------------------------------------------ > > Subject: > Re: [Biojava-l] A Exception Has Occurred During Parsing. > From: > ilhami visne > Date: > Fri, 11 Dec 2009 12:36:46 +0100 > To: > Richard Holland > > To: > Richard Holland > CC: > biojava-l at lists.open-bio.org > > > I created a new class NCBIGenbankSequenceFetcher.java which extends > GenbankRichSequenceDB and overrid the "getAddress(String id)" to > limit the sequence for an id (seq_start and seq_stop). > > public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{ > > private String seq_start; > private String seq_stop; > private String strand="1";//1=plus, 2=minus > > public NCBIGenbankSequenceFetcher() { > } > > public NCBIGenbankSequenceFetcher(String seq_start, String > seq_stop) { > this.seq_start = seq_start; > this.seq_stop = seq_stop; > } > > public NCBIGenbankSequenceFetcher(String seq_start, String > seq_stop,String strand) { > this.seq_start = seq_start; > this.seq_stop = seq_stop; > this.strand = strand; > } > > @Override > protected URL getAddress(String id) throws MalformedURLException { > FetchURL seqURL = new FetchURL("Genbank", "text"); > String baseurl = seqURL.getbaseURL(); > String db = seqURL.getDB(); > String url = baseurl+db+"&id="+id+"&rettype=gb"; > if(seq_start != null && seq_stop != null){ > url > +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand; > } > return new URL(url); > } > } > > From an other class, i create an instance of this class and then call > its "getRichSequence(id)" method. (Not the same, but similar) > > for(String gi:ids){ // ids is a list > seq = new NCBIGenbankSequenceFetcher(seq_start, > seq_stop,strand).getRichSequence(gi); > } > > What i found later is that it randomly throws the exception, not by > any particular sequence. So my guess an io error, which arises during > the data streaming from server. > > ilhami visne. > > On 12/11/2009 10:59 AM, Richard Holland wrote: >> Hello. Could you also post the relevant parts of your code that you >> are running when this exception happens? >> >> cheers, >> Richard >> >> On 11 Dec 2009, at 00:46, Ilhami Visne wrote: >> >> >>> Hi, >>> >>> I'm following the suggestion "Please submit the details that follow to >>> biojava-l at biojava.org or post a bug report to >>> http://bugzilla.open-bio.org/" >>> . >>> >>> The sequence of concerns is at >>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >>> >>> >>> Format_object=org.biojavax.bio.seq.io.GenbankFormat >>> Accession=null >>> Id=null >>> Comments=Bad section >>> Parse_block= >>> Stack trace follows .... >>> >>> >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >>> >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >>> >>> at >>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>> >>> ... 8 more >>> Caused by: java.lang.NullPointerException >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >>> >>> ... 10 more >>> org.biojava.bio.BioException: IO failure whilst reading from Genbank >>> >>> Any quick fix,patch? >>> >>> thanks. >>> Ilhami Visne >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> > > > > ------------------------------------------------------------------------ > > Subject: > Re: [Biojava-l] A Exception Has Occurred During Parsing. > From: > Richard Holland > Date: > Fri, 11 Dec 2009 14:17:21 +0000 > To: > ilhami visne > > To: > ilhami visne > CC: > biojava-l at lists.open-bio.org > > > If the problem is random, it's almost certainly due to problems with the NCBI server feeding you data. There are restrictions on usage - e.g. NCBI only allows a certain number of requests - so you might be running into those. > > cheers, > Richard > > On 11 Dec 2009, at 11:36, ilhami visne wrote: > > >> I created a new class NCBIGenbankSequenceFetcher.java which extends GenbankRichSequenceDB and overrid the "getAddress(String id)" to limit the sequence for an id (seq_start and seq_stop). >> >> public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{ >> >> private String seq_start; >> private String seq_stop; >> private String strand="1";//1=plus, 2=minus >> >> public NCBIGenbankSequenceFetcher() { >> } >> >> public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) { >> this.seq_start = seq_start; >> this.seq_stop = seq_stop; >> } >> >> public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop,String strand) { >> this.seq_start = seq_start; >> this.seq_stop = seq_stop; >> this.strand = strand; >> } >> >> @Override >> protected URL getAddress(String id) throws MalformedURLException { >> FetchURL seqURL = new FetchURL("Genbank", "text"); >> String baseurl = seqURL.getbaseURL(); >> String db = seqURL.getDB(); >> String url = baseurl+db+"&id="+id+"&rettype=gb"; >> if(seq_start != null && seq_stop != null){ >> url +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand; >> } >> return new URL(url); >> } >> } >> >> From an other class, i create an instance of this class and then call its "getRichSequence(id)" method. (Not the same, but similar) >> >> for(String gi:ids){ // ids is a list >> seq = new NCBIGenbankSequenceFetcher(seq_start, seq_stop,strand).getRichSequence(gi); >> } >> >> What i found later is that it randomly throws the exception, not by any particular sequence. So my guess an io error, which arises during the data streaming from server. >> >> ilhami visne. >> >> On 12/11/2009 10:59 AM, Richard Holland wrote: >> >>> Hello. Could you also post the relevant parts of your code that you are running when this exception happens? >>> >>> cheers, >>> Richard >>> >>> On 11 Dec 2009, at 00:46, Ilhami Visne wrote: >>> >>> >>> >>> >>>> Hi, >>>> >>>> I'm following the suggestion "Please submit the details that follow to >>>> >>>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ >>>> " >>>> . >>>> >>>> The sequence of concerns is at >>>> >>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >>>> >>>> >>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat >>>> Accession=null >>>> Id=null >>>> Comments=Bad section >>>> Parse_block= >>>> Stack trace follows .... >>>> >>>> >>>> at >>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >>>> at >>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >>>> at >>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>>> ... 8 more >>>> Caused by: java.lang.NullPointerException >>>> at >>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >>>> ... 10 more >>>> org.biojava.bio.BioException: IO failure whilst reading from Genbank >>>> >>>> Any quick fix,patch? >>>> >>>> thanks. >>>> Ilhami Visne >>>> _______________________________________________ >>>> Biojava-l mailing list - >>>> Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> >>>> >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: >>> holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> >>> >>> >>> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: LocationOfError.jpeg Type: image/jpeg Size: 359584 bytes Desc: not available URL: From holland at eaglegenomics.com Mon Dec 14 18:18:51 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 14 Dec 2009 23:18:51 +0000 Subject: [Biojava-l] A Exception Has Occurred During Parsing. (Ilhami Visne) In-Reply-To: <4B2566A0.9070909@gmail.com> References: <4B2566A0.9070909@gmail.com> Message-ID: <7DDF7F72-C576-4647-82F7-C60DCAA5CFC5@eaglegenomics.com> Thanks for noticing this - so the problem was not random then, but was predictable based on specific sequences! I have patched the current version of GenbankFormat in subversion trunk to behave better with blank lines in comment sections. Can you independently test it to see if it works for this sequence now? On 13 Dec 2009, at 22:11, Deepak Sheoran wrote: > Hi, > I am attaching a quick fix to solve you problem with this record. Please have look on the attached .jpeg file to see the solution. > > Thanks > Deepak Sheoran > >> Hi, >> >> I'm following the suggestion "Please submit the details that follow to >> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" >> . >> >> The sequence of concerns is at >> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >> >> Format_object=org.biojavax.bio.seq.io.GenbankFormat >> Accession=null >> Id=null >> Comments=Bad section >> Parse_block= >> Stack trace follows .... >> >> >> at >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >> at >> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >> ... 8 more >> Caused by: java.lang.NullPointerException >> at >> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >> ... 10 more >> org.biojava.bio.BioException: IO failure whilst reading from Genbank >> >> Any quick fix,patch? >> >> thanks. >> Ilhami Visne >> >> >> ------------------------------------------------------------------------ >> >> Subject: >> Re: [Biojava-l] A Exception Has Occurred During Parsing. >> From: >> Richard Holland >> Date: >> Fri, 11 Dec 2009 09:59:34 +0000 >> To: >> Ilhami Visne >> >> To: >> Ilhami Visne >> CC: >> biojava-l at lists.open-bio.org >> >> >> Hello. Could you also post the relevant parts of your code that you are running when this exception happens? >> >> cheers, >> Richard >> >> On 11 Dec 2009, at 00:46, Ilhami Visne wrote: >> >> >>> Hi, >>> >>> I'm following the suggestion "Please submit the details that follow to >>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" >>> . >>> >>> The sequence of concerns is at >>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >>> >>> Format_object=org.biojavax.bio.seq.io.GenbankFormat >>> Accession=null >>> Id=null >>> Comments=Bad section >>> Parse_block= >>> Stack trace follows .... >>> >>> >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >>> at >>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>> ... 8 more >>> Caused by: java.lang.NullPointerException >>> at >>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >>> ... 10 more >>> org.biojava.bio.BioException: IO failure whilst reading from Genbank >>> >>> Any quick fix,patch? >>> >>> thanks. >>> Ilhami Visne >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> >> >> ------------------------------------------------------------------------ >> >> Subject: >> Re: [Biojava-l] Sequences as strings to RichSequence iterator >> From: >> Oliver Stolpe >> Date: >> Fri, 11 Dec 2009 11:44:26 +0100 >> To: >> Richard Holland >> >> To: >> Richard Holland >> CC: >> biojava-l at biojava.org >> >> >> Dear Richard, >> >> thank you for your answer. What you stated in your PSS is exactly what I want. It worked well with the ByteArrayOutputStream. Nevertheless I dont use it but write my own string by concatenating the names and sequences using String s = ">" + name + "\r\n" + sequence + "\r\n". I know its not the best way but this fasta format thing modified the input names too much that I would have had to much trouble getting the right information out of them. >> >> Best regards, >> Oliver >> Richard Holland schrieb: >>> PS. Spot the deliberate mistake in the hasNext() function... that should be <, not <=! >>> >>> PPS. In your original email you stated you wanted to read your sequences as Fasta. In Biojava, all sequences are RichSequences - they have no format other than the object model of RichSequence itself. Fasta only gets involved when you're reading from a Fasta file, or writing to one. If you need to show the sequences as Fasta in your user interface, you should consider using the FastaWriter writeSequence() method with the PrintStream parameter and wiring in a StringWriter to the PrintStream so that you can get a String representation of a Fasta record. >>> >>> On 6 Dec 2009, at 15:41, Richard Holland wrote: >>> >>> >>>> I'm not sure what you're trying to do here - are you trying to represent your string array of sequences as a RichSequenceIterator, or are you trying to convert them into FASTA? I'll answer both anyway...: >>>> >>>> To convert your String[] of sequences into a RichSequenceIterator you need to create a new class that implements the RichSequenceIterator interface. You would probably write something like this (which I have not checked or compiled - so if it has bugs, sorry!): >>>> >>>> public class MyDNASeqIterator implements RichSequenceIterator { >>>> private final String[] sequences; >>>> private final int counter; >>>> >>>> public MyDNASeqIterator(String[] sequences) { this.sequences = sequences; this.counter = 0; } >>>> >>>> public hasNext() { return this.counter <= this.sequences.length; >>>> } >>>> >>>> public Sequence nextSequence() { return nextRichSequence(); } >>>> >>>> public BioEntry nextBioEntry() { return nextRichSequence(); } >>>> >>>> public RichSequence nextRichSequence() { >>>> String seqName = "MySeq"+this.counter; >>>> return RichSequence.Tools.createRichSequence(seqName, this.sequences[this.counter++], DNATools.getDNA()); >>>> } >>>> } >>>> >>>> You can then instantiate an object using MyDNASeqIterator's constructor to give it your string array, and iterate over it to get corresponding RichSequence instances. >>>> >>>> To convert your sequences to FASTA, use the above iterator to generate sequences to pass to FastaFormat in the same way that you would write a normal FASTA file. >>>> >>>> cheers, >>>> Richard >>>> >>>> On 6 Dec 2009, at 14:04, Oliver Stolpe wrote: >>>> >>>> >>>>> Hello *, >>>>> >>>>> I have a set of sequences as strings in an array. I now want to turn them into an iterator over RichSequences in fasta-format. I read in the cookbook, but I dont get it. And looked up the examples in biojavax-doc. I tried much but I have no good starting point. No starting point at all. How do the RichSequenceBuilder work? What about the FastaFormat-thing? >>>>> I thought about putting the sequences in a fast-file and then read the file. But there must be a much more straight-forward way! >>>>> >>>>> Thanks in advance for any hints, >>>>> Oliver >>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> Mit den besten Gr??en, >> Oliver >> >> >> ------------------------------------------------------------------------ >> >> Subject: >> Re: [Biojava-l] A Exception Has Occurred During Parsing. >> From: >> ilhami visne >> Date: >> Fri, 11 Dec 2009 12:36:46 +0100 >> To: >> Richard Holland >> >> To: >> Richard Holland >> CC: >> biojava-l at lists.open-bio.org >> >> >> I created a new class NCBIGenbankSequenceFetcher.java which extends GenbankRichSequenceDB and overrid the "getAddress(String id)" to limit the sequence for an id (seq_start and seq_stop). >> >> public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{ >> >> private String seq_start; >> private String seq_stop; >> private String strand="1";//1=plus, 2=minus >> >> public NCBIGenbankSequenceFetcher() { >> } >> >> public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) { >> this.seq_start = seq_start; >> this.seq_stop = seq_stop; >> } >> >> public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop,String strand) { >> this.seq_start = seq_start; >> this.seq_stop = seq_stop; >> this.strand = strand; >> } >> >> @Override >> protected URL getAddress(String id) throws MalformedURLException { >> FetchURL seqURL = new FetchURL("Genbank", "text"); >> String baseurl = seqURL.getbaseURL(); >> String db = seqURL.getDB(); >> String url = baseurl+db+"&id="+id+"&rettype=gb"; >> if(seq_start != null && seq_stop != null){ >> url +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand; >> } >> return new URL(url); >> } >> } >> >> From an other class, i create an instance of this class and then call its "getRichSequence(id)" method. (Not the same, but similar) >> >> for(String gi:ids){ // ids is a list >> seq = new NCBIGenbankSequenceFetcher(seq_start, seq_stop,strand).getRichSequence(gi); >> } >> >> What i found later is that it randomly throws the exception, not by any particular sequence. So my guess an io error, which arises during the data streaming from server. >> >> ilhami visne. >> >> On 12/11/2009 10:59 AM, Richard Holland wrote: >>> Hello. Could you also post the relevant parts of your code that you are running when this exception happens? >>> >>> cheers, >>> Richard >>> >>> On 11 Dec 2009, at 00:46, Ilhami Visne wrote: >>> >>> >>>> Hi, >>>> >>>> I'm following the suggestion "Please submit the details that follow to >>>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/" >>>> . >>>> >>>> The sequence of concerns is at >>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >>>> >>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat >>>> Accession=null >>>> Id=null >>>> Comments=Bad section >>>> Parse_block= >>>> Stack trace follows .... >>>> >>>> >>>> at >>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >>>> at >>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >>>> at >>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>>> ... 8 more >>>> Caused by: java.lang.NullPointerException >>>> at >>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >>>> ... 10 more >>>> org.biojava.bio.BioException: IO failure whilst reading from Genbank >>>> >>>> Any quick fix,patch? >>>> >>>> thanks. >>>> Ilhami Visne >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> >> >> >> >> ------------------------------------------------------------------------ >> >> Subject: >> Re: [Biojava-l] A Exception Has Occurred During Parsing. >> From: >> Richard Holland >> Date: >> Fri, 11 Dec 2009 14:17:21 +0000 >> To: >> ilhami visne >> >> To: >> ilhami visne >> CC: >> biojava-l at lists.open-bio.org >> >> >> If the problem is random, it's almost certainly due to problems with the NCBI server feeding you data. There are restrictions on usage - e.g. NCBI only allows a certain number of requests - so you might be running into those. >> >> cheers, >> Richard >> >> On 11 Dec 2009, at 11:36, ilhami visne wrote: >> >> >>> I created a new class NCBIGenbankSequenceFetcher.java which extends GenbankRichSequenceDB and overrid the "getAddress(String id)" to limit the sequence for an id (seq_start and seq_stop). >>> >>> public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{ >>> >>> private String seq_start; >>> private String seq_stop; >>> private String strand="1";//1=plus, 2=minus >>> >>> public NCBIGenbankSequenceFetcher() { >>> } >>> >>> public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) { >>> this.seq_start = seq_start; >>> this.seq_stop = seq_stop; >>> } >>> >>> public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop,String strand) { >>> this.seq_start = seq_start; >>> this.seq_stop = seq_stop; >>> this.strand = strand; >>> } >>> >>> @Override >>> protected URL getAddress(String id) throws MalformedURLException { >>> FetchURL seqURL = new FetchURL("Genbank", "text"); >>> String baseurl = seqURL.getbaseURL(); >>> String db = seqURL.getDB(); >>> String url = baseurl+db+"&id="+id+"&rettype=gb"; >>> if(seq_start != null && seq_stop != null){ >>> url +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand; >>> } >>> return new URL(url); >>> } >>> } >>> >>> From an other class, i create an instance of this class and then call its "getRichSequence(id)" method. (Not the same, but similar) >>> >>> for(String gi:ids){ // ids is a list >>> seq = new NCBIGenbankSequenceFetcher(seq_start, seq_stop,strand).getRichSequence(gi); >>> } >>> >>> What i found later is that it randomly throws the exception, not by any particular sequence. So my guess an io error, which arises during the data streaming from server. >>> >>> ilhami visne. >>> >>> On 12/11/2009 10:59 AM, Richard Holland wrote: >>> >>>> Hello. Could you also post the relevant parts of your code that you are running when this exception happens? >>>> >>>> cheers, >>>> Richard >>>> >>>> On 11 Dec 2009, at 00:46, Ilhami Visne wrote: >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> I'm following the suggestion "Please submit the details that follow to >>>>> >>>>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ >>>>> " >>>>> . >>>>> >>>>> The sequence of concerns is at >>>>> >>>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 >>>>> >>>>> >>>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat >>>>> Accession=null >>>>> Id=null >>>>> Comments=Bad section >>>>> Parse_block= >>>>> Stack trace follows .... >>>>> >>>>> >>>>> at >>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) >>>>> at >>>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) >>>>> at >>>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>>>> ... 8 more >>>>> Caused by: java.lang.NullPointerException >>>>> at >>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) >>>>> ... 10 more >>>>> org.biojava.bio.BioException: IO failure whilst reading from Genbank >>>>> >>>>> Any quick fix,patch? >>>>> >>>>> thanks. >>>>> Ilhami Visne >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>>> >>>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>>> >>>> >>>> >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From huijieqiao at gmail.com Mon Dec 14 21:19:37 2009 From: huijieqiao at gmail.com (huijieqiao at gmail.com) Date: Tue, 15 Dec 2009 10:19:37 +0800 Subject: [Biojava-l] A question about NexusFile and NewickTreeString Message-ID: Hi all, I have a NexusFile like this: #NEXUS Begin TREES; tree test1 = (1,2); tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0) Ancestor1:5.0,D:11.0); End; I can* **traversal all the nodes in tree "test1" with getTreeAsWeightedJGraphT("test1") and vertexSet(). But **getTreeAsWeightedJGraphT("test2") will through an exception "org.biojava.bio.seq.io.ParseException: Expecting ), got Ancestor1". the method "getTree()" doesn't support to traversal all the nodes. How can I deal with this ? By the way, how can I extract a subset of a tree (a given named parent node, such as "Ancestor1")?* From huijieqiao at gmail.com Mon Dec 14 21:31:54 2009 From: huijieqiao at gmail.com (huijieqiao at gmail.com) Date: Tue, 15 Dec 2009 10:31:54 +0800 Subject: [Biojava-l] just a test ^_^ Message-ID: From tiagoantao at gmail.com Mon Dec 14 21:40:42 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 15 Dec 2009 02:40:42 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: References: Message-ID: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> On Tue, Dec 15, 2009 at 2:19 AM, huijieqiao at gmail.com wrote: > Hi all, > > I have a NexusFile like this: > > #NEXUS > > Begin TREES; > ? ?tree test1 = (1,2); > ? ?tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0) > Ancestor1:5.0,D:11.0); > End; > How can I deal with this ? There is a comma missing after the ) closing E:4.0) ie, > tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0), <-- NOTE COMMA > Ancestor1:5.0,D:11.0); -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From huijieqiao at gmail.com Mon Dec 14 21:54:51 2009 From: huijieqiao at gmail.com (huijieqiao at gmail.com) Date: Tue, 15 Dec 2009 10:54:51 +0800 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> Message-ID: Hi Tiago, I used the tree "(A,B,(C,D)E)F;" comes from wikipedia and get the same error http://en.wikipedia.org/wiki/Newick_format "the 3rd one" (,,(,)); *no nodes are named* (A,B,(C,D)); *leaf nodes are named* (A,B,(C,D)E)F; *all nodes are named* (:0.1,:0.2,(:0.3,:0.4):0.5); *all but root node have a distance to parent* (:0.1,:0.2,(:0.3,:0.4):0.5):0.0; *all have a distance to parent* (A:0.1,B:0.2,(C:0.3,D:0.4):0.5); *distances and leaf names* *(popular)* (A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F; *distances and all names* ((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A; *a tree rooted on a leaf node* *(rare)* 2009/12/15 Tiago Ant?o > On Tue, Dec 15, 2009 at 2:19 AM, huijieqiao at gmail.com > wrote: > > Hi all, > > > > I have a NexusFile like this: > > > > #NEXUS > > > > Begin TREES; > > tree test1 = (1,2); > > tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0) > > Ancestor1:5.0,D:11.0); > > End; > > How can I deal with this ? > > There is a comma missing after the ) closing E:4.0) ie, > > tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0), <-- NOTE COMMA > > Ancestor1:5.0,D:11.0); > > > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci > From huijieqiao at gmail.com Tue Dec 15 00:04:09 2009 From: huijieqiao at gmail.com (huijieqiao at gmail.com) Date: Tue, 15 Dec 2009 13:04:09 +0800 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> Message-ID: Thanks for your reply Yes. But how to parse this kind of tree in BIOJava? 2009/12/15 Hilmar Lapp > > On Dec 14, 2009, at 9:40 PM, Tiago Ant?o wrote: > > On Tue, Dec 15, 2009 at 2:19 AM, huijieqiao at gmail.com >> wrote: >> >>> Hi all, >>> >>> I have a NexusFile like this: >>> >>> #NEXUS >>> >>> Begin TREES; >>> tree test1 = (1,2); >>> tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0) >>> Ancestor1:5.0,D:11.0); >>> End; >>> How can I deal with this ? >>> >> >> There is a comma missing after the ) closing E:4.0) ie, >> >>> tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0), <-- NOTE COMMA >>> Ancestor1:5.0,D:11.0); >>> >> > Actually no. Ancestor1 is the node label of the ancestor node of A, C, and > E. Node labels follow after a parenthesis without a comma. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > From holland at eaglegenomics.com Tue Dec 15 04:21:52 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 15 Dec 2009 09:21:52 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> Message-ID: <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> Hi there. I believe the code used to be able to parse this kind of tree, but TIago recently rewrote it so I'm no longer certain. Tiago - your new code doesn't seem to be coping with the insertion of a newline at random points in the Tree string. I think you might need to modify your tokenize() method to handle this better? Could you also add a unit test using this particular tree. cheers, Richard On 15 Dec 2009, at 05:04, huijieqiao at gmail.com wrote: > Thanks for your reply > > Yes. > > But how to parse this kind of tree in BIOJava? > > > 2009/12/15 Hilmar Lapp > >> >> On Dec 14, 2009, at 9:40 PM, Tiago Ant?o wrote: >> >> On Tue, Dec 15, 2009 at 2:19 AM, huijieqiao at gmail.com >>> wrote: >>> >>>> Hi all, >>>> >>>> I have a NexusFile like this: >>>> >>>> #NEXUS >>>> >>>> Begin TREES; >>>> tree test1 = (1,2); >>>> tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0) >>>> Ancestor1:5.0,D:11.0); >>>> End; >>>> How can I deal with this ? >>>> >>> >>> There is a comma missing after the ) closing E:4.0) ie, >>> >>>> tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0), <-- NOTE COMMA >>>> Ancestor1:5.0,D:11.0); >>>> >>> >> Actually no. Ancestor1 is the node label of the ancestor node of A, C, and >> E. Node labels follow after a parenthesis without a comma. >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Tue Dec 15 06:43:54 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 15 Dec 2009 11:43:54 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> Message-ID: <6d941f120912150343p73727225m7c35e57e81fdda46@mail.gmail.com> 2009/12/15 Richard Holland : > Hi there. > > I believe the code used to be able to parse this kind of tree, but TIago recently rewrote it so I'm no longer certain. > > Tiago - your new code doesn't seem to be coping with the insertion of a newline at random points in the Tree string. I think you might need to modify your tokenize() method to handle this better? Could you also add a unit test using this particular tree. Yep, will take care of this over the weekend. Maybe before, but no promises. From tiagoantao at gmail.com Tue Dec 15 10:02:17 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 15 Dec 2009 15:02:17 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <1263043C-26C6-4455-9D5B-E092EA389851@uni-jena.de> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912150343p73727225m7c35e57e81fdda46@mail.gmail.com> <1263043C-26C6-4455-9D5B-E092EA389851@uni-jena.de> Message-ID: <6d941f120912150702xc9bb9ek43cfe237ba063fc6@mail.gmail.com> Hi Thasso, I think you are using an old version (even the current stable is "old" AFAIK)... I have changed TreeBlock quite a lot, as it was somewhat problematic. 2009/12/15 Thasso Griebel : > Hi, > > I wanted to take a look at the parser anyways, so I took the opportunity. > > As far as i see this, the newline is just a minor part of the problem. I > think the bigger issue here is parsing inner node labels. I attached a patch > that fixes the problem, at least for inner nodes with label and inner nodes > with label and weights. Wikipedia states that Newick allows leaves without > any labels, but in case of phylogenetic trees I think one can safely ignore > this, though the parser should maybe throw an exception. > > If you are interested I also updated the unit test. > > hope it helps, cheers, > -thasso > > On Dec 15, 2009, at 12:43 , Tiago Ant?o wrote: > >> 2009/12/15 Richard Holland : >>> >>> Hi there. >>> >>> I believe the code used to be able to parse this kind of tree, but TIago >>> recently rewrote it so I'm no longer certain. >>> >>> Tiago - your new code doesn't seem to be coping with the insertion of a >>> newline at random points in the Tree string. I think you might need to >>> modify your tokenize() method to handle this better? Could you also add a >>> unit test using this particular tree. >> >> >> Yep, will take care of this over the weekend. Maybe before, but no >> promises. >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From huijieqiao at gmail.com Tue Dec 15 10:03:24 2009 From: huijieqiao at gmail.com (huijieqiao at gmail.com) Date: Tue, 15 Dec 2009 23:03:24 +0800 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> Message-ID: I noticed that there are quite a lot different between current version and the last version. And if there is a "_" in the node's name, NexasParse will remove the "_" in the node's name and there will be an error thrown for exmple (a_1,a_2); will be changed to (a 1,a 2); 2009/12/15 Richard Holland > Hi there. > > I believe the code used to be able to parse this kind of tree, but TIago > recently rewrote it so I'm no longer certain. > > Tiago - your new code doesn't seem to be coping with the insertion of a > newline at random points in the Tree string. I think you might need to > modify your tokenize() method to handle this better? Could you also add a > unit test using this particular tree. > > cheers, > Richard > > On 15 Dec 2009, at 05:04, huijieqiao at gmail.com wrote: > > > Thanks for your reply > > > > Yes. > > > > But how to parse this kind of tree in BIOJava? > > > > > > 2009/12/15 Hilmar Lapp > > > >> > >> On Dec 14, 2009, at 9:40 PM, Tiago Ant?o wrote: > >> > >> On Tue, Dec 15, 2009 at 2:19 AM, huijieqiao at gmail.com > >>> wrote: > >>> > >>>> Hi all, > >>>> > >>>> I have a NexusFile like this: > >>>> > >>>> #NEXUS > >>>> > >>>> Begin TREES; > >>>> tree test1 = (1,2); > >>>> tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0) > >>>> Ancestor1:5.0,D:11.0); > >>>> End; > >>>> How can I deal with this ? > >>>> > >>> > >>> There is a comma missing after the ) closing E:4.0) ie, > >>> > >>>> tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0), <-- NOTE COMMA > >>>> Ancestor1:5.0,D:11.0); > >>>> > >>> > >> Actually no. Ancestor1 is the node label of the ancestor node of A, C, > and > >> E. Node labels follow after a parenthesis without a comma. > >> > >> -hilmar > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > >> =========================================================== > >> > >> > >> > >> > >> > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From tiagoantao at gmail.com Tue Dec 15 10:14:20 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 15 Dec 2009 15:14:20 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> Message-ID: <6d941f120912150714q2ec2805ieaccb5043bfae7d5@mail.gmail.com> Hi again, Just to make things clear, it would be nice if people stated if they are using the latest release version or the version in the subversion head. As they are totally different in tree processing as the SVN version is mostly a reimplementation (TreeBlock only). 2009/12/15 huijieqiao at gmail.com : > I noticed that there are quite a lot different between current version and > the last version. > > And if there is a "_" in the node's name, NexasParse will remove the "_" in > the node's name and there will be an error thrown > > for exmple > > (a_1,a_2); will be changed to (a 1,a 2); > > > > 2009/12/15 Richard Holland >> >> Hi there. >> >> I believe the code used to be able to parse this kind of tree, but TIago >> recently rewrote it so I'm no longer certain. >> >> Tiago - your new code doesn't seem to be coping with the insertion of a >> newline at random points in the Tree string. I think you might need to >> modify your tokenize() method to handle this better? Could you also add a >> unit test using this particular tree. >> >> cheers, >> Richard >> >> On 15 Dec 2009, at 05:04, huijieqiao at gmail.com wrote: >> >> > Thanks for your reply >> > >> > Yes. >> > >> > But how to parse this kind of tree in BIOJava? >> > >> > >> > 2009/12/15 Hilmar Lapp >> > >> >> >> >> On Dec 14, 2009, at 9:40 PM, Tiago Ant?o wrote: >> >> >> >> On Tue, Dec 15, 2009 at 2:19 AM, huijieqiao at gmail.com >> >>> wrote: >> >>> >> >>>> Hi all, >> >>>> >> >>>> I have a NexusFile like this: >> >>>> >> >>>> #NEXUS >> >>>> >> >>>> Begin TREES; >> >>>> ?tree test1 = (1,2); >> >>>> ?tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0) >> >>>> Ancestor1:5.0,D:11.0); >> >>>> End; >> >>>> How can I deal with this ? >> >>>> >> >>> >> >>> There is a comma missing after the ) closing E:4.0) ie, >> >>> >> >>>> tree test2 = (B:6.0,(A:5.0,C:3.0,E:4.0), <-- NOTE COMMA >> >>>> Ancestor1:5.0,D:11.0); >> >>>> >> >>> >> >> Actually no. Ancestor1 is the node label of the ancestor node of A, C, >> >> and >> >> E. Node labels follow after a parenthesis without a comma. >> >> >> >> ? ? ? -hilmar >> >> -- >> >> =========================================================== >> >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> >> =========================================================== >> >> >> >> >> >> >> >> >> >> >> > >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From thasso.griebel at uni-jena.de Tue Dec 15 12:34:51 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Tue, 15 Dec 2009 18:34:51 +0100 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <6d941f120912150702xc9bb9ek43cfe237ba063fc6@mail.gmail.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912150343p73727225m7c35e57e81fdda46@mail.gmail.com> <1263043C-26C6-4455-9D5B-E092EA389851@uni-jena.de> <6d941f120912150702xc9bb9ek43cfe237ba063fc6@mail.gmail.com> Message-ID: <8DC3916F-6AC5-4204-A5A4-0F5F6E58BF19@uni-jena.de> Hi, now I am confused :) I did my fixes on the SVN version from this repository: svn://code.open-bio.org/biojava/biojava-live/trunk The revision I was working on is 7269. This is the current HEAD including the NEW parser, isn't it ? Tiago, you are listed as author in the javadoc and the last commit comment on the TreesBlock states that it contains your changes...so I am pretty sure I did my changes on the right version ? cheers, -thasso On Dec 15, 2009, at 16:02 , Tiago Ant?o wrote: > Hi Thasso, > > I think you are using an old version (even the current stable is "old" > AFAIK)... I have changed TreeBlock quite a lot, as it was somewhat > problematic. > > 2009/12/15 Thasso Griebel : >> Hi, >> >> I wanted to take a look at the parser anyways, so I took the >> opportunity. >> >> As far as i see this, the newline is just a minor part of the >> problem. I >> think the bigger issue here is parsing inner node labels. I >> attached a patch >> that fixes the problem, at least for inner nodes with label and >> inner nodes >> with label and weights. Wikipedia states that Newick allows leaves >> without >> any labels, but in case of phylogenetic trees I think one can >> safely ignore >> this, though the parser should maybe throw an exception. >> >> If you are interested I also updated the unit test. >> >> hope it helps, cheers, >> -thasso >> >> On Dec 15, 2009, at 12:43 , Tiago Ant?o wrote: >> >>> 2009/12/15 Richard Holland : >>>> >>>> Hi there. >>>> >>>> I believe the code used to be able to parse this kind of tree, >>>> but TIago >>>> recently rewrote it so I'm no longer certain. >>>> >>>> Tiago - your new code doesn't seem to be coping with the >>>> insertion of a >>>> newline at random points in the Tree string. I think you might >>>> need to >>>> modify your tokenize() method to handle this better? Could you >>>> also add a >>>> unit test using this particular tree. >>> >>> >>> Yep, will take care of this over the weekend. Maybe before, but no >>> promises. >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> >> >> > > > > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From andreas.prlic at gmail.com Tue Dec 15 13:46:33 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Tue, 15 Dec 2009 10:46:33 -0800 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <8DC3916F-6AC5-4204-A5A4-0F5F6E58BF19@uni-jena.de> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912150343p73727225m7c35e57e81fdda46@mail.gmail.com> <1263043C-26C6-4455-9D5B-E092EA389851@uni-jena.de> <6d941f120912150702xc9bb9ek43cfe237ba063fc6@mail.gmail.com> <8DC3916F-6AC5-4204-A5A4-0F5F6E58BF19@uni-jena.de> Message-ID: <59a41c430912151046u51181eaaq936946bccf0f6350@mail.gmail.com> Hi Thasso, just to clarify: you are correct, that is the latest SVN head. Andreas 2009/12/15 Thasso Griebel : > Hi, > > now I am confused :) > > I did my fixes on the SVN version from this repository: > > svn://code.open-bio.org/biojava/biojava-live/trunk > > The revision I was working on is 7269. This is the current HEAD including > the NEW parser, isn't it ? Tiago, you are listed as author in the javadoc > and the last commit comment on the TreesBlock ?states that it contains your > changes...so I am pretty sure I did my changes on the right version ? > > cheers, > -thasso > > On Dec 15, 2009, at 16:02 , Tiago Ant?o wrote: > >> Hi Thasso, >> >> I think you are using an old version (even the current stable is "old" >> AFAIK)... I have changed TreeBlock quite a lot, as it was somewhat >> problematic. >> >> 2009/12/15 Thasso Griebel : >>> >>> Hi, >>> >>> I wanted to take a look at the parser anyways, so I took the opportunity. >>> >>> As far as i see this, the newline is just a minor part of the problem. I >>> think the bigger issue here is parsing inner node labels. I attached a >>> patch >>> that fixes the problem, at least for inner nodes with label and inner >>> nodes >>> with label and weights. Wikipedia states that Newick allows leaves >>> without >>> any labels, but in case of phylogenetic trees I think one can safely >>> ignore >>> this, though the parser should maybe throw an exception. >>> >>> If you are interested I also updated the unit test. >>> >>> hope it helps, cheers, >>> -thasso >>> >>> On Dec 15, 2009, at 12:43 , Tiago Ant?o wrote: >>> >>>> 2009/12/15 Richard Holland : >>>>> >>>>> Hi there. >>>>> >>>>> I believe the code used to be able to parse this kind of tree, but >>>>> TIago >>>>> recently rewrote it so I'm no longer certain. >>>>> >>>>> Tiago - your new code doesn't seem to be coping with the insertion of a >>>>> newline at random points in the Tree string. I think you might need to >>>>> modify your tokenize() method to handle this better? Could you also add >>>>> a >>>>> unit test using this particular tree. >>>> >>>> >>>> Yep, will take care of this over the weekend. Maybe before, but no >>>> promises. >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> >>> >>> >> >> >> >> -- >> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From chris at compbio.dundee.ac.uk Wed Dec 16 04:10:23 2009 From: chris at compbio.dundee.ac.uk (Chris Cole) Date: Wed, 16 Dec 2009 09:10:23 +0000 Subject: [Biojava-l] Reading Fasta file problem (new to biojava) Message-ID: <4B28A3FF.3070306@compbio.dundee.ac.uk> [ repost as the first attempt never got through ] Hi, I'm new to biojava and relatively new to Java, but not new to progamming in general (perl and C). As a quick exercise I wanted to be able to read in a Fasta file, but this code here (Solution 2): http://biojava.org/wiki/BioJava:Cookbook:SeqIO:ReadFasta Has deprecated warnings for SeqIOTools and fileToBioJava(). The code currently works, but what's the non-deprecated way of going about it? I can see from Javadoc that RichSequence.IOTools from org.biojavax.bio.seq is now the recommended class, but how does one implement it? I've tried this, but readFastaProtein() requires a namespace, which I've no idea what that means. BufferedReader br = new BufferedReader(new FileReader(args[0])); RichSequenceIterator iter = (RichSequenceIterator)RichSequence.IOTools.readFastaProtein(br); All pointers appreciated. Cheers, Chris From holland at eaglegenomics.com Wed Dec 16 06:22:31 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 16 Dec 2009 11:22:31 +0000 Subject: [Biojava-l] Reading Fasta file problem (new to biojava) In-Reply-To: <4B28A3FF.3070306@compbio.dundee.ac.uk> References: <4B28A3FF.3070306@compbio.dundee.ac.uk> Message-ID: <76BC597D-0B44-47AA-84B6-F3BCFD05B0C9@eaglegenomics.com> You're almost right already - the sample code would be this: BufferedReader br = new BufferedReader(new FileReader(args[0])); Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator iterator = RichSequence.IOTools.readFastaProtein(br,ns); (No need to cast it - it's already a RichSequenceIterator). The namespace is a hangover from the object model being based on BioSQL, where all sequences belong to a namespace. If you don't care about namespaces (which is highly likely if you're not persisting to BioSQL) then a default one is provided to use as a placeholder, as shown above. cheers, Richard On 16 Dec 2009, at 09:10, Chris Cole wrote: > [ repost as the first attempt never got through ] > > Hi, > > I'm new to biojava and relatively new to Java, but not new to progamming in general (perl and C). > > As a quick exercise I wanted to be able to read in a Fasta file, but this code here (Solution 2): > http://biojava.org/wiki/BioJava:Cookbook:SeqIO:ReadFasta > > Has deprecated warnings for SeqIOTools and fileToBioJava(). The code currently works, but what's the non-deprecated way of going about it? > > I can see from Javadoc that RichSequence.IOTools from org.biojavax.bio.seq is now the recommended class, but how does one implement it? > > I've tried this, but readFastaProtein() requires a namespace, which I've no idea what that means. > BufferedReader br = new BufferedReader(new FileReader(args[0])); > RichSequenceIterator iter = (RichSequenceIterator)RichSequence.IOTools.readFastaProtein(br); > > All pointers appreciated. > Cheers, > Chris > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Wed Dec 16 06:26:28 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 16 Dec 2009 11:26:28 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <59a41c430912151046u51181eaaq936946bccf0f6350@mail.gmail.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912150343p73727225m7c35e57e81fdda46@mail.gmail.com> <1263043C-26C6-4455-9D5B-E092EA389851@uni-jena.de> <6d941f120912150702xc9bb9ek43cfe237ba063fc6@mail.gmail.com> <8DC3916F-6AC5-4204-A5A4-0F5F6E58BF19@uni-jena.de> <59a41c430912151046u51181eaaq936946bccf0f6350@mail.gmail.com> Message-ID: <6d941f120912160326i4cf85c45sb03707cb75173879@mail.gmail.com> Ok thanks for the clarifications. I no one else offers, I will sort this out before Monday (and add test cases also). 2009/12/15 Andreas Prlic : > Hi Thasso, > > just to clarify: you are correct, that is the latest SVN head. > > Andreas > > > > > 2009/12/15 Thasso Griebel : >> Hi, >> >> now I am confused :) >> >> I did my fixes on the SVN version from this repository: >> >> svn://code.open-bio.org/biojava/biojava-live/trunk >> >> The revision I was working on is 7269. This is the current HEAD including >> the NEW parser, isn't it ? Tiago, you are listed as author in the javadoc >> and the last commit comment on the TreesBlock ?states that it contains your >> changes...so I am pretty sure I did my changes on the right version ? >> >> cheers, >> -thasso >> >> On Dec 15, 2009, at 16:02 , Tiago Ant?o wrote: >> >>> Hi Thasso, >>> >>> I think you are using an old version (even the current stable is "old" >>> AFAIK)... I have changed TreeBlock quite a lot, as it was somewhat >>> problematic. >>> >>> 2009/12/15 Thasso Griebel : >>>> >>>> Hi, >>>> >>>> I wanted to take a look at the parser anyways, so I took the opportunity. >>>> >>>> As far as i see this, the newline is just a minor part of the problem. I >>>> think the bigger issue here is parsing inner node labels. I attached a >>>> patch >>>> that fixes the problem, at least for inner nodes with label and inner >>>> nodes >>>> with label and weights. Wikipedia states that Newick allows leaves >>>> without >>>> any labels, but in case of phylogenetic trees I think one can safely >>>> ignore >>>> this, though the parser should maybe throw an exception. >>>> >>>> If you are interested I also updated the unit test. >>>> >>>> hope it helps, cheers, >>>> -thasso >>>> >>>> On Dec 15, 2009, at 12:43 , Tiago Ant?o wrote: >>>> >>>>> 2009/12/15 Richard Holland : >>>>>> >>>>>> Hi there. >>>>>> >>>>>> I believe the code used to be able to parse this kind of tree, but >>>>>> TIago >>>>>> recently rewrote it so I'm no longer certain. >>>>>> >>>>>> Tiago - your new code doesn't seem to be coping with the insertion of a >>>>>> newline at random points in the Tree string. I think you might need to >>>>>> modify your tokenize() method to handle this better? Could you also add >>>>>> a >>>>>> unit test using this particular tree. >>>>> >>>>> >>>>> Yep, will take care of this over the weekend. Maybe before, but no >>>>> promises. >>>>> _______________________________________________ >>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From chris at compbio.dundee.ac.uk Wed Dec 16 10:00:47 2009 From: chris at compbio.dundee.ac.uk (Chris Cole) Date: Wed, 16 Dec 2009 15:00:47 +0000 Subject: [Biojava-l] Reading Fasta file problem (new to biojava) In-Reply-To: <76BC597D-0B44-47AA-84B6-F3BCFD05B0C9@eaglegenomics.com> References: <4B28A3FF.3070306@compbio.dundee.ac.uk> <76BC597D-0B44-47AA-84B6-F3BCFD05B0C9@eaglegenomics.com> Message-ID: <4B28F61F.8010104@compbio.dundee.ac.uk> On 16/12/09 11:22, Richard Holland wrote: > BufferedReader br = new BufferedReader(new FileReader(args[0])); > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator iterator = RichSequence.IOTools.readFastaProtein(br,ns); Thanks, Richard. However, now I get an error: "BioException cannot be resolved to a type" Below is the code for the method. What am I missing? public void read(String filename) { try { System.out.println("Reading file: " + filename); BufferedReader br = new BufferedReader(new FileReader(filename)); Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator iter = RichSequence.IOTools.readFastaProtein(br,ns); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); } catch (BioException ex) { //error parsing requested format ex.printStackTrace(); } } From holland at eaglegenomics.com Wed Dec 16 11:38:15 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 16 Dec 2009 16:38:15 +0000 Subject: [Biojava-l] Reading Fasta file problem (new to biojava) In-Reply-To: <4B28F61F.8010104@compbio.dundee.ac.uk> References: <4B28A3FF.3070306@compbio.dundee.ac.uk> <76BC597D-0B44-47AA-84B6-F3BCFD05B0C9@eaglegenomics.com> <4B28F61F.8010104@compbio.dundee.ac.uk> Message-ID: If that's a compile-time error, it's probably because you haven't got an 'import' statement for BioException, and/or you're compiling against an out-of-date version of BioJava (particularly if you're using Eclipse or some other IDE, check that your source dependencies for BioJava are the same version as the JAR dependencies). If that doesn't solve it, could you post the full error stack/console output? cheers, Richard On 16 Dec 2009, at 15:00, Chris Cole wrote: > On 16/12/09 11:22, Richard Holland wrote: >> BufferedReader br = new BufferedReader(new FileReader(args[0])); >> Namespace ns = RichObjectFactory.getDefaultNamespace(); >> RichSequenceIterator iterator = RichSequence.IOTools.readFastaProtein(br,ns); > > Thanks, Richard. However, now I get an error: > "BioException cannot be resolved to a type" > > Below is the code for the method. What am I missing? > > public void read(String filename) { > try { > System.out.println("Reading file: " + filename); > BufferedReader br = new BufferedReader(new FileReader(filename)); > > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator iter = RichSequence.IOTools.readFastaProtein(br,ns); > } > catch (FileNotFoundException ex) { > //can't find file specified by args[0] > ex.printStackTrace(); > } catch (BioException ex) { > //error parsing requested format > ex.printStackTrace(); > } > } -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Fri Dec 18 07:04:54 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 18 Dec 2009 12:04:54 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> Message-ID: <6d941f120912180404h5aae7a93y3c3220d80b5fd3df@mail.gmail.com> 2009/12/15 Richard Holland : > Tiago - your new code doesn't seem to be coping with the insertion of a newline at random points in the Tree string. I think you might need to modify your tokenize() method to handle this better? Could you also add a unit test using this particular tree. Do you have any case that exposes this bug? I am trying with this: #NEXUS Begin TREES; tree test1 = ( 1 , 2 ) ; End; And it works out fine and dandy. Maybe the bug that you are spotting is really not a newline problem but an instance of the other bug (ie node name that happens after a newline and seems as a newline bug)? From tiagoantao at gmail.com Fri Dec 18 07:23:28 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 18 Dec 2009 12:23:28 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> Message-ID: <6d941f120912180423y3668bf4esf5bc2c40d3831afc@mail.gmail.com> Hi, 2009/12/15 Hilmar Lapp : > Actually no. Ancestor1 is the node label of the ancestor node of A, C, and > E. Node labels follow after a parenthesis without a comma. I think I've found the cause for this mistake: I was following http://hydrodictyon.eeb.uconn.edu/eebedia/index.php/Phylogenetics:_NEXUS_Format which happens not to discuss newick trees in sufficient detail. I should have used a more detailed spec. I've found this: http://en.wikipedia.org/wiki/Newick_format Question: I am assuming that the name string is a string in the usual sense, ie, something starting with an alpha (not numeric) and ending in the first whitespace. Given your experience with the format (and considering that sometimes de facto implementations are somewhat liberal), do you think this notion of string is good enough? If so, the change seems trivial and should be ready today. Tiago -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From holland at eaglegenomics.com Fri Dec 18 09:36:59 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 18 Dec 2009 14:36:59 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <6d941f120912180404h5aae7a93y3c3220d80b5fd3df@mail.gmail.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912180404h5aae7a93y3c3220d80b5fd3df@mail.gmail.com> Message-ID: <850FC294-83E5-4B2E-A1AF-B45981BE3D10@eaglegenomics.com> You could be right - it could be just the other problem. cheers, Richard On 18 Dec 2009, at 12:04, Tiago Ant?o wrote: > 2009/12/15 Richard Holland : >> Tiago - your new code doesn't seem to be coping with the insertion of a newline at random points in the Tree string. I think you might need to modify your tokenize() method to handle this better? Could you also add a unit test using this particular tree. > > Do you have any case that exposes this bug? I am trying with this: > > #NEXUS > > Begin TREES; > tree test1 = > ( > 1 > , > 2 > ) > ; > > End; > > And it works out fine and dandy. > Maybe the bug that you are spotting is really not a newline problem > but an instance of the other bug (ie node name that happens after a > newline and seems as a newline bug)? -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at drycafe.net Fri Dec 18 10:13:35 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 18 Dec 2009 10:13:35 -0500 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <6d941f120912180423y3668bf4esf5bc2c40d3831afc@mail.gmail.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <6d941f120912180423y3668bf4esf5bc2c40d3831afc@mail.gmail.com> Message-ID: <2802F43E-9471-4793-B50A-4A4205A4BEF0@drycafe.net> Tiago - the authoritative page on Newick format is here: http://evolution.genetics.washington.edu/phylip/newicktree.html NHX (New Hampshire Extended) is here: http://www.phylosoft.org/NHX/ -hilmar On Dec 18, 2009, at 7:23 AM, Tiago Ant?o wrote: > Hi, > > 2009/12/15 Hilmar Lapp : >> Actually no. Ancestor1 is the node label of the ancestor node of A, >> C, and >> E. Node labels follow after a parenthesis without a comma. > > > I think I've found the cause for this mistake: I was following > http://hydrodictyon.eeb.uconn.edu/eebedia/index.php/Phylogenetics:_NEXUS_Format > which happens not to discuss newick trees in sufficient detail. I > should have used a more detailed spec. > I've found this: > http://en.wikipedia.org/wiki/Newick_format > Question: > I am assuming that the name string is a string in the usual sense, ie, > something starting with an alpha (not numeric) and ending in the first > whitespace. > Given your experience with the format (and considering that sometimes > de facto implementations are somewhat liberal), do you think this > notion of string is good enough? > > If so, the change seems trivial and should be ready today. > > Tiago > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From chris at compbio.dundee.ac.uk Fri Dec 18 10:53:25 2009 From: chris at compbio.dundee.ac.uk (Chris Cole) Date: Fri, 18 Dec 2009 15:53:25 +0000 Subject: [Biojava-l] Error parsing ipi.HUMAN.fasta file Message-ID: <4B2BA575.5010007@compbio.dundee.ac.uk> I'm wanting to parse a fasta file obtained from IPI using the code at the bottom of this message, but I get the following error: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) at test.readFasta(test.java:39) at test.main(test.java:18) Caused by: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:202) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 2 more Looking at the Fasta file itself and doing some tests, it seems to fail consistently at one or two entries /preceding/ an entry with a very long description line e.g.: >IPI:IPI00021421.4|SWISS-PROT:Q9UMR5-1|TREMBL:B0S868|ENSEMBL:ENSP00000382748;ENSP00000382749;ENSP00000382750;ENSP00000387679;ENSP00000388341;ENSP00000388618;ENSP00000389930;ENSP00000392885;ENSP00000393009;ENSP00000395242;ENSP00000395562;ENSP00000397025;ENSP00000399879;ENSP00000403820;ENSP00000406496;ENSP00000406566;ENSP00000408703;ENSP00000411007;ENSP00000411625;ENSP00000412827|REFSEQ:NP_005146|VEGA:OTTHUMP00000014775;OTTHUMP00000014776;OTTHUMP00000014778;OTTHUMP00000175028;OTTHUMP00000175029;OTTHUMP00000175030;OTTHUMP00000193135;OTTHUMP00000193136;OTTHUMP00000193138;OTTHUMP00000193964;OTTHUMP00000193965;OTTHUMP00000193967;OTTHUMP00000194391;OTTHUMP00000194392;OTTHUMP00000194394 Tax_Id=9606 Gene_Symbol=PPT2 Isoform 1 of Lysosomal thioesterase PPT2 MLGLWGQRLPAAWVLLLLPFLPLLLLAAPAPHRASYKPVIVVHGLFDSSYSFRHLLEYIN ETHPGTVVTVLDLFDGRESLRPLWEQVQGFREAVVPIMAKAPQGVHLICYSQGGLVCRAL LSVMDDHNVDSFISLSSPQMGQYGDTDYLKWLFPTSMRSNLYRICYSPWGQEFSICNYWH DPHHDDLYLNASSFLALINGERDHPNATVWRKNFLRVGHLVLIGGPDDGVITPWQSSFFG FYDANETVLEMEEQLVYLRDSFGLKTLLARGAIVRCPMAGISHTAWHSNRTLYETCIEPW LS Deleting the large entries allows the code to continue until it reaches another long description line. It also seems to be a feature of large Fasta files as reading the above sequence alone or as part of a small file is fine. Is this a known problem or am I doing something wrong? BTW I'm using biojava 1.7 and Java 1.6.0_17. Any help would be most appreciated. Cheers. code: import java.io.*; import org.biojava.bio.*; import org.biojavax.*; import org.biojavax.bio.seq.*; public class test { private static PrintStream o = System.out; public static void main(String[] args) { // TODO Auto-generated method stub readFasta(args[0]); } public static void readFasta(String filename) { try { o.println("Reading file: " + filename); //prepare a BufferedReader for file io BufferedReader br = new BufferedReader(new FileReader(filename)); // read Fasta file as BioJava RichSequence object Namespace ns = RichObjectFactory.getDefaultNamespace(); RichSequenceIterator iter = RichSequence.IOTools.readFastaProtein(br,ns); int numProteins = 0; while(iter.hasNext()) { ++numProteins; // Retrieve sequence and description data RichSequence seq = iter.nextRichSequence(); String ipi = seq.getName().substring(4,15); o.println(ipi); } o.println("Found " + numProteins + " in Fasta file"); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); } catch (BioException ex) { //error parsing requested format ex.printStackTrace(); } } } From holland at eaglegenomics.com Fri Dec 18 11:58:27 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 18 Dec 2009 16:58:27 +0000 Subject: [Biojava-l] Error parsing ipi.HUMAN.fasta file In-Reply-To: <4B2BA575.5010007@compbio.dundee.ac.uk> References: <4B2BA575.5010007@compbio.dundee.ac.uk> Message-ID: <991F3D0D-0702-4EB7-9718-676986A1415E@eaglegenomics.com> The FASTA parser has a buffer which it uses to read ahead to the next complete line then back up before it actually parses it on the second pass (in order to allow it to do things like hasNext()). The exception shows that the size of that buffer is being exceeded, causing it to fail to back up again afterwards. There's two cures - one is to rewrite the FASTA parser to buffer things in a different way. The other is to open up org/biojavax/bio/seq/io/FastaFormat.java in a text editor, search for the line where it sets the buffer (somewhere around line 202 according to the exception, in the readRichSequence() method - the command to look for is 'mark'), and increase the buffer size to something suitably large enough (it's currently set at 500 bytes). Then recompile BioJava and it should work. cheers, Richard On 18 Dec 2009, at 15:53, Chris Cole wrote: > I'm wanting to parse a fasta file obtained from IPI using the code at the bottom of this message, but I get the following error: > > org.biojava.bio.BioException: Could not read sequence > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > at test.readFasta(test.java:39) > at test.main(test.java:18) > Caused by: java.io.IOException: Mark invalid > at java.io.BufferedReader.reset(BufferedReader.java:485) > at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:202) > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 2 more > > Looking at the Fasta file itself and doing some tests, it seems to fail consistently at one or two entries /preceding/ an entry with a very long description line e.g.: > >IPI:IPI00021421.4|SWISS-PROT:Q9UMR5-1|TREMBL:B0S868|ENSEMBL:ENSP00000382748;ENSP00000382749;ENSP00000382750;ENSP00000387679;ENSP00000388341;ENSP00000388618;ENSP00000389930;ENSP00000392885;ENSP00000393009;ENSP00000395242;ENSP00000395562;ENSP00000397025;ENSP00000399879;ENSP00000403820;ENSP00000406496;ENSP00000406566;ENSP00000408703;ENSP00000411007;ENSP00000411625;ENSP00000412827|REFSEQ:NP_005146|VEGA:OTTHUMP00000014775;OTTHUMP00000014776;OTTHUMP00000014778;OTTHUMP00000175028;OTTHUMP00000175029;OTTHUMP00000175030;OTTHUMP00000193135;OTTHUMP00000193136;OTTHUMP00000193138;OTTHUMP00000193964;OTTHUMP00000193965;OTTHUMP00000193967;OTTHUMP00000194391;OTTHUMP00000194392;OTTHUMP00000194394 Tax_Id=9606 Gene_Symbol=PPT2 Isoform 1 of Lysosomal thioesterase PPT2 > MLGLWGQRLPAAWVLLLLPFLPLLLLAAPAPHRASYKPVIVVHGLFDSSYSFRHLLEYIN > ETHPGTVVTVLDLFDGRESLRPLWEQVQGFREAVVPIMAKAPQGVHLICYSQGGLVCRAL > LSVMDDHNVDSFISLSSPQMGQYGDTDYLKWLFPTSMRSNLYRICYSPWGQEFSICNYWH > DPHHDDLYLNASSFLALINGERDHPNATVWRKNFLRVGHLVLIGGPDDGVITPWQSSFFG > FYDANETVLEMEEQLVYLRDSFGLKTLLARGAIVRCPMAGISHTAWHSNRTLYETCIEPW > LS > > Deleting the large entries allows the code to continue until it reaches another long description line. > > It also seems to be a feature of large Fasta files as reading the above sequence alone or as part of a small file is fine. > > Is this a known problem or am I doing something wrong? BTW I'm using biojava 1.7 and Java 1.6.0_17. > Any help would be most appreciated. > Cheers. > > code: > import java.io.*; > > import org.biojava.bio.*; > import org.biojavax.*; > import org.biojavax.bio.seq.*; > > public class test { > private static PrintStream o = System.out; > > public static void main(String[] args) { > // TODO Auto-generated method stub > readFasta(args[0]); > } > > public static void readFasta(String filename) { > try { > o.println("Reading file: " + filename); > //prepare a BufferedReader for file io > BufferedReader br = new BufferedReader(new FileReader(filename)); > > // read Fasta file as BioJava RichSequence object > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator iter = RichSequence.IOTools.readFastaProtein(br,ns); > > int numProteins = 0; > while(iter.hasNext()) { > ++numProteins; > > // Retrieve sequence and description data > RichSequence seq = iter.nextRichSequence(); > String ipi = seq.getName().substring(4,15); > o.println(ipi); > > } > o.println("Found " + numProteins + " in Fasta file"); > } catch (FileNotFoundException ex) { > //can't find file specified by args[0] > ex.printStackTrace(); > } catch (BioException ex) { > //error parsing requested format > ex.printStackTrace(); > } > } > > } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jogoodma at indiana.edu Fri Dec 18 11:44:56 2009 From: jogoodma at indiana.edu (Josh Goodman) Date: Fri, 18 Dec 2009 11:44:56 -0500 Subject: [Biojava-l] Error parsing ipi.HUMAN.fasta file In-Reply-To: <4B2BA575.5010007@compbio.dundee.ac.uk> References: <4B2BA575.5010007@compbio.dundee.ac.uk> Message-ID: <4B2BB188.5050103@indiana.edu> Hi Chris, I've run into this problem before. See http://lists.open-bio.org/pipermail/biojava-l/2009-May/006834.html for details and some unofficial patches that fix the problem. Josh On 12/18/2009 10:53 AM, Chris Cole wrote: > I'm wanting to parse a fasta file obtained from IPI using the code at > the bottom of this message, but I get the following error: > > org.biojava.bio.BioException: Could not read sequence > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > > at test.readFasta(test.java:39) > at test.main(test.java:18) > Caused by: java.io.IOException: Mark invalid > at java.io.BufferedReader.reset(BufferedReader.java:485) > at > org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:202) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > > ... 2 more > > Looking at the Fasta file itself and doing some tests, it seems to fail > consistently at one or two entries /preceding/ an entry with a very long > description line e.g.: >>IPI:IPI00021421.4|SWISS-PROT:Q9UMR5-1|TREMBL:B0S868|ENSEMBL:ENSP00000382748;ENSP00000382749;ENSP00000382750;ENSP00000387679;ENSP00000388341;ENSP00000388618;ENSP00000389930;ENSP00000392885;ENSP00000393009;ENSP00000395242;ENSP00000395562;ENSP00000397025;ENSP00000399879;ENSP00000403820;ENSP00000406496;ENSP00000406566;ENSP00000408703;ENSP00000411007;ENSP00000411625;ENSP00000412827|REFSEQ:NP_005146|VEGA:OTTHUMP00000014775;OTTHUMP00000014776;OTTHUMP00000014778;OTTHUMP00000175028;OTTHUMP00000175029;OTTHUMP00000175030;OTTHUMP00000193135;OTTHUMP00000193136;OTTHUMP00000193138;OTTHUMP00000193964;OTTHUMP00000193965;OTTHUMP00000193967;OTTHUMP00000194391;OTTHUMP00000194392;OTTHUMP00000194394 > Tax_Id=9606 Gene_Symbol=PPT2 Isoform 1 of Lysosomal thioesterase PPT2 > MLGLWGQRLPAAWVLLLLPFLPLLLLAAPAPHRASYKPVIVVHGLFDSSYSFRHLLEYIN > ETHPGTVVTVLDLFDGRESLRPLWEQVQGFREAVVPIMAKAPQGVHLICYSQGGLVCRAL > LSVMDDHNVDSFISLSSPQMGQYGDTDYLKWLFPTSMRSNLYRICYSPWGQEFSICNYWH > DPHHDDLYLNASSFLALINGERDHPNATVWRKNFLRVGHLVLIGGPDDGVITPWQSSFFG > FYDANETVLEMEEQLVYLRDSFGLKTLLARGAIVRCPMAGISHTAWHSNRTLYETCIEPW > LS > > Deleting the large entries allows the code to continue until it reaches > another long description line. > > It also seems to be a feature of large Fasta files as reading the above > sequence alone or as part of a small file is fine. > > Is this a known problem or am I doing something wrong? BTW I'm using > biojava 1.7 and Java 1.6.0_17. > Any help would be most appreciated. > Cheers. > > code: > import java.io.*; > > import org.biojava.bio.*; > import org.biojavax.*; > import org.biojavax.bio.seq.*; > > public class test { > private static PrintStream o = System.out; > > public static void main(String[] args) { > // TODO Auto-generated method stub > readFasta(args[0]); > } > > public static void readFasta(String filename) { > try { > o.println("Reading file: " + filename); > //prepare a BufferedReader for file io > BufferedReader br = new BufferedReader(new FileReader(filename)); > > // read Fasta file as BioJava RichSequence object > Namespace ns = RichObjectFactory.getDefaultNamespace(); > RichSequenceIterator iter = > RichSequence.IOTools.readFastaProtein(br,ns); > > int numProteins = 0; > while(iter.hasNext()) { > ++numProteins; > > // Retrieve sequence and description data > RichSequence seq = iter.nextRichSequence(); > String ipi = seq.getName().substring(4,15); > o.println(ipi); > > } > o.println("Found " + numProteins + " in Fasta file"); > } catch (FileNotFoundException ex) { > //can't find file specified by args[0] > ex.printStackTrace(); > } catch (BioException ex) { > //error parsing requested format > ex.printStackTrace(); > } > } > > } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From tiagoantao at gmail.com Fri Dec 18 18:14:32 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 18 Dec 2009 23:14:32 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <850FC294-83E5-4B2E-A1AF-B45981BE3D10@eaglegenomics.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912180404h5aae7a93y3c3220d80b5fd3df@mail.gmail.com> <850FC294-83E5-4B2E-A1AF-B45981BE3D10@eaglegenomics.com> Message-ID: <6d941f120912181514q4f3239d1q2f4825e292850c4c@mail.gmail.com> Just submitted a patch. 2009/12/18 Richard Holland : > You could be right - it could be just the other problem. > > cheers, > Richard > > On 18 Dec 2009, at 12:04, Tiago Ant?o wrote: > >> 2009/12/15 Richard Holland : >>> Tiago - your new code doesn't seem to be coping with the insertion of a newline at random points in the Tree string. I think you might need to modify your tokenize() method to handle this better? Could you also add a unit test using this particular tree. >> >> Do you have any case that exposes this bug? I am trying with this: >> >> #NEXUS >> >> Begin TREES; >> ? ? ? tree test1 = >> ( >> 1 >> , >> 2 >> ) >> ; >> >> End; >> >> And it works out fine and dandy. >> Maybe the bug that you are spotting is really not a newline problem >> but an instance of the other bug (ie node name that happens after a >> newline and seems as a newline bug)? > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From andylu0320 at gmail.com Sun Dec 20 22:50:46 2009 From: andylu0320 at gmail.com (Andy Lu) Date: Sun, 20 Dec 2009 22:50:46 -0500 Subject: [Biojava-l] Simple Question Message-ID: <4a1a3f7d0912201950kf124dc7k24a2d66e06aba305@mail.gmail.com> Hi, I have a simple question. I tried to add a button to the JMol program opened through biojava. I added a Button Panel and JMol Panel to the JFrame. But the Button only shows up when my mouse is hovered over it. Is there a way that I can make it visible all the time? Thank you! From andreas.prlic at gmail.com Sun Dec 20 23:14:16 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Sun, 20 Dec 2009 20:14:16 -0800 Subject: [Biojava-l] Simple Question In-Reply-To: <4a1a3f7d0912201950kf124dc7k24a2d66e06aba305@mail.gmail.com> References: <4a1a3f7d0912201950kf124dc7k24a2d66e06aba305@mail.gmail.com> Message-ID: <59a41c430912202014u4c14c125n2e7288af52bdd372@mail.gmail.com> Hi Andy, the joys of user interface development :-) Difficult to say what is wrong with your code. I assume you need to add the button to either the contentPane of the JFrame or to the verticalBox that is there. - I find working with Boxes the most straightforward way to build up a user interface... No need to have too many panels most of the time... Andreas On Sun, Dec 20, 2009 at 7:50 PM, Andy Lu wrote: > Hi, I have a simple question. > I tried to add a button to the JMol program opened through biojava. I added > a Button Panel and JMol Panel to the JFrame. > But the Button only shows up when my mouse is hovered over it. > Is there a way that I can make it visible all the time? > Thank you! > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at eaglegenomics.com Mon Dec 21 10:19:48 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 21 Dec 2009 15:19:48 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <6d941f120912181514q4f3239d1q2f4825e292850c4c@mail.gmail.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912180404h5aae7a93y3c3220d80b5fd3df@mail.gmail.com> <850FC294-83E5-4B2E-A1AF-B45981BE3D10@eaglegenomics.com> <6d941f120912181514q4f3239d1q2f4825e292850c4c@mail.gmail.com> Message-ID: <249E0981-3E75-486F-B4D5-7E90CAE7E871@eaglegenomics.com> Patched. On 18 Dec 2009, at 23:14, Tiago Ant?o wrote: > Just submitted a patch. > > 2009/12/18 Richard Holland : >> You could be right - it could be just the other problem. >> >> cheers, >> Richard >> >> On 18 Dec 2009, at 12:04, Tiago Ant?o wrote: >> >>> 2009/12/15 Richard Holland : >>>> Tiago - your new code doesn't seem to be coping with the insertion of a newline at random points in the Tree string. I think you might need to modify your tokenize() method to handle this better? Could you also add a unit test using this particular tree. >>> >>> Do you have any case that exposes this bug? I am trying with this: >>> >>> #NEXUS >>> >>> Begin TREES; >>> tree test1 = >>> ( >>> 1 >>> , >>> 2 >>> ) >>> ; >>> >>> End; >>> >>> And it works out fine and dandy. >>> Maybe the bug that you are spotting is really not a newline problem >>> but an instance of the other bug (ie node name that happens after a >>> newline and seems as a newline bug)? >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > > > > -- > ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tiagoantao at gmail.com Mon Dec 21 16:55:58 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 21 Dec 2009 21:55:58 +0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <249E0981-3E75-486F-B4D5-7E90CAE7E871@eaglegenomics.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912180404h5aae7a93y3c3220d80b5fd3df@mail.gmail.com> <850FC294-83E5-4B2E-A1AF-B45981BE3D10@eaglegenomics.com> <6d941f120912181514q4f3239d1q2f4825e292850c4c@mail.gmail.com> <249E0981-3E75-486F-B4D5-7E90CAE7E871@eaglegenomics.com> Message-ID: <6d941f120912211355p23e21bcave2fd3ce50d5957c6@mail.gmail.com> If anyone wants to test with real data, please go ahead. There is some free time, so if bugs are detected it is a good time to report them. 2009/12/21 Richard Holland : > Patched. > > On 18 Dec 2009, at 23:14, Tiago Ant?o wrote: > >> Just submitted a patch. >> >> 2009/12/18 Richard Holland : >>> You could be right - it could be just the other problem. >>> >>> cheers, >>> Richard >>> >>> On 18 Dec 2009, at 12:04, Tiago Ant?o wrote: >>> >>>> 2009/12/15 Richard Holland : >>>>> Tiago - your new code doesn't seem to be coping with the insertion of a newline at random points in the Tree string. I think you might need to modify your tokenize() method to handle this better? Could you also add a unit test using this particular tree. >>>> >>>> Do you have any case that exposes this bug? I am trying with this: >>>> >>>> #NEXUS >>>> >>>> Begin TREES; >>>> ? ? ? tree test1 = >>>> ( >>>> 1 >>>> , >>>> 2 >>>> ) >>>> ; >>>> >>>> End; >>>> >>>> And it works out fine and dandy. >>>> Maybe the bug that you are spotting is really not a newline problem >>>> but an instance of the other bug (ie node name that happens after a >>>> newline and seems as a newline bug)? >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >> >> >> >> -- >> ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ?Pessimism of the Intellect; Optimism of the Will? -Antonio Gramsci From andylu0320 at gmail.com Tue Dec 22 23:31:07 2009 From: andylu0320 at gmail.com (Andy Lu) Date: Tue, 22 Dec 2009 23:31:07 -0500 Subject: [Biojava-l] Simple Question Message-ID: <4a1a3f7d0912222031w5fa8d314ta9d66d9ef4b32500@mail.gmail.com> My Java program incorporates the JMol viewer function, so basically my program opens a structure of a protein through biojava and Jmol. But once I exported it to make a jar file through eclipse, JMol doesn't display the structure anymore (JMol won't even open at all). Any idea why it might be? Am I doing something with the jar file? Any help is greatly appreciated! Thank you. From andreas.prlic at gmail.com Mon Dec 28 10:41:00 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Mon, 28 Dec 2009 07:41:00 -0800 Subject: [Biojava-l] Simple Question In-Reply-To: <4a1a3f7d0912222031w5fa8d314ta9d66d9ef4b32500@mail.gmail.com> References: <4a1a3f7d0912222031w5fa8d314ta9d66d9ef4b32500@mail.gmail.com> Message-ID: <59a41c430912280741p62ff4badra3ca676bb799cfda@mail.gmail.com> HI Andy, sorry for the slow reply, holiday season ... I assume there are some classpath errors and you need to make sure that all the required jars can be found. I never tried to export jar files for biojava via eclipse. The right approach to get jar files is (you are on trunk?) is to do a mvn build. The new jar files will be in the target subdirectories of the various modules. Andreas On Tue, Dec 22, 2009 at 8:31 PM, Andy Lu wrote: > My Java program incorporates the JMol viewer function, so basically my > program opens a structure of a protein through biojava ?and Jmol. > But once I exported it to make a jar file through eclipse, JMol doesn't > display the structure anymore (JMol won't even open at all). > Any idea why it might be? Am I doing something with the jar file? > Any help is greatly appreciated! > Thank you. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jbdundas at gmail.com Fri Dec 18 12:00:16 2009 From: jbdundas at gmail.com (jitesh dundas) Date: Fri, 18 Dec 2009 17:00:16 -0000 Subject: [Biojava-l] Error In Fetching Data from STRING EMBL File Message-ID: <326ea8620912180900l23efefads4fdc4693f11cb61c@mail.gmail.com> Dear Sir/Madam, I am trying to fetch data from EMBL STRING Database (http://string.embl.de/) for a list of protein interactors. The file is attached with this email. On executing the code, it gives me a "version number" error. The same code is working fine when I fetch data from NCBI databases. Thus, I am trying to know what is going wrong here. I request your reply. Please let me know if you need anything else from my side. Regards, Jitesh Dundas -------------- next part -------------- A non-text attachment was scrubbed... Name: ImportGet.jsp Type: application/octet-stream Size: 2780 bytes Desc: not available URL: From thasso.griebel at uni-jena.de Tue Dec 15 09:42:47 2009 From: thasso.griebel at uni-jena.de (Thasso Griebel) Date: Tue, 15 Dec 2009 14:42:47 -0000 Subject: [Biojava-l] A question about NexusFile and NewickTreeString In-Reply-To: <6d941f120912150343p73727225m7c35e57e81fdda46@mail.gmail.com> References: <6d941f120912141840n12578a44k34cf40b58b6328fd@mail.gmail.com> <96DA979D-2FBF-492A-A579-26A11D52FE4E@drycafe.net> <16A164A6-5483-46C7-93F2-5D390C948882@eaglegenomics.com> <6d941f120912150343p73727225m7c35e57e81fdda46@mail.gmail.com> Message-ID: <1263043C-26C6-4455-9D5B-E092EA389851@uni-jena.de> Hi, I wanted to take a look at the parser anyways, so I took the opportunity. As far as i see this, the newline is just a minor part of the problem. I think the bigger issue here is parsing inner node labels. I attached a patch that fixes the problem, at least for inner nodes with label and inner nodes with label and weights. Wikipedia states that Newick allows leaves without any labels, but in case of phylogenetic trees I think one can safely ignore this, though the parser should maybe throw an exception. If you are interested I also updated the unit test. hope it helps, cheers, -thasso On Dec 15, 2009, at 12:43 , Tiago Ant?o wrote: > 2009/12/15 Richard Holland : >> Hi there. >> >> I believe the code used to be able to parse this kind of tree, but >> TIago recently rewrote it so I'm no longer certain. >> >> Tiago - your new code doesn't seem to be coping with the insertion >> of a newline at random points in the Tree string. I think you might >> need to modify your tokenize() method to handle this better? Could >> you also add a unit test using this particular tree. > > > Yep, will take care of this over the weekend. Maybe before, but no > promises. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -------------- next part -------------- A non-text attachment was scrubbed... Name: TreesBlock.patch Type: application/octet-stream Size: 969 bytes Desc: not available URL: -------------- next part --------------