From KPetrov at ics.uci.edu Sat Nov 1 14:44:48 2003 From: KPetrov at ics.uci.edu (Kirill Petrov) Date: Sat Nov 1 14:41:20 2003 Subject: [Biojava-l] hmm profile comparison Message-ID: <1067715888.7337.15.camel@kirill.homedns.org> Hello All, is there a way to compare 2 existing hmm profiles using biojava api? Or probably there is another type of profiling system that allows comparisons of the profiles rather than sequences? Kirill From mark.schreiber at agresearch.co.nz Sat Nov 1 16:29:06 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Sat Nov 1 16:25:58 2003 Subject: [Biojava-l] hmm profile comparison Message-ID: If you want to see if two profiles have the same parameters you can get the Distributions from each state and use the DistributionTools.areEmissionSpectraEqual() method to tell you if they are the same. You should also test that the Distribution that holds the transitions is equal. - Mark -----Original Message----- From: Kirill Petrov [mailto:KPetrov@ics.uci.edu] Sent: Sun 2/11/2003 8:44 a.m. To: biojava-l@biojava.org Cc: Subject: [Biojava-l] hmm profile comparison Hello All, is there a way to compare 2 existing hmm profiles using biojava api? Or probably there is another type of profiling system that allows comparisons of the profiles rather than sequences? Kirill _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From KPetrov at ics.uci.edu Sat Nov 1 20:12:40 2003 From: KPetrov at ics.uci.edu (Kirill Petrov) Date: Sat Nov 1 20:09:10 2003 Subject: [Biojava-l] hmm profile comparison In-Reply-To: References: Message-ID: <1067735560.7337.99.camel@kirill.homedns.org> > is there a way to compare 2 existing hmm profiles using biojava api? > Or probably there is another type of profiling system that allows > comparisons of the profiles rather than sequences? On Sun, 2003-11-02 at 02:29, Schreiber, Mark wrote: > If you want to see if two profiles have the same parameters you can > get the Distributions from each state and use the > DistributionTools.areEmissionSpectraEqual() method to tell you if > they are the same. You should also test that the Distribution that > holds the transitions is equal. As far as I understand that would let me know if two profiles are equal or not. The problem, however, is identifying the distance between 2 profiles. Basically, I would want to use the HMM for separtion of a group of sequences into 2 distinct groups. Is that possible? Kirill From hr_malmi at hotmail.com Sun Nov 2 22:07:42 2003 From: hr_malmi at hotmail.com (harald malming) Date: Sun Nov 2 22:04:28 2003 Subject: [Biojava-l] Dot states in SimpleMarkovModel Message-ID: hi there, can anyone tell me if it is possible to have more than one dot state in a simpleMarkovModel. As soon as I add a second dot state, the : "DP dp=DPFactory.DEFAULT.createDP(...)"; call never completes. I would really appreciate some help, Harry _________________________________________________________________ Last ned nye MSN Messenger 6.0 gratis http://www.msn.no/computing/messenger - Den raskeste veien mellom deg og dine venner From matthew_pocock at yahoo.co.uk Mon Nov 3 08:08:23 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Mon Nov 3 08:11:08 2003 Subject: [Biojava-l] Dot states in SimpleMarkovModel In-Reply-To: References: Message-ID: <3FA65347.6020308@yahoo.co.uk> Hi, This is a known (and fixed) bug on the 1.3 release. Guys, could we get that maintainance release out? Matthew harald malming wrote: > hi there, can anyone tell me if it is possible to have more than one > dot state in a simpleMarkovModel. As soon as I add a second dot state, > the : "DP dp=DPFactory.DEFAULT.createDP(...)"; call never completes. > > I would really appreciate some help, > Harry > > _________________________________________________________________ > Last ned nye MSN Messenger 6.0 gratis > http://www.msn.no/computing/messenger - Den raskeste veien mellom deg > og dine venner > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From valentin_ruano at yahoo.es Mon Nov 3 14:55:34 2003 From: valentin_ruano at yahoo.es (=?iso-8859-1?q?Valentin=20Ruano?=) Date: Mon Nov 3 14:52:19 2003 Subject: [Biojava-l] SequencePoster Message-ID: <20031103195534.58456.qmail@web41902.mail.yahoo.com> Hi everyone, I plan to develop an small bioinformatic application using Java involving some Swing UI. Firstly I would like to be able to show a single sequence and multi sequence alignments. Since the sequence could be rather long, using a multiline display, such as SequencePoster, is best. The problem I am experiencing with SequencePoster is that apparently It is not possible to have control on the number of columns it displays per line, that is, the line "desirable" length. Moreover, the line length seems to be set in order to minimise the number of blank position in the last line right corner. I try to set the maximum size for the poster component and also its container panel but it does not stop it from span beyond the left and right frame limits when the line length is two big. At last I tried to use setLines(0) for automatic line number calculation depending on space available as indicated in the JavaDoc API. But it just returns and Out of Memory error. The same happens with negative line numbers. output: No sequence Fitting to sequence Initial width: 0 alongDim (pixles needed for sequence only): 48.0 Fitting to sequence Initial width: 0 alongDim (pixles needed for sequence only): 48.0 java.lang.OutOfMemoryError Exception in thread "main" ------------- About the alignment issue. There is this other JComponent for pairwise alignment so just two sequences. What about multi sequence alignments, is there any plans for this? thanks and regards, Valentin. ___________________________________________________ Yahoo! Messenger - Nueva versi?n GRATIS Super Webcam, voz, caritas animadas, y m?s... http://messenger.yahoo.es From mark.schreiber at agresearch.co.nz Mon Nov 3 17:58:23 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Mon Nov 3 18:00:59 2003 Subject: [Biojava-l] Serialization of SimpleSequences Message-ID: This problem is now resolved in the biojava1.3 branch, Features are serializing nicely and still behaiving like features at the end of it all. I'm still working on a solution in biojava-live which involves delving into the bowels of the ontology Term implementations to find out why they won't deserialize. Some problem with a HashSet throwing a wobbly when it calls hashcode() on Term$Impl and getting a null pointer exception which shouldn't be possible as far as I can tell. Matthew, do you know what might be going on? Might need to write a ReadObject method for Term$Impl to hold its hand but hopefull not. - Mark -----Original Message----- From: Schreiber, Mark Sent: Tue 28/10/2003 3:40 p.m. To: Vasa Curcin; biojava-l@biojava.org Cc: Subject: RE: [Biojava-l] Serialization of SimpleSequences Hi - I thought we had fixed that one although it turns out the unit test was a bit inadequate. Generally its not a good idea to make an interface implement serializable as there may be a perfectly valid implementation that can't implement serializable. Probably better to make as many of the implementations as possible serializable. I'll have a look at this. - Mark -----Original Message----- From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] Sent: Tue 28/10/2003 2:06 p.m. To: biojava-l@biojava.org Cc: Subject: [Biojava-l] Serialization of SimpleSequences Hello, While transferring some sequences via serialization, I noticed that all my Features are getting lost. After some digging around, it seems as if Java doesn't serialize the FeatureHolder inside the sequence (I was working with SimpleSequence objects). Even though a NotSerializableException is not thrown, the FeatureHolder is missing. After making the FeatureHolder interface in Biojava extend Serializable, the problem disappeared. How much will this change affect the rest of the code - ie. is there a good reason why FeatureHolders are not serializable? Cheers, Vasa _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From kalle.naslund at genpat.uu.se Wed Nov 5 04:52:38 2003 From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=) Date: Wed Nov 5 04:49:32 2003 Subject: [Biojava-l] SequencePoster In-Reply-To: <20031103195534.58456.qmail@web41902.mail.yahoo.com> References: <20031103195534.58456.qmail@web41902.mail.yahoo.com> Message-ID: <3FA8C866.5030609@genpat.uu.se> Valentin Ruano wrote: >< TEXT REMOVED BY KALLE > > >About the alignment issue. There is this other >JComponent for pairwise alignment so just two >sequences. What about multi sequence alignments, is >there any plans for this? > > Biojava already renders multisequence alignments, please look at http://biojava.org/pipermail/biojava-l/2003-May/003801.html for a description on how you can do it. Hopefully that mailinglist entry will atleast get you started. regards Kalle From mark.schreiber at agresearch.co.nz Wed Nov 5 18:05:47 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Wed Nov 5 18:02:29 2003 Subject: [Biojava-l] Nearing biojava 1.3.1 release Message-ID: Hi - There have been a lot of calls for a biojava 1.3.1 maintenance release. The good news is I'm just about ready. I just need to sort out some merging of the DP code from the biojava-live version. Before I put it out I would really like to get something resolved. Currenlty DNATools.a() == RNATools.a(). This is due to the way that the Symbols are declared in AlphabetManager.xml. I personally think this is counter intuitive and wrong. It is very easily fixable (if it needs fixing). Unless anybody disagrees I would like to make it so they are not canonical. Note that this has implications for the NUCLEOTIDE alphabet which might mean such a move is not desriable so please speak now or forever hold your peace. There have been a lot of bug fixes and minimal API breakage, there was a little bit although it is unlikely to be noticed by most people. I don't think I will be putting out a version for Java1.3 either. - Mark ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From vc100 at doc.ic.ac.uk Thu Nov 6 00:53:12 2003 From: vc100 at doc.ic.ac.uk (Vasa Curcin) Date: Thu Nov 6 00:49:14 2003 Subject: [Biojava-l] NCBI database Message-ID: <3FA9E1C8.80607@doc.ic.ac.uk> Hello, It seems that the NCBI class is not working anymore. I am using it to retrieve some annotated sequences, and since this morning it is returning a MalformedURLException. It seems like the web interface to NCBI changed. Anyone knows something about this? Regards, Vasa From mark.schreiber at agresearch.co.nz Thu Nov 6 19:47:41 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Thu Nov 6 19:44:58 2003 Subject: [Biojava-l] NCBI database Message-ID: Hi - Just noticed that the NCBI Entrez page has a new look and is at the URL http://www.ncbi.nih.gov/gquery/gquery.fcgi I'm not sure if this is the problem (possibly a new URL). I'll check it out. - Mark > -----Original Message----- > From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] > Sent: Thursday, 6 November 2003 6:53 p.m. > To: biojava-l@biojava.org > Subject: [Biojava-l] NCBI database > > > Hello, > > It seems that the NCBI class is not working anymore. I am using it to > retrieve some annotated sequences, and since this morning it is > returning a MalformedURLException. It seems like the web interface to > NCBI changed. Anyone knows something about this? > > Regards, > Vasa > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mark.schreiber at agresearch.co.nz Mon Nov 10 16:37:20 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Mon Nov 10 16:34:04 2003 Subject: [Biojava-l] Phrap ace format Message-ID: Hi - Has anyone ever made a biojava parser for the phrap ace format? - Mark ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dumontier at mshri.on.ca Mon Nov 10 22:39:07 2003 From: dumontier at mshri.on.ca (Marc Dumontier) Date: Mon Nov 10 22:42:05 2003 Subject: [Biojava-l] BLAST through servlets Message-ID: <490D0AFAF3D2D3119F6C00508B6FDF1501FA465A@ex.mshri.on.ca> Hi, I was wondering if anyone has implemented a blast interface using servlets, and maybe applied a stylesheet to the XML output. If anyone has already done this and can share some source code,that would be greatly appreciated. thanks, Marc Dumontier Bioinformatics Software Developer Blueprint Initiative Mount Sinai Hospital http://www.blueprint.org (416)586-8505 x6311 From verhoeff2 at gis.a-star.edu.sg Tue Nov 11 02:52:45 2003 From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans) Date: Tue Nov 11 08:17:17 2003 Subject: [Biojava-l] BLAST parsing explodes in size Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B041C@BIONIC.biopolis.one-north.com> Hi, I am having a problem parsing huge blast results. Basically I am parsing the blast results pretty much the same way as in "Biojava in Anger", with as only difference that I use the setModeLazy() of the BlastLikeSAXParser, since I am using NCBI Blast version 2.2.4 and that version is not recognised by the parser yet. Besides that the only difference lays in the things I do with the data. The problem is that when I parse a blast result that is a few hundred MB, for example 300MB, the java application is ballooning up to around 1.6GB of memory. Sometimes the application even crashes because I only have got 2GB to play with. Does anyone know what's causing this? Is it because I set the lazy mode? Is there any way to work around it? Kind regards, Frans Verhoef Bioinformatics Specialist Genome Institute of Singapore Genome, #02-01, 60 Biopolis Street, Singapore 138672 Tel: +65 6478 8000 DID: +65 6478 8060 HP: +65 9848 4325 Email: verhoeff2@gis.a-star.edu.sg From kdj at sanger.ac.uk Tue Nov 11 11:21:52 2003 From: kdj at sanger.ac.uk (Keith James) Date: Tue Nov 11 11:21:53 2003 Subject: [Biojava-l] BLAST parsing explodes in size In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560B041C@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D560B041C@BIONIC.biopolis.one-north.com> Message-ID: >>>>> "FV" == VERHOEF Frans writes: FV> Hi, I am having a problem parsing huge blast FV> results. Basically I am parsing the blast results pretty much FV> the same way as in "Biojava in Anger", with as only difference FV> that I use the setModeLazy() of the BlastLikeSAXParser, since FV> I am using NCBI Blast version 2.2.4 and that version is not FV> recognised by the parser yet. Using blast 2.2.4 or 2.2.6 is safe in lazy mode - diffs show only minor whitespace changes in the format. FV> Besides that the only difference lays in the things I do with FV> the data. This is likely to be the cause of the problem. See below. FV> The problem is that when I parse a blast result that is a few FV> hundred MB, for example 300MB, the java application is FV> ballooning up to around 1.6GB of memory. Sometimes the FV> application even crashes because I only have got 2GB to play FV> with. The parser uses an event driven framework which is designed to handle very big data - it will handle multi-GB reports. However, if you create many fine-grained objects for every element of every report you will quickly run out of resources. FV> Does anyone know what's causing this? Is it because I set the FV> lazy mode? Is there any way to work around it? Either you need to think about which elements of the report you are interested in and build a filter which captures those events, discarding the rest. See the demos/ssbind package for an example by Matthew. Or if you really need all those objects then you should look at allowing them to be garbage-collected as soon as possible. It is possible that there is a bug somewhere, but without any seeing any code it isn't possible to say much more. If you need more help, post a short (working) piece of code illustrating the problem and we will do our best. hth Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From ralf.sigmund at ipk-gatersleben.de Tue Nov 11 14:37:22 2003 From: ralf.sigmund at ipk-gatersleben.de (Ralf Sigmund) Date: Tue Nov 11 14:33:53 2003 Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice) Message-ID: <000401c3a88b$3a139bf0$7a8a5ec2@PGRC22> I have been investigating on Solutions to accurately describe and execute Bioinformatics analysis tasks. I am interested in an Analysis platform which offers the following possibilities for protocol based Planning, Execution and Result-Reporting of multistep processes example: find syntenious regions by comparison of ests & marker positions from a species lacking a genomic map with the completed genome map of another species. this task will be achieved by a sequence of subtasks each subtask will possess several degrees of freedom: - filtering / choice of input data - the sanitization of data - the choice of the algorithm - setting of multiple parameters, thresholds a framework could support tasks like this in several ways: --allow unambiguous definition of the steps in a storable format, which could be exchanged with other scientists and allow them to reproduce the experiment --protocol of valuable execution parameters like the actually used dataset versions, start-, end-time points --allow for annotation / documentation of intermediate steps and the presentation of these results in a repository in order to facilitate their reuse in additional in silico experiments (possibly done by different experimentators) --allow for concurrent execution of several experimentators optimizing the utilization of computing resources. --allow for scheduled reiteration of experiments after source-database updates Starting with L.Stein's commentary in Nature "Creating a Bioinformatics Nation" and by reading the available material on the OmniGene Project one might have guessed that Java would be an ideal Platform for a new generation of data and task integrating Middleware Software. However the OmniGene effort has been transferred into the non/public corporate space and even before there was no widespread adoption of this platform (judged by the sourceforge traffic, the lack of citations..) Recently I discovered the BioPipe project and its accompanying publication in Genome Research. The project is mature, tightly integrated with Bioperl and allmost completly fullfills the above stated requirements. However BioPipe is based on Perl and now I wonder if Java would not be more advantageous as a platform of this kind. I will try to list the advantages of JAVA and Perl in this application below and hope for your comments: (1)Compared to Perl Java has advanced Object Orientation support which allows for more transparent and modular architectures. Development tools like Eclipse/Omodo-UML even increase this advantage. (2)Component Transaction Monitors like the Application Server JBOSS (j2ee,ejb) are an ideal platform for the Management of multiple user / multiple task scenarios. The j2ee-technology is successfully used in many similar applications in other industries. Advanced client applications could really benefit form Object Remoting provided by the J2ee Platform. (3)Based on my limited knowledge the Java Platform appears to have a much tighter (more failsafe?) incorporation of XML (XML-Schema - class binding with JAXB) and Webservice Technologies (SOAP) (Apache Tomcat/AXIS). (4) There are several workflow design and management tools even with graphic editors. Integration of this j2ee based projects might allow big advantages to this part. I see 2 major disadvantages for Java: (1) bioinformatics tools are typically command line tools. The Perl on Unix platform is the best way to invoke such tools from a program. Java's platform independence appears to be the source for its weakness in this field. (2) the bioperl project has a far bigger codebase, and more contributors than any JAVA Bioinformatics efforts like Biojava and Omnigene. I wonder if Java will ever become a significant technology for public / open source bioinformatics projects? It seems like the existing headstart perl based projects now have outweighs any advantages the Java Technology offers. Thanks for your comments on this ideas... Regards Ralf --------------------------------- Dr. Ralf Sigmund Institut f?r Pflanzengenetik und Kulturpflanzenforschung (IPK) Corrensstra?e 3 D-06466 Gatersleben --------------------------------- Tel: +49/(0)39482/5-659 Fax: +49/(0)39482/5-595 mailto:ralf.sigmund@ipk-gatersleben.de From mark.schreiber at agresearch.co.nz Tue Nov 11 16:46:00 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Tue Nov 11 16:42:43 2003 Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice) Message-ID: Hi Ralf, I think this is an interesting proposal. I definitely think if you want to do this properly you would need to back it with J2EE technology and blend in some biojava where appropriate. We have been doing quite a bit of work recently to make biojava more able to play with j2ee, esp on the serialization side of things. These updates will be available in biojava1.3.1 which will be out soon. I've made some more comments below. > Starting with L.Stein's commentary in Nature "Creating a > Bioinformatics Nation" and by reading the available material > on the OmniGene Project one might have guessed that Java > would be an ideal Platform for a new generation of data and > task integrating Middleware Software. > I think your right. You would need something like j2ee to make it bullet proof if you envision multiple transactions with multiple clients, especially if any of them have write access to your data. This is probably beyond Perl. You could use .NET but then you are tied to one OS and you won't be able to easily use bioperl or biojava. > However the OmniGene effort has been transferred into the > non/public corporate space and even before there was no > widespread adoption of this platform (judged by the > sourceforge traffic, the lack of citations..) > Are you shure it's no longer open source? I'm surprised. > Recently I discovered the BioPipe project and its > accompanying publication in Genome Research. The project is > mature, tightly integrated with Bioperl and allmost completly > fullfills the above stated requirements. > > However BioPipe is based on Perl and now I wonder if Java > would not be more advantageous as a platform of this kind. > BioPipe is a protocol definition. The core engine is written in Perl/ BioPerl. It may be possible to write a BioPipe engine in Java although I've thought about this and I wonder if the BioPipe schema may be a bit Perl centric. Even so if you do make a enterprise bioinformatics system based on Java then a worthy goal would be making a module that can process and execute BioPipe protocols. > I will try to list the advantages of JAVA and Perl in this > application below and hope for your comments: > > (1)Compared to Perl Java has advanced Object Orientation > support which allows for more transparent and modular > architectures. Development tools like Eclipse/Omodo-UML even > increase this advantage. True, if you do it right. > (2)Component Transaction Monitors like the Application Server JBOSS > (j2ee,ejb) are an ideal platform for the Management of > multiple user / multiple task scenarios. The j2ee-technology > is successfully used in many similar applications in other > industries. Advanced client applications could really benefit > form Object Remoting provided by the J2ee Platform. Very true. I think to do it any other way would be to reinvent the wheel and cause several major headaches. This would be the strongest argument for using Java. >(3)Based > on my limited knowledge the Java Platform appears to have a > much tighter (more failsafe?) incorporation of XML > (XML-Schema - class binding with JAXB) and Webservice > Technologies (SOAP) (Apache Tomcat/AXIS). Also true, unfortunately the code gets a bit bloated. Compare an Axis Soap application to a Perl or Python one. Fortunately a lot of this code is biolerplate stuff that is easily autogenerated and doesn't need much maintaining. > (4) There are several workflow design and management tools > even with graphic editors. Integration of this j2ee based > projects might allow big advantages to this part. > I don't have much experience here so can't comment > I see 2 major disadvantages for Java: > (1) bioinformatics tools are typically command line tools. > The Perl on Unix platform is the best way to invoke such > tools from a program. Java's platform independence appears to > be the source for its weakness in this field. True but BioJava has introduced org.biojava.utils.ExecRunner classes to execute other applications which seems to perform very well. Currently it's in biojava-live. I think it should be able to be transferred to biojava 1.3.1 though. > (2) the bioperl project has a far bigger codebase, and more > contributors than any JAVA Bioinformatics efforts like > Biojava and Omnigene. > True, biojava is growing though. > I wonder if Java will ever become a significant technology > for public / open source bioinformatics projects? It seems > like the existing headstart perl based projects now have > outweighs any advantages the Java Technology offers. > Who knows. Almost everyone who comes through a university computer or bioinformatics program will be taught Java and possibly Perl. Java is much more attactive for industry and there have been some useful additions to biojava from industry sources. Perl has had the advantage of been a text processing language that lends itself to bioinformatics. I'm in awe of the people who use perl for large scale projects. Seems like a nightmare to me. - Mark ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From zren at amylin.com Tue Nov 11 17:49:54 2003 From: zren at amylin.com (Ren, Zhen) Date: Tue Nov 11 17:58:33 2003 Subject: [Biojava-l] building javadocs failed Message-ID: Hi, I followed the instruction at http://cvs.biojava.org/ and successfully downloaded code from the CVS repositories and later built the JAR files. However, I failed to build javadocs by typing "ant javadocs" at the DOS command prompt. Here is the error message: C:\Program Files\biojava-live>ant javadocs Buildfile: build.xml BUILD FAILED Target `javadocs' does not exist in this project. Total time: 4 seconds Thank you for your suggestion. Zhen From david.huen at ntlworld.com Tue Nov 11 18:18:26 2003 From: david.huen at ntlworld.com (David Huen) Date: Tue Nov 11 18:14:56 2003 Subject: [Biojava-l] building javadocs failed In-Reply-To: References: Message-ID: <200311112318.26134.david.huen@ntlworld.com> On Tuesday 11 Nov 2003 10:49 pm, Ren, Zhen wrote: > Hi, > > I followed the instruction at http://cvs.biojava.org/ and successfully > downloaded code from the CVS repositories and later built the JAR files. > However, I failed to build javadocs by typing "ant javadocs" at the DOS > command prompt. Here is the error message: > > C:\Program Files\biojava-live>ant javadocs > Buildfile: build.xml > I believe that 'ant javadocs-biojava' is what works now. There are spearate docs for various other items like grammars, etc. Regards, David From zren at amylin.com Tue Nov 11 18:24:36 2003 From: zren at amylin.com (Ren, Zhen) Date: Tue Nov 11 18:21:07 2003 Subject: [Biojava-l] building javadocs failed Message-ID: Sorry to bug you again. It seems still not working. Error message: C:\Program Files\biojava-live> C:\Program Files\biojava-live>ant javadocs-biojava Buildfile: build.xml init: [echo] JUnit present: true [echo] JUnit supported by Ant: true [echo] SableCC supported by Ant: true prepare: prepare-biojava: prepare-taglets: compile-taglets: [javac] Compiling 3 source files to C:\Program Files\biojava-live\ant-build\ classes\taglets [javac] C:\Program Files\biojava-live\ant-build\src\taglets\Useage.java:81: cannot resolve symbol [javac] symbol : method holder () [javac] location: interface com.sun.javadoc.Tag [javac] sb.append(((ClassDoc) tags[0].holder()).qualifiedTypeName()) ; [javac] ^ [javac] 1 error BUILD FAILED file:C:/Program Files/biojava-live/build.xml:421: Compile failed; see the compil er error output for details. Total time: 6 seconds Thanks. Zhen -----Original Message----- From: David Huen [mailto:david.huen@ntlworld.com] Sent: Tuesday, November 11, 2003 3:18 PM To: Ren, Zhen; biojava-l@biojava.org Subject: Re: [Biojava-l] building javadocs failed On Tuesday 11 Nov 2003 10:49 pm, Ren, Zhen wrote: > Hi, > > I followed the instruction at http://cvs.biojava.org/ and successfully > downloaded code from the CVS repositories and later built the JAR files. > However, I failed to build javadocs by typing "ant javadocs" at the DOS > command prompt. Here is the error message: > > C:\Program Files\biojava-live>ant javadocs > Buildfile: build.xml > I believe that 'ant javadocs-biojava' is what works now. There are spearate docs for various other items like grammars, etc. Regards, David From verhoeff2 at gis.a-star.edu.sg Wed Nov 12 04:37:22 2003 From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans) Date: Wed Nov 12 04:36:28 2003 Subject: [Biojava-l] BLAST parsing explodes in size Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com> Hi Keith, Thanks for your response. I did paste the method that's doing the parsing somewhere below. I also ran just now this method trying to parse a blast output file with a size of approximately 350mb. The output generated is this: Before parsing: 402280 After parsing: 1043162496 With the number indicating the memory size of java in bytes. That means that during the parsing (all biojava) the size explodes from a mere 402kb to 1gb. After that the size doesn't do much anymore. For your information, I am using the following: - NCBI Blast 2.2.4 - Java 1.4.2_01 - Linux - Biojava from cvs, last updated at 21st of October Hopefully you will now tell me I am doing something stupid ;-) private void parseBlastOutput(File file) throws Exception{ Runtime r = Runtime.getRuntime(); System.out.println("Before parsing: " + (r.totalMemory()-r.freeMemory())); InputStream is = new FileInputStream(file); BlastLikeSAXParser parser = new BlastLikeSAXParser(); parser.setModeLazy(); SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); parser.setContentHandler(adapter); List results = new ArrayList(); SearchContentHandler builder = new BlastLikeSearchBuilder(results, new DummySequenceDB("queries"), new DummySequenceDBInstallation()); adapter.setSearchContentHandler(builder); parser.parse(new InputSource(is)); for (Iterator i = results.iterator(); i.hasNext(); ){ System.out.println("Iterating: " + (r.totalMemory()-r.freeMemory())); SeqSimilaritySearchResult result = (SeqSimilaritySearchResult)i.next(); org.biojava.bio.Annotation anno = result.getAnnotation(); String queryID = (String)anno.getProperty("queryId"); String database = this.parseNameFromDBPath((String)anno.getProperty("databaseId")); String lib = this.parseIDForLibrary(queryID); BlastSetting bsetting = null; if (lib!=null && database!=null) bsetting = adaptor.fetchSetting(lib, database); if (lib == null || database == null || bsetting == null){ //means no blast setting can be found for this library and database System.out.println("HELP!!!!!"); throw new Exception("Cannot find Blast Setting in database for library " + lib + " and blastdatabase " + database); } File outFile = new File(destDir, queryID + ".out"); BufferedWriter out = new BufferedWriter(new FileWriter(outFile)); out.write("queryID\tqueryStart\tqueryEnd\tdatabase\tsubjectID\tsubjectSt art\tsubjectEnd\tscore\teValue\tDescription\n"); List hits = result.getHits(); //System.out.println("Start writing with " + hits.size() + " hits."); for (int j=0; j bsetting.getMaxEValue()){ break; } //System.out.println("HIT!!!"); org.biojava.bio.Annotation hitAnno = hit.getAnnotation(); String description = hitAnno.containsProperty("subjectDescription") ? (String)hitAnno.getProperty("subjectDescription") : "No Description"; out.write(queryID + "\t"); out.write(hit.getQueryStart() + "\t"); out.write(hit.getQueryEnd() + "\t"); out.write(database + "\t"); out.write(hit.getSubjectID() + "\t"); out.write(hit.getSubjectStart() + "\t"); out.write(hit.getSubjectEnd() + "\t"); out.write(hit.getScore() + "\t"); out.write(hit.getEValue() + "\t"); out.write(description + "\n"); out.flush(); hitAnno = null;description = null;hit=null; System.gc(); } out.close(); hits = null; out=null; outFile=null; bsetting=null; lib=null; database=null; queryID=null; anno=null; result=null; System.gc(); } file.delete(); } > -----Original Message----- > From: Keith James [mailto:kdj@sanger.ac.uk] > Sent: Wednesday, November 12, 2003 12:25 AM > To: VERHOEF Frans > Cc: biojava-l@biojava.org > Subject: Re: [Biojava-l] BLAST parsing explodes in size > > >>>>> "FV" == VERHOEF Frans writes: > > FV> Hi, I am having a problem parsing huge blast > FV> results. Basically I am parsing the blast results pretty much > FV> the same way as in "Biojava in Anger", with as only difference > FV> that I use the setModeLazy() of the BlastLikeSAXParser, since > FV> I am using NCBI Blast version 2.2.4 and that version is not > FV> recognised by the parser yet. > > Using blast 2.2.4 or 2.2.6 is safe in lazy mode - diffs show only > minor whitespace changes in the format. > > FV> Besides that the only difference lays in the things I do with > FV> the data. > > This is likely to be the cause of the problem. See below. > > FV> The problem is that when I parse a blast result that is a few > FV> hundred MB, for example 300MB, the java application is > FV> ballooning up to around 1.6GB of memory. Sometimes the > FV> application even crashes because I only have got 2GB to play > FV> with. > > The parser uses an event driven framework which is designed to handle > very big data - it will handle multi-GB reports. However, if you > create many fine-grained objects for every element of every report you > will quickly run out of resources. > > FV> Does anyone know what's causing this? Is it because I set the > FV> lazy mode? Is there any way to work around it? > > Either you need to think about which elements of the report you are > interested in and build a filter which captures those events, > discarding the rest. See the demos/ssbind package for an example by > Matthew. Or if you really need all those objects then you should look > at allowing them to be garbage-collected as soon as possible. > > It is possible that there is a bug somewhere, but without any seeing > any code it isn't possible to say much more. If you need more help, > post a short (working) piece of code illustrating the problem and we > will do our best. > > hth > > Keith > > -- > > - Keith James Microarray Facility, Team 65 - > - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From matthew_pocock at yahoo.co.uk Wed Nov 12 05:25:36 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Wed Nov 12 05:30:45 2003 Subject: [Biojava-l] BLAST parsing explodes in size In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com> Message-ID: <3FB20AA0.80509@yahoo.co.uk> Morning, I think the problem is that you are populating the results List with /all/ of the blast data. This means that all the data from the complete report must be in memory in this List. A better approach is to write an object to replace builder in adapter.setSearchContentHandler(builder), which does all the processing as the data streams in from the parser. This will keep memory consumption down to the bare minimum. There is some code that does this sort of thing in demos/ssbind, and it may be worth scanning the code for BlastLikeSearchBuilder for ideas. Best, Matthew VERHOEF Frans wrote: >Hi Keith, > >Thanks for your response. I did paste the method that's doing the >parsing somewhere below. I also ran just now this method trying to parse >a blast output file with a size of approximately 350mb. The output >generated is this: > >Before parsing: 402280 >After parsing: 1043162496 > >With the number indicating the memory size of java in bytes. That means >that during the parsing (all biojava) the size explodes from a mere >402kb to 1gb. After that the size doesn't do much anymore. > >For your information, I am using the following: >- NCBI Blast 2.2.4 >- Java 1.4.2_01 >- Linux >- Biojava from cvs, last updated at 21st of October > >Hopefully you will now tell me I am doing something stupid ;-) > > >private void parseBlastOutput(File file) throws Exception{ > Runtime r = Runtime.getRuntime(); > System.out.println("Before parsing: " + >(r.totalMemory()-r.freeMemory())); > InputStream is = new FileInputStream(file); > BlastLikeSAXParser parser = new BlastLikeSAXParser(); > parser.setModeLazy(); > SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); > parser.setContentHandler(adapter); > List results = new ArrayList(); > SearchContentHandler builder = new BlastLikeSearchBuilder(results, >new DummySequenceDB("queries"), new DummySequenceDBInstallation()); > adapter.setSearchContentHandler(builder); > parser.parse(new InputSource(is)); > > for (Iterator i = results.iterator(); i.hasNext(); ){ > System.out.println("Iterating: " + >(r.totalMemory()-r.freeMemory())); > SeqSimilaritySearchResult result = >(SeqSimilaritySearchResult)i.next(); > > org.biojava.bio.Annotation anno = result.getAnnotation(); > String queryID = (String)anno.getProperty("queryId"); > String database = >this.parseNameFromDBPath((String)anno.getProperty("databaseId")); > String lib = this.parseIDForLibrary(queryID); > BlastSetting bsetting = null; > if (lib!=null && database!=null) bsetting = >adaptor.fetchSetting(lib, database); > if (lib == null || database == null || bsetting == null){ > //means no blast setting can be found for this library and >database > System.out.println("HELP!!!!!"); > throw new Exception("Cannot find Blast Setting in database >for library " + lib + " and blastdatabase " + database); > } > > File outFile = new File(destDir, queryID + ".out"); > BufferedWriter out = new BufferedWriter(new >FileWriter(outFile)); > >out.write("queryID\tqueryStart\tqueryEnd\tdatabase\tsubjectID\tsubjectSt >art\tsubjectEnd\tscore\teValue\tDescription\n"); > List hits = result.getHits(); > //System.out.println("Start writing with " + hits.size() + " >hits."); > for (int j=0; j SeqSimilaritySearchHit hit = >(SeqSimilaritySearchHit)hits.get(j); > if (hit.getEValue() > bsetting.getMaxEValue()){ > > break; > } > //System.out.println("HIT!!!"); > org.biojava.bio.Annotation hitAnno = hit.getAnnotation(); > String description = >hitAnno.containsProperty("subjectDescription") ? >(String)hitAnno.getProperty("subjectDescription") : "No Description"; > > out.write(queryID + "\t"); > out.write(hit.getQueryStart() + "\t"); > out.write(hit.getQueryEnd() + "\t"); > out.write(database + "\t"); > out.write(hit.getSubjectID() + "\t"); > out.write(hit.getSubjectStart() + "\t"); > out.write(hit.getSubjectEnd() + "\t"); > out.write(hit.getScore() + "\t"); > out.write(hit.getEValue() + "\t"); > out.write(description + "\n"); > out.flush(); > hitAnno = null;description = null;hit=null; > System.gc(); > } > out.close(); > hits = null; out=null; outFile=null; bsetting=null; lib=null; >database=null; queryID=null; anno=null; result=null; > System.gc(); > } > > file.delete(); > } > > > > >>-----Original Message----- >>From: Keith James [mailto:kdj@sanger.ac.uk] >>Sent: Wednesday, November 12, 2003 12:25 AM >>To: VERHOEF Frans >>Cc: biojava-l@biojava.org >>Subject: Re: [Biojava-l] BLAST parsing explodes in size >> >> >> >>>>>>>"FV" == VERHOEF Frans writes: >>>>>>> >>>>>>> >> FV> Hi, I am having a problem parsing huge blast >> FV> results. Basically I am parsing the blast results pretty much >> FV> the same way as in "Biojava in Anger", with as only difference >> FV> that I use the setModeLazy() of the BlastLikeSAXParser, since >> FV> I am using NCBI Blast version 2.2.4 and that version is not >> FV> recognised by the parser yet. >> >>Using blast 2.2.4 or 2.2.6 is safe in lazy mode - diffs show only >>minor whitespace changes in the format. >> >> FV> Besides that the only difference lays in the things I do with >> FV> the data. >> >>This is likely to be the cause of the problem. See below. >> >> FV> The problem is that when I parse a blast result that is a few >> FV> hundred MB, for example 300MB, the java application is >> FV> ballooning up to around 1.6GB of memory. Sometimes the >> FV> application even crashes because I only have got 2GB to play >> FV> with. >> >>The parser uses an event driven framework which is designed to handle >>very big data - it will handle multi-GB reports. However, if you >>create many fine-grained objects for every element of every report you >>will quickly run out of resources. >> >> FV> Does anyone know what's causing this? Is it because I set the >> FV> lazy mode? Is there any way to work around it? >> >>Either you need to think about which elements of the report you are >>interested in and build a filter which captures those events, >>discarding the rest. See the demos/ssbind package for an example by >>Matthew. Or if you really need all those objects then you should look >>at allowing them to be garbage-collected as soon as possible. >> >>It is possible that there is a bug somewhere, but without any seeing >>any code it isn't possible to say much more. If you need more help, >>post a short (working) piece of code illustrating the problem and we >>will do our best. >> >>hth >> >>Keith >> >>-- >> >>- Keith James Microarray Facility, Team 65 - >>- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - >> >> > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > From matthew_pocock at yahoo.co.uk Wed Nov 12 05:30:21 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Wed Nov 12 05:35:21 2003 Subject: [Biojava-l] building javadocs failed In-Reply-To: References: Message-ID: <3FB20BBD.3000901@yahoo.co.uk> Hi, I think this is my bad - an incompattibility between the taglets of java 1.4.2 and those before. You can safely nuke this class if it's causing problems. Matthew Ren, Zhen wrote: >Sorry to bug you again. It seems still not working. Error message: > >C:\Program Files\biojava-live> >C:\Program Files\biojava-live>ant javadocs-biojava >Buildfile: build.xml > >init: > [echo] JUnit present: true > [echo] JUnit supported by Ant: true > [echo] SableCC supported by Ant: true > >prepare: > >prepare-biojava: > >prepare-taglets: > >compile-taglets: > [javac] Compiling 3 source files to C:\Program Files\biojava-live\ant-build\ >classes\taglets > [javac] C:\Program Files\biojava-live\ant-build\src\taglets\Useage.java:81: >cannot resolve symbol > [javac] symbol : method holder () > [javac] location: interface com.sun.javadoc.Tag > [javac] sb.append(((ClassDoc) tags[0].holder()).qualifiedTypeName()) >; > [javac] ^ > [javac] 1 error > >BUILD FAILED >file:C:/Program Files/biojava-live/build.xml:421: Compile failed; see the compil >er error output for details. > >Total time: 6 seconds > >Thanks. > >Zhen > >-----Original Message----- >From: David Huen [mailto:david.huen@ntlworld.com] >Sent: Tuesday, November 11, 2003 3:18 PM >To: Ren, Zhen; biojava-l@biojava.org >Subject: Re: [Biojava-l] building javadocs failed > > >On Tuesday 11 Nov 2003 10:49 pm, Ren, Zhen wrote: > > >>Hi, >> >>I followed the instruction at http://cvs.biojava.org/ and successfully >>downloaded code from the CVS repositories and later built the JAR files. >>However, I failed to build javadocs by typing "ant javadocs" at the DOS >>command prompt. Here is the error message: >> >>C:\Program Files\biojava-live>ant javadocs >>Buildfile: build.xml >> >> >> >I believe that 'ant javadocs-biojava' is what works now. There are spearate >docs for various other items like grammars, etc. > >Regards, >David > > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > From kdj at sanger.ac.uk Wed Nov 12 05:40:26 2003 From: kdj at sanger.ac.uk (Keith James) Date: Wed Nov 12 05:40:27 2003 Subject: [Biojava-l] BLAST parsing explodes in size In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com> Message-ID: >>>>> " " == VERHOEF Frans writes: > Hi Keith, Thanks for your response. I did paste the method > that's doing the parsing somewhere below. I also ran just now > this method trying to parse a blast output file with a size of > approximately 350mb. The output generated is this: > Before parsing: 402280 After parsing: 1043162496 > With the number indicating the memory size of java in > bytes. That means that during the parsing (all biojava) the > size explodes from a mere 402kb to 1gb. After that the size > doesn't do much anymore. A report of 350mb is sufficient to generate a lot of objects if you fully represent all hits, HSPs, alignments and annotation. At the top of your method you create a list to contain all your results: List results = new ArrayList(); and pass it to the builder. Although you make a couple of System.gc() calls further down they are not addressing the cause of the problem - this list is still in scope and objects within it cannot be garbage collected. As the BlastLikeSearchBuilder stores its results in a List in this way is not appropriate for your situation. This is the same as choosing whether to parse XML using SAX or DOM - only use DOM if you can afford to have the whole lot in memory at once. The data you are saving in your output file are taken from a very small subset of the objects being created (so you are not using most of them). You need to extend the event-driven way of handling the data from the SAXContentHandler right through the SearchContentHandler and up to the point where you write to your file. Don't collect everything as objects before you write. There is a working example in demos/ssbind (ProcessBlastReport) of using this event and filtering approach. Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From tmo at ebi.ac.uk Wed Nov 12 06:58:27 2003 From: tmo at ebi.ac.uk (Tom Oinn) Date: Wed Nov 12 06:50:01 2003 Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice) References: Message-ID: <3FB22063.90601@ebi.ac.uk> Hi Ralf, Mark, all... Are you aware of our project, Taverna? We're working with various groups, mostly up at Newcastle and Manchester but also here at the EBI to provide workflow based technology for bioinformatics. Specifically, we have a system, Soaplab, that can wrap arbitrary command line tools as services (currently applies to all the EMBOSS tools but we can easily extend it), courtesy of Martin Senger's work here, and the Taverna project itself which allows users to create workflows out of both Soaplab's services and arbitrary SOAP based web services (we could add Corba, OGSA, RMI etc if needed). It's open source (LGPL) and on sourceforge (taverna.sf.net), and is in use 'in anger' in several complex bioinformatics analysis projects. May I humbly request that you take a look before writing something similar, and if possible join our development effort? Matthew - you're both on this list and working up at Newcastle, does this seem reasonable? I'll be up in a few weeks to talk to the biologists, perhaps we could get together over a drink or several and see how Taverna and Biojava could play together? Cheers, Tom From matthew_pocock at yahoo.co.uk Wed Nov 12 08:18:57 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Wed Nov 12 08:23:53 2003 Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice) In-Reply-To: <3FB22063.90601@ebi.ac.uk> References: <3FB22063.90601@ebi.ac.uk> Message-ID: <3FB23341.3090203@yahoo.co.uk> Tom Oinn wrote: > Matthew - you're both on this list and working up at Newcastle, does > this seem reasonable? Yes. Very. > I'll be up in a few weeks to talk to the biologists, perhaps we could > get together over a drink or several and see how Taverna and Biojava > could play together? We should sort something out. On a related note, I'm currently writing AXIS web services for biojava sequence & feature objects, which should reduce the overhead of this kind of thing a bit. Matthew > > > Cheers, > > Tom > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From tmo at ebi.ac.uk Wed Nov 12 08:48:02 2003 From: tmo at ebi.ac.uk (Tom Oinn) Date: Wed Nov 12 08:39:30 2003 Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice) References: <3FB22063.90601@ebi.ac.uk> <3FB23341.3090203@yahoo.co.uk> Message-ID: <3FB23A12.7050308@ebi.ac.uk> Matthew Pocock wrote: > Tom Oinn wrote: > >> Matthew - you're both on this list and working up at Newcastle, does >> this seem reasonable? > > Yes. Very. Let's hope the BBSRC agree :) >> I'll be up in a few weeks to talk to the biologists, perhaps we could >> get together over a drink or several and see how Taverna and Biojava >> could play together? > > > We should sort something out. On a related note, I'm currently writing > AXIS web services for biojava sequence & feature objects, which should > reduce the overhead of this kind of thing a bit. Fantastic, we're also very interested in service interfaces to the DAS systems (working with the EnsEMBL guys next door from time to time on that one). We have some constraints on what kinds of service we can consume, basically it boils down to 'don't use complex types in axis', but there are some exceptions (collection types are fine). I'm assuming you've followed my various rants on the axis user list as to exactly why, but we've fallen into the pattern of passing XML documents around as strings, so our toolkit doesn't need to know anything about the data at that level and yet we retain the structured information where possible. We believe there is no good reason why a web service tookit should comprehend the structure of the sequence object, for example, flowing through it. Biojava people - please download Taverna and have a play with it, the 'windows' build is not particularly well named, it's actually all java, you'll just need to have 'dot' from graphviz installed and on your path and everything will work. We'll be releasing beta7 for macosX and hopefully both redhat and debian as well, just to make things a little more convenient. We have a user mailing list which might be worth subscribing to if this is of any interest, links from taverna.sf.net Cheers, Tom From verhoeff2 at gis.a-star.edu.sg Wed Nov 12 20:08:37 2003 From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans) Date: Wed Nov 12 20:07:33 2003 Subject: [Biojava-l] BLAST parsing explodes in size Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0420@BIONIC.biopolis.one-north.com> Hi, Thanks for the suggestions. I am quite new in the world of Biojava and basically what I did was copy the example in Biojava in anger and adapt it to my needs. It seems I now have to adapt it a little more. One more question. If the blast output is already in XML, how would you go about it in Biojava? Kind regards, Frans From verhoeff2 at gis.a-star.edu.sg Thu Nov 13 03:39:08 2003 From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans) Date: Thu Nov 13 03:38:04 2003 Subject: [Biojava-l] BLAST parsing explodes in size Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0421@BIONIC.biopolis.one-north.com> Thank you guys! I now have a great looking solution which is lean and fast. I am definitely a biojava fan. Regards, Frans From vc100 at doc.ic.ac.uk Tue Nov 18 05:27:53 2003 From: vc100 at doc.ic.ac.uk (Vasa Curcin) Date: Tue Nov 18 05:23:40 2003 Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot Message-ID: <3FB9F429.8030404@doc.ic.ac.uk> Hello, There seems to be some problem with serializing SequenceDB objects obtained from SwissProtDatabase. The error is: java.io.WriteAbortedException: writing aborted; java.io.NotSeria lizableException: org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre n at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:324) at java.util.HashSet.readObject(HashSet.java:272) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. I am using Biojava 1.30, with Mark's patches from a few weeks back. Anyone has an idea? Regards, Vasa From mark.schreiber at agresearch.co.nz Tue Nov 18 15:14:48 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Tue Nov 18 15:11:39 2003 Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot Message-ID: Hi - I'm not sure serializing an entire SequenceDB is a good idea, however, can you tell me if the serialization is failing on the DB or one of the sequences in it? - Mark > -----Original Message----- > From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] > Sent: Tuesday, 18 November 2003 11:28 p.m. > To: biojava-l@biojava.org > Subject: [Biojava-l] Serialization of SequenceDB obtained > from Swiss-Prot > > > Hello, > > There seems to be some problem with serializing SequenceDB objects > obtained from SwissProtDatabase. The error is: > > java.io.WriteAbortedException: writing aborted; java.io.NotSeria > lizableException: > org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre > n > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278) > at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:324) > at java.util.HashSet.readObject(HashSet.java:272) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > > I am using Biojava 1.30, with Mark's patches from a few weeks back. > Anyone has an idea? > > Regards, > Vasa > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From phirnee123 at yahoo.com Wed Nov 19 09:10:14 2003 From: phirnee123 at yahoo.com (bharani kumar) Date: Wed Nov 19 09:06:27 2003 Subject: [Biojava-l] data cleansing scoring functions Message-ID: <20031119141014.64425.qmail@web13405.mail.yahoo.com> hello everybody, we r involved in building a protein docking software and i would need a suggestion of urs.in this we r taking into account 20 scoring functions like hydrophobicity and stuffs like thatand after that combine all the scoring functions to get a optimised total and plot it against the RMSD of various conformations resulted by orientation of one protein over the other rotationally and translationally. Now my question is that does all these 20 scoring functions are equally important.certainly not.so the data has to be cleansed and finally i hope we wouild be left with certain limited number of scoring functions like 12 or 13. so what would be the best way to clean the data(the scoring functions).One of my supervisor suggested that it could be done using matlab by applying PCA. In this regard i need ur suggestion. ===== *********************************************************************************** "The secret of success is to know something nobody else knows." BHARANI KUMAR.P.S CUBIC, UNIVERSITÄT ZU KÖLN, Zülpicher Str. 47 50674 Köln Germany Fon +49 221 7212018, +49 176 21000597 phirnee123@yahoo.com ************************************************************ __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree From vc100 at doc.ic.ac.uk Wed Nov 19 12:46:59 2003 From: vc100 at doc.ic.ac.uk (Vasa Curcin) Date: Wed Nov 19 12:43:05 2003 Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot References: Message-ID: <3FBBAC93.4020004@doc.ic.ac.uk> Hi, I am still investigating when exactly the problem with writing out the object occurs, but this may be related. Here, I am returning matches from a SwissProt search from the server to the client. The object is a SequenceDB obtained from SwissProt and the entry has the following line: FT INIT_MET 0 0 BY SIMILARITY. This is the exact exception: 17:32:29,654 ERROR [STDERR] got data from http://us.expasy.org/cgi-bin/get-sprot -raw.pl?143B_MOUSE 17:32:30,605 ERROR [STDERR] java.lang.IllegalArgumentException: Location 0 is ou tside 1..245 at org.biojava.bio.seq.impl.SimpleFeature.(SimpleFeature.java:306) at sun.reflect.GeneratedConstructorAccessor85.newInstance(Unknown Source ) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC onstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple FeatureRealizer.java:138) rethrown as org.biojava.bio.BioException: Couldn't realize feature at org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple FeatureRealizer.java:144) at org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatur eRealizer.java:94) at org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence .java:198) at org.biojava.bio.seq.impl.SimpleSequence.createFeature(SimpleSequence. java:204) at org.biojava.bio.seq.io.SequenceBuilderBase.makeSequence(SequenceBuild erBase.java:168) at org.biojava.bio.seq.io.SmartSequenceBuilder.makeSequence(SmartSequenc eBuilder.java:87) at org.biojava.bio.seq.io.SequenceBuilderFilter.makeSequence(SequenceBui lderFilter.java:98) at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:10 1) at org.biojava.bio.seq.db.SwissprotSequenceDB.getSequence(SwissprotSeque nceDB.java:93) Is this 0, 0 location common in Swiss-Prot entries? It seems the serialization is failing only on those entries which have this feature. Regards, Vasa Schreiber, Mark wrote: >Hi - > >I'm not sure serializing an entire SequenceDB is a good idea, however, can you tell me if the serialization is failing on the DB or one of the sequences in it? > >- Mark > > > > >>-----Original Message----- >>From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] >>Sent: Tuesday, 18 November 2003 11:28 p.m. >>To: biojava-l@biojava.org >>Subject: [Biojava-l] Serialization of SequenceDB obtained >>from Swiss-Prot >> >> >>Hello, >> >>There seems to be some problem with serializing SequenceDB objects >>obtained from SwissProtDatabase. The error is: >> >>java.io.WriteAbortedException: writing aborted; java.io.NotSeria >>lizableException: >>org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre >>n >> at >>java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278) >> at >>java.io.ObjectInputStream.readObject(ObjectInputStream.java:324) >> at java.util.HashSet.readObject(HashSet.java:272) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >> >>I am using Biojava 1.30, with Mark's patches from a few weeks back. >>Anyone has an idea? >> >>Regards, >>Vasa >> >>_______________________________________________ >>Biojava-l mailing list - Biojava-l@biojava.org >>http://biojava.org/mailman/listinfo/biojava-l >> >> >> >======================================================================= >Attention: The information contained in this message and/or attachments >from AgResearch Limited is intended only for the persons or entities >to which it is addressed and may contain confidential and/or privileged >material. Any review, retransmission, dissemination or other use of, or >taking of any action in reliance upon, this information by persons or >entities other than the intended recipients is prohibited by AgResearch >Limited. If you have received this message in error, please notify the >sender immediately. >======================================================================= > > From matthew_pocock at yahoo.co.uk Wed Nov 19 13:30:39 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Wed Nov 19 13:37:36 2003 Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot In-Reply-To: <3FBBAC93.4020004@doc.ic.ac.uk> References: <3FBBAC93.4020004@doc.ic.ac.uk> Message-ID: <3FBBB6CF.7090101@yahoo.co.uk> I think the sp parser needs to special-case this - the location is meant to have the semantics of being 'before' the sequence starts (I think). Could someone brave fix this special case in the SP parser? Perhaps a fuzzy range, with both < 1? Also the sp file writer will need modifying to round-trip this. Grr. Matthew Vasa Curcin wrote: > Hi, > > I am still investigating when exactly the problem with writing out the > object occurs, but this may be related. Here, I am returning matches > from a SwissProt search from the server to the client. The object is a > SequenceDB obtained from SwissProt and the entry has the following line: > > FT INIT_MET 0 0 BY SIMILARITY. > > > This is the exact exception: > > 17:32:29,654 ERROR [STDERR] got data from > http://us.expasy.org/cgi-bin/get-sprot > -raw.pl?143B_MOUSE > 17:32:30,605 ERROR [STDERR] java.lang.IllegalArgumentException: > Location 0 is ou > tside 1..245 > at > org.biojava.bio.seq.impl.SimpleFeature.(SimpleFeature.java:306) > > at > sun.reflect.GeneratedConstructorAccessor85.newInstance(Unknown Source > ) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC > onstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:274) > at > org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple > FeatureRealizer.java:138) > rethrown as org.biojava.bio.BioException: Couldn't realize feature > at > org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple > FeatureRealizer.java:144) > at > org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatur > eRealizer.java:94) > at > org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence > .java:198) > at > org.biojava.bio.seq.impl.SimpleSequence.createFeature(SimpleSequence. > java:204) > at > org.biojava.bio.seq.io.SequenceBuilderBase.makeSequence(SequenceBuild > erBase.java:168) > at > org.biojava.bio.seq.io.SmartSequenceBuilder.makeSequence(SmartSequenc > eBuilder.java:87) > at > org.biojava.bio.seq.io.SequenceBuilderFilter.makeSequence(SequenceBui > lderFilter.java:98) > at > org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:10 > 1) > at > org.biojava.bio.seq.db.SwissprotSequenceDB.getSequence(SwissprotSeque > nceDB.java:93) > > Is this 0, 0 location common in Swiss-Prot entries? It seems the > serialization is failing only on those entries which have this feature. > > Regards, > Vasa > > Schreiber, Mark wrote: > >> Hi - >> >> I'm not sure serializing an entire SequenceDB is a good idea, >> however, can you tell me if the serialization is failing on the DB or >> one of the sequences in it? >> >> - Mark >> >> >> >> >>> -----Original Message----- >>> From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] Sent: Tuesday, 18 >>> November 2003 11:28 p.m. >>> To: biojava-l@biojava.org >>> Subject: [Biojava-l] Serialization of SequenceDB obtained from >>> Swiss-Prot >>> >>> >>> Hello, >>> >>> There seems to be some problem with serializing SequenceDB objects >>> obtained from SwissProtDatabase. The error is: >>> >>> java.io.WriteAbortedException: writing aborted; java.io.NotSeria >>> lizableException: >>> org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre >>> n >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278) >>> at >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:324) >>> at java.util.HashSet.readObject(HashSet.java:272) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >>> >>> I am using Biojava 1.30, with Mark's patches from a few weeks back. >>> Anyone has an idea? >>> >>> Regards, >>> Vasa >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l@biojava.org >>> http://biojava.org/mailman/listinfo/biojava-l >>> >>> >> >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From dumontier at mshri.on.ca Wed Nov 19 14:21:50 2003 From: dumontier at mshri.on.ca (Marc Dumontier) Date: Wed Nov 19 14:24:46 2003 Subject: [Biojava-l] blast2html Message-ID: <490D0AFAF3D2D3119F6C00508B6FDF1501FA4669@ex.mshri.on.ca> hi, I'm trying to modify the Blasr2HTML code in org.biojava.bio.program.blast2html to add some links to my blast output. In HTMLRenderer, I'm trying to add a link to each row in my summary. The variable oHitSummary.oHitId.id contains the accession..well something like (ref|NP_011554.1|) , I was wondering if the Blast2HTMLHandler saves GI information, since the link i need to create needs the GI as the argument. Thanks, Marc From bioinformatics4suman at yahoo.com Thu Nov 20 01:12:26 2003 From: bioinformatics4suman at yahoo.com (Suman Kanuganti) Date: Thu Nov 20 01:08:38 2003 Subject: [Biojava-l] MassCalc question. Message-ID: <20031120061226.55042.qmail@web60107.mail.yahoo.com> Ok; I am having problem with using MassCalc class. It always an IllegalSymbolException though the symbol list is correct. I have written this, SequenceIterator iter = MySeqTools.myReadFastaAA(args[0]); while(iter.hasNext()){ Sequence seq = iter.nextSequence(); SymbolList syml = (SymbolList)seq; System.out.println(syml.seqString()); MassCalc mCalc = new MassCalc(SymbolPropertyTable.MONO_MASS, true); double mass = mCalc.getMass(syml); System.out.println("Mass: "+seq.getName()+"\t"+mass); } while result in error Exception in thread "main" org.biojava.bio.symbol.IllegalSymbolException: The SymbolList was not using the protein alphabet Any one can help me with this, Thanks, Suman K ===== Suman K BioInformatics Associate, Genomics Research, Newton Lab, University of Missouri - Columbia. __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ From rh4552000 at yahoo.co.uk Thu Nov 20 04:59:08 2003 From: rh4552000 at yahoo.co.uk (=?iso-8859-1?q?Rich=20Heath?=) Date: Thu Nov 20 04:55:26 2003 Subject: [Biojava-l] File formats Message-ID: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com> Hi, I am a software developer based in the UK that has been asked about producing a piece of software that outputs data from the files in ABI sequencers in a more human readable format. I hope the org.biojava.bio.program.abi package will let me do this, but I have some concerns about the legal implications of using and contributing to this package. Does anyone know what the legal position is with regards reverse engineering the Applied Biosystems file format (and any other file formats come to that matter)? I would imagine this file format is the property of Applied Biosystems and they would not like me producing applications that read from it unless I provide them with a sizable licence fee (although I guess I am not reverse engineering it if I just use the above package, just if I contribute to it?). Many thanks in advance for your help, Rich ________________________________________________________________________ Want to chat instantly with your online friends? Get the FREE Yahoo! Messenger http://mail.messenger.yahoo.co.uk From colin.hardman at cambridgeAntibody.com Thu Nov 20 06:49:31 2003 From: colin.hardman at cambridgeAntibody.com (Colin Hardman) Date: Thu Nov 20 06:45:42 2003 Subject: [Biojava-l] blast2html References: <490D0AFAF3D2D3119F6C00508B6FDF1501FA4669@ex.mshri.on.ca> Message-ID: <3FBCAA47.4A6E06B4@cambridgeAntibody.com> Marc, As I remember it the summary line from the blast output is split on white space with the first token put into hitid - this will make it's way into oHitSummary.oHitId.id in Blast2HTML If you want to change this then you need to implement your own SummaryLineHelperIF in org.biojava.bio.program.sax, but I don't think you will need to. As the actual format of this line depends on how, and from what source, you built the blast indexes the HTMLRenderer delegates the link generation to the DatabaseURLGenerator interface. If your blast result has a summary line line the following ?gi|4557284|ref|NM_000646.1|[4557284] some text description..... eg from http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html you are going to need to parse it in your own DatabaseURLGenerator - HTMLRenderer gets hold of these using a URLGeneratorFactory to get the list - ths first returned in the list is used to create the link in the summary, the others ( if they exist ) are added as extra links in the details. for an example look at NcbiDatabaseURLGenerator & DefaultURLGeneratorFactory in org.biojava.bio.program.blast2html If you write one to parse the above format it might be useful to add it to the repository - even make it the default. Hope that helps, Colin Hardman Marc Dumontier wrote: > hi, > > I'm trying to modify the Blasr2HTML code in > org.biojava.bio.program.blast2html to add some links to my blast output. > > In HTMLRenderer, I'm trying to add a link to each row in my summary. > The variable oHitSummary.oHitId.id contains the accession..well something > like (ref|NP_011554.1|) , I was wondering if the Blast2HTMLHandler saves GI > information, since the link i need to create needs the GI as the argument. > > Thanks, > Marc > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From matthew_pocock at yahoo.co.uk Thu Nov 20 06:55:00 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Thu Nov 20 07:02:07 2003 Subject: [Biojava-l] File formats In-Reply-To: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com> References: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com> Message-ID: <3FBCAB94.1050506@yahoo.co.uk> Hi Rich, We should check this out. This is one of the bizar things about digital IP right now - the data in the abi file is obviosly yours, but potentially you are not alowed to access it in non-blessed ways because the encoding is proprietary. I have a feeling that we would have been in trouble if our code was based upon their serializer/deserializer code (which it is not) due to copyright issues. SW pattents don't work in the EU/UK (yet). Further than that I don't know. Oh, and IANAL. Matthew (goes to speak with someone who may know more) Rich Heath wrote: >Hi, > >I am a software developer based in the UK that has >been asked about producing a piece of software that >outputs data from the files in ABI sequencers in a >more human readable format. I hope the >org.biojava.bio.program.abi package will let me do >this, but I have some concerns about the legal >implications of using and contributing to this >package. > >Does anyone know what the legal position is with >regards reverse engineering the Applied Biosystems >file format (and any other file formats come to that >matter)? I would imagine this file format is the >property of Applied Biosystems and they would not like >me producing applications that read from it unless I >provide them with a sizable licence fee (although I >guess I am not reverse engineering it if I just use >the above package, just if I contribute to it?). > >Many thanks in advance for your help, > >Rich > > > >________________________________________________________________________ >Want to chat instantly with your online friends? Get the FREE Yahoo! >Messenger http://mail.messenger.yahoo.co.uk >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > From dumontier at mshri.on.ca Thu Nov 20 09:41:07 2003 From: dumontier at mshri.on.ca (Marc Dumontier) Date: Thu Nov 20 09:44:02 2003 Subject: [Biojava-l] blast2html Message-ID: <490D0AFAF3D2D3119F6C00508B6FDF1501FA466B@ex.mshri.on.ca> Hey, Thanks for your input I found the route of less resistance was to add the -I parameter when running blast, which will then include the gi from the original fasta. I then just parse out the GI to include in my link Marc -----Original Message----- From: Colin Hardman To: Marc Dumontier Cc: 'biojava-l@biojava.org' Sent: 11/20/03 6:49 AM Subject: Re: [Biojava-l] blast2html Marc, As I remember it the summary line from the blast output is split on white space with the first token put into hitid - this will make it's way into oHitSummary.oHitId.id in Blast2HTML If you want to change this then you need to implement your own SummaryLineHelperIF in org.biojava.bio.program.sax, but I don't think you will need to. As the actual format of this line depends on how, and from what source, you built the blast indexes the HTMLRenderer delegates the link generation to the DatabaseURLGenerator interface. If your blast result has a summary line line the following gi|4557284|ref|NM_000646.1|[4557284] some text description..... eg from http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html you are going to need to parse it in your own DatabaseURLGenerator - HTMLRenderer gets hold of these using a URLGeneratorFactory to get the list - ths first returned in the list is used to create the link in the summary, the others ( if they exist ) are added as extra links in the details. for an example look at NcbiDatabaseURLGenerator & DefaultURLGeneratorFactory in org.biojava.bio.program.blast2html If you write one to parse the above format it might be useful to add it to the repository - even make it the default. Hope that helps, Colin Hardman Marc Dumontier wrote: > hi, > > I'm trying to modify the Blasr2HTML code in > org.biojava.bio.program.blast2html to add some links to my blast output. > > In HTMLRenderer, I'm trying to add a link to each row in my summary. > The variable oHitSummary.oHitId.id contains the accession..well something > like (ref|NP_011554.1|) , I was wondering if the Blast2HTMLHandler saves GI > information, since the link i need to create needs the GI as the argument. > > Thanks, > Marc > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From rhett-sutphin at uiowa.edu Thu Nov 20 10:05:46 2003 From: rhett-sutphin at uiowa.edu (Rhett Sutphin) Date: Thu Nov 20 09:58:19 2003 Subject: [Biojava-l] File formats In-Reply-To: <3FBCAB94.1050506@yahoo.co.uk> References: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com> <3FBCAB94.1050506@yahoo.co.uk> Message-ID: <3FBCD84A.3080803@uiowa.edu> Hi Rich, Matthew, I am also not a lawyer. That in mind, here's my understanding of the topic: First, the org.biojava.bio.program.abi package is based on a paper by Clark Tibbetts (available online: http://www-2.cs.cmu.edu/afs/cs/project/genome/WWW/Papers/clark.html ). That paper was published in August 1995 and is a fairly thorough technical description of the ABI 377, including its means of operation, communication protocols, and (of course) data files. I am unaware of any legal action taken against him or Vanderbilt (his apparent employer at the time). And, of course, the paper remains available. Second, the Staden io_lib library can read ABI-formatted chromatograms. I am unaware of any legal action against its makers and it is currently available (and has been for a while). Third (and this is, again, just my understanding of current US law), reverse engineering for interoperability is legal. The only area where this is not true is if the material is (a) copyrighted and (b) protected by an "access control." If these conditions are met, then the material falls under that most unpleasant of IP laws, the DMCA. However, (a) whoever wants to read the ABI files with your software will probably own the copyright to them (if they are even copyrightable -- they might just be lists of facts and hence uncopyrightable in the US); and (b) I don't think a proprietary file format rises to the level of an "access control." An ABI file isn't encrypted -- you just have to know what offsets from which to read the bytes. Rhett Matthew Pocock wrote: > Hi Rich, > > We should check this out. This is one of the bizar things about digital > IP right now - the data in the abi file is obviosly yours, but > potentially you are not alowed to access it in non-blessed ways because > the encoding is proprietary. I have a feeling that we would have been in > trouble if our code was based upon their serializer/deserializer code > (which it is not) due to copyright issues. SW pattents don't work in the > EU/UK (yet). Further than that I don't know. Oh, and IANAL. > > Matthew > > (goes to speak with someone who may know more) > > Rich Heath wrote: > >> Hi, >> I am a software developer based in the UK that has >> been asked about producing a piece of software that >> outputs data from the files in ABI sequencers in a >> more human readable format. I hope the >> org.biojava.bio.program.abi package will let me do >> this, but I have some concerns about the legal >> implications of using and contributing to this >> package. >> Does anyone know what the legal position is with >> regards reverse engineering the Applied Biosystems >> file format (and any other file formats come to that >> matter)? I would imagine this file format is the >> property of Applied Biosystems and they would not like >> me producing applications that read from it unless I >> provide them with a sizable licence fee (although I >> guess I am not reverse engineering it if I just use >> the above package, just if I contribute to it?). >> Many thanks in advance for your help, >> Rich >> From dag at sonsorol.org Thu Nov 20 10:54:07 2003 From: dag at sonsorol.org (Chris Dagdigian) Date: Thu Nov 20 10:53:19 2003 Subject: [Biojava-l] Total OBF server shutdown Saturday November 22nd (all day EDT timezone) Message-ID: <3FBCE39F.6080309@sonsorol.org> Hi folks, Apologies for the massive cross-posting. Our CVS, mailing list and web servers are located in a Cambridge, MA USA datacenter belonging to Wyeth Resarch. Genetics Institute (which became part of Wyeth) has supported our signficant internet bandwidth and hosting needs for many years since the earliest versions of our open source efforts. Since I have to do this massive cross-post anyway I figured it was a good time to thank them again in public. The real reason for this message is to announce a 1-day period of significant server downtime. The office floor & datacenter in the building where our servers are hosted is going to have a planned electrical shutdown (including emergency and backup power circuits) from 10am - 6pm on Saturday November 22nd. I'll be manually bringing down our servers sometime before the 10am deadline. The time estimate is conservative. In the event that the facilty work takes less time than expected I'll probably take advantage of the window to perform some server upgrades and failed disk replacements. For any questions/concerns or if you notice a server or service that is still not available after the 22nd please contact me directly at 'chris@bioteam.net' or 1-617-877-5498. Regards, Chris From mark.schreiber at agresearch.co.nz Thu Nov 20 16:06:24 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Thu Nov 20 16:02:46 2003 Subject: [Biojava-l] MassCalc question. Message-ID: Hi - This is a bug that has been fixed in biojava-live and the 1.3 branch on CVS. It will also be available in the biojava 1.3.1 release which will be out very soon. (as soon as I solve my ftp problems and put it up on the site!) - Mark > -----Original Message----- > From: Suman Kanuganti [mailto:bioinformatics4suman@yahoo.com] > Sent: Thursday, 20 November 2003 7:12 p.m. > To: biojava-l@biojava.org > Subject: [Biojava-l] MassCalc question. > > > Ok; I am having problem with using MassCalc class. It > always an IllegalSymbolException though the symbol > list is correct. > I have written this, > > SequenceIterator iter = > MySeqTools.myReadFastaAA(args[0]); > > while(iter.hasNext()){ > Sequence seq = iter.nextSequence(); > SymbolList syml = (SymbolList)seq; > System.out.println(syml.seqString()); > MassCalc mCalc = new > MassCalc(SymbolPropertyTable.MONO_MASS, true); > double mass = mCalc.getMass(syml); > System.out.println("Mass: > "+seq.getName()+"\t"+mass); > } > > > while result in error > > Exception in thread "main" > org.biojava.bio.symbol.IllegalSymbolException: The > SymbolList was not using the protein alphabet > > Any one can help me with this, > > Thanks, > Suman K > > ===== > Suman K > BioInformatics Associate, > Genomics Research, > Newton Lab, > University of Missouri - Columbia. > > __________________________________ > Do you Yahoo!? > Free Pop-Up Blocker - Get it now > http://companion.yahoo.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From wux at mail.cbi.pku.edu.cn Thu Nov 20 19:49:14 2003 From: wux at mail.cbi.pku.edu.cn (wux@mail.cbi.pku.edu.cn) Date: Thu Nov 20 19:49:27 2003 Subject: [Biojava-l] chinese version of biojava in anger Message-ID: <200311210053.hAL0r4AY018996@mail.cbi.pku.edu.cn> Dear all: I have finished the translation of Biojava In Anger to Simple Chinese version: here is the URL http://wux.cbi.pku.edu.cn/PUMA/biojava/index-cn.html. . ¡¡¡¡ ¡¡¡¡¡¡¡¡¡¡¡¡ Yours faithfully, ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux@mail.cbi.pku.edu.cn ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ 2003-11-21 ***************************************************** WuXin Ph.D student of CBI (Center of Bioinformatics) Peking University 100871 P.R.China Email: wux@mail.cbi.pku.edu.cn Tel: 010-62762409 (dorm) 010-62755206 (office) Address: Building 47#2026 Peking University ***************************************************** From verhoeff2 at gis.a-star.edu.sg Fri Nov 21 03:02:57 2003 From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans) Date: Fri Nov 21 03:01:42 2003 Subject: [Biojava-l] File formats Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0426@BIONIC.biopolis.one-north.com> Hi, Just my take on it, but if developing software that reads ABI files would be illegal, I think Microsoft would already have sued Sun Microsystems for StarOffice being able to read/write in MS Office formats. So I do not think you have to worry about it. Kind regards, Frans > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l- > bounces@portal.open-bio.org] On Behalf Of Rich Heath > Sent: Thursday, November 20, 2003 5:59 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] File formats > > Hi, > > I am a software developer based in the UK that has > been asked about producing a piece of software that > outputs data from the files in ABI sequencers in a > more human readable format. I hope the > org.biojava.bio.program.abi package will let me do > this, but I have some concerns about the legal > implications of using and contributing to this > package. > > Does anyone know what the legal position is with > regards reverse engineering the Applied Biosystems > file format (and any other file formats come to that > matter)? I would imagine this file format is the > property of Applied Biosystems and they would not like > me producing applications that read from it unless I > provide them with a sizable licence fee (although I > guess I am not reverse engineering it if I just use > the above package, just if I contribute to it?). > > Many thanks in advance for your help, > > Rich > > > > ________________________________________________________________________ > Want to chat instantly with your online friends? Get the FREE Yahoo! > Messenger http://mail.messenger.yahoo.co.uk > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From lmorris at ebi.ac.uk Mon Nov 24 09:48:43 2003 From: lmorris at ebi.ac.uk (Lorna Morris) Date: Mon Nov 24 09:55:15 2003 Subject: [Biojava-l] code changes to embl parser Message-ID: <3FC21A4B.6090709@ebi.ac.uk> I'm trying to submit some biojava files I've changed to the mailing list but I get this message: ----------- Your mail to 'Biojava-l' with the subject EmblFileFormer Is being held until the list moderator can review it for approval. The reason it is being held: Message has a suspicious header ------------ I'm sending them as 6 separate java attachments. Should I send the code changes in a different way to avoid getting this message? Thanks, Lorna From td2 at sanger.ac.uk Mon Nov 24 10:10:24 2003 From: td2 at sanger.ac.uk (Thomas Down) Date: Mon Nov 24 10:16:54 2003 Subject: [Biojava-l] code changes to embl parser In-Reply-To: <3FC21A4B.6090709@ebi.ac.uk> References: <3FC21A4B.6090709@ebi.ac.uk> Message-ID: <20031124151024.GA277532@jabba.sanger.ac.uk> On Mon, Nov 24, 2003 at 02:48:43PM +0000, Lorna Morris wrote: > I'm trying to submit some biojava files I've changed to the mailing list > but I get this message: > ----------- > > Your mail to 'Biojava-l' with the subject > > EmblFileFormer > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Message has a suspicious header > > > ------------ > > I'm sending them as 6 separate java attachments. Should I send the code > changes in a different way to avoid getting this message? Thanks, Hi Lorna, I'm afraid that all messages with attachments are currently being held by the mailing list software -- it was introduced as a (rather draconian) anti spam/virus measure. We've now got some better filtering software on the mailing list, so I think the no-attachments rule should probably be removed. But in the mean time, the best solution would be either to put your changes on a website somewhere and post a link, or just include them in the body of a message. Sorry about that, Thomas. From lmorris at ebi.ac.uk Mon Nov 24 10:32:47 2003 From: lmorris at ebi.ac.uk (Lorna Morris) Date: Mon Nov 24 10:39:18 2003 Subject: [Biojava-l] EMBLFileFormer changes Message-ID: <3FC2249F.6010108@ebi.ac.uk> Hello I'm using biojava to parse an EMBL Flat file, modify it, and dump it out to file at the end. However when I used SeqIOTools.writeEmbl the file created, did not have correctly ordered and nested RN, RP, RX, RA, RT and RL lines. These lines should occur in repeated sets, one set for each reference in the flat file. I've modified some of the biojava classes and added 2 new classes to correct this. Everthing works fine now. I've put the modified classes and new classes here: www.ebi.ac.uk/~lmorris/bioJavaFiles Files modfied: EmblLikeFormat EmblFileFormer SeqIOEventEmitter GenEmblPropertyComparator Files added: ReferenceAnnotation.java EmblReferenceComparator.java If you need any more details on the changes I've made let me know. Thanks, Lorna From lmorris at ebi.ac.uk Wed Nov 19 11:09:41 2003 From: lmorris at ebi.ac.uk (Lorna Morris) Date: Mon Nov 24 12:01:13 2003 Subject: [Biojava-l] EMBL Parser Message-ID: <3FBB95C5.9000306@ebi.ac.uk> Hello I'm using biojava to parse an EMBL Flat file, modify it, and dump it out to file at the end. However when I used SeqIOTools.writeEmbl the file created, did not have correctly ordered and nested RN, RP, RX, RA, RT and RL lines. These lines should occur in repeated sets, one set for each reference in the flat file. I've modified some of the biojava classes and added 2 new classes to correct this. Everthing works fine now. I'm attatching the classes to this mail. Files modfied: EmblLikeFormat EmblFileFormer SeqIOEventEmitter GenEmblPropertyComparator Files added: ReferenceAnnotation.java EmblReferenceComparator.java If you need any more details on the changes I've made let me know. Thanks, Lorna -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.io.PrintStream; import java.util.ArrayList; import java.util.Arrays; import java.util.Collection; import java.util.Iterator; import java.util.List; import org.biojava.bio.seq.Feature; import org.biojava.bio.seq.StrandedFeature; import org.biojava.bio.symbol.Alphabet; import org.biojava.bio.symbol.IllegalAlphabetException; import org.biojava.bio.symbol.IllegalSymbolException; import org.biojava.bio.symbol.Symbol; import org.biojava.bio.taxa.EbiFormat; import org.biojava.bio.taxa.Taxon; import org.biojava.bio.BioException; /** *

EmblFileFormer performs the detailed formatting of * EMBL entries for writing to a PrintStream. Currently * the formatting of the header is not correct. This really needs to * be addressed in the parser which is merging fields which should * remain separate.

* *

The event generator used to feed events to this class should * enforce ordering of those events. This class will stream data * directly to the PrintStream

. * *

This implementation requires that all the symbols be added in * one block as is does not buffer the tokenized symbols between * calls.

* * @author Keith James * @author Len Trigg (Taxon output) * @since 1.2 */ public class EmblFileFormer extends AbstractGenEmblFileFormer implements SeqFileFormer { // Tags which are special cases, not having "XX" after them private static List NON_SEPARATED_TAGS = new ArrayList(); static { NON_SEPARATED_TAGS.add(EmblLikeFormat.SOURCE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.REFERENCE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.COORDINATE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.REF_ACCESSION_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.AUTHORS_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.TITLE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.FEATURE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.JOURNAL_TAG);//Lorna: added NON_SEPARATED_TAGS.add(EmblLikeFormat.SEPARATOR_TAG);//Lorna: added } // 19 spaces private static String FT_LEADER = EmblLikeFormat.FEATURE_TABLE_TAG + " "; // 3 spaces private static String SQ_LEADER = " "; // 80 spaces private static String EMPTY_LINE = " " + " "; private PrintStream stream; private String idLine; private String accLine; /** * Creates a new EmblFileFormer using * System.out stream. */ protected EmblFileFormer() { this(System.out); } /** * Creates a new EmblFileFormer using the specified * stream. * * @param stream a PrintStream. */ protected EmblFileFormer(PrintStream stream) { super(); this.stream = stream; } public PrintStream getPrintStream() { return stream; } public void setPrintStream(PrintStream stream) { this.stream = stream; } public void setName(String id) throws ParseException { idLine = id; } public void startSequence() throws ParseException { aCount = 0; cCount = 0; gCount = 0; tCount = 0; oCount = 0; } public void endSequence() throws ParseException { stream.println(EmblLikeFormat.END_SEQUENCE_TAG); } public void setURI(String uri) throws ParseException { } public void addSymbols(Alphabet alpha, Symbol [] syms, int start, int length) throws IllegalAlphabetException { try { int end = start + length - 1; for (int i = start; i <= end; i++) { Symbol sym = syms[i]; if (sym == a) aCount++; else if (sym == c) cCount++; else if (sym == g) gCount++; else if (sym == t) tCount++; else oCount++; } StringBuffer sb = new StringBuffer(EmblLikeFormat.SEPARATOR_TAG); sb.append(nl); sb.append("SQ Sequence "); sb.append(length + " BP; "); sb.append(aCount + " A; "); sb.append(cCount + " C; "); sb.append(gCount + " G; "); sb.append(tCount + " T; "); sb.append(oCount + " other;"); // Print sequence summary header stream.println(sb); int fullLine = length / 60; int partLine = length % 60; int lineCount = fullLine; if (partLine > 0) lineCount++; int lineLens [] = new int [lineCount]; // All lines are 60, except last (if present) Arrays.fill(lineLens, 60); if (partLine > 0) lineLens[lineCount - 1] = partLine; for (int i = 0; i < lineLens.length; i++) { // Prep the whitespace StringBuffer sq = new StringBuffer(EMPTY_LINE); // How long is this chunk? int len = lineLens[i]; // Prepare a Symbol array same length as chunk Symbol [] sa = new Symbol [len]; // Get symbols and format into blocks of tokens System.arraycopy(syms, start + (i * 60), sa, 0, len); sb = new StringBuffer(); String blocks = (formatTokenBlock(sb, sa, 10, alpha.getTokenization("token"))).toString(); sq.replace(5, blocks.length() + 5, blocks); // Calculate the running residue count and add to the line String count = Integer.toString((i * 60) + len); sq.replace((80 - count.length()), 80, count); // Print formatted sequence line stream.println(sq); } } catch (BioException ex) { throw new IllegalAlphabetException(ex, "Alphabet not tokenizing"); } } public void addSequenceProperty(Object key, Object value) throws ParseException { StringBuffer sb = new StringBuffer(); // Ignore separators if they are sent to us. The parser should // be ignoring these really (lorna: I've changed this so they are ignored in SeqIOEventEmitter) //if (key.equals(EmblLikeFormat.SEPARATOR_TAG)) //return; String tag = key.toString(); String leader = tag + SQ_LEADER; String line = ""; int wrapWidth = 85 - leader.length(); // Special case: accession number if (key.equals(EmblProcessor.PROPERTY_EMBL_ACCESSIONS)) { accLine = buildPropertyLine((Collection) value, ";", true); return; } else if (key.equals(EmblLikeFormat.ACCESSION_TAG)) { line = accLine; } else if (key.equals(OrganismParser.PROPERTY_ORGANISM)) { Taxon taxon = (Taxon) value; addSequenceProperty(EmblLikeFormat.SOURCE_TAG, taxon); addSequenceProperty(EmblLikeFormat.ORGANISM_TAG, taxon.getParent()); addSequenceProperty(EmblLikeFormat.ORGANISM_XREF_TAG, taxon); return; } if (value instanceof String) { line = (String) value; } else if (value instanceof Collection) { // Special case: date lines if (key.equals(EmblLikeFormat.DATE_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); wrapWidth = Integer.MAX_VALUE; } //lorna :added 21.08.03, DR lines are another special case. Each one goes onto a separate line. else if (key.equals(EmblLikeFormat.DR_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); wrapWidth = Integer.MAX_VALUE; } else if (key.equals(EmblLikeFormat.AUTHORS_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); //lorna: add space here? wrapWidth = Integer.MAX_VALUE; } else if (key.equals(EmblLikeFormat.REF_ACCESSION_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); wrapWidth = Integer.MAX_VALUE; } else { line = buildPropertyLine((Collection) value, " ", false); } } else if (value instanceof Taxon) { if (key.equals(EmblLikeFormat.ORGANISM_TAG)) { line = EbiFormat.getInstance().serialize((Taxon) value); } else if (key.equals(EmblLikeFormat.SOURCE_TAG)) { line = EbiFormat.getInstance().serializeSource((Taxon) value); } else if (key.equals(EmblLikeFormat.ORGANISM_XREF_TAG)) { line = EbiFormat.getInstance().serializeXRef((Taxon) value); } } if (line.length() == 0) { stream.println(tag); } else { sb = formatSequenceProperty(sb, line, leader, wrapWidth); stream.println(sb); } // Special case: those which don't get separated if (! NON_SEPARATED_TAGS.contains(key)) stream.println(EmblLikeFormat.SEPARATOR_TAG); // Special case: feature header if (key.equals(EmblLikeFormat.FEATURE_TAG)) stream.println(EmblLikeFormat.FEATURE_TAG); } public void startFeature(Feature.Template templ) throws ParseException { int strand = 0; if (templ instanceof StrandedFeature.Template) strand = ((StrandedFeature.Template) templ).strand.getValue(); StringBuffer sb = new StringBuffer(FT_LEADER); sb = formatLocationBlock(sb, templ.location, strand, FT_LEADER, 80); sb.replace(5, 5 + templ.type.length(), templ.type); stream.println(sb); } public void endFeature() throws ParseException { } public void addFeatureProperty(Object key, Object value) { // Don't print internal data structures if (key.equals(Feature.PROPERTY_DATA_KEY)) return; StringBuffer fb; StringBuffer sb; // The value may be a collection if several qualifiers of the // same type are present in a feature if (value instanceof Collection) { for (Iterator vi = ((Collection) value).iterator(); vi.hasNext();) { fb = new StringBuffer(); sb = new StringBuffer(); fb = formatQualifierBlock(fb, formatQualifier(sb, key, vi.next()).substring(0), FT_LEADER, 80); stream.println(fb); } } else { fb = new StringBuffer(); sb = new StringBuffer(); fb = formatQualifierBlock(fb, formatQualifier(sb, key, value).substring(0), FT_LEADER, 80); stream.println(fb); } } private String buildPropertyLine(Collection property, String separator, boolean terminate) { StringBuffer sb = new StringBuffer(); for (Iterator pi = property.iterator(); pi.hasNext();) { sb.append(pi.next().toString()); sb.append(separator); } if (terminate) { return sb.substring(0); } else { return sb.substring(0, sb.length() - separator.length()); } } } -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.io.BufferedReader; import java.io.IOException; import java.io.PrintStream; import java.io.Serializable; import java.util.Vector; import java.util.ArrayList; import org.biojava.bio.seq.Sequence; import org.biojava.bio.symbol.IllegalSymbolException; import org.biojava.utils.ParseErrorEvent; import org.biojava.utils.ParseErrorListener; import org.biojava.utils.ParseErrorSource; import org.biojava.utils.ChangeVetoException; /** *

* Format processor for handling EMBL records and similar files. This * takes a very simple approach: all `normal' attribute lines are * passed to the listener as a tag (first two characters) and a value * (the rest of the line from the 6th character onwards). Any data * between the special `SQ' line and the "//" entry terminator is * passed as a SymbolReader. *

* *

* This low-level format processor should normally be used in * conjunction with one or more `filter' objects, such as * EmblProcessor. *

* *

* Many ideas borrowed from the old EmblFormat processor by Thomas * Down and Thad Welch. *

* * @author Thomas Down * @author Greg Cox * @author Keith James * @author Len Trigg * @since 1.1 */ public class EmblLikeFormat implements SequenceFormat, Serializable, ParseErrorSource, ParseErrorListener { public static final String DEFAULT = "EMBL"; protected static final String ID_TAG = "ID"; protected static final String SIZE_TAG = "SIZE"; protected static final String STRAND_NUMBER_TAG = "STRANDS"; protected static final String TYPE_TAG = "TYPE"; protected static final String CIRCULAR_TAG = "CIRCULAR"; protected static final String DIVISION_TAG = "DIVISION"; protected static final String DR_TAG = "DR"; //Lorna: new tag protected static final String ACCESSION_TAG = "AC"; protected static final String VERSION_TAG = "SV"; protected static final String DATE_TAG = "DT"; protected static final String DEFINITION_TAG = "DE"; protected static final String KEYWORDS_TAG = "KW"; protected static final String SOURCE_TAG = "OS"; protected static final String ORGANISM_TAG = "OC"; protected static final String ORGANISM_XREF_TAG = "OX"; protected static final String REFERENCE_TAG = "RN"; protected static final String COORDINATE_TAG = "RP"; protected static final String REF_ACCESSION_TAG = "RX"; protected static final String AUTHORS_TAG = "RA"; protected static final String TITLE_TAG = "RT"; protected static final String JOURNAL_TAG = "RL"; protected static final String COMMENT_TAG = "CC"; protected static final String FEATURE_TAG = "FH"; protected static final String SEPARATOR_TAG = "XX"; protected static final String FEATURE_TABLE_TAG = "FT"; protected static final String START_SEQUENCE_TAG = "SQ"; protected static final String END_SEQUENCE_TAG = "//"; private boolean elideSymbols = false; private Vector mListeners = new Vector(); /** *

Specifies whether the symbols (SQ) part of the entry should * be ignored. If this property is set to true, the * parser will never call addSymbols on the * SeqIOListener, but parsing will be faster if * you're only interested in header information.

* *

This property also allows the header to be parsed for files * which have invalid sequence data.

*/ public void setElideSymbols(boolean b) { elideSymbols = b; } /** * Return a flag indicating if symbol data will be skipped * when parsing streams. */ public boolean getElideSymbols() { return elideSymbols; } public boolean readSequence(BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener) throws IllegalSymbolException, IOException, ParseException { EmblReferenceProperty reference = null; //lorna if (listener instanceof ParseErrorSource) { ((ParseErrorSource)(listener)).addParseErrorListener(this); } String line; StreamParser sparser = null; boolean hasMoreSequence = true; boolean hasInternalWhitespace = false; listener.startSequence(); while ((line = reader.readLine()) != null) { if (line.startsWith(END_SEQUENCE_TAG)) { if (sparser != null) { // End of symbol data sparser.close(); sparser = null; } // Allows us to tolerate trailing whitespace without // thinking that there is another Sequence to follow while (true) { reader.mark(1); int c = reader.read(); if (c == -1) { hasMoreSequence = false; break; } if (Character.isWhitespace((char) c)) { hasInternalWhitespace = true; continue; } if (hasInternalWhitespace) System.err.println("Warning: whitespace found between sequence entries"); reader.reset(); break; } listener.endSequence(); return hasMoreSequence; } else if (line.startsWith(START_SEQUENCE_TAG)) { // Adding a null property to flush the last feature; // Needed for Swissprot files because there is no gap // between the feature table and the sequence data listener.addSequenceProperty(SEPARATOR_TAG, ""); sparser = symParser.parseStream(listener); } else { if (sparser == null) { // Normal attribute line String tag = line.substring(0, 2); String rest = null; if (line.length() > 5) { rest = line.substring(5); } //lorna added, tags read in order, when a complete set goes through, //spit out a single annotation event ReferenceAnnotation refAnnot = new ReferenceAnnotation(); if (tag.equals(REFERENCE_TAG)) { //only 1 reference_tag! try { refAnnot.setProperty(tag, rest); while (!(tag.equals(SEPARATOR_TAG))) { // Normal attribute line line = reader.readLine(); tag = line.substring(0, 2); if (line.length() > 5) { rest = line.substring(5); } else { rest = null;//for XX lines } if (refAnnot.containsProperty(tag)) { Object property = refAnnot.getProperty(tag); ArrayList properties; if (property instanceof String) { properties = new ArrayList(); properties.add(property); properties.add(rest); refAnnot.setProperty(tag, properties); } if (property instanceof ArrayList) { ((ArrayList)property).add(rest); } } else { refAnnot.setProperty(tag, rest); } } listener.addSequenceProperty(ReferenceAnnotation.class, refAnnot); } catch (ChangeVetoException cve) { cve.printStackTrace(); } } // lorna, end else { //lorna listener.addSequenceProperty(tag, rest); } //lorna } else { // Sequence line if (! elideSymbols) processSequenceLine(line, sparser); } } } if (sparser != null) sparser.close(); throw new IOException("Premature end of stream or missing end tag '//' for EMBL"); } /** * Dispatch symbol data from SQ-block line of an EMBL-like file. */ protected void processSequenceLine(String line, StreamParser parser) throws IllegalSymbolException, ParseException { char[] cline = line.toCharArray(); int parseStart = 0; int parseEnd = 0; while (parseStart < cline.length) { while (parseStart < cline.length && cline[parseStart] == ' ') ++parseStart; if (parseStart >= cline.length) break; if (Character.isDigit(cline[parseStart])) return; parseEnd = parseStart + 1; while (parseEnd < cline.length && cline[parseEnd] != ' ') { if (cline[parseEnd] == '.' || cline[parseEnd] == '~') { cline[parseEnd] = '-'; } ++parseEnd; } // Got a segment of read sequence data parser.characters(cline, parseStart, parseEnd - parseStart); parseStart = parseEnd; } } public void writeSequence(Sequence seq, PrintStream os) throws IOException { writeSequence(seq, getDefaultFormat(), os); } /** * writeSequence writes a sequence to the specified * PrintStream, using the specified format. * * @param seq a Sequence to write out. * @param format a String indicating which sub-format * of those available from a particular * SequenceFormat implemention to use when * writing. * @param os a PrintStream object. * * @exception IOException if an error occurs. * @deprecated use writeSequence(Sequence seq, PrintStream os) */ public void writeSequence(Sequence seq, String format, PrintStream os) throws IOException { SeqFileFormer former; if (format.equalsIgnoreCase("EMBL")) former = new EmblFileFormer(); else if (format.equalsIgnoreCase("SWISSPROT")) former = new SwissprotFileFormer(); else throw new IllegalArgumentException("Unknown format '" + format + "'"); former.setPrintStream(os); SeqIOEventEmitter emitter = new SeqIOEventEmitter(GenEmblPropertyComparator.INSTANCE, GenEmblFeatureComparator.INSTANCE); emitter.getSeqIOEvents(seq, former); } /** * getDefaultFormat returns the String identifier for * the default format written by a SequenceFormat * implementation. * * @return a String. * @deprecated */ public String getDefaultFormat() { return DEFAULT; } /** *

* This method determines the behaviour when a bad line is processed. * Some options are to log the error, throw an exception, ignore it * completely, or pass the event through. *

* *

* This method should be overwritten when different behavior is desired. *

* * @param theEvent The event that contains the bad line and token. */ public void BadLineParsed(ParseErrorEvent theEvent) { notifyParseErrorEvent(theEvent); } /** * Adds a parse error listener to the list of listeners if it isn't already * included. * * @param theListener Listener to be added. */ public synchronized void addParseErrorListener(ParseErrorListener theListener) { if (mListeners.contains(theListener) == false) { mListeners.addElement(theListener); } } /** * Removes a parse error listener from the list of listeners if it is * included. * * @param theListener Listener to be removed. */ public synchronized void removeParseErrorListener(ParseErrorListener theListener) { if (mListeners.contains(theListener) == true) { mListeners.removeElement(theListener); } } // Protected methods /** * Passes the event on to all the listeners registered for ParseErrorEvents. * * @param theEvent The event to be handed to the listeners. */ protected void notifyParseErrorEvent(ParseErrorEvent theEvent) { Vector listeners; synchronized(this) { listeners = (Vector)mListeners.clone(); } for (int index = 0; index < listeners.size(); index++) { ParseErrorListener client = (ParseErrorListener)listeners.elementAt(index); client.BadLineParsed(theEvent); } } } -------------- next part -------------- /* * Created by IntelliJ IDEA. * User: lmorris * Date: Nov 14, 2003 * Time: 11:11:52 AM * To change template for new class use * Code Style | Class Templates options (Tools | IDE Options). */ package org.biojava.bio.seq.io; import java.util.Comparator; import java.util.List; import java.util.ArrayList; public class EmblReferenceComparator implements Comparator { static final Comparator INSTANCE = new EmblReferenceComparator(); private List tagOrder; { tagOrder = new ArrayList(); tagOrder.add(EmblLikeFormat.REFERENCE_TAG); tagOrder.add(EmblLikeFormat.COORDINATE_TAG); tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG); tagOrder.add(EmblLikeFormat.AUTHORS_TAG); tagOrder.add(EmblLikeFormat.TITLE_TAG); tagOrder.add(EmblLikeFormat.JOURNAL_TAG); tagOrder.add(EmblLikeFormat.SEPARATOR_TAG); } public int compare(Object o1, Object o2) { int index1 = tagOrder.indexOf(o1); int index2 = tagOrder.indexOf(o2); return (index1 - index2); } } -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.util.ArrayList; import java.util.Comparator; import java.util.List; /** *

GenEmblPropertyComparator compares Genbank/EMBL * file format tags by the order in which they should appear in their * respective formats.

* *

EMBL tags sort before Genbank tags. This is arbitrary. Given the * subtle differences in the values accompanying equivalent tags in * these formats the two sets shouldn't be mixed anyway.

* *

Any tags which belong to neither set sort before anything * else.

* * @author Keith James */ final class GenEmblPropertyComparator implements Comparator { static final Comparator INSTANCE = new GenEmblPropertyComparator(); private List tagOrder; private GenEmblPropertyComparator() { tagOrder = new ArrayList(); tagOrder.add(EmblLikeFormat.ID_TAG); tagOrder.add(EmblLikeFormat.ACCESSION_TAG); tagOrder.add(EmblLikeFormat.VERSION_TAG); tagOrder.add(EmblLikeFormat.DATE_TAG); tagOrder.add(EmblLikeFormat.DEFINITION_TAG); tagOrder.add(EmblLikeFormat.KEYWORDS_TAG); tagOrder.add(EmblLikeFormat.SOURCE_TAG); tagOrder.add(EmblLikeFormat.ORGANISM_TAG); /*tagOrder.add(EmblLikeFormat.REFERENCE_TAG); tagOrder.add(EmblLikeFormat.COORDINATE_TAG); tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG); tagOrder.add(EmblLikeFormat.AUTHORS_TAG); tagOrder.add(EmblLikeFormat.TITLE_TAG); tagOrder.add(EmblLikeFormat.JOURNAL_TAG);*/ tagOrder.add(ReferenceAnnotation.class); tagOrder.add(EmblLikeFormat.DR_TAG);//lorna:added 21.08.03 tagOrder.add(EmblLikeFormat.COORDINATE_TAG); tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG); tagOrder.add(EmblLikeFormat.AUTHORS_TAG); tagOrder.add(EmblLikeFormat.TITLE_TAG); tagOrder.add(EmblLikeFormat.JOURNAL_TAG); tagOrder.add(EmblLikeFormat.COMMENT_TAG); tagOrder.add(EmblLikeFormat.FEATURE_TAG); tagOrder.add(GenbankFormat.LOCUS_TAG); tagOrder.add(GenbankFormat.SIZE_TAG); tagOrder.add(GenbankFormat.STRAND_NUMBER_TAG); tagOrder.add(GenbankFormat.TYPE_TAG); tagOrder.add(GenbankFormat.CIRCULAR_TAG); tagOrder.add(GenbankFormat.DIVISION_TAG); tagOrder.add(GenbankFormat.DATE_TAG); tagOrder.add(GenbankFormat.DEFINITION_TAG); tagOrder.add(GenbankFormat.ACCESSION_TAG); tagOrder.add(GenbankFormat.VERSION_TAG); tagOrder.add(GenbankFormat.GI_TAG); tagOrder.add(GenbankFormat.KEYWORDS_TAG); tagOrder.add(GenbankFormat.SOURCE_TAG); tagOrder.add(GenbankFormat.ORGANISM_TAG); tagOrder.add(GenbankFormat.REFERENCE_TAG); tagOrder.add(GenbankFormat.AUTHORS_TAG); tagOrder.add(GenbankFormat.TITLE_TAG); tagOrder.add(GenbankFormat.JOURNAL_TAG); tagOrder.add(GenbankFormat.COMMENT_TAG); tagOrder.add(GenbankFormat.FEATURE_TAG); } public int compare(Object o1, Object o2) { int index1 = tagOrder.indexOf(o1); int index2 = tagOrder.indexOf(o2); return (index1 - index2); } } -------------- next part -------------- /* * Created by IntelliJ IDEA. * User: lmorris * Date: Nov 14, 2003 * Time: 11:45:41 AM * To change template for new class use * Code Style | Class Templates options (Tools | IDE Options). */ package org.biojava.bio.seq.io; import org.biojava.bio.AbstractAnnotation; import org.biojava.utils.ChangeVetoException; import java.util.Map; import java.util.HashMap; public class ReferenceAnnotation extends AbstractAnnotation { /** * The properties map. This may be null if no property values have * yet been set. */ private Map properties; public ReferenceAnnotation() { super(); try { System.out.println("Calling refAnnot"); this.setProperty(EmblLikeFormat.SEPARATOR_TAG, "");//all references have an epty XX line } catch (ChangeVetoException e) { e.printStackTrace(); } } protected Map getProperties() { if(!propertiesAllocated()) { properties = new HashMap(); } return properties; } protected boolean propertiesAllocated() { return properties != null; } } -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.util.*; import org.biojava.bio.Annotation; import org.biojava.bio.BioError; import org.biojava.bio.seq.Feature; import org.biojava.bio.seq.FeatureHolder; import org.biojava.bio.seq.Sequence; import org.biojava.bio.symbol.IllegalAlphabetException; import org.biojava.bio.symbol.Symbol; /** * SeqIOEventEmitter is a utility class which scans a * Sequence object and sends events describing its * constituent data to a SeqIOListener. The listener * should be able to reconstruct the Sequence from these * events. * * @author Keith James * @since 1.2 */ class SeqIOEventEmitter { private static Symbol [] symProto = new Symbol [0]; private Comparator seqPropComparator; private Comparator refPropComparator; private Comparator featureComparator; SeqIOEventEmitter(Comparator seqPropComparator, Comparator featureComparator) { this.seqPropComparator = seqPropComparator; this.featureComparator = featureComparator; }; /** * getSeqIOEvents scans a Sequence * object and sends events describing its data to the * SeqIOListener. * * @param seq a Sequence. * @param listener a SeqIOListener. */ void getSeqIOEvents(Sequence seq, SeqIOListener listener) { try { // Inform listener of sequence start listener.startSequence(); // Pass name to listener listener.setName(seq.getName()); // Pass URN to listener listener.setURI(seq.getURN()); // Pass sequence properties to listener Annotation a = seq.getAnnotation(); List sKeys = new ArrayList(a.keys()); Collections.sort(sKeys, seqPropComparator); for (Iterator ki = sKeys.iterator(); ki.hasNext();) { Object key = ki.next(); if ( key.equals(ReferenceAnnotation.class)) { ArrayList references = null; if (a.getProperty(key) instanceof ArrayList) { references = ((ArrayList)a.getProperty(key)); } if (references != null) { for ( int i = 0; i < references.size(); i++ ) { ReferenceAnnotation refAnnot = (ReferenceAnnotation)references.get(i); Map referenceLines = refAnnot.getProperties(); List refKeys = new ArrayList(referenceLines.keySet()); refPropComparator = EmblReferenceComparator.INSTANCE; Collections.sort(refKeys, refPropComparator); for (Iterator kit = refKeys.iterator(); kit.hasNext();) { Object refKey = kit.next(); //adds all the R* tags and final XX tag listener.addSequenceProperty(refKey, refAnnot.getProperty(refKey)); } } } } else { if (!(key.equals(EmblLikeFormat.SEPARATOR_TAG))) { //lorna: ignore XX listener.addSequenceProperty(key, a.getProperty(key)); } } } // Recurse through sub feature tree, flattening it for // EMBL List subs = getSubFeatures(seq); Collections.sort(subs, featureComparator); // Put the source features first for EMBL for (Iterator fi = subs.iterator(); fi.hasNext();) { // The template is required to call startFeature Feature.Template t = ((Feature) fi.next()).makeTemplate(); // Inform listener of feature start listener.startFeature(t); // Pass feature properties (i.e. qualifiers to // listener) // FIXME: this will drop all non-comparable keys List fKeys = comparableList(t.annotation.keys()); Collections.sort(fKeys); for (Iterator ki = fKeys.iterator(); ki.hasNext();) { Object key = ki.next(); listener.addFeatureProperty(key, t.annotation.getProperty(key)); } // Inform listener of feature end listener.endFeature(); } // Add symbols listener.addSymbols(seq.getAlphabet(), (Symbol []) seq.toList().toArray(symProto), 0, seq.length()); // Inform listener of sequence end listener.endSequence(); } catch (IllegalAlphabetException iae) { // This should never happen as the alphabet is being used // by this Sequence instance throw new BioError("An internal error occurred processing symbols",iae); } catch (ParseException pe) { throw new BioError("An internal error occurred creating SeqIO events",pe); } } /** * getSubFeatures is a recursive method which returns * a list of all Features within a * FeatureHolder. * * @param fh a FeatureHolder. * * @return a List. */ private static List getSubFeatures(FeatureHolder fh) { List subfeat = new ArrayList(); for (Iterator fi = fh.features(); fi.hasNext();) { FeatureHolder sfh = (FeatureHolder) fi.next(); subfeat.addAll((Collection) getSubFeatures(sfh)); subfeat.add(sfh); } return subfeat; } private List comparableList(Collection coll) { ArrayList res = new ArrayList(); for(Iterator i = coll.iterator(); i.hasNext(); ) { Object o = i.next(); if(o instanceof Comparable) { res.add(o); } } return res; } } From lmorris at ebi.ac.uk Mon Nov 24 09:37:52 2003 From: lmorris at ebi.ac.uk (Lorna Morris) Date: Mon Nov 24 12:01:15 2003 Subject: [Biojava-l] EmblFileFormer Message-ID: <3FC217C0.9050900@ebi.ac.uk> Hello I'm using biojava to parse an EMBL Flat file, modify it, and dump it out to file at the end. However when I used SeqIOTools.writeEmbl the file created, did not have correctly ordered and nested RN, RP, RX, RA, RT and RL lines. These lines should occur in repeated sets, one set for each reference in the flat file. I've modified some of the biojava classes and added 2 new classes to correct this. Everthing works fine now. I'm attatching the classes to this mail. Files modfied: EmblLikeFormat EmblFileFormer SeqIOEventEmitter GenEmblPropertyComparator Files added: ReferenceAnnotation.java EmblReferenceComparator.java If you need any more details on the changes I've made let me know. Thanks, Lorna -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.io.PrintStream; import java.util.ArrayList; import java.util.Arrays; import java.util.Collection; import java.util.Iterator; import java.util.List; import org.biojava.bio.seq.Feature; import org.biojava.bio.seq.StrandedFeature; import org.biojava.bio.symbol.Alphabet; import org.biojava.bio.symbol.IllegalAlphabetException; import org.biojava.bio.symbol.IllegalSymbolException; import org.biojava.bio.symbol.Symbol; import org.biojava.bio.taxa.EbiFormat; import org.biojava.bio.taxa.Taxon; import org.biojava.bio.BioException; /** *

EmblFileFormer performs the detailed formatting of * EMBL entries for writing to a PrintStream. Currently * the formatting of the header is not correct. This really needs to * be addressed in the parser which is merging fields which should * remain separate.

* *

The event generator used to feed events to this class should * enforce ordering of those events. This class will stream data * directly to the PrintStream

. * *

This implementation requires that all the symbols be added in * one block as is does not buffer the tokenized symbols between * calls.

* * @author Keith James * @author Len Trigg (Taxon output) * @since 1.2 */ public class EmblFileFormer extends AbstractGenEmblFileFormer implements SeqFileFormer { // Tags which are special cases, not having "XX" after them private static List NON_SEPARATED_TAGS = new ArrayList(); static { NON_SEPARATED_TAGS.add(EmblLikeFormat.SOURCE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.REFERENCE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.COORDINATE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.REF_ACCESSION_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.AUTHORS_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.TITLE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.FEATURE_TAG); NON_SEPARATED_TAGS.add(EmblLikeFormat.JOURNAL_TAG);//Lorna: added NON_SEPARATED_TAGS.add(EmblLikeFormat.SEPARATOR_TAG);//Lorna: added } // 19 spaces private static String FT_LEADER = EmblLikeFormat.FEATURE_TABLE_TAG + " "; // 3 spaces private static String SQ_LEADER = " "; // 80 spaces private static String EMPTY_LINE = " " + " "; private PrintStream stream; private String idLine; private String accLine; /** * Creates a new EmblFileFormer using * System.out stream. */ protected EmblFileFormer() { this(System.out); } /** * Creates a new EmblFileFormer using the specified * stream. * * @param stream a PrintStream. */ protected EmblFileFormer(PrintStream stream) { super(); this.stream = stream; } public PrintStream getPrintStream() { return stream; } public void setPrintStream(PrintStream stream) { this.stream = stream; } public void setName(String id) throws ParseException { idLine = id; } public void startSequence() throws ParseException { aCount = 0; cCount = 0; gCount = 0; tCount = 0; oCount = 0; } public void endSequence() throws ParseException { stream.println(EmblLikeFormat.END_SEQUENCE_TAG); } public void setURI(String uri) throws ParseException { } public void addSymbols(Alphabet alpha, Symbol [] syms, int start, int length) throws IllegalAlphabetException { try { int end = start + length - 1; for (int i = start; i <= end; i++) { Symbol sym = syms[i]; if (sym == a) aCount++; else if (sym == c) cCount++; else if (sym == g) gCount++; else if (sym == t) tCount++; else oCount++; } StringBuffer sb = new StringBuffer(EmblLikeFormat.SEPARATOR_TAG); sb.append(nl); sb.append("SQ Sequence "); sb.append(length + " BP; "); sb.append(aCount + " A; "); sb.append(cCount + " C; "); sb.append(gCount + " G; "); sb.append(tCount + " T; "); sb.append(oCount + " other;"); // Print sequence summary header stream.println(sb); int fullLine = length / 60; int partLine = length % 60; int lineCount = fullLine; if (partLine > 0) lineCount++; int lineLens [] = new int [lineCount]; // All lines are 60, except last (if present) Arrays.fill(lineLens, 60); if (partLine > 0) lineLens[lineCount - 1] = partLine; for (int i = 0; i < lineLens.length; i++) { // Prep the whitespace StringBuffer sq = new StringBuffer(EMPTY_LINE); // How long is this chunk? int len = lineLens[i]; // Prepare a Symbol array same length as chunk Symbol [] sa = new Symbol [len]; // Get symbols and format into blocks of tokens System.arraycopy(syms, start + (i * 60), sa, 0, len); sb = new StringBuffer(); String blocks = (formatTokenBlock(sb, sa, 10, alpha.getTokenization("token"))).toString(); sq.replace(5, blocks.length() + 5, blocks); // Calculate the running residue count and add to the line String count = Integer.toString((i * 60) + len); sq.replace((80 - count.length()), 80, count); // Print formatted sequence line stream.println(sq); } } catch (BioException ex) { throw new IllegalAlphabetException(ex, "Alphabet not tokenizing"); } } public void addSequenceProperty(Object key, Object value) throws ParseException { StringBuffer sb = new StringBuffer(); // Ignore separators if they are sent to us. The parser should // be ignoring these really (lorna: I've changed this so they are ignored in SeqIOEventEmitter) //if (key.equals(EmblLikeFormat.SEPARATOR_TAG)) //return; String tag = key.toString(); String leader = tag + SQ_LEADER; String line = ""; int wrapWidth = 85 - leader.length(); // Special case: accession number if (key.equals(EmblProcessor.PROPERTY_EMBL_ACCESSIONS)) { accLine = buildPropertyLine((Collection) value, ";", true); return; } else if (key.equals(EmblLikeFormat.ACCESSION_TAG)) { line = accLine; } else if (key.equals(OrganismParser.PROPERTY_ORGANISM)) { Taxon taxon = (Taxon) value; addSequenceProperty(EmblLikeFormat.SOURCE_TAG, taxon); addSequenceProperty(EmblLikeFormat.ORGANISM_TAG, taxon.getParent()); addSequenceProperty(EmblLikeFormat.ORGANISM_XREF_TAG, taxon); return; } if (value instanceof String) { line = (String) value; } else if (value instanceof Collection) { // Special case: date lines if (key.equals(EmblLikeFormat.DATE_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); wrapWidth = Integer.MAX_VALUE; } //lorna :added 21.08.03, DR lines are another special case. Each one goes onto a separate line. else if (key.equals(EmblLikeFormat.DR_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); wrapWidth = Integer.MAX_VALUE; } else if (key.equals(EmblLikeFormat.AUTHORS_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); //lorna: add space here? wrapWidth = Integer.MAX_VALUE; } else if (key.equals(EmblLikeFormat.REF_ACCESSION_TAG)) { line = buildPropertyLine((Collection) value, nl + leader, false); wrapWidth = Integer.MAX_VALUE; } else { line = buildPropertyLine((Collection) value, " ", false); } } else if (value instanceof Taxon) { if (key.equals(EmblLikeFormat.ORGANISM_TAG)) { line = EbiFormat.getInstance().serialize((Taxon) value); } else if (key.equals(EmblLikeFormat.SOURCE_TAG)) { line = EbiFormat.getInstance().serializeSource((Taxon) value); } else if (key.equals(EmblLikeFormat.ORGANISM_XREF_TAG)) { line = EbiFormat.getInstance().serializeXRef((Taxon) value); } } if (line.length() == 0) { stream.println(tag); } else { sb = formatSequenceProperty(sb, line, leader, wrapWidth); stream.println(sb); } // Special case: those which don't get separated if (! NON_SEPARATED_TAGS.contains(key)) stream.println(EmblLikeFormat.SEPARATOR_TAG); // Special case: feature header if (key.equals(EmblLikeFormat.FEATURE_TAG)) stream.println(EmblLikeFormat.FEATURE_TAG); } public void startFeature(Feature.Template templ) throws ParseException { int strand = 0; if (templ instanceof StrandedFeature.Template) strand = ((StrandedFeature.Template) templ).strand.getValue(); StringBuffer sb = new StringBuffer(FT_LEADER); sb = formatLocationBlock(sb, templ.location, strand, FT_LEADER, 80); sb.replace(5, 5 + templ.type.length(), templ.type); stream.println(sb); } public void endFeature() throws ParseException { } public void addFeatureProperty(Object key, Object value) { // Don't print internal data structures if (key.equals(Feature.PROPERTY_DATA_KEY)) return; StringBuffer fb; StringBuffer sb; // The value may be a collection if several qualifiers of the // same type are present in a feature if (value instanceof Collection) { for (Iterator vi = ((Collection) value).iterator(); vi.hasNext();) { fb = new StringBuffer(); sb = new StringBuffer(); fb = formatQualifierBlock(fb, formatQualifier(sb, key, vi.next()).substring(0), FT_LEADER, 80); stream.println(fb); } } else { fb = new StringBuffer(); sb = new StringBuffer(); fb = formatQualifierBlock(fb, formatQualifier(sb, key, value).substring(0), FT_LEADER, 80); stream.println(fb); } } private String buildPropertyLine(Collection property, String separator, boolean terminate) { StringBuffer sb = new StringBuffer(); for (Iterator pi = property.iterator(); pi.hasNext();) { sb.append(pi.next().toString()); sb.append(separator); } if (terminate) { return sb.substring(0); } else { return sb.substring(0, sb.length() - separator.length()); } } } -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.io.BufferedReader; import java.io.IOException; import java.io.PrintStream; import java.io.Serializable; import java.util.Vector; import java.util.ArrayList; import org.biojava.bio.seq.Sequence; import org.biojava.bio.symbol.IllegalSymbolException; import org.biojava.utils.ParseErrorEvent; import org.biojava.utils.ParseErrorListener; import org.biojava.utils.ParseErrorSource; import org.biojava.utils.ChangeVetoException; /** *

* Format processor for handling EMBL records and similar files. This * takes a very simple approach: all `normal' attribute lines are * passed to the listener as a tag (first two characters) and a value * (the rest of the line from the 6th character onwards). Any data * between the special `SQ' line and the "//" entry terminator is * passed as a SymbolReader. *

* *

* This low-level format processor should normally be used in * conjunction with one or more `filter' objects, such as * EmblProcessor. *

* *

* Many ideas borrowed from the old EmblFormat processor by Thomas * Down and Thad Welch. *

* * @author Thomas Down * @author Greg Cox * @author Keith James * @author Len Trigg * @since 1.1 */ public class EmblLikeFormat implements SequenceFormat, Serializable, ParseErrorSource, ParseErrorListener { public static final String DEFAULT = "EMBL"; protected static final String ID_TAG = "ID"; protected static final String SIZE_TAG = "SIZE"; protected static final String STRAND_NUMBER_TAG = "STRANDS"; protected static final String TYPE_TAG = "TYPE"; protected static final String CIRCULAR_TAG = "CIRCULAR"; protected static final String DIVISION_TAG = "DIVISION"; protected static final String DR_TAG = "DR"; //Lorna: new tag protected static final String ACCESSION_TAG = "AC"; protected static final String VERSION_TAG = "SV"; protected static final String DATE_TAG = "DT"; protected static final String DEFINITION_TAG = "DE"; protected static final String KEYWORDS_TAG = "KW"; protected static final String SOURCE_TAG = "OS"; protected static final String ORGANISM_TAG = "OC"; protected static final String ORGANISM_XREF_TAG = "OX"; protected static final String REFERENCE_TAG = "RN"; protected static final String COORDINATE_TAG = "RP"; protected static final String REF_ACCESSION_TAG = "RX"; protected static final String AUTHORS_TAG = "RA"; protected static final String TITLE_TAG = "RT"; protected static final String JOURNAL_TAG = "RL"; protected static final String COMMENT_TAG = "CC"; protected static final String FEATURE_TAG = "FH"; protected static final String SEPARATOR_TAG = "XX"; protected static final String FEATURE_TABLE_TAG = "FT"; protected static final String START_SEQUENCE_TAG = "SQ"; protected static final String END_SEQUENCE_TAG = "//"; private boolean elideSymbols = false; private Vector mListeners = new Vector(); /** *

Specifies whether the symbols (SQ) part of the entry should * be ignored. If this property is set to true, the * parser will never call addSymbols on the * SeqIOListener, but parsing will be faster if * you're only interested in header information.

* *

This property also allows the header to be parsed for files * which have invalid sequence data.

*/ public void setElideSymbols(boolean b) { elideSymbols = b; } /** * Return a flag indicating if symbol data will be skipped * when parsing streams. */ public boolean getElideSymbols() { return elideSymbols; } public boolean readSequence(BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener) throws IllegalSymbolException, IOException, ParseException { EmblReferenceProperty reference = null; //lorna if (listener instanceof ParseErrorSource) { ((ParseErrorSource)(listener)).addParseErrorListener(this); } String line; StreamParser sparser = null; boolean hasMoreSequence = true; boolean hasInternalWhitespace = false; listener.startSequence(); while ((line = reader.readLine()) != null) { if (line.startsWith(END_SEQUENCE_TAG)) { if (sparser != null) { // End of symbol data sparser.close(); sparser = null; } // Allows us to tolerate trailing whitespace without // thinking that there is another Sequence to follow while (true) { reader.mark(1); int c = reader.read(); if (c == -1) { hasMoreSequence = false; break; } if (Character.isWhitespace((char) c)) { hasInternalWhitespace = true; continue; } if (hasInternalWhitespace) System.err.println("Warning: whitespace found between sequence entries"); reader.reset(); break; } listener.endSequence(); return hasMoreSequence; } else if (line.startsWith(START_SEQUENCE_TAG)) { // Adding a null property to flush the last feature; // Needed for Swissprot files because there is no gap // between the feature table and the sequence data listener.addSequenceProperty(SEPARATOR_TAG, ""); sparser = symParser.parseStream(listener); } else { if (sparser == null) { // Normal attribute line String tag = line.substring(0, 2); String rest = null; if (line.length() > 5) { rest = line.substring(5); } //lorna added, tags read in order, when a complete set goes through, //spit out a single annotation event ReferenceAnnotation refAnnot = new ReferenceAnnotation(); if (tag.equals(REFERENCE_TAG)) { //only 1 reference_tag! try { refAnnot.setProperty(tag, rest); while (!(tag.equals(SEPARATOR_TAG))) { // Normal attribute line line = reader.readLine(); tag = line.substring(0, 2); if (line.length() > 5) { rest = line.substring(5); } else { rest = null;//for XX lines } if (refAnnot.containsProperty(tag)) { Object property = refAnnot.getProperty(tag); ArrayList properties; if (property instanceof String) { properties = new ArrayList(); properties.add(property); properties.add(rest); refAnnot.setProperty(tag, properties); } if (property instanceof ArrayList) { ((ArrayList)property).add(rest); } } else { refAnnot.setProperty(tag, rest); } } listener.addSequenceProperty(ReferenceAnnotation.class, refAnnot); } catch (ChangeVetoException cve) { cve.printStackTrace(); } } // lorna, end else { //lorna listener.addSequenceProperty(tag, rest); } //lorna } else { // Sequence line if (! elideSymbols) processSequenceLine(line, sparser); } } } if (sparser != null) sparser.close(); throw new IOException("Premature end of stream or missing end tag '//' for EMBL"); } /** * Dispatch symbol data from SQ-block line of an EMBL-like file. */ protected void processSequenceLine(String line, StreamParser parser) throws IllegalSymbolException, ParseException { char[] cline = line.toCharArray(); int parseStart = 0; int parseEnd = 0; while (parseStart < cline.length) { while (parseStart < cline.length && cline[parseStart] == ' ') ++parseStart; if (parseStart >= cline.length) break; if (Character.isDigit(cline[parseStart])) return; parseEnd = parseStart + 1; while (parseEnd < cline.length && cline[parseEnd] != ' ') { if (cline[parseEnd] == '.' || cline[parseEnd] == '~') { cline[parseEnd] = '-'; } ++parseEnd; } // Got a segment of read sequence data parser.characters(cline, parseStart, parseEnd - parseStart); parseStart = parseEnd; } } public void writeSequence(Sequence seq, PrintStream os) throws IOException { writeSequence(seq, getDefaultFormat(), os); } /** * writeSequence writes a sequence to the specified * PrintStream, using the specified format. * * @param seq a Sequence to write out. * @param format a String indicating which sub-format * of those available from a particular * SequenceFormat implemention to use when * writing. * @param os a PrintStream object. * * @exception IOException if an error occurs. * @deprecated use writeSequence(Sequence seq, PrintStream os) */ public void writeSequence(Sequence seq, String format, PrintStream os) throws IOException { SeqFileFormer former; if (format.equalsIgnoreCase("EMBL")) former = new EmblFileFormer(); else if (format.equalsIgnoreCase("SWISSPROT")) former = new SwissprotFileFormer(); else throw new IllegalArgumentException("Unknown format '" + format + "'"); former.setPrintStream(os); SeqIOEventEmitter emitter = new SeqIOEventEmitter(GenEmblPropertyComparator.INSTANCE, GenEmblFeatureComparator.INSTANCE); emitter.getSeqIOEvents(seq, former); } /** * getDefaultFormat returns the String identifier for * the default format written by a SequenceFormat * implementation. * * @return a String. * @deprecated */ public String getDefaultFormat() { return DEFAULT; } /** *

* This method determines the behaviour when a bad line is processed. * Some options are to log the error, throw an exception, ignore it * completely, or pass the event through. *

* *

* This method should be overwritten when different behavior is desired. *

* * @param theEvent The event that contains the bad line and token. */ public void BadLineParsed(ParseErrorEvent theEvent) { notifyParseErrorEvent(theEvent); } /** * Adds a parse error listener to the list of listeners if it isn't already * included. * * @param theListener Listener to be added. */ public synchronized void addParseErrorListener(ParseErrorListener theListener) { if (mListeners.contains(theListener) == false) { mListeners.addElement(theListener); } } /** * Removes a parse error listener from the list of listeners if it is * included. * * @param theListener Listener to be removed. */ public synchronized void removeParseErrorListener(ParseErrorListener theListener) { if (mListeners.contains(theListener) == true) { mListeners.removeElement(theListener); } } // Protected methods /** * Passes the event on to all the listeners registered for ParseErrorEvents. * * @param theEvent The event to be handed to the listeners. */ protected void notifyParseErrorEvent(ParseErrorEvent theEvent) { Vector listeners; synchronized(this) { listeners = (Vector)mListeners.clone(); } for (int index = 0; index < listeners.size(); index++) { ParseErrorListener client = (ParseErrorListener)listeners.elementAt(index); client.BadLineParsed(theEvent); } } } -------------- next part -------------- /* * Created by IntelliJ IDEA. * User: lmorris * Date: Nov 14, 2003 * Time: 11:11:52 AM * To change template for new class use * Code Style | Class Templates options (Tools | IDE Options). */ package org.biojava.bio.seq.io; import java.util.Comparator; import java.util.List; import java.util.ArrayList; public class EmblReferenceComparator implements Comparator { static final Comparator INSTANCE = new EmblReferenceComparator(); private List tagOrder; { tagOrder = new ArrayList(); tagOrder.add(EmblLikeFormat.REFERENCE_TAG); tagOrder.add(EmblLikeFormat.COORDINATE_TAG); tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG); tagOrder.add(EmblLikeFormat.AUTHORS_TAG); tagOrder.add(EmblLikeFormat.TITLE_TAG); tagOrder.add(EmblLikeFormat.JOURNAL_TAG); tagOrder.add(EmblLikeFormat.SEPARATOR_TAG); } public int compare(Object o1, Object o2) { int index1 = tagOrder.indexOf(o1); int index2 = tagOrder.indexOf(o2); return (index1 - index2); } } -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.util.ArrayList; import java.util.Comparator; import java.util.List; /** *

GenEmblPropertyComparator compares Genbank/EMBL * file format tags by the order in which they should appear in their * respective formats.

* *

EMBL tags sort before Genbank tags. This is arbitrary. Given the * subtle differences in the values accompanying equivalent tags in * these formats the two sets shouldn't be mixed anyway.

* *

Any tags which belong to neither set sort before anything * else.

* * @author Keith James */ final class GenEmblPropertyComparator implements Comparator { static final Comparator INSTANCE = new GenEmblPropertyComparator(); private List tagOrder; private GenEmblPropertyComparator() { tagOrder = new ArrayList(); tagOrder.add(EmblLikeFormat.ID_TAG); tagOrder.add(EmblLikeFormat.ACCESSION_TAG); tagOrder.add(EmblLikeFormat.VERSION_TAG); tagOrder.add(EmblLikeFormat.DATE_TAG); tagOrder.add(EmblLikeFormat.DEFINITION_TAG); tagOrder.add(EmblLikeFormat.KEYWORDS_TAG); tagOrder.add(EmblLikeFormat.SOURCE_TAG); tagOrder.add(EmblLikeFormat.ORGANISM_TAG); /*tagOrder.add(EmblLikeFormat.REFERENCE_TAG); tagOrder.add(EmblLikeFormat.COORDINATE_TAG); tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG); tagOrder.add(EmblLikeFormat.AUTHORS_TAG); tagOrder.add(EmblLikeFormat.TITLE_TAG); tagOrder.add(EmblLikeFormat.JOURNAL_TAG);*/ tagOrder.add(ReferenceAnnotation.class); tagOrder.add(EmblLikeFormat.DR_TAG);//lorna:added 21.08.03 tagOrder.add(EmblLikeFormat.COORDINATE_TAG); tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG); tagOrder.add(EmblLikeFormat.AUTHORS_TAG); tagOrder.add(EmblLikeFormat.TITLE_TAG); tagOrder.add(EmblLikeFormat.JOURNAL_TAG); tagOrder.add(EmblLikeFormat.COMMENT_TAG); tagOrder.add(EmblLikeFormat.FEATURE_TAG); tagOrder.add(GenbankFormat.LOCUS_TAG); tagOrder.add(GenbankFormat.SIZE_TAG); tagOrder.add(GenbankFormat.STRAND_NUMBER_TAG); tagOrder.add(GenbankFormat.TYPE_TAG); tagOrder.add(GenbankFormat.CIRCULAR_TAG); tagOrder.add(GenbankFormat.DIVISION_TAG); tagOrder.add(GenbankFormat.DATE_TAG); tagOrder.add(GenbankFormat.DEFINITION_TAG); tagOrder.add(GenbankFormat.ACCESSION_TAG); tagOrder.add(GenbankFormat.VERSION_TAG); tagOrder.add(GenbankFormat.GI_TAG); tagOrder.add(GenbankFormat.KEYWORDS_TAG); tagOrder.add(GenbankFormat.SOURCE_TAG); tagOrder.add(GenbankFormat.ORGANISM_TAG); tagOrder.add(GenbankFormat.REFERENCE_TAG); tagOrder.add(GenbankFormat.AUTHORS_TAG); tagOrder.add(GenbankFormat.TITLE_TAG); tagOrder.add(GenbankFormat.JOURNAL_TAG); tagOrder.add(GenbankFormat.COMMENT_TAG); tagOrder.add(GenbankFormat.FEATURE_TAG); } public int compare(Object o1, Object o2) { int index1 = tagOrder.indexOf(o1); int index2 = tagOrder.indexOf(o2); return (index1 - index2); } } -------------- next part -------------- /* * Created by IntelliJ IDEA. * User: lmorris * Date: Nov 14, 2003 * Time: 11:45:41 AM * To change template for new class use * Code Style | Class Templates options (Tools | IDE Options). */ package org.biojava.bio.seq.io; import org.biojava.bio.AbstractAnnotation; import org.biojava.utils.ChangeVetoException; import java.util.Map; import java.util.HashMap; public class ReferenceAnnotation extends AbstractAnnotation { /** * The properties map. This may be null if no property values have * yet been set. */ private Map properties; public ReferenceAnnotation() { super(); try { System.out.println("Calling refAnnot"); this.setProperty(EmblLikeFormat.SEPARATOR_TAG, "");//all references have an epty XX line } catch (ChangeVetoException e) { e.printStackTrace(); } } protected Map getProperties() { if(!propertiesAllocated()) { properties = new HashMap(); } return properties; } protected boolean propertiesAllocated() { return properties != null; } } -------------- next part -------------- /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * */ package org.biojava.bio.seq.io; import java.util.*; import org.biojava.bio.Annotation; import org.biojava.bio.BioError; import org.biojava.bio.seq.Feature; import org.biojava.bio.seq.FeatureHolder; import org.biojava.bio.seq.Sequence; import org.biojava.bio.symbol.IllegalAlphabetException; import org.biojava.bio.symbol.Symbol; /** * SeqIOEventEmitter is a utility class which scans a * Sequence object and sends events describing its * constituent data to a SeqIOListener. The listener * should be able to reconstruct the Sequence from these * events. * * @author Keith James * @since 1.2 */ class SeqIOEventEmitter { private static Symbol [] symProto = new Symbol [0]; private Comparator seqPropComparator; private Comparator refPropComparator; private Comparator featureComparator; SeqIOEventEmitter(Comparator seqPropComparator, Comparator featureComparator) { this.seqPropComparator = seqPropComparator; this.featureComparator = featureComparator; }; /** * getSeqIOEvents scans a Sequence * object and sends events describing its data to the * SeqIOListener. * * @param seq a Sequence. * @param listener a SeqIOListener. */ void getSeqIOEvents(Sequence seq, SeqIOListener listener) { try { // Inform listener of sequence start listener.startSequence(); // Pass name to listener listener.setName(seq.getName()); // Pass URN to listener listener.setURI(seq.getURN()); // Pass sequence properties to listener Annotation a = seq.getAnnotation(); List sKeys = new ArrayList(a.keys()); Collections.sort(sKeys, seqPropComparator); for (Iterator ki = sKeys.iterator(); ki.hasNext();) { Object key = ki.next(); if ( key.equals(ReferenceAnnotation.class)) { ArrayList references = null; if (a.getProperty(key) instanceof ArrayList) { references = ((ArrayList)a.getProperty(key)); } if (references != null) { for ( int i = 0; i < references.size(); i++ ) { ReferenceAnnotation refAnnot = (ReferenceAnnotation)references.get(i); Map referenceLines = refAnnot.getProperties(); List refKeys = new ArrayList(referenceLines.keySet()); refPropComparator = EmblReferenceComparator.INSTANCE; Collections.sort(refKeys, refPropComparator); for (Iterator kit = refKeys.iterator(); kit.hasNext();) { Object refKey = kit.next(); //adds all the R* tags and final XX tag listener.addSequenceProperty(refKey, refAnnot.getProperty(refKey)); } } } } else { if (!(key.equals(EmblLikeFormat.SEPARATOR_TAG))) { //lorna: ignore XX listener.addSequenceProperty(key, a.getProperty(key)); } } } // Recurse through sub feature tree, flattening it for // EMBL List subs = getSubFeatures(seq); Collections.sort(subs, featureComparator); // Put the source features first for EMBL for (Iterator fi = subs.iterator(); fi.hasNext();) { // The template is required to call startFeature Feature.Template t = ((Feature) fi.next()).makeTemplate(); // Inform listener of feature start listener.startFeature(t); // Pass feature properties (i.e. qualifiers to // listener) // FIXME: this will drop all non-comparable keys List fKeys = comparableList(t.annotation.keys()); Collections.sort(fKeys); for (Iterator ki = fKeys.iterator(); ki.hasNext();) { Object key = ki.next(); listener.addFeatureProperty(key, t.annotation.getProperty(key)); } // Inform listener of feature end listener.endFeature(); } // Add symbols listener.addSymbols(seq.getAlphabet(), (Symbol []) seq.toList().toArray(symProto), 0, seq.length()); // Inform listener of sequence end listener.endSequence(); } catch (IllegalAlphabetException iae) { // This should never happen as the alphabet is being used // by this Sequence instance throw new BioError("An internal error occurred processing symbols",iae); } catch (ParseException pe) { throw new BioError("An internal error occurred creating SeqIO events",pe); } } /** * getSubFeatures is a recursive method which returns * a list of all Features within a * FeatureHolder. * * @param fh a FeatureHolder. * * @return a List. */ private static List getSubFeatures(FeatureHolder fh) { List subfeat = new ArrayList(); for (Iterator fi = fh.features(); fi.hasNext();) { FeatureHolder sfh = (FeatureHolder) fi.next(); subfeat.addAll((Collection) getSubFeatures(sfh)); subfeat.add(sfh); } return subfeat; } private List comparableList(Collection coll) { ArrayList res = new ArrayList(); for(Iterator i = coll.iterator(); i.hasNext(); ) { Object o = i.next(); if(o instanceof Comparable) { res.add(o); } } return res; } } From mark.schreiber at agresearch.co.nz Mon Nov 24 16:01:54 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Mon Nov 24 16:08:36 2003 Subject: [Biojava-l] EmblFileFormer Message-ID: Hi Lorna, This is really good. Writing files correctly has been a weak point in biojava. I have committed your changes to the 1.3.1 branch and I will put them on the biojava-live branch shortly. If there are any volunteers the writing of GenPept and SwissProt files also sucks badly. International fame and adoration await the person who fixes them :) - mark > -----Original Message----- > From: Lorna Morris [mailto:lmorris@ebi.ac.uk] > Sent: Tuesday, 25 November 2003 3:38 a.m. > To: biojava-l@biojava.org > Subject: [Biojava-l] EmblFileFormer > > > Hello > > I'm using biojava to parse an EMBL Flat file, modify it, and > dump it out > to file at the end. However when I used SeqIOTools.writeEmbl the file > created, did not have correctly ordered and nested RN, RP, RX, RA, RT > and RL lines. These lines should occur in repeated sets, one set for > each reference in the flat file. I've modified some of the biojava > classes and added 2 new classes to correct this. Everthing works fine > now. I'm attatching the classes to this mail. > > Files modfied: > > EmblLikeFormat > EmblFileFormer > SeqIOEventEmitter > GenEmblPropertyComparator > > Files added: > > ReferenceAnnotation.java > EmblReferenceComparator.java > > If you need any more details on the changes I've made let me > know. Thanks, > > Lorna > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mark.schreiber at agresearch.co.nz Mon Nov 24 17:25:48 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Mon Nov 24 17:32:19 2003 Subject: [Biojava-l] BioJava in the news Message-ID: Hey - James Gosling knows we exist! http://bio.oreilly.com/news/gosling.html ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ben at schoolid.com Tue Nov 25 03:42:35 2003 From: ben at schoolid.com (Ben Good) Date: Tue Nov 25 03:49:09 2003 Subject: [Biojava-l] anger error Message-ID: <511887DA-1F23-11D8-99C7-000393C45566@schoolid.com> Hi, Trying to implement this bit from biojava in anger ("How do count the residues in a sequence"). Count counts = new IndexedCount ((FiniteAlphabet)seq.getAlphabet()); //iterate through the Symbols in seq for (Iterator i = seq.iterator(); i.hasNext();){ AtomicSymbol sym = (AtomicSymbol)i.next(); counts.increaseCount (sym,1.0); } It compiles but gives a class cast exception when I try to run it. Won't accept (AtomicSymbol)i.next(); It seems that seq.iterator() returns an iterator over Symbols and not AtomicSymbols? any ideas? thanks -Ben From mark.schreiber at agresearch.co.nz Tue Nov 25 16:32:14 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Tue Nov 25 16:38:50 2003 Subject: [Biojava-l] anger error Message-ID: Hi Ben - If your Sequence contains any ambiguous Symbols (eg for DNA n,y,r,w etc), they will not be AtomicSymbols they will be BasisSymbols. BasisSymbols are made up of one or more AtomicSymbols. If this is the case you need to use Solution 2 from the same page (http://www.biojava.org/docs/bj_in_anger/CountResidues.htm). Actually Solution 2 is the better of the two as it is more flexible. If this still doesn't work send me the sequence and I'll take a look. - Mark > -----Original Message----- > From: Ben Good [mailto:ben@schoolid.com] > Sent: Tuesday, 25 November 2003 9:43 p.m. > To: biojava-l@biojava.org > Subject: [Biojava-l] anger error > > > Hi, > > Trying to implement this bit from biojava in anger ("How do > count the > residues in a sequence"). > > Count counts = new IndexedCount ((FiniteAlphabet)seq.getAlphabet()); > //iterate through the > Symbols in seq > for (Iterator i > = seq.iterator(); i.hasNext();){ > AtomicSymbol > sym = (AtomicSymbol)i.next(); > > counts.increaseCount (sym,1.0); > } > > It compiles but gives a class cast exception when I try to run it. > Won't accept (AtomicSymbol)i.next(); > > It seems that seq.iterator() returns an iterator over Symbols and not > AtomicSymbols? > > any ideas? > > thanks > -Ben > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From wux at mail.cbi.pku.edu.cn Tue Nov 25 20:42:25 2003 From: wux at mail.cbi.pku.edu.cn (wux@mail.cbi.pku.edu.cn) Date: Tue Nov 25 20:52:49 2003 Subject: [Biojava-l] chinese version of biojava in anger's website Message-ID: <200311260146.hAQ1k1AY002411@mail.cbi.pku.edu.cn> Dear all: The chinese version of biojava in anger is located at http://www.cbi.pku.edu.cn/chinese/documents/PUMA/biojava/index-cn.html Now, it is ok to see it out of china. I hope you can enjoy it. PS: Mark, would you like to add a link in biojava in anger? Thanks. ¡¡¡¡ ¡¡¡¡¡¡¡¡¡¡¡¡ Yours faithfully, ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux@mail.cbi.pku.edu.cn ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ 2003-11-26 ***************************************************** WuXin Ph.D student of CBI (Center of Bioinformatics) Peking University 100871 P.R.China Email: wux@mail.cbi.pku.edu.cn Tel: 010-62762409 (dorm) 010-62755206 (office) Address: Building 47#2026 Peking University ***************************************************** From mark.schreiber at agresearch.co.nz Tue Nov 25 20:52:24 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Tue Nov 25 20:59:10 2003 Subject: [Biojava-l] chinese version of biojava in anger's website Message-ID: Hi - Thanks for this. I will put a link from biojava in anger as soon as I sort out some file permissions problems I am having with the open-bio server. Also, do you happen to know which font is required for viewing chinese characters so I can add this information too? Thanks Mark > -----Original Message----- > From: wux@mail.cbi.pku.edu.cn [mailto:wux@mail.cbi.pku.edu.cn] > Sent: Wednesday, 26 November 2003 2:42 p.m. > To: biojava-l@biojava.org > Subject: [Biojava-l] chinese version of biojava in anger's website > > > Dear all: > > The chinese version of biojava in anger is located at > http://www.cbi.pku.edu.cn/chinese/documents/PUMA/biojava/index-cn.html > > Now, it is ok to see it out of china. I hope you can enjoy it. > PS: Mark, would you like to add a link in biojava in anger? Thanks. > > > ¡¡¡¡ > > ¡¡¡¡¡¡¡¡¡¡¡¡ > Yours faithfully, > ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux > ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux@mail.cbi.pku.edu.cn > ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ 2003-11-26 > ***************************************************** > WuXin Ph.D student of CBI (Center of Bioinformatics) > Peking University 100871 P.R.China > Email: wux@mail.cbi.pku.edu.cn > Tel: 010-62762409 (dorm) > 010-62755206 (office) > Address: Building 47#2026 Peking University > ***************************************************** > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From daviddebeule at pandora.be Wed Nov 26 16:00:48 2003 From: daviddebeule at pandora.be (david de beule) Date: Wed Nov 26 18:42:57 2003 Subject: [Biojava-l] Strang change of Location class Message-ID: <001c01c3b460$5f08d3e0$0100a8c0@davidpc> Hi, This piece of code: //making a sequence Alphabet dna = DNATools.getDNA(); SymbolTokenization dnaToke = dna.getTokenization("token"); SymbolList seq0 = new SimpleSymbolList(dnaToke, "ACTGGACCTAAGG"); Sequence sequence0 = new SimpleSequence(seq0, "test", "test", null); //adding a feature with a between location StrandedFeature.Template templ = new StrandedFeature.Template(); templ.annotation = Annotation.EMPTY_ANNOTATION; templ.location = new BetweenLocation(new RangeLocation(7,8)); templ.source = "my feature"; templ.strand = StrandedFeature.POSITIVE; templ.type = "interesting motif"; sequence0.createFeature(templ); Iterator iter = sequence0.features(); while (iter.hasNext()) { Feature feature = (Feature)iter.next(); Location location = feature.getLocation(); System.out.println("orginal feature location: " + location.getClass()); } //converting to a simplegappedsequence SimpleGappedSequence _sequence = new SimpleGappedSequence(sequence0); iter = _sequence.features(); while (iter.hasNext()) { Feature feature = (Feature)iter.next(); Location location = feature.getLocation(); System.out.println("new feature location: " + location.getClass()); } Gives me the following output: orginal feature location class org.biojava.bio.symbol.BetweenLocation new feature location: class org.biojava.bio.symbol.RangeLocation Why is the feature location changed from BetweenLocation to RangeLocation during the conversion ?? Any help would be appreciated, David De Beule -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biojava-l/attachments/20031126/9f3d2d9f/attachment-0001.htm From cox at mshri.on.ca Wed Nov 26 21:14:40 2003 From: cox at mshri.on.ca (Brian Cox) Date: Wed Nov 26 18:43:21 2003 Subject: [Biojava-l] weightmatrix annotator Message-ID: <009301c3b48c$39daaf40$61627026@rossdell> Hello, Does the current method or is there a method that lets multiple weight matrix annotations be on the same sequence. I currently am annotating the sequence then pulling the annotation off into a list then annotating with the next matrix etc., is there a good way of iterating through all matrices, annotating the sequence with out deleting the annotation previous annotation? Perhaps it does this already and I did something wrong? later, Brian Cox Samuel Lunenfeld Research Institute Mount Sinai Hospital, Rm 884 Toronto, Ontario Canada 416-586-8266 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biojava-l/attachments/20031126/9e23259d/attachment.htm From wux at mail.cbi.pku.edu.cn Wed Nov 26 21:00:03 2003 From: wux at mail.cbi.pku.edu.cn (wux@mail.cbi.pku.edu.cn) Date: Wed Nov 26 21:59:12 2003 Subject: [Biojava-l] How soon can we get a book of biojava? Message-ID: <200311270203.hAR23PAY010939@mail.cbi.pku.edu.cn> Dear all: As Mark said, James Gosling knows biojava exists. I found two books in O'reilly : "Beginning perl for bioinformatics" and " Mastering perl for bioinformatics". I hope " Beginning java for bioinformatics" and " Mastering java for bioinformatics " are available as soon as possible. Does biojava team think about it? ¡¡¡¡ ¡¡¡¡¡¡¡¡¡¡¡¡ Yours faithfully, ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux@mail.cbi.pku.edu.cn ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ 2003-11-27 ***************************************************** WuXin Ph.D student of CBI (Center of Bioinformatics) Peking University 100871 P.R.China Email: wux@mail.cbi.pku.edu.cn Tel: 010-62762409 (dorm) 010-62755206 (office) Address: Building 47#2026 Peking University ***************************************************** From david.huen at ntlworld.com Thu Nov 27 02:33:31 2003 From: david.huen at ntlworld.com (David Huen) Date: Thu Nov 27 02:40:00 2003 Subject: [Biojava-l] How soon can we get a book of biojava? In-Reply-To: <200311270203.hAR23PAY010939@mail.cbi.pku.edu.cn> References: <200311270203.hAR23PAY010939@mail.cbi.pku.edu.cn> Message-ID: <200311270733.32777.david.huen@ntlworld.com> On Thursday 27 Nov 2003 2:00 am, wux@mail.cbi.pku.edu.cn wrote: > Dear all: > > As Mark said, James Gosling knows biojava exists. I found two books in > O'reilly : "Beginning perl for bioinformatics" and " Mastering perl for > bioinformatics". I hope " Beginning java for bioinformatics" and " > Mastering java for bioinformatics " are available as soon as possible. > Does biojava team think about it? > I believe that Matthew Pocock (and someone else too?) was commissioned to write one and has been busy doing so. Rgds, David Huen From matthew_pocock at yahoo.co.uk Thu Nov 27 10:32:42 2003 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Thu Nov 27 10:48:28 2003 Subject: [Biojava-l] Strang change of Location class In-Reply-To: <001c01c3b460$5f08d3e0$0100a8c0@davidpc> References: <001c01c3b460$5f08d3e0$0100a8c0@davidpc> Message-ID: <3FC6191A.4040803@yahoo.co.uk> Hehe. Subtle are the ways of wizards. Deep in the guts of BioJava there is some magic that makes locations behave reasonably, even if you do impolite things like ask them to project across gaps or into assemblies. This was not taking into account the other magic that makes BetweenLocation and CircularLocation behave. The code in CVS should handle this now. Well spotted. Matthew > Gives me the following output: > > orginal feature location class org.biojava.bio.symbol.BetweenLocation > new feature location: class org.biojava.bio.symbol.RangeLocation > > Why is the feature location changed from BetweenLocation to > RangeLocation during the conversion ?? > > Any help would be appreciated, > > David De Beule > > >------------------------------------------------------------------------ > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > From tvavouri at hotmail.com Thu Nov 27 11:14:47 2003 From: tvavouri at hotmail.com (Tanya Vavouri) Date: Thu Nov 27 11:21:13 2003 Subject: [Biojava-l] weightmatrix annotator Message-ID: Hi Brian, I have started using biojava to annotate sequences with weight matrices and have had the same problem. Basically, using the WeightMatrixAnnotator class I can annotate a sequence with multiple weight matrices, but the problem is that when I then look at the sequence features, I can't tell which feature corresponds to which weight matrix. As a solution, I've modified the WeightMatrixAnnotator class so that the constructor can also take a String argument, which is the ID of my weight matrix and then the class saves that string as the Feature.type(instead of "hit"). I also thought that it would be good for the WeightMatrixAnnotator to accept a database of Weight Matrices so that it can neatly annotate a sequence with many matrices. Does anyone know if this can already be done with some other biojava classes or whether someone is already working on this ? If not, would it be worth me sending to biojava some classes that I've written to deal with these tasks ? Tanya Vavouri Graduate Student Comparative Genomics Group MRC HGMP-RC Hinxton Cambridge CB10 1SB UK >From: "Brian Cox" >To: >Subject: [Biojava-l] weightmatrix annotator >Date: Wed, 26 Nov 2003 18:14:40 -0800 > >Hello, >Does the current method or is there a method that lets multiple weight >matrix annotations be on the same sequence. I currently am annotating the >sequence then pulling the annotation off into a list then annotating with >the next matrix etc., is there a good way of iterating through all >matrices, annotating the sequence with out deleting the annotation previous >annotation? Perhaps it does this already and I did something wrong? > >later, >Brian Cox >Samuel Lunenfeld Research Institute >Mount Sinai Hospital, Rm 884 >Toronto, Ontario >Canada > >416-586-8266 >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l _________________________________________________________________ Use MSN Messenger to send music and pics to your friends http://www.msn.co.uk/messenger From verhoeff2 at gis.a-star.edu.sg Thu Nov 27 22:34:07 2003 From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans) Date: Fri Nov 28 12:25:51 2003 Subject: [Biojava-l] PhredFormat Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0694@BIONIC.biopolis.one-north.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: PhredFormat.java Type: application/octet-stream Size: 9750 bytes Desc: PhredFormat.java Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20031128/65d8a81b/PhredFormat-0001.obj From mark.schreiber at agresearch.co.nz Sun Nov 30 17:19:12 2003 From: mark.schreiber at agresearch.co.nz (Schreiber, Mark) Date: Sun Nov 30 17:25:47 2003 Subject: [Biojava-l] RE: [Biojava-dev] PhredFormat Message-ID: Hi Frans - Thanks for these changes. I have committed them to cvs and added "default" as a valid tokenization of IntegerAlphabet (as a synonym of "token"). - Mark -----Original Message----- From: VERHOEF Frans [mailto:verhoeff2@gis.a-star.edu.sg] Sent: Friday, 28 November 2003 4:34 p.m. To: biojava-dev@biojava.org; biojava-l@biojava.org Subject: [Biojava-dev] PhredFormat Hi, I have fixed the little bugs in PhredFormat bugging me for the last 2 days. I have attached the version fixed by me. Feel free to use it, change it or throw it. In short what I have changed is this: - PhredFormat implements ParseErrorSource and ParseErrorListener. This was not much of a job, as I basically copied it from FastaFormat. - readSequenceData(BufferedReader br, SymbolTokenization parser, SeqIOListener listener) has changed. This method used to parse char arrays for short number strings and feed it to the StreamParser, which in turn would try to do the same. As in the process the whitespaces were removed, in the end a String representing a humongous number was tried to be parsed to integer. Now this method does not parse the char arrays, but just feeds whole chunks of char array to the StreamParser. One new issue came up though, when I am trying to do the following: StreamReader qualityIter = PhredTools.readPhredQuality(new BufferedReader(new FileReader(phredQualityFile))); While (qualityIter.hasNext()){ Sequence seq = qualityIter.nextSequence(); String str = seq.seqString(); } The last line gave the following exception: java.util.NoSuchElementException: default parser not supported by IntegerAlphabet yet at org.biojava.bio.symbol.IntegerAlphabet.getTokenization(IntegerAlphabet.java:216) at org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSymbolList.java:101) at org.biojava.bio.seq.impl.SimpleSequence.seqString(SimpleSequence.java:108) at org.gis.server.pipeline.apps.SequenceInfoParser.parseResults(SequenceInfoParser.java:82) What happens is that SimpleSequence calls the AbstractSymbolList.seqString() method. This method in turn executes getAlphabet().getTokenization("default"), where getAlphabet returns the IntegerAlphabet. But IntegerAlphabet throws the Exception here, because it only except a name parameter value "token" and not the "default" that AbstractSymbolList gives. I do have simple workaround, that basically where the method IntegerAplhabet.getTokenization(String name) accepts both "default" and "token". But I am not sure I here understand the philosophy behind the design completely... Kind regards, Frans Verhoef Bioinformatics Specialist Genome Institute of Singapore Genome, #02-01, 60 Biopolis Street, Singapore 138672 Tel: +65 6478 8000 DID: +65 6478 8060 HP: +65 9848 4325 Email: verhoeff2@gis.a-star.edu.sg ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================