From heuermh at acm.org Wed Jul 1 12:56:36 2009 From: heuermh at acm.org (Michael Heuer) Date: Wed, 1 Jul 2009 12:56:36 -0400 (EDT) Subject: [Biojava-dev] Singletons are bad In-Reply-To: <93b45ca50906300133w58109024vb89c6970a8446fed@mail.gmail.com> Message-ID: Mark Schreiber wrote: > I came across this today which is an interesting article about how > singletons seem like a good idea but after a while you realise they get you > into serious trouble. After playing with BioJava for over 10 years I > completely concur. Singletons and fly-weight objects are (IMHO) the most > serious problem in the BioJava code base and as the article predicts the BJ > code base is completely infected with them. > > The article is here: > http://tech.puredanger.com/2007/07/03/pattern-hate-singleton/ > > > But I have copied the paragraph below as it seems to offer a way out without > completely breaking everything. This should be seriously considered for > future BJ releases. > > ... paste starts here > But I already have a bunch of singletons in my code! > ... I've had good luck using Google Guice in several for-work projects: > http://code.google.com/p/google-guice/ @Inject is the new new as they say. :) michael From sylvain.foisy at diploide.net Thu Jul 2 09:12:44 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 02 Jul 2009 09:12:44 -0400 Subject: [Biojava-dev] Preliminary QBlast support in biojava-live Message-ID: Hi all, I just put some material into a new package (org.biojavax.bio.alignment) for creating a remote service for alignment with its implementation for QBlast. The philosophy for using these is this: - Create an implementation of RemotePairwiseAlignmentService for a specific remote service; - Create an implementation of RemotePairwiseAlignementProperties to set parameters for alignment; - Use the sendAlignmentRequest() method with a sequence with the implemented RemotePairwiseAlignementProperties to submit the sequence for alignmnent. - Retrieve the results with an implementation of RemotePairwiseAlignmentOutputProperties which specifies the format of the output to get from the service. This is done so that submission of sequence and retrieval of results can be dissociated. I think that I have addressed most of the points of a few weeks back. If not, let me know ;-) I created a demo in the demos folder. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From aradwen at gmail.com Thu Jul 2 09:28:00 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Thu, 2 Jul 2009 15:28:00 +0200 Subject: [Biojava-dev] Parsing Interpro results Message-ID: Hello everyone, I looked around in Biojava doc and through internet but I did'nt found how to parse Interproscan results (xml as well as tabular formats) It is not hard to code it in Java, But I just wanted to know if this exists or not. Regards Rad From hunter at ebi.ac.uk Thu Jul 2 10:56:28 2009 From: hunter at ebi.ac.uk (Sarah Hunter) Date: Thu, 02 Jul 2009 15:56:28 +0100 Subject: [Biojava-dev] Parsing Interpro results In-Reply-To: <12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com> References: <12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com> Message-ID: <4A4CCA9C.1080404@ebi.ac.uk> Hi Radwen (and the rest of the biojava guys), As far as I am aware, there isn't a biojava parser for InterProScan results. However, we are undergoing a complete re-write of InterPro and InterProScan at the moment and it is our intention to provide a java API for accessing all of our data. If you wish to be involved in testing this API, please contact the InterPro team via the EBI's support pages (http://www.ebi.ac.uk/support/) Many thanks for your interest. Sarah Hunter --- Sarah Hunter InterPro Team Leader European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD, UK ===================================== > From: Radwen ANIBA > Date: Thu, Jul 2, 2009 at 2:28 PM > Subject: [Biojava-dev] Parsing Interpro results > To: biojava-dev at lists.open-bio.org > > > Hello everyone, > > I looked around in Biojava doc and through internet but I did'nt found how > to parse Interproscan results (xml as well as tabular formats) > It is not hard to code it in Java, But I just wanted to know if this exists > or not. > > Regards > Rad > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > From fbristow at gmail.com Tue Jul 7 09:34:09 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Tue, 7 Jul 2009 08:34:09 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer Message-ID: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> Hi everyone, Now that you're all back from ISMB (I hope you all had a good time!) I thought it would be a good time to bring this up. A while back I wrote to the list about an ABIF parser and SCF writer that I had written. I got some pointers on things to change and I've since made the suggested changes. Now I was wondering how I should go about getting these files into BioJava.... -- Franklin From andreas at sdsc.edu Tue Jul 7 12:51:05 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 7 Jul 2009 09:51:05 -0700 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> Message-ID: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> Hi Franklin, The theme of the moment is modularization... I wonder if we should make a module for parsing the output of sequencers... This topic is also a bit related to the discussion we had around BOSC last week, how to contribute modules, and what is the role of a module maintainer. I will send out a more detailed summary on that a bit later. Andreas On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow wrote: > Hi everyone, > Now that you're all back from ISMB (I hope you all had a good time!) I > thought it would be a good time to bring this up. > > A while back I wrote to the list about an ABIF parser and SCF writer that I > had written. I got some pointers on things to change and I've since made > the suggested changes. Now I was wondering how I should go about getting > these files into BioJava.... > > -- > Franklin > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From gmicha at gmail.com Tue Jul 7 13:08:07 2009 From: gmicha at gmail.com (Micha Sammeth) Date: Tue, 07 Jul 2009 19:08:07 +0200 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> Message-ID: <4A5380F7.6090602@gmail.com> Hi, the time I worked on SCF and ABI files is a bit ago, but lets see if I can contribute something here. Working since a year on NGS, I could imagine that readers for the standard-output of pipelines by Illumina & Co would also fit there. If there is a reader module, I would plead for a low-level interface for accessing sequences/qualities and re-usable data containers during I/O. But maybe that is rather an early stage to talk about that when not even the existence of the module is decided. cheers - micha. Andreas Prlic wrote: > Hi Franklin, > > The theme of the moment is modularization... I wonder if we should make a > module for parsing the output of sequencers... > > This topic is also a bit related to the discussion we had around BOSC last > week, how to contribute modules, and what is the role of a module > maintainer. I will send out a more detailed summary on that a bit later. > > Andreas > > > On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow wrote: > >> Hi everyone, >> Now that you're all back from ISMB (I hope you all had a good time!) I >> thought it would be a good time to bring this up. >> >> A while back I wrote to the list about an ABIF parser and SCF writer that I >> had written. I got some pointers on things to change and I've since made >> the suggested changes. Now I was wondering how I should go about getting >> these files into BioJava.... >> >> -- >> Franklin >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From fbristow at gmail.com Tue Jul 7 13:29:56 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Tue, 7 Jul 2009 12:29:56 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <4A5380F7.6090602@gmail.com> References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> <4A5380F7.6090602@gmail.com> Message-ID: <50a7756d0907071029s45ee0983y85f2ee307765e65c@mail.gmail.com> Hello, I think I like the idea of having a module for the I/O of sequencers in general. I really only have familiarity with ABI sequencers (ie: 31xx and 37xx) and the data that they spit out, so I would be able to offer some help there. Needless to say, the documentation that ABI released regarding their binary format was much appreciated when I was going through the code. To Micha: when you talk about 'during I/O', do you mean having some kind of an event based parser? When I wrote my extended ABIF parser I modelled it after the perl module Bio::Trace::ABIF, so there are accessors for many of the tags that are defined in the ABI spec. On Tue, Jul 7, 2009 at 12:08 PM, Micha Sammeth wrote: > Hi, > > the time I worked on SCF and ABI files is a bit ago, but lets see if I can > contribute something here. Working since a year on NGS, I could imagine that > readers for the standard-output of pipelines by Illumina & Co would also fit > there. > > If there is a reader module, I would plead for a low-level interface for > accessing sequences/qualities and re-usable data containers during I/O. But > maybe that is rather an early stage to talk about that when not even the > existence of the module is decided. > > cheers - micha. > > > Andreas Prlic wrote: > >> Hi Franklin, >> >> The theme of the moment is modularization... I wonder if we should make a >> module for parsing the output of sequencers... >> >> This topic is also a bit related to the discussion we had around BOSC last >> week, how to contribute modules, and what is the role of a module >> maintainer. I will send out a more detailed summary on that a bit later. >> >> Andreas >> >> >> On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow >> wrote: >> >> Hi everyone, >>> Now that you're all back from ISMB (I hope you all had a good time!) I >>> thought it would be a good time to bring this up. >>> >>> A while back I wrote to the list about an ABIF parser and SCF writer that >>> I >>> had written. I got some pointers on things to change and I've since made >>> the suggested changes. Now I was wondering how I should go about getting >>> these files into BioJava.... >>> >>> -- >>> Franklin >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > -- Franklin From sreekanth.m at ocimumbio.com Wed Jul 8 01:51:15 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Wed, 8 Jul 2009 11:21:15 +0530 Subject: [Biojava-dev] Reg: Source of BioJava Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com> Dear all, Just now I started working with biojava to work in next generation sequencing. I got all the jar files to work with biojava, and i found many things which are very useful to me. I require sourde jar file of biojava. If anybody has it please send it to me. Thanks in advance. Thanks & Regards, Sreekanth.M From andreas at sdsc.edu Wed Jul 8 02:20:59 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 7 Jul 2009 23:20:59 -0700 Subject: [Biojava-dev] summary biojava user meeting Message-ID: <59a41c430907072320k3d5a4415u962d59a10d286beb@mail.gmail.com> Hi, Here a quick summary of the BioJava user meeting we had last week at the BOSC conference: The following people were present: Mattias Piipari Martijn Devisscher Frederik Decouttere Richard Holland Andreas Prlic The new modularized code base will allow for individual people to take over responsibility of some of the sub-modules as well as the contribution of new modules., which I both welcome greatly. As such it was great to have Mattias, Martijn and Frederik there and expressing their interest in this. Mattias is interested in contributing a new module related to machine learning. Martijn and Frederik are interested in providing a new GUI module (seqpad). Due to this our discussions were mainly related to how to organize the contribution of new modules and their maintainance: * Before starting a new module the code should undergo public code review * New modules need docu (wiki cookbook) and junit tests. * A Module Maintainer (MM) is the main responsible for everything related to the module. * MM coordinates patches and other user contributions for the module * MM can write papers related to the code in the module without having to cite all of the other BioJava contributors. * A MM volunteers to support the module for (at least) a year. * All MMs will be listed by name on a wiki page in order to clarify responsibilities Andreas From holland at eaglegenomics.com Wed Jul 8 12:41:54 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 08 Jul 2009 18:41:54 +0200 Subject: [Biojava-dev] Reg: Source of BioJava In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com> Message-ID: <1247071314.3792.9.camel@buzzybee> The source code can be obtained by following these instructions: http://biojava.org/wiki/CVS_to_SVN_Migration Richard. On Wed, 2009-07-08 at 11:21 +0530, Sreekanth Mogullapally wrote: > Dear all, > > Just now I started working with biojava to work in next generation sequencing. > I got all the jar files to work with biojava, and i found many things which are very useful to me. > I require sourde jar file of biojava. If anybody has it please send it to me. > > Thanks in advance. > > Thanks & Regards, > Sreekanth.M > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From florian.mittag at uni-tuebingen.de Thu Jul 9 11:16:12 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 9 Jul 2009 17:16:12 +0200 Subject: [Biojava-dev] Problems in DB2 with VARCHAR, TEXT and CLOB using BioJava Message-ID: <200907091716.13639.florian.mittag@uni-tuebingen.de> Hi all! I'm posting this to both the BioSQL and the BioJava-dev mailinglist because the problem resides in both domains, I hope this is okay. We're working on getting BioJava to run with a DB2 Express-C backend for various reasons. We've encountered several problems during this task, but this one seems to have no real solution. When adapting the BioSQL schema to DB2, the official IBM conversion guide tells us to use the data type CLOB where MySQL uses TEXT. (Chapter 11 in ftp://ftp.software.ibm.com/software/data/db2/migration/mtk/mtk_2050.pdf) So far, no problem. But when we tried reading some genebank files with BioJava, the DB2 driver threw an exception: SQL0401N The data types of the operands for the operation "=" are not compatible. SQLSTATE=42818 SQLCODE=-401 Explanation: The class org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder defines some Hibernate queries, of which one has the conditions: "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?" All three columns "authors", "location", and "title" are of type TEXT in MySQL and of type CLOB in DB2, so comparing them with "=" leads to the above error message. The way I see it, there are only two possible solutions to this problem: 1) Change the query to "from DocRef as cr where cr.authors LIKE '?' and cr.location LIKE '?' and cr.title LIKE '?'" 2) Change the data type to something comparable with "=", like VARCHAR. Solution 1 is no real solution to me, because comparing values with "LIKE" usually is slow and it seems a bit odd to change a query that works with other databases just for DB2. But taking a closer look, solution 2 has some problems, too: Although VARCHARs in DB2 can have a length of theoretically 32767, in reality they are limited by the page size of the database, which can be 32K at maximum. Since this particular table "reference" has three columns of this type, the sum of their lengths must not exceed 32767, so they could only be something like VARCHAR(10000). I have never encountered cases in which values come even close to the length of 10000, but you can never be sure. And that is why I post here. For me, the way to go is pretty clear, but we intend to be as compatible as possible with the original BioSQL. Maybe you could give me some input on how to solve this problem with as few casualties as possible ;-) Thanks, Florian From aradwen at gmail.com Fri Jul 10 10:45:35 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Fri, 10 Jul 2009 16:45:35 +0200 Subject: [Biojava-dev] ExternalProcess class Message-ID: Hi everyone, Do somebody have (as examples of biojava cookbook) a usage example of ExternalProcess class ? Let's say we want to run a local clustalw program with it, is it possible ? Any example code ? Thank you Radwen From sylvain.foisy at diploide.net Fri Jul 10 12:43:09 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Fri, 10 Jul 2009 12:43:09 -0400 Subject: [Biojava-dev] ExternalProcess class In-Reply-To: Message-ID: Hi, There is no such example to the best of my knowledge. If you do use this class, you are welcome to share your experience by contributing. As far as running something like clustalw, I don't see why you could not make use of this class. You could actually build a wrapper class to execute clustalw and do something with its output. Best regards Sylvain On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote: > Hi everyone, > > Do somebody have (as examples of biojava cookbook) a usage example of > ExternalProcess class ? > > Let's say we want to run a local clustalw program with it, is it possible ? > > Any example code ? > > Thank you =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From hlapp at gmx.net Sat Jul 11 07:47:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Jul 2009 07:47:34 -0400 Subject: [Biojava-dev] [BioSQL-l] Problems in DB2 with VARCHAR, TEXT and CLOB using BioJava In-Reply-To: <200907091716.13639.florian.mittag@uni-tuebingen.de> References: <200907091716.13639.florian.mittag@uni-tuebingen.de> Message-ID: <5614AEDA-3406-4844-8690-7653A2C4297C@gmx.net> Hi Florian: On Jul 9, 2009, at 11:16 AM, Florian Mittag wrote: > [...] > 2) Change the data type to something comparable with "=", like > VARCHAR. That's the way to go. The reason they are not VARCHAR in MySQL is because it is limited to 256 characters there. > [...] > Although VARCHARs in DB2 can have a length of theoretically 32767, > in reality > they are limited by the page size of the database, which can be 32K at > maximum. Since this particular table "reference" has three columns > of this > type, the sum of their lengths must not exceed 32767, so they could > only be > something like VARCHAR(10000). That sounds great though. You may have noticed that the columns are all of type VARCHAR in the Oracle version of the schema with the following widths: Title VARCHAR2(1000) Authors VARCHAR2(4000) Location VARCHAR2(512) That has always served me well. Feel free to use larger widths though if you think you need them. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From HWillis at scripps.edu Sat Jul 11 07:08:32 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Sat, 11 Jul 2009 07:08:32 -0400 Subject: [Biojava-dev] ExternalProcess class In-Reply-To: References: , Message-ID: Biojava already has a Clustalw class that executes Clustalw as an external process if that is your larger goal. http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW Thanks Scooter ________________________________________ From: biojava-dev-bounces at lists.open-bio.org [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy [sylvain.foisy at diploide.net] Sent: Friday, July 10, 2009 12:43 PM To: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] ExternalProcess class Hi, There is no such example to the best of my knowledge. If you do use this class, you are welcome to share your experience by contributing. As far as running something like clustalw, I don't see why you could not make use of this class. You could actually build a wrapper class to execute clustalw and do something with its output. Best regards Sylvain On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote: > Hi everyone, > > Do somebody have (as examples of biojava cookbook) a usage example of > ExternalProcess class ? > > Let's say we want to run a local clustalw program with it, is it possible ? > > Any example code ? > > Thank you =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From paolo.pavan at gmail.com Sun Jul 12 17:41:05 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Sun, 12 Jul 2009 23:41:05 +0200 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> Message-ID: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> Hi, I would like to post again with some adjustments a question I put some times ago because maybe this is a more correct list, apologize for the repeating. Can someone kindly give me his advise? thank you in advance, Paolo ---------- Forwarded message ---------- From: Paolo Pavan Date: 2009/7/9 Subject: Assembly data reading To: Biojava-l at lists.open-bio.org Hi everybody, I'm almost new to this topic, I would like to know if there is something can help me to load in my java program data from a large 454 contig. I need to retain in memory and access data from the single reads forming the contig too. I suppose these informations are in a *.sff file, if it is not possible to load such file it should be ok to load a *.ace (phrap) data file that I have too. Many thanks for any suggestion you can give me! Greetings, Paolo From holland at eaglegenomics.com Mon Jul 13 01:20:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 13 Jul 2009 06:20:35 +0100 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> Message-ID: <1247462435.25217.7.camel@buzzybee> Nothing within BJ can parse the 454 .sff files directly. However I think there is a growing need for it so if anyone is willing to contribute code, it would be very welcome. There is also no .ace parser, although in 2007 someone volunteered to write one but nothing happened, and there was a previous post (many years ago!) from someone else who already had some working code but again nothing seems to have happened: http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html So to start with, someone (perhaps yourself? that would be nice! :) ) needs to volunteer to write either a .ace or .sff parser, or both. The thing to bear in mind with 454 contigs as you rightly point out is the sheer size of the things. The requirement to keep them entirely in memory is likely to be unworkable as it would leave little room for anything else to run on your average machine. I would suggest either memory-mapping the file itself, or parsing and writing out a memory-mapped summary file containing the bits of data you're interested in. (Memory-mapping is where you keep an index in memory indicating where in the file each record is, so that when you need to access them you load them on-the-fly from the file and drop them out of memory again immediately after use. An accelerated form of this is to put the loaded records into some kind of LRU cache which holds only the most recently accessed records and then check that cache first to see if you've already loaded the record before accessing the file directly.) cheers, Richard On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: > Hi, > I would like to post again with some adjustments a question I put some > times ago because maybe this is a more correct list, apologize for the > repeating. > Can someone kindly give me his advise? > > thank you in advance, > Paolo > > > ---------- Forwarded message ---------- > From: Paolo Pavan > Date: 2009/7/9 > Subject: Assembly data reading > To: Biojava-l at lists.open-bio.org > > > Hi everybody, > I'm almost new to this topic, I would like to know if there is > something can help me to load in my java program data from a large 454 > contig. I need to retain in memory and access data from the single > reads forming the contig too. > I suppose these informations are in a *.sff file, if it is not > possible to load such file it should be ok to load a *.ace (phrap) > data file that I have too. > Many thanks for any suggestion you can give me! > > Greetings, > Paolo > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Mon Jul 13 02:29:56 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 13 Jul 2009 14:29:56 +0800 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <1247462435.25217.7.camel@buzzybee> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> <1247462435.25217.7.camel@buzzybee> Message-ID: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> I would agree that there is a strong need for this kind of thing in biojava. As Richard says you probably can't fit it in memory so you may want to memory map it. There are classes in the javax.nio package that can help a lot with this. Also I have had some success with in-memory compression of large files using LZ compression. Essentially the memory representation of the file is LZ compressed and compression and decompression are handled on the fly. Again there are Java utility classes that can help. - Mark On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland wrote: > Nothing within BJ can parse the 454 .sff files directly. However I think > there is a growing need for it so if anyone is willing to contribute > code, it would be very welcome. > > There is also no .ace parser, although in 2007 someone volunteered to > write one but nothing happened, and there was a previous post (many > years ago!) from someone else who already had some working code but > again nothing seems to have happened: > > http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html > http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html > > So to start with, someone (perhaps yourself? that would be nice! :) ) > needs to volunteer to write either a .ace or .sff parser, or both. > > The thing to bear in mind with 454 contigs as you rightly point out is > the sheer size of the things. The requirement to keep them entirely in > memory is likely to be unworkable as it would leave little room for > anything else to run on your average machine. I would suggest either > memory-mapping the file itself, or parsing and writing out a > memory-mapped summary file containing the bits of data you're interested > in. (Memory-mapping is where you keep an index in memory indicating > where in the file each record is, so that when you need to access them > you load them on-the-fly from the file and drop them out of memory again > immediately after use. An accelerated form of this is to put the loaded > records into some kind of LRU cache which holds only the most recently > accessed records and then check that cache first to see if you've > already loaded the record before accessing the file directly.) > > cheers, > Richard > > > On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: > > Hi, > > I would like to post again with some adjustments a question I put some > > times ago because maybe this is a more correct list, apologize for the > > repeating. > > Can someone kindly give me his advise? > > > > thank you in advance, > > Paolo > > > > > > ---------- Forwarded message ---------- > > From: Paolo Pavan > > Date: 2009/7/9 > > Subject: Assembly data reading > > To: Biojava-l at lists.open-bio.org > > > > > > Hi everybody, > > I'm almost new to this topic, I would like to know if there is > > something can help me to load in my java program data from a large 454 > > contig. I need to retain in memory and access data from the single > > reads forming the contig too. > > I suppose these informations are in a *.sff file, if it is not > > possible to load such file it should be ok to load a *.ace (phrap) > > data file that I have too. > > Many thanks for any suggestion you can give me! > > > > Greetings, > > Paolo > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sreekanth.m at ocimumbio.com Mon Jul 13 07:28:52 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Mon, 13 Jul 2009 16:58:52 +0530 Subject: [Biojava-dev] Reg: Quality values of ABI File Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> Hi Everybody, I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it, but I need the Quality values for each base. Please help me in this regard. Thanks in Advance Sreekanth.M From fbristow at gmail.com Mon Jul 13 09:15:40 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Mon, 13 Jul 2009 08:15:40 -0500 Subject: [Biojava-dev] Reg: Quality values of ABI File In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> Message-ID: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com> Hi Sreekanth, The quality values are stored under the PCON 1 and 2 tags. The information you need is in the offsetData field of the TaggedDataRecord. You can treat this byte array as an array of shorts containing the quality values for each base. Take a look at this PDF for more information about the different tags available to you: http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf -- Franklin On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally < sreekanth.m at ocimumbio.com> wrote: > Hi Everybody, > > I am working with "ABI" Files. I am able to get the pixel values for > Chromatogram viewer from it, > but I need the Quality values for each base. > Please help me in this regard. > > Thanks in Advance > Sreekanth.M > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Mon Jul 13 09:16:34 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Mon, 13 Jul 2009 09:16:34 -0400 Subject: [Biojava-dev] ExternalProcess class In-Reply-To: Message-ID: Hi Scooter, Actaully, this is not into BJ 1.7. This material was created by Dickson Guedes but never formalized into a class for BJ. Maybe it would be the time to do so? Best regards Sylvain On 11/07/09 07:08, "[NAME]" <[ADDRESS]> wrote: > Biojava already has a Clustalw class that executes Clustalw as an external > process if that is your larger goal. > > http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW > > Thanks > > Scooter > ________________________________________ > From: biojava-dev-bounces at lists.open-bio.org > [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy > [sylvain.foisy at diploide.net] > Sent: Friday, July 10, 2009 12:43 PM > To: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] ExternalProcess class > > Hi, > > There is no such example to the best of my knowledge. If you do use this > class, you are welcome to share your experience by contributing. As far as > running something like clustalw, I don't see why you could not make use of > this class. You could actually build a wrapper class to execute clustalw and > do something with its output. > > Best regards > > Sylvain > > On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote: > >> Hi everyone, >> >> Do somebody have (as examples of biojava cookbook) a usage example of >> ExternalProcess class ? >> >> Let's say we want to run a local clustalw program with it, is it possible ? >> >> Any example code ? >> >> Thank you > > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From markjschreiber at gmail.com Mon Jul 13 09:21:42 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 13 Jul 2009 21:21:42 +0800 Subject: [Biojava-dev] Reg: Quality values of ABI File In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> Message-ID: <93b45ca50907130621m6ebd37bn72aad9fafe195db7@mail.gmail.com> Hi - You would usually use a program like Phred/Phrap for this. There is a BioJava package for reading and processing the Phred output. - Mark On Mon, Jul 13, 2009 at 7:28 PM, Sreekanth Mogullapally < sreekanth.m at ocimumbio.com> wrote: > Hi Everybody, > > I am working with "ABI" Files. I am able to get the pixel values for > Chromatogram viewer from it, > but I need the Quality values for each base. > Please help me in this regard. > > Thanks in Advance > Sreekanth.M > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sreekanth.m at ocimumbio.com Mon Jul 13 10:49:26 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Mon, 13 Jul 2009 20:19:26 +0530 Subject: [Biojava-dev] Reg: Quality values of ABI File In-Reply-To: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com> Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CBC6@EXCHMB.ocimumbio.com> Dear Franklin, Thank you very much for your quick responce. Now I can able to get the Quality values, which meets my requirement. Thanks & Regards, Sreekanth.M From: Franklin Bristow [mailto:fbristow at gmail.com] Sent: Monday, July 13, 2009 6:46 PM To: Sreekanth Mogullapally Cc: biojava-dev-request at lists.open-bio.org; biojava-dev at lists.open-bio.org; Madhu Mohan. Ganni; Kishore Dunga Subject: Re: [Biojava-dev] Reg: Quality values of ABI File Hi Sreekanth, The quality values are stored under the PCON 1 and 2 tags. The information you need is in the offsetData field of the TaggedDataRecord. You can treat this byte array as an array of shorts containing the quality values for each base. Take a look at this PDF for more information about the different tags available to you: http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf -- Franklin On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally > wrote: Hi Everybody, I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it, but I need the Quality values for each base. Please help me in this regard. Thanks in Advance Sreekanth.M _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at eaglegenomics.com Mon Jul 13 13:14:13 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 13 Jul 2009 18:14:13 +0100 Subject: [Biojava-dev] Hackathon Message-ID: <1247505253.27493.15.camel@buzzybee> Hi all. Andreas and I would like to organise a hackathon to get the modularisation and general improvement plans for BJ3 into action, and bring the project forward into the 21st century (only 10 years late!). At this time I'm trying to gather interest and gauge who might realistically be able to attend. We will attempt to site the hackathon at a location closest to the majority of attendees. To help me plan numbers and likely costs (for potential sponsors) could all those who are interested please answer the following questions for me: 1. Name, 2. Specialist interest within Biojava (e.g. proteomics, microarrays, sequencing, etc.), 3. Your physical location (country and and nearest major city - e.g. Cambridge, London, Newcastle, San Diego, Singapore, etc.), 4. Whether you think your employer would help pay your airfare and/or 1 week in a hotel to attend (and how far you think you could go on such funding), 5. Approximate availability for the next 12 months. To get the ball rolling, here's me: 1. Richard Holland, 2. Making the whole thing more consistent, efficient, and easier to use, 3. Southampton (UK), 4. Possibly but probably only within UK/Europe, 5. Only available in mid-Jan 2010, otherwise can't do anything until mid-March 2010 onwards. Looking forward to hearing your comments! Once I have a good idea of numbers and distribution, I can get some costs together to give you (and any potential sponsors) the best idea of what might be involved. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From HWillis at scripps.edu Mon Jul 13 16:12:16 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 13 Jul 2009 16:12:16 -0400 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> Message-ID: Richard I would be up for a week away as a "vacation" to do some Java programming. So I am flexible on all elements of time and ability to travel. Bonus points for going some place where the weather is reasonable for the location since it appears we have a global option (January in the UK not my first choice/January in Colorado better choice). Probably wouldn't be a bad idea to try and bookend/overlap a bioinformatics related conference to help justify travel costs for those who need support from work. We should also consider online options for those who can't travel but can allocate the time. Thanks Scooter On 7/13/09 1:14 PM, "Richard Holland" wrote: Hi all. Andreas and I would like to organise a hackathon to get the modularisation and general improvement plans for BJ3 into action, and bring the project forward into the 21st century (only 10 years late!). At this time I'm trying to gather interest and gauge who might realistically be able to attend. We will attempt to site the hackathon at a location closest to the majority of attendees. To help me plan numbers and likely costs (for potential sponsors) could all those who are interested please answer the following questions for me: 1. Name, 2. Specialist interest within Biojava (e.g. proteomics, microarrays, sequencing, etc.), 3. Your physical location (country and and nearest major city - e.g. Cambridge, London, Newcastle, San Diego, Singapore, etc.), 4. Whether you think your employer would help pay your airfare and/or 1 week in a hotel to attend (and how far you think you could go on such funding), 5. Approximate availability for the next 12 months. To get the ball rolling, here's me: 1. Richard Holland, 2. Making the whole thing more consistent, efficient, and easier to use, 3. Southampton (UK), 4. Possibly but probably only within UK/Europe, 5. Only available in mid-Jan 2010, otherwise can't do anything until mid-March 2010 onwards. Looking forward to hearing your comments! Once I have a good idea of numbers and distribution, I can get some costs together to give you (and any potential sponsors) the best idea of what might be involved. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From paolo.pavan at gmail.com Tue Jul 14 12:08:11 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 14 Jul 2009 18:08:11 +0200 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> <1247462435.25217.7.camel@buzzybee> <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> Message-ID: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com> Dear all, I took a day to make a rapid search to try to have a clearest point of the situation. ? I found the specification of the .sff file in the 454 instrument manual, it is fully described and seems to be enough to build a reader. ? However from a more careful read it seems that a *.sff file brings not information about the automatic contig assembling and only stores flowgram info that are "reads" (not like a *.ace file indeed). ? Two hidden binary files can be found in a 454 gsAssembler project folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are not described in the manual but they seem to contain the former nucleotide data and the latter read names, they are big enough to contain such kind of data, the problem is that we don't know how to parse them. ? It is necessary to decide a "memory structure" in which store the information read, I agree on the "memory mapping" solution, maybe implemented with a Map object that can associate the names of the read and its location on the file. ? the parser class then should expose methods to: 1) iterate through reads, but maybe this should be heavy and avoidable 2) access read sequence from name ? if the parser should manage the assembled contigs too and this is subordinated to what explained in the third bullet point, it should expose method to: 1) iterate through contigs names 2) iterate through contigs consensus sequences 3) access consensus sequences from name (this is a sub problem of point 2) 4) access random aligned portions (I mean "slice") of the assembly given start-end positions returning an alignment object ? any more suggestions? I would be glad to be involved in the biojava community through this project and I could try but first of all I want to say that I?m not a guru like most of the people here ( :-p ) and to say the truth the job that my company required me is different and maybe if exists a workaround I should be honest to choose it. So let me think a bit about starting such adventure, if I can couple my job and contributing the community growth I?ll be happy to share my work! Any suggestion welcome. Bye bye, Paolo 2009/7/13 Mark Schreiber : > I would agree that there is a strong need for this kind of thing in biojava. > > As Richard says you probably can't fit it in memory so you may want to > memory map it. There are classes in the javax.nio package that can help a > lot with this. > > Also I have had some success with in-memory compression of large files using > LZ compression. Essentially the memory representation of the file is LZ > compressed and compression and decompression are handled on the fly. Again > there are Java utility classes that can help. > > - Mark > > On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland > wrote: >> >> Nothing within BJ can parse the 454 .sff files directly. However I think >> there is a growing need for it so if anyone is willing to contribute >> code, it would be very welcome. >> >> There is also no .ace parser, although in 2007 someone volunteered to >> write one but nothing happened, and there was a previous post (many >> years ago!) from someone else who already had some working code but >> again nothing seems to have happened: >> >> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html >> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html >> >> So to start with, someone (perhaps yourself? that would be nice! :) ) >> needs to volunteer to write either a .ace or .sff parser, or both. >> >> The thing to bear in mind with 454 contigs as you rightly point out is >> the sheer size of the things. The requirement to keep them entirely in >> memory is likely to be unworkable as it would leave little room for >> anything else to run on your average machine. I would suggest either >> memory-mapping the file itself, or parsing and writing out a >> memory-mapped summary file containing the bits of data you're interested >> in. (Memory-mapping is where you keep an index in memory indicating >> where in the file each record is, so that when you need to access them >> you load them on-the-fly from the file and drop them out of memory again >> immediately after use. An accelerated form of this is to put the loaded >> records into some kind of LRU cache which holds only the most recently >> accessed records and then check that cache first to see if you've >> already loaded the record before accessing the file directly.) >> >> cheers, >> Richard >> >> >> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: >> > Hi, >> > I would like to post again with some adjustments a question I put some >> > times ago because maybe this is a more correct list, apologize for the >> > repeating. >> > Can someone kindly give me his advise? >> > >> > thank you in advance, >> > Paolo >> > >> > >> > ---------- Forwarded message ---------- >> > From: Paolo Pavan >> > Date: 2009/7/9 >> > Subject: Assembly data reading >> > To: Biojava-l at lists.open-bio.org >> > >> > >> > Hi everybody, >> > I'm almost new to this topic, I would like to know if there is >> > something can help me to load in my java program data from a large 454 >> > contig. I need to retain in memory and access data from the single >> > reads forming the contig too. >> > I suppose these informations are in a *.sff file, if it is not >> > possible to load such file it should be ok to load a *.ace (phrap) >> > data file that I have too. >> > Many thanks for any suggestion you can give me! >> > >> > Greetings, >> > Paolo >> > _______________________________________________ >> > biojava-dev mailing list >> > biojava-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-dev >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From aradwen at gmail.com Wed Jul 15 07:10:03 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Wed, 15 Jul 2009 13:10:03 +0200 Subject: [Biojava-dev] PsiPred Message-ID: Hi mates, I was wondering if Biojava could handle PsiPred outputs (protein secondary structures) for parsing (eg saying well this protein have helix start at .. end at ..., sheet start at ... end at ... ), is there any class or methods that was done in that sens, if yes i'm interested. thank you From andreas at sdsc.edu Wed Jul 15 11:13:07 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 15 Jul 2009 08:13:07 -0700 Subject: [Biojava-dev] PsiPred In-Reply-To: References: Message-ID: <59a41c430907150813q7fd58a81uc2d4372f01d2e89a@mail.gmail.com> Hi Radwen, At the present there is no parser for PsiPred. I am happy about any contribution re. that... Andreas On Wed, Jul 15, 2009 at 4:10 AM, Radwen ANIBA wrote: > Hi mates, > > I was wondering if Biojava could handle PsiPred outputs (protein secondary > structures) for parsing (eg saying well this protein have helix start at .. > end at ..., sheet start at ... end at ... ), is there any class or methods > that was done in that sens, if yes i'm interested. > > thank you > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From heuermh at acm.org Wed Jul 15 15:22:14 2009 From: heuermh at acm.org (Michael Heuer) Date: Wed, 15 Jul 2009 15:22:14 -0400 (EDT) Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> Message-ID: Richard Holland wrote: > Andreas and I would like to organise a hackathon to get the > modularisation and general improvement plans for BJ3 into action, and > bring the project forward into the 21st century (only 10 years late!). > > At this time I'm trying to gather interest and gauge who might > realistically be able to attend. We will attempt to site the hackathon > at a location closest to the majority of attendees. > > To help me plan numbers and likely costs (for potential sponsors) could > all those who are interested please answer the following questions for > me: > > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. A hackathon would be great. I'm closest to MSP airport. My employer would probably not cover air fare or hotel. I would then recommend choosing an interesting location so that it would be worth spending out-of-pocket to get there. I don't use biojava for my day job any more, so I'm most interested in helping with architecture and build issues. My day job is currently a lot of data viz, maybe better integration with viz tools like Cytoscape, Piccolo2D, prefuse, and Processing would be fun to work on. michael From sreekanth.m at ocimumbio.com Thu Jul 16 02:07:19 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Thu, 16 Jul 2009 11:37:19 +0530 Subject: [Biojava-dev] Reg: SeqIOTools Class Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com> Hi Everybody, I am newly working with biojava. While I am trying to read Fasta file using biojava with following code SequenceIterator stream = SeqIOTools.readFastaDNA(new BufferedReader(new FileReader(fileName))); for this I am importing following class import org.biojava.bio.seq.io.SeqIOTools; But it is showing a warning that "SeqIOTools" is deprecated. Is there any other class which satisfies all the functionality of "SeqIOTools" class. Please suggest me in this regard. Thanks in Advance Sreekanth. M From mark.schreiber at novartis.com Thu Jul 16 02:11:52 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 16 Jul 2009 14:11:52 +0800 Subject: [Biojava-dev] Reg: SeqIOTools Class In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com> Message-ID: Hi - The replacement for this class is RichSequence.IOTools - Mark biojava-dev-bounces at lists.open-bio.org wrote on 07/16/2009 02:07:19 PM: > > Hi Everybody, > > I am newly working with biojava. While I am trying to read Fasta > file using biojava with following code > > SequenceIterator stream = SeqIOTools.readFastaDNA(new > BufferedReader(new FileReader(fileName))); > > for this I am importing following class > import org.biojava.bio.seq.io.SeqIOTools; > But it is showing a warning that "SeqIOTools" is deprecated. > > Is there any other class which satisfies all the functionality of > "SeqIOTools" class. > > Please suggest me in this regard. > > Thanks in Advance > Sreekanth. M > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From sreekanth.m at ocimumbio.com Thu Jul 16 02:38:31 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Thu, 16 Jul 2009 12:08:31 +0530 Subject: [Biojava-dev] Reg: Exporting into fasta format In-Reply-To: References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com> Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F8916A7@EXCHMB.ocimumbio.com> Hi Everybody, I need to Export the sequences into fasta format. Please suggest me how to export into fasta format. I have written my own code to export it, but I want to implement it using biojava. Thanks & Regards, Sreekanth. M From ayates at ebi.ac.uk Sun Jul 19 15:15:41 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Sun, 19 Jul 2009 20:15:41 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> References: <1247505253.27493.15.camel@buzzybee> Message-ID: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> Okay another one for the ball: 1). Andy Yates 2). Erm well getting it right so I guess that lands me in testing/ integration/killing every singleton 3). Cambridge 4). Currently in a Perl group so not a chance really 5). Quite flexible How does that sound? On 13 Jul 2009, at 18:14, Richard Holland wrote: > Hi all. > > Andreas and I would like to organise a hackathon to get the > modularisation and general improvement plans for BJ3 into action, and > bring the project forward into the 21st century (only 10 years late!). > > At this time I'm trying to gather interest and gauge who might > realistically be able to attend. We will attempt to site the hackathon > at a location closest to the majority of attendees. > > To help me plan numbers and likely costs (for potential sponsors) > could > all those who are interested please answer the following questions for > me: > > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/ > or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. > > To get the ball rolling, here's me: > > 1. Richard Holland, 2. Making the whole thing more consistent, > efficient, and easier to use, 3. Southampton (UK), 4. Possibly but > probably only within UK/Europe, 5. Only available in mid-Jan 2010, > otherwise can't do anything until mid-March 2010 onwards. > > Looking forward to hearing your comments! Once I have a good idea of > numbers and distribution, I can get some costs together to give you > (and > any potential sponsors) the best idea of what might be involved. > > cheers, > Richard > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Mon Jul 20 16:11:52 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 20 Jul 2009 13:11:52 -0700 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> <1247462435.25217.7.camel@buzzybee> <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com> Message-ID: <59a41c430907201311o4d287651k3adc2069b2c95f61@mail.gmail.com> Hi Paolo, Not sure if you got a response to your mail off list. If there is sufficient interest from the people working on processing the output of the various sequencers, it would be great if those people would work together to get a new biojava module started. Most probably somebody needs to take initiative and lead the development, otherwise it won't happen. Cheers, Andreas On Tue, Jul 14, 2009 at 9:08 AM, Paolo Pavan wrote: > Dear all, > I took a day to make a rapid search to try to have a clearest point of > the situation. > ? ? ? ? I found the specification of the .sff file in the 454 instrument > manual, it is fully described and seems to be enough to build a > reader. > ? ? ? ? However from a more careful read it seems that a *.sff file brings > not information about the automatic contig assembling and only stores > flowgram info that are "reads" (not like a *.ace file indeed). > ? ? ? ? Two hidden binary files can be found in a 454 gsAssembler project > folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are > not described in the manual but they seem to contain the former > nucleotide data and the latter read names, they are big enough to > contain such kind of data, the problem is that we don't know how to > parse them. > ? ? ? ? It is necessary to decide a "memory structure" in which store the > information read, I agree on the "memory mapping" solution, maybe > implemented with a Map object that can associate the names of the read > and its location on the file. > ? ? ? ? the parser class then should expose methods to: > ? ? ? ?1) iterate through reads, but maybe this should be heavy and avoidable > ? ? ? ?2) access read sequence from name > ? ? ? ? if the parser should manage the assembled contigs too and this is > subordinated to what explained in the third bullet point, it should > expose method to: > ? ? ? ?1) iterate through contigs names > ? ? ? ?2) iterate through contigs consensus sequences > ? ? ? ?3) access consensus sequences from name (this is a sub problem of point 2) > ? ? ? ?4) access random aligned portions (I mean "slice") of the assembly > given start-end positions returning an alignment object > ? ? ? ? any more suggestions? > I would be glad to be involved in the biojava community through this > project and I could try but first of all I want to say that I?m not a > guru like most of the people here ( :-p ) and to say the truth the job > that my company required me is different and maybe if exists a > workaround I should be honest to choose it. > So let me think a bit about starting such adventure, if I can couple > my job and contributing the community growth I?ll be happy to share my > work! Any suggestion welcome. > > Bye bye, > Paolo > > > 2009/7/13 Mark Schreiber : >> I would agree that there is a strong need for this kind of thing in biojava. >> >> As Richard says you probably can't fit it in memory so you may want to >> memory map it. There are classes in the javax.nio package that can help a >> lot with this. >> >> Also I have had some success with in-memory compression of large files using >> LZ compression. Essentially the memory representation of the file is LZ >> compressed and compression and decompression are handled on the fly. Again >> there are Java utility classes that can help. >> >> - Mark >> >> On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland >> wrote: >>> >>> Nothing within BJ can parse the 454 .sff files directly. However I think >>> there is a growing need for it so if anyone is willing to contribute >>> code, it would be very welcome. >>> >>> There is also no .ace parser, although in 2007 someone volunteered to >>> write one but nothing happened, and there was a previous post (many >>> years ago!) from someone else who already had some working code but >>> again nothing seems to have happened: >>> >>> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html >>> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html >>> >>> So to start with, someone (perhaps yourself? that would be nice! :) ) >>> needs to volunteer to write either a .ace or .sff parser, or both. >>> >>> The thing to bear in mind with 454 contigs as you rightly point out is >>> the sheer size of the things. The requirement to keep them entirely in >>> memory is likely to be unworkable as it would leave little room for >>> anything else to run on your average machine. I would suggest either >>> memory-mapping the file itself, or parsing and writing out a >>> memory-mapped summary file containing the bits of data you're interested >>> in. (Memory-mapping is where you keep an index in memory indicating >>> where in the file each record is, so that when you need to access them >>> you load them on-the-fly from the file and drop them out of memory again >>> immediately after use. An accelerated form of this is to put the loaded >>> records into some kind of LRU cache which holds only the most recently >>> accessed records and then check that cache first to see if you've >>> already loaded the record before accessing the file directly.) >>> >>> cheers, >>> Richard >>> >>> >>> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: >>> > Hi, >>> > I would like to post again with some adjustments a question I put some >>> > times ago because maybe this is a more correct list, apologize for the >>> > repeating. >>> > Can someone kindly give me his advise? >>> > >>> > thank you in advance, >>> > Paolo >>> > >>> > >>> > ---------- Forwarded message ---------- >>> > From: Paolo Pavan >>> > Date: 2009/7/9 >>> > Subject: Assembly data reading >>> > To: Biojava-l at lists.open-bio.org >>> > >>> > >>> > Hi everybody, >>> > I'm almost new to this topic, I would like to know if there is >>> > something can help me to load in my java program data from a large 454 >>> > contig. I need to retain in memory and access data from the single >>> > reads forming the contig too. >>> > I suppose these informations are in a *.sff file, if it is not >>> > possible to load such file it should be ok to load a *.ace (phrap) >>> > data file that I have too. >>> > Many thanks for any suggestion you can give me! >>> > >>> > Greetings, >>> > Paolo >>> > _______________________________________________ >>> > biojava-dev mailing list >>> > biojava-dev at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From matias.piipari at gmail.com Tue Jul 21 18:07:47 2009 From: matias.piipari at gmail.com (Matias Piipari) Date: Tue, 21 Jul 2009 23:07:47 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> References: <1247505253.27493.15.camel@buzzybee> <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> Message-ID: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com> 1. Matias Piipari 2. sequences + sequence motifs 3. Cambridge 4. Possible but not terribly likely. 5. Flexible On Sun, Jul 19, 2009 at 8:15 PM, Andy Yates wrote: > Okay another one for the ball: > > 1). Andy Yates > 2). Erm well getting it right so I guess that lands me in > testing/integration/killing every singleton > 3). Cambridge > 4). Currently in a Perl group so not a chance really > 5). Quite flexible > > How does that sound? > > > On 13 Jul 2009, at 18:14, Richard Holland wrote: > > Hi all. >> >> Andreas and I would like to organise a hackathon to get the >> modularisation and general improvement plans for BJ3 into action, and >> bring the project forward into the 21st century (only 10 years late!). >> >> At this time I'm trying to gather interest and gauge who might >> realistically be able to attend. We will attempt to site the hackathon >> at a location closest to the majority of attendees. >> >> To help me plan numbers and likely costs (for potential sponsors) could >> all those who are interested please answer the following questions for >> me: >> >> 1. Name, >> 2. Specialist interest within Biojava (e.g. proteomics, microarrays, >> sequencing, etc.), >> 3. Your physical location (country and and nearest major city - e.g. >> Cambridge, London, Newcastle, San Diego, Singapore, etc.), >> 4. Whether you think your employer would help pay your airfare and/or 1 >> week in a hotel to attend (and how far you think you could go on such >> funding), >> 5. Approximate availability for the next 12 months. >> >> To get the ball rolling, here's me: >> >> 1. Richard Holland, 2. Making the whole thing more consistent, >> efficient, and easier to use, 3. Southampton (UK), 4. Possibly but >> probably only within UK/Europe, 5. Only available in mid-Jan 2010, >> otherwise can't do anything until mid-March 2010 onwards. >> >> Looking forward to hearing your comments! Once I have a good idea of >> numbers and distribution, I can get some costs together to give you (and >> any potential sponsors) the best idea of what might be involved. >> >> cheers, >> Richard >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Wed Jul 22 14:34:26 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 22 Jul 2009 14:34:26 -0400 Subject: [Biojava-dev] Bug Tracking Message-ID: Do we have a formal defect/feature request tracking setup for Biojava? In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following. I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine. Thanks Scooter From andreas at sdsc.edu Wed Jul 22 15:15:50 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 22 Jul 2009 12:15:50 -0700 Subject: [Biojava-dev] Bug Tracking In-Reply-To: References: Message-ID: <59a41c430907221215y10b95473y25fa9d94c82afc3f@mail.gmail.com> Hi Scooter, we have bugzilla running at: http://bugzilla.open-bio.org/ Andreas On Wed, Jul 22, 2009 at 11:34 AM, Scooter Willis wrote: > Do we have a formal defect/feature request tracking setup for Biojava? > > In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following. > > I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine > > Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine. > > Thanks > > Scooter > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Wed Jul 22 15:16:19 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 22 Jul 2009 20:16:19 +0100 Subject: [Biojava-dev] Bug Tracking In-Reply-To: References: Message-ID: <1248290179.28124.54.camel@buzzybee> Yup, we do. It's here: http://bugzilla.open-bio.org/ cheers, Richard On Wed, 2009-07-22 at 14:34 -0400, Scooter Willis wrote: > Do we have a formal defect/feature request tracking setup for Biojava? > > In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following. > > I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine > > Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine. > > Thanks > > Scooter > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jw12 at sanger.ac.uk Fri Jul 24 05:12:21 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Fri, 24 Jul 2009 10:12:21 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> References: <1247505253.27493.15.camel@buzzybee> Message-ID: Hi Richard and Andreas, you both know where I'm situated, but for the record: 1. Jonathan Warren 2. Any DAS related libararies, visualization 3. Cambridge 4. Yes I think so, definitely if in UK. 5. Anytime.... although obviously I'm very busy, just not many definite commitments ;) There seem to be many DAS classes under the biojava-live site, but many of them are not used in any of the DAS related code I have, with the exception of Structure and Alignment classes that are used in some dazzle plugins. So it might be good to update some of these classes to new more relevant code. I also notice that Apollo was trying to get away from using Biojava code for it's DAS 1.5 adapter. Maybe the new modular design of biojava will resolve issues that Apollo developers had? Anyway- I guess I'm asking Andreas or anyone else if they know the history of some of these classes e.g. org.biojava.bio.program.das package? On 13 Jul 2009, at 18:14, Richard Holland wrote: > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/ > or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From julie at flymine.org Fri Jul 24 05:25:03 2009 From: julie at flymine.org (Julie Sullivan) Date: Fri, 24 Jul 2009 10:25:03 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com> References: <1247505253.27493.15.camel@buzzybee> <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com> Message-ID: <4A697DEF.2020304@flymine.org> 1. Julie Sullivan 2. InterMine uses BioJava to handle sequences and pdb data 3. Cambridge 4. No 5. Flexible >>> Andreas and I would like to organise a hackathon to get the >>> modularisation and general improvement plans for BJ3 into action, and >>> bring the project forward into the 21st century (only 10 years late!). >>> >>> At this time I'm trying to gather interest and gauge who might >>> realistically be able to attend. We will attempt to site the hackathon >>> at a location closest to the majority of attendees. >>> >>> To help me plan numbers and likely costs (for potential sponsors) could >>> all those who are interested please answer the following questions for >>> me: >>> >>> 1. Name, >>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays, >>> sequencing, etc.), >>> 3. Your physical location (country and and nearest major city - e.g. >>> Cambridge, London, Newcastle, San Diego, Singapore, etc.), >>> 4. Whether you think your employer would help pay your airfare and/or 1 >>> week in a hotel to attend (and how far you think you could go on such >>> funding), >>> 5. Approximate availability for the next 12 months. >>> >>> To get the ball rolling, here's me: >>> >>> 1. Richard Holland, 2. Making the whole thing more consistent, >>> efficient, and easier to use, 3. Southampton (UK), 4. Possibly but >>> probably only within UK/Europe, 5. Only available in mid-Jan 2010, >>> otherwise can't do anything until mid-March 2010 onwards. >>> >>> Looking forward to hearing your comments! Once I have a good idea of >>> numbers and distribution, I can get some costs together to give you (and >>> any potential sponsors) the best idea of what might be involved. >>> >>> cheers, >>> Richard >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From gmicha at gmail.com Fri Jul 24 06:38:04 2009 From: gmicha at gmail.com (Micha Sammeth) Date: Fri, 24 Jul 2009 12:38:04 +0200 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> References: <1247505253.27493.15.camel@buzzybee> Message-ID: <4A698F0C.9090402@gmail.com> 1) Michael Sammeth 2) sequencing, gene expression, splicing, alignment 3) Barcelona beach 4) continental maybe yes, states probably not 5) not available from mid Oct to end of Nov, the rest of the time probably just busy as usual Cheers, micha Richard Holland wrote: > Hi all. > > Andreas and I would like to organise a hackathon to get the > modularisation and general improvement plans for BJ3 into action, and > bring the project forward into the 21st century (only 10 years late!). > > At this time I'm trying to gather interest and gauge who might > realistically be able to attend. We will attempt to site the hackathon > at a location closest to the majority of attendees. > > To help me plan numbers and likely costs (for potential sponsors) could > all those who are interested please answer the following questions for > me: > > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. > > To get the ball rolling, here's me: > > 1. Richard Holland, 2. Making the whole thing more consistent, > efficient, and easier to use, 3. Southampton (UK), 4. Possibly but > probably only within UK/Europe, 5. Only available in mid-Jan 2010, > otherwise can't do anything until mid-March 2010 onwards. > > Looking forward to hearing your comments! Once I have a good idea of > numbers and distribution, I can get some costs together to give you (and > any potential sponsors) the best idea of what might be involved. > > cheers, > Richard > -- O o O o O o Dr. Michael Sammeth | O o | | O o | | O o | http://www.sammeth.net | | O | | | | O | GRIB| O | Phone: +34-933-160-166 | o O | | o O | | o O | Fax: +34 933-969-983 o O o O o O Dr. Aiguader 88, 08003 Barcelona From andreas at sdsc.edu Fri Jul 24 11:23:42 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 24 Jul 2009 08:23:42 -0700 Subject: [Biojava-dev] Hackathon In-Reply-To: References: <1247505253.27493.15.camel@buzzybee> Message-ID: <59a41c430907240823u5dc152i9c6a5854adf9ba0e@mail.gmail.com> > There seem to be many DAS classes under the biojava-live site, but many of > them are not used in any of the DAS related code I have, with the exception > of Structure and Alignment classes that are used in some dazzle plugins. Many of the org.biojava.bio.program.das code is quite ancient and should be deprecated... In a new biojava-das related module would be nice to merge in one of the more modern DAS libraries (dasobert?) as a replacement... Andreas > it might be good to update some of these classes to new more relevant code. > I also notice that Apollo was trying to get away from using Biojava code for > it's DAS 1.5 adapter. Maybe the new modular design of biojava will resolve > issues that Apollo developers had? > > Anyway- I guess I'm asking Andreas or anyone else if they know the history > of some of these classes e.g. org.biojava.bio.program.das package? > > > > On 13 Jul 2009, at 18:14, Richard Holland wrote: > >> 1. Name, >> 2. Specialist interest within Biojava (e.g. proteomics, microarrays, >> sequencing, etc.), >> 3. Your physical location (country and and nearest major city - e.g. >> Cambridge, London, Newcastle, San Diego, Singapore, etc.), >> 4. Whether you think your employer would help pay your airfare and/or 1 >> week in a hotel to attend (and how far you think you could go on such >> funding), >> 5. Approximate availability for the next 12 months. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a > charity registered in England with number 1021457 and acompany registered in > England with number 2742969, whose registeredoffice is 215 Euston Road, > London, NW1 2BE._______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Mon Jul 27 09:23:19 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 27 Jul 2009 14:23:19 +0100 Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files? In-Reply-To: <200907271416.33485.florian.mittag@uni-tuebingen.de> References: <200907241929.08768.florian.mittag@uni-tuebingen.de> <93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> <200907271416.33485.florian.mittag@uni-tuebingen.de> Message-ID: <1248700999.2803.103.camel@buzzybee> > My question to this list again: > Is there a way to achieve my goal of parsing a 200MB Genbank file with the > current biojava version without code changes? Probably not. The internal requirement to convert everything into SymbolLists and back again really does get in the way. This is one of the main drivers behind BioJava3 - to refactor out unnecessary complexity, of which this is a prime example. The ideal solution would be to parse the file and keep the sequence as a string, only to be converted into Symbols when _absolutely necessary_ - otherwise to remain as a string (or even just as a pointer to a string stored on a disk-based temporary file repository somewhere, to save memory). Hibernate et al could then work directly with the string. cheers, Richard > > - Florian > > > > > On 25 Jul 2009, 1:33 AM, "Florian Mittag" > > wrote: > > > > Hi! > > > > I think this is a problem worth of its own thread, so I'll start one: > > > > I want to store all human chromosomes in a BioSQL database after I loaded > > the > > information from .gbk files. The files I get from NCBI with the following > > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804: > > > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0 > >00023&rettype=gbwithparts&retmode=text > > > > I then try to parse the files as described in > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi > >les but it wont work. While there are no problems parsing 1804 and 24, > > chromosome > > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space. > > > > Here is a stack trace (the line numbers might differ, because I already > > tried > > to improve GenbankFormat.java in memory efficiency): > > > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > > at > > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis > >tFactory.java:222) at > > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ > >enceBuilder.java:256) at > > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5 > >35) at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader. > >java:110) at > > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main. > >java:537) at > > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46 > >8) at > > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164) > > > > The line in GenbankFormat.java is: > > > > rlistener.addSymbols( > > symParser.getAlphabet(), > > (Symbol[])(sl.toList().toArray(new Symbol[0])), > > 0, sl.length()); > > > > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails > > later > > inside the addSymbols method, but it always fails. > > > > How can this be? I mean, the file is only 190MB in size, so 2GB of memory > > should be more than enough. Browsing through the source code, I discovered > > what I think of as very inefficient handling of sequences: > > > > 1) the sequence string is read from file into a StringBuffer > > 2) it is converted to a string (with whitespaces removed) > > 3) a SimpleSymbolList is created out of the string > > 4) the SymbolList is converted to a List of Symbols > > 5) the List is converted to an array of Symbols > > 6) the array is passed to addSymbols > > 7) there it is added to a ChunkedSymbolListFactory > > 8) if at some point the sequence is requested, a SymbolList is created and > > then converted to a string. > > > > You see, there is a lot of copying and converting, but in the end I have > > the same string I started with. Well, I had the string, if it ever reached > > the end, because it will crash before completing this process. > > > > > > Am I doing something wrong or is there a great potential of improving > > parsing > > of Genbank files? > > > > > > Regards, > > Florian > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From paolo.pavan at gmail.com Mon Jul 27 12:47:46 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 27 Jul 2009 18:47:46 +0200 Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files? In-Reply-To: <1248700999.2803.103.camel@buzzybee> References: <200907241929.08768.florian.mittag@uni-tuebingen.de> <93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> <200907271416.33485.florian.mittag@uni-tuebingen.de> <1248700999.2803.103.camel@buzzybee> Message-ID: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com> Calling a garbage collection among the steps doesn't bring to anything, isn't it? 2009/7/27 Richard Holland : > >> My question to this list again: >> Is there a way to achieve my goal of parsing a 200MB Genbank file with the >> current biojava version without code changes? > > Probably not. The internal requirement to convert everything into > SymbolLists and back again really does get in the way. This is one of > the main drivers behind BioJava3 - to refactor out unnecessary > complexity, of which this is a prime example. > > The ideal solution would be to parse the file and keep the sequence as a > string, only to be converted into Symbols when _absolutely necessary_ - > otherwise to remain as a string (or even just as a pointer to a string > stored on a disk-based temporary file repository somewhere, to save > memory). Hibernate et al could then work directly with the string. > > cheers, > Richard > >> >> - Florian >> >> >> >> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" >> > wrote: >> > >> > Hi! >> > >> > I think this is a problem worth of its own thread, so I'll start one: >> > >> > I want to store all human chromosomes in a BioSQL database after I loaded >> > the >> > information from .gbk files. The files I get from NCBI with the following >> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804: >> > >> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0 >> >00023&rettype=gbwithparts&retmode=text >> > >> > I then try to parse the files as described in >> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi >> >les but it wont work. While there are no problems parsing 1804 and 24, >> > chromosome >> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space. >> > >> > Here is a stack trace (the line numbers might differ, because I already >> > tried >> > to improve GenbankFormat.java in memory efficiency): >> > >> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >> > ? ? ? ?at >> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis >> >tFactory.java:222) at >> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ >> >enceBuilder.java:256) at >> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5 >> >35) at >> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader. >> >java:110) at >> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main. >> >java:537) at >> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46 >> >8) at >> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164) >> > >> > The line in GenbankFormat.java is: >> > >> > rlistener.addSymbols( >> > ? ? ? ?symParser.getAlphabet(), >> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])), >> > ? ? ? ?0, sl.length()); >> > >> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails >> > later >> > inside the addSymbols method, but it always fails. >> > >> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory >> > should be more than enough. Browsing through the source code, I discovered >> > what I think of as very inefficient handling of sequences: >> > >> > 1) the sequence string is read from file into a StringBuffer >> > 2) it is converted to a string (with whitespaces removed) >> > 3) a SimpleSymbolList is created out of the string >> > 4) the SymbolList is converted to a List of Symbols >> > 5) the List is converted to an array of Symbols >> > 6) the array is passed to addSymbols >> > 7) there it is added to a ChunkedSymbolListFactory >> > 8) if at some point the sequence is requested, a SymbolList is created and >> > then converted to a string. >> > >> > You see, there is a lot of copying and converting, but in the end I have >> > the same string I started with. Well, I had the string, if it ever reached >> > the end, because it will crash before completing this process. >> > >> > >> > Am I doing something wrong or is there a great potential of improving >> > parsing >> > of Genbank files? >> > >> > >> > Regards, >> > ? Florian >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Mon Jul 27 22:52:44 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 28 Jul 2009 10:52:44 +0800 Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files? In-Reply-To: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com> References: <200907241929.08768.florian.mittag@uni-tuebingen.de> <93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> <200907271416.33485.florian.mittag@uni-tuebingen.de> <1248700999.2803.103.camel@buzzybee> <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com> Message-ID: <93b45ca50907271952k689c2f27h78bf7b9cc47e7d45@mail.gmail.com> Dear Paolo - Calling the garbage collector is generally not required and often not recommended. Modern JVMs do a better job of this than programmers do. Also a garbage collector cannot release memory that is allocated to objects that still contain references. I suspect the problem here is that objects are being copied and references are being retained to the old copies. These old copies are not really required and therefore the references can be set to null which will allow the GC to clean them up. Also, manually calling the GC is very aggressive and forces the JVM to dump all classes it is not currently using, when the class is called again the classloader will need to reload it which can result in a performance hit. - Mark On Tue, Jul 28, 2009 at 12:47 AM, Paolo Pavan wrote: > Calling a garbage collection among the steps doesn't bring to > anything, isn't it? > > 2009/7/27 Richard Holland : >> >>> My question to this list again: >>> Is there a way to achieve my goal of parsing a 200MB Genbank file with the >>> current biojava version without code changes? >> >> Probably not. The internal requirement to convert everything into >> SymbolLists and back again really does get in the way. This is one of >> the main drivers behind BioJava3 - to refactor out unnecessary >> complexity, of which this is a prime example. >> >> The ideal solution would be to parse the file and keep the sequence as a >> string, only to be converted into Symbols when _absolutely necessary_ - >> otherwise to remain as a string (or even just as a pointer to a string >> stored on a disk-based temporary file repository somewhere, to save >> memory). Hibernate et al could then work directly with the string. >> >> cheers, >> Richard >> >>> >>> - Florian >>> >>> >>> >>> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" >>> > wrote: >>> > >>> > Hi! >>> > >>> > I think this is a problem worth of its own thread, so I'll start one: >>> > >>> > I want to store all human chromosomes in a BioSQL database after I loaded >>> > the >>> > information from .gbk files. The files I get from NCBI with the following >>> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804: >>> > >>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0 >>> >00023&rettype=gbwithparts&retmode=text >>> > >>> > I then try to parse the files as described in >>> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi >>> >les but it wont work. While there are no problems parsing 1804 and 24, >>> > chromosome >>> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space. >>> > >>> > Here is a stack trace (the line numbers might differ, because I already >>> > tried >>> > to improve GenbankFormat.java in memory efficiency): >>> > >>> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >>> > ? ? ? ?at >>> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis >>> >tFactory.java:222) at >>> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ >>> >enceBuilder.java:256) at >>> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5 >>> >35) at >>> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader. >>> >java:110) at >>> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main. >>> >java:537) at >>> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46 >>> >8) at >>> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164) >>> > >>> > The line in GenbankFormat.java is: >>> > >>> > rlistener.addSymbols( >>> > ? ? ? ?symParser.getAlphabet(), >>> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])), >>> > ? ? ? ?0, sl.length()); >>> > >>> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails >>> > later >>> > inside the addSymbols method, but it always fails. >>> > >>> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory >>> > should be more than enough. Browsing through the source code, I discovered >>> > what I think of as very inefficient handling of sequences: >>> > >>> > 1) the sequence string is read from file into a StringBuffer >>> > 2) it is converted to a string (with whitespaces removed) >>> > 3) a SimpleSymbolList is created out of the string >>> > 4) the SymbolList is converted to a List of Symbols >>> > 5) the List is converted to an array of Symbols >>> > 6) the array is passed to addSymbols >>> > 7) there it is added to a ChunkedSymbolListFactory >>> > 8) if at some point the sequence is requested, a SymbolList is created and >>> > then converted to a string. >>> > >>> > You see, there is a lot of copying and converting, but in the end I have >>> > the same string I started with. Well, I had the string, if it ever reached >>> > the end, because it will crash before completing this process. >>> > >>> > >>> > Am I doing something wrong or is there a great potential of improving >>> > parsing >>> > of Genbank files? >>> > >>> > >>> > Regards, >>> > ? Florian >>> > _______________________________________________ >>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From abhishek.vit at gmail.com Tue Jul 28 15:04:00 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Tue, 28 Jul 2009 15:04:00 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl Message-ID: Hi Guys Before I ask the question, let me introduce myself. I am Abhishek primarily a Bioinformatician and this is my first mail here. I realized sooner thn later that I have to use BioJava to make my life easier. :) So basically we have a lot of perl code where we would like to plugin some Biojava code and some inhouse written packages/classes. I am just wondering what is the best way to do so. Clearly I am not a java guy so please excuse me in case I am asking something which is very basic. I found couple of solutions after few googles but not sure which is the efficient one. Thanks, -Abhi From ayates at ebi.ac.uk Tue Jul 28 17:54:17 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 28 Jul 2009 22:54:17 +0100 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: Message-ID: Hi Abhi, Well to answer your first question the only real way to do this is by shelling out to Java. Inter-process communication could then be dealt with by writing to temporary files or maybe communicating back over STDOUT. The question I would ask you though is what particular part of BioJava are you using? Is there any reason why another similarly named Bio project (shall not mention it here as I think people think I'm becoming weak when it comes to Perl) cannot be used? As always when programming avoiding shelling out to another program if possible is always a good idea; sometimes it cannot happen say if you want to run clustalw but say shelling out to delete a file is unnecessary. Andy On 28 Jul 2009, at 20:04, Abhishek Pratap wrote: > Hi Guys > > Before I ask the question, let me introduce myself. I am Abhishek > primarily a Bioinformatician and this is my first mail here. I > realized sooner thn later that I have to use BioJava to make my life > easier. :) > > So basically we have a lot of perl code where we would like to plugin > some Biojava code and some inhouse written packages/classes. I am just > wondering what is the best way to do so. Clearly I am not a java guy > so please excuse me in case I am asking something which is very basic. > I found couple of solutions after few googles but not sure which is > the efficient one. > > Thanks, > -Abhi > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From abhishek.vit at gmail.com Tue Jul 28 18:23:49 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Tue, 28 Jul 2009 18:23:49 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: Message-ID: Hi Andy Thanks for a quick reply. I think SHELLING out will be too process intensive as we expect thousands of call to same Java method. I also read about the Perl modules Java::Inline. Is that any good ? And to answer your second question I am basically using a inhouse method which in turns used a lot of BioJava classes for DNA manipulation. Thanks, -Abhi On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi Abhi, > > Well to answer your first question the only real way to do this is by > shelling out to Java. Inter-process communication could then be dealt with > by writing to temporary files or maybe communicating back over STDOUT. > > The question I would ask you though is what particular part of BioJava are > you using? Is there any reason why another similarly named Bio project > (shall not mention it here as I think people think I'm becoming weak when it > comes to Perl) cannot be used? As always when programming avoiding shelling > out to another program if possible is always a good idea; sometimes it > cannot happen say if you want to run clustalw but say shelling out to delete > a file is unnecessary. > > Andy > > On 28 Jul 2009, at 20:04, Abhishek Pratap wrote: > >> Hi Guys >> >> Before I ask the question, let me introduce myself. I am Abhishek >> primarily a Bioinformatician and this is my first mail here. I >> realized sooner thn later that I have to use BioJava to make my life >> easier. :) >> >> So basically we have a lot of perl code where we would like to plugin >> some Biojava code and some inhouse written packages/classes. I am just >> wondering what is the best way to do so. Clearly I am not a java guy >> so please excuse me in case I am asking something which is very basic. >> I found couple of solutions after few googles but not sure which is >> the efficient one. >> >> Thanks, >> -Abhi >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From markjschreiber at gmail.com Tue Jul 28 19:24:30 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 29 Jul 2009 07:24:30 +0800 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> Message-ID: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> Hi - You could try and use something like CORBA but that would be quite ugly. A nicer alternative would be to put the BioJava functionality in a web service and send sequences as FASTA or some custom format?? I think WS is considered the best way for Java and .NET to talk so probably it is for Perl too. - Mark On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: Hi Andy Thanks for a quick reply. I think SHELLING out will be too process intensive as we expect thousands of call to same Java method. I also read about the Perl modules Java::Inline. Is that any good ? And to answer your second question I am basically using a inhouse method which in turns used a lot of BioJava classes for DNA manipulation. Thanks, -Abhi On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi Abhi, > > Well to answer ... From ayates at ebi.ac.uk Wed Jul 29 04:48:33 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 29 Jul 2009 09:48:33 +0100 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> Message-ID: <4A700CE1.3050901@ebi.ac.uk> Indeed I would agree with Mark here and go for web services as the desired solution. JAX-WS & CXF are both popular frameworks for doing Web Services and as far as I remember Spring had some very nice helper classes for quickly exposing any Java class as a web service. Then there are other remoting protocols such as Hessian, Burlap, Protocol Buffers or Thrift all of which are good in their own ways. However Web Services should be the quickest (re implementation) way to communicate with a persistent Java process. Personally I would stay away from Java::Inline. Andy Mark Schreiber wrote: > Hi - > > You could try and use something like CORBA but that would be quite ugly. > > A nicer alternative would be to put the BioJava functionality in a web > service and send sequences as FASTA or some custom format?? > > I think WS is considered the best way for Java and .NET to talk so probably > it is for Perl too. > > - Mark > > On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: > > Hi Andy > > Thanks for a quick reply. I think SHELLING out will be too process > intensive as we expect thousands of call to same Java method. I also > read about the Perl modules Java::Inline. Is that any good ? > > And to answer your second question I am basically using a inhouse > method which in turns used a lot of BioJava classes for DNA > manipulation. > > Thanks, > -Abhi > > On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi > Abhi, > > Well to answer ... > From abhishek.vit at gmail.com Wed Jul 29 12:04:04 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 29 Jul 2009 12:04:04 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <4A700CE1.3050901@ebi.ac.uk> References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> <4A700CE1.3050901@ebi.ac.uk> Message-ID: Thanks all. I think Java WS is a way out for me then. As you said it would be code agnostic and will help me in updating the core code later. Just a quick question . Do you happen to know of any good tutorial to implement a WS for a java process. Thanks, -Abhi On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates wrote: > Indeed I would agree with Mark here and go for web services as the > desired solution. JAX-WS & CXF are both popular frameworks for doing Web > Services and as far as I remember Spring had some very nice helper > classes for quickly exposing any Java class as a web service. Then there > are other remoting protocols such as Hessian, Burlap, Protocol Buffers > or Thrift all of which are good in their own ways. > > However Web Services should be the quickest (re implementation) way to > communicate with a persistent Java process. > > Personally I would stay away from Java::Inline. > > Andy > > Mark Schreiber wrote: >> Hi - >> >> You could try and use something like CORBA but that would be quite ugly. >> >> A nicer alternative would be to put the BioJava functionality in a web >> service and send sequences as FASTA or some custom format?? >> >> I think WS is considered the best way for Java and .NET to talk so probably >> it is for Perl too. >> >> - Mark >> >> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: >> >> Hi Andy >> >> Thanks for a quick reply. ?I think SHELLING out will be too process >> intensive as we expect thousands of call to same Java method. I also >> read about the Perl modules Java::Inline. Is that any good ? >> >> And to answer your second question I am basically using a inhouse >> method which in turns used a lot of BioJava classes for DNA >> manipulation. >> >> Thanks, >> -Abhi >> >> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi >> Abhi, > > Well to answer ... >> > From ayates at ebi.ac.uk Wed Jul 29 12:21:12 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 29 Jul 2009 17:21:12 +0100 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> <4A700CE1.3050901@ebi.ac.uk> Message-ID: <4A7076F8.2040308@ebi.ac.uk> Depends on what you're going to use but when I last did it I bought into the Spring way of things and found that the spring manual was very good. The WS bit is: http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/ch21s05.html It goes through doing it for JAX-WS & XFire. There's also a JAX-WS tutorial from: http://java.sun.com/javaee/5/docs/tutorial/doc/?wp405739&JAXWS.html#wp72279 To be honest though Google is your best friend here. Good luck, Andy Abhishek Pratap wrote: > Thanks all. I think Java WS is a way out for me then. As you said it > would be code agnostic and will help me in updating the core code > later. > > Just a quick question . Do you happen to know of any good tutorial to > implement a WS for a java process. > > Thanks, > -Abhi > > On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates wrote: >> Indeed I would agree with Mark here and go for web services as the >> desired solution. JAX-WS & CXF are both popular frameworks for doing Web >> Services and as far as I remember Spring had some very nice helper >> classes for quickly exposing any Java class as a web service. Then there >> are other remoting protocols such as Hessian, Burlap, Protocol Buffers >> or Thrift all of which are good in their own ways. >> >> However Web Services should be the quickest (re implementation) way to >> communicate with a persistent Java process. >> >> Personally I would stay away from Java::Inline. >> >> Andy >> >> Mark Schreiber wrote: >>> Hi - >>> >>> You could try and use something like CORBA but that would be quite ugly. >>> >>> A nicer alternative would be to put the BioJava functionality in a web >>> service and send sequences as FASTA or some custom format?? >>> >>> I think WS is considered the best way for Java and .NET to talk so probably >>> it is for Perl too. >>> >>> - Mark >>> >>> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: >>> >>> Hi Andy >>> >>> Thanks for a quick reply. I think SHELLING out will be too process >>> intensive as we expect thousands of call to same Java method. I also >>> read about the Perl modules Java::Inline. Is that any good ? >>> >>> And to answer your second question I am basically using a inhouse >>> method which in turns used a lot of BioJava classes for DNA >>> manipulation. >>> >>> Thanks, >>> -Abhi >>> >>> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi >>> Abhi, > > Well to answer ... >>> From Russell.Smithies at agresearch.co.nz Wed Jul 29 16:25:05 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 30 Jul 2009 08:25:05 +1200 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> You could always use BioPerl instead :-) http://www.bioperl.org/wiki/Main_Page Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev- > bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap > Sent: Wednesday, 29 July 2009 7:04 a.m. > To: biojava-dev at lists.open-bio.org > Subject: [Biojava-dev] Hi.. Calling Java from Perl > > Hi Guys > > Before I ask the question, let me introduce myself. I am Abhishek > primarily a Bioinformatician and this is my first mail here. I > realized sooner thn later that I have to use BioJava to make my life > easier. :) > > So basically we have a lot of perl code where we would like to plugin > some Biojava code and some inhouse written packages/classes. I am just > wondering what is the best way to do so. Clearly I am not a java guy > so please excuse me in case I am asking something which is very basic. > I found couple of solutions after few googles but not sure which is > the efficient one. > > Thanks, > -Abhi > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ayates at ebi.ac.uk Wed Jul 29 16:56:34 2009 From: ayates at ebi.ac.uk (ayates at ebi.ac.uk) Date: Wed, 29 Jul 2009 21:56:34 +0100 (BST) Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> Message-ID: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk> That was my original point however it sounds from the original poster that the system which is in Perl needs to call out to an already implemented system in BioJava. In a perfect world this mismatch would never happen but hey we all know it can :) Andy > You could always use BioPerl instead :-) > http://www.bioperl.org/wiki/Main_Page > > > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809?? > F? +64 3 489 9174? > www.agresearch.co.nz > > > >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev- >> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap >> Sent: Wednesday, 29 July 2009 7:04 a.m. >> To: biojava-dev at lists.open-bio.org >> Subject: [Biojava-dev] Hi.. Calling Java from Perl >> >> Hi Guys >> >> Before I ask the question, let me introduce myself. I am Abhishek >> primarily a Bioinformatician and this is my first mail here. I >> realized sooner thn later that I have to use BioJava to make my life >> easier. :) >> >> So basically we have a lot of perl code where we would like to plugin >> some Biojava code and some inhouse written packages/classes. I am just >> wondering what is the best way to do so. Clearly I am not a java guy >> so please excuse me in case I am asking something which is very basic. >> I found couple of solutions after few googles but not sure which is >> the efficient one. >> >> Thanks, >> -Abhi >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From abhishek.vit at gmail.com Wed Jul 29 17:06:54 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 29 Jul 2009 17:06:54 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk> References: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk> Message-ID: Yeah it is part of the development cycle. We need to integrate the code some part of it is perl and some in Java. >From your suggestions I feel I clearly have two options. 1. use a web service to talk between java and perl. This might not be very efficient as we expect to make thousands of call per run. 2. Port the whole java code to bioperl. #2 is scary but I might just have to do it. Thanks again to all of you, -Abhi On Wed, Jul 29, 2009 at 4:56 PM, wrote: > That was my original point however it sounds from the original poster that > the system which is in Perl needs to call out to an already implemented > system in BioJava. In a perfect world this mismatch would never happen but > hey we all know it can :) > > Andy > >> You could always use BioPerl instead :-) >> http://www.bioperl.org/wiki/Main_Page >> >> >> >> >> Russell Smithies >> >> Bioinformatics Applications Developer >> T +64 3 489 9085 >> E? russell.smithies at agresearch.co.nz >> >> Invermay? Research Centre >> Puddle Alley, >> Mosgiel, >> New Zealand >> T? +64 3 489 3809 >> F? +64 3 489 9174 >> www.agresearch.co.nz >> >> >> >>> -----Original Message----- >>> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev- >>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap >>> Sent: Wednesday, 29 July 2009 7:04 a.m. >>> To: biojava-dev at lists.open-bio.org >>> Subject: [Biojava-dev] Hi.. Calling Java from Perl >>> >>> Hi Guys >>> >>> Before I ask the question, let me introduce myself. I am Abhishek >>> primarily a Bioinformatician and this is my first mail here. I >>> realized sooner thn later that I have to use BioJava to make my life >>> easier. :) >>> >>> So basically we have a lot of perl code where we would like to plugin >>> some Biojava code and some inhouse written packages/classes. I am just >>> wondering what is the best way to do so. Clearly I am not a java guy >>> so please excuse me in case I am asking something which is very basic. >>> I found couple of solutions after few googles but not sure which is >>> the efficient one. >>> >>> Thanks, >>> -Abhi >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > From niall at sgenomics.org Thu Jul 30 12:32:00 2009 From: niall at sgenomics.org (Niall Haslam) Date: Thu, 30 Jul 2009 18:32:00 +0200 Subject: [Biojava-dev] Webservices Message-ID: <200907301832.01103.niall@sgenomics.org> Hi, I know it was brought up in the users list a month or two ago. But I wanted to ask in the Dev list what the consensus is on creating a biojava module for webservices clients. I am interested and have a little code to contribute. I think it would consist of mainly example code in how to use the webservice. And critically would not incorporate the stub code generated by axis. I would also bump for axis2. I think this could have the benefit of making services more standards compliant. But we'll probably have to do it on a case by case basis. I'd also like to know if there are people who are interested in using or writing some of it as well. Thanks and looking forward to your input, Niall. From HWillis at scripps.edu Fri Jul 31 10:04:31 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 31 Jul 2009 10:04:31 -0400 Subject: [Biojava-dev] Webservices In-Reply-To: <200907301832.01103.niall@sgenomics.org> Message-ID: Niall I have the web services biojava implementation on my list of things to do! I have an upcoming project that doing Blast through web services to external sources and internal sources will make things easier. I like what axis2 is doing on making it easy to publish web services but using Netbeans as an example it is fairly painless to create a web service. Since we are mainly focused on consuming web services it would be nice to use the built in support of Java 6 to keep the external library count as low as possible which also helps avoid conflicts when an external application is using a different version of the same external library. I think the main driving force as you mention is that much will depend on the provider of the web service as to what web services client library will be needed. Thanks Scooter On 7/30/09 12:32 PM, "Niall Haslam" wrote: Hi, I know it was brought up in the users list a month or two ago. But I wanted to ask in the Dev list what the consensus is on creating a biojava module for webservices clients. I am interested and have a little code to contribute. I think it would consist of mainly example code in how to use the webservice. And critically would not incorporate the stub code generated by axis. I would also bump for axis2. I think this could have the benefit of making services more standards compliant. But we'll probably have to do it on a case by case basis. I'd also like to know if there are people who are interested in using or writing some of it as well. Thanks and looking forward to your input, Niall. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From heuermh at acm.org Wed Jul 1 16:56:36 2009 From: heuermh at acm.org (Michael Heuer) Date: Wed, 1 Jul 2009 12:56:36 -0400 (EDT) Subject: [Biojava-dev] Singletons are bad In-Reply-To: <93b45ca50906300133w58109024vb89c6970a8446fed@mail.gmail.com> Message-ID: Mark Schreiber wrote: > I came across this today which is an interesting article about how > singletons seem like a good idea but after a while you realise they get you > into serious trouble. After playing with BioJava for over 10 years I > completely concur. Singletons and fly-weight objects are (IMHO) the most > serious problem in the BioJava code base and as the article predicts the BJ > code base is completely infected with them. > > The article is here: > http://tech.puredanger.com/2007/07/03/pattern-hate-singleton/ > > > But I have copied the paragraph below as it seems to offer a way out without > completely breaking everything. This should be seriously considered for > future BJ releases. > > ... paste starts here > But I already have a bunch of singletons in my code! > ... I've had good luck using Google Guice in several for-work projects: > http://code.google.com/p/google-guice/ @Inject is the new new as they say. :) michael From sylvain.foisy at diploide.net Thu Jul 2 13:12:44 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Thu, 02 Jul 2009 09:12:44 -0400 Subject: [Biojava-dev] Preliminary QBlast support in biojava-live Message-ID: Hi all, I just put some material into a new package (org.biojavax.bio.alignment) for creating a remote service for alignment with its implementation for QBlast. The philosophy for using these is this: - Create an implementation of RemotePairwiseAlignmentService for a specific remote service; - Create an implementation of RemotePairwiseAlignementProperties to set parameters for alignment; - Use the sendAlignmentRequest() method with a sequence with the implemented RemotePairwiseAlignementProperties to submit the sequence for alignmnent. - Retrieve the results with an implementation of RemotePairwiseAlignmentOutputProperties which specifies the format of the output to get from the service. This is done so that submission of sequence and retrieval of results can be dissociated. I think that I have addressed most of the points of a few weeks back. If not, let me know ;-) I created a demo in the demos folder. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From aradwen at gmail.com Thu Jul 2 13:28:00 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Thu, 2 Jul 2009 15:28:00 +0200 Subject: [Biojava-dev] Parsing Interpro results Message-ID: Hello everyone, I looked around in Biojava doc and through internet but I did'nt found how to parse Interproscan results (xml as well as tabular formats) It is not hard to code it in Java, But I just wanted to know if this exists or not. Regards Rad From hunter at ebi.ac.uk Thu Jul 2 14:56:28 2009 From: hunter at ebi.ac.uk (Sarah Hunter) Date: Thu, 02 Jul 2009 15:56:28 +0100 Subject: [Biojava-dev] Parsing Interpro results In-Reply-To: <12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com> References: <12c279870907020749s1f6b9890ub7090b89c473340c@mail.gmail.com> Message-ID: <4A4CCA9C.1080404@ebi.ac.uk> Hi Radwen (and the rest of the biojava guys), As far as I am aware, there isn't a biojava parser for InterProScan results. However, we are undergoing a complete re-write of InterPro and InterProScan at the moment and it is our intention to provide a java API for accessing all of our data. If you wish to be involved in testing this API, please contact the InterPro team via the EBI's support pages (http://www.ebi.ac.uk/support/) Many thanks for your interest. Sarah Hunter --- Sarah Hunter InterPro Team Leader European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD, UK ===================================== > From: Radwen ANIBA > Date: Thu, Jul 2, 2009 at 2:28 PM > Subject: [Biojava-dev] Parsing Interpro results > To: biojava-dev at lists.open-bio.org > > > Hello everyone, > > I looked around in Biojava doc and through internet but I did'nt found how > to parse Interproscan results (xml as well as tabular formats) > It is not hard to code it in Java, But I just wanted to know if this exists > or not. > > Regards > Rad > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > From fbristow at gmail.com Tue Jul 7 13:34:09 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Tue, 7 Jul 2009 08:34:09 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer Message-ID: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> Hi everyone, Now that you're all back from ISMB (I hope you all had a good time!) I thought it would be a good time to bring this up. A while back I wrote to the list about an ABIF parser and SCF writer that I had written. I got some pointers on things to change and I've since made the suggested changes. Now I was wondering how I should go about getting these files into BioJava.... -- Franklin From andreas at sdsc.edu Tue Jul 7 16:51:05 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 7 Jul 2009 09:51:05 -0700 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> Message-ID: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> Hi Franklin, The theme of the moment is modularization... I wonder if we should make a module for parsing the output of sequencers... This topic is also a bit related to the discussion we had around BOSC last week, how to contribute modules, and what is the role of a module maintainer. I will send out a more detailed summary on that a bit later. Andreas On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow wrote: > Hi everyone, > Now that you're all back from ISMB (I hope you all had a good time!) I > thought it would be a good time to bring this up. > > A while back I wrote to the list about an ABIF parser and SCF writer that I > had written. I got some pointers on things to change and I've since made > the suggested changes. Now I was wondering how I should go about getting > these files into BioJava.... > > -- > Franklin > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From gmicha at gmail.com Tue Jul 7 17:08:07 2009 From: gmicha at gmail.com (Micha Sammeth) Date: Tue, 07 Jul 2009 19:08:07 +0200 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> Message-ID: <4A5380F7.6090602@gmail.com> Hi, the time I worked on SCF and ABI files is a bit ago, but lets see if I can contribute something here. Working since a year on NGS, I could imagine that readers for the standard-output of pipelines by Illumina & Co would also fit there. If there is a reader module, I would plead for a low-level interface for accessing sequences/qualities and re-usable data containers during I/O. But maybe that is rather an early stage to talk about that when not even the existence of the module is decided. cheers - micha. Andreas Prlic wrote: > Hi Franklin, > > The theme of the moment is modularization... I wonder if we should make a > module for parsing the output of sequencers... > > This topic is also a bit related to the discussion we had around BOSC last > week, how to contribute modules, and what is the role of a module > maintainer. I will send out a more detailed summary on that a bit later. > > Andreas > > > On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow wrote: > >> Hi everyone, >> Now that you're all back from ISMB (I hope you all had a good time!) I >> thought it would be a good time to bring this up. >> >> A while back I wrote to the list about an ABIF parser and SCF writer that I >> had written. I got some pointers on things to change and I've since made >> the suggested changes. Now I was wondering how I should go about getting >> these files into BioJava.... >> >> -- >> Franklin >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From fbristow at gmail.com Tue Jul 7 17:29:56 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Tue, 7 Jul 2009 12:29:56 -0500 Subject: [Biojava-dev] Extended ABIF Parser and SCF Writer In-Reply-To: <4A5380F7.6090602@gmail.com> References: <50a7756d0907070634s31051bc0t3410ea2e33fa0686@mail.gmail.com> <59a41c430907070951r16e22bf3h7e83b16012cb2700@mail.gmail.com> <4A5380F7.6090602@gmail.com> Message-ID: <50a7756d0907071029s45ee0983y85f2ee307765e65c@mail.gmail.com> Hello, I think I like the idea of having a module for the I/O of sequencers in general. I really only have familiarity with ABI sequencers (ie: 31xx and 37xx) and the data that they spit out, so I would be able to offer some help there. Needless to say, the documentation that ABI released regarding their binary format was much appreciated when I was going through the code. To Micha: when you talk about 'during I/O', do you mean having some kind of an event based parser? When I wrote my extended ABIF parser I modelled it after the perl module Bio::Trace::ABIF, so there are accessors for many of the tags that are defined in the ABI spec. On Tue, Jul 7, 2009 at 12:08 PM, Micha Sammeth wrote: > Hi, > > the time I worked on SCF and ABI files is a bit ago, but lets see if I can > contribute something here. Working since a year on NGS, I could imagine that > readers for the standard-output of pipelines by Illumina & Co would also fit > there. > > If there is a reader module, I would plead for a low-level interface for > accessing sequences/qualities and re-usable data containers during I/O. But > maybe that is rather an early stage to talk about that when not even the > existence of the module is decided. > > cheers - micha. > > > Andreas Prlic wrote: > >> Hi Franklin, >> >> The theme of the moment is modularization... I wonder if we should make a >> module for parsing the output of sequencers... >> >> This topic is also a bit related to the discussion we had around BOSC last >> week, how to contribute modules, and what is the role of a module >> maintainer. I will send out a more detailed summary on that a bit later. >> >> Andreas >> >> >> On Tue, Jul 7, 2009 at 6:34 AM, Franklin Bristow >> wrote: >> >> Hi everyone, >>> Now that you're all back from ISMB (I hope you all had a good time!) I >>> thought it would be a good time to bring this up. >>> >>> A while back I wrote to the list about an ABIF parser and SCF writer that >>> I >>> had written. I got some pointers on things to change and I've since made >>> the suggested changes. Now I was wondering how I should go about getting >>> these files into BioJava.... >>> >>> -- >>> Franklin >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > -- Franklin From sreekanth.m at ocimumbio.com Wed Jul 8 05:51:15 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Wed, 8 Jul 2009 11:21:15 +0530 Subject: [Biojava-dev] Reg: Source of BioJava Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com> Dear all, Just now I started working with biojava to work in next generation sequencing. I got all the jar files to work with biojava, and i found many things which are very useful to me. I require sourde jar file of biojava. If anybody has it please send it to me. Thanks in advance. Thanks & Regards, Sreekanth.M From andreas at sdsc.edu Wed Jul 8 06:20:59 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 7 Jul 2009 23:20:59 -0700 Subject: [Biojava-dev] summary biojava user meeting Message-ID: <59a41c430907072320k3d5a4415u962d59a10d286beb@mail.gmail.com> Hi, Here a quick summary of the BioJava user meeting we had last week at the BOSC conference: The following people were present: Mattias Piipari Martijn Devisscher Frederik Decouttere Richard Holland Andreas Prlic The new modularized code base will allow for individual people to take over responsibility of some of the sub-modules as well as the contribution of new modules., which I both welcome greatly. As such it was great to have Mattias, Martijn and Frederik there and expressing their interest in this. Mattias is interested in contributing a new module related to machine learning. Martijn and Frederik are interested in providing a new GUI module (seqpad). Due to this our discussions were mainly related to how to organize the contribution of new modules and their maintainance: * Before starting a new module the code should undergo public code review * New modules need docu (wiki cookbook) and junit tests. * A Module Maintainer (MM) is the main responsible for everything related to the module. * MM coordinates patches and other user contributions for the module * MM can write papers related to the code in the module without having to cite all of the other BioJava contributors. * A MM volunteers to support the module for (at least) a year. * All MMs will be listed by name on a wiki page in order to clarify responsibilities Andreas From holland at eaglegenomics.com Wed Jul 8 16:41:54 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 08 Jul 2009 18:41:54 +0200 Subject: [Biojava-dev] Reg: Source of BioJava In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F7CE20F@EXCHMB.ocimumbio.com> Message-ID: <1247071314.3792.9.camel@buzzybee> The source code can be obtained by following these instructions: http://biojava.org/wiki/CVS_to_SVN_Migration Richard. On Wed, 2009-07-08 at 11:21 +0530, Sreekanth Mogullapally wrote: > Dear all, > > Just now I started working with biojava to work in next generation sequencing. > I got all the jar files to work with biojava, and i found many things which are very useful to me. > I require sourde jar file of biojava. If anybody has it please send it to me. > > Thanks in advance. > > Thanks & Regards, > Sreekanth.M > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From florian.mittag at uni-tuebingen.de Thu Jul 9 15:16:12 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 9 Jul 2009 17:16:12 +0200 Subject: [Biojava-dev] Problems in DB2 with VARCHAR, TEXT and CLOB using BioJava Message-ID: <200907091716.13639.florian.mittag@uni-tuebingen.de> Hi all! I'm posting this to both the BioSQL and the BioJava-dev mailinglist because the problem resides in both domains, I hope this is okay. We're working on getting BioJava to run with a DB2 Express-C backend for various reasons. We've encountered several problems during this task, but this one seems to have no real solution. When adapting the BioSQL schema to DB2, the official IBM conversion guide tells us to use the data type CLOB where MySQL uses TEXT. (Chapter 11 in ftp://ftp.software.ibm.com/software/data/db2/migration/mtk/mtk_2050.pdf) So far, no problem. But when we tried reading some genebank files with BioJava, the DB2 driver threw an exception: SQL0401N The data types of the operands for the operation "=" are not compatible. SQLSTATE=42818 SQLCODE=-401 Explanation: The class org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder defines some Hibernate queries, of which one has the conditions: "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?" All three columns "authors", "location", and "title" are of type TEXT in MySQL and of type CLOB in DB2, so comparing them with "=" leads to the above error message. The way I see it, there are only two possible solutions to this problem: 1) Change the query to "from DocRef as cr where cr.authors LIKE '?' and cr.location LIKE '?' and cr.title LIKE '?'" 2) Change the data type to something comparable with "=", like VARCHAR. Solution 1 is no real solution to me, because comparing values with "LIKE" usually is slow and it seems a bit odd to change a query that works with other databases just for DB2. But taking a closer look, solution 2 has some problems, too: Although VARCHARs in DB2 can have a length of theoretically 32767, in reality they are limited by the page size of the database, which can be 32K at maximum. Since this particular table "reference" has three columns of this type, the sum of their lengths must not exceed 32767, so they could only be something like VARCHAR(10000). I have never encountered cases in which values come even close to the length of 10000, but you can never be sure. And that is why I post here. For me, the way to go is pretty clear, but we intend to be as compatible as possible with the original BioSQL. Maybe you could give me some input on how to solve this problem with as few casualties as possible ;-) Thanks, Florian From aradwen at gmail.com Fri Jul 10 14:45:35 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Fri, 10 Jul 2009 16:45:35 +0200 Subject: [Biojava-dev] ExternalProcess class Message-ID: Hi everyone, Do somebody have (as examples of biojava cookbook) a usage example of ExternalProcess class ? Let's say we want to run a local clustalw program with it, is it possible ? Any example code ? Thank you Radwen From sylvain.foisy at diploide.net Fri Jul 10 16:43:09 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Fri, 10 Jul 2009 12:43:09 -0400 Subject: [Biojava-dev] ExternalProcess class In-Reply-To: Message-ID: Hi, There is no such example to the best of my knowledge. If you do use this class, you are welcome to share your experience by contributing. As far as running something like clustalw, I don't see why you could not make use of this class. You could actually build a wrapper class to execute clustalw and do something with its output. Best regards Sylvain On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote: > Hi everyone, > > Do somebody have (as examples of biojava cookbook) a usage example of > ExternalProcess class ? > > Let's say we want to run a local clustalw program with it, is it possible ? > > Any example code ? > > Thank you =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== From hlapp at gmx.net Sat Jul 11 11:47:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Jul 2009 07:47:34 -0400 Subject: [Biojava-dev] [BioSQL-l] Problems in DB2 with VARCHAR, TEXT and CLOB using BioJava In-Reply-To: <200907091716.13639.florian.mittag@uni-tuebingen.de> References: <200907091716.13639.florian.mittag@uni-tuebingen.de> Message-ID: <5614AEDA-3406-4844-8690-7653A2C4297C@gmx.net> Hi Florian: On Jul 9, 2009, at 11:16 AM, Florian Mittag wrote: > [...] > 2) Change the data type to something comparable with "=", like > VARCHAR. That's the way to go. The reason they are not VARCHAR in MySQL is because it is limited to 256 characters there. > [...] > Although VARCHARs in DB2 can have a length of theoretically 32767, > in reality > they are limited by the page size of the database, which can be 32K at > maximum. Since this particular table "reference" has three columns > of this > type, the sum of their lengths must not exceed 32767, so they could > only be > something like VARCHAR(10000). That sounds great though. You may have noticed that the columns are all of type VARCHAR in the Oracle version of the schema with the following widths: Title VARCHAR2(1000) Authors VARCHAR2(4000) Location VARCHAR2(512) That has always served me well. Feel free to use larger widths though if you think you need them. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From HWillis at scripps.edu Sat Jul 11 11:08:32 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Sat, 11 Jul 2009 07:08:32 -0400 Subject: [Biojava-dev] ExternalProcess class In-Reply-To: References: , Message-ID: Biojava already has a Clustalw class that executes Clustalw as an external process if that is your larger goal. http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW Thanks Scooter ________________________________________ From: biojava-dev-bounces at lists.open-bio.org [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy [sylvain.foisy at diploide.net] Sent: Friday, July 10, 2009 12:43 PM To: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] ExternalProcess class Hi, There is no such example to the best of my knowledge. If you do use this class, you are welcome to share your experience by contributing. As far as running something like clustalw, I don't see why you could not make use of this class. You could actually build a wrapper class to execute clustalw and do something with its output. Best regards Sylvain On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote: > Hi everyone, > > Do somebody have (as examples of biojava cookbook) a usage example of > ExternalProcess class ? > > Let's say we want to run a local clustalw program with it, is it possible ? > > Any example code ? > > Thank you =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net Tel: (514) 893-4363 =================================================================== _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From paolo.pavan at gmail.com Sun Jul 12 21:41:05 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Sun, 12 Jul 2009 23:41:05 +0200 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> Message-ID: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> Hi, I would like to post again with some adjustments a question I put some times ago because maybe this is a more correct list, apologize for the repeating. Can someone kindly give me his advise? thank you in advance, Paolo ---------- Forwarded message ---------- From: Paolo Pavan Date: 2009/7/9 Subject: Assembly data reading To: Biojava-l at lists.open-bio.org Hi everybody, I'm almost new to this topic, I would like to know if there is something can help me to load in my java program data from a large 454 contig. I need to retain in memory and access data from the single reads forming the contig too. I suppose these informations are in a *.sff file, if it is not possible to load such file it should be ok to load a *.ace (phrap) data file that I have too. Many thanks for any suggestion you can give me! Greetings, Paolo From holland at eaglegenomics.com Mon Jul 13 05:20:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 13 Jul 2009 06:20:35 +0100 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> Message-ID: <1247462435.25217.7.camel@buzzybee> Nothing within BJ can parse the 454 .sff files directly. However I think there is a growing need for it so if anyone is willing to contribute code, it would be very welcome. There is also no .ace parser, although in 2007 someone volunteered to write one but nothing happened, and there was a previous post (many years ago!) from someone else who already had some working code but again nothing seems to have happened: http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html So to start with, someone (perhaps yourself? that would be nice! :) ) needs to volunteer to write either a .ace or .sff parser, or both. The thing to bear in mind with 454 contigs as you rightly point out is the sheer size of the things. The requirement to keep them entirely in memory is likely to be unworkable as it would leave little room for anything else to run on your average machine. I would suggest either memory-mapping the file itself, or parsing and writing out a memory-mapped summary file containing the bits of data you're interested in. (Memory-mapping is where you keep an index in memory indicating where in the file each record is, so that when you need to access them you load them on-the-fly from the file and drop them out of memory again immediately after use. An accelerated form of this is to put the loaded records into some kind of LRU cache which holds only the most recently accessed records and then check that cache first to see if you've already loaded the record before accessing the file directly.) cheers, Richard On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: > Hi, > I would like to post again with some adjustments a question I put some > times ago because maybe this is a more correct list, apologize for the > repeating. > Can someone kindly give me his advise? > > thank you in advance, > Paolo > > > ---------- Forwarded message ---------- > From: Paolo Pavan > Date: 2009/7/9 > Subject: Assembly data reading > To: Biojava-l at lists.open-bio.org > > > Hi everybody, > I'm almost new to this topic, I would like to know if there is > something can help me to load in my java program data from a large 454 > contig. I need to retain in memory and access data from the single > reads forming the contig too. > I suppose these informations are in a *.sff file, if it is not > possible to load such file it should be ok to load a *.ace (phrap) > data file that I have too. > Many thanks for any suggestion you can give me! > > Greetings, > Paolo > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Mon Jul 13 06:29:56 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 13 Jul 2009 14:29:56 +0800 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <1247462435.25217.7.camel@buzzybee> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> <1247462435.25217.7.camel@buzzybee> Message-ID: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> I would agree that there is a strong need for this kind of thing in biojava. As Richard says you probably can't fit it in memory so you may want to memory map it. There are classes in the javax.nio package that can help a lot with this. Also I have had some success with in-memory compression of large files using LZ compression. Essentially the memory representation of the file is LZ compressed and compression and decompression are handled on the fly. Again there are Java utility classes that can help. - Mark On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland wrote: > Nothing within BJ can parse the 454 .sff files directly. However I think > there is a growing need for it so if anyone is willing to contribute > code, it would be very welcome. > > There is also no .ace parser, although in 2007 someone volunteered to > write one but nothing happened, and there was a previous post (many > years ago!) from someone else who already had some working code but > again nothing seems to have happened: > > http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html > http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html > > So to start with, someone (perhaps yourself? that would be nice! :) ) > needs to volunteer to write either a .ace or .sff parser, or both. > > The thing to bear in mind with 454 contigs as you rightly point out is > the sheer size of the things. The requirement to keep them entirely in > memory is likely to be unworkable as it would leave little room for > anything else to run on your average machine. I would suggest either > memory-mapping the file itself, or parsing and writing out a > memory-mapped summary file containing the bits of data you're interested > in. (Memory-mapping is where you keep an index in memory indicating > where in the file each record is, so that when you need to access them > you load them on-the-fly from the file and drop them out of memory again > immediately after use. An accelerated form of this is to put the loaded > records into some kind of LRU cache which holds only the most recently > accessed records and then check that cache first to see if you've > already loaded the record before accessing the file directly.) > > cheers, > Richard > > > On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: > > Hi, > > I would like to post again with some adjustments a question I put some > > times ago because maybe this is a more correct list, apologize for the > > repeating. > > Can someone kindly give me his advise? > > > > thank you in advance, > > Paolo > > > > > > ---------- Forwarded message ---------- > > From: Paolo Pavan > > Date: 2009/7/9 > > Subject: Assembly data reading > > To: Biojava-l at lists.open-bio.org > > > > > > Hi everybody, > > I'm almost new to this topic, I would like to know if there is > > something can help me to load in my java program data from a large 454 > > contig. I need to retain in memory and access data from the single > > reads forming the contig too. > > I suppose these informations are in a *.sff file, if it is not > > possible to load such file it should be ok to load a *.ace (phrap) > > data file that I have too. > > Many thanks for any suggestion you can give me! > > > > Greetings, > > Paolo > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sreekanth.m at ocimumbio.com Mon Jul 13 11:28:52 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Mon, 13 Jul 2009 16:58:52 +0530 Subject: [Biojava-dev] Reg: Quality values of ABI File Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> Hi Everybody, I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it, but I need the Quality values for each base. Please help me in this regard. Thanks in Advance Sreekanth.M From fbristow at gmail.com Mon Jul 13 13:15:40 2009 From: fbristow at gmail.com (Franklin Bristow) Date: Mon, 13 Jul 2009 08:15:40 -0500 Subject: [Biojava-dev] Reg: Quality values of ABI File In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> Message-ID: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com> Hi Sreekanth, The quality values are stored under the PCON 1 and 2 tags. The information you need is in the offsetData field of the TaggedDataRecord. You can treat this byte array as an array of shorts containing the quality values for each base. Take a look at this PDF for more information about the different tags available to you: http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf -- Franklin On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally < sreekanth.m at ocimumbio.com> wrote: > Hi Everybody, > > I am working with "ABI" Files. I am able to get the pixel values for > Chromatogram viewer from it, > but I need the Quality values for each base. > Please help me in this regard. > > Thanks in Advance > Sreekanth.M > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sylvain.foisy at diploide.net Mon Jul 13 13:16:34 2009 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Mon, 13 Jul 2009 09:16:34 -0400 Subject: [Biojava-dev] ExternalProcess class In-Reply-To: Message-ID: Hi Scooter, Actaully, this is not into BJ 1.7. This material was created by Dickson Guedes but never formalized into a class for BJ. Maybe it would be the time to do so? Best regards Sylvain On 11/07/09 07:08, "[NAME]" <[ADDRESS]> wrote: > Biojava already has a Clustalw class that executes Clustalw as an external > process if that is your larger goal. > > http://www.biojava.org/wiki/BioJava:Tutorial:MultiAlignClustalW > > Thanks > > Scooter > ________________________________________ > From: biojava-dev-bounces at lists.open-bio.org > [biojava-dev-bounces at lists.open-bio.org] On Behalf Of Sylvain Foisy > [sylvain.foisy at diploide.net] > Sent: Friday, July 10, 2009 12:43 PM > To: biojava-dev at lists.open-bio.org > Subject: Re: [Biojava-dev] ExternalProcess class > > Hi, > > There is no such example to the best of my knowledge. If you do use this > class, you are welcome to share your experience by contributing. As far as > running something like clustalw, I don't see why you could not make use of > this class. You could actually build a wrapper class to execute clustalw and > do something with its output. > > Best regards > > Sylvain > > On 10/07/09 12:00, "[NAME]" <[ADDRESS]> wrote: > >> Hi everyone, >> >> Do somebody have (as examples of biojava cookbook) a usage example of >> ExternalProcess class ? >> >> Let's say we want to run a local clustalw program with it, is it possible ? >> >> Any example code ? >> >> Thank you > > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > Tel: (514) 893-4363 > =================================================================== > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From markjschreiber at gmail.com Mon Jul 13 13:21:42 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 13 Jul 2009 21:21:42 +0800 Subject: [Biojava-dev] Reg: Quality values of ABI File In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> Message-ID: <93b45ca50907130621m6ebd37bn72aad9fafe195db7@mail.gmail.com> Hi - You would usually use a program like Phred/Phrap for this. There is a BioJava package for reading and processing the Phred output. - Mark On Mon, Jul 13, 2009 at 7:28 PM, Sreekanth Mogullapally < sreekanth.m at ocimumbio.com> wrote: > Hi Everybody, > > I am working with "ABI" Files. I am able to get the pixel values for > Chromatogram viewer from it, > but I need the Quality values for each base. > Please help me in this regard. > > Thanks in Advance > Sreekanth.M > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From sreekanth.m at ocimumbio.com Mon Jul 13 14:49:26 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Mon, 13 Jul 2009 20:19:26 +0530 Subject: [Biojava-dev] Reg: Quality values of ABI File In-Reply-To: <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com> References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CA8E@EXCHMB.ocimumbio.com> <50a7756d0907130615i3d7cca5ar941f5d2d09657e5@mail.gmail.com> Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F82CBC6@EXCHMB.ocimumbio.com> Dear Franklin, Thank you very much for your quick responce. Now I can able to get the Quality values, which meets my requirement. Thanks & Regards, Sreekanth.M From: Franklin Bristow [mailto:fbristow at gmail.com] Sent: Monday, July 13, 2009 6:46 PM To: Sreekanth Mogullapally Cc: biojava-dev-request at lists.open-bio.org; biojava-dev at lists.open-bio.org; Madhu Mohan. Ganni; Kishore Dunga Subject: Re: [Biojava-dev] Reg: Quality values of ABI File Hi Sreekanth, The quality values are stored under the PCON 1 and 2 tags. The information you need is in the offsetData field of the TaggedDataRecord. You can treat this byte array as an array of shorts containing the quality values for each base. Take a look at this PDF for more information about the different tags available to you: http://www.appliedbiosystems.com/support/software_community/ABIF_File_Format.pdf -- Franklin On Mon, Jul 13, 2009 at 6:28 AM, Sreekanth Mogullapally > wrote: Hi Everybody, I am working with "ABI" Files. I am able to get the pixel values for Chromatogram viewer from it, but I need the Quality values for each base. Please help me in this regard. Thanks in Advance Sreekanth.M _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at eaglegenomics.com Mon Jul 13 17:14:13 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 13 Jul 2009 18:14:13 +0100 Subject: [Biojava-dev] Hackathon Message-ID: <1247505253.27493.15.camel@buzzybee> Hi all. Andreas and I would like to organise a hackathon to get the modularisation and general improvement plans for BJ3 into action, and bring the project forward into the 21st century (only 10 years late!). At this time I'm trying to gather interest and gauge who might realistically be able to attend. We will attempt to site the hackathon at a location closest to the majority of attendees. To help me plan numbers and likely costs (for potential sponsors) could all those who are interested please answer the following questions for me: 1. Name, 2. Specialist interest within Biojava (e.g. proteomics, microarrays, sequencing, etc.), 3. Your physical location (country and and nearest major city - e.g. Cambridge, London, Newcastle, San Diego, Singapore, etc.), 4. Whether you think your employer would help pay your airfare and/or 1 week in a hotel to attend (and how far you think you could go on such funding), 5. Approximate availability for the next 12 months. To get the ball rolling, here's me: 1. Richard Holland, 2. Making the whole thing more consistent, efficient, and easier to use, 3. Southampton (UK), 4. Possibly but probably only within UK/Europe, 5. Only available in mid-Jan 2010, otherwise can't do anything until mid-March 2010 onwards. Looking forward to hearing your comments! Once I have a good idea of numbers and distribution, I can get some costs together to give you (and any potential sponsors) the best idea of what might be involved. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From HWillis at scripps.edu Mon Jul 13 20:12:16 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 13 Jul 2009 16:12:16 -0400 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> Message-ID: Richard I would be up for a week away as a "vacation" to do some Java programming. So I am flexible on all elements of time and ability to travel. Bonus points for going some place where the weather is reasonable for the location since it appears we have a global option (January in the UK not my first choice/January in Colorado better choice). Probably wouldn't be a bad idea to try and bookend/overlap a bioinformatics related conference to help justify travel costs for those who need support from work. We should also consider online options for those who can't travel but can allocate the time. Thanks Scooter On 7/13/09 1:14 PM, "Richard Holland" wrote: Hi all. Andreas and I would like to organise a hackathon to get the modularisation and general improvement plans for BJ3 into action, and bring the project forward into the 21st century (only 10 years late!). At this time I'm trying to gather interest and gauge who might realistically be able to attend. We will attempt to site the hackathon at a location closest to the majority of attendees. To help me plan numbers and likely costs (for potential sponsors) could all those who are interested please answer the following questions for me: 1. Name, 2. Specialist interest within Biojava (e.g. proteomics, microarrays, sequencing, etc.), 3. Your physical location (country and and nearest major city - e.g. Cambridge, London, Newcastle, San Diego, Singapore, etc.), 4. Whether you think your employer would help pay your airfare and/or 1 week in a hotel to attend (and how far you think you could go on such funding), 5. Approximate availability for the next 12 months. To get the ball rolling, here's me: 1. Richard Holland, 2. Making the whole thing more consistent, efficient, and easier to use, 3. Southampton (UK), 4. Possibly but probably only within UK/Europe, 5. Only available in mid-Jan 2010, otherwise can't do anything until mid-March 2010 onwards. Looking forward to hearing your comments! Once I have a good idea of numbers and distribution, I can get some costs together to give you (and any potential sponsors) the best idea of what might be involved. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From paolo.pavan at gmail.com Tue Jul 14 16:08:11 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 14 Jul 2009 18:08:11 +0200 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> <1247462435.25217.7.camel@buzzybee> <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> Message-ID: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com> Dear all, I took a day to make a rapid search to try to have a clearest point of the situation. ? I found the specification of the .sff file in the 454 instrument manual, it is fully described and seems to be enough to build a reader. ? However from a more careful read it seems that a *.sff file brings not information about the automatic contig assembling and only stores flowgram info that are "reads" (not like a *.ace file indeed). ? Two hidden binary files can be found in a 454 gsAssembler project folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are not described in the manual but they seem to contain the former nucleotide data and the latter read names, they are big enough to contain such kind of data, the problem is that we don't know how to parse them. ? It is necessary to decide a "memory structure" in which store the information read, I agree on the "memory mapping" solution, maybe implemented with a Map object that can associate the names of the read and its location on the file. ? the parser class then should expose methods to: 1) iterate through reads, but maybe this should be heavy and avoidable 2) access read sequence from name ? if the parser should manage the assembled contigs too and this is subordinated to what explained in the third bullet point, it should expose method to: 1) iterate through contigs names 2) iterate through contigs consensus sequences 3) access consensus sequences from name (this is a sub problem of point 2) 4) access random aligned portions (I mean "slice") of the assembly given start-end positions returning an alignment object ? any more suggestions? I would be glad to be involved in the biojava community through this project and I could try but first of all I want to say that I?m not a guru like most of the people here ( :-p ) and to say the truth the job that my company required me is different and maybe if exists a workaround I should be honest to choose it. So let me think a bit about starting such adventure, if I can couple my job and contributing the community growth I?ll be happy to share my work! Any suggestion welcome. Bye bye, Paolo 2009/7/13 Mark Schreiber : > I would agree that there is a strong need for this kind of thing in biojava. > > As Richard says you probably can't fit it in memory so you may want to > memory map it. There are classes in the javax.nio package that can help a > lot with this. > > Also I have had some success with in-memory compression of large files using > LZ compression. Essentially the memory representation of the file is LZ > compressed and compression and decompression are handled on the fly. Again > there are Java utility classes that can help. > > - Mark > > On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland > wrote: >> >> Nothing within BJ can parse the 454 .sff files directly. However I think >> there is a growing need for it so if anyone is willing to contribute >> code, it would be very welcome. >> >> There is also no .ace parser, although in 2007 someone volunteered to >> write one but nothing happened, and there was a previous post (many >> years ago!) from someone else who already had some working code but >> again nothing seems to have happened: >> >> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html >> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html >> >> So to start with, someone (perhaps yourself? that would be nice! :) ) >> needs to volunteer to write either a .ace or .sff parser, or both. >> >> The thing to bear in mind with 454 contigs as you rightly point out is >> the sheer size of the things. The requirement to keep them entirely in >> memory is likely to be unworkable as it would leave little room for >> anything else to run on your average machine. I would suggest either >> memory-mapping the file itself, or parsing and writing out a >> memory-mapped summary file containing the bits of data you're interested >> in. (Memory-mapping is where you keep an index in memory indicating >> where in the file each record is, so that when you need to access them >> you load them on-the-fly from the file and drop them out of memory again >> immediately after use. An accelerated form of this is to put the loaded >> records into some kind of LRU cache which holds only the most recently >> accessed records and then check that cache first to see if you've >> already loaded the record before accessing the file directly.) >> >> cheers, >> Richard >> >> >> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: >> > Hi, >> > I would like to post again with some adjustments a question I put some >> > times ago because maybe this is a more correct list, apologize for the >> > repeating. >> > Can someone kindly give me his advise? >> > >> > thank you in advance, >> > Paolo >> > >> > >> > ---------- Forwarded message ---------- >> > From: Paolo Pavan >> > Date: 2009/7/9 >> > Subject: Assembly data reading >> > To: Biojava-l at lists.open-bio.org >> > >> > >> > Hi everybody, >> > I'm almost new to this topic, I would like to know if there is >> > something can help me to load in my java program data from a large 454 >> > contig. I need to retain in memory and access data from the single >> > reads forming the contig too. >> > I suppose these informations are in a *.sff file, if it is not >> > possible to load such file it should be ok to load a *.ace (phrap) >> > data file that I have too. >> > Many thanks for any suggestion you can give me! >> > >> > Greetings, >> > Paolo >> > _______________________________________________ >> > biojava-dev mailing list >> > biojava-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-dev >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From aradwen at gmail.com Wed Jul 15 11:10:03 2009 From: aradwen at gmail.com (Radwen ANIBA) Date: Wed, 15 Jul 2009 13:10:03 +0200 Subject: [Biojava-dev] PsiPred Message-ID: Hi mates, I was wondering if Biojava could handle PsiPred outputs (protein secondary structures) for parsing (eg saying well this protein have helix start at .. end at ..., sheet start at ... end at ... ), is there any class or methods that was done in that sens, if yes i'm interested. thank you From andreas at sdsc.edu Wed Jul 15 15:13:07 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 15 Jul 2009 08:13:07 -0700 Subject: [Biojava-dev] PsiPred In-Reply-To: References: Message-ID: <59a41c430907150813q7fd58a81uc2d4372f01d2e89a@mail.gmail.com> Hi Radwen, At the present there is no parser for PsiPred. I am happy about any contribution re. that... Andreas On Wed, Jul 15, 2009 at 4:10 AM, Radwen ANIBA wrote: > Hi mates, > > I was wondering if Biojava could handle PsiPred outputs (protein secondary > structures) for parsing (eg saying well this protein have helix start at .. > end at ..., sheet start at ... end at ... ), is there any class or methods > that was done in that sens, if yes i'm interested. > > thank you > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From heuermh at acm.org Wed Jul 15 19:22:14 2009 From: heuermh at acm.org (Michael Heuer) Date: Wed, 15 Jul 2009 15:22:14 -0400 (EDT) Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> Message-ID: Richard Holland wrote: > Andreas and I would like to organise a hackathon to get the > modularisation and general improvement plans for BJ3 into action, and > bring the project forward into the 21st century (only 10 years late!). > > At this time I'm trying to gather interest and gauge who might > realistically be able to attend. We will attempt to site the hackathon > at a location closest to the majority of attendees. > > To help me plan numbers and likely costs (for potential sponsors) could > all those who are interested please answer the following questions for > me: > > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. A hackathon would be great. I'm closest to MSP airport. My employer would probably not cover air fare or hotel. I would then recommend choosing an interesting location so that it would be worth spending out-of-pocket to get there. I don't use biojava for my day job any more, so I'm most interested in helping with architecture and build issues. My day job is currently a lot of data viz, maybe better integration with viz tools like Cytoscape, Piccolo2D, prefuse, and Processing would be fun to work on. michael From sreekanth.m at ocimumbio.com Thu Jul 16 06:07:19 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Thu, 16 Jul 2009 11:37:19 +0530 Subject: [Biojava-dev] Reg: SeqIOTools Class Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com> Hi Everybody, I am newly working with biojava. While I am trying to read Fasta file using biojava with following code SequenceIterator stream = SeqIOTools.readFastaDNA(new BufferedReader(new FileReader(fileName))); for this I am importing following class import org.biojava.bio.seq.io.SeqIOTools; But it is showing a warning that "SeqIOTools" is deprecated. Is there any other class which satisfies all the functionality of "SeqIOTools" class. Please suggest me in this regard. Thanks in Advance Sreekanth. M From mark.schreiber at novartis.com Thu Jul 16 06:11:52 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 16 Jul 2009 14:11:52 +0800 Subject: [Biojava-dev] Reg: SeqIOTools Class In-Reply-To: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com> Message-ID: Hi - The replacement for this class is RichSequence.IOTools - Mark biojava-dev-bounces at lists.open-bio.org wrote on 07/16/2009 02:07:19 PM: > > Hi Everybody, > > I am newly working with biojava. While I am trying to read Fasta > file using biojava with following code > > SequenceIterator stream = SeqIOTools.readFastaDNA(new > BufferedReader(new FileReader(fileName))); > > for this I am importing following class > import org.biojava.bio.seq.io.SeqIOTools; > But it is showing a warning that "SeqIOTools" is deprecated. > > Is there any other class which satisfies all the functionality of > "SeqIOTools" class. > > Please suggest me in this regard. > > Thanks in Advance > Sreekanth. M > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From sreekanth.m at ocimumbio.com Thu Jul 16 06:38:31 2009 From: sreekanth.m at ocimumbio.com (Sreekanth Mogullapally) Date: Thu, 16 Jul 2009 12:08:31 +0530 Subject: [Biojava-dev] Reg: Exporting into fasta format In-Reply-To: References: <2DDF09AFEB46E54894A3843CEF9CB3A4409F891673@EXCHMB.ocimumbio.com> Message-ID: <2DDF09AFEB46E54894A3843CEF9CB3A4409F8916A7@EXCHMB.ocimumbio.com> Hi Everybody, I need to Export the sequences into fasta format. Please suggest me how to export into fasta format. I have written my own code to export it, but I want to implement it using biojava. Thanks & Regards, Sreekanth. M From ayates at ebi.ac.uk Sun Jul 19 19:15:41 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Sun, 19 Jul 2009 20:15:41 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> References: <1247505253.27493.15.camel@buzzybee> Message-ID: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> Okay another one for the ball: 1). Andy Yates 2). Erm well getting it right so I guess that lands me in testing/ integration/killing every singleton 3). Cambridge 4). Currently in a Perl group so not a chance really 5). Quite flexible How does that sound? On 13 Jul 2009, at 18:14, Richard Holland wrote: > Hi all. > > Andreas and I would like to organise a hackathon to get the > modularisation and general improvement plans for BJ3 into action, and > bring the project forward into the 21st century (only 10 years late!). > > At this time I'm trying to gather interest and gauge who might > realistically be able to attend. We will attempt to site the hackathon > at a location closest to the majority of attendees. > > To help me plan numbers and likely costs (for potential sponsors) > could > all those who are interested please answer the following questions for > me: > > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/ > or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. > > To get the ball rolling, here's me: > > 1. Richard Holland, 2. Making the whole thing more consistent, > efficient, and easier to use, 3. Southampton (UK), 4. Possibly but > probably only within UK/Europe, 5. Only available in mid-Jan 2010, > otherwise can't do anything until mid-March 2010 onwards. > > Looking forward to hearing your comments! Once I have a good idea of > numbers and distribution, I can get some costs together to give you > (and > any potential sponsors) the best idea of what might be involved. > > cheers, > Richard > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Mon Jul 20 20:11:52 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 20 Jul 2009 13:11:52 -0700 Subject: [Biojava-dev] Fwd: Assembly data reading In-Reply-To: <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com> References: <56be91b60907090858t41f2c72cwf7db057e6390d6db@mail.gmail.com> <56be91b60907121441h7602d917m496b675d0fa9fd68@mail.gmail.com> <1247462435.25217.7.camel@buzzybee> <93b45ca50907122329g65cef108h112399da01d6b322@mail.gmail.com> <56be91b60907140908t6d6aa906ifcd334afd9e44882@mail.gmail.com> Message-ID: <59a41c430907201311o4d287651k3adc2069b2c95f61@mail.gmail.com> Hi Paolo, Not sure if you got a response to your mail off list. If there is sufficient interest from the people working on processing the output of the various sequencers, it would be great if those people would work together to get a new biojava module started. Most probably somebody needs to take initiative and lead the development, otherwise it won't happen. Cheers, Andreas On Tue, Jul 14, 2009 at 9:08 AM, Paolo Pavan wrote: > Dear all, > I took a day to make a rapid search to try to have a clearest point of > the situation. > ? ? ? ? I found the specification of the .sff file in the 454 instrument > manual, it is fully described and seems to be enough to build a > reader. > ? ? ? ? However from a more careful read it seems that a *.sff file brings > not information about the automatic contig assembling and only stores > flowgram info that are "reads" (not like a *.ace file indeed). > ? ? ? ? Two hidden binary files can be found in a 454 gsAssembler project > folder, they are: .ChordMatrixMetadata and .SeqCacheMetadata. They are > not described in the manual but they seem to contain the former > nucleotide data and the latter read names, they are big enough to > contain such kind of data, the problem is that we don't know how to > parse them. > ? ? ? ? It is necessary to decide a "memory structure" in which store the > information read, I agree on the "memory mapping" solution, maybe > implemented with a Map object that can associate the names of the read > and its location on the file. > ? ? ? ? the parser class then should expose methods to: > ? ? ? ?1) iterate through reads, but maybe this should be heavy and avoidable > ? ? ? ?2) access read sequence from name > ? ? ? ? if the parser should manage the assembled contigs too and this is > subordinated to what explained in the third bullet point, it should > expose method to: > ? ? ? ?1) iterate through contigs names > ? ? ? ?2) iterate through contigs consensus sequences > ? ? ? ?3) access consensus sequences from name (this is a sub problem of point 2) > ? ? ? ?4) access random aligned portions (I mean "slice") of the assembly > given start-end positions returning an alignment object > ? ? ? ? any more suggestions? > I would be glad to be involved in the biojava community through this > project and I could try but first of all I want to say that I?m not a > guru like most of the people here ( :-p ) and to say the truth the job > that my company required me is different and maybe if exists a > workaround I should be honest to choose it. > So let me think a bit about starting such adventure, if I can couple > my job and contributing the community growth I?ll be happy to share my > work! Any suggestion welcome. > > Bye bye, > Paolo > > > 2009/7/13 Mark Schreiber : >> I would agree that there is a strong need for this kind of thing in biojava. >> >> As Richard says you probably can't fit it in memory so you may want to >> memory map it. There are classes in the javax.nio package that can help a >> lot with this. >> >> Also I have had some success with in-memory compression of large files using >> LZ compression. Essentially the memory representation of the file is LZ >> compressed and compression and decompression are handled on the fly. Again >> there are Java utility classes that can help. >> >> - Mark >> >> On Mon, Jul 13, 2009 at 1:20 PM, Richard Holland >> wrote: >>> >>> Nothing within BJ can parse the 454 .sff files directly. However I think >>> there is a growing need for it so if anyone is willing to contribute >>> code, it would be very welcome. >>> >>> There is also no .ace parser, although in 2007 someone volunteered to >>> write one but nothing happened, and there was a previous post (many >>> years ago!) from someone else who already had some working code but >>> again nothing seems to have happened: >>> >>> http://portal.open-bio.org/pipermail/biojava-l/2001-June/001283.html >>> http://lists.open-bio.org/pipermail/biojava-l/2007-July/005900.html >>> >>> So to start with, someone (perhaps yourself? that would be nice! :) ) >>> needs to volunteer to write either a .ace or .sff parser, or both. >>> >>> The thing to bear in mind with 454 contigs as you rightly point out is >>> the sheer size of the things. The requirement to keep them entirely in >>> memory is likely to be unworkable as it would leave little room for >>> anything else to run on your average machine. I would suggest either >>> memory-mapping the file itself, or parsing and writing out a >>> memory-mapped summary file containing the bits of data you're interested >>> in. (Memory-mapping is where you keep an index in memory indicating >>> where in the file each record is, so that when you need to access them >>> you load them on-the-fly from the file and drop them out of memory again >>> immediately after use. An accelerated form of this is to put the loaded >>> records into some kind of LRU cache which holds only the most recently >>> accessed records and then check that cache first to see if you've >>> already loaded the record before accessing the file directly.) >>> >>> cheers, >>> Richard >>> >>> >>> On Sun, 2009-07-12 at 23:41 +0200, Paolo Pavan wrote: >>> > Hi, >>> > I would like to post again with some adjustments a question I put some >>> > times ago because maybe this is a more correct list, apologize for the >>> > repeating. >>> > Can someone kindly give me his advise? >>> > >>> > thank you in advance, >>> > Paolo >>> > >>> > >>> > ---------- Forwarded message ---------- >>> > From: Paolo Pavan >>> > Date: 2009/7/9 >>> > Subject: Assembly data reading >>> > To: Biojava-l at lists.open-bio.org >>> > >>> > >>> > Hi everybody, >>> > I'm almost new to this topic, I would like to know if there is >>> > something can help me to load in my java program data from a large 454 >>> > contig. I need to retain in memory and access data from the single >>> > reads forming the contig too. >>> > I suppose these informations are in a *.sff file, if it is not >>> > possible to load such file it should be ok to load a *.ace (phrap) >>> > data file that I have too. >>> > Many thanks for any suggestion you can give me! >>> > >>> > Greetings, >>> > Paolo >>> > _______________________________________________ >>> > biojava-dev mailing list >>> > biojava-dev at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From matias.piipari at gmail.com Tue Jul 21 22:07:47 2009 From: matias.piipari at gmail.com (Matias Piipari) Date: Tue, 21 Jul 2009 23:07:47 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> References: <1247505253.27493.15.camel@buzzybee> <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> Message-ID: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com> 1. Matias Piipari 2. sequences + sequence motifs 3. Cambridge 4. Possible but not terribly likely. 5. Flexible On Sun, Jul 19, 2009 at 8:15 PM, Andy Yates wrote: > Okay another one for the ball: > > 1). Andy Yates > 2). Erm well getting it right so I guess that lands me in > testing/integration/killing every singleton > 3). Cambridge > 4). Currently in a Perl group so not a chance really > 5). Quite flexible > > How does that sound? > > > On 13 Jul 2009, at 18:14, Richard Holland wrote: > > Hi all. >> >> Andreas and I would like to organise a hackathon to get the >> modularisation and general improvement plans for BJ3 into action, and >> bring the project forward into the 21st century (only 10 years late!). >> >> At this time I'm trying to gather interest and gauge who might >> realistically be able to attend. We will attempt to site the hackathon >> at a location closest to the majority of attendees. >> >> To help me plan numbers and likely costs (for potential sponsors) could >> all those who are interested please answer the following questions for >> me: >> >> 1. Name, >> 2. Specialist interest within Biojava (e.g. proteomics, microarrays, >> sequencing, etc.), >> 3. Your physical location (country and and nearest major city - e.g. >> Cambridge, London, Newcastle, San Diego, Singapore, etc.), >> 4. Whether you think your employer would help pay your airfare and/or 1 >> week in a hotel to attend (and how far you think you could go on such >> funding), >> 5. Approximate availability for the next 12 months. >> >> To get the ball rolling, here's me: >> >> 1. Richard Holland, 2. Making the whole thing more consistent, >> efficient, and easier to use, 3. Southampton (UK), 4. Possibly but >> probably only within UK/Europe, 5. Only available in mid-Jan 2010, >> otherwise can't do anything until mid-March 2010 onwards. >> >> Looking forward to hearing your comments! Once I have a good idea of >> numbers and distribution, I can get some costs together to give you (and >> any potential sponsors) the best idea of what might be involved. >> >> cheers, >> Richard >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From HWillis at scripps.edu Wed Jul 22 18:34:26 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 22 Jul 2009 14:34:26 -0400 Subject: [Biojava-dev] Bug Tracking Message-ID: Do we have a formal defect/feature request tracking setup for Biojava? In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following. I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine. Thanks Scooter From andreas at sdsc.edu Wed Jul 22 19:15:50 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 22 Jul 2009 12:15:50 -0700 Subject: [Biojava-dev] Bug Tracking In-Reply-To: References: Message-ID: <59a41c430907221215y10b95473y25fa9d94c82afc3f@mail.gmail.com> Hi Scooter, we have bugzilla running at: http://bugzilla.open-bio.org/ Andreas On Wed, Jul 22, 2009 at 11:34 AM, Scooter Willis wrote: > Do we have a formal defect/feature request tracking setup for Biojava? > > In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following. > > I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine > > Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine. > > Thanks > > Scooter > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Wed Jul 22 19:16:19 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 22 Jul 2009 20:16:19 +0100 Subject: [Biojava-dev] Bug Tracking In-Reply-To: References: Message-ID: <1248290179.28124.54.camel@buzzybee> Yup, we do. It's here: http://bugzilla.open-bio.org/ cheers, Richard On Wed, 2009-07-22 at 14:34 -0400, Scooter Willis wrote: > Do we have a formal defect/feature request tracking setup for Biojava? > > In case we don't have something formally setup or if we do and are not using it I wanted to suggest the following. > > I am getting ready to install the following jumpbox at work to do subversion/defect tracking/wiki/project management. http://www.jumpbox.com/app/redmine > > Everything is already installed in a vmware instance and sitting through the demo it looked very promising. You create a project in redmine and that provisions the subversion repository. You can then report bugs, add feature requests against the project. You can assign defects to members of the project and get timelines and other project management related reports. Combine this with a wiki for doing docs and demo code it looks like an interesting combination. I have been very impressed with what jumpbox is doing in other bundled software components so if they are providing redmine as a jumpbox it is probably very good. So far I have only played around with the demo instance at redmine. > > Thanks > > Scooter > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jw12 at sanger.ac.uk Fri Jul 24 09:12:21 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Fri, 24 Jul 2009 10:12:21 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> References: <1247505253.27493.15.camel@buzzybee> Message-ID: Hi Richard and Andreas, you both know where I'm situated, but for the record: 1. Jonathan Warren 2. Any DAS related libararies, visualization 3. Cambridge 4. Yes I think so, definitely if in UK. 5. Anytime.... although obviously I'm very busy, just not many definite commitments ;) There seem to be many DAS classes under the biojava-live site, but many of them are not used in any of the DAS related code I have, with the exception of Structure and Alignment classes that are used in some dazzle plugins. So it might be good to update some of these classes to new more relevant code. I also notice that Apollo was trying to get away from using Biojava code for it's DAS 1.5 adapter. Maybe the new modular design of biojava will resolve issues that Apollo developers had? Anyway- I guess I'm asking Andreas or anyone else if they know the history of some of these classes e.g. org.biojava.bio.program.das package? On 13 Jul 2009, at 18:14, Richard Holland wrote: > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/ > or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From julie at flymine.org Fri Jul 24 09:25:03 2009 From: julie at flymine.org (Julie Sullivan) Date: Fri, 24 Jul 2009 10:25:03 +0100 Subject: [Biojava-dev] Hackathon In-Reply-To: <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com> References: <1247505253.27493.15.camel@buzzybee> <3735A549-2F00-418D-9194-638D7C923FC7@ebi.ac.uk> <15cdf3360907211507t6cb9b519y36a61197ae01378f@mail.gmail.com> Message-ID: <4A697DEF.2020304@flymine.org> 1. Julie Sullivan 2. InterMine uses BioJava to handle sequences and pdb data 3. Cambridge 4. No 5. Flexible >>> Andreas and I would like to organise a hackathon to get the >>> modularisation and general improvement plans for BJ3 into action, and >>> bring the project forward into the 21st century (only 10 years late!). >>> >>> At this time I'm trying to gather interest and gauge who might >>> realistically be able to attend. We will attempt to site the hackathon >>> at a location closest to the majority of attendees. >>> >>> To help me plan numbers and likely costs (for potential sponsors) could >>> all those who are interested please answer the following questions for >>> me: >>> >>> 1. Name, >>> 2. Specialist interest within Biojava (e.g. proteomics, microarrays, >>> sequencing, etc.), >>> 3. Your physical location (country and and nearest major city - e.g. >>> Cambridge, London, Newcastle, San Diego, Singapore, etc.), >>> 4. Whether you think your employer would help pay your airfare and/or 1 >>> week in a hotel to attend (and how far you think you could go on such >>> funding), >>> 5. Approximate availability for the next 12 months. >>> >>> To get the ball rolling, here's me: >>> >>> 1. Richard Holland, 2. Making the whole thing more consistent, >>> efficient, and easier to use, 3. Southampton (UK), 4. Possibly but >>> probably only within UK/Europe, 5. Only available in mid-Jan 2010, >>> otherwise can't do anything until mid-March 2010 onwards. >>> >>> Looking forward to hearing your comments! Once I have a good idea of >>> numbers and distribution, I can get some costs together to give you (and >>> any potential sponsors) the best idea of what might be involved. >>> >>> cheers, >>> Richard >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From gmicha at gmail.com Fri Jul 24 10:38:04 2009 From: gmicha at gmail.com (Micha Sammeth) Date: Fri, 24 Jul 2009 12:38:04 +0200 Subject: [Biojava-dev] Hackathon In-Reply-To: <1247505253.27493.15.camel@buzzybee> References: <1247505253.27493.15.camel@buzzybee> Message-ID: <4A698F0C.9090402@gmail.com> 1) Michael Sammeth 2) sequencing, gene expression, splicing, alignment 3) Barcelona beach 4) continental maybe yes, states probably not 5) not available from mid Oct to end of Nov, the rest of the time probably just busy as usual Cheers, micha Richard Holland wrote: > Hi all. > > Andreas and I would like to organise a hackathon to get the > modularisation and general improvement plans for BJ3 into action, and > bring the project forward into the 21st century (only 10 years late!). > > At this time I'm trying to gather interest and gauge who might > realistically be able to attend. We will attempt to site the hackathon > at a location closest to the majority of attendees. > > To help me plan numbers and likely costs (for potential sponsors) could > all those who are interested please answer the following questions for > me: > > 1. Name, > 2. Specialist interest within Biojava (e.g. proteomics, microarrays, > sequencing, etc.), > 3. Your physical location (country and and nearest major city - e.g. > Cambridge, London, Newcastle, San Diego, Singapore, etc.), > 4. Whether you think your employer would help pay your airfare and/or 1 > week in a hotel to attend (and how far you think you could go on such > funding), > 5. Approximate availability for the next 12 months. > > To get the ball rolling, here's me: > > 1. Richard Holland, 2. Making the whole thing more consistent, > efficient, and easier to use, 3. Southampton (UK), 4. Possibly but > probably only within UK/Europe, 5. Only available in mid-Jan 2010, > otherwise can't do anything until mid-March 2010 onwards. > > Looking forward to hearing your comments! Once I have a good idea of > numbers and distribution, I can get some costs together to give you (and > any potential sponsors) the best idea of what might be involved. > > cheers, > Richard > -- O o O o O o Dr. Michael Sammeth | O o | | O o | | O o | http://www.sammeth.net | | O | | | | O | GRIB| O | Phone: +34-933-160-166 | o O | | o O | | o O | Fax: +34 933-969-983 o O o O o O Dr. Aiguader 88, 08003 Barcelona From andreas at sdsc.edu Fri Jul 24 15:23:42 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 24 Jul 2009 08:23:42 -0700 Subject: [Biojava-dev] Hackathon In-Reply-To: References: <1247505253.27493.15.camel@buzzybee> Message-ID: <59a41c430907240823u5dc152i9c6a5854adf9ba0e@mail.gmail.com> > There seem to be many DAS classes under the biojava-live site, but many of > them are not used in any of the DAS related code I have, with the exception > of Structure and Alignment classes that are used in some dazzle plugins. Many of the org.biojava.bio.program.das code is quite ancient and should be deprecated... In a new biojava-das related module would be nice to merge in one of the more modern DAS libraries (dasobert?) as a replacement... Andreas > it might be good to update some of these classes to new more relevant code. > I also notice that Apollo was trying to get away from using Biojava code for > it's DAS 1.5 adapter. Maybe the new modular design of biojava will resolve > issues that Apollo developers had? > > Anyway- I guess I'm asking Andreas or anyone else if they know the history > of some of these classes e.g. org.biojava.bio.program.das package? > > > > On 13 Jul 2009, at 18:14, Richard Holland wrote: > >> 1. Name, >> 2. Specialist interest within Biojava (e.g. proteomics, microarrays, >> sequencing, etc.), >> 3. Your physical location (country and and nearest major city - e.g. >> Cambridge, London, Newcastle, San Diego, Singapore, etc.), >> 4. Whether you think your employer would help pay your airfare and/or 1 >> week in a hotel to attend (and how far you think you could go on such >> funding), >> 5. Approximate availability for the next 12 months. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a > charity registered in England with number 1021457 and acompany registered in > England with number 2742969, whose registeredoffice is 215 Euston Road, > London, NW1 2BE._______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From holland at eaglegenomics.com Mon Jul 27 13:23:19 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 27 Jul 2009 14:23:19 +0100 Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files? In-Reply-To: <200907271416.33485.florian.mittag@uni-tuebingen.de> References: <200907241929.08768.florian.mittag@uni-tuebingen.de> <93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> <200907271416.33485.florian.mittag@uni-tuebingen.de> Message-ID: <1248700999.2803.103.camel@buzzybee> > My question to this list again: > Is there a way to achieve my goal of parsing a 200MB Genbank file with the > current biojava version without code changes? Probably not. The internal requirement to convert everything into SymbolLists and back again really does get in the way. This is one of the main drivers behind BioJava3 - to refactor out unnecessary complexity, of which this is a prime example. The ideal solution would be to parse the file and keep the sequence as a string, only to be converted into Symbols when _absolutely necessary_ - otherwise to remain as a string (or even just as a pointer to a string stored on a disk-based temporary file repository somewhere, to save memory). Hibernate et al could then work directly with the string. cheers, Richard > > - Florian > > > > > On 25 Jul 2009, 1:33 AM, "Florian Mittag" > > wrote: > > > > Hi! > > > > I think this is a problem worth of its own thread, so I'll start one: > > > > I want to store all human chromosomes in a BioSQL database after I loaded > > the > > information from .gbk files. The files I get from NCBI with the following > > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804: > > > > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0 > >00023&rettype=gbwithparts&retmode=text > > > > I then try to parse the files as described in > > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi > >les but it wont work. While there are no problems parsing 1804 and 24, > > chromosome > > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space. > > > > Here is a stack trace (the line numbers might differ, because I already > > tried > > to improve GenbankFormat.java in memory efficiency): > > > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > > at > > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis > >tFactory.java:222) at > > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ > >enceBuilder.java:256) at > > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5 > >35) at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader. > >java:110) at > > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main. > >java:537) at > > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46 > >8) at > > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164) > > > > The line in GenbankFormat.java is: > > > > rlistener.addSymbols( > > symParser.getAlphabet(), > > (Symbol[])(sl.toList().toArray(new Symbol[0])), > > 0, sl.length()); > > > > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails > > later > > inside the addSymbols method, but it always fails. > > > > How can this be? I mean, the file is only 190MB in size, so 2GB of memory > > should be more than enough. Browsing through the source code, I discovered > > what I think of as very inefficient handling of sequences: > > > > 1) the sequence string is read from file into a StringBuffer > > 2) it is converted to a string (with whitespaces removed) > > 3) a SimpleSymbolList is created out of the string > > 4) the SymbolList is converted to a List of Symbols > > 5) the List is converted to an array of Symbols > > 6) the array is passed to addSymbols > > 7) there it is added to a ChunkedSymbolListFactory > > 8) if at some point the sequence is requested, a SymbolList is created and > > then converted to a string. > > > > You see, there is a lot of copying and converting, but in the end I have > > the same string I started with. Well, I had the string, if it ever reached > > the end, because it will crash before completing this process. > > > > > > Am I doing something wrong or is there a great potential of improving > > parsing > > of Genbank files? > > > > > > Regards, > > Florian > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From paolo.pavan at gmail.com Mon Jul 27 16:47:46 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 27 Jul 2009 18:47:46 +0200 Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files? In-Reply-To: <1248700999.2803.103.camel@buzzybee> References: <200907241929.08768.florian.mittag@uni-tuebingen.de> <93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> <200907271416.33485.florian.mittag@uni-tuebingen.de> <1248700999.2803.103.camel@buzzybee> Message-ID: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com> Calling a garbage collection among the steps doesn't bring to anything, isn't it? 2009/7/27 Richard Holland : > >> My question to this list again: >> Is there a way to achieve my goal of parsing a 200MB Genbank file with the >> current biojava version without code changes? > > Probably not. The internal requirement to convert everything into > SymbolLists and back again really does get in the way. This is one of > the main drivers behind BioJava3 - to refactor out unnecessary > complexity, of which this is a prime example. > > The ideal solution would be to parse the file and keep the sequence as a > string, only to be converted into Symbols when _absolutely necessary_ - > otherwise to remain as a string (or even just as a pointer to a string > stored on a disk-based temporary file repository somewhere, to save > memory). Hibernate et al could then work directly with the string. > > cheers, > Richard > >> >> - Florian >> >> >> >> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" >> > wrote: >> > >> > Hi! >> > >> > I think this is a problem worth of its own thread, so I'll start one: >> > >> > I want to store all human chromosomes in a BioSQL database after I loaded >> > the >> > information from .gbk files. The files I get from NCBI with the following >> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804: >> > >> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0 >> >00023&rettype=gbwithparts&retmode=text >> > >> > I then try to parse the files as described in >> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi >> >les but it wont work. While there are no problems parsing 1804 and 24, >> > chromosome >> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space. >> > >> > Here is a stack trace (the line numbers might differ, because I already >> > tried >> > to improve GenbankFormat.java in memory efficiency): >> > >> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >> > ? ? ? ?at >> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis >> >tFactory.java:222) at >> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ >> >enceBuilder.java:256) at >> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5 >> >35) at >> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader. >> >java:110) at >> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main. >> >java:537) at >> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46 >> >8) at >> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164) >> > >> > The line in GenbankFormat.java is: >> > >> > rlistener.addSymbols( >> > ? ? ? ?symParser.getAlphabet(), >> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])), >> > ? ? ? ?0, sl.length()); >> > >> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails >> > later >> > inside the addSymbols method, but it always fails. >> > >> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory >> > should be more than enough. Browsing through the source code, I discovered >> > what I think of as very inefficient handling of sequences: >> > >> > 1) the sequence string is read from file into a StringBuffer >> > 2) it is converted to a string (with whitespaces removed) >> > 3) a SimpleSymbolList is created out of the string >> > 4) the SymbolList is converted to a List of Symbols >> > 5) the List is converted to an array of Symbols >> > 6) the array is passed to addSymbols >> > 7) there it is added to a ChunkedSymbolListFactory >> > 8) if at some point the sequence is requested, a SymbolList is created and >> > then converted to a string. >> > >> > You see, there is a lot of copying and converting, but in the end I have >> > the same string I started with. Well, I had the string, if it ever reached >> > the end, because it will crash before completing this process. >> > >> > >> > Am I doing something wrong or is there a great potential of improving >> > parsing >> > of Genbank files? >> > >> > >> > Regards, >> > ? Florian >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Tue Jul 28 02:52:44 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 28 Jul 2009 10:52:44 +0800 Subject: [Biojava-dev] [Biojava-l] How to parse large Genbank files? In-Reply-To: <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com> References: <200907241929.08768.florian.mittag@uni-tuebingen.de> <93b45ca50907241920r60c28931p1b43bf6b6a101b46@mail.gmail.com> <200907271416.33485.florian.mittag@uni-tuebingen.de> <1248700999.2803.103.camel@buzzybee> <56be91b60907270947kf03afa3v7976c569d806ad16@mail.gmail.com> Message-ID: <93b45ca50907271952k689c2f27h78bf7b9cc47e7d45@mail.gmail.com> Dear Paolo - Calling the garbage collector is generally not required and often not recommended. Modern JVMs do a better job of this than programmers do. Also a garbage collector cannot release memory that is allocated to objects that still contain references. I suspect the problem here is that objects are being copied and references are being retained to the old copies. These old copies are not really required and therefore the references can be set to null which will allow the GC to clean them up. Also, manually calling the GC is very aggressive and forces the JVM to dump all classes it is not currently using, when the class is called again the classloader will need to reload it which can result in a performance hit. - Mark On Tue, Jul 28, 2009 at 12:47 AM, Paolo Pavan wrote: > Calling a garbage collection among the steps doesn't bring to > anything, isn't it? > > 2009/7/27 Richard Holland : >> >>> My question to this list again: >>> Is there a way to achieve my goal of parsing a 200MB Genbank file with the >>> current biojava version without code changes? >> >> Probably not. The internal requirement to convert everything into >> SymbolLists and back again really does get in the way. This is one of >> the main drivers behind BioJava3 - to refactor out unnecessary >> complexity, of which this is a prime example. >> >> The ideal solution would be to parse the file and keep the sequence as a >> string, only to be converted into Symbols when _absolutely necessary_ - >> otherwise to remain as a string (or even just as a pointer to a string >> stored on a disk-based temporary file repository somewhere, to save >> memory). Hibernate et al could then work directly with the string. >> >> cheers, >> Richard >> >>> >>> - Florian >>> >>> >>> >>> > On 25 Jul 2009, 1:33 AM, "Florian Mittag" >>> > wrote: >>> > >>> > Hi! >>> > >>> > I think this is a problem worth of its own thread, so I'll start one: >>> > >>> > I want to store all human chromosomes in a BioSQL database after I loaded >>> > the >>> > information from .gbk files. The files I get from NCBI with the following >>> > URIs, where the id ranges from nc_000001 to nc_000024 plus nc_001804: >>> > >>> > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=nc_0 >>> >00023&rettype=gbwithparts&retmode=text >>> > >>> > I then try to parse the files as described in >>> > http://biojava.org/wiki/BioJava:BioJavaXDocs#Tools_for_reading.2Fwriting_fi >>> >les but it wont work. While there are no problems parsing 1804 and 24, >>> > chromosome >>> > 23 leads to a OutOfMemory exception although I gave it 2GB of heap space. >>> > >>> > Here is a stack trace (the line numbers might differ, because I already >>> > tried >>> > to improve GenbankFormat.java in memory efficiency): >>> > >>> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >>> > ? ? ? ?at >>> > org.biojava.bio.seq.io.ChunkedSymbolListFactory.addSymbols(ChunkedSymbolLis >>> >tFactory.java:222) at >>> > org.biojavax.bio.seq.io.SimpleRichSequenceBuilder.addSymbols(SimpleRichSequ >>> >enceBuilder.java:256) at >>> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:5 >>> >35) at >>> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader. >>> >java:110) at >>> > org.prodge.sequence_viewer.db.UpdateDB_Main.updateChromosome(UpdateDB_Main. >>> >java:537) at >>> > org.prodge.sequence_viewer.db.UpdateDB_Main.newGenome(UpdateDB_Main.java:46 >>> >8) at >>> > org.prodge.sequence_viewer.db.UpdateDB_Main.main(UpdateDB_Main.java:164) >>> > >>> > The line in GenbankFormat.java is: >>> > >>> > rlistener.addSymbols( >>> > ? ? ? ?symParser.getAlphabet(), >>> > ? ? ? ?(Symbol[])(sl.toList().toArray(new Symbol[0])), >>> > ? ? ? ?0, sl.length()); >>> > >>> > Sometimes it fails at the sl.toList().toArray()-part, sometimes it fails >>> > later >>> > inside the addSymbols method, but it always fails. >>> > >>> > How can this be? I mean, the file is only 190MB in size, so 2GB of memory >>> > should be more than enough. Browsing through the source code, I discovered >>> > what I think of as very inefficient handling of sequences: >>> > >>> > 1) the sequence string is read from file into a StringBuffer >>> > 2) it is converted to a string (with whitespaces removed) >>> > 3) a SimpleSymbolList is created out of the string >>> > 4) the SymbolList is converted to a List of Symbols >>> > 5) the List is converted to an array of Symbols >>> > 6) the array is passed to addSymbols >>> > 7) there it is added to a ChunkedSymbolListFactory >>> > 8) if at some point the sequence is requested, a SymbolList is created and >>> > then converted to a string. >>> > >>> > You see, there is a lot of copying and converting, but in the end I have >>> > the same string I started with. Well, I had the string, if it ever reached >>> > the end, because it will crash before completing this process. >>> > >>> > >>> > Am I doing something wrong or is there a great potential of improving >>> > parsing >>> > of Genbank files? >>> > >>> > >>> > Regards, >>> > ? Florian >>> > _______________________________________________ >>> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From abhishek.vit at gmail.com Tue Jul 28 19:04:00 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Tue, 28 Jul 2009 15:04:00 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl Message-ID: Hi Guys Before I ask the question, let me introduce myself. I am Abhishek primarily a Bioinformatician and this is my first mail here. I realized sooner thn later that I have to use BioJava to make my life easier. :) So basically we have a lot of perl code where we would like to plugin some Biojava code and some inhouse written packages/classes. I am just wondering what is the best way to do so. Clearly I am not a java guy so please excuse me in case I am asking something which is very basic. I found couple of solutions after few googles but not sure which is the efficient one. Thanks, -Abhi From ayates at ebi.ac.uk Tue Jul 28 21:54:17 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 28 Jul 2009 22:54:17 +0100 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: Message-ID: Hi Abhi, Well to answer your first question the only real way to do this is by shelling out to Java. Inter-process communication could then be dealt with by writing to temporary files or maybe communicating back over STDOUT. The question I would ask you though is what particular part of BioJava are you using? Is there any reason why another similarly named Bio project (shall not mention it here as I think people think I'm becoming weak when it comes to Perl) cannot be used? As always when programming avoiding shelling out to another program if possible is always a good idea; sometimes it cannot happen say if you want to run clustalw but say shelling out to delete a file is unnecessary. Andy On 28 Jul 2009, at 20:04, Abhishek Pratap wrote: > Hi Guys > > Before I ask the question, let me introduce myself. I am Abhishek > primarily a Bioinformatician and this is my first mail here. I > realized sooner thn later that I have to use BioJava to make my life > easier. :) > > So basically we have a lot of perl code where we would like to plugin > some Biojava code and some inhouse written packages/classes. I am just > wondering what is the best way to do so. Clearly I am not a java guy > so please excuse me in case I am asking something which is very basic. > I found couple of solutions after few googles but not sure which is > the efficient one. > > Thanks, > -Abhi > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From abhishek.vit at gmail.com Tue Jul 28 22:23:49 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Tue, 28 Jul 2009 18:23:49 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: Message-ID: Hi Andy Thanks for a quick reply. I think SHELLING out will be too process intensive as we expect thousands of call to same Java method. I also read about the Perl modules Java::Inline. Is that any good ? And to answer your second question I am basically using a inhouse method which in turns used a lot of BioJava classes for DNA manipulation. Thanks, -Abhi On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi Abhi, > > Well to answer your first question the only real way to do this is by > shelling out to Java. Inter-process communication could then be dealt with > by writing to temporary files or maybe communicating back over STDOUT. > > The question I would ask you though is what particular part of BioJava are > you using? Is there any reason why another similarly named Bio project > (shall not mention it here as I think people think I'm becoming weak when it > comes to Perl) cannot be used? As always when programming avoiding shelling > out to another program if possible is always a good idea; sometimes it > cannot happen say if you want to run clustalw but say shelling out to delete > a file is unnecessary. > > Andy > > On 28 Jul 2009, at 20:04, Abhishek Pratap wrote: > >> Hi Guys >> >> Before I ask the question, let me introduce myself. I am Abhishek >> primarily a Bioinformatician and this is my first mail here. I >> realized sooner thn later that I have to use BioJava to make my life >> easier. :) >> >> So basically we have a lot of perl code where we would like to plugin >> some Biojava code and some inhouse written packages/classes. I am just >> wondering what is the best way to do so. Clearly I am not a java guy >> so please excuse me in case I am asking something which is very basic. >> I found couple of solutions after few googles but not sure which is >> the efficient one. >> >> Thanks, >> -Abhi >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From markjschreiber at gmail.com Tue Jul 28 23:24:30 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 29 Jul 2009 07:24:30 +0800 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> Message-ID: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> Hi - You could try and use something like CORBA but that would be quite ugly. A nicer alternative would be to put the BioJava functionality in a web service and send sequences as FASTA or some custom format?? I think WS is considered the best way for Java and .NET to talk so probably it is for Perl too. - Mark On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: Hi Andy Thanks for a quick reply. I think SHELLING out will be too process intensive as we expect thousands of call to same Java method. I also read about the Perl modules Java::Inline. Is that any good ? And to answer your second question I am basically using a inhouse method which in turns used a lot of BioJava classes for DNA manipulation. Thanks, -Abhi On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi Abhi, > > Well to answer ... From ayates at ebi.ac.uk Wed Jul 29 08:48:33 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 29 Jul 2009 09:48:33 +0100 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> Message-ID: <4A700CE1.3050901@ebi.ac.uk> Indeed I would agree with Mark here and go for web services as the desired solution. JAX-WS & CXF are both popular frameworks for doing Web Services and as far as I remember Spring had some very nice helper classes for quickly exposing any Java class as a web service. Then there are other remoting protocols such as Hessian, Burlap, Protocol Buffers or Thrift all of which are good in their own ways. However Web Services should be the quickest (re implementation) way to communicate with a persistent Java process. Personally I would stay away from Java::Inline. Andy Mark Schreiber wrote: > Hi - > > You could try and use something like CORBA but that would be quite ugly. > > A nicer alternative would be to put the BioJava functionality in a web > service and send sequences as FASTA or some custom format?? > > I think WS is considered the best way for Java and .NET to talk so probably > it is for Perl too. > > - Mark > > On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: > > Hi Andy > > Thanks for a quick reply. I think SHELLING out will be too process > intensive as we expect thousands of call to same Java method. I also > read about the Perl modules Java::Inline. Is that any good ? > > And to answer your second question I am basically using a inhouse > method which in turns used a lot of BioJava classes for DNA > manipulation. > > Thanks, > -Abhi > > On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi > Abhi, > > Well to answer ... > From abhishek.vit at gmail.com Wed Jul 29 16:04:04 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 29 Jul 2009 12:04:04 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <4A700CE1.3050901@ebi.ac.uk> References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> <4A700CE1.3050901@ebi.ac.uk> Message-ID: Thanks all. I think Java WS is a way out for me then. As you said it would be code agnostic and will help me in updating the core code later. Just a quick question . Do you happen to know of any good tutorial to implement a WS for a java process. Thanks, -Abhi On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates wrote: > Indeed I would agree with Mark here and go for web services as the > desired solution. JAX-WS & CXF are both popular frameworks for doing Web > Services and as far as I remember Spring had some very nice helper > classes for quickly exposing any Java class as a web service. Then there > are other remoting protocols such as Hessian, Burlap, Protocol Buffers > or Thrift all of which are good in their own ways. > > However Web Services should be the quickest (re implementation) way to > communicate with a persistent Java process. > > Personally I would stay away from Java::Inline. > > Andy > > Mark Schreiber wrote: >> Hi - >> >> You could try and use something like CORBA but that would be quite ugly. >> >> A nicer alternative would be to put the BioJava functionality in a web >> service and send sequences as FASTA or some custom format?? >> >> I think WS is considered the best way for Java and .NET to talk so probably >> it is for Perl too. >> >> - Mark >> >> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: >> >> Hi Andy >> >> Thanks for a quick reply. ?I think SHELLING out will be too process >> intensive as we expect thousands of call to same Java method. I also >> read about the Perl modules Java::Inline. Is that any good ? >> >> And to answer your second question I am basically using a inhouse >> method which in turns used a lot of BioJava classes for DNA >> manipulation. >> >> Thanks, >> -Abhi >> >> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi >> Abhi, > > Well to answer ... >> > From ayates at ebi.ac.uk Wed Jul 29 16:21:12 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 29 Jul 2009 17:21:12 +0100 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: <93b45ca50907281619q4fc1572aoed7333a72534fa14@mail.gmail.com> <93b45ca50907281624u1a8bb7cy81c50f1c90434f5@mail.gmail.com> <4A700CE1.3050901@ebi.ac.uk> Message-ID: <4A7076F8.2040308@ebi.ac.uk> Depends on what you're going to use but when I last did it I bought into the Spring way of things and found that the spring manual was very good. The WS bit is: http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/ch21s05.html It goes through doing it for JAX-WS & XFire. There's also a JAX-WS tutorial from: http://java.sun.com/javaee/5/docs/tutorial/doc/?wp405739&JAXWS.html#wp72279 To be honest though Google is your best friend here. Good luck, Andy Abhishek Pratap wrote: > Thanks all. I think Java WS is a way out for me then. As you said it > would be code agnostic and will help me in updating the core code > later. > > Just a quick question . Do you happen to know of any good tutorial to > implement a WS for a java process. > > Thanks, > -Abhi > > On Wed, Jul 29, 2009 at 4:48 AM, Andy Yates wrote: >> Indeed I would agree with Mark here and go for web services as the >> desired solution. JAX-WS & CXF are both popular frameworks for doing Web >> Services and as far as I remember Spring had some very nice helper >> classes for quickly exposing any Java class as a web service. Then there >> are other remoting protocols such as Hessian, Burlap, Protocol Buffers >> or Thrift all of which are good in their own ways. >> >> However Web Services should be the quickest (re implementation) way to >> communicate with a persistent Java process. >> >> Personally I would stay away from Java::Inline. >> >> Andy >> >> Mark Schreiber wrote: >>> Hi - >>> >>> You could try and use something like CORBA but that would be quite ugly. >>> >>> A nicer alternative would be to put the BioJava functionality in a web >>> service and send sequences as FASTA or some custom format?? >>> >>> I think WS is considered the best way for Java and .NET to talk so probably >>> it is for Perl too. >>> >>> - Mark >>> >>> On 29 Jul 2009, 6:24 AM, "Abhishek Pratap" wrote: >>> >>> Hi Andy >>> >>> Thanks for a quick reply. I think SHELLING out will be too process >>> intensive as we expect thousands of call to same Java method. I also >>> read about the Perl modules Java::Inline. Is that any good ? >>> >>> And to answer your second question I am basically using a inhouse >>> method which in turns used a lot of BioJava classes for DNA >>> manipulation. >>> >>> Thanks, >>> -Abhi >>> >>> On Tue, Jul 28, 2009 at 5:54 PM, Andy Yates wrote: > Hi >>> Abhi, > > Well to answer ... >>> From Russell.Smithies at agresearch.co.nz Wed Jul 29 20:25:05 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 30 Jul 2009 08:25:05 +1200 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> You could always use BioPerl instead :-) http://www.bioperl.org/wiki/Main_Page Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev- > bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap > Sent: Wednesday, 29 July 2009 7:04 a.m. > To: biojava-dev at lists.open-bio.org > Subject: [Biojava-dev] Hi.. Calling Java from Perl > > Hi Guys > > Before I ask the question, let me introduce myself. I am Abhishek > primarily a Bioinformatician and this is my first mail here. I > realized sooner thn later that I have to use BioJava to make my life > easier. :) > > So basically we have a lot of perl code where we would like to plugin > some Biojava code and some inhouse written packages/classes. I am just > wondering what is the best way to do so. Clearly I am not a java guy > so please excuse me in case I am asking something which is very basic. > I found couple of solutions after few googles but not sure which is > the efficient one. > > Thanks, > -Abhi > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ayates at ebi.ac.uk Wed Jul 29 20:56:34 2009 From: ayates at ebi.ac.uk (ayates at ebi.ac.uk) Date: Wed, 29 Jul 2009 21:56:34 +0100 (BST) Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> Message-ID: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk> That was my original point however it sounds from the original poster that the system which is in Perl needs to call out to an already implemented system in BioJava. In a perfect world this mismatch would never happen but hey we all know it can :) Andy > You could always use BioPerl instead :-) > http://www.bioperl.org/wiki/Main_Page > > > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809?? > F? +64 3 489 9174? > www.agresearch.co.nz > > > >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev- >> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap >> Sent: Wednesday, 29 July 2009 7:04 a.m. >> To: biojava-dev at lists.open-bio.org >> Subject: [Biojava-dev] Hi.. Calling Java from Perl >> >> Hi Guys >> >> Before I ask the question, let me introduce myself. I am Abhishek >> primarily a Bioinformatician and this is my first mail here. I >> realized sooner thn later that I have to use BioJava to make my life >> easier. :) >> >> So basically we have a lot of perl code where we would like to plugin >> some Biojava code and some inhouse written packages/classes. I am just >> wondering what is the best way to do so. Clearly I am not a java guy >> so please excuse me in case I am asking something which is very basic. >> I found couple of solutions after few googles but not sure which is >> the efficient one. >> >> Thanks, >> -Abhi >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From abhishek.vit at gmail.com Wed Jul 29 21:06:54 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 29 Jul 2009 17:06:54 -0400 Subject: [Biojava-dev] Hi.. Calling Java from Perl In-Reply-To: <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk> References: <18DF7D20DFEC044098A1062202F5FFF32AAB5A4EC5@exchsth.agresearch.co.nz> <35760.86.9.203.151.1248900994.squirrel@webmail.ebi.ac.uk> Message-ID: Yeah it is part of the development cycle. We need to integrate the code some part of it is perl and some in Java. >From your suggestions I feel I clearly have two options. 1. use a web service to talk between java and perl. This might not be very efficient as we expect to make thousands of call per run. 2. Port the whole java code to bioperl. #2 is scary but I might just have to do it. Thanks again to all of you, -Abhi On Wed, Jul 29, 2009 at 4:56 PM, wrote: > That was my original point however it sounds from the original poster that > the system which is in Perl needs to call out to an already implemented > system in BioJava. In a perfect world this mismatch would never happen but > hey we all know it can :) > > Andy > >> You could always use BioPerl instead :-) >> http://www.bioperl.org/wiki/Main_Page >> >> >> >> >> Russell Smithies >> >> Bioinformatics Applications Developer >> T +64 3 489 9085 >> E? russell.smithies at agresearch.co.nz >> >> Invermay? Research Centre >> Puddle Alley, >> Mosgiel, >> New Zealand >> T? +64 3 489 3809 >> F? +64 3 489 9174 >> www.agresearch.co.nz >> >> >> >>> -----Original Message----- >>> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev- >>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap >>> Sent: Wednesday, 29 July 2009 7:04 a.m. >>> To: biojava-dev at lists.open-bio.org >>> Subject: [Biojava-dev] Hi.. Calling Java from Perl >>> >>> Hi Guys >>> >>> Before I ask the question, let me introduce myself. I am Abhishek >>> primarily a Bioinformatician and this is my first mail here. I >>> realized sooner thn later that I have to use BioJava to make my life >>> easier. :) >>> >>> So basically we have a lot of perl code where we would like to plugin >>> some Biojava code and some inhouse written packages/classes. I am just >>> wondering what is the best way to do so. Clearly I am not a java guy >>> so please excuse me in case I am asking something which is very basic. >>> I found couple of solutions after few googles but not sure which is >>> the efficient one. >>> >>> Thanks, >>> -Abhi >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > From niall at sgenomics.org Thu Jul 30 16:32:00 2009 From: niall at sgenomics.org (Niall Haslam) Date: Thu, 30 Jul 2009 18:32:00 +0200 Subject: [Biojava-dev] Webservices Message-ID: <200907301832.01103.niall@sgenomics.org> Hi, I know it was brought up in the users list a month or two ago. But I wanted to ask in the Dev list what the consensus is on creating a biojava module for webservices clients. I am interested and have a little code to contribute. I think it would consist of mainly example code in how to use the webservice. And critically would not incorporate the stub code generated by axis. I would also bump for axis2. I think this could have the benefit of making services more standards compliant. But we'll probably have to do it on a case by case basis. I'd also like to know if there are people who are interested in using or writing some of it as well. Thanks and looking forward to your input, Niall. From HWillis at scripps.edu Fri Jul 31 14:04:31 2009 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 31 Jul 2009 10:04:31 -0400 Subject: [Biojava-dev] Webservices In-Reply-To: <200907301832.01103.niall@sgenomics.org> Message-ID: Niall I have the web services biojava implementation on my list of things to do! I have an upcoming project that doing Blast through web services to external sources and internal sources will make things easier. I like what axis2 is doing on making it easy to publish web services but using Netbeans as an example it is fairly painless to create a web service. Since we are mainly focused on consuming web services it would be nice to use the built in support of Java 6 to keep the external library count as low as possible which also helps avoid conflicts when an external application is using a different version of the same external library. I think the main driving force as you mention is that much will depend on the provider of the web service as to what web services client library will be needed. Thanks Scooter On 7/30/09 12:32 PM, "Niall Haslam" wrote: Hi, I know it was brought up in the users list a month or two ago. But I wanted to ask in the Dev list what the consensus is on creating a biojava module for webservices clients. I am interested and have a little code to contribute. I think it would consist of mainly example code in how to use the webservice. And critically would not incorporate the stub code generated by axis. I would also bump for axis2. I think this could have the benefit of making services more standards compliant. But we'll probably have to do it on a case by case basis. I'd also like to know if there are people who are interested in using or writing some of it as well. Thanks and looking forward to your input, Niall. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev