From daniel.quest at gmail.com Tue Oct 2 23:31:46 2012 From: daniel.quest at gmail.com (Daniel Quest) Date: Tue, 2 Oct 2012 22:31:46 -0500 Subject: [Biojava-l] maven + genbank parser Message-ID: Is there an example project that uses maven to access the genbank parser? I am confused if biojava3 has genbank support. If I want to use an earlier version along the lines of the cookbook does it exist in a maven repo? Thanks Dan From andreas at sdsc.edu Wed Oct 3 01:09:28 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 2 Oct 2012 22:09:28 -0700 Subject: [Biojava-l] maven + genbank parser In-Reply-To: References: Message-ID: Hi Daniel, both the 1.8 as well as the 3.0 series are available via maven builds. You have to configure the biojava specific repo ... biojava-maven-repo BioJava repository http://www.biojava.org/download/maven/ Andreas On Tue, Oct 2, 2012 at 8:31 PM, Daniel Quest wrote: > Is there an example project that uses maven to access the genbank parser? > I am confused if biojava3 has genbank support. If I want to use an > earlier version along the lines of the cookbook does it exist in a maven > repo? > > Thanks > Dan > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Mon Oct 8 18:43:43 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 8 Oct 2012 15:43:43 -0700 Subject: [Biojava-l] InstabilityIndex In-Reply-To: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: Hi Subrata, Please don't mail me directly, but send your questions to the list. Chances are that somebody there can help you best. Ah Fu, do you have any thoughts on this? Thanks, Andreas On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha wrote: > Dear Sir, > > I was testing the getInstabilityIndex() method of > org.biojava3.aaproperties.PeptideProperties class. > > I am trying to find InstabilityIndex for very short segment of peptide, as > short as two. But I am getting some surprising result in negative.Then i > found that the method perhaps giving -ve result for ambiguous characters. So > how to handle a situation if my protein sequence contains ambiguous > characters > > Input Sequence Instability Index > > GTDG -13.725 > VDVR -30.075 > > How i analyse the above situation. > > Kindly help me. > > > With Regards > > Subrata Sinha > Assistant Professor > Centre for Bioinformatics Studies > Dibrugarh University From darnells at dnastar.com Tue Oct 9 13:32:36 2012 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 9 Oct 2012 17:32:36 +0000 Subject: [Biojava-l] InstabilityIndex In-Reply-To: References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: @Subrata, The class does not handle characters other than the standard 20 amino acids. We substitute BJOUXZ with DLKCAE and live with the approximation. @Ah Fu, In the past we have used average values for B (D or N), J (L or I), and Z (E or Q), and a dummy substitution for X (G or A). Our U and O substitutions just map to the closest natural amino acid. It would be nice if real values existed for selenocysteine and pyrrolysine, but I haven't a clue if they do. @Andreas, I would suggest that a general policy that all protein sequence analyses support the 6 non-standard letter would be a good thing. A nice ideal, but who has the time? :) Regards, Steve -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Monday, October 08, 2012 5:44 PM To: subrata sinha Cc: Chuan Hock Koh; biojava-l at biojava.org Subject: Re: [Biojava-l] InstabilityIndex Hi Subrata, Please don't mail me directly, but send your questions to the list. Chances are that somebody there can help you best. Ah Fu, do you have any thoughts on this? Thanks, Andreas On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha wrote: > Dear Sir, > > I was testing the getInstabilityIndex() method of > org.biojava3.aaproperties.PeptideProperties class. > > I am trying to find InstabilityIndex for very short segment of > peptide, as short as two. But I am getting some surprising result in > negative.Then i found that the method perhaps giving -ve result for > ambiguous characters. So how to handle a situation if my protein > sequence contains ambiguous characters > > Input Sequence Instability Index > > GTDG -13.725 > VDVR -30.075 > > How i analyse the above situation. > > Kindly help me. > > > With Regards > > Subrata Sinha > Assistant Professor > Centre for Bioinformatics Studies > Dibrugarh University _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From hlapp at drycafe.net Wed Oct 10 16:31:04 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 10 Oct 2012 16:31:04 -0400 Subject: [Biojava-l] Fwd: [Announce] Call for Proposals for Doc Sprint Summit v2.0 References: Message-ID: If you've been a Google Summer of Code mentor this year or last year, you will have already seen this. I wanted to make sure everybody is aware, and this may provide the opportunity for the kind of concerted effort that could finally get a BioPerl, Biopython, Bioruby, or Biojava (or a combined??) off the ground. -hilmar Begin forwarded message: From: Carol Smith Subject: [GSoC Mentors] [Announce] Call for Proposals for Doc Sprint Summit v2.0 Date: October 10, 2012 2:44:50 PM EDT To: Google Summer of Code Mentors List Cc: adam at flossmanuals.net Dear GSoC mentors and org admins, Google Summer of Code in collaboration with Aspiration and FLOSS Manuals is hosting a "Doc Sprint Camp" at Google's Mountain View headquarters (California) Dec 3 - 7, 2012. The 2012 Doc Camp will feature: 1) An unconference on free software documentation topics - facilitated by Aspiration 2) 2-5 Book Sprints to produce books on free softwares - facilitated by FLOSS Manuals Building on the success of the 2011 GSoC Doc Camp we are proud to bring you the 2012 GSoC Doc Camp. Like the previous event the 2012 GSoC Doc Camp is a place for free software communities to meet, create a book for their project, attract new people to their efforts, and share their documentation experiences. The camp aims to improve free documentation materials and skills in free software projects and individuals and help form the identity of the emergent free documentation sector. Individuals and projects can apply. Food and accommodation for all individuals will be provided and travel support (full or partial) can also be applied for. Be a part of this exciting event ? propose a Book Sprint on your favorite free software or come and help others write a book on their favorite project. Guaranteed to be a lot of fun, productive, and a fantastic place to advance your documentation efforts and experiences. For more information or to register to take part, please see https://sites.google.com/site/docsprintsummitv2/. Please note proposals are due by October 26, so get yours in ASAP! Cheers, Carol Smith, Allen Gunn, Adam Hyde -- You received this message because you are subscribed to the Google Groups "Google Summer of Code Mentors List" group. To post to this group, send email to google-summer-of-code-mentors-list at googlegroups.com. To unsubscribe from this group, send email to google-summer-of-code-mentors-list+unsubscribe at googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-summer-of-code-mentors-list?hl=en. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From andreas at sdsc.edu Wed Oct 10 22:17:45 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 10 Oct 2012 19:17:45 -0700 Subject: [Biojava-l] InstabilityIndex In-Reply-To: References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: Thanks, good comments, Steve. I agree we should make it a policy from now on to support 6 non-standard letters for anything protein related... About your time comment - that's part of the idea of moving the development to git.. Ideally, it should be easier for everybody who is concerned to patch and share patches... Andreas On Tue, Oct 9, 2012 at 10:32 AM, Steve Darnell wrote: > @Subrata, The class does not handle characters other than the standard 20 amino acids. We substitute BJOUXZ with DLKCAE and live with the approximation. > > @Ah Fu, In the past we have used average values for B (D or N), J (L or I), and Z (E or Q), and a dummy substitution for X (G or A). Our U and O substitutions just map to the closest natural amino acid. It would be nice if real values existed for selenocysteine and pyrrolysine, but I haven't a clue if they do. > > @Andreas, I would suggest that a general policy that all protein sequence analyses support the 6 non-standard letter would be a good thing. A nice ideal, but who has the time? :) > > Regards, > Steve > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Monday, October 08, 2012 5:44 PM > To: subrata sinha > Cc: Chuan Hock Koh; biojava-l at biojava.org > Subject: Re: [Biojava-l] InstabilityIndex > > Hi Subrata, > > Please don't mail me directly, but send your questions to the list. > Chances are that somebody there can help you best. > > Ah Fu, do you have any thoughts on this? > > Thanks, > > Andreas > > > > > On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha wrote: >> Dear Sir, >> >> I was testing the getInstabilityIndex() method of >> org.biojava3.aaproperties.PeptideProperties class. >> >> I am trying to find InstabilityIndex for very short segment of >> peptide, as short as two. But I am getting some surprising result in >> negative.Then i found that the method perhaps giving -ve result for >> ambiguous characters. So how to handle a situation if my protein >> sequence contains ambiguous characters >> >> Input Sequence Instability Index >> >> GTDG -13.725 >> VDVR -30.075 >> >> How i analyse the above situation. >> >> Kindly help me. >> >> >> With Regards >> >> Subrata Sinha >> Assistant Professor >> Centre for Bioinformatics Studies >> Dibrugarh University > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas.prlic at gmail.com Thu Oct 11 14:32:30 2012 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 11 Oct 2012 11:32:30 -0700 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: References: Message-ID: Hi Terry, Biojava depends on forester version 0.955. There are no plans to get rid of this dependency, as far as I know. However we can try to upgrade to a newer version if that helps. If you are working in a Maven environment and you pull in BioJava that way, you can add an exclusion to your config. Something like the XML below. This forces your project to ignore the older forester library configured in biojava. Is this a suitable workaround for your problem? Andreas org.biojava biojava3-phylo 3.0.4 org forester On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens wrote: > Dear Andreas, > > I am the lead developer of the software Tassel. > http://www.maizegenetics.net/tassel > > We currently use Biojava 3.0. And we are > wanting to use the latest release of Forester. > Since Biojava has a dependency on an older > release of Forester, we are running into conflicts. > Can you help explain Biojava's dependency on > Forester? > > What version of Forester does Biojava 3.0 require? > It looks like version 0.955 > > What version of Forester does Biojava 3.0.4 require? > > Does any Biojava jar files include Forester classes? > Or just references? > > Are there plans to remove Biojava's dependency > on Forester? > > > Thank you, > > Terry Casstevens From andreas.prlic at gmail.com Thu Oct 11 14:50:10 2012 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 11 Oct 2012 11:50:10 -0700 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: References: Message-ID: "exclusion" means only that the (old) forester.jar that biojava depends on, will not be included. Instead you could configure your newer forester dependency in its own section. That one will get used instead. A On Thu, Oct 11, 2012 at 11:43 AM, Terry Casstevens wrote: > Hi Andreas, > > Thank you for the quick response! > > When you say "exclusion", sounds like the > parts of biojava that uses forester would be excluded? > I'm not sure, but I think our code uses some of > the code that would be excluded. > > As you probably already know, the latest > release of Forester is not backwardly > compatible with Forester version 0.955. > > Thank you, > > Terry > > > On Thu, Oct 11, 2012 at 2:32 PM, Andreas Prlic wrote: >> Hi Terry, >> >> Biojava depends on forester version 0.955. There are no plans to get >> rid of this dependency, as far as I know. However we can try to >> upgrade to a newer version if that helps. >> >> If you are working in a Maven environment and you pull in BioJava that >> way, you can add an exclusion to your config. Something like the XML >> below. This forces your project to ignore the older forester library >> configured in biojava. Is this a suitable workaround for your problem? >> >> Andreas >> >> >> >> org.biojava >> biojava3-phylo >> 3.0.4 >> >> >> org >> forester >> >> >> >> >> >> >> >> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens wrote: >>> Dear Andreas, >>> >>> I am the lead developer of the software Tassel. >>> http://www.maizegenetics.net/tassel >>> >>> We currently use Biojava 3.0. And we are >>> wanting to use the latest release of Forester. >>> Since Biojava has a dependency on an older >>> release of Forester, we are running into conflicts. >>> Can you help explain Biojava's dependency on >>> Forester? >>> >>> What version of Forester does Biojava 3.0 require? >>> It looks like version 0.955 >>> >>> What version of Forester does Biojava 3.0.4 require? >>> >>> Does any Biojava jar files include Forester classes? >>> Or just references? >>> >>> Are there plans to remove Biojava's dependency >>> on Forester? >>> >>> >>> Thank you, >>> >>> Terry Casstevens From HWillis at scripps.edu Thu Oct 11 14:46:24 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 11 Oct 2012 14:46:24 -0400 Subject: [Biojava-l] Biojava Dependency on Forester Message-ID: <9CA63734-A43D-4BAE-A744-FF292D1EF972@scripps.edu> If forester has maven repository we can unhook the local depedency. We use forester for NJ and should only be a needed in one model. Let me know the issue/conflict and I can see what I can do to clean up. Thanks Scooter ----- Reply message ----- From: "Andreas Prlic" To: "Terry Casstevens" Cc: "Peter Bradbury" , "Jeff Glaubitz" , "Ed Buckler" , "biojava-l at biojava.org" Subject: [Biojava-l] Biojava Dependency on Forester Date: Thu, Oct 11, 2012 2:33 pm Hi Terry, Biojava depends on forester version 0.955. There are no plans to get rid of this dependency, as far as I know. However we can try to upgrade to a newer version if that helps. If you are working in a Maven environment and you pull in BioJava that way, you can add an exclusion to your config. Something like the XML below. This forces your project to ignore the older forester library configured in biojava. Is this a suitable workaround for your problem? Andreas org.biojava biojava3-phylo 3.0.4 org forester On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens wrote: > Dear Andreas, > > I am the lead developer of the software Tassel. > http://www.maizegenetics.net/tassel > > We currently use Biojava 3.0. And we are > wanting to use the latest release of Forester. > Since Biojava has a dependency on an older > release of Forester, we are running into conflicts. > Can you help explain Biojava's dependency on > Forester? > > What version of Forester does Biojava 3.0 require? > It looks like version 0.955 > > What version of Forester does Biojava 3.0.4 require? > > Does any Biojava jar files include Forester classes? > Or just references? > > Are there plans to remove Biojava's dependency > on Forester? > > > Thank you, > > Terry Casstevens _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From HWillis at scripps.edu Thu Oct 11 15:50:26 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 11 Oct 2012 15:50:26 -0400 Subject: [Biojava-l] Biojava Dependency on Forester Message-ID: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> Andreas Looks like the alignment code is using the distance matrix from forester and that has changed. Any chance the developer who did the MSA code could get this working with the latest forester code. It is probably a refactoring problem. Scooter ----- Reply message ----- From: "Terry Casstevens" To: "Scooter Willis" Cc: "Andreas Prlic" , "Peter Bradbury" , "Jeff Glaubitz" , "Ed Buckler" , "biojava-l at biojava.org" Subject: [Biojava-l] Biojava Dependency on Forester Date: Thu, Oct 11, 2012 2:56 pm Hi Scooter, Andreas, Thank you again for the responses. This is one problem we are seeing. org/forester/phylogenyinference does not exist in Forester version 1.005. Exception in thread "main" java.lang.NoClassDefFoundError: org/forester/phylogenyinference/DistanceMatrix at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) at net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java:306) at net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java:183) at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMutableAlignment(TagsToSNPByAlignmentPlugin.java:417) at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPByAlignment(TagsToSNPByAlignmentPlugin.java:347) at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunction(TagsToSNPByAlignmentPlugin.java:107) at net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlugin(TerryPipelines.java:36) at net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:41) Caused by: java.lang.ClassNotFoundException: org.forester.phylogenyinference.DistanceMatrix at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) Thank you, Terry On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis wrote: > If forester has maven repository we can unhook the local depedency. We use > forester for NJ and should only be a needed in one model. Let me know the > issue/conflict and I can see what I can do to clean up. > > Thanks > > Scooter > > > ----- Reply message ----- > From: "Andreas Prlic" > To: "Terry Casstevens" > Cc: "Peter Bradbury" , "Jeff Glaubitz" > , "Ed Buckler" , > "biojava-l at biojava.org" > Subject: [Biojava-l] Biojava Dependency on Forester > Date: Thu, Oct 11, 2012 2:33 pm > > > > Hi Terry, > > Biojava depends on forester version 0.955. There are no plans to get > rid of this dependency, as far as I know. However we can try to > upgrade to a newer version if that helps. > > If you are working in a Maven environment and you pull in BioJava that > way, you can add an exclusion to your config. Something like the XML > below. This forces your project to ignore the older forester library > configured in biojava. Is this a suitable workaround for your problem? > > Andreas > > > > org.biojava > biojava3-phylo > 3.0.4 > > > org > forester > > > > > > > > On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens > wrote: >> Dear Andreas, >> >> I am the lead developer of the software Tassel. >> http://www.maizegenetics.net/tassel >> >> We currently use Biojava 3.0. And we are >> wanting to use the latest release of Forester. >> Since Biojava has a dependency on an older >> release of Forester, we are running into conflicts. >> Can you help explain Biojava's dependency on >> Forester? >> >> What version of Forester does Biojava 3.0 require? >> It looks like version 0.955 >> >> What version of Forester does Biojava 3.0.4 require? >> >> Does any Biojava jar files include Forester classes? >> Or just references? >> >> Are there plans to remove Biojava's dependency >> on Forester? >> >> >> Thank you, >> >> Terry Casstevens > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From kohchuanhock at gmail.com Sun Oct 14 00:54:44 2012 From: kohchuanhock at gmail.com (Chuan Hock Koh) Date: Sun, 14 Oct 2012 13:54:44 +0900 Subject: [Biojava-l] InstabilityIndex In-Reply-To: References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: Hi Andreas and Steve, Sorry for the slow response. Currently, I am in the midst of moving from Singapore to Japan. Busy with settling into the new job, apartment hunting etc.. So, what is the conclusion for the problem? Do let me know what you guys like to be done, as clear as possible :) I will code them whenever I can find time which I believe I should have some at the end of this month. Thanks, Ah Fu On Thu, Oct 11, 2012 at 11:17 AM, Andreas Prlic wrote: > Thanks, good comments, Steve. > > I agree we should make it a policy from now on to support 6 > non-standard letters for anything protein related... About your time > comment - that's part of the idea of moving the development to git.. > Ideally, it should be easier for everybody who is concerned to patch > and share patches... > > Andreas > > > On Tue, Oct 9, 2012 at 10:32 AM, Steve Darnell > wrote: > > @Subrata, The class does not handle characters other than the standard > 20 amino acids. We substitute BJOUXZ with DLKCAE and live with the > approximation. > > > > @Ah Fu, In the past we have used average values for B (D or N), J (L or > I), and Z (E or Q), and a dummy substitution for X (G or A). Our U and O > substitutions just map to the closest natural amino acid. It would be nice > if real values existed for selenocysteine and pyrrolysine, but I haven't a > clue if they do. > > > > @Andreas, I would suggest that a general policy that all protein > sequence analyses support the 6 non-standard letter would be a good thing. > A nice ideal, but who has the time? :) > > > > Regards, > > Steve > > > > -----Original Message----- > > From: biojava-l-bounces at lists.open-bio.org [mailto: > biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > > Sent: Monday, October 08, 2012 5:44 PM > > To: subrata sinha > > Cc: Chuan Hock Koh; biojava-l at biojava.org > > Subject: Re: [Biojava-l] InstabilityIndex > > > > Hi Subrata, > > > > Please don't mail me directly, but send your questions to the list. > > Chances are that somebody there can help you best. > > > > Ah Fu, do you have any thoughts on this? > > > > Thanks, > > > > Andreas > > > > > > > > > > On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha < > subratasinha2006 at yahoo.co.in> wrote: > >> Dear Sir, > >> > >> I was testing the getInstabilityIndex() method of > >> org.biojava3.aaproperties.PeptideProperties class. > >> > >> I am trying to find InstabilityIndex for very short segment of > >> peptide, as short as two. But I am getting some surprising result in > >> negative.Then i found that the method perhaps giving -ve result for > >> ambiguous characters. So how to handle a situation if my protein > >> sequence contains ambiguous characters > >> > >> Input Sequence Instability Index > >> > >> GTDG -13.725 > >> VDVR -30.075 > >> > >> How i analyse the above situation. > >> > >> Kindly help me. > >> > >> > >> With Regards > >> > >> Subrata Sinha > >> Assistant Professor > >> Centre for Bioinformatics Studies > >> Dibrugarh University > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- http://compbio.ddns.comp.nus.edu.sg/~ChuanHockKoh From asidhu at biomap.org Wed Oct 17 01:28:06 2012 From: asidhu at biomap.org (Amandeep Sidhu) Date: Wed, 17 Oct 2012 13:28:06 +0800 Subject: [Biojava-l] International Symposium on Biomedical Data Infrastructure (BDI 2013) Message-ID: International Symposium on Biomedical Data Infrastructure (BDI 2013) 30 - 31 January 2013 Kuala Lumpur, Malaysia http://umconference.um.edu.my/BDI2013 Proceedings to be published by Springer Due to the emerging demands of huge amounts of biomedical data, new and improved data management capabilities are required for supporting a wide range of applications. Current Biomedical Databases are independently administered in geographically distinct locations, lending them almost ideally to adoption of intelligent data management approaches. As a result next generation of information infrastructure and data integration capabilities are needed to ensure increasing infrastructure agility required for high-throughput biomedical research. The workshop will focus on research issues, problems and opportunities in Biomedical Data Infrastructure. Topics of Interest are: * Big Biomedical Data and its Management * Biomedical Data integration and Interoperability * Next Generation Sequencing Data * Biomedical Image Analysis * Medical Informatics and Translational Bioinformatics * Biomedical Ontologies * Semantic Web Tools and Techniques for Biomedicine * Web 2.0 and Web 3.0 Applications in Biomedicine * Novel architectural models for HPC and cloud computing in Biomedicine * New parallel / concurrent programming models for High Performance Biomedical Applications in Cloud * Biomedical Data Cloud * Interoperability between different Utility Computing Platforms used for Biomedicine * Performance monitoring for biomedical applications in HPC and Cloud * Biomedical Infrastructure as a Service * Biomedical Platforms as a Service * Biomedical Software as a Service * Scientific workflows in bioinformatics and biomedicine * Data Mining in Biomedicine * Computational Systems Biology Submission Guidelines: We welcome original submissions that have not been published and that are not under review by another conference or journal. Papers should not exceed 15 pages excluding references in Springer format. Paper should be submitted through Easy Chair Online Submission System following instructions on the website (http://umconference.um.edu.my/BDI2013). All submissions will be evaluated on their originality, technical soundness, significance, presentation, and interest to the symposium attendees. Submission implies the willingness of at least one of the authors to register and present the work associated with the paper submitted. All submitted papers will be reviewed by symposium's technical program committee. All accepted papers of registered authors will be included in the proceedings published by Springer. All accepted papers will be required to submit a Springer Copyright Form. Important Dates: Paper submission: 20 November 2012 Notifications sent to authors: 10 December 2012 Camera-ready papers due: 24 December 2012 Registration due: 10 January 2013 Conference: 30 - 31 January 2013 Organizing Chairs: Dr. Amandeep S. Sidhu (Curtin Sarawak Malaysia, Malaysia) Dr. Sarinder Kaur (University of Malaya, Malaysia) Steering Committee: Dr. Dickson Lukose (MIMOS, Malaysia) Dr. Kanagasabai Rajaraman (Institute for Infocomm Research, Singapore) Prof. Dr. Meena Kishore Sakharkar (University of Tsukuba, Japan) Prof. Dr. Jake Chen (Indiana University-Purdue University Indianapolis, USA) Prof. Dr. Xiaohua Tony Hu (Drexel University, USA) Prof. Dr. Jason Tsong-Li Wang (New Jersey Institute of Technology, USA) Prof. Dr. Carolyn McGregor (Health Informatics Research, Canada) Please contact BDI 2013 Secretariat through email bdi at biomap.org for any queries. From chapman at cs.wisc.edu Wed Oct 17 21:08:58 2012 From: chapman at cs.wisc.edu (Mark Chapman) Date: Wed, 17 Oct 2012 20:08:58 -0500 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> References: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> Message-ID: <507F56AA.3010001@cs.wisc.edu> biojava3-alignment and biojava3-phylo have both been updated to use the latest forester release: 1.005. The jar file is in our maven repository, and changes are committed to the SVN and git repositories. Enjoy! Mark On 10/11/2012 02:50 PM, Scooter Willis wrote: > Andreas > > Looks like the alignment code is using the distance matrix from forester and that has changed. Any chance the developer who did the MSA code could get this working with the latest forester code. It is probably a refactoring problem. > > Scooter > > ----- Reply message ----- > From: "Terry Casstevens" > To: "Scooter Willis" > Cc: "Andreas Prlic" , "Peter Bradbury" , "Jeff Glaubitz" , "Ed Buckler" , "biojava-l at biojava.org" > Subject: [Biojava-l] Biojava Dependency on Forester > Date: Thu, Oct 11, 2012 2:56 pm > > > > Hi Scooter, Andreas, > > Thank you again for the responses. > > This is one problem we are seeing. org/forester/phylogenyinference > does not exist in Forester version 1.005. > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/forester/phylogenyinference/DistanceMatrix > at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) > at net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java:306) > at net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java:183) > at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMutableAlignment(TagsToSNPByAlignmentPlugin.java:417) > at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPByAlignment(TagsToSNPByAlignmentPlugin.java:347) > at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunction(TagsToSNPByAlignmentPlugin.java:107) > at net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlugin(TerryPipelines.java:36) > at net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:41) > Caused by: java.lang.ClassNotFoundException: > org.forester.phylogenyinference.DistanceMatrix > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:423) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > > > Thank you, > > Terry > > > On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis wrote: >> If forester has maven repository we can unhook the local depedency. We use >> forester for NJ and should only be a needed in one model. Let me know the >> issue/conflict and I can see what I can do to clean up. >> >> Thanks >> >> Scooter >> >> >> ----- Reply message ----- >> From: "Andreas Prlic" >> To: "Terry Casstevens" >> Cc: "Peter Bradbury" , "Jeff Glaubitz" >> , "Ed Buckler" , >> "biojava-l at biojava.org" >> Subject: [Biojava-l] Biojava Dependency on Forester >> Date: Thu, Oct 11, 2012 2:33 pm >> >> >> >> Hi Terry, >> >> Biojava depends on forester version 0.955. There are no plans to get >> rid of this dependency, as far as I know. However we can try to >> upgrade to a newer version if that helps. >> >> If you are working in a Maven environment and you pull in BioJava that >> way, you can add an exclusion to your config. Something like the XML >> below. This forces your project to ignore the older forester library >> configured in biojava. Is this a suitable workaround for your problem? >> >> Andreas >> >> >> >> org.biojava >> biojava3-phylo >> 3.0.4 >> >> >> org >> forester >> >> >> >> >> >> >> >> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens >> wrote: >>> Dear Andreas, >>> >>> I am the lead developer of the software Tassel. >>> http://www.maizegenetics.net/tassel >>> >>> We currently use Biojava 3.0. And we are >>> wanting to use the latest release of Forester. >>> Since Biojava has a dependency on an older >>> release of Forester, we are running into conflicts. >>> Can you help explain Biojava's dependency on >>> Forester? >>> >>> What version of Forester does Biojava 3.0 require? >>> It looks like version 0.955 >>> >>> What version of Forester does Biojava 3.0.4 require? >>> >>> Does any Biojava jar files include Forester classes? >>> Or just references? >>> >>> Are there plans to remove Biojava's dependency >>> on Forester? >>> >>> >>> Thank you, >>> >>> Terry Casstevens >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Wed Oct 17 21:18:31 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 17 Oct 2012 18:18:31 -0700 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: <507F56AA.3010001@cs.wisc.edu> References: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> <507F56AA.3010001@cs.wisc.edu> Message-ID: Awesome, Mark, Thanks for fixing this! Andreas On Wed, Oct 17, 2012 at 6:08 PM, Mark Chapman wrote: > biojava3-alignment and biojava3-phylo have both been updated to use the > latest forester release: 1.005. The jar file is in our maven repository, > and changes are committed to the SVN and git repositories. > > Enjoy! > Mark > > > > On 10/11/2012 02:50 PM, Scooter Willis wrote: >> >> Andreas >> >> Looks like the alignment code is using the distance matrix from forester >> and that has changed. Any chance the developer who did the MSA code could >> get this working with the latest forester code. It is probably a refactoring >> problem. >> >> Scooter >> >> ----- Reply message ----- >> From: "Terry Casstevens" >> To: "Scooter Willis" >> Cc: "Andreas Prlic" , "Peter Bradbury" >> , "Jeff Glaubitz" , "Ed Buckler" >> , "biojava-l at biojava.org" >> Subject: [Biojava-l] Biojava Dependency on Forester >> Date: Thu, Oct 11, 2012 2:56 pm >> >> >> >> Hi Scooter, Andreas, >> >> Thank you again for the responses. >> >> This is one problem we are seeing. org/forester/phylogenyinference >> does not exist in Forester version 1.005. >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/forester/phylogenyinference/DistanceMatrix >> at >> org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) >> at >> net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java:306) >> at >> net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java:183) >> at >> net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMutableAlignment(TagsToSNPByAlignmentPlugin.java:417) >> at >> net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPByAlignment(TagsToSNPByAlignmentPlugin.java:347) >> at >> net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunction(TagsToSNPByAlignmentPlugin.java:107) >> at >> net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlugin(TerryPipelines.java:36) >> at >> net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:41) >> Caused by: java.lang.ClassNotFoundException: >> org.forester.phylogenyinference.DistanceMatrix >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >> >> >> Thank you, >> >> Terry >> >> >> On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis >> wrote: >>> >>> If forester has maven repository we can unhook the local depedency. We >>> use >>> forester for NJ and should only be a needed in one model. Let me know the >>> issue/conflict and I can see what I can do to clean up. >>> >>> Thanks >>> >>> Scooter >>> >>> >>> ----- Reply message ----- >>> From: "Andreas Prlic" >>> To: "Terry Casstevens" >>> Cc: "Peter Bradbury" , "Jeff Glaubitz" >>> , "Ed Buckler" , >>> "biojava-l at biojava.org" >>> Subject: [Biojava-l] Biojava Dependency on Forester >>> Date: Thu, Oct 11, 2012 2:33 pm >>> >>> >>> >>> Hi Terry, >>> >>> Biojava depends on forester version 0.955. There are no plans to get >>> rid of this dependency, as far as I know. However we can try to >>> upgrade to a newer version if that helps. >>> >>> If you are working in a Maven environment and you pull in BioJava that >>> way, you can add an exclusion to your config. Something like the XML >>> below. This forces your project to ignore the older forester library >>> configured in biojava. Is this a suitable workaround for your problem? >>> >>> Andreas >>> >>> >>> >>> org.biojava >>> biojava3-phylo >>> 3.0.4 >>> >>> >>> org >>> >>> forester >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens >>> wrote: >>>> >>>> Dear Andreas, >>>> >>>> I am the lead developer of the software Tassel. >>>> http://www.maizegenetics.net/tassel >>>> >>>> We currently use Biojava 3.0. And we are >>>> wanting to use the latest release of Forester. >>>> Since Biojava has a dependency on an older >>>> release of Forester, we are running into conflicts. >>>> Can you help explain Biojava's dependency on >>>> Forester? >>>> >>>> What version of Forester does Biojava 3.0 require? >>>> It looks like version 0.955 >>>> >>>> What version of Forester does Biojava 3.0.4 require? >>>> >>>> Does any Biojava jar files include Forester classes? >>>> Or just references? >>>> >>>> Are there plans to remove Biojava's dependency >>>> on Forester? >>>> >>>> >>>> Thank you, >>>> >>>> Terry Casstevens >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From HWillis at scripps.edu Wed Oct 17 21:49:24 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 17 Oct 2012 21:49:24 -0400 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: Message-ID: Thanks Mark! Is Forester on maven to avoid external jar file dependency? On 10/17/12 9:18 PM, "Andreas Prlic" wrote: >Awesome, Mark, Thanks for fixing this! > >Andreas > >On Wed, Oct 17, 2012 at 6:08 PM, Mark Chapman wrote: >> biojava3-alignment and biojava3-phylo have both been updated to use the >> latest forester release: 1.005. The jar file is in our maven >>repository, >> and changes are committed to the SVN and git repositories. >> >> Enjoy! >> Mark >> >> >> >> On 10/11/2012 02:50 PM, Scooter Willis wrote: >>> >>> Andreas >>> >>> Looks like the alignment code is using the distance matrix from >>>forester >>> and that has changed. Any chance the developer who did the MSA code >>>could >>> get this working with the latest forester code. It is probably a >>>refactoring >>> problem. >>> >>> Scooter >>> >>> ----- Reply message ----- >>> From: "Terry Casstevens" >>> To: "Scooter Willis" >>> Cc: "Andreas Prlic" , "Peter Bradbury" >>> , "Jeff Glaubitz" , "Ed Buckler" >>> , "biojava-l at biojava.org" >>> Subject: [Biojava-l] Biojava Dependency on Forester >>> Date: Thu, Oct 11, 2012 2:56 pm >>> >>> >>> >>> Hi Scooter, Andreas, >>> >>> Thank you again for the responses. >>> >>> This is one problem we are seeing. org/forester/phylogenyinference >>> does not exist in Forester version 1.005. >>> >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> org/forester/phylogenyinference/DistanceMatrix >>> at >>> >>>org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignment >>>s.java:176) >>> at >>> >>>net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java >>>:306) >>> at >>> >>>net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java >>>:183) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMuta >>>bleAlignment(TagsToSNPByAlignmentPlugin.java:417) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPBy >>>Alignment(TagsToSNPByAlignmentPlugin.java:347) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunctio >>>n(TagsToSNPByAlignmentPlugin.java:107) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlu >>>gin(TerryPipelines.java:36) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:4 >>>1) >>> Caused by: java.lang.ClassNotFoundException: >>> org.forester.phylogenyinference.DistanceMatrix >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >>> >>> >>> Thank you, >>> >>> Terry >>> >>> >>> On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis >>> wrote: >>>> >>>> If forester has maven repository we can unhook the local depedency. We >>>> use >>>> forester for NJ and should only be a needed in one model. Let me know >>>>the >>>> issue/conflict and I can see what I can do to clean up. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> >>>> ----- Reply message ----- >>>> From: "Andreas Prlic" >>>> To: "Terry Casstevens" >>>> Cc: "Peter Bradbury" , "Jeff Glaubitz" >>>> , "Ed Buckler" , >>>> "biojava-l at biojava.org" >>>> Subject: [Biojava-l] Biojava Dependency on Forester >>>> Date: Thu, Oct 11, 2012 2:33 pm >>>> >>>> >>>> >>>> Hi Terry, >>>> >>>> Biojava depends on forester version 0.955. There are no plans to get >>>> rid of this dependency, as far as I know. However we can try to >>>> upgrade to a newer version if that helps. >>>> >>>> If you are working in a Maven environment and you pull in BioJava that >>>> way, you can add an exclusion to your config. Something like the XML >>>> below. This forces your project to ignore the older forester library >>>> configured in biojava. Is this a suitable workaround for your problem? >>>> >>>> Andreas >>>> >>>> >>>> >>>> org.biojava >>>> biojava3-phylo >>>> 3.0.4 >>>> >>>> >>>> org >>>> >>>> forester >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens >>>> wrote: >>>>> >>>>> Dear Andreas, >>>>> >>>>> I am the lead developer of the software Tassel. >>>>> http://www.maizegenetics.net/tassel >>>>> >>>>> We currently use Biojava 3.0. And we are >>>>> wanting to use the latest release of Forester. >>>>> Since Biojava has a dependency on an older >>>>> release of Forester, we are running into conflicts. >>>>> Can you help explain Biojava's dependency on >>>>> Forester? >>>>> >>>>> What version of Forester does Biojava 3.0 require? >>>>> It looks like version 0.955 >>>>> >>>>> What version of Forester does Biojava 3.0.4 require? >>>>> >>>>> Does any Biojava jar files include Forester classes? >>>>> Or just references? >>>>> >>>>> Are there plans to remove Biojava's dependency >>>>> on Forester? >>>>> >>>>> >>>>> Thank you, >>>>> >>>>> Terry Casstevens >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >_______________________________________________ >Biojava-l mailing list - Biojava-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biojava-l From pwrose at ucsd.edu Thu Oct 18 19:24:43 2012 From: pwrose at ucsd.edu (Peter Rose) Date: Thu, 18 Oct 2012 23:24:43 +0000 Subject: [Biojava-l] [Job] Lead Web Architect - RCSB PDB Message-ID: Become part of the RCSB Protein Data Bank team. We have an opening for an experienced Lead Web Architect. A detailed job description and online application form can be found at: http://jobs.ucsd.edu/bulletin/job.aspx?cat=information&sortby=post&jobnum_in=64091 Qualifications: * MS Degree in Computer Science or comparable combination of education and experience with considerable focus in Java EE software development. * Established demonstrated work experience in the role of an architect and developer on medium to large size database-driven web applications using Java EE technology and standards. * Advanced experience developing the presentation layer of a dynamic, database-driven web application using HTML, CSS, JavaScript, JavaScript Toolkits, Ajax, JSP, XML, Java. Experience resolving browser and cross-platform compatibility issues. Advanced experience with Struts2, Tiles, jQuery. * Advanced experience with database design, Structured Query Language and RDBMS's such as MySQL. Expertise in web application server administration and configuration such as Tomcat. * Established expertise in software life cycle methodologies. Experience with build tools such as Maven and Ant, and continuous integration systems such as Cruise Control. Experience with project tracking tools such as Jira. ________________________________________________ Peter Rose, Ph.D. Scientific Lead RCSB Protein Data Bank (http://www.rcsb.org) San Diego Supercomputer Center (SDSC) and Skaggs School of Pharmacy and Pharmaceutical Sciences University of California San Diego From hlapp at drycafe.net Mon Oct 22 10:52:24 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 22 Oct 2012 10:52:24 -0400 Subject: [Biojava-l] regex performance in Java Message-ID: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From andreas at sdsc.edu Mon Oct 22 16:42:00 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 22 Oct 2012 13:42:00 -0700 Subject: [Biojava-l] regex performance in Java In-Reply-To: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: Hi Hilmar, I can't say much about performance of java regular expressions, but isn't it hard to write efficient regular expressions in any language? There are great tools in the Java world for parsing of XML/JSON and other standard file types that help avoiding them. I am not sure if this is a general rule for the wider Java community, but from my perspective, the use of regular expressions in Java is only limited and used if nothing else works... Not sure if anybody else has a different experience? Andreas On Mon, Oct 22, 2012 at 7:52 AM, Hilmar Lapp wrote: > I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. > > They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? > > Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From to.petr at gmail.com Mon Oct 22 16:48:02 2012 From: to.petr at gmail.com (P. Troshin) Date: Mon, 22 Oct 2012 21:48:02 +0100 Subject: [Biojava-l] Fwd: regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: Sorry, I should have written to the list. Also just want to say that I agree with Andreas, in Java we use regexp if everything else fails (:-)) Regards, P. ---------- Forwarded message ---------- From: P. Troshin Date: 22 October 2012 21:44 Subject: Re: [Biojava-l] regex performance in Java To: Hilmar Lapp Hi Hilmar, I think this is one of the myths, I do not think there is a difference. It might have been true long ago, but I do not think this is still the case. Last time we compared Perl, Python and Java performance the former was the last with a large margin :-). However, I never had to make a direct comparison of regexp. Google for "perl vs java regexp speed comparison" brings a few links. I had a quick look at one result only (http://onlyjob.blogspot.co.uk/2011/03/perl5-python-ruby-php-c-c-lua-tcl.html), it claimed that Perl regexp is faster than Java. Unfortunately the author of the test clearly lacked understanding of Java and as a result the test compared the performance of String concatenation (which is notoriously bad in Java, as Strings are immutable) rather than the regexp performance itself. I guess this is an easy mistake to make though. Hence the advice - if you are doing a lot of String permutations use the StringBuilder class, not the String itself. If you have a Java implementation which is lacking I am sure people on this list will have no problem optimizing it! Regards, Peter On 22 October 2012 15:52, Hilmar Lapp wrote: > I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. > > They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? > > Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From tiagoantao at gmail.com Mon Oct 22 16:53:56 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 22 Oct 2012 21:53:56 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: On Mon, Oct 22, 2012 at 9:42 PM, Andreas Prlic wrote: > and used if nothing else works... Not sure if anybody else has a > different experience? I might be beating a dead horse here, but I agree. I would say that from an idiomatic perspective Perl uses a lot of regex programming (Ruby also?), which is less common in most other languages (Java and Python are my work case). Regexes exist but are not the first option. That being said, there is a very cool JVM language which has regexes as first class objects: Clojure. But even in that case, I do not see lots of idiomatic use of regexes. Tiago From daniel.quest at gmail.com Tue Oct 23 00:51:37 2012 From: daniel.quest at gmail.com (daniel.quest at gmail.com) Date: Mon, 22 Oct 2012 23:51:37 -0500 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: Wow! This could open a huge flame war. Let me just make a couple of quick points about performance. Perl is implemented in C/C++, It is interpreted, and Java runs bytecode on top of the JVM. The Venders of JVMs probably write the bytecode instruction set in C/assembly. Java itself, at least at this point is most likely written in Java. The speed of Java is greatly influenced by the underlying JVM and how well the JVM instruction set maps to the hardware. The algorithm being implemented and the version of Java also have a great impact on performance. Conventional wisdom is that Fortran is the best performing language in widespread use with interpreted languages such as Python, Ruby, and Perl being 3-8 times slower. This website shows Java having about a ten percent overhead relative to C: http://shootout.alioth.debian.org/ I have personally noticed superior performance of Perl's Regex parsing capabilities over Python. I have never noticed a difference between Perl and Java that was so extreme that I would choose to implement something in Perl over Java in a production setting. Java is a language with such deep library support that it makes most every language look like a second class citizen in comparison (notable exceptions: C, C++, and JavaScript) Something else interesting: http://swtch.com/~rsc/regexp/regexp1.html Finally, be very cautious of benchmarks. It is very very hard to do benchmarking well. Dan Sent from my iPhone On Oct 22, 2012, at 3:53 PM, Tiago Ant?o wrote: > On Mon, Oct 22, 2012 at 9:42 PM, Andreas Prlic wrote: >> and used if nothing else works... Not sure if anybody else has a >> different experience? > > I might be beating a dead horse here, but I agree. I would say that > from an idiomatic perspective Perl uses a lot of regex programming > (Ruby also?), which is less common in most other languages (Java and > Python are my work case). Regexes exist but are not the first option. > That being said, there is a very cool JVM language which has regexes > as first class objects: Clojure. But even in that case, I do not see > lots of idiomatic use of regexes. > > Tiago > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From khalil.elmazouari at gmail.com Tue Oct 23 12:48:32 2012 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 23 Oct 2012 18:48:32 +0200 Subject: [Biojava-l] Biojava-l Digest, Vol 117, Issue 10 In-Reply-To: References: Message-ID: <99D2DB4A-DD8B-4473-BE04-2834ECF5457E@gmail.com> Hi Hilmar, I used regex a lot in perl and java... I was also skeptical about the regex in java when I start using them. from my own experience, I can tell you the following: it's MUCH more easy to write regex in perl than in java. java regex require more optimisation: working regex and optimal regex are two different things in java, Patterns must be compiled first. So, if you iterate through a large number of strings you want to match, compile your pattern outside the loop if you use regex in large iteration, avoid using methods from java.lang.String that use regex: String.replaceFirst, String.replaceAll, String.matches.... your pattern will be compiled each time Avoid applying regex to large string. If possible, try to limit the matches to the places where the pattern is .. methods like indexOf, lastIndexOf, split ... from java.lang.String are very useful in this regards. It's more easy to get the matching group in java than in perl test first with editors like : RegExhibit or your IDE regex plugin. finally, I recommend the Java Regular Expressions book from Mehran Habibi (http://www.amazon.com/Java-Regular-Expressions-Taming-java-util-regex/dp/1590591070) If your regex are well optimised, you will not notice any difference between perl/java. If you need to use regex in complex algorithm or software in combination with java/biojava, don't hesitate, java regex are excellent. If you just need regex in small script go for perl Best khalil On 22 Oct 2012, at 18:00, biojava-l-request at lists.open-bio.org wrote: > Send Biojava-l mailing list submissions to > biojava-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > biojava-l-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. regex performance in Java (Hilmar Lapp) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 22 Oct 2012 10:52:24 -0400 > From: Hilmar Lapp > Subject: [Biojava-l] regex performance in Java > To: BioJava > Message-ID: <1B62BC3E-B005-4484-AE66-0B8F407E4756 at drycafe.net> > Content-Type: text/plain; charset=us-ascii > > I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. > > They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? > > Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 117, Issue 10 > ****************************************** ----- Confidentiality Notice: This e-mail and any files transmitted with it are private and confidential and are solely for the use of the addressee. It may contain material which is legally privileged. If you are not the addressee or the person responsible for delivering to the addressee, please notify that you have received this e-mail in error and that any use of it is strictly prohibited. It would be helpful if you could notify the author by replying to it. From hlapp at drycafe.net Wed Oct 24 12:47:27 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Oct 2012 12:47:27 -0400 Subject: [Biojava-l] regex performance in Java In-Reply-To: <874nllo4fx.fsf@newcastle.ac.uk> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> Message-ID: <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Hi everyone, Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. Java: https://gist.github.com/3940931 Perl: https://gist.github.com/3940780 I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. -hilmar On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: > Hilmar Lapp writes: >> They (at least as in java.util.regex) have been reported to me as >> performing much slower (by several orders of magnitude) than the regex >> implementation in Perl, and some simple benchmarking tests seem to >> bear that out. Even after scrutinizing the benchmark and finding >> nothing obvious, I'm still skeptical as to why this would be the case >> - naively I would have assumed that the underlying runtime library is >> implemented in C in both cases. But perhaps this is not true? > > > Well, the difference is that Perl is perl, while Java is not; it all > depends on the JVM, and libraries also. A quick shuftie at > the source for the open-jdk libraries suggests that the regexp searching > is done in Java -- it's not just a drop through to C. Always the problem > with performance optimisation on Java -- you are only optimising for one > situation. It might be interesting to see how much variation there is > between JVMs. > > Like others, I would only use regexp as a last resort in Java anyway; > compared to Perl, writing the code is painful. Still, I guess that you > know this! > > Phil -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From to.petr at gmail.com Wed Oct 24 13:59:19 2012 From: to.petr at gmail.com (P. Troshin) Date: Wed, 24 Oct 2012 18:59:19 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: Hi Hilmar, You have not mentions the version of JVM you are using, but it appears that there is a massive difference in timing on my machine. Here is the timing of the run on Windows 7 (pro 64 bit) with Oracle JVM (64 bit) v. 1.7.0_02. # of Iteration: 1t Time: 1.711E-6 seconds # of Iteration: 10 Time: 1.711E-6 seconds # of Iteration: 100 Time: 2.567E-6 seconds # of Iteration: 1000 Time: 1.2403E-5 seconds # of Iteration: 10000 Time: 1.44143E-4 seconds # of Iteration: 100000 Time: 0.001369138 seconds I have not changed the code at all. I have 3 year old laptop with Intel Core Duo P8600, 2.4 Ghz CPU. So nothing special. I cannot tell whether this is slow or not as you did not publish the timings for Perl. Could you please do so. It looks to me that you might just need to update/replace your JVM. I will be happy to look at the code in a bit more details if this result is still slower than Perl. Thanks, Peter On 24 October 2012 17:47, Hilmar Lapp wrote: > Hi everyone, > > Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. > > However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. > > Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. > > Java: https://gist.github.com/3940931 > Perl: https://gist.github.com/3940780 > > I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. > > -hilmar > > On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: > >> Hilmar Lapp writes: >>> They (at least as in java.util.regex) have been reported to me as >>> performing much slower (by several orders of magnitude) than the regex >>> implementation in Perl, and some simple benchmarking tests seem to >>> bear that out. Even after scrutinizing the benchmark and finding >>> nothing obvious, I'm still skeptical as to why this would be the case >>> - naively I would have assumed that the underlying runtime library is >>> implemented in C in both cases. But perhaps this is not true? >> >> >> Well, the difference is that Perl is perl, while Java is not; it all >> depends on the JVM, and libraries also. A quick shuftie at >> the source for the open-jdk libraries suggests that the regexp searching >> is done in Java -- it's not just a drop through to C. Always the problem >> with performance optimisation on Java -- you are only optimising for one >> situation. It might be interesting to see how much variation there is >> between JVMs. >> >> Like others, I would only use regexp as a last resort in Java anyway; >> compared to Perl, writing the code is painful. Still, I guess that you >> know this! >> >> Phil > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From to.petr at gmail.com Wed Oct 24 14:10:38 2012 From: to.petr at gmail.com (P. Troshin) Date: Wed, 24 Oct 2012 19:10:38 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: Hi Hilmar, Hmm, it looks like I spoke too soon; the previous run was doing nothing as all of the cases were commented out. I can now see that the results of my runs are not massively different from that of yours. It would help if you could encourage your student to write a few unit tests so that we know what you are trying to achieve and to simplify the testing. Just a thought Thanks, Peter On 24 October 2012 17:47, Hilmar Lapp wrote: > Hi everyone, > > Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. > > However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. > > Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. > > Java: https://gist.github.com/3940931 > Perl: https://gist.github.com/3940780 > > I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. > > -hilmar > > On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: > >> Hilmar Lapp writes: >>> They (at least as in java.util.regex) have been reported to me as >>> performing much slower (by several orders of magnitude) than the regex >>> implementation in Perl, and some simple benchmarking tests seem to >>> bear that out. Even after scrutinizing the benchmark and finding >>> nothing obvious, I'm still skeptical as to why this would be the case >>> - naively I would have assumed that the underlying runtime library is >>> implemented in C in both cases. But perhaps this is not true? >> >> >> Well, the difference is that Perl is perl, while Java is not; it all >> depends on the JVM, and libraries also. A quick shuftie at >> the source for the open-jdk libraries suggests that the regexp searching >> is done in Java -- it's not just a drop through to C. Always the problem >> with performance optimisation on Java -- you are only optimising for one >> situation. It might be interesting to see how much variation there is >> between JVMs. >> >> Like others, I would only use regexp as a last resort in Java anyway; >> compared to Perl, writing the code is painful. Still, I guess that you >> know this! >> >> Phil > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From to.petr at gmail.com Wed Oct 24 14:30:16 2012 From: to.petr at gmail.com (P. Troshin) Date: Wed, 24 Oct 2012 19:30:16 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: Hi Hilmar, Looked at the test in a bit more details, I can see what you are trying to test but is there a real life problem behind this? What this test is doing is a lot of searches on very short strings. Is this what your real life application does? I am asking because if your real life application uses regexp to look into long string, the performance might be totally different. What is your aim - 3 seconds for 500K searches do not seem particularly slow to me. Thanks Peter On 24 October 2012 19:10, P. Troshin wrote: > Hi Hilmar, > > Hmm, it looks like I spoke too soon; the previous run was doing > nothing as all of the cases were commented out. > I can now see that the results of my runs are not massively different > from that of yours. > It would help if you could encourage your student to write a few unit > tests so that we know what you are trying to achieve and to simplify > the testing. > > Just a thought > > Thanks, > Peter > > > > On 24 October 2012 17:47, Hilmar Lapp wrote: >> Hi everyone, >> >> Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. >> >> However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. >> >> Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. >> >> Java: https://gist.github.com/3940931 >> Perl: https://gist.github.com/3940780 >> >> I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. >> >> -hilmar >> >> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: >> >>> Hilmar Lapp writes: >>>> They (at least as in java.util.regex) have been reported to me as >>>> performing much slower (by several orders of magnitude) than the regex >>>> implementation in Perl, and some simple benchmarking tests seem to >>>> bear that out. Even after scrutinizing the benchmark and finding >>>> nothing obvious, I'm still skeptical as to why this would be the case >>>> - naively I would have assumed that the underlying runtime library is >>>> implemented in C in both cases. But perhaps this is not true? >>> >>> >>> Well, the difference is that Perl is perl, while Java is not; it all >>> depends on the JVM, and libraries also. A quick shuftie at >>> the source for the open-jdk libraries suggests that the regexp searching >>> is done in Java -- it's not just a drop through to C. Always the problem >>> with performance optimisation on Java -- you are only optimising for one >>> situation. It might be interesting to see how much variation there is >>> between JVMs. >>> >>> Like others, I would only use regexp as a last resort in Java anyway; >>> compared to Perl, writing the code is painful. Still, I guess that you >>> know this! >>> >>> Phil >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l From hlapp at drycafe.net Wed Oct 24 23:45:52 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Oct 2012 23:45:52 -0400 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: <34DFDF13-5014-4383-A3D3-BAF793AC2E23@drycafe.net> The code is a very small snippet from a natural language processing software aimed at extracting structured phenotype descriptions from un- or semistructured free text. Apparently the code as is (in Perl) makes a lot of regular expression matches, and so if the speed difference for them between Perl and Java is significant, in theory this might become a problem. Though whether it will or will not amount to a bottleneck indeed remains to be seen, as the code is also doing other things that are potentially expensive, and possibly more so than the regex matching. So the exercise here is merely to see whether there is a notable performance difference in regex pattern evaluation that can't simply be attributed to programming mistakes (and apparently there is). -hilmar On Oct 24, 2012, at 2:30 PM, P. Troshin wrote: > Hi Hilmar, > > Looked at the test in a bit more details, I can see what you are > trying to test but is there a real life problem behind this? > What this test is doing is a lot of searches on very short strings. Is > this what your real life application does? I am asking because if your > real life application uses regexp to look into long string, the > performance might be totally different. > What is your aim - 3 seconds for 500K searches do not seem > particularly slow to me. > > Thanks > Peter > > > On 24 October 2012 19:10, P. Troshin wrote: >> Hi Hilmar, >> >> Hmm, it looks like I spoke too soon; the previous run was doing >> nothing as all of the cases were commented out. >> I can now see that the results of my runs are not massively different >> from that of yours. >> It would help if you could encourage your student to write a few unit >> tests so that we know what you are trying to achieve and to simplify >> the testing. >> >> Just a thought >> >> Thanks, >> Peter >> >> >> >> On 24 October 2012 17:47, Hilmar Lapp wrote: >>> Hi everyone, >>> >>> Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. >>> >>> However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. >>> >>> Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. >>> >>> Java: https://gist.github.com/3940931 >>> Perl: https://gist.github.com/3940780 >>> >>> I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. >>> >>> -hilmar >>> >>> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: >>> >>>> Hilmar Lapp writes: >>>>> They (at least as in java.util.regex) have been reported to me as >>>>> performing much slower (by several orders of magnitude) than the regex >>>>> implementation in Perl, and some simple benchmarking tests seem to >>>>> bear that out. Even after scrutinizing the benchmark and finding >>>>> nothing obvious, I'm still skeptical as to why this would be the case >>>>> - naively I would have assumed that the underlying runtime library is >>>>> implemented in C in both cases. But perhaps this is not true? >>>> >>>> >>>> Well, the difference is that Perl is perl, while Java is not; it all >>>> depends on the JVM, and libraries also. A quick shuftie at >>>> the source for the open-jdk libraries suggests that the regexp searching >>>> is done in Java -- it's not just a drop through to C. Always the problem >>>> with performance optimisation on Java -- you are only optimising for one >>>> situation. It might be interesting to see how much variation there is >>>> between JVMs. >>>> >>>> Like others, I would only use regexp as a last resort in Java anyway; >>>> compared to Perl, writing the code is painful. Still, I guess that you >>>> know this! >>>> >>>> Phil >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From daniel.quest at gmail.com Wed Oct 24 23:51:53 2012 From: daniel.quest at gmail.com (daniel.quest at gmail.com) Date: Wed, 24 Oct 2012 22:51:53 -0500 Subject: [Biojava-l] regex performance in Java In-Reply-To: <34DFDF13-5014-4383-A3D3-BAF793AC2E23@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> <34DFDF13-5014-4383-A3D3-BAF793AC2E23@drycafe.net> Message-ID: Have you ever used uima? Same software used on the IBM Watson project. Very very powerful. http://uima.apache.org/ Dan Sent from my iPhone On Oct 24, 2012, at 10:45 PM, Hilmar Lapp wrote: > The code is a very small snippet from a natural language processing software aimed at extracting structured phenotype descriptions from un- or semistructured free text. Apparently the code as is (in Perl) makes a lot of regular expression matches, and so if the speed difference for them between Perl and Java is significant, in theory this might become a problem. Though whether it will or will not amount to a bottleneck indeed remains to be seen, as the code is also doing other things that are potentially expensive, and possibly more so than the regex matching. > > So the exercise here is merely to see whether there is a notable performance difference in regex pattern evaluation that can't simply be attributed to programming mistakes (and apparently there is). > > -hilmar > > On Oct 24, 2012, at 2:30 PM, P. Troshin wrote: > >> Hi Hilmar, >> >> Looked at the test in a bit more details, I can see what you are >> trying to test but is there a real life problem behind this? >> What this test is doing is a lot of searches on very short strings. Is >> this what your real life application does? I am asking because if your >> real life application uses regexp to look into long string, the >> performance might be totally different. >> What is your aim - 3 seconds for 500K searches do not seem >> particularly slow to me. >> >> Thanks >> Peter >> >> >> On 24 October 2012 19:10, P. Troshin wrote: >>> Hi Hilmar, >>> >>> Hmm, it looks like I spoke too soon; the previous run was doing >>> nothing as all of the cases were commented out. >>> I can now see that the results of my runs are not massively different >>> from that of yours. >>> It would help if you could encourage your student to write a few unit >>> tests so that we know what you are trying to achieve and to simplify >>> the testing. >>> >>> Just a thought >>> >>> Thanks, >>> Peter >>> >>> >>> >>> On 24 October 2012 17:47, Hilmar Lapp wrote: >>>> Hi everyone, >>>> >>>> Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. >>>> >>>> However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. >>>> >>>> Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. >>>> >>>> Java: https://gist.github.com/3940931 >>>> Perl: https://gist.github.com/3940780 >>>> >>>> I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. >>>> >>>> -hilmar >>>> >>>> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: >>>> >>>>> Hilmar Lapp writes: >>>>>> They (at least as in java.util.regex) have been reported to me as >>>>>> performing much slower (by several orders of magnitude) than the regex >>>>>> implementation in Perl, and some simple benchmarking tests seem to >>>>>> bear that out. Even after scrutinizing the benchmark and finding >>>>>> nothing obvious, I'm still skeptical as to why this would be the case >>>>>> - naively I would have assumed that the underlying runtime library is >>>>>> implemented in C in both cases. But perhaps this is not true? >>>>> >>>>> >>>>> Well, the difference is that Perl is perl, while Java is not; it all >>>>> depends on the JVM, and libraries also. A quick shuftie at >>>>> the source for the open-jdk libraries suggests that the regexp searching >>>>> is done in Java -- it's not just a drop through to C. Always the problem >>>>> with performance optimisation on Java -- you are only optimising for one >>>>> situation. It might be interesting to see how much variation there is >>>>> between JVMs. >>>>> >>>>> Like others, I would only use regexp as a last resort in Java anyway; >>>>> compared to Perl, writing the code is painful. Still, I guess that you >>>>> know this! >>>>> >>>>> Phil >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Wed Oct 31 16:48:07 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 31 Oct 2012 13:48:07 -0700 Subject: [Biojava-l] open-bio servers moving Message-ID: Hi, The portal.open-bio.org server that is hosting the biojava wiki site is currently down, since it is being moved to a new location on the Amazon Cloud. It may take a few days until everything has been set up properly and the wiki will be back. If there is anybody on the biojava side who wants to join the open bioinformatics foundation's sysadmin team and help out with projects like this one, this would be a good moment to volunteer... Andreas From daniel.quest at gmail.com Wed Oct 3 03:31:46 2012 From: daniel.quest at gmail.com (Daniel Quest) Date: Tue, 2 Oct 2012 22:31:46 -0500 Subject: [Biojava-l] maven + genbank parser Message-ID: Is there an example project that uses maven to access the genbank parser? I am confused if biojava3 has genbank support. If I want to use an earlier version along the lines of the cookbook does it exist in a maven repo? Thanks Dan From andreas at sdsc.edu Wed Oct 3 05:09:28 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 2 Oct 2012 22:09:28 -0700 Subject: [Biojava-l] maven + genbank parser In-Reply-To: References: Message-ID: Hi Daniel, both the 1.8 as well as the 3.0 series are available via maven builds. You have to configure the biojava specific repo ... biojava-maven-repo BioJava repository http://www.biojava.org/download/maven/ Andreas On Tue, Oct 2, 2012 at 8:31 PM, Daniel Quest wrote: > Is there an example project that uses maven to access the genbank parser? > I am confused if biojava3 has genbank support. If I want to use an > earlier version along the lines of the cookbook does it exist in a maven > repo? > > Thanks > Dan > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Mon Oct 8 22:43:43 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 8 Oct 2012 15:43:43 -0700 Subject: [Biojava-l] InstabilityIndex In-Reply-To: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: Hi Subrata, Please don't mail me directly, but send your questions to the list. Chances are that somebody there can help you best. Ah Fu, do you have any thoughts on this? Thanks, Andreas On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha wrote: > Dear Sir, > > I was testing the getInstabilityIndex() method of > org.biojava3.aaproperties.PeptideProperties class. > > I am trying to find InstabilityIndex for very short segment of peptide, as > short as two. But I am getting some surprising result in negative.Then i > found that the method perhaps giving -ve result for ambiguous characters. So > how to handle a situation if my protein sequence contains ambiguous > characters > > Input Sequence Instability Index > > GTDG -13.725 > VDVR -30.075 > > How i analyse the above situation. > > Kindly help me. > > > With Regards > > Subrata Sinha > Assistant Professor > Centre for Bioinformatics Studies > Dibrugarh University From darnells at dnastar.com Tue Oct 9 17:32:36 2012 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 9 Oct 2012 17:32:36 +0000 Subject: [Biojava-l] InstabilityIndex In-Reply-To: References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: @Subrata, The class does not handle characters other than the standard 20 amino acids. We substitute BJOUXZ with DLKCAE and live with the approximation. @Ah Fu, In the past we have used average values for B (D or N), J (L or I), and Z (E or Q), and a dummy substitution for X (G or A). Our U and O substitutions just map to the closest natural amino acid. It would be nice if real values existed for selenocysteine and pyrrolysine, but I haven't a clue if they do. @Andreas, I would suggest that a general policy that all protein sequence analyses support the 6 non-standard letter would be a good thing. A nice ideal, but who has the time? :) Regards, Steve -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Monday, October 08, 2012 5:44 PM To: subrata sinha Cc: Chuan Hock Koh; biojava-l at biojava.org Subject: Re: [Biojava-l] InstabilityIndex Hi Subrata, Please don't mail me directly, but send your questions to the list. Chances are that somebody there can help you best. Ah Fu, do you have any thoughts on this? Thanks, Andreas On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha wrote: > Dear Sir, > > I was testing the getInstabilityIndex() method of > org.biojava3.aaproperties.PeptideProperties class. > > I am trying to find InstabilityIndex for very short segment of > peptide, as short as two. But I am getting some surprising result in > negative.Then i found that the method perhaps giving -ve result for > ambiguous characters. So how to handle a situation if my protein > sequence contains ambiguous characters > > Input Sequence Instability Index > > GTDG -13.725 > VDVR -30.075 > > How i analyse the above situation. > > Kindly help me. > > > With Regards > > Subrata Sinha > Assistant Professor > Centre for Bioinformatics Studies > Dibrugarh University _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From hlapp at drycafe.net Wed Oct 10 20:31:04 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 10 Oct 2012 16:31:04 -0400 Subject: [Biojava-l] Fwd: [Announce] Call for Proposals for Doc Sprint Summit v2.0 References: Message-ID: If you've been a Google Summer of Code mentor this year or last year, you will have already seen this. I wanted to make sure everybody is aware, and this may provide the opportunity for the kind of concerted effort that could finally get a BioPerl, Biopython, Bioruby, or Biojava (or a combined??) off the ground. -hilmar Begin forwarded message: From: Carol Smith Subject: [GSoC Mentors] [Announce] Call for Proposals for Doc Sprint Summit v2.0 Date: October 10, 2012 2:44:50 PM EDT To: Google Summer of Code Mentors List Cc: adam at flossmanuals.net Dear GSoC mentors and org admins, Google Summer of Code in collaboration with Aspiration and FLOSS Manuals is hosting a "Doc Sprint Camp" at Google's Mountain View headquarters (California) Dec 3 - 7, 2012. The 2012 Doc Camp will feature: 1) An unconference on free software documentation topics - facilitated by Aspiration 2) 2-5 Book Sprints to produce books on free softwares - facilitated by FLOSS Manuals Building on the success of the 2011 GSoC Doc Camp we are proud to bring you the 2012 GSoC Doc Camp. Like the previous event the 2012 GSoC Doc Camp is a place for free software communities to meet, create a book for their project, attract new people to their efforts, and share their documentation experiences. The camp aims to improve free documentation materials and skills in free software projects and individuals and help form the identity of the emergent free documentation sector. Individuals and projects can apply. Food and accommodation for all individuals will be provided and travel support (full or partial) can also be applied for. Be a part of this exciting event ? propose a Book Sprint on your favorite free software or come and help others write a book on their favorite project. Guaranteed to be a lot of fun, productive, and a fantastic place to advance your documentation efforts and experiences. For more information or to register to take part, please see https://sites.google.com/site/docsprintsummitv2/. Please note proposals are due by October 26, so get yours in ASAP! Cheers, Carol Smith, Allen Gunn, Adam Hyde -- You received this message because you are subscribed to the Google Groups "Google Summer of Code Mentors List" group. To post to this group, send email to google-summer-of-code-mentors-list at googlegroups.com. To unsubscribe from this group, send email to google-summer-of-code-mentors-list+unsubscribe at googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-summer-of-code-mentors-list?hl=en. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From andreas at sdsc.edu Thu Oct 11 02:17:45 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 10 Oct 2012 19:17:45 -0700 Subject: [Biojava-l] InstabilityIndex In-Reply-To: References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: Thanks, good comments, Steve. I agree we should make it a policy from now on to support 6 non-standard letters for anything protein related... About your time comment - that's part of the idea of moving the development to git.. Ideally, it should be easier for everybody who is concerned to patch and share patches... Andreas On Tue, Oct 9, 2012 at 10:32 AM, Steve Darnell wrote: > @Subrata, The class does not handle characters other than the standard 20 amino acids. We substitute BJOUXZ with DLKCAE and live with the approximation. > > @Ah Fu, In the past we have used average values for B (D or N), J (L or I), and Z (E or Q), and a dummy substitution for X (G or A). Our U and O substitutions just map to the closest natural amino acid. It would be nice if real values existed for selenocysteine and pyrrolysine, but I haven't a clue if they do. > > @Andreas, I would suggest that a general policy that all protein sequence analyses support the 6 non-standard letter would be a good thing. A nice ideal, but who has the time? :) > > Regards, > Steve > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Monday, October 08, 2012 5:44 PM > To: subrata sinha > Cc: Chuan Hock Koh; biojava-l at biojava.org > Subject: Re: [Biojava-l] InstabilityIndex > > Hi Subrata, > > Please don't mail me directly, but send your questions to the list. > Chances are that somebody there can help you best. > > Ah Fu, do you have any thoughts on this? > > Thanks, > > Andreas > > > > > On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha wrote: >> Dear Sir, >> >> I was testing the getInstabilityIndex() method of >> org.biojava3.aaproperties.PeptideProperties class. >> >> I am trying to find InstabilityIndex for very short segment of >> peptide, as short as two. But I am getting some surprising result in >> negative.Then i found that the method perhaps giving -ve result for >> ambiguous characters. So how to handle a situation if my protein >> sequence contains ambiguous characters >> >> Input Sequence Instability Index >> >> GTDG -13.725 >> VDVR -30.075 >> >> How i analyse the above situation. >> >> Kindly help me. >> >> >> With Regards >> >> Subrata Sinha >> Assistant Professor >> Centre for Bioinformatics Studies >> Dibrugarh University > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas.prlic at gmail.com Thu Oct 11 18:32:30 2012 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 11 Oct 2012 11:32:30 -0700 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: References: Message-ID: Hi Terry, Biojava depends on forester version 0.955. There are no plans to get rid of this dependency, as far as I know. However we can try to upgrade to a newer version if that helps. If you are working in a Maven environment and you pull in BioJava that way, you can add an exclusion to your config. Something like the XML below. This forces your project to ignore the older forester library configured in biojava. Is this a suitable workaround for your problem? Andreas org.biojava biojava3-phylo 3.0.4 org forester On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens wrote: > Dear Andreas, > > I am the lead developer of the software Tassel. > http://www.maizegenetics.net/tassel > > We currently use Biojava 3.0. And we are > wanting to use the latest release of Forester. > Since Biojava has a dependency on an older > release of Forester, we are running into conflicts. > Can you help explain Biojava's dependency on > Forester? > > What version of Forester does Biojava 3.0 require? > It looks like version 0.955 > > What version of Forester does Biojava 3.0.4 require? > > Does any Biojava jar files include Forester classes? > Or just references? > > Are there plans to remove Biojava's dependency > on Forester? > > > Thank you, > > Terry Casstevens From andreas.prlic at gmail.com Thu Oct 11 18:50:10 2012 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 11 Oct 2012 11:50:10 -0700 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: References: Message-ID: "exclusion" means only that the (old) forester.jar that biojava depends on, will not be included. Instead you could configure your newer forester dependency in its own section. That one will get used instead. A On Thu, Oct 11, 2012 at 11:43 AM, Terry Casstevens wrote: > Hi Andreas, > > Thank you for the quick response! > > When you say "exclusion", sounds like the > parts of biojava that uses forester would be excluded? > I'm not sure, but I think our code uses some of > the code that would be excluded. > > As you probably already know, the latest > release of Forester is not backwardly > compatible with Forester version 0.955. > > Thank you, > > Terry > > > On Thu, Oct 11, 2012 at 2:32 PM, Andreas Prlic wrote: >> Hi Terry, >> >> Biojava depends on forester version 0.955. There are no plans to get >> rid of this dependency, as far as I know. However we can try to >> upgrade to a newer version if that helps. >> >> If you are working in a Maven environment and you pull in BioJava that >> way, you can add an exclusion to your config. Something like the XML >> below. This forces your project to ignore the older forester library >> configured in biojava. Is this a suitable workaround for your problem? >> >> Andreas >> >> >> >> org.biojava >> biojava3-phylo >> 3.0.4 >> >> >> org >> forester >> >> >> >> >> >> >> >> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens wrote: >>> Dear Andreas, >>> >>> I am the lead developer of the software Tassel. >>> http://www.maizegenetics.net/tassel >>> >>> We currently use Biojava 3.0. And we are >>> wanting to use the latest release of Forester. >>> Since Biojava has a dependency on an older >>> release of Forester, we are running into conflicts. >>> Can you help explain Biojava's dependency on >>> Forester? >>> >>> What version of Forester does Biojava 3.0 require? >>> It looks like version 0.955 >>> >>> What version of Forester does Biojava 3.0.4 require? >>> >>> Does any Biojava jar files include Forester classes? >>> Or just references? >>> >>> Are there plans to remove Biojava's dependency >>> on Forester? >>> >>> >>> Thank you, >>> >>> Terry Casstevens From HWillis at scripps.edu Thu Oct 11 18:46:24 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 11 Oct 2012 14:46:24 -0400 Subject: [Biojava-l] Biojava Dependency on Forester Message-ID: <9CA63734-A43D-4BAE-A744-FF292D1EF972@scripps.edu> If forester has maven repository we can unhook the local depedency. We use forester for NJ and should only be a needed in one model. Let me know the issue/conflict and I can see what I can do to clean up. Thanks Scooter ----- Reply message ----- From: "Andreas Prlic" To: "Terry Casstevens" Cc: "Peter Bradbury" , "Jeff Glaubitz" , "Ed Buckler" , "biojava-l at biojava.org" Subject: [Biojava-l] Biojava Dependency on Forester Date: Thu, Oct 11, 2012 2:33 pm Hi Terry, Biojava depends on forester version 0.955. There are no plans to get rid of this dependency, as far as I know. However we can try to upgrade to a newer version if that helps. If you are working in a Maven environment and you pull in BioJava that way, you can add an exclusion to your config. Something like the XML below. This forces your project to ignore the older forester library configured in biojava. Is this a suitable workaround for your problem? Andreas org.biojava biojava3-phylo 3.0.4 org forester On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens wrote: > Dear Andreas, > > I am the lead developer of the software Tassel. > http://www.maizegenetics.net/tassel > > We currently use Biojava 3.0. And we are > wanting to use the latest release of Forester. > Since Biojava has a dependency on an older > release of Forester, we are running into conflicts. > Can you help explain Biojava's dependency on > Forester? > > What version of Forester does Biojava 3.0 require? > It looks like version 0.955 > > What version of Forester does Biojava 3.0.4 require? > > Does any Biojava jar files include Forester classes? > Or just references? > > Are there plans to remove Biojava's dependency > on Forester? > > > Thank you, > > Terry Casstevens _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From HWillis at scripps.edu Thu Oct 11 19:50:26 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 11 Oct 2012 15:50:26 -0400 Subject: [Biojava-l] Biojava Dependency on Forester Message-ID: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> Andreas Looks like the alignment code is using the distance matrix from forester and that has changed. Any chance the developer who did the MSA code could get this working with the latest forester code. It is probably a refactoring problem. Scooter ----- Reply message ----- From: "Terry Casstevens" To: "Scooter Willis" Cc: "Andreas Prlic" , "Peter Bradbury" , "Jeff Glaubitz" , "Ed Buckler" , "biojava-l at biojava.org" Subject: [Biojava-l] Biojava Dependency on Forester Date: Thu, Oct 11, 2012 2:56 pm Hi Scooter, Andreas, Thank you again for the responses. This is one problem we are seeing. org/forester/phylogenyinference does not exist in Forester version 1.005. Exception in thread "main" java.lang.NoClassDefFoundError: org/forester/phylogenyinference/DistanceMatrix at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) at net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java:306) at net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java:183) at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMutableAlignment(TagsToSNPByAlignmentPlugin.java:417) at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPByAlignment(TagsToSNPByAlignmentPlugin.java:347) at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunction(TagsToSNPByAlignmentPlugin.java:107) at net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlugin(TerryPipelines.java:36) at net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:41) Caused by: java.lang.ClassNotFoundException: org.forester.phylogenyinference.DistanceMatrix at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) Thank you, Terry On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis wrote: > If forester has maven repository we can unhook the local depedency. We use > forester for NJ and should only be a needed in one model. Let me know the > issue/conflict and I can see what I can do to clean up. > > Thanks > > Scooter > > > ----- Reply message ----- > From: "Andreas Prlic" > To: "Terry Casstevens" > Cc: "Peter Bradbury" , "Jeff Glaubitz" > , "Ed Buckler" , > "biojava-l at biojava.org" > Subject: [Biojava-l] Biojava Dependency on Forester > Date: Thu, Oct 11, 2012 2:33 pm > > > > Hi Terry, > > Biojava depends on forester version 0.955. There are no plans to get > rid of this dependency, as far as I know. However we can try to > upgrade to a newer version if that helps. > > If you are working in a Maven environment and you pull in BioJava that > way, you can add an exclusion to your config. Something like the XML > below. This forces your project to ignore the older forester library > configured in biojava. Is this a suitable workaround for your problem? > > Andreas > > > > org.biojava > biojava3-phylo > 3.0.4 > > > org > forester > > > > > > > > On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens > wrote: >> Dear Andreas, >> >> I am the lead developer of the software Tassel. >> http://www.maizegenetics.net/tassel >> >> We currently use Biojava 3.0. And we are >> wanting to use the latest release of Forester. >> Since Biojava has a dependency on an older >> release of Forester, we are running into conflicts. >> Can you help explain Biojava's dependency on >> Forester? >> >> What version of Forester does Biojava 3.0 require? >> It looks like version 0.955 >> >> What version of Forester does Biojava 3.0.4 require? >> >> Does any Biojava jar files include Forester classes? >> Or just references? >> >> Are there plans to remove Biojava's dependency >> on Forester? >> >> >> Thank you, >> >> Terry Casstevens > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From kohchuanhock at gmail.com Sun Oct 14 04:54:44 2012 From: kohchuanhock at gmail.com (Chuan Hock Koh) Date: Sun, 14 Oct 2012 13:54:44 +0900 Subject: [Biojava-l] InstabilityIndex In-Reply-To: References: <1349723178.65392.YahooMailNeo@web190604.mail.sg3.yahoo.com> Message-ID: Hi Andreas and Steve, Sorry for the slow response. Currently, I am in the midst of moving from Singapore to Japan. Busy with settling into the new job, apartment hunting etc.. So, what is the conclusion for the problem? Do let me know what you guys like to be done, as clear as possible :) I will code them whenever I can find time which I believe I should have some at the end of this month. Thanks, Ah Fu On Thu, Oct 11, 2012 at 11:17 AM, Andreas Prlic wrote: > Thanks, good comments, Steve. > > I agree we should make it a policy from now on to support 6 > non-standard letters for anything protein related... About your time > comment - that's part of the idea of moving the development to git.. > Ideally, it should be easier for everybody who is concerned to patch > and share patches... > > Andreas > > > On Tue, Oct 9, 2012 at 10:32 AM, Steve Darnell > wrote: > > @Subrata, The class does not handle characters other than the standard > 20 amino acids. We substitute BJOUXZ with DLKCAE and live with the > approximation. > > > > @Ah Fu, In the past we have used average values for B (D or N), J (L or > I), and Z (E or Q), and a dummy substitution for X (G or A). Our U and O > substitutions just map to the closest natural amino acid. It would be nice > if real values existed for selenocysteine and pyrrolysine, but I haven't a > clue if they do. > > > > @Andreas, I would suggest that a general policy that all protein > sequence analyses support the 6 non-standard letter would be a good thing. > A nice ideal, but who has the time? :) > > > > Regards, > > Steve > > > > -----Original Message----- > > From: biojava-l-bounces at lists.open-bio.org [mailto: > biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > > Sent: Monday, October 08, 2012 5:44 PM > > To: subrata sinha > > Cc: Chuan Hock Koh; biojava-l at biojava.org > > Subject: Re: [Biojava-l] InstabilityIndex > > > > Hi Subrata, > > > > Please don't mail me directly, but send your questions to the list. > > Chances are that somebody there can help you best. > > > > Ah Fu, do you have any thoughts on this? > > > > Thanks, > > > > Andreas > > > > > > > > > > On Mon, Oct 8, 2012 at 12:06 PM, subrata sinha < > subratasinha2006 at yahoo.co.in> wrote: > >> Dear Sir, > >> > >> I was testing the getInstabilityIndex() method of > >> org.biojava3.aaproperties.PeptideProperties class. > >> > >> I am trying to find InstabilityIndex for very short segment of > >> peptide, as short as two. But I am getting some surprising result in > >> negative.Then i found that the method perhaps giving -ve result for > >> ambiguous characters. So how to handle a situation if my protein > >> sequence contains ambiguous characters > >> > >> Input Sequence Instability Index > >> > >> GTDG -13.725 > >> VDVR -30.075 > >> > >> How i analyse the above situation. > >> > >> Kindly help me. > >> > >> > >> With Regards > >> > >> Subrata Sinha > >> Assistant Professor > >> Centre for Bioinformatics Studies > >> Dibrugarh University > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- http://compbio.ddns.comp.nus.edu.sg/~ChuanHockKoh From asidhu at biomap.org Wed Oct 17 05:28:06 2012 From: asidhu at biomap.org (Amandeep Sidhu) Date: Wed, 17 Oct 2012 13:28:06 +0800 Subject: [Biojava-l] International Symposium on Biomedical Data Infrastructure (BDI 2013) Message-ID: International Symposium on Biomedical Data Infrastructure (BDI 2013) 30 - 31 January 2013 Kuala Lumpur, Malaysia http://umconference.um.edu.my/BDI2013 Proceedings to be published by Springer Due to the emerging demands of huge amounts of biomedical data, new and improved data management capabilities are required for supporting a wide range of applications. Current Biomedical Databases are independently administered in geographically distinct locations, lending them almost ideally to adoption of intelligent data management approaches. As a result next generation of information infrastructure and data integration capabilities are needed to ensure increasing infrastructure agility required for high-throughput biomedical research. The workshop will focus on research issues, problems and opportunities in Biomedical Data Infrastructure. Topics of Interest are: * Big Biomedical Data and its Management * Biomedical Data integration and Interoperability * Next Generation Sequencing Data * Biomedical Image Analysis * Medical Informatics and Translational Bioinformatics * Biomedical Ontologies * Semantic Web Tools and Techniques for Biomedicine * Web 2.0 and Web 3.0 Applications in Biomedicine * Novel architectural models for HPC and cloud computing in Biomedicine * New parallel / concurrent programming models for High Performance Biomedical Applications in Cloud * Biomedical Data Cloud * Interoperability between different Utility Computing Platforms used for Biomedicine * Performance monitoring for biomedical applications in HPC and Cloud * Biomedical Infrastructure as a Service * Biomedical Platforms as a Service * Biomedical Software as a Service * Scientific workflows in bioinformatics and biomedicine * Data Mining in Biomedicine * Computational Systems Biology Submission Guidelines: We welcome original submissions that have not been published and that are not under review by another conference or journal. Papers should not exceed 15 pages excluding references in Springer format. Paper should be submitted through Easy Chair Online Submission System following instructions on the website (http://umconference.um.edu.my/BDI2013). All submissions will be evaluated on their originality, technical soundness, significance, presentation, and interest to the symposium attendees. Submission implies the willingness of at least one of the authors to register and present the work associated with the paper submitted. All submitted papers will be reviewed by symposium's technical program committee. All accepted papers of registered authors will be included in the proceedings published by Springer. All accepted papers will be required to submit a Springer Copyright Form. Important Dates: Paper submission: 20 November 2012 Notifications sent to authors: 10 December 2012 Camera-ready papers due: 24 December 2012 Registration due: 10 January 2013 Conference: 30 - 31 January 2013 Organizing Chairs: Dr. Amandeep S. Sidhu (Curtin Sarawak Malaysia, Malaysia) Dr. Sarinder Kaur (University of Malaya, Malaysia) Steering Committee: Dr. Dickson Lukose (MIMOS, Malaysia) Dr. Kanagasabai Rajaraman (Institute for Infocomm Research, Singapore) Prof. Dr. Meena Kishore Sakharkar (University of Tsukuba, Japan) Prof. Dr. Jake Chen (Indiana University-Purdue University Indianapolis, USA) Prof. Dr. Xiaohua Tony Hu (Drexel University, USA) Prof. Dr. Jason Tsong-Li Wang (New Jersey Institute of Technology, USA) Prof. Dr. Carolyn McGregor (Health Informatics Research, Canada) Please contact BDI 2013 Secretariat through email bdi at biomap.org for any queries. From chapman at cs.wisc.edu Thu Oct 18 01:08:58 2012 From: chapman at cs.wisc.edu (Mark Chapman) Date: Wed, 17 Oct 2012 20:08:58 -0500 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> References: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> Message-ID: <507F56AA.3010001@cs.wisc.edu> biojava3-alignment and biojava3-phylo have both been updated to use the latest forester release: 1.005. The jar file is in our maven repository, and changes are committed to the SVN and git repositories. Enjoy! Mark On 10/11/2012 02:50 PM, Scooter Willis wrote: > Andreas > > Looks like the alignment code is using the distance matrix from forester and that has changed. Any chance the developer who did the MSA code could get this working with the latest forester code. It is probably a refactoring problem. > > Scooter > > ----- Reply message ----- > From: "Terry Casstevens" > To: "Scooter Willis" > Cc: "Andreas Prlic" , "Peter Bradbury" , "Jeff Glaubitz" , "Ed Buckler" , "biojava-l at biojava.org" > Subject: [Biojava-l] Biojava Dependency on Forester > Date: Thu, Oct 11, 2012 2:56 pm > > > > Hi Scooter, Andreas, > > Thank you again for the responses. > > This is one problem we are seeing. org/forester/phylogenyinference > does not exist in Forester version 1.005. > > Exception in thread "main" java.lang.NoClassDefFoundError: > org/forester/phylogenyinference/DistanceMatrix > at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) > at net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java:306) > at net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java:183) > at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMutableAlignment(TagsToSNPByAlignmentPlugin.java:417) > at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPByAlignment(TagsToSNPByAlignmentPlugin.java:347) > at net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunction(TagsToSNPByAlignmentPlugin.java:107) > at net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlugin(TerryPipelines.java:36) > at net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:41) > Caused by: java.lang.ClassNotFoundException: > org.forester.phylogenyinference.DistanceMatrix > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:423) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > > > Thank you, > > Terry > > > On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis wrote: >> If forester has maven repository we can unhook the local depedency. We use >> forester for NJ and should only be a needed in one model. Let me know the >> issue/conflict and I can see what I can do to clean up. >> >> Thanks >> >> Scooter >> >> >> ----- Reply message ----- >> From: "Andreas Prlic" >> To: "Terry Casstevens" >> Cc: "Peter Bradbury" , "Jeff Glaubitz" >> , "Ed Buckler" , >> "biojava-l at biojava.org" >> Subject: [Biojava-l] Biojava Dependency on Forester >> Date: Thu, Oct 11, 2012 2:33 pm >> >> >> >> Hi Terry, >> >> Biojava depends on forester version 0.955. There are no plans to get >> rid of this dependency, as far as I know. However we can try to >> upgrade to a newer version if that helps. >> >> If you are working in a Maven environment and you pull in BioJava that >> way, you can add an exclusion to your config. Something like the XML >> below. This forces your project to ignore the older forester library >> configured in biojava. Is this a suitable workaround for your problem? >> >> Andreas >> >> >> >> org.biojava >> biojava3-phylo >> 3.0.4 >> >> >> org >> forester >> >> >> >> >> >> >> >> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens >> wrote: >>> Dear Andreas, >>> >>> I am the lead developer of the software Tassel. >>> http://www.maizegenetics.net/tassel >>> >>> We currently use Biojava 3.0. And we are >>> wanting to use the latest release of Forester. >>> Since Biojava has a dependency on an older >>> release of Forester, we are running into conflicts. >>> Can you help explain Biojava's dependency on >>> Forester? >>> >>> What version of Forester does Biojava 3.0 require? >>> It looks like version 0.955 >>> >>> What version of Forester does Biojava 3.0.4 require? >>> >>> Does any Biojava jar files include Forester classes? >>> Or just references? >>> >>> Are there plans to remove Biojava's dependency >>> on Forester? >>> >>> >>> Thank you, >>> >>> Terry Casstevens >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Thu Oct 18 01:18:31 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 17 Oct 2012 18:18:31 -0700 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: <507F56AA.3010001@cs.wisc.edu> References: <47E61FED-17CF-40E9-AE76-E44C11BDBB27@scripps.edu> <507F56AA.3010001@cs.wisc.edu> Message-ID: Awesome, Mark, Thanks for fixing this! Andreas On Wed, Oct 17, 2012 at 6:08 PM, Mark Chapman wrote: > biojava3-alignment and biojava3-phylo have both been updated to use the > latest forester release: 1.005. The jar file is in our maven repository, > and changes are committed to the SVN and git repositories. > > Enjoy! > Mark > > > > On 10/11/2012 02:50 PM, Scooter Willis wrote: >> >> Andreas >> >> Looks like the alignment code is using the distance matrix from forester >> and that has changed. Any chance the developer who did the MSA code could >> get this working with the latest forester code. It is probably a refactoring >> problem. >> >> Scooter >> >> ----- Reply message ----- >> From: "Terry Casstevens" >> To: "Scooter Willis" >> Cc: "Andreas Prlic" , "Peter Bradbury" >> , "Jeff Glaubitz" , "Ed Buckler" >> , "biojava-l at biojava.org" >> Subject: [Biojava-l] Biojava Dependency on Forester >> Date: Thu, Oct 11, 2012 2:56 pm >> >> >> >> Hi Scooter, Andreas, >> >> Thank you again for the responses. >> >> This is one problem we are seeing. org/forester/phylogenyinference >> does not exist in Forester version 1.005. >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/forester/phylogenyinference/DistanceMatrix >> at >> org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176) >> at >> net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java:306) >> at >> net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java:183) >> at >> net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMutableAlignment(TagsToSNPByAlignmentPlugin.java:417) >> at >> net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPByAlignment(TagsToSNPByAlignmentPlugin.java:347) >> at >> net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunction(TagsToSNPByAlignmentPlugin.java:107) >> at >> net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlugin(TerryPipelines.java:36) >> at >> net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:41) >> Caused by: java.lang.ClassNotFoundException: >> org.forester.phylogenyinference.DistanceMatrix >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >> >> >> Thank you, >> >> Terry >> >> >> On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis >> wrote: >>> >>> If forester has maven repository we can unhook the local depedency. We >>> use >>> forester for NJ and should only be a needed in one model. Let me know the >>> issue/conflict and I can see what I can do to clean up. >>> >>> Thanks >>> >>> Scooter >>> >>> >>> ----- Reply message ----- >>> From: "Andreas Prlic" >>> To: "Terry Casstevens" >>> Cc: "Peter Bradbury" , "Jeff Glaubitz" >>> , "Ed Buckler" , >>> "biojava-l at biojava.org" >>> Subject: [Biojava-l] Biojava Dependency on Forester >>> Date: Thu, Oct 11, 2012 2:33 pm >>> >>> >>> >>> Hi Terry, >>> >>> Biojava depends on forester version 0.955. There are no plans to get >>> rid of this dependency, as far as I know. However we can try to >>> upgrade to a newer version if that helps. >>> >>> If you are working in a Maven environment and you pull in BioJava that >>> way, you can add an exclusion to your config. Something like the XML >>> below. This forces your project to ignore the older forester library >>> configured in biojava. Is this a suitable workaround for your problem? >>> >>> Andreas >>> >>> >>> >>> org.biojava >>> biojava3-phylo >>> 3.0.4 >>> >>> >>> org >>> >>> forester >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens >>> wrote: >>>> >>>> Dear Andreas, >>>> >>>> I am the lead developer of the software Tassel. >>>> http://www.maizegenetics.net/tassel >>>> >>>> We currently use Biojava 3.0. And we are >>>> wanting to use the latest release of Forester. >>>> Since Biojava has a dependency on an older >>>> release of Forester, we are running into conflicts. >>>> Can you help explain Biojava's dependency on >>>> Forester? >>>> >>>> What version of Forester does Biojava 3.0 require? >>>> It looks like version 0.955 >>>> >>>> What version of Forester does Biojava 3.0.4 require? >>>> >>>> Does any Biojava jar files include Forester classes? >>>> Or just references? >>>> >>>> Are there plans to remove Biojava's dependency >>>> on Forester? >>>> >>>> >>>> Thank you, >>>> >>>> Terry Casstevens >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From HWillis at scripps.edu Thu Oct 18 01:49:24 2012 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 17 Oct 2012 21:49:24 -0400 Subject: [Biojava-l] Biojava Dependency on Forester In-Reply-To: Message-ID: Thanks Mark! Is Forester on maven to avoid external jar file dependency? On 10/17/12 9:18 PM, "Andreas Prlic" wrote: >Awesome, Mark, Thanks for fixing this! > >Andreas > >On Wed, Oct 17, 2012 at 6:08 PM, Mark Chapman wrote: >> biojava3-alignment and biojava3-phylo have both been updated to use the >> latest forester release: 1.005. The jar file is in our maven >>repository, >> and changes are committed to the SVN and git repositories. >> >> Enjoy! >> Mark >> >> >> >> On 10/11/2012 02:50 PM, Scooter Willis wrote: >>> >>> Andreas >>> >>> Looks like the alignment code is using the distance matrix from >>>forester >>> and that has changed. Any chance the developer who did the MSA code >>>could >>> get this working with the latest forester code. It is probably a >>>refactoring >>> problem. >>> >>> Scooter >>> >>> ----- Reply message ----- >>> From: "Terry Casstevens" >>> To: "Scooter Willis" >>> Cc: "Andreas Prlic" , "Peter Bradbury" >>> , "Jeff Glaubitz" , "Ed Buckler" >>> , "biojava-l at biojava.org" >>> Subject: [Biojava-l] Biojava Dependency on Forester >>> Date: Thu, Oct 11, 2012 2:56 pm >>> >>> >>> >>> Hi Scooter, Andreas, >>> >>> Thank you again for the responses. >>> >>> This is one problem we are seeing. org/forester/phylogenyinference >>> does not exist in Forester version 1.005. >>> >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> org/forester/phylogenyinference/DistanceMatrix >>> at >>> >>>org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignment >>>s.java:176) >>> at >>> >>>net.maizegenetics.gbs.maps.TagsAtLocus.getVariableSites(TagsAtLocus.java >>>:306) >>> at >>> >>>net.maizegenetics.gbs.maps.TagsAtLocus.getSNPCallsQuant(TagsAtLocus.java >>>:183) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.addSitesToMuta >>>bleAlignment(TagsToSNPByAlignmentPlugin.java:417) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.runTagsToSNPBy >>>Alignment(TagsToSNPByAlignmentPlugin.java:347) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TagsToSNPByAlignmentPlugin.performFunctio >>>n(TagsToSNPByAlignmentPlugin.java:107) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TerryPipelines.runTagsToSNPByAlignmentPlu >>>gin(TerryPipelines.java:36) >>> at >>> >>>net.maizegenetics.gbs.pipeline.TerryPipelines.main(TerryPipelines.java:4 >>>1) >>> Caused by: java.lang.ClassNotFoundException: >>> org.forester.phylogenyinference.DistanceMatrix >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >>> >>> >>> Thank you, >>> >>> Terry >>> >>> >>> On Thu, Oct 11, 2012 at 2:46 PM, Scooter Willis >>> wrote: >>>> >>>> If forester has maven repository we can unhook the local depedency. We >>>> use >>>> forester for NJ and should only be a needed in one model. Let me know >>>>the >>>> issue/conflict and I can see what I can do to clean up. >>>> >>>> Thanks >>>> >>>> Scooter >>>> >>>> >>>> ----- Reply message ----- >>>> From: "Andreas Prlic" >>>> To: "Terry Casstevens" >>>> Cc: "Peter Bradbury" , "Jeff Glaubitz" >>>> , "Ed Buckler" , >>>> "biojava-l at biojava.org" >>>> Subject: [Biojava-l] Biojava Dependency on Forester >>>> Date: Thu, Oct 11, 2012 2:33 pm >>>> >>>> >>>> >>>> Hi Terry, >>>> >>>> Biojava depends on forester version 0.955. There are no plans to get >>>> rid of this dependency, as far as I know. However we can try to >>>> upgrade to a newer version if that helps. >>>> >>>> If you are working in a Maven environment and you pull in BioJava that >>>> way, you can add an exclusion to your config. Something like the XML >>>> below. This forces your project to ignore the older forester library >>>> configured in biojava. Is this a suitable workaround for your problem? >>>> >>>> Andreas >>>> >>>> >>>> >>>> org.biojava >>>> biojava3-phylo >>>> 3.0.4 >>>> >>>> >>>> org >>>> >>>> forester >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Oct 11, 2012 at 11:17 AM, Terry Casstevens >>>> wrote: >>>>> >>>>> Dear Andreas, >>>>> >>>>> I am the lead developer of the software Tassel. >>>>> http://www.maizegenetics.net/tassel >>>>> >>>>> We currently use Biojava 3.0. And we are >>>>> wanting to use the latest release of Forester. >>>>> Since Biojava has a dependency on an older >>>>> release of Forester, we are running into conflicts. >>>>> Can you help explain Biojava's dependency on >>>>> Forester? >>>>> >>>>> What version of Forester does Biojava 3.0 require? >>>>> It looks like version 0.955 >>>>> >>>>> What version of Forester does Biojava 3.0.4 require? >>>>> >>>>> Does any Biojava jar files include Forester classes? >>>>> Or just references? >>>>> >>>>> Are there plans to remove Biojava's dependency >>>>> on Forester? >>>>> >>>>> >>>>> Thank you, >>>>> >>>>> Terry Casstevens >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >_______________________________________________ >Biojava-l mailing list - Biojava-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biojava-l From pwrose at ucsd.edu Thu Oct 18 23:24:43 2012 From: pwrose at ucsd.edu (Peter Rose) Date: Thu, 18 Oct 2012 23:24:43 +0000 Subject: [Biojava-l] [Job] Lead Web Architect - RCSB PDB Message-ID: Become part of the RCSB Protein Data Bank team. We have an opening for an experienced Lead Web Architect. A detailed job description and online application form can be found at: http://jobs.ucsd.edu/bulletin/job.aspx?cat=information&sortby=post&jobnum_in=64091 Qualifications: * MS Degree in Computer Science or comparable combination of education and experience with considerable focus in Java EE software development. * Established demonstrated work experience in the role of an architect and developer on medium to large size database-driven web applications using Java EE technology and standards. * Advanced experience developing the presentation layer of a dynamic, database-driven web application using HTML, CSS, JavaScript, JavaScript Toolkits, Ajax, JSP, XML, Java. Experience resolving browser and cross-platform compatibility issues. Advanced experience with Struts2, Tiles, jQuery. * Advanced experience with database design, Structured Query Language and RDBMS's such as MySQL. Expertise in web application server administration and configuration such as Tomcat. * Established expertise in software life cycle methodologies. Experience with build tools such as Maven and Ant, and continuous integration systems such as Cruise Control. Experience with project tracking tools such as Jira. ________________________________________________ Peter Rose, Ph.D. Scientific Lead RCSB Protein Data Bank (http://www.rcsb.org) San Diego Supercomputer Center (SDSC) and Skaggs School of Pharmacy and Pharmaceutical Sciences University of California San Diego From hlapp at drycafe.net Mon Oct 22 14:52:24 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Mon, 22 Oct 2012 10:52:24 -0400 Subject: [Biojava-l] regex performance in Java Message-ID: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From andreas at sdsc.edu Mon Oct 22 20:42:00 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 22 Oct 2012 13:42:00 -0700 Subject: [Biojava-l] regex performance in Java In-Reply-To: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: Hi Hilmar, I can't say much about performance of java regular expressions, but isn't it hard to write efficient regular expressions in any language? There are great tools in the Java world for parsing of XML/JSON and other standard file types that help avoiding them. I am not sure if this is a general rule for the wider Java community, but from my perspective, the use of regular expressions in Java is only limited and used if nothing else works... Not sure if anybody else has a different experience? Andreas On Mon, Oct 22, 2012 at 7:52 AM, Hilmar Lapp wrote: > I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. > > They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? > > Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From to.petr at gmail.com Mon Oct 22 20:48:02 2012 From: to.petr at gmail.com (P. Troshin) Date: Mon, 22 Oct 2012 21:48:02 +0100 Subject: [Biojava-l] Fwd: regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: Sorry, I should have written to the list. Also just want to say that I agree with Andreas, in Java we use regexp if everything else fails (:-)) Regards, P. ---------- Forwarded message ---------- From: P. Troshin Date: 22 October 2012 21:44 Subject: Re: [Biojava-l] regex performance in Java To: Hilmar Lapp Hi Hilmar, I think this is one of the myths, I do not think there is a difference. It might have been true long ago, but I do not think this is still the case. Last time we compared Perl, Python and Java performance the former was the last with a large margin :-). However, I never had to make a direct comparison of regexp. Google for "perl vs java regexp speed comparison" brings a few links. I had a quick look at one result only (http://onlyjob.blogspot.co.uk/2011/03/perl5-python-ruby-php-c-c-lua-tcl.html), it claimed that Perl regexp is faster than Java. Unfortunately the author of the test clearly lacked understanding of Java and as a result the test compared the performance of String concatenation (which is notoriously bad in Java, as Strings are immutable) rather than the regexp performance itself. I guess this is an easy mistake to make though. Hence the advice - if you are doing a lot of String permutations use the StringBuilder class, not the String itself. If you have a Java implementation which is lacking I am sure people on this list will have no problem optimizing it! Regards, Peter On 22 October 2012 15:52, Hilmar Lapp wrote: > I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. > > They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? > > Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From tiagoantao at gmail.com Mon Oct 22 20:53:56 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 22 Oct 2012 21:53:56 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: On Mon, Oct 22, 2012 at 9:42 PM, Andreas Prlic wrote: > and used if nothing else works... Not sure if anybody else has a > different experience? I might be beating a dead horse here, but I agree. I would say that from an idiomatic perspective Perl uses a lot of regex programming (Ruby also?), which is less common in most other languages (Java and Python are my work case). Regexes exist but are not the first option. That being said, there is a very cool JVM language which has regexes as first class objects: Clojure. But even in that case, I do not see lots of idiomatic use of regexes. Tiago From daniel.quest at gmail.com Tue Oct 23 04:51:37 2012 From: daniel.quest at gmail.com (daniel.quest at gmail.com) Date: Mon, 22 Oct 2012 23:51:37 -0500 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> Message-ID: Wow! This could open a huge flame war. Let me just make a couple of quick points about performance. Perl is implemented in C/C++, It is interpreted, and Java runs bytecode on top of the JVM. The Venders of JVMs probably write the bytecode instruction set in C/assembly. Java itself, at least at this point is most likely written in Java. The speed of Java is greatly influenced by the underlying JVM and how well the JVM instruction set maps to the hardware. The algorithm being implemented and the version of Java also have a great impact on performance. Conventional wisdom is that Fortran is the best performing language in widespread use with interpreted languages such as Python, Ruby, and Perl being 3-8 times slower. This website shows Java having about a ten percent overhead relative to C: http://shootout.alioth.debian.org/ I have personally noticed superior performance of Perl's Regex parsing capabilities over Python. I have never noticed a difference between Perl and Java that was so extreme that I would choose to implement something in Perl over Java in a production setting. Java is a language with such deep library support that it makes most every language look like a second class citizen in comparison (notable exceptions: C, C++, and JavaScript) Something else interesting: http://swtch.com/~rsc/regexp/regexp1.html Finally, be very cautious of benchmarks. It is very very hard to do benchmarking well. Dan Sent from my iPhone On Oct 22, 2012, at 3:53 PM, Tiago Ant?o wrote: > On Mon, Oct 22, 2012 at 9:42 PM, Andreas Prlic wrote: >> and used if nothing else works... Not sure if anybody else has a >> different experience? > > I might be beating a dead horse here, but I agree. I would say that > from an idiomatic perspective Perl uses a lot of regex programming > (Ruby also?), which is less common in most other languages (Java and > Python are my work case). Regexes exist but are not the first option. > That being said, there is a very cool JVM language which has regexes > as first class objects: Clojure. But even in that case, I do not see > lots of idiomatic use of regexes. > > Tiago > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From khalil.elmazouari at gmail.com Tue Oct 23 16:48:32 2012 From: khalil.elmazouari at gmail.com (Khalil El Mazouari) Date: Tue, 23 Oct 2012 18:48:32 +0200 Subject: [Biojava-l] Biojava-l Digest, Vol 117, Issue 10 In-Reply-To: References: Message-ID: <99D2DB4A-DD8B-4473-BE04-2834ECF5457E@gmail.com> Hi Hilmar, I used regex a lot in perl and java... I was also skeptical about the regex in java when I start using them. from my own experience, I can tell you the following: it's MUCH more easy to write regex in perl than in java. java regex require more optimisation: working regex and optimal regex are two different things in java, Patterns must be compiled first. So, if you iterate through a large number of strings you want to match, compile your pattern outside the loop if you use regex in large iteration, avoid using methods from java.lang.String that use regex: String.replaceFirst, String.replaceAll, String.matches.... your pattern will be compiled each time Avoid applying regex to large string. If possible, try to limit the matches to the places where the pattern is .. methods like indexOf, lastIndexOf, split ... from java.lang.String are very useful in this regards. It's more easy to get the matching group in java than in perl test first with editors like : RegExhibit or your IDE regex plugin. finally, I recommend the Java Regular Expressions book from Mehran Habibi (http://www.amazon.com/Java-Regular-Expressions-Taming-java-util-regex/dp/1590591070) If your regex are well optimised, you will not notice any difference between perl/java. If you need to use regex in complex algorithm or software in combination with java/biojava, don't hesitate, java regex are excellent. If you just need regex in small script go for perl Best khalil On 22 Oct 2012, at 18:00, biojava-l-request at lists.open-bio.org wrote: > Send Biojava-l mailing list submissions to > biojava-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > biojava-l-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. regex performance in Java (Hilmar Lapp) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 22 Oct 2012 10:52:24 -0400 > From: Hilmar Lapp > Subject: [Biojava-l] regex performance in Java > To: BioJava > Message-ID: <1B62BC3E-B005-4484-AE66-0B8F407E4756 at drycafe.net> > Content-Type: text/plain; charset=us-ascii > > I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java. > > They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true? > > Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 117, Issue 10 > ****************************************** ----- Confidentiality Notice: This e-mail and any files transmitted with it are private and confidential and are solely for the use of the addressee. It may contain material which is legally privileged. If you are not the addressee or the person responsible for delivering to the addressee, please notify that you have received this e-mail in error and that any use of it is strictly prohibited. It would be helpful if you could notify the author by replying to it. From hlapp at drycafe.net Wed Oct 24 16:47:27 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Oct 2012 12:47:27 -0400 Subject: [Biojava-l] regex performance in Java In-Reply-To: <874nllo4fx.fsf@newcastle.ac.uk> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> Message-ID: <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Hi everyone, Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. Java: https://gist.github.com/3940931 Perl: https://gist.github.com/3940780 I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. -hilmar On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: > Hilmar Lapp writes: >> They (at least as in java.util.regex) have been reported to me as >> performing much slower (by several orders of magnitude) than the regex >> implementation in Perl, and some simple benchmarking tests seem to >> bear that out. Even after scrutinizing the benchmark and finding >> nothing obvious, I'm still skeptical as to why this would be the case >> - naively I would have assumed that the underlying runtime library is >> implemented in C in both cases. But perhaps this is not true? > > > Well, the difference is that Perl is perl, while Java is not; it all > depends on the JVM, and libraries also. A quick shuftie at > the source for the open-jdk libraries suggests that the regexp searching > is done in Java -- it's not just a drop through to C. Always the problem > with performance optimisation on Java -- you are only optimising for one > situation. It might be interesting to see how much variation there is > between JVMs. > > Like others, I would only use regexp as a last resort in Java anyway; > compared to Perl, writing the code is painful. Still, I guess that you > know this! > > Phil -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From to.petr at gmail.com Wed Oct 24 17:59:19 2012 From: to.petr at gmail.com (P. Troshin) Date: Wed, 24 Oct 2012 18:59:19 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: Hi Hilmar, You have not mentions the version of JVM you are using, but it appears that there is a massive difference in timing on my machine. Here is the timing of the run on Windows 7 (pro 64 bit) with Oracle JVM (64 bit) v. 1.7.0_02. # of Iteration: 1t Time: 1.711E-6 seconds # of Iteration: 10 Time: 1.711E-6 seconds # of Iteration: 100 Time: 2.567E-6 seconds # of Iteration: 1000 Time: 1.2403E-5 seconds # of Iteration: 10000 Time: 1.44143E-4 seconds # of Iteration: 100000 Time: 0.001369138 seconds I have not changed the code at all. I have 3 year old laptop with Intel Core Duo P8600, 2.4 Ghz CPU. So nothing special. I cannot tell whether this is slow or not as you did not publish the timings for Perl. Could you please do so. It looks to me that you might just need to update/replace your JVM. I will be happy to look at the code in a bit more details if this result is still slower than Perl. Thanks, Peter On 24 October 2012 17:47, Hilmar Lapp wrote: > Hi everyone, > > Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. > > However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. > > Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. > > Java: https://gist.github.com/3940931 > Perl: https://gist.github.com/3940780 > > I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. > > -hilmar > > On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: > >> Hilmar Lapp writes: >>> They (at least as in java.util.regex) have been reported to me as >>> performing much slower (by several orders of magnitude) than the regex >>> implementation in Perl, and some simple benchmarking tests seem to >>> bear that out. Even after scrutinizing the benchmark and finding >>> nothing obvious, I'm still skeptical as to why this would be the case >>> - naively I would have assumed that the underlying runtime library is >>> implemented in C in both cases. But perhaps this is not true? >> >> >> Well, the difference is that Perl is perl, while Java is not; it all >> depends on the JVM, and libraries also. A quick shuftie at >> the source for the open-jdk libraries suggests that the regexp searching >> is done in Java -- it's not just a drop through to C. Always the problem >> with performance optimisation on Java -- you are only optimising for one >> situation. It might be interesting to see how much variation there is >> between JVMs. >> >> Like others, I would only use regexp as a last resort in Java anyway; >> compared to Perl, writing the code is painful. Still, I guess that you >> know this! >> >> Phil > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From to.petr at gmail.com Wed Oct 24 18:10:38 2012 From: to.petr at gmail.com (P. Troshin) Date: Wed, 24 Oct 2012 19:10:38 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: Hi Hilmar, Hmm, it looks like I spoke too soon; the previous run was doing nothing as all of the cases were commented out. I can now see that the results of my runs are not massively different from that of yours. It would help if you could encourage your student to write a few unit tests so that we know what you are trying to achieve and to simplify the testing. Just a thought Thanks, Peter On 24 October 2012 17:47, Hilmar Lapp wrote: > Hi everyone, > > Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. > > However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. > > Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. > > Java: https://gist.github.com/3940931 > Perl: https://gist.github.com/3940780 > > I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. > > -hilmar > > On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: > >> Hilmar Lapp writes: >>> They (at least as in java.util.regex) have been reported to me as >>> performing much slower (by several orders of magnitude) than the regex >>> implementation in Perl, and some simple benchmarking tests seem to >>> bear that out. Even after scrutinizing the benchmark and finding >>> nothing obvious, I'm still skeptical as to why this would be the case >>> - naively I would have assumed that the underlying runtime library is >>> implemented in C in both cases. But perhaps this is not true? >> >> >> Well, the difference is that Perl is perl, while Java is not; it all >> depends on the JVM, and libraries also. A quick shuftie at >> the source for the open-jdk libraries suggests that the regexp searching >> is done in Java -- it's not just a drop through to C. Always the problem >> with performance optimisation on Java -- you are only optimising for one >> situation. It might be interesting to see how much variation there is >> between JVMs. >> >> Like others, I would only use regexp as a last resort in Java anyway; >> compared to Perl, writing the code is painful. Still, I guess that you >> know this! >> >> Phil > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From to.petr at gmail.com Wed Oct 24 18:30:16 2012 From: to.petr at gmail.com (P. Troshin) Date: Wed, 24 Oct 2012 19:30:16 +0100 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: Hi Hilmar, Looked at the test in a bit more details, I can see what you are trying to test but is there a real life problem behind this? What this test is doing is a lot of searches on very short strings. Is this what your real life application does? I am asking because if your real life application uses regexp to look into long string, the performance might be totally different. What is your aim - 3 seconds for 500K searches do not seem particularly slow to me. Thanks Peter On 24 October 2012 19:10, P. Troshin wrote: > Hi Hilmar, > > Hmm, it looks like I spoke too soon; the previous run was doing > nothing as all of the cases were commented out. > I can now see that the results of my runs are not massively different > from that of yours. > It would help if you could encourage your student to write a few unit > tests so that we know what you are trying to achieve and to simplify > the testing. > > Just a thought > > Thanks, > Peter > > > > On 24 October 2012 17:47, Hilmar Lapp wrote: >> Hi everyone, >> >> Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. >> >> However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. >> >> Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. >> >> Java: https://gist.github.com/3940931 >> Perl: https://gist.github.com/3940780 >> >> I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. >> >> -hilmar >> >> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: >> >>> Hilmar Lapp writes: >>>> They (at least as in java.util.regex) have been reported to me as >>>> performing much slower (by several orders of magnitude) than the regex >>>> implementation in Perl, and some simple benchmarking tests seem to >>>> bear that out. Even after scrutinizing the benchmark and finding >>>> nothing obvious, I'm still skeptical as to why this would be the case >>>> - naively I would have assumed that the underlying runtime library is >>>> implemented in C in both cases. But perhaps this is not true? >>> >>> >>> Well, the difference is that Perl is perl, while Java is not; it all >>> depends on the JVM, and libraries also. A quick shuftie at >>> the source for the open-jdk libraries suggests that the regexp searching >>> is done in Java -- it's not just a drop through to C. Always the problem >>> with performance optimisation on Java -- you are only optimising for one >>> situation. It might be interesting to see how much variation there is >>> between JVMs. >>> >>> Like others, I would only use regexp as a last resort in Java anyway; >>> compared to Perl, writing the code is painful. Still, I guess that you >>> know this! >>> >>> Phil >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l From hlapp at drycafe.net Thu Oct 25 03:45:52 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Oct 2012 23:45:52 -0400 Subject: [Biojava-l] regex performance in Java In-Reply-To: References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> Message-ID: <34DFDF13-5014-4383-A3D3-BAF793AC2E23@drycafe.net> The code is a very small snippet from a natural language processing software aimed at extracting structured phenotype descriptions from un- or semistructured free text. Apparently the code as is (in Perl) makes a lot of regular expression matches, and so if the speed difference for them between Perl and Java is significant, in theory this might become a problem. Though whether it will or will not amount to a bottleneck indeed remains to be seen, as the code is also doing other things that are potentially expensive, and possibly more so than the regex matching. So the exercise here is merely to see whether there is a notable performance difference in regex pattern evaluation that can't simply be attributed to programming mistakes (and apparently there is). -hilmar On Oct 24, 2012, at 2:30 PM, P. Troshin wrote: > Hi Hilmar, > > Looked at the test in a bit more details, I can see what you are > trying to test but is there a real life problem behind this? > What this test is doing is a lot of searches on very short strings. Is > this what your real life application does? I am asking because if your > real life application uses regexp to look into long string, the > performance might be totally different. > What is your aim - 3 seconds for 500K searches do not seem > particularly slow to me. > > Thanks > Peter > > > On 24 October 2012 19:10, P. Troshin wrote: >> Hi Hilmar, >> >> Hmm, it looks like I spoke too soon; the previous run was doing >> nothing as all of the cases were commented out. >> I can now see that the results of my runs are not massively different >> from that of yours. >> It would help if you could encourage your student to write a few unit >> tests so that we know what you are trying to achieve and to simplify >> the testing. >> >> Just a thought >> >> Thanks, >> Peter >> >> >> >> On 24 October 2012 17:47, Hilmar Lapp wrote: >>> Hi everyone, >>> >>> Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. >>> >>> However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. >>> >>> Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. >>> >>> Java: https://gist.github.com/3940931 >>> Perl: https://gist.github.com/3940780 >>> >>> I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. >>> >>> -hilmar >>> >>> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: >>> >>>> Hilmar Lapp writes: >>>>> They (at least as in java.util.regex) have been reported to me as >>>>> performing much slower (by several orders of magnitude) than the regex >>>>> implementation in Perl, and some simple benchmarking tests seem to >>>>> bear that out. Even after scrutinizing the benchmark and finding >>>>> nothing obvious, I'm still skeptical as to why this would be the case >>>>> - naively I would have assumed that the underlying runtime library is >>>>> implemented in C in both cases. But perhaps this is not true? >>>> >>>> >>>> Well, the difference is that Perl is perl, while Java is not; it all >>>> depends on the JVM, and libraries also. A quick shuftie at >>>> the source for the open-jdk libraries suggests that the regexp searching >>>> is done in Java -- it's not just a drop through to C. Always the problem >>>> with performance optimisation on Java -- you are only optimising for one >>>> situation. It might be interesting to see how much variation there is >>>> between JVMs. >>>> >>>> Like others, I would only use regexp as a last resort in Java anyway; >>>> compared to Perl, writing the code is painful. Still, I guess that you >>>> know this! >>>> >>>> Phil >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From daniel.quest at gmail.com Thu Oct 25 03:51:53 2012 From: daniel.quest at gmail.com (daniel.quest at gmail.com) Date: Wed, 24 Oct 2012 22:51:53 -0500 Subject: [Biojava-l] regex performance in Java In-Reply-To: <34DFDF13-5014-4383-A3D3-BAF793AC2E23@drycafe.net> References: <1B62BC3E-B005-4484-AE66-0B8F407E4756@drycafe.net> <874nllo4fx.fsf@newcastle.ac.uk> <4F9982AE-FA76-41BF-BF1B-58B7889F03AB@drycafe.net> <34DFDF13-5014-4383-A3D3-BAF793AC2E23@drycafe.net> Message-ID: Have you ever used uima? Same software used on the IBM Watson project. Very very powerful. http://uima.apache.org/ Dan Sent from my iPhone On Oct 24, 2012, at 10:45 PM, Hilmar Lapp wrote: > The code is a very small snippet from a natural language processing software aimed at extracting structured phenotype descriptions from un- or semistructured free text. Apparently the code as is (in Perl) makes a lot of regular expression matches, and so if the speed difference for them between Perl and Java is significant, in theory this might become a problem. Though whether it will or will not amount to a bottleneck indeed remains to be seen, as the code is also doing other things that are potentially expensive, and possibly more so than the regex matching. > > So the exercise here is merely to see whether there is a notable performance difference in regex pattern evaluation that can't simply be attributed to programming mistakes (and apparently there is). > > -hilmar > > On Oct 24, 2012, at 2:30 PM, P. Troshin wrote: > >> Hi Hilmar, >> >> Looked at the test in a bit more details, I can see what you are >> trying to test but is there a real life problem behind this? >> What this test is doing is a lot of searches on very short strings. Is >> this what your real life application does? I am asking because if your >> real life application uses regexp to look into long string, the >> performance might be totally different. >> What is your aim - 3 seconds for 500K searches do not seem >> particularly slow to me. >> >> Thanks >> Peter >> >> >> On 24 October 2012 19:10, P. Troshin wrote: >>> Hi Hilmar, >>> >>> Hmm, it looks like I spoke too soon; the previous run was doing >>> nothing as all of the cases were commented out. >>> I can now see that the results of my runs are not massively different >>> from that of yours. >>> It would help if you could encourage your student to write a few unit >>> tests so that we know what you are trying to achieve and to simplify >>> the testing. >>> >>> Just a thought >>> >>> Thanks, >>> Peter >>> >>> >>> >>> On 24 October 2012 17:47, Hilmar Lapp wrote: >>>> Hi everyone, >>>> >>>> Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison. >>>> >>>> However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once. >>>> >>>> Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts. >>>> >>>> Java: https://gist.github.com/3940931 >>>> Perl: https://gist.github.com/3940780 >>>> >>>> I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate. >>>> >>>> -hilmar >>>> >>>> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: >>>> >>>>> Hilmar Lapp writes: >>>>>> They (at least as in java.util.regex) have been reported to me as >>>>>> performing much slower (by several orders of magnitude) than the regex >>>>>> implementation in Perl, and some simple benchmarking tests seem to >>>>>> bear that out. Even after scrutinizing the benchmark and finding >>>>>> nothing obvious, I'm still skeptical as to why this would be the case >>>>>> - naively I would have assumed that the underlying runtime library is >>>>>> implemented in C in both cases. But perhaps this is not true? >>>>> >>>>> >>>>> Well, the difference is that Perl is perl, while Java is not; it all >>>>> depends on the JVM, and libraries also. A quick shuftie at >>>>> the source for the open-jdk libraries suggests that the regexp searching >>>>> is done in Java -- it's not just a drop through to C. Always the problem >>>>> with performance optimisation on Java -- you are only optimising for one >>>>> situation. It might be interesting to see how much variation there is >>>>> between JVMs. >>>>> >>>>> Like others, I would only use regexp as a last resort in Java anyway; >>>>> compared to Perl, writing the code is painful. Still, I guess that you >>>>> know this! >>>>> >>>>> Phil >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Wed Oct 31 20:48:07 2012 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 31 Oct 2012 13:48:07 -0700 Subject: [Biojava-l] open-bio servers moving Message-ID: Hi, The portal.open-bio.org server that is hosting the biojava wiki site is currently down, since it is being moved to a new location on the Amazon Cloud. It may take a few days until everything has been set up properly and the wiki will be back. If there is anybody on the biojava side who wants to join the open bioinformatics foundation's sysadmin team and help out with projects like this one, this would be a good moment to volunteer... Andreas