From james at carmanconsulting.com Thu Jul 3 11:06:40 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 11:06:40 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 Message-ID: I'm trying to parse the file: ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz using: RichSequence.IOTools.readGenbankProtein() and I keep getting this error (the date column is from my build server which runs this "loader", sorry): [10:51:36]: org.biojava.bio.BioException: Could not read sequence [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could not understand position: bond(39,96 [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) [10:51:36]: ... 28 more Does the parser not understand "Bond" features? From dicknetherlands at gmail.com Thu Jul 3 11:17:11 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 3 Jul 2008 16:17:11 +0100 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: Apparently not. I don't think they're part of the formal Genbank specification, or at least not the one that was current at the time the parser was written (in 2004). If they were, then we must have missed them out by accident. Sorry! Could you raise a bug report via BugZilla onthe BioJava website and someone will look into it as soon as they get a chance. cheers, Richard 2008/7/3 James Carman : > I'm trying to parse the file: > > ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz > > using: > > RichSequence.IOTools.readGenbankProtein() > > and I keep getting this error (the date column is from my build server > which runs this "loader", sorry): > > [10:51:36]: org.biojava.bio.BioException: Could not read sequence > [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) > [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) > [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) > [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) > [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) > [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) > [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) > [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) > [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) > [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) > [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) > [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) > [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) > [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) > [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could > not understand position: bond(39,96 > [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) > [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) > [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) > [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) > [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > [10:51:36]: ... 28 more > > Does the parser not understand "Bond" features? > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From james at carmanconsulting.com Thu Jul 3 11:19:32 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 11:19:32 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: Ok, great! I just wanted to make sure I wasn't doing something stupid! :) I'll file the BugZilla issue now (and download the source so that I can hopefully provide a patch). On Thu, Jul 3, 2008 at 11:17 AM, Richard Holland wrote: > Apparently not. I don't think they're part of the formal Genbank > specification, or at least not the one that was current at the time > the parser was written (in 2004). If they were, then we must have > missed them out by accident. Sorry! Could you raise a bug report via > BugZilla onthe BioJava website and someone will look into it as soon > as they get a chance. > > cheers, > Richard > > 2008/7/3 James Carman : >> I'm trying to parse the file: >> >> ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz >> >> using: >> >> RichSequence.IOTools.readGenbankProtein() >> >> and I keep getting this error (the date column is from my build server >> which runs this "loader", sorry): >> >> [10:51:36]: org.biojava.bio.BioException: Could not read sequence >> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >> [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) >> [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) >> [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) >> [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) >> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) >> [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) >> [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) >> [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) >> [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >> [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) >> [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) >> [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) >> [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) >> [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could >> not understand position: bond(39,96 >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) >> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >> [10:51:36]: ... 28 more >> >> Does the parser not understand "Bond" features? >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From james at carmanconsulting.com Thu Jul 3 14:52:52 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 14:52:52 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: Richard, I filed the BugZilla issue: http://bugzilla.open-bio.org/show_bug.cgi?id=2536 I also attached a patch that I believe fixes the issue (it includes a test case). I hope that helps! James On Thu, Jul 3, 2008 at 11:19 AM, James Carman wrote: > Ok, great! I just wanted to make sure I wasn't doing something > stupid! :) I'll file the BugZilla issue now (and download the source > so that I can hopefully provide a patch). > > On Thu, Jul 3, 2008 at 11:17 AM, Richard Holland > wrote: >> Apparently not. I don't think they're part of the formal Genbank >> specification, or at least not the one that was current at the time >> the parser was written (in 2004). If they were, then we must have >> missed them out by accident. Sorry! Could you raise a bug report via >> BugZilla onthe BioJava website and someone will look into it as soon >> as they get a chance. >> >> cheers, >> Richard >> >> 2008/7/3 James Carman : >>> I'm trying to parse the file: >>> >>> ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz >>> >>> using: >>> >>> RichSequence.IOTools.readGenbankProtein() >>> >>> and I keep getting this error (the date column is from my build server >>> which runs this "loader", sorry): >>> >>> [10:51:36]: org.biojava.bio.BioException: Could not read sequence >>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >>> [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) >>> [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) >>> [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) >>> [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) >>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) >>> [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) >>> [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) >>> [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) >>> [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) >>> [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could >>> not understand position: bond(39,96 >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) >>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>> [10:51:36]: ... 28 more >>> >>> Does the parser not understand "Bond" features? >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> > From james at carmanconsulting.com Thu Jul 3 19:07:51 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 19:07:51 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: I added a new patch that actually fixes the problem (you really should halt your build when a test case fails by the way :). Basically, it just skips over "Bond" features. On Thu, Jul 3, 2008 at 2:52 PM, James Carman wrote: > Richard, > > I filed the BugZilla issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2536 > > I also attached a patch that I believe fixes the issue (it includes a > test case). I hope that helps! > > James > > On Thu, Jul 3, 2008 at 11:19 AM, James Carman > wrote: >> Ok, great! I just wanted to make sure I wasn't doing something >> stupid! :) I'll file the BugZilla issue now (and download the source >> so that I can hopefully provide a patch). >> >> On Thu, Jul 3, 2008 at 11:17 AM, Richard Holland >> wrote: >>> Apparently not. I don't think they're part of the formal Genbank >>> specification, or at least not the one that was current at the time >>> the parser was written (in 2004). If they were, then we must have >>> missed them out by accident. Sorry! Could you raise a bug report via >>> BugZilla onthe BioJava website and someone will look into it as soon >>> as they get a chance. >>> >>> cheers, >>> Richard >>> >>> 2008/7/3 James Carman : >>>> I'm trying to parse the file: >>>> >>>> ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz >>>> >>>> using: >>>> >>>> RichSequence.IOTools.readGenbankProtein() >>>> >>>> and I keep getting this error (the date column is from my build server >>>> which runs this "loader", sorry): >>>> >>>> [10:51:36]: org.biojava.bio.BioException: Could not read sequence >>>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >>>> [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) >>>> [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) >>>> [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) >>>> [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) >>>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) >>>> [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) >>>> [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) >>>> [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) >>>> [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) >>>> [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could >>>> not understand position: bond(39,96 >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) >>>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>>> [10:51:36]: ... 28 more >>>> >>>> Does the parser not understand "Bond" features? >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >> > From james at carmanconsulting.com Sat Jul 5 07:46:41 2008 From: james at carmanconsulting.com (James Carman) Date: Sat, 5 Jul 2008 07:46:41 -0400 Subject: [Biojava-l] Maven2... Message-ID: Would the biojava project be interested in being "mavenized"? I'd be willing to help get you guys set up if you'd like. Also, it'd be nice to have biojava in the main maven repository. From dicknetherlands at gmail.com Sat Jul 5 08:09:51 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Sat, 5 Jul 2008 13:09:51 +0100 Subject: [Biojava-l] Maven2... In-Reply-To: References: Message-ID: Hello. BioJava 3 will make use of Maven. It's currently undergoing some use-case development to work out what to work on first, but we have a shell of a maven project already in our subversion hierarchy (under the biojava3 branch of the biojava-live project) and will set it up in the main maven repository when it's ready for release. Thanks for the offer though. If you're keen, you could go ahead and maven-ize the existing BioJava JAR files (version 1.6)? But, you would need to preserve the existing Ant config as well so that existing users are not affected. cheers, Richard 2008/7/5 James Carman : > Would the biojava project be interested in being "mavenized"? I'd be > willing to help get you guys set up if you'd like. Also, it'd be nice > to have biojava in the main maven repository. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Mon Jul 7 04:35:34 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 07 Jul 2008 09:35:34 +0100 Subject: [Biojava-l] Maven2... In-Reply-To: References: Message-ID: <4871D556.8020307@ebi.ac.uk> From my experience Maveninzing an existing build system is never a good idea. What is probably of more use to people is if a POM was generated & the biojava files uploaded to a maven repository (or host it on our website). That way it would keep people happy who are using the dependency management systems (I think buildr, raven & the alike can use the same systems as Maven2) & means we don't have to go through the heartache of reconfiguring Maven/our codebase to friendly to one of the other. Andy Richard Holland wrote: > Hello. BioJava 3 will make use of Maven. It's currently undergoing > some use-case development to work out what to work on first, but we > have a shell of a maven project already in our subversion hierarchy > (under the biojava3 branch of the biojava-live project) and will set > it up in the main maven repository when it's ready for release. > > Thanks for the offer though. If you're keen, you could go ahead and > maven-ize the existing BioJava JAR files (version 1.6)? But, you would > need to preserve the existing Ant config as well so that existing > users are not affected. > > cheers, > Richard > > 2008/7/5 James Carman : >> Would the biojava project be interested in being "mavenized"? I'd be >> willing to help get you guys set up if you'd like. Also, it'd be nice >> to have biojava in the main maven repository. >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From martin.jones at ed.ac.uk Thu Jul 10 06:13:28 2008 From: martin.jones at ed.ac.uk (Martin Jones) Date: Thu, 10 Jul 2008 11:13:28 +0100 Subject: [Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown Message-ID: Hi, I have a file containing GenBank records, and I want to process them thus: RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, null); while (seqs.hasNext()) { RichSequence seq = seqs.nextRichSequence(); // processing code } however, some records cannot be parsed by biojava... this is to be expected as I'm processing half a million records - some are bound to be wonky. So I use a try-catch to skip over troublesome records: RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, null); while (seqs.hasNext()) { try{ RichSequence seq = seqs.nextRichSequence(); // processing code } catch (BioException e){ System.out.println("record count not be parsed!"); } } However, it seems that the position in the input file is not changed if an exception is thrown during parsing. If I run the above code on a file containing a single un-parseable record, it gets stuck in a non-terminating loop - i.e. each time seqs.nextRichSequence() is called, an exception is thrown, but seqs.hasNext() still returns true. Is there a correct way to deal with this? I could split up my input file into multiple records and do something like: ArrayList records = splitGenBankFileIntoRecords(); for (String singleRecord : records){ BufferedReader singleRecordReader = new BufferedReader(new StringReader(singleRecord)); RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(singleRecordReader, null); try{ RichSequence seq = seqs.nextRichSequence(); // processing code } catch (BioException e){ System.out.println("record count not be parsed!"); } } but this seems inefficient, as I have to instantiate a new StringReader, BufferedReader and RichSequenceIterator for every record (half a milion cycles of object creation/destruction!) Any ideas? -- ------------------------ Martin Jones School of Biological Sciences, Ashworth Laboratories, King's Buildings Edinburgh, EH9 3JT, UK From dicknetherlands at gmail.com Thu Jul 10 06:21:30 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 10 Jul 2008 11:21:30 +0100 Subject: [Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown In-Reply-To: References: Message-ID: Hello. You appear to have hit a bit of a limitation with the system. The sequence iterator doesn't know how to skip over bad records (in fact, the parsers themselves do not - they just give up at the first sign of a failed line). I'll have to have a think about how to fix this, as it's not immediately obvious (although it definitely needs to be done). cheers, Richard 2008/7/10 Martin Jones : > Hi, > > I have a file containing GenBank records, and I want to process them thus: > > RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, > null); > while (seqs.hasNext()) { > RichSequence seq = seqs.nextRichSequence(); > // processing code > } > > however, some records cannot be parsed by biojava... this is to be expected > as I'm processing half a million records - some are bound to be wonky. So I > use a try-catch to skip over troublesome records: > > > RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, > null); > while (seqs.hasNext()) { > try{ > RichSequence seq = seqs.nextRichSequence(); > // processing code > } catch (BioException e){ > System.out.println("record count not be parsed!"); > } > } > > However, it seems that the position in the input file is not changed if an > exception is thrown during parsing. If I run the above code on a file > containing a single un-parseable record, it gets stuck in a non-terminating > loop - i.e. each time seqs.nextRichSequence() is called, an exception is > thrown, but seqs.hasNext() still returns true. Is there a correct way to > deal with this? I could split up my input file into multiple records and do > something like: > > ArrayList records = splitGenBankFileIntoRecords(); > for (String singleRecord : records){ > BufferedReader singleRecordReader = new BufferedReader(new > StringReader(singleRecord)); > RichSequenceIterator seqs = > RichSequence.IOTools.readGenbankDNA(singleRecordReader, null); > try{ > RichSequence seq = seqs.nextRichSequence(); > // processing code > } catch (BioException e){ > System.out.println("record count not be parsed!"); > } > > } > > but this seems inefficient, as I have to instantiate a new StringReader, > BufferedReader and RichSequenceIterator for every record (half a milion > cycles of object creation/destruction!) > > Any ideas? > > > > -- > ------------------------ > > Martin Jones > School of Biological Sciences, > Ashworth Laboratories, King's Buildings > Edinburgh, EH9 3JT, UK > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From james at carmanconsulting.com Thu Jul 10 07:30:32 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 10 Jul 2008 07:30:32 -0400 Subject: [Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown In-Reply-To: References: Message-ID: Ooooh. That's nasty. I just re-wrote one of our "loaders" because it was doing exactly that, breaking the file up into records and then using the parser to parse each one individually. I guess that's why they were doing that. I'll have to back out my changes. Good to know! Perhaps they should have put in a comment?! :) On Thu, Jul 10, 2008 at 6:21 AM, Richard Holland wrote: > Hello. You appear to have hit a bit of a limitation with the system. > The sequence iterator doesn't know how to skip over bad records (in > fact, the parsers themselves do not - they just give up at the first > sign of a failed line). I'll have to have a think about how to fix > this, as it's not immediately obvious (although it definitely needs to > be done). > > cheers, > Richard > > 2008/7/10 Martin Jones : >> Hi, >> >> I have a file containing GenBank records, and I want to process them thus: >> >> RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, >> null); >> while (seqs.hasNext()) { >> RichSequence seq = seqs.nextRichSequence(); >> // processing code >> } >> >> however, some records cannot be parsed by biojava... this is to be expected >> as I'm processing half a million records - some are bound to be wonky. So I >> use a try-catch to skip over troublesome records: >> >> >> RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, >> null); >> while (seqs.hasNext()) { >> try{ >> RichSequence seq = seqs.nextRichSequence(); >> // processing code >> } catch (BioException e){ >> System.out.println("record count not be parsed!"); >> } >> } >> >> However, it seems that the position in the input file is not changed if an >> exception is thrown during parsing. If I run the above code on a file >> containing a single un-parseable record, it gets stuck in a non-terminating >> loop - i.e. each time seqs.nextRichSequence() is called, an exception is >> thrown, but seqs.hasNext() still returns true. Is there a correct way to >> deal with this? I could split up my input file into multiple records and do >> something like: >> >> ArrayList records = splitGenBankFileIntoRecords(); >> for (String singleRecord : records){ >> BufferedReader singleRecordReader = new BufferedReader(new >> StringReader(singleRecord)); >> RichSequenceIterator seqs = >> RichSequence.IOTools.readGenbankDNA(singleRecordReader, null); >> try{ >> RichSequence seq = seqs.nextRichSequence(); >> // processing code >> } catch (BioException e){ >> System.out.println("record count not be parsed!"); >> } >> >> } >> >> but this seems inefficient, as I have to instantiate a new StringReader, >> BufferedReader and RichSequenceIterator for every record (half a milion >> cycles of object creation/destruction!) >> >> Any ideas? >> >> >> >> -- >> ------------------------ >> >> Martin Jones >> School of Biological Sciences, >> Ashworth Laboratories, King's Buildings >> Edinburgh, EH9 3JT, UK >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From anisahghoorah at hotmail.com Thu Jul 17 06:06:37 2008 From: anisahghoorah at hotmail.com (Anisah Ghoorah) Date: Thu, 17 Jul 2008 11:06:37 +0100 Subject: [Biojava-l] Nexus file parser In-Reply-To: References: Message-ID: Hi, I would like to parse a nexus file and get the alignment from the DATA block. I'm not sure how the NexusFileListener works. Is there any code available that illustrates how to parse a nexus file. Many thanks, Anisah _________________________________________________________________ Invite your Facebook friends to chat on Messenger http://clk.atdmt.com/UKM/go/101719649/direct/01/ From anisahghoorah at hotmail.com Thu Jul 17 06:09:14 2008 From: anisahghoorah at hotmail.com (Anisah Ghoorah) Date: Thu, 17 Jul 2008 11:09:14 +0100 Subject: [Biojava-l] nexus file parser In-Reply-To: References: Message-ID: Hi, I would like to parse a nexus file and get the alignment from the DATA block. I'm not sure how the NexusFileListener works. Is there any code available that illustrates how to parse a nexus file. Many thanks, Anisah _________________________________________________________________ The John Lewis Clearance - save up to 50% with FREE delivery http://clk.atdmt.com/UKM/go/101719806/direct/01/ From dicknetherlands at gmail.com Thu Jul 17 07:21:45 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 17 Jul 2008 12:21:45 +0100 Subject: [Biojava-l] nexus file parser In-Reply-To: References: Message-ID: Hello. If you pass an instance of NexusFileBuilder to the NexusFileFormat parse methods, it will construct a NexusFile instance in memory which you can get by calling getNexusFile() after parsing has finished. You can then iterate over the blocks of the NexusFile by using the blockIterator() method. Each block returned is a class that implements the NexusObject interface. You can find out which type of block it is using instanceof, and thus find the DataBlock instance. You can then cast to DataBlock (which extends CharactersBlock) and use the methods from that to explore the alignment. cheers, Richard 2008/7/17 Anisah Ghoorah : > > > > Hi, > > I would like to parse a nexus file and get the alignment > from the DATA block. I'm not sure how the NexusFileListener works. Is > there any code available that illustrates how to parse a nexus file. > > Many thanks, > Anisah > _________________________________________________________________ > The John Lewis Clearance - save up to 50% with FREE delivery > http://clk.atdmt.com/UKM/go/101719806/direct/01/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From markjschreiber at gmail.com Thu Jul 17 08:33:11 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:33:11 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <483E0CA2.4010906@asti.dost.gov.ph> References: <483E0CA2.4010906@asti.dost.gov.ph> Message-ID: <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> Hi - Is the code throwing an exception or running out of memory?? Can you send an example program and the problem you encounter to the list. - Mark On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia wrote: > > > -------- Original Message -------- > Subject: large genbank data > Date: Wed, 28 May 2008 18:02:48 +0800 > From: Rey Vincent Babilonia > To: biojava-l at biojava.org > > hi, > > anybody tried uploading a large genbank data (e.g. > ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? > load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and > it can't read the sequence (maybe because it has 30000+ sequences). > > thanks. > > -- > /** > * @author Rey Vincent P. Babilonia > * @number +63 2 426 9760 local 1302 > * @pgp 0x383454CF pgp.mit.edu > * @project Philippine Bioinformatics Solutions > * @program Philippine e-Science Grid > * @division Research and Development Division > * @agency Advanced Science and Technology Institute > * @url http://www.psigrid.gov.ph > */ > > > -- > /** > * @author Rey Vincent P. Babilonia > * @number +63 2 426 9760 local 1302 > * @pgp 0x383454CF pgp.mit.edu > * @project Philippine Bioinformatics Solutions > * @program Philippine e-Science Grid > * @division Research and Development Division > * @agency Advanced Science and Technology Institute > * @url http://www.psigrid.gov.ph > */ > > No virus found in this outgoing message. > Checked by AVG. > Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: 5/28/2008 5:33 PM > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Thu Jul 17 08:40:31 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:40:31 +0800 Subject: [Biojava-l] problems installing biojava on Windows XP professional In-Reply-To: <406135.22320.qm@web94608.mail.in2.yahoo.com> References: <406135.22320.qm@web94608.mail.in2.yahoo.com> Message-ID: <93b45ca50807170540u2cc9a797mb4572fe5cb54599d@mail.gmail.com> Hi - First off, depending on the version of biojava you downloaded you may need Java 5 (JDK 1.5) or later. Secondly, you need to add JAR files to the CLASSPATH variable not the PATH variable. PATH is where windows searches for executables. - Mark On Tue, Apr 29, 2008 at 1:22 PM, arunabha banerjee wrote: > Hello, > > > > I am new to using biojava. I am trying to install biojava on a PC running > > Windows XP professional. I am using Java 2 SDK version 1.4.2. I have > > downloaded the files in the "binaries" directory in the download area of the > > biojava server to the directory "C:\biojava" on my computer. I have added > the > > string > > > > > > "C:\biojava;C:\biojava\biojava.jar;C:\biojava\xerces.jar;C:\biojava\bytecode.jar;" > > > > > > to my PATH variable. When I try to compile one of the simple demo files, > > like AlphabetExample.java, I get error messages saying that the packages > > "org.biojava.bio.symbol.*" and "org.biojava.bio.seq.*" can't be found. Is > > there something else I have to do to get the biojava files installed > correctly? > > > > Thanks - > > Arunabha Banerjee > > ________________________________ > Explore your hobbies and interests. Click here to begin. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From markjschreiber at gmail.com Thu Jul 17 08:44:08 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:44:08 +0800 Subject: [Biojava-l] Problem parsing biojava xml file In-Reply-To: <4846DBFE.1060105@mpi-cbg.de> References: <4846DBFE.1060105@mpi-cbg.de> Message-ID: <93b45ca50807170544i7b7f52cfwbbbde0c844053f78@mail.gmail.com> Hi - In the past I have seen this when there are invisible metacharacters in the stream or file before the XML proper starts. This can happen with language variants of Unicode. Try trimming the String before parsing. - Mark On Thu, Jun 5, 2008 at 2:16 AM, benn wrote: > Hello, > > Sorry to pepper the board with questions! I am working on BLAST > parsing and have the standard output for BLAST working fine with JUnit > tests. So I am attempting to recreate this for files in XML format comming > from blast (blastp), however I have the problem that I get a SAXExepttion > that content is not allowed before prolog. I thought I could have some > invisible characters whihc is causing it to throw a wobbly but I cannto see > any. Has anyone else come across the problem. for completeness i have > attached teh blast file and the code to parse is below: > > > private List parseBlast(String filename) > throws IOException, SAXException, BioException { > > InputStream is = new FileInputStream( > "src/test/resources/blast/standardoutput.blastp"); > > BlastXMLParserFacade parser = new BlastXMLParserFacade(); > SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); > parser.setContentHandler(adapter); > List results = new > ArrayList(); > > SearchContentHandler builder = new BlastLikeSearchBuilder(results, > new DummySequenceDB("queries"), > new DummySequenceDBInstallation()); > > adapter.setSearchContentHandler(builder); > > parser.parse(new InputSource(is)); > return results; > } > > > Cheers, > > Neil > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From markjschreiber at gmail.com Thu Jul 17 08:50:15 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:50:15 +0800 Subject: [Biojava-l] Important notice about email handling on BioJava lists Message-ID: <93b45ca50807170550y722ebdc2qd4a1bb36b3b32206@mail.gmail.com> Hi - A lot of old emails just got posted to the list. This usually happens because messages that contain attachments or HTML get blocked by our aggressive spam filter. When our overworked admins get around to confirming they are not spam they eventually get through but probably too late to be of much help to you. Therefore... For prompt service when asking for help: 1) USE ONLY TEXT FORMAT EMAIL (NO HTML) 2) DON'T ADD ATTACHMENTS. If you want to post code just copy it in the body of the email. Although this might be a bit draconian we used to get badly spammed on the list so this is one of the easiest ways around it. Thanks, - Mark From ap3 at sanger.ac.uk Thu Jul 17 08:49:41 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 17 Jul 2008 13:49:41 +0100 Subject: [Biojava-l] biojava mailing lists Message-ID: <66377475-9986-4824-820F-A36F4AC979D9@sanger.ac.uk> Hi, You might have noticed a number of emails getting through to the mailing lists today with big delay. This happens if you post to the mailing list, without being subscribed to it. In order to avoid spam both lists only accept postings from list members. Anybody can become a list member, so please subscribe before you post. If you send without being subscribed your mail will get stuck in the moderation loop, which can cause several weeks of delay (no fun to read through all that spam). Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From dicknetherlands at gmail.com Thu Jul 17 15:14:39 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 17 Jul 2008 20:14:39 +0100 Subject: [Biojava-l] Problem while parsing GenBank-like files and persiting them using Hibernate In-Reply-To: <480844DB.6070808@uni-tuebingen.de> References: <480844DB.6070808@uni-tuebingen.de> Message-ID: I can't remember if I answered something like this before or not... anyhow here goes just in case! > 1. Is there a way to read in files downloaded from Ensembl using only the > designated BioJavaX classes? You could use the original ones and do some plain-text parsing of your own on the 'unrich' data. The 'rich' parsers adhere strictly to the official format, which does not include the Ensembl extensions (exon etc.). Therefore any attempt to 'enrich' the data will attempt to force it into the standard format, which as you see causes non-standard bits either to get skipped or converted into some kind of catch-all data type (such as 'any'). > 2. How can I extend the terms so that not only "SOME X-specific terms" are > included, but some more? And how do I tell the parser to use and apply these > terms? Or more generally, can I somehow read in an ontology (for instance > the GO), persist it in BioSQL and make use of the terms contained therein? It's a bit hard. I could have made this code easier to extend I think - wasn't planning on non-standard versions when I wrote it! Essentially the way to do this is to locate the appropriate XYZFormat.Terms class in an IDE such as Eclipse or NetBeans, then find a term similar to the one you want to use (in your case, you want to add 'exon' so find something similar in the GenbankFormat.Terms class), highlight it and do a 'find all usages'. That'll pretty quickly point you to the parts of the code which use the term. Add your new term to the XYZFormat.Terms class, then insert extra code in all the parts that 'find all usages' highlighted. > 3. How can I persist a sequence from Ensembl within a BioSQL database using > Hibernate even though they use different accession numbers? Find the regex and modify it to accept Ensembl-style accessions. Then, use 'find all usages' on the regex to find the place that uses it and modify those accordingly to pick up the correct groups from the regex and assign them to the data model, particularly if you reordered brackets etc. and therefore renumbered the groups in the regex. cheers, Richard From dicknetherlands at gmail.com Thu Jul 17 15:15:04 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 17 Jul 2008 20:15:04 +0100 Subject: [Biojava-l] Constructing Backbone of Protein In-Reply-To: <165609.65933.qm@web51412.mail.re2.yahoo.com> References: <165609.65933.qm@web51412.mail.re2.yahoo.com> Message-ID: Not sure. Andreas Prlic should know. Andreas....? 2008/5/13 Armita Sheari : > Hi everyone, > > I need to write a program that can construct the backbone of the protein > from its sequence and the relevant phi and psi angles. I want to know if > there is a class or method that can help me to calculate the coordinates > form phi and psi angles! > > thanks, > ArmitaSh > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From rvincent at asti.dost.gov.ph Thu Jul 17 21:59:47 2008 From: rvincent at asti.dost.gov.ph (Rey Vincent Babilonia) Date: Fri, 18 Jul 2008 09:59:47 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> Message-ID: <487FF913.6090504@asti.dost.gov.ph> Hi Mark, At first it throws an out of memory exception. My workaround is to subdivide the sequence file into individual GenBank files. The error now is that if a GenBank sequence has an 'empty alphabet', it does not get loaded to BioSQL. My workaround is to check if sequence.getAlphabet().getName() is DNA. Thanks. Mark Schreiber wrote: > Hi - > > Is the code throwing an exception or running out of memory?? > > Can you send an example program and the problem you encounter to the list. > - Mark > > On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia > wrote: >> >> -------- Original Message -------- >> Subject: large genbank data >> Date: Wed, 28 May 2008 18:02:48 +0800 >> From: Rey Vincent Babilonia >> To: biojava-l at biojava.org >> >> hi, >> >> anybody tried uploading a large genbank data (e.g. >> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >> it can't read the sequence (maybe because it has 30000+ sequences). >> >> thanks. >> >> -- >> /** >> * @author Rey Vincent P. Babilonia >> * @number +63 2 426 9760 local 1302 >> * @pgp 0x383454CF pgp.mit.edu >> * @project Philippine Bioinformatics Solutions >> * @program Philippine e-Science Grid >> * @division Research and Development Division >> * @agency Advanced Science and Technology Institute >> * @url http://www.psigrid.gov.ph >> */ >> >> >> -- >> /** >> * @author Rey Vincent P. Babilonia >> * @number +63 2 426 9760 local 1302 >> * @pgp 0x383454CF pgp.mit.edu >> * @project Philippine Bioinformatics Solutions >> * @program Philippine e-Science Grid >> * @division Research and Development Division >> * @agency Advanced Science and Technology Institute >> * @url http://www.psigrid.gov.ph >> */ >> >> No virus found in this outgoing message. >> Checked by AVG. >> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: 5/28/2008 5:33 PM >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > -- /** * @author Rey Vincent P. Babilonia * @number +63 2 426 9760 local 1302 * @pgp 0x383454CF pgp.mit.edu * @project Philippine Bioinformatics Solutions * @program Philippine e-Science Grid * @division Research and Development Division * @agency Advanced Science and Technology Institute * @url http://www.psigrid.gov.ph */ From rvincent at asti.dost.gov.ph Fri Jul 18 04:12:15 2008 From: rvincent at asti.dost.gov.ph (Rey Vincent Babilonia) Date: Fri, 18 Jul 2008 16:12:15 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <487FF913.6090504@asti.dost.gov.ph> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> Message-ID: <4880505F.7010308@asti.dost.gov.ph> Hi Mark, What is the maximum sequence length that a RichSequence can handle? java -Xms1024m -Xmx1256m -jar loader.jar . 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier 56384585, length 5528445 and alphabet DNA... org.hibernate.PropertyAccessException: Exception occurred inside getter of org.biojavax.bio.seq.SimpleRichSequence.sequenceLength Rey Vincent Babilonia wrote: > Hi Mark, > > At first it throws an out of memory exception. My workaround is to > subdivide the sequence file into individual GenBank files. > > The error now is that if a GenBank sequence has an 'empty alphabet', it > does not get loaded to BioSQL. My workaround is to check if > sequence.getAlphabet().getName() is DNA. > > Thanks. > > Mark Schreiber wrote: >> Hi - >> >> Is the code throwing an exception or running out of memory?? >> >> Can you send an example program and the problem you encounter to the >> list. >> - Mark >> >> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >> wrote: >>> >>> -------- Original Message -------- >>> Subject: large genbank data >>> Date: Wed, 28 May 2008 18:02:48 +0800 >>> From: Rey Vincent Babilonia >>> To: biojava-l at biojava.org >>> >>> hi, >>> >>> anybody tried uploading a large genbank data (e.g. >>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>> it can't read the sequence (maybe because it has 30000+ sequences). >>> >>> thanks. >>> >>> -- >>> /** >>> * @author Rey Vincent P. Babilonia >>> * @number +63 2 426 9760 local 1302 >>> * @pgp 0x383454CF pgp.mit.edu >>> * @project Philippine Bioinformatics Solutions >>> * @program Philippine e-Science Grid >>> * @division Research and Development Division >>> * @agency Advanced Science and Technology Institute >>> * @url http://www.psigrid.gov.ph >>> */ >>> >>> >>> -- >>> /** >>> * @author Rey Vincent P. Babilonia >>> * @number +63 2 426 9760 local 1302 >>> * @pgp 0x383454CF pgp.mit.edu >>> * @project Philippine Bioinformatics Solutions >>> * @program Philippine e-Science Grid >>> * @division Research and Development Division >>> * @agency Advanced Science and Technology Institute >>> * @url http://www.psigrid.gov.ph >>> */ >>> >>> No virus found in this outgoing message. >>> Checked by AVG. >>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>> 5/28/2008 5:33 PM >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> > -- /** * @author Rey Vincent P. Babilonia * @number +63 2 426 9760 local 1302 * @pgp 0x383454CF pgp.mit.edu * @project Philippine Bioinformatics Solutions * @program Philippine e-Science Grid * @division Research and Development Division * @agency Advanced Science and Technology Institute * @url http://www.psigrid.gov.ph */ From dicknetherlands at gmail.com Fri Jul 18 04:47:08 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 18 Jul 2008 09:47:08 +0100 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <4880505F.7010308@asti.dost.gov.ph> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> Message-ID: In order to persist to BioSQL, BioJava has to convert the symbol list into a string so that it can pass it to JDBC via Hibernate. Therefore the maximum length of a sequence you wish to persist to BioSQL is the maximum length of a string in Java, which is 65536 (2^16) if you are working in a UTF-8 environment. 2008/7/18 Rey Vincent Babilonia : > Hi Mark, > > What is the maximum sequence length that a RichSequence can handle? > > java -Xms1024m -Xmx1256m -jar loader.jar > . > 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. > 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier > 56384585, length 5528445 and alphabet DNA... > org.hibernate.PropertyAccessException: Exception occurred inside getter of > org.biojavax.bio.seq.SimpleRichSequence.sequenceLength > > Rey Vincent Babilonia wrote: >> >> Hi Mark, >> >> At first it throws an out of memory exception. My workaround is to >> subdivide the sequence file into individual GenBank files. >> >> The error now is that if a GenBank sequence has an 'empty alphabet', it >> does not get loaded to BioSQL. My workaround is to check if >> sequence.getAlphabet().getName() is DNA. >> >> Thanks. >> >> Mark Schreiber wrote: >>> >>> Hi - >>> >>> Is the code throwing an exception or running out of memory?? >>> >>> Can you send an example program and the problem you encounter to the >>> list. >>> - Mark >>> >>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>> wrote: >>>> >>>> -------- Original Message -------- >>>> Subject: large genbank data >>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>> From: Rey Vincent Babilonia >>>> To: biojava-l at biojava.org >>>> >>>> hi, >>>> >>>> anybody tried uploading a large genbank data (e.g. >>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>> >>>> thanks. >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> No virus found in this outgoing message. >>>> Checked by AVG. >>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>> 5/28/2008 5:33 PM >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> >> > > -- > /** > * @author Rey Vincent P. Babilonia > * @number +63 2 426 9760 local 1302 > * @pgp 0x383454CF pgp.mit.edu > * @project Philippine Bioinformatics Solutions > * @program Philippine e-Science Grid > * @division Research and Development Division > * @agency Advanced Science and Technology Institute > * @url http://www.psigrid.gov.ph > */ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From james at carmanconsulting.com Fri Jul 18 06:45:50 2008 From: james at carmanconsulting.com (James Carman) Date: Fri, 18 Jul 2008 06:45:50 -0400 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> Message-ID: That is a limitation for string literals, not any string. Correct? On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland wrote: > In order to persist to BioSQL, BioJava has to convert the symbol list > into a string so that it can pass it to JDBC via Hibernate. Therefore > the maximum length of a sequence you wish to persist to BioSQL is the > maximum length of a string in Java, which is 65536 (2^16) if you are > working in a UTF-8 environment. > > 2008/7/18 Rey Vincent Babilonia : >> Hi Mark, >> >> What is the maximum sequence length that a RichSequence can handle? >> >> java -Xms1024m -Xmx1256m -jar loader.jar >> . >> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >> 56384585, length 5528445 and alphabet DNA... >> org.hibernate.PropertyAccessException: Exception occurred inside getter of >> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >> >> Rey Vincent Babilonia wrote: >>> >>> Hi Mark, >>> >>> At first it throws an out of memory exception. My workaround is to >>> subdivide the sequence file into individual GenBank files. >>> >>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>> does not get loaded to BioSQL. My workaround is to check if >>> sequence.getAlphabet().getName() is DNA. >>> >>> Thanks. >>> >>> Mark Schreiber wrote: >>>> >>>> Hi - >>>> >>>> Is the code throwing an exception or running out of memory?? >>>> >>>> Can you send an example program and the problem you encounter to the >>>> list. >>>> - Mark >>>> >>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>> wrote: >>>>> >>>>> -------- Original Message -------- >>>>> Subject: large genbank data >>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>> From: Rey Vincent Babilonia >>>>> To: biojava-l at biojava.org >>>>> >>>>> hi, >>>>> >>>>> anybody tried uploading a large genbank data (e.g. >>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>> >>>>> thanks. >>>>> >>>>> -- >>>>> /** >>>>> * @author Rey Vincent P. Babilonia >>>>> * @number +63 2 426 9760 local 1302 >>>>> * @pgp 0x383454CF pgp.mit.edu >>>>> * @project Philippine Bioinformatics Solutions >>>>> * @program Philippine e-Science Grid >>>>> * @division Research and Development Division >>>>> * @agency Advanced Science and Technology Institute >>>>> * @url http://www.psigrid.gov.ph >>>>> */ >>>>> >>>>> >>>>> -- >>>>> /** >>>>> * @author Rey Vincent P. Babilonia >>>>> * @number +63 2 426 9760 local 1302 >>>>> * @pgp 0x383454CF pgp.mit.edu >>>>> * @project Philippine Bioinformatics Solutions >>>>> * @program Philippine e-Science Grid >>>>> * @division Research and Development Division >>>>> * @agency Advanced Science and Technology Institute >>>>> * @url http://www.psigrid.gov.ph >>>>> */ >>>>> >>>>> No virus found in this outgoing message. >>>>> Checked by AVG. >>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>> 5/28/2008 5:33 PM >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>> >>> >> >> -- >> /** >> * @author Rey Vincent P. Babilonia >> * @number +63 2 426 9760 local 1302 >> * @pgp 0x383454CF pgp.mit.edu >> * @project Philippine Bioinformatics Solutions >> * @program Philippine e-Science Grid >> * @division Research and Development Division >> * @agency Advanced Science and Technology Institute >> * @url http://www.psigrid.gov.ph >> */ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From markjschreiber at gmail.com Fri Jul 18 09:17:28 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 18 Jul 2008 21:17:28 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> Message-ID: <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> Was looking on the internet ... So the Java spec says nothing about an upper limit however the sun JDK implements String as a char[] (behind the scenes). Therefore I think that on the Sun JDK with the right amount of RAM you could go to 2^32 (except for string literals as mentioned above) which is 4,294,967,296 characters. So a string of a sequence should be able to get to about 4 billion bases. Of course if you don't assign enough memory to the JVM ( -Xmx4G) you won't be able to get close. Of course even if you can assign that much that doesn't account for all the other Java overhead and all the stuff Hibernate is doing with proxy classes etc. Also BioSQL usually defines sequence as a CLOB so depending on your DB implementation there may be a limit on that. On a 32 bit machine 4GB is all you can get per CPU so you would have issues trying to do anything bigger. Anyhow I know I have stored human chromosome 1 (approx 1 billion bases in memory). - Mark On Fri, Jul 18, 2008 at 6:45 PM, James Carman wrote: > That is a limitation for string literals, not any string. Correct? > > On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland > wrote: >> In order to persist to BioSQL, BioJava has to convert the symbol list >> into a string so that it can pass it to JDBC via Hibernate. Therefore >> the maximum length of a sequence you wish to persist to BioSQL is the >> maximum length of a string in Java, which is 65536 (2^16) if you are >> working in a UTF-8 environment. >> >> 2008/7/18 Rey Vincent Babilonia : >>> Hi Mark, >>> >>> What is the maximum sequence length that a RichSequence can handle? >>> >>> java -Xms1024m -Xmx1256m -jar loader.jar >>> . >>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >>> 56384585, length 5528445 and alphabet DNA... >>> org.hibernate.PropertyAccessException: Exception occurred inside getter of >>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >>> >>> Rey Vincent Babilonia wrote: >>>> >>>> Hi Mark, >>>> >>>> At first it throws an out of memory exception. My workaround is to >>>> subdivide the sequence file into individual GenBank files. >>>> >>>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>>> does not get loaded to BioSQL. My workaround is to check if >>>> sequence.getAlphabet().getName() is DNA. >>>> >>>> Thanks. >>>> >>>> Mark Schreiber wrote: >>>>> >>>>> Hi - >>>>> >>>>> Is the code throwing an exception or running out of memory?? >>>>> >>>>> Can you send an example program and the problem you encounter to the >>>>> list. >>>>> - Mark >>>>> >>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>>> wrote: >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: large genbank data >>>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>>> From: Rey Vincent Babilonia >>>>>> To: biojava-l at biojava.org >>>>>> >>>>>> hi, >>>>>> >>>>>> anybody tried uploading a large genbank data (e.g. >>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>>> >>>>>> thanks. >>>>>> >>>>>> -- >>>>>> /** >>>>>> * @author Rey Vincent P. Babilonia >>>>>> * @number +63 2 426 9760 local 1302 >>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>> * @project Philippine Bioinformatics Solutions >>>>>> * @program Philippine e-Science Grid >>>>>> * @division Research and Development Division >>>>>> * @agency Advanced Science and Technology Institute >>>>>> * @url http://www.psigrid.gov.ph >>>>>> */ >>>>>> >>>>>> >>>>>> -- >>>>>> /** >>>>>> * @author Rey Vincent P. Babilonia >>>>>> * @number +63 2 426 9760 local 1302 >>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>> * @project Philippine Bioinformatics Solutions >>>>>> * @program Philippine e-Science Grid >>>>>> * @division Research and Development Division >>>>>> * @agency Advanced Science and Technology Institute >>>>>> * @url http://www.psigrid.gov.ph >>>>>> */ >>>>>> >>>>>> No virus found in this outgoing message. >>>>>> Checked by AVG. >>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>>> 5/28/2008 5:33 PM >>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>> >>>> >>> >>> -- >>> /** >>> * @author Rey Vincent P. Babilonia >>> * @number +63 2 426 9760 local 1302 >>> * @pgp 0x383454CF pgp.mit.edu >>> * @project Philippine Bioinformatics Solutions >>> * @program Philippine e-Science Grid >>> * @division Research and Development Division >>> * @agency Advanced Science and Technology Institute >>> * @url http://www.psigrid.gov.ph >>> */ >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ap3 at sanger.ac.uk Fri Jul 18 10:05:20 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 18 Jul 2008 15:05:20 +0100 Subject: [Biojava-l] Constructing Backbone of Protein In-Reply-To: References: <165609.65933.qm@web51412.mail.re2.yahoo.com> Message-ID: <1EB8C2A5-9387-47A8-8088-C5E492FD3FC0@sanger.ac.uk> Hi Richard, This email actually managed to find its way through to the list back in May... http://www.biojava.org/pipermail/biojava-l/2008-May/006211.html Andreas On 17 Jul 2008, at 20:15, Richard Holland wrote: > Not sure. Andreas Prlic should know. Andreas....? > > 2008/5/13 Armita Sheari : >> Hi everyone, >> >> I need to write a program that can construct the backbone of the >> protein >> from its sequence and the relevant phi and psi angles. I want to >> know if >> there is a class or method that can help me to calculate the >> coordinates >> form phi and psi angles! >> >> thanks, >> ArmitaSh >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From super_fx at msn.com Fri Jul 18 10:22:33 2008 From: super_fx at msn.com (Mohammed AlQuraishi) Date: Fri, 18 Jul 2008 07:22:33 -0700 Subject: [Biojava-l] Constructing Backbone of Protein In-Reply-To: <1EB8C2A5-9387-47A8-8088-C5E492FD3FC0@sanger.ac.uk> References: <165609.65933.qm@web51412.mail.re2.yahoo.com> <1EB8C2A5-9387-47A8-8088-C5E492FD3FC0@sanger.ac.uk> Message-ID: In general it's not possible to accurately reconstruct a protein's backbone strictly from phi/psi angles--you'd need the bond lengths and bond angles (especially important) to have an accurate reconstruction. It is however possible to get an approximate reconstruction, particularly for short protein fragments, if you use "standard" values for bond lengths and angles, such as the ones here: http://scripts.iucr.org/cgi-bin/paper?li0061 I don't know if biojava has any methods specific for this purpose, but the link below contains a description of how to reconstruct the coordinates if you have the dihedral angles (and bond lengths and angles) that doesn't require more functionality than simple 3D transforms: https://lists.sdsc.edu/pipermail/pdb-l/2002-December/000326.html Hope this helps, Mohammed --- Mohammed AlQuraishi McAdams and Shapiro Labs Stanford University -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Friday, July 18, 2008 7:05 AM To: Richard Holland Cc: biojava-1 mailing list Subject: Re: [Biojava-l] Constructing Backbone of Protein Hi Richard, This email actually managed to find its way through to the list back in May... http://www.biojava.org/pipermail/biojava-l/2008-May/006211.html Andreas On 17 Jul 2008, at 20:15, Richard Holland wrote: > Not sure. Andreas Prlic should know. Andreas....? > > 2008/5/13 Armita Sheari : >> Hi everyone, >> >> I need to write a program that can construct the backbone of the >> protein >> from its sequence and the relevant phi and psi angles. I want to >> know if >> there is a class or method that can help me to calculate the >> coordinates >> form phi and psi angles! >> >> thanks, >> ArmitaSh >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From koen.bruynseels at cropdesign.com Fri Jul 18 10:48:23 2008 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Fri, 18 Jul 2008 16:48:23 +0200 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 18/07/2008 and will not return until 28/07/2008. I will respond to your message when I return. From dicknetherlands at gmail.com Fri Jul 18 11:44:49 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 18 Jul 2008 16:44:49 +0100 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> Message-ID: Hmm in that case it must be something else. Your original mail only posted the first couple of lines of the stack trace. Could you post the whole thing so we can take a closer look? 2008/7/18 Mark Schreiber : > Was looking on the internet ... > > So the Java spec says nothing about an upper limit however the sun JDK > implements String as a char[] (behind the scenes). Therefore I think > that on the Sun JDK with the right amount of RAM you could go to 2^32 > (except for string literals as mentioned above) which is 4,294,967,296 > characters. So a string of a sequence should be able to get to about 4 > billion bases. > > Of course if you don't assign enough memory to the JVM ( -Xmx4G) you > won't be able to get close. Of course even if you can assign that much > that doesn't account for all the other Java overhead and all the stuff > Hibernate is doing with proxy classes etc. Also BioSQL usually > defines sequence as a CLOB so depending on your DB implementation > there may be a limit on that. On a 32 bit machine 4GB is all you can > get per CPU so you would have issues trying to do anything bigger. > > Anyhow I know I have stored human chromosome 1 (approx 1 billion bases > in memory). > > > > - Mark > > On Fri, Jul 18, 2008 at 6:45 PM, James Carman > wrote: >> That is a limitation for string literals, not any string. Correct? >> >> On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland >> wrote: >>> In order to persist to BioSQL, BioJava has to convert the symbol list >>> into a string so that it can pass it to JDBC via Hibernate. Therefore >>> the maximum length of a sequence you wish to persist to BioSQL is the >>> maximum length of a string in Java, which is 65536 (2^16) if you are >>> working in a UTF-8 environment. >>> >>> 2008/7/18 Rey Vincent Babilonia : >>>> Hi Mark, >>>> >>>> What is the maximum sequence length that a RichSequence can handle? >>>> >>>> java -Xms1024m -Xmx1256m -jar loader.jar >>>> . >>>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >>>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >>>> 56384585, length 5528445 and alphabet DNA... >>>> org.hibernate.PropertyAccessException: Exception occurred inside getter of >>>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >>>> >>>> Rey Vincent Babilonia wrote: >>>>> >>>>> Hi Mark, >>>>> >>>>> At first it throws an out of memory exception. My workaround is to >>>>> subdivide the sequence file into individual GenBank files. >>>>> >>>>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>>>> does not get loaded to BioSQL. My workaround is to check if >>>>> sequence.getAlphabet().getName() is DNA. >>>>> >>>>> Thanks. >>>>> >>>>> Mark Schreiber wrote: >>>>>> >>>>>> Hi - >>>>>> >>>>>> Is the code throwing an exception or running out of memory?? >>>>>> >>>>>> Can you send an example program and the problem you encounter to the >>>>>> list. >>>>>> - Mark >>>>>> >>>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>>>> wrote: >>>>>>> >>>>>>> -------- Original Message -------- >>>>>>> Subject: large genbank data >>>>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>>>> From: Rey Vincent Babilonia >>>>>>> To: biojava-l at biojava.org >>>>>>> >>>>>>> hi, >>>>>>> >>>>>>> anybody tried uploading a large genbank data (e.g. >>>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>>>> >>>>>>> thanks. >>>>>>> >>>>>>> -- >>>>>>> /** >>>>>>> * @author Rey Vincent P. Babilonia >>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>> * @program Philippine e-Science Grid >>>>>>> * @division Research and Development Division >>>>>>> * @agency Advanced Science and Technology Institute >>>>>>> * @url http://www.psigrid.gov.ph >>>>>>> */ >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> /** >>>>>>> * @author Rey Vincent P. Babilonia >>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>> * @program Philippine e-Science Grid >>>>>>> * @division Research and Development Division >>>>>>> * @agency Advanced Science and Technology Institute >>>>>>> * @url http://www.psigrid.gov.ph >>>>>>> */ >>>>>>> >>>>>>> No virus found in this outgoing message. >>>>>>> Checked by AVG. >>>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>>>> 5/28/2008 5:33 PM >>>>>>> >>>>>>> _______________________________________________ >>>>>>> biojava-dev mailing list >>>>>>> biojava-dev at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From rvincent at asti.dost.gov.ph Sun Jul 20 22:35:04 2008 From: rvincent at asti.dost.gov.ph (Rey Vincent Babilonia) Date: Mon, 21 Jul 2008 10:35:04 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> Message-ID: <4883F5D8.5030908@asti.dost.gov.ph> Dear all, Here's the complete stack trace: 10:26:14,796 INFO Loader:296 - D:\AE000521.gbk is readable. 10:26:16,046 INFO Loader:340 - Alphabet of AE000521 is Empty Alphabet. Skipping... 10:26:16,250 INFO Loader:296 - D:\AE004438.gbk is readable. 10:26:20,750 FATAL Loader:334 - Sequence AE004438 already exists. 10:26:20,921 INFO Loader:296 - D:\AE005174.gbk is readable. 10:26:28,328 INFO Loader:326 - Loading sequence AE005174 with identifier 56384585, length 5528445 and alphabet DNA... org.hibernate.PropertyAccessException: Exception occurred inside getter of org.biojavax.bio.seq.SimpleRichSequence.sequenceLength at org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:148) at org.hibernate.tuple.entity.AbstractEntityTuplizer.getPropertyValues(AbstractEntityTuplizer.java:256) at org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValues(PojoEntityTuplizer.java:209) at org.hibernate.persister.entity.AbstractEntityPersister.getPropertyValues(AbstractEntityPersister.java:3581) at org.hibernate.event.def.DefaultMergeEventListener.copyValues(DefaultMergeEventListener.java:377) at org.hibernate.event.def.DefaultMergeEventListener.entityIsTransient(DefaultMergeEventListener.java:179) at org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:123) at org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:53) at org.hibernate.impl.SessionImpl.fireMerge(SessionImpl.java:677) at org.hibernate.impl.SessionImpl.merge(SessionImpl.java:661) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:328) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:145) ... 12 more Caused by: java.lang.NullPointerException at org.biojavax.bio.seq.SimpleRichSequence.length(SimpleRichSequence.java:91) at org.biojavax.bio.seq.SimpleRichSequence.getSequenceLength(SimpleRichSequence.java:97) ... 17 more 10:26:28,937 ERROR AbstractBatcher:51 - Exception executing batch: org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61) at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) 10:26:28,937 ERROR AbstractFlushingEventListener:301 - Could not synchronize database state with session org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61) at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) Exception in thread "main" org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61) at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) Richard Holland wrote: > Hmm in that case it must be something else. > > Your original mail only posted the first couple of lines of the stack > trace. Could you post the whole thing so we can take a closer look? > > 2008/7/18 Mark Schreiber : >> Was looking on the internet ... >> >> So the Java spec says nothing about an upper limit however the sun JDK >> implements String as a char[] (behind the scenes). Therefore I think >> that on the Sun JDK with the right amount of RAM you could go to 2^32 >> (except for string literals as mentioned above) which is 4,294,967,296 >> characters. So a string of a sequence should be able to get to about 4 >> billion bases. >> >> Of course if you don't assign enough memory to the JVM ( -Xmx4G) you >> won't be able to get close. Of course even if you can assign that much >> that doesn't account for all the other Java overhead and all the stuff >> Hibernate is doing with proxy classes etc. Also BioSQL usually >> defines sequence as a CLOB so depending on your DB implementation >> there may be a limit on that. On a 32 bit machine 4GB is all you can >> get per CPU so you would have issues trying to do anything bigger. >> >> Anyhow I know I have stored human chromosome 1 (approx 1 billion bases >> in memory). >> >> >> >> - Mark >> >> On Fri, Jul 18, 2008 at 6:45 PM, James Carman >> wrote: >>> That is a limitation for string literals, not any string. Correct? >>> >>> On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland >>> wrote: >>>> In order to persist to BioSQL, BioJava has to convert the symbol list >>>> into a string so that it can pass it to JDBC via Hibernate. Therefore >>>> the maximum length of a sequence you wish to persist to BioSQL is the >>>> maximum length of a string in Java, which is 65536 (2^16) if you are >>>> working in a UTF-8 environment. >>>> >>>> 2008/7/18 Rey Vincent Babilonia : >>>>> Hi Mark, >>>>> >>>>> What is the maximum sequence length that a RichSequence can handle? >>>>> >>>>> java -Xms1024m -Xmx1256m -jar loader.jar >>>>> . >>>>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >>>>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >>>>> 56384585, length 5528445 and alphabet DNA... >>>>> org.hibernate.PropertyAccessException: Exception occurred inside getter of >>>>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >>>>> >>>>> Rey Vincent Babilonia wrote: >>>>>> Hi Mark, >>>>>> >>>>>> At first it throws an out of memory exception. My workaround is to >>>>>> subdivide the sequence file into individual GenBank files. >>>>>> >>>>>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>>>>> does not get loaded to BioSQL. My workaround is to check if >>>>>> sequence.getAlphabet().getName() is DNA. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Mark Schreiber wrote: >>>>>>> Hi - >>>>>>> >>>>>>> Is the code throwing an exception or running out of memory?? >>>>>>> >>>>>>> Can you send an example program and the problem you encounter to the >>>>>>> list. >>>>>>> - Mark >>>>>>> >>>>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>>>>> wrote: >>>>>>>> -------- Original Message -------- >>>>>>>> Subject: large genbank data >>>>>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>>>>> From: Rey Vincent Babilonia >>>>>>>> To: biojava-l at biojava.org >>>>>>>> >>>>>>>> hi, >>>>>>>> >>>>>>>> anybody tried uploading a large genbank data (e.g. >>>>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>>>>> >>>>>>>> thanks. >>>>>>>> >>>>>>>> -- >>>>>>>> /** >>>>>>>> * @author Rey Vincent P. Babilonia >>>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>>> * @program Philippine e-Science Grid >>>>>>>> * @division Research and Development Division >>>>>>>> * @agency Advanced Science and Technology Institute >>>>>>>> * @url http://www.psigrid.gov.ph >>>>>>>> */ >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> /** >>>>>>>> * @author Rey Vincent P. Babilonia >>>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>>> * @program Philippine e-Science Grid >>>>>>>> * @division Research and Development Division >>>>>>>> * @agency Advanced Science and Technology Institute >>>>>>>> * @url http://www.psigrid.gov.ph >>>>>>>> */ >>>>>>>> >>>>>>>> No virus found in this outgoing message. >>>>>>>> Checked by AVG. >>>>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>>>>> 5/28/2008 5:33 PM >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> biojava-dev mailing list >>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>> >>>>> -- >>>>> /** >>>>> * @author Rey Vincent P. Babilonia >>>>> * @number +63 2 426 9760 local 1302 >>>>> * @pgp 0x383454CF pgp.mit.edu >>>>> * @project Philippine Bioinformatics Solutions >>>>> * @program Philippine e-Science Grid >>>>> * @division Research and Development Division >>>>> * @agency Advanced Science and Technology Institute >>>>> * @url http://www.psigrid.gov.ph >>>>> */ >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- /** * @author Rey Vincent P. Babilonia * @number +63 2 426 9760 local 1302 * @pgp 0x383454CF pgp.mit.edu * @project Philippine Bioinformatics Solutions * @program Philippine e-Science Grid * @division Research and Development Division * @agency Advanced Science and Technology Institute * @url http://www.psigrid.gov.ph */ From holland at eaglegenomics.com Mon Jul 21 05:28:46 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 21 Jul 2008 10:28:46 +0100 Subject: [Biojava-l] BioJava3 Use Cases Message-ID: Hi guys, I'd like to repeat an earlier request for use cases to guide the new BioJava 3 development work. We have a wiki page for this but it hasn't seen many updates: http://biojava.org/wiki/BioJava_3_Use_Cases Could anyone who has a task which BioJava cannot currently achieve, or does not achieve correctly, please add that task to this wiki page, so that we can try and implement it in the new code. A template for a use case has been provided on that same wiki page which you should follow when submitting your own suggestions. Basically the rule is that saying something like 'I want microarray support' isn't likely to get much of a response, but asking for a specific function, e.g. 'I want to be able to parse MAGE files' or 'I want to use XYZ technique to analyse my own chip designs', will get you a lot further. I'm setting a cut-off date for the initial list of use-cases at August 1st. Whatever's on the page at that point will be considered for implementation in the first phase of development over the next 6 months, along with updates or transfers of functionality from the existing code base where appropriate. Anything that gets added to the list after that date will only get implemented in the second later phase, date indeterminate as yet, unless whoever submits the use case also chooses to submit their own code to solve it! cheers, Richard -- Richard Holland Bioinformatics Software Developer Eagle Genomics http://www.eaglegenomics.com/ From charles at imbusch.net Wed Jul 23 05:40:30 2008 From: charles at imbusch.net (Charles Imbusch) Date: Wed, 23 Jul 2008 11:40:30 +0200 Subject: [Biojava-l] parsing BLAST result Message-ID: <4886FC8E.4070400@imbusch.net> Hello, for a project I have to parse Blast output files. To do this I used the code provided on this page: http://biojava.org/wiki/BioJava:CookBook:Blast:Parser I'm interested in the start and stop positions of the subject I align with, so I adjusted the code a bit so that it looks like: //list the hits for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); System.out.print("\tmatch: "+hit.getSubjectID()); System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); System.out.println("\te score: "+hit.getEValue()); } I execute "java BlastParserOriginal S2431-F.fasta.txt" and have a look at the best hit: ... match: 48_scaffold.txt SubSeqStart: 3320 SubSeqStop: 2952643 e score: 0.0 ... The subject id is correct but the numbers are just nonsense. It should be 610956 for the start and 610367 for the end position. This doesn't happen will all Blast result files but with some. Is there a solution for that? How do you parse the Blast files? I just uploaded the Blast output to http://charles.imbusch.net/tmp/ Any answer is appreciated. Cheers, Charles From holland at eaglegenomics.com Wed Jul 23 14:20:53 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 23 Jul 2008 19:20:53 +0100 Subject: [Biojava-l] parsing BLAST result In-Reply-To: <4886FC8E.4070400@imbusch.net> References: <4886FC8E.4070400@imbusch.net> Message-ID: Your hits consist of numerous sub-hits, which means that the hits themselves don't contain meaningful data. You can get the sub-hits by doing this: // existing code to list the hits for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); System.out.print("\tmatch: "+hit.getSubjectID()); System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); System.out.println("\te score: "+hit.getEValue()); // new code to get the subhits System.out.println("\t\t Subhits:"); for (Iterator j = hit.getSubHits().iterator(); j.hasNext(); ) { SeqSimilaritySearchSubHit subhit = (SeqSimilaritySearchSubHit)j.next(); System.out.print("\t\tSubSeqStart: "+subhit.getSubjectStart()); System.out.print("\t\tSubSeqStop: "+subhit.getSubjectEnd()); System.out.println("\t\te score: "+subhit.getEValue()); } } cheers, Richard 2008/7/23 Charles Imbusch : > Hello, > > for a project I have to parse Blast output files. To do this I used the code > provided on this page: > > http://biojava.org/wiki/BioJava:CookBook:Blast:Parser > > I'm interested in the start and stop positions of the subject I align with, > so > I adjusted the code a bit so that it looks like: > > //list the hits > for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { > SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); > System.out.print("\tmatch: "+hit.getSubjectID()); > System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); > System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); > System.out.println("\te score: "+hit.getEValue()); > } > > I execute "java BlastParserOriginal S2431-F.fasta.txt" and have a look at > the > best hit: > ... > match: 48_scaffold.txt SubSeqStart: 3320 SubSeqStop: 2952643 e > score: 0.0 > ... > The subject id is correct but the numbers are just nonsense. It should be > 610956 for the start > and 610367 for the end position. > > This doesn't happen will all Blast result files but with some. Is there a > solution for that? How > do you parse the Blast files? > > I just uploaded the Blast output to http://charles.imbusch.net/tmp/ > > Any answer is appreciated. > > Cheers, > Charles > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland Bioinformatics Software Developer Eagle Genomics http://www.eaglegenomics.com/ From charles at imbusch.net Wed Jul 23 19:22:52 2008 From: charles at imbusch.net (Charles Imbusch) Date: Thu, 24 Jul 2008 01:22:52 +0200 Subject: [Biojava-l] parsing BLAST result In-Reply-To: References: <4886FC8E.4070400@imbusch.net> Message-ID: <4887BD4C.8090509@imbusch.net> Thanks for that information. That did the job! Cheers, Charles Richard Holland wrote: > Your hits consist of numerous sub-hits, which means that the hits > themselves don't contain meaningful data. You can get the sub-hits by > doing this: > > // existing code to list the hits > for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { > SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); > System.out.print("\tmatch: "+hit.getSubjectID()); > System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); > System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); > System.out.println("\te score: "+hit.getEValue()); > > // new code to get the subhits > System.out.println("\t\t Subhits:"); > for (Iterator j = hit.getSubHits().iterator(); j.hasNext(); ) { > SeqSimilaritySearchSubHit subhit = > (SeqSimilaritySearchSubHit)j.next(); > System.out.print("\t\tSubSeqStart: "+subhit.getSubjectStart()); > System.out.print("\t\tSubSeqStop: "+subhit.getSubjectEnd()); > System.out.println("\t\te score: "+subhit.getEValue()); > } > } > > > cheers, > Richard > From peter.robinson at t-online.de Sat Jul 26 06:41:49 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sat, 26 Jul 2008 12:41:49 +0200 Subject: [Biojava-l] Installation woes Message-ID: <488AFF6D.1000505@t-online.de> Hi Biojava, I am entirely new to Biojava and have limited Java experience (C is more my thing), and so this is almost certainly a dumb question, but I cannot seem to find an answer in the online docs. I am running debian 4 linux and have: java version "1.6.0_06" Java(TM) SE Runtime Environment (build 1.6.0_06-b02) Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) I have downloaded the biojava code, unpacked it, and set the CLASSPATH in bashrc : BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 export CLASSPATH=${BIOJAVA_BASE}/biojava.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar export CLASSPATH=${CLASSPATH}:. This also goes through without error from the command line. However, when I try to compile one of the test programs as instructed on the page: http://biojava.org/wiki/BioJava:GetStarted peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java I get a bunch of errors, apparently javac cannot find the imports it needs. (see bottom of this mail). I would greatly appreciate any tips how to get started here! Thanks, Peter peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java seq/TestEmbl.java:25: package org.biojavax does not exist import org.biojavax.Namespace; ^ seq/TestEmbl.java:26: package org.biojavax does not exist import org.biojavax.RichObjectFactory; ^ seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequence; ^ seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequenceIterator; ^ seq/TestEmbl.java:48: cannot find symbol symbol : class Namespace location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:48: cannot find symbol symbol : variable RichObjectFactory location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:50: cannot find symbol symbol : class RichSequenceIterator location: class seq.TestEmbl RichSequenceIterator seqI = ^ seq/TestEmbl.java:51: package RichSequence does not exist RichSequence.IOTools.readEMBLDNA(br, ns); ^ seq/TestEmbl.java:54: cannot find symbol symbol : class RichSequence location: class seq.TestEmbl RichSequence seq = seqI.nextRichSequence(); ^ seq/TestEmbl.java:57: package RichSequence does not exist RichSequence.IOTools.writeEMBL(System.out, seq, ns); ^ 10 errors peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version java version "1.6.0_06" Java(TM) SE Runtime Environment (build 1.6.0_06-b02) Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java seq/TestEmbl.java:25: package org.biojavax does not exist import org.biojavax.Namespace; ^ seq/TestEmbl.java:26: package org.biojavax does not exist import org.biojavax.RichObjectFactory; ^ seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequence; ^ seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequenceIterator; ^ seq/TestEmbl.java:48: cannot find symbol symbol : class Namespace location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:48: cannot find symbol symbol : variable RichObjectFactory location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:50: cannot find symbol symbol : class RichSequenceIterator location: class seq.TestEmbl RichSequenceIterator seqI = ^ seq/TestEmbl.java:51: package RichSequence does not exist RichSequence.IOTools.readEMBLDNA(br, ns); ^ seq/TestEmbl.java:54: cannot find symbol symbol : class RichSequence location: class seq.TestEmbl RichSequence seq = seqI.nextRichSequence(); ^ seq/TestEmbl.java:57: package RichSequence does not exist RichSequence.IOTools.writeEMBL(System.out, seq, ns); ^ 10 errors peter at peter:~/bin/biojava/biojava-live_1.6/demos$ From james at carmanconsulting.com Sat Jul 26 08:40:55 2008 From: james at carmanconsulting.com (James Carman) Date: Sat, 26 Jul 2008 08:40:55 -0400 Subject: [Biojava-l] Installation woes In-Reply-To: <488AFF6D.1000505@t-online.de> References: <488AFF6D.1000505@t-online.de> Message-ID: Try export CLASSPATH=$CLASSPATH:... Basically, remove the "squiggly braces" On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson wrote: > Hi Biojava, > > I am entirely new to Biojava and have limited Java experience (C is more my > thing), and so this is almost certainly a dumb question, but I cannot seem > to find an answer in the online docs. I am running debian 4 linux and have: > > java version "1.6.0_06" > Java(TM) SE Runtime Environment (build 1.6.0_06-b02) > Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) > > > > I have downloaded the biojava code, unpacked it, and set the CLASSPATH in > bashrc : > > BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 > export CLASSPATH=${BIOJAVA_BASE}/biojava.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar > export CLASSPATH=${CLASSPATH}:. > > > This also goes through without error from the command line. However, when I > try to compile one of the test programs as instructed on the page: > http://biojava.org/wiki/BioJava:GetStarted > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > > > I get a bunch of errors, apparently javac cannot find the imports it needs. > (see bottom of this mail). > > I would greatly appreciate any tips how to get started here! > Thanks, Peter > > > > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > seq/TestEmbl.java:25: package org.biojavax does not exist > import org.biojavax.Namespace; > ^ > seq/TestEmbl.java:26: package org.biojavax does not exist > import org.biojavax.RichObjectFactory; > ^ > seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequence; > ^ > seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequenceIterator; > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : class Namespace > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : variable RichObjectFactory > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:50: cannot find symbol > symbol : class RichSequenceIterator > location: class seq.TestEmbl > RichSequenceIterator seqI = > ^ > seq/TestEmbl.java:51: package RichSequence does not exist > RichSequence.IOTools.readEMBLDNA(br, ns); > ^ > seq/TestEmbl.java:54: cannot find symbol > symbol : class RichSequence > location: class seq.TestEmbl > RichSequence seq = seqI.nextRichSequence(); > ^ > seq/TestEmbl.java:57: package RichSequence does not exist > RichSequence.IOTools.writeEMBL(System.out, seq, ns); > ^ > 10 errors > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version > java version "1.6.0_06" > Java(TM) SE Runtime Environment (build 1.6.0_06-b02) > Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > seq/TestEmbl.java:25: package org.biojavax does not exist > import org.biojavax.Namespace; > ^ > seq/TestEmbl.java:26: package org.biojavax does not exist > import org.biojavax.RichObjectFactory; > ^ > seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequence; > ^ > seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequenceIterator; > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : class Namespace > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : variable RichObjectFactory > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:50: cannot find symbol > symbol : class RichSequenceIterator > location: class seq.TestEmbl > RichSequenceIterator seqI = > ^ > seq/TestEmbl.java:51: package RichSequence does not exist > RichSequence.IOTools.readEMBLDNA(br, ns); > ^ > seq/TestEmbl.java:54: cannot find symbol > symbol : class RichSequence > location: class seq.TestEmbl > RichSequence seq = seqI.nextRichSequence(); > ^ > seq/TestEmbl.java:57: package RichSequence does not exist > RichSequence.IOTools.writeEMBL(System.out, seq, ns); > ^ > 10 errors > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From peter.robinson at t-online.de Sun Jul 27 03:23:13 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sun, 27 Jul 2008 09:23:13 +0200 Subject: [Biojava-l] Installation woes In-Reply-To: References: <488AFF6D.1000505@t-online.de> Message-ID: <488C2261.9000303@t-online.de> Hi, Thanks. I think that squiggly braces are OK for the shell, but in any case, I removed them from .bashrc, which now goes as follows: export CLASSPATH=/home/peter/bin/biojava/biojava-live_1.6/biojava.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar export CLASSPATH=$CLASSPATH:. **************The class path variable seems to be set OK peter at peter:~/bin/biojava/biojava-live_1.6/demos$ echo $CLASSPATH /home/peter/bin/biojava/biojava-live_1.6/biojava.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar:. ************ The paths appear to be correct: peter at peter:~/bin/biojava/biojava-live_1.6/demos$ ls /home/peter/bin/biojava/biojava-live_1.6/*.jar /home/peter/bin/biojava/biojava-live_1.6/apps-live.jar /home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar /home/peter/bin/biojava/biojava-live_1.6/biojava-live.jar /home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar /home/peter/bin/biojava/biojava-live_1.6/bytecode.jar /home/peter/bin/biojava/biojava-live_1.6/demos-live.jar /home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar /home/peter/bin/biojava/biojava-live_1.6/jgrapht-jdk1.5.jar /home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar /home/peter/bin/biojava/biojava-live_1.6/junit-4.4.jar ***********But again, I cannot compile any of the demo programs peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java seq/TestEmbl.java:25: package org.biojavax does not exist import org.biojavax.Namespace; ^ seq/TestEmbl.java:26: package org.biojavax does not exist import org.biojavax.RichObjectFactory; ^ seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequence; ^ seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequenceIterator; ^ seq/TestEmbl.java:48: cannot find symbol symbol : class Namespace location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:48: cannot find symbol symbol : variable RichObjectFactory location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:50: cannot find symbol symbol : class RichSequenceIterator location: class seq.TestEmbl RichSequenceIterator seqI = ^ seq/TestEmbl.java:51: package RichSequence does not exist RichSequence.IOTools.readEMBLDNA(br, ns); ^ seq/TestEmbl.java:54: cannot find symbol symbol : class RichSequence location: class seq.TestEmbl RichSequence seq = seqI.nextRichSequence(); ^ seq/TestEmbl.java:57: package RichSequence does not exist RichSequence.IOTools.writeEMBL(System.out, seq, ns); ^ 10 errors peter at peter:~/bin/biojava/biojava-live_1.6/demos$ Richard Holland wrote: > Hello. > > Before typing the javac instruction, type the following to check what > your classpath actually contains: > > echo $CLASSPATH > > If this doesn't immediately 'look right' (i.e. it has curly braces or > variable names embedded in it, or doesn't match where you think the > files are), then this'll be where the problem is. > > If you can't see any obvious problems with it, then post it as a reply > to this message and we can take a closer look. > > cheers, > Richard > > > 2008/7/26 James Carman : > >> Try export CLASSPATH=$CLASSPATH:... >> >> Basically, remove the "squiggly braces" >> >> >> On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson >> wrote: >> >>> Hi Biojava, >>> >>> I am entirely new to Biojava and have limited Java experience (C is more my >>> thing), and so this is almost certainly a dumb question, but I cannot seem >>> to find an answer in the online docs. I am running debian 4 linux and have: >>> >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> >>> >>> >>> I have downloaded the biojava code, unpacked it, and set the CLASSPATH in >>> bashrc : >>> >>> BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 >>> export CLASSPATH=${BIOJAVA_BASE}/biojava.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar >>> export CLASSPATH=${CLASSPATH}:. >>> >>> >>> This also goes through without error from the command line. However, when I >>> try to compile one of the test programs as instructed on the page: >>> http://biojava.org/wiki/BioJava:GetStarted >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> >>> >>> I get a bunch of errors, apparently javac cannot find the imports it needs. >>> (see bottom of this mail). >>> >>> I would greatly appreciate any tips how to get started here! >>> Thanks, Peter >>> >>> >>> >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > > > From peter.robinson at t-online.de Sun Jul 27 03:46:56 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sun, 27 Jul 2008 09:46:56 +0200 Subject: [Biojava-l] Installation woes [SOLVED] In-Reply-To: References: <488AFF6D.1000505@t-online.de> Message-ID: <488C27F0.6070705@t-online.de> Richard Holland wrote: I found the problem. The file biojava-live.jar needs to be added to the CLASSPATH This means that the page http://biojava.org/wiki/BioJava:GetStarted needs to be corrected! cheers, Peter > Hello. > > Before typing the javac instruction, type the following to check what > your classpath actually contains: > > echo $CLASSPATH > > If this doesn't immediately 'look right' (i.e. it has curly braces or > variable names embedded in it, or doesn't match where you think the > files are), then this'll be where the problem is. > > If you can't see any obvious problems with it, then post it as a reply > to this message and we can take a closer look. > > cheers, > Richard > > > 2008/7/26 James Carman : > >> Try export CLASSPATH=$CLASSPATH:... >> >> Basically, remove the "squiggly braces" >> >> >> On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson >> wrote: >> >>> Hi Biojava, >>> >>> I am entirely new to Biojava and have limited Java experience (C is more my >>> thing), and so this is almost certainly a dumb question, but I cannot seem >>> to find an answer in the online docs. I am running debian 4 linux and have: >>> >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> >>> >>> >>> I have downloaded the biojava code, unpacked it, and set the CLASSPATH in >>> bashrc : >>> >>> BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 >>> export CLASSPATH=${BIOJAVA_BASE}/biojava.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar >>> export CLASSPATH=${CLASSPATH}:. >>> >>> >>> This also goes through without error from the command line. However, when I >>> try to compile one of the test programs as instructed on the page: >>> http://biojava.org/wiki/BioJava:GetStarted >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> >>> >>> I get a bunch of errors, apparently javac cannot find the imports it needs. >>> (see bottom of this mail). >>> >>> I would greatly appreciate any tips how to get started here! >>> Thanks, Peter >>> >>> >>> >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > > > From james at carmanconsulting.com Sun Jul 27 07:43:49 2008 From: james at carmanconsulting.com (James Carman) Date: Sun, 27 Jul 2008 07:43:49 -0400 Subject: [Biojava-l] Installation woes In-Reply-To: <488C2261.9000303@t-online.de> References: <488AFF6D.1000505@t-online.de> <488C2261.9000303@t-online.de> Message-ID: This is exactly why BioJava needs to be Mavenized! On Sun, Jul 27, 2008 at 3:23 AM, Peter Robinson wrote: > Hi, > Thanks. I think that squiggly braces are OK for the shell, but in any case, > I removed them from .bashrc, which now goes as follows: > > > export CLASSPATH=/home/peter/bin/biojava/biojava-live_1.6/biojava.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar > export CLASSPATH=$CLASSPATH:. > > **************The class path variable seems to be set OK > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ echo $CLASSPATH > /home/peter/bin/biojava/biojava-live_1.6/biojava.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar:. > > ************ The paths appear to be correct: > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ ls > /home/peter/bin/biojava/biojava-live_1.6/*.jar > /home/peter/bin/biojava/biojava-live_1.6/apps-live.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar > /home/peter/bin/biojava/biojava-live_1.6/biojava-live.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar > /home/peter/bin/biojava/biojava-live_1.6/bytecode.jar > /home/peter/bin/biojava/biojava-live_1.6/demos-live.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar > /home/peter/bin/biojava/biojava-live_1.6/jgrapht-jdk1.5.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar > /home/peter/bin/biojava/biojava-live_1.6/junit-4.4.jar > > ***********But again, I cannot compile any of the demo programs > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > seq/TestEmbl.java:25: package org.biojavax does not exist > import org.biojavax.Namespace; > ^ > seq/TestEmbl.java:26: package org.biojavax does not exist > import org.biojavax.RichObjectFactory; > ^ > seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequence; > ^ > seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequenceIterator; > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : class Namespace > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : variable RichObjectFactory > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:50: cannot find symbol > symbol : class RichSequenceIterator > location: class seq.TestEmbl > RichSequenceIterator seqI = > ^ > seq/TestEmbl.java:51: package RichSequence does not exist > RichSequence.IOTools.readEMBLDNA(br, ns); > ^ > seq/TestEmbl.java:54: cannot find symbol > symbol : class RichSequence > location: class seq.TestEmbl > RichSequence seq = seqI.nextRichSequence(); > ^ > seq/TestEmbl.java:57: package RichSequence does not exist > RichSequence.IOTools.writeEMBL(System.out, seq, ns); > ^ > 10 errors > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ > > > Richard Holland wrote: >> >> Hello. >> >> Before typing the javac instruction, type the following to check what >> your classpath actually contains: >> >> echo $CLASSPATH >> >> If this doesn't immediately 'look right' (i.e. it has curly braces or >> variable names embedded in it, or doesn't match where you think the >> files are), then this'll be where the problem is. >> >> If you can't see any obvious problems with it, then post it as a reply >> to this message and we can take a closer look. >> >> cheers, >> Richard >> >> >> 2008/7/26 James Carman : >> >>> >>> Try export CLASSPATH=$CLASSPATH:... >>> >>> Basically, remove the "squiggly braces" >>> >>> >>> On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson >>> wrote: >>> >>>> >>>> Hi Biojava, >>>> >>>> I am entirely new to Biojava and have limited Java experience (C is more >>>> my >>>> thing), and so this is almost certainly a dumb question, but I cannot >>>> seem >>>> to find an answer in the online docs. I am running debian 4 linux and >>>> have: >>>> >>>> java version "1.6.0_06" >>>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>>> >>>> >>>> >>>> I have downloaded the biojava code, unpacked it, and set the CLASSPATH >>>> in >>>> bashrc : >>>> >>>> BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 >>>> export CLASSPATH=${BIOJAVA_BASE}/biojava.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar >>>> export >>>> CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar >>>> export CLASSPATH=${CLASSPATH}:. >>>> >>>> >>>> This also goes through without error from the command line. However, >>>> when I >>>> try to compile one of the test programs as instructed on the page: >>>> http://biojava.org/wiki/BioJava:GetStarted >>>> >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac >>>> seq/TestEmbl.java >>>> >>>> >>>> I get a bunch of errors, apparently javac cannot find the imports it >>>> needs. >>>> (see bottom of this mail). >>>> >>>> I would greatly appreciate any tips how to get started here! >>>> Thanks, Peter >>>> >>>> >>>> >>>> >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac >>>> seq/TestEmbl.java >>>> seq/TestEmbl.java:25: package org.biojavax does not exist >>>> import org.biojavax.Namespace; >>>> ^ >>>> seq/TestEmbl.java:26: package org.biojavax does not exist >>>> import org.biojavax.RichObjectFactory; >>>> ^ >>>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequence; >>>> ^ >>>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequenceIterator; >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : class Namespace >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : variable RichObjectFactory >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:50: cannot find symbol >>>> symbol : class RichSequenceIterator >>>> location: class seq.TestEmbl >>>> RichSequenceIterator seqI = >>>> ^ >>>> seq/TestEmbl.java:51: package RichSequence does not exist >>>> RichSequence.IOTools.readEMBLDNA(br, ns); >>>> ^ >>>> seq/TestEmbl.java:54: cannot find symbol >>>> symbol : class RichSequence >>>> location: class seq.TestEmbl >>>> RichSequence seq = seqI.nextRichSequence(); >>>> ^ >>>> seq/TestEmbl.java:57: package RichSequence does not exist >>>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>>> ^ >>>> 10 errors >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version >>>> java version "1.6.0_06" >>>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac >>>> seq/TestEmbl.java >>>> seq/TestEmbl.java:25: package org.biojavax does not exist >>>> import org.biojavax.Namespace; >>>> ^ >>>> seq/TestEmbl.java:26: package org.biojavax does not exist >>>> import org.biojavax.RichObjectFactory; >>>> ^ >>>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequence; >>>> ^ >>>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequenceIterator; >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : class Namespace >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : variable RichObjectFactory >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:50: cannot find symbol >>>> symbol : class RichSequenceIterator >>>> location: class seq.TestEmbl >>>> RichSequenceIterator seqI = >>>> ^ >>>> seq/TestEmbl.java:51: package RichSequence does not exist >>>> RichSequence.IOTools.readEMBLDNA(br, ns); >>>> ^ >>>> seq/TestEmbl.java:54: cannot find symbol >>>> symbol : class RichSequence >>>> location: class seq.TestEmbl >>>> RichSequence seq = seqI.nextRichSequence(); >>>> ^ >>>> seq/TestEmbl.java:57: package RichSequence does not exist >>>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>>> ^ >>>> 10 errors >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> >> >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas.prlic at gmail.com Sun Jul 27 09:06:18 2008 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Sun, 27 Jul 2008 06:06:18 -0700 Subject: [Biojava-l] build.xml (was: Installation woes [SOLVED]) Message-ID: <59a41c430807270606o135cc94ai26a9ede906e4967@mail.gmail.com> Hi, I built the release with the default biojava ant build file. "ant dist" , which contains the line i.e. it should be changed there... In general about our build.xml: This file contains many tasks. I believe it might make sense to split it into smaller files e.g. * a build.xml that contains the core tasks to build biojava from svn and a * build-release.xml that contains the tasks that are release related... Andreas On 27 Jul 2008, at 00:46, Peter Robinson wrote: Richard Holland wrote: I found the problem. The file biojava-live.jar needs to be added to the CLASSPATH This means that the page http://biojava.org/wiki/BioJava:GetStarted needs to be corrected! From peter.robinson at t-online.de Sun Jul 27 11:57:28 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sun, 27 Jul 2008 17:57:28 +0200 Subject: [Biojava-l] Short names for Amino acid symbols Message-ID: <488C9AE8.9080305@t-online.de> Hi, thanks to all on the list who helped me get started with Biojava, and by the way, the online documents are quite helpful! I am trying to develop some code to look for signs of positive selection in human sequences by making multiple alignments of protein sequences and mapping the nucleotide sequences onto this alignment and checking synonymous and nonsynonymous nucleotide substitutions in several species (etc). A few small questions; 1) I have written a class to encapsulate all I need from a given Genbank mRNA sequence; the entire mRNA, the CDS and the corresponding protein sequence. I have some methods such as the following: private void setCDSSequence() { Feature CDS = getCDSFeature(this.completeSequence); Location loc = CDS.getLocation(); SymbolList symL = this.completeSequence.subList(loc.getMin(), loc.getMax()-3); //-3 to remove stop codon this.CDS= symL; } Question: Why is there (seemingly) no way in Biojava to create a Sequence object instead of a SymbolList object? Or did I miss something? 2) I would then like to printout the protein alignment to check for correctness, and it seems there is no way of getting from a symbol to the one-letter aminoacid code. That is, proteinAlignment.get(j).symbolAt(k).getName() will return "Ala" instead of "A" etc. Is there a good way of getting the short symbols? Thanks, Peter From community at struck.lu Mon Jul 28 05:25:41 2008 From: community at struck.lu (community at struck.lu) Date: Mon, 28 Jul 2008 11:25:41 +0200 Subject: [Biojava-l] Short names for Amino acid symbols In-Reply-To: <488C9AE8.9080305@t-online.de> References: <488C9AE8.9080305@t-online.de> Message-ID: Peter Robinson <peter.robinson at t-online.de> wrote: > 2) I would then like to printout the protein alignment to check for > correctness, and it seems there is no way of getting from a symbol to > the one-letter aminoacid code. That is, > > proteinAlignment.get(j).symbolAt(k).getName() > > will return "Ala" instead of "A" etc. Is there a good way of getting the > short symbols? This small tutorial might help you out: http://biojava.org/wiki/BioJava:Cookbook:Translation:OneLetterAmbiDaniel _________________________________________________________ Mail sent using root eSolutions Webmailer - www.root.lu From james at carmanconsulting.com Thu Jul 3 15:06:40 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 11:06:40 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 Message-ID: I'm trying to parse the file: ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz using: RichSequence.IOTools.readGenbankProtein() and I keep getting this error (the date column is from my build server which runs this "loader", sorry): [10:51:36]: org.biojava.bio.BioException: Could not read sequence [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could not understand position: bond(39,96 [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) [10:51:36]: ... 28 more Does the parser not understand "Bond" features? From dicknetherlands at gmail.com Thu Jul 3 15:17:11 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 3 Jul 2008 16:17:11 +0100 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: Apparently not. I don't think they're part of the formal Genbank specification, or at least not the one that was current at the time the parser was written (in 2004). If they were, then we must have missed them out by accident. Sorry! Could you raise a bug report via BugZilla onthe BioJava website and someone will look into it as soon as they get a chance. cheers, Richard 2008/7/3 James Carman : > I'm trying to parse the file: > > ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz > > using: > > RichSequence.IOTools.readGenbankProtein() > > and I keep getting this error (the date column is from my build server > which runs this "loader", sorry): > > [10:51:36]: org.biojava.bio.BioException: Could not read sequence > [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) > [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) > [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) > [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) > [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) > [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) > [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) > [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) > [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) > [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) > [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) > [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) > [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) > [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) > [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) > [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) > [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could > not understand position: bond(39,96 > [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) > [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) > [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) > [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) > [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > [10:51:36]: ... 28 more > > Does the parser not understand "Bond" features? > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From james at carmanconsulting.com Thu Jul 3 15:19:32 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 11:19:32 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: Ok, great! I just wanted to make sure I wasn't doing something stupid! :) I'll file the BugZilla issue now (and download the source so that I can hopefully provide a patch). On Thu, Jul 3, 2008 at 11:17 AM, Richard Holland wrote: > Apparently not. I don't think they're part of the formal Genbank > specification, or at least not the one that was current at the time > the parser was written (in 2004). If they were, then we must have > missed them out by accident. Sorry! Could you raise a bug report via > BugZilla onthe BioJava website and someone will look into it as soon > as they get a chance. > > cheers, > Richard > > 2008/7/3 James Carman : >> I'm trying to parse the file: >> >> ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz >> >> using: >> >> RichSequence.IOTools.readGenbankProtein() >> >> and I keep getting this error (the date column is from my build server >> which runs this "loader", sorry): >> >> [10:51:36]: org.biojava.bio.BioException: Could not read sequence >> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >> [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) >> [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) >> [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) >> [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) >> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) >> [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) >> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) >> [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) >> [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) >> [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >> [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) >> [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) >> [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) >> [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) >> [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could >> not understand position: bond(39,96 >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) >> [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) >> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >> [10:51:36]: ... 28 more >> >> Does the parser not understand "Bond" features? >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From james at carmanconsulting.com Thu Jul 3 18:52:52 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 14:52:52 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: Richard, I filed the BugZilla issue: http://bugzilla.open-bio.org/show_bug.cgi?id=2536 I also attached a patch that I believe fixes the issue (it includes a test case). I hope that helps! James On Thu, Jul 3, 2008 at 11:19 AM, James Carman wrote: > Ok, great! I just wanted to make sure I wasn't doing something > stupid! :) I'll file the BugZilla issue now (and download the source > so that I can hopefully provide a patch). > > On Thu, Jul 3, 2008 at 11:17 AM, Richard Holland > wrote: >> Apparently not. I don't think they're part of the formal Genbank >> specification, or at least not the one that was current at the time >> the parser was written (in 2004). If they were, then we must have >> missed them out by accident. Sorry! Could you raise a bug report via >> BugZilla onthe BioJava website and someone will look into it as soon >> as they get a chance. >> >> cheers, >> Richard >> >> 2008/7/3 James Carman : >>> I'm trying to parse the file: >>> >>> ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz >>> >>> using: >>> >>> RichSequence.IOTools.readGenbankProtein() >>> >>> and I keep getting this error (the date column is from my build server >>> which runs this "loader", sorry): >>> >>> [10:51:36]: org.biojava.bio.BioException: Could not read sequence >>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >>> [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) >>> [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) >>> [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) >>> [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) >>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) >>> [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) >>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) >>> [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) >>> [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) >>> [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) >>> [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) >>> [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could >>> not understand position: bond(39,96 >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) >>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) >>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>> [10:51:36]: ... 28 more >>> >>> Does the parser not understand "Bond" features? >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> > From james at carmanconsulting.com Thu Jul 3 23:07:51 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 3 Jul 2008 19:07:51 -0400 Subject: [Biojava-l] ParseException: Could not understand position: bond(39, 96 In-Reply-To: References: Message-ID: I added a new patch that actually fixes the problem (you really should halt your build when a test case fails by the way :). Basically, it just skips over "Bond" features. On Thu, Jul 3, 2008 at 2:52 PM, James Carman wrote: > Richard, > > I filed the BugZilla issue: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2536 > > I also attached a patch that I believe fixes the issue (it includes a > test case). I hope that helps! > > James > > On Thu, Jul 3, 2008 at 11:19 AM, James Carman > wrote: >> Ok, great! I just wanted to make sure I wasn't doing something >> stupid! :) I'll file the BugZilla issue now (and download the source >> so that I can hopefully provide a patch). >> >> On Thu, Jul 3, 2008 at 11:17 AM, Richard Holland >> wrote: >>> Apparently not. I don't think they're part of the formal Genbank >>> specification, or at least not the one that was current at the time >>> the parser was written (in 2004). If they were, then we must have >>> missed them out by accident. Sorry! Could you raise a bug report via >>> BugZilla onthe BioJava website and someone will look into it as soon >>> as they get a chance. >>> >>> cheers, >>> Richard >>> >>> 2008/7/3 James Carman : >>>> I'm trying to parse the file: >>>> >>>> ftp://ftp.ncbi.nih.gov/refseq/release/vertebrate_mammalian/vertebrate_mammalian12.protein.gpff.gz >>>> >>>> using: >>>> >>>> RichSequence.IOTools.readGenbankProtein() >>>> >>>> and I keep getting this error (the date column is from my build server >>>> which runs this "loader", sorry): >>>> >>>> [10:51:36]: org.biojava.bio.BioException: Could not read sequence >>>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >>>> [10:51:36]: at com.pg.iip.loader.pubrec.RefSeqLoader.loadPublicRecords(RefSeqLoader.java:106) >>>> [10:51:36]: at com.pg.iip.loader.pubrec.PublicRecordLoader.doLoad(PublicRecordLoader.java:248) >>>> [10:51:36]: at com.pg.iip.loader.AbstractLoader.execute(AbstractLoader.java:56) >>>> [10:51:36]: at com.pg.iip.loader.LoaderUtils.executeLoader(LoaderUtils.java:20) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.invoke(RunLoaderMojo.java:95) >>>> [10:51:36]: at com.pg.iip.loader.plugin.RunLoaderMojo.execute(RunLoaderMojo.java:142) >>>> [10:51:36]: at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:493) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:463) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) >>>> [10:51:36]: at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) >>>> [10:51:36]: at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) >>>> [10:51:36]: at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) >>>> [10:51:36]: at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> [10:51:36]: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> [10:51:36]: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> [10:51:36]: at java.lang.reflect.Method.invoke(Method.java:585) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) >>>> [10:51:36]: at org.codehaus.classworlds.Launcher.main(Launcher.java:375) >>>> [10:51:36]: Caused by: org.biojava.bio.seq.io.ParseException: Could >>>> not understand position: bond(39,96 >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:271) >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) >>>> [10:51:36]: at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:490) >>>> [10:51:36]: at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>>> [10:51:36]: ... 28 more >>>> >>>> Does the parser not understand "Bond" features? >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >> > From james at carmanconsulting.com Sat Jul 5 11:46:41 2008 From: james at carmanconsulting.com (James Carman) Date: Sat, 5 Jul 2008 07:46:41 -0400 Subject: [Biojava-l] Maven2... Message-ID: Would the biojava project be interested in being "mavenized"? I'd be willing to help get you guys set up if you'd like. Also, it'd be nice to have biojava in the main maven repository. From dicknetherlands at gmail.com Sat Jul 5 12:09:51 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Sat, 5 Jul 2008 13:09:51 +0100 Subject: [Biojava-l] Maven2... In-Reply-To: References: Message-ID: Hello. BioJava 3 will make use of Maven. It's currently undergoing some use-case development to work out what to work on first, but we have a shell of a maven project already in our subversion hierarchy (under the biojava3 branch of the biojava-live project) and will set it up in the main maven repository when it's ready for release. Thanks for the offer though. If you're keen, you could go ahead and maven-ize the existing BioJava JAR files (version 1.6)? But, you would need to preserve the existing Ant config as well so that existing users are not affected. cheers, Richard 2008/7/5 James Carman : > Would the biojava project be interested in being "mavenized"? I'd be > willing to help get you guys set up if you'd like. Also, it'd be nice > to have biojava in the main maven repository. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Mon Jul 7 08:35:34 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 07 Jul 2008 09:35:34 +0100 Subject: [Biojava-l] Maven2... In-Reply-To: References: Message-ID: <4871D556.8020307@ebi.ac.uk> From my experience Maveninzing an existing build system is never a good idea. What is probably of more use to people is if a POM was generated & the biojava files uploaded to a maven repository (or host it on our website). That way it would keep people happy who are using the dependency management systems (I think buildr, raven & the alike can use the same systems as Maven2) & means we don't have to go through the heartache of reconfiguring Maven/our codebase to friendly to one of the other. Andy Richard Holland wrote: > Hello. BioJava 3 will make use of Maven. It's currently undergoing > some use-case development to work out what to work on first, but we > have a shell of a maven project already in our subversion hierarchy > (under the biojava3 branch of the biojava-live project) and will set > it up in the main maven repository when it's ready for release. > > Thanks for the offer though. If you're keen, you could go ahead and > maven-ize the existing BioJava JAR files (version 1.6)? But, you would > need to preserve the existing Ant config as well so that existing > users are not affected. > > cheers, > Richard > > 2008/7/5 James Carman : >> Would the biojava project be interested in being "mavenized"? I'd be >> willing to help get you guys set up if you'd like. Also, it'd be nice >> to have biojava in the main maven repository. >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From martin.jones at ed.ac.uk Thu Jul 10 10:13:28 2008 From: martin.jones at ed.ac.uk (Martin Jones) Date: Thu, 10 Jul 2008 11:13:28 +0100 Subject: [Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown Message-ID: Hi, I have a file containing GenBank records, and I want to process them thus: RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, null); while (seqs.hasNext()) { RichSequence seq = seqs.nextRichSequence(); // processing code } however, some records cannot be parsed by biojava... this is to be expected as I'm processing half a million records - some are bound to be wonky. So I use a try-catch to skip over troublesome records: RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, null); while (seqs.hasNext()) { try{ RichSequence seq = seqs.nextRichSequence(); // processing code } catch (BioException e){ System.out.println("record count not be parsed!"); } } However, it seems that the position in the input file is not changed if an exception is thrown during parsing. If I run the above code on a file containing a single un-parseable record, it gets stuck in a non-terminating loop - i.e. each time seqs.nextRichSequence() is called, an exception is thrown, but seqs.hasNext() still returns true. Is there a correct way to deal with this? I could split up my input file into multiple records and do something like: ArrayList records = splitGenBankFileIntoRecords(); for (String singleRecord : records){ BufferedReader singleRecordReader = new BufferedReader(new StringReader(singleRecord)); RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(singleRecordReader, null); try{ RichSequence seq = seqs.nextRichSequence(); // processing code } catch (BioException e){ System.out.println("record count not be parsed!"); } } but this seems inefficient, as I have to instantiate a new StringReader, BufferedReader and RichSequenceIterator for every record (half a milion cycles of object creation/destruction!) Any ideas? -- ------------------------ Martin Jones School of Biological Sciences, Ashworth Laboratories, King's Buildings Edinburgh, EH9 3JT, UK From dicknetherlands at gmail.com Thu Jul 10 10:21:30 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 10 Jul 2008 11:21:30 +0100 Subject: [Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown In-Reply-To: References: Message-ID: Hello. You appear to have hit a bit of a limitation with the system. The sequence iterator doesn't know how to skip over bad records (in fact, the parsers themselves do not - they just give up at the first sign of a failed line). I'll have to have a think about how to fix this, as it's not immediately obvious (although it definitely needs to be done). cheers, Richard 2008/7/10 Martin Jones : > Hi, > > I have a file containing GenBank records, and I want to process them thus: > > RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, > null); > while (seqs.hasNext()) { > RichSequence seq = seqs.nextRichSequence(); > // processing code > } > > however, some records cannot be parsed by biojava... this is to be expected > as I'm processing half a million records - some are bound to be wonky. So I > use a try-catch to skip over troublesome records: > > > RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, > null); > while (seqs.hasNext()) { > try{ > RichSequence seq = seqs.nextRichSequence(); > // processing code > } catch (BioException e){ > System.out.println("record count not be parsed!"); > } > } > > However, it seems that the position in the input file is not changed if an > exception is thrown during parsing. If I run the above code on a file > containing a single un-parseable record, it gets stuck in a non-terminating > loop - i.e. each time seqs.nextRichSequence() is called, an exception is > thrown, but seqs.hasNext() still returns true. Is there a correct way to > deal with this? I could split up my input file into multiple records and do > something like: > > ArrayList records = splitGenBankFileIntoRecords(); > for (String singleRecord : records){ > BufferedReader singleRecordReader = new BufferedReader(new > StringReader(singleRecord)); > RichSequenceIterator seqs = > RichSequence.IOTools.readGenbankDNA(singleRecordReader, null); > try{ > RichSequence seq = seqs.nextRichSequence(); > // processing code > } catch (BioException e){ > System.out.println("record count not be parsed!"); > } > > } > > but this seems inefficient, as I have to instantiate a new StringReader, > BufferedReader and RichSequenceIterator for every record (half a milion > cycles of object creation/destruction!) > > Any ideas? > > > > -- > ------------------------ > > Martin Jones > School of Biological Sciences, > Ashworth Laboratories, King's Buildings > Edinburgh, EH9 3JT, UK > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From james at carmanconsulting.com Thu Jul 10 11:30:32 2008 From: james at carmanconsulting.com (James Carman) Date: Thu, 10 Jul 2008 07:30:32 -0400 Subject: [Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown In-Reply-To: References: Message-ID: Ooooh. That's nasty. I just re-wrote one of our "loaders" because it was doing exactly that, breaking the file up into records and then using the parser to parse each one individually. I guess that's why they were doing that. I'll have to back out my changes. Good to know! Perhaps they should have put in a comment?! :) On Thu, Jul 10, 2008 at 6:21 AM, Richard Holland wrote: > Hello. You appear to have hit a bit of a limitation with the system. > The sequence iterator doesn't know how to skip over bad records (in > fact, the parsers themselves do not - they just give up at the first > sign of a failed line). I'll have to have a think about how to fix > this, as it's not immediately obvious (although it definitely needs to > be done). > > cheers, > Richard > > 2008/7/10 Martin Jones : >> Hi, >> >> I have a file containing GenBank records, and I want to process them thus: >> >> RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, >> null); >> while (seqs.hasNext()) { >> RichSequence seq = seqs.nextRichSequence(); >> // processing code >> } >> >> however, some records cannot be parsed by biojava... this is to be expected >> as I'm processing half a million records - some are bound to be wonky. So I >> use a try-catch to skip over troublesome records: >> >> >> RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader, >> null); >> while (seqs.hasNext()) { >> try{ >> RichSequence seq = seqs.nextRichSequence(); >> // processing code >> } catch (BioException e){ >> System.out.println("record count not be parsed!"); >> } >> } >> >> However, it seems that the position in the input file is not changed if an >> exception is thrown during parsing. If I run the above code on a file >> containing a single un-parseable record, it gets stuck in a non-terminating >> loop - i.e. each time seqs.nextRichSequence() is called, an exception is >> thrown, but seqs.hasNext() still returns true. Is there a correct way to >> deal with this? I could split up my input file into multiple records and do >> something like: >> >> ArrayList records = splitGenBankFileIntoRecords(); >> for (String singleRecord : records){ >> BufferedReader singleRecordReader = new BufferedReader(new >> StringReader(singleRecord)); >> RichSequenceIterator seqs = >> RichSequence.IOTools.readGenbankDNA(singleRecordReader, null); >> try{ >> RichSequence seq = seqs.nextRichSequence(); >> // processing code >> } catch (BioException e){ >> System.out.println("record count not be parsed!"); >> } >> >> } >> >> but this seems inefficient, as I have to instantiate a new StringReader, >> BufferedReader and RichSequenceIterator for every record (half a milion >> cycles of object creation/destruction!) >> >> Any ideas? >> >> >> >> -- >> ------------------------ >> >> Martin Jones >> School of Biological Sciences, >> Ashworth Laboratories, King's Buildings >> Edinburgh, EH9 3JT, UK >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From anisahghoorah at hotmail.com Thu Jul 17 10:06:37 2008 From: anisahghoorah at hotmail.com (Anisah Ghoorah) Date: Thu, 17 Jul 2008 11:06:37 +0100 Subject: [Biojava-l] Nexus file parser In-Reply-To: References: Message-ID: Hi, I would like to parse a nexus file and get the alignment from the DATA block. I'm not sure how the NexusFileListener works. Is there any code available that illustrates how to parse a nexus file. Many thanks, Anisah _________________________________________________________________ Invite your Facebook friends to chat on Messenger http://clk.atdmt.com/UKM/go/101719649/direct/01/ From anisahghoorah at hotmail.com Thu Jul 17 10:09:14 2008 From: anisahghoorah at hotmail.com (Anisah Ghoorah) Date: Thu, 17 Jul 2008 11:09:14 +0100 Subject: [Biojava-l] nexus file parser In-Reply-To: References: Message-ID: Hi, I would like to parse a nexus file and get the alignment from the DATA block. I'm not sure how the NexusFileListener works. Is there any code available that illustrates how to parse a nexus file. Many thanks, Anisah _________________________________________________________________ The John Lewis Clearance - save up to 50% with FREE delivery http://clk.atdmt.com/UKM/go/101719806/direct/01/ From dicknetherlands at gmail.com Thu Jul 17 11:21:45 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 17 Jul 2008 12:21:45 +0100 Subject: [Biojava-l] nexus file parser In-Reply-To: References: Message-ID: Hello. If you pass an instance of NexusFileBuilder to the NexusFileFormat parse methods, it will construct a NexusFile instance in memory which you can get by calling getNexusFile() after parsing has finished. You can then iterate over the blocks of the NexusFile by using the blockIterator() method. Each block returned is a class that implements the NexusObject interface. You can find out which type of block it is using instanceof, and thus find the DataBlock instance. You can then cast to DataBlock (which extends CharactersBlock) and use the methods from that to explore the alignment. cheers, Richard 2008/7/17 Anisah Ghoorah : > > > > Hi, > > I would like to parse a nexus file and get the alignment > from the DATA block. I'm not sure how the NexusFileListener works. Is > there any code available that illustrates how to parse a nexus file. > > Many thanks, > Anisah > _________________________________________________________________ > The John Lewis Clearance - save up to 50% with FREE delivery > http://clk.atdmt.com/UKM/go/101719806/direct/01/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From markjschreiber at gmail.com Thu Jul 17 12:33:11 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:33:11 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <483E0CA2.4010906@asti.dost.gov.ph> References: <483E0CA2.4010906@asti.dost.gov.ph> Message-ID: <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> Hi - Is the code throwing an exception or running out of memory?? Can you send an example program and the problem you encounter to the list. - Mark On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia wrote: > > > -------- Original Message -------- > Subject: large genbank data > Date: Wed, 28 May 2008 18:02:48 +0800 > From: Rey Vincent Babilonia > To: biojava-l at biojava.org > > hi, > > anybody tried uploading a large genbank data (e.g. > ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? > load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and > it can't read the sequence (maybe because it has 30000+ sequences). > > thanks. > > -- > /** > * @author Rey Vincent P. Babilonia > * @number +63 2 426 9760 local 1302 > * @pgp 0x383454CF pgp.mit.edu > * @project Philippine Bioinformatics Solutions > * @program Philippine e-Science Grid > * @division Research and Development Division > * @agency Advanced Science and Technology Institute > * @url http://www.psigrid.gov.ph > */ > > > -- > /** > * @author Rey Vincent P. Babilonia > * @number +63 2 426 9760 local 1302 > * @pgp 0x383454CF pgp.mit.edu > * @project Philippine Bioinformatics Solutions > * @program Philippine e-Science Grid > * @division Research and Development Division > * @agency Advanced Science and Technology Institute > * @url http://www.psigrid.gov.ph > */ > > No virus found in this outgoing message. > Checked by AVG. > Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: 5/28/2008 5:33 PM > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Thu Jul 17 12:40:31 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:40:31 +0800 Subject: [Biojava-l] problems installing biojava on Windows XP professional In-Reply-To: <406135.22320.qm@web94608.mail.in2.yahoo.com> References: <406135.22320.qm@web94608.mail.in2.yahoo.com> Message-ID: <93b45ca50807170540u2cc9a797mb4572fe5cb54599d@mail.gmail.com> Hi - First off, depending on the version of biojava you downloaded you may need Java 5 (JDK 1.5) or later. Secondly, you need to add JAR files to the CLASSPATH variable not the PATH variable. PATH is where windows searches for executables. - Mark On Tue, Apr 29, 2008 at 1:22 PM, arunabha banerjee wrote: > Hello, > > > > I am new to using biojava. I am trying to install biojava on a PC running > > Windows XP professional. I am using Java 2 SDK version 1.4.2. I have > > downloaded the files in the "binaries" directory in the download area of the > > biojava server to the directory "C:\biojava" on my computer. I have added > the > > string > > > > > > "C:\biojava;C:\biojava\biojava.jar;C:\biojava\xerces.jar;C:\biojava\bytecode.jar;" > > > > > > to my PATH variable. When I try to compile one of the simple demo files, > > like AlphabetExample.java, I get error messages saying that the packages > > "org.biojava.bio.symbol.*" and "org.biojava.bio.seq.*" can't be found. Is > > there something else I have to do to get the biojava files installed > correctly? > > > > Thanks - > > Arunabha Banerjee > > ________________________________ > Explore your hobbies and interests. Click here to begin. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From markjschreiber at gmail.com Thu Jul 17 12:44:08 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:44:08 +0800 Subject: [Biojava-l] Problem parsing biojava xml file In-Reply-To: <4846DBFE.1060105@mpi-cbg.de> References: <4846DBFE.1060105@mpi-cbg.de> Message-ID: <93b45ca50807170544i7b7f52cfwbbbde0c844053f78@mail.gmail.com> Hi - In the past I have seen this when there are invisible metacharacters in the stream or file before the XML proper starts. This can happen with language variants of Unicode. Try trimming the String before parsing. - Mark On Thu, Jun 5, 2008 at 2:16 AM, benn wrote: > Hello, > > Sorry to pepper the board with questions! I am working on BLAST > parsing and have the standard output for BLAST working fine with JUnit > tests. So I am attempting to recreate this for files in XML format comming > from blast (blastp), however I have the problem that I get a SAXExepttion > that content is not allowed before prolog. I thought I could have some > invisible characters whihc is causing it to throw a wobbly but I cannto see > any. Has anyone else come across the problem. for completeness i have > attached teh blast file and the code to parse is below: > > > private List parseBlast(String filename) > throws IOException, SAXException, BioException { > > InputStream is = new FileInputStream( > "src/test/resources/blast/standardoutput.blastp"); > > BlastXMLParserFacade parser = new BlastXMLParserFacade(); > SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); > parser.setContentHandler(adapter); > List results = new > ArrayList(); > > SearchContentHandler builder = new BlastLikeSearchBuilder(results, > new DummySequenceDB("queries"), > new DummySequenceDBInstallation()); > > adapter.setSearchContentHandler(builder); > > parser.parse(new InputSource(is)); > return results; > } > > > Cheers, > > Neil > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From markjschreiber at gmail.com Thu Jul 17 12:50:15 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 17 Jul 2008 20:50:15 +0800 Subject: [Biojava-l] Important notice about email handling on BioJava lists Message-ID: <93b45ca50807170550y722ebdc2qd4a1bb36b3b32206@mail.gmail.com> Hi - A lot of old emails just got posted to the list. This usually happens because messages that contain attachments or HTML get blocked by our aggressive spam filter. When our overworked admins get around to confirming they are not spam they eventually get through but probably too late to be of much help to you. Therefore... For prompt service when asking for help: 1) USE ONLY TEXT FORMAT EMAIL (NO HTML) 2) DON'T ADD ATTACHMENTS. If you want to post code just copy it in the body of the email. Although this might be a bit draconian we used to get badly spammed on the list so this is one of the easiest ways around it. Thanks, - Mark From ap3 at sanger.ac.uk Thu Jul 17 12:49:41 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 17 Jul 2008 13:49:41 +0100 Subject: [Biojava-l] biojava mailing lists Message-ID: <66377475-9986-4824-820F-A36F4AC979D9@sanger.ac.uk> Hi, You might have noticed a number of emails getting through to the mailing lists today with big delay. This happens if you post to the mailing list, without being subscribed to it. In order to avoid spam both lists only accept postings from list members. Anybody can become a list member, so please subscribe before you post. If you send without being subscribed your mail will get stuck in the moderation loop, which can cause several weeks of delay (no fun to read through all that spam). Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From dicknetherlands at gmail.com Thu Jul 17 19:14:39 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 17 Jul 2008 20:14:39 +0100 Subject: [Biojava-l] Problem while parsing GenBank-like files and persiting them using Hibernate In-Reply-To: <480844DB.6070808@uni-tuebingen.de> References: <480844DB.6070808@uni-tuebingen.de> Message-ID: I can't remember if I answered something like this before or not... anyhow here goes just in case! > 1. Is there a way to read in files downloaded from Ensembl using only the > designated BioJavaX classes? You could use the original ones and do some plain-text parsing of your own on the 'unrich' data. The 'rich' parsers adhere strictly to the official format, which does not include the Ensembl extensions (exon etc.). Therefore any attempt to 'enrich' the data will attempt to force it into the standard format, which as you see causes non-standard bits either to get skipped or converted into some kind of catch-all data type (such as 'any'). > 2. How can I extend the terms so that not only "SOME X-specific terms" are > included, but some more? And how do I tell the parser to use and apply these > terms? Or more generally, can I somehow read in an ontology (for instance > the GO), persist it in BioSQL and make use of the terms contained therein? It's a bit hard. I could have made this code easier to extend I think - wasn't planning on non-standard versions when I wrote it! Essentially the way to do this is to locate the appropriate XYZFormat.Terms class in an IDE such as Eclipse or NetBeans, then find a term similar to the one you want to use (in your case, you want to add 'exon' so find something similar in the GenbankFormat.Terms class), highlight it and do a 'find all usages'. That'll pretty quickly point you to the parts of the code which use the term. Add your new term to the XYZFormat.Terms class, then insert extra code in all the parts that 'find all usages' highlighted. > 3. How can I persist a sequence from Ensembl within a BioSQL database using > Hibernate even though they use different accession numbers? Find the regex and modify it to accept Ensembl-style accessions. Then, use 'find all usages' on the regex to find the place that uses it and modify those accordingly to pick up the correct groups from the regex and assign them to the data model, particularly if you reordered brackets etc. and therefore renumbered the groups in the regex. cheers, Richard From dicknetherlands at gmail.com Thu Jul 17 19:15:04 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Thu, 17 Jul 2008 20:15:04 +0100 Subject: [Biojava-l] Constructing Backbone of Protein In-Reply-To: <165609.65933.qm@web51412.mail.re2.yahoo.com> References: <165609.65933.qm@web51412.mail.re2.yahoo.com> Message-ID: Not sure. Andreas Prlic should know. Andreas....? 2008/5/13 Armita Sheari : > Hi everyone, > > I need to write a program that can construct the backbone of the protein > from its sequence and the relevant phi and psi angles. I want to know if > there is a class or method that can help me to calculate the coordinates > form phi and psi angles! > > thanks, > ArmitaSh > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From rvincent at asti.dost.gov.ph Fri Jul 18 01:59:47 2008 From: rvincent at asti.dost.gov.ph (Rey Vincent Babilonia) Date: Fri, 18 Jul 2008 09:59:47 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> Message-ID: <487FF913.6090504@asti.dost.gov.ph> Hi Mark, At first it throws an out of memory exception. My workaround is to subdivide the sequence file into individual GenBank files. The error now is that if a GenBank sequence has an 'empty alphabet', it does not get loaded to BioSQL. My workaround is to check if sequence.getAlphabet().getName() is DNA. Thanks. Mark Schreiber wrote: > Hi - > > Is the code throwing an exception or running out of memory?? > > Can you send an example program and the problem you encounter to the list. > - Mark > > On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia > wrote: >> >> -------- Original Message -------- >> Subject: large genbank data >> Date: Wed, 28 May 2008 18:02:48 +0800 >> From: Rey Vincent Babilonia >> To: biojava-l at biojava.org >> >> hi, >> >> anybody tried uploading a large genbank data (e.g. >> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >> it can't read the sequence (maybe because it has 30000+ sequences). >> >> thanks. >> >> -- >> /** >> * @author Rey Vincent P. Babilonia >> * @number +63 2 426 9760 local 1302 >> * @pgp 0x383454CF pgp.mit.edu >> * @project Philippine Bioinformatics Solutions >> * @program Philippine e-Science Grid >> * @division Research and Development Division >> * @agency Advanced Science and Technology Institute >> * @url http://www.psigrid.gov.ph >> */ >> >> >> -- >> /** >> * @author Rey Vincent P. Babilonia >> * @number +63 2 426 9760 local 1302 >> * @pgp 0x383454CF pgp.mit.edu >> * @project Philippine Bioinformatics Solutions >> * @program Philippine e-Science Grid >> * @division Research and Development Division >> * @agency Advanced Science and Technology Institute >> * @url http://www.psigrid.gov.ph >> */ >> >> No virus found in this outgoing message. >> Checked by AVG. >> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: 5/28/2008 5:33 PM >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > -- /** * @author Rey Vincent P. Babilonia * @number +63 2 426 9760 local 1302 * @pgp 0x383454CF pgp.mit.edu * @project Philippine Bioinformatics Solutions * @program Philippine e-Science Grid * @division Research and Development Division * @agency Advanced Science and Technology Institute * @url http://www.psigrid.gov.ph */ From rvincent at asti.dost.gov.ph Fri Jul 18 08:12:15 2008 From: rvincent at asti.dost.gov.ph (Rey Vincent Babilonia) Date: Fri, 18 Jul 2008 16:12:15 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <487FF913.6090504@asti.dost.gov.ph> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> Message-ID: <4880505F.7010308@asti.dost.gov.ph> Hi Mark, What is the maximum sequence length that a RichSequence can handle? java -Xms1024m -Xmx1256m -jar loader.jar . 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier 56384585, length 5528445 and alphabet DNA... org.hibernate.PropertyAccessException: Exception occurred inside getter of org.biojavax.bio.seq.SimpleRichSequence.sequenceLength Rey Vincent Babilonia wrote: > Hi Mark, > > At first it throws an out of memory exception. My workaround is to > subdivide the sequence file into individual GenBank files. > > The error now is that if a GenBank sequence has an 'empty alphabet', it > does not get loaded to BioSQL. My workaround is to check if > sequence.getAlphabet().getName() is DNA. > > Thanks. > > Mark Schreiber wrote: >> Hi - >> >> Is the code throwing an exception or running out of memory?? >> >> Can you send an example program and the problem you encounter to the >> list. >> - Mark >> >> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >> wrote: >>> >>> -------- Original Message -------- >>> Subject: large genbank data >>> Date: Wed, 28 May 2008 18:02:48 +0800 >>> From: Rey Vincent Babilonia >>> To: biojava-l at biojava.org >>> >>> hi, >>> >>> anybody tried uploading a large genbank data (e.g. >>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>> it can't read the sequence (maybe because it has 30000+ sequences). >>> >>> thanks. >>> >>> -- >>> /** >>> * @author Rey Vincent P. Babilonia >>> * @number +63 2 426 9760 local 1302 >>> * @pgp 0x383454CF pgp.mit.edu >>> * @project Philippine Bioinformatics Solutions >>> * @program Philippine e-Science Grid >>> * @division Research and Development Division >>> * @agency Advanced Science and Technology Institute >>> * @url http://www.psigrid.gov.ph >>> */ >>> >>> >>> -- >>> /** >>> * @author Rey Vincent P. Babilonia >>> * @number +63 2 426 9760 local 1302 >>> * @pgp 0x383454CF pgp.mit.edu >>> * @project Philippine Bioinformatics Solutions >>> * @program Philippine e-Science Grid >>> * @division Research and Development Division >>> * @agency Advanced Science and Technology Institute >>> * @url http://www.psigrid.gov.ph >>> */ >>> >>> No virus found in this outgoing message. >>> Checked by AVG. >>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>> 5/28/2008 5:33 PM >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> > -- /** * @author Rey Vincent P. Babilonia * @number +63 2 426 9760 local 1302 * @pgp 0x383454CF pgp.mit.edu * @project Philippine Bioinformatics Solutions * @program Philippine e-Science Grid * @division Research and Development Division * @agency Advanced Science and Technology Institute * @url http://www.psigrid.gov.ph */ From dicknetherlands at gmail.com Fri Jul 18 08:47:08 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 18 Jul 2008 09:47:08 +0100 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <4880505F.7010308@asti.dost.gov.ph> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> Message-ID: In order to persist to BioSQL, BioJava has to convert the symbol list into a string so that it can pass it to JDBC via Hibernate. Therefore the maximum length of a sequence you wish to persist to BioSQL is the maximum length of a string in Java, which is 65536 (2^16) if you are working in a UTF-8 environment. 2008/7/18 Rey Vincent Babilonia : > Hi Mark, > > What is the maximum sequence length that a RichSequence can handle? > > java -Xms1024m -Xmx1256m -jar loader.jar > . > 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. > 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier > 56384585, length 5528445 and alphabet DNA... > org.hibernate.PropertyAccessException: Exception occurred inside getter of > org.biojavax.bio.seq.SimpleRichSequence.sequenceLength > > Rey Vincent Babilonia wrote: >> >> Hi Mark, >> >> At first it throws an out of memory exception. My workaround is to >> subdivide the sequence file into individual GenBank files. >> >> The error now is that if a GenBank sequence has an 'empty alphabet', it >> does not get loaded to BioSQL. My workaround is to check if >> sequence.getAlphabet().getName() is DNA. >> >> Thanks. >> >> Mark Schreiber wrote: >>> >>> Hi - >>> >>> Is the code throwing an exception or running out of memory?? >>> >>> Can you send an example program and the problem you encounter to the >>> list. >>> - Mark >>> >>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>> wrote: >>>> >>>> -------- Original Message -------- >>>> Subject: large genbank data >>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>> From: Rey Vincent Babilonia >>>> To: biojava-l at biojava.org >>>> >>>> hi, >>>> >>>> anybody tried uploading a large genbank data (e.g. >>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>> >>>> thanks. >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> No virus found in this outgoing message. >>>> Checked by AVG. >>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>> 5/28/2008 5:33 PM >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> >> > > -- > /** > * @author Rey Vincent P. Babilonia > * @number +63 2 426 9760 local 1302 > * @pgp 0x383454CF pgp.mit.edu > * @project Philippine Bioinformatics Solutions > * @program Philippine e-Science Grid > * @division Research and Development Division > * @agency Advanced Science and Technology Institute > * @url http://www.psigrid.gov.ph > */ > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From james at carmanconsulting.com Fri Jul 18 10:45:50 2008 From: james at carmanconsulting.com (James Carman) Date: Fri, 18 Jul 2008 06:45:50 -0400 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> Message-ID: That is a limitation for string literals, not any string. Correct? On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland wrote: > In order to persist to BioSQL, BioJava has to convert the symbol list > into a string so that it can pass it to JDBC via Hibernate. Therefore > the maximum length of a sequence you wish to persist to BioSQL is the > maximum length of a string in Java, which is 65536 (2^16) if you are > working in a UTF-8 environment. > > 2008/7/18 Rey Vincent Babilonia : >> Hi Mark, >> >> What is the maximum sequence length that a RichSequence can handle? >> >> java -Xms1024m -Xmx1256m -jar loader.jar >> . >> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >> 56384585, length 5528445 and alphabet DNA... >> org.hibernate.PropertyAccessException: Exception occurred inside getter of >> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >> >> Rey Vincent Babilonia wrote: >>> >>> Hi Mark, >>> >>> At first it throws an out of memory exception. My workaround is to >>> subdivide the sequence file into individual GenBank files. >>> >>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>> does not get loaded to BioSQL. My workaround is to check if >>> sequence.getAlphabet().getName() is DNA. >>> >>> Thanks. >>> >>> Mark Schreiber wrote: >>>> >>>> Hi - >>>> >>>> Is the code throwing an exception or running out of memory?? >>>> >>>> Can you send an example program and the problem you encounter to the >>>> list. >>>> - Mark >>>> >>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>> wrote: >>>>> >>>>> -------- Original Message -------- >>>>> Subject: large genbank data >>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>> From: Rey Vincent Babilonia >>>>> To: biojava-l at biojava.org >>>>> >>>>> hi, >>>>> >>>>> anybody tried uploading a large genbank data (e.g. >>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>> >>>>> thanks. >>>>> >>>>> -- >>>>> /** >>>>> * @author Rey Vincent P. Babilonia >>>>> * @number +63 2 426 9760 local 1302 >>>>> * @pgp 0x383454CF pgp.mit.edu >>>>> * @project Philippine Bioinformatics Solutions >>>>> * @program Philippine e-Science Grid >>>>> * @division Research and Development Division >>>>> * @agency Advanced Science and Technology Institute >>>>> * @url http://www.psigrid.gov.ph >>>>> */ >>>>> >>>>> >>>>> -- >>>>> /** >>>>> * @author Rey Vincent P. Babilonia >>>>> * @number +63 2 426 9760 local 1302 >>>>> * @pgp 0x383454CF pgp.mit.edu >>>>> * @project Philippine Bioinformatics Solutions >>>>> * @program Philippine e-Science Grid >>>>> * @division Research and Development Division >>>>> * @agency Advanced Science and Technology Institute >>>>> * @url http://www.psigrid.gov.ph >>>>> */ >>>>> >>>>> No virus found in this outgoing message. >>>>> Checked by AVG. >>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>> 5/28/2008 5:33 PM >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>> >>> >> >> -- >> /** >> * @author Rey Vincent P. Babilonia >> * @number +63 2 426 9760 local 1302 >> * @pgp 0x383454CF pgp.mit.edu >> * @project Philippine Bioinformatics Solutions >> * @program Philippine e-Science Grid >> * @division Research and Development Division >> * @agency Advanced Science and Technology Institute >> * @url http://www.psigrid.gov.ph >> */ >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From markjschreiber at gmail.com Fri Jul 18 13:17:28 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 18 Jul 2008 21:17:28 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> Message-ID: <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> Was looking on the internet ... So the Java spec says nothing about an upper limit however the sun JDK implements String as a char[] (behind the scenes). Therefore I think that on the Sun JDK with the right amount of RAM you could go to 2^32 (except for string literals as mentioned above) which is 4,294,967,296 characters. So a string of a sequence should be able to get to about 4 billion bases. Of course if you don't assign enough memory to the JVM ( -Xmx4G) you won't be able to get close. Of course even if you can assign that much that doesn't account for all the other Java overhead and all the stuff Hibernate is doing with proxy classes etc. Also BioSQL usually defines sequence as a CLOB so depending on your DB implementation there may be a limit on that. On a 32 bit machine 4GB is all you can get per CPU so you would have issues trying to do anything bigger. Anyhow I know I have stored human chromosome 1 (approx 1 billion bases in memory). - Mark On Fri, Jul 18, 2008 at 6:45 PM, James Carman wrote: > That is a limitation for string literals, not any string. Correct? > > On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland > wrote: >> In order to persist to BioSQL, BioJava has to convert the symbol list >> into a string so that it can pass it to JDBC via Hibernate. Therefore >> the maximum length of a sequence you wish to persist to BioSQL is the >> maximum length of a string in Java, which is 65536 (2^16) if you are >> working in a UTF-8 environment. >> >> 2008/7/18 Rey Vincent Babilonia : >>> Hi Mark, >>> >>> What is the maximum sequence length that a RichSequence can handle? >>> >>> java -Xms1024m -Xmx1256m -jar loader.jar >>> . >>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >>> 56384585, length 5528445 and alphabet DNA... >>> org.hibernate.PropertyAccessException: Exception occurred inside getter of >>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >>> >>> Rey Vincent Babilonia wrote: >>>> >>>> Hi Mark, >>>> >>>> At first it throws an out of memory exception. My workaround is to >>>> subdivide the sequence file into individual GenBank files. >>>> >>>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>>> does not get loaded to BioSQL. My workaround is to check if >>>> sequence.getAlphabet().getName() is DNA. >>>> >>>> Thanks. >>>> >>>> Mark Schreiber wrote: >>>>> >>>>> Hi - >>>>> >>>>> Is the code throwing an exception or running out of memory?? >>>>> >>>>> Can you send an example program and the problem you encounter to the >>>>> list. >>>>> - Mark >>>>> >>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>>> wrote: >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: large genbank data >>>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>>> From: Rey Vincent Babilonia >>>>>> To: biojava-l at biojava.org >>>>>> >>>>>> hi, >>>>>> >>>>>> anybody tried uploading a large genbank data (e.g. >>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>>> >>>>>> thanks. >>>>>> >>>>>> -- >>>>>> /** >>>>>> * @author Rey Vincent P. Babilonia >>>>>> * @number +63 2 426 9760 local 1302 >>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>> * @project Philippine Bioinformatics Solutions >>>>>> * @program Philippine e-Science Grid >>>>>> * @division Research and Development Division >>>>>> * @agency Advanced Science and Technology Institute >>>>>> * @url http://www.psigrid.gov.ph >>>>>> */ >>>>>> >>>>>> >>>>>> -- >>>>>> /** >>>>>> * @author Rey Vincent P. Babilonia >>>>>> * @number +63 2 426 9760 local 1302 >>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>> * @project Philippine Bioinformatics Solutions >>>>>> * @program Philippine e-Science Grid >>>>>> * @division Research and Development Division >>>>>> * @agency Advanced Science and Technology Institute >>>>>> * @url http://www.psigrid.gov.ph >>>>>> */ >>>>>> >>>>>> No virus found in this outgoing message. >>>>>> Checked by AVG. >>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>>> 5/28/2008 5:33 PM >>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>> >>>> >>> >>> -- >>> /** >>> * @author Rey Vincent P. Babilonia >>> * @number +63 2 426 9760 local 1302 >>> * @pgp 0x383454CF pgp.mit.edu >>> * @project Philippine Bioinformatics Solutions >>> * @program Philippine e-Science Grid >>> * @division Research and Development Division >>> * @agency Advanced Science and Technology Institute >>> * @url http://www.psigrid.gov.ph >>> */ >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ap3 at sanger.ac.uk Fri Jul 18 14:05:20 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 18 Jul 2008 15:05:20 +0100 Subject: [Biojava-l] Constructing Backbone of Protein In-Reply-To: References: <165609.65933.qm@web51412.mail.re2.yahoo.com> Message-ID: <1EB8C2A5-9387-47A8-8088-C5E492FD3FC0@sanger.ac.uk> Hi Richard, This email actually managed to find its way through to the list back in May... http://www.biojava.org/pipermail/biojava-l/2008-May/006211.html Andreas On 17 Jul 2008, at 20:15, Richard Holland wrote: > Not sure. Andreas Prlic should know. Andreas....? > > 2008/5/13 Armita Sheari : >> Hi everyone, >> >> I need to write a program that can construct the backbone of the >> protein >> from its sequence and the relevant phi and psi angles. I want to >> know if >> there is a class or method that can help me to calculate the >> coordinates >> form phi and psi angles! >> >> thanks, >> ArmitaSh >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From super_fx at msn.com Fri Jul 18 14:22:33 2008 From: super_fx at msn.com (Mohammed AlQuraishi) Date: Fri, 18 Jul 2008 07:22:33 -0700 Subject: [Biojava-l] Constructing Backbone of Protein In-Reply-To: <1EB8C2A5-9387-47A8-8088-C5E492FD3FC0@sanger.ac.uk> References: <165609.65933.qm@web51412.mail.re2.yahoo.com> <1EB8C2A5-9387-47A8-8088-C5E492FD3FC0@sanger.ac.uk> Message-ID: In general it's not possible to accurately reconstruct a protein's backbone strictly from phi/psi angles--you'd need the bond lengths and bond angles (especially important) to have an accurate reconstruction. It is however possible to get an approximate reconstruction, particularly for short protein fragments, if you use "standard" values for bond lengths and angles, such as the ones here: http://scripts.iucr.org/cgi-bin/paper?li0061 I don't know if biojava has any methods specific for this purpose, but the link below contains a description of how to reconstruct the coordinates if you have the dihedral angles (and bond lengths and angles) that doesn't require more functionality than simple 3D transforms: https://lists.sdsc.edu/pipermail/pdb-l/2002-December/000326.html Hope this helps, Mohammed --- Mohammed AlQuraishi McAdams and Shapiro Labs Stanford University -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Friday, July 18, 2008 7:05 AM To: Richard Holland Cc: biojava-1 mailing list Subject: Re: [Biojava-l] Constructing Backbone of Protein Hi Richard, This email actually managed to find its way through to the list back in May... http://www.biojava.org/pipermail/biojava-l/2008-May/006211.html Andreas On 17 Jul 2008, at 20:15, Richard Holland wrote: > Not sure. Andreas Prlic should know. Andreas....? > > 2008/5/13 Armita Sheari : >> Hi everyone, >> >> I need to write a program that can construct the backbone of the >> protein >> from its sequence and the relevant phi and psi angles. I want to >> know if >> there is a class or method that can help me to calculate the >> coordinates >> form phi and psi angles! >> >> thanks, >> ArmitaSh >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From koen.bruynseels at cropdesign.com Fri Jul 18 14:48:23 2008 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Fri, 18 Jul 2008 16:48:23 +0200 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 18/07/2008 and will not return until 28/07/2008. I will respond to your message when I return. From dicknetherlands at gmail.com Fri Jul 18 15:44:49 2008 From: dicknetherlands at gmail.com (Richard Holland) Date: Fri, 18 Jul 2008 16:44:49 +0100 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> Message-ID: Hmm in that case it must be something else. Your original mail only posted the first couple of lines of the stack trace. Could you post the whole thing so we can take a closer look? 2008/7/18 Mark Schreiber : > Was looking on the internet ... > > So the Java spec says nothing about an upper limit however the sun JDK > implements String as a char[] (behind the scenes). Therefore I think > that on the Sun JDK with the right amount of RAM you could go to 2^32 > (except for string literals as mentioned above) which is 4,294,967,296 > characters. So a string of a sequence should be able to get to about 4 > billion bases. > > Of course if you don't assign enough memory to the JVM ( -Xmx4G) you > won't be able to get close. Of course even if you can assign that much > that doesn't account for all the other Java overhead and all the stuff > Hibernate is doing with proxy classes etc. Also BioSQL usually > defines sequence as a CLOB so depending on your DB implementation > there may be a limit on that. On a 32 bit machine 4GB is all you can > get per CPU so you would have issues trying to do anything bigger. > > Anyhow I know I have stored human chromosome 1 (approx 1 billion bases > in memory). > > > > - Mark > > On Fri, Jul 18, 2008 at 6:45 PM, James Carman > wrote: >> That is a limitation for string literals, not any string. Correct? >> >> On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland >> wrote: >>> In order to persist to BioSQL, BioJava has to convert the symbol list >>> into a string so that it can pass it to JDBC via Hibernate. Therefore >>> the maximum length of a sequence you wish to persist to BioSQL is the >>> maximum length of a string in Java, which is 65536 (2^16) if you are >>> working in a UTF-8 environment. >>> >>> 2008/7/18 Rey Vincent Babilonia : >>>> Hi Mark, >>>> >>>> What is the maximum sequence length that a RichSequence can handle? >>>> >>>> java -Xms1024m -Xmx1256m -jar loader.jar >>>> . >>>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >>>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >>>> 56384585, length 5528445 and alphabet DNA... >>>> org.hibernate.PropertyAccessException: Exception occurred inside getter of >>>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >>>> >>>> Rey Vincent Babilonia wrote: >>>>> >>>>> Hi Mark, >>>>> >>>>> At first it throws an out of memory exception. My workaround is to >>>>> subdivide the sequence file into individual GenBank files. >>>>> >>>>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>>>> does not get loaded to BioSQL. My workaround is to check if >>>>> sequence.getAlphabet().getName() is DNA. >>>>> >>>>> Thanks. >>>>> >>>>> Mark Schreiber wrote: >>>>>> >>>>>> Hi - >>>>>> >>>>>> Is the code throwing an exception or running out of memory?? >>>>>> >>>>>> Can you send an example program and the problem you encounter to the >>>>>> list. >>>>>> - Mark >>>>>> >>>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>>>> wrote: >>>>>>> >>>>>>> -------- Original Message -------- >>>>>>> Subject: large genbank data >>>>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>>>> From: Rey Vincent Babilonia >>>>>>> To: biojava-l at biojava.org >>>>>>> >>>>>>> hi, >>>>>>> >>>>>>> anybody tried uploading a large genbank data (e.g. >>>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>>>> >>>>>>> thanks. >>>>>>> >>>>>>> -- >>>>>>> /** >>>>>>> * @author Rey Vincent P. Babilonia >>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>> * @program Philippine e-Science Grid >>>>>>> * @division Research and Development Division >>>>>>> * @agency Advanced Science and Technology Institute >>>>>>> * @url http://www.psigrid.gov.ph >>>>>>> */ >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> /** >>>>>>> * @author Rey Vincent P. Babilonia >>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>> * @program Philippine e-Science Grid >>>>>>> * @division Research and Development Division >>>>>>> * @agency Advanced Science and Technology Institute >>>>>>> * @url http://www.psigrid.gov.ph >>>>>>> */ >>>>>>> >>>>>>> No virus found in this outgoing message. >>>>>>> Checked by AVG. >>>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>>>> 5/28/2008 5:33 PM >>>>>>> >>>>>>> _______________________________________________ >>>>>>> biojava-dev mailing list >>>>>>> biojava-dev at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> /** >>>> * @author Rey Vincent P. Babilonia >>>> * @number +63 2 426 9760 local 1302 >>>> * @pgp 0x383454CF pgp.mit.edu >>>> * @project Philippine Bioinformatics Solutions >>>> * @program Philippine e-Science Grid >>>> * @division Research and Development Division >>>> * @agency Advanced Science and Technology Institute >>>> * @url http://www.psigrid.gov.ph >>>> */ >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From rvincent at asti.dost.gov.ph Mon Jul 21 02:35:04 2008 From: rvincent at asti.dost.gov.ph (Rey Vincent Babilonia) Date: Mon, 21 Jul 2008 10:35:04 +0800 Subject: [Biojava-l] [Biojava-dev] [Fwd: large genbank data] In-Reply-To: References: <483E0CA2.4010906@asti.dost.gov.ph> <93b45ca50807170533k24af6231s89b257bce5c740ad@mail.gmail.com> <487FF913.6090504@asti.dost.gov.ph> <4880505F.7010308@asti.dost.gov.ph> <93b45ca50807180617x7328c2b6r265939c89afd5f7a@mail.gmail.com> Message-ID: <4883F5D8.5030908@asti.dost.gov.ph> Dear all, Here's the complete stack trace: 10:26:14,796 INFO Loader:296 - D:\AE000521.gbk is readable. 10:26:16,046 INFO Loader:340 - Alphabet of AE000521 is Empty Alphabet. Skipping... 10:26:16,250 INFO Loader:296 - D:\AE004438.gbk is readable. 10:26:20,750 FATAL Loader:334 - Sequence AE004438 already exists. 10:26:20,921 INFO Loader:296 - D:\AE005174.gbk is readable. 10:26:28,328 INFO Loader:326 - Loading sequence AE005174 with identifier 56384585, length 5528445 and alphabet DNA... org.hibernate.PropertyAccessException: Exception occurred inside getter of org.biojavax.bio.seq.SimpleRichSequence.sequenceLength at org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:148) at org.hibernate.tuple.entity.AbstractEntityTuplizer.getPropertyValues(AbstractEntityTuplizer.java:256) at org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValues(PojoEntityTuplizer.java:209) at org.hibernate.persister.entity.AbstractEntityPersister.getPropertyValues(AbstractEntityPersister.java:3581) at org.hibernate.event.def.DefaultMergeEventListener.copyValues(DefaultMergeEventListener.java:377) at org.hibernate.event.def.DefaultMergeEventListener.entityIsTransient(DefaultMergeEventListener.java:179) at org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:123) at org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:53) at org.hibernate.impl.SessionImpl.fireMerge(SessionImpl.java:677) at org.hibernate.impl.SessionImpl.merge(SessionImpl.java:661) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:328) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:145) ... 12 more Caused by: java.lang.NullPointerException at org.biojavax.bio.seq.SimpleRichSequence.length(SimpleRichSequence.java:91) at org.biojavax.bio.seq.SimpleRichSequence.getSequenceLength(SimpleRichSequence.java:97) ... 17 more 10:26:28,937 ERROR AbstractBatcher:51 - Exception executing batch: org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61) at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) 10:26:28,937 ERROR AbstractFlushingEventListener:301 - Could not synchronize database state with session org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61) at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) Exception in thread "main" org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61) at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000) at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351) at ph.gov.dost.asti.genbankers.Loader.(Loader.java:137) at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416) Richard Holland wrote: > Hmm in that case it must be something else. > > Your original mail only posted the first couple of lines of the stack > trace. Could you post the whole thing so we can take a closer look? > > 2008/7/18 Mark Schreiber : >> Was looking on the internet ... >> >> So the Java spec says nothing about an upper limit however the sun JDK >> implements String as a char[] (behind the scenes). Therefore I think >> that on the Sun JDK with the right amount of RAM you could go to 2^32 >> (except for string literals as mentioned above) which is 4,294,967,296 >> characters. So a string of a sequence should be able to get to about 4 >> billion bases. >> >> Of course if you don't assign enough memory to the JVM ( -Xmx4G) you >> won't be able to get close. Of course even if you can assign that much >> that doesn't account for all the other Java overhead and all the stuff >> Hibernate is doing with proxy classes etc. Also BioSQL usually >> defines sequence as a CLOB so depending on your DB implementation >> there may be a limit on that. On a 32 bit machine 4GB is all you can >> get per CPU so you would have issues trying to do anything bigger. >> >> Anyhow I know I have stored human chromosome 1 (approx 1 billion bases >> in memory). >> >> >> >> - Mark >> >> On Fri, Jul 18, 2008 at 6:45 PM, James Carman >> wrote: >>> That is a limitation for string literals, not any string. Correct? >>> >>> On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland >>> wrote: >>>> In order to persist to BioSQL, BioJava has to convert the symbol list >>>> into a string so that it can pass it to JDBC via Hibernate. Therefore >>>> the maximum length of a sequence you wish to persist to BioSQL is the >>>> maximum length of a string in Java, which is 65536 (2^16) if you are >>>> working in a UTF-8 environment. >>>> >>>> 2008/7/18 Rey Vincent Babilonia : >>>>> Hi Mark, >>>>> >>>>> What is the maximum sequence length that a RichSequence can handle? >>>>> >>>>> java -Xms1024m -Xmx1256m -jar loader.jar >>>>> . >>>>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable. >>>>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier >>>>> 56384585, length 5528445 and alphabet DNA... >>>>> org.hibernate.PropertyAccessException: Exception occurred inside getter of >>>>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength >>>>> >>>>> Rey Vincent Babilonia wrote: >>>>>> Hi Mark, >>>>>> >>>>>> At first it throws an out of memory exception. My workaround is to >>>>>> subdivide the sequence file into individual GenBank files. >>>>>> >>>>>> The error now is that if a GenBank sequence has an 'empty alphabet', it >>>>>> does not get loaded to BioSQL. My workaround is to check if >>>>>> sequence.getAlphabet().getName() is DNA. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Mark Schreiber wrote: >>>>>>> Hi - >>>>>>> >>>>>>> Is the code throwing an exception or running out of memory?? >>>>>>> >>>>>>> Can you send an example program and the problem you encounter to the >>>>>>> list. >>>>>>> - Mark >>>>>>> >>>>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia >>>>>>> wrote: >>>>>>>> -------- Original Message -------- >>>>>>>> Subject: large genbank data >>>>>>>> Date: Wed, 28 May 2008 18:02:48 +0800 >>>>>>>> From: Rey Vincent Babilonia >>>>>>>> To: biojava-l at biojava.org >>>>>>>> >>>>>>>> hi, >>>>>>>> >>>>>>>> anybody tried uploading a large genbank data (e.g. >>>>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql? >>>>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and >>>>>>>> it can't read the sequence (maybe because it has 30000+ sequences). >>>>>>>> >>>>>>>> thanks. >>>>>>>> >>>>>>>> -- >>>>>>>> /** >>>>>>>> * @author Rey Vincent P. Babilonia >>>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>>> * @program Philippine e-Science Grid >>>>>>>> * @division Research and Development Division >>>>>>>> * @agency Advanced Science and Technology Institute >>>>>>>> * @url http://www.psigrid.gov.ph >>>>>>>> */ >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> /** >>>>>>>> * @author Rey Vincent P. Babilonia >>>>>>>> * @number +63 2 426 9760 local 1302 >>>>>>>> * @pgp 0x383454CF pgp.mit.edu >>>>>>>> * @project Philippine Bioinformatics Solutions >>>>>>>> * @program Philippine e-Science Grid >>>>>>>> * @division Research and Development Division >>>>>>>> * @agency Advanced Science and Technology Institute >>>>>>>> * @url http://www.psigrid.gov.ph >>>>>>>> */ >>>>>>>> >>>>>>>> No virus found in this outgoing message. >>>>>>>> Checked by AVG. >>>>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date: >>>>>>>> 5/28/2008 5:33 PM >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> biojava-dev mailing list >>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>> >>>>> -- >>>>> /** >>>>> * @author Rey Vincent P. Babilonia >>>>> * @number +63 2 426 9760 local 1302 >>>>> * @pgp 0x383454CF pgp.mit.edu >>>>> * @project Philippine Bioinformatics Solutions >>>>> * @program Philippine e-Science Grid >>>>> * @division Research and Development Division >>>>> * @agency Advanced Science and Technology Institute >>>>> * @url http://www.psigrid.gov.ph >>>>> */ >>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- /** * @author Rey Vincent P. Babilonia * @number +63 2 426 9760 local 1302 * @pgp 0x383454CF pgp.mit.edu * @project Philippine Bioinformatics Solutions * @program Philippine e-Science Grid * @division Research and Development Division * @agency Advanced Science and Technology Institute * @url http://www.psigrid.gov.ph */ From holland at eaglegenomics.com Mon Jul 21 09:28:46 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 21 Jul 2008 10:28:46 +0100 Subject: [Biojava-l] BioJava3 Use Cases Message-ID: Hi guys, I'd like to repeat an earlier request for use cases to guide the new BioJava 3 development work. We have a wiki page for this but it hasn't seen many updates: http://biojava.org/wiki/BioJava_3_Use_Cases Could anyone who has a task which BioJava cannot currently achieve, or does not achieve correctly, please add that task to this wiki page, so that we can try and implement it in the new code. A template for a use case has been provided on that same wiki page which you should follow when submitting your own suggestions. Basically the rule is that saying something like 'I want microarray support' isn't likely to get much of a response, but asking for a specific function, e.g. 'I want to be able to parse MAGE files' or 'I want to use XYZ technique to analyse my own chip designs', will get you a lot further. I'm setting a cut-off date for the initial list of use-cases at August 1st. Whatever's on the page at that point will be considered for implementation in the first phase of development over the next 6 months, along with updates or transfers of functionality from the existing code base where appropriate. Anything that gets added to the list after that date will only get implemented in the second later phase, date indeterminate as yet, unless whoever submits the use case also chooses to submit their own code to solve it! cheers, Richard -- Richard Holland Bioinformatics Software Developer Eagle Genomics http://www.eaglegenomics.com/ From charles at imbusch.net Wed Jul 23 09:40:30 2008 From: charles at imbusch.net (Charles Imbusch) Date: Wed, 23 Jul 2008 11:40:30 +0200 Subject: [Biojava-l] parsing BLAST result Message-ID: <4886FC8E.4070400@imbusch.net> Hello, for a project I have to parse Blast output files. To do this I used the code provided on this page: http://biojava.org/wiki/BioJava:CookBook:Blast:Parser I'm interested in the start and stop positions of the subject I align with, so I adjusted the code a bit so that it looks like: //list the hits for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); System.out.print("\tmatch: "+hit.getSubjectID()); System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); System.out.println("\te score: "+hit.getEValue()); } I execute "java BlastParserOriginal S2431-F.fasta.txt" and have a look at the best hit: ... match: 48_scaffold.txt SubSeqStart: 3320 SubSeqStop: 2952643 e score: 0.0 ... The subject id is correct but the numbers are just nonsense. It should be 610956 for the start and 610367 for the end position. This doesn't happen will all Blast result files but with some. Is there a solution for that? How do you parse the Blast files? I just uploaded the Blast output to http://charles.imbusch.net/tmp/ Any answer is appreciated. Cheers, Charles From holland at eaglegenomics.com Wed Jul 23 18:20:53 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 23 Jul 2008 19:20:53 +0100 Subject: [Biojava-l] parsing BLAST result In-Reply-To: <4886FC8E.4070400@imbusch.net> References: <4886FC8E.4070400@imbusch.net> Message-ID: Your hits consist of numerous sub-hits, which means that the hits themselves don't contain meaningful data. You can get the sub-hits by doing this: // existing code to list the hits for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); System.out.print("\tmatch: "+hit.getSubjectID()); System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); System.out.println("\te score: "+hit.getEValue()); // new code to get the subhits System.out.println("\t\t Subhits:"); for (Iterator j = hit.getSubHits().iterator(); j.hasNext(); ) { SeqSimilaritySearchSubHit subhit = (SeqSimilaritySearchSubHit)j.next(); System.out.print("\t\tSubSeqStart: "+subhit.getSubjectStart()); System.out.print("\t\tSubSeqStop: "+subhit.getSubjectEnd()); System.out.println("\t\te score: "+subhit.getEValue()); } } cheers, Richard 2008/7/23 Charles Imbusch : > Hello, > > for a project I have to parse Blast output files. To do this I used the code > provided on this page: > > http://biojava.org/wiki/BioJava:CookBook:Blast:Parser > > I'm interested in the start and stop positions of the subject I align with, > so > I adjusted the code a bit so that it looks like: > > //list the hits > for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { > SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); > System.out.print("\tmatch: "+hit.getSubjectID()); > System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); > System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); > System.out.println("\te score: "+hit.getEValue()); > } > > I execute "java BlastParserOriginal S2431-F.fasta.txt" and have a look at > the > best hit: > ... > match: 48_scaffold.txt SubSeqStart: 3320 SubSeqStop: 2952643 e > score: 0.0 > ... > The subject id is correct but the numbers are just nonsense. It should be > 610956 for the start > and 610367 for the end position. > > This doesn't happen will all Blast result files but with some. Is there a > solution for that? How > do you parse the Blast files? > > I just uploaded the Blast output to http://charles.imbusch.net/tmp/ > > Any answer is appreciated. > > Cheers, > Charles > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland Bioinformatics Software Developer Eagle Genomics http://www.eaglegenomics.com/ From charles at imbusch.net Wed Jul 23 23:22:52 2008 From: charles at imbusch.net (Charles Imbusch) Date: Thu, 24 Jul 2008 01:22:52 +0200 Subject: [Biojava-l] parsing BLAST result In-Reply-To: References: <4886FC8E.4070400@imbusch.net> Message-ID: <4887BD4C.8090509@imbusch.net> Thanks for that information. That did the job! Cheers, Charles Richard Holland wrote: > Your hits consist of numerous sub-hits, which means that the hits > themselves don't contain meaningful data. You can get the sub-hits by > doing this: > > // existing code to list the hits > for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { > SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); > System.out.print("\tmatch: "+hit.getSubjectID()); > System.out.print("\tSubSeqStart: "+hit.getSubjectStart()); > System.out.print("\tSubSeqStop: "+hit.getSubjectEnd()); > System.out.println("\te score: "+hit.getEValue()); > > // new code to get the subhits > System.out.println("\t\t Subhits:"); > for (Iterator j = hit.getSubHits().iterator(); j.hasNext(); ) { > SeqSimilaritySearchSubHit subhit = > (SeqSimilaritySearchSubHit)j.next(); > System.out.print("\t\tSubSeqStart: "+subhit.getSubjectStart()); > System.out.print("\t\tSubSeqStop: "+subhit.getSubjectEnd()); > System.out.println("\t\te score: "+subhit.getEValue()); > } > } > > > cheers, > Richard > From peter.robinson at t-online.de Sat Jul 26 10:41:49 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sat, 26 Jul 2008 12:41:49 +0200 Subject: [Biojava-l] Installation woes Message-ID: <488AFF6D.1000505@t-online.de> Hi Biojava, I am entirely new to Biojava and have limited Java experience (C is more my thing), and so this is almost certainly a dumb question, but I cannot seem to find an answer in the online docs. I am running debian 4 linux and have: java version "1.6.0_06" Java(TM) SE Runtime Environment (build 1.6.0_06-b02) Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) I have downloaded the biojava code, unpacked it, and set the CLASSPATH in bashrc : BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 export CLASSPATH=${BIOJAVA_BASE}/biojava.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar export CLASSPATH=${CLASSPATH}:. This also goes through without error from the command line. However, when I try to compile one of the test programs as instructed on the page: http://biojava.org/wiki/BioJava:GetStarted peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java I get a bunch of errors, apparently javac cannot find the imports it needs. (see bottom of this mail). I would greatly appreciate any tips how to get started here! Thanks, Peter peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java seq/TestEmbl.java:25: package org.biojavax does not exist import org.biojavax.Namespace; ^ seq/TestEmbl.java:26: package org.biojavax does not exist import org.biojavax.RichObjectFactory; ^ seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequence; ^ seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequenceIterator; ^ seq/TestEmbl.java:48: cannot find symbol symbol : class Namespace location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:48: cannot find symbol symbol : variable RichObjectFactory location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:50: cannot find symbol symbol : class RichSequenceIterator location: class seq.TestEmbl RichSequenceIterator seqI = ^ seq/TestEmbl.java:51: package RichSequence does not exist RichSequence.IOTools.readEMBLDNA(br, ns); ^ seq/TestEmbl.java:54: cannot find symbol symbol : class RichSequence location: class seq.TestEmbl RichSequence seq = seqI.nextRichSequence(); ^ seq/TestEmbl.java:57: package RichSequence does not exist RichSequence.IOTools.writeEMBL(System.out, seq, ns); ^ 10 errors peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version java version "1.6.0_06" Java(TM) SE Runtime Environment (build 1.6.0_06-b02) Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java seq/TestEmbl.java:25: package org.biojavax does not exist import org.biojavax.Namespace; ^ seq/TestEmbl.java:26: package org.biojavax does not exist import org.biojavax.RichObjectFactory; ^ seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequence; ^ seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequenceIterator; ^ seq/TestEmbl.java:48: cannot find symbol symbol : class Namespace location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:48: cannot find symbol symbol : variable RichObjectFactory location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:50: cannot find symbol symbol : class RichSequenceIterator location: class seq.TestEmbl RichSequenceIterator seqI = ^ seq/TestEmbl.java:51: package RichSequence does not exist RichSequence.IOTools.readEMBLDNA(br, ns); ^ seq/TestEmbl.java:54: cannot find symbol symbol : class RichSequence location: class seq.TestEmbl RichSequence seq = seqI.nextRichSequence(); ^ seq/TestEmbl.java:57: package RichSequence does not exist RichSequence.IOTools.writeEMBL(System.out, seq, ns); ^ 10 errors peter at peter:~/bin/biojava/biojava-live_1.6/demos$ From james at carmanconsulting.com Sat Jul 26 12:40:55 2008 From: james at carmanconsulting.com (James Carman) Date: Sat, 26 Jul 2008 08:40:55 -0400 Subject: [Biojava-l] Installation woes In-Reply-To: <488AFF6D.1000505@t-online.de> References: <488AFF6D.1000505@t-online.de> Message-ID: Try export CLASSPATH=$CLASSPATH:... Basically, remove the "squiggly braces" On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson wrote: > Hi Biojava, > > I am entirely new to Biojava and have limited Java experience (C is more my > thing), and so this is almost certainly a dumb question, but I cannot seem > to find an answer in the online docs. I am running debian 4 linux and have: > > java version "1.6.0_06" > Java(TM) SE Runtime Environment (build 1.6.0_06-b02) > Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) > > > > I have downloaded the biojava code, unpacked it, and set the CLASSPATH in > bashrc : > > BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 > export CLASSPATH=${BIOJAVA_BASE}/biojava.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar > export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar > export CLASSPATH=${CLASSPATH}:. > > > This also goes through without error from the command line. However, when I > try to compile one of the test programs as instructed on the page: > http://biojava.org/wiki/BioJava:GetStarted > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > > > I get a bunch of errors, apparently javac cannot find the imports it needs. > (see bottom of this mail). > > I would greatly appreciate any tips how to get started here! > Thanks, Peter > > > > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > seq/TestEmbl.java:25: package org.biojavax does not exist > import org.biojavax.Namespace; > ^ > seq/TestEmbl.java:26: package org.biojavax does not exist > import org.biojavax.RichObjectFactory; > ^ > seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequence; > ^ > seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequenceIterator; > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : class Namespace > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : variable RichObjectFactory > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:50: cannot find symbol > symbol : class RichSequenceIterator > location: class seq.TestEmbl > RichSequenceIterator seqI = > ^ > seq/TestEmbl.java:51: package RichSequence does not exist > RichSequence.IOTools.readEMBLDNA(br, ns); > ^ > seq/TestEmbl.java:54: cannot find symbol > symbol : class RichSequence > location: class seq.TestEmbl > RichSequence seq = seqI.nextRichSequence(); > ^ > seq/TestEmbl.java:57: package RichSequence does not exist > RichSequence.IOTools.writeEMBL(System.out, seq, ns); > ^ > 10 errors > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version > java version "1.6.0_06" > Java(TM) SE Runtime Environment (build 1.6.0_06-b02) > Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > seq/TestEmbl.java:25: package org.biojavax does not exist > import org.biojavax.Namespace; > ^ > seq/TestEmbl.java:26: package org.biojavax does not exist > import org.biojavax.RichObjectFactory; > ^ > seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequence; > ^ > seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequenceIterator; > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : class Namespace > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : variable RichObjectFactory > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:50: cannot find symbol > symbol : class RichSequenceIterator > location: class seq.TestEmbl > RichSequenceIterator seqI = > ^ > seq/TestEmbl.java:51: package RichSequence does not exist > RichSequence.IOTools.readEMBLDNA(br, ns); > ^ > seq/TestEmbl.java:54: cannot find symbol > symbol : class RichSequence > location: class seq.TestEmbl > RichSequence seq = seqI.nextRichSequence(); > ^ > seq/TestEmbl.java:57: package RichSequence does not exist > RichSequence.IOTools.writeEMBL(System.out, seq, ns); > ^ > 10 errors > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From peter.robinson at t-online.de Sun Jul 27 07:23:13 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sun, 27 Jul 2008 09:23:13 +0200 Subject: [Biojava-l] Installation woes In-Reply-To: References: <488AFF6D.1000505@t-online.de> Message-ID: <488C2261.9000303@t-online.de> Hi, Thanks. I think that squiggly braces are OK for the shell, but in any case, I removed them from .bashrc, which now goes as follows: export CLASSPATH=/home/peter/bin/biojava/biojava-live_1.6/biojava.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar export CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar export CLASSPATH=$CLASSPATH:. **************The class path variable seems to be set OK peter at peter:~/bin/biojava/biojava-live_1.6/demos$ echo $CLASSPATH /home/peter/bin/biojava/biojava-live_1.6/biojava.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar:. ************ The paths appear to be correct: peter at peter:~/bin/biojava/biojava-live_1.6/demos$ ls /home/peter/bin/biojava/biojava-live_1.6/*.jar /home/peter/bin/biojava/biojava-live_1.6/apps-live.jar /home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar /home/peter/bin/biojava/biojava-live_1.6/biojava-live.jar /home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar /home/peter/bin/biojava/biojava-live_1.6/bytecode.jar /home/peter/bin/biojava/biojava-live_1.6/demos-live.jar /home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar /home/peter/bin/biojava/biojava-live_1.6/jgrapht-jdk1.5.jar /home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar /home/peter/bin/biojava/biojava-live_1.6/junit-4.4.jar ***********But again, I cannot compile any of the demo programs peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java seq/TestEmbl.java:25: package org.biojavax does not exist import org.biojavax.Namespace; ^ seq/TestEmbl.java:26: package org.biojavax does not exist import org.biojavax.RichObjectFactory; ^ seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequence; ^ seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist import org.biojavax.bio.seq.RichSequenceIterator; ^ seq/TestEmbl.java:48: cannot find symbol symbol : class Namespace location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:48: cannot find symbol symbol : variable RichObjectFactory location: class seq.TestEmbl Namespace ns = RichObjectFactory.getDefaultNamespace(); ^ seq/TestEmbl.java:50: cannot find symbol symbol : class RichSequenceIterator location: class seq.TestEmbl RichSequenceIterator seqI = ^ seq/TestEmbl.java:51: package RichSequence does not exist RichSequence.IOTools.readEMBLDNA(br, ns); ^ seq/TestEmbl.java:54: cannot find symbol symbol : class RichSequence location: class seq.TestEmbl RichSequence seq = seqI.nextRichSequence(); ^ seq/TestEmbl.java:57: package RichSequence does not exist RichSequence.IOTools.writeEMBL(System.out, seq, ns); ^ 10 errors peter at peter:~/bin/biojava/biojava-live_1.6/demos$ Richard Holland wrote: > Hello. > > Before typing the javac instruction, type the following to check what > your classpath actually contains: > > echo $CLASSPATH > > If this doesn't immediately 'look right' (i.e. it has curly braces or > variable names embedded in it, or doesn't match where you think the > files are), then this'll be where the problem is. > > If you can't see any obvious problems with it, then post it as a reply > to this message and we can take a closer look. > > cheers, > Richard > > > 2008/7/26 James Carman : > >> Try export CLASSPATH=$CLASSPATH:... >> >> Basically, remove the "squiggly braces" >> >> >> On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson >> wrote: >> >>> Hi Biojava, >>> >>> I am entirely new to Biojava and have limited Java experience (C is more my >>> thing), and so this is almost certainly a dumb question, but I cannot seem >>> to find an answer in the online docs. I am running debian 4 linux and have: >>> >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> >>> >>> >>> I have downloaded the biojava code, unpacked it, and set the CLASSPATH in >>> bashrc : >>> >>> BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 >>> export CLASSPATH=${BIOJAVA_BASE}/biojava.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar >>> export CLASSPATH=${CLASSPATH}:. >>> >>> >>> This also goes through without error from the command line. However, when I >>> try to compile one of the test programs as instructed on the page: >>> http://biojava.org/wiki/BioJava:GetStarted >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> >>> >>> I get a bunch of errors, apparently javac cannot find the imports it needs. >>> (see bottom of this mail). >>> >>> I would greatly appreciate any tips how to get started here! >>> Thanks, Peter >>> >>> >>> >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > > > From peter.robinson at t-online.de Sun Jul 27 07:46:56 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sun, 27 Jul 2008 09:46:56 +0200 Subject: [Biojava-l] Installation woes [SOLVED] In-Reply-To: References: <488AFF6D.1000505@t-online.de> Message-ID: <488C27F0.6070705@t-online.de> Richard Holland wrote: I found the problem. The file biojava-live.jar needs to be added to the CLASSPATH This means that the page http://biojava.org/wiki/BioJava:GetStarted needs to be corrected! cheers, Peter > Hello. > > Before typing the javac instruction, type the following to check what > your classpath actually contains: > > echo $CLASSPATH > > If this doesn't immediately 'look right' (i.e. it has curly braces or > variable names embedded in it, or doesn't match where you think the > files are), then this'll be where the problem is. > > If you can't see any obvious problems with it, then post it as a reply > to this message and we can take a closer look. > > cheers, > Richard > > > 2008/7/26 James Carman : > >> Try export CLASSPATH=$CLASSPATH:... >> >> Basically, remove the "squiggly braces" >> >> >> On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson >> wrote: >> >>> Hi Biojava, >>> >>> I am entirely new to Biojava and have limited Java experience (C is more my >>> thing), and so this is almost certainly a dumb question, but I cannot seem >>> to find an answer in the online docs. I am running debian 4 linux and have: >>> >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> >>> >>> >>> I have downloaded the biojava code, unpacked it, and set the CLASSPATH in >>> bashrc : >>> >>> BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 >>> export CLASSPATH=${BIOJAVA_BASE}/biojava.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar >>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar >>> export CLASSPATH=${CLASSPATH}:. >>> >>> >>> This also goes through without error from the command line. However, when I >>> try to compile one of the test programs as instructed on the page: >>> http://biojava.org/wiki/BioJava:GetStarted >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> >>> >>> I get a bunch of errors, apparently javac cannot find the imports it needs. >>> (see bottom of this mail). >>> >>> I would greatly appreciate any tips how to get started here! >>> Thanks, Peter >>> >>> >>> >>> >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version >>> java version "1.6.0_06" >>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java >>> seq/TestEmbl.java:25: package org.biojavax does not exist >>> import org.biojavax.Namespace; >>> ^ >>> seq/TestEmbl.java:26: package org.biojavax does not exist >>> import org.biojavax.RichObjectFactory; >>> ^ >>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequence; >>> ^ >>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>> import org.biojavax.bio.seq.RichSequenceIterator; >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : class Namespace >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:48: cannot find symbol >>> symbol : variable RichObjectFactory >>> location: class seq.TestEmbl >>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>> ^ >>> seq/TestEmbl.java:50: cannot find symbol >>> symbol : class RichSequenceIterator >>> location: class seq.TestEmbl >>> RichSequenceIterator seqI = >>> ^ >>> seq/TestEmbl.java:51: package RichSequence does not exist >>> RichSequence.IOTools.readEMBLDNA(br, ns); >>> ^ >>> seq/TestEmbl.java:54: cannot find symbol >>> symbol : class RichSequence >>> location: class seq.TestEmbl >>> RichSequence seq = seqI.nextRichSequence(); >>> ^ >>> seq/TestEmbl.java:57: package RichSequence does not exist >>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>> ^ >>> 10 errors >>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > > > From james at carmanconsulting.com Sun Jul 27 11:43:49 2008 From: james at carmanconsulting.com (James Carman) Date: Sun, 27 Jul 2008 07:43:49 -0400 Subject: [Biojava-l] Installation woes In-Reply-To: <488C2261.9000303@t-online.de> References: <488AFF6D.1000505@t-online.de> <488C2261.9000303@t-online.de> Message-ID: This is exactly why BioJava needs to be Mavenized! On Sun, Jul 27, 2008 at 3:23 AM, Peter Robinson wrote: > Hi, > Thanks. I think that squiggly braces are OK for the shell, but in any case, > I removed them from .bashrc, which now goes as follows: > > > export CLASSPATH=/home/peter/bin/biojava/biojava-live_1.6/biojava.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar > export > CLASSPATH=$CLASSPATH:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar > export CLASSPATH=$CLASSPATH:. > > **************The class path variable seems to be set OK > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ echo $CLASSPATH > /home/peter/bin/biojava/biojava-live_1.6/biojava.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar:/home/peter/bin/biojava/biojava-live_1.6/bytecode.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar:/home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar:. > > ************ The paths appear to be correct: > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ ls > /home/peter/bin/biojava/biojava-live_1.6/*.jar > /home/peter/bin/biojava/biojava-live_1.6/apps-live.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-dbcp-1.1.jar > /home/peter/bin/biojava/biojava-live_1.6/biojava-live.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-pool-1.1.jar > /home/peter/bin/biojava/biojava-live_1.6/bytecode.jar > /home/peter/bin/biojava/biojava-live_1.6/demos-live.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-cli.jar > /home/peter/bin/biojava/biojava-live_1.6/jgrapht-jdk1.5.jar > /home/peter/bin/biojava/biojava-live_1.6/commons-collections-2.1.jar > /home/peter/bin/biojava/biojava-live_1.6/junit-4.4.jar > > ***********But again, I cannot compile any of the demo programs > > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac seq/TestEmbl.java > seq/TestEmbl.java:25: package org.biojavax does not exist > import org.biojavax.Namespace; > ^ > seq/TestEmbl.java:26: package org.biojavax does not exist > import org.biojavax.RichObjectFactory; > ^ > seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequence; > ^ > seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist > import org.biojavax.bio.seq.RichSequenceIterator; > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : class Namespace > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:48: cannot find symbol > symbol : variable RichObjectFactory > location: class seq.TestEmbl > Namespace ns = RichObjectFactory.getDefaultNamespace(); > ^ > seq/TestEmbl.java:50: cannot find symbol > symbol : class RichSequenceIterator > location: class seq.TestEmbl > RichSequenceIterator seqI = > ^ > seq/TestEmbl.java:51: package RichSequence does not exist > RichSequence.IOTools.readEMBLDNA(br, ns); > ^ > seq/TestEmbl.java:54: cannot find symbol > symbol : class RichSequence > location: class seq.TestEmbl > RichSequence seq = seqI.nextRichSequence(); > ^ > seq/TestEmbl.java:57: package RichSequence does not exist > RichSequence.IOTools.writeEMBL(System.out, seq, ns); > ^ > 10 errors > peter at peter:~/bin/biojava/biojava-live_1.6/demos$ > > > Richard Holland wrote: >> >> Hello. >> >> Before typing the javac instruction, type the following to check what >> your classpath actually contains: >> >> echo $CLASSPATH >> >> If this doesn't immediately 'look right' (i.e. it has curly braces or >> variable names embedded in it, or doesn't match where you think the >> files are), then this'll be where the problem is. >> >> If you can't see any obvious problems with it, then post it as a reply >> to this message and we can take a closer look. >> >> cheers, >> Richard >> >> >> 2008/7/26 James Carman : >> >>> >>> Try export CLASSPATH=$CLASSPATH:... >>> >>> Basically, remove the "squiggly braces" >>> >>> >>> On Sat, Jul 26, 2008 at 6:41 AM, Peter Robinson >>> wrote: >>> >>>> >>>> Hi Biojava, >>>> >>>> I am entirely new to Biojava and have limited Java experience (C is more >>>> my >>>> thing), and so this is almost certainly a dumb question, but I cannot >>>> seem >>>> to find an answer in the online docs. I am running debian 4 linux and >>>> have: >>>> >>>> java version "1.6.0_06" >>>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>>> >>>> >>>> >>>> I have downloaded the biojava code, unpacked it, and set the CLASSPATH >>>> in >>>> bashrc : >>>> >>>> BIOJAVA_BASE=/home/peter/bin/biojava/biojava-live_1.6 >>>> export CLASSPATH=${BIOJAVA_BASE}/biojava.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-cli.jar >>>> export >>>> CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-collections-2.1.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/bytecode.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-dbcp-1.1.jar >>>> export CLASSPATH=${CLASSPATH}:${BIOJAVA_BASE}/commons-pool-1.1.jar >>>> export CLASSPATH=${CLASSPATH}:. >>>> >>>> >>>> This also goes through without error from the command line. However, >>>> when I >>>> try to compile one of the test programs as instructed on the page: >>>> http://biojava.org/wiki/BioJava:GetStarted >>>> >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac >>>> seq/TestEmbl.java >>>> >>>> >>>> I get a bunch of errors, apparently javac cannot find the imports it >>>> needs. >>>> (see bottom of this mail). >>>> >>>> I would greatly appreciate any tips how to get started here! >>>> Thanks, Peter >>>> >>>> >>>> >>>> >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac >>>> seq/TestEmbl.java >>>> seq/TestEmbl.java:25: package org.biojavax does not exist >>>> import org.biojavax.Namespace; >>>> ^ >>>> seq/TestEmbl.java:26: package org.biojavax does not exist >>>> import org.biojavax.RichObjectFactory; >>>> ^ >>>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequence; >>>> ^ >>>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequenceIterator; >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : class Namespace >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : variable RichObjectFactory >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:50: cannot find symbol >>>> symbol : class RichSequenceIterator >>>> location: class seq.TestEmbl >>>> RichSequenceIterator seqI = >>>> ^ >>>> seq/TestEmbl.java:51: package RichSequence does not exist >>>> RichSequence.IOTools.readEMBLDNA(br, ns); >>>> ^ >>>> seq/TestEmbl.java:54: cannot find symbol >>>> symbol : class RichSequence >>>> location: class seq.TestEmbl >>>> RichSequence seq = seqI.nextRichSequence(); >>>> ^ >>>> seq/TestEmbl.java:57: package RichSequence does not exist >>>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>>> ^ >>>> 10 errors >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ java -version >>>> java version "1.6.0_06" >>>> Java(TM) SE Runtime Environment (build 1.6.0_06-b02) >>>> Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode) >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ javac >>>> seq/TestEmbl.java >>>> seq/TestEmbl.java:25: package org.biojavax does not exist >>>> import org.biojavax.Namespace; >>>> ^ >>>> seq/TestEmbl.java:26: package org.biojavax does not exist >>>> import org.biojavax.RichObjectFactory; >>>> ^ >>>> seq/TestEmbl.java:27: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequence; >>>> ^ >>>> seq/TestEmbl.java:28: package org.biojavax.bio.seq does not exist >>>> import org.biojavax.bio.seq.RichSequenceIterator; >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : class Namespace >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:48: cannot find symbol >>>> symbol : variable RichObjectFactory >>>> location: class seq.TestEmbl >>>> Namespace ns = RichObjectFactory.getDefaultNamespace(); >>>> ^ >>>> seq/TestEmbl.java:50: cannot find symbol >>>> symbol : class RichSequenceIterator >>>> location: class seq.TestEmbl >>>> RichSequenceIterator seqI = >>>> ^ >>>> seq/TestEmbl.java:51: package RichSequence does not exist >>>> RichSequence.IOTools.readEMBLDNA(br, ns); >>>> ^ >>>> seq/TestEmbl.java:54: cannot find symbol >>>> symbol : class RichSequence >>>> location: class seq.TestEmbl >>>> RichSequence seq = seqI.nextRichSequence(); >>>> ^ >>>> seq/TestEmbl.java:57: package RichSequence does not exist >>>> RichSequence.IOTools.writeEMBL(System.out, seq, ns); >>>> ^ >>>> 10 errors >>>> peter at peter:~/bin/biojava/biojava-live_1.6/demos$ >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> >> >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas.prlic at gmail.com Sun Jul 27 13:06:18 2008 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Sun, 27 Jul 2008 06:06:18 -0700 Subject: [Biojava-l] build.xml (was: Installation woes [SOLVED]) Message-ID: <59a41c430807270606o135cc94ai26a9ede906e4967@mail.gmail.com> Hi, I built the release with the default biojava ant build file. "ant dist" , which contains the line i.e. it should be changed there... In general about our build.xml: This file contains many tasks. I believe it might make sense to split it into smaller files e.g. * a build.xml that contains the core tasks to build biojava from svn and a * build-release.xml that contains the tasks that are release related... Andreas On 27 Jul 2008, at 00:46, Peter Robinson wrote: Richard Holland wrote: I found the problem. The file biojava-live.jar needs to be added to the CLASSPATH This means that the page http://biojava.org/wiki/BioJava:GetStarted needs to be corrected! From peter.robinson at t-online.de Sun Jul 27 15:57:28 2008 From: peter.robinson at t-online.de (Peter Robinson) Date: Sun, 27 Jul 2008 17:57:28 +0200 Subject: [Biojava-l] Short names for Amino acid symbols Message-ID: <488C9AE8.9080305@t-online.de> Hi, thanks to all on the list who helped me get started with Biojava, and by the way, the online documents are quite helpful! I am trying to develop some code to look for signs of positive selection in human sequences by making multiple alignments of protein sequences and mapping the nucleotide sequences onto this alignment and checking synonymous and nonsynonymous nucleotide substitutions in several species (etc). A few small questions; 1) I have written a class to encapsulate all I need from a given Genbank mRNA sequence; the entire mRNA, the CDS and the corresponding protein sequence. I have some methods such as the following: private void setCDSSequence() { Feature CDS = getCDSFeature(this.completeSequence); Location loc = CDS.getLocation(); SymbolList symL = this.completeSequence.subList(loc.getMin(), loc.getMax()-3); //-3 to remove stop codon this.CDS= symL; } Question: Why is there (seemingly) no way in Biojava to create a Sequence object instead of a SymbolList object? Or did I miss something? 2) I would then like to printout the protein alignment to check for correctness, and it seems there is no way of getting from a symbol to the one-letter aminoacid code. That is, proteinAlignment.get(j).symbolAt(k).getName() will return "Ala" instead of "A" etc. Is there a good way of getting the short symbols? Thanks, Peter From community at struck.lu Mon Jul 28 09:25:41 2008 From: community at struck.lu (community at struck.lu) Date: Mon, 28 Jul 2008 11:25:41 +0200 Subject: [Biojava-l] Short names for Amino acid symbols In-Reply-To: <488C9AE8.9080305@t-online.de> References: <488C9AE8.9080305@t-online.de> Message-ID: Peter Robinson <peter.robinson at t-online.de> wrote: > 2) I would then like to printout the protein alignment to check for > correctness, and it seems there is no way of getting from a symbol to > the one-letter aminoacid code. That is, > > proteinAlignment.get(j).symbolAt(k).getName() > > will return "Ala" instead of "A" etc. Is there a good way of getting the > short symbols? This small tutorial might help you out: http://biojava.org/wiki/BioJava:Cookbook:Translation:OneLetterAmbiDaniel _________________________________________________________ Mail sent using root eSolutions Webmailer - www.root.lu