From aumanga at biggjapan.com Wed Mar 4 05:13:35 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Wed, 04 Mar 2009 19:13:35 +0900 Subject: [Biojava-l] PDB parsing: Calculate absolute atom numbers of a chain. Message-ID: <49AE544F.9080502@biggjapan.com> Greetings all, From a database; I retrieve PDB-id ,chainID and epitope positions.These epitope positions are calculated relative to its Chain. I want to select these epitope positions (residues) in JMol.But in jmol the select query support only absolute residues numbers. Ex: CHAIN L : residues 10,12,13,15 (relative to chain L) But in PDB file (in JMOL) the values are 210,212,213,215 - this are what I want to calculate? Anyway to calculate this? Thanks in advance, Umanga -- ??? ???? ????? ???????????????????BiGG) ?140-0001 ?????????3-6-9 ??????8F TEL:03-6679-8763 FAX:03-6679-8764 From andreas.prlic at gmail.com Wed Mar 4 10:38:26 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Wed, 4 Mar 2009 07:38:26 -0800 Subject: [Biojava-l] PDB parsing: Calculate absolute atom numbers of a chain. In-Reply-To: <49AE544F.9080502@biggjapan.com> References: <49AE544F.9080502@biggjapan.com> Message-ID: <59a41c430903040738o43b5417as688f8f18b7d14f24@mail.gmail.com> Hi Umanga, You will need to use the getPDBCode() method to obtain the PDB residue number as well as the chain ID to select the residue in Jmol. Andreas 2009/3/4 Ashika Umanga Umagiliya : > Greetings all, > > From a database; I retrieve PDB-id ,chainID and epitope positions.These > epitope positions are calculated relative to its Chain. > I want to select these epitope positions (residues) in JMol.But in jmol the > select query support only absolute residues numbers. > > Ex: > CHAIN L : residues 10,12,13,15 (relative to chain L) > But in PDB file (in JMOL) the values are 210,212,213,215 - this are what I > want to calculate? > > Anyway to calculate this? > > Thanks in advance, > Umanga > > -- > $B%"%7%+(B $B%&%^%s%,(B $B%&%^%.%j%d(B > $B-j9q:]%P%$%*%$%s%U%)%^%F%#%/%98&5f=j!J(BBiGG) > $B")(B140-0001 > $BEl5~ETIJ at n6hKLIJ@n(B3-6-9 $B%"%s%I%&%S%k(B8F > TEL:03-6679-8763 > FAX:03-6679-8764 > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From benmpe at pml.ac.uk Wed Mar 4 10:36:21 2009 From: benmpe at pml.ac.uk (Ben Temperton) Date: Wed, 4 Mar 2009 15:36:21 -0000 Subject: [Biojava-l] Parsing out identities from a blast file Message-ID: <7546D505C3AF304280B10C3D8B8DFD3902F221D6@burgh.npm.ac.uk> Hi there, Does anyone know what methods I need to invoke to pull out the identity for a particular hit when parsing a blast output? I can get the score, the start & end points etc using: SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)obj; LOGGER.debug("match:" + hit.getSubjectID() + ", score:" + hit.getScore() + ", e-value:" + hit.getEValue()); But I can't seem to find any method to pull out the identity of the hit, which I would have thought would be in the SeqSimilaritySearchSubHit interface. Many thanks, Ben -------------------------------------------------------------------------------- Plymouth Marine Laboratory Registered Office: Prospect Place The Hoe Plymouth PL1 3DH Website: www.pml.ac.uk Registered Charity No. 1091222 PML is a company limited by guarantee registered in England & Wales company number 4178503 PML is a member of the Plymouth Marine Sciences Partnership Website: www.pmsp.org.uk -------------------------------------------------------------------------------- This e-mail, its content and any file attachments are confidential. If you have received this e-mail in error please do not copy, disclose it to any third party or use the contents or attachments in any way. Please notify the sender by replying to this e-mail or e-mail forinfo at pml.ac.uk and then delete the email without making any copies or using it in any other way. The content of this message may contain personal views which are not the views of Plymouth Marine Laboratory unless specifically stated. You are reminded that e-mail communications are not secure and may contain viruses. Plymouth Marine Laboratory accepts no liability for any loss or damage which may be caused by viruses. -------------------------------------------------------------------------------- From mark.schreiber at novartis.com Wed Mar 4 21:53:47 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 5 Mar 2009 10:53:47 +0800 Subject: [Biojava-l] Parsing out identities from a blast file In-Reply-To: <7546D505C3AF304280B10C3D8B8DFD3902F221D6@burgh.npm.ac.uk> Message-ID: Hi - You need to use the SeqSimilaritySearchHit which is that parent of the SeqSimilaritySearchSubHit. It has a method called getSubjectID() which gives you the ID of the hit subject. All subhits with the same parent search hit will have the same subject ID so it makes more sense to store the information in this class. - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/04/2009 11:36:21 PM: > Hi there, > > Does anyone know what methods I need to invoke to pull out the identity > for a particular hit when parsing a blast output? > > I can get the score, the start & end points etc using: > SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)obj; > LOGGER.debug("match:" + hit.getSubjectID() + ", score:" > + hit.getScore() + ", e-value:" + hit.getEValue()); > > But I can't seem to find any method to pull out the identity of the hit, > which I would have thought would be in the SeqSimilaritySearchSubHit > interface. > > Many thanks, > > Ben > > > -------------------------------------------------------------------------------- > Plymouth Marine Laboratory > > Registered Office: > Prospect Place > The Hoe > Plymouth PL1 3DH > > Website: www.pml.ac.uk > Registered Charity No. 1091222 > PML is a company limited by guarantee > registered in England & Wales > company number 4178503 > > PML is a member of the Plymouth Marine Sciences Partnership > Website: www.pmsp.org.uk > -------------------------------------------------------------------------------- > This e-mail, its content and any file attachments are confidential. > > If you have received this e-mail in error please do not copy, > disclose it to any third party or use the contents or attachments in > any way. Please notify the sender by replying to this e-mail or e- > mail forinfo at pml.ac.uk and then delete the email without making any > copies or using it in any other way. > > The content of this message may contain personal views which are not > the views of Plymouth Marine Laboratory unless specifically stated. > > You are reminded that e-mail communications are not secure and may > contain viruses. Plymouth Marine Laboratory accepts no liability for > any loss or damage which may be caused by viruses. > -------------------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From bioinfosej17 at gmail.com Thu Mar 5 04:53:47 2009 From: bioinfosej17 at gmail.com (Sej Modha) Date: Thu, 5 Mar 2009 15:23:47 +0530 Subject: [Biojava-l] Biojava-l Digest, Vol 74, Issue 1 In-Reply-To: References: Message-ID: <4844b4790903050153u213599dax32d7ee46ab1d97b2@mail.gmail.com> I m working on biojava.... I m not getting what is the arg[0] we need to pass in BlastParser ... if this particular program is providing the blast facility like utility or not. If yes then send me one example program with each n every details. plz....... From marcel.huntemann at gmail.com Fri Mar 6 19:55:35 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Fri, 06 Mar 2009 16:55:35 -0800 Subject: [Biojava-l] Stop condition for blast parser Message-ID: <49B1C607.6050803@Gmail.com> Hi! I followed the example in http://biojava.org/wiki/BioJava:CookBook:Blast:Echo to create my own blast parser. I've the problem that I am often only interested in the hits of a certain query. Is there a way to tell the blast parser via my customized SearchContentHandler to stop parsing after it found this certain queryId? Because often the wanted queryId will be in the beginning of the blast file and I don't want the parser to run over the really, really long rest of the file, even if it doesn't do much during that. I just want the parser to stop then. Can I implement that somehow in my customized SearchContentHandler or do I've to change the source code of biojava somewhere for that? Thanks, Marcel From aumanga at biggjapan.com Mon Mar 9 01:08:42 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Mon, 09 Mar 2009 14:08:42 +0900 Subject: [Biojava-l] Reading meta data in PDB file? Message-ID: <49B4A45A.3000102@biggjapan.com> Greetings all, I want to read following information.I noticed that most of them are stored under REMARK tag in PDB.Can I read them using BioJava PDB Parser? Thanks in Advance, Umanga. ------------------------------ Unit Cell: a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? ) Molecular Description Asymetric Unit: Polymer , Molecule , Chains Classification Source: Polymer,Scientific Name Ligand Chemical Component : Identifier , Name , Formula Diffraction Detector: Detector,Type,Collection Data Diffration Radiation: Monochromator,Diffraction Protocol,Wavelength,Wavelenth List -- ??? ???? ????? ???????????????????BiGG) ?140-0001 ?????????3-6-9 ??????8F TEL:03-6679-8763 FAX:03-6679-8764 From marcel.huntemann at gmail.com Mon Mar 9 15:00:36 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Mon, 09 Mar 2009 12:00:36 -0700 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: <93b45ca50903062057x680cdf0fu39d981938478b547@mail.gmail.com> References: <49B1C607.6050803@Gmail.com> <93b45ca50903061758t38dfde52p65d4ad7f5ddf678a@mail.gmail.com> <49B1F1A2.3040403@Gmail.com> <93b45ca50903062054k2ec842a9ua890f592286cd3d3@mail.gmail.com> <93b45ca50903062057x680cdf0fu39d981938478b547@mail.gmail.com> Message-ID: <49B56754.6080100@Gmail.com> Hi Mark! Mark Schreiber wrote: > You could just customize BlastEcho to pass on the events of interest, > ignore those that are not interesting. That's what I am doing right now. But I don't know, how to tell my customized BlastEcho to stop, when a certain condition is met during a paricular event call. What's the command for stopping there? > It could also exit if a certain > event occurs. How? > Remember it cost almost nothing to read the file so you > save time by only sending interesting events for parsing. Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 contigs that were blasted against a database with about maybe 3,000,000 genes. The blast output that I am parsing is about 13Gig big and every cycle I am looking for the results of one particular contig of these 90,000 contigs. So I definitely experienced that the time sums up a lot, when it's running in each of these 90,000 cycles over the whole file, although the contig I am looking for was already at the beginning of the file. Cheers, Marcel > > On 7 Mar 2009, 12:01 PM, "Marcel Huntemann" > > wrote: > > But where? I can't do it in my customized handler, can I? > > Mark Schreiber wrote: > Because the blast parser uses event based > parsing you should be able to > c... > > > > >> wrote: > > Hi! > > ... > > >

> > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From mark.schreiber at novartis.com Mon Mar 9 22:36:50 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 10 Mar 2009 10:36:50 +0800 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: <49B56754.6080100@Gmail.com> Message-ID: Hi - There are many ways to stop the parsing but it really depends on how you have set the program up. Notably there is no way for the Blast parsing system of BioJava to shut itself down but control probably shouldn't happen at that level. A crude but effective procedure is to write out the results when you find the hit of interest and then simply call System.exit() Another approach would be to spawn Tasks to parse each record and then have them signal to the main thread when they are complete to shut them down. If you are using Java 1.5 or earlier then you would need to do this with Threads. If you have a later version you can use the concurrent packages which are much nicer to deal with. One thing I don't understand is why you don't blast each contig separately, in that case the results would only contain your hit of interest. That means 90K separate blasts but there are versions of blast that run on clusters and the database (3 million genes) is not huge so it should be an embarrassingly parallel problem? - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: > Hi Mark! > > Mark Schreiber wrote: > > You could just customize BlastEcho to pass on the events of interest, > > ignore those that are not interesting. > That's what I am doing right now. But I don't know, how to tell my > customized BlastEcho to stop, when a certain condition is met during a > paricular event call. What's the command for stopping there? > > > It could also exit if a certain > > event occurs. > How? > > > Remember it cost almost nothing to read the file so you > > save time by only sending interesting events for parsing. > Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 > contigs that were blasted against a database with about maybe 3,000,000 > genes. The blast output that I am parsing is about 13Gig big and every > cycle I am looking for the results of one particular contig of these > 90,000 contigs. So I definitely experienced that the time sums up a lot, > when it's running in each of these 90,000 cycles over the whole file, > although the contig I am looking for was already at the beginning ofthe file. > > > Cheers, > Marcel > > > > > On 7 Mar 2009, 12:01 PM, "Marcel Huntemann" > > > wrote: > > > > But where? I can't do it in my customized handler, can I? > > > > Mark Schreiber wrote: > Because the blast parser uses event based > > parsing you should be able to > c... > > > > > > > > >> wrote: > > Hi! > > ... > > > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From hlapp at gmx.net Mon Mar 9 23:36:07 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 9 Mar 2009 23:36:07 -0400 Subject: [Biojava-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: <1F570555-12DF-42DF-8D0E-95AAE298D76A@gmx.net> References: <1F570555-12DF-42DF-8D0E-95AAE298D76A@gmx.net> Message-ID: You may recall my message to the developer lists of several O|B|F projects in February about the idea of O|B|F applying to Google Summer of Code as a mentoring organization [1]. I felt that the response to this was very positive and encouraging. Although late (sorry, been swamped too much), I've now put up the skeleton of an ideas page at http://open-bio.org/wiki/Google_Summer_Code_2009 I basically modeled (in fact, largely copied) this page after the NESCent Phyloinformatics Summer of Code ideas pages, which I think worked pretty well. We can completely rework this, though - any feedback and suggestions are very much welcome. In the meantime, I need all developers to double check the information under 'Contact'. Would the open-bio-l mailing list indeed reach the prospective mentors and other devs? Will be you be fine with students asking for feedback to their applications on the developers (i.e., this) list? Is there a blessed IRC where at least some of the prospective mentors hang out for students to ask questions during the time they apply? I also need space for the reference information for all projects that will participate with at least one project idea (I would hope that that's all projects) to be added in the 'Open-Bio projects involved' section. ***** Most important of all, if you can volunteer to mentor a project, please post a project idea to the page in the respective section, using the idea template that's there already (copy, paste, and edit). ***** The deadline for organization applications is Friday this week, Mar 13, which is very soon. The ideas page is a major factor and component in how Google scores new mentoring organizations - the more we can show the resourcefulness and diversity of our member projects the more competitive I think we'll be. So all those who responded with ideas or willingness to help out as primary or secondary mentores earlier, I need you to think about and put up your idea(s) now. Cheers, -hilmar [1] http://tinyurl.com/ck7tqe -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From tallpaulinjax at yahoo.com Wed Mar 11 09:06:02 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 06:06:02 -0700 (PDT) Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <343262.18611.qm@web30701.mail.mud.yahoo.com> Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: ? # ${HOME} will be replaced by JVM user.home system property # netbeans_default_userdir="${HOME}/.netbeans/6.5" # Options used by NetBeans launcher by default, can be overridden by explicit # command line switches: netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" # Note that a default -Xmx is selected for you automatically. # You can find this value in var/log/messages.log file in your userdir. # The automatically selected value can be overridden by specifying -J-Xmx here # or on the command line. # command line switches netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" # If you specify the heap size (-Xmx) explicitely, you may also want to enable # Concurrent Mark & Sweep garbage collector. In such case add the following # options to the netbeans_default_options: # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled # -J-XX:+UseParNewGC # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) # Default location of JDK, can be overridden by using --jdkhome : netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" # If you have some problems with detect of proxy settings, you may want to enable # detect the proxy settings provided by JDK5 or higher. # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 8:51 AM Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul From tallpaulinjax at yahoo.com Wed Mar 11 09:17:15 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 06:17:15 -0700 (PDT) Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <338940.12456.qm@web30705.mail.mud.yahoo.com> I believe I have just answered my own question. Since I am just using BioJava as a parser (right now), I added the following line: ? ??? try { ????? pdbreader = new PDBFileReader(); ?? ?? pdbreader.setPath(localFilePath); ?? ?? pdbreader.setAlignSeqRes(false); //?added this line ????? pdbreader.setAutoFetch(true); ???? struc = pdbreader.getStructureById(pdbCode); But I wonder how people handle this problem if they require sequence alignment? ? Paul --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Re: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 9:06 AM Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: ? # ${HOME} will be replaced by JVM user.home system property # netbeans_default_userdir="${HOME}/.netbeans/6.5" # Options used by NetBeans launcher by default, can be overridden by explicit # command line switches: netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" # Note that a default -Xmx is selected for you automatically. # You can find this value in var/log/messages.log file in your userdir. # The automatically selected value can be overridden by specifying -J-Xmx here # or on the command line. # command line switches netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" # If you specify the heap size (-Xmx) explicitely, you may also want to enable # Concurrent Mark & Sweep garbage collector. In such case add the following # options to the netbeans_default_options: # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled # -J-XX:+UseParNewGC # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) # Default location of JDK, can be overridden by using --jdkhome : netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" # If you have some problems with detect of proxy settings, you may want to enable # detect the proxy settings provided by JDK5 or higher. # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 8:51 AM Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul From andreas.prlic at gmail.com Wed Mar 11 09:37:02 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Wed, 11 Mar 2009 13:37:02 +0000 Subject: [Biojava-l] Out of heap space during structure parsing. In-Reply-To: <343262.18611.qm@web30701.mail.mud.yahoo.com> References: <343262.18611.qm@web30701.mail.mud.yahoo.com> Message-ID: <59a41c430903110637sf56a0aaoa23b5203d2e28adf@mail.gmail.com> Hi Paul, If you want you can turn off the SEQRES to ATOM alignment part and reduce memory requirements a bit. You can do this with PdbFileParser.setAlignSeqRes(false); ... Andreas On Wed, Mar 11, 2009 at 1:06 PM, Paul B wrote: > Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: > > # ${HOME} will be replaced by JVM user.home system property > # netbeans_default_userdir="${HOME}/.netbeans/6.5" > # Options used by NetBeans launcher by default, can be overridden by explicit > # command line switches: > netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" > # Note that a default -Xmx is selected for you automatically. > # You can find this value in var/log/messages.log file in your userdir. > # The automatically selected value can be overridden by specifying -J-Xmx here > # or on the command line. > # command line switches > netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" > # If you specify the heap size (-Xmx) explicitely, you may also want to enable > # Concurrent Mark & Sweep garbage collector. In such case add the following > # options to the netbeans_default_options: > # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled > # -J-XX:+UseParNewGC > # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) > # Default location of JDK, can be overridden by using --jdkhome : > netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" > # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): > #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" > # If you have some problems with detect of proxy settings, you may want to enable > # detect the proxy settings provided by JDK5 or higher. > # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. > > > --- On Wed, 3/11/09, Paul B wrote: > > > From: Paul B > Subject: Out of heap space during structure parsing. > To: biojava-l at biojava.org > Date: Wednesday, March 11, 2009, 8:51 AM > > > > > > > > Hi, > > I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: > > ??? try { > ????? pdbreader = new PDBFileReader(); > ???? ?pdbreader.setPath(localFilePath); > ????? pdbreader.setAutoFetch(true); > ????? struc = pdbreader.getStructureById(pdbCode); > ??? ... > > Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) > ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) > ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) > ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) > ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) > ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) > ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) > ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) > ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) > ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) > ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) > > Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation? > > By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? > > Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? > Thanks, > > Paul > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From tallpaulinjax at yahoo.com Wed Mar 11 11:15:10 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 08:15:10 -0700 (PDT) Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <371079.36785.qm@web30702.mail.mud.yahoo.com> Hi Scooter, ? That seemed to fix the problem when I added back in the pdbreader.setAutoFetch(true), at least on this file! Thanks for that info, I am somewhat of a novice in Java and Netbeans. The full argument I added to 'run' was: -Xms32m -Xmx1024m -XX:PermSize=32m -XX:MaxPermSize=96m -Xverify:none -Dapple.laf.useScreenMenuBar=true -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+UseParNewGC ? For others: I had to remove all the '-J' prefixes from each argument. ? Andreas, thanks for your input as well! ? Paul --- On Wed, 3/11/09, Homer Willis wrote: From: Homer Willis Subject: RE: [Biojava-l] Out of heap space during structure parsing. To: "Paul B" , biojava-l at biojava.org@yahoo.com Date: Wednesday, March 11, 2009, 10:07 AM Paul The netbeans.conf file is for setting the jvm values for Netbeans which is different from running an application. If you right click on the project->properties->run you will see an option for setting the VM options and this is where you can add settings to add more memory when you run the application. Scooter Willis -----Original Message----- From: Paul B [mailto:tallpaulinjax at yahoo.com] Sent: Wednesday, March 11, 2009 9:06 AM To: biojava-l at biojava.org Subject: Re: [Biojava-l] Out of heap space during structure parsing. Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: ? # ${HOME} will be replaced by JVM user.home system property # netbeans_default_userdir="${HOME}/.netbeans/6.5" # Options used by NetBeans launcher by default, can be overridden by explicit # command line switches: netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" # Note that a default -Xmx is selected for you automatically. # You can find this value in var/log/messages.log file in your userdir. # The automatically selected value can be overridden by specifying -J-Xmx here # or on the command line. # command line switches netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" # If you specify the heap size (-Xmx) explicitely, you may also want to enable # Concurrent Mark & Sweep garbage collector. In such case add the following # options to the netbeans_default_options: # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled # -J-XX:+UseParNewGC # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) # Default location of JDK, can be overridden by using --jdkhome : netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" # If you have some problems with detect of proxy settings, you may want to enable # detect the proxy settings provided by JDK5 or higher. # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 8:51 AM Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul _______________________________________________ Biojava-l mailing list? -? Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From aumanga at biggjapan.com Wed Mar 11 21:15:05 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 12 Mar 2009 10:15:05 +0900 Subject: [Biojava-l] Reading meta data in PDB file? In-Reply-To: <49B4A45A.3000102@biggjapan.com> References: <49B4A45A.3000102@biggjapan.com> Message-ID: <49B86219.1060501@biggjapan.com> I assume there is no way to read these information and I have to modify the PDB parser ? Ashika Umanga Umagiliya wrote: > Greetings all, > I want to read following information.I noticed that most of them are > stored under REMARK tag in PDB.Can I read them using BioJava PDB Parser? > > Thanks in Advance, > Umanga. > ------------------------------ > > Unit Cell: > a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? ) > > Molecular Description Asymetric Unit: > Polymer , Molecule , Chains > > Classification > > Source: > Polymer,Scientific Name > > Ligand Chemical Component : > Identifier , Name , Formula > > Diffraction Detector: > Detector,Type,Collection Data > > Diffration Radiation: > Monochromator,Diffraction Protocol,Wavelength,Wavelenth List > -- ??? ???? ????? ???????????????????BiGG) ?140-0001 ?????????3-6-9 ??????8F TEL:03-6679-8763 FAX:03-6679-8764 From andreas.prlic at gmail.com Wed Mar 11 21:23:19 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 12 Mar 2009 01:23:19 +0000 Subject: [Biojava-l] Reading meta data in PDB file? In-Reply-To: <49B86219.1060501@biggjapan.com> References: <49B4A45A.3000102@biggjapan.com> <49B86219.1060501@biggjapan.com> Message-ID: <59a41c430903111823u1d3626acv69aa287decba649e@mail.gmail.com> Hi Ashika, actually some of the data is available via the Compound class, but for the other parts of your query that is not there you would need to add additional methods to the parser. if you have a patch, would be great if you could post it to the list... Andreas 2009/3/12 Ashika Umanga Umagiliya : > I assume there is no way to read these information and I have to modify the > PDB parser ? > > > Ashika Umanga Umagiliya wrote: >> >> Greetings all, >> I want to read following information.I noticed that most of them are >> stored under REMARK tag in PDB.Can I read them using BioJava PDB Parser? >> >> Thanks in Advance, >> Umanga. >> ------------------------------ >> >> Unit Cell: >> a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? ) >> >> Molecular Description Asymetric Unit: >> Polymer , Molecule , Chains >> >> Classification >> >> Source: >> Polymer,Scientific Name >> >> Ligand Chemical Component : >> Identifier , Name , Formula >> >> Diffraction Detector: >> Detector,Type,Collection Data >> >> Diffration Radiation: >> Monochromator,Diffraction Protocol,Wavelength,Wavelenth List >> > > > -- > $B%"%7%+(B $B%&%^%s%,(B $B%&%^%.%j%d(B > $B-j9q:]%P%$%*%$%s%U%)%^%F%#%/%98&5f=j!J(BBiGG) > $B")(B140-0001 > $BEl5~ETIJ at n6hKLIJ@n(B3-6-9 $B%"%s%I%&%S%k(B8F > TEL:03-6679-8763 > FAX:03-6679-8764 > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From marcel.huntemann at gmail.com Wed Mar 11 23:00:38 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Wed, 11 Mar 2009 20:00:38 -0700 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: References: Message-ID: <49B87AD6.7010005@Gmail.com> Hi Mark! The blast etc. is parallelized. The contigs are split into groups of 1000 and I also modified my program in the way that it works now with all those separate files. But nevertheless I also have a program that works on the concatenated blast output. The parser with my customized handler is always looking for the results of a certain contig and then compares these results to something else and also does some other stuff in-between to calculate some statistics and then creates a new parser again to get the results for the next contig. So a System.exit() is not an option, since it would stop my whole program (in which I am using the parser). I also don't wanna start working with threads here. I was just hoping that there would be a way to tell the handler that, when a certain condition is met, it should give the parser a signal to stop parsing (and maybe even to reset itself to the first line). But I guess there's no way to do it in the customized handler... Thanks, Marcel mark.schreiber at novartis.com wrote: > > Hi - > > There are many ways to stop the parsing but it really depends on how you > have set the program up. Notably there is no way for the Blast parsing > system of BioJava to shut itself down but control probably shouldn't > happen at that level. > > A crude but effective procedure is to write out the results when you > find the hit of interest and then simply call System.exit() > > Another approach would be to spawn Tasks to parse each record and then > have them signal to the main thread when they are complete to shut them > down. If you are using Java 1.5 or earlier then you would need to do > this with Threads. If you have a later version you can use the > concurrent packages which are much nicer to deal with. > > One thing I don't understand is why you don't blast each contig > separately, in that case the results would only contain your hit of > interest. That means 90K separate blasts but there are versions of > blast that run on clusters and the database (3 million genes) is not > huge so it should be an embarrassingly parallel problem? > > - Mark > > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: > >> Hi Mark! >> >> Mark Schreiber wrote: >> > You could just customize BlastEcho to pass on the events of interest, >> > ignore those that are not interesting. >> That's what I am doing right now. But I don't know, how to tell my >> customized BlastEcho to stop, when a certain condition is met during a >> paricular event call. What's the command for stopping there? >> >> > It could also exit if a certain >> > event occurs. >> How? >> >> > Remember it cost almost nothing to read the file so you >> > save time by only sending interesting events for parsing. >> Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 >> contigs that were blasted against a database with about maybe 3,000,000 >> genes. The blast output that I am parsing is about 13Gig big and every >> cycle I am looking for the results of one particular contig of these >> 90,000 contigs. So I definitely experienced that the time sums up a lot, >> when it's running in each of these 90,000 cycles over the whole file, >> although the contig I am looking for was already at the beginning > ofthe file. >> >> >> Cheers, >> Marcel From mark.schreiber at novartis.com Wed Mar 11 23:49:54 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 12 Mar 2009 11:49:54 +0800 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: <49B87AD6.7010005@Gmail.com> Message-ID: Hi Marcel - One possible solution would be to customise the handler and the parser so they can talk to each other and the handler can make call backs to the parser. However, there is a fundamental problem with the BlastLikeSAXParser. Because it is a SAX parser it is not at all suited to bouncing around the file it is parsing because SAX parsing is event based. Therefore I think you need a different paradigm. If you have lots of memory you could go with something that is more like a DOM parser and reads the whole file into memory (or uses java nio to pretend to) and use something like XQuery to find what you want. If you are using BLAST XML output you could also build an object tree with JAXB and navigate that. You can also combine SAX and DOM to read memory sized chunks in one go but this can be clunky. Note, I am assuming you will use BLAST XML. If you are not I would strongly encourage it for the task you describe. It will also make you parsers much more robust to BLAST version changes. Sorry the standard BioJava model can't really help here but please consider posting you're solution or adding it as a recipe in the cookbook as others are sure to have similar problems soon. - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/12/2009 11:00:38 AM: > Hi Mark! > > The blast etc. is parallelized. The contigs are split into groups of 1000 > and I also modified my program in the way that it works now with all those > separate files. But nevertheless I also have a program that works on the > concatenated blast output. The parser with my customized handler is always > looking for the results of a certain contig and then compares these > results to something else and also does some other stuff in-between to > calculate some statistics and then creates a new parser again to get the > results for the next contig. So a System.exit() is not an option, since it > would stop my whole program (in which I am using the parser). I also don't > wanna start working with threads here. I was just hoping that there would > be a way to tell the handler that, when a certain condition is met, it > should give the parser a signal to stop parsing (and maybe even to reset > itself to the first line). But I guess there's no way to do it in the > customized handler... > > Thanks, > Marcel > > > mark.schreiber at novartis.com wrote: > > > > Hi - > > > > There are many ways to stop the parsing but it really depends on how you > > have set the program up. Notably there is no way for the Blast parsing > > system of BioJava to shut itself down but control probably shouldn't > > happen at that level. > > > > A crude but effective procedure is to write out the results when you > > find the hit of interest and then simply call System.exit() > > > > Another approach would be to spawn Tasks to parse each record and then > > have them signal to the main thread when they are complete to shut them > > down. If you are using Java 1.5 or earlier then you would need to do > > this with Threads. If you have a later version you can use the > > concurrent packages which are much nicer to deal with. > > > > One thing I don't understand is why you don't blast each contig > > separately, in that case the results would only contain your hit of > > interest. That means 90K separate blasts but there are versions of > > blast that run on clusters and the database (3 million genes) is not > > huge so it should be an embarrassingly parallel problem? > > > > - Mark > > > > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: > > > >> Hi Mark! > >> > >> Mark Schreiber wrote: > >> > You could just customize BlastEcho to pass on the events of interest, > >> > ignore those that are not interesting. > >> That's what I am doing right now. But I don't know, how to tell my > >> customized BlastEcho to stop, when a certain condition is met during a > >> paricular event call. What's the command for stopping there? > >> > >> > It could also exit if a certain > >> > event occurs. > >> How? > >> > >> > Remember it cost almost nothing to read the file so you > >> > save time by only sending interesting events for parsing. > >> Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 > >> contigs that were blasted against a database with about maybe 3,000,000 > >> genes. The blast output that I am parsing is about 13Gig big and every > >> cycle I am looking for the results of one particular contig of these > >> 90,000 contigs. So I definitely experienced that the time sums up a lot, > >> when it's running in each of these 90,000 cycles over the whole file, > >> although the contig I am looking for was already at the beginning > > ofthe file. > >> > >> > >> Cheers, > >> Marcel > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From marcel.huntemann at gmail.com Thu Mar 12 11:40:21 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 12 Mar 2009 08:40:21 -0700 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: References: Message-ID: <49B92CE5.8060402@Gmail.com> OK, thanks heaps 4 your help, Mark! mark.schreiber at novartis.com wrote: > > Hi Marcel - > > One possible solution would be to customise the handler and the parser > so they can talk to each other and the handler can make call backs to > the parser. > > However, there is a fundamental problem with the BlastLikeSAXParser. > Because it is a SAX parser it is not at all suited to bouncing around > the file it is parsing because SAX parsing is event based. Therefore I > think you need a different paradigm. If you have lots of memory you > could go with something that is more like a DOM parser and reads the > whole file into memory (or uses java nio to pretend to) and use > something like XQuery to find what you want. If you are using BLAST XML > output you could also build an object tree with JAXB and navigate that. > > You can also combine SAX and DOM to read memory sized chunks in one go > but this can be clunky. > > Note, I am assuming you will use BLAST XML. If you are not I would > strongly encourage it for the task you describe. It will also make you > parsers much more robust to BLAST version changes. > > Sorry the standard BioJava model can't really help here but please > consider posting you're solution or adding it as a recipe in the > cookbook as others are sure to have similar problems soon. > > - Mark > > biojava-l-bounces at lists.open-bio.org wrote on 03/12/2009 11:00:38 AM: > >> Hi Mark! >> >> The blast etc. is parallelized. The contigs are split into groups of 1000 >> and I also modified my program in the way that it works now with all those >> separate files. But nevertheless I also have a program that works on the >> concatenated blast output. The parser with my customized handler is always >> looking for the results of a certain contig and then compares these >> results to something else and also does some other stuff in-between to >> calculate some statistics and then creates a new parser again to get the >> results for the next contig. So a System.exit() is not an option, since it >> would stop my whole program (in which I am using the parser). I also don't >> wanna start working with threads here. I was just hoping that there would >> be a way to tell the handler that, when a certain condition is met, it >> should give the parser a signal to stop parsing (and maybe even to reset >> itself to the first line). But I guess there's no way to do it in the >> customized handler... >> >> Thanks, >> Marcel >> >> >> mark.schreiber at novartis.com wrote: >> > >> > Hi - >> > >> > There are many ways to stop the parsing but it really depends on how you >> > have set the program up. Notably there is no way for the Blast parsing >> > system of BioJava to shut itself down but control probably shouldn't >> > happen at that level. >> > >> > A crude but effective procedure is to write out the results when you >> > find the hit of interest and then simply call System.exit() >> > >> > Another approach would be to spawn Tasks to parse each record and then >> > have them signal to the main thread when they are complete to shut them >> > down. If you are using Java 1.5 or earlier then you would need to do >> > this with Threads. If you have a later version you can use the >> > concurrent packages which are much nicer to deal with. >> > >> > One thing I don't understand is why you don't blast each contig >> > separately, in that case the results would only contain your hit of >> > interest. That means 90K separate blasts but there are versions of >> > blast that run on clusters and the database (3 million genes) is not >> > huge so it should be an embarrassingly parallel problem? >> > >> > - Mark >> > >> > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: >> > >> >> Hi Mark! >> >> >> >> Mark Schreiber wrote: >> >> > You could just customize BlastEcho to pass on the events of interest, >> >> > ignore those that are not interesting. >> >> That's what I am doing right now. But I don't know, how to tell my >> >> customized BlastEcho to stop, when a certain condition is met during a >> >> paricular event call. What's the command for stopping there? >> >> >> >> > It could also exit if a certain >> >> > event occurs. >> >> How? >> >> >> >> > Remember it cost almost nothing to read the file so you >> >> > save time by only sending interesting events for parsing. >> >> Hmm, I am not sure, if it's really almost nothing, when I've about > 90,000 >> >> contigs that were blasted against a database with about maybe 3,000,000 >> >> genes. The blast output that I am parsing is about 13Gig big and every >> >> cycle I am looking for the results of one particular contig of these >> >> 90,000 contigs. So I definitely experienced that the time sums up a > lot, >> >> when it's running in each of these 90,000 cycles over the whole file, >> >> although the contig I am looking for was already at the beginning >> > ofthe file. >> >> >> >> >> >> Cheers, >> >> Marcel >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for > the exclusive use of the individual or entity named above and may > contain information that is privileged, confidential or exempt from > disclosure under applicable law. If the reader of this message is not > the intended recipient, or the employee or agent responsible for > delivery of the message to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this > communication in error, please notify the sender immediately by e-mail > and delete the material from any computer. Thank you. From pwrose at ucsd.edu Thu Mar 12 12:52:29 2009 From: pwrose at ucsd.edu (Peter Rose) Date: Thu, 12 Mar 2009 09:52:29 -0700 Subject: [Biojava-l] Java Developer and Scientific Software Developer Jobs at Protein Data Bank, UCSD Message-ID: <000001c9a332$eeb3c1a0$cc1b44e0$@edu> The PDB has openings for Java Developers and Scientific Software Developers at the University of California San Diego. http://www.pdb.org/pdb/static.do?p=general_information/about_pdb/contact/job _listings.html -Peter Rose From andreas.prlic at gmail.com Fri Mar 13 10:48:25 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Fri, 13 Mar 2009 07:48:25 -0700 Subject: [Biojava-l] Reading meta data in PDB file? In-Reply-To: References: <49B4A45A.3000102@biggjapan.com> <49B86219.1060501@biggjapan.com> <59a41c430903111823u1d3626acv69aa287decba649e@mail.gmail.com> Message-ID: <59a41c430903130748l29708cao1bbbe6289377fe5c@mail.gmail.com> if you mean residues that have been mentioned in the REMARK section, then those won't get parsed at the present. but you can have a look at the following page to see how it deals with SEQRES and ATOM records: http://biojava.org/wiki/BioJava:CookBook:PDB:seqres Andreas On Thu, Mar 12, 2009 at 8:50 AM, Anant Jain wrote: > Greetings, > > Even i want to retrieve position of missing residues but i am not getting > any method, should i try regex in java for the given problem.. > > > On 3/12/09, Andreas Prlic wrote: >> >> Hi Ashika, >> >> actually some of the data is available via the Compound class, but for >> the other parts of your query that is not there you would need to add >> additional methods to the parser. if you have a patch, would be great >> if you could post it to the list... >> >> Andreas >> >> >> >> 2009/3/12 Ashika Umanga Umagiliya : >> > I assume there is no way to read these information and I have to modify >> > the >> > PDB parser ? >> > >> > >> > Ashika Umanga Umagiliya wrote: >> >> >> >> Greetings all, >> >> I want to read following information.I noticed that most of them are >> >> stored under REMARK tag in PDB.Can I read them using BioJava PDB >> >> Parser? >> >> >> >> Thanks in Advance, >> >> Umanga. >> >> ------------------------------ >> >> >> >> Unit Cell: >> >> a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? >> >> ) >> >> >> >> Molecular Description Asymetric Unit: >> >> Polymer , Molecule , Chains >> >> >> >> Classification >> >> >> >> Source: >> >> Polymer,Scientific Name >> >> >> >> Ligand Chemical Component : >> >> Identifier , Name , Formula >> >> >> >> Diffraction Detector: >> >> Detector,Type,Collection Data >> >> >> >> Diffration Radiation: >> >> Monochromator,Diffraction Protocol,Wavelength,Wavelenth List >> >> >> > >> > >> > -- >> > $B%"%7%+(B $B%&%^%s%,(B $B%&%^%.%j%d(B >> > $B-j9q:]%P%$%*%$%s%U%)%^%F%#%/%98&5f=j!J(BBiGG) >> > $B")(B140-0001 >> > $BEl5~ETIJ at n6hKLIJ@n(B3-6-9 $B%"%s%I%&%S%k(B8F >> > TEL:03-6679-8763 >> > FAX:03-6679-8764 >> > >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From hlapp at gmx.net Sat Mar 14 18:59:44 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 14 Mar 2009 18:59:44 -0400 Subject: [Biojava-l] Google Summer of Code: application submitted, action needed In-Reply-To: References: <1F570555-12DF-42DF-8D0E-95AAE298D76A@gmx.net> Message-ID: <85E9A7D7-97E5-4D46-BA8C-C37E557BEBF3@gmx.net> Hi all, I have submitted the application yesterday for O|B|F participating in the 2009 Google Summer of Code as a mentoring organization. The application is at http://docs.google.com/Doc?id=dhs98hzv_7zn8bxqjm and is also linked to from the ideas page at http://open-bio.org/wiki/Google_Summer_of_Code_2009 Now keep your fingers crossed, Google is slated to announce acceptances on March 18. This is the last cross-project message re: Summer of Code that addresses mentors and our projects; future messages that I'll post across projects will be primarily for students such as announcing whether we are accepted or not and issuing calls for application. **What we need most and right now is action from our projects' developers and from possible mentors.** Google admins will start reviewing organization applications on Monday. The ideas page has 6 project ideas right now - though the ideas are good ones, the quantity won't be particularly impressive to Google. Therefore, if you have an idea for a summer project for a student please use the C& template (it is commented out now but you'll see it when you pull the Ideas section into the editor) and put it up there ASAP. If you're not sure yet who'll mentor, put tentative names there. We don't need a full commitment from mentors until the student application period starts (March 23). Next, for all projects, the leads and/or volunteers should check the reference information for their project: http://open-bio.org/wiki/Google_Summer_of_Code_2009#Open-Bio_projects_involved I just culled these links from the various project websites - it'd be much appreciated if going forward everyone can lend a hand in this. Please review what's there and add or fix as you see fit. *These links must be correct and complete - otherwise potential students may not find you.* Finally, all prospective mentors, primary or secondary, committed or not, and anyone else who would like to volunteer to help out, should subscribe themselves ASAP to the mailing list for communicating GSoC- related administrivia: http://lists.open-bio.org/mailman/listinfo/gsoc I will *not* cross-post all administrative announcements or requests for information, and so you *will* miss information if you don't subscribe yourself there. (Note: students will be subscribed there only *after* acceptance). Those who are considering to mentor, primary or helping out, please also add yourselves to the Mentors section on the Ideas page (and check your link if you're already there): http://open-bio.org/wiki/Google_Summer_of_Code_2009#Mentors Cheers everyone, and fingers crossed! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Mar 18 14:45:50 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 18 Mar 2009 14:45:50 -0400 Subject: [Biojava-l] OBF application for Summer of Code has been rejected Message-ID: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> I hope to find out later why, but our Google Summer of Code application as an umbrella org has been rejected. However, NESCent has been accepted. If you can give your project idea a phylogenetics/phyloinformatics focus, go and put it up on the NESCent ideas page at http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 Do so pretty much **now** - we will start broadcasting and reaching out to students tonight and tomorrow. If someone comes to the site and they don't see a Bio* project that they would have been interested in, they may not check back for updates. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Stephan.Neumann at gmx.de Fri Mar 20 07:32:35 2009 From: Stephan.Neumann at gmx.de (Stephan Neumann) Date: Fri, 20 Mar 2009 12:32:35 +0100 Subject: [Biojava-l] Problem with SingleDP.viterbi(..) Message-ID: <49C37ED3.9020606@gmx.de> Hello, we have a question concerning the Viterbi algorithm in BioJava: We are building Profile HMMs and performing BaumWelch algorithm to improve them. Finally we are performing the Viterbi algorithm on the resulting dynamic programming matrix (SingleDP). Now, we are wondering where is the difference between the different ScoreTypes: (ODDS, PROBABILITY, NULL_MODEL) of SingleDP.viterbi and how can we interpret the resulting StatePaths. Furthermore, what does StatePath.getScore mean? Best regards, Stephan. From jp at javaclass.co.uk Tue Mar 24 08:51:49 2009 From: jp at javaclass.co.uk (JP) Date: Tue, 24 Mar 2009 12:51:49 +0000 Subject: [Biojava-l] fjoin algorithm implementation in Java (submission to BioJava) Message-ID: <4adc29060903240551l749650e3l2254356c126d9bdc@mail.gmail.com> Hi there at BioJava, I have implemented an algorithm in Java for efficient computation of feature overlap (fjoin algorithm: described in http://www.liebertonline.com/doi/abs/10.1089/cmb.2006.13.1457 and the actual paper here http://www.liebertonline.com/doi/pdf/10.1089/cmb.2006.13.1457) as part of an MSc project in Bioinformatics at Imperial College (London). I have looked at biojava to see if the implementation of this algorithm already exists and it didn't (a simple search for "fjoin" returns no results) so after I implemented it I thought I'd submit it and maybe spare a few hours development to someone else (or in the true open source spirit someone might find a bug or improve this). Is this possible and how do I go about moving this forward (I did have a read through the website, and could find anythign related to submission). Any pointers would be greatly appreciated. Keep up the Good Work, Jean-Paul Ebejer, Malta. From charles at imbusch.net Tue Mar 24 14:23:15 2009 From: charles at imbusch.net (Charles Imbusch) Date: Tue, 24 Mar 2009 19:23:15 +0100 Subject: [Biojava-l] NoClassDefFoundError Message-ID: <49C92513.3040902@imbusch.net> Hello, I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file for my project and I can successfully execute it. Now I would like to execute the java code on another machine. I installed Biojava on that machine as well and copied the jar file to it. But I get an error message like this: charles at nougat:~$ java -jar MPI.jar Exception in thread "main" java.lang.NoClassDefFoundError: org/biojava/bio/BioException I checked my installation (on nougat ) and tried to compile one of the example files provided in the demo folder of the Biojava distribution. The compilation worked out. No I'm really wondering what can cause that error. Any answer is appreciated. Cheers, Charles From ahmed.elmasri at gmail.com Tue Mar 24 13:58:20 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Tue, 24 Mar 2009 13:58:20 -0400 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C92513.3040902@imbusch.net> References: <49C92513.3040902@imbusch.net> Message-ID: <5cdd31570903241058s5d02c3f5k7d3db253bb70c3f6@mail.gmail.com> Hi Charles, This is a common java error when you don't have all the required jar in your class-path. Problem with Netbeans or Eclipse is that they set the class-path internally but once you need to use it from another machine you have to create manage your own class-path. In Eclipse, not sure about NetBeans, you can export your project as a JAR and you can specify a MANIFEST.MF file. Make sure this this file exists and the project libraries that are required on your labtop are migrated to the new machine and the path within the MANIFEST.MF file is pointing correctly to them. Here is what my MANIFEST.MF file looks like when I migrated biojava project to another machine: Manifest-Version: 1.0 Class-Path: /home/ahamed/I529Fall2009/HW1/bytecode.jar /home/ahamed/I529Fall2009/HW1/biojava.jar Main-Class: mlbio.genes.dna.CodonTableCalculator Hope that fixes your problem. Best wishes, Ahmed On Tue, Mar 24, 2009 at 2:23 PM, Charles Imbusch wrote: > Hello, > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file > for my project and I can successfully execute it. > > Now I would like to execute the java code on another machine. I installed > Biojava on > that machine as well and copied the jar file to it. > > But I get an error message like this: > > charles at nougat:~$ java -jar MPI.jar > Exception in thread "main" java.lang.NoClassDefFoundError: > org/biojava/bio/BioException > > I checked my installation (on nougat ) and tried to compile one of the > example files provided in the > demo folder of the Biojava distribution. The compilation worked out. > > No I'm really wondering what can cause that error. > > Any answer is appreciated. > > Cheers, > Charles > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From holland at eaglegenomics.com Tue Mar 24 15:12:33 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Mar 2009 19:12:33 +0000 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C92513.3040902@imbusch.net> References: <49C92513.3040902@imbusch.net> Message-ID: <49C930A1.9000202@eaglegenomics.com> Depends what you mean by 'installed biojava'. Have you tried running this command: java -jar MPI.jar -cp /path/to/biojava.jar If that works, then the problem will be that your biojava.jar is not installed in the correct location for java to pick it up system-wide. Compiling the demos does not rely on having jars installed as it picks everything up from inside the biojava source distribution. cheers, Richard Charles Imbusch wrote: > Hello, > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file > for my project and I can successfully execute it. > > Now I would like to execute the java code on another machine. I > installed Biojava on > that machine as well and copied the jar file to it. > > But I get an error message like this: > > charles at nougat:~$ java -jar MPI.jar > Exception in thread "main" java.lang.NoClassDefFoundError: > org/biojava/bio/BioException > > I checked my installation (on nougat ) and tried to compile one of the > example files provided in the > demo folder of the Biojava distribution. The compilation worked out. > > No I'm really wondering what can cause that error. > > Any answer is appreciated. > > Cheers, > Charles > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ahmed.elmasri at gmail.com Tue Mar 24 16:53:24 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Tue, 24 Mar 2009 16:53:24 -0400 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C930A1.9000202@eaglegenomics.com> References: <49C92513.3040902@imbusch.net> <49C930A1.9000202@eaglegenomics.com> Message-ID: <5cdd31570903241353g3101620cjb9544c30c6954c06@mail.gmail.com> Dear Richard, I know you are answering Charles's question below. However, I thought it might be worthwhile mentioning that Java ignores the classpath argument if you are running a jar i.e. using "-jar" option. Best wishes, Ahmed On Tue, Mar 24, 2009 at 3:12 PM, Richard Holland wrote: > Depends what you mean by 'installed biojava'. > > Have you tried running this command: > > java -jar MPI.jar -cp /path/to/biojava.jar > > If that works, then the problem will be that your biojava.jar is not > installed in the correct location for java to pick it up system-wide. > > Compiling the demos does not rely on having jars installed as it picks > everything up from inside the biojava source distribution. > > cheers, > Richard > > Charles Imbusch wrote: > > Hello, > > > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file > > for my project and I can successfully execute it. > > > > Now I would like to execute the java code on another machine. I > > installed Biojava on > > that machine as well and copied the jar file to it. > > > > But I get an error message like this: > > > > charles at nougat:~$ java -jar MPI.jar > > Exception in thread "main" java.lang.NoClassDefFoundError: > > org/biojava/bio/BioException > > > > I checked my installation (on nougat ) and tried to compile one of the > > example files provided in the > > demo folder of the Biojava distribution. The compilation worked out. > > > > No I'm really wondering what can cause that error. > > > > Any answer is appreciated. > > > > Cheers, > > Charles > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From mark.schreiber at novartis.com Wed Mar 25 00:03:31 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 25 Mar 2009 12:03:31 +0800 Subject: [Biojava-l] fjoin algorithm implementation in Java (submission to BioJava) In-Reply-To: <4adc29060903240551l749650e3l2254356c126d9bdc@mail.gmail.com> Message-ID: Hi - If you wish to contribute the code you could give it to someone with a development account (Andreas Prilc might be the best option). Alternatively if you think you will be regularly contributing you could apply for a development account. Because fjoin works on features and locations and as BioJava already has a detailed location model it would be good if your code makes use of the BioJava location/ feature API (at least in the public interface). - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/24/2009 08:51:49 PM: > Hi there at BioJava, > > I have implemented an algorithm in Java for efficient computation of feature > overlap (fjoin algorithm: described in > http://www.liebertonline.com/doi/abs/10.1089/cmb.2006.13.1457 and the actual > paper here http://www.liebertonline.com/doi/pdf/10.1089/cmb.2006.13.1457) as > part of an MSc project in Bioinformatics at Imperial College (London). > > I have looked at biojava to see if the implementation of this algorithm > already exists and it didn't (a simple search for "fjoin" returns no > results) so after I implemented it I thought I'd submit it and maybe spare a > few hours development to someone else (or in the true open source spirit > someone might find a bug or improve this). > > Is this possible and how do I go about moving this forward (I did have a > read through the website, and could find anythign related to submission). > > Any pointers would be greatly appreciated. > > Keep up the Good Work, > Jean-Paul Ebejer, Malta. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From holland at eaglegenomics.com Wed Mar 25 04:44:50 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 25 Mar 2009 08:44:50 +0000 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <5cdd31570903241353g3101620cjb9544c30c6954c06@mail.gmail.com> References: <49C92513.3040902@imbusch.net> <49C930A1.9000202@eaglegenomics.com> <5cdd31570903241353g3101620cjb9544c30c6954c06@mail.gmail.com> Message-ID: <49C9EF02.6020607@eaglegenomics.com> Good point... *doh* sometimes we all forget things! :) Hamed, Ahmed A. wrote: > Dear Richard, > I know you are answering Charles's question below. However, I thought it > might be worthwhile mentioning that Java ignores the classpath argument > if you are running a jar i.e. using "-jar" option. > Best wishes, > Ahmed > > On Tue, Mar 24, 2009 at 3:12 PM, Richard Holland > > wrote: > > Depends what you mean by 'installed biojava'. > > Have you tried running this command: > > java -jar MPI.jar -cp /path/to/biojava.jar > > If that works, then the problem will be that your biojava.jar is not > installed in the correct location for java to pick it up system-wide. > > Compiling the demos does not rely on having jars installed as it picks > everything up from inside the biojava source distribution. > > cheers, > Richard > > Charles Imbusch wrote: > > Hello, > > > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a > jar file > > for my project and I can successfully execute it. > > > > Now I would like to execute the java code on another machine. I > > installed Biojava on > > that machine as well and copied the jar file to it. > > > > But I get an error message like this: > > > > charles at nougat:~$ java -jar MPI.jar > > Exception in thread "main" java.lang.NoClassDefFoundError: > > org/biojava/bio/BioException > > > > I checked my installation (on nougat ) and tried to compile one of the > > example files provided in the > > demo folder of the Biojava distribution. The compilation worked out. > > > > No I'm really wondering what can cause that error. > > > > Any answer is appreciated. > > > > Cheers, > > Charles > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > -- > Ahmed Abdeen Hamed > Scientific Informatics Project Leader > Marine Biological Laboratory Woods Hole, MA > -- > Ph.D. student, Complex Systems > School of Informatics, Indiana University > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From nir at rosettadesigngroup.com Wed Mar 25 12:18:24 2009 From: nir at rosettadesigngroup.com (Nir London) Date: Wed, 25 Mar 2009 18:18:24 +0200 Subject: [Biojava-l] Rosetta Academic Training Webinar Message-ID: <88F0F36A-FC4D-4A9C-AC31-5B883C3F92CB@rosettadesigngroup.com> The Rosetta Design Group is proud to present the first webinar in the Rosetta Academic Workshop Series. For the first webinar, we have selected to focus on Protein-Protein Docking based on the answers to the interest poll. We hope this will be the first in a line of helpful and inspiring webinars to kick-off our Rosetta Academic Workshop Series. What: Protein-Protein Docking When: May 4th 2009, 0800-1000 AM EST Where: Your office! Click here for more details and registration (For non html emails: http://rosettadesigngroup.com/RDGLS/index.php?sid=54479&lang=en ) Pleas note: This is not a promotional webinar. Rosetta is open-source and freeware for academic and non-profit organizations and can be downloaded here from University of Washington's TechTransfer Digital Ventures. The majority of the webinar is concerned with Rosetta 2.3.0. Rosetta 3.0 is still a beta version. Hope to see you there, Nir London. Rosetta Design Group | http://rosettadesigngroup.com/ From charles at imbusch.net Fri Mar 27 15:33:39 2009 From: charles at imbusch.net (Charles Imbusch) Date: Fri, 27 Mar 2009 20:33:39 +0100 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C921C6.7010202@umn.edu> References: <49C92513.3040902@imbusch.net> <49C921C6.7010202@umn.edu> Message-ID: <49CD2A13.4010101@imbusch.net> Hello, thanks everybody for the replies! In fact I did not copy the dist directory to the other machine, so the libraries needed were not found. Now, I'm almost there, when I want to start the program, I get this message: charles at nougat:~/dist$ java -jar MPI.jar Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad version number in .class file [...] Google has a a lot of information to this. Probably the java version on the machine is to old. I think I can handle this. Cheers, Charles From andreas at sdsc.edu Fri Mar 27 16:09:53 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 27 Mar 2009 13:09:53 -0700 Subject: [Biojava-l] biojava @ ISMB and BOSC Message-ID: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> Hi I would like to organize a BioJava user meeting around the upcoming ISMB conference in July. Anybody interested? Related to this I would also like to submit an abstract for the BOSC - satellite meeting there. I started a wiki page in preparation for this: http://biojava.org/wiki/BOSC2009_Presentation Andreas From SMarkel at accelrys.com Fri Mar 27 16:38:03 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 27 Mar 2009 16:38:03 -0400 Subject: [Biojava-l] biojava @ ISMB and BOSC In-Reply-To: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> References: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74729A92D92@exch1-hi.accelrys.net> Andreas, I plan on being at BOSC and ISMB. I would attend a BioJava user meeting, depending, of course, on what else is scheduled at the same time. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- > bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Friday, 27 March 2009 1:10 PM > To: biojava-l at biojava.org > Subject: [Biojava-l] biojava @ ISMB and BOSC > > Hi > > I would like to organize a BioJava user meeting around the upcoming > ISMB conference in July. Anybody interested? > > Related to this I would also like to submit an abstract for the BOSC - > satellite meeting there. I started a wiki page in preparation for > this: > > http://biojava.org/wiki/BOSC2009_Presentation > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From holland at eaglegenomics.com Sun Mar 29 16:16:25 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 29 Mar 2009 21:16:25 +0100 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49CD2A13.4010101@imbusch.net> References: <49C92513.3040902@imbusch.net> <49C921C6.7010202@umn.edu> <49CD2A13.4010101@imbusch.net> Message-ID: <49CFD719.3020008@eaglegenomics.com> It means that the version of the Java compiler (or the compatibility flag used on the compiler) that was used to compile the BioJava JARs you are using is newer than the version of Java available on the machine you are trying to run the program on! You need to either upgrade Java on the target machine, or compile BioJava from source on the target machine. Charles Imbusch wrote: > Hello, > > thanks everybody for the replies! In fact I did not copy the dist > directory to the other machine, > so the libraries needed were not found. > > Now, I'm almost there, when I want to start the program, I get this > message: > > charles at nougat:~/dist$ java -jar MPI.jar > Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad > version number in .class file > [...] > > Google has a a lot of information to this. Probably the java version on > the machine is to old. > I think I can handle this. > > Cheers, > Charles > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Sun Mar 29 16:16:55 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 29 Mar 2009 21:16:55 +0100 Subject: [Biojava-l] biojava @ ISMB and BOSC In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C74729A92D92@exch1-hi.accelrys.net> References: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74729A92D92@exch1-hi.accelrys.net> Message-ID: <49CFD737.3000102@eaglegenomics.com> I will be there too, for BOSC only, so will be happy to attend a meeting. Scott Markel wrote: > Andreas, > > I plan on being at BOSC and ISMB. I would attend a BioJava user > meeting, depending, of course, on what else is scheduled at the > same time. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > >> -----Original Message----- >> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- >> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic >> Sent: Friday, 27 March 2009 1:10 PM >> To: biojava-l at biojava.org >> Subject: [Biojava-l] biojava @ ISMB and BOSC >> >> Hi >> >> I would like to organize a BioJava user meeting around the upcoming >> ISMB conference in July. Anybody interested? >> >> Related to this I would also like to submit an abstract for the BOSC - >> satellite meeting there. I started a wiki page in preparation for >> this: >> >> http://biojava.org/wiki/BOSC2009_Presentation >> >> Andreas >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tallpaulinjax at yahoo.com Wed Mar 11 08:58:27 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 12:58:27 -0000 Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <585690.93512.qm@web30705.mail.mud.yahoo.com> Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: netbeans.conf Type: application/octet-stream Size: 1965 bytes Desc: not available URL: From tallpaulinjax at yahoo.com Mon Mar 23 09:13:12 2009 From: tallpaulinjax at yahoo.com (tallpaulinjax at yahoo.com) Date: Mon, 23 Mar 2009 13:13:12 -0000 Subject: [Biojava-l] For BioJava List: Possible solution to Hibernate and slow Atom storage. Message-ID: <395990.73913.qm@web30705.mail.mud.yahoo.com> Hi Andreas, ? I saw this?post on slow Atom storage using BioJava and Hibernate: http://www.biojava.org/wiki/BioJava:CookBook:PDB:hibernate I may have an acceptable work-around. I had the same problem, but have 'flattened' the object structure I am using with Hibernate and got a 20x to 25x performance improvement. The problem was the number of objects being held in memory as the file was parsed, similar to how BioJava can run out of heap space. In my design, a PdbMeta record can have hundreds or more of ModelChainResidue objects, and each ModelChainResidue object could have dozens of AtomNorm records. Just to load a 300kb PDB file into BioJava then into my database could take 25 minutes! And I have 4,000 of these files to load! ? So what I did is explained in this post: http://forum.hibernate.org/viewtopic.php?p=2409385#2409385 ? Basically, per PdbMeta row I only kept around the currently needed ModelChainResidue object and AtomNorm object, and garbage collected any others. This meant I couldn't use one big session/transaction and had to split this up into separate transactions, but I gained 25x load times into my database (HIbernate doesn't support nested transactions, and I couldn't see a way within a transaction to remove an object without subsequently deleting it from the database).?My plan is to include a Boolean 'useFastLoad' parameter to the method calls which will turn this feature on and off as-needed. With 4,000 PDB files to load, each one taking a minute or so on average to download, parse into BioJava, and then dump to my database (on my laptop for testing, moving to my server soon), 1 minute per file will still take almost 3 days. ? Perhaps BioJava could use the same strategy with Hibernate? ? Paul ? PS: Here is some background on what I am using BioJava for: ? I am using BioJava to parse PDB files, then converting from BioJava's object structure to one more specific to my needs. I am working with two chemists at The University Of North Florida on this project, which supports my Masters in C.S. thesis. I have attached a preliminary schema (hopefully it won't inflame the SPAM filter :-) ). I just added the PeriodicTable table last night and have to adjust the AtomNorm and AtomDenorm?tables accordingly. Basically, the schema is as follows: ? 1. We imported a high-level list of over 4,000 "representative sample" PDB files into the RepresentativeSample table. This will be used as part of the basis to start filling the PdbMeta, ModelChainResidue, and AtomNorm tables. 2. There may be more of these representative sample lists in the future, so each batch of imports has an entry in RepresentativeSampleMeta. 3. Each PdbMeta entry is unique by PDB Code, DepositionDate, and ModificationDate. 4. A PdbMeta entry can have 0 or more child ModelChainResidue records (usually hundreds). 5. A ModelChainResidue record can have dozens of child?AtomNorm records. 6. For data mining purposes and?join improvements, pertinent info from PdbMeta,?ModelChainResidue, and AtomNorm are dumped into AtomDenorm. 7. The "Lkp" tables are merely static 'helper' tables whose number of records and field entries are expected to remain static. 8. The Error table is where any errors found in the data are dumped by the Java program, by table name and then primary key within that table. 9. MethylDonatedHydrogen table: one of the key areas of interest for the UNF Chemists, including Dr. Robert Vergenz. ? (BTW, I can't figure out how to get the RFactor out of BioJava, and apparently BioJava is removing 'Unknown amino acids' before I have a chance to parse them and add them to my Error table as well... solutions to both those problems? Does BioJava somewhere have a "hasErrors" field based on parsing?) ? ? -------------- next part -------------- A non-text attachment was scrubbed... Name: OverviewSchema.pdf Type: application/pdf Size: 89251 bytes Desc: not available URL: From aumanga at biggjapan.com Wed Mar 4 10:13:35 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Wed, 04 Mar 2009 19:13:35 +0900 Subject: [Biojava-l] PDB parsing: Calculate absolute atom numbers of a chain. Message-ID: <49AE544F.9080502@biggjapan.com> Greetings all, From a database; I retrieve PDB-id ,chainID and epitope positions.These epitope positions are calculated relative to its Chain. I want to select these epitope positions (residues) in JMol.But in jmol the select query support only absolute residues numbers. Ex: CHAIN L : residues 10,12,13,15 (relative to chain L) But in PDB file (in JMOL) the values are 210,212,213,215 - this are what I want to calculate? Anyway to calculate this? Thanks in advance, Umanga -- ??? ???? ????? ???????????????????BiGG) ?140-0001 ?????????3-6-9 ??????8F TEL:03-6679-8763 FAX:03-6679-8764 From andreas.prlic at gmail.com Wed Mar 4 15:38:26 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Wed, 4 Mar 2009 07:38:26 -0800 Subject: [Biojava-l] PDB parsing: Calculate absolute atom numbers of a chain. In-Reply-To: <49AE544F.9080502@biggjapan.com> References: <49AE544F.9080502@biggjapan.com> Message-ID: <59a41c430903040738o43b5417as688f8f18b7d14f24@mail.gmail.com> Hi Umanga, You will need to use the getPDBCode() method to obtain the PDB residue number as well as the chain ID to select the residue in Jmol. Andreas 2009/3/4 Ashika Umanga Umagiliya : > Greetings all, > > From a database; I retrieve PDB-id ,chainID and epitope positions.These > epitope positions are calculated relative to its Chain. > I want to select these epitope positions (residues) in JMol.But in jmol the > select query support only absolute residues numbers. > > Ex: > CHAIN L : residues 10,12,13,15 (relative to chain L) > But in PDB file (in JMOL) the values are 210,212,213,215 - this are what I > want to calculate? > > Anyway to calculate this? > > Thanks in advance, > Umanga > > -- > $B%"%7%+(B $B%&%^%s%,(B $B%&%^%.%j%d(B > $B-j9q:]%P%$%*%$%s%U%)%^%F%#%/%98&5f=j!J(BBiGG) > $B")(B140-0001 > $BEl5~ETIJ at n6hKLIJ@n(B3-6-9 $B%"%s%I%&%S%k(B8F > TEL:03-6679-8763 > FAX:03-6679-8764 > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From benmpe at pml.ac.uk Wed Mar 4 15:36:21 2009 From: benmpe at pml.ac.uk (Ben Temperton) Date: Wed, 4 Mar 2009 15:36:21 -0000 Subject: [Biojava-l] Parsing out identities from a blast file Message-ID: <7546D505C3AF304280B10C3D8B8DFD3902F221D6@burgh.npm.ac.uk> Hi there, Does anyone know what methods I need to invoke to pull out the identity for a particular hit when parsing a blast output? I can get the score, the start & end points etc using: SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)obj; LOGGER.debug("match:" + hit.getSubjectID() + ", score:" + hit.getScore() + ", e-value:" + hit.getEValue()); But I can't seem to find any method to pull out the identity of the hit, which I would have thought would be in the SeqSimilaritySearchSubHit interface. Many thanks, Ben -------------------------------------------------------------------------------- Plymouth Marine Laboratory Registered Office: Prospect Place The Hoe Plymouth PL1 3DH Website: www.pml.ac.uk Registered Charity No. 1091222 PML is a company limited by guarantee registered in England & Wales company number 4178503 PML is a member of the Plymouth Marine Sciences Partnership Website: www.pmsp.org.uk -------------------------------------------------------------------------------- This e-mail, its content and any file attachments are confidential. If you have received this e-mail in error please do not copy, disclose it to any third party or use the contents or attachments in any way. Please notify the sender by replying to this e-mail or e-mail forinfo at pml.ac.uk and then delete the email without making any copies or using it in any other way. The content of this message may contain personal views which are not the views of Plymouth Marine Laboratory unless specifically stated. You are reminded that e-mail communications are not secure and may contain viruses. Plymouth Marine Laboratory accepts no liability for any loss or damage which may be caused by viruses. -------------------------------------------------------------------------------- From mark.schreiber at novartis.com Thu Mar 5 02:53:47 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 5 Mar 2009 10:53:47 +0800 Subject: [Biojava-l] Parsing out identities from a blast file In-Reply-To: <7546D505C3AF304280B10C3D8B8DFD3902F221D6@burgh.npm.ac.uk> Message-ID: Hi - You need to use the SeqSimilaritySearchHit which is that parent of the SeqSimilaritySearchSubHit. It has a method called getSubjectID() which gives you the ID of the hit subject. All subhits with the same parent search hit will have the same subject ID so it makes more sense to store the information in this class. - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/04/2009 11:36:21 PM: > Hi there, > > Does anyone know what methods I need to invoke to pull out the identity > for a particular hit when parsing a blast output? > > I can get the score, the start & end points etc using: > SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)obj; > LOGGER.debug("match:" + hit.getSubjectID() + ", score:" > + hit.getScore() + ", e-value:" + hit.getEValue()); > > But I can't seem to find any method to pull out the identity of the hit, > which I would have thought would be in the SeqSimilaritySearchSubHit > interface. > > Many thanks, > > Ben > > > -------------------------------------------------------------------------------- > Plymouth Marine Laboratory > > Registered Office: > Prospect Place > The Hoe > Plymouth PL1 3DH > > Website: www.pml.ac.uk > Registered Charity No. 1091222 > PML is a company limited by guarantee > registered in England & Wales > company number 4178503 > > PML is a member of the Plymouth Marine Sciences Partnership > Website: www.pmsp.org.uk > -------------------------------------------------------------------------------- > This e-mail, its content and any file attachments are confidential. > > If you have received this e-mail in error please do not copy, > disclose it to any third party or use the contents or attachments in > any way. Please notify the sender by replying to this e-mail or e- > mail forinfo at pml.ac.uk and then delete the email without making any > copies or using it in any other way. > > The content of this message may contain personal views which are not > the views of Plymouth Marine Laboratory unless specifically stated. > > You are reminded that e-mail communications are not secure and may > contain viruses. Plymouth Marine Laboratory accepts no liability for > any loss or damage which may be caused by viruses. > -------------------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From bioinfosej17 at gmail.com Thu Mar 5 09:53:47 2009 From: bioinfosej17 at gmail.com (Sej Modha) Date: Thu, 5 Mar 2009 15:23:47 +0530 Subject: [Biojava-l] Biojava-l Digest, Vol 74, Issue 1 In-Reply-To: References: Message-ID: <4844b4790903050153u213599dax32d7ee46ab1d97b2@mail.gmail.com> I m working on biojava.... I m not getting what is the arg[0] we need to pass in BlastParser ... if this particular program is providing the blast facility like utility or not. If yes then send me one example program with each n every details. plz....... From marcel.huntemann at gmail.com Sat Mar 7 00:55:35 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Fri, 06 Mar 2009 16:55:35 -0800 Subject: [Biojava-l] Stop condition for blast parser Message-ID: <49B1C607.6050803@Gmail.com> Hi! I followed the example in http://biojava.org/wiki/BioJava:CookBook:Blast:Echo to create my own blast parser. I've the problem that I am often only interested in the hits of a certain query. Is there a way to tell the blast parser via my customized SearchContentHandler to stop parsing after it found this certain queryId? Because often the wanted queryId will be in the beginning of the blast file and I don't want the parser to run over the really, really long rest of the file, even if it doesn't do much during that. I just want the parser to stop then. Can I implement that somehow in my customized SearchContentHandler or do I've to change the source code of biojava somewhere for that? Thanks, Marcel From aumanga at biggjapan.com Mon Mar 9 05:08:42 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Mon, 09 Mar 2009 14:08:42 +0900 Subject: [Biojava-l] Reading meta data in PDB file? Message-ID: <49B4A45A.3000102@biggjapan.com> Greetings all, I want to read following information.I noticed that most of them are stored under REMARK tag in PDB.Can I read them using BioJava PDB Parser? Thanks in Advance, Umanga. ------------------------------ Unit Cell: a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? ) Molecular Description Asymetric Unit: Polymer , Molecule , Chains Classification Source: Polymer,Scientific Name Ligand Chemical Component : Identifier , Name , Formula Diffraction Detector: Detector,Type,Collection Data Diffration Radiation: Monochromator,Diffraction Protocol,Wavelength,Wavelenth List -- ??? ???? ????? ???????????????????BiGG) ?140-0001 ?????????3-6-9 ??????8F TEL:03-6679-8763 FAX:03-6679-8764 From marcel.huntemann at gmail.com Mon Mar 9 19:00:36 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Mon, 09 Mar 2009 12:00:36 -0700 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: <93b45ca50903062057x680cdf0fu39d981938478b547@mail.gmail.com> References: <49B1C607.6050803@Gmail.com> <93b45ca50903061758t38dfde52p65d4ad7f5ddf678a@mail.gmail.com> <49B1F1A2.3040403@Gmail.com> <93b45ca50903062054k2ec842a9ua890f592286cd3d3@mail.gmail.com> <93b45ca50903062057x680cdf0fu39d981938478b547@mail.gmail.com> Message-ID: <49B56754.6080100@Gmail.com> Hi Mark! Mark Schreiber wrote: > You could just customize BlastEcho to pass on the events of interest, > ignore those that are not interesting. That's what I am doing right now. But I don't know, how to tell my customized BlastEcho to stop, when a certain condition is met during a paricular event call. What's the command for stopping there? > It could also exit if a certain > event occurs. How? > Remember it cost almost nothing to read the file so you > save time by only sending interesting events for parsing. Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 contigs that were blasted against a database with about maybe 3,000,000 genes. The blast output that I am parsing is about 13Gig big and every cycle I am looking for the results of one particular contig of these 90,000 contigs. So I definitely experienced that the time sums up a lot, when it's running in each of these 90,000 cycles over the whole file, although the contig I am looking for was already at the beginning of the file. Cheers, Marcel > > On 7 Mar 2009, 12:01 PM, "Marcel Huntemann" > > wrote: > > But where? I can't do it in my customized handler, can I? > > Mark Schreiber wrote: > Because the blast parser uses event based > parsing you should be able to > c... > > > > >> wrote: > > Hi! > > ... > > >

> > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From mark.schreiber at novartis.com Tue Mar 10 02:36:50 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 10 Mar 2009 10:36:50 +0800 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: <49B56754.6080100@Gmail.com> Message-ID: Hi - There are many ways to stop the parsing but it really depends on how you have set the program up. Notably there is no way for the Blast parsing system of BioJava to shut itself down but control probably shouldn't happen at that level. A crude but effective procedure is to write out the results when you find the hit of interest and then simply call System.exit() Another approach would be to spawn Tasks to parse each record and then have them signal to the main thread when they are complete to shut them down. If you are using Java 1.5 or earlier then you would need to do this with Threads. If you have a later version you can use the concurrent packages which are much nicer to deal with. One thing I don't understand is why you don't blast each contig separately, in that case the results would only contain your hit of interest. That means 90K separate blasts but there are versions of blast that run on clusters and the database (3 million genes) is not huge so it should be an embarrassingly parallel problem? - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: > Hi Mark! > > Mark Schreiber wrote: > > You could just customize BlastEcho to pass on the events of interest, > > ignore those that are not interesting. > That's what I am doing right now. But I don't know, how to tell my > customized BlastEcho to stop, when a certain condition is met during a > paricular event call. What's the command for stopping there? > > > It could also exit if a certain > > event occurs. > How? > > > Remember it cost almost nothing to read the file so you > > save time by only sending interesting events for parsing. > Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 > contigs that were blasted against a database with about maybe 3,000,000 > genes. The blast output that I am parsing is about 13Gig big and every > cycle I am looking for the results of one particular contig of these > 90,000 contigs. So I definitely experienced that the time sums up a lot, > when it's running in each of these 90,000 cycles over the whole file, > although the contig I am looking for was already at the beginning ofthe file. > > > Cheers, > Marcel > > > > > On 7 Mar 2009, 12:01 PM, "Marcel Huntemann" > > > wrote: > > > > But where? I can't do it in my customized handler, can I? > > > > Mark Schreiber wrote: > Because the blast parser uses event based > > parsing you should be able to > c... > > > > > > > > >> wrote: > > Hi! > > ... > > > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From hlapp at gmx.net Tue Mar 10 03:36:07 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 9 Mar 2009 23:36:07 -0400 Subject: [Biojava-l] Google Summer of Code: Call for Bio* Volunteers In-Reply-To: <1F570555-12DF-42DF-8D0E-95AAE298D76A@gmx.net> References: <1F570555-12DF-42DF-8D0E-95AAE298D76A@gmx.net> Message-ID: You may recall my message to the developer lists of several O|B|F projects in February about the idea of O|B|F applying to Google Summer of Code as a mentoring organization [1]. I felt that the response to this was very positive and encouraging. Although late (sorry, been swamped too much), I've now put up the skeleton of an ideas page at http://open-bio.org/wiki/Google_Summer_Code_2009 I basically modeled (in fact, largely copied) this page after the NESCent Phyloinformatics Summer of Code ideas pages, which I think worked pretty well. We can completely rework this, though - any feedback and suggestions are very much welcome. In the meantime, I need all developers to double check the information under 'Contact'. Would the open-bio-l mailing list indeed reach the prospective mentors and other devs? Will be you be fine with students asking for feedback to their applications on the developers (i.e., this) list? Is there a blessed IRC where at least some of the prospective mentors hang out for students to ask questions during the time they apply? I also need space for the reference information for all projects that will participate with at least one project idea (I would hope that that's all projects) to be added in the 'Open-Bio projects involved' section. ***** Most important of all, if you can volunteer to mentor a project, please post a project idea to the page in the respective section, using the idea template that's there already (copy, paste, and edit). ***** The deadline for organization applications is Friday this week, Mar 13, which is very soon. The ideas page is a major factor and component in how Google scores new mentoring organizations - the more we can show the resourcefulness and diversity of our member projects the more competitive I think we'll be. So all those who responded with ideas or willingness to help out as primary or secondary mentores earlier, I need you to think about and put up your idea(s) now. Cheers, -hilmar [1] http://tinyurl.com/ck7tqe -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From tallpaulinjax at yahoo.com Wed Mar 11 13:06:02 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 06:06:02 -0700 (PDT) Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <343262.18611.qm@web30701.mail.mud.yahoo.com> Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: ? # ${HOME} will be replaced by JVM user.home system property # netbeans_default_userdir="${HOME}/.netbeans/6.5" # Options used by NetBeans launcher by default, can be overridden by explicit # command line switches: netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" # Note that a default -Xmx is selected for you automatically. # You can find this value in var/log/messages.log file in your userdir. # The automatically selected value can be overridden by specifying -J-Xmx here # or on the command line. # command line switches netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" # If you specify the heap size (-Xmx) explicitely, you may also want to enable # Concurrent Mark & Sweep garbage collector. In such case add the following # options to the netbeans_default_options: # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled # -J-XX:+UseParNewGC # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) # Default location of JDK, can be overridden by using --jdkhome : netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" # If you have some problems with detect of proxy settings, you may want to enable # detect the proxy settings provided by JDK5 or higher. # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 8:51 AM Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul From tallpaulinjax at yahoo.com Wed Mar 11 13:17:15 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 06:17:15 -0700 (PDT) Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <338940.12456.qm@web30705.mail.mud.yahoo.com> I believe I have just answered my own question. Since I am just using BioJava as a parser (right now), I added the following line: ? ??? try { ????? pdbreader = new PDBFileReader(); ?? ?? pdbreader.setPath(localFilePath); ?? ?? pdbreader.setAlignSeqRes(false); //?added this line ????? pdbreader.setAutoFetch(true); ???? struc = pdbreader.getStructureById(pdbCode); But I wonder how people handle this problem if they require sequence alignment? ? Paul --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Re: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 9:06 AM Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: ? # ${HOME} will be replaced by JVM user.home system property # netbeans_default_userdir="${HOME}/.netbeans/6.5" # Options used by NetBeans launcher by default, can be overridden by explicit # command line switches: netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" # Note that a default -Xmx is selected for you automatically. # You can find this value in var/log/messages.log file in your userdir. # The automatically selected value can be overridden by specifying -J-Xmx here # or on the command line. # command line switches netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" # If you specify the heap size (-Xmx) explicitely, you may also want to enable # Concurrent Mark & Sweep garbage collector. In such case add the following # options to the netbeans_default_options: # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled # -J-XX:+UseParNewGC # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) # Default location of JDK, can be overridden by using --jdkhome : netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" # If you have some problems with detect of proxy settings, you may want to enable # detect the proxy settings provided by JDK5 or higher. # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 8:51 AM Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul From andreas.prlic at gmail.com Wed Mar 11 13:37:02 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Wed, 11 Mar 2009 13:37:02 +0000 Subject: [Biojava-l] Out of heap space during structure parsing. In-Reply-To: <343262.18611.qm@web30701.mail.mud.yahoo.com> References: <343262.18611.qm@web30701.mail.mud.yahoo.com> Message-ID: <59a41c430903110637sf56a0aaoa23b5203d2e28adf@mail.gmail.com> Hi Paul, If you want you can turn off the SEQRES to ATOM alignment part and reduce memory requirements a bit. You can do this with PdbFileParser.setAlignSeqRes(false); ... Andreas On Wed, Mar 11, 2009 at 1:06 PM, Paul B wrote: > Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: > > # ${HOME} will be replaced by JVM user.home system property > # netbeans_default_userdir="${HOME}/.netbeans/6.5" > # Options used by NetBeans launcher by default, can be overridden by explicit > # command line switches: > netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" > # Note that a default -Xmx is selected for you automatically. > # You can find this value in var/log/messages.log file in your userdir. > # The automatically selected value can be overridden by specifying -J-Xmx here > # or on the command line. > # command line switches > netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" > # If you specify the heap size (-Xmx) explicitely, you may also want to enable > # Concurrent Mark & Sweep garbage collector. In such case add the following > # options to the netbeans_default_options: > # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled > # -J-XX:+UseParNewGC > # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) > # Default location of JDK, can be overridden by using --jdkhome : > netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" > # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): > #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" > # If you have some problems with detect of proxy settings, you may want to enable > # detect the proxy settings provided by JDK5 or higher. > # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. > > > --- On Wed, 3/11/09, Paul B wrote: > > > From: Paul B > Subject: Out of heap space during structure parsing. > To: biojava-l at biojava.org > Date: Wednesday, March 11, 2009, 8:51 AM > > > > > > > > Hi, > > I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: > > ??? try { > ????? pdbreader = new PDBFileReader(); > ???? ?pdbreader.setPath(localFilePath); > ????? pdbreader.setAutoFetch(true); > ????? struc = pdbreader.getStructureById(pdbCode); > ??? ... > > Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) > ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) > ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) > ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) > ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) > ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) > ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) > ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) > ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) > ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) > ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) > > Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation? > > By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? > > Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? > Thanks, > > Paul > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From tallpaulinjax at yahoo.com Wed Mar 11 15:15:10 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 08:15:10 -0700 (PDT) Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <371079.36785.qm@web30702.mail.mud.yahoo.com> Hi Scooter, ? That seemed to fix the problem when I added back in the pdbreader.setAutoFetch(true), at least on this file! Thanks for that info, I am somewhat of a novice in Java and Netbeans. The full argument I added to 'run' was: -Xms32m -Xmx1024m -XX:PermSize=32m -XX:MaxPermSize=96m -Xverify:none -Dapple.laf.useScreenMenuBar=true -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+UseParNewGC ? For others: I had to remove all the '-J' prefixes from each argument. ? Andreas, thanks for your input as well! ? Paul --- On Wed, 3/11/09, Homer Willis wrote: From: Homer Willis Subject: RE: [Biojava-l] Out of heap space during structure parsing. To: "Paul B" , biojava-l at biojava.org@yahoo.com Date: Wednesday, March 11, 2009, 10:07 AM Paul The netbeans.conf file is for setting the jvm values for Netbeans which is different from running an application. If you right click on the project->properties->run you will see an option for setting the VM options and this is where you can add settings to add more memory when you run the application. Scooter Willis -----Original Message----- From: Paul B [mailto:tallpaulinjax at yahoo.com] Sent: Wednesday, March 11, 2009 9:06 AM To: biojava-l at biojava.org Subject: Re: [Biojava-l] Out of heap space during structure parsing. Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing? 1024 Mg ram to be used for heap size: ? # ${HOME} will be replaced by JVM user.home system property # netbeans_default_userdir="${HOME}/.netbeans/6.5" # Options used by NetBeans launcher by default, can be overridden by explicit # command line switches: netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true" # Note that a default -Xmx is selected for you automatically. # You can find this value in var/log/messages.log file in your userdir. # The automatically selected value can be overridden by specifying -J-Xmx here # or on the command line. # command line switches netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC" # If you specify the heap size (-Xmx) explicitely, you may also want to enable # Concurrent Mark & Sweep garbage collector. In such case add the following # options to the netbeans_default_options: # -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled # -J-XX:+UseParNewGC # (see http://wiki.netbeans.org/wiki/view/FaqGCPauses) # Default location of JDK, can be overridden by using --jdkhome : netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06" # Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix): #netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2" # If you have some problems with detect of proxy settings, you may want to enable # detect the proxy settings provided by JDK5 or higher. # In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options. --- On Wed, 3/11/09, Paul B wrote: From: Paul B Subject: Out of heap space during structure parsing. To: biojava-l at biojava.org Date: Wednesday, March 11, 2009, 8:51 AM Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul _______________________________________________ Biojava-l mailing list? -? Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From aumanga at biggjapan.com Thu Mar 12 01:15:05 2009 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 12 Mar 2009 10:15:05 +0900 Subject: [Biojava-l] Reading meta data in PDB file? In-Reply-To: <49B4A45A.3000102@biggjapan.com> References: <49B4A45A.3000102@biggjapan.com> Message-ID: <49B86219.1060501@biggjapan.com> I assume there is no way to read these information and I have to modify the PDB parser ? Ashika Umanga Umagiliya wrote: > Greetings all, > I want to read following information.I noticed that most of them are > stored under REMARK tag in PDB.Can I read them using BioJava PDB Parser? > > Thanks in Advance, > Umanga. > ------------------------------ > > Unit Cell: > a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? ) > > Molecular Description Asymetric Unit: > Polymer , Molecule , Chains > > Classification > > Source: > Polymer,Scientific Name > > Ligand Chemical Component : > Identifier , Name , Formula > > Diffraction Detector: > Detector,Type,Collection Data > > Diffration Radiation: > Monochromator,Diffraction Protocol,Wavelength,Wavelenth List > -- ??? ???? ????? ???????????????????BiGG) ?140-0001 ?????????3-6-9 ??????8F TEL:03-6679-8763 FAX:03-6679-8764 From andreas.prlic at gmail.com Thu Mar 12 01:23:19 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 12 Mar 2009 01:23:19 +0000 Subject: [Biojava-l] Reading meta data in PDB file? In-Reply-To: <49B86219.1060501@biggjapan.com> References: <49B4A45A.3000102@biggjapan.com> <49B86219.1060501@biggjapan.com> Message-ID: <59a41c430903111823u1d3626acv69aa287decba649e@mail.gmail.com> Hi Ashika, actually some of the data is available via the Compound class, but for the other parts of your query that is not there you would need to add additional methods to the parser. if you have a patch, would be great if you could post it to the list... Andreas 2009/3/12 Ashika Umanga Umagiliya : > I assume there is no way to read these information and I have to modify the > PDB parser ? > > > Ashika Umanga Umagiliya wrote: >> >> Greetings all, >> I want to read following information.I noticed that most of them are >> stored under REMARK tag in PDB.Can I read them using BioJava PDB Parser? >> >> Thanks in Advance, >> Umanga. >> ------------------------------ >> >> Unit Cell: >> a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? ) >> >> Molecular Description Asymetric Unit: >> Polymer , Molecule , Chains >> >> Classification >> >> Source: >> Polymer,Scientific Name >> >> Ligand Chemical Component : >> Identifier , Name , Formula >> >> Diffraction Detector: >> Detector,Type,Collection Data >> >> Diffration Radiation: >> Monochromator,Diffraction Protocol,Wavelength,Wavelenth List >> > > > -- > $B%"%7%+(B $B%&%^%s%,(B $B%&%^%.%j%d(B > $B-j9q:]%P%$%*%$%s%U%)%^%F%#%/%98&5f=j!J(BBiGG) > $B")(B140-0001 > $BEl5~ETIJ at n6hKLIJ@n(B3-6-9 $B%"%s%I%&%S%k(B8F > TEL:03-6679-8763 > FAX:03-6679-8764 > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From marcel.huntemann at gmail.com Thu Mar 12 03:00:38 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Wed, 11 Mar 2009 20:00:38 -0700 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: References: Message-ID: <49B87AD6.7010005@Gmail.com> Hi Mark! The blast etc. is parallelized. The contigs are split into groups of 1000 and I also modified my program in the way that it works now with all those separate files. But nevertheless I also have a program that works on the concatenated blast output. The parser with my customized handler is always looking for the results of a certain contig and then compares these results to something else and also does some other stuff in-between to calculate some statistics and then creates a new parser again to get the results for the next contig. So a System.exit() is not an option, since it would stop my whole program (in which I am using the parser). I also don't wanna start working with threads here. I was just hoping that there would be a way to tell the handler that, when a certain condition is met, it should give the parser a signal to stop parsing (and maybe even to reset itself to the first line). But I guess there's no way to do it in the customized handler... Thanks, Marcel mark.schreiber at novartis.com wrote: > > Hi - > > There are many ways to stop the parsing but it really depends on how you > have set the program up. Notably there is no way for the Blast parsing > system of BioJava to shut itself down but control probably shouldn't > happen at that level. > > A crude but effective procedure is to write out the results when you > find the hit of interest and then simply call System.exit() > > Another approach would be to spawn Tasks to parse each record and then > have them signal to the main thread when they are complete to shut them > down. If you are using Java 1.5 or earlier then you would need to do > this with Threads. If you have a later version you can use the > concurrent packages which are much nicer to deal with. > > One thing I don't understand is why you don't blast each contig > separately, in that case the results would only contain your hit of > interest. That means 90K separate blasts but there are versions of > blast that run on clusters and the database (3 million genes) is not > huge so it should be an embarrassingly parallel problem? > > - Mark > > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: > >> Hi Mark! >> >> Mark Schreiber wrote: >> > You could just customize BlastEcho to pass on the events of interest, >> > ignore those that are not interesting. >> That's what I am doing right now. But I don't know, how to tell my >> customized BlastEcho to stop, when a certain condition is met during a >> paricular event call. What's the command for stopping there? >> >> > It could also exit if a certain >> > event occurs. >> How? >> >> > Remember it cost almost nothing to read the file so you >> > save time by only sending interesting events for parsing. >> Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 >> contigs that were blasted against a database with about maybe 3,000,000 >> genes. The blast output that I am parsing is about 13Gig big and every >> cycle I am looking for the results of one particular contig of these >> 90,000 contigs. So I definitely experienced that the time sums up a lot, >> when it's running in each of these 90,000 cycles over the whole file, >> although the contig I am looking for was already at the beginning > ofthe file. >> >> >> Cheers, >> Marcel From mark.schreiber at novartis.com Thu Mar 12 03:49:54 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 12 Mar 2009 11:49:54 +0800 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: <49B87AD6.7010005@Gmail.com> Message-ID: Hi Marcel - One possible solution would be to customise the handler and the parser so they can talk to each other and the handler can make call backs to the parser. However, there is a fundamental problem with the BlastLikeSAXParser. Because it is a SAX parser it is not at all suited to bouncing around the file it is parsing because SAX parsing is event based. Therefore I think you need a different paradigm. If you have lots of memory you could go with something that is more like a DOM parser and reads the whole file into memory (or uses java nio to pretend to) and use something like XQuery to find what you want. If you are using BLAST XML output you could also build an object tree with JAXB and navigate that. You can also combine SAX and DOM to read memory sized chunks in one go but this can be clunky. Note, I am assuming you will use BLAST XML. If you are not I would strongly encourage it for the task you describe. It will also make you parsers much more robust to BLAST version changes. Sorry the standard BioJava model can't really help here but please consider posting you're solution or adding it as a recipe in the cookbook as others are sure to have similar problems soon. - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/12/2009 11:00:38 AM: > Hi Mark! > > The blast etc. is parallelized. The contigs are split into groups of 1000 > and I also modified my program in the way that it works now with all those > separate files. But nevertheless I also have a program that works on the > concatenated blast output. The parser with my customized handler is always > looking for the results of a certain contig and then compares these > results to something else and also does some other stuff in-between to > calculate some statistics and then creates a new parser again to get the > results for the next contig. So a System.exit() is not an option, since it > would stop my whole program (in which I am using the parser). I also don't > wanna start working with threads here. I was just hoping that there would > be a way to tell the handler that, when a certain condition is met, it > should give the parser a signal to stop parsing (and maybe even to reset > itself to the first line). But I guess there's no way to do it in the > customized handler... > > Thanks, > Marcel > > > mark.schreiber at novartis.com wrote: > > > > Hi - > > > > There are many ways to stop the parsing but it really depends on how you > > have set the program up. Notably there is no way for the Blast parsing > > system of BioJava to shut itself down but control probably shouldn't > > happen at that level. > > > > A crude but effective procedure is to write out the results when you > > find the hit of interest and then simply call System.exit() > > > > Another approach would be to spawn Tasks to parse each record and then > > have them signal to the main thread when they are complete to shut them > > down. If you are using Java 1.5 or earlier then you would need to do > > this with Threads. If you have a later version you can use the > > concurrent packages which are much nicer to deal with. > > > > One thing I don't understand is why you don't blast each contig > > separately, in that case the results would only contain your hit of > > interest. That means 90K separate blasts but there are versions of > > blast that run on clusters and the database (3 million genes) is not > > huge so it should be an embarrassingly parallel problem? > > > > - Mark > > > > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: > > > >> Hi Mark! > >> > >> Mark Schreiber wrote: > >> > You could just customize BlastEcho to pass on the events of interest, > >> > ignore those that are not interesting. > >> That's what I am doing right now. But I don't know, how to tell my > >> customized BlastEcho to stop, when a certain condition is met during a > >> paricular event call. What's the command for stopping there? > >> > >> > It could also exit if a certain > >> > event occurs. > >> How? > >> > >> > Remember it cost almost nothing to read the file so you > >> > save time by only sending interesting events for parsing. > >> Hmm, I am not sure, if it's really almost nothing, when I've about 90,000 > >> contigs that were blasted against a database with about maybe 3,000,000 > >> genes. The blast output that I am parsing is about 13Gig big and every > >> cycle I am looking for the results of one particular contig of these > >> 90,000 contigs. So I definitely experienced that the time sums up a lot, > >> when it's running in each of these 90,000 cycles over the whole file, > >> although the contig I am looking for was already at the beginning > > ofthe file. > >> > >> > >> Cheers, > >> Marcel > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From marcel.huntemann at gmail.com Thu Mar 12 15:40:21 2009 From: marcel.huntemann at gmail.com (Marcel Huntemann) Date: Thu, 12 Mar 2009 08:40:21 -0700 Subject: [Biojava-l] Stop condition for blast parser In-Reply-To: References: Message-ID: <49B92CE5.8060402@Gmail.com> OK, thanks heaps 4 your help, Mark! mark.schreiber at novartis.com wrote: > > Hi Marcel - > > One possible solution would be to customise the handler and the parser > so they can talk to each other and the handler can make call backs to > the parser. > > However, there is a fundamental problem with the BlastLikeSAXParser. > Because it is a SAX parser it is not at all suited to bouncing around > the file it is parsing because SAX parsing is event based. Therefore I > think you need a different paradigm. If you have lots of memory you > could go with something that is more like a DOM parser and reads the > whole file into memory (or uses java nio to pretend to) and use > something like XQuery to find what you want. If you are using BLAST XML > output you could also build an object tree with JAXB and navigate that. > > You can also combine SAX and DOM to read memory sized chunks in one go > but this can be clunky. > > Note, I am assuming you will use BLAST XML. If you are not I would > strongly encourage it for the task you describe. It will also make you > parsers much more robust to BLAST version changes. > > Sorry the standard BioJava model can't really help here but please > consider posting you're solution or adding it as a recipe in the > cookbook as others are sure to have similar problems soon. > > - Mark > > biojava-l-bounces at lists.open-bio.org wrote on 03/12/2009 11:00:38 AM: > >> Hi Mark! >> >> The blast etc. is parallelized. The contigs are split into groups of 1000 >> and I also modified my program in the way that it works now with all those >> separate files. But nevertheless I also have a program that works on the >> concatenated blast output. The parser with my customized handler is always >> looking for the results of a certain contig and then compares these >> results to something else and also does some other stuff in-between to >> calculate some statistics and then creates a new parser again to get the >> results for the next contig. So a System.exit() is not an option, since it >> would stop my whole program (in which I am using the parser). I also don't >> wanna start working with threads here. I was just hoping that there would >> be a way to tell the handler that, when a certain condition is met, it >> should give the parser a signal to stop parsing (and maybe even to reset >> itself to the first line). But I guess there's no way to do it in the >> customized handler... >> >> Thanks, >> Marcel >> >> >> mark.schreiber at novartis.com wrote: >> > >> > Hi - >> > >> > There are many ways to stop the parsing but it really depends on how you >> > have set the program up. Notably there is no way for the Blast parsing >> > system of BioJava to shut itself down but control probably shouldn't >> > happen at that level. >> > >> > A crude but effective procedure is to write out the results when you >> > find the hit of interest and then simply call System.exit() >> > >> > Another approach would be to spawn Tasks to parse each record and then >> > have them signal to the main thread when they are complete to shut them >> > down. If you are using Java 1.5 or earlier then you would need to do >> > this with Threads. If you have a later version you can use the >> > concurrent packages which are much nicer to deal with. >> > >> > One thing I don't understand is why you don't blast each contig >> > separately, in that case the results would only contain your hit of >> > interest. That means 90K separate blasts but there are versions of >> > blast that run on clusters and the database (3 million genes) is not >> > huge so it should be an embarrassingly parallel problem? >> > >> > - Mark >> > >> > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM: >> > >> >> Hi Mark! >> >> >> >> Mark Schreiber wrote: >> >> > You could just customize BlastEcho to pass on the events of interest, >> >> > ignore those that are not interesting. >> >> That's what I am doing right now. But I don't know, how to tell my >> >> customized BlastEcho to stop, when a certain condition is met during a >> >> paricular event call. What's the command for stopping there? >> >> >> >> > It could also exit if a certain >> >> > event occurs. >> >> How? >> >> >> >> > Remember it cost almost nothing to read the file so you >> >> > save time by only sending interesting events for parsing. >> >> Hmm, I am not sure, if it's really almost nothing, when I've about > 90,000 >> >> contigs that were blasted against a database with about maybe 3,000,000 >> >> genes. The blast output that I am parsing is about 13Gig big and every >> >> cycle I am looking for the results of one particular contig of these >> >> 90,000 contigs. So I definitely experienced that the time sums up a > lot, >> >> when it's running in each of these 90,000 cycles over the whole file, >> >> although the contig I am looking for was already at the beginning >> > ofthe file. >> >> >> >> >> >> Cheers, >> >> Marcel >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for > the exclusive use of the individual or entity named above and may > contain information that is privileged, confidential or exempt from > disclosure under applicable law. If the reader of this message is not > the intended recipient, or the employee or agent responsible for > delivery of the message to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this > communication in error, please notify the sender immediately by e-mail > and delete the material from any computer. Thank you. From pwrose at ucsd.edu Thu Mar 12 16:52:29 2009 From: pwrose at ucsd.edu (Peter Rose) Date: Thu, 12 Mar 2009 09:52:29 -0700 Subject: [Biojava-l] Java Developer and Scientific Software Developer Jobs at Protein Data Bank, UCSD Message-ID: <000001c9a332$eeb3c1a0$cc1b44e0$@edu> The PDB has openings for Java Developers and Scientific Software Developers at the University of California San Diego. http://www.pdb.org/pdb/static.do?p=general_information/about_pdb/contact/job _listings.html -Peter Rose From andreas.prlic at gmail.com Fri Mar 13 14:48:25 2009 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Fri, 13 Mar 2009 07:48:25 -0700 Subject: [Biojava-l] Reading meta data in PDB file? In-Reply-To: References: <49B4A45A.3000102@biggjapan.com> <49B86219.1060501@biggjapan.com> <59a41c430903111823u1d3626acv69aa287decba649e@mail.gmail.com> Message-ID: <59a41c430903130748l29708cao1bbbe6289377fe5c@mail.gmail.com> if you mean residues that have been mentioned in the REMARK section, then those won't get parsed at the present. but you can have a look at the following page to see how it deals with SEQRES and ATOM records: http://biojava.org/wiki/BioJava:CookBook:PDB:seqres Andreas On Thu, Mar 12, 2009 at 8:50 AM, Anant Jain wrote: > Greetings, > > Even i want to retrieve position of missing residues but i am not getting > any method, should i try regex in java for the given problem.. > > > On 3/12/09, Andreas Prlic wrote: >> >> Hi Ashika, >> >> actually some of the data is available via the Compound class, but for >> the other parts of your query that is not there you would need to add >> additional methods to the parser. if you have a patch, would be great >> if you could post it to the list... >> >> Andreas >> >> >> >> 2009/3/12 Ashika Umanga Umagiliya : >> > I assume there is no way to read these information and I have to modify >> > the >> > PDB parser ? >> > >> > >> > Ashika Umanga Umagiliya wrote: >> >> >> >> Greetings all, >> >> I want to read following information.I noticed that most of them are >> >> stored under REMARK tag in PDB.Can I read them using BioJava PDB >> >> Parser? >> >> >> >> Thanks in Advance, >> >> Umanga. >> >> ------------------------------ >> >> >> >> Unit Cell: >> >> a,b,c,alpha,beta,gamma values (I assume they are stored in CRYST1 tab? >> >> ) >> >> >> >> Molecular Description Asymetric Unit: >> >> Polymer , Molecule , Chains >> >> >> >> Classification >> >> >> >> Source: >> >> Polymer,Scientific Name >> >> >> >> Ligand Chemical Component : >> >> Identifier , Name , Formula >> >> >> >> Diffraction Detector: >> >> Detector,Type,Collection Data >> >> >> >> Diffration Radiation: >> >> Monochromator,Diffraction Protocol,Wavelength,Wavelenth List >> >> >> > >> > >> > -- >> > $B%"%7%+(B $B%&%^%s%,(B $B%&%^%.%j%d(B >> > $B-j9q:]%P%$%*%$%s%U%)%^%F%#%/%98&5f=j!J(BBiGG) >> > $B")(B140-0001 >> > $BEl5~ETIJ at n6hKLIJ@n(B3-6-9 $B%"%s%I%&%S%k(B8F >> > TEL:03-6679-8763 >> > FAX:03-6679-8764 >> > >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From hlapp at gmx.net Sat Mar 14 22:59:44 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 14 Mar 2009 18:59:44 -0400 Subject: [Biojava-l] Google Summer of Code: application submitted, action needed In-Reply-To: References: <1F570555-12DF-42DF-8D0E-95AAE298D76A@gmx.net> Message-ID: <85E9A7D7-97E5-4D46-BA8C-C37E557BEBF3@gmx.net> Hi all, I have submitted the application yesterday for O|B|F participating in the 2009 Google Summer of Code as a mentoring organization. The application is at http://docs.google.com/Doc?id=dhs98hzv_7zn8bxqjm and is also linked to from the ideas page at http://open-bio.org/wiki/Google_Summer_of_Code_2009 Now keep your fingers crossed, Google is slated to announce acceptances on March 18. This is the last cross-project message re: Summer of Code that addresses mentors and our projects; future messages that I'll post across projects will be primarily for students such as announcing whether we are accepted or not and issuing calls for application. **What we need most and right now is action from our projects' developers and from possible mentors.** Google admins will start reviewing organization applications on Monday. The ideas page has 6 project ideas right now - though the ideas are good ones, the quantity won't be particularly impressive to Google. Therefore, if you have an idea for a summer project for a student please use the C& template (it is commented out now but you'll see it when you pull the Ideas section into the editor) and put it up there ASAP. If you're not sure yet who'll mentor, put tentative names there. We don't need a full commitment from mentors until the student application period starts (March 23). Next, for all projects, the leads and/or volunteers should check the reference information for their project: http://open-bio.org/wiki/Google_Summer_of_Code_2009#Open-Bio_projects_involved I just culled these links from the various project websites - it'd be much appreciated if going forward everyone can lend a hand in this. Please review what's there and add or fix as you see fit. *These links must be correct and complete - otherwise potential students may not find you.* Finally, all prospective mentors, primary or secondary, committed or not, and anyone else who would like to volunteer to help out, should subscribe themselves ASAP to the mailing list for communicating GSoC- related administrivia: http://lists.open-bio.org/mailman/listinfo/gsoc I will *not* cross-post all administrative announcements or requests for information, and so you *will* miss information if you don't subscribe yourself there. (Note: students will be subscribed there only *after* acceptance). Those who are considering to mentor, primary or helping out, please also add yourselves to the Mentors section on the Ideas page (and check your link if you're already there): http://open-bio.org/wiki/Google_Summer_of_Code_2009#Mentors Cheers everyone, and fingers crossed! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Mar 18 18:45:50 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 18 Mar 2009 14:45:50 -0400 Subject: [Biojava-l] OBF application for Summer of Code has been rejected Message-ID: <44D1FAFD-B5D7-418B-9FDA-6945219A5481@gmx.net> I hope to find out later why, but our Google Summer of Code application as an umbrella org has been rejected. However, NESCent has been accepted. If you can give your project idea a phylogenetics/phyloinformatics focus, go and put it up on the NESCent ideas page at http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009 Do so pretty much **now** - we will start broadcasting and reaching out to students tonight and tomorrow. If someone comes to the site and they don't see a Bio* project that they would have been interested in, they may not check back for updates. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From Stephan.Neumann at gmx.de Fri Mar 20 11:32:35 2009 From: Stephan.Neumann at gmx.de (Stephan Neumann) Date: Fri, 20 Mar 2009 12:32:35 +0100 Subject: [Biojava-l] Problem with SingleDP.viterbi(..) Message-ID: <49C37ED3.9020606@gmx.de> Hello, we have a question concerning the Viterbi algorithm in BioJava: We are building Profile HMMs and performing BaumWelch algorithm to improve them. Finally we are performing the Viterbi algorithm on the resulting dynamic programming matrix (SingleDP). Now, we are wondering where is the difference between the different ScoreTypes: (ODDS, PROBABILITY, NULL_MODEL) of SingleDP.viterbi and how can we interpret the resulting StatePaths. Furthermore, what does StatePath.getScore mean? Best regards, Stephan. From jp at javaclass.co.uk Tue Mar 24 12:51:49 2009 From: jp at javaclass.co.uk (JP) Date: Tue, 24 Mar 2009 12:51:49 +0000 Subject: [Biojava-l] fjoin algorithm implementation in Java (submission to BioJava) Message-ID: <4adc29060903240551l749650e3l2254356c126d9bdc@mail.gmail.com> Hi there at BioJava, I have implemented an algorithm in Java for efficient computation of feature overlap (fjoin algorithm: described in http://www.liebertonline.com/doi/abs/10.1089/cmb.2006.13.1457 and the actual paper here http://www.liebertonline.com/doi/pdf/10.1089/cmb.2006.13.1457) as part of an MSc project in Bioinformatics at Imperial College (London). I have looked at biojava to see if the implementation of this algorithm already exists and it didn't (a simple search for "fjoin" returns no results) so after I implemented it I thought I'd submit it and maybe spare a few hours development to someone else (or in the true open source spirit someone might find a bug or improve this). Is this possible and how do I go about moving this forward (I did have a read through the website, and could find anythign related to submission). Any pointers would be greatly appreciated. Keep up the Good Work, Jean-Paul Ebejer, Malta. From charles at imbusch.net Tue Mar 24 18:23:15 2009 From: charles at imbusch.net (Charles Imbusch) Date: Tue, 24 Mar 2009 19:23:15 +0100 Subject: [Biojava-l] NoClassDefFoundError Message-ID: <49C92513.3040902@imbusch.net> Hello, I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file for my project and I can successfully execute it. Now I would like to execute the java code on another machine. I installed Biojava on that machine as well and copied the jar file to it. But I get an error message like this: charles at nougat:~$ java -jar MPI.jar Exception in thread "main" java.lang.NoClassDefFoundError: org/biojava/bio/BioException I checked my installation (on nougat ) and tried to compile one of the example files provided in the demo folder of the Biojava distribution. The compilation worked out. No I'm really wondering what can cause that error. Any answer is appreciated. Cheers, Charles From ahmed.elmasri at gmail.com Tue Mar 24 17:58:20 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Tue, 24 Mar 2009 13:58:20 -0400 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C92513.3040902@imbusch.net> References: <49C92513.3040902@imbusch.net> Message-ID: <5cdd31570903241058s5d02c3f5k7d3db253bb70c3f6@mail.gmail.com> Hi Charles, This is a common java error when you don't have all the required jar in your class-path. Problem with Netbeans or Eclipse is that they set the class-path internally but once you need to use it from another machine you have to create manage your own class-path. In Eclipse, not sure about NetBeans, you can export your project as a JAR and you can specify a MANIFEST.MF file. Make sure this this file exists and the project libraries that are required on your labtop are migrated to the new machine and the path within the MANIFEST.MF file is pointing correctly to them. Here is what my MANIFEST.MF file looks like when I migrated biojava project to another machine: Manifest-Version: 1.0 Class-Path: /home/ahamed/I529Fall2009/HW1/bytecode.jar /home/ahamed/I529Fall2009/HW1/biojava.jar Main-Class: mlbio.genes.dna.CodonTableCalculator Hope that fixes your problem. Best wishes, Ahmed On Tue, Mar 24, 2009 at 2:23 PM, Charles Imbusch wrote: > Hello, > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file > for my project and I can successfully execute it. > > Now I would like to execute the java code on another machine. I installed > Biojava on > that machine as well and copied the jar file to it. > > But I get an error message like this: > > charles at nougat:~$ java -jar MPI.jar > Exception in thread "main" java.lang.NoClassDefFoundError: > org/biojava/bio/BioException > > I checked my installation (on nougat ) and tried to compile one of the > example files provided in the > demo folder of the Biojava distribution. The compilation worked out. > > No I'm really wondering what can cause that error. > > Any answer is appreciated. > > Cheers, > Charles > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From holland at eaglegenomics.com Tue Mar 24 19:12:33 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 24 Mar 2009 19:12:33 +0000 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C92513.3040902@imbusch.net> References: <49C92513.3040902@imbusch.net> Message-ID: <49C930A1.9000202@eaglegenomics.com> Depends what you mean by 'installed biojava'. Have you tried running this command: java -jar MPI.jar -cp /path/to/biojava.jar If that works, then the problem will be that your biojava.jar is not installed in the correct location for java to pick it up system-wide. Compiling the demos does not rely on having jars installed as it picks everything up from inside the biojava source distribution. cheers, Richard Charles Imbusch wrote: > Hello, > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file > for my project and I can successfully execute it. > > Now I would like to execute the java code on another machine. I > installed Biojava on > that machine as well and copied the jar file to it. > > But I get an error message like this: > > charles at nougat:~$ java -jar MPI.jar > Exception in thread "main" java.lang.NoClassDefFoundError: > org/biojava/bio/BioException > > I checked my installation (on nougat ) and tried to compile one of the > example files provided in the > demo folder of the Biojava distribution. The compilation worked out. > > No I'm really wondering what can cause that error. > > Any answer is appreciated. > > Cheers, > Charles > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ahmed.elmasri at gmail.com Tue Mar 24 20:53:24 2009 From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.) Date: Tue, 24 Mar 2009 16:53:24 -0400 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C930A1.9000202@eaglegenomics.com> References: <49C92513.3040902@imbusch.net> <49C930A1.9000202@eaglegenomics.com> Message-ID: <5cdd31570903241353g3101620cjb9544c30c6954c06@mail.gmail.com> Dear Richard, I know you are answering Charles's question below. However, I thought it might be worthwhile mentioning that Java ignores the classpath argument if you are running a jar i.e. using "-jar" option. Best wishes, Ahmed On Tue, Mar 24, 2009 at 3:12 PM, Richard Holland wrote: > Depends what you mean by 'installed biojava'. > > Have you tried running this command: > > java -jar MPI.jar -cp /path/to/biojava.jar > > If that works, then the problem will be that your biojava.jar is not > installed in the correct location for java to pick it up system-wide. > > Compiling the demos does not rely on having jars installed as it picks > everything up from inside the biojava source distribution. > > cheers, > Richard > > Charles Imbusch wrote: > > Hello, > > > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a jar file > > for my project and I can successfully execute it. > > > > Now I would like to execute the java code on another machine. I > > installed Biojava on > > that machine as well and copied the jar file to it. > > > > But I get an error message like this: > > > > charles at nougat:~$ java -jar MPI.jar > > Exception in thread "main" java.lang.NoClassDefFoundError: > > org/biojava/bio/BioException > > > > I checked my installation (on nougat ) and tried to compile one of the > > example files provided in the > > demo folder of the Biojava distribution. The compilation worked out. > > > > No I'm really wondering what can cause that error. > > > > Any answer is appreciated. > > > > Cheers, > > Charles > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Ahmed Abdeen Hamed Scientific Informatics Project Leader Marine Biological Laboratory Woods Hole, MA -- Ph.D. student, Complex Systems School of Informatics, Indiana University From mark.schreiber at novartis.com Wed Mar 25 04:03:31 2009 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 25 Mar 2009 12:03:31 +0800 Subject: [Biojava-l] fjoin algorithm implementation in Java (submission to BioJava) In-Reply-To: <4adc29060903240551l749650e3l2254356c126d9bdc@mail.gmail.com> Message-ID: Hi - If you wish to contribute the code you could give it to someone with a development account (Andreas Prilc might be the best option). Alternatively if you think you will be regularly contributing you could apply for a development account. Because fjoin works on features and locations and as BioJava already has a detailed location model it would be good if your code makes use of the BioJava location/ feature API (at least in the public interface). - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/24/2009 08:51:49 PM: > Hi there at BioJava, > > I have implemented an algorithm in Java for efficient computation of feature > overlap (fjoin algorithm: described in > http://www.liebertonline.com/doi/abs/10.1089/cmb.2006.13.1457 and the actual > paper here http://www.liebertonline.com/doi/pdf/10.1089/cmb.2006.13.1457) as > part of an MSc project in Bioinformatics at Imperial College (London). > > I have looked at biojava to see if the implementation of this algorithm > already exists and it didn't (a simple search for "fjoin" returns no > results) so after I implemented it I thought I'd submit it and maybe spare a > few hours development to someone else (or in the true open source spirit > someone might find a bug or improve this). > > Is this possible and how do I go about moving this forward (I did have a > read through the website, and could find anythign related to submission). > > Any pointers would be greatly appreciated. > > Keep up the Good Work, > Jean-Paul Ebejer, Malta. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From holland at eaglegenomics.com Wed Mar 25 08:44:50 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 25 Mar 2009 08:44:50 +0000 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <5cdd31570903241353g3101620cjb9544c30c6954c06@mail.gmail.com> References: <49C92513.3040902@imbusch.net> <49C930A1.9000202@eaglegenomics.com> <5cdd31570903241353g3101620cjb9544c30c6954c06@mail.gmail.com> Message-ID: <49C9EF02.6020607@eaglegenomics.com> Good point... *doh* sometimes we all forget things! :) Hamed, Ahmed A. wrote: > Dear Richard, > I know you are answering Charles's question below. However, I thought it > might be worthwhile mentioning that Java ignores the classpath argument > if you are running a jar i.e. using "-jar" option. > Best wishes, > Ahmed > > On Tue, Mar 24, 2009 at 3:12 PM, Richard Holland > > wrote: > > Depends what you mean by 'installed biojava'. > > Have you tried running this command: > > java -jar MPI.jar -cp /path/to/biojava.jar > > If that works, then the problem will be that your biojava.jar is not > installed in the correct location for java to pick it up system-wide. > > Compiling the demos does not rely on having jars installed as it picks > everything up from inside the biojava source distribution. > > cheers, > Richard > > Charles Imbusch wrote: > > Hello, > > > > I'm using Netbeans and Biojava on my Laptop. Netbeans creates a > jar file > > for my project and I can successfully execute it. > > > > Now I would like to execute the java code on another machine. I > > installed Biojava on > > that machine as well and copied the jar file to it. > > > > But I get an error message like this: > > > > charles at nougat:~$ java -jar MPI.jar > > Exception in thread "main" java.lang.NoClassDefFoundError: > > org/biojava/bio/BioException > > > > I checked my installation (on nougat ) and tried to compile one of the > > example files provided in the > > demo folder of the Biojava distribution. The compilation worked out. > > > > No I'm really wondering what can cause that error. > > > > Any answer is appreciated. > > > > Cheers, > > Charles > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > -- > Ahmed Abdeen Hamed > Scientific Informatics Project Leader > Marine Biological Laboratory Woods Hole, MA > -- > Ph.D. student, Complex Systems > School of Informatics, Indiana University > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From nir at rosettadesigngroup.com Wed Mar 25 16:18:24 2009 From: nir at rosettadesigngroup.com (Nir London) Date: Wed, 25 Mar 2009 18:18:24 +0200 Subject: [Biojava-l] Rosetta Academic Training Webinar Message-ID: <88F0F36A-FC4D-4A9C-AC31-5B883C3F92CB@rosettadesigngroup.com> The Rosetta Design Group is proud to present the first webinar in the Rosetta Academic Workshop Series. For the first webinar, we have selected to focus on Protein-Protein Docking based on the answers to the interest poll. We hope this will be the first in a line of helpful and inspiring webinars to kick-off our Rosetta Academic Workshop Series. What: Protein-Protein Docking When: May 4th 2009, 0800-1000 AM EST Where: Your office! Click here for more details and registration (For non html emails: http://rosettadesigngroup.com/RDGLS/index.php?sid=54479&lang=en ) Pleas note: This is not a promotional webinar. Rosetta is open-source and freeware for academic and non-profit organizations and can be downloaded here from University of Washington's TechTransfer Digital Ventures. The majority of the webinar is concerned with Rosetta 2.3.0. Rosetta 3.0 is still a beta version. Hope to see you there, Nir London. Rosetta Design Group | http://rosettadesigngroup.com/ From charles at imbusch.net Fri Mar 27 19:33:39 2009 From: charles at imbusch.net (Charles Imbusch) Date: Fri, 27 Mar 2009 20:33:39 +0100 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49C921C6.7010202@umn.edu> References: <49C92513.3040902@imbusch.net> <49C921C6.7010202@umn.edu> Message-ID: <49CD2A13.4010101@imbusch.net> Hello, thanks everybody for the replies! In fact I did not copy the dist directory to the other machine, so the libraries needed were not found. Now, I'm almost there, when I want to start the program, I get this message: charles at nougat:~/dist$ java -jar MPI.jar Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad version number in .class file [...] Google has a a lot of information to this. Probably the java version on the machine is to old. I think I can handle this. Cheers, Charles From andreas at sdsc.edu Fri Mar 27 20:09:53 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 27 Mar 2009 13:09:53 -0700 Subject: [Biojava-l] biojava @ ISMB and BOSC Message-ID: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> Hi I would like to organize a BioJava user meeting around the upcoming ISMB conference in July. Anybody interested? Related to this I would also like to submit an abstract for the BOSC - satellite meeting there. I started a wiki page in preparation for this: http://biojava.org/wiki/BOSC2009_Presentation Andreas From SMarkel at accelrys.com Fri Mar 27 20:38:03 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 27 Mar 2009 16:38:03 -0400 Subject: [Biojava-l] biojava @ ISMB and BOSC In-Reply-To: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> References: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74729A92D92@exch1-hi.accelrys.net> Andreas, I plan on being at BOSC and ISMB. I would attend a BioJava user meeting, depending, of course, on what else is scheduled at the same time. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- > bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Friday, 27 March 2009 1:10 PM > To: biojava-l at biojava.org > Subject: [Biojava-l] biojava @ ISMB and BOSC > > Hi > > I would like to organize a BioJava user meeting around the upcoming > ISMB conference in July. Anybody interested? > > Related to this I would also like to submit an abstract for the BOSC - > satellite meeting there. I started a wiki page in preparation for > this: > > http://biojava.org/wiki/BOSC2009_Presentation > > Andreas > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From holland at eaglegenomics.com Sun Mar 29 20:16:25 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 29 Mar 2009 21:16:25 +0100 Subject: [Biojava-l] NoClassDefFoundError In-Reply-To: <49CD2A13.4010101@imbusch.net> References: <49C92513.3040902@imbusch.net> <49C921C6.7010202@umn.edu> <49CD2A13.4010101@imbusch.net> Message-ID: <49CFD719.3020008@eaglegenomics.com> It means that the version of the Java compiler (or the compatibility flag used on the compiler) that was used to compile the BioJava JARs you are using is newer than the version of Java available on the machine you are trying to run the program on! You need to either upgrade Java on the target machine, or compile BioJava from source on the target machine. Charles Imbusch wrote: > Hello, > > thanks everybody for the replies! In fact I did not copy the dist > directory to the other machine, > so the libraries needed were not found. > > Now, I'm almost there, when I want to start the program, I get this > message: > > charles at nougat:~/dist$ java -jar MPI.jar > Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad > version number in .class file > [...] > > Google has a a lot of information to this. Probably the java version on > the machine is to old. > I think I can handle this. > > Cheers, > Charles > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Sun Mar 29 20:16:55 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 29 Mar 2009 21:16:55 +0100 Subject: [Biojava-l] biojava @ ISMB and BOSC In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C74729A92D92@exch1-hi.accelrys.net> References: <59a41c430903271309i5bbc8d3cm27204f55eca2b45f@mail.gmail.com> <1F1240778FB0AF46B4E5A72C44D2C74729A92D92@exch1-hi.accelrys.net> Message-ID: <49CFD737.3000102@eaglegenomics.com> I will be there too, for BOSC only, so will be happy to attend a meeting. Scott Markel wrote: > Andreas, > > I plan on being at BOSC and ISMB. I would attend a BioJava user > meeting, depending, of course, on what else is scheduled at the > same time. > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect email: smarkel at accelrys.com > Accelrys (SciTegic R&D) mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 > San Diego, CA 92121 fax: +1 858 799 5222 > USA web: http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > International Society for Computational Biology > Co-chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > >> -----Original Message----- >> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- >> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic >> Sent: Friday, 27 March 2009 1:10 PM >> To: biojava-l at biojava.org >> Subject: [Biojava-l] biojava @ ISMB and BOSC >> >> Hi >> >> I would like to organize a BioJava user meeting around the upcoming >> ISMB conference in July. Anybody interested? >> >> Related to this I would also like to submit an abstract for the BOSC - >> satellite meeting there. I started a wiki page in preparation for >> this: >> >> http://biojava.org/wiki/BOSC2009_Presentation >> >> Andreas >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From tallpaulinjax at yahoo.com Wed Mar 11 12:58:27 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Wed, 11 Mar 2009 12:58:27 -0000 Subject: [Biojava-l] Out of heap space during structure parsing. Message-ID: <585690.93512.qm@web30705.mail.mud.yahoo.com> Hi, ? I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below: ? ??? try { ????? pdbreader = new PDBFileReader(); ???? ?pdbreader.setPath(localFilePath); ????? pdbreader.setAutoFetch(true);? ????? struc = pdbreader.getStructureById(pdbCode); ??? ... ? Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line: ? Exception in thread "main" java.lang.OutOfMemoryError: Java heap space ??????? at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411) ??????? at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220) ??????? at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140) ??????? at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155) ??????? at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013) ??????? at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439) ??????? at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105) ??????? at biojavatest.PdbDemo.runTest(PdbDemo.java:67) ??????? at biojavatest.PdbDemo.main(PdbDemo.java:58) Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation?? ? By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request? ? Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data? Thanks, ? Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: netbeans.conf Type: application/octet-stream Size: 1965 bytes Desc: not available URL: From tallpaulinjax at yahoo.com Mon Mar 23 13:13:12 2009 From: tallpaulinjax at yahoo.com (tallpaulinjax at yahoo.com) Date: Mon, 23 Mar 2009 13:13:12 -0000 Subject: [Biojava-l] For BioJava List: Possible solution to Hibernate and slow Atom storage. Message-ID: <395990.73913.qm@web30705.mail.mud.yahoo.com> Hi Andreas, ? I saw this?post on slow Atom storage using BioJava and Hibernate: http://www.biojava.org/wiki/BioJava:CookBook:PDB:hibernate I may have an acceptable work-around. I had the same problem, but have 'flattened' the object structure I am using with Hibernate and got a 20x to 25x performance improvement. The problem was the number of objects being held in memory as the file was parsed, similar to how BioJava can run out of heap space. In my design, a PdbMeta record can have hundreds or more of ModelChainResidue objects, and each ModelChainResidue object could have dozens of AtomNorm records. Just to load a 300kb PDB file into BioJava then into my database could take 25 minutes! And I have 4,000 of these files to load! ? So what I did is explained in this post: http://forum.hibernate.org/viewtopic.php?p=2409385#2409385 ? Basically, per PdbMeta row I only kept around the currently needed ModelChainResidue object and AtomNorm object, and garbage collected any others. This meant I couldn't use one big session/transaction and had to split this up into separate transactions, but I gained 25x load times into my database (HIbernate doesn't support nested transactions, and I couldn't see a way within a transaction to remove an object without subsequently deleting it from the database).?My plan is to include a Boolean 'useFastLoad' parameter to the method calls which will turn this feature on and off as-needed. With 4,000 PDB files to load, each one taking a minute or so on average to download, parse into BioJava, and then dump to my database (on my laptop for testing, moving to my server soon), 1 minute per file will still take almost 3 days. ? Perhaps BioJava could use the same strategy with Hibernate? ? Paul ? PS: Here is some background on what I am using BioJava for: ? I am using BioJava to parse PDB files, then converting from BioJava's object structure to one more specific to my needs. I am working with two chemists at The University Of North Florida on this project, which supports my Masters in C.S. thesis. I have attached a preliminary schema (hopefully it won't inflame the SPAM filter :-) ). I just added the PeriodicTable table last night and have to adjust the AtomNorm and AtomDenorm?tables accordingly. Basically, the schema is as follows: ? 1. We imported a high-level list of over 4,000 "representative sample" PDB files into the RepresentativeSample table. This will be used as part of the basis to start filling the PdbMeta, ModelChainResidue, and AtomNorm tables. 2. There may be more of these representative sample lists in the future, so each batch of imports has an entry in RepresentativeSampleMeta. 3. Each PdbMeta entry is unique by PDB Code, DepositionDate, and ModificationDate. 4. A PdbMeta entry can have 0 or more child ModelChainResidue records (usually hundreds). 5. A ModelChainResidue record can have dozens of child?AtomNorm records. 6. For data mining purposes and?join improvements, pertinent info from PdbMeta,?ModelChainResidue, and AtomNorm are dumped into AtomDenorm. 7. The "Lkp" tables are merely static 'helper' tables whose number of records and field entries are expected to remain static. 8. The Error table is where any errors found in the data are dumped by the Java program, by table name and then primary key within that table. 9. MethylDonatedHydrogen table: one of the key areas of interest for the UNF Chemists, including Dr. Robert Vergenz. ? (BTW, I can't figure out how to get the RFactor out of BioJava, and apparently BioJava is removing 'Unknown amino acids' before I have a chance to parse them and add them to my Error table as well... solutions to both those problems? Does BioJava somewhere have a "hasErrors" field based on parsing?) ? ? -------------- next part -------------- A non-text attachment was scrubbed... Name: OverviewSchema.pdf Type: application/pdf Size: 89251 bytes Desc: not available URL: