From mark.schreiber at novartis.com Thu Dec 1 00:34:34 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Dec 1 00:32:20 2005 Subject: [Biojava-l] BaumWelchTrainer Broken??!!! (please help) Message-ID: As a possible work around until this issue can be resolved the BaumWelchSampler can be substituted for a BaumWelchTrainer. Although not technically equivalent they are similar. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From escobarebio at yahoo.com Thu Dec 1 01:53:34 2005 From: escobarebio at yahoo.com (D.Enrique ESCOBAR ESPINOZA) Date: Thu Dec 1 01:58:00 2005 Subject: [Biojava-l] cvs download Message-ID: <20051201065335.14439.qmail@web30504.mail.mud.yahoo.com> i put the files: * bytecode-0.92.jar * commons-cli.jar * commons-collections-2.1.jar * commons-dbcp-1.1.jar * commons-pool-1.1.jar in my biojava directory: C:\biojava is set my classpath set CLASSPATH C:\biojava\biojava.jar;C:\biojava\bytecode-0.92.jar; C:\biojava\commons-cli.jar;C:\biojava\commons-collections-2.1.jar; C:\biojava\commons-dbcp-1.1.jar; C:\biojava\commons-pool-1.1.jar;. i ve done > C: i ve done > cd biojava/ i ve done >cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login when prompted, the password is 'cvs' i ve done > cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl checkout bioperl-live i ve check the new folders created: >a folder named bioperl-live/ has been created i ve done >$ cvs update cvs [update aborted]: C:/MYCVSROOT/CVSROOT: No such file or directory WHAT IS SUPPOSE TO BE MY CVSROOT directory? HOW AM I SUPPOSE TO SET MY CLASSPATH? WHAT I DO FOR OTHER MODULES LIKE BIOJAVA-LIMS? thanks -------------------------------------------------- D.Enrique ESCOBAR ESPINOZA (B.Sc.) http://adn.bioinfo.uqam.ca/~escd07097301/ http://spaces.msn.com/members/escobarebio/ ICQ#: 201778618 ------------------------------------------------- Tel: (514) 523-8398 Montreal QC Canada __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From mark.schreiber at novartis.com Thu Dec 1 02:23:22 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Dec 1 02:21:10 2005 Subject: [Biojava-l] cvs download Message-ID: Hello - To get biojava I would suggest you would need to do this: cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava checkout biojava-live to get biojava-lims (although I don't think this project is still active) cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava checkout biojava-lims to upate, navigate to where ever you checked out biojava-live and do this: cvs -Pd update The -P will remove any empty directories on the CVS tree (there are a few so this is highly recommended) The -d will treat your current directory as CVS_ROOT. To set your class path see http://www.biojava.org/started.html although it looks like you have done that successfully. - Mark "D.Enrique ESCOBAR ESPINOZA" Sent by: biojava-l-bounces@portal.open-bio.org 12/01/2005 02:53 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] cvs download i put the files: * bytecode-0.92.jar * commons-cli.jar * commons-collections-2.1.jar * commons-dbcp-1.1.jar * commons-pool-1.1.jar in my biojava directory: C:\biojava is set my classpath set CLASSPATH C:\biojava\biojava.jar;C:\biojava\bytecode-0.92.jar; C:\biojava\commons-cli.jar;C:\biojava\commons-collections-2.1.jar; C:\biojava\commons-dbcp-1.1.jar; C:\biojava\commons-pool-1.1.jar;. i ve done > C: i ve done > cd biojava/ i ve done >cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login when prompted, the password is 'cvs' i ve done > cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl checkout bioperl-live i ve check the new folders created: >a folder named bioperl-live/ has been created i ve done >$ cvs update cvs [update aborted]: C:/MYCVSROOT/CVSROOT: No such file or directory WHAT IS SUPPOSE TO BE MY CVSROOT directory? HOW AM I SUPPOSE TO SET MY CLASSPATH? WHAT I DO FOR OTHER MODULES LIKE BIOJAVA-LIMS? thanks -------------------------------------------------- D.Enrique ESCOBAR ESPINOZA (B.Sc.) http://adn.bioinfo.uqam.ca/~escd07097301/ http://spaces.msn.com/members/escobarebio/ ICQ#: 201778618 ------------------------------------------------- Tel: (514) 523-8398 Montreal QC Canada __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From ap3 at sanger.ac.uk Thu Dec 1 03:54:41 2005 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu Dec 1 03:51:27 2005 Subject: [Biojava-l] modify structure In-Reply-To: References: Message-ID: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> Hi Tamas, it is possible to access the content of a structure and change/add/drop groups and atoms as you wish. When talking about introducing "point mutations" I assume you want to re-label a residue's main chain atoms and drop the side chain atoms, but keep the Cb one? this would take only a few lines to implement. Cheers, Andreas On 30 Nov 2005, at 16:21, Tamas Horvath wrote: > Is there any way to modify a protein structure by modifying the > contents ofthe Structure object?In short, I have a Structure object, > parsed from a pdb file, and I want tointroduce point mutations to it, > and save the modified structure to a pdbfile for further analysis... > (I intend to use gromacs for instance if itmatters)... > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From christoph.gille at charite.de Thu Dec 1 04:59:35 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Thu Dec 1 04:57:24 2005 Subject: [Biojava-l] 1.4 vs 1.5 Message-ID: <61212.84.190.34.173.1133431175.squirrel@webmail.charite.de> Recently we had a discussion whether Biojava could use the novel features of Java1.5. Since I just have moved two larger applications (633 java files) to Java1.5 I would like to share my experiences with you. Applying the new features to the source code took me two days: This was worth doing because I identified two bugs thanks to the Generics and Annotations of Java 1.5. 1. GENERICS: I added types to all collections E.g. public List getProteinsV() { ... } was turned into public List getProteinsV() { ... } I found one bug where I added the wrong Object type! 2. ANNOTATIONS: I preceded all methods that override a method of the parent class with the annotation @Override. Indeed I found a hidden bug where I mistyped the name of a method ! Instead of of overriding a method I invented a new one which was not intended. This kind of bugs remains unnoticed in a Java1.4 environment. 3. Loops: I achieved a more compact source code by using foreach loops. The code is better readable now. In 1.4 the head of loops sometimes require 3 java lines which is now condensed to one single line. RETROWEAVER A sound argument against 1.5 was the broken compatibility to application servers still working with 1.4 and old Macintosh OSX. I used Retroweaver to convert the class files after compilation into 1.4 class format. As a result the program works on a 1.4 virtual machine as well as on a 1.5 machine. Fortunately, I did not find any problem related to the code conversion by Retroweaver. PERFORMANCE: The foreach loops are slightly slower. The autoboxing feature is dangerous in terms of performance because expensive object creation is hidden. For example the compiler would conveniently replace "10" by new Integer(10) for method parameters that require "Integer" and not "int". Therefore, I do not like autoboxing. I did not try but the alternative to StringBuffer is said to be faster because thread safty is omitted but still lacks standard String operations from other languages. DISADVANTAGES: 1. Jikes can not be used any more. Jikes compiles faster than javac and has a better error report. 2. The make script takes longer because Retroweaver must be run. 3. Some additional class files shipped with Retroweaver are required at runtime and makes the binary larger by 60kbytes. Well, that is not really significant. Conclusions: I would highly recommend migrating Biojava to 1.5. I hope this helps to make a decision. Christoph From hotafin at gmail.com Thu Dec 1 07:26:30 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Thu Dec 1 07:24:18 2005 Subject: [Biojava-l] modify structure In-Reply-To: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> References: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> Message-ID: Exactly... I want to keep a backbone and replace the whole sidechain asnecessary... I know this should be relatively easy, but I'm a bit lost intthe documentation... On 12/1/05, Andreas Prlic wrote:>> Hi Tamas,>> it is possible to access the content of a structure and> change/add/drop groups and atoms as you wish.>> When talking about introducing "point mutations" I assume you want to> re-label a residue's main chain> atoms and drop the side chain atoms, but keep the Cb one? this would> take only a few lines to implement.>> Cheers,> Andreas>> On 30 Nov 2005, at 16:21, Tamas Horvath wrote:>> > Is there any way to modify a protein structure by modifying the> > contents ofthe Structure object?In short, I have a Structure object,> > parsed from a pdb file, and I want tointroduce point mutations to it,> > and save the modified structure to a pdbfile for further analysis...> > (I intend to use gromacs for instance if itmatters)...> > _______________________________________________> > Biojava-l mailing list - Biojava-l@biojava.org> > http://biojava.org/mailman/listinfo/biojava-l> >> >> ----------------------------------------------------------------------->> Andreas Prlic Wellcome Trust Sanger Institute> Hinxton, Cambridge CB10 1SA, UK> +44 (0) 1223 49 6891>> From k.parveen at gmail.com Thu Dec 1 08:56:10 2005 From: k.parveen at gmail.com (Parveen k) Date: Thu Dec 1 09:00:25 2005 Subject: [Biojava-l] help on blast Message-ID: <1373ba70512010556u2ffd4f75l20242b4ff071de0@mail.gmail.com> Thanks for all your ideas. It was really useful . Parveen Date: Wed, 30 Nov 2005 09:42:51 -0800 (PST) From: "W. Eric Trull" Subject: [Biojava-l] help on blast To: biojava-l@biojava.org Cc: fpepin@cs.mcgill.ca, k.parveen@gmail.com Message-ID: < 20051130174251.95303.qmail@web81405.mail.mud.yahoo.com> Content-Type: text/plain; charset=iso-8859-1 I have the same situation where I work, except I have a Swing client instead of an applet. I decided to use NCBI's BLAST implementation (http://www.ncbi.nlm.nih.gov/BLAST/download.shtml) invoked using a command to org.biojava.utils.ExecRunner. I then wrapped the whole thing in a Web Service, which is easier and more flexible than using RMI IMHO. NCBI's BLAST toolkit also contains the executable for building the BLAST database from a FASTA sequence file (formatdb.exe). Be sure to set the BLAST output option to XML (-m 7) and use a org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade to parse the output. I had trouble using the default output as it is different under Windows and *nix. Look at the BioJava in Anger example of parsing BLAST output if you need help here. The one twist here is that you are constrained by the applet security model which, I believe by default, will not allow you to go to a different server for a Web Service unless you sign the applet. Something for you to dig into if you decided to use a Web Service. The rest of my comments assume that you are going to go down the Web Service path. For creation of the Web Service I'm using webMethods GLUE, but that requires a $$ license. I've used Apache's Axis/Tomcat to build web services before and it is pretty easy to use. Building a web service future proofs, IMO, any changes the powers that be may decided about the client side (i.e. "Now we want a .NET application", etc.). If you want a quick prototype, look at IBM's Web Services for Life Sciences ( http://www.alphaworks.ibm.com/tech/ws4LS). They have a BLAST web service that is downloadable and configurable to run in a local environment. However their services are a bit dated (February 7, 2003). One last thought. I'm working under the constraint that I cannot send my query sequence outside my local network. If you DO NOT have this restriction and are just querying public databases, both the NCBI and PDB have web services. The PDB provides a SOAP over HTTP web service (WSDL at http://pdbbeta.rcsb.org/pdbws/rcsbWebService?wsdl) which is currently BETA but will go production January 1, 2006. Point Axis at this WSDL to generate client side code and then look for the blastQuery() methods. The NCBI's web service does not use SOAP, but provides an HTTP interface. See http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml for documentation and a Perl example. Good luck! -Eric Trull --- Francois Pepin fpepin at cs.mcgill.ca wrote: > Hi Parveen, > > This might not be as easy as you might like. > > The applet runs on the client, so you need the applet to communicate > remotely to the server to send the sequence. Then the easiest way would > be for the server to call blast on the command-line with the sequence > (which is pretty easy), parse the result and send it back to the client > applet. > > I think RMI could do this, but I've never had to play with it. > > Anyone has a better way to do this? > > Francois > > On Wed, 2005-11-30 at 16:04 +0530, Parveen k wrote: > > Hi > > I'm pretty new to bioinformatics.i have to incorparate balst in my > > applet.so that when the client enters the sequence ,it should perform the > > blast search against the database we have and return the result.can > anyone > > guide me in this regard. > > > > -- > > Regards > > Parveen K > > > > YOU MAY SAY I AM A DREAMER, BUT I AM NOT THE ONLY ONE. > > I HOPE SOMEDAY YOU WILL JOIN US, AND THE WORLD WILL FOLLOW US. > > - JOHN LENNON > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > Thanks. -W. Eric Trull -- Regards Parveen K YOU MAY SAY I AM A DREAMER, BUT I AM NOT THE ONLY ONE. I HOPE SOMEDAY YOU WILL JOIN US, AND THE WORLD WILL FOLLOW US. - JOHN LENNON From dreher at mpiib-berlin.mpg.de Thu Dec 1 12:11:48 2005 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Thu Dec 1 12:10:53 2005 Subject: [Biojava-l] Problem with downloading Genbank-sequence Message-ID: <438F2ED4.8080105@mpiib-berlin.mpg.de> Hi, the problem is the security-policy of the container I use for my web-application. In this case it's the 'Sun Java System Application Server Platform Edition 8.1'. As Thomas Down suggested, the Server prohibits the creation of ClassLoaders, however they are needed by BioJava. So I tried to customise the Server-configuration 'server.policy'-file by adding a new line. Here is the code fraction: grant { permission java.lang.RuntimePermission "loadLibrary.*"; ... ... //new line: permission java.lang.RuntimePermission "createClassLoader"; }; As some ClassLoader seems to have permission now, I think this was the right starting point - and also the exception thrown changed. It's the following: org.biojava.bio.BioError: Unable to initialize DNATools org.biojava.bio.seq.DNATools.(DNATools.java:119) org.biojava.bio.seq.db.GenbankSequenceDB.getAlphabet(GenbankSequenceDB.java:66) org.biojava.bio.seq.db.GenbankSequenceDB.getSequence(GenbankSequenceDB.java:121) rnai.GenbankDownload.loadGenBankSequence(GenbankDownload.java:23) rnai.seq_input2.prerender(seq_input2.java:296) com.sun.web.ui.appbase.faces.ViewHandlerImpl.prerender(ViewHandlerImpl.java:788) com.sun.web.ui.appbase.faces.ViewHandlerImpl.renderView(ViewHandlerImpl.java:282) com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:87) com.sun.faces.lifecycle.LifecycleImpl.phase(LifecycleImpl.java:221) com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:117) javax.faces.webapp.FacesServlet.service(FacesServlet.java:198) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) un.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:585) org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:249) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAsPrivileged(Subject.java:517) org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:282) org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:165) java.security.AccessController.doPrivileged(Native Method) com.sun.web.ui.util.UploadFilter.doFilter(UploadFilter.java:179) --- DNATools.java calls the following line in AlphabetManager.java: InputStream alphabetStream = ClassTools.getClassLoader(AlphabetManager.class).getResourceAsStream("org/biojava/bio/symbol/AlphabetManager.xml"); So I suppose that the change in the Server-Configuration-file is not 'globally enough' to affect all custom ClassLoader-calls. Maybe someone has experienced something similar or knows something about this specific Server? Thanks, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From toddri at eden.rutgers.edu Thu Dec 1 17:06:39 2005 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu Dec 1 17:26:41 2005 Subject: [Biojava-l] Looking for a Fisher(like) Kernel In-Reply-To: <43825416.1040909@eden.rutgers.edu> References: <43825416.1040909@eden.rutgers.edu> Message-ID: <438F73EF.8020408@eden.rutgers.edu> Hello, I have good news! I have fixed the bug in the BaumWelchTrainer class (hopefully the source in CVS will be updated soon). Now that I am able to train my Profile HMM, I would like to feed my HMM into a Fisher Kernel to perform SVM training in order to find the proper scoring threshold for proper classification (ie - use SVM classification to set a barrier for my HMM log-odds scores). Has anyone implemented a Fisher Kernel (or one like it) for the BioJava SVM classes? Any information here would be greatly appreciated. Thanks, Todd From hotafin at gmail.com Thu Dec 1 20:30:14 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Thu Dec 1 21:56:21 2005 Subject: [Biojava-l] modify structure In-Reply-To: References: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> Message-ID: If I've got a Group , which is an amino acid, and I want to shift it by a 3Dvector (or 3 2D vectors), how may I do it?Similarly, if i want to rotate the same structure, how may I do it?If you just can show me a very short sample code, I'd really appreciate it!Thanks! On 12/1/05, Tamas Horvath wrote:>> Exactly... I want to keep a backbone and replace the whole sidechain as> necessary... I know this should be relatively easy, but I'm a bit lost int> the documentation...>> On 12/1/05, Andreas Prlic wrote:> >> > Hi Tamas,> >> > it is possible to access the content of a structure and> > change/add/drop groups and atoms as you wish.> >> > When talking about introducing "point mutations" I assume you want to> > re-label a residue's main chain> > atoms and drop the side chain atoms, but keep the Cb one? this would> > take only a few lines to implement.> >> > Cheers,> > Andreas> >> > On 30 Nov 2005, at 16:21, Tamas Horvath wrote:> >> > > Is there any way to modify a protein structure by modifying the> > > contents ofthe Structure object?In short, I have a Structure object,> > > parsed from a pdb file, and I want tointroduce point mutations to it,> > > and save the modified structure to a pdbfile for further analysis...> > > (I intend to use gromacs for instance if itmatters)...> > > _______________________________________________> > > Biojava-l mailing list - Biojava-l@biojava.org> > > http://biojava.org/mailman/listinfo/biojava-l> > >> > >> > -----------------------------------------------------------------------> >> > Andreas Prlic Wellcome Trust Sanger Institute> > Hinxton, Cambridge CB10 1SA, UK> > +44 (0) 1223 49 6891> >> >> From escobarebio at yahoo.com Fri Dec 2 00:53:56 2005 From: escobarebio at yahoo.com (D.Enrique ESCOBAR ESPINOZA) Date: Fri Dec 2 00:58:18 2005 Subject: [Biojava-l] cvs downlod install Message-ID: <20051202055356.89235.qmail@web30507.mail.mud.yahoo.com> i put the files: * bytecode-0.92.jar * commons-cli.jar * commons-collections-2.1.jar * commons-dbcp-1.1.jar * commons-pool-1.1.jar in my biojava directory: C:\biojava is set my classpath set CLASSPATH C:\biojava\biojava.jar; C:\biojava\bytecode-0.92.jar; C:\biojava\commons-cli.jar; C:\biojava\commons-collections-2.1.jar; C:\biojava\commons-dbcp-1.1.jar; C:\biojava\commons-pool-1.1.jar;. with cvs download: all biojava-live files where put into C:\biojava\biojava-live\ so i m supposed to move these files up to C:\biojava\ directory? ** when i use in the folder C:\biojava\biojava-live\: (windows) cd demos javac seq\TestEmbl.java i obtain $ javac seq\TestEmbl.java error: cannot read: seqTestEmbl.java 1 error i obtain java seq.TestEmbl seq\AL121903.embl i have: $ java seq.TestEmbl seq\AL121903.embl Exception in thread "main" java.lang.NoClassDefFoundError: seq/TestEmbl -------------------------------------------------- D.Enrique ESCOBAR ESPINOZA (B.Sc.) http://adn.bioinfo.uqam.ca/~escd07097301/ http://spaces.msn.com/members/escobarebio/ ICQ#: 201778618 ------------------------------------------------- Tel: (514) 523-8398 Montreal QC Canada __________________________________ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs From hollandr at gis.a-star.edu.sg Fri Dec 2 01:24:27 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Fri Dec 2 01:22:50 2005 Subject: [Biojava-l] cvs downlod install Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602894D67@BIONIC.biopolis.one-north.com> Your problem lies here: > when i use in the folder C:\biojava\biojava-live\: > (windows) > cd demos > javac seq\TestEmbl.java > i obtain > $ javac seq\TestEmbl.java > error: cannot read: seqTestEmbl.java > 1 error Note that the java file has not been compiled. Hence when you later try to run the compiled class, it's not there, so you get a NoClassDefFound exception. I suspect that either you accidentally left out the backslash (\) between "seq" and "TestEmbl.java" when you typed the javac command, or that Windows is misinterpreting the backslash. Try replacing it with a double backslash (\\) or a forward slash (/). cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > D.Enrique ESCOBAR ESPINOZA > Sent: Friday, December 02, 2005 1:54 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] cvs downlod install > > > i put the files: > * bytecode-0.92.jar > * commons-cli.jar > * commons-collections-2.1.jar > * commons-dbcp-1.1.jar > * commons-pool-1.1.jar > in my biojava directory: C:\biojava > is set my classpath > set CLASSPATH C:\biojava\biojava.jar; > C:\biojava\bytecode-0.92.jar; > C:\biojava\commons-cli.jar; > C:\biojava\commons-collections-2.1.jar; > C:\biojava\commons-dbcp-1.1.jar; > C:\biojava\commons-pool-1.1.jar;. > with cvs download: > all biojava-live files where put into > C:\biojava\biojava-live\ > so i m supposed to move these files up to > C:\biojava\ > directory? > ** > when i use in the folder C:\biojava\biojava-live\: > (windows) > cd demos > javac seq\TestEmbl.java > i obtain > $ javac seq\TestEmbl.java > error: cannot read: seqTestEmbl.java > 1 error > i obtain > java seq.TestEmbl seq\AL121903.embl > i have: > $ java seq.TestEmbl seq\AL121903.embl > Exception in thread "main" java.lang.NoClassDefFoundError: > seq/TestEmbl > > -------------------------------------------------- > D.Enrique ESCOBAR ESPINOZA (B.Sc.) > http://adn.bioinfo.uqam.ca/~escd07097301/ > http://spaces.msn.com/members/escobarebio/ > ICQ#: 201778618 > ------------------------------------------------- > Tel: (514) 523-8398 > Montreal QC Canada > > > > __________________________________ > Start your day with Yahoo! - Make it your home page! > http://www.yahoo.com/r/hs > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From ap3 at sanger.ac.uk Fri Dec 2 05:17:49 2005 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri Dec 2 05:14:22 2005 Subject: [Biojava-l] modify structure In-Reply-To: References: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> Message-ID: Hi Tamas! > If I've got a Group , which is an amino acid, and I want to shift it > by a 3D vector (or 3 2D vectors), how may I do it? There is the org.biojava.bio.structure.Calc class that allows to do calculations with the structure. e.g. to shift a structure do: double x = 2.0; double y = 0.2; double z = 12.3; Atom vector = new AtomImpl(); vector.setX(x); vector.setY(y); vector.setZ(z); // shift the structure. Calc.shift(structure,vector); > Similarly, if i want to rotate the same structure, how may I do it? double[][] matrix = new double[3][3]; matrix[0][0] = 0.1; matrix[0][1] = 0.2; matrix[0][2] = 0.3; matrix[1][0] = 0.4; matrix[1][1] = 0.5; matrix[1][2] = 0.6; matrix[2][0] = 0.7; matrix[2][1] = 0.8; matrix[2][2] = 0.9; Calc.rotate(structure,matrix); And here is an example regarding your questions from yesterday, how to do mutations. most of the code actually deals with finding the right chain and residue. I will add the "mutator" class to cvs, so in future doing mutations will be a two liner... Cheers, Andreas /* * BioJava development code * * This code may be freely distributed and modified under the * terms of the GNU Lesser General Public Licence. This should * be distributed with the code. If you do not have a copy, * see: * * http://www.gnu.org/copyleft/lesser.html * * Copyright for this code is held jointly by the individual * authors. These should be listed in @author doc comments. * * For more information on the BioJava project and its aims, * or to join the biojava-l mailing list, visit the home page * at: * * http://www.biojava.org/ * * Created on Nov 30, 2005 * */ import java.io.FileOutputStream; import java.io.PrintStream; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import org.biojava.bio.structure.AminoAcid; import org.biojava.bio.structure.AminoAcidImpl; import org.biojava.bio.structure.Atom; import org.biojava.bio.structure.AtomIterator; import org.biojava.bio.structure.Chain; import org.biojava.bio.structure.ChainImpl; import org.biojava.bio.structure.Group; import org.biojava.bio.structure.Structure; import org.biojava.bio.structure.StructureImpl; import org.biojava.bio.structure.io.PDBFileReader; import org.biojava.bio.structure.io.PDBParseException; public class structureTest { public structureTest() { super(); } public static void main (String[] args){ String filename = "/Users/ap3/WORK/PDB/5pti.pdb" ; String outputfile = "/Users/ap3/WORK/PDB/mutated.pdb" ; PDBFileReader pdbreader = new PDBFileReader(); try{ Structure struc = pdbreader.getStructure(filename); System.out.println(struc); String chainId = " "; String pdbResnum = "2"; String newType = "ARG"; // mutate the original structure and create a new one. Mutator m = new Mutator(); Structure newstruc = m.mutate(struc,chainId,pdbResnum,newType); FileOutputStream out= new FileOutputStream(outputfile); PrintStream p = new PrintStream( out ); p.println (newstruc.toPDB()); p.close(); } catch (Exception e) { e.printStackTrace(); } } } class Mutator{ List supportedAtoms; public Mutator(){ supportedAtoms = new ArrayList(); supportedAtoms.add("N"); supportedAtoms.add("CA"); supportedAtoms.add("C"); supportedAtoms.add("O"); supportedAtoms.add("CB"); } /** creates a new structure which is identical with the original one. * only one amino acid will be different. * * @param struc * @param chainId * @param pdbResnum * @param newType * @return * @throws PDBParseException */ public Structure mutate(Structure struc, String chainId, String pdbResnum, String newType) throws PDBParseException{ // create a container for the new structure Structure newstruc = new StructureImpl(); // first we need to find our corresponding chain // get the chains for model nr. 0 // if structure is xray there will be only one "model". List chains = struc.getChains(0); // iterate over all chains. Iterator iter = chains.iterator(); while (iter.hasNext()){ Chain c = (Chain)iter.next(); if (c.getName().equals(chainId)) { // here is our chain! Chain newchain = new ChainImpl(); newchain.setName(c.getName()); List groups = c.getGroups(); // now iterate over all groups in this chain. // in order to find the amino acid that has this pdbRenum. Iterator giter = groups.iterator(); while (giter.hasNext()){ Group g = (Group) giter.next(); String rnum = g.getPDBCode(); // we only mutate amino acids // and ignore hetatoms and nucleotides in this case if (rnum.equals(pdbResnum) && (g.getType().equals("amino"))){ // create the mutated amino acid and add it to our new chain AminoAcid newgroup = mutateResidue((AminoAcid)g,newType); newchain.addGroup(newgroup); } else { // add the group to the new chain unmodified. newchain.addGroup(g); } } // add the newly constructed chain to the structure; newstruc.addChain(newchain); } else { // this chain is not requested, add it to the new structure unmodified. newstruc.addChain(c); } } return newstruc; } /** create a new residue which is of the new type. * Only the atoms N, Ca, C, O, Cb will be considered. * prolines are not mutated... * @param oldAmino * @param newType * @return */ public AminoAcid mutateResidue(AminoAcid oldAmino, String newType) throws PDBParseException { AminoAcid newgroup = new AminoAcidImpl(); newgroup.setPDBCode(oldAmino.getPDBCode()); newgroup.setPDBName(newType); AtomIterator aiter =new AtomIterator(oldAmino); while (aiter.hasNext()){ Atom a = (Atom)aiter.next(); if ( supportedAtoms.contains(a.getName())){ newgroup.addAtom(a); } } return newgroup; } } ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From matthew.pocock at ncl.ac.uk Fri Dec 2 06:11:15 2005 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Fri Dec 2 06:28:51 2005 Subject: [Biojava-l] Re: Looking for a Fisher(like) Kernel In-Reply-To: <438F73EF.8020408@eden.rutgers.edu> References: <43825416.1040909@eden.rutgers.edu> <438F73EF.8020408@eden.rutgers.edu> Message-ID: <200512021111.16812.matthew.pocock@ncl.ac.uk> On Thursday 01 December 2005 22:06, Todd Riley wrote: > Hello, > > I have good news! I have fixed the bug in the BaumWelchTrainer class > (hopefully the source in CVS will be updated soon). Yay! What was it? > > Now that I am able to train my Profile HMM, I would like to feed my HMM > into a Fisher Kernel to perform SVM training in order to find the proper > scoring threshold for proper classification (ie - use SVM classification > to set a barrier for my HMM log-odds scores). Sounds like a plan... > > Has anyone implemented a Fisher Kernel (or one like it) for the BioJava > SVM classes? Any information here would be greatly appreciated. > I have not heard of one. However, I think you should be able to calcualte the needed numbers using code nearly identical to that in the BaumWelchTrainer. In fact, I have a sneeking suspicion that the ModelTrainer parameters after 1 cycle of training (before updating the model!) are the raw numbers that the SVM fischer-kernel requires. > Thanks, > Todd Matthew From dreher at mpiib-berlin.mpg.de Fri Dec 2 07:48:00 2005 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Fri Dec 2 07:47:12 2005 Subject: [Biojava-l] Problem with downloading Genbank-sequence In-Reply-To: <438F2ED4.8080105@mpiib-berlin.mpg.de> References: <438F2ED4.8080105@mpiib-berlin.mpg.de> Message-ID: <43904280.80300@mpiib-berlin.mpg.de> Hello, the exception I posted in the last mail had nothing to do with the application server I use. It was an IDE specific bug: 'Java Studio Creator Early Access 2' failed to execute the build.xml-file properly when creating biojava.jar. This was solved by using Netbeans to build the jar-file again and deleting and re-adding it in StudioCreator. Greetings, Felix Felix Dreher wrote: > Hi, > > the problem is the security-policy of the container I use for my > web-application. In this case it's the 'Sun Java System Application > Server Platform Edition 8.1'. As Thomas Down suggested, the Server > prohibits the creation of ClassLoaders, however they are needed by > BioJava. > So I tried to customise the Server-configuration 'server.policy'-file > by adding a new line. Here is the code fraction: > > > grant { > permission java.lang.RuntimePermission "loadLibrary.*"; > ... > ... > //new line: > permission java.lang.RuntimePermission "createClassLoader"; > }; > > > As some ClassLoader seems to have permission now, I think this was the > right starting point - and also the exception thrown changed. It's the > following: > > org.biojava.bio.BioError: Unable to initialize DNATools > org.biojava.bio.seq.DNATools.(DNATools.java:119) > org.biojava.bio.seq.db.GenbankSequenceDB.getAlphabet(GenbankSequenceDB.java:66) > org.biojava.bio.seq.db.GenbankSequenceDB.getSequence(GenbankSequenceDB.java:121) > rnai.GenbankDownload.loadGenBankSequence(GenbankDownload.java:23) > rnai.seq_input2.prerender(seq_input2.java:296) > com.sun.web.ui.appbase.faces.ViewHandlerImpl.prerender(ViewHandlerImpl.java:788) > com.sun.web.ui.appbase.faces.ViewHandlerImpl.renderView(ViewHandlerImpl.java:282) > com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:87) > com.sun.faces.lifecycle.LifecycleImpl.phase(LifecycleImpl.java:221) > com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:117) > javax.faces.webapp.FacesServlet.service(FacesServlet.java:198) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > un.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > java.lang.reflect.Method.invoke(Method.java:585) > org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:249) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAsPrivileged(Subject.java:517) > org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:282) > > org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:165) > java.security.AccessController.doPrivileged(Native Method) > com.sun.web.ui.util.UploadFilter.doFilter(UploadFilter.java:179) > > --- > > DNATools.java calls the following line in AlphabetManager.java: > > InputStream alphabetStream = > ClassTools.getClassLoader(AlphabetManager.class).getResourceAsStream("org/biojava/bio/symbol/AlphabetManager.xml"); > > > So I suppose that the change in the Server-Configuration-file is not > 'globally enough' to affect all custom ClassLoader-calls. > Maybe someone has experienced something similar or knows something > about this specific Server? > > Thanks, > Felix > > > >-- >Felix Dreher >Max-Planck-Institute for Infection Biology >Campus Charit? Mitte >Department of Immunology >Mailing address: Schumannstra?e 21/22 >Visitors: Virchowweg 12 >10117 Berlin >Germany >Tel.: +49 (0)30 28460-254 / -494 >Mobile: +49 (0)163 7542426 > > -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From hotafin at gmail.com Fri Dec 2 08:02:38 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Fri Dec 2 08:00:19 2005 Subject: [Biojava-l] modify structure In-Reply-To: References: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> Message-ID: Thanks for the codes! I've noticed the methods in Calc, but my main questionis the following. Let's say I've got a primitive library of AminoAcides.They stored as a group, they have all the atoms. When I'm mutating thechain, I want to keep the backbone atoms in place, so as far as your mutatemethod goes it's ok. But now I want to replace the sidechain. In order to dothat, I'd shift and rotate the desired AA in place (Cbs would be identicaland the other backbone atoms as close as possible), and then copy thesidechain atoms to the mutated AA... (I hope that's clear) So do I have to wrap my Group objects to a Chain/Structure object in orderto shift and rotate them? I don't really get how the rotation is supposed to work... what is exactlythe matrix it asks for? On 12/2/05, Andreas Prlic wrote:>> Hi Tamas!>>> > If I've got a Group , which is an amino acid, and I want to shift it> > by a 3D vector (or 3 2D vectors), how may I do it?>> There is the org.biojava.bio.structure.Calc class that allows to do> calculations with the structure.>> e.g. to shift a structure do:>>> double x = 2.0;> double y = 0.2;> double z = 12.3;>> Atom vector = new AtomImpl();> vector.setX(x);> vector.setY(y);> vector.setZ(z);>> // shift the structure.> Calc.shift(structure,vector);>>>> > Similarly, if i want to rotate the same structure, how may I do it?>> double[][] matrix = new double[3][3];>> matrix[0][0] = 0.1;> matrix[0][1] = 0.2;> matrix[0][2] = 0.3;> matrix[1][0] = 0.4;> matrix[1][1] = 0.5;> matrix[1][2] = 0.6;> matrix[2][0] = 0.7;> matrix[2][1] = 0.8;> matrix[2][2] = 0.9;>> Calc.rotate(structure,matrix);>>> And here is an example regarding your questions from yesterday,> how to do mutations. most of the code actually deals with finding the> right chain and residue.> I will add the "mutator" class to cvs, so in future doing mutations> will be a two liner...>> Cheers,> Andreas>>> /*> * BioJava development code> *> * This code may be freely distributed and modified under the> * terms of the GNU Lesser General Public Licence. This should> * be distributed with the code. If you do not have a copy,> * see:> *> * http://www.gnu.org/copyleft/lesser.html> *> * Copyright for this code is held jointly by the individual> * authors. These should be listed in @author doc comments.> *> * For more information on the BioJava project and its aims,> * or to join the biojava-l mailing list, visit the home page> * at:> *> * http://www.biojava.org/> *> * Created on Nov 30, 2005> *> */>> import java.io.FileOutputStream;> import java.io.PrintStream;> import java.util.ArrayList;> import java.util.Iterator;> import java.util.List;>> import org.biojava.bio.structure.AminoAcid;> import org.biojava.bio.structure.AminoAcidImpl;> import org.biojava.bio.structure.Atom;> import org.biojava.bio.structure.AtomIterator;> import org.biojava.bio.structure.Chain;> import org.biojava.bio.structure.ChainImpl;> import org.biojava.bio.structure.Group;> import org.biojava.bio.structure.Structure;> import org.biojava.bio.structure.StructureImpl;> import org.biojava.bio.structure.io.PDBFileReader;> import org.biojava.bio.structure.io.PDBParseException;>>> public class structureTest {>> public structureTest() {> super();>> }>> public static void main (String[] args){> String filename = "/Users/ap3/WORK/PDB/5pti.pdb" ;> String outputfile = "/Users/ap3/WORK/PDB/mutated.pdb" ;>> PDBFileReader pdbreader = new PDBFileReader();>> try{> Structure struc = pdbreader.getStructure(filename);> System.out.println(struc);>>> String chainId = " ";> String pdbResnum = "2";> String newType = "ARG";>> // mutate the original structure and create a new one.> Mutator m = new Mutator();> Structure newstruc => m.mutate(struc,chainId,pdbResnum,newType);>> FileOutputStream out= new FileOutputStream(outputfile);> PrintStream p = new PrintStream( out );>> p.println (newstruc.toPDB());>> p.close();>>> } catch (Exception e) {> e.printStackTrace();> }> }> }>> class Mutator{> List supportedAtoms;>> public Mutator(){> supportedAtoms = new ArrayList();> supportedAtoms.add("N");> supportedAtoms.add("CA");> supportedAtoms.add("C");> supportedAtoms.add("O");> supportedAtoms.add("CB");> }>> /** creates a new structure which is identical with the original> one.> * only one amino acid will be different.> *> * @param struc> * @param chainId> * @param pdbResnum> * @param newType> * @return> * @throws PDBParseException> */> public Structure mutate(Structure struc, String chainId, String> pdbResnum, String newType)> throws PDBParseException{>>> // create a container for the new structure> Structure newstruc = new StructureImpl();>> // first we need to find our corresponding chain>> // get the chains for model nr. 0> // if structure is xray there will be only one "model".> List chains = struc.getChains(0);>> // iterate over all chains.> Iterator iter = chains.iterator();> while (iter.hasNext()){> Chain c = (Chain)iter.next();> if (c.getName().equals(chainId)) {> // here is our chain!>> Chain newchain = new ChainImpl();> newchain.setName(c.getName());>> List groups = c.getGroups();>> // now iterate over all groups in this chain.> // in order to find the amino acid that has this> pdbRenum.>> Iterator giter = groups.iterator();> while (giter.hasNext()){> Group g = (Group) giter.next();> String rnum = g.getPDBCode();>> // we only mutate amino acids> // and ignore hetatoms and nucleotides in this case> if (rnum.equals(pdbResnum) &&> (g.getType().equals("amino"))){>> // create the mutated amino acid and add it to> our new chain> AminoAcid newgroup => mutateResidue((AminoAcid)g,newType);> newchain.addGroup(newgroup);> }> else {> // add the group to the new chain unmodified.> newchain.addGroup(g);> }> }>> // add the newly constructed chain to the structure;> newstruc.addChain(newchain);> } else {> // this chain is not requested, add it to the new> structure unmodified.> newstruc.addChain(c);> }>> }> return newstruc;> }>> /** create a new residue which is of the new type.> * Only the atoms N, Ca, C, O, Cb will be considered.> * prolines are not mutated...> * @param oldAmino> * @param newType> * @return> */> public AminoAcid mutateResidue(AminoAcid oldAmino, String newType)> throws PDBParseException {>> AminoAcid newgroup = new AminoAcidImpl();>> newgroup.setPDBCode(oldAmino.getPDBCode());> newgroup.setPDBName(newType);>>> AtomIterator aiter =new AtomIterator(oldAmino);> while (aiter.hasNext()){> Atom a = (Atom)aiter.next();> if ( supportedAtoms.contains(a.getName())){> newgroup.addAtom(a);> }> }>> return newgroup;>> }>> }>> ----------------------------------------------------------------------->> Andreas Prlic Wellcome Trust Sanger Institute> Hinxton, Cambridge CB10 1SA, UK> +44 (0) 1223 49 6891>> _______________________________________________> Biojava-l mailing list - Biojava-l@biojava.org> http://biojava.org/mailman/listinfo/biojava-l> From hotafin at gmail.com Fri Dec 2 08:23:04 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Fri Dec 2 08:20:47 2005 Subject: [Biojava-l] modify structure In-Reply-To: References: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> Message-ID: Just to be more clear:I store the AminoAcides as HashMap where 1 group contains 1AA, and the String is the name of the AA. (I prefer 1 letter code there) To shift the desired AA in place, I'd match the Cb atomsThen rotate the AA so that the Ca-Cb line would matchThen rotate the AA so that the N-Ca-Cb pane matches (In this case it's equalto the N-Ca line match) After this, the AA would be roughly in place. After this there may should be a collision test, but I think that part canbe handeled by the GROMACS package, which I'd ther run anyway to see how themutation effects the structure. On 12/2/05, Tamas Horvath wrote:>> Thanks for the codes! I've noticed the methods in Calc, but my main> question is the following. Let's say I've got a primitive library of> AminoAcides. They stored as a group, they have all the atoms. When I'm> mutating the chain, I want to keep the backbone atoms in place, so as far as> your mutate method goes it's ok. But now I want to replace the sidechain. In> order to do that, I'd shift and rotate the desired AA in place (Cbs would be> identical and the other backbone atoms as close as possible), and then copy> the sidechain atoms to the mutated AA... (I hope that's clear)>> So do I have to wrap my Group objects to a Chain/Structure object in order> to shift and rotate them?>> I don't really get how the rotation is supposed to work... what is exactly> the matrix it asks for?>> On 12/2/05, Andreas Prlic wrote:> >> > Hi Tamas!> >> >> > > If I've got a Group , which is an amino acid, and I want to shift it> > > by a 3D vector (or 3 2D vectors), how may I do it?> >> > There is the org.biojava.bio.structure.Calc class that allows to do> > calculations with the structure.> >> > e.g. to shift a structure do:> >> >> > double x = 2.0;> > double y = 0.2;> > double z = 12.3;> >> > Atom vector = new AtomImpl();> > vector.setX(x);> > vector.setY(y);> > vector.setZ(z);> >> > // shift the structure.> > Calc.shift(structure,vector);> >> >> >> > > Similarly, if i want to rotate the same structure, how may I do it?> >> > double[][] matrix = new double[3][3];> >> > matrix[0][0] = 0.1;> > matrix[0][1] = 0.2;> > matrix[0][2] = 0.3;> > matrix[1][0] = 0.4;> > matrix[1][1] = 0.5;> > matrix[1][2] = 0.6;> > matrix[2][0] = 0.7;> > matrix[2][1] = 0.8;> > matrix[2][2] = 0.9;> >> > Calc.rotate(structure,matrix);> >> >> > And here is an example regarding your questions from yesterday,> > how to do mutations. most of the code actually deals with finding the> > right chain and residue.> > I will add the "mutator" class to cvs, so in future doing mutations> > will be a two liner...> >> > Cheers,> > Andreas> >> >> > /*> > * BioJava development code> > *> > * This code may be freely distributed and modified under the> > * terms of the GNU Lesser General Public Licence. This should> > * be distributed with the code. If you do not have a copy,> > * see:> > *> > * http://www.gnu.org/copyleft/lesser.html> > *> > * Copyright for this code is held jointly by the individual> > * authors. These should be listed in @author doc comments.> > *> > * For more information on the BioJava project and its aims,> > * or to join the biojava-l mailing list, visit the home page> > * at:> > *> > * http://www.biojava.org/> > *> > * Created on Nov 30, 2005> > *> > */> >> > import java.io.FileOutputStream;> > import java.io.PrintStream;> > import java.util.ArrayList;> > import java.util.Iterator;> > import java.util.List;> >> > import org.biojava.bio.structure.AminoAcid;> > import org.biojava.bio.structure.AminoAcidImpl;> > import org.biojava.bio.structure.Atom;> > import org.biojava.bio.structure.AtomIterator;> > import org.biojava.bio.structure.Chain;> > import org.biojava.bio.structure.ChainImpl;> > import org.biojava.bio.structure.Group;> > import org.biojava.bio.structure.Structure ;> > import org.biojava.bio.structure.StructureImpl;> > import org.biojava.bio.structure.io.PDBFileReader;> > import org.biojava.bio.structure.io.PDBParseException;> >> >> > public class structureTest {> >> > public structureTest() {> > super();> >> > }> >> > public static void main (String[] args){> > String filename = "/Users/ap3/WORK/PDB/5pti.pdb" ;> > String outputfile = "/Users/ap3/WORK/PDB/mutated.pdb" ;> >> > PDBFileReader pdbreader = new PDBFileReader();> >> > try{> > Structure struc = pdbreader.getStructure(filename);> > System.out.println(struc);> >> >> > String chainId = " ";> > String pdbResnum = "2";> > String newType = "ARG";> >> > // mutate the original structure and create a new one.> > Mutator m = new Mutator();> > Structure newstruc => > m.mutate(struc,chainId,pdbResnum,newType);> >> > FileOutputStream out= new FileOutputStream(outputfile);> > PrintStream p = new PrintStream( out );> >> > p.println (newstruc.toPDB());> >> > p.close();> >> >> > } catch (Exception e) {> > e.printStackTrace();> > }> > }> > }> >> > class Mutator{> > List supportedAtoms;> >> > public Mutator(){> > supportedAtoms = new ArrayList();> > supportedAtoms.add("N");> > supportedAtoms.add("CA");> > supportedAtoms.add ("C");> > supportedAtoms.add("O");> > supportedAtoms.add("CB");> > }> >> > /** creates a new structure which is identical with the original> > one.> > * only one amino acid will be different.> > *> > * @param struc> > * @param chainId> > * @param pdbResnum> > * @param newType> > * @return> > * @throws PDBParseException> > */> > public Structure mutate(Structure struc, String chainId, String> > pdbResnum, String newType)> > throws PDBParseException{> >> >> > // create a container for the new structure> > Structure newstruc = new StructureImpl();> >> > // first we need to find our corresponding chain> >> > // get the chains for model nr. 0> > // if structure is xray there will be only one "model".> > List chains = struc.getChains(0);> >> > // iterate over all chains.> > Iterator iter = chains.iterator();> > while (iter.hasNext()){> > Chain c = (Chain)iter.next();> > if (c.getName().equals(chainId)) {> > // here is our chain!> >> > Chain newchain = new ChainImpl();> > newchain.setName(c.getName());> >> > List groups = c.getGroups();> >> > // now iterate over all groups in this chain.> > // in order to find the amino acid that has this> > pdbRenum.> >> > Iterator giter = groups.iterator();> > while (giter.hasNext()){> > Group g = (Group) giter.next();> > String rnum = g.getPDBCode();> >> > // we only mutate amino acids> > // and ignore hetatoms and nucleotides in this case> > if ( rnum.equals(pdbResnum) &&> > (g.getType().equals("amino"))){> >> > // create the mutated amino acid and add it to> > our new chain> > AminoAcid newgroup => > mutateResidue((AminoAcid)g,newType);> > newchain.addGroup(newgroup);> > }> > else {> > // add the group to the new chain unmodified.> > newchain.addGroup(g);> > }> > }> >> > // add the newly constructed chain to the structure;> > newstruc.addChain(newchain);> > } else {> > // this chain is not requested, add it to the new> > structure unmodified.> > newstruc.addChain(c);> > }> >> > }> > return newstruc;> > }> >> > /** create a new residue which is of the new type.> > * Only the atoms N, Ca, C, O, Cb will be considered.> > * prolines are not mutated...> > * @param oldAmino> > * @param newType> > * @return> > */> > public AminoAcid mutateResidue(AminoAcid oldAmino, String newType)> > throws PDBParseException {> >> > AminoAcid newgroup = new AminoAcidImpl();> >> > newgroup.setPDBCode (oldAmino.getPDBCode());> > newgroup.setPDBName(newType);> >> >> > AtomIterator aiter =new AtomIterator(oldAmino);> > while (aiter.hasNext()){> > Atom a = (Atom)aiter.next();> > if ( supportedAtoms.contains(a.getName())){> > newgroup.addAtom(a);> > }> > }> >> > return newgroup;> >> > }> >> > }> >> > -----------------------------------------------------------------------> >> > Andreas Prlic Wellcome Trust Sanger Institute> > Hinxton, Cambridge CB10 1SA, UK> > +44 (0) 1223 49 6891> >> > _______________________________________________> > Biojava-l mailing list - Biojava-l@biojava.org> > http://biojava.org/mailman/listinfo/biojava-l> >>> From hotafin at gmail.com Fri Dec 2 10:09:25 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Fri Dec 2 10:08:35 2005 Subject: [Biojava-l] cvs ant build failure Message-ID: ant package-biojavaBuildfile: build.xml init: [echo] Building biojava-live [echo] Java Home:/home/hota/programs/java/jdk1.5.0/jre [echo] JUnit present: ${junit.present} [echo] JUnit supported by Ant: true [echo] HSQLDB driver present: ${sqlDriver.hsqldb} prepare: prepare-biojava: compile-biojava: [javac] Compiling 93 source files to/data3/installs/biojava-live/ant-build/classes/biojava [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRichAnnotation.java:42:org.biojavax.EmptyRichAnnotation is not abstract and does not overrideabstract method getProperties(java.lang.Object) inorg.biojavax.RichAnnotation [javac] public class EmptyRichAnnotation extends Unchangeable implementsRichAnnotation, Serializable { [javac] ^ [javac] Note: * uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 1 error BUILD FAILED/data3/installs/biojava-live/build.xml:267: Compile failed; see the compilererror output for details. From erik.sjolund at gmail.com Fri Dec 2 08:20:53 2005 From: erik.sjolund at gmail.com (=?ISO-8859-1?Q?Erik_Sj=F6lund?=) Date: Fri Dec 2 14:56:07 2005 Subject: [Biojava-l] abi2xml a new parser of abi trace files Message-ID: Biojava contains a class to parse abi trace files: http://www.biojava.org/docs/api14/org/biojava/bio/program/abi/ABITrace.html So you might be interested to know that a new command line utility has been released http://abi2xml.sourceforge.net that converts abi trace files to xml files. This bioinformatics utility is written in C++ and released under the GPL license. A java programmer could first convert the abi files to xml files and then access the information over a DOM interface or over XPATH. Probably that java programmer has nothing to gain doing this compared to using the ABITrace class, but I thought it was worth mentioning the possibility. cheers, Erik Sj?lund From fpepin at cs.mcgill.ca Fri Dec 2 15:40:57 2005 From: fpepin at cs.mcgill.ca (Francois Pepin) Date: Fri Dec 2 15:38:55 2005 Subject: [Biojava-l] cvs ant build failure In-Reply-To: References: Message-ID: <1133556057.16992.36.camel@elm.mcb.mcgill.ca> Hi Tamas, I can compile fine from the CVS right now. two reasons why it might not work for you: 1- you might not be up to date, cvs update should fix that. 1- you have previous build in there (otherwise it would say compiling 1096 source files instead of 93). You probably want to do an 'ant clean' and try again. Francois On Fri, 2005-12-02 at 15:09 +0000, Tamas Horvath wrote: > ant package-biojavaBuildfile: build.xml > init: [echo] Building biojava-live [echo] Java Home:/home/hota/programs/java/jdk1.5.0/jre [echo] JUnit present: ${junit.present} [echo] JUnit supported by Ant: true [echo] HSQLDB driver present: ${sqlDriver.hsqldb} > prepare: > prepare-biojava: > compile-biojava: [javac] Compiling 93 source files to/data3/installs/biojava-live/ant-build/classes/biojava [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRichAnnotation.java:42:org.biojavax.EmptyRichAnnotation is not abstract and does not overrideabstract method getProperties(java.lang.Object) inorg.biojavax.RichAnnotation [javac] public class EmptyRichAnnotation extends Unchangeable implementsRichAnnotation, Serializable { [javac] ^ [javac] Note: * uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 1 error > BUILD FAILED/data3/installs/biojava-live/build.xml:267: Compile failed; see the compilererror output for details. > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From hotafin at gmail.com Sun Dec 4 16:06:53 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Sun Dec 4 16:31:37 2005 Subject: [Biojava-l] 3d structure rotation code Message-ID: I'd like to show u the following 2 functions that may be valuable in theCalc class: /**Returns a rotated Structure object (the rotation is around the origo) * * @param ostructure Structure -- the stucture to be rotated * @param from Atom -- the reference Atom's originalcoordinates * @param to Atom -- the reference Atom's desiredcoordinates * @return Structure -- null if there was an error */ public static Structure rotate3D(Structure ostructure,Atom from, Atom to)throws StructureException { Structure nstructure = new StructureImpl(); //calculate the angle of rotation final double angle = radangle(from,to); if (angle == 0 || angle == Math.PI) { throw new StructureException ("The rotation angle is 0 or 180degrees!"); } //calculate te unit normal vector of the (origo, from, to) pane //which will serve as an arbitary axis for the rotation Atom axisvector = vectorProduct(from,to); axisvector = unitVector(axisvector); //calculate the trigonometric values final double c = Math.cos(angle); final double s = Math.sin(angle); final double t = 1-Math.cos(angle); final double x = axisvector.getX(); final double y = axisvector.getY(); final double z = axisvector.getZ(); //and now the matrix double[][] rotationmatrix = new double[3][3]; rotationmatrix[0][0] = t*x*x+c ;rotationmatrix[0][1] =t*x*y+s*z;rotationmatrix[0][2] = t*x*z-s*y; rotationmatrix[1][0] = t*x*y-s*z;rotationmatrix[1][1] =t*y*y+c;rotationmatrix[1][2] = t*y*z+s*x; rotationmatrix[2][0] = t*x*y+s*y;rotationmatrix[2][1] =t*y*z-s*x;rotationmatrix[2][2] = t*z*z+c; //and now the rotation nstructure = (Structure) ostructure.clone(); try { rotate(nstructure, rotationmatrix); } catch (StructureException e) { System.out.println(e); nstructure = null; } return nstructure; } /**Calculates the a,origo,b angle in radians * * @param a Atom * @param b Atom * @return double */ public static double radangle(Atom a, Atom b) { final double skalar = skalarProduct(a,b); final double radangle = Math.acos( skalar/( amount(a) * amount(b) )); return radangle; } From hollandr at gis.a-star.edu.sg Sun Dec 4 20:42:55 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Dec 4 20:41:11 2005 Subject: [Biojava-l] cvs ant build failure Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602894DF0@BIONIC.biopolis.one-north.com> I just checked out the most recent version and found a bug in EmptyRichAnnotation just as your compiler output indicates. I fixed it. But... it still won't compile, but now for a different reason. It seems that Andreas' check-in of his structure classes over the weekend was missing the Matrix and SingularValueDecomposition classes. Andreas can you fix this please? cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Tamas Horvath > Sent: Friday, December 02, 2005 11:09 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] cvs ant build failure > > > ant package-biojavaBuildfile: build.xml > init: [echo] Building biojava-live [echo] Java > Home:/home/hota/programs/java/jdk1.5.0/jre [echo] JUnit > present: ${junit.present} [echo] JUnit > supported by Ant: true [echo] HSQLDB driver > present: ${sqlDriver.hsqldb} > prepare: > prepare-biojava: > compile-biojava: [javac] Compiling 93 source files > to/data3/installs/biojava-live/ant-build/classes/biojava > [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRich > Annotation.java:42:org.biojavax.EmptyRichAnnotation is not > abstract and does not overrideabstract method > getProperties(java.lang.Object) inorg.biojavax.RichAnnotation > [javac] public class EmptyRichAnnotation extends > Unchangeable implementsRichAnnotation, Serializable { > [javac] ^ [javac] Note: * uses or overrides a > deprecated API. [javac] Note: Recompile with > -Xlint:deprecation for details. [javac] 1 error > BUILD FAILED/data3/installs/biojava-live/build.xml:267: > Compile failed; see the compilererror output for details. > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Sun Dec 4 20:52:12 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Dec 4 20:49:52 2005 Subject: [Biojava-l] cvs ant build failure Message-ID: Just a reminder to people with CVS accounts (including myself who is sometimes guilty of this): The minimum requirement of CVS is that it will build at all times (using JDK1.4.2). The desirable requirement is that it will build and pass all unit tests. This is not a strict requirement for the live distribution but it is good to think about what you may have done to break the unit tests. - Mark "Richard HOLLAND" Sent by: biojava-l-bounces@portal.open-bio.org 12/05/2005 09:42 AM To: "Tamas Horvath" cc: biojava-l@biojava.org, (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] cvs ant build failure I just checked out the most recent version and found a bug in EmptyRichAnnotation just as your compiler output indicates. I fixed it. But... it still won't compile, but now for a different reason. It seems that Andreas' check-in of his structure classes over the weekend was missing the Matrix and SingularValueDecomposition classes. Andreas can you fix this please? cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Tamas Horvath > Sent: Friday, December 02, 2005 11:09 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] cvs ant build failure > > > ant package-biojavaBuildfile: build.xml > init: [echo] Building biojava-live [echo] Java > Home:/home/hota/programs/java/jdk1.5.0/jre [echo] JUnit > present: ${junit.present} [echo] JUnit > supported by Ant: true [echo] HSQLDB driver > present: ${sqlDriver.hsqldb} > prepare: > prepare-biojava: > compile-biojava: [javac] Compiling 93 source files > to/data3/installs/biojava-live/ant-build/classes/biojava > [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRich > Annotation.java:42:org.biojavax.EmptyRichAnnotation is not > abstract and does not overrideabstract method > getProperties(java.lang.Object) inorg.biojavax.RichAnnotation > [javac] public class EmptyRichAnnotation extends > Unchangeable implementsRichAnnotation, Serializable { > [javac] ^ [javac] Note: * uses or overrides a > deprecated API. [javac] Note: Recompile with > -Xlint:deprecation for details. [javac] 1 error > BUILD FAILED/data3/installs/biojava-live/build.xml:267: > Compile failed; see the compilererror output for details. > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Mon Dec 5 01:31:48 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Dec 5 01:29:27 2005 Subject: [Biojava-l] BaumWelchTrainer Broken??!!! (please help) Message-ID: Fixes for this bug suggested by Todd Riley and Thomas Down are now in CVS. I have tried a few examples and it seems to work well. - Mark From ap3 at sanger.ac.uk Mon Dec 5 04:31:53 2005 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon Dec 5 04:28:22 2005 Subject: [Biojava-l] cvs ant build failure In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5602894DF0@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5602894DF0@BIONIC.biopolis.one-north.com> Message-ID: <5ab80aa187d2967d8abab0c13a239f47@sanger.ac.uk> Hi Richard, > But... it still won't compile, but now for a different reason. It seems > that Andreas' check-in of his structure classes over the weekend was > missing the Matrix and SingularValueDecomposition classes. Andreas can > you fix this please? They were all checked in at the same time yesterday evening in a new directory. Did you do a cvs update -dP ? -d is for getting new directories and p for purging old ones. Cheers, Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From hollandr at gis.a-star.edu.sg Mon Dec 5 04:37:00 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Dec 5 04:35:16 2005 Subject: [Biojava-l] cvs ant build failure Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602894E48@BIONIC.biopolis.one-north.com> *doh!* All working. Thanks, Andreas. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Andreas Prlic [mailto:ap3@sanger.ac.uk] > Sent: Monday, December 05, 2005 5:32 PM > To: Richard HOLLAND > Cc: biojava-l@biojava.org > Subject: Re: [Biojava-l] cvs ant build failure > > > Hi Richard, > > > But... it still won't compile, but now for a different > reason. It seems > > that Andreas' check-in of his structure classes over the weekend was > > missing the Matrix and SingularValueDecomposition classes. > Andreas can > > you fix this please? > > They were all checked in at the same time yesterday evening in a new > directory. > > Did you do a > > cvs update -dP > > ? > > -d is for getting new directories and p for purging old ones. > > Cheers, > Andreas > > > -------------------------------------------------------------- > --------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > From hotafin at gmail.com Mon Dec 5 09:13:59 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Mon Dec 5 09:18:52 2005 Subject: [Biojava-l] parsePDB Message-ID: I have some very plane pdb files (Coordinates only), and if I try to parsethem, it throws: java.lang.StringIndexOutOfBoundsException: String index out of range: 6 at java.lang.String.substring(String.java:1765) at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:764) at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:720) What do I need in the pdb file to be able to parse it? From ap3 at sanger.ac.uk Mon Dec 5 09:36:00 2005 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon Dec 5 09:32:51 2005 Subject: [Biojava-l] 3d structure rotation code In-Reply-To: References: <1706112d2cfe772f3501821995576ead@sanger.ac.uk> Message-ID: Hi! Regarding the question posted by Tamas for creating artificial side-chains for amino acids last week: To superimpose two (or more) residues/atoms, one needs to do a singular value decomposition, which gives the required rotation matrix and shift vector. The recent biojava 1.4 release could not do this, but after doing a little bit of research and using available open source as template, biojava - cvs can do this now! A "screenshot" of two superimposed residues is available at: http://www.sanger.ac.uk/Users/ap3/rotation_example.html This is achieved by using the Jama library which is under us. gov. public domain license (i.e. do whatever you want). It is located at http://math.nist.gov/javanumerics/jama/ I added the few files from this package to the biojava cvs repository under org.biojava.structure.jama. I thought that biojava should not have yet another .jar dependency, so inclusion of the code is better. There is now also a class called SVDSuperimposer. It is heavily inspired by some code available from our friends at Biopython.... :-) Thanks also to Peter Lackner for providing an example for how to calculate "virtual" CB atoms. Now Tamas: back to your problem. I think you want to do something like the code below: Regards, Andreas try{ // get two amino acids from somewhere String filename = "/Users/ap3/WORK/PDB/5pti.pdb" ; PDBFileReader pdbreader = new PDBFileReader(); Structure struc = pdbreader.getStructure(filename); Group g1 = (Group)struc.getChain(0).getGroup(56); Group g2 = (Group)struc.getChain(0).getGroup(21); if ( g1.getPDBName().equals("GLY")){ if ( g1 instanceof AminoAcid){ Atom cb = Calc.createVirtualCBAtom((AminoAcid)g1); g1.addAtom(cb); } } if ( g2.getPDBName().equals("GLY")){ if ( g2 instanceof AminoAcid){ Atom cb = Calc.createVirtualCBAtom((AminoAcid)g2); g2.addAtom(cb); } } System.out.println(g1); System.out.println(g2); // convert the Groups to Atom arrays Atom[] atoms1 = new Atom[3]; Atom[] atoms2 = new Atom[3]; atoms1[0] = g1.getAtom("N"); atoms1[1] = g1.getAtom("CA"); atoms1[2] = g1.getAtom("CB"); atoms2[0] = g2.getAtom("N"); atoms2[1] = g2.getAtom("CA"); atoms2[2] = g2.getAtom("CB"); // and do the SVD ... SVDSuperimposer svds = new SVDSuperimposer(atoms1,atoms2); // the rotation matrix to be applied to group2 Matrix rotMatrix = svds.getRotation(); // and the vector to shift group2 Atom tranMatrix = svds.getTranslation(); // now we have all the info to perform the rotations ... // clone group2 - we want to preserve the original coords for the output later. Group newGroup = (Group)g2.clone(); // and rotate it Calc.rotate(newGroup,rotMatrix); // shift the group ... Calc.shift(newGroup,tranMatrix); // that's it! /// // now we finish up with doing some output: // write to a file to view in a viewer String outputfile = "/Users/ap3/WORK/PDB/rotated.pdb"; FileOutputStream out= new FileOutputStream(outputfile); PrintStream p = new PrintStream( out ); // create a new structure that contains the data to be written to the file. Structure newstruc = new StructureImpl(); // add the group1 Chain c1 = new ChainImpl(); c1.setName("A"); c1.addGroup(g1); newstruc.addChain(c1); // add the now correctly positioned group2 Chain c2 = new ChainImpl(); c2.setName("B"); c2.addGroup(newGroup); newstruc.addChain(c2); // show where the group was originally ... Chain c3 = new ChainImpl(); c3.setName("C"); //c3.addGroup(g1); c3.addGroup(g2); newstruc.addChain(c3); p.println(newstruc.toPDB()); p.close(); System.out.println("wrote to file " + outputfile); } catch (Exception e){ e.printStackTrace(); } On 2 Dec 2005, at 13:02, Tamas Horvath wrote: > Thanks for the codes! I've noticed the methods in Calc, but my main > question is the following. Let's say I've got a primitive library of > AminoAcides. They stored as a group, they have all the atoms. When I'm > mutating the chain, I want to keep the backbone atoms in place, so as > far as your mutate method goes it's ok. But now I want to replace the > sidechain. In order to do that, I'd shift and rotate the desired AA in > place (Cbs would be identical and the other backbone atoms as close as > possible), and then copy the sidechain atoms to the mutated AA... (I > hope that's clear) > > So do I have to wrap my Group objects to a Chain/Structure object in > order to shift and rotate them? > > I don't really get how the rotation is supposed to work... what is > exactly the matrix it asks for? ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From ap3 at sanger.ac.uk Mon Dec 5 09:40:57 2005 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon Dec 5 09:37:16 2005 Subject: [Biojava-l] parsePDB In-Reply-To: References: Message-ID: <2ec64bcf7c06324f76606abd8b887255@sanger.ac.uk> can you send me one of your files off list? the parser could parse all of PDB about one year ago ... And. On 5 Dec 2005, at 14:13, Tamas Horvath wrote: > I have some very plane pdb files (Coordinates only), and if I try to > parsethem, it throws: > java.lang.StringIndexOutOfBoundsException: String index out of range: > 6 at java.lang.String.substring(String.java:1765) at > org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.j > ava:764) at > org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.j > ava:720) > > What do I need in the pdb file to be able to parse it? > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From mes5k at cs.virginia.edu Mon Dec 5 22:00:49 2005 From: mes5k at cs.virginia.edu (Michael E. Smoot) Date: Mon Dec 5 22:24:35 2005 Subject: [Biojava-l] Hit_def from blast xml output? Message-ID: Hi, Can anyone tell me how I might get the value of the Hit_def tag from blast xml output? I'm following the cookbook protocol for parsing and extracting results (http://www.biojava.org/docs/bj_in_anger/BlastParser.htm). I see a way to get the subject (hit) id, but not the description. thanks, Mike From mark.schreiber at novartis.com Tue Dec 6 02:44:50 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Dec 6 02:42:59 2005 Subject: [Biojava-l] Hit_def from blast xml output? Message-ID: You may need to customize your blast listeners. If you run the blast echo example in biojava in anger you will find out what event type that information is contained in. You can then listen for that event type. http://www.biojava.org/docs/bj_in_anger/blastecho.htm - Mark "Michael E. Smoot" Sent by: biojava-l-bounces@portal.open-bio.org 12/06/2005 11:00 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Hit_def from blast xml output? Hi, Can anyone tell me how I might get the value of the Hit_def tag from blast xml output? I'm following the cookbook protocol for parsing and extracting results (http://www.biojava.org/docs/bj_in_anger/BlastParser.htm). I see a way to get the subject (hit) id, but not the description. thanks, Mike _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hotafin at gmail.com Thu Dec 8 09:16:47 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Thu Dec 8 09:14:40 2005 Subject: [Biojava-l] gromacs shell script Message-ID: I know this is not strictly BioJava, but here's my problem: I create a shell script file that would run a GROMACS MD simulationI generate the necessary input and config files I can make the generated shell script runnable I cannot actually run the script from the Java application. The script works fine from shell... The returned exitValue is 255. Can anyone tell, what may I do? From hotafin at gmail.com Thu Dec 8 10:00:49 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Thu Dec 8 09:58:33 2005 Subject: [Biojava-l] gromacs shell script In-Reply-To: References: Message-ID: I tried the ProcessTools method and nothing happens there either... On 12/8/05, Thomas Down wrote:>>> On 8 Dec 2005, at 14:16, Tamas Horvath wrote:>> > I know this is not strictly BioJava, but here's my problem:> > I create a shell script file that would run a GROMACS MD> > simulationI generate the necessary input and config files> > I can make the generated shell script runnable> > I cannot actually run the script from the Java application.> > The script works fine from shell...> > The returned exitValue is 255.> > Can anyone tell, what may I do?>> Could you give a few more details about the code you're using to run> the shell script from Java? Running external processes using Java's> Runtime.exec method isn't totally trivial -- you usually need to> start some extra threads to handle the child process' input and output.>> I presume your script is actually printing some kind of error message> to standard error (or maybe standard out, if it's badly behaved, but> these may be getting lost.>> BioJava has some convenience methods that (usually) allow you to run> child processes without writing your own multithreaded code. A> simple usage, that echoes the child's errors and outputs to the> console, would be something like:>> ProcessTools.exec(> new String[] {"/path/to/my/script", "-> someArgument"},> null, // no standard input> new OutputStreamWriter(System.out),> new OutputStreamWriter(System.err)> );>> You probably don't want to do this in production code, but for> development and debugging it's quite useful. For production use,> you'd normally use StringWriters to capture the child process' output.>> Thomas.> From hotafin at gmail.com Thu Dec 8 09:59:06 2005 From: hotafin at gmail.com (Tamas Horvath) Date: Thu Dec 8 10:03:36 2005 Subject: [Biojava-l] gromacs shell script In-Reply-To: References: Message-ID: On 12/8/05, Tamas Horvath wrote:>> Runtime rtime = Runtime.getRuntime();> Process child = rtime.exec("/bin/sh");>>> BufferedWriter outCommand = new BufferedWriter(new> OutputStreamWriter( child.getOutputStream()));>>> outCommand.write("cd "+workhome +"; chmod +x run.bat;> exit\n");> outCommand.flush();>> child.waitFor();> child.destroy(); this runs well the script gets the executable flag rtime = Runtime.getRuntime();> child = rtime.exec(workhome+"run.bat");>> BufferedReader input = new BufferedReader(new> InputStreamReader( child.getInputStream()));> BufferedReader inerr = new BufferedReader(new> InputStreamReader(child.getErrorStream()));>> String line = "";String lerr = "";> while ( (line = input.readLine()) != null || (lerr => inerr.readLine()) != null){> if (line != null && !line.equals("")) {> System.out.println(line);> lerr = inerr.readLine();> }> if (lerr != null && !lerr.equals("")) System.out.println> (lerr);> }>> child.waitFor();> System.err.println("EV:"+child.exitValue());> child.destroy(); Here I only get the exit value, and nothing else On 12/8/05, Thomas Down wrote:>>> On 8 Dec 2005, at 14:16, Tamas Horvath wrote:>> > I know this is not strictly BioJava, but here's my problem:> > I create a shell script file that would run a GROMACS MD> > simulationI generate the necessary input and config files> > I can make the generated shell script runnable> > I cannot actually run the script from the Java application.> > The script works fine from shell...> > The returned exitValue is 255.> > Can anyone tell, what may I do?>> Could you give a few more details about the code you're using to run> the shell script from Java? Running external processes using Java's> Runtime.exec method isn't totally trivial -- you usually need to> start some extra threads to handle the child process' input and output.>> I presume your script is actually printing some kind of error message> to standard error (or maybe standard out, if it's badly behaved, but> these may be getting lost.>> BioJava has some convenience methods that (usually) allow you to run> child processes without writing your own multithreaded code. A> simple usage, that echoes the child's errors and outputs to the> console, would be something like:>> ProcessTools.exec(> new String[] {"/path/to/my/script", "-> someArgument"},> null, // no standard input> new OutputStreamWriter(System.out),> new OutputStreamWriter(System.err)> );>> You probably don't want to do this in production code, but for> development and debugging it's quite useful. For production use,> you'd normally use StringWriters to capture the child process' output.>> Thomas.> From td2 at sanger.ac.uk Thu Dec 8 09:46:08 2005 From: td2 at sanger.ac.uk (Thomas Down) Date: Thu Dec 8 11:49:49 2005 Subject: [Biojava-l] gromacs shell script In-Reply-To: References: Message-ID: On 8 Dec 2005, at 14:16, Tamas Horvath wrote: > I know this is not strictly BioJava, but here's my problem: > I create a shell script file that would run a GROMACS MD > simulationI generate the necessary input and config files > I can make the generated shell script runnable > I cannot actually run the script from the Java application. > The script works fine from shell... > The returned exitValue is 255. > Can anyone tell, what may I do? Could you give a few more details about the code you're using to run the shell script from Java? Running external processes using Java's Runtime.exec method isn't totally trivial -- you usually need to start some extra threads to handle the child process' input and output. I presume your script is actually printing some kind of error message to standard error (or maybe standard out, if it's badly behaved, but these may be getting lost. BioJava has some convenience methods that (usually) allow you to run child processes without writing your own multithreaded code. A simple usage, that echoes the child's errors and outputs to the console, would be something like: ProcessTools.exec( new String[] {"/path/to/my/script", "- someArgument"}, null, // no standard input new OutputStreamWriter(System.out), new OutputStreamWriter(System.err) ); You probably don't want to do this in production code, but for development and debugging it's quite useful. For production use, you'd normally use StringWriters to capture the child process' output. Thomas. From ilhami.visne at gmail.com Sun Dec 11 16:57:01 2005 From: ilhami.visne at gmail.com (Ilhami Visne) Date: Sun Dec 11 17:19:24 2005 Subject: [Biojava-l] Restriction mapping for the whole chromosome sequence Message-ID: hi, i want to do restricting mapping for the whole chromosome sequence, e.g. chr1, ~240MB. it goes for some enzyme, like MsiI, perfect but for an another enzyme(MseI), i achieve an OutOfMemoryError. Why? What is the difference? thanx in advance From mark.schreiber at novartis.com Sun Dec 11 20:15:55 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Dec 11 20:13:22 2005 Subject: [Biojava-l] Restriction mapping for the whole chromosome sequence Message-ID: Have you tried setting the -Xmx option of your JVM? Ilhami Visne Sent by: biojava-l-bounces@portal.open-bio.org 12/12/2005 05:57 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Restriction mapping for the whole chromosome sequence hi, i want to do restricting mapping for the whole chromosome sequence, e.g. chr1, ~240MB. it goes for some enzyme, like MsiI, perfect but for an another enzyme(MseI), i achieve an OutOfMemoryError. Why? What is the difference? thanx in advance _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From wetrull at yahoo.com Mon Dec 12 19:22:48 2005 From: wetrull at yahoo.com (W. Eric Trull) Date: Mon Dec 12 19:26:50 2005 Subject: [Biojava-l] SAXException with BLAST errors Message-ID: <20051213002248.79592.qmail@web81412.mail.mud.yahoo.com> Hello all, Some of you may remember that I've been creating a Java application to front a BLAST web service. Everything is working great except some user found the random sequence that causes problems (gotta love those users). I'm using the BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I have two problems; one is a NCBI BLAST problem and the other is with BioJava's BlastXMLParserFacade. Any help/advice would be appreciated, especially if I have to explain the problem to NCBI - biology is not my strong suit. Here is the relevant BioJava stack trace: org.xml.sax.SAXException: is non-compliant. at org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362) at org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235) at org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153) at org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403) at org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456) at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260) at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381) at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081) at org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180) Here is STDERR from NCBI BLAST on Sun Solaris: [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput BlastOutput.iterations.E.hits.E.hsps.E. Invalid value(s) [-3] in VisibleString [?????????????????----------???????????????????????????????????????????? ...] Here is what I get from NCBI BLAST on Windows XP: [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(280) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(313) >= len(256) Here is how I started BLAST: /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp -d /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 -o /var/tmp/blast39961.tmp -b 0 Here is my input sequence: MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV GAAPHPFLHR YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI NGSNWEGILG LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID HSLYTGSLWY TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA ASSTEKFPDG FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY KFAISQSSTG TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME DCGYN Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me that the identities and positives are both zero - why is this even showing up as a similar sequence? >pdb|1ML5|E 30S Ribosomal Protein S2 Length = 256 Score = 28.1 bits (61), Expect = 5.8 Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%) Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 158 Sbjct: 264 ---------- 313 Query: 159 SLVKQTHVPNL 169 Sbjct: 314 324 Here is the XML BLAST output for pdb|1ML5|E. Notice the second has a bunch of "#" signs. Is this valid in BioJava? 146 pdb|1ML5|E 30S Ribosomal Protein S2 1ML5_E 256 1 28.1054 61 5.76848 99 169 264 324 1 1 10 71 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL #################----------############################################ Thanks. -Eric Trull From mark.schreiber at novartis.com Mon Dec 12 20:37:59 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Dec 12 20:35:44 2005 Subject: [Biojava-l] SAXException with BLAST errors Message-ID: Not exactly sure what the problem is here but it looks like your input is not in FASTA format so that might be causing a problem?? "W. Eric Trull" Sent by: biojava-l-bounces@portal.open-bio.org 12/13/2005 08:22 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] SAXException with BLAST errors Hello all, Some of you may remember that I've been creating a Java application to front a BLAST web service. Everything is working great except some user found the random sequence that causes problems (gotta love those users). I'm using the BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I have two problems; one is a NCBI BLAST problem and the other is with BioJava's BlastXMLParserFacade. Any help/advice would be appreciated, especially if I have to explain the problem to NCBI - biology is not my strong suit. Here is the relevant BioJava stack trace: org.xml.sax.SAXException: is non-compliant. at org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362) at org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235) at org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153) at org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403) at org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456) at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260) at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381) at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081) at org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180) Here is STDERR from NCBI BLAST on Sun Solaris: [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput BlastOutput.iterations.E.hits.E.hsps.E. Invalid value(s) [-3] in VisibleString [?????????????????----------???????????????????????????????????????????? ...] Here is what I get from NCBI BLAST on Windows XP: [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(280) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(313) >= len(256) Here is how I started BLAST: /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp -d /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 -o /var/tmp/blast39961.tmp -b 0 Here is my input sequence: MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV GAAPHPFLHR YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI NGSNWEGILG LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID HSLYTGSLWY TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA ASSTEKFPDG FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY KFAISQSSTG TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME DCGYN Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me that the identities and positives are both zero - why is this even showing up as a similar sequence? >pdb|1ML5|E 30S Ribosomal Protein S2 Length = 256 Score = 28.1 bits (61), Expect = 5.8 Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%) Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 158 Sbjct: 264 ---------- 313 Query: 159 SLVKQTHVPNL 169 Sbjct: 314 324 Here is the XML BLAST output for pdb|1ML5|E. Notice the second has a bunch of "#" signs. Is this valid in BioJava? 146 pdb|1ML5|E 30S Ribosomal Protein S2 1ML5_E 256 1 28.1054 61 5.76848 99 169 264 324 1 1 10 71 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL #################----------############################################ Thanks. -Eric Trull _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From wetrull at yahoo.com Mon Dec 12 20:42:30 2005 From: wetrull at yahoo.com (W. Eric Trull) Date: Mon Dec 12 20:46:32 2005 Subject: [Biojava-l] SAXException with BLAST errors In-Reply-To: Message-ID: <20051213014230.61941.qmail@web81405.mail.mud.yahoo.com> No, I use BioJava to write the user's query sequence as a fasta file before feeding it to BLAST. I just copied a differently formatted sequence into my post. Thanks. -Eric Trull --- mark.schreiber@novartis.com wrote: > Not exactly sure what the problem is here but it looks like your input is > not in FASTA format so that might be causing a problem?? > > > > > > "W. Eric Trull" > Sent by: biojava-l-bounces@portal.open-bio.org > 12/13/2005 08:22 AM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] SAXException with BLAST errors > > > Hello all, > > Some of you may remember that I've been creating a Java application to > front > a BLAST web service. Everything is working great except some user found > the > random sequence that causes problems (gotta love those users). I'm using > the > BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I > have > two problems; one is a NCBI BLAST problem and the other is with BioJava's > BlastXMLParserFacade. Any help/advice would be appreciated, especially if > I > have to explain the problem to NCBI - biology is not my strong suit. > > Here is the relevant BioJava stack trace: > > org.xml.sax.SAXException: is non-compliant. > at > org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362) > at > org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235) > at > org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153) > at > org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403) > at > org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456) > at > org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260) > at > org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381) > at > org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081) > at > org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180) > > Here is STDERR from NCBI BLAST on Sun Solaris: > > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > >= > len(256) > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > >= > len(256) > [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput > BlastOutput.iterations.E.hits.E.hsps.E. > Invalid value(s) [-3] in VisibleString > [?????????????????----------???????????????????????????????????????????? > ...] > > Here is what I get from NCBI BLAST on Windows XP: > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(263) > >= > len(256) > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(263) > >= > len(256) > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(280) > >= > len(256) > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(313) > >= > len(256) > > Here is how I started BLAST: > > /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp > -d > /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 > -o > /var/tmp/blast39961.tmp -b 0 > > Here is my input sequence: > > MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV > GAAPHPFLHR > YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI > NGSNWEGILG > LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID > HSLYTGSLWY > TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA > ASSTEKFPDG > FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY > KFAISQSSTG > TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME > DCGYN > > Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me that > the > identities and positives are both zero - why is this even showing up as a > similar sequence? > > >pdb|1ML5|E 30S Ribosomal Protein S2 > Length = 256 > > Score = 28.1 bits (61), Expect = 5.8 > Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%) > > Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD > 158 > > Sbjct: 264 ---------- 313 > > Query: 159 SLVKQTHVPNL 169 > > Sbjct: 314 324 > > > Here is the XML BLAST output for pdb|1ML5|E. Notice the second > has a bunch of "#" signs. Is this valid in BioJava? > > > 146 > pdb|1ML5|E > 30S Ribosomal Protein S2 > 1ML5_E > 256 > > > 1 > 28.1054 > 61 > 5.76848 > 99 > 169 > 264 > 324 > 1 > 1 > 10 > 71 > > ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL > > #################----------############################################ > > > > > > > Thanks. > > -Eric Trull > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > Thanks. -W. Eric Trull From wetrull at yahoo.com Wed Dec 14 11:58:28 2005 From: wetrull at yahoo.com (W. Eric Trull) Date: Wed Dec 14 12:02:27 2005 Subject: [Biojava-l] SAXException with BLAST errors In-Reply-To: Message-ID: <20051214165828.34425.qmail@web81402.mail.mud.yahoo.com> Thanks for the suggestion Mark. I emailed NCBI and the jist of the reply was: These SeqPortNew errors usually indicate a problem in the formatting process; the #'s are certainly not normal. Is this the only database entry that generates errors? So I dug a little deeper on 1ML5 to discover that it has a chain 'e' and a chain 'E'. When I created my FASTA file to feed to formatdb I made the deflines of the form pdb||, but in uppercase. So I had two entries with the same defline but different sequences. I think this is my problem and am working on fixing it now. Thanks. -Eric Trull --- mark.schreiber@novartis.com wrote: > I would send NCBI your test sequence, the blast output and the version of > BLAST and ask them if this is "normal". I have found them to be very > responsive in the past. If it is normal then we need to fix biojava to > cope. > > - Mark > > > > > > "W. Eric Trull" > 12/13/2005 09:42 AM > > > To: Mark Schreiber/GP/Novartis@PH > cc: biojava-l@biojava.org, > biojava-l-bounces@portal.open-bio.org > Subject: Re: [Biojava-l] SAXException with BLAST errors > > > No, I use BioJava to write the user's query sequence as a fasta file > before > feeding it to BLAST. I just copied a differently formatted sequence into > my > post. > > Thanks. > > -Eric Trull > > --- mark.schreiber@novartis.com wrote: > > > Not exactly sure what the problem is here but it looks like your input > is > > not in FASTA format so that might be causing a problem?? > > > > > > > > > > > > "W. Eric Trull" > > Sent by: biojava-l-bounces@portal.open-bio.org > > 12/13/2005 08:22 AM > > > > > > To: biojava-l@biojava.org > > cc: (bcc: Mark Schreiber/GP/Novartis) > > Subject: [Biojava-l] SAXException with BLAST errors > > > > > > Hello all, > > > > Some of you may remember that I've been creating a Java application to > > front > > a BLAST web service. Everything is working great except some user found > > > the > > random sequence that causes problems (gotta love those users). I'm > using > > the > > BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I > > > have > > two problems; one is a NCBI BLAST problem and the other is with > BioJava's > > BlastXMLParserFacade. Any help/advice would be appreciated, especially > if > > I > > have to explain the problem to NCBI - biology is not my strong suit. > > > > Here is the relevant BioJava stack trace: > > > > org.xml.sax.SAXException: is non-compliant. > > at > > > org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362) > > at > > > org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235) > > at > > > org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153) > > at > > org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403) > > at > > > org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456) > > at > > > org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260) > > at > > > org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381) > > at > > org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081) > > at > > > org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180) > > > > Here is STDERR from NCBI BLAST on Sun Solaris: > > > > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > > > >= > > len(256) > > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > > > >= > > len(256) > > [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput > > BlastOutput.iterations.E.hits.E.hsps.E. > > Invalid value(s) [-3] in VisibleString > > [?????????????????----------???????????????????????????????????????????? > > > ...] > > > > Here is what I get from NCBI BLAST on Windows XP: > > > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(263) > > >= > > len(256) > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(263) > > >= > > len(256) > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(280) > > >= > > len(256) > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(313) > > >= > > len(256) > > > > Here is how I started BLAST: > > > > /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p > blastp > > -d > > /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 > > > -o > > /var/tmp/blast39961.tmp -b 0 > > > > Here is my input sequence: > > > > MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV > > GAAPHPFLHR > > YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI > > NGSNWEGILG > > LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID > > HSLYTGSLWY > > TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA > > ASSTEKFPDG > > FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY > > KFAISQSSTG > > TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME > > DCGYN > > > > Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me > that > > the > > identities and positives are both zero - why is this even showing up as > a > > similar sequence? > > > > >pdb|1ML5|E 30S Ribosomal Protein S2 > > Length = 256 > > > > Score = 28.1 bits (61), Expect = 5.8 > > Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%) > > > > Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD > > 158 > > > > Sbjct: 264 ---------- 313 > > > > Query: 159 SLVKQTHVPNL 169 > > > > Sbjct: 314 324 > > > > > > Here is the XML BLAST output for pdb|1ML5|E. Notice the second > > > has a bunch of "#" signs. Is this valid in BioJava? > > > > > > 146 > > pdb|1ML5|E > > 30S Ribosomal Protein S2 > > 1ML5_E > > 256 > > > > > > 1 > > 28.1054 > > 61 > > 5.76848 > > 99 > === message truncated === From christoph.gille at charite.de Thu Dec 15 05:22:29 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Thu Dec 15 05:29:14 2005 Subject: [Biojava-l] BLAST (ncbi-blast, wu-blast, web-blast) Message-ID: <65052.84.190.28.222.1134642149.squirrel@webmail.charite.de> In the last weeks, we had many discussions on BLAST which shows that BLAST is of much interest. To my knowledge, there is no class in Biojava for invoking Blast searches yet. Therefore I would like to discuss the new BLAST API with you. It is a Java wrapper for local NCBI and local WU-blast and for Web BLAST at http://www.ebi.ac.uk . Please have a look at the API and tell me your opinion. Have I missed something, are the method names OK ? http://www.charite.de/bioinf/strap/Scripting.html#SequenceBlaster http://www.charite.de/bioinf/strap/biojavaInAnger_SequenceBlaster.html Please send your suggestions. Here a short description: Implementations of SequenceBlaster produce XML output which can be parsed with org.biojava.bio.program.sax.BlastLikeSAXParser. There is also a simple non-Biojava DOM based parser which is currently used only to make a human readable output. The Wrapper provides a cache to avoid that one and the same BLAST is run twice. If this is, however intended, the BLAST result must be removed from the cache before invoking compute(). The implementations of NCBI and WU-blast evaluate the shell variables BLASTDB and WUBLASTDB, respectively which point to directories were the databases are located. This works even for Java1.4 where the method getenv() is corrupted. The Java wrapper can thus determine, what databases are available. The API is thread save. You can perform the computation outside the event dispatching thread. Christoph From ilueny at yahoo.com.br Thu Dec 15 15:08:40 2005 From: ilueny at yahoo.com.br (Ilueny Santos) Date: Thu Dec 15 15:12:33 2005 Subject: [Biojava-l] somebody can help me? Message-ID: <20051215200840.95165.qmail@web53901.mail.yahoo.com> Hello, I am Brazilian, my name is Ilueny Santos and is new in the world of the bioinformatic. I am writing my work of conclusion of course in this area and the subject of the work is: "Classifying Bayesians and its Applications in the Recognition of Procaryotics Promoters". I program in java and I am trying to use biojava it for locating the regions box-10 and box-35 in DNA. somebody can help me? it forgives, my English is not very good. since already I am thankful. --------------------------------- Yahoo! doce lar. Fa?a do Yahoo! sua homepage. From m.fortner at sbcglobal.net Thu Dec 15 16:55:11 2005 From: m.fortner at sbcglobal.net (Mark Fortner) Date: Thu Dec 15 16:59:13 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) Message-ID: <43A1E63F.2060302@sbcglobal.net> I'm looking for the best way to iterate through all nmers within a given sequence. For example, given a sequence that looks like this: ACTGACTGACTG If I extract all trimers from this I should get: ACT CTG TGA GAC ACT CTG TGA GAC ACT CTG Is there an existing class that will allow me to iterate through a sequence this way, or am I on my own? Regards, Mark Fortner From smh1008 at cam.ac.uk Thu Dec 15 18:34:02 2005 From: smh1008 at cam.ac.uk (David Huen) Date: Thu Dec 15 18:53:50 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) In-Reply-To: <43A1E63F.2060302@sbcglobal.net> References: <43A1E63F.2060302@sbcglobal.net> Message-ID: On Dec 15 2005, Mark Fortner wrote: I think what you want is the SymbolListViews.orderNSymbolList method. It will take a SymbolList and turn it into another where it is viewed in another compound alphabet of the required order. >I'm looking for the best way to iterate through all >nmers within a given sequence. For example, given a >sequence that looks like this: > >ACTGACTGACTG > >If I extract all trimers from this I should get: > >ACT >CTG >TGA >GAC >ACT >CTG >TGA >GAC >ACT >CTG > >Is there an existing class that will allow me to >iterate through a sequence this way, or am I on my >own? > From mark.schreiber at novartis.com Thu Dec 15 20:06:01 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Dec 15 20:03:20 2005 Subject: [Biojava-l] somebody can help me? Message-ID: Hello - If you want to make a Bayesian classifier you would most likely use the org.biojava.dist packages to calculate distributions of nulceotide frequency. Hope this helps, - Mark Ilueny Santos Sent by: biojava-l-bounces@portal.open-bio.org 12/16/2005 04:08 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] somebody can help me? Hello, I am Brazilian, my name is Ilueny Santos and is new in the world of the bioinformatic. I am writing my work of conclusion of course in this area and the subject of the work is: "Classifying Bayesians and its Applications in the Recognition of Procaryotics Promoters". I program in java and I am trying to use biojava it for locating the regions box-10 and box-35 in DNA. somebody can help me? it forgives, my English is not very good. since already I am thankful. --------------------------------- Yahoo! doce lar. Fa?a do Yahoo! sua homepage. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Thu Dec 15 20:43:52 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Thu Dec 15 20:53:54 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com> orderNSymbolList splits the sequence into non-overlapping chunks. What is required here is chunks that are only one base different (further on) than the previous chunk. The simplest way would be this: SymbolList mySeq; // this is your sequence from somewhere else for (int i = 1 ; i <= mySeq.length()-2; i++) { SymbolList trimer = mySeq.subSeq(i,i+2); // coords are inclusive so i to i+2 = 3 bases // do something with your trimer here } Note that the index starts at 1 and goes right up to and including length(), as symbols in a SymbolList are 1-indexed, not 0-indexed. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen > Sent: Friday, December 16, 2005 7:34 AM > To: m.fortner@sbcglobal.net > Cc: biojava-list > Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) > > > On Dec 15 2005, Mark Fortner wrote: > I think what you want is the SymbolListViews.orderNSymbolList method. > > It will take a SymbolList and turn it into another where it > is viewed in > another compound alphabet of the required order. > > > >I'm looking for the best way to iterate through all > >nmers within a given sequence. For example, given a > >sequence that looks like this: > > > >ACTGACTGACTG > > > >If I extract all trimers from this I should get: > > > >ACT > >CTG > >TGA > >GAC > >ACT > >CTG > >TGA > >GAC > >ACT > >CTG > > > >Is there an existing class that will allow me to > >iterate through a sequence this way, or am I on my > >own? > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Thu Dec 15 21:33:48 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Dec 15 21:33:26 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) Message-ID: Actually orderNSymbolList gives overlapping NMers. windowedSymbolList gives non-overlapping Nmers. given the sequence actcgcatgcgatcgcag orderNSymbolList (with order of 4) would give actc, ctcg, tcgc etc windowedSymbolList with an order of 4 would give actc, gcat, gcga, etc eventually the windowedSymbolList woud actually throw an exception cause the sequence above is not evenly divisible by 4 (seq.length() % 4 != 0) - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Richard HOLLAND" Sent by: biojava-l-bounces@portal.open-bio.org 12/16/2005 09:43 AM To: "David Huen" , cc: biojava-list , (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] Sequence Iteration in BioJava(x) orderNSymbolList splits the sequence into non-overlapping chunks. What is required here is chunks that are only one base different (further on) than the previous chunk. The simplest way would be this: SymbolList mySeq; // this is your sequence from somewhere else for (int i = 1 ; i <= mySeq.length()-2; i++) { SymbolList trimer = mySeq.subSeq(i,i+2); // coords are inclusive so i to i+2 = 3 bases // do something with your trimer here } Note that the index starts at 1 and goes right up to and including length(), as symbols in a SymbolList are 1-indexed, not 0-indexed. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen > Sent: Friday, December 16, 2005 7:34 AM > To: m.fortner@sbcglobal.net > Cc: biojava-list > Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) > > > On Dec 15 2005, Mark Fortner wrote: > I think what you want is the SymbolListViews.orderNSymbolList method. > > It will take a SymbolList and turn it into another where it > is viewed in > another compound alphabet of the required order. > > > >I'm looking for the best way to iterate through all > >nmers within a given sequence. For example, given a > >sequence that looks like this: > > > >ACTGACTGACTG > > > >If I extract all trimers from this I should get: > > > >ACT > >CTG > >TGA > >GAC > >ACT > >CTG > >TGA > >GAC > >ACT > >CTG > > > >Is there an existing class that will allow me to > >iterate through a sequence this way, or am I on my > >own? > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From m.fortner at sbcglobal.net Thu Dec 15 21:36:11 2005 From: m.fortner at sbcglobal.net (Mark Fortner) Date: Thu Dec 15 21:40:13 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com> Message-ID: <43A2281B.7010609@sbcglobal.net> Richard, Thanks for the example. Your approach is very similar to a non-BioJava approach that I had worked out earlier. I was wondering if the BioJava(x) API provides any performance benefit over simply running a window along a character stream? The work that we're doing involves iterating through the human genome, (and in a number of cases, metagenomic sequences) and we're trying to squeeze as much performance out of it as possible while minimizing the memory footprint. Thanks, Mark Richard HOLLAND wrote: >orderNSymbolList splits the sequence into non-overlapping chunks. What >is required here is chunks that are only one base different (further on) >than the previous chunk. > >The simplest way would be this: > > SymbolList mySeq; // this is your sequence from somewhere else > for (int i = 1 ; i <= mySeq.length()-2; i++) { > SymbolList trimer = mySeq.subSeq(i,i+2); // coords are >inclusive so i to i+2 = 3 bases > // do something with your trimer here > } > >Note that the index starts at 1 and goes right up to and including >length(), as symbols in a SymbolList are 1-indexed, not 0-indexed. > >cheers, >Richard > >Richard Holland >Bioinformatics Specialist >GIS extension 8199 >--------------------------------------------- >This email is confidential and may be privileged. If you are not the >intended recipient, please delete it and notify us immediately. Please >do not copy or use it for any purpose, or disclose its content to any >other person. Thank you. >--------------------------------------------- > > > > >>-----Original Message----- >>From: biojava-l-bounces@portal.open-bio.org >>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen >>Sent: Friday, December 16, 2005 7:34 AM >>To: m.fortner@sbcglobal.net >>Cc: biojava-list >>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) >> >> >>On Dec 15 2005, Mark Fortner wrote: >>I think what you want is the SymbolListViews.orderNSymbolList method. >> >>It will take a SymbolList and turn it into another where it >>is viewed in >>another compound alphabet of the required order. >> >> >> >> >>>I'm looking for the best way to iterate through all >>>nmers within a given sequence. For example, given a >>>sequence that looks like this: >>> >>>ACTGACTGACTG >>> >>>If I extract all trimers from this I should get: >>> >>>ACT >>>CTG >>>TGA >>>GAC >>>ACT >>>CTG >>>TGA >>>GAC >>>ACT >>>CTG >>> >>>Is there an existing class that will allow me to >>>iterate through a sequence this way, or am I on my >>>own? >>> >>> >>> >>_______________________________________________ >>Biojava-l mailing list - Biojava-l@biojava.org >>http://biojava.org/mailman/listinfo/biojava-l >> >> >> > > > From hollandr at gis.a-star.edu.sg Thu Dec 15 21:57:15 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Thu Dec 15 21:55:13 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602895362@BIONIC.biopolis.one-north.com> Mark's comments earlier make my sample code redundant. I had the two different window thingies confused. See his post for more details! cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Mark Fortner > Sent: Friday, December 16, 2005 10:36 AM > To: biojava-list > Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) > > > Richard, > Thanks for the example. Your approach is very similar to a > non-BioJava > approach that I had worked out earlier. I was wondering if the > BioJava(x) API provides any performance benefit over simply running a > window along a character stream? > > The work that we're doing involves iterating through the > human genome, > (and in a number of cases, metagenomic sequences) and we're trying to > squeeze as much performance out of it as possible while > minimizing the > memory footprint. > > Thanks, > > Mark > > Richard HOLLAND wrote: > > >orderNSymbolList splits the sequence into non-overlapping > chunks. What > >is required here is chunks that are only one base different > (further on) > >than the previous chunk. > > > >The simplest way would be this: > > > > SymbolList mySeq; // this is your sequence from somewhere else > > for (int i = 1 ; i <= mySeq.length()-2; i++) { > > SymbolList trimer = mySeq.subSeq(i,i+2); // coords are > >inclusive so i to i+2 = 3 bases > > // do something with your trimer here > > } > > > >Note that the index starts at 1 and goes right up to and including > >length(), as symbols in a SymbolList are 1-indexed, not 0-indexed. > > > >cheers, > >Richard > > > >Richard Holland > >Bioinformatics Specialist > >GIS extension 8199 > >--------------------------------------------- > >This email is confidential and may be privileged. If you are not the > >intended recipient, please delete it and notify us > immediately. Please > >do not copy or use it for any purpose, or disclose its content to any > >other person. Thank you. > >--------------------------------------------- > > > > > > > > > >>-----Original Message----- > >>From: biojava-l-bounces@portal.open-bio.org > >>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > David Huen > >>Sent: Friday, December 16, 2005 7:34 AM > >>To: m.fortner@sbcglobal.net > >>Cc: biojava-list > >>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) > >> > >> > >>On Dec 15 2005, Mark Fortner wrote: > >>I think what you want is the > SymbolListViews.orderNSymbolList method. > >> > >>It will take a SymbolList and turn it into another where it > >>is viewed in > >>another compound alphabet of the required order. > >> > >> > >> > >> > >>>I'm looking for the best way to iterate through all > >>>nmers within a given sequence. For example, given a > >>>sequence that looks like this: > >>> > >>>ACTGACTGACTG > >>> > >>>If I extract all trimers from this I should get: > >>> > >>>ACT > >>>CTG > >>>TGA > >>>GAC > >>>ACT > >>>CTG > >>>TGA > >>>GAC > >>>ACT > >>>CTG > >>> > >>>Is there an existing class that will allow me to > >>>iterate through a sequence this way, or am I on my > >>>own? > >>> > >>> > >>> > >>_______________________________________________ > >>Biojava-l mailing list - Biojava-l@biojava.org > >>http://biojava.org/mailman/listinfo/biojava-l > >> > >> > >> > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Thu Dec 15 22:45:21 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Dec 15 22:43:44 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) Message-ID: There is probably not any performance benefit except in the case of very large sequences which are often compressed behind the scenes by biojava. The benefits may come from ease of use and object orientation. eg, There is probably already a parser to read in an validate your sequence, The windowing or nMer stuff is already figured out and has been used by lots of people so it's been "stress tested". Also the objects themselves have a lot of functionality built in that a character stream does not. The downside of using objects is they take up memory and there is a certain amount of overhead in there construction. To help overcome this SymbolLists are actually lists of references to Symbols not lists of Symbols themselves. This makes them much smaller (although not as small as char[]'s). If you want superfast performance then you should bit encode the data and operate over it with memory pointers as in C or machine code. You should be aware though that any intensive loop like the ones that would be used to carry out this operation in biojava will almost certainly be detected and compiled into native code by the Java Runtime on the fly. This might make it hard to say if the java code would be much slower than the C code. - Mark Mark Fortner Sent by: biojava-l-bounces@portal.open-bio.org 12/16/2005 10:36 AM Please respond to m.fortner To: biojava-list cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) Richard, Thanks for the example. Your approach is very similar to a non-BioJava approach that I had worked out earlier. I was wondering if the BioJava(x) API provides any performance benefit over simply running a window along a character stream? The work that we're doing involves iterating through the human genome, (and in a number of cases, metagenomic sequences) and we're trying to squeeze as much performance out of it as possible while minimizing the memory footprint. Thanks, Mark Richard HOLLAND wrote: >orderNSymbolList splits the sequence into non-overlapping chunks. What >is required here is chunks that are only one base different (further on) >than the previous chunk. > >The simplest way would be this: > > SymbolList mySeq; // this is your sequence from somewhere else > for (int i = 1 ; i <= mySeq.length()-2; i++) { > SymbolList trimer = mySeq.subSeq(i,i+2); // coords are >inclusive so i to i+2 = 3 bases > // do something with your trimer here > } > >Note that the index starts at 1 and goes right up to and including >length(), as symbols in a SymbolList are 1-indexed, not 0-indexed. > >cheers, >Richard > >Richard Holland >Bioinformatics Specialist >GIS extension 8199 >--------------------------------------------- >This email is confidential and may be privileged. If you are not the >intended recipient, please delete it and notify us immediately. Please >do not copy or use it for any purpose, or disclose its content to any >other person. Thank you. >--------------------------------------------- > > > > >>-----Original Message----- >>From: biojava-l-bounces@portal.open-bio.org >>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen >>Sent: Friday, December 16, 2005 7:34 AM >>To: m.fortner@sbcglobal.net >>Cc: biojava-list >>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) >> >> >>On Dec 15 2005, Mark Fortner wrote: >>I think what you want is the SymbolListViews.orderNSymbolList method. >> >>It will take a SymbolList and turn it into another where it >>is viewed in >>another compound alphabet of the required order. >> >> >> >> >>>I'm looking for the best way to iterate through all >>>nmers within a given sequence. For example, given a >>>sequence that looks like this: >>> >>>ACTGACTGACTG >>> >>>If I extract all trimers from this I should get: >>> >>>ACT >>>CTG >>>TGA >>>GAC >>>ACT >>>CTG >>>TGA >>>GAC >>>ACT >>>CTG >>> >>>Is there an existing class that will allow me to >>>iterate through a sequence this way, or am I on my >>>own? >>> >>> >>> >>_______________________________________________ >>Biojava-l mailing list - Biojava-l@biojava.org >>http://biojava.org/mailman/listinfo/biojava-l >> >> >> > > > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From m.fortner at sbcglobal.net Thu Dec 15 23:09:42 2005 From: m.fortner at sbcglobal.net (Mark Fortner) Date: Thu Dec 15 23:13:48 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) In-Reply-To: References: Message-ID: <43A23E06.8040202@sbcglobal.net> Mark, Thanks for the info. This is sort of a test project for us. We have a few classes and data structures in C++ that handle operations like sequence io and packing, and are fairly fast. However, we've also come to the realization that we've spent a lot of time dealing with cross-platform and compiler-related problems, and if Java can give us comparable performance then we might switch to it. If nothing else, the opportunity costs would be lower, since we could write and test more code, in the same amount of time. The tools are good-deal better for Java development than C++. We're at the point where we can either continue to invest time in our library or rewrite what we have using BioJava and other libraries. I've written a lot of Java-code over the past 10 years and suggested that we try Java both using the standard javac compiler and gcj to see if we can get C++ like performance. Thanks for your help, Mark mark.schreiber@novartis.com wrote: >There is probably not any performance benefit except in the case of very >large sequences which are often compressed behind the scenes by biojava. > >The benefits may come from ease of use and object orientation. > >eg, There is probably already a parser to read in an validate your >sequence, The windowing or nMer stuff is already figured out and has been >used by lots of people so it's been "stress tested". Also the objects >themselves have a lot of functionality built in that a character stream >does not. The downside of using objects is they take up memory and there >is a certain amount of overhead in there construction. To help overcome >this SymbolLists are actually lists of references to Symbols not lists of >Symbols themselves. This makes them much smaller (although not as small as >char[]'s). > >If you want superfast performance then you should bit encode the data and >operate over it with memory pointers as in C or machine code. You should >be aware though that any intensive loop like the ones that would be used >to carry out this operation in biojava will almost certainly be detected >and compiled into native code by the Java Runtime on the fly. This might >make it hard to say if the java code would be much slower than the C code. > >- Mark > > > > > >Mark Fortner >Sent by: biojava-l-bounces@portal.open-bio.org >12/16/2005 10:36 AM >Please respond to m.fortner > > > To: biojava-list > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) > > >Richard, >Thanks for the example. Your approach is very similar to a non-BioJava >approach that I had worked out earlier. I was wondering if the >BioJava(x) API provides any performance benefit over simply running a >window along a character stream? > >The work that we're doing involves iterating through the human genome, >(and in a number of cases, metagenomic sequences) and we're trying to >squeeze as much performance out of it as possible while minimizing the >memory footprint. > >Thanks, > >Mark > >Richard HOLLAND wrote: > > > >>orderNSymbolList splits the sequence into non-overlapping chunks. What >>is required here is chunks that are only one base different (further on) >>than the previous chunk. >> >>The simplest way would be this: >> >> SymbolList mySeq; // this is your sequence from somewhere >> >> >else > > >> for (int i = 1 ; i <= mySeq.length()-2; i++) { >> SymbolList trimer = mySeq.subSeq(i,i+2); >> >> >// coords are > > >>inclusive so i to i+2 = 3 bases >> // do something with your trimer here >> } >> >>Note that the index starts at 1 and goes right up to and including >>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed. >> >>cheers, >>Richard >> >>Richard Holland >>Bioinformatics Specialist >>GIS extension 8199 >>--------------------------------------------- >>This email is confidential and may be privileged. If you are not the >>intended recipient, please delete it and notify us immediately. Please >>do not copy or use it for any purpose, or disclose its content to any >>other person. Thank you. >>--------------------------------------------- >> >> >> >> >> >> >>>-----Original Message----- >>>From: biojava-l-bounces@portal.open-bio.org >>>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen >>>Sent: Friday, December 16, 2005 7:34 AM >>>To: m.fortner@sbcglobal.net >>>Cc: biojava-list >>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) >>> >>> >>>On Dec 15 2005, Mark Fortner wrote: >>>I think what you want is the SymbolListViews.orderNSymbolList method. >>> >>>It will take a SymbolList and turn it into another where it >>>is viewed in >>>another compound alphabet of the required order. >>> >>> >>> >>> >>> >>> >>>>I'm looking for the best way to iterate through all >>>>nmers within a given sequence. For example, given a >>>>sequence that looks like this: >>>> >>>>ACTGACTGACTG >>>> >>>>If I extract all trimers from this I should get: >>>> >>>>ACT >>>>CTG >>>>TGA >>>>GAC >>>>ACT >>>>CTG >>>>TGA >>>>GAC >>>>ACT >>>>CTG >>>> >>>>Is there an existing class that will allow me to >>>>iterate through a sequence this way, or am I on my >>>>own? >>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Biojava-l mailing list - Biojava-l@biojava.org >>>http://biojava.org/mailman/listinfo/biojava-l >>> >>> >>> >>> >>> >> >> >> > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > > > > From mark.schreiber at novartis.com Fri Dec 16 00:37:30 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Fri Dec 16 00:35:16 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) Message-ID: You should also be aware that the biojavax sequence i/o is actually about 3 times slower than the biojava sequence i/o (for genbank, haven't tested others). This is because it does a much better job of parsing the relevant details into a more structured object heirachy. Having said that it is possible to set i/o pipeline up so that it ignores details that are not of interest to you. If you only want the sequence name and the sequence data from Genbank (and not all the features, annotations and comments) then parsing is on average about 10x faster (based on about 4000 eukaryote records). Details on how to do this can be found in the biojavax docboc in CVS. - Mark Mark Fortner Sent by: biojava-l-bounces@portal.open-bio.org 12/16/2005 12:09 PM Please respond to m.fortner To: biojava-list cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) Mark, Thanks for the info. This is sort of a test project for us. We have a few classes and data structures in C++ that handle operations like sequence io and packing, and are fairly fast. However, we've also come to the realization that we've spent a lot of time dealing with cross-platform and compiler-related problems, and if Java can give us comparable performance then we might switch to it. If nothing else, the opportunity costs would be lower, since we could write and test more code, in the same amount of time. The tools are good-deal better for Java development than C++. We're at the point where we can either continue to invest time in our library or rewrite what we have using BioJava and other libraries. I've written a lot of Java-code over the past 10 years and suggested that we try Java both using the standard javac compiler and gcj to see if we can get C++ like performance. Thanks for your help, Mark mark.schreiber@novartis.com wrote: >There is probably not any performance benefit except in the case of very >large sequences which are often compressed behind the scenes by biojava. > >The benefits may come from ease of use and object orientation. > >eg, There is probably already a parser to read in an validate your >sequence, The windowing or nMer stuff is already figured out and has been >used by lots of people so it's been "stress tested". Also the objects >themselves have a lot of functionality built in that a character stream >does not. The downside of using objects is they take up memory and there >is a certain amount of overhead in there construction. To help overcome >this SymbolLists are actually lists of references to Symbols not lists of >Symbols themselves. This makes them much smaller (although not as small as >char[]'s). > >If you want superfast performance then you should bit encode the data and >operate over it with memory pointers as in C or machine code. You should >be aware though that any intensive loop like the ones that would be used >to carry out this operation in biojava will almost certainly be detected >and compiled into native code by the Java Runtime on the fly. This might >make it hard to say if the java code would be much slower than the C code. > >- Mark > > > > > >Mark Fortner >Sent by: biojava-l-bounces@portal.open-bio.org >12/16/2005 10:36 AM >Please respond to m.fortner > > > To: biojava-list > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) > > >Richard, >Thanks for the example. Your approach is very similar to a non-BioJava >approach that I had worked out earlier. I was wondering if the >BioJava(x) API provides any performance benefit over simply running a >window along a character stream? > >The work that we're doing involves iterating through the human genome, >(and in a number of cases, metagenomic sequences) and we're trying to >squeeze as much performance out of it as possible while minimizing the >memory footprint. > >Thanks, > >Mark > >Richard HOLLAND wrote: > > > >>orderNSymbolList splits the sequence into non-overlapping chunks. What >>is required here is chunks that are only one base different (further on) >>than the previous chunk. >> >>The simplest way would be this: >> >> SymbolList mySeq; // this is your sequence from somewhere >> >> >else > > >> for (int i = 1 ; i <= mySeq.length()-2; i++) { >> SymbolList trimer = mySeq.subSeq(i,i+2); >> >> >// coords are > > >>inclusive so i to i+2 = 3 bases >> // do something with your trimer here >> } >> >>Note that the index starts at 1 and goes right up to and including >>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed. >> >>cheers, >>Richard >> >>Richard Holland >>Bioinformatics Specialist >>GIS extension 8199 >>--------------------------------------------- >>This email is confidential and may be privileged. If you are not the >>intended recipient, please delete it and notify us immediately. Please >>do not copy or use it for any purpose, or disclose its content to any >>other person. Thank you. >>--------------------------------------------- >> >> >> >> >> >> >>>-----Original Message----- >>>From: biojava-l-bounces@portal.open-bio.org >>>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen >>>Sent: Friday, December 16, 2005 7:34 AM >>>To: m.fortner@sbcglobal.net >>>Cc: biojava-list >>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x) >>> >>> >>>On Dec 15 2005, Mark Fortner wrote: >>>I think what you want is the SymbolListViews.orderNSymbolList method. >>> >>>It will take a SymbolList and turn it into another where it >>>is viewed in >>>another compound alphabet of the required order. >>> >>> >>> >>> >>> >>> >>>>I'm looking for the best way to iterate through all >>>>nmers within a given sequence. For example, given a >>>>sequence that looks like this: >>>> >>>>ACTGACTGACTG >>>> >>>>If I extract all trimers from this I should get: >>>> >>>>ACT >>>>CTG >>>>TGA >>>>GAC >>>>ACT >>>>CTG >>>>TGA >>>>GAC >>>>ACT >>>>CTG >>>> >>>>Is there an existing class that will allow me to >>>>iterate through a sequence this way, or am I on my >>>>own? >>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Biojava-l mailing list - Biojava-l@biojava.org >>>http://biojava.org/mailman/listinfo/biojava-l >>> >>> >>> >>> >>> >> >> >> > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > > > > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From smh1008 at cam.ac.uk Fri Dec 16 04:25:21 2005 From: smh1008 at cam.ac.uk (David Huen) Date: Fri Dec 16 04:39:15 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) In-Reply-To: <43A2281B.7010609@sbcglobal.net> References: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com> <43A2281B.7010609@sbcglobal.net> Message-ID: On Dec 16 2005, Mark Fortner wrote: >Richard, >Thanks for the example. Your approach is very similar to a non-BioJava >approach that I had worked out earlier. I was wondering if the >BioJava(x) API provides any performance benefit over simply running a >window along a character stream? > >The work that we're doing involves iterating through the human genome, >(and in a number of cases, metagenomic sequences) and we're trying to >squeeze as much performance out of it as possible while minimizing the >memory footprint. > The only case where I have encountered horrible performance out of using BJ for this kind of activity is where the order is large (say >10). I think it is killing the Alphabet code somewhere to represent the required alphabet. If that is the kind of case you want to deal with, I would believe the SSAHA code in BJ may be adapted to your purposes but this comment does not arise from direct personal experience. Regards, David From ilueny at yahoo.com.br Fri Dec 16 07:35:59 2005 From: ilueny at yahoo.com.br (Ilueny Santos) Date: Fri Dec 16 07:39:53 2005 Subject: [Biojava-l] Locating promoter regions in sequence of DNA with Biojava Message-ID: <20051216123559.69612.qmail@web53908.mail.yahoo.com> Hello to all, First would like to be thankful all, in special to the Mark and Gregory, for having answered. Explaining of form more detailed my doubt: I was trying to locate definitive regions in a DNA sequence (-10 box and -35 box). These regions are small stretches of 6 pairs of bases (pb) and are thus called by being generally the 10 pb and 35 pb, respectively, upstream of +1 (ATG) and the presence of them in the sequence strong characterizes the existence of a promoter. The problem is that they are not steady, for example: region -10 box normally is presented as TATAAT but it can have variations in form TATAAG in such a way or TATTAT how much in its positioning in relation to start codon (+1 ATG) Leaving of Displayed I ask: it will be that I obtain, using biojava it, to make one algor?tmo capable to locate unstable regions (in such a way in the form how much in its positioning) in DNA sequences? I am thankful all one more time that will be able to help. PS.: Gregory favours, already I am studying Regular Expressions and..., Mark, the bayesiano classifier already is fact, but, followed its tip, I go to also study the package org.biojava.dist because it can be useful of some form, thanks. --------------------------------- Yahoo! doce lar. Fa?a do Yahoo! sua homepage. From matthew.pocock at ncl.ac.uk Fri Dec 16 09:17:15 2005 From: matthew.pocock at ncl.ac.uk (Matthew Pocock) Date: Fri Dec 16 09:24:09 2005 Subject: [Biojava-l] Sequence Iteration in BioJava(x) In-Reply-To: References: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com> <43A2281B.7010609@sbcglobal.net> Message-ID: <200512161417.16174.matthew.pocock@ncl.ac.uk> On Friday 16 December 2005 09:25, David Huen wrote: > > If that is the kind of case you want to deal with, I would believe the > SSAHA code in BJ may be adapted to your purposes but this comment does not > arise from direct personal experience. The biojava SSAHA code is likely to be quite efficient for this kind of sliding-window application. I think it can be attached directly to the sequence IO events, and encodes the DNA n-mers directly as bits in an integer datatype. All operations are done by integer comparison, logical operations and shifts. Even though SSAHA itself is probably not what you want, nearly all the building blocks should be there in that module. Matthew > > Regards, > David > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Sun Dec 18 20:12:49 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Dec 18 20:10:15 2005 Subject: [Biojava-l] Locating promoter regions in sequence of DNA with Biojava Message-ID: Hello - There are many approaches you can use to try and find a promoter with variable degrees of success. There is extensive literature on this. You could make profile HMMs and train them with real examples. You could also use a Gibbs Sampler. There are examples of both in the biojava in anger pages http://www.biojava.org/docs/bj_in_anger/ Other approaches would be programs like MEME or the technique called nested MICA developed by Thomas Down of biojava fame which seems to be very good. - Mark Ilueny Santos Sent by: biojava-l-bounces@portal.open-bio.org 12/16/2005 04:35 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Locating promoter regions in sequence of DNA with Biojava Hello to all, First would like to be thankful all, in special to the Mark and Gregory, for having answered. Explaining of form more detailed my doubt: I was trying to locate definitive regions in a DNA sequence (-10 box and -35 box). These regions are small stretches of 6 pairs of bases (pb) and are thus called by being generally the 10 pb and 35 pb, respectively, upstream of +1 (ATG) and the presence of them in the sequence strong characterizes the existence of a promoter. The problem is that they are not steady, for example: region -10 box normally is presented as TATAAT but it can have variations in form TATAAG in such a way or TATTAT how much in its positioning in relation to start codon (+1 ATG) Leaving of Displayed I ask: it will be that I obtain, using biojava it, to make one algor?tmo capable to locate unstable regions (in such a way in the form how much in its positioning) in DNA sequences? I am thankful all one more time that will be able to help. PS.: Gregory favours, already I am studying Regular Expressions and..., Mark, the bayesiano classifier already is fact, but, followed its tip, I go to also study the package org.biojava.dist because it can be useful of some form, thanks. --------------------------------- Yahoo! doce lar. Fa?a do Yahoo! sua homepage. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From ftdgc1 at uaf.edu Wed Dec 21 03:16:27 2005 From: ftdgc1 at uaf.edu (Dan Cardin) Date: Wed Dec 21 03:34:33 2005 Subject: [Biojava-l] Gapping Sequence problems Message-ID: <61871.66.230.82.213.1135152987.squirrel@ftdgc1.email.uaf.edu> Hello all, I am hung up on SimpleGappedSymbolList problem. I want to add gaps to DNA sequences that are loaded in from file that contain gaps and remove the gaps. I just load the sequences into an instance of type Sequence. Here is snippet , private void finalizeAddGapEdit(){ SimpleGappedSymbolList list = new SimpleGappedSymbolList(node.getSequence()); try { list.addGapsInSource(startX+1,counter); Sequence newSequence = DNATools.createDNASequence(list.seqString(),node.getSequence().getName()); node.setSequence(newSequence); gvc.repaint(); } catch (IllegalSymbolException e) { // TODO Auto-generated catch block e.printStackTrace(); } } My code to draw out the gapped symbols looks like this public void paint(Graphics g) { double leftX; Symbol gap; double scale_factor; boolean inGapState; max = 0; for(int i=0;i max) max = seq[i].length(); } } My sequences load from file and display correctly , but when I add gaps they don't show up. I am confused because I believe that the gap symbols used in the underlying sequences are the same. The gaps are added and I know this from printing out the string of the sequence. Does anyone know how to fix this issue or have any suggestions on a better approach? -dc From hollandr at gis.a-star.edu.sg Wed Dec 21 04:22:09 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed Dec 21 04:20:05 2005 Subject: [Biojava-l] Gapping Sequence problems Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602895532@BIONIC.biopolis.one-north.com> I think you could try swapping the use of == for equals() when testing for equivalence to the gap symbol. It _should_ be the same literal object, but maybe not. equals() will work in both cases but == will not. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Dan Cardin > Sent: Wednesday, December 21, 2005 4:16 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] Gapping Sequence problems > > > Hello all, I am hung up on SimpleGappedSymbolList problem. I > want to add > gaps to DNA sequences that are loaded in from file that > contain gaps and > remove the gaps. I just load the sequences into an instance of type > Sequence. Here is snippet , > > private void finalizeAddGapEdit(){ > SimpleGappedSymbolList list = new > SimpleGappedSymbolList(node.getSequence()); > > try { > list.addGapsInSource(startX+1,counter); > > Sequence newSequence = > DNATools.createDNASequence(list.seqString(),node.getSequence() > .getName()); > > node.setSequence(newSequence); > > gvc.repaint(); > > } catch (IllegalSymbolException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > > } > > My code to draw out the gapped symbols looks like this > > public void paint(Graphics g) > { > double leftX; > Symbol gap; > double scale_factor; > boolean inGapState; > max = 0; > for(int i=0;i > gap = seq[i].getAlphabet().getGapSymbol(); > leftX = 0; > scale_factor = (double) getWidth()/seq[i].length(); > inGapState = true; > > for(int j=1;j<=seq[i].length();j++){ > > if(!inGapState && seq[i].symbolAt(j) == gap){ > g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int) > ((j-1)*scale_factor) , i*pixels_bw_lines); > inGapState = true; > } > else if(inGapState && seq[i].symbolAt(j) != gap){ > leftX = j-1; > inGapState = false; > } > } > //draw the last line > g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int) > (seq[i].length()*scale_factor) , i*pixels_bw_lines); > > if(seq[i].length() > max) > max = seq[i].length(); > } > } > > My sequences load from file and display correctly , but when > I add gaps > they don't show up. I am confused because I believe that the > gap symbols > used in the underlying sequences are the same. The gaps are > added and I > know this from printing out the string of the sequence. Does > anyone know > how to fix this issue or have any suggestions on a better approach? > > -dc > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Wed Dec 21 08:57:16 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Dec 21 08:54:26 2005 Subject: [Biojava-l] Gapping Sequence problems Message-ID: You could try if(!inGapState && (seq[i].symbolAt(j) == gap || seq[i].symbolAt(j) == AlphabetManager.getGapSymbol())) - Mark "Dan Cardin" Sent by: biojava-l-bounces@portal.open-bio.org 12/21/2005 04:16 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Gapping Sequence problems Hello all, I am hung up on SimpleGappedSymbolList problem. I want to add gaps to DNA sequences that are loaded in from file that contain gaps and remove the gaps. I just load the sequences into an instance of type Sequence. Here is snippet , private void finalizeAddGapEdit(){ SimpleGappedSymbolList list = new SimpleGappedSymbolList(node.getSequence()); try { list.addGapsInSource(startX+1,counter); Sequence newSequence = DNATools.createDNASequence(list.seqString(),node.getSequence().getName()); node.setSequence(newSequence); gvc.repaint(); } catch (IllegalSymbolException e) { // TODO Auto-generated catch block e.printStackTrace(); } } My code to draw out the gapped symbols looks like this public void paint(Graphics g) { double leftX; Symbol gap; double scale_factor; boolean inGapState; max = 0; for(int i=0;i max) max = seq[i].length(); } } My sequences load from file and display correctly , but when I add gaps they don't show up. I am confused because I believe that the gap symbols used in the underlying sequences are the same. The gaps are added and I know this from printing out the string of the sequence. Does anyone know how to fix this issue or have any suggestions on a better approach? -dc _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From Russell.Smithies at agresearch.co.nz Wed Dec 21 14:35:23 2005 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed Dec 21 14:48:02 2005 Subject: [Biojava-l] [OT] Bioinformatician job vacancy Message-ID: Hi All, I hope you don't mind me posting this here but our company has a vacancy for a bioinformatician and I thought there might be someone here who would like to work in beautiful New Zealand. Russell Smithies Bioinformatics Software Developer Invermay Research Centre Puddle Alley, Mosgiel, New Zealand www.agresearch.co.nz =================================================== As part of AgResearch's company strategy we are continuing to grow our business in the area of bioinformatics. This capability is essential for our science discovery. In this position you will be part of a national team of 26 bioinformaticians, mathematical biologists and statisticians and be based at our Grasslands campus at Palmerston North. This is a permanent position. You will be an advocate for bioinformatics within AgResearch; you will work collaboratively on projects and will provide bioinformatics training and advice to science staff working in the biotechnology area. We are seeking a person who has: * An excellent tertiary qualification in molecular biology or genetics * Experience with the use of bioinformatics applications * Knowledge of life sciences databases and the internet * Well developed IT technical skills and web based technologies * Experience in a training environment * Excellent writing, speaking and interpersonal skills * Familiarity with Perl, Java or Unix Scripting If you possess the above skills, we would like to hear from you. To find out more about this position please contact Anette Becher by email anette.becher@agresearch.co.nz or alternatively phone (03) 489 9028 (after 16th January 2006). For a job description and application form please contact Linda Murray, Phone (03) 489 9011 or email linda.murray@agresearch.co.nz (after 16 January 2006). Alternatively the job description and application form can be found at http://www.agresearch.co.nz/recruitment For general information on AgResearch please visit our website at www.agresearch.co.nz Applications close 30th January 2006 and should be sent to Linda Murray at the following address or by email - Linda Murray AgResearch Invermay Agricultural Centre Private Bag 50034 Mosgiel, Dunedin NEW ZEALAND Email: linda.murray@agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From prem at bch.umontreal.ca Wed Dec 14 16:53:53 2005 From: prem at bch.umontreal.ca (Premkumar Natarajan) Date: Thu Dec 22 11:17:01 2005 Subject: [Biojava-l] Is there any Parser for "rnamotif" output? In-Reply-To: <200512141705.jBEH5A8U019796@portal.open-bio.org> References: <200512141705.jBEH5A8U019796@portal.open-bio.org> Message-ID: <43A09471.2000608@bch.umontreal.ca> Hi: I would like to know if there is any generic praser for Rnamotif output. Even a wrapper-script that can convert rnamotif output to xml would be great. Reason: For one of my project I need to integrate more than one tool. and "rnamotif" is one of them. I'm thinking of using XML format of output to communicate between various programs. Thankyou. Prem prem _AT_ umontreal.ca