From mark.schreiber at novartis.com  Thu Dec  1 00:34:34 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Dec  1 00:32:20 2005
Subject: [Biojava-l] BaumWelchTrainer Broken??!!! (please help)
Message-ID: <OFCCF93121.B35F46A7-ON482570CA.001E8517-482570CA.001EA136@EU.novartis.net>

As a possible work around until this issue can be resolved the 
BaumWelchSampler can be substituted for a BaumWelchTrainer.

Although not technically equivalent they are similar.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From escobarebio at yahoo.com  Thu Dec  1 01:53:34 2005
From: escobarebio at yahoo.com (D.Enrique ESCOBAR ESPINOZA)
Date: Thu Dec  1 01:58:00 2005
Subject: [Biojava-l] cvs download
Message-ID: <20051201065335.14439.qmail@web30504.mail.mud.yahoo.com>

i put the files:
    *  bytecode-0.92.jar
    * commons-cli.jar
    * commons-collections-2.1.jar
    * commons-dbcp-1.1.jar
    * commons-pool-1.1.jar
in my biojava directory: C:\biojava
is set my classpath
set CLASSPATH C:\biojava\biojava.jar;C:\biojava\bytecode-0.92.jar;
C:\biojava\commons-cli.jar;C:\biojava\commons-collections-2.1.jar;
C:\biojava\commons-dbcp-1.1.jar;
C:\biojava\commons-pool-1.1.jar;.
i ve done
> C:
i ve done
> cd biojava/
i ve done
>cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login
when prompted, the password is 'cvs'
i ve done
> cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl
checkout bioperl-live
i ve check the new folders created:
>a folder named bioperl-live/ has been created
i ve done
>$ cvs update
cvs [update aborted]: C:/MYCVSROOT/CVSROOT: No such file or directory
WHAT IS SUPPOSE TO BE MY CVSROOT directory?
HOW AM I SUPPOSE TO SET MY CLASSPATH?
WHAT I DO FOR OTHER MODULES LIKE BIOJAVA-LIMS?
thanks

--------------------------------------------------
D.Enrique ESCOBAR ESPINOZA (B.Sc.) 
http://adn.bioinfo.uqam.ca/~escd07097301/
http://spaces.msn.com/members/escobarebio/
ICQ#: 201778618
-------------------------------------------------
Tel:  (514) 523-8398
Montreal QC Canada


__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com
From mark.schreiber at novartis.com  Thu Dec  1 02:23:22 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Dec  1 02:21:10 2005
Subject: [Biojava-l] cvs download
Message-ID: <OF269DA076.85149F0D-ON482570CA.00282F4F-482570CA.00289772@EU.novartis.net>

Hello -

To get biojava I would suggest you would need to do this:

cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava checkout 
biojava-live

to get biojava-lims (although I don't think this project is still active)

cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava checkout 
biojava-lims

to upate, navigate to where ever you checked out biojava-live and do this:

cvs -Pd update

The -P will remove any empty directories on the CVS tree (there are a few 
so this is highly recommended)
The -d will treat your current directory as CVS_ROOT.

To set your class path see http://www.biojava.org/started.html although it 
looks like you have done that successfully.

- Mark


"D.Enrique ESCOBAR ESPINOZA" <escobarebio@yahoo.com>
Sent by: biojava-l-bounces@portal.open-bio.org
12/01/2005 02:53 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] cvs download


i put the files:
    *  bytecode-0.92.jar
    * commons-cli.jar
    * commons-collections-2.1.jar
    * commons-dbcp-1.1.jar
    * commons-pool-1.1.jar
in my biojava directory: C:\biojava
is set my classpath
set CLASSPATH C:\biojava\biojava.jar;C:\biojava\bytecode-0.92.jar;
C:\biojava\commons-cli.jar;C:\biojava\commons-collections-2.1.jar;
C:\biojava\commons-dbcp-1.1.jar;
C:\biojava\commons-pool-1.1.jar;.
i ve done
> C:
i ve done
> cd biojava/
i ve done
>cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login
when prompted, the password is 'cvs'
i ve done
> cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl
checkout bioperl-live
i ve check the new folders created:
>a folder named bioperl-live/ has been created
i ve done
>$ cvs update
cvs [update aborted]: C:/MYCVSROOT/CVSROOT: No such file or directory
WHAT IS SUPPOSE TO BE MY CVSROOT directory?
HOW AM I SUPPOSE TO SET MY CLASSPATH?
WHAT I DO FOR OTHER MODULES LIKE BIOJAVA-LIMS?
thanks

--------------------------------------------------
D.Enrique ESCOBAR ESPINOZA (B.Sc.) 
http://adn.bioinfo.uqam.ca/~escd07097301/
http://spaces.msn.com/members/escobarebio/
ICQ#: 201778618
-------------------------------------------------
Tel:  (514) 523-8398
Montreal QC Canada


__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From ap3 at sanger.ac.uk  Thu Dec  1 03:54:41 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Thu Dec  1 03:51:27 2005
Subject: [Biojava-l] modify structure
In-Reply-To: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
References: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
Message-ID: <1706112d2cfe772f3501821995576ead@sanger.ac.uk>

Hi Tamas,

  it is possible to access the content of a structure and 
change/add/drop groups  and atoms as you wish.

  When talking about introducing "point mutations" I assume you want to 
re-label a residue's main chain
  atoms and drop the side chain atoms, but keep the Cb one? this would 
take only a few lines to implement.

Cheers,
Andreas

On 30 Nov 2005, at 16:21, Tamas Horvath wrote:

> Is there any way to modify a protein structure by modifying the 
> contents ofthe Structure object?In short, I have a Structure object, 
> parsed from a pdb file, and I want tointroduce point mutations to it, 
> and save the modified structure to a pdbfile for further analysis... 
> (I intend to use gromacs for instance if itmatters)...
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From christoph.gille at charite.de  Thu Dec  1 04:59:35 2005
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Thu Dec  1 04:57:24 2005
Subject: [Biojava-l] 1.4 vs 1.5
Message-ID: <61212.84.190.34.173.1133431175.squirrel@webmail.charite.de>


Recently we had a discussion whether Biojava could use the novel
features of Java1.5. Since I just have moved two larger applications
(633 java files) to Java1.5 I would like to share my experiences with
you.

Applying the new features to the source code took me two days: This
was worth doing because I identified two bugs thanks to the Generics
and Annotations of Java 1.5.

1. GENERICS: I added types to all collections E.g.
public List getProteinsV() { ... } was turned into
public List<Protein> getProteinsV() { ... }
I found one bug where I added the wrong Object type!

2. ANNOTATIONS: I preceded all methods that override a method of the
parent class with the annotation @Override.  Indeed I found a hidden
bug where I mistyped the name of a method !  Instead of of overriding
a method I invented a new one which was not intended.  This kind of
bugs remains unnoticed in a Java1.4 environment.

3. Loops: I achieved a more compact source code by using foreach
loops. The code is better readable now. In 1.4 the head of loops
sometimes require 3 java lines which is now condensed to one single
line.


RETROWEAVER

A sound argument against 1.5 was the broken compatibility to
application servers still working with 1.4 and old Macintosh
OSX. I used Retroweaver to convert the class files after compilation
into 1.4 class format. As a result the program works on a 1.4 virtual
machine as well as on a 1.5 machine. Fortunately, I did not find any
problem related to the code conversion by Retroweaver.


PERFORMANCE:

The foreach loops are slightly slower.  The autoboxing feature is
dangerous in terms of performance because expensive object creation
is hidden.
For example the compiler would conveniently replace "10" by new
Integer(10) for method parameters that require "Integer" and not
"int". Therefore, I do not like autoboxing.

I did not try but the alternative to StringBuffer is said to be faster
because thread safty is omitted but still lacks standard String operations
from other languages.


DISADVANTAGES:

1. Jikes can not be used any more. Jikes compiles faster than javac
and has a better error report.

2. The make script takes longer because Retroweaver must be run.

3. Some additional class files shipped with Retroweaver are required
at runtime and makes the binary larger by 60kbytes. Well, that is not
really significant.

Conclusions: I would highly recommend migrating Biojava to 1.5.

I hope this helps to make a decision.

Christoph


From hotafin at gmail.com  Thu Dec  1 07:26:30 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Thu Dec  1 07:24:18 2005
Subject: [Biojava-l] modify structure
In-Reply-To: <1706112d2cfe772f3501821995576ead@sanger.ac.uk>
References: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
	<1706112d2cfe772f3501821995576ead@sanger.ac.uk>
Message-ID: <c343d7080512010426k7d398d1cwd082003e115e9108@mail.gmail.com>

Exactly... I want to keep a backbone and replace the whole sidechain asnecessary... I know this should be relatively easy, but I'm a bit lost intthe documentation...
On 12/1/05, Andreas Prlic <ap3@sanger.ac.uk> wrote:>> Hi Tamas,>>   it is possible to access the content of a structure and> change/add/drop groups  and atoms as you wish.>>   When talking about introducing "point mutations" I assume you want to> re-label a residue's main chain>   atoms and drop the side chain atoms, but keep the Cb one? this would> take only a few lines to implement.>> Cheers,> Andreas>> On 30 Nov 2005, at 16:21, Tamas Horvath wrote:>> > Is there any way to modify a protein structure by modifying the> > contents ofthe Structure object?In short, I have a Structure object,> > parsed from a pdb file, and I want tointroduce point mutations to it,> > and save the modified structure to a pdbfile for further analysis...> > (I intend to use gromacs for instance if itmatters)...> > _______________________________________________> > Biojava-l mailing list  -  Biojava-l@biojava.org> > http://biojava.org/mailman/listinfo/biojava-l> >> >> ----------------------------------------------------------------------->> Andreas Prlic      Wellcome Trust Sanger Institute>                                Hinxton, Cambridge CB10 1SA, UK>                          +44 (0) 1223 49 6891>>
From k.parveen at gmail.com  Thu Dec  1 08:56:10 2005
From: k.parveen at gmail.com (Parveen k)
Date: Thu Dec  1 09:00:25 2005
Subject: [Biojava-l] help on blast
Message-ID: <1373ba70512010556u2ffd4f75l20242b4ff071de0@mail.gmail.com>

Thanks for all your ideas. It was really useful .
Parveen

Date: Wed, 30 Nov 2005 09:42:51 -0800 (PST)
From: "W. Eric Trull" <wetrull@yahoo.com>
Subject: [Biojava-l] help on blast
To: biojava-l@biojava.org
Cc: fpepin@cs.mcgill.ca, k.parveen@gmail.com
Message-ID: < 20051130174251.95303.qmail@web81405.mail.mud.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1

I have the same situation where I work, except I have a Swing client instead
of an applet.

I decided to use NCBI's BLAST implementation
(http://www.ncbi.nlm.nih.gov/BLAST/download.shtml) invoked using a command
to
org.biojava.utils.ExecRunner.  I then wrapped the whole thing in a Web
Service, which is easier and more flexible than using RMI IMHO.  NCBI's
BLAST
toolkit also contains the executable for building the BLAST database
from a FASTA sequence file (formatdb.exe).

Be sure to set the BLAST output option to XML (-m 7) and use a
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade to parse the
output.  I had trouble using the default output as it is different under
Windows and *nix.  Look at the BioJava in Anger example of parsing BLAST
output if you need help here.

The one twist here is that you are constrained by the applet security model
which, I believe by default, will not allow you to go to a different server
for a Web Service unless you sign the applet.  Something for you to dig into
if you decided to use a Web Service.  The rest of my comments assume that
you
are going to go down the Web Service path.

For creation of the Web Service I'm using webMethods GLUE, but that requires
a $$ license.  I've used Apache's Axis/Tomcat to build web services before
and it is pretty easy to use.  Building a web service future proofs, IMO,
any
changes the powers that be may decided about the client side (i.e. "Now we
want a .NET application", etc.).

If you want a quick prototype, look at IBM's Web Services for Life Sciences
( http://www.alphaworks.ibm.com/tech/ws4LS).  They have a BLAST web service
that is downloadable and configurable to run in a local environment.
 However
their services are a bit dated (February 7, 2003).

One last thought.  I'm working under the constraint that I cannot send my
query sequence outside my local network.  If you DO NOT have this
restriction
and are just querying public databases, both the NCBI and PDB have web
services.  The PDB provides a SOAP over HTTP web service (WSDL at
http://pdbbeta.rcsb.org/pdbws/rcsbWebService?wsdl) which is currently BETA
but will go production January 1, 2006.  Point Axis at this WSDL to generate
client side code and then look for the blastQuery() methods.  The NCBI's web
service does not use SOAP, but provides an HTTP interface.  See
http://www.ncbi.nlm.nih.gov/BLAST/developer.shtml for documentation and a
Perl example.

Good luck!

-Eric Trull

--- Francois Pepin fpepin at cs.mcgill.ca wrote:

> Hi Parveen,
>
> This might not be as easy as you might like.
>
> The applet runs on the client, so you need the applet to communicate
> remotely to the server to send the sequence. Then the easiest way would
> be for the server to call blast on the command-line with the sequence
> (which is pretty easy), parse the result and send it back to the client
> applet.
>
> I think RMI could do this, but I've never had to play with it.
>
> Anyone has a better way to do this?
>
> Francois
>
> On Wed, 2005-11-30 at 16:04 +0530, Parveen k wrote:
> > Hi
> >    I'm pretty new to bioinformatics.i have to incorparate balst in my
> > applet.so that when the client enters the sequence ,it should perform
the
> > blast search against the database we have and return the result.can
> anyone
> > guide me in this regard.
> >
> > --
> > Regards
> > Parveen K
> >
> > YOU MAY SAY I AM A DREAMER, BUT  I AM NOT THE ONLY ONE.
> > I HOPE SOMEDAY YOU WILL JOIN US, AND THE WORLD WILL FOLLOW US.
> >                                   - JOHN LENNON
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> >
>
>
>
>


Thanks.

-W. Eric Trull


--
Regards
Parveen K
YOU MAY SAY I AM A DREAMER, BUT  I AM NOT THE ONLY ONE.
I HOPE SOMEDAY YOU WILL JOIN US, AND THE WORLD WILL FOLLOW US.
                                  - JOHN LENNON

From dreher at mpiib-berlin.mpg.de  Thu Dec  1 12:11:48 2005
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Thu Dec  1 12:10:53 2005
Subject: [Biojava-l] Problem with downloading Genbank-sequence
Message-ID: <438F2ED4.8080105@mpiib-berlin.mpg.de>

Hi,

the problem is the security-policy of the container I use for my 
web-application. In this case it's the 'Sun Java System Application 
Server Platform Edition 8.1'. As Thomas Down suggested, the Server 
prohibits the creation of ClassLoaders, however they are needed by BioJava.
So I tried to customise the Server-configuration 'server.policy'-file by 
adding a new line. Here is the code fraction:


grant {
    permission java.lang.RuntimePermission  "loadLibrary.*";
       ...
       ...  
    //new line:
    permission java.lang.RuntimePermission  "createClassLoader";
};


As some ClassLoader seems to have permission now, I think this was the 
right starting point - and also the exception thrown changed. It's the 
following:

    org.biojava.bio.BioError: Unable to initialize DNATools
    org.biojava.bio.seq.DNATools.(DNATools.java:119)
    org.biojava.bio.seq.db.GenbankSequenceDB.getAlphabet(GenbankSequenceDB.java:66)
    org.biojava.bio.seq.db.GenbankSequenceDB.getSequence(GenbankSequenceDB.java:121)
    rnai.GenbankDownload.loadGenBankSequence(GenbankDownload.java:23)
    rnai.seq_input2.prerender(seq_input2.java:296)
    com.sun.web.ui.appbase.faces.ViewHandlerImpl.prerender(ViewHandlerImpl.java:788)
    com.sun.web.ui.appbase.faces.ViewHandlerImpl.renderView(ViewHandlerImpl.java:282)
    com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:87)
    com.sun.faces.lifecycle.LifecycleImpl.phase(LifecycleImpl.java:221)
    com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:117)
    javax.faces.webapp.FacesServlet.service(FacesServlet.java:198)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

    un.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    java.lang.reflect.Method.invoke(Method.java:585)
    org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:249)
    java.security.AccessController.doPrivileged(Native Method)
    javax.security.auth.Subject.doAsPrivileged(Subject.java:517)
    org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:282)

    org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:165)
    java.security.AccessController.doPrivileged(Native Method)
    com.sun.web.ui.util.UploadFilter.doFilter(UploadFilter.java:179)

---

DNATools.java calls the following line in AlphabetManager.java:

    InputStream alphabetStream =
    ClassTools.getClassLoader(AlphabetManager.class).getResourceAsStream("org/biojava/bio/symbol/AlphabetManager.xml");


So I suppose that the change in the Server-Configuration-file is not 
'globally enough' to affect all custom ClassLoader-calls.
Maybe someone has experienced something similar or knows something about 
this specific Server?

Thanks,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From toddri at eden.rutgers.edu  Thu Dec  1 17:06:39 2005
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu Dec  1 17:26:41 2005
Subject: [Biojava-l] Looking for a Fisher(like) Kernel
In-Reply-To: <43825416.1040909@eden.rutgers.edu>
References: <OF82B46BF9.50800E2E-ON482570C0.002A5349-482570C0.002A70EE@EU.novartis.net>
	<43825416.1040909@eden.rutgers.edu>
Message-ID: <438F73EF.8020408@eden.rutgers.edu>

Hello,

I have good news! I have fixed the bug in the BaumWelchTrainer class 
(hopefully the source in CVS will be updated soon).

Now that I am able to train my Profile HMM, I would like to feed my HMM 
into a Fisher Kernel to perform SVM training in order to find the proper 
scoring threshold for proper classification (ie - use SVM classification 
to set a barrier for my HMM log-odds scores).

Has anyone implemented a Fisher Kernel (or one like it) for the BioJava 
SVM classes?  Any information here would be greatly appreciated.

Thanks,
Todd

From hotafin at gmail.com  Thu Dec  1 20:30:14 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Thu Dec  1 21:56:21 2005
Subject: [Biojava-l] modify structure
In-Reply-To: <c343d7080512010426k7d398d1cwd082003e115e9108@mail.gmail.com>
References: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
	<1706112d2cfe772f3501821995576ead@sanger.ac.uk>
	<c343d7080512010426k7d398d1cwd082003e115e9108@mail.gmail.com>
Message-ID: <c343d7080512011730g7efde2e6l3499d12f4f7d3fc3@mail.gmail.com>

If I've got a Group , which is an amino acid, and I want to shift it by a 3Dvector (or 3 2D vectors), how may I do it?Similarly, if i want to rotate the same structure, how may I do it?If you just can show me a very short sample code, I'd really appreciate it!Thanks!
On 12/1/05, Tamas Horvath <hotafin@gmail.com> wrote:>> Exactly... I want to keep a backbone and replace the whole sidechain as> necessary... I know this should be relatively easy, but I'm a bit lost int> the documentation...>> On 12/1/05, Andreas Prlic <ap3@sanger.ac.uk> wrote:> >> > Hi Tamas,> >> >   it is possible to access the content of a structure and> > change/add/drop groups  and atoms as you wish.> >> >   When talking about introducing "point mutations" I assume you want to> > re-label a residue's main chain> >   atoms and drop the side chain atoms, but keep the Cb one? this would> > take only a few lines to implement.> >> > Cheers,> > Andreas> >> > On 30 Nov 2005, at 16:21, Tamas Horvath wrote:> >> > > Is there any way to modify a protein structure by modifying the> > > contents ofthe Structure object?In short, I have a Structure object,> > > parsed from a pdb file, and I want tointroduce point mutations to it,> > > and save the modified structure to a pdbfile for further analysis...> > > (I intend to use gromacs for instance if itmatters)...> > > _______________________________________________> > > Biojava-l mailing list  -  Biojava-l@biojava.org> > > http://biojava.org/mailman/listinfo/biojava-l> > >> > >> > -----------------------------------------------------------------------> >> > Andreas Prlic      Wellcome Trust Sanger Institute> >                                Hinxton, Cambridge CB10 1SA, UK> >                          +44 (0) 1223 49 6891> >> >>
From escobarebio at yahoo.com  Fri Dec  2 00:53:56 2005
From: escobarebio at yahoo.com (D.Enrique ESCOBAR ESPINOZA)
Date: Fri Dec  2 00:58:18 2005
Subject: [Biojava-l] cvs downlod install
Message-ID: <20051202055356.89235.qmail@web30507.mail.mud.yahoo.com>

i put the files:
    *  bytecode-0.92.jar
    * commons-cli.jar
    * commons-collections-2.1.jar
    * commons-dbcp-1.1.jar
    * commons-pool-1.1.jar
in my biojava directory: C:\biojava
is set my classpath
set CLASSPATH C:\biojava\biojava.jar;
C:\biojava\bytecode-0.92.jar;
C:\biojava\commons-cli.jar;
C:\biojava\commons-collections-2.1.jar;
C:\biojava\commons-dbcp-1.1.jar;
C:\biojava\commons-pool-1.1.jar;.
with cvs download:
all biojava-live files where put into
C:\biojava\biojava-live\
so i m supposed to move these files up to
C:\biojava\
directory?
**
when i use in the folder C:\biojava\biojava-live\:
(windows)
  cd demos
  javac seq\TestEmbl.java
i obtain
$ javac seq\TestEmbl.java
error: cannot read: seqTestEmbl.java
1 error
i obtain
  java seq.TestEmbl seq\AL121903.embl
i have:
$ java seq.TestEmbl seq\AL121903.embl
Exception in thread "main" java.lang.NoClassDefFoundError:
seq/TestEmbl

--------------------------------------------------
D.Enrique ESCOBAR ESPINOZA (B.Sc.) 
http://adn.bioinfo.uqam.ca/~escd07097301/
http://spaces.msn.com/members/escobarebio/
ICQ#: 201778618
-------------------------------------------------
Tel:  (514) 523-8398
Montreal QC Canada


__________________________________ 
Start your day with Yahoo! - Make it your home page! 
http://www.yahoo.com/r/hs
From hollandr at gis.a-star.edu.sg  Fri Dec  2 01:24:27 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Fri Dec  2 01:22:50 2005
Subject: [Biojava-l] cvs downlod install
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602894D67@BIONIC.biopolis.one-north.com>

Your problem lies here:

> when i use in the folder C:\biojava\biojava-live\:
> (windows)
>   cd demos
>   javac seq\TestEmbl.java
> i obtain
> $ javac seq\TestEmbl.java
> error: cannot read: seqTestEmbl.java
> 1 error

Note that the java file has not been compiled. Hence when you later try
to run the compiled class, it's not there, so you get a NoClassDefFound
exception.

I suspect that either you accidentally left out the backslash (\)
between "seq" and "TestEmbl.java" when you typed the javac command, or
that Windows is misinterpreting the backslash. Try replacing it with a
double backslash (\\) or a forward slash (/).

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org 
> [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of 
> D.Enrique ESCOBAR ESPINOZA
> Sent: Friday, December 02, 2005 1:54 PM
> To: biojava-l@biojava.org
> Subject: [Biojava-l] cvs downlod install
> 
> 
> i put the files:
>     *  bytecode-0.92.jar
>     * commons-cli.jar
>     * commons-collections-2.1.jar
>     * commons-dbcp-1.1.jar
>     * commons-pool-1.1.jar
> in my biojava directory: C:\biojava
> is set my classpath
> set CLASSPATH C:\biojava\biojava.jar;
> C:\biojava\bytecode-0.92.jar;
> C:\biojava\commons-cli.jar;
> C:\biojava\commons-collections-2.1.jar;
> C:\biojava\commons-dbcp-1.1.jar;
> C:\biojava\commons-pool-1.1.jar;.
> with cvs download:
> all biojava-live files where put into
> C:\biojava\biojava-live\
> so i m supposed to move these files up to
> C:\biojava\
> directory?
> **
> when i use in the folder C:\biojava\biojava-live\:
> (windows)
>   cd demos
>   javac seq\TestEmbl.java
> i obtain
> $ javac seq\TestEmbl.java
> error: cannot read: seqTestEmbl.java
> 1 error
> i obtain
>   java seq.TestEmbl seq\AL121903.embl
> i have:
> $ java seq.TestEmbl seq\AL121903.embl
> Exception in thread "main" java.lang.NoClassDefFoundError:
> seq/TestEmbl
> 
> --------------------------------------------------
> D.Enrique ESCOBAR ESPINOZA (B.Sc.) 
> http://adn.bioinfo.uqam.ca/~escd07097301/
> http://spaces.msn.com/members/escobarebio/
> ICQ#: 201778618
> -------------------------------------------------
> Tel:  (514) 523-8398
> Montreal QC Canada
> 
> 
> 		
> __________________________________ 
> Start your day with Yahoo! - Make it your home page! 
> http://www.yahoo.com/r/hs
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

From ap3 at sanger.ac.uk  Fri Dec  2 05:17:49 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Fri Dec  2 05:14:22 2005
Subject: [Biojava-l] modify structure
In-Reply-To: <c343d7080512011730g7efde2e6l3499d12f4f7d3fc3@mail.gmail.com>
References: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
	<1706112d2cfe772f3501821995576ead@sanger.ac.uk>
	<c343d7080512010426k7d398d1cwd082003e115e9108@mail.gmail.com>
	<c343d7080512011730g7efde2e6l3499d12f4f7d3fc3@mail.gmail.com>
Message-ID: <d0e2fd03a50d90d7c7611b17f5f183f4@sanger.ac.uk>

Hi Tamas!


> If I've got a Group , which is an amino acid, and I want to shift it 
> by a 3D vector (or 3 2D vectors), how may I do it?

There is the org.biojava.bio.structure.Calc class that allows to do 
calculations with the structure.

e.g. to shift a structure do:


                 double x = 2.0;
                 double y = 0.2;
                 double z = 12.3;

                 Atom vector = new AtomImpl();
                 vector.setX(x);
                 vector.setY(y);
                 vector.setZ(z);

                 // shift the structure.
                 Calc.shift(structure,vector);


>  Similarly, if i want to rotate the same structure, how may I do it?

	      double[][] matrix = new double[3][3];

                 matrix[0][0] = 0.1;
                 matrix[0][1] = 0.2;
                 matrix[0][2] = 0.3;
                 matrix[1][0] = 0.4;
                 matrix[1][1] = 0.5;
                 matrix[1][2] = 0.6;
                 matrix[2][0] = 0.7;
                 matrix[2][1] = 0.8;
                 matrix[2][2] = 0.9;

                 Calc.rotate(structure,matrix);


And here is an example regarding your questions from yesterday,
how to do mutations. most of the code actually deals with finding the 
right  chain and residue.
I will add the "mutator" class to cvs,  so in future doing mutations 
will be a two liner...

Cheers,
Andreas


/*
  *                  BioJava development code
  *
  * This code may be freely distributed and modified under the
  * terms of the GNU Lesser General Public Licence.  This should
  * be distributed with the code.  If you do not have a copy,
  * see:
  *
  *      http://www.gnu.org/copyleft/lesser.html
  *
  * Copyright for this code is held jointly by the individual
  * authors.  These should be listed in @author doc comments.
  *
  * For more information on the BioJava project and its aims,
  * or to join the biojava-l mailing list, visit the home page
  * at:
  *
  *      http://www.biojava.org/
  *
  * Created on Nov 30, 2005
  *
  */

import java.io.FileOutputStream;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.biojava.bio.structure.AminoAcid;
import org.biojava.bio.structure.AminoAcidImpl;
import org.biojava.bio.structure.Atom;
import org.biojava.bio.structure.AtomIterator;
import org.biojava.bio.structure.Chain;
import org.biojava.bio.structure.ChainImpl;
import org.biojava.bio.structure.Group;
import org.biojava.bio.structure.Structure;
import org.biojava.bio.structure.StructureImpl;
import org.biojava.bio.structure.io.PDBFileReader;
import org.biojava.bio.structure.io.PDBParseException;


public class structureTest {

     public structureTest() {
         super();

     }

     public static void main (String[] args){
         String filename   =  "/Users/ap3/WORK/PDB/5pti.pdb" ;
         String outputfile =  "/Users/ap3/WORK/PDB/mutated.pdb" ;

         PDBFileReader pdbreader = new PDBFileReader();

         try{
                 Structure struc = pdbreader.getStructure(filename);
                 System.out.println(struc);


                 String chainId = " ";
                 String pdbResnum = "2";
                 String newType = "ARG";

                 // mutate the original structure and create a new one.
                 Mutator m = new Mutator();
                 Structure newstruc = 
m.mutate(struc,chainId,pdbResnum,newType);

                 FileOutputStream out= new FileOutputStream(outputfile);
                 PrintStream p =  new PrintStream( out );

                 p.println (newstruc.toPDB());

                 p.close();


         } catch (Exception e) {
             e.printStackTrace();
         }
     }
}

class Mutator{
     List supportedAtoms;

     public Mutator(){
         supportedAtoms = new ArrayList();
         supportedAtoms.add("N");
         supportedAtoms.add("CA");
         supportedAtoms.add("C");
         supportedAtoms.add("O");
         supportedAtoms.add("CB");
     }

     /** creates a new structure which is identical with the original 
one.
      * only one amino acid will be different.
      *
      * @param struc
      * @param chainId
      * @param pdbResnum
      * @param newType
      * @return
      * @throws PDBParseException
      */
     public Structure  mutate(Structure struc, String chainId, String 
pdbResnum, String newType)
     throws PDBParseException{


         // create a  container for the new structure
         Structure newstruc = new StructureImpl();

         // first we need to find our corresponding chain

         // get the chains for model nr. 0
         // if structure is xray there will be only one "model".
         List chains = struc.getChains(0);

         // iterate over all chains.
         Iterator iter = chains.iterator();
         while (iter.hasNext()){
             Chain c = (Chain)iter.next();
             if (c.getName().equals(chainId)) {
                 // here is our chain!

                 Chain newchain = new ChainImpl();
                 newchain.setName(c.getName());

                  List groups = c.getGroups();

                 // now iterate over all groups in this chain.
                 // in order to find the amino acid that has this 
pdbRenum.

                 Iterator giter = groups.iterator();
                 while (giter.hasNext()){
                     Group g = (Group) giter.next();
                     String rnum = g.getPDBCode();

                     // we only mutate amino acids
                     // and ignore hetatoms and nucleotides in this case
                     if (rnum.equals(pdbResnum) && 
(g.getType().equals("amino"))){

                         // create the mutated amino acid and add it to 
our new chain
                         AminoAcid newgroup = 
mutateResidue((AminoAcid)g,newType);
                         newchain.addGroup(newgroup);
                     }
                     else {
                         // add the group  to the new chain unmodified.
                         newchain.addGroup(g);
                     }
                 }

                 // add the newly constructed chain to the structure;
                 newstruc.addChain(newchain);
             } else {
                 // this chain is not requested, add it to the new 
structure unmodified.
                 newstruc.addChain(c);
             }

         }
         return newstruc;
     }

     /** create a new residue which is of the new type.
      * Only the atoms N, Ca, C, O, Cb will be considered.
      * prolines are not mutated...
      * @param oldAmino
      * @param newType
      * @return
      */
     public AminoAcid mutateResidue(AminoAcid oldAmino, String newType)
     throws PDBParseException {

         AminoAcid newgroup = new AminoAcidImpl();

         newgroup.setPDBCode(oldAmino.getPDBCode());
         newgroup.setPDBName(newType);


         AtomIterator aiter =new AtomIterator(oldAmino);
         while (aiter.hasNext()){
             Atom a = (Atom)aiter.next();
             if ( supportedAtoms.contains(a.getName())){
                 newgroup.addAtom(a);
             }
         }

         return newgroup;

     }

}

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From matthew.pocock at ncl.ac.uk  Fri Dec  2 06:11:15 2005
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Fri Dec  2 06:28:51 2005
Subject: [Biojava-l] Re: Looking for a Fisher(like) Kernel
In-Reply-To: <438F73EF.8020408@eden.rutgers.edu>
References: <OF82B46BF9.50800E2E-ON482570C0.002A5349-482570C0.002A70EE@EU.novartis.net>
	<43825416.1040909@eden.rutgers.edu>
	<438F73EF.8020408@eden.rutgers.edu>
Message-ID: <200512021111.16812.matthew.pocock@ncl.ac.uk>

On Thursday 01 December 2005 22:06, Todd Riley wrote:
> Hello,
>
> I have good news! I have fixed the bug in the BaumWelchTrainer class
> (hopefully the source in CVS will be updated soon).

Yay! What was it?

>
> Now that I am able to train my Profile HMM, I would like to feed my HMM
> into a Fisher Kernel to perform SVM training in order to find the proper
> scoring threshold for proper classification (ie - use SVM classification
> to set a barrier for my HMM log-odds scores).

Sounds like a plan...

>
> Has anyone implemented a Fisher Kernel (or one like it) for the BioJava
> SVM classes?  Any information here would be greatly appreciated.
>

I have not heard of one. However, I think you should be able to calcualte the 
needed numbers using code nearly identical to that in the BaumWelchTrainer. 
In fact, I have a sneeking suspicion that the ModelTrainer parameters after 1 
cycle of training (before updating the model!) are the raw numbers that the 
SVM fischer-kernel requires.

> Thanks,
> Todd

Matthew
From dreher at mpiib-berlin.mpg.de  Fri Dec  2 07:48:00 2005
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Fri Dec  2 07:47:12 2005
Subject: [Biojava-l] Problem with downloading Genbank-sequence
In-Reply-To: <438F2ED4.8080105@mpiib-berlin.mpg.de>
References: <438F2ED4.8080105@mpiib-berlin.mpg.de>
Message-ID: <43904280.80300@mpiib-berlin.mpg.de>

Hello,

the exception I posted in the last mail had nothing to do with the 
application server I use. It was an IDE specific bug: 'Java Studio 
Creator Early Access 2' failed to execute the build.xml-file properly 
when creating biojava.jar.
This was solved by using Netbeans to build the jar-file again and 
deleting and re-adding it in StudioCreator.

Greetings,
Felix


Felix Dreher wrote:

> Hi,
>
> the problem is the security-policy of the container I use for my 
> web-application. In this case it's the 'Sun Java System Application 
> Server Platform Edition 8.1'. As Thomas Down suggested, the Server 
> prohibits the creation of ClassLoaders, however they are needed by 
> BioJava.
> So I tried to customise the Server-configuration 'server.policy'-file 
> by adding a new line. Here is the code fraction:
>
>
> grant {
>     permission java.lang.RuntimePermission  "loadLibrary.*";
>        ...
>        ...  
>     //new line:
>     permission java.lang.RuntimePermission  "createClassLoader";
> };
>
>
> As some ClassLoader seems to have permission now, I think this was the 
> right starting point - and also the exception thrown changed. It's the 
> following:
>
>     org.biojava.bio.BioError: Unable to initialize DNATools
>     org.biojava.bio.seq.DNATools.(DNATools.java:119)
>     org.biojava.bio.seq.db.GenbankSequenceDB.getAlphabet(GenbankSequenceDB.java:66)
>     org.biojava.bio.seq.db.GenbankSequenceDB.getSequence(GenbankSequenceDB.java:121)
>     rnai.GenbankDownload.loadGenBankSequence(GenbankDownload.java:23)
>     rnai.seq_input2.prerender(seq_input2.java:296)
>     com.sun.web.ui.appbase.faces.ViewHandlerImpl.prerender(ViewHandlerImpl.java:788)
>     com.sun.web.ui.appbase.faces.ViewHandlerImpl.renderView(ViewHandlerImpl.java:282)
>     com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:87)
>     com.sun.faces.lifecycle.LifecycleImpl.phase(LifecycleImpl.java:221)
>     com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:117)
>     javax.faces.webapp.FacesServlet.service(FacesServlet.java:198)
>     sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>     un.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     java.lang.reflect.Method.invoke(Method.java:585)
>     org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:249)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAsPrivileged(Subject.java:517)
>     org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:282)
>
>     org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:165)
>     java.security.AccessController.doPrivileged(Native Method)
>     com.sun.web.ui.util.UploadFilter.doFilter(UploadFilter.java:179)
>
> ---
>
> DNATools.java calls the following line in AlphabetManager.java:
>
>     InputStream alphabetStream =
>     ClassTools.getClassLoader(AlphabetManager.class).getResourceAsStream("org/biojava/bio/symbol/AlphabetManager.xml");
>
>
> So I suppose that the change in the Server-Configuration-file is not 
> 'globally enough' to affect all custom ClassLoader-calls.
> Maybe someone has experienced something similar or knows something 
> about this specific Server?
>
> Thanks,
> Felix
>
>
>
>-- 
>Felix Dreher
>Max-Planck-Institute for Infection Biology
>Campus Charit? Mitte
>Department of Immunology
>Mailing address: Schumannstra?e 21/22
>Visitors: Virchowweg 12
>10117 Berlin
>Germany
>Tel.: +49 (0)30 28460-254 / -494
>Mobile: +49 (0)163 7542426
>  
>


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From hotafin at gmail.com  Fri Dec  2 08:02:38 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Fri Dec  2 08:00:19 2005
Subject: [Biojava-l] modify structure
In-Reply-To: <d0e2fd03a50d90d7c7611b17f5f183f4@sanger.ac.uk>
References: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
	<1706112d2cfe772f3501821995576ead@sanger.ac.uk>
	<c343d7080512010426k7d398d1cwd082003e115e9108@mail.gmail.com>
	<c343d7080512011730g7efde2e6l3499d12f4f7d3fc3@mail.gmail.com>
	<d0e2fd03a50d90d7c7611b17f5f183f4@sanger.ac.uk>
Message-ID: <c343d7080512020502lc87c8epc39781e78816e2b7@mail.gmail.com>

Thanks for the codes! I've noticed the methods in Calc, but my main questionis the following. Let's say I've got a primitive library of AminoAcides.They stored as a group, they have all the atoms. When I'm mutating thechain, I want to keep the backbone atoms in place, so as far as your mutatemethod goes it's ok. But now I want to replace the sidechain. In order to dothat, I'd shift and rotate the desired AA in place (Cbs would be identicaland the other backbone atoms as close as possible), and then copy thesidechain atoms to the mutated AA... (I hope that's clear)
So do I have to wrap my Group objects to a Chain/Structure object in orderto shift and rotate them?
I don't really get how the rotation is supposed to work... what is exactlythe matrix it asks for?
On 12/2/05, Andreas Prlic <ap3@sanger.ac.uk> wrote:>> Hi Tamas!>>> > If I've got a Group , which is an amino acid, and I want to shift it> > by a 3D vector (or 3 2D vectors), how may I do it?>> There is the org.biojava.bio.structure.Calc class that allows to do> calculations with the structure.>> e.g. to shift a structure do:>>>                  double x = 2.0;>                  double y = 0.2;>                  double z = 12.3;>>                  Atom vector = new AtomImpl();>                  vector.setX(x);>                  vector.setY(y);>                  vector.setZ(z);>>                  // shift the structure.>                  Calc.shift(structure,vector);>>>> >  Similarly, if i want to rotate the same structure, how may I do it?>>               double[][] matrix = new double[3][3];>>                  matrix[0][0] = 0.1;>                  matrix[0][1] = 0.2;>                  matrix[0][2] = 0.3;>                  matrix[1][0] = 0.4;>                  matrix[1][1] = 0.5;>                  matrix[1][2] = 0.6;>                  matrix[2][0] = 0.7;>                  matrix[2][1] = 0.8;>                  matrix[2][2] = 0.9;>>                  Calc.rotate(structure,matrix);>>> And here is an example regarding your questions from yesterday,> how to do mutations. most of the code actually deals with finding the> right  chain and residue.> I will add the "mutator" class to cvs,  so in future doing mutations> will be a two liner...>> Cheers,> Andreas>>> /*>   *                  BioJava development code>   *>   * This code may be freely distributed and modified under the>   * terms of the GNU Lesser General Public Licence.  This should>   * be distributed with the code.  If you do not have a copy,>   * see:>   *>   *      http://www.gnu.org/copyleft/lesser.html>   *>   * Copyright for this code is held jointly by the individual>   * authors.  These should be listed in @author doc comments.>   *>   * For more information on the BioJava project and its aims,>   * or to join the biojava-l mailing list, visit the home page>   * at:>   *>   *      http://www.biojava.org/>   *>   * Created on Nov 30, 2005>   *>   */>> import java.io.FileOutputStream;> import java.io.PrintStream;> import java.util.ArrayList;> import java.util.Iterator;> import java.util.List;>> import org.biojava.bio.structure.AminoAcid;> import org.biojava.bio.structure.AminoAcidImpl;> import org.biojava.bio.structure.Atom;> import org.biojava.bio.structure.AtomIterator;> import org.biojava.bio.structure.Chain;> import org.biojava.bio.structure.ChainImpl;> import org.biojava.bio.structure.Group;> import org.biojava.bio.structure.Structure;> import org.biojava.bio.structure.StructureImpl;> import org.biojava.bio.structure.io.PDBFileReader;> import org.biojava.bio.structure.io.PDBParseException;>>> public class structureTest {>>      public structureTest() {>          super();>>      }>>      public static void main (String[] args){>          String filename   =  "/Users/ap3/WORK/PDB/5pti.pdb" ;>          String outputfile =  "/Users/ap3/WORK/PDB/mutated.pdb" ;>>          PDBFileReader pdbreader = new PDBFileReader();>>          try{>                  Structure struc = pdbreader.getStructure(filename);>                  System.out.println(struc);>>>                  String chainId = " ";>                  String pdbResnum = "2";>                  String newType = "ARG";>>                  // mutate the original structure and create a new one.>                  Mutator m = new Mutator();>                  Structure newstruc => m.mutate(struc,chainId,pdbResnum,newType);>>                  FileOutputStream out= new FileOutputStream(outputfile);>                  PrintStream p =  new PrintStream( out );>>                  p.println (newstruc.toPDB());>>                  p.close();>>>          } catch (Exception e) {>              e.printStackTrace();>          }>      }> }>> class Mutator{>      List supportedAtoms;>>      public Mutator(){>          supportedAtoms = new ArrayList();>          supportedAtoms.add("N");>          supportedAtoms.add("CA");>          supportedAtoms.add("C");>          supportedAtoms.add("O");>          supportedAtoms.add("CB");>      }>>      /** creates a new structure which is identical with the original> one.>       * only one amino acid will be different.>       *>       * @param struc>       * @param chainId>       * @param pdbResnum>       * @param newType>       * @return>       * @throws PDBParseException>       */>      public Structure  mutate(Structure struc, String chainId, String> pdbResnum, String newType)>      throws PDBParseException{>>>          // create a  container for the new structure>          Structure newstruc = new StructureImpl();>>          // first we need to find our corresponding chain>>          // get the chains for model nr. 0>          // if structure is xray there will be only one "model".>          List chains = struc.getChains(0);>>          // iterate over all chains.>          Iterator iter = chains.iterator();>          while (iter.hasNext()){>              Chain c = (Chain)iter.next();>              if (c.getName().equals(chainId)) {>                  // here is our chain!>>                  Chain newchain = new ChainImpl();>                  newchain.setName(c.getName());>>                   List groups = c.getGroups();>>                  // now iterate over all groups in this chain.>                  // in order to find the amino acid that has this> pdbRenum.>>                  Iterator giter = groups.iterator();>                  while (giter.hasNext()){>                      Group g = (Group) giter.next();>                      String rnum = g.getPDBCode();>>                      // we only mutate amino acids>                      // and ignore hetatoms and nucleotides in this case>                      if (rnum.equals(pdbResnum) &&> (g.getType().equals("amino"))){>>                          // create the mutated amino acid and add it to> our new chain>                          AminoAcid newgroup => mutateResidue((AminoAcid)g,newType);>                          newchain.addGroup(newgroup);>                      }>                      else {>                          // add the group  to the new chain unmodified.>                          newchain.addGroup(g);>                      }>                  }>>                  // add the newly constructed chain to the structure;>                  newstruc.addChain(newchain);>              } else {>                  // this chain is not requested, add it to the new> structure unmodified.>                  newstruc.addChain(c);>              }>>          }>          return newstruc;>      }>>      /** create a new residue which is of the new type.>       * Only the atoms N, Ca, C, O, Cb will be considered.>       * prolines are not mutated...>       * @param oldAmino>       * @param newType>       * @return>       */>      public AminoAcid mutateResidue(AminoAcid oldAmino, String newType)>      throws PDBParseException {>>          AminoAcid newgroup = new AminoAcidImpl();>>          newgroup.setPDBCode(oldAmino.getPDBCode());>          newgroup.setPDBName(newType);>>>          AtomIterator aiter =new AtomIterator(oldAmino);>          while (aiter.hasNext()){>              Atom a = (Atom)aiter.next();>              if ( supportedAtoms.contains(a.getName())){>                  newgroup.addAtom(a);>              }>          }>>          return newgroup;>>      }>> }>> ----------------------------------------------------------------------->> Andreas Prlic      Wellcome Trust Sanger Institute>                                Hinxton, Cambridge CB10 1SA, UK>                          +44 (0) 1223 49 6891>> _______________________________________________> Biojava-l mailing list  -  Biojava-l@biojava.org> http://biojava.org/mailman/listinfo/biojava-l>
From hotafin at gmail.com  Fri Dec  2 08:23:04 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Fri Dec  2 08:20:47 2005
Subject: [Biojava-l] modify structure
In-Reply-To: <c343d7080512020502lc87c8epc39781e78816e2b7@mail.gmail.com>
References: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
	<1706112d2cfe772f3501821995576ead@sanger.ac.uk>
	<c343d7080512010426k7d398d1cwd082003e115e9108@mail.gmail.com>
	<c343d7080512011730g7efde2e6l3499d12f4f7d3fc3@mail.gmail.com>
	<d0e2fd03a50d90d7c7611b17f5f183f4@sanger.ac.uk>
	<c343d7080512020502lc87c8epc39781e78816e2b7@mail.gmail.com>
Message-ID: <c343d7080512020523o1f1cf32fva78fc917efa6c341@mail.gmail.com>

Just to be more clear:I store the AminoAcides as HashMap<String,Group> where 1 group contains 1AA, and the String is the name of the AA. (I prefer 1 letter code there)
To shift the desired AA in place, I'd match the Cb atomsThen rotate the AA so that the Ca-Cb line would matchThen rotate the AA so that the N-Ca-Cb pane matches (In this case it's equalto the N-Ca line match)
After this, the AA would be roughly in place.
After this there may should be a collision test, but I think that part canbe handeled by the GROMACS package, which I'd ther run anyway to see how themutation effects the structure.
On 12/2/05, Tamas Horvath <hotafin@gmail.com> wrote:>> Thanks for the codes! I've noticed the methods in Calc, but my main> question is the following. Let's say I've got a primitive library of> AminoAcides. They stored as a group, they have all the atoms. When I'm> mutating the chain, I want to keep the backbone atoms in place, so as far as> your mutate method goes it's ok. But now I want to replace the sidechain. In> order to do that, I'd shift and rotate the desired AA in place (Cbs would be> identical and the other backbone atoms as close as possible), and then copy> the sidechain atoms to the mutated AA... (I hope that's clear)>> So do I have to wrap my Group objects to a Chain/Structure object in order> to shift and rotate them?>> I don't really get how the rotation is supposed to work... what is exactly> the matrix it asks for?>> On 12/2/05, Andreas Prlic <ap3@sanger.ac.uk> wrote:> >> > Hi Tamas!> >> >> > > If I've got a Group , which is an amino acid, and I want to shift it> > > by a 3D vector (or 3 2D vectors), how may I do it?> >> > There is the org.biojava.bio.structure.Calc class that allows to do> > calculations with the structure.> >> > e.g. to shift a structure do:> >> >> >                  double x = 2.0;> >                  double y = 0.2;> >                  double z = 12.3;> >> >                  Atom vector = new AtomImpl();> >                  vector.setX(x);> >                  vector.setY(y);> >                  vector.setZ(z);> >> >                  // shift the structure.> >                  Calc.shift(structure,vector);> >> >> >> > >  Similarly, if i want to rotate the same structure, how may I do it?> >> >               double[][] matrix = new double[3][3];> >> >                  matrix[0][0] = 0.1;> >                  matrix[0][1] = 0.2;> >                  matrix[0][2] = 0.3;> >                  matrix[1][0] = 0.4;> >                  matrix[1][1] = 0.5;> >                  matrix[1][2] = 0.6;> >                  matrix[2][0] = 0.7;> >                  matrix[2][1] = 0.8;> >                  matrix[2][2] = 0.9;> >> >                  Calc.rotate(structure,matrix);> >> >> > And here is an example regarding your questions from yesterday,> > how to do mutations. most of the code actually deals with finding the> > right  chain and residue.> > I will add the "mutator" class to cvs,  so in future doing mutations> > will be a two liner...> >> > Cheers,> > Andreas> >> >> > /*> >   *                  BioJava development code> >   *> >   * This code may be freely distributed and modified under the> >   * terms of the GNU Lesser General Public Licence.  This should> >   * be distributed with the code.  If you do not have a copy,> >   * see:> >   *> >   *      http://www.gnu.org/copyleft/lesser.html> >   *> >   * Copyright for this code is held jointly by the individual> >   * authors.  These should be listed in @author doc comments.> >   *> >   * For more information on the BioJava project and its aims,> >   * or to join the biojava-l mailing list, visit the home page> >   * at:> >   *> >   *      http://www.biojava.org/> >   *> >   * Created on Nov 30, 2005> >   *> >   */> >> > import java.io.FileOutputStream;> > import java.io.PrintStream;> > import java.util.ArrayList;> > import java.util.Iterator;> > import java.util.List;> >> > import org.biojava.bio.structure.AminoAcid;> > import org.biojava.bio.structure.AminoAcidImpl;> > import org.biojava.bio.structure.Atom;> > import org.biojava.bio.structure.AtomIterator;> > import org.biojava.bio.structure.Chain;> > import org.biojava.bio.structure.ChainImpl;> > import org.biojava.bio.structure.Group;> > import org.biojava.bio.structure.Structure ;> > import org.biojava.bio.structure.StructureImpl;> > import org.biojava.bio.structure.io.PDBFileReader;> > import org.biojava.bio.structure.io.PDBParseException;> >> >> > public class structureTest {> >> >      public structureTest() {> >          super();> >> >      }> >> >      public static void main (String[] args){> >          String filename   =  "/Users/ap3/WORK/PDB/5pti.pdb" ;> >          String outputfile =  "/Users/ap3/WORK/PDB/mutated.pdb" ;> >> >          PDBFileReader pdbreader = new PDBFileReader();> >> >          try{> >                  Structure struc = pdbreader.getStructure(filename);> >                  System.out.println(struc);> >> >> >                  String chainId = " ";> >                  String pdbResnum = "2";> >                  String newType = "ARG";> >> >                  // mutate the original structure and create a new one.> >                  Mutator m = new Mutator();> >                  Structure newstruc => > m.mutate(struc,chainId,pdbResnum,newType);> >> >                  FileOutputStream out= new FileOutputStream(outputfile);> >                  PrintStream p =  new PrintStream( out );> >> >                  p.println (newstruc.toPDB());> >> >                  p.close();> >> >> >          } catch (Exception e) {> >              e.printStackTrace();> >          }> >      }> > }> >> > class Mutator{> >      List supportedAtoms;> >> >      public Mutator(){> >          supportedAtoms = new ArrayList();> >          supportedAtoms.add("N");> >          supportedAtoms.add("CA");> >          supportedAtoms.add ("C");> >          supportedAtoms.add("O");> >          supportedAtoms.add("CB");> >      }> >> >      /** creates a new structure which is identical with the original> > one.> >       * only one amino acid will be different.> >       *> >       * @param struc> >       * @param chainId> >       * @param pdbResnum> >       * @param newType> >       * @return> >       * @throws PDBParseException> >       */> >      public Structure  mutate(Structure struc, String chainId, String> > pdbResnum, String newType)> >      throws PDBParseException{> >> >> >          // create a  container for the new structure> >          Structure newstruc = new StructureImpl();> >> >          // first we need to find our corresponding chain> >> >          // get the chains for model nr. 0> >          // if structure is xray there will be only one "model".> >          List chains = struc.getChains(0);> >> >          // iterate over all chains.> >          Iterator iter = chains.iterator();> >          while (iter.hasNext()){> >              Chain c = (Chain)iter.next();> >              if (c.getName().equals(chainId)) {> >                  // here is our chain!> >> >                  Chain newchain = new ChainImpl();> >                  newchain.setName(c.getName());> >> >                   List groups = c.getGroups();> >> >                  // now iterate over all groups in this chain.> >                  // in order to find the amino acid that has this> > pdbRenum.> >> >                  Iterator giter = groups.iterator();> >                  while (giter.hasNext()){> >                      Group g = (Group) giter.next();> >                      String rnum = g.getPDBCode();> >> >                      // we only mutate amino acids> >                      // and ignore hetatoms and nucleotides in this case> >                      if ( rnum.equals(pdbResnum) &&> > (g.getType().equals("amino"))){> >> >                          // create the mutated amino acid and add it to> > our new chain> >                          AminoAcid newgroup => > mutateResidue((AminoAcid)g,newType);> >                          newchain.addGroup(newgroup);> >                      }> >                      else {> >                          // add the group  to the new chain unmodified.> >                          newchain.addGroup(g);> >                      }> >                  }> >> >                  // add the newly constructed chain to the structure;> >                  newstruc.addChain(newchain);> >              } else {> >                  // this chain is not requested, add it to the new> > structure unmodified.> >                  newstruc.addChain(c);> >              }> >> >          }> >          return newstruc;> >      }> >> >      /** create a new residue which is of the new type.> >       * Only the atoms N, Ca, C, O, Cb will be considered.> >       * prolines are not mutated...> >       * @param oldAmino> >       * @param newType> >       * @return> >       */> >      public AminoAcid mutateResidue(AminoAcid oldAmino, String newType)> >      throws PDBParseException {> >> >          AminoAcid newgroup = new AminoAcidImpl();> >> >          newgroup.setPDBCode (oldAmino.getPDBCode());> >          newgroup.setPDBName(newType);> >> >> >          AtomIterator aiter =new AtomIterator(oldAmino);> >          while (aiter.hasNext()){> >              Atom a = (Atom)aiter.next();> >              if ( supportedAtoms.contains(a.getName())){> >                  newgroup.addAtom(a);> >              }> >          }> >> >          return newgroup;> >> >      }> >> > }> >> > -----------------------------------------------------------------------> >> > Andreas Prlic      Wellcome Trust Sanger Institute> >                                Hinxton, Cambridge CB10 1SA, UK> >                          +44 (0) 1223 49 6891> >> > _______________________________________________> > Biojava-l mailing list  -  Biojava-l@biojava.org> > http://biojava.org/mailman/listinfo/biojava-l> >>>
From hotafin at gmail.com  Fri Dec  2 10:09:25 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Fri Dec  2 10:08:35 2005
Subject: [Biojava-l] cvs ant build failure
Message-ID: <c343d7080512020709h8f60cbfx15a8fe06733f5fb9@mail.gmail.com>

ant package-biojavaBuildfile: build.xml
init:     [echo] Building biojava-live     [echo] Java Home:/home/hota/programs/java/jdk1.5.0/jre     [echo] JUnit present:                   ${junit.present}     [echo] JUnit supported by Ant:          true     [echo] HSQLDB driver present:           ${sqlDriver.hsqldb}
prepare:
prepare-biojava:
compile-biojava:    [javac] Compiling 93 source files to/data3/installs/biojava-live/ant-build/classes/biojava    [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRichAnnotation.java:42:org.biojavax.EmptyRichAnnotation is not abstract and does not overrideabstract method getProperties(java.lang.Object) inorg.biojavax.RichAnnotation    [javac] public class EmptyRichAnnotation extends Unchangeable implementsRichAnnotation, Serializable {    [javac]        ^    [javac] Note: * uses or overrides a deprecated API.    [javac] Note: Recompile with -Xlint:deprecation for details.    [javac] 1 error
BUILD FAILED/data3/installs/biojava-live/build.xml:267: Compile failed; see the compilererror output for details.
From erik.sjolund at gmail.com  Fri Dec  2 08:20:53 2005
From: erik.sjolund at gmail.com (=?ISO-8859-1?Q?Erik_Sj=F6lund?=)
Date: Fri Dec  2 14:56:07 2005
Subject: [Biojava-l] abi2xml a new parser of abi trace files
Message-ID: <ddaa1890512020520p79e2ed23of8659ee059256027@mail.gmail.com>

Biojava contains a class to parse abi trace files:

http://www.biojava.org/docs/api14/org/biojava/bio/program/abi/ABITrace.html

So you might be interested to know that a new command line utility has
been released

http://abi2xml.sourceforge.net

that converts abi trace files to xml files. This bioinformatics
utility is written in C++ and released under the GPL license. A java
programmer could first convert the abi files to xml files and then
access the information over a DOM interface  or over XPATH. Probably
that java programmer has nothing to gain doing this compared to using
the ABITrace class, but I thought it was worth mentioning the
possibility.

cheers,
Erik Sj?lund

From fpepin at cs.mcgill.ca  Fri Dec  2 15:40:57 2005
From: fpepin at cs.mcgill.ca (Francois Pepin)
Date: Fri Dec  2 15:38:55 2005
Subject: [Biojava-l] cvs ant build failure
In-Reply-To: <c343d7080512020709h8f60cbfx15a8fe06733f5fb9@mail.gmail.com>
References: <c343d7080512020709h8f60cbfx15a8fe06733f5fb9@mail.gmail.com>
Message-ID: <1133556057.16992.36.camel@elm.mcb.mcgill.ca>

Hi Tamas,

I can compile fine from the CVS right now.

two reasons why it might not work for you:

1- you might not be up to date, cvs update should fix that.

1- you have previous build in there (otherwise it would say compiling
1096 source files instead of 93). You probably want to do an 'ant clean'
and try again.

Francois

On Fri, 2005-12-02 at 15:09 +0000, Tamas Horvath wrote:
> ant package-biojavaBuildfile: build.xml
> init:     [echo] Building biojava-live     [echo] Java Home:/home/hota/programs/java/jdk1.5.0/jre     [echo] JUnit present:                   ${junit.present}     [echo] JUnit supported by Ant:          true     [echo] HSQLDB driver present:           ${sqlDriver.hsqldb}
> prepare:
> prepare-biojava:
> compile-biojava:    [javac] Compiling 93 source files to/data3/installs/biojava-live/ant-build/classes/biojava    [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRichAnnotation.java:42:org.biojavax.EmptyRichAnnotation is not abstract and does not overrideabstract method getProperties(java.lang.Object) inorg.biojavax.RichAnnotation    [javac] public class EmptyRichAnnotation extends Unchangeable implementsRichAnnotation, Serializable {    [javac]        ^    [javac] Note: * uses or overrides a deprecated API.    [javac] Note: Recompile with -Xlint:deprecation for details.    [javac] 1 error
> BUILD FAILED/data3/installs/biojava-live/build.xml:267: Compile failed; see the compilererror output for details.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

From hotafin at gmail.com  Sun Dec  4 16:06:53 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Sun Dec  4 16:31:37 2005
Subject: [Biojava-l] 3d structure rotation code
Message-ID: <c343d7080512041306g50c2f293tbb89e85c6046ccb0@mail.gmail.com>

I'd like to show u the following 2 functions that may be valuable in theCalc class:
   /**Returns a rotated Structure object (the rotation is around the origo)    *    * @param ostructure Structure -- the stucture to be rotated    * @param from Atom            -- the reference Atom's originalcoordinates    * @param to Atom              -- the reference Atom's desiredcoordinates    * @return Structure           -- null if there was an error    */
   public static Structure rotate3D(Structure ostructure,Atom from, Atom to)throws StructureException {       Structure nstructure = new StructureImpl();
       //calculate the angle of rotation       final double angle = radangle(from,to);       if (angle == 0 || angle == Math.PI) {           throw new StructureException ("The rotation angle is 0 or 180degrees!");       }
       //calculate te unit normal vector of the (origo, from, to) pane       //which will serve as an arbitary axis for the rotation       Atom axisvector = vectorProduct(from,to);       axisvector = unitVector(axisvector);
       //calculate the trigonometric values       final double c = Math.cos(angle);       final double s = Math.sin(angle);       final double t = 1-Math.cos(angle);
       final double x = axisvector.getX();       final double y = axisvector.getY();       final double z = axisvector.getZ();
       //and now the matrix       double[][] rotationmatrix = new double[3][3];       rotationmatrix[0][0] = t*x*x+c  ;rotationmatrix[0][1] =t*x*y+s*z;rotationmatrix[0][2] = t*x*z-s*y;       rotationmatrix[1][0] = t*x*y-s*z;rotationmatrix[1][1] =t*y*y+c;rotationmatrix[1][2] = t*y*z+s*x;       rotationmatrix[2][0] = t*x*y+s*y;rotationmatrix[2][1] =t*y*z-s*x;rotationmatrix[2][2] = t*z*z+c;
       //and now the rotation       nstructure = (Structure) ostructure.clone();       try {           rotate(nstructure, rotationmatrix);       }       catch (StructureException e) {           System.out.println(e);           nstructure = null;       }
       return nstructure;   }
   /**Calculates the a,origo,b angle in radians    *    * @param a Atom    * @param b Atom    * @return double    */

   public static double radangle(Atom a, Atom b) {
       final double skalar = skalarProduct(a,b);       final double radangle = Math.acos( skalar/( amount(a) * amount(b) ));
       return radangle;   }
From hollandr at gis.a-star.edu.sg  Sun Dec  4 20:42:55 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Sun Dec  4 20:41:11 2005
Subject: [Biojava-l] cvs ant build failure
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602894DF0@BIONIC.biopolis.one-north.com>

I just checked out the most recent version and found a bug in
EmptyRichAnnotation just as your compiler output indicates. I fixed it. 

But... it still won't compile, but now for a different reason. It seems
that Andreas' check-in of his structure classes over the weekend was
missing the Matrix and SingularValueDecomposition classes. Andreas can
you fix this please?

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org 
> [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of 
> Tamas Horvath
> Sent: Friday, December 02, 2005 11:09 PM
> To: biojava-l@biojava.org
> Subject: [Biojava-l] cvs ant build failure
> 
> 
> ant package-biojavaBuildfile: build.xml
> init:     [echo] Building biojava-live     [echo] Java 
> Home:/home/hota/programs/java/jdk1.5.0/jre     [echo] JUnit 
> present:                   ${junit.present}     [echo] JUnit 
> supported by Ant:          true     [echo] HSQLDB driver 
> present:           ${sqlDriver.hsqldb}
> prepare:
> prepare-biojava:
> compile-biojava:    [javac] Compiling 93 source files 
> to/data3/installs/biojava-live/ant-build/classes/biojava    
> [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRich
> Annotation.java:42:org.biojavax.EmptyRichAnnotation is not 
> abstract and does not overrideabstract method 
> getProperties(java.lang.Object) inorg.biojavax.RichAnnotation 
>    [javac] public class EmptyRichAnnotation extends 
> Unchangeable implementsRichAnnotation, Serializable {    
> [javac]        ^    [javac] Note: * uses or overrides a 
> deprecated API.    [javac] Note: Recompile with 
> -Xlint:deprecation for details.    [javac] 1 error
> BUILD FAILED/data3/installs/biojava-live/build.xml:267: 
> Compile failed; see the compilererror output for details.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

From mark.schreiber at novartis.com  Sun Dec  4 20:52:12 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Dec  4 20:49:52 2005
Subject: [Biojava-l] cvs ant build failure
Message-ID: <OF27138484.364F895F-ON482570CE.0009FB32-482570CE.000A4619@EU.novartis.net>

Just a reminder to people with CVS accounts (including myself who is 
sometimes guilty of this):

The minimum requirement of CVS is that it will build at all times (using 
JDK1.4.2).
The desirable requirement is that it will build and pass all unit tests. 
This is not a strict requirement for the live distribution but it is good 
to think about what you may have done to break the unit tests.

- Mark


"Richard HOLLAND" <hollandr@gis.a-star.edu.sg>
Sent by: biojava-l-bounces@portal.open-bio.org
12/05/2005 09:42 AM

 
        To:     "Tamas Horvath" <hotafin@gmail.com>
        cc:     biojava-l@biojava.org, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] cvs ant build failure


I just checked out the most recent version and found a bug in
EmptyRichAnnotation just as your compiler output indicates. I fixed it. 

But... it still won't compile, but now for a different reason. It seems
that Andreas' check-in of his structure classes over the weekend was
missing the Matrix and SingularValueDecomposition classes. Andreas can
you fix this please?

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org 
> [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of 
> Tamas Horvath
> Sent: Friday, December 02, 2005 11:09 PM
> To: biojava-l@biojava.org
> Subject: [Biojava-l] cvs ant build failure
> 
> 
> ant package-biojavaBuildfile: build.xml
> init:     [echo] Building biojava-live     [echo] Java 
> Home:/home/hota/programs/java/jdk1.5.0/jre     [echo] JUnit 
> present:                   ${junit.present}     [echo] JUnit 
> supported by Ant:          true     [echo] HSQLDB driver 
> present:           ${sqlDriver.hsqldb}
> prepare:
> prepare-biojava:
> compile-biojava:    [javac] Compiling 93 source files 
> to/data3/installs/biojava-live/ant-build/classes/biojava 
> [javac]/data3/installs/biojava-live/src/org/biojavax/EmptyRich
> Annotation.java:42:org.biojavax.EmptyRichAnnotation is not 
> abstract and does not overrideabstract method 
> getProperties(java.lang.Object) inorg.biojavax.RichAnnotation 
>    [javac] public class EmptyRichAnnotation extends 
> Unchangeable implementsRichAnnotation, Serializable { 
> [javac]        ^    [javac] Note: * uses or overrides a 
> deprecated API.    [javac] Note: Recompile with 
> -Xlint:deprecation for details.    [javac] 1 error
> BUILD FAILED/data3/installs/biojava-live/build.xml:267: 
> Compile failed; see the compilererror output for details.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Mon Dec  5 01:31:48 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Dec  5 01:29:27 2005
Subject: [Biojava-l] BaumWelchTrainer Broken??!!!  (please help)
Message-ID: <OFE4106D02.97B9441A-ON482570CE.0023B193-482570CE.0023DEDF@EU.novartis.net>

Fixes for this bug suggested by Todd Riley and Thomas Down are now in CVS. 
I have tried a few examples and it seems to work well.

- Mark

From ap3 at sanger.ac.uk  Mon Dec  5 04:31:53 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon Dec  5 04:28:22 2005
Subject: [Biojava-l] cvs ant build failure
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5602894DF0@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5602894DF0@BIONIC.biopolis.one-north.com>
Message-ID: <5ab80aa187d2967d8abab0c13a239f47@sanger.ac.uk>

Hi Richard,

> But... it still won't compile, but now for a different reason. It seems
> that Andreas' check-in of his structure classes over the weekend was
> missing the Matrix and SingularValueDecomposition classes. Andreas can
> you fix this please?

They were all checked in at the same time yesterday evening in a new 
directory.

Did you do a

cvs update -dP

?

-d is for getting new directories and p for purging old ones.

Cheers,
Andreas


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From hollandr at gis.a-star.edu.sg  Mon Dec  5 04:37:00 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Dec  5 04:35:16 2005
Subject: [Biojava-l] cvs ant build failure
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602894E48@BIONIC.biopolis.one-north.com>

*doh!*

All working. Thanks, Andreas.


Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Andreas Prlic [mailto:ap3@sanger.ac.uk] 
> Sent: Monday, December 05, 2005 5:32 PM
> To: Richard HOLLAND
> Cc: biojava-l@biojava.org
> Subject: Re: [Biojava-l] cvs ant build failure
> 
> 
> Hi Richard,
> 
> > But... it still won't compile, but now for a different 
> reason. It seems
> > that Andreas' check-in of his structure classes over the weekend was
> > missing the Matrix and SingularValueDecomposition classes. 
> Andreas can
> > you fix this please?
> 
> They were all checked in at the same time yesterday evening in a new 
> directory.
> 
> Did you do a
> 
> cvs update -dP
> 
> ?
> 
> -d is for getting new directories and p for purging old ones.
> 
> Cheers,
> Andreas
> 
> 
> --------------------------------------------------------------
> ---------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                                Hinxton, Cambridge CB10 1SA, UK
> 			 +44 (0) 1223 49 6891
> 
> 

From hotafin at gmail.com  Mon Dec  5 09:13:59 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Mon Dec  5 09:18:52 2005
Subject: [Biojava-l] parsePDB
Message-ID: <c343d7080512050613n362fb249s316cc54bd8dbe60a@mail.gmail.com>

I have some very plane pdb files (Coordinates only), and if I try to parsethem, it throws:
java.lang.StringIndexOutOfBoundsException: String index out of range: 6    at java.lang.String.substring(String.java:1765)    at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:764)    at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:720)

What do I need in the pdb file to be able to parse it?
From ap3 at sanger.ac.uk  Mon Dec  5 09:36:00 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon Dec  5 09:32:51 2005
Subject: [Biojava-l] 3d structure rotation code
In-Reply-To: <c343d7080512020502lc87c8epc39781e78816e2b7@mail.gmail.com>
References: <c343d7080511300821y7258c6bcs86a1deb8c6affa56@mail.gmail.com>
	<1706112d2cfe772f3501821995576ead@sanger.ac.uk>
	<c343d7080512010426k7d398d1cwd082003e115e9108@mail.gmail.com>
	<c343d7080512011730g7efde2e6l3499d12f4f7d3fc3@mail.gmail.com>
	<d0e2fd03a50d90d7c7611b17f5f183f4@sanger.ac.uk>
	<c343d7080512020502lc87c8epc39781e78816e2b7@mail.gmail.com>
Message-ID: <e799609ff1a5b1d27adac5077e3c6901@sanger.ac.uk>

Hi!

Regarding the question posted by Tamas for creating artificial 
side-chains for amino acids last week:

To superimpose two (or more) residues/atoms, one needs to do a singular 
value decomposition, which
gives the  required rotation matrix and shift vector.  The recent 
biojava 1.4 release could not do this,
but after doing a little bit of research and using available open 
source as template,  biojava - cvs can do this now!

A "screenshot"  of two superimposed residues is available at:

http://www.sanger.ac.uk/Users/ap3/rotation_example.html


This is achieved by using the Jama library which is under us. gov. 
public domain license (i.e. do whatever you want).
  It is located at http://math.nist.gov/javanumerics/jama/

I  added the few files from this package to the biojava cvs repository 
under org.biojava.structure.jama.
  I thought that biojava should not have yet another .jar dependency, so 
inclusion of the code is better.

There is now also a class called SVDSuperimposer. It is heavily 
inspired by some code available from our friends
at Biopython.... :-) Thanks also to Peter Lackner for providing an 
example for how to calculate "virtual"
  CB atoms.


Now Tamas: back to your problem. I think you want to do something like 
the code below:

Regards,
Andreas


  try{

             // get two amino acids from somewhere
             String filename   =  "/Users/ap3/WORK/PDB/5pti.pdb" ;

             PDBFileReader pdbreader = new PDBFileReader();
             Structure struc = pdbreader.getStructure(filename);
             Group g1 = (Group)struc.getChain(0).getGroup(56);
             Group g2 = (Group)struc.getChain(0).getGroup(21);

             if ( g1.getPDBName().equals("GLY")){
                 if ( g1 instanceof AminoAcid){
                     Atom cb = Calc.createVirtualCBAtom((AminoAcid)g1);
                     g1.addAtom(cb);
                 }
             }

             if ( g2.getPDBName().equals("GLY")){
                 if ( g2 instanceof AminoAcid){
                     Atom cb = Calc.createVirtualCBAtom((AminoAcid)g2);
                     g2.addAtom(cb);
                 }
             }


             System.out.println(g1);
             System.out.println(g2);

             // convert the Groups to Atom arrays
             Atom[] atoms1 = new Atom[3];
             Atom[] atoms2 = new Atom[3];

             atoms1[0] = g1.getAtom("N");
             atoms1[1] = g1.getAtom("CA");
             atoms1[2] = g1.getAtom("CB");


             atoms2[0] = g2.getAtom("N");
             atoms2[1] = g2.getAtom("CA");
             atoms2[2] = g2.getAtom("CB");


             // and do the SVD ...
             SVDSuperimposer svds = new SVDSuperimposer(atoms1,atoms2);

             // the rotation matrix to be applied to group2
             Matrix rotMatrix = svds.getRotation();

             // and the vector to shift group2
             Atom tranMatrix = svds.getTranslation();


             // now we have all the info to perform the rotations ...

             // clone group2 - we want to preserve the original coords 
for the output later.
             Group newGroup = (Group)g2.clone();


             // and rotate it
             Calc.rotate(newGroup,rotMatrix);

             //    shift the group ...
             Calc.shift(newGroup,tranMatrix);

             // that's it!


             ///
             // now we finish up with doing some output:
             // write to a file to view in a viewer
             String outputfile = "/Users/ap3/WORK/PDB/rotated.pdb";

             FileOutputStream out= new FileOutputStream(outputfile);
             PrintStream p =  new PrintStream( out );

             // create a new structure that contains the data to be 
written to the file.
             Structure newstruc = new StructureImpl();

             // add the group1
             Chain c1 = new ChainImpl();
             c1.setName("A");
             c1.addGroup(g1);
             newstruc.addChain(c1);

             // add the now correctly positioned group2
             Chain c2 = new ChainImpl();
             c2.setName("B");
             c2.addGroup(newGroup);
             newstruc.addChain(c2);


             // show where the group was originally ...
             Chain c3 = new ChainImpl();
             c3.setName("C");
             //c3.addGroup(g1);
             c3.addGroup(g2);

             newstruc.addChain(c3);
             p.println(newstruc.toPDB());

             p.close();

             System.out.println("wrote to file " + outputfile);

         } catch (Exception e){
             e.printStackTrace();
         }


On 2 Dec 2005, at 13:02, Tamas Horvath wrote:

> Thanks for the codes! I've noticed the methods in Calc, but my main 
> question is the following. Let's say I've got a primitive library of 
> AminoAcides. They stored as a group, they have all the atoms. When I'm 
> mutating the chain, I want to keep the backbone atoms in place, so as 
> far as your mutate method goes it's ok. But now I want to replace the 
> sidechain. In order to do that, I'd shift and rotate the desired AA in 
> place (Cbs would be identical and the other backbone atoms as close as 
> possible), and then copy the sidechain atoms to the mutated AA... (I 
> hope that's clear)
>
> So do I have to wrap my Group objects to a Chain/Structure object in 
> order to shift and rotate them?
>
> I don't really get how the rotation is supposed to work... what is 
> exactly the matrix it asks for?
-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From ap3 at sanger.ac.uk  Mon Dec  5 09:40:57 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon Dec  5 09:37:16 2005
Subject: [Biojava-l] parsePDB
In-Reply-To: <c343d7080512050613n362fb249s316cc54bd8dbe60a@mail.gmail.com>
References: <c343d7080512050613n362fb249s316cc54bd8dbe60a@mail.gmail.com>
Message-ID: <2ec64bcf7c06324f76606abd8b887255@sanger.ac.uk>

can you send me one of your files off list?

the parser could parse all of PDB about one year ago ...

And.		


On 5 Dec 2005, at 14:13, Tamas Horvath wrote:

> I have some very plane pdb files (Coordinates only), and if I try to  
> parsethem, it throws:
> java.lang.StringIndexOutOfBoundsException: String index out of range:  
> 6    at java.lang.String.substring(String.java:1765)    at  
> org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.j 
> ava:764)    at  
> org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.j 
> ava:720)
>
> What do I need in the pdb file to be able to parse it?
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From mes5k at cs.virginia.edu  Mon Dec  5 22:00:49 2005
From: mes5k at cs.virginia.edu (Michael E. Smoot)
Date: Mon Dec  5 22:24:35 2005
Subject: [Biojava-l] Hit_def from blast xml output?
Message-ID: <Pine.SOC.4.64.0512052156380.8461@mamba.cs.Virginia.EDU>


Hi,

Can anyone tell me how I might get the value of the Hit_def tag from
blast xml output?  I'm following the cookbook protocol for parsing and 
extracting results 
(http://www.biojava.org/docs/bj_in_anger/BlastParser.htm).  I see a way to 
get the subject (hit) id, but not the description.


thanks,
Mike

From mark.schreiber at novartis.com  Tue Dec  6 02:44:50 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Dec  6 02:42:59 2005
Subject: [Biojava-l] Hit_def from blast xml output?
Message-ID: <OFD7DB048B.86D45508-ON482570CF.002A3491-482570CF.002A8EEF@EU.novartis.net>

You may need to customize your blast listeners. If you run the blast echo 
example in biojava in anger you will find out what event type that 
information is contained in. You can then listen for that event type.

http://www.biojava.org/docs/bj_in_anger/blastecho.htm

- Mark


"Michael E. Smoot" <mes5k@cs.virginia.edu>
Sent by: biojava-l-bounces@portal.open-bio.org
12/06/2005 11:00 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Hit_def from blast xml output?


Hi,

Can anyone tell me how I might get the value of the Hit_def tag from
blast xml output?  I'm following the cookbook protocol for parsing and 
extracting results 
(http://www.biojava.org/docs/bj_in_anger/BlastParser.htm).  I see a way to 
get the subject (hit) id, but not the description.


thanks,
Mike

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From hotafin at gmail.com  Thu Dec  8 09:16:47 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Thu Dec  8 09:14:40 2005
Subject: [Biojava-l] gromacs shell script
Message-ID: <c343d7080512080616k467e20a9o83454a19da5bd062@mail.gmail.com>

I know this is not strictly BioJava, but here's my problem:
I create a shell script file that would run a GROMACS MD simulationI generate the necessary input and config files
I can make the generated shell script runnable
I cannot actually run the script from the Java application.
The script works fine from shell...
The returned exitValue is 255.
Can anyone tell, what may I do?
From hotafin at gmail.com  Thu Dec  8 10:00:49 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Thu Dec  8 09:58:33 2005
Subject: [Biojava-l] gromacs shell script
In-Reply-To: <C6BB1558-3034-47B8-BB23-0A45CEC3EB58@sanger.ac.uk>
References: <c343d7080512080616k467e20a9o83454a19da5bd062@mail.gmail.com>
	<C6BB1558-3034-47B8-BB23-0A45CEC3EB58@sanger.ac.uk>
Message-ID: <c343d7080512080700h29f35172kf950bc131ba751c1@mail.gmail.com>

I tried the ProcessTools method and nothing happens there either...
On 12/8/05, Thomas Down <td2@sanger.ac.uk> wrote:>>> On 8 Dec 2005, at 14:16, Tamas Horvath wrote:>> > I know this is not strictly BioJava, but here's my problem:> > I create a shell script file that would run a GROMACS MD> > simulationI generate the necessary input and config files> > I can make the generated shell script runnable> > I cannot actually run the script from the Java application.> > The script works fine from shell...> > The returned exitValue is 255.> > Can anyone tell, what may I do?>> Could you give a few more details about the code you're using to run> the shell script from Java?  Running external processes using Java's> Runtime.exec method isn't totally trivial -- you usually need to> start some extra threads to handle the child process' input and output.>> I presume your script is actually printing some kind of error message> to standard error (or maybe standard out, if it's badly behaved, but> these may be getting lost.>> BioJava has some convenience methods that (usually) allow you to run> child processes without writing your own multithreaded code.  A> simple usage, that echoes the child's errors and outputs to the> console, would be something like:>>           ProcessTools.exec(>                            new String[] {"/path/to/my/script", "-> someArgument"},>                            null,           // no standard input>                            new OutputStreamWriter(System.out),>                            new OutputStreamWriter(System.err)>           );>> You probably don't want to do this in production code, but for> development and debugging it's quite useful.  For production use,> you'd normally use StringWriters to capture the child process' output.>>              Thomas.>
From hotafin at gmail.com  Thu Dec  8 09:59:06 2005
From: hotafin at gmail.com (Tamas Horvath)
Date: Thu Dec  8 10:03:36 2005
Subject: [Biojava-l] gromacs shell script
In-Reply-To: <C6BB1558-3034-47B8-BB23-0A45CEC3EB58@sanger.ac.uk>
References: <c343d7080512080616k467e20a9o83454a19da5bd062@mail.gmail.com>
	<C6BB1558-3034-47B8-BB23-0A45CEC3EB58@sanger.ac.uk>
Message-ID: <c343d7080512080659l77d5d80cj824ad08bf4c681c8@mail.gmail.com>

On 12/8/05, Tamas Horvath <hotafin@gmail.com> wrote:>> Runtime rtime = Runtime.getRuntime();>             Process child = rtime.exec("/bin/sh");>>>             BufferedWriter outCommand = new BufferedWriter(new>             OutputStreamWriter( child.getOutputStream()));>>>             outCommand.write("cd "+workhome +"; chmod +x run.bat;> exit\n");>             outCommand.flush();>>             child.waitFor();>             child.destroy();


this runs well the script gets the  executable  flag
            rtime = Runtime.getRuntime();>             child = rtime.exec(workhome+"run.bat");>>             BufferedReader input = new BufferedReader(new> InputStreamReader( child.getInputStream()));>             BufferedReader inerr = new BufferedReader(new> InputStreamReader(child.getErrorStream()));>>             String line = "";String lerr = "";>             while ( (line = input.readLine()) != null || (lerr => inerr.readLine()) != null){>                 if (line != null && !line.equals(""))  {>                     System.out.println(line);>                     lerr = inerr.readLine();>                 }>                 if (lerr != null && !lerr.equals(""))  System.out.println> (lerr);>             }>>             child.waitFor();>             System.err.println("EV:"+child.exitValue());>             child.destroy();


Here I only get the exit value, and nothing else


On 12/8/05, Thomas Down <td2@sanger.ac.uk> wrote:>>> On 8 Dec 2005, at 14:16, Tamas Horvath wrote:>> > I know this is not strictly BioJava, but here's my problem:> > I create a shell script file that would run a GROMACS MD> > simulationI generate the necessary input and config files> > I can make the generated shell script runnable> > I cannot actually run the script from the Java application.> > The script works fine from shell...> > The returned exitValue is 255.> > Can anyone tell, what may I do?>> Could you give a few more details about the code you're using to run> the shell script from Java?  Running external processes using Java's> Runtime.exec method isn't totally trivial -- you usually need to> start some extra threads to handle the child process' input and output.>> I presume your script is actually printing some kind of error message> to standard error (or maybe standard out, if it's badly behaved, but> these may be getting lost.>> BioJava has some convenience methods that (usually) allow you to run> child processes without writing your own multithreaded code.  A> simple usage, that echoes the child's errors and outputs to the> console, would be something like:>>           ProcessTools.exec(>                            new String[] {"/path/to/my/script", "-> someArgument"},>                            null,           // no standard input>                            new OutputStreamWriter(System.out),>                            new OutputStreamWriter(System.err)>           );>> You probably don't want to do this in production code, but for> development and debugging it's quite useful.  For production use,> you'd normally use StringWriters to capture the child process' output.>>              Thomas.>
From td2 at sanger.ac.uk  Thu Dec  8 09:46:08 2005
From: td2 at sanger.ac.uk (Thomas Down)
Date: Thu Dec  8 11:49:49 2005
Subject: [Biojava-l] gromacs shell script
In-Reply-To: <c343d7080512080616k467e20a9o83454a19da5bd062@mail.gmail.com>
References: <c343d7080512080616k467e20a9o83454a19da5bd062@mail.gmail.com>
Message-ID: <C6BB1558-3034-47B8-BB23-0A45CEC3EB58@sanger.ac.uk>


On 8 Dec 2005, at 14:16, Tamas Horvath wrote:

> I know this is not strictly BioJava, but here's my problem:
> I create a shell script file that would run a GROMACS MD  
> simulationI generate the necessary input and config files
> I can make the generated shell script runnable
> I cannot actually run the script from the Java application.
> The script works fine from shell...
> The returned exitValue is 255.
> Can anyone tell, what may I do?

Could you give a few more details about the code you're using to run  
the shell script from Java?  Running external processes using Java's  
Runtime.exec method isn't totally trivial -- you usually need to  
start some extra threads to handle the child process' input and output.

I presume your script is actually printing some kind of error message  
to standard error (or maybe standard out, if it's badly behaved, but  
these may be getting lost.

BioJava has some convenience methods that (usually) allow you to run  
child processes without writing your own multithreaded code.  A  
simple usage, that echoes the child's errors and outputs to the  
console, would be something like:

          ProcessTools.exec(
                           new String[] {"/path/to/my/script", "- 
someArgument"},
                           null,           // no standard input
                           new OutputStreamWriter(System.out),
                           new OutputStreamWriter(System.err)
          );

You probably don't want to do this in production code, but for  
development and debugging it's quite useful.  For production use,  
you'd normally use StringWriters to capture the child process' output.

             Thomas.
From ilhami.visne at gmail.com  Sun Dec 11 16:57:01 2005
From: ilhami.visne at gmail.com (Ilhami Visne)
Date: Sun Dec 11 17:19:24 2005
Subject: [Biojava-l] Restriction mapping for the whole chromosome sequence
Message-ID: <ce6b4d120512111357n50b4a82w4a1eb0ae3f3b53e2@mail.gmail.com>

hi,

i want to do restricting mapping for the whole chromosome sequence, e.g.
chr1, ~240MB. it goes for some enzyme, like MsiI, perfect but for an another
enzyme(MseI), i achieve an OutOfMemoryError. Why? What is the difference?

thanx in advance

From mark.schreiber at novartis.com  Sun Dec 11 20:15:55 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Dec 11 20:13:22 2005
Subject: [Biojava-l] Restriction mapping for the whole chromosome sequence
Message-ID: <OFF8D4B43F.D62B0462-ON482570D5.0006E89C-482570D5.0006F35A@EU.novartis.net>

Have you tried setting the -Xmx option of your JVM?


Ilhami Visne <ilhami.visne@gmail.com>
Sent by: biojava-l-bounces@portal.open-bio.org
12/12/2005 05:57 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Restriction mapping for the whole chromosome sequence


hi,

i want to do restricting mapping for the whole chromosome sequence, e.g.
chr1, ~240MB. it goes for some enzyme, like MsiI, perfect but for an 
another
enzyme(MseI), i achieve an OutOfMemoryError. Why? What is the difference?

thanx in advance

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From wetrull at yahoo.com  Mon Dec 12 19:22:48 2005
From: wetrull at yahoo.com (W. Eric Trull)
Date: Mon Dec 12 19:26:50 2005
Subject: [Biojava-l] SAXException with BLAST errors
Message-ID: <20051213002248.79592.qmail@web81412.mail.mud.yahoo.com>

Hello all,

Some of you may remember that I've been creating a Java application to front
a BLAST web service.  Everything is working great except some user found the
random sequence that causes problems (gotta love those users).  I'm using the
BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output.  I think I have
two problems; one is a NCBI BLAST problem and the other is with BioJava's
BlastXMLParserFacade.  Any help/advice would be appreciated, especially if I
have to explain the problem to NCBI - biology is not my strong suit.

Here is the relevant BioJava stack trace:

org.xml.sax.SAXException: <Hsp> is non-compliant.
	at
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
	at
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
	at
org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
	at org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
	at
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
	at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
	at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
	at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
	at
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)

Here is STDERR from NCBI BLAST on Sun Solaris:

[blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) >=
len(256)
[blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) >=
len(256)
[blastall] ERROR:  [065.106]  : /var/tmp/blast39961.tmpOutput
BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
Invalid value(s) [-3] in VisibleString
[?????????????????----------???????????????????????????????????????????? ...]

Here is what I get from NCBI BLAST on Windows XP:

[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(280)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(313)
>=
len(256)

Here is how I started BLAST:

/home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp -d
/home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 -o
/var/tmp/blast39961.tmp -b 0

Here is my input sequence:

MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV GAAPHPFLHR
YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI NGSNWEGILG
LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID HSLYTGSLWY
TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA ASSTEKFPDG
FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY KFAISQSSTG
TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
DCGYN

Here is the regular BLAST output for pdb|1ML5|E.  It seems odd to me that the
identities and positives are both zero - why is this even showing up as a
similar sequence?

>pdb|1ML5|E 30S Ribosomal Protein S2
          Length = 256

 Score = 28.1 bits (61), Expect = 5.8
 Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)

Query: 99  ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 158

Sbjct: 264 ----------                                                   313

Query: 159 SLVKQTHVPNL 169

Sbjct: 314             324


Here is the XML BLAST output for pdb|1ML5|E.  Notice the second <Hsp_hseq>
has a bunch of "#" signs.  Is this valid in BioJava?

        <Hit>
          <Hit_num>146</Hit_num>
          <Hit_id>pdb|1ML5|E</Hit_id>
          <Hit_def>30S Ribosomal Protein S2</Hit_def>
          <Hit_accession>1ML5_E</Hit_accession>
          <Hit_len>256</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>28.1054</Hsp_bit-score>
              <Hsp_score>61</Hsp_score>
              <Hsp_evalue>5.76848</Hsp_evalue>
              <Hsp_query-from>99</Hsp_query-from>
              <Hsp_query-to>169</Hsp_query-to>
              <Hsp_hit-from>264</Hsp_hit-from>
              <Hsp_hit-to>324</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_gaps>10</Hsp_gaps>
              <Hsp_align-len>71</Hsp_align-len>
             
<Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq>
             
<Hsp_hseq>#################----------############################################</Hsp_hseq>
              <Hsp_midline>                                                  
                    </Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>

Thanks.

-Eric Trull
From mark.schreiber at novartis.com  Mon Dec 12 20:37:59 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Dec 12 20:35:44 2005
Subject: [Biojava-l] SAXException with BLAST errors
Message-ID: <OF72C3350A.8939A34E-ON482570D6.0008E814-482570D6.0008F8C7@EU.novartis.net>

Not exactly sure what the problem is here but it looks like your input is 
not in FASTA format so that might be causing a problem??


"W. Eric Trull" <wetrull@yahoo.com>
Sent by: biojava-l-bounces@portal.open-bio.org
12/13/2005 08:22 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] SAXException with BLAST errors


Hello all,

Some of you may remember that I've been creating a Java application to 
front
a BLAST web service.  Everything is working great except some user found 
the
random sequence that causes problems (gotta love those users).  I'm using 
the
BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output.  I think I 
have
two problems; one is a NCBI BLAST problem and the other is with BioJava's
BlastXMLParserFacade.  Any help/advice would be appreciated, especially if 
I
have to explain the problem to NCBI - biology is not my strong suit.

Here is the relevant BioJava stack trace:

org.xml.sax.SAXException: <Hsp> is non-compliant.
                 at
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
                 at
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
                 at
org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
                 at 
org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
                 at
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
                 at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
                 at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
                 at 
org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
                 at
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)

Here is STDERR from NCBI BLAST on Sun Solaris:

[blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
>=
len(256)
[blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
>=
len(256)
[blastall] ERROR:  [065.106]  : /var/tmp/blast39961.tmpOutput
BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
Invalid value(s) [-3] in VisibleString
[?????????????????----------???????????????????????????????????????????? 
...]

Here is what I get from NCBI BLAST on Windows XP:

[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
start(263)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
start(263)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
start(280)
>=
len(256)
[NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
start(313)
>=
len(256)

Here is how I started BLAST:

/home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp 
-d
/home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 
-o
/var/tmp/blast39961.tmp -b 0

Here is my input sequence:

MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV 
GAAPHPFLHR
YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI 
NGSNWEGILG
LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID 
HSLYTGSLWY
TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA 
ASSTEKFPDG
FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY 
KFAISQSSTG
TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
DCGYN

Here is the regular BLAST output for pdb|1ML5|E.  It seems odd to me that 
the
identities and positives are both zero - why is this even showing up as a
similar sequence?

>pdb|1ML5|E 30S Ribosomal Protein S2
          Length = 256

 Score = 28.1 bits (61), Expect = 5.8
 Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)

Query: 99  ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 
158

Sbjct: 264 ---------- 313

Query: 159 SLVKQTHVPNL 169

Sbjct: 314             324


Here is the XML BLAST output for pdb|1ML5|E.  Notice the second <Hsp_hseq>
has a bunch of "#" signs.  Is this valid in BioJava?

        <Hit>
          <Hit_num>146</Hit_num>
          <Hit_id>pdb|1ML5|E</Hit_id>
          <Hit_def>30S Ribosomal Protein S2</Hit_def>
          <Hit_accession>1ML5_E</Hit_accession>
          <Hit_len>256</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>28.1054</Hsp_bit-score>
              <Hsp_score>61</Hsp_score>
              <Hsp_evalue>5.76848</Hsp_evalue>
              <Hsp_query-from>99</Hsp_query-from>
              <Hsp_query-to>169</Hsp_query-to>
              <Hsp_hit-from>264</Hsp_hit-from>
              <Hsp_hit-to>324</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_gaps>10</Hsp_gaps>
              <Hsp_align-len>71</Hsp_align-len>
 
<Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq>
 
<Hsp_hseq>#################----------############################################</Hsp_hseq>
              <Hsp_midline>  
                    </Hsp_midline>
            </Hsp>
          </Hit_hsps>
        </Hit>

Thanks.

-Eric Trull
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From wetrull at yahoo.com  Mon Dec 12 20:42:30 2005
From: wetrull at yahoo.com (W. Eric Trull)
Date: Mon Dec 12 20:46:32 2005
Subject: [Biojava-l] SAXException with BLAST errors
In-Reply-To: <OF72C3350A.8939A34E-ON482570D6.0008E814-482570D6.0008F8C7@EU.novartis.net>
Message-ID: <20051213014230.61941.qmail@web81405.mail.mud.yahoo.com>

No, I use BioJava to write the user's query sequence as a fasta file before
feeding it to BLAST.  I just copied a differently formatted sequence into my
post.

Thanks.

-Eric Trull

--- mark.schreiber@novartis.com wrote:

> Not exactly sure what the problem is here but it looks like your input is 
> not in FASTA format so that might be causing a problem??
> 
> 
> 
> 
> 
> "W. Eric Trull" <wetrull@yahoo.com>
> Sent by: biojava-l-bounces@portal.open-bio.org
> 12/13/2005 08:22 AM
> 
>  
>         To:     biojava-l@biojava.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] SAXException with BLAST errors
> 
> 
> Hello all,
> 
> Some of you may remember that I've been creating a Java application to 
> front
> a BLAST web service.  Everything is working great except some user found 
> the
> random sequence that causes problems (gotta love those users).  I'm using 
> the
> BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output.  I think I 
> have
> two problems; one is a NCBI BLAST problem and the other is with BioJava's
> BlastXMLParserFacade.  Any help/advice would be appreciated, especially if 
> I
> have to explain the problem to NCBI - biology is not my strong suit.
> 
> Here is the relevant BioJava stack trace:
> 
> org.xml.sax.SAXException: <Hsp> is non-compliant.
>                  at
>
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
>                  at
>
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
>                  at
> org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
>                  at 
> org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
>                  at
>
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
>                  at
>
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
>                  at
>
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
>                  at 
> org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
>                  at
>
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)
> 
> Here is STDERR from NCBI BLAST on Sun Solaris:
> 
> [blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
> >=
> len(256)
> [blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
> >=
> len(256)
> [blastall] ERROR:  [065.106]  : /var/tmp/blast39961.tmpOutput
> BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
> Invalid value(s) [-3] in VisibleString
> [?????????????????----------???????????????????????????????????????????? 
> ...]
> 
> Here is what I get from NCBI BLAST on Windows XP:
> 
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(263)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(263)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(280)
> >=
> len(256)
> [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> start(313)
> >=
> len(256)
> 
> Here is how I started BLAST:
> 
> /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp 
> -d
> /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 
> -o
> /var/tmp/blast39961.tmp -b 0
> 
> Here is my input sequence:
> 
> MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV 
> GAAPHPFLHR
> YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI 
> NGSNWEGILG
> LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID 
> HSLYTGSLWY
> TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA 
> ASSTEKFPDG
> FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY 
> KFAISQSSTG
> TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
> DCGYN
> 
> Here is the regular BLAST output for pdb|1ML5|E.  It seems odd to me that 
> the
> identities and positives are both zero - why is this even showing up as a
> similar sequence?
> 
> >pdb|1ML5|E 30S Ribosomal Protein S2
>           Length = 256
> 
>  Score = 28.1 bits (61), Expect = 5.8
>  Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)
> 
> Query: 99  ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 
> 158
> 
> Sbjct: 264 ---------- 313
> 
> Query: 159 SLVKQTHVPNL 169
> 
> Sbjct: 314             324
> 
> 
> Here is the XML BLAST output for pdb|1ML5|E.  Notice the second <Hsp_hseq>
> has a bunch of "#" signs.  Is this valid in BioJava?
> 
>         <Hit>
>           <Hit_num>146</Hit_num>
>           <Hit_id>pdb|1ML5|E</Hit_id>
>           <Hit_def>30S Ribosomal Protein S2</Hit_def>
>           <Hit_accession>1ML5_E</Hit_accession>
>           <Hit_len>256</Hit_len>
>           <Hit_hsps>
>             <Hsp>
>               <Hsp_num>1</Hsp_num>
>               <Hsp_bit-score>28.1054</Hsp_bit-score>
>               <Hsp_score>61</Hsp_score>
>               <Hsp_evalue>5.76848</Hsp_evalue>
>               <Hsp_query-from>99</Hsp_query-from>
>               <Hsp_query-to>169</Hsp_query-to>
>               <Hsp_hit-from>264</Hsp_hit-from>
>               <Hsp_hit-to>324</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_gaps>10</Hsp_gaps>
>               <Hsp_align-len>71</Hsp_align-len>
>  
>
<Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq>
>  
>
<Hsp_hseq>#################----------############################################</Hsp_hseq>
>               <Hsp_midline>  
>                     </Hsp_midline>
>             </Hsp>
>           </Hit_hsps>
>         </Hit>
> 
> Thanks.
> 
> -Eric Trull
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> 
> 


Thanks.

-W. Eric Trull
From wetrull at yahoo.com  Wed Dec 14 11:58:28 2005
From: wetrull at yahoo.com (W. Eric Trull)
Date: Wed Dec 14 12:02:27 2005
Subject: [Biojava-l] SAXException with BLAST errors
In-Reply-To: <OF43217AF9.FB3D82A5-ON482570D6.0009AA45-482570D6.0009C81A@EU.novartis.net>
Message-ID: <20051214165828.34425.qmail@web81402.mail.mud.yahoo.com>

Thanks for the suggestion Mark.  I emailed NCBI and the jist of the reply
was:

These SeqPortNew errors usually indicate a problem in the formatting process;
the #'s are certainly not normal. Is this the only database entry that
generates errors?

So I dug a little deeper on 1ML5 to discover that it has a chain 'e' and a
chain 'E'.  When I created my FASTA file to feed to formatdb I made the
deflines of the form pdb|<id>|<chain>, but in uppercase.  So I had two
entries with the same defline but different sequences.  I think this is my
problem and am working on fixing it now.

Thanks.

-Eric Trull

--- mark.schreiber@novartis.com wrote:

> I would send NCBI your test sequence, the blast output and the version of 
> BLAST and ask them if this is "normal". I have found them to be very 
> responsive in the past. If it is normal then we need to fix biojava to 
> cope.
> 
> - Mark
> 
> 
> 
> 
> 
> "W. Eric Trull" <wetrull@yahoo.com>
> 12/13/2005 09:42 AM
> 
>  
>         To:     Mark Schreiber/GP/Novartis@PH
>         cc:     biojava-l@biojava.org,
> biojava-l-bounces@portal.open-bio.org
>         Subject:        Re: [Biojava-l] SAXException with BLAST errors
> 
> 
> No, I use BioJava to write the user's query sequence as a fasta file 
> before
> feeding it to BLAST.  I just copied a differently formatted sequence into 
> my
> post.
> 
> Thanks.
> 
> -Eric Trull
> 
> --- mark.schreiber@novartis.com wrote:
> 
> > Not exactly sure what the problem is here but it looks like your input 
> is 
> > not in FASTA format so that might be causing a problem??
> > 
> > 
> > 
> > 
> > 
> > "W. Eric Trull" <wetrull@yahoo.com>
> > Sent by: biojava-l-bounces@portal.open-bio.org
> > 12/13/2005 08:22 AM
> > 
> > 
> >         To:     biojava-l@biojava.org
> >         cc:     (bcc: Mark Schreiber/GP/Novartis)
> >         Subject:        [Biojava-l] SAXException with BLAST errors
> > 
> > 
> > Hello all,
> > 
> > Some of you may remember that I've been creating a Java application to 
> > front
> > a BLAST web service.  Everything is working great except some user found 
> 
> > the
> > random sequence that causes problems (gotta love those users).  I'm 
> using 
> > the
> > BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output.  I think I 
> 
> > have
> > two problems; one is a NCBI BLAST problem and the other is with 
> BioJava's
> > BlastXMLParserFacade.  Any help/advice would be appreciated, especially 
> if 
> > I
> > have to explain the problem to NCBI - biology is not my strong suit.
> > 
> > Here is the relevant BioJava stack trace:
> > 
> > org.xml.sax.SAXException: <Hsp> is non-compliant.
> >                  at
> >
>
org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362)
> >                  at
> >
>
org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235)
> >                  at
> > 
> org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153)
> >                  at 
> > org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
> >                  at
> >
>
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456)
> >                  at
> >
>
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260)
> >                  at
> >
>
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
> >                  at 
> > org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
> >                  at
> >
>
org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180)
> > 
> > Here is STDERR from NCBI BLAST on Sun Solaris:
> > 
> > [blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
> 
> > >=
> > len(256)
> > [blastall] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E start(263) 
> 
> > >=
> > len(256)
> > [blastall] ERROR:  [065.106]  : /var/tmp/blast39961.tmpOutput
> > BlastOutput.iterations.E.hits.E.hsps.E.<hseq>
> > Invalid value(s) [-3] in VisibleString
> > [?????????????????----------???????????????????????????????????????????? 
> 
> > ...]
> > 
> > Here is what I get from NCBI BLAST on Windows XP:
> > 
> > [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> > start(263)
> > >=
> > len(256)
> > [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> > start(263)
> > >=
> > len(256)
> > [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> > start(280)
> > >=
> > len(256)
> > [NULL_Caption] ERROR: ncbiapi [000.000]  : SeqPortNew: pdb|1ML5|E 
> > start(313)
> > >=
> > len(256)
> > 
> > Here is how I started BLAST:
> > 
> > /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p 
> blastp 
> > -d
> > /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 
> 
> > -o
> > /var/tmp/blast39961.tmp -b 0
> > 
> > Here is my input sequence:
> > 
> > MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV 
> > GAAPHPFLHR
> > YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI 
> > NGSNWEGILG
> > LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID 
> > HSLYTGSLWY
> > TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA 
> > ASSTEKFPDG
> > FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY 
> > KFAISQSSTG
> > TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME
> > DCGYN
> > 
> > Here is the regular BLAST output for pdb|1ML5|E.  It seems odd to me 
> that 
> > the
> > identities and positives are both zero - why is this even showing up as 
> a
> > similar sequence?
> > 
> > >pdb|1ML5|E 30S Ribosomal Protein S2
> >           Length = 256
> > 
> >  Score = 28.1 bits (61), Expect = 5.8
> >  Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%)
> > 
> > Query: 99  ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 
> > 158
> > 
> > Sbjct: 264 ---------- 313
> > 
> > Query: 159 SLVKQTHVPNL 169
> > 
> > Sbjct: 314             324
> > 
> > 
> > Here is the XML BLAST output for pdb|1ML5|E.  Notice the second 
> <Hsp_hseq>
> > has a bunch of "#" signs.  Is this valid in BioJava?
> > 
> >         <Hit>
> >           <Hit_num>146</Hit_num>
> >           <Hit_id>pdb|1ML5|E</Hit_id>
> >           <Hit_def>30S Ribosomal Protein S2</Hit_def>
> >           <Hit_accession>1ML5_E</Hit_accession>
> >           <Hit_len>256</Hit_len>
> >           <Hit_hsps>
> >             <Hsp>
> >               <Hsp_num>1</Hsp_num>
> >               <Hsp_bit-score>28.1054</Hsp_bit-score>
> >               <Hsp_score>61</Hsp_score>
> >               <Hsp_evalue>5.76848</Hsp_evalue>
> >               <Hsp_query-from>99</Hsp_query-from>
> 
=== message truncated ===

From christoph.gille at charite.de  Thu Dec 15 05:22:29 2005
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Thu Dec 15 05:29:14 2005
Subject: [Biojava-l] BLAST (ncbi-blast, wu-blast, web-blast)
Message-ID: <65052.84.190.28.222.1134642149.squirrel@webmail.charite.de>

In the last weeks, we had many discussions on BLAST which shows that
BLAST is of much interest. To my knowledge, there is no class in
Biojava  for invoking Blast searches yet.

Therefore I would like to discuss the new BLAST API with you.
It is a  Java wrapper for local NCBI and
local WU-blast and for Web BLAST at  http://www.ebi.ac.uk .

Please have a look at the API and tell me your opinion.

Have I missed something, are the method names OK ?

http://www.charite.de/bioinf/strap/Scripting.html#SequenceBlaster

http://www.charite.de/bioinf/strap/biojavaInAnger_SequenceBlaster.html

Please send your suggestions.

Here a short description:
Implementations of SequenceBlaster produce XML output which can be
parsed with org.biojava.bio.program.sax.BlastLikeSAXParser.  There is
also a simple non-Biojava DOM based parser which is currently used only to
make a human readable output.

The Wrapper provides a cache to avoid that one and the same BLAST is
run twice.  If this is, however intended, the BLAST result must be
removed from the cache before invoking compute().

The implementations of NCBI and WU-blast evaluate the shell variables
BLASTDB and WUBLASTDB, respectively which point to directories were
the databases are located. This works even for Java1.4 where the method
getenv() is corrupted. The Java wrapper can thus determine, what
databases are available.

The API is thread save. You can perform the computation outside the
event dispatching thread.

Christoph


From ilueny at yahoo.com.br  Thu Dec 15 15:08:40 2005
From: ilueny at yahoo.com.br (Ilueny Santos)
Date: Thu Dec 15 15:12:33 2005
Subject: [Biojava-l] somebody can help me?
Message-ID: <20051215200840.95165.qmail@web53901.mail.yahoo.com>

 Hello,
  I am Brazilian, 
 
 my name is Ilueny Santos and is new in the world of the bioinformatic.
 I am writing my work of conclusion of course in this area and the subject of the work is:  "Classifying Bayesians and its Applications in the Recognition of Procaryotics Promoters".  
 I program in java and I am trying to use biojava it for locating the regions box-10 and box-35 in DNA.
 
 somebody can help me?  
 
 it forgives, my English is not very good.
 
 since already I am thankful.

		
---------------------------------
 Yahoo! doce lar. Fa?a do Yahoo! sua homepage.
From m.fortner at sbcglobal.net  Thu Dec 15 16:55:11 2005
From: m.fortner at sbcglobal.net (Mark Fortner)
Date: Thu Dec 15 16:59:13 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
Message-ID: <43A1E63F.2060302@sbcglobal.net>

I'm looking for the best way to iterate through all
nmers within a given sequence.  For example, given a
sequence that looks like this:

ACTGACTGACTG

If I extract all trimers from this I should get:

ACT
CTG
TGA
GAC
ACT
CTG
TGA
GAC
ACT
CTG

Is there an existing class that will allow me to
iterate through a sequence this way, or am I on my
own?

Regards,

Mark Fortner

From smh1008 at cam.ac.uk  Thu Dec 15 18:34:02 2005
From: smh1008 at cam.ac.uk (David Huen)
Date: Thu Dec 15 18:53:50 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
In-Reply-To: <43A1E63F.2060302@sbcglobal.net>
References: <43A1E63F.2060302@sbcglobal.net>
Message-ID: <Prayer.1.0.16.0512152334020.17861@hermes-1.csi.cam.ac.uk>

On Dec 15 2005, Mark Fortner wrote:
I think what you want is the SymbolListViews.orderNSymbolList method.

It will take a SymbolList and turn it into another where it is viewed in 
another compound alphabet of the required order.


>I'm looking for the best way to iterate through all
>nmers within a given sequence.  For example, given a
>sequence that looks like this:
>
>ACTGACTGACTG
>
>If I extract all trimers from this I should get:
>
>ACT
>CTG
>TGA
>GAC
>ACT
>CTG
>TGA
>GAC
>ACT
>CTG
>
>Is there an existing class that will allow me to
>iterate through a sequence this way, or am I on my
>own?
>

From mark.schreiber at novartis.com  Thu Dec 15 20:06:01 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Dec 15 20:03:20 2005
Subject: [Biojava-l] somebody can help me?
Message-ID: <OF3220A0A1.B71982E2-ON482570D9.0005EE13-482570D9.00060B7E@EU.novartis.net>

Hello -

If you want to make a Bayesian classifier you would most likely use the 
org.biojava.dist packages to calculate distributions of nulceotide 
frequency.

Hope this helps,

- Mark


Ilueny Santos <ilueny@yahoo.com.br>
Sent by: biojava-l-bounces@portal.open-bio.org
12/16/2005 04:08 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] somebody can help me?


 Hello,
  I am Brazilian, 
 
 my name is Ilueny Santos and is new in the world of the bioinformatic.
 I am writing my work of conclusion of course in this area and the subject 
of the work is:  "Classifying Bayesians and its Applications in the 
Recognition of Procaryotics Promoters". 
 I program in java and I am trying to use biojava it for locating the 
regions box-10 and box-35 in DNA.
 
 somebody can help me? 
 
 it forgives, my English is not very good.
 
 since already I am thankful.

 
---------------------------------
 Yahoo! doce lar. Fa?a do Yahoo! sua homepage.
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From hollandr at gis.a-star.edu.sg  Thu Dec 15 20:43:52 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Thu Dec 15 20:53:54 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com>

orderNSymbolList splits the sequence into non-overlapping chunks. What
is required here is chunks that are only one base different (further on)
than the previous chunk.

The simplest way would be this:

	SymbolList mySeq; // this is your sequence from somewhere else
	for (int i = 1 ; i <= mySeq.length()-2; i++) {
		SymbolList trimer = mySeq.subSeq(i,i+2); // coords are
inclusive so i to i+2 = 3 bases
		// do something with your trimer here
	}

Note that the index starts at 1 and goes right up to and including
length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
	
cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org 
> [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen
> Sent: Friday, December 16, 2005 7:34 AM
> To: m.fortner@sbcglobal.net
> Cc: biojava-list
> Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
> 
> 
> On Dec 15 2005, Mark Fortner wrote:
> I think what you want is the SymbolListViews.orderNSymbolList method.
> 
> It will take a SymbolList and turn it into another where it 
> is viewed in 
> another compound alphabet of the required order.
> 
> 
> >I'm looking for the best way to iterate through all
> >nmers within a given sequence.  For example, given a
> >sequence that looks like this:
> >
> >ACTGACTGACTG
> >
> >If I extract all trimers from this I should get:
> >
> >ACT
> >CTG
> >TGA
> >GAC
> >ACT
> >CTG
> >TGA
> >GAC
> >ACT
> >CTG
> >
> >Is there an existing class that will allow me to
> >iterate through a sequence this way, or am I on my
> >own?
> >
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

From mark.schreiber at novartis.com  Thu Dec 15 21:33:48 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Dec 15 21:33:26 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
Message-ID: <OF9C02C81B.48FF89FE-ON482570D9.000DB386-482570D9.000E14DC@EU.novartis.net>

Actually orderNSymbolList gives overlapping NMers. windowedSymbolList 
gives non-overlapping Nmers.

given the sequence 

actcgcatgcgatcgcag


orderNSymbolList (with order of 4) would give

actc, ctcg, tcgc etc

windowedSymbolList with an order of 4 would give

actc, gcat, gcga, etc

eventually the windowedSymbolList woud actually throw an exception cause 
the sequence above is not evenly divisible by 4 (seq.length() % 4 != 0)

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


"Richard HOLLAND" <hollandr@gis.a-star.edu.sg>
Sent by: biojava-l-bounces@portal.open-bio.org
12/16/2005 09:43 AM

 
        To:     "David Huen" <smh1008@cam.ac.uk>, <m.fortner@sbcglobal.net>
        cc:     biojava-list <biojava-l@biojava.org>, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] Sequence Iteration in BioJava(x)


orderNSymbolList splits the sequence into non-overlapping chunks. What
is required here is chunks that are only one base different (further on)
than the previous chunk.

The simplest way would be this:

                 SymbolList mySeq; // this is your sequence from somewhere 
else
                 for (int i = 1 ; i <= mySeq.length()-2; i++) {
                                 SymbolList trimer = mySeq.subSeq(i,i+2); 
// coords are
inclusive so i to i+2 = 3 bases
                                 // do something with your trimer here
                 }

Note that the index starts at 1 and goes right up to and including
length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
 
cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org 
> [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen
> Sent: Friday, December 16, 2005 7:34 AM
> To: m.fortner@sbcglobal.net
> Cc: biojava-list
> Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
> 
> 
> On Dec 15 2005, Mark Fortner wrote:
> I think what you want is the SymbolListViews.orderNSymbolList method.
> 
> It will take a SymbolList and turn it into another where it 
> is viewed in 
> another compound alphabet of the required order.
> 
> 
> >I'm looking for the best way to iterate through all
> >nmers within a given sequence.  For example, given a
> >sequence that looks like this:
> >
> >ACTGACTGACTG
> >
> >If I extract all trimers from this I should get:
> >
> >ACT
> >CTG
> >TGA
> >GAC
> >ACT
> >CTG
> >TGA
> >GAC
> >ACT
> >CTG
> >
> >Is there an existing class that will allow me to
> >iterate through a sequence this way, or am I on my
> >own?
> >
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From m.fortner at sbcglobal.net  Thu Dec 15 21:36:11 2005
From: m.fortner at sbcglobal.net (Mark Fortner)
Date: Thu Dec 15 21:40:13 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com>
Message-ID: <43A2281B.7010609@sbcglobal.net>

Richard,
Thanks for the example.  Your approach is very similar to a non-BioJava 
approach that I had worked out earlier.  I was wondering if the 
BioJava(x) API provides any performance benefit over simply running a 
window along a character stream? 

The work that we're doing involves iterating through the human genome, 
(and in a number of cases, metagenomic sequences) and we're trying to 
squeeze as much performance out of it as possible while minimizing the 
memory footprint.

Thanks,

Mark

Richard HOLLAND wrote:

>orderNSymbolList splits the sequence into non-overlapping chunks. What
>is required here is chunks that are only one base different (further on)
>than the previous chunk.
>
>The simplest way would be this:
>
>	SymbolList mySeq; // this is your sequence from somewhere else
>	for (int i = 1 ; i <= mySeq.length()-2; i++) {
>		SymbolList trimer = mySeq.subSeq(i,i+2); // coords are
>inclusive so i to i+2 = 3 bases
>		// do something with your trimer here
>	}
>
>Note that the index starts at 1 and goes right up to and including
>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
>	
>cheers,
>Richard
>
>Richard Holland
>Bioinformatics Specialist
>GIS extension 8199
>---------------------------------------------
>This email is confidential and may be privileged. If you are not the
>intended recipient, please delete it and notify us immediately. Please
>do not copy or use it for any purpose, or disclose its content to any
>other person. Thank you.
>---------------------------------------------
>
>
>  
>
>>-----Original Message-----
>>From: biojava-l-bounces@portal.open-bio.org 
>>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen
>>Sent: Friday, December 16, 2005 7:34 AM
>>To: m.fortner@sbcglobal.net
>>Cc: biojava-list
>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
>>
>>
>>On Dec 15 2005, Mark Fortner wrote:
>>I think what you want is the SymbolListViews.orderNSymbolList method.
>>
>>It will take a SymbolList and turn it into another where it 
>>is viewed in 
>>another compound alphabet of the required order.
>>
>>
>>    
>>
>>>I'm looking for the best way to iterate through all
>>>nmers within a given sequence.  For example, given a
>>>sequence that looks like this:
>>>
>>>ACTGACTGACTG
>>>
>>>If I extract all trimers from this I should get:
>>>
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>
>>>Is there an existing class that will allow me to
>>>iterate through a sequence this way, or am I on my
>>>own?
>>>
>>>      
>>>
>>_______________________________________________
>>Biojava-l mailing list  -  Biojava-l@biojava.org
>>http://biojava.org/mailman/listinfo/biojava-l
>>
>>    
>>
>
>  
>

From hollandr at gis.a-star.edu.sg  Thu Dec 15 21:57:15 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Thu Dec 15 21:55:13 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602895362@BIONIC.biopolis.one-north.com>

Mark's comments earlier make my sample code redundant. I had the two
different window thingies confused.

See his post for more details!

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org 
> [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of 
> Mark Fortner
> Sent: Friday, December 16, 2005 10:36 AM
> To: biojava-list
> Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
> 
> 
> Richard,
> Thanks for the example.  Your approach is very similar to a 
> non-BioJava 
> approach that I had worked out earlier.  I was wondering if the 
> BioJava(x) API provides any performance benefit over simply running a 
> window along a character stream? 
> 
> The work that we're doing involves iterating through the 
> human genome, 
> (and in a number of cases, metagenomic sequences) and we're trying to 
> squeeze as much performance out of it as possible while 
> minimizing the 
> memory footprint.
> 
> Thanks,
> 
> Mark
> 
> Richard HOLLAND wrote:
> 
> >orderNSymbolList splits the sequence into non-overlapping 
> chunks. What
> >is required here is chunks that are only one base different 
> (further on)
> >than the previous chunk.
> >
> >The simplest way would be this:
> >
> >	SymbolList mySeq; // this is your sequence from somewhere else
> >	for (int i = 1 ; i <= mySeq.length()-2; i++) {
> >		SymbolList trimer = mySeq.subSeq(i,i+2); // coords are
> >inclusive so i to i+2 = 3 bases
> >		// do something with your trimer here
> >	}
> >
> >Note that the index starts at 1 and goes right up to and including
> >length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
> >	
> >cheers,
> >Richard
> >
> >Richard Holland
> >Bioinformatics Specialist
> >GIS extension 8199
> >---------------------------------------------
> >This email is confidential and may be privileged. If you are not the
> >intended recipient, please delete it and notify us 
> immediately. Please
> >do not copy or use it for any purpose, or disclose its content to any
> >other person. Thank you.
> >---------------------------------------------
> >
> >
> >  
> >
> >>-----Original Message-----
> >>From: biojava-l-bounces@portal.open-bio.org 
> >>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of 
> David Huen
> >>Sent: Friday, December 16, 2005 7:34 AM
> >>To: m.fortner@sbcglobal.net
> >>Cc: biojava-list
> >>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
> >>
> >>
> >>On Dec 15 2005, Mark Fortner wrote:
> >>I think what you want is the 
> SymbolListViews.orderNSymbolList method.
> >>
> >>It will take a SymbolList and turn it into another where it 
> >>is viewed in 
> >>another compound alphabet of the required order.
> >>
> >>
> >>    
> >>
> >>>I'm looking for the best way to iterate through all
> >>>nmers within a given sequence.  For example, given a
> >>>sequence that looks like this:
> >>>
> >>>ACTGACTGACTG
> >>>
> >>>If I extract all trimers from this I should get:
> >>>
> >>>ACT
> >>>CTG
> >>>TGA
> >>>GAC
> >>>ACT
> >>>CTG
> >>>TGA
> >>>GAC
> >>>ACT
> >>>CTG
> >>>
> >>>Is there an existing class that will allow me to
> >>>iterate through a sequence this way, or am I on my
> >>>own?
> >>>
> >>>      
> >>>
> >>_______________________________________________
> >>Biojava-l mailing list  -  Biojava-l@biojava.org
> >>http://biojava.org/mailman/listinfo/biojava-l
> >>
> >>    
> >>
> >
> >  
> >
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

From mark.schreiber at novartis.com  Thu Dec 15 22:45:21 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Dec 15 22:43:44 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
Message-ID: <OF7023038A.6B171AA3-ON482570D9.0013CB6E-482570D9.0014A1BB@EU.novartis.net>

There is probably not any performance benefit except in the case of very 
large sequences which are often compressed behind the scenes by biojava.

The benefits may come from ease of use and object orientation.

eg, There is probably already a parser to read in an validate your 
sequence, The windowing or nMer stuff is already figured out and has been 
used by lots of people so it's been "stress tested". Also the objects 
themselves have a lot of functionality built in that a character stream 
does not. The downside of using objects is they take up memory and there 
is a certain amount of overhead in there construction. To help overcome 
this SymbolLists are actually lists of references to Symbols not lists of 
Symbols themselves. This makes them much smaller (although not as small as 
char[]'s).

If you want superfast performance then you should bit encode the data and 
operate over it with memory pointers as in C or machine code. You should 
be aware though that any intensive loop like the ones that would be used 
to carry out this operation in biojava will almost certainly be detected 
and compiled into native code by the Java Runtime on the fly. This might 
make it hard to say if the java code would be much slower than the C code.

- Mark


Mark Fortner <m.fortner@sbcglobal.net>
Sent by: biojava-l-bounces@portal.open-bio.org
12/16/2005 10:36 AM
Please respond to m.fortner

 
        To:     biojava-list <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Sequence Iteration in BioJava(x)


Richard,
Thanks for the example.  Your approach is very similar to a non-BioJava 
approach that I had worked out earlier.  I was wondering if the 
BioJava(x) API provides any performance benefit over simply running a 
window along a character stream? 

The work that we're doing involves iterating through the human genome, 
(and in a number of cases, metagenomic sequences) and we're trying to 
squeeze as much performance out of it as possible while minimizing the 
memory footprint.

Thanks,

Mark

Richard HOLLAND wrote:

>orderNSymbolList splits the sequence into non-overlapping chunks. What
>is required here is chunks that are only one base different (further on)
>than the previous chunk.
>
>The simplest way would be this:
>
>                SymbolList mySeq; // this is your sequence from somewhere 
else
>                for (int i = 1 ; i <= mySeq.length()-2; i++) {
>                                SymbolList trimer = mySeq.subSeq(i,i+2); 
// coords are
>inclusive so i to i+2 = 3 bases
>                                // do something with your trimer here
>                }
>
>Note that the index starts at 1 and goes right up to and including
>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
> 
>cheers,
>Richard
>
>Richard Holland
>Bioinformatics Specialist
>GIS extension 8199
>---------------------------------------------
>This email is confidential and may be privileged. If you are not the
>intended recipient, please delete it and notify us immediately. Please
>do not copy or use it for any purpose, or disclose its content to any
>other person. Thank you.
>---------------------------------------------
>
>
> 
>
>>-----Original Message-----
>>From: biojava-l-bounces@portal.open-bio.org 
>>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen
>>Sent: Friday, December 16, 2005 7:34 AM
>>To: m.fortner@sbcglobal.net
>>Cc: biojava-list
>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
>>
>>
>>On Dec 15 2005, Mark Fortner wrote:
>>I think what you want is the SymbolListViews.orderNSymbolList method.
>>
>>It will take a SymbolList and turn it into another where it 
>>is viewed in 
>>another compound alphabet of the required order.
>>
>>
>> 
>>
>>>I'm looking for the best way to iterate through all
>>>nmers within a given sequence.  For example, given a
>>>sequence that looks like this:
>>>
>>>ACTGACTGACTG
>>>
>>>If I extract all trimers from this I should get:
>>>
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>TGA
>>>GAC
>>>ACT
>>>CTG
>>>
>>>Is there an existing class that will allow me to
>>>iterate through a sequence this way, or am I on my
>>>own?
>>>
>>> 
>>>
>>_______________________________________________
>>Biojava-l mailing list  -  Biojava-l@biojava.org
>>http://biojava.org/mailman/listinfo/biojava-l
>>
>> 
>>
>
> 
>

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From m.fortner at sbcglobal.net  Thu Dec 15 23:09:42 2005
From: m.fortner at sbcglobal.net (Mark Fortner)
Date: Thu Dec 15 23:13:48 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
In-Reply-To: <OF7023038A.6B171AA3-ON482570D9.0013CB6E-482570D9.0014A1BB@EU.novartis.net>
References: <OF7023038A.6B171AA3-ON482570D9.0013CB6E-482570D9.0014A1BB@EU.novartis.net>
Message-ID: <43A23E06.8040202@sbcglobal.net>

Mark,
Thanks for the info.  This is sort of a test project for us.  We have a 
few classes and data structures in C++ that handle operations like 
sequence io and packing, and are fairly fast.  However, we've also come 
to the realization that we've spent a lot of time dealing with 
cross-platform and compiler-related problems, and if Java can give us 
comparable performance then we might switch to it.  If nothing else, the 
opportunity costs would be lower, since we could write and test more 
code, in the same amount of time.  The tools are good-deal better for 
Java development than C++.

We're at the point where we can either continue to invest time in our 
library or rewrite what we have using BioJava and other libraries.  I've 
written a lot of Java-code over the past 10 years and suggested that we 
try Java both using the standard javac compiler and gcj to see if we can 
get C++ like performance.

Thanks for your help,

Mark

mark.schreiber@novartis.com wrote:

>There is probably not any performance benefit except in the case of very 
>large sequences which are often compressed behind the scenes by biojava.
>
>The benefits may come from ease of use and object orientation.
>
>eg, There is probably already a parser to read in an validate your 
>sequence, The windowing or nMer stuff is already figured out and has been 
>used by lots of people so it's been "stress tested". Also the objects 
>themselves have a lot of functionality built in that a character stream 
>does not. The downside of using objects is they take up memory and there 
>is a certain amount of overhead in there construction. To help overcome 
>this SymbolLists are actually lists of references to Symbols not lists of 
>Symbols themselves. This makes them much smaller (although not as small as 
>char[]'s).
>
>If you want superfast performance then you should bit encode the data and 
>operate over it with memory pointers as in C or machine code. You should 
>be aware though that any intensive loop like the ones that would be used 
>to carry out this operation in biojava will almost certainly be detected 
>and compiled into native code by the Java Runtime on the fly. This might 
>make it hard to say if the java code would be much slower than the C code.
>
>- Mark
>
>
>
>
>
>Mark Fortner <m.fortner@sbcglobal.net>
>Sent by: biojava-l-bounces@portal.open-bio.org
>12/16/2005 10:36 AM
>Please respond to m.fortner
>
> 
>        To:     biojava-list <biojava-l@biojava.org>
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        Re: [Biojava-l] Sequence Iteration in BioJava(x)
>
>
>Richard,
>Thanks for the example.  Your approach is very similar to a non-BioJava 
>approach that I had worked out earlier.  I was wondering if the 
>BioJava(x) API provides any performance benefit over simply running a 
>window along a character stream? 
>
>The work that we're doing involves iterating through the human genome, 
>(and in a number of cases, metagenomic sequences) and we're trying to 
>squeeze as much performance out of it as possible while minimizing the 
>memory footprint.
>
>Thanks,
>
>Mark
>
>Richard HOLLAND wrote:
>
>  
>
>>orderNSymbolList splits the sequence into non-overlapping chunks. What
>>is required here is chunks that are only one base different (further on)
>>than the previous chunk.
>>
>>The simplest way would be this:
>>
>>               SymbolList mySeq; // this is your sequence from somewhere 
>>    
>>
>else
>  
>
>>               for (int i = 1 ; i <= mySeq.length()-2; i++) {
>>                               SymbolList trimer = mySeq.subSeq(i,i+2); 
>>    
>>
>// coords are
>  
>
>>inclusive so i to i+2 = 3 bases
>>                               // do something with your trimer here
>>               }
>>
>>Note that the index starts at 1 and goes right up to and including
>>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
>>
>>cheers,
>>Richard
>>
>>Richard Holland
>>Bioinformatics Specialist
>>GIS extension 8199
>>---------------------------------------------
>>This email is confidential and may be privileged. If you are not the
>>intended recipient, please delete it and notify us immediately. Please
>>do not copy or use it for any purpose, or disclose its content to any
>>other person. Thank you.
>>---------------------------------------------
>>
>>
>>
>>
>>    
>>
>>>-----Original Message-----
>>>From: biojava-l-bounces@portal.open-bio.org 
>>>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen
>>>Sent: Friday, December 16, 2005 7:34 AM
>>>To: m.fortner@sbcglobal.net
>>>Cc: biojava-list
>>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
>>>
>>>
>>>On Dec 15 2005, Mark Fortner wrote:
>>>I think what you want is the SymbolListViews.orderNSymbolList method.
>>>
>>>It will take a SymbolList and turn it into another where it 
>>>is viewed in 
>>>another compound alphabet of the required order.
>>>
>>>
>>>
>>>
>>>      
>>>
>>>>I'm looking for the best way to iterate through all
>>>>nmers within a given sequence.  For example, given a
>>>>sequence that looks like this:
>>>>
>>>>ACTGACTGACTG
>>>>
>>>>If I extract all trimers from this I should get:
>>>>
>>>>ACT
>>>>CTG
>>>>TGA
>>>>GAC
>>>>ACT
>>>>CTG
>>>>TGA
>>>>GAC
>>>>ACT
>>>>CTG
>>>>
>>>>Is there an existing class that will allow me to
>>>>iterate through a sequence this way, or am I on my
>>>>own?
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Biojava-l mailing list  -  Biojava-l@biojava.org
>>>http://biojava.org/mailman/listinfo/biojava-l
>>>
>>>
>>>
>>>      
>>>
>>
>>    
>>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>
>  
>

From mark.schreiber at novartis.com  Fri Dec 16 00:37:30 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Fri Dec 16 00:35:16 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
Message-ID: <OFFD597048.712CA745-ON482570D9.001E85C4-482570D9.001EE65B@EU.novartis.net>

You should also be aware that the biojavax sequence i/o is actually about 
3 times slower than the biojava sequence i/o (for genbank, haven't tested 
others). This is because it does a much better job of parsing the relevant 
details into a more structured object heirachy.

Having said that it is possible to set i/o pipeline up so that it ignores 
details that are not of interest to you. If you only want the sequence 
name and the sequence data from Genbank (and not all the features, 
annotations and comments) then parsing is on average about 10x faster 
(based on about 4000 eukaryote records). Details on how to do this can be 
found in the biojavax docboc in CVS.

- Mark


Mark Fortner <m.fortner@sbcglobal.net>
Sent by: biojava-l-bounces@portal.open-bio.org
12/16/2005 12:09 PM
Please respond to m.fortner

 
        To:     biojava-list <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Sequence Iteration in BioJava(x)


Mark,
Thanks for the info.  This is sort of a test project for us.  We have a 
few classes and data structures in C++ that handle operations like 
sequence io and packing, and are fairly fast.  However, we've also come 
to the realization that we've spent a lot of time dealing with 
cross-platform and compiler-related problems, and if Java can give us 
comparable performance then we might switch to it.  If nothing else, the 
opportunity costs would be lower, since we could write and test more 
code, in the same amount of time.  The tools are good-deal better for 
Java development than C++.

We're at the point where we can either continue to invest time in our 
library or rewrite what we have using BioJava and other libraries.  I've 
written a lot of Java-code over the past 10 years and suggested that we 
try Java both using the standard javac compiler and gcj to see if we can 
get C++ like performance.

Thanks for your help,

Mark

mark.schreiber@novartis.com wrote:

>There is probably not any performance benefit except in the case of very 
>large sequences which are often compressed behind the scenes by biojava.
>
>The benefits may come from ease of use and object orientation.
>
>eg, There is probably already a parser to read in an validate your 
>sequence, The windowing or nMer stuff is already figured out and has been 

>used by lots of people so it's been "stress tested". Also the objects 
>themselves have a lot of functionality built in that a character stream 
>does not. The downside of using objects is they take up memory and there 
>is a certain amount of overhead in there construction. To help overcome 
>this SymbolLists are actually lists of references to Symbols not lists of 

>Symbols themselves. This makes them much smaller (although not as small 
as 
>char[]'s).
>
>If you want superfast performance then you should bit encode the data and 

>operate over it with memory pointers as in C or machine code. You should 
>be aware though that any intensive loop like the ones that would be used 
>to carry out this operation in biojava will almost certainly be detected 
>and compiled into native code by the Java Runtime on the fly. This might 
>make it hard to say if the java code would be much slower than the C 
code.
>
>- Mark
>
>
>
>
>
>Mark Fortner <m.fortner@sbcglobal.net>
>Sent by: biojava-l-bounces@portal.open-bio.org
>12/16/2005 10:36 AM
>Please respond to m.fortner
>
> 
>        To:     biojava-list <biojava-l@biojava.org>
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        Re: [Biojava-l] Sequence Iteration in BioJava(x)
>
>
>Richard,
>Thanks for the example.  Your approach is very similar to a non-BioJava 
>approach that I had worked out earlier.  I was wondering if the 
>BioJava(x) API provides any performance benefit over simply running a 
>window along a character stream? 
>
>The work that we're doing involves iterating through the human genome, 
>(and in a number of cases, metagenomic sequences) and we're trying to 
>squeeze as much performance out of it as possible while minimizing the 
>memory footprint.
>
>Thanks,
>
>Mark
>
>Richard HOLLAND wrote:
>
> 
>
>>orderNSymbolList splits the sequence into non-overlapping chunks. What
>>is required here is chunks that are only one base different (further on)
>>than the previous chunk.
>>
>>The simplest way would be this:
>>
>>               SymbolList mySeq; // this is your sequence from somewhere 

>> 
>>
>else
> 
>
>>               for (int i = 1 ; i <= mySeq.length()-2; i++) {
>>                               SymbolList trimer = mySeq.subSeq(i,i+2); 
>> 
>>
>// coords are
> 
>
>>inclusive so i to i+2 = 3 bases
>>                               // do something with your trimer here
>>               }
>>
>>Note that the index starts at 1 and goes right up to and including
>>length(), as symbols in a SymbolList are 1-indexed, not 0-indexed.
>>
>>cheers,
>>Richard
>>
>>Richard Holland
>>Bioinformatics Specialist
>>GIS extension 8199
>>---------------------------------------------
>>This email is confidential and may be privileged. If you are not the
>>intended recipient, please delete it and notify us immediately. Please
>>do not copy or use it for any purpose, or disclose its content to any
>>other person. Thank you.
>>---------------------------------------------
>>
>>
>>
>>
>> 
>>
>>>-----Original Message-----
>>>From: biojava-l-bounces@portal.open-bio.org 
>>>[mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of David Huen
>>>Sent: Friday, December 16, 2005 7:34 AM
>>>To: m.fortner@sbcglobal.net
>>>Cc: biojava-list
>>>Subject: Re: [Biojava-l] Sequence Iteration in BioJava(x)
>>>
>>>
>>>On Dec 15 2005, Mark Fortner wrote:
>>>I think what you want is the SymbolListViews.orderNSymbolList method.
>>>
>>>It will take a SymbolList and turn it into another where it 
>>>is viewed in 
>>>another compound alphabet of the required order.
>>>
>>>
>>>
>>>
>>> 
>>>
>>>>I'm looking for the best way to iterate through all
>>>>nmers within a given sequence.  For example, given a
>>>>sequence that looks like this:
>>>>
>>>>ACTGACTGACTG
>>>>
>>>>If I extract all trimers from this I should get:
>>>>
>>>>ACT
>>>>CTG
>>>>TGA
>>>>GAC
>>>>ACT
>>>>CTG
>>>>TGA
>>>>GAC
>>>>ACT
>>>>CTG
>>>>
>>>>Is there an existing class that will allow me to
>>>>iterate through a sequence this way, or am I on my
>>>>own?
>>>>
>>>>
>>>>
>>>> 
>>>>
>>>_______________________________________________
>>>Biojava-l mailing list  -  Biojava-l@biojava.org
>>>http://biojava.org/mailman/listinfo/biojava-l
>>>
>>>
>>>
>>> 
>>>
>>
>> 
>>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>
> 
>

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From smh1008 at cam.ac.uk  Fri Dec 16 04:25:21 2005
From: smh1008 at cam.ac.uk (David Huen)
Date: Fri Dec 16 04:39:15 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
In-Reply-To: <43A2281B.7010609@sbcglobal.net>
References: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com>
	<43A2281B.7010609@sbcglobal.net>
Message-ID: <Prayer.1.0.16.0512160925210.9408@hermes-1.csi.cam.ac.uk>

On Dec 16 2005, Mark Fortner wrote:

>Richard,
>Thanks for the example.  Your approach is very similar to a non-BioJava 
>approach that I had worked out earlier.  I was wondering if the 
>BioJava(x) API provides any performance benefit over simply running a 
>window along a character stream? 
>
>The work that we're doing involves iterating through the human genome, 
>(and in a number of cases, metagenomic sequences) and we're trying to 
>squeeze as much performance out of it as possible while minimizing the 
>memory footprint.
>
The only case where I have encountered horrible performance out of using BJ 
for this kind of activity is where the order is large (say >10). I think it 
is killing the Alphabet code somewhere to represent the required alphabet.

If that is the kind of case you want to deal with, I would believe the 
SSAHA code in BJ may be adapted to your purposes but this comment does not 
arise from direct personal experience.

Regards,
David
From ilueny at yahoo.com.br  Fri Dec 16 07:35:59 2005
From: ilueny at yahoo.com.br (Ilueny Santos)
Date: Fri Dec 16 07:39:53 2005
Subject: [Biojava-l] Locating promoter regions in sequence of DNA with
	Biojava
Message-ID: <20051216123559.69612.qmail@web53908.mail.yahoo.com>

Hello to all, 
 
 First would like to be thankful all, in special to the Mark and Gregory, for having answered.  
 
 Explaining of form more detailed my doubt: 
      I was trying to locate definitive regions in a DNA sequence (-10 box and -35 box).  These regions are small stretches of 6 pairs of bases (pb) and are thus called by being generally the 10 pb and 35 pb, respectively, upstream of +1 (ATG) and the presence of them in the sequence strong characterizes the existence of a promoter.  
 
     The problem is that they are not steady, for example:
 region -10 box normally is presented as TATAAT but it can have variations in form TATAAG in such a way or TATTAT how much in its positioning in relation to start codon (+1 ATG) 
 
 
 Leaving of Displayed I ask:  
 it will be that I obtain, using biojava it, to make one algor?tmo capable to locate unstable regions (in such a way in the form how much in its positioning) in DNA sequences? 
 
 
 I am thankful all one more time that will be able to help. 
 
 PS.:  Gregory favours, already I am studying Regular Expressions and...,  Mark, the bayesiano classifier already is fact, but, followed its tip, 
 I go to also study the package org.biojava.dist because it can be useful of some form, thanks.
 

---------------------------------
 Yahoo! doce lar. Fa?a do Yahoo! sua homepage.
From matthew.pocock at ncl.ac.uk  Fri Dec 16 09:17:15 2005
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Fri Dec 16 09:24:09 2005
Subject: [Biojava-l] Sequence Iteration in BioJava(x)
In-Reply-To: <Prayer.1.0.16.0512160925210.9408@hermes-1.csi.cam.ac.uk>
References: <6D9E9B9DF347EF4385F6271C64FB8D560289534C@BIONIC.biopolis.one-north.com>
	<43A2281B.7010609@sbcglobal.net>
	<Prayer.1.0.16.0512160925210.9408@hermes-1.csi.cam.ac.uk>
Message-ID: <200512161417.16174.matthew.pocock@ncl.ac.uk>

On Friday 16 December 2005 09:25, David Huen wrote:
>
> If that is the kind of case you want to deal with, I would believe the
> SSAHA code in BJ may be adapted to your purposes but this comment does not
> arise from direct personal experience.

The biojava SSAHA code is likely to be quite efficient for this kind of 
sliding-window application. I think it can be attached directly to the 
sequence IO events, and encodes the DNA n-mers directly as bits in an integer 
datatype. All operations are done by integer comparison, logical operations 
and shifts. Even though SSAHA itself is probably not what you want, nearly 
all the building blocks should be there in that module.

Matthew

>
> Regards,
> David
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
From mark.schreiber at novartis.com  Sun Dec 18 20:12:49 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Dec 18 20:10:15 2005
Subject: [Biojava-l] Locating promoter regions in sequence of DNA
	with	Biojava
Message-ID: <OF26C42942.9F4FA84A-ON482570DC.0005A6CB-482570DC.0006AAE1@EU.novartis.net>

Hello -

There are many approaches you can use to try and find a promoter with 
variable degrees of success. There is extensive literature on this. You 
could make profile HMMs and train them with real examples. You could also 
use a Gibbs Sampler. There are examples of both in the biojava in anger 
pages http://www.biojava.org/docs/bj_in_anger/

Other approaches would be programs like MEME or the technique called 
nested MICA developed by Thomas Down of biojava fame which seems to be 
very good.

- Mark


Ilueny Santos <ilueny@yahoo.com.br>
Sent by: biojava-l-bounces@portal.open-bio.org
12/16/2005 04:35 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Locating promoter regions in sequence of DNA with   Biojava


Hello to all, 
 
 First would like to be thankful all, in special to the Mark and Gregory, 
for having answered. 
 
 Explaining of form more detailed my doubt: 
      I was trying to locate definitive regions in a DNA sequence (-10 box 
and -35 box).  These regions are small stretches of 6 pairs of bases (pb) 
and are thus called by being generally the 10 pb and 35 pb, respectively, 
upstream of +1 (ATG) and the presence of them in the sequence strong 
characterizes the existence of a promoter. 
 
     The problem is that they are not steady, for example:
 region -10 box normally is presented as TATAAT but it can have variations 
in form TATAAG in such a way or TATTAT how much in its positioning in 
relation to start codon (+1 ATG) 
 
 
 Leaving of Displayed I ask: 
 it will be that I obtain, using biojava it, to make one algor?tmo capable 
to locate unstable regions (in such a way in the form how much in its 
positioning) in DNA sequences? 
 
 
 I am thankful all one more time that will be able to help. 
 
 PS.:  Gregory favours, already I am studying Regular Expressions and..., 
Mark, the bayesiano classifier already is fact, but, followed its tip, 
 I go to also study the package org.biojava.dist because it can be useful 
of some form, thanks.
 

---------------------------------
 Yahoo! doce lar. Fa?a do Yahoo! sua homepage.
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From ftdgc1 at uaf.edu  Wed Dec 21 03:16:27 2005
From: ftdgc1 at uaf.edu (Dan Cardin)
Date: Wed Dec 21 03:34:33 2005
Subject: [Biojava-l] Gapping Sequence problems
Message-ID: <61871.66.230.82.213.1135152987.squirrel@ftdgc1.email.uaf.edu>

Hello all, I am hung up on SimpleGappedSymbolList problem. I want to add
gaps to DNA sequences that are loaded in from file that contain gaps and
remove the gaps. I just load the sequences into an instance of type
Sequence. Here is snippet ,

private void finalizeAddGapEdit(){
   SimpleGappedSymbolList list = new
SimpleGappedSymbolList(node.getSequence());

   try {
      list.addGapsInSource(startX+1,counter);

      Sequence newSequence =
DNATools.createDNASequence(list.seqString(),node.getSequence().getName());

      node.setSequence(newSequence);

      gvc.repaint();

} catch (IllegalSymbolException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
}

}

My code to draw out the gapped symbols looks like this

public void paint(Graphics g)
{
double leftX;
Symbol gap;
double scale_factor;
boolean inGapState;
max = 0;
for(int i=0;i<seq.length;i++){

  gap = seq[i].getAlphabet().getGapSymbol();
  leftX = 0;
  scale_factor = (double) getWidth()/seq[i].length();
  inGapState = true;

  for(int j=1;j<=seq[i].length();j++){

   if(!inGapState && seq[i].symbolAt(j) == gap){
     g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int)
     ((j-1)*scale_factor) , i*pixels_bw_lines);
     inGapState = true;
   }
   else if(inGapState && seq[i].symbolAt(j) != gap){
   leftX = j-1;
   inGapState = false;
   }
  }
   //draw the last line
   g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int)
   (seq[i].length()*scale_factor) , i*pixels_bw_lines);

   if(seq[i].length() > max)
      max = seq[i].length();
}
}

My sequences load from file and display correctly , but when I add gaps
they don't show up. I am confused because I believe that the gap symbols
used in the underlying sequences are the same. The gaps are added and I
know this from printing out the string of the sequence. Does anyone know
how to fix this issue or have any suggestions on a better approach?

-dc
From hollandr at gis.a-star.edu.sg  Wed Dec 21 04:22:09 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Wed Dec 21 04:20:05 2005
Subject: [Biojava-l] Gapping Sequence problems
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5602895532@BIONIC.biopolis.one-north.com>

I think you could try swapping the use of == for equals() when testing
for equivalence to the gap symbol. It _should_ be the same literal
object, but maybe not. equals() will work in both cases but == will not.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org 
> [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Dan Cardin
> Sent: Wednesday, December 21, 2005 4:16 PM
> To: biojava-l@biojava.org
> Subject: [Biojava-l] Gapping Sequence problems
> 
> 
> Hello all, I am hung up on SimpleGappedSymbolList problem. I 
> want to add
> gaps to DNA sequences that are loaded in from file that 
> contain gaps and
> remove the gaps. I just load the sequences into an instance of type
> Sequence. Here is snippet ,
> 
> private void finalizeAddGapEdit(){
>    SimpleGappedSymbolList list = new
> SimpleGappedSymbolList(node.getSequence());
> 
>    try {
>       list.addGapsInSource(startX+1,counter);
> 
>       Sequence newSequence =
> DNATools.createDNASequence(list.seqString(),node.getSequence()
> .getName());
> 
>       node.setSequence(newSequence);
> 
>       gvc.repaint();
> 
> } catch (IllegalSymbolException e) {
>    // TODO Auto-generated catch block
>    e.printStackTrace();
> }
> 
> }
> 
> My code to draw out the gapped symbols looks like this
> 
> public void paint(Graphics g)
> {
> double leftX;
> Symbol gap;
> double scale_factor;
> boolean inGapState;
> max = 0;
> for(int i=0;i<seq.length;i++){
> 
>   gap = seq[i].getAlphabet().getGapSymbol();
>   leftX = 0;
>   scale_factor = (double) getWidth()/seq[i].length();
>   inGapState = true;
> 
>   for(int j=1;j<=seq[i].length();j++){
> 
>    if(!inGapState && seq[i].symbolAt(j) == gap){
>      g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int)
>      ((j-1)*scale_factor) , i*pixels_bw_lines);
>      inGapState = true;
>    }
>    else if(inGapState && seq[i].symbolAt(j) != gap){
>    leftX = j-1;
>    inGapState = false;
>    }
>   }
>    //draw the last line
>    g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int)
>    (seq[i].length()*scale_factor) , i*pixels_bw_lines);
> 
>    if(seq[i].length() > max)
>       max = seq[i].length();
> }
> }
> 
> My sequences load from file and display correctly , but when 
> I add gaps
> they don't show up. I am confused because I believe that the 
> gap symbols
> used in the underlying sequences are the same. The gaps are 
> added and I
> know this from printing out the string of the sequence. Does 
> anyone know
> how to fix this issue or have any suggestions on a better approach?
> 
> -dc
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

From mark.schreiber at novartis.com  Wed Dec 21 08:57:16 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Dec 21 08:54:26 2005
Subject: [Biojava-l] Gapping Sequence problems
Message-ID: <OFBCC014D2.6832E6D1-ON482570DE.004C6C65-482570DE.004CA7C3@EU.novartis.net>

You could try 

if(!inGapState && (seq[i].symbolAt(j) == gap || seq[i].symbolAt(j) == 
AlphabetManager.getGapSymbol()))

- Mark


"Dan Cardin" <ftdgc1@uaf.edu>
Sent by: biojava-l-bounces@portal.open-bio.org
12/21/2005 04:16 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Gapping Sequence problems


Hello all, I am hung up on SimpleGappedSymbolList problem. I want to add
gaps to DNA sequences that are loaded in from file that contain gaps and
remove the gaps. I just load the sequences into an instance of type
Sequence. Here is snippet ,

private void finalizeAddGapEdit(){
   SimpleGappedSymbolList list = new
SimpleGappedSymbolList(node.getSequence());

   try {
      list.addGapsInSource(startX+1,counter);

      Sequence newSequence =
DNATools.createDNASequence(list.seqString(),node.getSequence().getName());

      node.setSequence(newSequence);

      gvc.repaint();

} catch (IllegalSymbolException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
}

}

My code to draw out the gapped symbols looks like this

public void paint(Graphics g)
{
double leftX;
Symbol gap;
double scale_factor;
boolean inGapState;
max = 0;
for(int i=0;i<seq.length;i++){

  gap = seq[i].getAlphabet().getGapSymbol();
  leftX = 0;
  scale_factor = (double) getWidth()/seq[i].length();
  inGapState = true;

  for(int j=1;j<=seq[i].length();j++){

   if(!inGapState && seq[i].symbolAt(j) == gap){
     g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int)
     ((j-1)*scale_factor) , i*pixels_bw_lines);
     inGapState = true;
   }
   else if(inGapState && seq[i].symbolAt(j) != gap){
   leftX = j-1;
   inGapState = false;
   }
  }
   //draw the last line
   g.drawLine((int) (leftX*scale_factor), i*pixels_bw_lines, (int)
   (seq[i].length()*scale_factor) , i*pixels_bw_lines);

   if(seq[i].length() > max)
      max = seq[i].length();
}
}

My sequences load from file and display correctly , but when I add gaps
they don't show up. I am confused because I believe that the gap symbols
used in the underlying sequences are the same. The gaps are added and I
know this from printing out the string of the sequence. Does anyone know
how to fix this issue or have any suggestions on a better approach?

-dc
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From Russell.Smithies at agresearch.co.nz  Wed Dec 21 14:35:23 2005
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed Dec 21 14:48:02 2005
Subject: [Biojava-l] [OT] Bioinformatician job vacancy
Message-ID: <D5DBA313349A4B458528BE63B387F36C017CC3DF@imail.agresearch.co.nz>

Hi All,

I hope you don't mind me posting this here but our company has a vacancy
for a bioinformatician and I thought there might be someone here who
would like to work in beautiful New Zealand.

Russell Smithies

Bioinformatics Software Developer
Invermay  Research Centre
Puddle Alley,
Mosgiel,
New Zealand
www.agresearch.co.nz


===================================================
As part of AgResearch's company strategy we are continuing to grow our
business in the area of bioinformatics.  This capability is essential
for our science discovery.

In this position you will be part of a national team of 26
bioinformaticians, mathematical biologists and statisticians and be
based at our Grasslands campus at Palmerston North.  This is a permanent
position.

You will be an advocate for bioinformatics within AgResearch; you will
work collaboratively on projects and will provide bioinformatics
training and advice to science staff working in the biotechnology area.

We are seeking a person who has:
*	An excellent tertiary qualification in molecular biology or
genetics
*	Experience with the use of bioinformatics applications
*	Knowledge of life sciences databases and the internet
*	Well developed IT technical skills and web based technologies
*	Experience in a training environment
*	Excellent writing, speaking and interpersonal skills
*	Familiarity with Perl, Java or Unix Scripting

If you possess the above skills, we would like to hear from you.

To find out more about this position please contact Anette Becher by
email anette.becher@agresearch.co.nz or alternatively phone (03) 489
9028 (after 16th January 2006).

For a job description and application form please contact Linda Murray,
Phone (03) 489 9011 or email linda.murray@agresearch.co.nz (after 16
January 2006).  Alternatively the job description and application form
can be found at http://www.agresearch.co.nz/recruitment 

For general information on AgResearch please visit our website at
www.agresearch.co.nz

Applications close 30th January 2006 and should be sent to Linda Murray
at the following address or by email -

Linda Murray
AgResearch
Invermay Agricultural Centre
Private Bag 50034
Mosgiel, Dunedin
NEW ZEALAND
Email:  linda.murray@agresearch.co.nz  
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From prem at bch.umontreal.ca  Wed Dec 14 16:53:53 2005
From: prem at bch.umontreal.ca (Premkumar Natarajan)
Date: Thu Dec 22 11:17:01 2005
Subject: [Biojava-l] Is there any Parser for "rnamotif" output?
In-Reply-To: <200512141705.jBEH5A8U019796@portal.open-bio.org>
References: <200512141705.jBEH5A8U019796@portal.open-bio.org>
Message-ID: <43A09471.2000608@bch.umontreal.ca>

Hi:


I would like  to know if there is any generic praser for Rnamotif 
output. Even a wrapper-script that can convert rnamotif output to xml 
would be great.

Reason:
For one of my project I need to integrate more than one tool. and 
"rnamotif" is one of them. I'm thinking of using XML format of output to 
communicate between various programs.
 

Thankyou.
Prem
prem _AT_ umontreal.ca