From jolyon.holdstock at ogt.co.uk  Tue Jan  3 09:21:46 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Tue Jan  3 11:17:52 2006
Subject: [Biojava-l] Embl parser problem
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F83C2C3@EUCLID.internal.ogtip.com>

Hi,

 
I have an application using BioJava1.4pre1.4 that loads an embl or
genbank file.

 
If I load an embl file via the genbank option a BioException error is
thrown.

 
But if I load a genbank file via the embl option no BioException is
thrown and the sequence is created although it is not correct e.g.
sequence.length() returns 0

 
An example of code using the sequence file from the BioJava demos

 
String fileName =
"C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1
21903.genbank";    

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

  System.out.println("URN: " + seq.getURN());

  System.out.println("Length: " + seq.length());

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

 
The output is:

URN: sequence/embl:SION

Length: 0

 
If I use the matching embl sequence from the demos the output is:

URN: sequence/embl:AL121903

Length: 80600

 
I've used BioJava1.4 with the same outcome. Should I be parsing the file
an alternative way?

 
Thanks,

 
Jolyon

 
From mark.schreiber at novartis.com  Tue Jan  3 20:09:49 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Jan  3 20:13:15 2006
Subject: [Biojava-l] Embl parser problem
Message-ID: <OF6B69E1EC.015CCF90-ON482570EC.00061DE3-482570EC.0006648F@EU.novartis.net>

Hi -

A BioException would be expected when parsing an embl file via the genbank 
option. I is surprising you don't get one when parsing a genbank file via 
the embl option although it clearly has not worked properly.

You should only ever parse a file with the appropriate read method.

Please note that if you have access to CVS you could download the 
development version of the new parsers (biojavax) which do a much better 
job.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/03/2006 10:21 PM

 
        To:     <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Embl parser problem


Hi,

 
I have an application using BioJava1.4pre1.4 that loads an embl or
genbank file.

 
If I load an embl file via the genbank option a BioException error is
thrown.

 
But if I load a genbank file via the embl option no BioException is
thrown and the sequence is created although it is not correct e.g.
sequence.length() returns 0

 
An example of code using the sequence file from the BioJava demos

 
String fileName =
"C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1
21903.genbank"; 

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

  System.out.println("URN: " + seq.getURN());

  System.out.println("Length: " + seq.length());

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

 
The output is:

URN: sequence/embl:SION

Length: 0

 
If I use the matching embl sequence from the demos the output is:

URN: sequence/embl:AL121903

Length: 80600

 
I've used BioJava1.4 with the same outcome. Should I be parsing the file
an alternative way?

 
Thanks,

 
Jolyon

 
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From jolyon.holdstock at ogt.co.uk  Wed Jan  4 05:54:25 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Wed Jan  4 05:51:07 2006
Subject: [Biojava-l] Embl parser problem[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F8A1F08@EUCLID.internal.ogtip.com>

Thanks for the help.
I have downloaded the dev version and tried to build it.
I have no experience with Ant (I'm running v1.6.1) and the build fails.
The output from this is:

Buildfile: build.xml

init:
     [echo] Building biojava-live
     [echo] Java Home:                       c:\j2sdk1.4.2_04\jre
     [echo] JUnit present:                   ${junit.present}
     [echo] JUnit supported by Ant:          true
     [echo] HSQLDB driver present:           ${sqlDriver.hsqldb}

prepare:

prepare-biojava:

compile-biojava:
    [javac] Compiling 1279 source files to
C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava
    [javac]
C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC
ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous,
both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]     private ProjectedFeatureHolder pfh;
    [javac]             ^
    [javac] C:\Downloads\Java\BioJava\biojava-
live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to
ProjectedFeatureHolder is ambiguous, both class
org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]         pfh = new ProjectedFeatureHolder(new
TranslateFlipContext(this,seq,seq.length()+1,true));
    [javac]                   ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -deprecation for details.
    [javac] 2 errors


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 04 January 2006 01:10
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: Re: [Biojava-l] Embl parser problem[Scanned]

Hi -

A BioException would be expected when parsing an embl file via the
genbank 
option. I is surprising you don't get one when parsing a genbank file
via 
the embl option although it clearly has not worked properly.

You should only ever parse a file with the appropriate read method.

Please note that if you have access to CVS you could download the 
development version of the new parsers (biojavax) which do a much better

job.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/03/2006 10:21 PM

 
        To:     <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Embl parser problem


Hi,

 
I have an application using BioJava1.4pre1.4 that loads an embl or
genbank file.

 
If I load an embl file via the genbank option a BioException error is
thrown.

 
But if I load a genbank file via the embl option no BioException is
thrown and the sequence is created although it is not correct e.g.
sequence.length() returns 0

 
An example of code using the sequence file from the BioJava demos

 
String fileName =
"C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1
21903.genbank"; 

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

  System.out.println("URN: " + seq.getURN());

  System.out.println("Length: " + seq.length());

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

 
The output is:

URN: sequence/embl:SION

Length: 0

 
If I use the matching embl sequence from the demos the output is:

URN: sequence/embl:AL121903

Length: 80600

 
I've used BioJava1.4 with the same outcome. Should I be parsing the file
an alternative way?

 
Thanks,

 
Jolyon

 
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From alexg at compugen.co.il  Wed Jan  4 13:06:41 2006
From: alexg at compugen.co.il (Alex Golubev)
Date: Wed Jan  4 13:14:58 2006
Subject: [Biojava-l] amino acid to nucleic acid alignment
Message-ID: <5F0B7D17FC20E7489AAEE5043D6390EC03288A@cmail.il.cgen.biz>

Hi,

I'm trying to align amino acids to nucleic acids. I'm using gapped sequences both for the protein and for the DNA. I have several problems and I would very appreciate if someone could help.
1. How can I parse DNA nucleic acids and get codons. I would like to start with DNA that look like this "ATGTAT" and get a protein that look like this "MY". I'm using  "Alphabet alpha = DNATools.getCodonAlphabet();" but I can't find tokenization to parse the DNA string (does this make any sense?).
2. My other problem is that there are frame shifts and my gapped DNA look actually like this "AT-G-TAT". Is there any way to get/translate locations from the codon symbols list to/from the DNA symbols list?

I would appreciate any clue whether all of this make any sense.

Thanks,
Alex Golubev.

From smh1008 at cam.ac.uk  Wed Jan  4 15:52:19 2006
From: smh1008 at cam.ac.uk (David Huen)
Date: Wed Jan  4 16:07:53 2006
Subject: [Biojava-l] amino acid to nucleic acid alignment
In-Reply-To: <5F0B7D17FC20E7489AAEE5043D6390EC03288A@cmail.il.cgen.biz>
References: <5F0B7D17FC20E7489AAEE5043D6390EC03288A@cmail.il.cgen.biz>
Message-ID: <Prayer.1.0.16.0601042052190.7906@hermes-1.csi.cam.ac.uk>

On Jan 4 2006, Alex Golubev wrote:

>Hi,
>
> I'm trying to align amino acids to nucleic acids. I'm using gapped 
> sequences both for the protein and for the DNA. I have several problems 
> and I would very appreciate if someone could help. 1. How can I parse DNA 
> nucleic acids and get codons. I would like to start with DNA that look 
> like this "ATGTAT" and get a protein that look like this "MY". I'm using 
> "Alphabet alpha = DNATools.getCodonAlphabet();" but I can't find 
> tokenization to parse the DNA string (does this make any sense?). 

You can convert a SymbolList in the DNA alphabet into the equivalent symbol 
list in the codon alphabet (DNAxDNAxDNA) by using 
SymbolListViews.orderNSymbolList(...).


> 2. My 
> other problem is that there are frame shifts and my gapped DNA look 
> actually like this "AT-G-TAT". Is there any way to get/translate 
> locations from the codon symbols list to/from the DNA symbols list?
>
Ouch.  What do you really want to do here?

Regards,
David Huen
From mark.schreiber at novartis.com  Wed Jan  4 20:22:21 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Jan  4 20:19:06 2006
Subject: [Biojava-l] Embl parser problem[Scanned]
Message-ID: <OF994ECF81.64D8D832-ON482570ED.0007216C-482570ED.00078A68@EU.novartis.net>

Hi -

When you do the CVS update or checkout make sure you use the -Pd options.

The -d option prunes empty directories (old stuff not included in 
biojava-live anymore). It seems that you have got both an old copy and a 
new copy of the projected feature holder.

The -P option pulls new directories (new packages since your last update).

Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate 
-Pd and then running ant.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/04/2006 06:54 PM

 
        To:     <biojava-l@biojava.org>, <biojava-l-bounces@portal.open-bio.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] Embl parser problem[Scanned]


Thanks for the help.
I have downloaded the dev version and tried to build it.
I have no experience with Ant (I'm running v1.6.1) and the build fails.
The output from this is:

Buildfile: build.xml

init:
     [echo] Building biojava-live
     [echo] Java Home:                       c:\j2sdk1.4.2_04\jre
     [echo] JUnit present:                   ${junit.present}
     [echo] JUnit supported by Ant:          true
     [echo] HSQLDB driver present:           ${sqlDriver.hsqldb}

prepare:

prepare-biojava:

compile-biojava:
    [javac] Compiling 1279 source files to
C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava
    [javac]
C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC
ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous,
both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]     private ProjectedFeatureHolder pfh;
    [javac]             ^
    [javac] C:\Downloads\Java\BioJava\biojava-
live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to
ProjectedFeatureHolder is ambiguous, both class
org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]         pfh = new ProjectedFeatureHolder(new
TranslateFlipContext(this,seq,seq.length()+1,true));
    [javac]                   ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -deprecation for details.
    [javac] 2 errors


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 04 January 2006 01:10
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: Re: [Biojava-l] Embl parser problem[Scanned]

Hi -

A BioException would be expected when parsing an embl file via the
genbank 
option. I is surprising you don't get one when parsing a genbank file
via 
the embl option although it clearly has not worked properly.

You should only ever parse a file with the appropriate read method.

Please note that if you have access to CVS you could download the 
development version of the new parsers (biojavax) which do a much better

job.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/03/2006 10:21 PM

 
        To:     <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Embl parser problem


Hi,

 
I have an application using BioJava1.4pre1.4 that loads an embl or
genbank file.

 
If I load an embl file via the genbank option a BioException error is
thrown.

 
But if I load a genbank file via the embl option no BioException is
thrown and the sequence is created although it is not correct e.g.
sequence.length() returns 0

 
An example of code using the sequence file from the BioJava demos

 
String fileName =
"C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1
21903.genbank"; 

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

  System.out.println("URN: " + seq.getURN());

  System.out.println("Length: " + seq.length());

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

 
The output is:

URN: sequence/embl:SION

Length: 0

 
If I use the matching embl sequence from the demos the output is:

URN: sequence/embl:AL121903

Length: 80600

 
I've used BioJava1.4 with the same outcome. Should I be parsing the file
an alternative way?

 
Thanks,

 
Jolyon

 
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From jolyon.holdstock at ogt.co.uk  Thu Jan  5 05:07:45 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Thu Jan  5 05:41:12 2006
Subject: [Biojava-l] Embl parser problem[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F8A1FEC@EUCLID.internal.ogtip.com>

Hi 

I ran cvs update -Pd and then repeated the Ant command.

I can see it has updated as I'm trying to compile an extra source file

[javac] Compiling 1280 source files

But the build fails with the same error.

Is there a work around I could use?

Thanks

Jolyon


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 05 January 2006 01:22
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: RE: [Biojava-l] Embl parser problem[Scanned]

Hi -

When you do the CVS update or checkout make sure you use the -Pd
options.

The -d option prunes empty directories (old stuff not included in 
biojava-live anymore). It seems that you have got both an old copy and a

new copy of the projected feature holder.

The -P option pulls new directories (new packages since your last
update).

Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate 
-Pd and then running ant.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/04/2006 06:54 PM

 
        To:     <biojava-l@biojava.org>,
<biojava-l-bounces@portal.open-bio.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] Embl parser problem[Scanned]


Thanks for the help.
I have downloaded the dev version and tried to build it.
I have no experience with Ant (I'm running v1.6.1) and the build fails.
The output from this is:

Buildfile: build.xml

init:
     [echo] Building biojava-live
     [echo] Java Home:                       c:\j2sdk1.4.2_04\jre
     [echo] JUnit present:                   ${junit.present}
     [echo] JUnit supported by Ant:          true
     [echo] HSQLDB driver present:           ${sqlDriver.hsqldb}

prepare:

prepare-biojava:

compile-biojava:
    [javac] Compiling 1279 source files to
C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava
    [javac]
C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC
ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous,
both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]     private ProjectedFeatureHolder pfh;
    [javac]             ^
    [javac] C:\Downloads\Java\BioJava\biojava-
live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to
ProjectedFeatureHolder is ambiguous, both class
org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]         pfh = new ProjectedFeatureHolder(new
TranslateFlipContext(this,seq,seq.length()+1,true));
    [javac]                   ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -deprecation for details.
    [javac] 2 errors


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 04 January 2006 01:10
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: Re: [Biojava-l] Embl parser problem[Scanned]

Hi -

A BioException would be expected when parsing an embl file via the
genbank 
option. I is surprising you don't get one when parsing a genbank file
via 
the embl option although it clearly has not worked properly.

You should only ever parse a file with the appropriate read method.

Please note that if you have access to CVS you could download the 
development version of the new parsers (biojavax) which do a much better

job.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/03/2006 10:21 PM

 
        To:     <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Embl parser problem


Hi,

 
I have an application using BioJava1.4pre1.4 that loads an embl or
genbank file.

 
If I load an embl file via the genbank option a BioException error is
thrown.

 
But if I load a genbank file via the embl option no BioException is
thrown and the sequence is created although it is not correct e.g.
sequence.length() returns 0

 
An example of code using the sequence file from the BioJava demos

 
String fileName =
"C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1
21903.genbank"; 

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

  System.out.println("URN: " + seq.getURN());

  System.out.println("Length: " + seq.length());

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

 
The output is:

URN: sequence/embl:SION

Length: 0

 
If I use the matching embl sequence from the demos the output is:

URN: sequence/embl:AL121903

Length: 80600

 
I've used BioJava1.4 with the same outcome. Should I be parsing the file
an alternative way?

 
Thanks,

 
Jolyon

 
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From mark.schreiber at novartis.com  Thu Jan  5 20:10:44 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Jan  5 20:07:35 2006
Subject: [Biojava-l] Embl parser problem[Scanned]
Message-ID: <OF0E17204F.8F15E2D9-ON482570EE.00064D2B-482570EE.00067A28@EU.novartis.net>

There should only be one copy of the ProjectedFeatureHolder 
(org.biojava.bio.seq.projection.ProjectedFeatureHolder),
Try deleting your biojava-live directory and doing a fresh checkout, make 
sure you use the -Pd options during the checkout.
- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/05/2006 06:07 PM

 
        To:     Mark Schreiber/GP/Novartis@PH
        cc:     biojava-l-bounces@portal.open-bio.org, biojava-l@biojava.org
        Subject:        RE: [Biojava-l] Embl parser problem[Scanned]


Hi 

I ran cvs update -Pd and then repeated the Ant command.

I can see it has updated as I'm trying to compile an extra source file

[javac] Compiling 1280 source files

But the build fails with the same error.

Is there a work around I could use?

Thanks

Jolyon


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 05 January 2006 01:22
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: RE: [Biojava-l] Embl parser problem[Scanned]

Hi -

When you do the CVS update or checkout make sure you use the -Pd
options.

The -d option prunes empty directories (old stuff not included in 
biojava-live anymore). It seems that you have got both an old copy and a

new copy of the projected feature holder.

The -P option pulls new directories (new packages since your last
update).

Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate 
-Pd and then running ant.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/04/2006 06:54 PM

 
        To:     <biojava-l@biojava.org>,
<biojava-l-bounces@portal.open-bio.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] Embl parser problem[Scanned]


Thanks for the help.
I have downloaded the dev version and tried to build it.
I have no experience with Ant (I'm running v1.6.1) and the build fails.
The output from this is:

Buildfile: build.xml

init:
     [echo] Building biojava-live
     [echo] Java Home:                       c:\j2sdk1.4.2_04\jre
     [echo] JUnit present:                   ${junit.present}
     [echo] JUnit supported by Ant:          true
     [echo] HSQLDB driver present:           ${sqlDriver.hsqldb}

prepare:

prepare-biojava:

compile-biojava:
    [javac] Compiling 1279 source files to
C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava
    [javac]
C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC
ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous,
both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]     private ProjectedFeatureHolder pfh;
    [javac]             ^
    [javac] C:\Downloads\Java\BioJava\biojava-
live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to
ProjectedFeatureHolder is ambiguous, both class
org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]         pfh = new ProjectedFeatureHolder(new
TranslateFlipContext(this,seq,seq.length()+1,true));
    [javac]                   ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -deprecation for details.
    [javac] 2 errors


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 04 January 2006 01:10
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: Re: [Biojava-l] Embl parser problem[Scanned]

Hi -

A BioException would be expected when parsing an embl file via the
genbank 
option. I is surprising you don't get one when parsing a genbank file
via 
the embl option although it clearly has not worked properly.

You should only ever parse a file with the appropriate read method.

Please note that if you have access to CVS you could download the 
development version of the new parsers (biojavax) which do a much better

job.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/03/2006 10:21 PM

 
        To:     <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Embl parser problem


Hi,

 
I have an application using BioJava1.4pre1.4 that loads an embl or
genbank file.

 
If I load an embl file via the genbank option a BioException error is
thrown.

 
But if I load a genbank file via the embl option no BioException is
thrown and the sequence is created although it is not correct e.g.
sequence.length() returns 0

 
An example of code using the sequence file from the BioJava demos

 
String fileName =
"C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1
21903.genbank"; 

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

  System.out.println("URN: " + seq.getURN());

  System.out.println("Length: " + seq.length());

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

 
The output is:

URN: sequence/embl:SION

Length: 0

 
If I use the matching embl sequence from the demos the output is:

URN: sequence/embl:AL121903

Length: 80600

 
I've used BioJava1.4 with the same outcome. Should I be parsing the file
an alternative way?

 
Thanks,

 
Jolyon

 
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From jolyon.holdstock at ogt.co.uk  Fri Jan  6 04:56:27 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Fri Jan  6 04:53:01 2006
Subject: [Biojava-l] Embl parser problem[Scanned]
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F8A20FF@EUCLID.internal.ogtip.com>

Hi Mark,

Thanks for your help.

I have deleted the original download and repeated the cvs checkout with
the command

cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/biojava checkout
-P biojava-live

I couldn't use -Pd with the checkout command (I'm using cvs 1.11.17).

I repeated the build and got the same error.

I checked the download and there is only one copy of the
ProjectedFeatureHolder in org.biojava.bio.seq.projection where it should
be; so I'm not sure why Ant believes there is a second one in
org.biojava.bio.seq


Jolyon

-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 06 January 2006 01:11
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: RE: [Biojava-l] Embl parser problem[Scanned]

There should only be one copy of the ProjectedFeatureHolder 
(org.biojava.bio.seq.projection.ProjectedFeatureHolder),
Try deleting your biojava-live directory and doing a fresh checkout,
make 
sure you use the -Pd options during the checkout.
- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/05/2006 06:07 PM

 
        To:     Mark Schreiber/GP/Novartis@PH
        cc:     biojava-l-bounces@portal.open-bio.org,
biojava-l@biojava.org
        Subject:        RE: [Biojava-l] Embl parser problem[Scanned]


Hi 

I ran cvs update -Pd and then repeated the Ant command.

I can see it has updated as I'm trying to compile an extra source file

[javac] Compiling 1280 source files

But the build fails with the same error.

Is there a work around I could use?

Thanks

Jolyon


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 05 January 2006 01:22
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: RE: [Biojava-l] Embl parser problem[Scanned]

Hi -

When you do the CVS update or checkout make sure you use the -Pd
options.

The -d option prunes empty directories (old stuff not included in 
biojava-live anymore). It seems that you have got both an old copy and a

new copy of the projected feature holder.

The -P option pulls new directories (new packages since your last
update).

Or maybe i've got them mixed up, anyhow use both. Try doing a CVS upate 
-Pd and then running ant.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/04/2006 06:54 PM

 
        To:     <biojava-l@biojava.org>,
<biojava-l-bounces@portal.open-bio.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] Embl parser problem[Scanned]


Thanks for the help.
I have downloaded the dev version and tried to build it.
I have no experience with Ant (I'm running v1.6.1) and the build fails.
The output from this is:

Buildfile: build.xml

init:
     [echo] Building biojava-live
     [echo] Java Home:                       c:\j2sdk1.4.2_04\jre
     [echo] JUnit present:                   ${junit.present}
     [echo] JUnit supported by Ant:          true
     [echo] HSQLDB driver present:           ${sqlDriver.hsqldb}

prepare:

prepare-biojava:

compile-biojava:
    [javac] Compiling 1279 source files to
C:\Downloads\Java\BioJava\biojava-live\ant-build\classes\biojava
    [javac]
C:\Downloads\Java\BioJava\biojava-live\src\org\biojava\bio\seq\impl\RevC
ompSequence.java:47: reference to ProjectedFeatureHolder is ambiguous,
both class org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]     private ProjectedFeatureHolder pfh;
    [javac]             ^
    [javac] C:\Downloads\Java\BioJava\biojava-
live\src\org\biojava\bio\seq\impl\RevCompSequence.java:65: reference to
ProjectedFeatureHolder is ambiguous, both class
org.biojava.bio.seq.projection.ProjectedFeatureHolder in
org.biojava.bio.seq.projection and class
org.biojava.bio.seq.ProjectedFeatureHolder in org.biojava.bio.seq match
    [javac]         pfh = new ProjectedFeatureHolder(new
TranslateFlipContext(this,seq,seq.length()+1,true));
    [javac]                   ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -deprecation for details.
    [javac] 2 errors


-----Original Message-----
From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] 
Sent: 04 January 2006 01:10
To: Jolyon Holdstock
Cc: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org
Subject: Re: [Biojava-l] Embl parser problem[Scanned]

Hi -

A BioException would be expected when parsing an embl file via the
genbank 
option. I is surprising you don't get one when parsing a genbank file
via 
the embl option although it clearly has not worked properly.

You should only ever parse a file with the appropriate read method.

Please note that if you have access to CVS you could download the 
development version of the new parsers (biojavax) which do a much better

job.

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
01/03/2006 10:21 PM

 
        To:     <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Embl parser problem


Hi,

 
I have an application using BioJava1.4pre1.4 that loads an embl or
genbank file.

 
If I load an embl file via the genbank option a BioException error is
thrown.

 
But if I load a genbank file via the embl option no BioException is
thrown and the sequence is created although it is not correct e.g.
sequence.length() returns 0

 
An example of code using the sequence file from the BioJava demos

 
String fileName =
"C:/Downloads/Java/BioJava/BioJava-1.4pre1/biojava-1.4pre1/demos/seq/AL1
21903.genbank"; 

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

  System.out.println("URN: " + seq.getURN());

  System.out.println("Length: " + seq.length());

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

 
The output is:

URN: sequence/embl:SION

Length: 0

 
If I use the matching embl sequence from the demos the output is:

URN: sequence/embl:AL121903

Length: 80600

 
I've used BioJava1.4 with the same outcome. Should I be parsing the file
an alternative way?

 
Thanks,

 
Jolyon

 
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


This email has been scanned by Oxford Gene Technology Group of Companies
Security Systems.


From td2 at sanger.ac.uk  Fri Jan  6 04:34:05 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Fri Jan  6 05:07:49 2006
Subject: [Biojava-l] Embl parser problem[Scanned]
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F8A1FEC@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3F8A1FEC@EUCLID.internal.ogtip.com>
Message-ID: <88BF1DBE-FB06-46F0-84A9-5751FF12307D@sanger.ac.uk>


On 5 Jan 2006, at 10:07, Jolyon Holdstock wrote:

> Hi
>
> I ran cvs update -Pd and then repeated the Ant command.
>
> I can see it has updated as I'm trying to compile an extra source file
>
> [javac] Compiling 1280 source files
>
> But the build fails with the same error.
>
> Is there a work around I could use?

I'm wondering if you might have an old version of BioJava lying  
around on your CLASSPATH or in a JDK extensions directory?  There's  
only one copy of ProjectedFeatureHolder in the source tree but long  
ago in a galaxy far, far away it used to live in bio.seq rather than  
bio.seq.projection.  I suspect you have a copy that pre-dates this move.

Alternatively, you could just update the import statements to import  
individual classes:

           import org.biojava.bio.seq.projection.ProjectedFeatureHolder;

instead of

           import org.biojava.bio.seq.projection.*;

Thomas.
From christoph.gille at charite.de  Fri Jan  6 15:37:46 2006
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Fri Jan  6 15:41:43 2006
Subject: [Biojava-l] tiny problem with converting java 1.5 to 1.4
Message-ID: <51334.192.168.220.203.1136579866.squirrel@webmail.charite.de>

Recently I discussed that Biojava could be changed to Java version
1.5 without breaking compatibility since the novel tool Retroweaver
allows to run Java 1.5 programs on older JREs.

I started to use enums in my program and did not encounter any
problems related to retroweaving.

However there is one nasty problem  which shows up only at runtime:

In Java 1.5 but not in 1.4 exists the method
StringBuffer#insert(int, CharSequence)

In Java 1.4 and 1.5 exists the method.
StringBuffer#insert(int, Object)

After compiling with the javac version 1.5 and retroweaving
one gets a  NoSuchMethodError runtime error because
#insert(int, CharSequence) does not exist in the 1.4 runtime library.

The workaround is simple - just casting StringBuffer to Object so that
the method #insert(int, Object) is taken instead of #insert(int,
CharSequence).

I already told the author of retroweaver.
Otherwise retroweaver works very well.


From wendy.wong at gmail.com  Tue Jan 10 16:00:30 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Tue Jan 10 18:48:25 2006
Subject: [Biojava-l] Generalized HMM in biojava?
Message-ID: <e554425b0601101300h508bc2f9q5606df9bf203b7ea@mail.gmail.com>

Hi,

I was wondering if it is possible to use the biojava library to
construct a generalized HMM?

thanks,
Wendy

From mark.schreiber at novartis.com  Tue Jan 10 22:39:48 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Jan 10 22:36:37 2006
Subject: [Biojava-l] Generalized HMM in biojava?
Message-ID: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>

Depending on what you mean by generalized....

You can create lots of custom HMM architechtures using the DP packages of 
biojava.

- Mark


wendy wong <wendy.wong@gmail.com>
Sent by: biojava-l-bounces@portal.open-bio.org
01/11/2006 05:00 AM
Please respond to sww8

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Generalized HMM in biojava?


Hi,

I was wondering if it is possible to use the biojava library to
construct a generalized HMM?

thanks,
Wendy

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From wendy.wong at gmail.com  Wed Jan 11 04:37:34 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Wed Jan 11 04:58:59 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
Message-ID: <e554425b0601110137v37d0bb17n92b618c919c02979@mail.gmail.com>

what I mean by Generalized HMM is that each state emits a sequence of
symbols (fixed length though), which doesn't seen very straight
forward in biojava?

thanks,
wendy

On 1/11/06, mark.schreiber@novartis.com <mark.schreiber@novartis.com> wrote:
> Depending on what you mean by generalized....
>
> You can create lots of custom HMM architechtures using the DP packages of
> biojava.
>
> - Mark
>
>
>
>
>
> wendy wong <wendy.wong@gmail.com>
> Sent by: biojava-l-bounces@portal.open-bio.org
> 01/11/2006 05:00 AM
> Please respond to sww8
>
>
>         To:     biojava-l@biojava.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] Generalized HMM in biojava?
>
>
> Hi,
>
> I was wondering if it is possible to use the biojava library to
> construct a generalized HMM?
>
> thanks,
> Wendy
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>

From koeberle at mpiib-berlin.mpg.de  Wed Jan 11 05:45:17 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Wed Jan 11 05:48:28 2006
Subject: [Biojava-l] Sort Features
Message-ID: <43C4E1BD.3060602@mpiib-berlin.mpg.de>

Hi,

exists a way to get Features from a FeatureHolder sorted  by Location?

thanks,
Christian

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle@mpiib-berlin.mpg.de

From td2 at sanger.ac.uk  Wed Jan 11 06:08:11 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Wed Jan 11 06:04:36 2006
Subject: [Biojava-l] Sort Features
In-Reply-To: <43C4E1BD.3060602@mpiib-berlin.mpg.de>
References: <43C4E1BD.3060602@mpiib-berlin.mpg.de>
Message-ID: <7EF216E9-51B0-4E9A-89C6-6291736C8193@sanger.ac.uk>


On 11 Jan 2006, at 10:45, Christian K?berle wrote:

> Hi,
>
> exists a way to get Features from a FeatureHolder sorted  by Location?

You guarantee a specific iteration order from a FeatureHolder (unless  
you write your own implementation).  You can, however, dump some  
features into a List or Set then sort them there.

           FeatureHolder fh = ...;
           List l = new ArrayList();
           for (Iterator i = fh.features(); i.hasNext(); ) {
               l.add(i.next());
           }
           Collections.sort(l, Feature.byLocationOrder);


Thomas.
From matthew.pocock at ncl.ac.uk  Wed Jan 11 06:27:00 2006
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Wed Jan 11 06:39:44 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <e554425b0601110137v37d0bb17n92b618c919c02979@mail.gmail.com>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
	<e554425b0601110137v37d0bb17n92b618c919c02979@mail.gmail.com>
Message-ID: <200601111127.00973.matthew.pocock@ncl.ac.uk>

If each state emits a fixed number of symbols then you can just do an HMM 
where the emissions are over alpha^length. If you want the symbols to overlap 
then use an order-n distribution. 

Matthew

On Wednesday 11 January 2006 09:37, wendy wong wrote:
> what I mean by Generalized HMM is that each state emits a sequence of
> symbols (fixed length though), which doesn't seen very straight
> forward in biojava?
>
> thanks,
> wendy
>
> On 1/11/06, mark.schreiber@novartis.com <mark.schreiber@novartis.com> wrote:
> > Depending on what you mean by generalized....
> >
> > You can create lots of custom HMM architechtures using the DP packages of
> > biojava.
> >
> > - Mark
> >
> >
> >
> >
> >
> > wendy wong <wendy.wong@gmail.com>
> > Sent by: biojava-l-bounces@portal.open-bio.org
> > 01/11/2006 05:00 AM
> > Please respond to sww8
> >
> >
> >         To:     biojava-l@biojava.org
> >         cc:     (bcc: Mark Schreiber/GP/Novartis)
> >         Subject:        [Biojava-l] Generalized HMM in biojava?
> >
> >
> > Hi,
> >
> > I was wondering if it is possible to use the biojava library to
> > construct a generalized HMM?
> >
> > thanks,
> > Wendy
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
From srouane at hotmail.com  Wed Jan 11 08:36:40 2006
From: srouane at hotmail.com (Simon Rouane)
Date: Wed Jan 11 08:50:17 2006
Subject: [Biojava-l] getting involved
In-Reply-To: <43C4E1BD.3060602@mpiib-berlin.mpg.de>
Message-ID: <BAY24-F21B5A24E89B6C78591BBECBE240@phx.gbl>

I'm a commercial Java developer who's worked on a fair few systems 
integration, LIMS and Datamart implementations in the past and I'd love to 
get involved in this project.

Can anyone give me some hints as to what the first steps are?

Thanks,

Simon Rouane.


From wendy.wong at gmail.com  Wed Jan 11 11:03:11 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Wed Jan 11 11:06:59 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <200601111127.00973.matthew.pocock@ncl.ac.uk>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
	<e554425b0601110137v37d0bb17n92b618c919c02979@mail.gmail.com>
	<200601111127.00973.matthew.pocock@ncl.ac.uk>
Message-ID: <e554425b0601110803m123e3310g124dd0a70f911896@mail.gmail.com>

Thanks!
Now I have two questions about the SimpleEmissionState class:

1. advance: I am not entirely sure what it does. So if my state emits
4 symbols at a time do I set it to {4}?

2. Each of my sites can emit up to more than 100 alphabets and if 
each state emits 4 symbols at a time the number of alphabet for each
state is 100^4. I am a bit concerned about setting up the
distributions (too much memory consumption?). Is there a function that
I can overload so that the probability of each emission alphabet can
be calculated on the run?

Thanks for your help!

wendy


On 1/11/06, Matthew Pocock <matthew.pocock@ncl.ac.uk> wrote:
> If each state emits a fixed number of symbols then you can just do an HMM
> where the emissions are over alpha^length. If you want the symbols to overlap
> then use an order-n distribution.
>
> Matthew
>
> On Wednesday 11 January 2006 09:37, wendy wong wrote:
> > what I mean by Generalized HMM is that each state emits a sequence of
> > symbols (fixed length though), which doesn't seen very straight
> > forward in biojava?
> >
> > thanks,
> > wendy
> >
> > On 1/11/06, mark.schreiber@novartis.com <mark.schreiber@novartis.com> wrote:
> > > Depending on what you mean by generalized....
> > >
> > > You can create lots of custom HMM architechtures using the DP packages of
> > > biojava.
> > >
> > > - Mark
> > >
> > >
> > >
> > >
> > >
> > > wendy wong <wendy.wong@gmail.com>
> > > Sent by: biojava-l-bounces@portal.open-bio.org
> > > 01/11/2006 05:00 AM
> > > Please respond to sww8
> > >
> > >
> > >         To:     biojava-l@biojava.org
> > >         cc:     (bcc: Mark Schreiber/GP/Novartis)
> > >         Subject:        [Biojava-l] Generalized HMM in biojava?
> > >
> > >
> > > Hi,
> > >
> > > I was wondering if it is possible to use the biojava library to
> > > construct a generalized HMM?
> > >
> > > thanks,
> > > Wendy
> > >
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l@biojava.org
> > > http://biojava.org/mailman/listinfo/biojava-l
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
>

From mark.schreiber at novartis.com  Wed Jan 11 21:46:42 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Jan 11 21:43:14 2006
Subject: [Biojava-l] getting involved
Message-ID: <OF04507EC9.5740FF51-ON482570F4.000F10A2-482570F4.000F4353@EU.novartis.net>

It really comes down to what you want to do.

Right now we need people to stress test the new biojavax packages 
available in CVS. Some more Unit tests for biojavax would also be great. 
Especially ones that test for cases identified in stress testing.

If you have other ideas that would also be cool.

- Mark


"Simon Rouane" <srouane@hotmail.com>
Sent by: biojava-l-bounces@portal.open-bio.org
01/11/2006 09:36 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] getting involved


I'm a commercial Java developer who's worked on a fair few systems 
integration, LIMS and Datamart implementations in the past and I'd love to 

get involved in this project.

Can anyone give me some hints as to what the first steps are?

Thanks,

Simon Rouane.


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From srouane at hotmail.com  Thu Jan 12 05:05:17 2006
From: srouane at hotmail.com (Simon Rouane)
Date: Thu Jan 12 05:18:49 2006
Subject: [Biojava-l] getting involved
In-Reply-To: <OF04507EC9.5740FF51-ON482570F4.000F10A2-482570F4.000F4353@EU.novartis.net>
Message-ID: <BAY24-F2620AFD31ED51CB7F9EC24BE270@phx.gbl>

Thanks for everyones comments. I'll do a bit more reading and then get back 
to you...

Is your testing done using JUNIT?

Simon.

>From: mark.schreiber@novartis.com
>To: "Simon Rouane" <srouane@hotmail.com>
>CC: biojava-l@biojava.org, biojava-l-bounces@portal.open-bio.org
>Subject: Re: [Biojava-l] getting involved
>Date: Thu, 12 Jan 2006 10:46:42 +0800
>
>It really comes down to what you want to do.
>
>Right now we need people to stress test the new biojavax packages
>available in CVS. Some more Unit tests for biojavax would also be great.
>Especially ones that test for cases identified in stress testing.
>
>If you have other ideas that would also be cool.
>
>- Mark
>
>
>
>
>
>"Simon Rouane" <srouane@hotmail.com>
>Sent by: biojava-l-bounces@portal.open-bio.org
>01/11/2006 09:36 PM
>
>
>         To:     biojava-l@biojava.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] getting involved
>
>
>I'm a commercial Java developer who's worked on a fair few systems
>integration, LIMS and Datamart implementations in the past and I'd love to
>
>get involved in this project.
>
>Can anyone give me some hints as to what the first steps are?
>
>Thanks,
>
>Simon Rouane.
>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>
>


From hotafin at gmail.com  Thu Jan 12 07:52:05 2006
From: hotafin at gmail.com (Tamas Horvath)
Date: Thu Jan 12 07:48:31 2006
Subject: [Biojava-l] Re: strange pdb
In-Reply-To: <c343d7080601120417s47fda656oa84d7c160c2c8e80@mail.gmail.com>
References: <c343d7080601120417s47fda656oa84d7c160c2c8e80@mail.gmail.com>
Message-ID: <c343d7080601120452k3aac6c69s1ca8e67bda63c545@mail.gmail.com>

(wow... stupid linewrap...)It seems to me, that we need a variance tag for the Group or Atom object...As a beginning... The altloc is supposedly means variation of the atomsposition, but it seems to me, it makes more sense to treat the alternates asalternative groups, as in the cases I've so far seen, these altlocs reallyrefer to alternative sidechain conformations. In this case there would be aTYR A and TYR B conformation.
On 1/12/06, Tamas Horvath <hotafin@gmail.com> wrote:>> Hi!> I've just stubled upon a strange pdb parsing fenomenon. Look at the> following pdb file:> ATOM      1  N   GLU   326      14.783  14.947 -11.793  1.00 46.17> N> ATOM      2  CA  GLU   326      15.471  16.220 -11.447  1.00 39.29> C> ATOM      3  C   GLU   326      14.978  16.646 -10.075  1.00 37.04> C> ATOM      4  O   GLU   326      13.774  16.707  -9.841   1.00 37.72> O> ATOM      5  CB  GLU   326      15.133  17.290 -12.489  1.00 45.78> C> ATOM      6  CG  GLU   326      16.102  18.482 -12.553  1.00 71.24> C> ATOM      7  CD  GLU   326      15.940  19.327 -13.826  1.00 93.39> C> ATOM      8  OE1 GLU   326      14.901  19.198 -14.512  1.00101.02> O> ATOM      9  OE2 GLU   326      16.857  20.119 -14.144  1.00 84.50> O> ATOM     10  N   TYR   327      15.913  16.885  -9.163  1.00 33.93> N> ATOM     11  CA  TYR   327      15.604  17.298  -7.797  1.00 23.92> C> ATOM     12  C   TYR   327      15.865  18.786  -7.632  1.00 24.48> C> ATOM     13  O   TYR   327      16.797  19.328  -8.230  1.00 31.71> O> ATOM     14  CB ATYR   327      16.402  16.443  -6.818  0.50 29.56> C> ATOM     15  CB BTYR   327      16.528  16.583  -6.799  0.50 30.30> C> ATOM     16  CG ATYR   327      16.280  14.990  -7.206  0.50 45.39> C> ATOM     17  CG BTYR   327      15.997  15.310  -6.184  0.50 31.62> C> ATOM     18  CD1ATYR   327      16.886  14.518  -8.371  0.50 44.19> C> ATOM     19  CD1BTYR   327      14.840  15.316  -5.413  0.50 41.31> C> ATOM     20  CD2ATYR   327      15.466  14.119  -6.496  0.50 38.02> C> ATOM     21  CD2BTYR   327      16.667  14.101  -6.351  0.50 54.42> C> ATOM     22  CE1ATYR   327      16.676  13.240  -8.828  0.50 27.11> C> ATOM     23  CE1BTYR   327      14.361  14.153  -4.823  0.50 24.22> C> ATOM     24  CE2ATYR   327      15.256  12.830  -6.944  0.50 27.50> C> ATOM     25  CE2BTYR   327      16.196  12.934  -5.764  0.50 45.82> C> ATOM     26  CZ ATYR   327      15.866  12.400  -8.119  0.50 24.52> C> ATOM     27  CZ BTYR   327      15.041  12.970  -5.001  0.50 38.12> C> ATOM     28  OH ATYR   327      15.666  11.127  -8.607  0.50 51.23> O> ATOM     29  OH BTYR   327      14.567  11.824  -4.411  0.50 40.14> O> ATOM     30  N   PHE   328      15.050  19.446  -6.825  1.00 20.97> N> ATOM     31  CA  PHE   328      15.212  20.876  -6.587  1.00 20.04> C> ATOM     32  C   PHE   328      15.213  21.072  -5.098  1.00 28.28> C> ATOM     33  O   PHE   328      14.775  20.197  -4.363  1.00 24.43> O> ATOM     34  CB  PHE   328      14.061  21.656  -7.209  1.00 22.08> C> ATOM     35  CG  PHE   328      13.906  21.406  -8.670  1.00 31.12> C> ATOM     36  CD1 PHE   328      13.164  20.320  -9.124  1.00 23.58> C> ATOM     37  CD2 PHE   328      14.547   22.217  -9.594  1.00 47.00> C> ATOM     38  CE1 PHE   328      13.064  20.044 -10.465  1.00 30.40> C> ATOM     39  CE2 PHE   328      14.452  21.948 -10.954  1.00 44.64> C> ATOM     40  CZ  PHE   328      13.706  20.852 -11.386  1.00 33.12> C>> As the pdb parser goes through these it simply cuts off those A/B variants> of that TYR, and simply just parses them as similarly named atoms of the> same aa. This is really not a desired thing to do.> As in the pdb format description, this is:>> 17             Character       altLoc        Alternate location indicator.>>>> Maybe the simplest way to deal with it is to let the user choose, which wariant should be used...>>
From matthew.pocock at ncl.ac.uk  Thu Jan 12 08:27:10 2006
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Thu Jan 12 08:42:45 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <e554425b0601110803m123e3310g124dd0a70f911896@mail.gmail.com>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
	<200601111127.00973.matthew.pocock@ncl.ac.uk>
	<e554425b0601110803m123e3310g124dd0a70f911896@mail.gmail.com>
Message-ID: <200601121327.11083.matthew.pocock@ncl.ac.uk>

On Wednesday 11 January 2006 16:03, wendy wong wrote:
> Thanks!
> Now I have two questions about the SimpleEmissionState class:
>
> 1. advance: I am not entirely sure what it does. So if my state emits
> 4 symbols at a time do I set it to {4}?

If you are emitting 4 symbols at a time, then you should probably think of the 
sequence as being a string of 4-tuples. In this case, the advance would be 
{1 }, as you emit a single 4-tuple each time.

>
> 2. Each of my sites can emit up to more than 100 alphabets 

I think we are using different words here. Do you mean 100 alphabets, or 
alphabets containing 100 symbols?

> and if 
> each state emits 4 symbols at a time the number of alphabet for each
> state is 100^4. I am a bit concerned about setting up the
> distributions (too much memory consumption?).

Well, there's no way arround this. If you realy want to estimate a full 
discrete distribution over 4-tuples over 100 symbols, then you will have 
100^4 parameters to estimate.

The alternative is to estimate a much smaller number of variables which when 
combined together (e.g. by multiplying them) calculate the full set of 
parameters. With a little thinking, You can rig the  distribution trainer to 
route the counts back from the 100^4 possible outcomes to the underlying 
parameters.

It would probably help to have a better idea what it is you are attempting to 
model.

> Is there a function that 
> I can overload so that the probability of each emission alphabet can
> be calculated on the run?

It's not the alphabet that will kill you, but the number of parameters you are 
estimating. Indeed, BioJava should be able to handle alphabets with more than 
2^32 symbols quite happily. There's an implementation of cross-product 
alphabet designed especially for this case.

>
> Thanks for your help!
>
> wendy
>
> On 1/11/06, Matthew Pocock <matthew.pocock@ncl.ac.uk> wrote:
> > If each state emits a fixed number of symbols then you can just do an HMM
> > where the emissions are over alpha^length. If you want the symbols to
> > overlap then use an order-n distribution.
> >
> > Matthew
> >
> > On Wednesday 11 January 2006 09:37, wendy wong wrote:
> > > what I mean by Generalized HMM is that each state emits a sequence of
> > > symbols (fixed length though), which doesn't seen very straight
> > > forward in biojava?
> > >
> > > thanks,
> > > wendy
> > >
> > > On 1/11/06, mark.schreiber@novartis.com <mark.schreiber@novartis.com> 
wrote:
> > > > Depending on what you mean by generalized....
> > > >
> > > > You can create lots of custom HMM architechtures using the DP
> > > > packages of biojava.
> > > >
> > > > - Mark
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > wendy wong <wendy.wong@gmail.com>
> > > > Sent by: biojava-l-bounces@portal.open-bio.org
> > > > 01/11/2006 05:00 AM
> > > > Please respond to sww8
> > > >
> > > >
> > > >         To:     biojava-l@biojava.org
> > > >         cc:     (bcc: Mark Schreiber/GP/Novartis)
> > > >         Subject:        [Biojava-l] Generalized HMM in biojava?
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I was wondering if it is possible to use the biojava library to
> > > > construct a generalized HMM?
> > > >
> > > > thanks,
> > > > Wendy
> > > >
> > > > _______________________________________________
> > > > Biojava-l mailing list  -  Biojava-l@biojava.org
> > > > http://biojava.org/mailman/listinfo/biojava-l
> > >
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l@biojava.org
> > > http://biojava.org/mailman/listinfo/biojava-l
From ap3 at sanger.ac.uk  Thu Jan 12 08:41:31 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Thu Jan 12 09:05:45 2006
Subject: [Biojava-l] Re: strange pdb
In-Reply-To: <c343d7080601120452k3aac6c69s1ca8e67bda63c545@mail.gmail.com>
References: <c343d7080601120417s47fda656oa84d7c160c2c8e80@mail.gmail.com>
	<c343d7080601120452k3aac6c69s1ca8e67bda63c545@mail.gmail.com>
Message-ID: <334130934e4489534c81036f79f2d6d3@sanger.ac.uk>

Hi Tamas,

the Atoms have an altLoc field. - only the pdb parser did not capture 
that information...
I  committed a fix to cvs.

Cheers,
Andreas


On 12 Jan 2006, at 12:52, Tamas Horvath wrote:

> (wow... stupid linewrap...)It seems to me, that we need a variance tag 
> for the Group or Atom object...As a beginning... The altloc is 
> supposedly means variation of the atomsposition, but it seems to me, 
> it makes more sense to treat the alternates asalternative groups, as 
> in the cases I've so far seen, these altlocs reallyrefer to 
> alternative sidechain conformations. In this case there would be aTYR 
> A and TYR B conformation.
> On 1/12/06, Tamas Horvath <hotafin@gmail.com> wrote:>> Hi!> I've just 
> stubled upon a strange pdb parsing fenomenon. Look at the> following 
> pdb file:> ATOM      1  N   GLU   326      14.783  14.947 -11.793  
> 1.00 46.17> N> ATOM      2  CA  GLU   326      15.471  16.220 -11.447  
> 1.00 39.29> C> ATOM      3  C   GLU   326      14.978  16.646 -10.075  
> 1.00 37.04> C> ATOM      4  O   GLU   326      13.774  16.707  -9.841  
>  1.00 37.72> O> ATOM      5  CB  GLU   326      15.133  17.290 -12.489 
>  1.00 45.78> C> ATOM      6  CG  GLU   326      16.102  18.482 -12.553 
>  1.00 71.24> C> ATOM      7  CD  GLU   326      15.940  19.327 -13.826 
>  1.00 93.39> C> ATOM      8  OE1 GLU   326      14.901  19.198 -14.512 
>  1.00101.02> O> ATOM      9  OE2 GLU   326      16.857  20.119 -14.144 
>  1.00 84.50> O> ATOM     10  N   TYR   327      15.913  16.885  -9.163 
>  1.00 33.93> N> ATOM     11  CA  TYR   327      15.604  17.298  -7.797 
>  1.00 23.92> C> ATOM     12  C   TYR   327      15.865  18.786  -7.632 
>  1!
> .00 24.48> C> ATOM     13  O   TYR   327      16.797  19.328  -8.230  
> 1.00 31.71> O> ATOM     14  CB ATYR   327      16.402  16.443  -6.818  
> 0.50 29.56> C> ATOM     15  CB BTYR   327      16.528  16.583  -6.799  
> 0.50 30.30> C> ATOM     16  CG ATYR   327      16.280  14.990  -7.206  
> 0.50 45.39> C> ATOM     17  CG BTYR   327      15.997  15.310  -6.184  
> 0.50 31.62> C> ATOM     18  CD1ATYR   327      16.886  14.518  -8.371  
> 0.50 44.19> C> ATOM     19  CD1BTYR   327      14.840  15.316  -5.413  
> 0.50 41.31> C> ATOM     20  CD2ATYR   327      15.466  14.119  -6.496  
> 0.50 38.02> C> ATOM     21  CD2BTYR   327      16.667  14.101  -6.351  
> 0.50 54.42> C> ATOM     22  CE1ATYR   327      16.676  13.240  -8.828  
> 0.50 27.11> C> ATOM     23  CE1BTYR   327      14.361  14.153  -4.823  
> 0.50 24.22> C> ATOM     24  CE2ATYR   327      15.256  12.830  -6.944  
> 0.50 27.50> C> ATOM     25  CE2BTYR   327      16.196  12.934  -5.764  
> 0.50 45.82> C> ATOM     26  CZ ATYR   327      15.866  12.400  -8.1!
> 19  0.50 24.52> C> ATOM     27  CZ BTYR   327      15.041  12.970  
> -5.001  0.50 38.12> C> ATOM     28  OH ATYR   327      15.666  11.127  
> -8.607  0.50 51.23> O> ATOM     29  OH BTYR   327      14.567  11.824  
> -4.411  0.50 40.14> O> ATOM     30  N   PHE   328      15.050  19.446  
> -6.825  1.00 20.97> N> ATOM     31  CA  PHE   328      15.212  20.876  
> -6.587  1.00 20.04> C> ATOM     32  C   PHE   328      15.213  21.072  
> -5.098  1.00 28.28> C> ATOM     33  O   PHE   328      14.775  20.197  
> -4.363  1.00 24.43> O> ATOM     34  CB  PHE   328      14.061  21.656  
> -7.209  1.00 22.08> C> ATOM     35  CG  PHE   328      13.906  21.406  
> -8.670  1.00 31.12> C> ATOM     36  CD1 PHE   328      13.164  20.320  
> -9.124  1.00 23.58> C> ATOM     37  CD2 PHE   328      14.547   22.217 
>  -9.594  1.00 47.00> C> ATOM     38  CE1 PHE   328      13.064  20.044 
> -10.465  1.00 30.40> C> ATOM     39  CE2 PHE   328      14.452  21.948 
> -10.954  1.00 44.64> C> ATOM     40  CZ  PHE   328      13.706  
> 20.852!
>  -11.386  1.00 33.12> C>> As the pdb parser goes through these it 
> simply cuts off those A/B variants> of that TYR, and simply just 
> parses them as similarly named atoms of the> same aa. This is really 
> not a desired thing to do.> As in the pdb format description, this 
> is:>> 17             Character       altLoc        Alternate location 
> indicator.>>>> Maybe the simplest way to deal with it is to let the 
> user choose, which wariant should be used...>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From hotafin at gmail.com  Thu Jan 12 07:17:31 2006
From: hotafin at gmail.com (Tamas Horvath)
Date: Thu Jan 12 13:50:03 2006
Subject: [Biojava-l] strange pdb
Message-ID: <c343d7080601120417s47fda656oa84d7c160c2c8e80@mail.gmail.com>

Hi!I've just stubled upon a strange pdb parsing fenomenon. Look at thefollowing pdb file:ATOM      1  N   GLU   326      14.783  14.947 -11.793  1.00 46.17NATOM      2  CA  GLU   326      15.471  16.220 -11.447  1.00 39.29CATOM      3  C   GLU   326      14.978  16.646 -10.075  1.00 37.04CATOM      4  O   GLU   326      13.774  16.707  -9.841  1.00 37.72OATOM      5  CB  GLU   326      15.133  17.290 -12.489  1.00 45.78CATOM      6  CG  GLU   326      16.102  18.482 -12.553  1.00 71.24CATOM      7  CD  GLU   326      15.940  19.327 -13.826  1.00 93.39CATOM      8  OE1 GLU   326      14.901  19.198 -14.512  1.00101.02OATOM      9  OE2 GLU   326      16.857  20.119 -14.144  1.00 84.50OATOM     10  N   TYR   327      15.913  16.885  -9.163  1.00 33.93NATOM     11  CA  TYR   327      15.604  17.298  -7.797  1.00 23.92CATOM     12  C   TYR   327      15.865  18.786  -7.632  1.00 24.48CATOM     13  O   TYR   327      16.797  19.328  -8.230  1.00 31.71OATOM     14  CB ATYR   327      16.402  16.443  -6.818  0.50 29.56CATOM     15  CB BTYR   327      16.528  16.583  -6.799  0.50 30.30CATOM     16  CG ATYR   327      16.280  14.990  -7.206  0.50 45.39CATOM     17  CG BTYR   327      15.997  15.310  -6.184  0.50 31.62CATOM     18  CD1ATYR   327      16.886  14.518  -8.371  0.50 44.19CATOM     19  CD1BTYR   327      14.840  15.316  -5.413  0.50 41.31CATOM     20  CD2ATYR   327      15.466  14.119  -6.496  0.50 38.02CATOM     21  CD2BTYR   327      16.667  14.101  -6.351  0.50 54.42CATOM     22  CE1ATYR   327      16.676  13.240  -8.828  0.50 27.11CATOM     23  CE1BTYR   327      14.361  14.153  -4.823  0.50 24.22CATOM     24  CE2ATYR   327      15.256  12.830  -6.944  0.50 27.50CATOM     25  CE2BTYR   327      16.196  12.934  -5.764  0.50 45.82CATOM     26  CZ ATYR   327      15.866  12.400  -8.119  0.50 24.52CATOM     27  CZ BTYR   327      15.041  12.970  -5.001  0.50 38.12CATOM     28  OH ATYR   327      15.666  11.127  -8.607  0.50 51.23OATOM     29  OH BTYR   327      14.567  11.824  -4.411  0.50 40.14OATOM     30  N   PHE   328      15.050  19.446  -6.825  1.00 20.97NATOM     31  CA  PHE   328      15.212  20.876  -6.587  1.00 20.04CATOM     32  C   PHE   328      15.213  21.072  -5.098  1.00 28.28CATOM     33  O   PHE   328      14.775  20.197  -4.363  1.00 24.43OATOM     34  CB  PHE   328      14.061  21.656  -7.209  1.00 22.08CATOM     35  CG  PHE   328      13.906  21.406  -8.670  1.00 31.12CATOM     36  CD1 PHE   328      13.164  20.320  -9.124  1.00 23.58CATOM     37  CD2 PHE   328      14.547  22.217  -9.594  1.00 47.00CATOM     38  CE1 PHE   328      13.064  20.044 -10.465  1.00 30.40CATOM     39  CE2 PHE   328      14.452  21.948 -10.954  1.00 44.64CATOM     40  CZ  PHE   328      13.706  20.852 -11.386  1.00 33.12C
As the pdb parser goes through these it simply cuts off those A/B variantsof that TYR, and simply just parses them as similarly named atoms of thesame aa. This is really not a desired thing to do.As in the pdb format description, this is:
17             Character       altLoc        Alternate location indicator.

Maybe the simplest way to deal with it is to let the user choose,which wariant should be used...
From franckv at ebi.ac.uk  Mon Jan 16 10:59:42 2006
From: franckv at ebi.ac.uk (Franck)
Date: Mon Jan 16 16:24:52 2006
Subject: [Biojava-l] Re: Multiple questions (mark.schreiber@novartis.com)
Message-ID: <43CBC2EE.9090504@ebi.ac.uk>

Hi,

sorry for this late response !
As for point 2) (Is there a wrapper for SequenceIO.fileToBiojava(..)), 
For one of my projects I've written a factory class which returns a 
Sequence object according to an URI or a string. The formats taken into 
account are EMBL, Genbank and SwissProt.
This project is still going on and not fully tested but by now this code 
works with my sequences.
If it can help someone...

Franck

p.s. You can find the java file attached.
-------------- next part --------------
package uk.ac.ebi.ftv;

import java.io.*;
import java.net.URL;
import java.net.MalformedURLException;
import java.util.regex.Pattern;

import org.biojava.bio.seq.Sequence;
import org.biojava.bio.seq.SequenceIterator;
import org.biojava.bio.seq.io.SeqIOTools;
import org.biojava.bio.seq.io.SequenceBuilder;
import org.biojava.bio.BioException;

/**
 * Project FTV : Feature Table Viewer
 * F. Valentin - Jul 2005
 * Copyright (c) European Bioinformatics Institute 2005
 * <p/>
 * $Header$
 * Version : $Name$
 * <p/>
 * <p/>
 * $Log$
 */
public abstract class SequenceFactory {

	/* ----------------------- Class variables    --------------------------- */

	// According to the documentation the first line of EMBL and SwissProt files are
	// defined as following :
	// EMBL := ID \s+ <entryname> \s+ <dataclass>; \s+ [circular] \s+ <molecule>; \s+
	//                <division>; \s+ <seqlength> \s+ BP.
	// <entryname> := \p{Alpha> \w+
	// <dataclass> := standard
	// <molecule>  := .+  (should be the same as the value in the mol_type qualifier).
	// < division> := (PHG)|(CON)|... (see EMBL documentation)
	// <seqlength> := \d+
	// ------------------------------------------------------------------------------
	// SwissProt := ID \s+ <entryname> \s+ <dataclass>; \s+ <type>; <length> \s+ AA.
	// <entryname> := \w{1,12}
	// <dataclass  := (STANDARD) | (PRELIMINARY)
	// <type>      := PRT
	// <length>    := \d+
	// ------------------------------------------------------------------------------
	// GenBank := LOCUS \s{7} <locusname> \s <length> \s bp \s <strandtype><molecule>
	//            \s{2} <type_adn> \s <division> \s <date>
	// <locusname>  := \w ( (\w(?<=\w)) | (\s(?=\s)) ){11}
	// <lentgth>    := \s ( (\s(?<=\s)) | (\d (?=\d) ){4} \d
	// <strandtype> := \s{3} ([sdm]s-)
	// <molecule>   := (NA\s) | ( (DNA) | (tRNA) | (rRNA) | (mRNA) | (uRNA) | (snRNA) | (snoRNA)
	// <type_adn>   := (circular) | (linear \s \s)
	// <division>   := \w{3}
	// <date>       := // date format dd-MMM-yyyy
	// ------------------------------------------------------------------------------
	// DDBJ := the format seems to be the same as Genbank.
	// TODO need to be confirmed.
	//
	// We don't strictly follow these definitions. The important point here is to
	// be able to distinguish the different formats. However, if new formats are
	// added it's important to adapt the tests to keep the choice deterministic !

	private static Pattern EMBL_PATTERN      = Pattern.compile("\\AID.+BP\\.\\s*$",     Pattern.MULTILINE);
	private static Pattern GENBANK_PATTERN   = Pattern.compile("\\ALOCUS.+\\d{4}\\s*$", Pattern.MULTILINE);
	private static Pattern SWISSPROT_PATTERN = Pattern.compile("\\AID.+AA\\.\\s*$",     Pattern.MULTILINE);

	/* ------------------------- Class methods    --------------------------- */

	/**
	 * Create the biojava object Sequence according to the first line of the string.
	 * @param st A string representing the sequence.
	 * @return the sequence object.
	 */
	private static Sequence createSequenceFromString(String st) throws FtvUserException {
		SequenceIterator iterator;
		BufferedReader   br = new BufferedReader(new StringReader(st));
		Sequence         sequence;

		// If EMBL format
		if (EMBL_PATTERN.matcher(st).find()) {
			iterator = SeqIOTools.readEmbl(br);
		}
		// Genbank/DDBJ format
		else if (GENBANK_PATTERN.matcher(st).find()) {
			iterator = SeqIOTools.readGenbank(br);
		}
		// SwissProt format
		else if (SWISSPROT_PATTERN.matcher(st).find()) {
			iterator = SeqIOTools.readSwissprot(br);
		}
		else {
			throw new FtvUserException(FtvUtil.MSG_SEQ_FORMAT_UNKNOWN);
		}

		// We read only the first sequence from the iterator (we use an iterator here because
		// it's simpler than creating the Sequence object directly, see StreamReader's
		// implementation to see what's have to be done).
		try {
			return sequence = iterator.nextSequence();
		} catch (BioException e) {
			System.out.println("-------------------------");
			e.getStackTrace();
			System.out.println("-------------------------");
			throw new FtvUserException("BioException : " + e.getMessage());

		}
	}

	/**
	 * Create a Sequence object according to the sort of string given as a parameter :<br>
	 * The string can be :<br>
	 *    - the sequence itself.<br>
	 *    - an URI to the sequence.<br>
	 *        eg. http://www.ebi.ac.uk/cgibin/dbfetch?db=EMBL&id=j00021&forma=embl&style=raw<br>
	 *            ftp://www.asite.fr/sequence.embl
	 * @param st string that represents a sequence.
	 * @return the sequence object.
	 */
	public static Sequence createSequence(String st) throws FtvUserException, IOException {
		StringBuffer   sb_sequence = new StringBuffer();
		String         st_sequence;
		BufferedReader in       = null;
		URL            url      = null;
		String         seq_line = null ;

		// If the URL has no protocol defined, this is the sequence itself.
		// (See http://www.ietf.org/rfc/rfc2396.txt chap 3.1)
		if (! st.matches("\\A\\w*(\\w|\\d|\\+|-|\\.):.+$")) {
			st_sequence = new String(st);
		}
		else {
			try {
				url = new URL(st);
				in  = new BufferedReader(new InputStreamReader(url.openStream()));

				while ((seq_line = in.readLine()) != null) {
					sb_sequence.append(seq_line).append("\n");
				}
				in.close();
				st_sequence = new String(sb_sequence);

			} catch (MalformedURLException e) {
				throw new FtvUserException(FtvUtil.MSG_PROTOCOL_UNKNOWN);
			} catch (FileNotFoundException e)  {
				throw new FtvUserException(FtvUtil.MSG_FILE_NOT_FOUND);
			} catch (IOException e) {
				throw e;  //To change body of catch statement use File | Settings | File Templates.
			}
		}
		return createSequenceFromString(st_sequence);
	}
}
From koeberle at mpiib-berlin.mpg.de  Thu Jan 19 12:16:46 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Thu Jan 19 12:19:40 2006
Subject: [Biojava-l] Parse XML BLAST 
Message-ID: <43CFC97E.9070308@mpiib-berlin.mpg.de>

Hi,

is it possible to get the information form BLAST-XML Tag  <Hit_def>  
with bioJAVA?

I use the example from  BioJava In Anger for parse a BLAST.
I use BlastXMLParserFacade as a parser.
To get the definition of the target gen I use 
SeqSimilaritySearchHit-Object parse the result from getSubjectID() and 
download the Sequence from NCBI. But this is very slow.

for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
          SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next();
          String name = hit.getSubjectID().split("\\|")[3];
          Sequence seq = db.getSequence(name);
          System.out.print("\t" + 
seq.getAnnotation().getProperty("DEFINITION"));
}

Is there are a better way to get the Information?

thanks,
Christian

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle@mpiib-berlin.mpg.de

From mark.schreiber at novartis.com  Thu Jan 19 22:37:23 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Jan 19 22:41:01 2006
Subject: [Biojava-l] Parse XML BLAST
Message-ID: <OF0DF1D139.A14507E4-ON482570FC.0013DB11-482570FC.0013E724@EU.novartis.net>

Which example are you using?

The BlastEcho might be faster.

- Mark


Christian K?berle <koeberle@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
01/20/2006 01:16 AM

 
        To:     bio java mailing list <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Parse XML BLAST


Hi,

is it possible to get the information form BLAST-XML Tag  <Hit_def> 
with bioJAVA?

I use the example from  BioJava In Anger for parse a BLAST.
I use BlastXMLParserFacade as a parser.
To get the definition of the target gen I use 
SeqSimilaritySearchHit-Object parse the result from getSubjectID() and 
download the Sequence from NCBI. But this is very slow.

for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
          SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next();
          String name = hit.getSubjectID().split("\\|")[3];
          Sequence seq = db.getSequence(name);
          System.out.print("\t" + 
seq.getAnnotation().getProperty("DEFINITION"));
}

Is there are a better way to get the Information?

thanks,
Christian

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle@mpiib-berlin.mpg.de

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From dreher at mpiib-berlin.mpg.de  Fri Jan 20 09:45:27 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Fri Jan 20 09:49:44 2006
Subject: [Biojava-l] BioSQL cvs versions
Message-ID: <43D0F787.5020705@mpiib-berlin.mpg.de>

Hello,
when I try to add a sequence to a BioSQL-DB, the following exception is 
thrown:

*Exception Details: * org.postgresql.util.PSQLException
  ERROR: column "seqfeature_key_id" of relation "seqfeature" does not exist

|org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804)
org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760)
org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729)
org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481)
org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374)
.
.
.

|
apparently the BioJava- and BioSQL-version don't really match.
I use the following cvs-version of the corresponding class: 
/BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005//
Further I use the latest cvs-version of the BioSQL-script 
'biosqldb-pg.sql' (it's from June 2005).
Are there any suggestions how this could be solved?

Thank you,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From jdiminic at gmail.com  Fri Jan 20 14:17:46 2006
From: jdiminic at gmail.com (Janko Diminic)
Date: Fri Jan 20 15:51:23 2006
Subject: [Biojava-l] BioSQL cvs versions
In-Reply-To: <43D0F787.5020705@mpiib-berlin.mpg.de>
References: <43D0F787.5020705@mpiib-berlin.mpg.de>
Message-ID: <43cbb78e0601201117r41222872u@mail.gmail.com>

Do you create database schema with <property
name="hbm2ddl.auto">create</property>?

Check if seqfeature_key_id exists.


2006/1/20, Felix Dreher <dreher@mpiib-berlin.mpg.de>:
> Hello,
> when I try to add a sequence to a BioSQL-DB, the following exception is
> thrown:
>
> *Exception Details: * org.postgresql.util.PSQLException
>   ERROR: column "seqfeature_key_id" of relation "seqfeature" does not exist
>
> |org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
> org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804)
> org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760)
> org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729)
> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481)
> org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374)
> .
> .
> .
>
> |
> apparently the BioJava- and BioSQL-version don't really match.
> I use the following cvs-version of the corresponding class:
> /BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005//
> Further I use the latest cvs-version of the BioSQL-script
> 'biosqldb-pg.sql' (it's from June 2005).
> Are there any suggestions how this could be solved?
>
> Thank you,
> Felix
>
>
>
>
>
>
> --
> Felix Dreher
> Max-Planck-Institute for Infection Biology
> Campus Charit? Mitte
> Department of Immunology
> Mailing address: Schumannstra?e 21/22
> Visitors: Virchowweg 12
> 10117 Berlin
> Germany
> Tel.: +49 (0)30 28460-254 / -494
> Mobile: +49 (0)163 7542426
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>


--
Janko Diminic

From wendy.wong at gmail.com  Fri Jan 20 17:11:58 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Sat Jan 21 07:19:38 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <200601121327.11083.matthew.pocock@ncl.ac.uk>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
	<200601111127.00973.matthew.pocock@ncl.ac.uk>
	<e554425b0601110803m123e3310g124dd0a70f911896@mail.gmail.com>
	<200601121327.11083.matthew.pocock@ncl.ac.uk>
Message-ID: <e554425b0601201411p68c9bad7t32f6874961620c00@mail.gmail.com>

Thanks for your help!

> It's not the alphabet that will kill you, but the number of parameters you are
> estimating. Indeed, BioJava should be able to handle alphabets with more than
> 2^32 symbols quite happily. There's an implementation of cross-product
> alphabet designed especially for this case.

what I am trying to do is to develop a phylogenetic HMM. so say there
are 3 sequences, in the alignment, that means each site consists of 3
symbols, and if it is a generalized HMM, each state has several sites,
say 7. I wrote a testing program to see if it works. when the length
of sites in the state = 5 it worked. (I just want to see if I can
factorize a symbol in the state alphabet. but when number of sites in
the state = 7, I get  java.lang.ArrayIndexOutOfBoundsException.  (code
attached)

Is it because i was not using the alphabet efficiently?

again, thanks very much for helping!

Wendy

public static void main(String[] args) throws MarshalException,
ValidationException, IOException {
		
		Alphabet sequenceAlphabet = DNATools.getDNA();
		Set alphabetSet = AlphabetManager.getAllSymbols((FiniteAlphabet)
sequenceAlphabet);
		
	    	int no_sequences = 3;
		List siteAlphabetList = Collections.nCopies(no_sequences, sequenceAlphabet);
	    Alphabet siteAlphabet =
AlphabetManager.getCrossProductAlphabet(siteAlphabetList);
	    int length = 7;
	    List staeAlphabetList = Collections.nCopies(length, siteAlphabet);
	    Alphabet stateAlphabet =
AlphabetManager.getCrossProductAlphabet(staeAlphabetList);
	
	    AlphabetIndex alphabetIndex =
AlphabetManager.getAlphabetIndex((FiniteAlphabet) stateAlphabet);
	AtomicSymbol sym = (AtomicSymbol) alphabetIndex.symbolForIndex(3);
	    List symList = sym.getSymbols();
	    log.info("sym (index=3)  is " + sym);
	    log.info("sym is composed of:");
	    Iterator symIter = symList.iterator();
	    while (symIter.hasNext()) {
	    		log.info(symIter.next());
	    }
}

From mark.schreiber at novartis.com  Sun Jan 22 20:17:16 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Jan 22 20:13:42 2006
Subject: [Biojava-l] BioSQL cvs versions
Message-ID: <OFF75A9CBE.6D006CC2-ON482570FF.0006A71E-482570FF.0007134F@EU.novartis.net>

Dear Felix,

We have found a number of deficiencies in biojava's support of biosql. 
Therefore we have moved to a new model using hibernate to overcome several 
problems. This will be officially released in biojava1.5. In the meantime 
you can download the development version from CVS.

Having said that, the best supported database versions in biojava 1.4 are 
Oracle and MySQL. These have received the most testing and support. If you 
have a chance (and cannot use Hibernate) I would suggest using one of 
those. Although someone may offer a bug fix for this problem we do not 
plan to support the old biojava/biosql mappings after 1.5 is released. 
They have been deprecated in the CVS. The official way to interact with 
biosql will be via Hibernate.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


Felix Dreher <dreher@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
01/20/2006 10:45 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioSQL cvs versions


Hello,
when I try to add a sequence to a BioSQL-DB, the following exception is 
thrown:

*Exception Details: * org.postgresql.util.PSQLException
  ERROR: column "seqfeature_key_id" of relation "seqfeature" does not 
exist

|org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804)
org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760)
org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729)
org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481)
org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374)
.
.
.

|
apparently the BioJava- and BioSQL-version don't really match.
I use the following cvs-version of the corresponding class: 
/BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005//
Further I use the latest cvs-version of the BioSQL-script 
'biosqldb-pg.sql' (it's from June 2005).
Are there any suggestions how this could be solved?

Thank you,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From matthew.pocock at ncl.ac.uk  Mon Jan 23 06:32:21 2006
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Mon Jan 23 06:46:20 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <e554425b0601201411p68c9bad7t32f6874961620c00@mail.gmail.com>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
	<200601121327.11083.matthew.pocock@ncl.ac.uk>
	<e554425b0601201411p68c9bad7t32f6874961620c00@mail.gmail.com>
Message-ID: <200601231132.21336.matthew.pocock@ncl.ac.uk>

On Friday 20 January 2006 22:11, wendy wong wrote:
> what I am trying to do is to develop a phylogenetic HMM. so say there
> are 3 sequences, in the alignment, that means each site consists of 3
> symbols, and if it is a generalized HMM, each state has several sites,
> say 7.

OK - so you have a single HMM that emits whole columns of an alignment? 
Usually to a lign three sequences, you would use a 3-head HMM where each head 
emits one of the sequences.

> I wrote a testing program to see if it works. when the length 
> of sites in the state = 5 it worked. (I just want to see if I can
> factorize a symbol in the state alphabet. but when number of sites in
> the state = 7, I get  java.lang.ArrayIndexOutOfBoundsException.  (code
> attached)
>
> Is it because i was not using the alphabet efficiently?

You shouldn't be getting exceptions. This is almost certainly a bug. Could you 
send the stack-trace?

Matthew

>
> again, thanks very much for helping!
>
> Wendy
>
> public static void main(String[] args) throws MarshalException,
> ValidationException, IOException {
>
> 		Alphabet sequenceAlphabet = DNATools.getDNA();
> 		Set alphabetSet = AlphabetManager.getAllSymbols((FiniteAlphabet)
> sequenceAlphabet);
>
> 	    	int no_sequences = 3;
> 		List siteAlphabetList = Collections.nCopies(no_sequences,
> sequenceAlphabet); Alphabet siteAlphabet =
> AlphabetManager.getCrossProductAlphabet(siteAlphabetList);
> 	    int length = 7;
> 	    List staeAlphabetList = Collections.nCopies(length, siteAlphabet);
> 	    Alphabet stateAlphabet =
> AlphabetManager.getCrossProductAlphabet(staeAlphabetList);
>
> 	    AlphabetIndex alphabetIndex =
> AlphabetManager.getAlphabetIndex((FiniteAlphabet) stateAlphabet);
> 	AtomicSymbol sym = (AtomicSymbol) alphabetIndex.symbolForIndex(3);
> 	    List symList = sym.getSymbols();
> 	    log.info("sym (index=3)  is " + sym);
> 	    log.info("sym is composed of:");
> 	    Iterator symIter = symList.iterator();
> 	    while (symIter.hasNext()) {
> 	    		log.info(symIter.next());
> 	    }
> }
From wendy.wong at gmail.com  Mon Jan 23 06:43:43 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Mon Jan 23 06:46:28 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <200601231132.21336.matthew.pocock@ncl.ac.uk>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
	<200601121327.11083.matthew.pocock@ncl.ac.uk>
	<e554425b0601201411p68c9bad7t32f6874961620c00@mail.gmail.com>
	<200601231132.21336.matthew.pocock@ncl.ac.uk>
Message-ID: <e554425b0601230343x13f4502dy370a369048676cc8@mail.gmail.com>

> OK - so you have a single HMM that emits whole columns of an alignment?
> Usually to a lign three sequences, you would use a 3-head HMM where each head
> emits one of the sequences.

I am not sure if it would work with a 3 head HMM, as in here the
sequences are related to each other by the phylogenetic tree. so if
the sequences order is the same, the column ACC would have a different
likelihood than CCA.

> You shouldn't be getting exceptions. This is almost certainly a bug. Could you
> send the stack-trace?

sure, here it is:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
	at org.biojava.bio.symbol.LinearAlphabetIndex.buildIndex(LinearAlphabetIndex.java:108)
	at org.biojava.bio.symbol.LinearAlphabetIndex.<init>(LinearAlphabetIndex.java:66)
	at org.biojava.bio.symbol.AlphabetManager.getAlphabetIndex(AlphabetManager.java:1796)
	at edu.cornell.bscb.evopromoter.TestingFunctions.main(TestingFunctions.java:61)

I think I don't need the full alphabet of getDNA(), which has 16
symbols. I reduced it to 5 (A,T, C, G, N), so I can have a state that
contains more sites...

thanks,
wendy

> > again, thanks very much for helping!
> >
> > Wendy
> >
> > public static void main(String[] args) throws MarshalException,
> > ValidationException, IOException {
> >
> >               Alphabet sequenceAlphabet = DNATools.getDNA();
> >               Set alphabetSet = AlphabetManager.getAllSymbols((FiniteAlphabet)
> > sequenceAlphabet);
> >
> >               int no_sequences = 3;
> >               List siteAlphabetList = Collections.nCopies(no_sequences,
> > sequenceAlphabet); Alphabet siteAlphabet =
> > AlphabetManager.getCrossProductAlphabet(siteAlphabetList);
> >           int length = 7;
> >           List staeAlphabetList = Collections.nCopies(length, siteAlphabet);
> >           Alphabet stateAlphabet =
> > AlphabetManager.getCrossProductAlphabet(staeAlphabetList);
> >
> >           AlphabetIndex alphabetIndex =
> > AlphabetManager.getAlphabetIndex((FiniteAlphabet) stateAlphabet);
> >       AtomicSymbol sym = (AtomicSymbol) alphabetIndex.symbolForIndex(3);
> >           List symList = sym.getSymbols();
> >           log.info("sym (index=3)  is " + sym);
> >           log.info("sym is composed of:");
> >           Iterator symIter = symList.iterator();
> >           while (symIter.hasNext()) {
> >                       log.info(symIter.next());
> >           }
> > }
>

From matthew.pocock at ncl.ac.uk  Mon Jan 23 06:58:41 2006
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Mon Jan 23 07:13:38 2006
Subject: [Biojava-l] Generalized HMM in biojava?
In-Reply-To: <e554425b0601230343x13f4502dy370a369048676cc8@mail.gmail.com>
References: <OF687C2C2C.0E90936C-ON482570F3.00140C4A-482570F3.00142001@EU.novartis.net>
	<200601231132.21336.matthew.pocock@ncl.ac.uk>
	<e554425b0601230343x13f4502dy370a369048676cc8@mail.gmail.com>
Message-ID: <200601231158.42259.matthew.pocock@ncl.ac.uk>

On Monday 23 January 2006 11:43, wendy wong wrote:
> > OK - so you have a single HMM that emits whole columns of an alignment?
> > Usually to a lign three sequences, you would use a 3-head HMM where each
> > head emits one of the sequences.
>
> I am not sure if it would work with a 3 head HMM, as in here the
> sequences are related to each other by the phylogenetic tree. so if
> the sequences order is the same, the column ACC would have a different
> likelihood than CCA.

So you already have the alignment from a phylogenetic program and you are 
using biojava to compute some other statistic over it?

>
> > You shouldn't be getting exceptions. This is almost certainly a bug.
> > Could you send the stack-trace?
>
> sure, here it is:

Thanks. I am not arround untill the end of the week. Could somebody take a 
look at this?

> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
> 	at
> org.biojava.bio.symbol.LinearAlphabetIndex.buildIndex(LinearAlphabetIndex.j
>ava:108) at
> org.biojava.bio.symbol.LinearAlphabetIndex.<init>(LinearAlphabetIndex.java:
>66) at
> org.biojava.bio.symbol.AlphabetManager.getAlphabetIndex(AlphabetManager.jav
>a:1796) at
> edu.cornell.bscb.evopromoter.TestingFunctions.main(TestingFunctions.java:61
>)
>
> I think I don't need the full alphabet of getDNA(), which has 16
> symbols. I reduced it to 5 (A,T, C, G, N), so I can have a state that
> contains more sites...

While this is a good idea, it actually will be counter-productive in BioJava. 
The DNA alphabet only has 4 'real' symbols - the nucleotides. The other 
symbols (n included) are 'virtual' symbols constructed from sets of the 
'real' symbols. By introducing 'N' as a 1st class symbol, you have actually 
grown the problem from being exp(4,n) to exp(5,n) which is probably not what 
you wanted :-)

>
> thanks,
> wendy

Matthew
From mitchellw at gis.a-star.edu.sg  Tue Jan 24 07:12:59 2006
From: mitchellw at gis.a-star.edu.sg (Wayne Mitchell)
Date: Tue Jan 24 07:17:02 2006
Subject: [Biojava-l] Bioinformatics Programmer Position in Singapore
Message-ID: <BFFC3ACB.7805%mitchellw@gis.a-star.edu.sg>

The Research Computing Group at Genome Institute Singapore is recruiting a
bioinformatics programmer to work closely with institute Scientists
to architect and implement informatics solutions to genomic biology
problems. Current projects include sequence, proteomics, SNP and micro array
analysis pipelines, db implementations, and user interface design.

Candidate should have:
-- Demonstrated ability to translate real world problems  into actionable
    software solutions
-- Outgoing, client-centric personality able to manage relationships with
    scientist clients. Gifted introverts will not thrive in this position.
-- Experience in a complex, networked UNIX environment
-- Strong programming skillset, ideally in a bioscience setting
-- Team Software Development trackrecord, preferably enterprise Java
-- DB and Data Warehouse design skills
-- Bioinformatics/ Biology domain expertise (academic degree or work
   experience) strongly preferred.


Minimum Education/ experience:
bachelors in Computer Science with 2+ years general programming experience,
or, 
1+ years bioinformatics programming experience;
or:
BS in bioscience, chemistry, physics, math or engineering with 4+ years
programming experience, or
2+ years bioinformatics programming experience.

CV to mitchellw@gis.a-star.edu.sg

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dr. Wayne Mitchell, Ph.D.
Senior Scientist, 
Genome Institute of Singaapore
+65 6478 8177 (vox)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    All rivers flow into the sea
    because it is lower than they are.
    Humility gives it its power.

                                   Dao De Jing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


   ????

   The ocean of learning is unbound

                                   Zhuang Zhou
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    This email is confidential and may be priviledged.
    If you are not the intended recipient: please
    delete it and notify us immediately; pease do
    not copy or use it for any purpose, or disclose its
    contents to other persons. Thank you.


From guedes at unisul.br  Thu Jan 26 14:51:36 2006
From: guedes at unisul.br (Dickson S. Guedes)
Date: Thu Jan 26 15:00:56 2006
Subject: [Biojava-l] Alignment with GAPs
Message-ID: <43D92848.3060408@unisul.br>

Hello All,

It is possible to make the alignment of two sequences, being that one of 
them contains GAP?

I?m doing some tests with DP and the Viterbi Algorithm, but without success.

Where can I learn about?

Thank you people,

[]s
--
:: Dickson S. Guedes (guedes at unisul dot br)
::
:: UNISUL - Universidade do Sul de Santa Catarina
:: ATI - Assessoria de Tecnologia da Informa??o
:: (0xx48) 621-3200 - http://www.unisul.br
--
"H? 10 tipos de pessoas no mundo: as que entendem
  bin?rio, e as que n?o entendem"
From matthew.pocock at ncl.ac.uk  Thu Jan 26 15:33:45 2006
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Thu Jan 26 15:40:22 2006
Subject: [Biojava-l] Alignment with GAPs
In-Reply-To: <43D92848.3060408@unisul.br>
References: <43D92848.3060408@unisul.br>
Message-ID: <200601262033.45779.matthew.pocock@ncl.ac.uk>

If I understand you correctly, one of the two sequences you are aligning 
contains a gap before you align them? Or do you want to produce a pair-wise 
alignment from two un-aligned sequences and introduce gaps?

If it is the former, you want a state that emits a gap in one sequence and a 
symbol in the other, and also advances {1,1}. I think that is easy enough to 
set up, but can't remember the exact code. If the worst comes to the worst, 
you can construct the distribution over {gap,Protein} using the classes 
in .dist and then set up a SimpleState, providing the advance and alphabet in 
the constructor.

Matthew

On Thursday 26 January 2006 19:51, Dickson S. Guedes wrote:
> Hello All,
>
> It is possible to make the alignment of two sequences, being that one of
> them contains GAP?
>
> I?m doing some tests with DP and the Viterbi Algorithm, but without
> success.
>
> Where can I learn about?
>
> Thank you people,
>
> []s
> --
>
> :: Dickson S. Guedes (guedes at unisul dot br)
> ::
> :: UNISUL - Universidade do Sul de Santa Catarina
> :: ATI - Assessoria de Tecnologia da Informa??o
> :: (0xx48) 621-3200 - http://www.unisul.br
>
> --
> "H? 10 tipos de pessoas no mundo: as que entendem
>   bin?rio, e as que n?o entendem"
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l

From guedes at unisul.br  Thu Jan 26 15:49:14 2006
From: guedes at unisul.br (Dickson S. Guedes)
Date: Thu Jan 26 15:47:29 2006
Subject: [Biojava-l] Alignment with GAPs
In-Reply-To: <200601262033.45779.matthew.pocock@ncl.ac.uk>
References: <43D92848.3060408@unisul.br>
	<200601262033.45779.matthew.pocock@ncl.ac.uk>
Message-ID: <43D935CA.2030708@unisul.br>

Thanks Matthew,

To produce a pair-wise alignment from two un-aligned sequences and 
introduce gaps I have used the sample at "BioJava In Anger" and it runs 
successfully.

Now I need an alignment of two sequences where one of them already have 
gaps before I align.

I mean that the DP class don?t accept GappedSequence, it?s right?

[]s
Guedes

Matthew Pocock escreveu:
> If I understand you correctly, one of the two sequences you are aligning 
> contains a gap before you align them? Or do you want to produce a pair-wise 
> alignment from two un-aligned sequences and introduce gaps?
> 
> If it is the former, you want a state that emits a gap in one sequence and a 
> symbol in the other, and also advances {1,1}. I think that is easy enough to 
> set up, but can't remember the exact code. If the worst comes to the worst, 
> you can construct the distribution over {gap,Protein} using the classes 
> in .dist and then set up a SimpleState, providing the advance and alphabet in 
> the constructor.
> 
> Matthew
-- 
--
:: Dickson S. Guedes (guedes at unisul dot br)
::
:: UNISUL - Universidade do Sul de Santa Catarina
:: ATI - Assessoria de Tecnologia da Informa??o
:: (0xx48) 621-3200 - http://www.unisul.br
--
"H? 10 tipos de pessoas no mundo: as que entendem
  bin?rio, e as que n?o entendem"
From mark.schreiber at novartis.com  Thu Jan 26 21:05:20 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Jan 26 21:01:40 2006
Subject: [Biojava-l] Alignment with GAPs
Message-ID: <OF82FB27D4.B14F37E3-ON48257103.000B5120-48257103.000B79A5@EU.novartis.net>

Hi -

I think the DP class should accept a GappedSequence. To get the result you 
want you will probably need to have at least one match state that can emit 
gaps. I'm curious to know why you would want to do that kind of alignment 
though?

- Mark


"Dickson S. Guedes" <guedes@unisul.br>
Sent by: biojava-l-bounces@portal.open-bio.org
01/27/2006 04:49 AM

 
        To:     Matthew Pocock <matthew.pocock@ncl.ac.uk>, Biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Alignment with GAPs


Thanks Matthew,

To produce a pair-wise alignment from two un-aligned sequences and 
introduce gaps I have used the sample at "BioJava In Anger" and it runs 
successfully.

Now I need an alignment of two sequences where one of them already have 
gaps before I align.

I mean that the DP class don?t accept GappedSequence, it?s right?

[]s
Guedes

Matthew Pocock escreveu:
> If I understand you correctly, one of the two sequences you are aligning 

> contains a gap before you align them? Or do you want to produce a 
pair-wise 
> alignment from two un-aligned sequences and introduce gaps?
> 
> If it is the former, you want a state that emits a gap in one sequence 
and a 
> symbol in the other, and also advances {1,1}. I think that is easy 
enough to 
> set up, but can't remember the exact code. If the worst comes to the 
worst, 
> you can construct the distribution over {gap,Protein} using the classes 
> in .dist and then set up a SimpleState, providing the advance and 
alphabet in 
> the constructor.
> 
> Matthew
-- 
--
:: Dickson S. Guedes (guedes at unisul dot br)
::
:: UNISUL - Universidade do Sul de Santa Catarina
:: ATI - Assessoria de Tecnologia da Informa??o
:: (0xx48) 621-3200 - http://www.unisul.br
--
"H? 10 tipos de pessoas no mundo: as que entendem
  bin?rio, e as que n?o entendem"
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From guedes at unisul.br  Fri Jan 27 11:09:07 2006
From: guedes at unisul.br (Dickson S. Guedes)
Date: Fri Jan 27 11:07:26 2006
Subject: [Biojava-l] Alignment with GAPs
In-Reply-To: <OF82FB27D4.B14F37E3-ON48257103.000B5120-48257103.000B79A5@EU.novartis.net>
References: <OF82FB27D4.B14F37E3-ON48257103.000B5120-48257103.000B79A5@EU.novartis.net>
Message-ID: <43DA45A3.4070407@unisul.br>

Hi Mark,

Ok. I?ll test it, thanks.

Curious? :) ... I?m testing somethings about progressive alignment, 
because I dont?t found how to do Multiple Sequence Aligments (MSA) using 
with only Biojava. I?m wrong?

I make some tests with strap but it?s not what I need. Have you any 
suggestion about MSA with BioJava?

Thanks all!

mark.schreiber@novartis.com escreveu:
> Hi -
> 
> I think the DP class should accept a GappedSequence. To get the result you 
> want you will probably need to have at least one match state that can emit 
> gaps. I'm curious to know why you would want to do that kind of alignment 
> though?
> 
> - Mark
From toddri at eden.rutgers.edu  Tue Jan 31 16:33:13 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Tue Jan 31 16:53:44 2006
Subject: [Biojava-l] Help needed tp add "Number of Bits" vertical and column
 number labeling to DistributionLogos
In-Reply-To: <43D935CA.2030708@unisul.br>
References: <43D92848.3060408@unisul.br>	<200601262033.45779.matthew.pocock@ncl.ac.uk>
	<43D935CA.2030708@unisul.br>
Message-ID: <43DFD799.7050803@eden.rutgers.edu>

Hello,

I would like to add a "2 Bits" vertical label (with a bracket) and 
column numbering to my DistributionLogos.  I have seen both in some 
graphics, but haven't been able to find the code in the demos or on the web.

Thanks,
Todd

From mark.schreiber at novartis.com  Tue Jan 31 20:08:41 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Jan 31 20:04:56 2006
Subject: [Biojava-l] Alignment with GAPs
Message-ID: <OF7C21BE6C.E3764341-ON48257108.00063A65-48257108.000649F9@EU.novartis.net>

There is no MSA in biojava. CLUSTALW, TCoffee etc are probably much 
better.


"Dickson S. Guedes" <guedes@unisul.br>
01/28/2006 12:09 AM

 
        To:     Mark Schreiber/GP/Novartis@PH
        cc:     Biojava-l@biojava.org, Matthew Pocock <matthew.pocock@ncl.ac.uk>
        Subject:        Re: [Biojava-l] Alignment with GAPs


Hi Mark,

Ok. I?ll test it, thanks.

Curious? :) ... I?m testing somethings about progressive alignment, 
because I dont?t found how to do Multiple Sequence Aligments (MSA) using 
with only Biojava. I?m wrong?

I make some tests with strap but it?s not what I need. Have you any 
suggestion about MSA with BioJava?

Thanks all!

mark.schreiber@novartis.com escreveu:
> Hi -
> 
> I think the DP class should accept a GappedSequence. To get the result 
you 
> want you will probably need to have at least one match state that can 
emit 
> gaps. I'm curious to know why you would want to do that kind of 
alignment 
> though?
> 
> - Mark


From mark.schreiber at novartis.com  Tue Jan 31 20:12:36 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Jan 31 20:08:45 2006
Subject: [Biojava-l] Help needed tp add "Number of Bits" vertical and
	column number labeling to DistributionLogos
Message-ID: <OF51AB00D8.8DA75282-ON48257108.00065759-48257108.0006A5D4@EU.novartis.net>

Hi Todd,

The DistributionLogos class is not the best way to draw large logos with 
additional features such as labels.

The best way to do this is to make a custom component and copy the drawing 
code from DistributionLogo and incorporate your own code for labels etc. 
This way you can also draw several positions in the Logo into one 
component. The better option is to make the code draw direct to a 
Graphics2D object. In this way the code can paint to a component or to a 
BufferedImage.

- Mark


Todd Riley <toddri@eden.rutgers.edu>
Sent by: biojava-l-bounces@portal.open-bio.org
02/01/2006 05:33 AM

 
        To:     Biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Help needed tp add "Number of Bits" vertical and column number 
labeling to DistributionLogos


Hello,

I would like to add a "2 Bits" vertical label (with a bracket) and 
column numbering to my DistributionLogos.  I have seen both in some 
graphics, but haven't been able to find the code in the demos or on the 
web.

Thanks,
Todd

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From toddri at eden.rutgers.edu  Tue Jan 31 21:57:44 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Wed Feb  1 22:06:22 2006
Subject: [Biojava-l] Help needed to add "Number of Bits" vertical and
	column number labeling to DistributionLogos
In-Reply-To: <OF51AB00D8.8DA75282-ON48257108.00065759-48257108.0006A5D4@EU.novartis.net>
References: <OF51AB00D8.8DA75282-ON48257108.00065759-48257108.0006A5D4@EU.novartis.net>
Message-ID: <43E023A8.7080201@eden.rutgers.edu>

An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/biojava-l/attachments/20060131/0814f9cb/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 35167 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060131/0814f9cb/attachment-0001.jpg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: C:\DOCUME~1\Todd\LOCALS~1\Temp\msohtml1\01\clip_image002.jpg
Type: image/jpeg
Size: 17019 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060131/0814f9cb/clip_image002-0001.jpg