From heatkent at gmail.com  Wed Mar  1 16:15:20 2006
From: heatkent at gmail.com (Heather Kent)
Date: Thu Mar  2 01:55:49 2006
Subject: [Biojava-l] problems with SubIntegerAlphabet
Message-ID: <de8b3c810603011315s21a0c52cwe539dfa38a1ea79f@mail.gmail.com>

Hi,

i'm currently having a problem with the
IntegerAlphabet.SubIntegerAlphabetclass. When i make a call to the 
seqstring() method  from
AbstractSymbolList  i get an error, No such element Exception "parser not
supported by Integer Alphabet yet" in the getTokenization method of my
SubIntegerAlphabet class

the call from seqstring to getTokenization sends "default" as the string
name .....the getTokenization method for the IntegerAlphabet class accepts
both "token" or "default" but the SubIntegerAlphabet class only accepts only
"token"

can anyone help me find a way around this when i'm working with
SubIntegerAlphabets??

thanx

Heather

From mark.schreiber at novartis.com  Thu Mar  2 04:09:08 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Mar  2 04:04:33 2006
Subject: [Biojava-l] problems with SubIntegerAlphabet
Message-ID: <OF99054EBB.58D8276E-ON48257125.0031F283-48257125.0032469C@EU.novartis.net>

Hi -

The actual code in seqString() method gets the "default" tokenizer from 
the parent Alphabet (SubInteger in this case) and asks it to tokenize the 
SymbolList. It is a bug that SubIntegerAlphabet doesn't have a "default", 
however, if you use the code from SimpleSymbolList's .seqString() method 
as an example you can do the equivalent operation "manually" as a work 
around using "token" instead of default.

Let me know if you have problems...

I will also fix this bug in CVS shortly.

- Mark


"Heather Kent" <heatkent@gmail.com>
Sent by: biojava-l-bounces@portal.open-bio.org
03/02/2006 05:15 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] problems with SubIntegerAlphabet


Hi,

i'm currently having a problem with the
IntegerAlphabet.SubIntegerAlphabetclass. When i make a call to the 
seqstring() method  from
AbstractSymbolList  i get an error, No such element Exception "parser not
supported by Integer Alphabet yet" in the getTokenization method of my
SubIntegerAlphabet class

the call from seqstring to getTokenization sends "default" as the string
name .....the getTokenization method for the IntegerAlphabet class accepts
both "token" or "default" but the SubIntegerAlphabet class only accepts 
only
"token"

can anyone help me find a way around this when i'm working with
SubIntegerAlphabets??

thanx

Heather

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Thu Mar  2 06:48:05 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Mar  2 06:43:54 2006
Subject: [Biojava-l] problems with SubIntegerAlphabet
Message-ID: <OF3E02B6CC.BE3E716D-ON48257125.0040BD75-48257125.0040D3F8@EU.novartis.net>

Fixed in CVS now

- Mark


"Heather Kent" <heatkent@gmail.com>
Sent by: biojava-l-bounces@portal.open-bio.org
03/02/2006 05:15 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] problems with SubIntegerAlphabet


Hi,

i'm currently having a problem with the
IntegerAlphabet.SubIntegerAlphabetclass. When i make a call to the 
seqstring() method  from
AbstractSymbolList  i get an error, No such element Exception "parser not
supported by Integer Alphabet yet" in the getTokenization method of my
SubIntegerAlphabet class

the call from seqstring to getTokenization sends "default" as the string
name .....the getTokenization method for the IntegerAlphabet class accepts
both "token" or "default" but the SubIntegerAlphabet class only accepts 
only
"token"

can anyone help me find a way around this when i'm working with
SubIntegerAlphabets??

thanx

Heather

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From admin at unleashedinformatics.com  Thu Mar  2 18:14:22 2006
From: admin at unleashedinformatics.com (Unleashed Informatics Administration)
Date: Thu Mar  2 18:09:28 2006
Subject: [Biojava-l] SeqHound User Support
Message-ID: <44077C4E.9010500@unleashedinformatics.com>

As announced on 2 March 2006, SeqHound has been replaced by "DogBox
Online".

 From 3 April 2006, users of the SeqHound API will be required to
provide their e-mail address when beginning a block of SeqHound (now
DogBox Online) calls.

Users who fail to provide a valid address will not have access to the
API, and will not have access to user support.

The following FAQ has been posted to the DogBox Online website.

FAQ:

Q. What is DogBox Online?

DogBox Online is a powerful integrated data service for the life science
community, and represents the new name for the SeqHound service offered
by The Blueprint Initiative. The new service is located at:

http://dogboxonline.unleashedinformatics.com

Q. What happened to SeqHound?

The SeqHound service is being phased out. Please change to DogBox Online.

Q. How will DogBox Online differ?

DogBox Online includes several new
features previously available only to DogBox customers.

Q. Will my use of the SeqHound API be affected?

Yes. On 3 April 2006, Unleashed Informatics will require SeqHound users
to submit a valid e-mail address as part of the initial SHoundInit call.
  If you were not using a SHoundInit call or similar in your code,
ensure that you now begin your scripts with this call.

Q. Can you give me an example?

Using the Perl API, you need to begin your series of SeqHound queries
like so:

> use SeqHound;
> SHoundInit('Program Name');

Replace the 'Program Name' text with your valid e-mail address:

> use SeqHound;
> SHoundInit('joe.bloggs@blogme.com');

To avoid disappointment, we recommend you change your scripts now to
employ the new URL, http://dogboxonline.unleashedinformatics.com.

Q. What happens if I don't provide a valid email address?

Your use of the API will be blocked until you do so.

Q. Why is this change happening?

1. To obtain feedback from users regarding API use and improvement.

2. To notify users of future developments and new features.

Q. Where do I sign up?

https://secure.unleashedinformatics.com/index.php?pg=support.register

Thank you for your co-operation.


From admin at unleashedinformatics.com  Thu Mar  2 18:09:05 2006
From: admin at unleashedinformatics.com (Unleashed Informatics Administration)
Date: Thu Mar  2 20:04:09 2006
Subject: [Biojava-l] Unleashed Informatics Supports DogBox Online Community
Message-ID: <44077B11.7090608@unleashedinformatics.com>

In December 2005, Unleashed Informatics acquired commercial rights to 
Blueprint Initiative intellectual property from Mount Sinai Hospital.

Spun-off from The Blueprint Initiative public research program at 
Toronto?s Mount Sinai Hospital, Unleashed Informatics provides 
integrated hardware and software products designed to harness the power 
of increasingly complex scientific data.

On 22 February, Unleashed Informatics released DogBox Online as an open 
access product to the life science community.

DogBox Online is an integrated, online data retrieval service and 
represents the new, re-named SeqHound service previously offered by the 
Blueprint Initiative.

The new service is located at 
http://dogboxonline.unleashedinformatics.com, and requires a free 
Unleashed Informatics account for unrestricted access.

The DogBox Online registration process will help Unleashed Informatics 
better understand the resource user base, and ultimately help us improve 
our open access offerings in line with the needs of the life sciences 
community.

Importantly, the collection of such user feedback is essential for the 
preparation of planned public good research grant applications aimed at 
funding the ongoing provision of open source and freely available 
bioinformatics resources.

Unleashed Informatics is making a concerted effort to develop, maintain 
and improve open access resources for global researchers. The release 
this past week of the freely accessible DogBox Online reaffirms the 
company?s commitment to open access resources.

Specific support documentation for new DogBox Online service can be 
found in the Help section.
From wendy.wong at gmail.com  Fri Mar  3 09:28:18 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Fri Mar  3 22:35:43 2006
Subject: [Biojava-l] odds ratio
Message-ID: <e554425b0603030628i1440c9b1j8a25912da6b09a4d@mail.gmail.com>

Hi,

I am trying to set up an HMM wth a few states. I have a background
state with background distribution. For the rest of the states, I set
up the distribution and then use the setNullModel to set the  null
distribution to the background distribution. Am I doing the right
thing?

The reason why I am asking is when I tried using the forward backward
algorithm, the scores of the state that I am interested in at each
site is greater than 2000. I would expect some sites to have a number
less than 1 to indicate that it is more likely to be in the null
distribution, or am I doing something totally wrong here?

thanks,
wendy

From mark.schreiber at novartis.com  Sun Mar  5 20:24:00 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Mar  5 20:19:05 2006
Subject: [Biojava-l] odds ratio
Message-ID: <OF310E3CED.B81B2EF3-ON48257129.00076B7B-48257129.0007B0D7@EU.novartis.net>

>Hi,
>
>I am trying to set up an HMM wth a few states. I have a background
>state with background distribution. For the rest of the states, I set
>up the distribution and then use the setNullModel to set the  null
>distribution to the background distribution. Am I doing the right
>thing?

Sounds correct.

>The reason why I am asking is when I tried using the forward backward
>algorithm, the scores of the state that I am interested in at each
>site is greater than 2000. I would expect some sites to have a number
>less than 1 to indicate that it is more likely to be in the null
>distribution, or am I doing something totally wrong here?

If you used an odds scoring function then you have done things correctly. 
Sounds wierd.

- Mark

From wendy.wong at gmail.com  Mon Mar  6 06:16:46 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Mon Mar  6 06:19:04 2006
Subject: [Biojava-l] odds ratio
In-Reply-To: <OF310E3CED.B81B2EF3-ON48257129.00076B7B-48257129.0007B0D7@EU.novartis.net>
References: <OF310E3CED.B81B2EF3-ON48257129.00076B7B-48257129.0007B0D7@EU.novartis.net>
Message-ID: <e554425b0603060316j715cc757o5d17ac482a801ff7@mail.gmail.com>

Hi, here is what I have done:

First, I set up the distribution for a state, then do
siteDistribution.setNullModel(nullDistribution);
to set up the null Distribution

then use this function to get the odds ratio of my C3 state (the one i
am interested in). Maybe I can try getting the odds ratio for the
background state and see what numers I got?

public static void getC3PosteriorOdds(MarkovModel model, SymbolList
obsSymList) throws IllegalArgumentException, BioException {
		DP dp = DPFactory.DEFAULT.createDP(model);
		SymbolList[] obs_array = {obsSymList};
		DPMatrix dpMatrix = dp.forwardsBackwards(obs_array, ScoreType.ODDS);
		State [] states = dp.getStates();
		
		//find the C3 site
		int c3Index = -1;
		for (int s = 0; s < dp.getDotStatesIndex(); s++) {
    	 		if (states[s].getName().equalsIgnoreCase("C.3")) {
    	 			c3Index = s;
    	 			break;
    	 		}
		}
		
		for (int i = 0; i < obsSymList.length(); i++) {
			int[] array = { c3Index, i };
			log.debug(i+1 + " " + dpMatrix.getCell(array));	
		}
		
	}


thanks,
wendy

On 3/6/06, mark.schreiber@novartis.com <mark.schreiber@novartis.com> wrote:
> >Hi,
> >
> >I am trying to set up an HMM wth a few states. I have a background
> >state with background distribution. For the rest of the states, I set
> >up the distribution and then use the setNullModel to set the  null
> >distribution to the background distribution. Am I doing the right
> >thing?
>
> Sounds correct.
>
> >The reason why I am asking is when I tried using the forward backward
> >algorithm, the scores of the state that I am interested in at each
> >site is greater than 2000. I would expect some sites to have a number
> >less than 1 to indicate that it is more likely to be in the null
> >distribution, or am I doing something totally wrong here?
>
> If you used an odds scoring function then you have done things correctly.
> Sounds wierd.
>
> - Mark
>
>

From jolyon.holdstock at ogt.co.uk  Wed Mar  8 05:47:07 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Wed Mar  8 07:11:05 2006
Subject: [Biojava-l] BiojavaX EmblFormat
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FA7AEC7@EUCLID.internal.ogtip.com>

Hi,

 
I am using the new format parsers in BioJavaX. GenbankFormat is great,
but I am having some trouble with the EMBLFormat class. I have
downloaded a sequence file (ID:U00096) from the EBI in EMBL format but I
don't believe it is parsing properly.

 
My code is as follows:

String fileName = "path to file";

try {

  RichSequenceIterator rsi = RichSequence.IOTools.readEMBLDNA(new
BufferedReader(new FileReader(fileName)), null);

  while (rsi.hasNext()) {

    RichSequence seq = rsi.nextRichSequence();

    System.out.println(seq.getURN());

    System.out.println(seq.length());

    System.out.println(seq.getAccession());

  }

}

catch (IOException IOE) {

  System.out.println("BioJava IOException " + IOE);

}

catch (BioException BIOE) {

  System.out.println("BioJavaX BioException " + BIOE);

  BIOE.printStackTrace();                   

}

 
The BioJava parser will read it.

seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence(); //works

 
I checked the web CVS and the EMBLFormat class is 3 months old so I am
using the most recent version.

I have pasted a snippet of the sequence file that retains the problems
below.

 
The errors are:

 
The ID line isn't parsed because of 'genomic' being there - deleting it
removes the problem

 
org.biojava.bio.BioException: Could not read sequence

Caused by: org.biojava.bio.seq.io.ParseException: 

    Bad ID line found: U00096     standard; circular genomic DNA; PRO;
4639675 BP.

ID   U00096     standard; circular genomic DNA; PRO; 4639675 BP. //fails

ID   U00096     standard; circular DNA; PRO; 4639675 BP. //works

 
There is a problem with the RX tag which fails with output:

 
org.biojava.bio.BioException: Could not read sequence

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1

      at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:352)

 
Replacing 
RX   DOI; 10.1126/science.277.5331.1453.

with removes the error 

XX   RX   DOI; 10.1126/science.277.5331.1453.

 
There is an error with parsing the authors

 
org.biojava.bio.BioException: Could not read sequence

Caused by: java.lang.IllegalArgumentException: Authors string cannot be
null

      at
org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75)

      at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:395)

      at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead
er.java:100)

 
I am looking at the code trying to see where the problems are but
suspect that it may be beyond me.

So if anybody has some experience of this I would welcome their input.

 
Thanks,

 
Jolyon

 
This is a snippet of the code that reproduces the errors in my hands.

 
ID   U00096     standard; circular genomic DNA; PRO; 4639675 BP.

XX

AC   U00096; AE000111-AE000510;

XX

SV   U00096.2

XX

DT   23-FEB-2006 (Rel. 86, Created)

DT   06-MAR-2006 (Rel. 87, Last updated, Version 3)

XX

DE   Escherichia coli K-12 MG1655, complete genome.

XX

KW   .

XX

OS   Escherichia coli K12

OC   Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;

OC   Enterobacteriaceae; Escherichia.

XX

RN   [1]

RP   1-4639675

RX   DOI; 10.1126/science.277.5331.1453.

RX   PUBMED; 9278503.

RA   Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V.,
Riley M.,

RA   Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J.,

RA   Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao
Y.;

RT   "The complete genome sequence of Escherichia coli K-12";

RL   Science 277(5331):1453-1474(1997).

XX

RN   [2]

RP   1-4639675

RX   DOI; 10.1093/nar/gkj150.

RX   PUBMED; 16397293.

RA   Riley M., Abe T., Arnaud M.B., Berlyn M.K., Blattner F.R.,
Chaudhuri R.R.,

RA   Glasner J.D., Horiuchi T., Keseler I.M., Kosuge T., Mori H., Perna
N.T.,

RA   Plunkett G. III, Rudd K.E., Serres M.H., Thomas G.H., Thomson N.R.,

RA   Wishart D., Wanner B.L.;

RT   "Escherichia coli K-12: a cooperatively developed annotation

RT   snapshot--2005";

RL   (er) Nucleic Acids Res. 34 (1), 1-9 (2006)

XX

RN   [3]

RC   Woods Hole, Mass., on 14-18 November 2003 (sequence corrections)

RP   1-4639675

RA   Arnaud M., Berlyn M.K.B., Blattner F.R., Galperin M.Y., Glasner
J.D.,

RA   Horiuchi T., Kosuge T., Mori H., Perna N.T., Plunkett G. III, Riley
M.,

RA   Rudd K.E., Serres M.H., Thomas G.H., Wanner B.L.;

RT   "Workshop on Annotation of Escherichia coli K-12";

RL   Unpublished.

XX

RN   [4]

RC   ASAP download 10 June 2004 (annotation updates)

RP   1-4639675

RA   Glasner J.D., Perna N.T., Plunkett G. III, Anderson B.D., Bockhorst
J.,

RA   Hu J.C., Riley M., Rudd K.E., Serres M.H.;

RT   "ASAP: Escherichia coli K-12 strain MG1655 version m56";

RL   Unpublished.

XX

RN   [5]

RC   GenBank accessions AG613214 to AG613378 (sequence corrections)

RP   1-4639675

RA   Hayashi K., Morooka N., Mori H., Horiuchi T.;

RT   "A more accurate sequence comparison between genomes of Escherichia
coli

RT   K12 W3110 and MG1655 strains";

RL   Unpublished.

XX

RN   [6]

RC   GenBank accession AY605712 (sequence corrections)

RP   1-4639675

RA   Perna N.T.;

RT   "Escherichia coli K-12 MG1655 yqiK-rfaE intergenic region, genomic
sequence

RT   correction";

RL   Unpublished.

XX

RN   [7]

RP   1-4639675

RA   Rudd K.E.;

RT   "A manual approach to accurate translation start site annotation:
an E.

RT   coli K-12 case study";

RL   Unpublished.

XX

RN   [8]

RP   1-4639675

RA   Blattner F.R., Plunkett G. III.;

RT   ;

RL   Submitted (16-JAN-1997) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [9]

RP   1-4639675

RA   Blattner F.R., Plunkett G. III.;

RT   ;

RL   Submitted (02-SEP-1997) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [10]

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (13-OCT-1998) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [11]

RC   Sequence update by submitter

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (10-JUN-2004) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [12]

RC   Protein updates by submitter

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (07-FEB-2006) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

DR   EMBL-TPA; BR000242.

XX

FH   Key             Location/Qualifiers

FH

FT   source          1..4639675

FT                   /organism="Escherichia coli K12"

FT                   /strain="K-12"

FT                   /sub_strain="MG1655"

FT                   /mol_type="genomic DNA"

FT                   /db_xref="taxon:83333"

FT   gene            190..255

FT                   /gene="thrL"

FT                   /locus_tag="b0001"

FT                   /note="synonyms: ECK0001, JW4367"

FT   CDS             190..255

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrL"

FT                   /locus_tag="b0001"

FT                   /product="thr operon leader peptide"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="leader; Amino acid biosynthesis:
Threonine"

FT                   /note="go_process: threonine biosynthesis [goid
0009088]"

FT                   /protein_id="AAC73112.1"

FT                   /translation="MKRISTTITTTITITTGNGAG"

FT   gene            337..2799

FT                   /gene="thrA"

FT                   /locus_tag="b0002"

FT                   /note="synonyms: Hs, thrD, ECK0002, JW0001"

FT   CDS             337..2799

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrA"

FT                   /locus_tag="b0002"

FT                   /product="fused aspartokinase I and homoserine

FT                   dehydrogenase I"

FT                   /function="1.5.1.21 metabolism; building block

FT                   biosynthesis; amino acids; homoserine"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="1.1.1.3"

FT                   /EC_number="2.7.2.4"

FT                   /note="bifunctional: aspartokinase I (N-terminal);

FT                   homoserine dehydrogenase I (C-terminal);
go_component:

FT                   cytoplasm [goid 0005737]; go_process: threonine

FT                   biosynthesis [goid 0009088]; go_process: homoserine

FT                   biosynthesis [goid 0009090]"

FT                   /protein_id="AAC73113.1"

FT
/translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITN

FT
HLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHV

FT
LHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLES

FT
TVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADC

FT
CEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCL

FT
IKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMS

FT
RARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAII

FT
SVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQM

FT
LFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGVANSKALLTNVHGLN

FT
LENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVT

FT
PNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGI

FT
LSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGREL

FT
ELADIEIEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDG

FT
VCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR

FT                   TLSWKLGV"

FT   gene            2801..3733

FT                   /gene="thrB"

FT                   /locus_tag="b0003"

FT                   /note="synonyms: ECK0003, JW0002"

FT   CDS             2801..3733

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrB"

FT                   /locus_tag="b0003"

FT                   /product="homoserine kinase"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="2.7.1.39"

FT                   /note="go_component: cytoplasm [goid 0005737];
go_process:

FT                   threonine biosynthesis [goid 0009088]"

FT                   /protein_id="AAC73114.1"

FT
/translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFS

FT
LNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVV

FT
AALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQ

FT
QVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELA

FT
AKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVAD

FT                   WLGKNYLQNQEGFVHICRLDTAGARVLEN"

FT   gene            3734..5020

FT                   /gene="thrC"

FT                   /locus_tag="b0004"

FT                   /note="synonyms: ECK0004, JW0003"

FT   CDS             3734..5020

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrC"

FT                   /locus_tag="b0004"

FT                   /product="threonine synthase"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="4.2.3.1"

FT                   /note="go_component: cytoplasm [goid 0005737];
go_process:

FT                   threonine biosynthesis [goid 0009088]"

FT                   /protein_id="AAC73115.1"

FT
/translation="MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDE

FT
MLKLDFVTRSAKILSAFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTL

FT
AFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKIS

FT
PLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQ

FT
ICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPR

FT
FLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTM

FT
RELKELGYTSEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKE

FT                   LAERADLPLLSHNLPADFAALRKLMMNHQ"

XX

SQ   Sequence 4639675 BP; 1142228 A; 1179554 C; 1176923 G; 1140970 T; 0
other;

     agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc
60

     tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg
120

     tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac
180

     acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt
240

     aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg
300

     cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt
360

     acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc
420

     aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg
480

     gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa
540

     cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg
600

     caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt
660

     agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa
720

     atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc
780

     gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct
840

     gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca
900

     ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac
960

     tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac
1020

     gttgacgggg tctatacctg cgacccgcgt caggtgcccg atgcgaggtt gttgaagtcg
1080

     atgtcctacc aggaagcgat ggagctttcc tacttcggcg ctaaagttct tcacccccgc
1140

     accattaccc ccatcgccca gttccagatc ccttgcctga ttaaaaatac cggaaatcct
1200

     caagcaccag gtacgctcat tggtgccagc cgtgatgaag acgaattacc ggtcaagggc
1260

     atttccaatc tgaataacat ggcaatgttc agcgtttctg gtccggggat gaaagggatg
1320

     gtcggcatgg cggcgcgcgt ctttgcagcg atgtcacgcg cccgtatttc cgtggtgctg
1380

     attacgcaat catcttccga atacagcatc agtttctgcg ttccacaaag cgactgtgtg
1440

     cgagctgaac gggcaatgca ggaagagttc tacctggaac tgaaagaagg cttactggag
1500

     ccgctggcag tgacggaacg gctggccatt atctcggtgg taggtgatgg tatgcgcacc
1560

     ttgcgtggga tctcggcgaa attctttgcc gcactggccc gcgccaatat caacattgtc
1620

     gccattgctc agggatcttc tgaacgctca atctctgtcg tggtaaataa cgatgatgcg
1680

     accactggcg tgcgcgttac tcatcagatg ctgttcaata ccgatcaggt tatcgaagtg
1740

     tttgtgattg gcgtcggtgg cgttggcggt gcgctgctgg agcaactgaa gcgtcagcaa
1800

     agctggctga agaataaaca tatcgactta cgtgtctgcg gtgttgccaa ctcgaaggct
1860

     ctgctcacca atgtacatgg ccttaatctg gaaaactggc aggaagaact ggcgcaagcc
1920

     aaagagccgt ttaatctcgg gcgcttaatt cgcctcgtga aagaatatca tctgctgaac
1980

     ccggtcattg ttgactgcac ttccagccag gcagtggcgg atcaatatgc cgacttcctg
2040

     cgcgaaggtt tccacgttgt cacgccgaac aaaaaggcca acacctcgtc gatggattac
2100

     taccatcagt tgcgttatgc ggcggaaaaa tcgcggcgta aattcctcta tgacaccaac
2160

     gttggggctg gattaccggt tattgagaac ctgcaaaatc tgctcaatgc aggtgatgaa
2220

     ttgatgaagt tctccggcat tctttctggt tcgctttctt atatcttcgg caagttagac
2280

     gaaggcatga gtttctccga ggcgaccacg ctggcgcggg aaatgggtta taccgaaccg
2340

     gacccgcgag atgatctttc tggtatggat gtggcgcgta aactattgat tctcgctcgt
2400

     gaaacgggac gtgaactgga gctggcggat attgaaattg aacctgtgct gcccgcagag
2460

     tttaacgccg agggtgatgt tgccgctttt atggcgaatc tgtcacaact cgacgatctc
2520

     tttgccgcgc gcgtggcgaa ggcccgtgat gaaggaaaag ttttgcgcta tgttggcaat
2580

     attgatgaag atggcgtctg ccgcgtgaag attgccgaag tggatggtaa tgatccgctg
2640

     ttcaaagtga aaaatggcga aaacgccctg gccttctata gccactatta tcagccgctg
2700

     ccgttggtac tgcgcggata tggtgcgggc aatgacgtta cagctgccgg tgtctttgct
2760

     gatctgctac gtaccctctc atggaagtta ggagtctgac atggttaaag tttatgcccc
2820

     ggcttccagt gccaatatga gcgtcgggtt tgatgtgctc ggggcggcgg tgacacctgt
2880

     tgatggtgca ttgctcggag atgtagtcac ggttgaggcg gcagagacat tcagtctcaa
2940

     caacctcgga cgctttgccg ataagctgcc gtcagaacca cgggaaaata tcgtttatca
3000

     gtgctgggag cgtttttgcc aggaactggg taagcaaatt ccagtggcga tgaccctgga
3060

     aaagaatatg ccgatcggtt cgggcttagg ctccagtgcc tgttcggtgg tcgcggcgct
3120

     gatggcgatg aatgaacact gcggcaagcc gcttaatgac actcgtttgc tggctttgat
3180

     gggcgagctg gaaggccgta tctccggcag cattcattac gacaacgtgg caccgtgttt
3240

     tctcggtggt atgcagttga tgatcgaaga aaacgacatc atcagccagc aagtgccagg
3300

     gtttgatgag tggctgtggg tgctggcgta tccggggatt aaagtctcga cggcagaagc
3360

     cagggctatt ttaccggcgc agtatcgccg ccaggattgc attgcgcacg ggcgacatct
3420

     ggcaggcttc attcacgcct gctattcccg tcagcctgag cttgccgcga agctgatgaa
3480

     agatgttatc gctgaaccct accgtgaacg gttactgcca ggcttccggc aggcgcggca
3540

     ggcggtcgcg gaaatcggcg cggtagcgag cggtatctcc ggctccggcc cgaccttgtt
3600

     cgctctgtgt gacaagccgg aaaccgccca gcgcgttgcc gactggttgg gtaagaacta
3660

     cctgcaaaat caggaaggtt ttgttcatat ttgccggctg gatacggcgg gcgcacgagt
3720

     actggaaaac taaatgaaac tctacaatct gaaagatcac aacgagcagg tcagctttgc
3780

     gcaagccgta acccaggggt tgggcaaaaa tcaggggctg ttttttccgc acgacctgcc
3840

     ggaattcagc ctgactgaaa ttgatgagat gctgaagctg gattttgtca cccgcagtgc
3900

     gaagatcctc tcggcgttta ttggtgatga aatcccacag gaaatcctgg aagagcgcgt
3960

     gcgcgcggcg tttgccttcc cggctccggt cgccaatgtt gaaagcgatg tcggttgtct
4020

     ggaattgttc cacgggccaa cgctggcatt taaagatttc ggcggtcgct ttatggcaca
4080

     aatgctgacc catattgcgg gtgataagcc agtgaccatt ctgaccgcga cctccggtga
4140

     taccggagcg gcagtggctc atgctttcta cggtttaccg aatgtgaaag tggttatcct
4200

     ctatccacga ggcaaaatca gtccactgca agaaaaactg ttctgtacat tgggcggcaa
4260

     tatcgaaact gttgccatcg acggcgattt cgatgcctgt caggcgctgg tgaagcaggc
4320

     gtttgatgat gaagaactga aagtggcgct agggttaaac tcggctaact cgattaacat
4380

     cagccgtttg ctggcgcaga tttgctacta ctttgaagct gttgcgcagc tgccgcagga
4440

     gacgcgcaac cagctggttg tctcggtgcc aagcggaaac ttcggcgatt tgacggcggg
4500

     tctgctggcg aagtcactcg gtctgccggt gaaacgtttt attgctgcga ccaacgtgaa
4560

     cgataccgtg ccacgtttcc tgcacgacgg tcagtggtca cccaaagcga ctcaggcgac
4620

     gttatccaac gcgatggacg tgagtcagcc gaacaactgg ccgcgtgtgg aagagttgtt
4680

     ccgccgcaaa atctggcaac tgaaagagct gggttatgca gccgtggatg atgaaaccac
4740

     gcaacagaca atgcgtgagt taaaagaact gggctacact tcggagccgc acgctgccgt
4800

     agcttatcgt gcgctgcgtg atcagttgaa tccaggcgaa tatggcttgt tcctcggcac
4860

     cgcgcatccg gcgaaattta aagagagcgt ggaagcgatt ctcggtgaaa cgttggatct
4920

     gccaaaagag ctggcagaac gtgctgattt acccttgctt tcacataatc tgcccgccga
4980

     ttttgctgcg ttgcgtaaat tgatgatgaa tcatcagtaa aatctattca ttatctcaat
5040

     caggccgggt ttgcttttat gcagcccggc ttttttatga agaaattatg gagaaaaatg
5100

     acagggaaaa aggagaaatt ctcaataaat gcggtaactt agagattagg attgcggaga
5160

     ataacaaccg ccgttctcat cgagtaatct ccggatatcg acccataacg ggcaatgata
5220

     aaaggagtaa cctgtgaaaa agatgcaatc tatcgtactc gcactttccc tggttctggt
5280

     cgctcccatg gcagcacagg ctgcggaaat tacgttagtc ccgtcagtaa aattacagat
5340

     aggcgatcgt gataatcgtg gctattactg ggatggaggt cactggcgcg accacggctg
5400

     gtggaaacaa cattatgaat ggcgaggcaa tcgctggcac ctacacggac cgccgccacc
5460

     gccgcgccac cataagaaag ctcctcatga tcatcacggc ggtcatggtc caggcaaaca
5520

     tcaccgctaa atgacaaatg ccgggtaaca atccggcatt cagcgcctga tgcgacgctg
5580

     gcgcgtctta tcaggcctac gttaattctg caatatattg aatctgcatg cttttgtagg
5640

     caggataagg cgttcacgcc gcatccggca ttgactgcaa acttaacgct gctcgtagcg
5700

     tttaaacacc agttcgccat tgctggagga atcttcatca aagaagtaac cttcgctatt
5760

     aaaaccagtc agttgctctg gtttggtcag ccgattttca ataatgaaac gactcatcag
5820

     accgcgtgct ttcttagcgt agaagctgat gatcttaaat ttgccgttct tctcatcgag
5880

     gaacaccggc ttgataatct cggcattcaa tttcttcggc ttcaccgatt taaaatactc
5940

     atctgacgcc agattaatca ccacattatc gccttgtgct gcgagcgcct cgttcagctt
6000

     gttggtgatg atatctcccc agaattgata cagatctttc cctcgggcat tctcaagacg
6060

     gatccccatt tccagacgat aaggctgcat taaatcgagc gggcggagta cgccatacaa
6120

     gccggaaagc attcgcaaat gctgttgggc aaaatcgaaa tcgtcttcgc tgaaggtttc
6180

     ggcctgcaag ccggtgtaga catcaccttt aaacgccaga atcgcctggc gggcattcgc
6240

     cggcgtgaaa tctggctgcc agtcatgaaa gcgagcggcg ttgatacccg ccagtttgtc
6300

     gctgatgcgc atcagcgtgc taatctgcgg aggcgtcagt ttccgcgcct catggatcaa
6360

     ctgctgggaa ttgtctaaca gctccggcag cgtatagcgc gtggtggtca acgggctttg
6420

     gtaatcaagc gttttcgcag gtgaaataag aatcagcata tccagtcctt gcaggaaatt
6480

     tatgccgact ttagcaaaaa atgagaatga gttgatcgat agttgtgatt actcctgcga
6540

     aacatcatcc cacgcgtccg gagaaagctg gcgaccgata tccggataac gcaatggatc
6600

     aaacaccggg cgcacgccga gtttacgctg gcgtagataa tcactggcaa tggtatgaac
6660

     cacaggcgag agcagtaaaa tggcggtcaa attggtaata gccatgcagg ccattatgat
6720

     atctgccagt tgccacatca gcggaaggct tagcaaggtg ccgccgatga ccgttgcgaa
6780

     ggtgcagatc cgcaaacacc agatcgcttt agggttgttc aggcgtaaaa agaagagatt
6840

     gttttcggca taaatgtagt tggcaacgat ggagctgaag gcaaacagaa taaccacaag
6900

     ggtaacaaac tcagcacccc aggaacccat tagcacccgc atcgccttct ggataagctg
6960

     aataccttcc agcggcatgt aggttgtgcc gttacccgcc agtaatatca gcatggcgct
7020

     tgccgtacag atgaccaggg tgtcgataaa aatgccaatc atctggacaa tcccttgcgc
7080

     tgccggatgc ggaggccagg acgccgctgc cgctgccgcg tttggcgtcg aacccattcc
7140

     cgcctcattg gaaaacatac tgcgctgaaa accgttagta atcgcctggc ttaaggtata
7200

     tcccgccgcg ccgcctgccg cttcctgcca gccaaaagca ctctcaaaaa tagaccaaat
7260

     gacgtgggga agttgcccga tattcattac gcaaattacc aggctggtca gtacccagat
7320

     tatcgccatc aacgggacaa agccctgcat gagccgggcg acgccatgaa gaccgcgagt
7380

     gattgccagc agagtaaaga cagcgagaat aatgcctgtc accagcgggg gaaaatcaaa
7440

     agaaaaactc agggcgcggg caacggcgtt cgcttgaact ccgctgaaaa ttatgccata
7500

     ggcgatgagc aaaaagacgg cgaacagaac gcccatccag cgcatcccca gcccgcgcgc
7560

     catataccat gccggtccgc cacgaaactg cccattgacg tcacgttctt tataaagttg
7620

     tgccagagaa cattcggcaa acgaggtcgc catgccgata aacgcggcaa cccacatcca
7680

     aaagacggct ccaggtccac cggcggtaat agccagcgca acgccggcca ggttgccgct
7740

     acccacgcgc gccgcaagac tggtacacaa tgactgaaat gaggttaaac cgcctggctg
7800

     tggatgaatg ctatttttaa gacttttgcc aaactggcgg atgtagcgaa actgcacaaa
7860

     tccggtgcga aaagtgaacc aacaacctgc gccgaagagc aggtaaatca ttaccgatcc
7920

     ccaaaggacg ctgttaatga aggagaaaaa atctggcatg catatccctc ttattgccgg
7980

     tcgcgatgac tttcctgtgt aaacgttacc aattgtttaa gaagtatata cgctacgagg
8040

     tacttgataa cttctgcgta gcatacatga ggttttgtat aaaaatggcg ggcgatatca
8100

     acgcagtgtc agaaatccga aacagtctcg cctggcgata accgtcttgt cggcggttgc
8160

     gctgacgttg cgtcgtgata tcatcagggc agaccggtta catcccccta acaagctgtt
8220

     taaagagaaa tactatcatg acggacaaat tgacctccct tcgtcagtac accaccgtag
8280

     tggccgacac tggggacatc gcggcaatga agctgtatca accgcaggat gccacaacca
8340

     acccttctct cattcttaac gcagcgcaga ttccggaata ccgtaagttg attgatgatg
8400

     ctgtcgcctg ggcgaaacag cagagcaacg atcgcgcgca gcagatcgtg gacgcgaccg
8460

     acaaactggc agtaaatatt ggtctggaaa tcctgaaact ggttccgggc cgtatctcaa
8520

     ctgaagttga tgcgcgtctt tcctatgaca ccgaagcgtc aattgcgaaa gcaaaacgcc
8580

     tgatcaaact ctacaacgat gctggtatta gcaacgatcg tattctgatc aaactggctt
8640

     ctacctggca gggtatccgt gctgcagaac agctggaaaa agaaggcatc aactgtaacc
8700

     tgaccctgct gttctccttc gctcaggctc gtgcttgtgc ggaagcgggc gtgttcctga
8760

     tctcgccgtt tgttggccgt attcttgact ggtacaaagc gaataccgat aagaaagagt
8820

     acgctccggc agaagatccg ggcgtggttt ctgtatctga aatctaccag tactacaaag
8880

     agcacggtta tgaaaccgtg gttatgggcg caagcttccg taacatcggc gaaattctgg
8940

     aactggcagg ctgcgaccgt ctgaccatcg caccggcact gctgaaagag ctggcggaga
9000

//

 
Jolyon Holdstock Ph.D.

Senior Computational Biologist,

Oxford Gene Technology (Ops) Ltd.

Begbroke Business and Science Park

Sandy Lane, Yarnton

Oxford, OX5 1PF

 
Tel: 01865 309699

Fax: 01865 842116

 
Confidentiality Notice:

The contents of this email from the Oxford Gene Technology Group of
Companies are confidential and intended solely for the person to whom it
is addressed. It may contain privileged and confidential information. If
you are not the intended recipient you must not read, copy, distribute,
discuss or take any action in reliance on it.

 
From dreher at mpiib-berlin.mpg.de  Wed Mar  8 13:08:50 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Wed Mar  8 13:11:54 2006
Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation
Message-ID: <440F1DB2.903@mpiib-berlin.mpg.de>

Hello all,

in my last post I described a problem with primary keys. When I tried to 
save a RichSequence with annotations in a PostgreSQL/BioSQL-Database 
using Hibernate,
among others the exception
--- org.postgresql.util.PSQLException: ERROR: relation 
"ontology_ontology_id_seq" does not exist ---
was thrown.
This could be solved by changing the <generator> tag in the ontology.hbm.xml
from
            <generator class="identity"/>

to
            <generator class="sequence">
                <param name="sequence">ontology_pk_seq</param>
            </generator>

(and similarly in the term.hbm.xml file).

I'm not sure if this is specific for my project or if it's a general 
problem.
Anyway, this works fine now, however another problem came up:

I want to enrich a Sequence that was downloaded from Genbank and (by 
enriching) save all the annotations in the RichSequence object.

Sequence seq = new GenbankSequenceDB().getSequence("NM_008160");
RichSequence s = RichSequence.Tools.enrich(seq);
tdb.addSequence(s);

(where tdb is a convenience wrapper for storing and retrieving sequences 
from the BioSQL-DB, it works with non-enriched sequences).

 From the debugging info I got, this works at the object level, but when 
I try to save the sequence to the DB, the following exception is thrown:


2006-03-08 18:35:00,642 ERROR [httpWorkerThread-28080-9]
 calling method: 
org.hibernate.util.JDBCExceptionReporter.logExceptions(JDBCExceptionReporter.java:72)
 *ERROR: duplicate key violates unique constraint 
"seqfeature_bioentry_id_key"*

2006-03-08 18:35:00,643 ERROR [httpWorkerThread-28080-9]
 calling method: 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:299)
 *Could not synchronize database state with session*
org.hibernate.exception.ConstraintViolationException: Could not execute 
JDBC batch update
        at 
org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:69)
        at 
org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:202)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427)
        at 
org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140)
        at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296)
        at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009)
        at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356)
        at 
org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106)
        at 
rnaiprediction.sequence.db.SequenceDB.addSequence(SequenceDB.java:67)
        at rnaiprediction.Queue.prerender(Queue.java:374)
......
       
*Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into 
seqfeature (bioentry_id, source_term_id, type_term_id, display_name, 
rank, seqfeature_id) values (126, 269, 269, NULL, 0, 83) was aborted.  
Call getNextException to see the cause.*
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2497)
        at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1298)
        at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:347)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2559)
        at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
        ... 57 more


Any suggestions would be highly appreciated!

Regards,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From mark.schreiber at novartis.com  Wed Mar  8 20:02:09 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Mar  8 19:57:12 2006
Subject: [Biojava-l] BiojavaX EmblFormat
Message-ID: <OFE0127171.ED2CB90B-ON4825712C.000590AE-4825712C.0005B129@EU.novartis.net>

The biojavax parser uses regular expressions to parse these lines. I will 
need to check what needs changing in these regex's to allow parsing of 
these files.

Thanks for your testing!

- Mark


"Jolyon Holdstock" <jolyon.holdstock@ogt.co.uk>
Sent by: biojava-l-bounces@portal.open-bio.org
03/08/2006 06:47 PM

 
        To:     <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BiojavaX EmblFormat


Hi,

 
I am using the new format parsers in BioJavaX. GenbankFormat is great,
but I am having some trouble with the EMBLFormat class. I have
downloaded a sequence file (ID:U00096) from the EBI in EMBL format but I
don't believe it is parsing properly.

 
My code is as follows:

String fileName = "path to file";

try {

  RichSequenceIterator rsi = RichSequence.IOTools.readEMBLDNA(new
BufferedReader(new FileReader(fileName)), null);

  while (rsi.hasNext()) {

    RichSequence seq = rsi.nextRichSequence();

    System.out.println(seq.getURN());

    System.out.println(seq.length());

    System.out.println(seq.getAccession());

  }

}

catch (IOException IOE) {

  System.out.println("BioJava IOException " + IOE);

}

catch (BioException BIOE) {

  System.out.println("BioJavaX BioException " + BIOE);

  BIOE.printStackTrace(); 

}

 
The BioJava parser will read it.

seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence(); //works

 
I checked the web CVS and the EMBLFormat class is 3 months old so I am
using the most recent version.

I have pasted a snippet of the sequence file that retains the problems
below.

 
The errors are:

 
The ID line isn't parsed because of 'genomic' being there - deleting it
removes the problem

 
org.biojava.bio.BioException: Could not read sequence

Caused by: org.biojava.bio.seq.io.ParseException: 

    Bad ID line found: U00096     standard; circular genomic DNA; PRO;
4639675 BP.

ID   U00096     standard; circular genomic DNA; PRO; 4639675 BP. //fails

ID   U00096     standard; circular DNA; PRO; 4639675 BP. //works

 
There is a problem with the RX tag which fails with output:

 
org.biojava.bio.BioException: Could not read sequence

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1

      at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:352)

 
Replacing 
RX   DOI; 10.1126/science.277.5331.1453.

with removes the error 

XX   RX   DOI; 10.1126/science.277.5331.1453.

 
There is an error with parsing the authors

 
org.biojava.bio.BioException: Could not read sequence

Caused by: java.lang.IllegalArgumentException: Authors string cannot be
null

      at
org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75)

      at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:395)

      at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead
er.java:100)

 
I am looking at the code trying to see where the problems are but
suspect that it may be beyond me.

So if anybody has some experience of this I would welcome their input.

 
Thanks,

 
Jolyon

 
This is a snippet of the code that reproduces the errors in my hands.

 
ID   U00096     standard; circular genomic DNA; PRO; 4639675 BP.

XX

AC   U00096; AE000111-AE000510;

XX

SV   U00096.2

XX

DT   23-FEB-2006 (Rel. 86, Created)

DT   06-MAR-2006 (Rel. 87, Last updated, Version 3)

XX

DE   Escherichia coli K-12 MG1655, complete genome.

XX

KW   .

XX

OS   Escherichia coli K12

OC   Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;

OC   Enterobacteriaceae; Escherichia.

XX

RN   [1]

RP   1-4639675

RX   DOI; 10.1126/science.277.5331.1453.

RX   PUBMED; 9278503.

RA   Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V.,
Riley M.,

RA   Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J.,

RA   Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao
Y.;

RT   "The complete genome sequence of Escherichia coli K-12";

RL   Science 277(5331):1453-1474(1997).

XX

RN   [2]

RP   1-4639675

RX   DOI; 10.1093/nar/gkj150.

RX   PUBMED; 16397293.

RA   Riley M., Abe T., Arnaud M.B., Berlyn M.K., Blattner F.R.,
Chaudhuri R.R.,

RA   Glasner J.D., Horiuchi T., Keseler I.M., Kosuge T., Mori H., Perna
N.T.,

RA   Plunkett G. III, Rudd K.E., Serres M.H., Thomas G.H., Thomson N.R.,

RA   Wishart D., Wanner B.L.;

RT   "Escherichia coli K-12: a cooperatively developed annotation

RT   snapshot--2005";

RL   (er) Nucleic Acids Res. 34 (1), 1-9 (2006)

XX

RN   [3]

RC   Woods Hole, Mass., on 14-18 November 2003 (sequence corrections)

RP   1-4639675

RA   Arnaud M., Berlyn M.K.B., Blattner F.R., Galperin M.Y., Glasner
J.D.,

RA   Horiuchi T., Kosuge T., Mori H., Perna N.T., Plunkett G. III, Riley
M.,

RA   Rudd K.E., Serres M.H., Thomas G.H., Wanner B.L.;

RT   "Workshop on Annotation of Escherichia coli K-12";

RL   Unpublished.

XX

RN   [4]

RC   ASAP download 10 June 2004 (annotation updates)

RP   1-4639675

RA   Glasner J.D., Perna N.T., Plunkett G. III, Anderson B.D., Bockhorst
J.,

RA   Hu J.C., Riley M., Rudd K.E., Serres M.H.;

RT   "ASAP: Escherichia coli K-12 strain MG1655 version m56";

RL   Unpublished.

XX

RN   [5]

RC   GenBank accessions AG613214 to AG613378 (sequence corrections)

RP   1-4639675

RA   Hayashi K., Morooka N., Mori H., Horiuchi T.;

RT   "A more accurate sequence comparison between genomes of Escherichia
coli

RT   K12 W3110 and MG1655 strains";

RL   Unpublished.

XX

RN   [6]

RC   GenBank accession AY605712 (sequence corrections)

RP   1-4639675

RA   Perna N.T.;

RT   "Escherichia coli K-12 MG1655 yqiK-rfaE intergenic region, genomic
sequence

RT   correction";

RL   Unpublished.

XX

RN   [7]

RP   1-4639675

RA   Rudd K.E.;

RT   "A manual approach to accurate translation start site annotation:
an E.

RT   coli K-12 case study";

RL   Unpublished.

XX

RN   [8]

RP   1-4639675

RA   Blattner F.R., Plunkett G. III.;

RT   ;

RL   Submitted (16-JAN-1997) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [9]

RP   1-4639675

RA   Blattner F.R., Plunkett G. III.;

RT   ;

RL   Submitted (02-SEP-1997) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [10]

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (13-OCT-1998) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [11]

RC   Sequence update by submitter

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (10-JUN-2004) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

RN   [12]

RC   Protein updates by submitter

RP   1-4639675

RA   Plunkett G. III.;

RT   ;

RL   Submitted (07-FEB-2006) to the EMBL/GenBank/DDBJ databases.

RL   Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,

RL   WI 53706-1580, USA

XX

DR   EMBL-TPA; BR000242.

XX

FH   Key             Location/Qualifiers

FH

FT   source          1..4639675

FT                   /organism="Escherichia coli K12"

FT                   /strain="K-12"

FT                   /sub_strain="MG1655"

FT                   /mol_type="genomic DNA"

FT                   /db_xref="taxon:83333"

FT   gene            190..255

FT                   /gene="thrL"

FT                   /locus_tag="b0001"

FT                   /note="synonyms: ECK0001, JW4367"

FT   CDS             190..255

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrL"

FT                   /locus_tag="b0001"

FT                   /product="thr operon leader peptide"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="leader; Amino acid biosynthesis:
Threonine"

FT                   /note="go_process: threonine biosynthesis [goid
0009088]"

FT                   /protein_id="AAC73112.1"

FT                   /translation="MKRISTTITTTITITTGNGAG"

FT   gene            337..2799

FT                   /gene="thrA"

FT                   /locus_tag="b0002"

FT                   /note="synonyms: Hs, thrD, ECK0002, JW0001"

FT   CDS             337..2799

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrA"

FT                   /locus_tag="b0002"

FT                   /product="fused aspartokinase I and homoserine

FT                   dehydrogenase I"

FT                   /function="1.5.1.21 metabolism; building block

FT                   biosynthesis; amino acids; homoserine"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="1.1.1.3"

FT                   /EC_number="2.7.2.4"

FT                   /note="bifunctional: aspartokinase I (N-terminal);

FT                   homoserine dehydrogenase I (C-terminal);
go_component:

FT                   cytoplasm [goid 0005737]; go_process: threonine

FT                   biosynthesis [goid 0009088]; go_process: homoserine

FT                   biosynthesis [goid 0009090]"

FT                   /protein_id="AAC73113.1"

FT
/translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITN

FT
HLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHV

FT
LHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLES

FT
TVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADC

FT
CEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCL

FT
IKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMS

FT
RARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAII

FT
SVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQM

FT
LFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGVANSKALLTNVHGLN

FT
LENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVT

FT
PNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGI

FT
LSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGREL

FT
ELADIEIEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDG

FT
VCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR

FT                   TLSWKLGV"

FT   gene            2801..3733

FT                   /gene="thrB"

FT                   /locus_tag="b0003"

FT                   /note="synonyms: ECK0003, JW0002"

FT   CDS             2801..3733

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrB"

FT                   /locus_tag="b0003"

FT                   /product="homoserine kinase"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="2.7.1.39"

FT                   /note="go_component: cytoplasm [goid 0005737];
go_process:

FT                   threonine biosynthesis [goid 0009088]"

FT                   /protein_id="AAC73114.1"

FT
/translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFS

FT
LNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVV

FT
AALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQ

FT
QVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELA

FT
AKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVAD

FT                   WLGKNYLQNQEGFVHICRLDTAGARVLEN"

FT   gene            3734..5020

FT                   /gene="thrC"

FT                   /locus_tag="b0004"

FT                   /note="synonyms: ECK0004, JW0003"

FT   CDS             3734..5020

FT                   /codon_start=1

FT                   /transl_table=11

FT                   /gene="thrC"

FT                   /locus_tag="b0004"

FT                   /product="threonine synthase"

FT                   /function="1.5.1.8 metabolism; building block
biosynthesis;

FT                   amino acids; threonine"

FT                   /function="7.1 location of gene products;
cytoplasm"

FT                   /function="enzyme; Amino acid biosynthesis:
Threonine"

FT                   /EC_number="4.2.3.1"

FT                   /note="go_component: cytoplasm [goid 0005737];
go_process:

FT                   threonine biosynthesis [goid 0009088]"

FT                   /protein_id="AAC73115.1"

FT
/translation="MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDE

FT
MLKLDFVTRSAKILSAFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTL

FT
AFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKIS

FT
PLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQ

FT
ICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPR

FT
FLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTM

FT
RELKELGYTSEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKE

FT                   LAERADLPLLSHNLPADFAALRKLMMNHQ"

XX

SQ   Sequence 4639675 BP; 1142228 A; 1179554 C; 1176923 G; 1140970 T; 0
other;

     agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc
60

     tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg
120

     tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac
180

     acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt
240

     aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg
300

     cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt
360

     acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc
420

     aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg
480

     gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa
540

     cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg
600

     caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt
660

     agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa
720

     atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc
780

     gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct
840

     gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca
900

     ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac
960

     tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac
1020

     gttgacgggg tctatacctg cgacccgcgt caggtgcccg atgcgaggtt gttgaagtcg
1080

     atgtcctacc aggaagcgat ggagctttcc tacttcggcg ctaaagttct tcacccccgc
1140

     accattaccc ccatcgccca gttccagatc ccttgcctga ttaaaaatac cggaaatcct
1200

     caagcaccag gtacgctcat tggtgccagc cgtgatgaag acgaattacc ggtcaagggc
1260

     atttccaatc tgaataacat ggcaatgttc agcgtttctg gtccggggat gaaagggatg
1320

     gtcggcatgg cggcgcgcgt ctttgcagcg atgtcacgcg cccgtatttc cgtggtgctg
1380

     attacgcaat catcttccga atacagcatc agtttctgcg ttccacaaag cgactgtgtg
1440

     cgagctgaac gggcaatgca ggaagagttc tacctggaac tgaaagaagg cttactggag
1500

     ccgctggcag tgacggaacg gctggccatt atctcggtgg taggtgatgg tatgcgcacc
1560

     ttgcgtggga tctcggcgaa attctttgcc gcactggccc gcgccaatat caacattgtc
1620

     gccattgctc agggatcttc tgaacgctca atctctgtcg tggtaaataa cgatgatgcg
1680

     accactggcg tgcgcgttac tcatcagatg ctgttcaata ccgatcaggt tatcgaagtg
1740

     tttgtgattg gcgtcggtgg cgttggcggt gcgctgctgg agcaactgaa gcgtcagcaa
1800

     agctggctga agaataaaca tatcgactta cgtgtctgcg gtgttgccaa ctcgaaggct
1860

     ctgctcacca atgtacatgg ccttaatctg gaaaactggc aggaagaact ggcgcaagcc
1920

     aaagagccgt ttaatctcgg gcgcttaatt cgcctcgtga aagaatatca tctgctgaac
1980

     ccggtcattg ttgactgcac ttccagccag gcagtggcgg atcaatatgc cgacttcctg
2040

     cgcgaaggtt tccacgttgt cacgccgaac aaaaaggcca acacctcgtc gatggattac
2100

     taccatcagt tgcgttatgc ggcggaaaaa tcgcggcgta aattcctcta tgacaccaac
2160

     gttggggctg gattaccggt tattgagaac ctgcaaaatc tgctcaatgc aggtgatgaa
2220

     ttgatgaagt tctccggcat tctttctggt tcgctttctt atatcttcgg caagttagac
2280

     gaaggcatga gtttctccga ggcgaccacg ctggcgcggg aaatgggtta taccgaaccg
2340

     gacccgcgag atgatctttc tggtatggat gtggcgcgta aactattgat tctcgctcgt
2400

     gaaacgggac gtgaactgga gctggcggat attgaaattg aacctgtgct gcccgcagag
2460

     tttaacgccg agggtgatgt tgccgctttt atggcgaatc tgtcacaact cgacgatctc
2520

     tttgccgcgc gcgtggcgaa ggcccgtgat gaaggaaaag ttttgcgcta tgttggcaat
2580

     attgatgaag atggcgtctg ccgcgtgaag attgccgaag tggatggtaa tgatccgctg
2640

     ttcaaagtga aaaatggcga aaacgccctg gccttctata gccactatta tcagccgctg
2700

     ccgttggtac tgcgcggata tggtgcgggc aatgacgtta cagctgccgg tgtctttgct
2760

     gatctgctac gtaccctctc atggaagtta ggagtctgac atggttaaag tttatgcccc
2820

     ggcttccagt gccaatatga gcgtcgggtt tgatgtgctc ggggcggcgg tgacacctgt
2880

     tgatggtgca ttgctcggag atgtagtcac ggttgaggcg gcagagacat tcagtctcaa
2940

     caacctcgga cgctttgccg ataagctgcc gtcagaacca cgggaaaata tcgtttatca
3000

     gtgctgggag cgtttttgcc aggaactggg taagcaaatt ccagtggcga tgaccctgga
3060

     aaagaatatg ccgatcggtt cgggcttagg ctccagtgcc tgttcggtgg tcgcggcgct
3120

     gatggcgatg aatgaacact gcggcaagcc gcttaatgac actcgtttgc tggctttgat
3180

     gggcgagctg gaaggccgta tctccggcag cattcattac gacaacgtgg caccgtgttt
3240

     tctcggtggt atgcagttga tgatcgaaga aaacgacatc atcagccagc aagtgccagg
3300

     gtttgatgag tggctgtggg tgctggcgta tccggggatt aaagtctcga cggcagaagc
3360

     cagggctatt ttaccggcgc agtatcgccg ccaggattgc attgcgcacg ggcgacatct
3420

     ggcaggcttc attcacgcct gctattcccg tcagcctgag cttgccgcga agctgatgaa
3480

     agatgttatc gctgaaccct accgtgaacg gttactgcca ggcttccggc aggcgcggca
3540

     ggcggtcgcg gaaatcggcg cggtagcgag cggtatctcc ggctccggcc cgaccttgtt
3600

     cgctctgtgt gacaagccgg aaaccgccca gcgcgttgcc gactggttgg gtaagaacta
3660

     cctgcaaaat caggaaggtt ttgttcatat ttgccggctg gatacggcgg gcgcacgagt
3720

     actggaaaac taaatgaaac tctacaatct gaaagatcac aacgagcagg tcagctttgc
3780

     gcaagccgta acccaggggt tgggcaaaaa tcaggggctg ttttttccgc acgacctgcc
3840

     ggaattcagc ctgactgaaa ttgatgagat gctgaagctg gattttgtca cccgcagtgc
3900

     gaagatcctc tcggcgttta ttggtgatga aatcccacag gaaatcctgg aagagcgcgt
3960

     gcgcgcggcg tttgccttcc cggctccggt cgccaatgtt gaaagcgatg tcggttgtct
4020

     ggaattgttc cacgggccaa cgctggcatt taaagatttc ggcggtcgct ttatggcaca
4080

     aatgctgacc catattgcgg gtgataagcc agtgaccatt ctgaccgcga cctccggtga
4140

     taccggagcg gcagtggctc atgctttcta cggtttaccg aatgtgaaag tggttatcct
4200

     ctatccacga ggcaaaatca gtccactgca agaaaaactg ttctgtacat tgggcggcaa
4260

     tatcgaaact gttgccatcg acggcgattt cgatgcctgt caggcgctgg tgaagcaggc
4320

     gtttgatgat gaagaactga aagtggcgct agggttaaac tcggctaact cgattaacat
4380

     cagccgtttg ctggcgcaga tttgctacta ctttgaagct gttgcgcagc tgccgcagga
4440

     gacgcgcaac cagctggttg tctcggtgcc aagcggaaac ttcggcgatt tgacggcggg
4500

     tctgctggcg aagtcactcg gtctgccggt gaaacgtttt attgctgcga ccaacgtgaa
4560

     cgataccgtg ccacgtttcc tgcacgacgg tcagtggtca cccaaagcga ctcaggcgac
4620

     gttatccaac gcgatggacg tgagtcagcc gaacaactgg ccgcgtgtgg aagagttgtt
4680

     ccgccgcaaa atctggcaac tgaaagagct gggttatgca gccgtggatg atgaaaccac
4740

     gcaacagaca atgcgtgagt taaaagaact gggctacact tcggagccgc acgctgccgt
4800

     agcttatcgt gcgctgcgtg atcagttgaa tccaggcgaa tatggcttgt tcctcggcac
4860

     cgcgcatccg gcgaaattta aagagagcgt ggaagcgatt ctcggtgaaa cgttggatct
4920

     gccaaaagag ctggcagaac gtgctgattt acccttgctt tcacataatc tgcccgccga
4980

     ttttgctgcg ttgcgtaaat tgatgatgaa tcatcagtaa aatctattca ttatctcaat
5040

     caggccgggt ttgcttttat gcagcccggc ttttttatga agaaattatg gagaaaaatg
5100

     acagggaaaa aggagaaatt ctcaataaat gcggtaactt agagattagg attgcggaga
5160

     ataacaaccg ccgttctcat cgagtaatct ccggatatcg acccataacg ggcaatgata
5220

     aaaggagtaa cctgtgaaaa agatgcaatc tatcgtactc gcactttccc tggttctggt
5280

     cgctcccatg gcagcacagg ctgcggaaat tacgttagtc ccgtcagtaa aattacagat
5340

     aggcgatcgt gataatcgtg gctattactg ggatggaggt cactggcgcg accacggctg
5400

     gtggaaacaa cattatgaat ggcgaggcaa tcgctggcac ctacacggac cgccgccacc
5460

     gccgcgccac cataagaaag ctcctcatga tcatcacggc ggtcatggtc caggcaaaca
5520

     tcaccgctaa atgacaaatg ccgggtaaca atccggcatt cagcgcctga tgcgacgctg
5580

     gcgcgtctta tcaggcctac gttaattctg caatatattg aatctgcatg cttttgtagg
5640

     caggataagg cgttcacgcc gcatccggca ttgactgcaa acttaacgct gctcgtagcg
5700

     tttaaacacc agttcgccat tgctggagga atcttcatca aagaagtaac cttcgctatt
5760

     aaaaccagtc agttgctctg gtttggtcag ccgattttca ataatgaaac gactcatcag
5820

     accgcgtgct ttcttagcgt agaagctgat gatcttaaat ttgccgttct tctcatcgag
5880

     gaacaccggc ttgataatct cggcattcaa tttcttcggc ttcaccgatt taaaatactc
5940

     atctgacgcc agattaatca ccacattatc gccttgtgct gcgagcgcct cgttcagctt
6000

     gttggtgatg atatctcccc agaattgata cagatctttc cctcgggcat tctcaagacg
6060

     gatccccatt tccagacgat aaggctgcat taaatcgagc gggcggagta cgccatacaa
6120

     gccggaaagc attcgcaaat gctgttgggc aaaatcgaaa tcgtcttcgc tgaaggtttc
6180

     ggcctgcaag ccggtgtaga catcaccttt aaacgccaga atcgcctggc gggcattcgc
6240

     cggcgtgaaa tctggctgcc agtcatgaaa gcgagcggcg ttgatacccg ccagtttgtc
6300

     gctgatgcgc atcagcgtgc taatctgcgg aggcgtcagt ttccgcgcct catggatcaa
6360

     ctgctgggaa ttgtctaaca gctccggcag cgtatagcgc gtggtggtca acgggctttg
6420

     gtaatcaagc gttttcgcag gtgaaataag aatcagcata tccagtcctt gcaggaaatt
6480

     tatgccgact ttagcaaaaa atgagaatga gttgatcgat agttgtgatt actcctgcga
6540

     aacatcatcc cacgcgtccg gagaaagctg gcgaccgata tccggataac gcaatggatc
6600

     aaacaccggg cgcacgccga gtttacgctg gcgtagataa tcactggcaa tggtatgaac
6660

     cacaggcgag agcagtaaaa tggcggtcaa attggtaata gccatgcagg ccattatgat
6720

     atctgccagt tgccacatca gcggaaggct tagcaaggtg ccgccgatga ccgttgcgaa
6780

     ggtgcagatc cgcaaacacc agatcgcttt agggttgttc aggcgtaaaa agaagagatt
6840

     gttttcggca taaatgtagt tggcaacgat ggagctgaag gcaaacagaa taaccacaag
6900

     ggtaacaaac tcagcacccc aggaacccat tagcacccgc atcgccttct ggataagctg
6960

     aataccttcc agcggcatgt aggttgtgcc gttacccgcc agtaatatca gcatggcgct
7020

     tgccgtacag atgaccaggg tgtcgataaa aatgccaatc atctggacaa tcccttgcgc
7080

     tgccggatgc ggaggccagg acgccgctgc cgctgccgcg tttggcgtcg aacccattcc
7140

     cgcctcattg gaaaacatac tgcgctgaaa accgttagta atcgcctggc ttaaggtata
7200

     tcccgccgcg ccgcctgccg cttcctgcca gccaaaagca ctctcaaaaa tagaccaaat
7260

     gacgtgggga agttgcccga tattcattac gcaaattacc aggctggtca gtacccagat
7320

     tatcgccatc aacgggacaa agccctgcat gagccgggcg acgccatgaa gaccgcgagt
7380

     gattgccagc agagtaaaga cagcgagaat aatgcctgtc accagcgggg gaaaatcaaa
7440

     agaaaaactc agggcgcggg caacggcgtt cgcttgaact ccgctgaaaa ttatgccata
7500

     ggcgatgagc aaaaagacgg cgaacagaac gcccatccag cgcatcccca gcccgcgcgc
7560

     catataccat gccggtccgc cacgaaactg cccattgacg tcacgttctt tataaagttg
7620

     tgccagagaa cattcggcaa acgaggtcgc catgccgata aacgcggcaa cccacatcca
7680

     aaagacggct ccaggtccac cggcggtaat agccagcgca acgccggcca ggttgccgct
7740

     acccacgcgc gccgcaagac tggtacacaa tgactgaaat gaggttaaac cgcctggctg
7800

     tggatgaatg ctatttttaa gacttttgcc aaactggcgg atgtagcgaa actgcacaaa
7860

     tccggtgcga aaagtgaacc aacaacctgc gccgaagagc aggtaaatca ttaccgatcc
7920

     ccaaaggacg ctgttaatga aggagaaaaa atctggcatg catatccctc ttattgccgg
7980

     tcgcgatgac tttcctgtgt aaacgttacc aattgtttaa gaagtatata cgctacgagg
8040

     tacttgataa cttctgcgta gcatacatga ggttttgtat aaaaatggcg ggcgatatca
8100

     acgcagtgtc agaaatccga aacagtctcg cctggcgata accgtcttgt cggcggttgc
8160

     gctgacgttg cgtcgtgata tcatcagggc agaccggtta catcccccta acaagctgtt
8220

     taaagagaaa tactatcatg acggacaaat tgacctccct tcgtcagtac accaccgtag
8280

     tggccgacac tggggacatc gcggcaatga agctgtatca accgcaggat gccacaacca
8340

     acccttctct cattcttaac gcagcgcaga ttccggaata ccgtaagttg attgatgatg
8400

     ctgtcgcctg ggcgaaacag cagagcaacg atcgcgcgca gcagatcgtg gacgcgaccg
8460

     acaaactggc agtaaatatt ggtctggaaa tcctgaaact ggttccgggc cgtatctcaa
8520

     ctgaagttga tgcgcgtctt tcctatgaca ccgaagcgtc aattgcgaaa gcaaaacgcc
8580

     tgatcaaact ctacaacgat gctggtatta gcaacgatcg tattctgatc aaactggctt
8640

     ctacctggca gggtatccgt gctgcagaac agctggaaaa agaaggcatc aactgtaacc
8700

     tgaccctgct gttctccttc gctcaggctc gtgcttgtgc ggaagcgggc gtgttcctga
8760

     tctcgccgtt tgttggccgt attcttgact ggtacaaagc gaataccgat aagaaagagt
8820

     acgctccggc agaagatccg ggcgtggttt ctgtatctga aatctaccag tactacaaag
8880

     agcacggtta tgaaaccgtg gttatgggcg caagcttccg taacatcggc gaaattctgg
8940

     aactggcagg ctgcgaccgt ctgaccatcg caccggcact gctgaaagag ctggcggaga
9000

//

 
Jolyon Holdstock Ph.D.

Senior Computational Biologist,

Oxford Gene Technology (Ops) Ltd.

Begbroke Business and Science Park

Sandy Lane, Yarnton

Oxford, OX5 1PF

 
Tel: 01865 309699

Fax: 01865 842116

 
Confidentiality Notice:

The contents of this email from the Oxford Gene Technology Group of
Companies are confidential and intended solely for the person to whom it
is addressed. It may contain privileged and confidential information. If
you are not the intended recipient you must not read, copy, distribute,
discuss or take any action in reliance on it.

 
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Wed Mar  8 21:18:42 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Mar  8 21:13:44 2006
Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation
Message-ID: <OF7F997996.426B09C6-ON4825712C.00092887-4825712C.000CB319@EU.novartis.net>

Hi Felix -

There are some mapping differences between postgress and MySQL and Oracle, 
mostly seems to center around how they generate primary keys. I think you 
have solved this with your changes to the hbm.xml files. I will commit 
these to CVS.

The second problem you describe might be caused by the enrich process. 
Richard has created a biojavax equivalent of GenbankSequenceDB 
(RichGenbankSequenceDB I think) which will mean you can avoid using the 
enrich method. This may solve the problem.

The problem might be with the primary key of some seqfeature, this might 
be because of the enrich() method.

*ERROR: duplicate key violates unique constraint 
"seqfeature_bioentry_id_key"*

It may also be because of a problem in the postgres mapping of features 
(although if it only happens with enrich()ed sequences then probably not).

It could also be some old entries in your database from previous testing 
that may need cleaning out (although if the hibernate mapping is correct 
this is not likely).

- Mark


Felix Dreher <dreher@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
03/09/2006 02:08 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Problem: Hibernate - RichSequence Annotation


Hello all,

in my last post I described a problem with primary keys. When I tried to 
save a RichSequence with annotations in a PostgreSQL/BioSQL-Database 
using Hibernate,
among others the exception
--- org.postgresql.util.PSQLException: ERROR: relation 
"ontology_ontology_id_seq" does not exist ---
was thrown.
This could be solved by changing the <generator> tag in the 
ontology.hbm.xml
from
            <generator class="identity"/>

to
            <generator class="sequence">
                <param name="sequence">ontology_pk_seq</param>
            </generator>

(and similarly in the term.hbm.xml file).

I'm not sure if this is specific for my project or if it's a general 
problem.
Anyway, this works fine now, however another problem came up:

I want to enrich a Sequence that was downloaded from Genbank and (by 
enriching) save all the annotations in the RichSequence object.

Sequence seq = new GenbankSequenceDB().getSequence("NM_008160");
RichSequence s = RichSequence.Tools.enrich(seq);
tdb.addSequence(s);

(where tdb is a convenience wrapper for storing and retrieving sequences 
from the BioSQL-DB, it works with non-enriched sequences).

 From the debugging info I got, this works at the object level, but when 
I try to save the sequence to the DB, the following exception is thrown:


2006-03-08 18:35:00,642 ERROR [httpWorkerThread-28080-9]
 calling method: 
org.hibernate.util.JDBCExceptionReporter.logExceptions(JDBCExceptionReporter.java:72)
 *ERROR: duplicate key violates unique constraint 
"seqfeature_bioentry_id_key"*

2006-03-08 18:35:00,643 ERROR [httpWorkerThread-28080-9]
 calling method: 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:299)
 *Could not synchronize database state with session*
org.hibernate.exception.ConstraintViolationException: Could not execute 
JDBC batch update
        at 
org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:69)
        at 
org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:202)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427)
        at 
org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140)
        at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296)
        at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009)
        at 
org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356)
        at 
org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106)
        at 
rnaiprediction.sequence.db.SequenceDB.addSequence(SequenceDB.java:67)
        at rnaiprediction.Queue.prerender(Queue.java:374)
......
 
*Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into 
seqfeature (bioentry_id, source_term_id, type_term_id, display_name, 
rank, seqfeature_id) values (126, 269, 269, NULL, 0, 83) was aborted. 
Call getNextException to see the cause.*
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2497)
        at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1298)
        at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:347)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2559)
        at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
        ... 57 more


Any suggestions would be highly appreciated!

Regards,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From anderson.moura at telemar-rj.com.br  Thu Mar  9 11:59:37 2006
From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva)
Date: Thu Mar  9 12:06:21 2006
Subject: [Biojava-l] Alignmente algorithms implemented by BioJava
Message-ID: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net>

Hi,
 
Can somebody tell me what algorithms Biojava uses to make local alignments and multiples alignments?
I'm Serching it on the Documentation but I have not found it?
 
Where could I find it? I'm working on a Java Environment for Analysis and Alignment of Sequences!
 
Thanks,
Anderson Moura - Brazil


Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la, informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos. Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.


From sylvain.foisy at bioneq.qc.ca  Thu Mar  9 12:45:45 2006
From: sylvain.foisy at bioneq.qc.ca (Sylvain Foisy)
Date: Thu Mar  9 13:19:11 2006
Subject: [Biojava-l] Alignmente algorithms implemented by BioJava
In-Reply-To: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net>
Message-ID: <C035D3F9.8E64%sylvain.foisy@bioneq.qc.ca>

Hi there,

The CVS code in BioJava contains the necessary classes to perform both
Smith-Waterman and Needleman-Wunch pairwise alignments. Just have a look at
the Javadocs for instructions. The preferred way around here is based around
using HMMs to perform the alignments; for this look at the Cookbook section
of the BioJava's new website for ways to do that
(http://biojava.open-bio.org).

As far as MSA are concerned, I don't think that there is anything to do that
in BJ. You could call clustal from within your program and use BJ's MSA
parsing classes to do your stuff.

Hope this helps

Sylvain


On 3/9/06 11:59 AM, "[NAME]" <[ADDRESS]> wrote:

> Hi,
>  
> Can somebody tell me what algorithms Biojava uses to make local alignments and
> multiples alignments?
> I'm Serching it on the Documentation but I have not found it?
>  
> Where could I find it? I'm working on a Java Environment for Analysis and
> Alignment of Sequences!
>  
> Thanks,
> Anderson Moura - Brazil
> 
> 
> Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas
> e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do
> remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la,
> informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos.
> Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe
> respondendo imediatamente a este e-mail e em seguida apague-a.
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

==================================================================
Sylvain Foisy, Ph. D.
Directeur - operations / Project Manager
BioneQ - Reseau quebecois de bio-informatique
U. de Montreal / Genome-Quebec

Adresse postale:

Departement de biochimie
Pavillon principal
2900, boul. ?douard-Montpetit
Montr?al (Qu?bec) H3T 1J4

Tel: (514) 343-6111 x.2545
Fax: (514) 343-7759
Courriel: sylvain.foisy@bioneq.qc.ca
==================================================================


From anderson.moura at telemar-rj.com.br  Thu Mar  9 13:17:59 2006
From: anderson.moura at telemar-rj.com.br (Anderson Moura da Silva)
Date: Thu Mar  9 13:22:46 2006
Subject: RES: [Biojava-l] Alignmente algorithms implemented by BioJava
Message-ID: <3C39C09ED334F243838953854BE43FB6022BE082@MAILBX02.telemar.corp.net>

Thanks, I was really looking for these 3 ones!!

I'm very knew in Bioinformatics, so can I ask if there is others algorithms that are really used? I only know these 3 ones (SW, NW and HMM)!!

Thanks
Anderson Moura - Brazil

-----Mensagem original-----
De: Sylvain Foisy [mailto:sylvain.foisy@bioneq.qc.ca]
Enviada em: quinta-feira, 9 de mar?o de 2006 14:46
Para: Anderson Moura da Silva
Cc: biojava-l@portal.open-bio.org
Assunto: Re: [Biojava-l] Alignmente algorithms implemented by BioJava


Hi there,

The CVS code in BioJava contains the necessary classes to perform both
Smith-Waterman and Needleman-Wunch pairwise alignments. Just have a look at
the Javadocs for instructions. The preferred way around here is based around
using HMMs to perform the alignments; for this look at the Cookbook section
of the BioJava's new website for ways to do that
(http://biojava.open-bio.org).

As far as MSA are concerned, I don't think that there is anything to do that
in BJ. You could call clustal from within your program and use BJ's MSA
parsing classes to do your stuff.

Hope this helps

Sylvain


On 3/9/06 11:59 AM, "[NAME]" <[ADDRESS]> wrote:

> Hi,
>  
> Can somebody tell me what algorithms Biojava uses to make local alignments and
> multiples alignments?
> I'm Serching it on the Documentation but I have not found it?
>  
> Where could I find it? I'm working on a Java Environment for Analysis and
> Alignment of Sequences!
>  
> Thanks,
> Anderson Moura - Brazil
> 
> 
> Esta mensagem, incluindo seus anexos, pode conter informa??es privilegiadas
> e/ou de car?ter confidencial, n?o podendo ser retransmitida sem autoriza??o do
> remetente. Se voc? n?o ? o destinat?rio ou pessoa autorizada a receb?-la,
> informamos que o seu uso, divulga??o, c?pia ou arquivamento s?o proibidos.
> Portanto, se voc? recebeu esta mensagem por engano, por favor, nos informe
> respondendo imediatamente a este e-mail e em seguida apague-a.
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

==================================================================
Sylvain Foisy, Ph. D.
Directeur - operations / Project Manager
BioneQ - Reseau quebecois de bio-informatique
U. de Montreal / Genome-Quebec

Adresse postale:

Departement de biochimie
Pavillon principal
2900, boul. ?douard-Montpetit
Montr?al (Qu?bec) H3T 1J4

Tel: (514) 343-6111 x.2545
Fax: (514) 343-7759
Courriel: sylvain.foisy@bioneq.qc.ca
==================================================================


From guedes at unisul.br  Thu Mar  9 14:33:59 2006
From: guedes at unisul.br (Dickson S. Guedes)
Date: Thu Mar  9 15:37:43 2006
Subject: [Biojava-l] Alignmente algorithms implemented by BioJava
In-Reply-To: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net>
References: <3C39C09ED334F243838953854BE43FB6022BDD91@MAILBX02.telemar.corp.net>
Message-ID: <44108327.9040009@unisul.br>

Anderson Moura da Silva escreveu:
> Hi,
>  
> Can somebody tell me what algorithms Biojava uses to make local alignments and multiples alignments?
> I'm Serching it on the Documentation but I have not found it?
>  
> Where could I find it? I'm working on a Java Environment for Analysis and Alignment of Sequences!
>  
> Thanks,
> Anderson Moura - Brazil

Sorry ALL but I'll reply this message using my natural language.


Ola Anderson,

O BioJava n?o implementa uma classe para Alinhamentos Multiplos, por?m 
tem como voce fazer alinhamento de pares (PairWise) usando programa??o 
din?mica (DP) neste caso h? um exemplo no CookBook no link abaixo:

- http://biojava.open-bio.org/wiki/BioJava:CookBook:DP:PairWise

Voce pode optar por fazer alinhamentos multiplos usando o pacote STRAP 
neste caso ha um exemplo em:

- http://www.charite.de/bioinf/strap/biojavaInAnger_SequenceAligner.html

[]'s
-- 
Dickson S. Guedes
/*
  * UNISUL - Universidade do Sul de Santa Catarina
  * ATI - Assessoria de Tecnologia da Informa??o
  * (0xx48) 621-3200 - http://www.unisul.br
  *
  *    "Quis custodiet ipsos custodes?"
  */
From dreher at mpiib-berlin.mpg.de  Fri Mar 10 10:37:21 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Fri Mar 10 10:33:42 2006
Subject: [Biojava-l] Problem: Hibernate - RichSequence Annotation
In-Reply-To: <OF7F997996.426B09C6-ON4825712C.00092887-4825712C.000CB319@EU.novartis.net>
References: <OF7F997996.426B09C6-ON4825712C.00092887-4825712C.000CB319@EU.novartis.net>
Message-ID: <44119D31.6010703@mpiib-berlin.mpg.de>

Hi Mark,

I changed my code and now use the new 
GenbankRichSequenceDB.getRichSequence(String id) method, and this solved 
my problem.
However I had to change some of the hbm.xml files again. I will commit 
these to CVS.
Thanks again for your help.

Regards,
Felix


mark.schreiber@novartis.com wrote:

>Hi Felix -
>
>There are some mapping differences between postgress and MySQL and Oracle, 
>mostly seems to center around how they generate primary keys. I think you 
>have solved this with your changes to the hbm.xml files. I will commit 
>these to CVS.
>
>The second problem you describe might be caused by the enrich process. 
>Richard has created a biojavax equivalent of GenbankSequenceDB 
>(RichGenbankSequenceDB I think) which will mean you can avoid using the 
>enrich method. This may solve the problem.
>
>The problem might be with the primary key of some seqfeature, this might 
>be because of the enrich() method.
>
>*ERROR: duplicate key violates unique constraint 
>"seqfeature_bioentry_id_key"*
>
>It may also be because of a problem in the postgres mapping of features 
>(although if it only happens with enrich()ed sequences then probably not).
>
>It could also be some old entries in your database from previous testing 
>that may need cleaning out (although if the hibernate mapping is correct 
>this is not likely).
>
>- Mark
>
>
>
>
>
>Felix Dreher <dreher@mpiib-berlin.mpg.de>
>Sent by: biojava-l-bounces@portal.open-bio.org
>03/09/2006 02:08 AM
>
> 
>        To:     biojava-l@biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        [Biojava-l] Problem: Hibernate - RichSequence Annotation
>
>
>Hello all,
>
>in my last post I described a problem with primary keys. When I tried to 
>save a RichSequence with annotations in a PostgreSQL/BioSQL-Database 
>using Hibernate,
>among others the exception
>--- org.postgresql.util.PSQLException: ERROR: relation 
>"ontology_ontology_id_seq" does not exist ---
>was thrown.
>This could be solved by changing the <generator> tag in the 
>ontology.hbm.xml
>from
>            <generator class="identity"/>
>
>to
>            <generator class="sequence">
>                <param name="sequence">ontology_pk_seq</param>
>            </generator>
>
>(and similarly in the term.hbm.xml file).
>
>I'm not sure if this is specific for my project or if it's a general 
>problem.
>Anyway, this works fine now, however another problem came up:
>
>I want to enrich a Sequence that was downloaded from Genbank and (by 
>enriching) save all the annotations in the RichSequence object.
>
>Sequence seq = new GenbankSequenceDB().getSequence("NM_008160");
>RichSequence s = RichSequence.Tools.enrich(seq);
>tdb.addSequence(s);
>
>(where tdb is a convenience wrapper for storing and retrieving sequences 
>from the BioSQL-DB, it works with non-enriched sequences).
>
> From the debugging info I got, this works at the object level, but when 
>I try to save the sequence to the DB, the following exception is thrown:
>
>
>
>2006-03-08 18:35:00,642 ERROR [httpWorkerThread-28080-9]
> calling method: 
>org.hibernate.util.JDBCExceptionReporter.logExceptions(JDBCExceptionReporter.java:72)
> *ERROR: duplicate key violates unique constraint 
>"seqfeature_bioentry_id_key"*
>
>2006-03-08 18:35:00,643 ERROR [httpWorkerThread-28080-9]
> calling method: 
>org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:299)
> *Could not synchronize database state with session*
>org.hibernate.exception.ConstraintViolationException: Could not execute 
>JDBC batch update
>        at 
>org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:69)
>        at 
>org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
>        at 
>org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:202)
>        at 
>org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91)
>        at 
>org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86)
>        at 
>org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171)
>        at 
>org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048)
>        at 
>org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427)
>        at 
>org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51)
>        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
>        at 
>org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227)
>        at 
>org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140)
>        at 
>org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296)
>        at 
>org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
>        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009)
>        at 
>org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356)
>        at 
>org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106)
>        at 
>rnaiprediction.sequence.db.SequenceDB.addSequence(SequenceDB.java:67)
>        at rnaiprediction.Queue.prerender(Queue.java:374)
>......
> 
>*Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into 
>seqfeature (bioentry_id, source_term_id, type_term_id, display_name, 
>rank, seqfeature_id) values (126, 269, 269, NULL, 0, 83) was aborted. 
>Call getNextException to see the cause.*
>        at 
>org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2497)
>        at 
>org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1298)
>        at 
>org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:347)
>        at 
>org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2559)
>        at 
>org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
>        at 
>org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
>        ... 57 more
>
>
>Any suggestions would be highly appreciated!
>
>Regards,
>Felix
>
>
>  
>


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From emy_66 at hotmail.com  Mon Mar 13 00:39:41 2006
From: emy_66 at hotmail.com (Emily Wong)
Date: Mon, 13 Mar 2006 05:39:41 +0000
Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13
Message-ID: <BAY103-F3466856600F5B0BB2AE6EEE1E00@phx.gbl>

Hi,

Is there a parser that takes into account ncbi blast version 2.2.13(on their 
website)? I am trying to use the code here to parse : 
http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the 
parser from strict to lazy I get these comments :
Exception in thread "main" java.lang.NullPointerException
	at 
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215)
	at 
org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
	at 
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311)
	at 
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274)
	at 
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160)
	at BlastParser.main(BlastParser.java:46)

Thanks,

Emily


From mark.schreiber at novartis.com  Mon Mar 13 20:07:39 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 14 Mar 2006 09:07:39 +0800
Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13
Message-ID: <OF775F959B.A4A8F50B-ON48257131.00061EEB-48257131.0006322C@EU.novartis.net>

Possibly some variation in this output is causing the problem

Can you post some blast output that replicates this error?

Thanks

- Mark


"Emily Wong" <emy_66 at hotmail.com>
Sent by: biojava-l-bounces at lists.open-bio.org
03/13/2006 01:39 PM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Blast parser for ncbi blast version 2.2.13


Hi,

Is there a parser that takes into account ncbi blast version 2.2.13(on 
their 
website)? I am trying to use the code here to parse : 
http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the 
parser from strict to lazy I get these comments :
Exception in thread "main" java.lang.NullPointerException
                 at 
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215)
                 at 
org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
                 at 
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311)
                 at 
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274)
                 at 
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160)
                 at BlastParser.main(BlastParser.java:46)

Thanks,

Emily


_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From christoph.gille at charite.de  Tue Mar 14 02:47:10 2006
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Tue, 14 Mar 2006 08:47:10 +0100 (CET)
Subject: [Biojava-l] alignment algor in Biojava
In-Reply-To: <mailman.1077.1142298882.2006.biojava-l@lists.open-bio.org>
References: <mailman.1077.1142298882.2006.biojava-l@lists.open-bio.org>
Message-ID: <64617.84.190.58.246.1142322430.squirrel@webmail.charite.de>

> Hi,
>
> Can somebody tell me what algorithms Biojava uses to make local alignments
> and multiples alignments? I'm Serching it on the Documentation but I have
> not found it?

at the bottom of the page
http://www.biojava.org/docs/bj_in_anger/index.htm

http://www.charite.de/bioinf/strap/Scripting.html#SequenceAligner
Cheers

Christoph


From koeberle at mpiib-berlin.mpg.de  Tue Mar 14 07:28:37 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Tue, 14 Mar 2006 13:28:37 +0100
Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ?
Message-ID: <4416B6F5.1050905@mpiib-berlin.mpg.de>

Hi,
I try to write a Sequence-Object into BioSQL-DB using the Classes of 
BioJAVA-X.
This works well. But if I try to save a Sequence-Object with two (or 
more) Features and both Feature have equal Types and equal Sources, 
writing in DB fails.
Is the idea wrong to have more than one Feature with same type and 
source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL.

Thanks,
Christian

The Errormessage:
org.hibernate.StaleStateException: Batch update returned unexpected row 
count from update: 0 actual row count: 0 expected: 1
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93)
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79)
        at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427)
        at 
org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140)
        at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296)
        at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009)
        at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356)
        at 
org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106)

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de


From koeberle at mpiib-berlin.mpg.de  Tue Mar 14 09:06:43 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Tue, 14 Mar 2006 15:06:43 +0100
Subject: [Biojava-l] BioJAVA-X + BioSQL + no update
Message-ID: <4416CDF3.4000407@mpiib-berlin.mpg.de>

Hi,
I have following problem.
I put a RichSequence-Object into a BioSQL-DB, using the new Classes from 
BioJAVA-X.
Later I get these Sequence-Object from the BioSQL-DB (also with 
BioJAVA-X)  and create new Faeture-Objects and Note-Objects and add 
these to the Sequence-Object.
In the case of BioJAVA 1.4 all Features and Annotations are written into 
the BioSQL-DB. In case of BioJAVA-X there are  no changes ind the DB.
Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the 
changes into the DB.

Thanks,
Christian

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de


From bubba.puryear at gmail.com  Mon Mar 13 12:27:11 2006
From: bubba.puryear at gmail.com (Bubba Puryear)
Date: Mon, 13 Mar 2006 12:27:11 -0500
Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records
Message-ID: <d2f7533b0603130927g26e08bc6o975cac05a9df1f72@mail.gmail.com>

 Hello,

  I work on a webapp for a biotech company that uses biojava to parse
plasmid and feature maps (genbank flatfile format)  and we store them in a
local database. I've wanted to update the version of biojava we use because
the current CVS parser handles features that cross the origin on plasmid
maps much better than the parser in 1.4.

  However, we have a lot of data in various databases that have genbank
records formatted in some of the older incarnations of the GFF. In
particular, some feature maps don't have ACCESSION fields, and/or are
missing modification dates and genbank divisions on the LOCUS line. When I
try to parse one of those maps with biojavax, I get parse errors.

  Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat
class be made more tolerant? I know NCBI made several changes to their
flatfile format in part  because writing parsers for the older specs was
tricky. So I'm not sure which direction the bio* folks would like to go with
this.

  I've attached a small example map that causes parse problems. The data in
the map is completely bogus, but the structure was taken from a real map
file I have to deal with.

  The following code snippet illustrates my problems:

        BufferedReader br = new BufferedReader(new
StringReader(genbankContent));
        try {
            RichSequenceIterator sequences = IOTools.readGenbankDNA(br,
null);
            if (sequences.hasNext()) {
                    this.sequence = sequences.nextRichSequence();
             }
        } catch (Exception e) {
            e.printStackTrace();
        }


  where genbankContent is a String containing the contents of the attached
file.

Thanks much,
Bubba Puryear
-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.gb
Type: chemical/seq-na-genbank
Size: 1091 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biojava-l/attachments/20060313/b56af1d0/attachment.bin 

From mira.edelstein at gmx.net  Tue Mar 14 17:30:01 2006
From: mira.edelstein at gmx.net (Mira)
Date: Tue, 14 Mar 2006 23:30:01 +0100
Subject: [Biojava-l] (no subject)
Message-ID: <001501c647b6$d5954f70$9b7ba8c0@mecom>

please take me from the mailing list

thanks mira

From mark.schreiber at novartis.com  Wed Mar 15 01:42:59 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 15 Mar 2006 14:42:59 +0800
Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ?
Message-ID: <OFF9C6B6C2.B599605D-ON48257132.0023FE17-48257132.0024E58C@EU.novartis.net>

This could be a bug, this is bleeding edge development code.

Are you using the most up to date CVS code? Also which database are you 
using?

As a suggestion RichFeatures with the same Type, Source and Parent 
sequence can only be distinguished by rank (In BioSQL and BioJavaX). Can 
you persist them to the DB if you give one a different rank?

- Mark


Christian K?berle <koeberle at mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces at lists.open-bio.org
03/14/2006 08:28 PM

 
        To:     bio java mailing list <biojava-l at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Feature + BioJAVA-X + BioSQL ?


Hi,
I try to write a Sequence-Object into BioSQL-DB using the Classes of 
BioJAVA-X.
This works well. But if I try to save a Sequence-Object with two (or 
more) Features and both Feature have equal Types and equal Sources, 
writing in DB fails.
Is the idea wrong to have more than one Feature with same type and 
source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL.

Thanks,
Christian

The Errormessage:
org.hibernate.StaleStateException: Batch update returned unexpected row 
count from update: 0 actual row count: 0 expected: 1
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93)
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79)
        at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427)
        at 
org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140)
        at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296)
        at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009)
        at 
org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356)
        at 
org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106)

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Wed Mar 15 02:02:02 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 15 Mar 2006 15:02:02 +0800
Subject: [Biojava-l] BioJAVA-X + BioSQL + no update
Message-ID: <OF1DCB13C9.C7738C71-ON48257132.0024F81A-48257132.0026A3EF@EU.novartis.net>

With BioJavaX if you want any changes to a RichSequence object to persist 
to the database you need to "save or add it" with Hibernate. 


SessionFactory sessionFactory = new 
Configuration().configure().buildSessionFactory(); 
Session session = sessionFactory.openSession();  
RichObjectFactory.connectToBioSQL(session);

RichSequence rs = ...;                // some sequence you've made or 
modified
session.saveOrUpdate("Sequence",rs);  // persist the sequence

***
Another way is to do everything inside a transaction (this example is from 
the BioJavaX docbook in CVS)

SessionFactory sessionFactory = new 
Configuration().configure().buildSessionFactory(); 
Session session = sessionFactory.openSession();  
RichObjectFactory.connectToBioSQL(session);

Transaction tx = session.beginTransaction();
try {

    // print out all the namespaces in the database

    Query q = session.createQuery("from Namespace");
    List namespaces = q.list();               // retrieve all the 
namespaces from the db
    for (Iterator i = namespaces.iterator(); i.hasNext(); ) {
        Namespace ns = (Namespace)i.next();
        System.out.println(ns.getName());     // print out the name of the 
namespace

        // print out all the sequences in the namespace
        Query sq = session.createQuery("from BioEntry where namespace= 
:nsp");
        // set the named parameter "nsp" to ns
        sq.setParameter("nsp",ns);
        List sequences = sq.list();

        for (Iterator j = sequences.iterator(); j.hasNext(); ) {
            BioEntry be = (BioEntry)j.next();        // RichSequences are 
BioEntrys too
            System.out.println("   "+be.getName());  // print out the name 
of the sequence

            // if the sequence is called bloggs, change its description to 
XYZ

            if (be.getName().equals("bloggs")) {
                be.setDescription("XYZ");
            }
        }

    }

    // commit and tidy up
    tx.commit(); 
    System.out.println("Changes committed.");

    // all sequences called bloggs now have a description of "XYZ" in the 
database

} catch (Exception e) {
    tx.rollback(); 
    System.out.println("Changes rolled back.");
    e.printStackTrace(); 
}

session.close();


Christian K?berle <koeberle at mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces at lists.open-bio.org
03/14/2006 10:06 PM

 
        To:     bio java mailing list <biojava-l at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioJAVA-X + BioSQL + no update


Hi,
I have following problem.
I put a RichSequence-Object into a BioSQL-DB, using the new Classes from 
BioJAVA-X.
Later I get these Sequence-Object from the BioSQL-DB (also with 
BioJAVA-X)  and create new Faeture-Objects and Note-Objects and add 
these to the Sequence-Object.
In the case of BioJAVA 1.4 all Features and Annotations are written into 
the BioSQL-DB. In case of BioJAVA-X there are  no changes ind the DB.
Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the 
changes into the DB.

Thanks,
Christian

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Wed Mar 15 02:11:55 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 15 Mar 2006 15:11:55 +0800
Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records
Message-ID: <OFCB3369C9.A20B0E95-ON48257132.002727B7-48257132.00278B93@EU.novartis.net>

Hi -

I'm happy for the regexps in GenbankFormat and EMBLFormat etc to be 
relaxed a little as long as the parsing of fully valid genbank files 
doesn't suffer. If someone wants to test this thoroughly it would be a 
great benefit to the whole community.

In some cases it may not be possible. For example if a feature doesn't 
have sufficient information to build a proper RichFeature object I don't 
think we should allow the file.

I might be good to make a collection in CVS of example files that are 
known to have broken the parser in the past (the files folder in the test 
suite would be an ideal place).

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


"Bubba Puryear" <bubba.puryear at gmail.com>
Sent by: biojava-l-bounces at lists.open-bio.org
03/14/2006 01:27 AM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] biojavax GenbankFormat and legacy genbank records


Hello,

I work on a webapp for a biotech company that uses biojava to parse
plasmid and feature maps (genbank flatfile format)  and we store them in a
local database. I've wanted to update the version of biojava we use 
because
the current CVS parser handles features that cross the origin on plasmid
maps much better than the parser in 1.4.

However, we have a lot of data in various databases that have genbank
records formatted in some of the older incarnations of the GFF. In
particular, some feature maps don't have ACCESSION fields, and/or are
missing modification dates and genbank divisions on the LOCUS line. When I
try to parse one of those maps with biojavax, I get parse errors.

Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat
class be made more tolerant? I know NCBI made several changes to their
flatfile format in part  because writing parsers for the older specs was
tricky. So I'm not sure which direction the bio* folks would like to go 
with
this.

I've attached a small example map that causes parse problems. The data in
the map is completely bogus, but the structure was taken from a real map
file I have to deal with.

The following code snippet illustrates my problems:

BufferedReader br = new BufferedReader(new
StringReader(genbankContent));
try {
RichSequenceIterator sequences = IOTools.readGenbankDNA(br,
null);
if (sequences.hasNext()) {
this.sequence = sequences.nextRichSequence();
}
} catch (Exception e) {
e.printStackTrace();
}


where genbankContent is a String containing the contents of the attached
file.

Thanks much,
Bubba Puryear

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l

[ Attachment ''FOO.GB'' removed by Mark Schreiber ]


From koeberle at mpiib-berlin.mpg.de  Thu Mar 16 05:03:26 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Thu, 16 Mar 2006 11:03:26 +0100
Subject: [Biojava-l] BioJAVA-X + BioSQL + no update
In-Reply-To: <OF1DCB13C9.C7738C71-ON48257132.0024F81A-48257132.0026A3EF@EU.novartis.net>
References: <OF1DCB13C9.C7738C71-ON48257132.0024F81A-48257132.0026A3EF@EU.novartis.net>
Message-ID: <441937EE.6000204@mpiib-berlin.mpg.de>

Hi  Mark,

it works but the code has to look like that:
...
        session.getTransaction().begin();
        session.saveOrUpdate("Sequence",seq);       
        session.getTransaction().commit();
it also works with:
       session.update("Sequence",seq); 

Thanks,
Christian

mark.schreiber at novartis.com wrote:
> With BioJavaX if you want any changes to a RichSequence object to persist 
> to the database you need to "save or add it" with Hibernate. 
>
>
> SessionFactory sessionFactory = new 
> Configuration().configure().buildSessionFactory(); 
> Session session = sessionFactory.openSession();  
> RichObjectFactory.connectToBioSQL(session);
>
> RichSequence rs = ...;                // some sequence you've made or 
> modified
> session.saveOrUpdate("Sequence",rs);  // persist the sequence
>
> ***
> Another way is to do everything inside a transaction (this example is from 
> the BioJavaX docbook in CVS)
>
> SessionFactory sessionFactory = new 
> Configuration().configure().buildSessionFactory(); 
> Session session = sessionFactory.openSession();  
> RichObjectFactory.connectToBioSQL(session);
>
> Transaction tx = session.beginTransaction();
> try {
>
>     // print out all the namespaces in the database
>
>     Query q = session.createQuery("from Namespace");
>     List namespaces = q.list();               // retrieve all the 
> namespaces from the db
>     for (Iterator i = namespaces.iterator(); i.hasNext(); ) {
>         Namespace ns = (Namespace)i.next();
>         System.out.println(ns.getName());     // print out the name of the 
> namespace
>
>         // print out all the sequences in the namespace
>         Query sq = session.createQuery("from BioEntry where namespace= 
> :nsp");
>         // set the named parameter "nsp" to ns
>         sq.setParameter("nsp",ns);
>         List sequences = sq.list();
>
>         for (Iterator j = sequences.iterator(); j.hasNext(); ) {
>             BioEntry be = (BioEntry)j.next();        // RichSequences are 
> BioEntrys too
>             System.out.println("   "+be.getName());  // print out the name 
> of the sequence
>
>             // if the sequence is called bloggs, change its description to 
> XYZ
>
>             if (be.getName().equals("bloggs")) {
>                 be.setDescription("XYZ");
>             }
>         }
>
>     }
>
>     // commit and tidy up
>     tx.commit(); 
>     System.out.println("Changes committed.");
>
>     // all sequences called bloggs now have a description of "XYZ" in the 
> database
>
> } catch (Exception e) {
>     tx.rollback(); 
>     System.out.println("Changes rolled back.");
>     e.printStackTrace(); 
> }
>
> session.close();
>
>
>
>
>
>
> Christian K?berle <koeberle at mpiib-berlin.mpg.de>
> Sent by: biojava-l-bounces at lists.open-bio.org
> 03/14/2006 10:06 PM
>
>  
>         To:     bio java mailing list <biojava-l at biojava.org>
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] BioJAVA-X + BioSQL + no update
>
>
> Hi,
> I have following problem.
> I put a RichSequence-Object into a BioSQL-DB, using the new Classes from 
> BioJAVA-X.
> Later I get these Sequence-Object from the BioSQL-DB (also with 
> BioJAVA-X)  and create new Faeture-Objects and Note-Objects and add 
> these to the Sequence-Object.
> In the case of BioJAVA 1.4 all Features and Annotations are written into 
> the BioSQL-DB. In case of BioJAVA-X there are  no changes ind the DB.
> Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the 
> changes into the DB.
>
> Thanks,
> Christian
>
>   


-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de


From mark.schreiber at novartis.com  Thu Mar 16 21:50:34 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 17 Mar 2006 10:50:34 +0800
Subject: [Biojava-l] ProfileHMM Serialization Problem
Message-ID: <OF6364F467.02ADA9D1-ON48257134.000F8AD9-48257134.000F9E14@EU.novartis.net>

He did fix a number of problems, although possibly not all,

Which version are you using?

Can you send a stack trace?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


Todd Riley <toddri at eden.rutgers.edu>
03/17/2006 10:33 AM

 
        To:     Mark Schreiber/GP/Novartis at PH
        cc:     biojava-l-bounces at portal.open-bio.org, biojava-l at biojava.org
        Subject:        ProfileHMM Serialization Problem


Hello all,

I am having a problem with serialized ProfileHMM objects.  I can read in 
one serialized ProfileHMM object, but never more than one (I can't even 
read in the same serialized object again.)  It appears that the problem 
lies with the AlphabetManager. Maybe a clash with alphabet names and/or 
indexes?  I looked in the archives and found the problem seemed to exist 
back in Oct of 2002.  Has this ever been addressed?  Any help here would 
be greatly appreciated.

Thanks,
Todd
RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs
Schreiber, Mark
Tue, 08 Oct 2002 13:11:33 -0700
Yup,

It needs fixing, serialization and BioJava just don't seem to play that
well :(

The question is what kind of API. The attractive part about
serialization is that when it works you get back what you started with.
You can also do RMI. The downside of the XML model is you don't get back
what you had before, you get back a MarkovModel, all of your custom
designed methods etc are lost. 

Two ways I can see to get around this. One right a wrapper class that
makes your custom model and the thing returned by the XMLMarkovModel
look the same (look like the same interface generally). The other option
is to mimic something like JAXB (not JAXB though as it won't cope well
with BioJava flyweight symbols and alphabets). Somewhere the class name
is stored in the XML and through the wonders of introspection things are
returned to how they were. This generally requires the class to be
designed as a valid bean, or at least point to a nice FactoryClass or
something.

Ultimately this would be good for all of BioJava. I know people hate the
idea of another XML format but I think that there really isn't one that
represents what we are trying to do here. You could also write XSLT to
transform into XML flavours that aren't as interested in gory details
such as classnames etc which are needed for serialization.

Just my $0.02

- Mark


> -----Original Message-----
> From: Matthew Pocock [mailto:[EMAIL PROTECTED]] 
> Sent: Wednesday, 9 October 2002 7:08 a.m.
> To: Lachlan Coin; [EMAIL PROTECTED]
> Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs
> 
> 
> Hi,
> 
> HMM serialization (or persistance) seems to be an
> ongoing problem for people. We (OK - I) wrote this
> code a long time ago, back in the dark ages when I
> didn't know much about programming. Does anyone want
> to fix this mess once and for all, and write a HMM
> persistance API? It sounds like that would be a realy
> helpfull thing to have.
> 
> Matthew
> 
>  --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi
> > 
> > Having made a mistake in serialising HMMs before -
> > are you writing your
> > serialised object at several points in the code?
> > Unless you write all of
> > the models at the same point, they will not work
> > when you read them back
> > in.
> > 
> > Cheers,
> > 
> > Lachlan
> > 
> > >
> > > Message: 1
> > > Subject: RE: [Biojava-l] Create DP object from
> > profileHMM class file
> > > Date: Tue, 8 Oct 2002 08:53:41 +1300
> > > From: "Schreiber, Mark"
> > <[EMAIL PROTECTED]>
> > > To: "Tisanai" <[EMAIL PROTECTED]>,
> > <[EMAIL PROTECTED]>
> > >
> > > Hi -
> > >
> > > The error is coming from the 64th line of your
> > program (at
> > > T_Zscore.main(T_Zscore.java:64))
> > >
> > > I can see two places that the error might be
> > coming from but I need to
> > > know which line is the 64th line of the program.
> > >
> > > Is it: ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > >
> > > Or is it: dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Tisanai
> > [mailto:[EMAIL PROTECTED]]
> > > > Sent: Tuesday, 8 October 2002 2:40 a.m.
> > > > To: [EMAIL PROTECTED]
> > > > Subject: [Biojava-l] Create DP object from
> > profileHMM class file
> > > >
> > > >
> > > > Hi
> > > >
> > > >    By this code I would like to create DP object
> > from several
> > > > phmm file.
> > > >
> > > >        for(int
> > i=0;i<md_out_lst.align.length;i++){
> > > >         String model_out_name =
> > md_out_lst.align[i];
> > > >         File md_file = new File(model_out_name);
> > > >
> > > >         FileInputStream fis_md = new
> > FileInputStream(md_file);
> > > >         ObjectInputStream ois_md = new
> > ObjectInputStream(fis_md);
> > > >         ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > > >         ois_md.close();
> > > >         dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > > >        }
> > > >
> > > >    I found that  it always stuck at the second file (i=2). If 
there is only one file in my list this code will
> > work fine. But if there is more than one file in the list when it try 
to
> > > > create the second dp object (dp[1]). This kind of error will shown 
out:
> > > >
> > > >             org.biojava.bio.BioError: State d-15
> > is known in
> > > > states  but is not listed in the transFrom table
> > > >         at
> > > >
> >
> org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar
> > > > kovModel.java:227)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599)
> > > >             at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586)
> > > >         at org.biojava.bio.dp.DP.stateList(DP.java:123)
> > > >         at org.biojava.bio.dp.DP.update(DP.java:353)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49)
> > > >         at org.biojava.bio.dp.DP.<init>(DP.java:377)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.<init>(SingleDP.java:41)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory
> > > .java:53)
> > > >         at T_Zscore.main(T_Zscore.java:64)
> > > >
> > > >     How can I fix my code?
> > > >
> > > > Thank
> > > > Tisanai
> > > >
> > > > _______________________________________________
> > > > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > > > http://biojava.org/mailman/listinfo/biojava-l
> > > >
> > >
> >
> ==============================================================
> =========
> > > Attention: The information contained in this
> > message and/or attachments
> > > from AgResearch Limited is intended only for the
> > persons or entities
> > > to which it is addressed and may contain
> > confidential and/or privileged
> > > material. Any review, retransmission,
> > dissemination or other use of, or
> > > taking of any action in reliance upon, this
> > information by persons or
> > > entities other than the intended recipients is
> > prohibited by AgResearch
> > > Limited. If you have received this message in
> > error, please notify the
> > > sender immediately.
> > >
> >
> ==============================================================
> =========
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > http://biojava.org/mailman/listinfo/biojava-l
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts 
> http://uk.my.yahoo.com >
_______________________________________________
> Biojava-l mailing list  -  [EMAIL PROTECTED] 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

[Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Lachlan Coin 


From mark.schreiber at novartis.com  Thu Mar 16 21:52:52 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 17 Mar 2006 10:52:52 +0800
Subject: [Biojava-l] Away
Message-ID: <OFE32BF527.53630C7A-ON48257134.000FA01A-48257134.000FD39F@EU.novartis.net>

Hello -

I'm going to be travelling a lot in the next 5 weeks and may only have 
patchy access to email and no access to CVS or my development machines. 
Therefore I won't be able to offer much in the way of technical support. 

Hopefully Richard and Michael will be able to deal with any major issues 
that crop up.

Best regards,

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


From toddri at eden.rutgers.edu  Thu Mar 16 21:33:05 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu, 16 Mar 2006 21:33:05 -0500
Subject: [Biojava-l] ProfileHMM Serialization Problem
In-Reply-To: <OFE0127171.ED2CB90B-ON4825712C.000590AE-4825712C.0005B129@EU.novartis.net>
References: <OFE0127171.ED2CB90B-ON4825712C.000590AE-4825712C.0005B129@EU.novartis.net>
Message-ID: <441A1FE1.9000508@eden.rutgers.edu>

  Hello all,

I am having a problem with serialized ProfileHMM objects.  I can read in 
one serialized ProfileHMM object, but never more than one (I can't even 
read in the same serialized object again.)  It appears that the problem 
lies with the AlphabetManager. Maybe a clash with alphabet names and/or 
indexes?  I looked in the archives and found the problem seemed to exist 
back in Oct of 2002.  Has this ever been addressed?  Any help here would 
be greatly appreciated.

Thanks,
Todd


  RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs

Schreiber, Mark
Tue, 08 Oct 2002 13:11:33 -0700

Yup,

It needs fixing, serialization and BioJava just don't seem to play that
well :(

The question is what kind of API. The attractive part about
serialization is that when it works you get back what you started with.
You can also do RMI. The downside of the XML model is you don't get back
what you had before, you get back a MarkovModel, all of your custom
designed methods etc are lost. 

Two ways I can see to get around this. One right a wrapper class that
makes your custom model and the thing returned by the XMLMarkovModel
look the same (look like the same interface generally). The other option
is to mimic something like JAXB (not JAXB though as it won't cope well
with BioJava flyweight symbols and alphabets). Somewhere the class name
is stored in the XML and through the wonders of introspection things are
returned to how they were. This generally requires the class to be
designed as a valid bean, or at least point to a nice FactoryClass or
something.

Ultimately this would be good for all of BioJava. I know people hate the
idea of another XML format but I think that there really isn't one that
represents what we are trying to do here. You could also write XSLT to
transform into XML flavours that aren't as interested in gory details
such as classnames etc which are needed for serialization.

Just my $0.02

- Mark


> -----Original Message-----
> From: Matthew Pocock [mailto:[EMAIL PROTECTED] <mailto:%5BEMAIL%20PROTECTED%5D>] 
> Sent: Wednesday, 9 October 2002 7:08 a.m.
> To: Lachlan Coin; [EMAIL PROTECTED]
> Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs
> 
> 
> Hi,
> 
> HMM serialization (or persistance) seems to be an
> ongoing problem for people. We (OK - I) wrote this
> code a long time ago, back in the dark ages when I
> didn't know much about programming. Does anyone want
> to fix this mess once and for all, and write a HMM
> persistance API? It sounds like that would be a realy
> helpfull thing to have.
> 
> Matthew
> 
>  --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi
> > 
> > Having made a mistake in serialising HMMs before -
> > are you writing your
> > serialised object at several points in the code?
> > Unless you write all of
> > the models at the same point, they will not work
> > when you read them back
> > in.
> > 
> > Cheers,
> > 
> > Lachlan
> > 
> > >
> > > Message: 1
> > > Subject: RE: [Biojava-l] Create DP object from
> > profileHMM class file
> > > Date: Tue, 8 Oct 2002 08:53:41 +1300
> > > From: "Schreiber, Mark"
> > <[EMAIL PROTECTED]>
> > > To: "Tisanai" <[EMAIL PROTECTED]>,
> > <[EMAIL PROTECTED]>
> > >
> > > Hi -
> > >
> > > The error is coming from the 64th line of your
> > program (at
> > > T_Zscore.main(T_Zscore.java:64))
> > >
> > > I can see two places that the error might be
> > coming from but I need to
> > > know which line is the 64th line of the program.
> > >
> > > Is it: ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > >
> > > Or is it: dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Tisanai
> > [mailto:[EMAIL PROTECTED] <mailto:%5BEMAIL%20PROTECTED%5D>]
> > > > Sent: Tuesday, 8 October 2002 2:40 a.m.
> > > > To: [EMAIL PROTECTED]
> > > > Subject: [Biojava-l] Create DP object from
> > profileHMM class file
> > > >
> > > >
> > > > Hi
> > > >
> > > >    By this code I would like to create DP object
> > from several
> > > > phmm file.
> > > >
> > > >        for(int
> > i=0;i<md_out_lst.align.length;i++){
> > > >         String model_out_name =
> > md_out_lst.align[i];
> > > >         File md_file = new File(model_out_name);
> > > >
> > > >         FileInputStream fis_md = new
> > FileInputStream(md_file);
> > > >         ObjectInputStream ois_md = new
> > ObjectInputStream(fis_md);
> > > >         ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > > >         ois_md.close();
> > > >         dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > > >        }
> > > >
> > > >    I found that  it always stuck at the second file (i=2). If there is only one file in my list this code will
> > work fine. But if there is more than one file in the list when it try to
> > > > create the second dp object (dp[1]). This kind of error will shown out:
> > > >
> > > >             org.biojava.bio.BioError: State d-15
> > is known in
> > > > states  but is not listed in the transFrom table
> > > >         at
> > > >
> >
> org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar
> > > > kovModel.java:227)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599)
> > > >             at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586)
> > > >         at org.biojava.bio.dp.DP.stateList(DP.java:123)
> > > >         at org.biojava.bio.dp.DP.update(DP.java:353)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49)
> > > >         at org.biojava.bio.dp.DP.<init>(DP.java:377)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.<init>(SingleDP.java:41)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory
> > > .java:53)
> > > >         at T_Zscore.main(T_Zscore.java:64)
> > > >
> > > >     How can I fix my code?
> > > >
> > > > Thank
> > > > Tisanai
> > > >
> > > > _______________________________________________
> > > > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > > > http://biojava.org/mailman/listinfo/biojava-l
> > > >
> > >
> >
> ==============================================================
> =========
> > > Attention: The information contained in this
> > message and/or attachments
> > > from AgResearch Limited is intended only for the
> > persons or entities
> > > to which it is addressed and may contain
> > confidential and/or privileged
> > > material. Any review, retransmission,
> > dissemination or other use of, or
> > > taking of any action in reliance upon, this
> > information by persons or
> > > entities other than the intended recipients is
> > prohibited by AgResearch
> > > Limited. If you have received this message in
> > error, please notify the
> > > sender immediately.
> > >
> >
> ==============================================================
> =========
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > http://biojava.org/mailman/listinfo/biojava-l
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts 
> http://uk.my.yahoo.com >
_______________________________________________
> Biojava-l mailing list  -  [EMAIL PROTECTED] 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

[Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs 
<http://www.mail-archive.com/biojava-l at biojava.org/msg02132.html> 
Lachlan Coin

From er.sukhdeepsingh at gmail.com  Fri Mar 17 06:21:16 2006
From: er.sukhdeepsingh at gmail.com (Sukhdeep Singh)
Date: Fri, 17 Mar 2006 16:51:16 +0530
Subject: [Biojava-l] need help
Message-ID: <40fbb41e0603170321p572b04cdj20d8e84ae5fb3977@mail.gmail.com>

hello guys
myself SUKHDEEP SINGH a 2ND YEAR student of AMBALA COLLEGE OF ENGINEERING &
APPLIED RESEARCH.

  pals i am very much dedicated to bioinformatics and want to do something
great in it.
  i have also done basic & advanced courses in BIOINFORMATICS in my 15 day
winter vacation.
I hav learned the functions of some softwares such as RASMOL,SWISSPDB,CN3D(
V3.1),CLUSTAL-X,HYPERCAM(V7.5 student evaluation version).
  i am very much dedicated to it because i have a good knowledge of
computers as i am operating it for about 4 years but moderate knowledge of
bio.
I am also familier to the databases like KEGG,NCBI,PUBMED,ENTREZ etc.
so i want you to help me by telling me any tutorial program for
BIOJAVA,BIOPERL or any institute giving training in bioinformatics or any
other subject related to BIOINFORMATICS for 45 days nearly in the month of
july-august.

so please friends jus help me out with this

REPLY me at er.sukhdeepsingh at gmail.com

SUKHDEEP SINGH


From dag at sonsorol.org  Tue Mar 21 12:55:11 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Tue, 21 Mar 2006 12:55:11 -0500
Subject: [Biojava-l] Important OBF update for biojava developers and users
Message-ID: <EA431336-80D3-4430-8690-73A33AAC3A55@sonsorol.org>


Executive summary:

biojava.org new DNS is propagating as I write this email. Eventually  
everyone should see the new wiki-based site running on the new OBF  
server hardware.  Read on for more info on some other upcoming  
changes...


Hi biojava people,

Sorry for the interruption but I've got some important site and  
server news. People will also see multiple copies of this note as I  
slowly transition sites over one at a time.

We are in the midst of moving all of our websites, mailing lists,  
developers and sourcecode repositories onto more modern hardware  
located in a 2nd Boston area datacenter facility.

The transition is important for a couple of reasons - the most urgent  
being that we are going to lose internet connectivity in our current  
hosting facility on March 27th 2006.  That datacenter belongs to  
Wyeth Research in Cambridge, Massachusetts.  Wyeth Research &  
Genetics Institute have been long time significant supporters &  
hosting providers for OBF servers and projects -- we owe them a great  
deal of gratitude and public acknowledgment for hosting our servers  
over many years. Speaking as a hardware geek I can tell you that the  
many years of high-bandwidth, trouble free hosting have been  
invaluable for our efforts and projects.   Sadly, it is no longer  
possible for them to host our servers as they need to begin making  
some network and WAN circuit changes that will no longer support  
direct internet facing servers (such as ours) in Cambridge.

The other major reason for the transition is our need to relocate  
onto hardware that can better be remotely managed (as our volunteer  
administrators are scattered all over the globe).

My employer, BioTeam Inc. has donated new server hardware and is also  
providing the hosting facilities in a Tier 1 Boston area colocation  
facility.
Infrastructure geeks can see pictures of the colocation  cage and the  
new OBF servers online at this URL:
http://bioteam.net/gallery/bioteamBDC  -- those servers also host  
EMBOSS FTP/CVS and mailing lists.


Current status of the migration:

  - All 57 mailing lists have been moved over to the new hardware  
(you may have noticed "lists.open-bio.org" showing up in your list  
messages)

  - The new anonymous sourcecode server is running at http:// 
code.open-bio.org. "cvs.biojava.org" is already pointing at it.

  - Your website (biojava.org) was moved to the new hardware (and new  
Wiki site!) about an hour ago

  - Developers with CVS accounts have *NOT* been migrated yet

Basically we are trying to relocate everything but the developers  
over the next few days so we can spend the weekend on the developer  
and CVS transition.


If DNS has not propagated yet, point your browser at http:// 
biojava.open-bio.org -- that is the new site your group has been  
building. What is happening now is DNS pointers for biojava.org and  
www.biojava.org are slowly changing over to point at the wiki and the  
new hardware. Eventually you'll see the same site regardless of which  
URL you use.


  For biojava users
  --------------------------
Please keep an eye on your website and mailing lists and let  
support at open-bio.org know if there are any problems with the  
transition. In particular your new wiki site contains embedded links  
to some parts of the 'old' static website.  I caught the obvious ones  
-- (biojava.org/downloads/ and biojava.org/docs/ but I may have  
missed some.  Please let me know about any broken links.

Also someone may want to clean up the biojava logo image now in the  
wiki to make the white background transparent.


  For developers and leaders
  ---------------------------------------------
Whomever will be updating the static parts of the website (/downlaod/  
and /docs/)  in the future will need login access to our new central  
webserver  machine, please contact support at open-bio.org to request a  
user account for biojava website maintenance.

For people with CVS commit/write access
---------------------------------------------------------
Also note that when we finally do transition over to the new  
developer machine (where the real sourcecode lives), ALL developers  
will need to email support at open-bio.org to request a password reset.  
Although we can transition usernames, settings and home directories  
over from the old to the new machine we can not transition over  
existing passwords as they are stored in incompatible hashed formats.  
All developers are going to need new passwords for the new developer  
machine.  We will likely make the developer machine swap this weekend.


Reporting Problems / Help & Assistance
------------------------------------------------------
The transition will be complicated, we need your help to spot  
problems and glitches! The OBF has a new helpdesk ticketing system  
set up at "support at open-bio.org" so that all OBF admins can read and  
respond to issues and problems. Most troubles should be reported to  
that address. For urgent problems, especially during this transition  
period,  feel free to contact me directly (dag at sonsorol.org) (ichat/ 
aol/aim screen name:  bioteamdag).


Regards,
Chris Dagdigian
open-bio.org


From toddri at eden.rutgers.edu  Thu Mar 23 16:59:23 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu, 23 Mar 2006 16:59:23 -0500
Subject: [Biojava-l] HMM's - Attempting some fancy stuff
In-Reply-To: <44119D31.6010703@mpiib-berlin.mpg.de>
References: <OF7F997996.426B09C6-ON4825712C.00092887-4825712C.000CB319@EU.novartis.net>
	<44119D31.6010703@mpiib-berlin.mpg.de>
Message-ID: <44231A3B.6030902@eden.rutgers.edu>

Hello,

After successfully implementing some TFBS search models using the 
ProfileHMM and DP classes, I am ready to attempt some fancier stuff that 
is going to require some serious coding.  Before I begin, I thought that 
I might field some questions to the BioJava users/programmers that have 
some experience and/or interest in the BioJava HMM classes.  I want to 
be sure to implement features in a fashion that will maximize usability 
in the simplest way....

Questions:

1. Many of the TFBS sites that I am modeling are palindromic or 
repetitive.  I wish to associate transition and emission distributions 
(as prior knowledge) during training in order to enforce a palindromic 
and/or repetitive pattern and thus also greatly reduce the parameter space.

Example: A p53 TFBS is palindromic and repetitive.  A 20 column Profile 
HMM can be greatly reduced to an HMM with a the match-state topology of 
1 2 3 4 5 C(5) C(4) C(3) C(2) C(1) 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1), 
where C() means DNA complement.  Notice that with this model, I now have 
only 5 match-state emissions as opposed to 20 to train.  (C(n) is a 
complement view over distribution n).  There are also far fewer 
transition distributions to train if I impose that the transitions from 
a->b are the same as b->a or C(b)->C(a), but in the opposite direction.

I wish to implement this in a fashion that does not require any changes 
to the current Viterbi, forward, Baum Welch, etc, algorithms, or the DP 
class.

I have already started writing classes that provide a view (or 
complement view) over an existing distribution.  My plan is to use these 
views as a means to correlate emission and transition distributions from 
and between different columns in the Profile HMM.

Has anyone ever tried this or thought of trying this?  Any ideas about 
how to implement this could be very useful.

2.  I wish to use more complicated background models than just a 0-th 
order background distribution.  I would like to use a Dirichlet mixture 
and/or higher order Markov models.  Has anyone looked into this?  Any 
ideas as to how to implement this in the current release?

-Todd


From toddri at eden.rutgers.edu  Thu Mar 23 18:04:45 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu, 23 Mar 2006 18:04:45 -0500
Subject: [Biojava-l] HMM's - Attempting some fancy stuff
In-Reply-To: <1143153837.13405.184.camel@elm.mcb.mcgill.ca>
References: <OF7F997996.426B09C6-ON4825712C.00092887-4825712C.000CB319@EU.novartis.net>	
	<44119D31.6010703@mpiib-berlin.mpg.de>
	<44231A3B.6030902@eden.rutgers.edu>
	<1143153837.13405.184.camel@elm.mcb.mcgill.ca>
Message-ID: <4423298D.8000901@eden.rutgers.edu>

Yes, I agree that the palindromes are not always identical.  However, 
often my unaligned training data is not complete enough to train the 
model well without some simplification.  So far, I have been using 
Cross-validation, sensitivity, and specificity to determine the 
effectiveness of this simplification approach.

-Todd

Francois Pepin wrote:

>>1. Many of the TFBS sites that I am modeling are palindromic or 
>>repetitive.  I wish to associate transition and emission distributions 
>>(as prior knowledge) during training in order to enforce a palindromic 
>>and/or repetitive pattern and thus also greatly reduce the parameter space.
>>    
>>
>
>Just as a note, we haven't found this to be ideal, if you have
>sufficient training data. It is often the case that one of the
>palindromes is more conserved than the other, and you would treating
>them the same way.
>
>Of course, it depends how much of an in-depth study you'll want to be
>doing.
>
>Francois
>
>  
>


From mark.schreiber at novartis.com  Thu Mar 23 21:28:04 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 24 Mar 2006 10:28:04 +0800
Subject: [Biojava-l] HMM's - Attempting some fancy stuff
Message-ID: <OF8375920E.72210EFF-ON4825713B.000CD1C1-4825713B.000D8EBC@EU.novartis.net>

I think you could do a palindrome as a push-down automaton or similar. 
Alternatively you could do something like a HMM with emission duration as 
in Borodovsky's GeneMarkHMM programs but that would require a lot of new 
code for the DP library (good to have though).

To use a Dirichlet mixture as your background you could calculate one and give it to a Distribution 
although it might be best to implement the Distribution interface with a 
class that generates one for you. To go to higer order models you just 
need a higher order alphabet 
(http://biojava.org/wiki/BioJava:Cookbook:Alphabets:CrossProduct) and 
possibly use an OrderNDistribution for background and emission 
(http://biojava.org/wiki/BioJava:CookBook:Distribution:Custom)

- Mark


Todd Riley <toddri at eden.rutgers.edu>
Sent by: biojava-l-bounces at lists.open-bio.org
03/24/2006 07:04 AM

 
        To:     Francois Pepin <fpepin at aei.ca>
        cc:     biojava-l at biojava.org, Mark Schreiber/GP/Novartis at PH
        Subject:        Re: [Biojava-l] HMM's - Attempting some fancy stuff


Yes, I agree that the palindromes are not always identical.  However, 
often my unaligned training data is not complete enough to train the 
model well without some simplification.  So far, I have been using 
Cross-validation, sensitivity, and specificity to determine the 
effectiveness of this simplification approach.

-Todd

Francois Pepin wrote:

>>1. Many of the TFBS sites that I am modeling are palindromic or 
>>repetitive.  I wish to associate transition and emission distributions 
>>(as prior knowledge) during training in order to enforce a palindromic 
>>and/or repetitive pattern and thus also greatly reduce the parameter 
space.
>> 
>>
>
>Just as a note, we haven't found this to be ideal, if you have
>sufficient training data. It is often the case that one of the
>palindromes is more conserved than the other, and you would treating
>them the same way.
>
>Of course, it depends how much of an in-depth study you'll want to be
>doing.
>
>Francois
>
> 
>

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From jolyon.holdstock at ogt.co.uk  Fri Mar 24 06:26:44 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Fri, 24 Mar 2006 11:26:44 -0000
Subject: [Biojava-l] RichSequence annotations...
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com>

Hi,

 
I use the following code to extract all the genes from a sequence file; 

I load the sequence then filter out only CDS features; iterating through
these lets me get the gene annotation for the feature

 
//======================================================================
=========

Sequence seq;

String fileName = new
File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e
mbl");

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

}

catch (IOException IOE) {

  System.out.println("IOException " + IOE);

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

    
//Create a feature filter for CDS features only

FeatureFilter ff = new FeatureFilter.ByType("CDS");

 
//Get the filtered Features

FeatureHolder fh = seq.filter(ff);

 
//Iterate over the Features in fh

for (Iterator i = fh.features(); i.hasNext(); ) {

  Feature f = (Feature)i.next();

  Annotation annotation = f.getAnnotation();

  Object key = "gene";

  hash.put(annotation.getProperty(key), f);

}

//======================================================================
=========

 
I am now using the new BioJavaX classes which I cannot get to work. Does
anyone has any pointers for this?

I use the sequence data so have to use a RichSequence rather than a
BioEntry

 
//======================================================================
=========

RichSequence richSeq;

String fileName =
"C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl";

  try {

    richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new
FileReader(fileName)), null).nextRichSequence();

  }

  catch (IOException IOE) {

    System.out.println("IOException " + IOE);

  }

  catch (BioException BIOE) {

    System.out.println("BioException " + BIOE);

}

 
//Create a feature filter for CDS features only

FeatureFilter ff = new FeatureFilter.ByType("CDS");

 
//Get the filtered Features

FeatureHolder fh = richSeq.filter(ff);

 
//Iterate through the features

for (Iterator i = fh.features(); i.hasNext(); ) {

  RichFeature rf = (RichFeature) i.next();

  System.out.println("RichFeature: " + rf.toString());

  RichAnnotation ra = (RichAnnotation) rf.getAnnotation();

  System.out.println("RichAnnotation: " + ra.toString());

}

//======================================================================
=========

 
The output  shows that CDS features have been filtered successfully and
that the gene name is in the annotation

 
RichFeature: (#1)
lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109
76,12496..12656,14136..14266])

RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1"

14403..14532,16852..16987,17821..17959,18068..18122,

19456..19570,23623..23753,25885..26053,29102..29240,

32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4)
biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match:
proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene:
dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited
guanine

nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1]

 
If I add the following then I can see what keys are in the annotation

//======================================================================
=========

Set keySet = ra.keys();

for (Iterator it = keySet.iterator(); it.hasNext(); ) {

  String key = it.next().toString();

  System.out.println("Key: " + key);

}

//======================================================================
=========

 
The output shows that there is a gene

 
Key: biojavax:clone_lib

Key: biojavax:codon_start

Key: biojavax:evidence

Key: biojavax:gene

Key: biojavax:note

Key: biojavax:product

Key: biojavax:protein_id

 
My understanding is that I need to use a ComparableTerm to access the
value but when I create it I get a NoSuchElementException error

 
ComparableTerm gene =
RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene");

System.out.println("Gene: " + ra.getProperty(gene));

 
java.util.NoSuchElementException: No such property: biojavax:gene, rank
0

 
cheers,

 
Jolyon

 
Jolyon Holdstock Ph.D.

Senior Computational Biologist,

Oxford Gene Technology (Ops) Ltd.

Begbroke Business and Science Park

Sandy Lane, Yarnton

Oxford, OX5 1PF

 
Tel: 01865 309699

Fax: 01865 842116

 
Confidentiality Notice:

The contents of this email from the Oxford Gene Technology Group of
Companies are confidential and intended solely for the person to whom it
is addressed. It may contain privileged and confidential information. If
you are not the intended recipient you must not read, copy, distribute,
discuss or take any action in reliance on it.

 
From richard.holland at ebi.ac.uk  Fri Mar 24 08:16:49 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 24 Mar 2006 13:16:49 +0000
Subject: [Biojava-l] RichSequence annotations...
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com>
Message-ID: <1143206209.3899.84.camel@texas.ebi.ac.uk>

The terms are ranked in RichAnnotations. getProperty(term) searches for
a Note with that term and a rank of zero.

If you don't know the ranks, you need to use the 

    public Note[] getProperties(Object key);

method on the RichAnnotation object instead. This will return a list of
all matching Note objects with the given term regardless of rank.

cheers,
Richard

On Fri, 2006-03-24 at 11:26 +0000, Jolyon Holdstock wrote:
> Hi,
> 
>  
> 
> I use the following code to extract all the genes from a sequence file; 
> 
> I load the sequence then filter out only CDS features; iterating through
> these lets me get the gene annotation for the feature
> 
>  
> 
> //======================================================================
> =========
> 
> Sequence seq;
> 
> String fileName = new
> File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e
> mbl");
> 
> try {
> 
>   seq = SeqIOTools.readEmbl(new BufferedReader(new
> FileReader(fileName))).nextSequence();
> 
> }
> 
> catch (IOException IOE) {
> 
>   System.out.println("IOException " + IOE);
> 
> }
> 
> catch (BioException BIOE) {
> 
>   System.out.println("BioException " + BIOE);
> 
> }
> 
>     
> 
> //Create a feature filter for CDS features only
> 
> FeatureFilter ff = new FeatureFilter.ByType("CDS");
> 
>  
> 
> //Get the filtered Features
> 
> FeatureHolder fh = seq.filter(ff);
> 
>  
> 
> //Iterate over the Features in fh
> 
> for (Iterator i = fh.features(); i.hasNext(); ) {
> 
>   Feature f = (Feature)i.next();
> 
>   Annotation annotation = f.getAnnotation();
> 
>   Object key = "gene";
> 
>   hash.put(annotation.getProperty(key), f);
> 
> }
> 
> //======================================================================
> =========
> 
>  
> 
> I am now using the new BioJavaX classes which I cannot get to work. Does
> anyone has any pointers for this?
> 
> I use the sequence data so have to use a RichSequence rather than a
> BioEntry
> 
>  
> 
> //======================================================================
> =========
> 
> RichSequence richSeq;
> 
> String fileName =
> "C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl";
> 
>   try {
> 
>     richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new
> FileReader(fileName)), null).nextRichSequence();
> 
>   }
> 
>   catch (IOException IOE) {
> 
>     System.out.println("IOException " + IOE);
> 
>   }
> 
>   catch (BioException BIOE) {
> 
>     System.out.println("BioException " + BIOE);
> 
> }
> 
>  
> 
> //Create a feature filter for CDS features only
> 
> FeatureFilter ff = new FeatureFilter.ByType("CDS");
> 
>  
> 
> //Get the filtered Features
> 
> FeatureHolder fh = richSeq.filter(ff);
> 
>  
> 
> //Iterate through the features
> 
> for (Iterator i = fh.features(); i.hasNext(); ) {
> 
>   RichFeature rf = (RichFeature) i.next();
> 
>   System.out.println("RichFeature: " + rf.toString());
> 
>   RichAnnotation ra = (RichAnnotation) rf.getAnnotation();
> 
>   System.out.println("RichAnnotation: " + ra.toString());
> 
> }
> 
> //======================================================================
> =========
> 
>  
> 
> The output  shows that CDS features have been filtered successfully and
> that the gene name is in the annotation
> 
>  
> 
> RichFeature: (#1)
> lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109
> 76,12496..12656,14136..14266])
> 
> RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1"
> 
> 14403..14532,16852..16987,17821..17959,18068..18122,
> 
> 19456..19570,23623..23753,25885..26053,29102..29240,
> 
> 32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4)
> biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match:
> proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene:
> dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited
> guanine
> 
> nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1]
> 
>  
> 
> 
> 
> If I add the following then I can see what keys are in the annotation
> 
> //======================================================================
> =========
> 
> Set keySet = ra.keys();
> 
> for (Iterator it = keySet.iterator(); it.hasNext(); ) {
> 
>   String key = it.next().toString();
> 
>   System.out.println("Key: " + key);
> 
> }
> 
> //======================================================================
> =========
> 
>  
> 
> The output shows that there is a gene
> 
>  
> 
> Key: biojavax:clone_lib
> 
> Key: biojavax:codon_start
> 
> Key: biojavax:evidence
> 
> Key: biojavax:gene
> 
> Key: biojavax:note
> 
> Key: biojavax:product
> 
> Key: biojavax:protein_id
> 
>  
> 
> My understanding is that I need to use a ComparableTerm to access the
> value but when I create it I get a NoSuchElementException error
> 
>  
> 
> ComparableTerm gene =
> RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene");
> 
> System.out.println("Gene: " + ra.getProperty(gene));
> 
>  
> 
> java.util.NoSuchElementException: No such property: biojavax:gene, rank
> 0
> 
>  
> 
> cheers,
> 
>  
> 
> Jolyon
> 
>  
> 
> 
> 
> 
> 
> 
> 
> Jolyon Holdstock Ph.D.
> 
> Senior Computational Biologist,
> 
> Oxford Gene Technology (Ops) Ltd.
> 
> Begbroke Business and Science Park
> 
> Sandy Lane, Yarnton
> 
> Oxford, OX5 1PF
> 
>  
> 
> Tel: 01865 309699
> 
> Fax: 01865 842116
> 
>  
> 
> Confidentiality Notice:
> 
> The contents of this email from the Oxford Gene Technology Group of
> Companies are confidential and intended solely for the person to whom it
> is addressed. It may contain privileged and confidential information. If
> you are not the intended recipient you must not read, copy, distribute,
> discuss or take any action in reliance on it.
> 
>  
> 
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From dag at sonsorol.org  Sat Mar 25 18:50:57 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Sat, 25 Mar 2006 18:50:57 -0500
Subject: [Biojava-l] Important news for developers on open-bio machines
Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org>


Hi, apologies for the massive cross-post. I'll keep it short!

This message is a last-ditch attempt to contact people with developer  
accounts on pub.open-bio.org who may have not received the individual  
mails we've been sending via the obf-developers at lists.open-bio.org  
mailing list. We suspect that there are a number of devs out there  
for whom we don't have up to date email addresses.

All open-bio services have been migrated to new hardware and a new  
datacenter. Part of this migration process involved moving all  
developer accounts and all source-code repositories to a new server.  
The developer migration was completed a few minutes ago. An  
unavoidable side effect of the move is that all developers are now  
locked out of their accounts until they contact us for a password reset.

If you are a developer and this news comes as a surprise to you, it  
means we don't have your contact info. Your best way to get up to  
speed on the history and technical details behind the migration is to  
point your browser here:

http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ 
thread.html

... and read the various messages we've posted this month. Included  
in the first message is the information on how to request an account  
reset.


Regards,
Chris Dagdigian
open-bio.org


From duze at gmx.de  Tue Mar 28 01:44:38 2006
From: duze at gmx.de (=?ISO-8859-1?Q?=22Andreas_Dr=E4ger=22?=)
Date: Tue, 28 Mar 2006 08:44:38 +0200 (MEST)
Subject: [Biojava-l] (no subject)
Message-ID: <2493.1143528278@www086.gmx.net>

Hi,

I just tried the GA-Example from the BioJava Cookbook.
Therefore I included all sources from the biojava-live
directory from CVS. The following line seems to cause
problems:

       genAlg.run(new DemoStopping());

After execution one receives the following (error) message:
gen,average_fitness,best_fitness
0,49.98,67.0
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
   Syntax error on token "assert", assert expected

   at
org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280)
   at
org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339)
   at
org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80)
   at
org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108)
   at GADemo.main(GADemo.java:91)

I do not know, how to proceed, so I post this message to you.

Sincerely,
Andreas Dr?ger

-- 
Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

From richard.holland at ebi.ac.uk  Tue Mar 28 02:42:33 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 28 Mar 2006 08:42:33 +0100
Subject: [Biojava-l] (no subject)
In-Reply-To: <2493.1143528278@www086.gmx.net>
References: <2493.1143528278@www086.gmx.net>
Message-ID: <1143531753.3898.45.camel@texas.ebi.ac.uk>

Hi Andreas.

This sounds like a compiler version or flags problem. 

Could you check that you are running javac from a Java 1.4 or later
installation?

Also, see
http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility

(The Ant script uses the flag '-source 1.4' for everything).

Then try doing an 'ant clean' before your 'ant package-biojava' to make
sure everything gets recompiled.

cheers,
Richard 

On Tue, 2006-03-28 at 08:44 +0200, "Andreas Dr?ger" wrote:
> Hi,
> 
> I just tried the GA-Example from the BioJava Cookbook.
> Therefore I included all sources from the biojava-live
> directory from CVS. The following line seems to cause
> problems:
> 
>        genAlg.run(new DemoStopping());
> 
> After execution one receives the following (error) message:
> gen,average_fitness,best_fitness
> 0,49.98,67.0
> Exception in thread "main" java.lang.Error: Unresolved compilation problem:
>    Syntax error on token "assert", assert expected
> 
>    at
> org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280)
>    at
> org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339)
>    at
> org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80)
>    at
> org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108)
>    at GADemo.main(GADemo.java:91)
> 
> I do not know, how to proceed, so I post this message to you.
> 
> Sincerely,
> Andreas Dr?ger
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From andreas.draeger at clever-telefonieren.de  Tue Mar 28 03:29:32 2006
From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Tue, 28 Mar 2006 10:29:32 +0200
Subject: [Biojava-l] GA-Package
Message-ID: <4428F3EC.9050507@clever-telefonieren.de>

Thanks, 

Now it works fine!

Cheers,
Andreas


Richard Holland wrote:

Hi Andreas.

This sounds like a compiler version or flags problem. 

Could you check that you are running javac from a Java 1.4 or later
installation?

Also, see
http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility

(The Ant script uses the flag '-source 1.4' for everything).

Then try doing an 'ant clean' before your 'ant package-biojava' to make
sure everything gets recompiled.

cheers,
Richard 

-- 
==================================
Andreas Dr?ger
PhD student
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Phone: +49-7071-29-70436
==================================


From andreas.draeger at clever-telefonieren.de  Tue Mar 28 03:34:20 2006
From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Tue, 28 Mar 2006 10:34:20 +0200
Subject: [Biojava-l] GA-Package
Message-ID: <4428F50C.4070104@clever-telefonieren.de>

Thanks,

Now it works fine!

Cheers,
Andreas

-- 
==================================
Andreas Dr?ger
PhD student
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Phone: +49-7071-29-70436
==================================


From wendy.wong at gmail.com  Thu Mar 30 10:41:47 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Thu, 30 Mar 2006 16:41:47 +0100
Subject: [Biojava-l] unsupervised training of transition weights
Message-ID: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>

Hi,

I am trying to train my HMM using unsupervised training (I don't need
to train the emission probabilities). I was wondering how I can do so
in biojava. do I have to implement the TransitionTrainer interface?

my second question is:
I implemnted getWeightImpl in my custom distribution to set up my
emission states and it works fine. but is it possible to get the
program to access it only when there's certain symbol in the observed
sequence, (instead of precalculated)? and I also found that (although
I might be wrong) the weights are calculated twice, once was when the
distribution was created, and then when I call viterbi it calls
getWeightImpl again. I am not sure what I did wrong here :(

any input would be very much appreciated!

thank you!

wendy


From td2 at sanger.ac.uk  Fri Mar 31 05:58:38 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Fri, 31 Mar 2006 11:58:38 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
Message-ID: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>


On 30 Mar 2006, at 16:41, wendy wong wrote:

> Hi,
>
> I am trying to train my HMM using unsupervised training (I don't need
> to train the emission probabilities). I was wondering how I can do so
> in biojava. do I have to implement the TransitionTrainer interface?

The easiest way to do this is to use UntrainableDistributions for all  
the transition-sets that you don't want to be trained:

         http://www.biojava.org/docs/api14/org/biojava/bio/dist/ 
UntrainableDistribution.html

If UntrainableDistribution doesn't fit your requirements, the  
alternative is to create your own Distribution implementation with a  
registerTrainer method that creates a "dummy" (i.e. doesn't do  
anything) DistributionTrainer.  UntrainableDistribution is just a  
subclass of SimpleDistribution which replaces the registerTrainer  
method with a non-functional version.

> my second question is:
> I implemnted getWeightImpl in my custom distribution to set up my
> emission states and it works fine. but is it possible to get the
> program to access it only when there's certain symbol in the observed
> sequence, (instead of precalculated)? and I also found that (although
> I might be wrong) the weights are calculated twice, once was when the
> distribution was created, and then when I call viterbi it calls
> getWeightImpl again. I am not sure what I did wrong here :(

The DP code does some caching of probabilities, I don't think there's  
any way to turn this off without modifying the DP implementations.

           Thomas.

From matthew.pocock at ncl.ac.uk  Fri Mar 31 12:05:25 2006
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Fri, 31 Mar 2006 18:05:25 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
	<5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
Message-ID: <200603311805.25861.matthew.pocock@ncl.ac.uk>

> The DP code does some caching of probabilities, I don't think there's
> any way to turn this off without modifying the DP implementations.
>
>            Thomas.

My reccolection is that if you did turn this off, the algorithm would run 
very, very much more slowly. Internally to the DP objects, the distribution 
probabilities (in fact, they aren't even probabilities by this stage) are 
stored in a data-structure optimized for the type of lookups performed during 
the dynamic programming recursions.

Matthew

From emy_66 at hotmail.com  Mon Mar 13 05:39:41 2006
From: emy_66 at hotmail.com (Emily Wong)
Date: Mon, 13 Mar 2006 05:39:41 +0000
Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13
Message-ID: <BAY103-F3466856600F5B0BB2AE6EEE1E00@phx.gbl>

Hi,

Is there a parser that takes into account ncbi blast version 2.2.13(on their 
website)? I am trying to use the code here to parse : 
http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the 
parser from strict to lazy I get these comments :
Exception in thread "main" java.lang.NullPointerException
	at 
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215)
	at 
org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
	at 
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311)
	at 
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274)
	at 
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160)
	at BlastParser.main(BlastParser.java:46)

Thanks,

Emily


From mark.schreiber at novartis.com  Tue Mar 14 01:07:39 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Tue, 14 Mar 2006 09:07:39 +0800
Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13
Message-ID: <OF775F959B.A4A8F50B-ON48257131.00061EEB-48257131.0006322C@EU.novartis.net>

Possibly some variation in this output is causing the problem

Can you post some blast output that replicates this error?

Thanks

- Mark


"Emily Wong" <emy_66 at hotmail.com>
Sent by: biojava-l-bounces at lists.open-bio.org
03/13/2006 01:39 PM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Blast parser for ncbi blast version 2.2.13


Hi,

Is there a parser that takes into account ncbi blast version 2.2.13(on 
their 
website)? I am trying to use the code here to parse : 
http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the 
parser from strict to lazy I get these comments :
Exception in thread "main" java.lang.NullPointerException
                 at 
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215)
                 at 
org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
                 at 
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311)
                 at 
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274)
                 at 
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160)
                 at BlastParser.main(BlastParser.java:46)

Thanks,

Emily


_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From emy_66 at hotmail.com  Tue Mar 14 02:30:55 2006
From: emy_66 at hotmail.com (Emily Wong)
Date: Tue, 14 Mar 2006 02:30:55 +0000
Subject: [Biojava-l] Blast parser for ncbi blast version 2.2.13
In-Reply-To: <OF775F959B.A4A8F50B-ON48257131.00061EEB-48257131.0006322C@EU.novartis.net>
Message-ID: <BAY103-F29C6A9C2A6B80459A34B89E1E10@phx.gbl>

Hi,

Here is a truncated example of the blast output that causes the biojava 
blast parser to print out error messages.

Thanks,
Emily


BLASTN 2.2.13 [Nov-27-2005]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch?ffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman
(1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.

RID: 1142225124-9513-115032994966.BLASTQ1


Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           3,777,692 sequences; 16,788,289,139 total letters
Query=  gi|27477090|ref|NM_000589.2| Homo sapiens interleukin 4 (IL4), 
transcript
variant 1, mRNA
Length=921


                                                                   Score     
E
Sequences producing significant alignments:                        (Bits)  
Value

gi|27477090|ref|NM_000589.2|  Homo sapiens interleukin 4 (IL4), t  1649    
0.0
gi|45708995|gb|BC067514.1|  Homo sapiens interleukin 4, transc...  1045    
0.0
gi|186334|gb|M13982.1|HUMIL4  Human interleukin 4 (IL-4) mRNA, co  1039    
0.0
gi|45709847|gb|BC067515.1|  Homo sapiens interleukin 4, transc...  1037    
0.0
gi|47123366|gb|BC070123.1|  Homo sapiens interleukin 4, transc...  1023    
0.0
gi|42490870|gb|BC066277.1|  Homo sapiens interleukin 4, transc...  1023    
0.0
gi|42490899|gb|BC066278.1|  Homo sapiens cDNA clone IMAGE:6971781  1021    
0.0
gi|27477091|ref|NM_172348.1|  Homo sapiens interleukin 4 (IL4), t  1003    
0.0
gi|2811101|gb|AC004039.1|  Homo sapiens chromosome 5, P1 clone...  1001    
0.0
gi|14600279|gb|AF395008.1|AF395008  Homo sapiens interleukin 4 (I  1001    
0.0
gi|1930572|gb|L81582.1|HSL81582  Homo sapiens (subclone 9_c5 f...  1001    
0.0
gi|33831|emb|X06750.1|HSIL45  Human interleukin 4 gene 5'-region   1001    
0.0
gi|186336|gb|M23442.1|HUMIL4A  Human interleukin 4 (IL-4) gene, c  1001    
0.0
gi|39980474|gb|AY480012.1|  Pan troglodytes interleukin 13 (IL...   977    
0.0
gi|37703741|gb|AY339646.1|  Gorilla gorilla interleukin 4 (IL4...   977    
0.0
gi|37703740|gb|AY339645.1|  Pan paniscus interleukin 4 (IL4) g...   977    
0.0
gi|37703739|gb|AY339644.1|  Pan troglodytes interleukin 4 (IL4...   977    
0.0
gi|62990254|gb|AC157216.2|  Pan troglodytes BAC clone CH251-66...   977    
0.0
gi|37703742|gb|AY339647.1|  Pongo pygmaeus interleukin 4 (IL4)...   969    
0.0
gi|61358435|gb|AY888625.1|  Synthetic construct Homo sapiens c...   916    
0.0
gi|61368530|gb|AY891269.1|  Synthetic construct Homo sapiens c...   912    
0.0
gi|60828588|gb|AY893811.1|  Synthetic construct Homo sapiens c...   912    
0.0
gi|60816922|gb|AY893365.1|  Synthetic construct Homo sapiens c...   912    
0.0
gi|58743320|ref|NM_001011714.1|  Pan troglodytes interleukin 4 (I   908    
0.0
gi|22858883|gb|AY130260.1|  Pan troglodytes interleukin-4 (IL-4)    908    
0.0
gi|37703743|gb|AY339648.1|  Papio papio interleukin 4 (IL4) ge...   904    
0.0
gi|1841299|dbj|AB000515.1|  Macaca fascicularis mRNA for IL-4 pre   797    
0.0
gi|514383|gb|L26027.1|MACIN4A  Macaca mulatta interleukin 4 mRNA,   797    
0.0
gi|644793|gb|U19838.1|CTU19838  Cercocebus torquatus interleukin-   789    
0.0
gi|29569760|gb|AY234221.1|  Papio anubis interleukin-4 precursor    779    
0.0
gi|74136370|ref|NM_001032904.1|  Macaca mulatta interleukin-4 (LO   773    
0.0
gi|37014164|gb|AY376144.1|  Macaca mulatta interleukin-4 mRNA, co   773    
0.0
gi|20452370|gb|AF465829.1|  Homo sapiens interleukin 4-like (IL4)   684    
0.0
gi|40804376|gb|AY486435.1|  Cercocebus torquatus atys interleu...   642    
0.0
gi|40804375|gb|AY486434.1|  Macaca mulatta interleukin 4 (IL-4) g   634    
1e-178
gi|555892|gb|U14131.1|BTIL4S1  Bos taurus interleukin 4 (IL4) gen   561    
1e-156
gi|673418|emb|X81851.1|HSIL4SV  H. sapiens IL-4 gene splice varia   559    
6e-156
gi|58736974|dbj|AB102862.1|  Homo sapiens IL4 mRNA for interle...   559    
6e-156
gi|22858887|gb|AY130262.1|  Macaca fascicularis interleukin-4 ...   559    
6e-156
gi|19918909|gb|AY083267.1|  Macaca mulatta IL-4 gene, promoter re   553    
3e-154
gi|58743332|ref|NM_001008993.1|  Pan troglodytes interleukin 4 (I   551    
1e-153
gi|22858885|gb|AY130261.1|  Pan troglodytes interleukin-4 delt...   551    
1e-153
gi|2905623|gb|AF043336.1|AF043336  Homo sapiens interleukin 4 del   551    
1e-153
gi|19918910|gb|AY083268.1|  Macaca radiata IL-4 gene, promoter re   545    
8e-152
gi|4102679|gb|AF014509.1|  Aotus nancymaae interleukin-4 (IL-4) m   545    
8e-152
gi|6648935|gb|AF097321.1|  Aotus lemurinus interleukin-4 (IL-4) m   545    
8e-152
gi|6648933|gb|AF097320.1|  Aotus nigriceps interleukin-4 (IL-4) m   545    
8e-152
gi|4102669|gb|AF014504.1|  Aotus vociferans interleukin-4 (IL-4)    537    
2e-149
gi|25991896|gb|AF457197.1|  Macaca mulatta interleukin 4 mRNA, pa   502    
1e-138
gi|8575472|gb|AF235038.1|AF235038  Homo sapiens interleukin 4 (IL   486    
7e-134
gi|18654097|gb|L81736.1|HUM11DC9S  Homo sapiens (subclone 5_f7 fr   355    
2e-94
gi|18654096|gb|L81735.1|HUM11DC92S  Homo sapiens (subclone 8_e7 f   355    
2e-94
gi|1930575|gb|L81579.1|HSL81579  Homo sapiens (subclone 5_f7 f...   355    
2e-94
gi|1930574|gb|L81580.1|HSL81580  Homo sapiens (subclone 8_e7 f...   355    
2e-94
gi|34419654|gb|AC107611.6|  Rattus norvegicus 10 BAC CH230-195...   278    
3e-71
gi|56481|emb|X53087.1|RNIL4E12  R.norvegicus gene for interleukin   270    
7e-69
gi|545218|gb|S69238.1|  IL4 {promoter} [mice, BALB/c, Genomic, 82   222    
2e-54
gi|52678|emb|X05064.1|MMIL4G12  Mouse interleukin 4 gene exons 1    222    
2e-54
gi|57157560|dbj|AB174764.1|  Mus musculus molossinus IL-4 gene...   220    
6e-54
gi|3687208|gb|AC005742.1|  Mus musculus chromosome 11, BAC clo...   214    
4e-52
gi|11038606|gb|AC084392.1|  Mus musculus BAC clone GS1-182G5 from   214    
4e-52
gi|545217|gb|S69237.1|  IL4 {promoter} [mice, C57BL/6, Genomic, 8   214    
4e-52
gi|21211994|emb|AL645741.15|  Mouse DNA sequence from clone RP...   214    
4e-52
gi|27960676|gb|AF463769.1|  Mus musculus strain C57BL/6J cytok...   214    
4e-52
gi|1930579|gb|L81578.1|HSL81578  Homo sapiens (subclone 2_b2 f...   204    
4e-49
gi|4996849|dbj|AB020732.1|  Tursiops truncatus mRNA for interleuk   192    
1e-45
gi|163212|gb|M84745.1|BOVIL4XX  Bovine interleukin 4 (IL4) gene,    180    
5e-42
gi|20530678|gb|AF493991.1|  Sus scrofa interleukin-4 precursor (I   141    
5e-30
gi|1997|emb|X68330.1|SSILK4  S.scrofa mRNA for interleukin-4        141    
5e-30
gi|55742621|ref|NM_214123.1|  Sus scrofa interleukin 4 (IL4), mRN   141    
5e-30
gi|1730275|gb|U34273.1|CHU34273  Capra hircus interleukin-4 mRNA,   141    
5e-30
gi|294220|gb|L12991.1|PIGIL4A  Pig interleukin 4 mRNA, complete c   133    
1e-27
gi|29603606|dbj|AB107648.1|  Lama glama IL-4 mRNA for interleukin   131    
4e-27
gi|32186777|gb|AY294020.1|  Sus scrofa interleukin 4 mRNA, comple   125    
3e-25
gi|57527819|ref|NM_001009313.2|  Ovis aries interleukin 4 (IL4),    125    
3e-25
gi|165950|gb|M96845.1|SHPIL4A  Sheep interleukin 4 mRNA, complete   125    
3e-25
gi|84794457|dbj|AB246673.1|  Camelus bactrianus IL-4 mRNA for int   123    
1e-24
gi|163891|gb|L07081.1|CEUINTERLU  Cervus elaphus (clone SH3) inte   119    
2e-23
gi|164233|gb|L06010.1|HRSIL4X  Horse interleukin 4 mRNA, partial   97.6    
6e-17
gi|50096517|gb|AY648947.1|  Capra hircus interleukin-4 mRNA, comp  87.7    
6e-14
gi|31416286|gb|AY293620.1|  Bubalus bubalis interleukin 4 (IL-4)   83.8    
9e-13
gi|84875352|dbj|AB246356.1|  Bubalus bubalis x Bubalus caraban...  83.8    
9e-13
gi|84871721|dbj|AB246355.1|  Bubalus bubalis IL-4 mRNA for interl  83.8    
9e-13
gi|84871699|dbj|AB246275.1|  Bubalus carabanensis IL-4 mRNA for i  83.8    
9e-13
gi|31343261|ref|NM_173921.2|  Bos taurus interleukin 4 (IL4), mRN  83.8    
9e-13
gi|163210|gb|M77120.1|BOVIL4X  Bovine interleukin 4 (IL4) mRNA, c  83.8    
9e-13
gi|46310147|gb|AY575607.1|  Ovis aries interleukin 4 (IL-4) mR...  81.8    
4e-12
gi|46310145|gb|AY575606.1|  Ovis aries interleukin 4 (IL-4) mR...  81.8    
4e-12
gi|21217734|gb|AY096800.1|  Ovis aries interleukin-4 precursor, m  81.8    
4e-12
gi|2654199|gb|AF035404.1|  Equus caballus interleukin-4 precur...  81.8    
4e-12
gi|1277|emb|Z11897.1|OAIL4MRNA  O.aries IL-4 mRNA for interleukin  81.8    
4e-12
gi|5732983|gb|AF172168.1|AF172168  Ovis aries interleukin 4 mRNA,  81.8    
4e-12
gi|10716183|gb|AF305617.1|AF305617  Equus caballus interleukin 4   81.8    
4e-12
gi|60687486|gb|AY939910.1|  Boselaphus tragocamelus interleukin-4  81.8    
4e-12
gi|50978885|ref|NM_001003159.1|  Canis familiaris interleukin 4 (  75.8    
2e-10
gi|7330263|gb|AF239917.1|AF239917  Canis familiaris interleukin-4  75.8    
2e-10
gi|13346438|gb|AF346295.1|AF346295  Phocoena phocoena interleukin  75.8    
2e-10
gi|6007792|gb|AF187322.1|AF187322  Canis familiaris interleukin-4  75.8    
2e-10
gi|4185290|gb|AF083270.1|AF083270  Canis familiaris interleuki...  75.8    
2e-10
gi|14029512|gb|AF333965.1|  Marmota monax interleukin-4 mRNA, par  67.9    
5e-08

ALIGNMENTS
>gi|27477090|ref|NM_000589.2| Homo sapiens interleukin 4 (IL4), transcript 
>variant 1, mRNA
Length=921

Score = 1649 bits (832),  Expect = 0.0
Identities = 832/832 (100%), Gaps = 0/832 (0%)
Strand=Plus/Plus

Query  1    TTCTATGCAAAGCAAAAAGCCAGCAGCAGCCCCAAGCTGATAAGATTAATCTAAAGAGCA  60
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  1    TTCTATGCAAAGCAAAAAGCCAGCAGCAGCCCCAAGCTGATAAGATTAATCTAAAGAGCA  60

Query  61   AATTATGGTGTAATTTCCTATGCTGAAACTTTGTAGTTAATTTTTTAAAAAGGTTTCATT  
120
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  61   AATTATGGTGTAATTTCCTATGCTGAAACTTTGTAGTTAATTTTTTAAAAAGGTTTCATT  
120

Query  121  TTCCTATTGGTCTGATTTCACAGGAACATTTTACCTGTTTGTGAGGCATTTTTTCTCCTG  
180
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  121  TTCCTATTGGTCTGATTTCACAGGAACATTTTACCTGTTTGTGAGGCATTTTTTCTCCTG  
180

Query  181  GAAGAGAGGTGCTGATTGGCCCCAAGTGACTGACAATCTGGTGTAACGAAAATTTCCAAT  
240
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  181  GAAGAGAGGTGCTGATTGGCCCCAAGTGACTGACAATCTGGTGTAACGAAAATTTCCAAT  
240

Query  241  GTAAACTCATTTTCCCTCGGTTTCAGCAATTTTAAATCTATATATAGAGATATCTTTGTC  
300
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  241  GTAAACTCATTTTCCCTCGGTTTCAGCAATTTTAAATCTATATATAGAGATATCTTTGTC  
300

Query  301  AGCATTGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCG  
360
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  301  AGCATTGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCG  
360

Query  361  ACACCTATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATG  
420
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  361  ACACCTATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATG  
420

Query  421  TGCCGGCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAAC  
480
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  421  TGCCGGCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAAC  
480

Query  481  TTTGAACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTT  
540
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  481  TTTGAACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTT  
540

Query  541  TGCTGCCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCG  
600
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  541  TGCTGCCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCG  
600

Query  601  GCAGTTCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTT  
660
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  601  GCAGTTCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTT  
660

Query  661  CCACAGGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCT  
720
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  661  CCACAGGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCT  
720

Query  721  GGCGGGCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTT  
780
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  721  GGCGGGCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTT  
780

Query  781  GGAAAGGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA  832
            ||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  781  GGAAAGGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA  832


>gi|45708995|gb|BC067514.1| Homo sapiens interleukin 4, transcript variant 
>1, mRNA (cDNA
clone MGC:79403 IMAGE:6971780), complete cds
Length=528

Score = 1045 bits (527),  Expect = 0.0
Identities = 527/527 (100%), Gaps = 0/527 (0%)
Strand=Plus/Plus

Query  306  TGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACC  
365
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  1    TGCATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACC  60

Query  366  TATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCG  
425
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  61   TATTAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCG  
120

Query  426  GCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGA  
485
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  121  GCAACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGA  
180

Query  486  ACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTG  
545
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  181  ACAGCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTG  
240

Query  546  CCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGT  
605
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  241  CCTCCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGT  
300

Query  606  TCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACA  
665
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  301  TCTACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACA  
360

Query  666  GGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGG  
725
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  361  GGCACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGG  
420

Query  726  GCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAA  
785
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  421  GCTTGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAA  
480

Query  786  GGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA  832
            |||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  481  GGCTAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA  527


>gi|186334|gb|M13982.1|HUMIL4 Human interleukin 4 (IL-4) mRNA, complete cds
Length=614

Score = 1039 bits (524),  Expect = 0.0
Identities = 524/524 (100%), Gaps = 0/524 (0%)
Strand=Plus/Plus

Query  309  ATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACCTAT  
368
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2    ATCGTTAGCTTCTCCTGATAAACTAATTGCCTCACATTGTCACTGCAAATCGACACCTAT  61

Query  369  TAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCGGCA  
428
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  62   TAATGGGTCTCACCTCCCAACTGCTTCCCCCTCTGTTCTTCCTGCTAGCATGTGCCGGCA  
121

Query  429  ACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGAACA  
488
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  122  ACTTTGTCCACGGACACAAGTGCGATATCACCTTACAGGAGATCATCAAAACTTTGAACA  
181

Query  489  GCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTGCCT  
548
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  182  GCCTCACAGAGCAGAAGACTCTGTGCACCGAGTTGACCGTAACAGACATCTTTGCTGCCT  
241

Query  549  CCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGTTCT  
608
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  242  CCAAGAACACAACTGAGAAGGAAACCTTCTGCAGGGCTGCGACTGTGCTCCGGCAGTTCT  
301

Query  609  ACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACAGGC  
668
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  302  ACAGCCACCATGAGAAGGACACTCGCTGCCTGGGTGCGACTGCACAGCAGTTCCACAGGC  
361

Query  669  ACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGGGCT  
728
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  362  ACAAGCAGCTGATCCGATTCCTGAAACGGCTCGACAGGAACCTCTGGGGCCTGGCGGGCT  
421

Query  729  TGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAAGGC  
788
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  422  TGAATTCCTGTCCTGTGAAGGAAGCCAACCAGAGTACGTTGGAAAACTTCTTGGAAAGGC  
481

Query  789  TAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA  832
            ||||||||||||||||||||||||||||||||||||||||||||
Sbjct  482  TAAAGACGATCATGAGAGAGAAATATTCAAAGTGTTCGAGCTGA  525


>From: mark.schreiber at novartis.com
>To: "Emily Wong" <emy_66 at hotmail.com>
>CC: biojava-l at lists.open-bio.org, biojava-l-bounces at lists.open-bio.org
>Subject: Re: [Biojava-l] Blast parser for ncbi blast version 2.2.13
>Date: Tue, 14 Mar 2006 09:07:39 +0800
>
>Possibly some variation in this output is causing the problem
>
>Can you post some blast output that replicates this error?
>
>Thanks
>
>- Mark
>
>
>
>
>
>"Emily Wong" <emy_66 at hotmail.com>
>Sent by: biojava-l-bounces at lists.open-bio.org
>03/13/2006 01:39 PM
>
>
>         To:     biojava-l at lists.open-bio.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] Blast parser for ncbi blast version 
>2.2.13
>
>
>Hi,
>
>Is there a parser that takes into account ncbi blast version 2.2.13(on
>their
>website)? I am trying to use the code here to parse :
>http://www.biojava.org/docs/bj_in_anger/BlastParser.htm . If I set the
>parser from strict to lazy I get these comments :
>Exception in thread "main" java.lang.NullPointerException
>                  at
>org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:215)
>                  at
>org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
>                  at
>org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:311)
>                  at
>org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:274)
>                  at
>org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160)
>                  at BlastParser.main(BlastParser.java:46)
>
>Thanks,
>
>Emily
>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>


From christoph.gille at charite.de  Tue Mar 14 07:47:10 2006
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Tue, 14 Mar 2006 08:47:10 +0100 (CET)
Subject: [Biojava-l] alignment algor in Biojava
In-Reply-To: <mailman.1077.1142298882.2006.biojava-l@lists.open-bio.org>
References: <mailman.1077.1142298882.2006.biojava-l@lists.open-bio.org>
Message-ID: <64617.84.190.58.246.1142322430.squirrel@webmail.charite.de>

> Hi,
>
> Can somebody tell me what algorithms Biojava uses to make local alignments
> and multiples alignments? I'm Serching it on the Documentation but I have
> not found it?

at the bottom of the page
http://www.biojava.org/docs/bj_in_anger/index.htm

http://www.charite.de/bioinf/strap/Scripting.html#SequenceAligner
Cheers

Christoph


From koeberle at mpiib-berlin.mpg.de  Tue Mar 14 12:28:37 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Tue, 14 Mar 2006 13:28:37 +0100
Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ?
Message-ID: <4416B6F5.1050905@mpiib-berlin.mpg.de>

Hi,
I try to write a Sequence-Object into BioSQL-DB using the Classes of 
BioJAVA-X.
This works well. But if I try to save a Sequence-Object with two (or 
more) Features and both Feature have equal Types and equal Sources, 
writing in DB fails.
Is the idea wrong to have more than one Feature with same type and 
source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL.

Thanks,
Christian

The Errormessage:
org.hibernate.StaleStateException: Batch update returned unexpected row 
count from update: 0 actual row count: 0 expected: 1
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93)
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79)
        at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427)
        at 
org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140)
        at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296)
        at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009)
        at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356)
        at 
org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106)

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de


From koeberle at mpiib-berlin.mpg.de  Tue Mar 14 14:06:43 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Tue, 14 Mar 2006 15:06:43 +0100
Subject: [Biojava-l] BioJAVA-X + BioSQL + no update
Message-ID: <4416CDF3.4000407@mpiib-berlin.mpg.de>

Hi,
I have following problem.
I put a RichSequence-Object into a BioSQL-DB, using the new Classes from 
BioJAVA-X.
Later I get these Sequence-Object from the BioSQL-DB (also with 
BioJAVA-X)  and create new Faeture-Objects and Note-Objects and add 
these to the Sequence-Object.
In the case of BioJAVA 1.4 all Features and Annotations are written into 
the BioSQL-DB. In case of BioJAVA-X there are  no changes ind the DB.
Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the 
changes into the DB.

Thanks,
Christian

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de


From bubba.puryear at gmail.com  Mon Mar 13 17:27:11 2006
From: bubba.puryear at gmail.com (Bubba Puryear)
Date: Mon, 13 Mar 2006 12:27:11 -0500
Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records
Message-ID: <d2f7533b0603130927g26e08bc6o975cac05a9df1f72@mail.gmail.com>

 Hello,

  I work on a webapp for a biotech company that uses biojava to parse
plasmid and feature maps (genbank flatfile format)  and we store them in a
local database. I've wanted to update the version of biojava we use because
the current CVS parser handles features that cross the origin on plasmid
maps much better than the parser in 1.4.

  However, we have a lot of data in various databases that have genbank
records formatted in some of the older incarnations of the GFF. In
particular, some feature maps don't have ACCESSION fields, and/or are
missing modification dates and genbank divisions on the LOCUS line. When I
try to parse one of those maps with biojavax, I get parse errors.

  Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat
class be made more tolerant? I know NCBI made several changes to their
flatfile format in part  because writing parsers for the older specs was
tricky. So I'm not sure which direction the bio* folks would like to go with
this.

  I've attached a small example map that causes parse problems. The data in
the map is completely bogus, but the structure was taken from a real map
file I have to deal with.

  The following code snippet illustrates my problems:

        BufferedReader br = new BufferedReader(new
StringReader(genbankContent));
        try {
            RichSequenceIterator sequences = IOTools.readGenbankDNA(br,
null);
            if (sequences.hasNext()) {
                    this.sequence = sequences.nextRichSequence();
             }
        } catch (Exception e) {
            e.printStackTrace();
        }


  where genbankContent is a String containing the contents of the attached
file.

Thanks much,
Bubba Puryear
-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.gb
Type: chemical/seq-na-genbank
Size: 1091 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20060313/b56af1d0/attachment-0002.bin>

From mira.edelstein at gmx.net  Tue Mar 14 22:30:01 2006
From: mira.edelstein at gmx.net (Mira)
Date: Tue, 14 Mar 2006 23:30:01 +0100
Subject: [Biojava-l] (no subject)
Message-ID: <001501c647b6$d5954f70$9b7ba8c0@mecom>

please take me from the mailing list

thanks mira


From mark.schreiber at novartis.com  Wed Mar 15 06:42:59 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 15 Mar 2006 14:42:59 +0800
Subject: [Biojava-l] Feature + BioJAVA-X + BioSQL ?
Message-ID: <OFF9C6B6C2.B599605D-ON48257132.0023FE17-48257132.0024E58C@EU.novartis.net>

This could be a bug, this is bleeding edge development code.

Are you using the most up to date CVS code? Also which database are you 
using?

As a suggestion RichFeatures with the same Type, Source and Parent 
sequence can only be distinguished by rank (In BioSQL and BioJavaX). Can 
you persist them to the DB if you give one a different rank?

- Mark


Christian K?berle <koeberle at mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces at lists.open-bio.org
03/14/2006 08:28 PM

 
        To:     bio java mailing list <biojava-l at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Feature + BioJAVA-X + BioSQL ?


Hi,
I try to write a Sequence-Object into BioSQL-DB using the Classes of 
BioJAVA-X.
This works well. But if I try to save a Sequence-Object with two (or 
more) Features and both Feature have equal Types and equal Sources, 
writing in DB fails.
Is the idea wrong to have more than one Feature with same type and 
source at one Sequence. Or is this a bug of BioJAVA / BioJAVA-X or BioSQL.

Thanks,
Christian

The Errormessage:
org.hibernate.StaleStateException: Batch update returned unexpected row 
count from update: 0 actual row count: 0 expected: 1
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCount(BatchingBatcher.java:93)
        at 
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:79)
        at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
        at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:91)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareStatement(AbstractBatcher.java:86)
        at 
org.hibernate.jdbc.AbstractBatcher.prepareBatchStatement(AbstractBatcher.java:171)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2048)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2427)
        at 
org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:51)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:227)
        at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:140)
        at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:296)
        at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
        at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1009)
        at 
org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:356)
        at 
org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:106)

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Wed Mar 15 07:02:02 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 15 Mar 2006 15:02:02 +0800
Subject: [Biojava-l] BioJAVA-X + BioSQL + no update
Message-ID: <OF1DCB13C9.C7738C71-ON48257132.0024F81A-48257132.0026A3EF@EU.novartis.net>

With BioJavaX if you want any changes to a RichSequence object to persist 
to the database you need to "save or add it" with Hibernate. 


SessionFactory sessionFactory = new 
Configuration().configure().buildSessionFactory(); 
Session session = sessionFactory.openSession();  
RichObjectFactory.connectToBioSQL(session);

RichSequence rs = ...;                // some sequence you've made or 
modified
session.saveOrUpdate("Sequence",rs);  // persist the sequence

***
Another way is to do everything inside a transaction (this example is from 
the BioJavaX docbook in CVS)

SessionFactory sessionFactory = new 
Configuration().configure().buildSessionFactory(); 
Session session = sessionFactory.openSession();  
RichObjectFactory.connectToBioSQL(session);

Transaction tx = session.beginTransaction();
try {

    // print out all the namespaces in the database

    Query q = session.createQuery("from Namespace");
    List namespaces = q.list();               // retrieve all the 
namespaces from the db
    for (Iterator i = namespaces.iterator(); i.hasNext(); ) {
        Namespace ns = (Namespace)i.next();
        System.out.println(ns.getName());     // print out the name of the 
namespace

        // print out all the sequences in the namespace
        Query sq = session.createQuery("from BioEntry where namespace= 
:nsp");
        // set the named parameter "nsp" to ns
        sq.setParameter("nsp",ns);
        List sequences = sq.list();

        for (Iterator j = sequences.iterator(); j.hasNext(); ) {
            BioEntry be = (BioEntry)j.next();        // RichSequences are 
BioEntrys too
            System.out.println("   "+be.getName());  // print out the name 
of the sequence

            // if the sequence is called bloggs, change its description to 
XYZ

            if (be.getName().equals("bloggs")) {
                be.setDescription("XYZ");
            }
        }

    }

    // commit and tidy up
    tx.commit(); 
    System.out.println("Changes committed.");

    // all sequences called bloggs now have a description of "XYZ" in the 
database

} catch (Exception e) {
    tx.rollback(); 
    System.out.println("Changes rolled back.");
    e.printStackTrace(); 
}

session.close();


Christian K?berle <koeberle at mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces at lists.open-bio.org
03/14/2006 10:06 PM

 
        To:     bio java mailing list <biojava-l at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioJAVA-X + BioSQL + no update


Hi,
I have following problem.
I put a RichSequence-Object into a BioSQL-DB, using the new Classes from 
BioJAVA-X.
Later I get these Sequence-Object from the BioSQL-DB (also with 
BioJAVA-X)  and create new Faeture-Objects and Note-Objects and add 
these to the Sequence-Object.
In the case of BioJAVA 1.4 all Features and Annotations are written into 
the BioSQL-DB. In case of BioJAVA-X there are  no changes ind the DB.
Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the 
changes into the DB.

Thanks,
Christian

-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Wed Mar 15 07:11:55 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Wed, 15 Mar 2006 15:11:55 +0800
Subject: [Biojava-l] biojavax GenbankFormat and legacy genbank records
Message-ID: <OFCB3369C9.A20B0E95-ON48257132.002727B7-48257132.00278B93@EU.novartis.net>

Hi -

I'm happy for the regexps in GenbankFormat and EMBLFormat etc to be 
relaxed a little as long as the parsing of fully valid genbank files 
doesn't suffer. If someone wants to test this thoroughly it would be a 
great benefit to the whole community.

In some cases it may not be possible. For example if a feature doesn't 
have sufficient information to build a proper RichFeature object I don't 
think we should allow the file.

I might be good to make a collection in CVS of example files that are 
known to have broken the parser in the past (the files folder in the test 
suite would be an ideal place).

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


"Bubba Puryear" <bubba.puryear at gmail.com>
Sent by: biojava-l-bounces at lists.open-bio.org
03/14/2006 01:27 AM

 
        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] biojavax GenbankFormat and legacy genbank records


Hello,

I work on a webapp for a biotech company that uses biojava to parse
plasmid and feature maps (genbank flatfile format)  and we store them in a
local database. I've wanted to update the version of biojava we use 
because
the current CVS parser handles features that cross the origin on plasmid
maps much better than the parser in 1.4.

However, we have a lot of data in various databases that have genbank
records formatted in some of the older incarnations of the GFF. In
particular, some feature maps don't have ACCESSION fields, and/or are
missing modification dates and genbank divisions on the LOCUS line. When I
try to parse one of those maps with biojavax, I get parse errors.

Should there perhaps be a LegacyGenbankFormat or should the GenbankFormat
class be made more tolerant? I know NCBI made several changes to their
flatfile format in part  because writing parsers for the older specs was
tricky. So I'm not sure which direction the bio* folks would like to go 
with
this.

I've attached a small example map that causes parse problems. The data in
the map is completely bogus, but the structure was taken from a real map
file I have to deal with.

The following code snippet illustrates my problems:

BufferedReader br = new BufferedReader(new
StringReader(genbankContent));
try {
RichSequenceIterator sequences = IOTools.readGenbankDNA(br,
null);
if (sequences.hasNext()) {
this.sequence = sequences.nextRichSequence();
}
} catch (Exception e) {
e.printStackTrace();
}


where genbankContent is a String containing the contents of the attached
file.

Thanks much,
Bubba Puryear

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l

[ Attachment ''FOO.GB'' removed by Mark Schreiber ]


From koeberle at mpiib-berlin.mpg.de  Thu Mar 16 10:03:26 2006
From: koeberle at mpiib-berlin.mpg.de (=?ISO-8859-1?Q?Christian_K=F6berle?=)
Date: Thu, 16 Mar 2006 11:03:26 +0100
Subject: [Biojava-l] BioJAVA-X + BioSQL + no update
In-Reply-To: <OF1DCB13C9.C7738C71-ON48257132.0024F81A-48257132.0026A3EF@EU.novartis.net>
References: <OF1DCB13C9.C7738C71-ON48257132.0024F81A-48257132.0026A3EF@EU.novartis.net>
Message-ID: <441937EE.6000204@mpiib-berlin.mpg.de>

Hi  Mark,

it works but the code has to look like that:
...
        session.getTransaction().begin();
        session.saveOrUpdate("Sequence",seq);       
        session.getTransaction().commit();
it also works with:
       session.update("Sequence",seq); 

Thanks,
Christian

mark.schreiber at novartis.com wrote:
> With BioJavaX if you want any changes to a RichSequence object to persist 
> to the database you need to "save or add it" with Hibernate. 
>
>
> SessionFactory sessionFactory = new 
> Configuration().configure().buildSessionFactory(); 
> Session session = sessionFactory.openSession();  
> RichObjectFactory.connectToBioSQL(session);
>
> RichSequence rs = ...;                // some sequence you've made or 
> modified
> session.saveOrUpdate("Sequence",rs);  // persist the sequence
>
> ***
> Another way is to do everything inside a transaction (this example is from 
> the BioJavaX docbook in CVS)
>
> SessionFactory sessionFactory = new 
> Configuration().configure().buildSessionFactory(); 
> Session session = sessionFactory.openSession();  
> RichObjectFactory.connectToBioSQL(session);
>
> Transaction tx = session.beginTransaction();
> try {
>
>     // print out all the namespaces in the database
>
>     Query q = session.createQuery("from Namespace");
>     List namespaces = q.list();               // retrieve all the 
> namespaces from the db
>     for (Iterator i = namespaces.iterator(); i.hasNext(); ) {
>         Namespace ns = (Namespace)i.next();
>         System.out.println(ns.getName());     // print out the name of the 
> namespace
>
>         // print out all the sequences in the namespace
>         Query sq = session.createQuery("from BioEntry where namespace= 
> :nsp");
>         // set the named parameter "nsp" to ns
>         sq.setParameter("nsp",ns);
>         List sequences = sq.list();
>
>         for (Iterator j = sequences.iterator(); j.hasNext(); ) {
>             BioEntry be = (BioEntry)j.next();        // RichSequences are 
> BioEntrys too
>             System.out.println("   "+be.getName());  // print out the name 
> of the sequence
>
>             // if the sequence is called bloggs, change its description to 
> XYZ
>
>             if (be.getName().equals("bloggs")) {
>                 be.setDescription("XYZ");
>             }
>         }
>
>     }
>
>     // commit and tidy up
>     tx.commit(); 
>     System.out.println("Changes committed.");
>
>     // all sequences called bloggs now have a description of "XYZ" in the 
> database
>
> } catch (Exception e) {
>     tx.rollback(); 
>     System.out.println("Changes rolled back.");
>     e.printStackTrace(); 
> }
>
> session.close();
>
>
>
>
>
>
> Christian K?berle <koeberle at mpiib-berlin.mpg.de>
> Sent by: biojava-l-bounces at lists.open-bio.org
> 03/14/2006 10:06 PM
>
>  
>         To:     bio java mailing list <biojava-l at biojava.org>
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] BioJAVA-X + BioSQL + no update
>
>
> Hi,
> I have following problem.
> I put a RichSequence-Object into a BioSQL-DB, using the new Classes from 
> BioJAVA-X.
> Later I get these Sequence-Object from the BioSQL-DB (also with 
> BioJAVA-X)  and create new Faeture-Objects and Note-Objects and add 
> these to the Sequence-Object.
> In the case of BioJAVA 1.4 all Features and Annotations are written into 
> the BioSQL-DB. In case of BioJAVA-X there are  no changes ind the DB.
> Includes BioJAVA-X a method to update the BioSQL-DB or how can I add the 
> changes into the DB.
>
> Thanks,
> Christian
>
>   


-- 
Christian K?berle

Max Planck Institute for Infection Biology
Department: Immunology
Schumannstr. 21/22
10117 Berlin

Tel: +49 30 28 460 562
e-mail: koeberle at mpiib-berlin.mpg.de


From mark.schreiber at novartis.com  Fri Mar 17 02:50:34 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 17 Mar 2006 10:50:34 +0800
Subject: [Biojava-l] ProfileHMM Serialization Problem
Message-ID: <OF6364F467.02ADA9D1-ON48257134.000F8AD9-48257134.000F9E14@EU.novartis.net>

He did fix a number of problems, although possibly not all,

Which version are you using?

Can you send a stack trace?

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


Todd Riley <toddri at eden.rutgers.edu>
03/17/2006 10:33 AM

 
        To:     Mark Schreiber/GP/Novartis at PH
        cc:     biojava-l-bounces at portal.open-bio.org, biojava-l at biojava.org
        Subject:        ProfileHMM Serialization Problem


Hello all,

I am having a problem with serialized ProfileHMM objects.  I can read in 
one serialized ProfileHMM object, but never more than one (I can't even 
read in the same serialized object again.)  It appears that the problem 
lies with the AlphabetManager. Maybe a clash with alphabet names and/or 
indexes?  I looked in the archives and found the problem seemed to exist 
back in Oct of 2002.  Has this ever been addressed?  Any help here would 
be greatly appreciated.

Thanks,
Todd
RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs
Schreiber, Mark
Tue, 08 Oct 2002 13:11:33 -0700
Yup,

It needs fixing, serialization and BioJava just don't seem to play that
well :(

The question is what kind of API. The attractive part about
serialization is that when it works you get back what you started with.
You can also do RMI. The downside of the XML model is you don't get back
what you had before, you get back a MarkovModel, all of your custom
designed methods etc are lost. 

Two ways I can see to get around this. One right a wrapper class that
makes your custom model and the thing returned by the XMLMarkovModel
look the same (look like the same interface generally). The other option
is to mimic something like JAXB (not JAXB though as it won't cope well
with BioJava flyweight symbols and alphabets). Somewhere the class name
is stored in the XML and through the wonders of introspection things are
returned to how they were. This generally requires the class to be
designed as a valid bean, or at least point to a nice FactoryClass or
something.

Ultimately this would be good for all of BioJava. I know people hate the
idea of another XML format but I think that there really isn't one that
represents what we are trying to do here. You could also write XSLT to
transform into XML flavours that aren't as interested in gory details
such as classnames etc which are needed for serialization.

Just my $0.02

- Mark


> -----Original Message-----
> From: Matthew Pocock [mailto:[EMAIL PROTECTED]] 
> Sent: Wednesday, 9 October 2002 7:08 a.m.
> To: Lachlan Coin; [EMAIL PROTECTED]
> Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs
> 
> 
> Hi,
> 
> HMM serialization (or persistance) seems to be an
> ongoing problem for people. We (OK - I) wrote this
> code a long time ago, back in the dark ages when I
> didn't know much about programming. Does anyone want
> to fix this mess once and for all, and write a HMM
> persistance API? It sounds like that would be a realy
> helpfull thing to have.
> 
> Matthew
> 
>  --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi
> > 
> > Having made a mistake in serialising HMMs before -
> > are you writing your
> > serialised object at several points in the code?
> > Unless you write all of
> > the models at the same point, they will not work
> > when you read them back
> > in.
> > 
> > Cheers,
> > 
> > Lachlan
> > 
> > >
> > > Message: 1
> > > Subject: RE: [Biojava-l] Create DP object from
> > profileHMM class file
> > > Date: Tue, 8 Oct 2002 08:53:41 +1300
> > > From: "Schreiber, Mark"
> > <[EMAIL PROTECTED]>
> > > To: "Tisanai" <[EMAIL PROTECTED]>,
> > <[EMAIL PROTECTED]>
> > >
> > > Hi -
> > >
> > > The error is coming from the 64th line of your
> > program (at
> > > T_Zscore.main(T_Zscore.java:64))
> > >
> > > I can see two places that the error might be
> > coming from but I need to
> > > know which line is the 64th line of the program.
> > >
> > > Is it: ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > >
> > > Or is it: dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Tisanai
> > [mailto:[EMAIL PROTECTED]]
> > > > Sent: Tuesday, 8 October 2002 2:40 a.m.
> > > > To: [EMAIL PROTECTED]
> > > > Subject: [Biojava-l] Create DP object from
> > profileHMM class file
> > > >
> > > >
> > > > Hi
> > > >
> > > >    By this code I would like to create DP object
> > from several
> > > > phmm file.
> > > >
> > > >        for(int
> > i=0;i<md_out_lst.align.length;i++){
> > > >         String model_out_name =
> > md_out_lst.align[i];
> > > >         File md_file = new File(model_out_name);
> > > >
> > > >         FileInputStream fis_md = new
> > FileInputStream(md_file);
> > > >         ObjectInputStream ois_md = new
> > ObjectInputStream(fis_md);
> > > >         ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > > >         ois_md.close();
> > > >         dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > > >        }
> > > >
> > > >    I found that  it always stuck at the second file (i=2). If 
there is only one file in my list this code will
> > work fine. But if there is more than one file in the list when it try 
to
> > > > create the second dp object (dp[1]). This kind of error will shown 
out:
> > > >
> > > >             org.biojava.bio.BioError: State d-15
> > is known in
> > > > states  but is not listed in the transFrom table
> > > >         at
> > > >
> >
> org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar
> > > > kovModel.java:227)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599)
> > > >             at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586)
> > > >         at org.biojava.bio.dp.DP.stateList(DP.java:123)
> > > >         at org.biojava.bio.dp.DP.update(DP.java:353)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49)
> > > >         at org.biojava.bio.dp.DP.<init>(DP.java:377)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.<init>(SingleDP.java:41)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory
> > > .java:53)
> > > >         at T_Zscore.main(T_Zscore.java:64)
> > > >
> > > >     How can I fix my code?
> > > >
> > > > Thank
> > > > Tisanai
> > > >
> > > > _______________________________________________
> > > > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > > > http://biojava.org/mailman/listinfo/biojava-l
> > > >
> > >
> >
> ==============================================================
> =========
> > > Attention: The information contained in this
> > message and/or attachments
> > > from AgResearch Limited is intended only for the
> > persons or entities
> > > to which it is addressed and may contain
> > confidential and/or privileged
> > > material. Any review, retransmission,
> > dissemination or other use of, or
> > > taking of any action in reliance upon, this
> > information by persons or
> > > entities other than the intended recipients is
> > prohibited by AgResearch
> > > Limited. If you have received this message in
> > error, please notify the
> > > sender immediately.
> > >
> >
> ==============================================================
> =========
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > http://biojava.org/mailman/listinfo/biojava-l
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts 
> http://uk.my.yahoo.com >
_______________________________________________
> Biojava-l mailing list  -  [EMAIL PROTECTED] 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

[Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs Lachlan Coin 


From mark.schreiber at novartis.com  Fri Mar 17 02:52:52 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 17 Mar 2006 10:52:52 +0800
Subject: [Biojava-l] Away
Message-ID: <OFE32BF527.53630C7A-ON48257134.000FA01A-48257134.000FD39F@EU.novartis.net>

Hello -

I'm going to be travelling a lot in the next 5 weeks and may only have 
patchy access to email and no access to CVS or my development machines. 
Therefore I won't be able to offer much in the way of technical support. 

Hopefully Richard and Michael will be able to deal with any major issues 
that crop up.

Best regards,

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


From toddri at eden.rutgers.edu  Fri Mar 17 02:33:05 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu, 16 Mar 2006 21:33:05 -0500
Subject: [Biojava-l] ProfileHMM Serialization Problem
In-Reply-To: <OFE0127171.ED2CB90B-ON4825712C.000590AE-4825712C.0005B129@EU.novartis.net>
References: <OFE0127171.ED2CB90B-ON4825712C.000590AE-4825712C.0005B129@EU.novartis.net>
Message-ID: <441A1FE1.9000508@eden.rutgers.edu>

  Hello all,

I am having a problem with serialized ProfileHMM objects.  I can read in 
one serialized ProfileHMM object, but never more than one (I can't even 
read in the same serialized object again.)  It appears that the problem 
lies with the AlphabetManager. Maybe a clash with alphabet names and/or 
indexes?  I looked in the archives and found the problem seemed to exist 
back in Oct of 2002.  Has this ever been addressed?  Any help here would 
be greatly appreciated.

Thanks,
Todd


  RE: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs

Schreiber, Mark
Tue, 08 Oct 2002 13:11:33 -0700

Yup,

It needs fixing, serialization and BioJava just don't seem to play that
well :(

The question is what kind of API. The attractive part about
serialization is that when it works you get back what you started with.
You can also do RMI. The downside of the XML model is you don't get back
what you had before, you get back a MarkovModel, all of your custom
designed methods etc are lost. 

Two ways I can see to get around this. One right a wrapper class that
makes your custom model and the thing returned by the XMLMarkovModel
look the same (look like the same interface generally). The other option
is to mimic something like JAXB (not JAXB though as it won't cope well
with BioJava flyweight symbols and alphabets). Somewhere the class name
is stored in the XML and through the wonders of introspection things are
returned to how they were. This generally requires the class to be
designed as a valid bean, or at least point to a nice FactoryClass or
something.

Ultimately this would be good for all of BioJava. I know people hate the
idea of another XML format but I think that there really isn't one that
represents what we are trying to do here. You could also write XSLT to
transform into XML flavours that aren't as interested in gory details
such as classnames etc which are needed for serialization.

Just my $0.02

- Mark


> -----Original Message-----
> From: Matthew Pocock [mailto:[EMAIL PROTECTED] <mailto:%5BEMAIL%20PROTECTED%5D>] 
> Sent: Wednesday, 9 October 2002 7:08 a.m.
> To: Lachlan Coin; [EMAIL PROTECTED]
> Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs
> 
> 
> Hi,
> 
> HMM serialization (or persistance) seems to be an
> ongoing problem for people. We (OK - I) wrote this
> code a long time ago, back in the dark ages when I
> didn't know much about programming. Does anyone want
> to fix this mess once and for all, and write a HMM
> persistance API? It sounds like that would be a realy
> helpfull thing to have.
> 
> Matthew
> 
>  --- Lachlan Coin <[EMAIL PROTECTED]> wrote: > Hi
> > 
> > Having made a mistake in serialising HMMs before -
> > are you writing your
> > serialised object at several points in the code?
> > Unless you write all of
> > the models at the same point, they will not work
> > when you read them back
> > in.
> > 
> > Cheers,
> > 
> > Lachlan
> > 
> > >
> > > Message: 1
> > > Subject: RE: [Biojava-l] Create DP object from
> > profileHMM class file
> > > Date: Tue, 8 Oct 2002 08:53:41 +1300
> > > From: "Schreiber, Mark"
> > <[EMAIL PROTECTED]>
> > > To: "Tisanai" <[EMAIL PROTECTED]>,
> > <[EMAIL PROTECTED]>
> > >
> > > Hi -
> > >
> > > The error is coming from the 64th line of your
> > program (at
> > > T_Zscore.main(T_Zscore.java:64))
> > >
> > > I can see two places that the error might be
> > coming from but I need to
> > > know which line is the 64th line of the program.
> > >
> > > Is it: ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > >
> > > Or is it: dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Tisanai
> > [mailto:[EMAIL PROTECTED] <mailto:%5BEMAIL%20PROTECTED%5D>]
> > > > Sent: Tuesday, 8 October 2002 2:40 a.m.
> > > > To: [EMAIL PROTECTED]
> > > > Subject: [Biojava-l] Create DP object from
> > profileHMM class file
> > > >
> > > >
> > > > Hi
> > > >
> > > >    By this code I would like to create DP object
> > from several
> > > > phmm file.
> > > >
> > > >        for(int
> > i=0;i<md_out_lst.align.length;i++){
> > > >         String model_out_name =
> > md_out_lst.align[i];
> > > >         File md_file = new File(model_out_name);
> > > >
> > > >         FileInputStream fis_md = new
> > FileInputStream(md_file);
> > > >         ObjectInputStream ois_md = new
> > ObjectInputStream(fis_md);
> > > >         ProfileHMM model = (ProfileHMM)
> > ois_md.readObject();
> > > >         ois_md.close();
> > > >         dp[i] =
> > DPFactory.DEFAULT.createDP(model);
> > > >        }
> > > >
> > > >    I found that  it always stuck at the second file (i=2). If there is only one file in my list this code will
> > work fine. But if there is more than one file in the list when it try to
> > > > create the second dp object (dp[1]). This kind of error will shown out:
> > > >
> > > >             org.biojava.bio.BioError: State d-15
> > is known in
> > > > states  but is not listed in the transFrom table
> > > >         at
> > > >
> >
> org.biojava.bio.dp.SimpleMarkovModel.transitionsFrom(SimpleMar
> > > > kovModel.java:227)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.transitionsTo(DP.java:599)
> > > >             at
> > > >
> >
> org.biojava.bio.dp.DP$HMMOrderByTransition.compare(DP.java:586)
> > > >         at org.biojava.bio.dp.DP.stateList(DP.java:123)
> > > >         at org.biojava.bio.dp.DP.update(DP.java:353)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.update(SingleDP.java:49)
> > > >         at org.biojava.bio.dp.DP.<init>(DP.java:377)
> > > >         at
> >
> org.biojava.bio.dp.onehead.SingleDP.<init>(SingleDP.java:41)
> > > >         at
> > > >
> >
> org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory
> > > .java:53)
> > > >         at T_Zscore.main(T_Zscore.java:64)
> > > >
> > > >     How can I fix my code?
> > > >
> > > > Thank
> > > > Tisanai
> > > >
> > > > _______________________________________________
> > > > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > > > http://biojava.org/mailman/listinfo/biojava-l
> > > >
> > >
> >
> ==============================================================
> =========
> > > Attention: The information contained in this
> > message and/or attachments
> > > from AgResearch Limited is intended only for the
> > persons or entities
> > > to which it is addressed and may contain
> > confidential and/or privileged
> > > material. Any review, retransmission,
> > dissemination or other use of, or
> > > taking of any action in reliance upon, this
> > information by persons or
> > > entities other than the intended recipients is
> > prohibited by AgResearch
> > > Limited. If you have received this message in
> > error, please notify the
> > > sender immediately.
> > >
> >
> ==============================================================
> =========
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  [EMAIL PROTECTED] 
> > http://biojava.org/mailman/listinfo/biojava-l
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts 
> http://uk.my.yahoo.com >
_______________________________________________
> Biojava-l mailing list  -  [EMAIL PROTECTED] 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

[Biojava-l] Re: Biojava-l digest, Vol 1 #776 - 5 msgs 
<http://www.mail-archive.com/biojava-l at biojava.org/msg02132.html> 
Lachlan Coin


From er.sukhdeepsingh at gmail.com  Fri Mar 17 11:21:16 2006
From: er.sukhdeepsingh at gmail.com (Sukhdeep Singh)
Date: Fri, 17 Mar 2006 16:51:16 +0530
Subject: [Biojava-l] need help
Message-ID: <40fbb41e0603170321p572b04cdj20d8e84ae5fb3977@mail.gmail.com>

hello guys
myself SUKHDEEP SINGH a 2ND YEAR student of AMBALA COLLEGE OF ENGINEERING &
APPLIED RESEARCH.

  pals i am very much dedicated to bioinformatics and want to do something
great in it.
  i have also done basic & advanced courses in BIOINFORMATICS in my 15 day
winter vacation.
I hav learned the functions of some softwares such as RASMOL,SWISSPDB,CN3D(
V3.1),CLUSTAL-X,HYPERCAM(V7.5 student evaluation version).
  i am very much dedicated to it because i have a good knowledge of
computers as i am operating it for about 4 years but moderate knowledge of
bio.
I am also familier to the databases like KEGG,NCBI,PUBMED,ENTREZ etc.
so i want you to help me by telling me any tutorial program for
BIOJAVA,BIOPERL or any institute giving training in bioinformatics or any
other subject related to BIOINFORMATICS for 45 days nearly in the month of
july-august.

so please friends jus help me out with this

REPLY me at er.sukhdeepsingh at gmail.com

SUKHDEEP SINGH


From dag at sonsorol.org  Tue Mar 21 17:55:11 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Tue, 21 Mar 2006 12:55:11 -0500
Subject: [Biojava-l] Important OBF update for biojava developers and users
Message-ID: <EA431336-80D3-4430-8690-73A33AAC3A55@sonsorol.org>


Executive summary:

biojava.org new DNS is propagating as I write this email. Eventually  
everyone should see the new wiki-based site running on the new OBF  
server hardware.  Read on for more info on some other upcoming  
changes...


Hi biojava people,

Sorry for the interruption but I've got some important site and  
server news. People will also see multiple copies of this note as I  
slowly transition sites over one at a time.

We are in the midst of moving all of our websites, mailing lists,  
developers and sourcecode repositories onto more modern hardware  
located in a 2nd Boston area datacenter facility.

The transition is important for a couple of reasons - the most urgent  
being that we are going to lose internet connectivity in our current  
hosting facility on March 27th 2006.  That datacenter belongs to  
Wyeth Research in Cambridge, Massachusetts.  Wyeth Research &  
Genetics Institute have been long time significant supporters &  
hosting providers for OBF servers and projects -- we owe them a great  
deal of gratitude and public acknowledgment for hosting our servers  
over many years. Speaking as a hardware geek I can tell you that the  
many years of high-bandwidth, trouble free hosting have been  
invaluable for our efforts and projects.   Sadly, it is no longer  
possible for them to host our servers as they need to begin making  
some network and WAN circuit changes that will no longer support  
direct internet facing servers (such as ours) in Cambridge.

The other major reason for the transition is our need to relocate  
onto hardware that can better be remotely managed (as our volunteer  
administrators are scattered all over the globe).

My employer, BioTeam Inc. has donated new server hardware and is also  
providing the hosting facilities in a Tier 1 Boston area colocation  
facility.
Infrastructure geeks can see pictures of the colocation  cage and the  
new OBF servers online at this URL:
http://bioteam.net/gallery/bioteamBDC  -- those servers also host  
EMBOSS FTP/CVS and mailing lists.


Current status of the migration:

  - All 57 mailing lists have been moved over to the new hardware  
(you may have noticed "lists.open-bio.org" showing up in your list  
messages)

  - The new anonymous sourcecode server is running at http:// 
code.open-bio.org. "cvs.biojava.org" is already pointing at it.

  - Your website (biojava.org) was moved to the new hardware (and new  
Wiki site!) about an hour ago

  - Developers with CVS accounts have *NOT* been migrated yet

Basically we are trying to relocate everything but the developers  
over the next few days so we can spend the weekend on the developer  
and CVS transition.


If DNS has not propagated yet, point your browser at http:// 
biojava.open-bio.org -- that is the new site your group has been  
building. What is happening now is DNS pointers for biojava.org and  
www.biojava.org are slowly changing over to point at the wiki and the  
new hardware. Eventually you'll see the same site regardless of which  
URL you use.


  For biojava users
  --------------------------
Please keep an eye on your website and mailing lists and let  
support at open-bio.org know if there are any problems with the  
transition. In particular your new wiki site contains embedded links  
to some parts of the 'old' static website.  I caught the obvious ones  
-- (biojava.org/downloads/ and biojava.org/docs/ but I may have  
missed some.  Please let me know about any broken links.

Also someone may want to clean up the biojava logo image now in the  
wiki to make the white background transparent.


  For developers and leaders
  ---------------------------------------------
Whomever will be updating the static parts of the website (/downlaod/  
and /docs/)  in the future will need login access to our new central  
webserver  machine, please contact support at open-bio.org to request a  
user account for biojava website maintenance.

For people with CVS commit/write access
---------------------------------------------------------
Also note that when we finally do transition over to the new  
developer machine (where the real sourcecode lives), ALL developers  
will need to email support at open-bio.org to request a password reset.  
Although we can transition usernames, settings and home directories  
over from the old to the new machine we can not transition over  
existing passwords as they are stored in incompatible hashed formats.  
All developers are going to need new passwords for the new developer  
machine.  We will likely make the developer machine swap this weekend.


Reporting Problems / Help & Assistance
------------------------------------------------------
The transition will be complicated, we need your help to spot  
problems and glitches! The OBF has a new helpdesk ticketing system  
set up at "support at open-bio.org" so that all OBF admins can read and  
respond to issues and problems. Most troubles should be reported to  
that address. For urgent problems, especially during this transition  
period,  feel free to contact me directly (dag at sonsorol.org) (ichat/ 
aol/aim screen name:  bioteamdag).


Regards,
Chris Dagdigian
open-bio.org


From toddri at eden.rutgers.edu  Thu Mar 23 21:59:23 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu, 23 Mar 2006 16:59:23 -0500
Subject: [Biojava-l] HMM's - Attempting some fancy stuff
In-Reply-To: <44119D31.6010703@mpiib-berlin.mpg.de>
References: <OF7F997996.426B09C6-ON4825712C.00092887-4825712C.000CB319@EU.novartis.net>
	<44119D31.6010703@mpiib-berlin.mpg.de>
Message-ID: <44231A3B.6030902@eden.rutgers.edu>

Hello,

After successfully implementing some TFBS search models using the 
ProfileHMM and DP classes, I am ready to attempt some fancier stuff that 
is going to require some serious coding.  Before I begin, I thought that 
I might field some questions to the BioJava users/programmers that have 
some experience and/or interest in the BioJava HMM classes.  I want to 
be sure to implement features in a fashion that will maximize usability 
in the simplest way....

Questions:

1. Many of the TFBS sites that I am modeling are palindromic or 
repetitive.  I wish to associate transition and emission distributions 
(as prior knowledge) during training in order to enforce a palindromic 
and/or repetitive pattern and thus also greatly reduce the parameter space.

Example: A p53 TFBS is palindromic and repetitive.  A 20 column Profile 
HMM can be greatly reduced to an HMM with a the match-state topology of 
1 2 3 4 5 C(5) C(4) C(3) C(2) C(1) 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1), 
where C() means DNA complement.  Notice that with this model, I now have 
only 5 match-state emissions as opposed to 20 to train.  (C(n) is a 
complement view over distribution n).  There are also far fewer 
transition distributions to train if I impose that the transitions from 
a->b are the same as b->a or C(b)->C(a), but in the opposite direction.

I wish to implement this in a fashion that does not require any changes 
to the current Viterbi, forward, Baum Welch, etc, algorithms, or the DP 
class.

I have already started writing classes that provide a view (or 
complement view) over an existing distribution.  My plan is to use these 
views as a means to correlate emission and transition distributions from 
and between different columns in the Profile HMM.

Has anyone ever tried this or thought of trying this?  Any ideas about 
how to implement this could be very useful.

2.  I wish to use more complicated background models than just a 0-th 
order background distribution.  I would like to use a Dirichlet mixture 
and/or higher order Markov models.  Has anyone looked into this?  Any 
ideas as to how to implement this in the current release?

-Todd


From toddri at eden.rutgers.edu  Thu Mar 23 23:04:45 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu, 23 Mar 2006 18:04:45 -0500
Subject: [Biojava-l] HMM's - Attempting some fancy stuff
In-Reply-To: <1143153837.13405.184.camel@elm.mcb.mcgill.ca>
References: <OF7F997996.426B09C6-ON4825712C.00092887-4825712C.000CB319@EU.novartis.net>	
	<44119D31.6010703@mpiib-berlin.mpg.de>
	<44231A3B.6030902@eden.rutgers.edu>
	<1143153837.13405.184.camel@elm.mcb.mcgill.ca>
Message-ID: <4423298D.8000901@eden.rutgers.edu>

Yes, I agree that the palindromes are not always identical.  However, 
often my unaligned training data is not complete enough to train the 
model well without some simplification.  So far, I have been using 
Cross-validation, sensitivity, and specificity to determine the 
effectiveness of this simplification approach.

-Todd

Francois Pepin wrote:

>>1. Many of the TFBS sites that I am modeling are palindromic or 
>>repetitive.  I wish to associate transition and emission distributions 
>>(as prior knowledge) during training in order to enforce a palindromic 
>>and/or repetitive pattern and thus also greatly reduce the parameter space.
>>    
>>
>
>Just as a note, we haven't found this to be ideal, if you have
>sufficient training data. It is often the case that one of the
>palindromes is more conserved than the other, and you would treating
>them the same way.
>
>Of course, it depends how much of an in-depth study you'll want to be
>doing.
>
>Francois
>
>  
>


From mark.schreiber at novartis.com  Fri Mar 24 02:28:04 2006
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Fri, 24 Mar 2006 10:28:04 +0800
Subject: [Biojava-l] HMM's - Attempting some fancy stuff
Message-ID: <OF8375920E.72210EFF-ON4825713B.000CD1C1-4825713B.000D8EBC@EU.novartis.net>

I think you could do a palindrome as a push-down automaton or similar. 
Alternatively you could do something like a HMM with emission duration as 
in Borodovsky's GeneMarkHMM programs but that would require a lot of new 
code for the DP library (good to have though).

To use a Dirichlet mixture as your background you could calculate one and give it to a Distribution 
although it might be best to implement the Distribution interface with a 
class that generates one for you. To go to higer order models you just 
need a higher order alphabet 
(http://biojava.org/wiki/BioJava:Cookbook:Alphabets:CrossProduct) and 
possibly use an OrderNDistribution for background and emission 
(http://biojava.org/wiki/BioJava:CookBook:Distribution:Custom)

- Mark


Todd Riley <toddri at eden.rutgers.edu>
Sent by: biojava-l-bounces at lists.open-bio.org
03/24/2006 07:04 AM

 
        To:     Francois Pepin <fpepin at aei.ca>
        cc:     biojava-l at biojava.org, Mark Schreiber/GP/Novartis at PH
        Subject:        Re: [Biojava-l] HMM's - Attempting some fancy stuff


Yes, I agree that the palindromes are not always identical.  However, 
often my unaligned training data is not complete enough to train the 
model well without some simplification.  So far, I have been using 
Cross-validation, sensitivity, and specificity to determine the 
effectiveness of this simplification approach.

-Todd

Francois Pepin wrote:

>>1. Many of the TFBS sites that I am modeling are palindromic or 
>>repetitive.  I wish to associate transition and emission distributions 
>>(as prior knowledge) during training in order to enforce a palindromic 
>>and/or repetitive pattern and thus also greatly reduce the parameter 
space.
>> 
>>
>
>Just as a note, we haven't found this to be ideal, if you have
>sufficient training data. It is often the case that one of the
>palindromes is more conserved than the other, and you would treating
>them the same way.
>
>Of course, it depends how much of an in-depth study you'll want to be
>doing.
>
>Francois
>
> 
>

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From jolyon.holdstock at ogt.co.uk  Fri Mar 24 11:26:44 2006
From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock)
Date: Fri, 24 Mar 2006 11:26:44 -0000
Subject: [Biojava-l] RichSequence annotations...
Message-ID: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com>

Hi,

 
I use the following code to extract all the genes from a sequence file; 

I load the sequence then filter out only CDS features; iterating through
these lets me get the gene annotation for the feature

 
//======================================================================
=========

Sequence seq;

String fileName = new
File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e
mbl");

try {

  seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();

}

catch (IOException IOE) {

  System.out.println("IOException " + IOE);

}

catch (BioException BIOE) {

  System.out.println("BioException " + BIOE);

}

    
//Create a feature filter for CDS features only

FeatureFilter ff = new FeatureFilter.ByType("CDS");

 
//Get the filtered Features

FeatureHolder fh = seq.filter(ff);

 
//Iterate over the Features in fh

for (Iterator i = fh.features(); i.hasNext(); ) {

  Feature f = (Feature)i.next();

  Annotation annotation = f.getAnnotation();

  Object key = "gene";

  hash.put(annotation.getProperty(key), f);

}

//======================================================================
=========

 
I am now using the new BioJavaX classes which I cannot get to work. Does
anyone has any pointers for this?

I use the sequence data so have to use a RichSequence rather than a
BioEntry

 
//======================================================================
=========

RichSequence richSeq;

String fileName =
"C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl";

  try {

    richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new
FileReader(fileName)), null).nextRichSequence();

  }

  catch (IOException IOE) {

    System.out.println("IOException " + IOE);

  }

  catch (BioException BIOE) {

    System.out.println("BioException " + BIOE);

}

 
//Create a feature filter for CDS features only

FeatureFilter ff = new FeatureFilter.ByType("CDS");

 
//Get the filtered Features

FeatureHolder fh = richSeq.filter(ff);

 
//Iterate through the features

for (Iterator i = fh.features(); i.hasNext(); ) {

  RichFeature rf = (RichFeature) i.next();

  System.out.println("RichFeature: " + rf.toString());

  RichAnnotation ra = (RichAnnotation) rf.getAnnotation();

  System.out.println("RichAnnotation: " + ra.toString());

}

//======================================================================
=========

 
The output  shows that CDS features have been filtered successfully and
that the gene name is in the annotation

 
RichFeature: (#1)
lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109
76,12496..12656,14136..14266])

RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1"

14403..14532,16852..16987,17821..17959,18068..18122,

19456..19570,23623..23753,25885..26053,29102..29240,

32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4)
biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match:
proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene:
dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited
guanine

nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1]

 
If I add the following then I can see what keys are in the annotation

//======================================================================
=========

Set keySet = ra.keys();

for (Iterator it = keySet.iterator(); it.hasNext(); ) {

  String key = it.next().toString();

  System.out.println("Key: " + key);

}

//======================================================================
=========

 
The output shows that there is a gene

 
Key: biojavax:clone_lib

Key: biojavax:codon_start

Key: biojavax:evidence

Key: biojavax:gene

Key: biojavax:note

Key: biojavax:product

Key: biojavax:protein_id

 
My understanding is that I need to use a ComparableTerm to access the
value but when I create it I get a NoSuchElementException error

 
ComparableTerm gene =
RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene");

System.out.println("Gene: " + ra.getProperty(gene));

 
java.util.NoSuchElementException: No such property: biojavax:gene, rank
0

 
cheers,

 
Jolyon

 
Jolyon Holdstock Ph.D.

Senior Computational Biologist,

Oxford Gene Technology (Ops) Ltd.

Begbroke Business and Science Park

Sandy Lane, Yarnton

Oxford, OX5 1PF

 
Tel: 01865 309699

Fax: 01865 842116

 
Confidentiality Notice:

The contents of this email from the Oxford Gene Technology Group of
Companies are confidential and intended solely for the person to whom it
is addressed. It may contain privileged and confidential information. If
you are not the intended recipient you must not read, copy, distribute,
discuss or take any action in reliance on it.

 
From richard.holland at ebi.ac.uk  Fri Mar 24 13:16:49 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Fri, 24 Mar 2006 13:16:49 +0000
Subject: [Biojava-l] RichSequence annotations...
In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com>
References: <588D0DD225D05746B5D8CAE1BE971F3FAFC0B7@EUCLID.internal.ogtip.com>
Message-ID: <1143206209.3899.84.camel@texas.ebi.ac.uk>

The terms are ranked in RichAnnotations. getProperty(term) searches for
a Note with that term and a rank of zero.

If you don't know the ranks, you need to use the 

    public Note[] getProperties(Object key);

method on the RichAnnotation object instead. This will return a list of
all matching Note objects with the given term regardless of rank.

cheers,
Richard

On Fri, 2006-03-24 at 11:26 +0000, Jolyon Holdstock wrote:
> Hi,
> 
>  
> 
> I use the following code to extract all the genes from a sequence file; 
> 
> I load the sequence then filter out only CDS features; iterating through
> these lets me get the gene annotation for the feature
> 
>  
> 
> //======================================================================
> =========
> 
> Sequence seq;
> 
> String fileName = new
> File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e
> mbl");
> 
> try {
> 
>   seq = SeqIOTools.readEmbl(new BufferedReader(new
> FileReader(fileName))).nextSequence();
> 
> }
> 
> catch (IOException IOE) {
> 
>   System.out.println("IOException " + IOE);
> 
> }
> 
> catch (BioException BIOE) {
> 
>   System.out.println("BioException " + BIOE);
> 
> }
> 
>     
> 
> //Create a feature filter for CDS features only
> 
> FeatureFilter ff = new FeatureFilter.ByType("CDS");
> 
>  
> 
> //Get the filtered Features
> 
> FeatureHolder fh = seq.filter(ff);
> 
>  
> 
> //Iterate over the Features in fh
> 
> for (Iterator i = fh.features(); i.hasNext(); ) {
> 
>   Feature f = (Feature)i.next();
> 
>   Annotation annotation = f.getAnnotation();
> 
>   Object key = "gene";
> 
>   hash.put(annotation.getProperty(key), f);
> 
> }
> 
> //======================================================================
> =========
> 
>  
> 
> I am now using the new BioJavaX classes which I cannot get to work. Does
> anyone has any pointers for this?
> 
> I use the sequence data so have to use a RichSequence rather than a
> BioEntry
> 
>  
> 
> //======================================================================
> =========
> 
> RichSequence richSeq;
> 
> String fileName =
> "C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl";
> 
>   try {
> 
>     richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new
> FileReader(fileName)), null).nextRichSequence();
> 
>   }
> 
>   catch (IOException IOE) {
> 
>     System.out.println("IOException " + IOE);
> 
>   }
> 
>   catch (BioException BIOE) {
> 
>     System.out.println("BioException " + BIOE);
> 
> }
> 
>  
> 
> //Create a feature filter for CDS features only
> 
> FeatureFilter ff = new FeatureFilter.ByType("CDS");
> 
>  
> 
> //Get the filtered Features
> 
> FeatureHolder fh = richSeq.filter(ff);
> 
>  
> 
> //Iterate through the features
> 
> for (Iterator i = fh.features(); i.hasNext(); ) {
> 
>   RichFeature rf = (RichFeature) i.next();
> 
>   System.out.println("RichFeature: " + rf.toString());
> 
>   RichAnnotation ra = (RichAnnotation) rf.getAnnotation();
> 
>   System.out.println("RichAnnotation: " + ra.toString());
> 
> }
> 
> //======================================================================
> =========
> 
>  
> 
> The output  shows that CDS features have been filtered successfully and
> that the gene name is in the annotation
> 
>  
> 
> RichFeature: (#1)
> lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109
> 76,12496..12656,14136..14266])
> 
> RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1"
> 
> 14403..14532,16852..16987,17821..17959,18068..18122,
> 
> 19456..19570,23623..23753,25885..26053,29102..29240,
> 
> 32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4)
> biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match:
> proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene:
> dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited
> guanine
> 
> nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1]
> 
>  
> 
> 
> 
> If I add the following then I can see what keys are in the annotation
> 
> //======================================================================
> =========
> 
> Set keySet = ra.keys();
> 
> for (Iterator it = keySet.iterator(); it.hasNext(); ) {
> 
>   String key = it.next().toString();
> 
>   System.out.println("Key: " + key);
> 
> }
> 
> //======================================================================
> =========
> 
>  
> 
> The output shows that there is a gene
> 
>  
> 
> Key: biojavax:clone_lib
> 
> Key: biojavax:codon_start
> 
> Key: biojavax:evidence
> 
> Key: biojavax:gene
> 
> Key: biojavax:note
> 
> Key: biojavax:product
> 
> Key: biojavax:protein_id
> 
>  
> 
> My understanding is that I need to use a ComparableTerm to access the
> value but when I create it I get a NoSuchElementException error
> 
>  
> 
> ComparableTerm gene =
> RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene");
> 
> System.out.println("Gene: " + ra.getProperty(gene));
> 
>  
> 
> java.util.NoSuchElementException: No such property: biojavax:gene, rank
> 0
> 
>  
> 
> cheers,
> 
>  
> 
> Jolyon
> 
>  
> 
> 
> 
> 
> 
> 
> 
> Jolyon Holdstock Ph.D.
> 
> Senior Computational Biologist,
> 
> Oxford Gene Technology (Ops) Ltd.
> 
> Begbroke Business and Science Park
> 
> Sandy Lane, Yarnton
> 
> Oxford, OX5 1PF
> 
>  
> 
> Tel: 01865 309699
> 
> Fax: 01865 842116
> 
>  
> 
> Confidentiality Notice:
> 
> The contents of this email from the Oxford Gene Technology Group of
> Companies are confidential and intended solely for the person to whom it
> is addressed. It may contain privileged and confidential information. If
> you are not the intended recipient you must not read, copy, distribute,
> discuss or take any action in reliance on it.
> 
>  
> 
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From dag at sonsorol.org  Sat Mar 25 23:50:57 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Sat, 25 Mar 2006 18:50:57 -0500
Subject: [Biojava-l] Important news for developers on open-bio machines
Message-ID: <1BB8AE37-91CA-45C7-AA81-A12826D5F422@sonsorol.org>


Hi, apologies for the massive cross-post. I'll keep it short!

This message is a last-ditch attempt to contact people with developer  
accounts on pub.open-bio.org who may have not received the individual  
mails we've been sending via the obf-developers at lists.open-bio.org  
mailing list. We suspect that there are a number of devs out there  
for whom we don't have up to date email addresses.

All open-bio services have been migrated to new hardware and a new  
datacenter. Part of this migration process involved moving all  
developer accounts and all source-code repositories to a new server.  
The developer migration was completed a few minutes ago. An  
unavoidable side effect of the move is that all developers are now  
locked out of their accounts until they contact us for a password reset.

If you are a developer and this news comes as a surprise to you, it  
means we don't have your contact info. Your best way to get up to  
speed on the history and technical details behind the migration is to  
point your browser here:

http://lists.open-bio.org/mailman/private/obf-developers/2006-March/ 
thread.html

... and read the various messages we've posted this month. Included  
in the first message is the information on how to request an account  
reset.


Regards,
Chris Dagdigian
open-bio.org


From duze at gmx.de  Tue Mar 28 06:44:38 2006
From: duze at gmx.de (=?ISO-8859-1?Q?=22Andreas_Dr=E4ger=22?=)
Date: Tue, 28 Mar 2006 08:44:38 +0200 (MEST)
Subject: [Biojava-l] (no subject)
Message-ID: <2493.1143528278@www086.gmx.net>

Hi,

I just tried the GA-Example from the BioJava Cookbook.
Therefore I included all sources from the biojava-live
directory from CVS. The following line seems to cause
problems:

       genAlg.run(new DemoStopping());

After execution one receives the following (error) message:
gen,average_fitness,best_fitness
0,49.98,67.0
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
   Syntax error on token "assert", assert expected

   at
org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280)
   at
org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339)
   at
org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80)
   at
org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108)
   at GADemo.main(GADemo.java:91)

I do not know, how to proceed, so I post this message to you.

Sincerely,
Andreas Dr?ger

-- 
Bis zu 70% Ihrer Onlinekosten sparen: GMX SmartSurfer!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer


From richard.holland at ebi.ac.uk  Tue Mar 28 07:42:33 2006
From: richard.holland at ebi.ac.uk (Richard Holland)
Date: Tue, 28 Mar 2006 08:42:33 +0100
Subject: [Biojava-l] (no subject)
In-Reply-To: <2493.1143528278@www086.gmx.net>
References: <2493.1143528278@www086.gmx.net>
Message-ID: <1143531753.3898.45.camel@texas.ebi.ac.uk>

Hi Andreas.

This sounds like a compiler version or flags problem. 

Could you check that you are running javac from a Java 1.4 or later
installation?

Also, see
http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility

(The Ant script uses the flag '-source 1.4' for everything).

Then try doing an 'ant clean' before your 'ant package-biojava' to make
sure everything gets recompiled.

cheers,
Richard 

On Tue, 2006-03-28 at 08:44 +0200, "Andreas Dr?ger" wrote:
> Hi,
> 
> I just tried the GA-Example from the BioJava Cookbook.
> Therefore I included all sources from the biojava-live
> directory from CVS. The following line seems to cause
> problems:
> 
>        genAlg.run(new DemoStopping());
> 
> After execution one receives the following (error) message:
> gen,average_fitness,best_fitness
> 0,49.98,67.0
> Exception in thread "main" java.lang.Error: Unresolved compilation problem:
>    Syntax error on token "assert", assert expected
> 
>    at
> org.biojava.utils.ChangeSupport.firePreChangeEvent(ChangeSupport.java:280)
>    at
> org.biojava.bio.symbol.SimpleSymbolList.edit(SimpleSymbolList.java:339)
>    at
> org.biojavax.ga.functions.SimpleCrossOverFunction.performCrossOver(SimpleCrossOverFunction.java:80)
>    at
> org.biojavax.ga.impl.SimpleGeneticAlgorithm.run(SimpleGeneticAlgorithm.java:108)
>    at GADemo.main(GADemo.java:91)
> 
> I do not know, how to proceed, so I post this message to you.
> 
> Sincerely,
> Andreas Dr?ger
> 
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------


From andreas.draeger at clever-telefonieren.de  Tue Mar 28 08:29:32 2006
From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Tue, 28 Mar 2006 10:29:32 +0200
Subject: [Biojava-l] GA-Package
Message-ID: <4428F3EC.9050507@clever-telefonieren.de>

Thanks, 

Now it works fine!

Cheers,
Andreas


Richard Holland wrote:

Hi Andreas.

This sounds like a compiler version or flags problem. 

Could you check that you are running javac from a Java 1.4 or later
installation?

Also, see
http://java.sun.com/j2se/1.4.2/docs/guide/lang/assert.html#compatibility

(The Ant script uses the flag '-source 1.4' for everything).

Then try doing an 'ant clean' before your 'ant package-biojava' to make
sure everything gets recompiled.

cheers,
Richard 

-- 
==================================
Andreas Dr?ger
PhD student
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Phone: +49-7071-29-70436
==================================


From andreas.draeger at clever-telefonieren.de  Tue Mar 28 08:34:20 2006
From: andreas.draeger at clever-telefonieren.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Tue, 28 Mar 2006 10:34:20 +0200
Subject: [Biojava-l] GA-Package
Message-ID: <4428F50C.4070104@clever-telefonieren.de>

Thanks,

Now it works fine!

Cheers,
Andreas

-- 
==================================
Andreas Dr?ger
PhD student
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Phone: +49-7071-29-70436
==================================


From wendy.wong at gmail.com  Thu Mar 30 15:41:47 2006
From: wendy.wong at gmail.com (wendy wong)
Date: Thu, 30 Mar 2006 16:41:47 +0100
Subject: [Biojava-l] unsupervised training of transition weights
Message-ID: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>

Hi,

I am trying to train my HMM using unsupervised training (I don't need
to train the emission probabilities). I was wondering how I can do so
in biojava. do I have to implement the TransitionTrainer interface?

my second question is:
I implemnted getWeightImpl in my custom distribution to set up my
emission states and it works fine. but is it possible to get the
program to access it only when there's certain symbol in the observed
sequence, (instead of precalculated)? and I also found that (although
I might be wrong) the weights are calculated twice, once was when the
distribution was created, and then when I call viterbi it calls
getWeightImpl again. I am not sure what I did wrong here :(

any input would be very much appreciated!

thank you!

wendy


From td2 at sanger.ac.uk  Fri Mar 31 10:58:38 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Fri, 31 Mar 2006 11:58:38 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
Message-ID: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>


On 30 Mar 2006, at 16:41, wendy wong wrote:

> Hi,
>
> I am trying to train my HMM using unsupervised training (I don't need
> to train the emission probabilities). I was wondering how I can do so
> in biojava. do I have to implement the TransitionTrainer interface?

The easiest way to do this is to use UntrainableDistributions for all  
the transition-sets that you don't want to be trained:

         http://www.biojava.org/docs/api14/org/biojava/bio/dist/ 
UntrainableDistribution.html

If UntrainableDistribution doesn't fit your requirements, the  
alternative is to create your own Distribution implementation with a  
registerTrainer method that creates a "dummy" (i.e. doesn't do  
anything) DistributionTrainer.  UntrainableDistribution is just a  
subclass of SimpleDistribution which replaces the registerTrainer  
method with a non-functional version.

> my second question is:
> I implemnted getWeightImpl in my custom distribution to set up my
> emission states and it works fine. but is it possible to get the
> program to access it only when there's certain symbol in the observed
> sequence, (instead of precalculated)? and I also found that (although
> I might be wrong) the weights are calculated twice, once was when the
> distribution was created, and then when I call viterbi it calls
> getWeightImpl again. I am not sure what I did wrong here :(

The DP code does some caching of probabilities, I don't think there's  
any way to turn this off without modifying the DP implementations.

           Thomas.


From matthew.pocock at ncl.ac.uk  Fri Mar 31 17:05:25 2006
From: matthew.pocock at ncl.ac.uk (Matthew Pocock)
Date: Fri, 31 Mar 2006 18:05:25 +0100
Subject: [Biojava-l] unsupervised training of transition weights
In-Reply-To: <5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
References: <e554425b0603300741j755ecd33m36f04275a3811f86@mail.gmail.com>
	<5D3C5E2A-25E7-4516-B0E8-D1F57EAAFE1A@sanger.ac.uk>
Message-ID: <200603311805.25861.matthew.pocock@ncl.ac.uk>

> The DP code does some caching of probabilities, I don't think there's
> any way to turn this off without modifying the DP implementations.
>
>            Thomas.

My reccolection is that if you did turn this off, the algorithm would run 
very, very much more slowly. Internally to the DP objects, the distribution 
probabilities (in fact, they aren't even probabilities by this stage) are 
stored in a data-structure optimized for the type of lookups performed during 
the dynamic programming recursions.

Matthew