From dreher at mpiib-berlin.mpg.de  Wed Feb  1 09:57:03 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Wed Feb  1 10:02:30 2006
Subject: [Biojava-l] BioSQL cvs versions
In-Reply-To: <OFF75A9CBE.6D006CC2-ON482570FF.0006A71E-482570FF.0007134F@EU.novartis.net>
References: <OFF75A9CBE.6D006CC2-ON482570FF.0006A71E-482570FF.0007134F@EU.novartis.net>
Message-ID: <43E0CC3F.4010402@mpiib-berlin.mpg.de>

Hello Mark,

thank you very much for your fast reply. In the meantime I was busy with 
installing and configuring a new version of 'Sun Java Studio Creator', 
which I use to develop at the time.
As you suggested, I would like to start using Hibernate. Is there any 
documentation right now about the Hibernate-BioSQL interaction?

Thank you,
Felix


mark.schreiber@novartis.com wrote:

>Dear Felix,
>
>We have found a number of deficiencies in biojava's support of biosql. 
>Therefore we have moved to a new model using hibernate to overcome several 
>problems. This will be officially released in biojava1.5. In the meantime 
>you can download the development version from CVS.
>
>Having said that, the best supported database versions in biojava 1.4 are 
>Oracle and MySQL. These have received the most testing and support. If you 
>have a chance (and cannot use Hibernate) I would suggest using one of 
>those. Although someone may offer a bug fix for this problem we do not 
>plan to support the old biojava/biosql mappings after 1.5 is released. 
>They have been deprecated in the CVS. The official way to interact with 
>biosql will be via Hibernate.
>
>- Mark
>
>Mark Schreiber
>Research Investigator (Bioinformatics)
>
>Novartis Institute for Tropical Diseases (NITD)
>10 Biopolis Road
>#05-01 Chromos
>Singapore 138670
>www.nitd.novartis.com
>
>phone +65 6722 2973
>fax  +65 6722 2910
>
>
>
>
>
>Felix Dreher <dreher@mpiib-berlin.mpg.de>
>Sent by: biojava-l-bounces@portal.open-bio.org
>01/20/2006 10:45 PM
>
> 
>        To:     biojava-l@biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        [Biojava-l] BioSQL cvs versions
>
>
>Hello,
>when I try to add a sequence to a BioSQL-DB, the following exception is 
>thrown:
>
>*Exception Details: * org.postgresql.util.PSQLException
>  ERROR: column "seqfeature_key_id" of relation "seqfeature" does not 
>exist
>
>|org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
>org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
>org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
>org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
>org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
>org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
>org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
>org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
>org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804)
>org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760)
>org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729)
>org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481)
>org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374)
>.
>.
>.
>
>|
>apparently the BioJava- and BioSQL-version don't really match.
>I use the following cvs-version of the corresponding class: 
>/BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005//
>Further I use the latest cvs-version of the BioSQL-script 
>'biosqldb-pg.sql' (it's from June 2005).
>Are there any suggestions how this could be solved?
>
>Thank you,
>Felix
>
>
>
>
>
>
>  
>


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From mark.schreiber at novartis.com  Wed Feb  1 20:02:17 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Feb  1 19:58:24 2006
Subject: [Biojava-l] BioSQL cvs versions
Message-ID: <OFB717DFAC.A79DA625-ON48257109.00059E6A-48257109.0005B3C6@EU.novartis.net>

Hello Felix -

The best document is the BioJavaX docbook in the docs/ folder of the CVS 
distribution of biojava.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910


Felix Dreher <dreher@mpiib-berlin.mpg.de>
02/01/2006 10:57 PM

 
        To:     Mark Schreiber/GP/Novartis@PH
        cc:     biojava-l@biojava.org
        Subject:        Re: [Biojava-l] BioSQL cvs versions


Hello Mark,

thank you very much for your fast reply. In the meantime I was busy with 
installing and configuring a new version of 'Sun Java Studio Creator', 
which I use to develop at the time.
As you suggested, I would like to start using Hibernate. Is there any 
documentation right now about the Hibernate-BioSQL interaction?

Thank you,
Felix


mark.schreiber@novartis.com wrote:

>Dear Felix,
>
>We have found a number of deficiencies in biojava's support of biosql. 
>Therefore we have moved to a new model using hibernate to overcome 
several 
>problems. This will be officially released in biojava1.5. In the meantime 

>you can download the development version from CVS.
>
>Having said that, the best supported database versions in biojava 1.4 are 

>Oracle and MySQL. These have received the most testing and support. If 
you 
>have a chance (and cannot use Hibernate) I would suggest using one of 
>those. Although someone may offer a bug fix for this problem we do not 
>plan to support the old biojava/biosql mappings after 1.5 is released. 
>They have been deprecated in the CVS. The official way to interact with 
>biosql will be via Hibernate.
>
>- Mark
>
>Mark Schreiber
>Research Investigator (Bioinformatics)
>
>Novartis Institute for Tropical Diseases (NITD)
>10 Biopolis Road
>#05-01 Chromos
>Singapore 138670
>www.nitd.novartis.com
>
>phone +65 6722 2973
>fax  +65 6722 2910
>
>
>
>
>
>Felix Dreher <dreher@mpiib-berlin.mpg.de>
>Sent by: biojava-l-bounces@portal.open-bio.org
>01/20/2006 10:45 PM
>
> 
>        To:     biojava-l@biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        [Biojava-l] BioSQL cvs versions
>
>
>Hello,
>when I try to add a sequence to a BioSQL-DB, the following exception is 
>thrown:
>
>*Exception Details: * org.postgresql.util.PSQLException
>  ERROR: column "seqfeature_key_id" of relation "seqfeature" does not 
>exist
>
>|org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
>org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
>org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
>org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
>org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
>org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
>org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
>org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
>org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804)
>org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760)
>org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729)
>org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481)
>org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374)
>.
>.
>.
>
>|
>apparently the BioJava- and BioSQL-version don't really match.
>I use the following cvs-version of the corresponding class: 
>/BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005//
>Further I use the latest cvs-version of the BioSQL-script 
>'biosqldb-pg.sql' (it's from June 2005).
>Are there any suggestions how this could be solved?
>
>Thank you,
>Felix
>
>
>
>
>
>
> 
>


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426


From mark.schreiber at novartis.com  Wed Feb  1 22:19:02 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Feb  1 22:15:11 2006
Subject: [Biojava-l] Help needed to add "Number of Bits" vertical
	and	column number labeling to DistributionLogos
Message-ID: <OFC56DB367.9E342063-ON48257109.001215A8-48257109.001238FE@EU.novartis.net>

>Actually, I am doing all of those things (Graphics2D object, 
BufferedImage, etc.).  I would like to get the >code that draws vertical 
and horizontal labels and such.  I have seen the results in Samiul Hasan's 
thesis >paper.  Ah! I forgot that I can use modern technology as a visual 
aid.....
>
>I would like to be able to do this (copied from Samiul Hasan's thesis 
paper)..........

take a look at http://www.sanger.ac.uk/Info/theses/

- Mark


From heatkent at gmail.com  Thu Feb  2 01:41:56 2006
From: heatkent at gmail.com (Heather Kent)
Date: Thu Feb  2 04:18:07 2006
Subject: [Biojava-l] concatenating chromatograms
Message-ID: <de8b3c810602012241s4f57e336ia75008f2a7e4f959@mail.gmail.com>

I would like to write a small application that would concatenate abi or scf
chromatograms and write out a new chromatogram file..
 has anyone done something similar to this or seen any code that would be
helpful for me, i am new at programming
and have been looking through the Biojava API

From mark.schreiber at novartis.com  Thu Feb  2 04:51:02 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Feb  2 04:49:26 2006
Subject: [Biojava-l] concatenating chromatograms
Message-ID: <OFF97D6F29.2D6C65D7-ON48257109.00347DEB-48257109.00361C60@EU.novartis.net>

Hi Heather,

If you look at the API docs under the Chromatogram and trace file support 
http://www.biojava.org/docs/api14/index.html there are the classes that 
biojava has to support traces.

The best package to use is org.biojava.bio.chromatogram.

One possible way would be to do something like this...

Chromatogram c1 = ChromatogramFactory.create(file1);
Chromatogram c2 = ChromatogramFactory.create(file2);
SimpleChromatogram mergeChrom = new SimpleChromatogram();

//repeat all steps below for each DNA base, eg replace DNATools.a() with 
DNATools.g() etc
int[] a1 = c1.getTrace(DNATools.a());
int[] a2 = c2.getTrace(DNATools.a());
int[] merged = new int[a1.length + a2.lenght]

//use a loop here to copy a1 and a2 into merged

//now set the DNATools.a() trace for mergeChrom
mergeChrom.setTraceValues(DNATools.a(), merged, merged.length);

Hope this works!

- Mark


Heather Kent <heatkent@gmail.com>
Sent by: biojava-l-bounces@portal.open-bio.org
02/02/2006 02:41 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] concatenating chromatograms


I would like to write a small application that would concatenate abi or 
scf
chromatograms and write out a new chromatogram file..
 has anyone done something similar to this or seen any code that would be
helpful for me, i am new at programming
and have been looking through the Biojava API

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From russ at kepler-eng.com  Thu Feb  2 09:23:30 2006
From: russ at kepler-eng.com (Russ Kepler)
Date: Thu Feb  2 09:48:34 2006
Subject: [Biojava-l] concatenating chromatograms
In-Reply-To: <de8b3c810602012241s4f57e336ia75008f2a7e4f959@mail.gmail.com>
References: <de8b3c810602012241s4f57e336ia75008f2a7e4f959@mail.gmail.com>
Message-ID: <200602020723.30627.russ@kepler-eng.com>

On Wednesday 01 February 2006 11:41 pm, Heather Kent wrote:
> I would like to write a small application that would concatenate abi or scf
> chromatograms and write out a new chromatogram file..
>  has anyone done something similar to this or seen any code that would be
> helpful for me, i am new at programming
> and have been looking through the Biojava API

I'm familiar with the ABI trace code and what you want to do would not be 
difficult, but the result may not work the way that you want it to.  A 
basecaller will likely be fooled in the transition between the traces and 
miscall or call no peaks for some time unless you match the local frequencies 
of each trace around the transition, and tagging the start of one run to the 
end of the other is a pretty good way to not do that.

If you're not going to run things through a basecaller all you really need to 
do it is to catenate the trace and basecalls arrays and sequences.  These are 
all exposed in gets().  If the data is coming from a newish AB instrument you 
may want to add code to handle the Q values from the KB caller and catenate 
those arrays as well.

Writing the new file would be a new capability, but the existing reader should 
show you the way to do it.
From ady at sanger.ac.uk  Thu Feb  2 11:23:54 2006
From: ady at sanger.ac.uk (Andy Yates)
Date: Thu Feb  2 12:09:53 2006
Subject: [Biojava-l] concatenating chromatograms
In-Reply-To: <200602020723.30627.russ@kepler-eng.com>
References: <de8b3c810602012241s4f57e336ia75008f2a7e4f959@mail.gmail.com>
	<200602020723.30627.russ@kepler-eng.com>
Message-ID: <43E2321A.2090804@sanger.ac.uk>

Throwing my opinion into the ring on this I've got to agree with Russ 
here. I would think that SCF is a more sensible format for this kind of 
procedure but there is the added bonus that the SCF parser does not 
encode delta-delta values which the SCF specification is completely 
dependant on.

SCF does have the advantage that nothing "really" assumes anything about 
them so you can fiddle about with the chromatogram and so long as the 
things you create in the output Chromatogram are normalised with respect 
to the cuts then everything should be hunky dory.

If you're doing this for space concerns can I suggest passing the SCF 
files through a compression filter. You get the best results with a 
BZIP2 compression algorithm (the format was developed for bzip 
compression) but GZIP works really well and is the choice of compression 
format here at the Sanger Centre.

Hope that helps,

Andy Yates
~~~~~~~~~~~~~~~
Senior Computer Biologist,
Cancer Genome Project.

Wellcome Trust Sanger Institute,
Hinxton, Cambridge

Russ Kepler wrote:
> On Wednesday 01 February 2006 11:41 pm, Heather Kent wrote:
>> I would like to write a small application that would concatenate abi or scf
>> chromatograms and write out a new chromatogram file..
>>  has anyone done something similar to this or seen any code that would be
>> helpful for me, i am new at programming
>> and have been looking through the Biojava API
> 
> I'm familiar with the ABI trace code and what you want to do would not be 
> difficult, but the result may not work the way that you want it to.  A 
> basecaller will likely be fooled in the transition between the traces and 
> miscall or call no peaks for some time unless you match the local frequencies 
> of each trace around the transition, and tagging the start of one run to the 
> end of the other is a pretty good way to not do that.
> 
> If you're not going to run things through a basecaller all you really need to 
> do it is to catenate the trace and basecalls arrays and sequences.  These are 
> all exposed in gets().  If the data is coming from a newish AB instrument you 
> may want to add code to handle the Q values from the KB caller and catenate 
> those arrays as well.
> 
> Writing the new file would be a new capability, but the existing reader should 
> show you the way to do it.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
From mark.schreiber at novartis.com  Thu Feb  2 21:37:36 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Feb  2 21:33:41 2006
Subject: [Biojava-l] biojava wikimedia based home page
Message-ID: <OFC9854694.CA6EF6C3-ON4825710A.0008C916-4825710A.000E6DFA@EU.novartis.net>

Hi all -

The OBF is moving several of it's projects homepages to wikimedia based 
systems. There is a possibility that biojava will move to use this system 
too. I think this is a great way to establish a community based biojava 
web presence. The current home page suffers from the problem that only a 
few people can access and update it which creates a large burden and means 
it sometimes gets out of date. More hands will make things easier. The new 
look bioperl page is a great example of what can be done 
(www.bioperl.org).

Before it is ready for prime-time some work needs to be done to copy 
content over from the current biojava site. We would like to ask for any 
volunteers who have some experience with Wikimedia who could help out.

Please reply to me or to the list.

Any help would be greatly appreciated.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From sylvain.foisy at bioneq.qc.ca  Fri Feb  3 10:43:41 2006
From: sylvain.foisy at bioneq.qc.ca (sylvain.foisy@bioneq.qc.ca)
Date: Fri Feb  3 13:01:31 2006
Subject: [Biojava-l] biojava wikimedia based home page
In-Reply-To: <OFC9854694.CA6EF6C3-ON4825710A.0008C916-4825710A.000E6DFA@EU.novartis
	.net>
References: <OFC9854694.CA6EF6C3-ON4825710A.0008C916-4825710A.000E6DFA@EU.novartis.net>
Message-ID: <22065.132.204.82.34.1138981421.squirrel@mail.bioneq.qc.ca>

Hi Mark

Marvelous idea ;-) We (the Quebec BIoinformatics Network) are in the
process of moving our "Bioinformatics KnowledgeBase"
(http://apps.bioneq.qc.ca/twiki/bin/view/Knowledgebase/WebHome) from its
current TWiki format toward MediaWiki. We are offering to put this (on
going) experience to use on the migration of the Biojava website.

Who should we contact on the OBF side to initiate the project? Chris D.?

Shameless plug: if anyone want to contribute to our Bioinformatics
KnowledgeBase, please feel free to do so!! This is all done in the Wiki
spirit. Don't get scared by the french side of thing, we'll take care of
it ;-)

> The OBF is moving several of it's projects homepages to wikimedia based
> systems. There is a possibility that biojava will move to use this system
> too. I think this is a great way to establish a community based biojava
> web presence. The current home page suffers from the problem that only a
> few people can access and update it which creates a large burden and means
> it sometimes gets out of date. More hands will make things easier. The new
> look bioperl page is a great example of what can be done
> (www.bioperl.org).
>
> Before it is ready for prime-time some work needs to be done to copy
> content over from the current biojava site. We would like to ask for any
> volunteers who have some experience with Wikimedia who could help out.


From guedes at unisul.br  Fri Feb  3 12:55:15 2006
From: guedes at unisul.br (Dickson S. Guedes)
Date: Fri Feb  3 14:02:44 2006
Subject: [Biojava-l] Re: [Biojava-dev] biojava wikimedia based home page
In-Reply-To: <OFC9854694.CA6EF6C3-ON4825710A.0008C916-4825710A.000E6DFA@EU.novartis.net>
References: <OFC9854694.CA6EF6C3-ON4825710A.0008C916-4825710A.000E6DFA@EU.novartis.net>
Message-ID: <43E39903.7050300@unisul.br>

Hi Mark,

I mean that?s very good and think that I can help about. So I?ll learn 
more too. :)

I don?t know more things about Wikimedia, but what I know let me think 
that isn't so hard or difficult. It?s only another way to write a 
hypertext. ;)

[]s
Guedes

mark.schreiber@novartis.com escreveu:
> Hi all -
> 
> The OBF is moving several of it's projects homepages to wikimedia based 
> systems. There is a possibility that biojava will move to use this system 
> too. I think this is a great way to establish a community based biojava 
> web presence. The current home page suffers from the problem that only a 
> few people can access and update it which creates a large burden and means 
> it sometimes gets out of date. More hands will make things easier. The new 
> look bioperl page is a great example of what can be done 
> (www.bioperl.org).
> 
> Before it is ready for prime-time some work needs to be done to copy 
> content over from the current biojava site. We would like to ask for any 
> volunteers who have some experience with Wikimedia who could help out.
> 
> Please reply to me or to the list.
> 
> Any help would be greatly appreciated.
> 
> - Mark
> 
> Mark Schreiber
> Research Investigator (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
> 


-- 
--
:: Dickson S. Guedes (guedes at unisul dot br)
::
:: UNISUL - Universidade do Sul de Santa Catarina
:: ATI - Assessoria de Tecnologia da Informa??o
:: (0xx48) 621-3200 - http://www.unisul.br
--
"H? 10 tipos de pessoas no mundo: as que entendem
  bin?rio, e as que n?o entendem"
From foisys at sympatico.ca  Fri Feb  3 11:34:30 2006
From: foisys at sympatico.ca (foisys@sympatico.ca)
Date: Fri Feb  3 14:33:48 2006
Subject: [Biojava-l] Re:biojava wikimedia based home page
Message-ID: <20060203163430.IOXX1601.tomts46-srv.bellnexxia.net@[209.226.175.82]>

Hi Mark

Marvelous idea ;-) We (the Quebec BIoinformatics Network) are in the
process of moving our "Bioinformatics KnowledgeBase"
(http://apps.bioneq.qc.ca/twiki/bin/view/Knowledgebase/WebHome) from its
current TWiki format toward MediaWiki. We are offering to put this (on
going) experience to use on the migration of the Biojava website.

Who should we contact on the OBF side to initiate the project? Chris D.?

Shameless plug: if anyone want to contribute to our Bioinformatics
KnowledgeBase, please feel free to do so!! This is all done in the Wiki
spirit. Don't get scared by the french side of thing, we'll take care of
it ;-)

> The OBF is moving several of it's projects homepages to wikimedia based
> systems. There is a possibility that biojava will move to use this system
> too. I think this is a great way to establish a community based biojava
> web presence. The current home page suffers from the problem that only a
> few people can access and update it which creates a large burden and means
> it sometimes gets out of date. More hands will make things easier. The new
> look bioperl page is a great example of what can be done
> (www.bioperl.org).
>
> Before it is ready for prime-time some work needs to be done to copy
> content over from the current biojava site. We would like to ask for any
> volunteers who have some experience with Wikimedia who could help out.

From e.willighagen at science.ru.nl  Sat Feb  4 04:10:10 2006
From: e.willighagen at science.ru.nl (Egon Willighagen)
Date: Sat Feb  4 05:28:01 2006
Subject: [Biojava-l] biojava wikimedia based home page
In-Reply-To: <OFC9854694.CA6EF6C3-ON4825710A.0008C916-4825710A.000E6DFA@EU.novartis.net>
References: <OFC9854694.CA6EF6C3-ON4825710A.0008C916-4825710A.000E6DFA@EU.novartis.net>
Message-ID: <200602041010.11238.e.willighagen@science.ru.nl>

On Friday 03 February 2006 03:37, mark.schreiber@novartis.com wrote:
> The OBF is moving several of it's projects homepages to wikimedia based
> systems. There is a possibility that biojava will move to use this system
> too. I think this is a great way to establish a community based biojava
> web presence. The current home page suffers from the problem that only a
> few people can access and update it which creates a large burden and means
> it sometimes gets out of date. More hands will make things easier. The new
> look bioperl page is a great example of what can be done
> (www.bioperl.org).

I have good experience with Wiki's in the past, for example for Jmol. I would 
like to point out that Jmol perfectly fits in to Wiki's systems, or at least 
4 of them:

http://wiki.jmol.org/JmolProcessor

It's a great way to enhance Wiki's with live protein structures.

Egon

-- 
e.willighagen@science.ru.nl
PhD student on Molecular Representation in Chemometrics
Radboud University Nijmegen
Blog: http://chem-bla-ics.blogspot.com/
http://www.cac.science.ru.nl/people/egonw/
GPG: 1024D/D6336BA6
From shameer at ncbs.res.in  Sun Feb  5 05:27:15 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Sun Feb  5 06:19:12 2006
Subject: [Biojava-l] from C alpha trace to full co-ordinates
In-Reply-To: <OFF97D6F29.2D6C65D7-ON48257109.00347DEB-48257109.00361C60@EU.novartis
	.net>
References: <OFF97D6F29.2D6C65D7-ON48257109.00347DEB-48257109.00361C60@EU.novartis.net>
Message-ID: <36486.192.168.1.176.1139135235.squirrel@192.168.1.176>


Dear All,

Any one is aware of a perl script / java code / class that can be used to
construct full atomic coordinates of a protein from a given C(alpha) trace
and optimizes side chain geometry.

I tried the original program Maxsprout from Holms Group, But it is not
giving me proper results (am getting errors like segmentation fault -
backbonchain failed etc.)

Since I need to use as a part of a web server - I would appreciate if any
one could let me know about a perl script for the same.

Thanks and cheers in advance,
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."


From shameer at ncbs.res.in  Mon Feb  6 03:27:50 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon Feb  6 04:01:49 2006
Subject: [Biojava-l] Need a  slogan for OBF 
In-Reply-To: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
	<47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
Message-ID: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38>

Dear All,

As we are moving to the all new look wiki-style-web - why dont we think
about a unique logo +  slogan that can express our spirit and excitement
???

For Example we can have a logo with O|B|F its full form and the slogan -
any body is interested - i would be happy to design logos once we have
done with the logo.

I have a couple of suggestions -I hope all OBF members can sent much more
powerful slogans than mine

'Let's Code for Life'
'Let's Decode Life'
'Let's Recode Life'
'Code your Life '

Happy O|B|!!!
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM


From tjaart at tuks.co.za  Mon Feb  6 12:09:57 2006
From: tjaart at tuks.co.za (Tjaart de Beer)
Date: Mon Feb  6 12:19:34 2006
Subject: [Biojava-l] Newbie struggling with secondary structure
Message-ID: <43E782E5.6090502@tuks.co.za>

Hi

I am new to Biojava (and Java). I want to get the secondary structure 
assignment (located in the PDB file) for a PDB file. I have looked at 
the AminoAcid interfaces and classes but can't get the stuff to work (I 
ahev gotten some of the Structure interfaces to work...). Does someone 
maybe have an example of extracting assigned secondary structure from a 
PDB file?

An alternative is simply to write a class which reads all the lines 
starting with HELIX or SHEET and somehow parse them into meaningful results.

Any suggestions would be greatly appreciated!

-- 
Tjaart de Beer
The software required "Windows XP or better" ... so I installed Linux
From ap3 at sanger.ac.uk  Tue Feb  7 16:44:33 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue Feb  7 17:02:53 2006
Subject: [Biojava-l] Newbie struggling with secondary structure
In-Reply-To: <43E782E5.6090502@tuks.co.za>
References: <43E782E5.6090502@tuks.co.za>
Message-ID: <e94e02f3e762024de2b1e272898c591d@sanger.ac.uk>

Hi Tjaart,

The PDB parser currently does not parse all of the header - the HELIX 
and SHEET lines containing the author's secondary
structure assignments are are currently ignored.

> An alternative is simply to write a class which reads all the lines 
> starting with HELIX or SHEET

for the moment this might be a solution

> and somehow parse them into meaningful results.

what is meaningful? :-) - in case you want to add the data  to the 
AminoAcid objects,
you might want to have a look at the PDB file parser and write a patch 
that reads the lines and stores the
data in the amino acids. let me know if you need help.

Cheers,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From mark.schreiber at novartis.com  Tue Feb  7 21:15:46 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Feb  7 21:11:42 2006
Subject: [Biojava-l] BioJava News site
Message-ID: <OF365347AB.C7F637CF-ON4825710F.000C3087-4825710F.000C6E48@EU.novartis.net>

Dear subscribers -

BioJava has a new news and blog site based on WordPress. It can be found 
at http://biojava.open-bio.org/news/

I have copied some of the more recent news items over and added a few new 
ones. Feel free to subscirbe and or contribute. All major biojava 
announcements will be posted and archived here in future.

Thanks,

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From mark.schreiber at novartis.com  Tue Feb  7 22:06:03 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Feb  7 22:02:00 2006
Subject: [Biojava-l] BioJava wiki
Message-ID: <OF5756BB1E.3E920D60-ON4825710F.00105490-4825710F.001108D4@EU.novartis.net>

Dear BioJava users.

The biojava wiki website has been up for a few days at 
http://biojava.open-bio.org/wiki/Main_Page. Thanks to the amazing efforts 
of a few early volunteers almost all of the content of the old site has 
been transfered to the new page. 

I think that the best part is that now it is wiki based it will be much 
easier for the biojava community to contribute by adding new material 
updating old material and fixing mistakes as they find them. This should 
make the site much more up to date and informative than it has been in the 
past.

We are currently lacking a logo. We have a few suggestions at 
http://biojava.open-bio.org/wiki/BioJava:Logo. If you are an artistic type 
we would love to see your contributions. If you more of a critic add your 
comments about what you like and don't like. After a couple of weeks we 
will decide (somehow) on something official. As I'm based in Singapore I 
cannot guarentee the selection process will be entirely transparent : )

Your contributions are welcome.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From mheusel at gmail.com  Wed Feb  8 02:06:46 2006
From: mheusel at gmail.com (Martin Heusel)
Date: Wed Feb  8 03:53:23 2006
Subject: [Biojava-l] Newbie struggling with secondary structure
In-Reply-To: <43E782E5.6090502@tuks.co.za>
References: <43E782E5.6090502@tuks.co.za>
Message-ID: <6127fc200602072306m7820bd12m@mail.gmail.com>

Hi Tjaart,
you can use DSSP to determine the secondary structure from a PDB.
http://swift.cmbi.ru.nl/gv/dssp/
or maybe better use Christoph's SecondaryStructure_Predictor with biojava
http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html
bye
Martin
From hotafin at gmail.com  Wed Feb  8 08:33:24 2006
From: hotafin at gmail.com (Tamas Horvath)
Date: Wed Feb  8 09:23:24 2006
Subject: [Biojava-l] Re: structureNMRImpl
In-Reply-To: <c343d7080602080528s6655e325sbfdcb757bbaf3e8d@mail.gmail.com>
References: <c343d7080602080528s6655e325sbfdcb757bbaf3e8d@mail.gmail.com>
Message-ID: <c343d7080602080533s4b299ad5ie0e0936e3d0d1e06@mail.gmail.com>

I forgot to mention, that I use a not yet published modified version ofStructureImpl, which can parse an ArrayList<String>. It has a newconstructor for it, and  the old BufferedReader method was changed so itgenerates the ArrayList for the parser method...
On 2/8/06, Tamas Horvath <hotafin@gmail.com> wrote:>> Hi!> The structureNMRImpl class, I've been working on is finally working. It is> far from ready to be incorporated into BioJava just yet.> It's a bit messy, lacks documentation, there are some naming convention> issues with it, nevertheless I'd be happy to hear suggestions about it.> In theory in the future it should use the same Structure interface as> StructureImpl. At least that's my aim.> Anyway... Tell me what u think!>>
From hotafin at gmail.com  Wed Feb  8 10:53:12 2006
From: hotafin at gmail.com (Tamas Horvath)
Date: Wed Feb  8 11:49:01 2006
Subject: [Biojava-l] warnings, errors, comments
Message-ID: <c343d7080602080753h5a02f636n57595dbf83266ef9@mail.gmail.com>

> What do u think about making a givewarning flag for PDBFileParser?
that would be nice - I would suggest to do it using the java -loggingtool -
- actually that applies to all of biojava and should be suggested tothe list -I find java.util.logging very helpful.

> By default it would be true, but parsing could be invoked so that it> would give no warnings or comments.
this can be done by setting log levels.level.severe, level.warning, level.info, level.finest, etc.
From joel at macresearcher.com  Thu Feb  9 21:15:50 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Thu Feb  9 22:02:57 2006
Subject: [Biojava-l] MacResearch announces iPod giveaway contest
Message-ID: <A17B746A-30CE-4D24-8BAC-9E3620E76563@macresearcher.com>

Help MacResearch.org expand its Script Repository and you could win a  
black 2GB iPod Nano. Eligible contestants must submit a research- 
oriented script that can run natively (no emulators) on Mac OS X 10.3  
or higher without modification before the contest end date. Scripts  
for all scientific domains are welcome including scripts written for  
High Performance Computing (grid, cluster, etc) setup and management.  
If your script does not meet the aforementioned criteria then you  
will not be eligible to win the iPod Nano. Winners will be chosen by  
random drawing. The contest begins 2/8/2006 and ends 2/28/2006.  The  
ultimate goal of this contest, and the script repository in general,  
is to create a valuable community resource that can be used to  
benefit endeavors in research and education. Please don't be shy  
about your coding style or lack of documentation. Your script will  
make someone's life easier. MacResearch.org is the premier, non- 
profit community for scientists using Mac OS X and related hardware  
in their research. To learn more about MacResearch.org and the  
MacResearch.org Script Repository visit http://www.macresearch.org  
and http://www.macresearch.org/script_repository.


For official contest rules see http://www.macresearch.org/ipod_contest
From toddri at eden.rutgers.edu  Thu Feb  9 22:42:44 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu Feb  9 22:57:41 2006
Subject: [Biojava-l] Is FullHmmerProfileHMM (or FlatModel) Broken?
Message-ID: <43EC0BB4.9010105@eden.rutgers.edu>

Hello again,

I have attempted to move up from the ProfileHMM class to the HMMER 
classes (FullHmmerProfileHMM and HmmerProfileHMM).  However, I 
immediately get an error when attempting to create the DP matrix passing 
in the FullHmmerProfileHMM object.

java.lang.ClassCastException: org.biojava.bio.dp.SimpleDotState
   at org.biojava.bio.dp.FlatModel.<init>(FlatModel.java:185)
   at org.biojava.bio.dp.DP.flatView(DP.java:169)
   at 
org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory.java:52)
   at BioJavaHMM.trainHmm(BioJavaHMM.java:840)

Here is the code from FlatModel.java in the ModelInState section:

         if(t instanceof DotState) {
           DotStateWrapper dsw = new DotStateWrapper(t);
           addAState(dsw);
           inModel.put(t, flatM);
           toM.put(t, dsw);
           toM.put(((Wrapper) t).getWrapped(), dsw);   
<-------------line 185!!!!!!!!!
           //System.out.println("Added wrapped dot state " + 
dsw.getName());
         } else if(t instanceof EmissionState) {

This code is a bit confusing, but t appears to be of type 
SimpleDotState, which I do not believe can be cast to type Wrapper.  
Also, should both lines 184 and 185 be executed?

Also, I found this in the source code as well:

     //
     // FIXME -- Matthew broked this...          <--------line 243!!!!!!!!!
     //

Does this mean that some functionality of FlatModel.java is broken?  
Should the ModelInState (and thus the Hmmer classes) be avoided?

Any help would be greatly appreciated,
Todd

From toddri at eden.rutgers.edu  Thu Feb  9 23:02:21 2006
From: toddri at eden.rutgers.edu (Todd Riley)
Date: Thu Feb  9 22:58:03 2006
Subject: [Biojava-l] Looking for a RandomAccessFile-like class for sequences
Message-ID: <43EC104D.4030202@eden.rutgers.edu>

Hello,

I am looking for a RandomAccessFile-like class that can read small, 
arbitrary chunks of a very large (like 250K of human DNA) fasta file.  
(UCSC chromosomal fasta files contain just 1 sequence for the whole 
chromosome).  I was hoping that someone may have already written a class 
that will take in an alphabet and a range (maybe in the form of a 
RangeLocation object) and will return the sequence in that range from 
the file.  I would hate to spend time re-inventing a wheel that may 
already exist.

Thanks,
Todd


From mark.schreiber at novartis.com  Fri Feb 10 01:48:03 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Fri Feb 10 01:43:56 2006
Subject: [Biojava-l] Looking for a RandomAccessFile-like class for
	sequences
Message-ID: <OFA578A410.5EB8C0EF-ON48257111.0025490D-48257111.00255BE2@EU.novartis.net>

Hello Todd,

This sounds a bit like what the biojava BioIndex code does. You make be 
able to use that.

- Mark


Todd Riley <toddri@eden.rutgers.edu>
Sent by: biojava-l-bounces@portal.open-bio.org
02/10/2006 12:02 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Looking for a RandomAccessFile-like class for sequences


Hello,

I am looking for a RandomAccessFile-like class that can read small, 
arbitrary chunks of a very large (like 250K of human DNA) fasta file. 
(UCSC chromosomal fasta files contain just 1 sequence for the whole 
chromosome).  I was hoping that someone may have already written a class 
that will take in an alphabet and a range (maybe in the form of a 
RangeLocation object) and will return the sequence in that range from 
the file.  I would hate to spend time re-inventing a wheel that may 
already exist.

Thanks,
Todd


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From dreher at mpiib-berlin.mpg.de  Fri Feb 10 12:55:28 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Fri Feb 10 13:01:18 2006
Subject: [Biojava-l] BioJavaX-Hibernate: Namespace problem
Message-ID: <43ECD390.1010405@mpiib-berlin.mpg.de>

Hello,
I tried to create different virtual BioSQL-databases for the storage of 
different types of sequences. For testing purposes, I created and saved 
a new Namespace called 'mRNA'. I didn't find out though, how to save a 
newly created sequence inside this namespace.
I tried the following code block:

    Namespace nsp = new SimpleNamespace("mRNA");
    session.saveOrUpdate("Namespace", nsp);
    RichSequenceDB db = new BioSQLRichSequenceDB("mRNA", session);
    RichSequence seq = 
RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","test"));
    db.addRichSequence(seq);
    tx.commit();

The Namespace and the sequence are actually being saved in the database, 
but the sequence is saved in the default namespace 'lcl' and not in the 
new namespace 'mRNA'.
Can someone tell me what I'm missing here?

Thanks in advance,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From tjaart at tuks.co.za  Sun Feb 12 11:51:57 2006
From: tjaart at tuks.co.za (Tjaart de Beer)
Date: Sun Feb 12 11:44:36 2006
Subject: [Biojava-l] Newbie struggling with secondary structure
In-Reply-To: <6127fc200602072306m7820bd12m@mail.gmail.com>
References: <43E782E5.6090502@tuks.co.za>
	<6127fc200602072306m7820bd12m@mail.gmail.com>
Message-ID: <43EF67AD.3030307@tuks.co.za>

Hi

Thanks for all the suggstions. Currently I just want to extract the 
secondary structure as specified in the PDB file. I am having trouble 
understanding how to utilize the AminoAcid class (after having looked at 
the source...). Does anyone have an example of extracting the secondary 
structure from a PDB file using the AminoAcid class in Biojava? Or any 
example using the AminoAcid class to extract info from a PDB file?

Any help would be greatly appreciated!

Martin Heusel wrote:
> Hi Tjaart,
> you can use DSSP to determine the secondary structure from a PDB.
> http://swift.cmbi.ru.nl/gv/dssp/
> or maybe better use Christoph's SecondaryStructure_Predictor with biojava
> http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html
> bye
> Martin
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

-- 
Tjaart de Beer


---------
The software required "Windows XP or better" ... so I installed Linux
From mark.schreiber at novartis.com  Sun Feb 12 20:15:44 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Feb 12 20:11:34 2006
Subject: [Biojava-l] BioJavaX-Hibernate: Namespace problem
Message-ID: <OF7803F1BD.124A3CB6-ON48257114.00064DA0-48257114.0006EF20@EU.novartis.net>

Hello -

When you make a Sequence with DNATools it is not Rich and therefore has no 
namespace. When you enrich it biojava will give it the default namespace 
'lcl' or local. Thus when you add it to the DB you get it added under the 
lcl namespace.

I would make a new SimpleRichSequence instead. Then you can specify it's 
namespace.

- Mark


Felix Dreher <dreher@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
02/11/2006 01:55 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioJavaX-Hibernate: Namespace problem


Hello,
I tried to create different virtual BioSQL-databases for the storage of 
different types of sequences. For testing purposes, I created and saved 
a new Namespace called 'mRNA'. I didn't find out though, how to save a 
newly created sequence inside this namespace.
I tried the following code block:

    Namespace nsp = new SimpleNamespace("mRNA");
    session.saveOrUpdate("Namespace", nsp);
    RichSequenceDB db = new BioSQLRichSequenceDB("mRNA", session);
    RichSequence seq = 
RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","test"));
    db.addRichSequence(seq);
    tx.commit();

The Namespace and the sequence are actually being saved in the database, 
but the sequence is saved in the default namespace 'lcl' and not in the 
new namespace 'mRNA'.
Can someone tell me what I'm missing here?

Thanks in advance,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Mon Feb 13 03:34:06 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Feb 13 03:29:50 2006
Subject: [Biojava-l] easy createRichSequence methods
Message-ID: <OFA1326973.CBA84880-ON48257114.002ED09A-48257114.002F1152@EU.novartis.net>

Hi -

To make it easier to create a RichSequence I have added several overloaded 
createRichSequence(...) methods to RichSequence.Tools.

These are similar to the createSequence methods found in DNATools and 
RNATools and optionally allow you to specify the namespace as either a 
String or a Namespace object.

Now available in CVS.

- Mark
From martin.eklund at farmbio.uu.se  Mon Feb 13 11:06:38 2006
From: martin.eklund at farmbio.uu.se (Martin Eklund)
Date: Mon Feb 13 11:34:44 2006
Subject: [Biojava-l] Persist SingleDP object
Message-ID: <1139846798.8057.47.camel@pele>

Hi,

I'm wondering if there is some way of persisting SingleDP objects? As I
see it, serialization requires quite a lot of rewriting...or? Is there
another way?

Thank you!

Martin.

-- 
========================================
Martin Eklund
PhD Student
Department of Pharmaceutical Biosciences
Uppsala University, Sweden
Ph: +46-18-4714281
========================================

From mthomasc at vub.ac.be  Mon Feb 13 15:36:59 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Mon Feb 13 15:54:06 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
Message-ID: <43F0EDEB.1010801@vub.ac.be>

Hello,

I have tried biojavax today with a view to use the Genbank file parser.

My test file is a Genbank formatted file which has been produced by 
Ensembl export system.

The head of the file is as follow :

LOCUS       6 489671 bp DNA HTG 13-FEB-2006
DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
            52296503..52786173 reannotated via EnsEMBL
ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
VERSION     chromosome:NCBIM34:6:52296503:52786173:1

I used the code provided in biojavax docbook to parse this file.
I get the following error :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 
6 489671 bp DNA HTG 13-FEB-2006
    at 
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

I had a look at GenbankFormat.java, and I guess the problem comes from 
the regular expression that do not recognize the LOCUS as a standard 
Genbank file LOCUS tag.

Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
exported files ?

Morgane.

-- 
*************************************
Morgane THOMAS-CHOLLIER, PHD Student 

Vrije Universiteit Brussels (VUB)    
Laboratory of Cell Genetics          
Pleinlaan 2                          
1050 Brussels                        
Belgium                              


From hotafin at gmail.com  Mon Feb 13 13:05:38 2006
From: hotafin at gmail.com (Tamas Horvath)
Date: Mon Feb 13 16:44:25 2006
Subject: [Biojava-l] Newbie struggling with secondary structure
In-Reply-To: <c343d7080602130151y7a997008n90da3377f2109da3@mail.gmail.com>
References: <43E782E5.6090502@tuks.co.za>
	<6127fc200602072306m7820bd12m@mail.gmail.com>
	<43EF67AD.3030307@tuks.co.za>
	<c343d7080602120921r4c3883adj456dce4d7b202126@mail.gmail.com>
	<43F04B82.7060409@tuks.co.za>
	<c343d7080602130151y7a997008n90da3377f2109da3@mail.gmail.com>
Message-ID: <c343d7080602131005q5700afcfi772342341cb2468a@mail.gmail.com>

Hi!I have the needed modifications of PDBFileParser to read secondary structuredata.The code is not final, there maybe some changes before it is added to cvs.(And I need Andreas to be around for that... )But if you need the file right now, I can send it to you, and you canrecompile your biojava. alternatively I can send u my compiledbiojava.jaras well. It's not the most current cvs, but not too lateeither...
On 2/13/06, Tamas Horvath <hotafin@gmail.com> wrote:>> You are right.... the secondary structure is not yet parsed. But it's> quite easy, so if Andreas is around today, we may add the needed code...>> On 2/13/06, Tjaart de Beer <tjaart@tuks.co.za> wrote:> >> > Thanks for help! But I still have problems. Here is my code, please see> > if you can find anything wrong....> >> >> > import org.biojava.bio.structure.*;> > import org.biojava.bio.structure.io.*;> > .> > .> > .> > PDBFileReader file = new PDBFileReader();> > Structure structure = file.getStructure (filename); //I used 1eye.pdb> > Chain chain = structure.getChain(0); //Only get 1st chain> > ArrayList s = chain.getGroups("amino"); //Get amino acids> > AminoAcidImpl a  = (AminoAcidImpl)s.get(28); //Make object of element 28> >> > Map secStruc = a.getSecStruc(); -> This returns an empty map...> > ???> >> >> > My chain variable contains the specifed chain and I can get a specific> > value for element 28. When I check the class of a element in the> > ArrayList it says "class org.biojava.bio.structure.AminoAcidImpl". Thus> > I now want to use the getSecStruc method on that element. Typing> > "System.out.println(s.get(28).getSecStruc())" does not give me anyhting> > but an empty array.> >> > Any help would be appreciated...> >> >> > Tamas Horvath wrote:> > > It's very easy really. Especially if you use cvs BioJava.> > >> > > You parse a PDB file using PDBFileParser.> > > That gives you a structure object.> > > Than you need to iterate through all models (if there are more than 1)> > > and all chains in them.> > > Once you have a chain object, you can iterate through it.> > > for example you can say findChain("A")> > > that gives you the "A" chain.> > > then you can say getGroups("amino").> > > That gives you a list of the aminoacids.> > > And every aminoacid object has a secondary structure attribute.> > >> > > On 2/12/06, *Tjaart de Beer* <tjaart@tuks.co.za> > > <mailto:tjaart@tuks.co.za>> wrote:> > >> > >     Hi> > >> > >     Thanks for all the suggstions. Currently I just want to extract> > the> > >     secondary structure as specified in the PDB file. I am having> > trouble> > >     understanding how to utilize the AminoAcid class (after having> > >     looked at> > >     the source...). Does anyone have an example of extracting the> > secondary> > >     structure from a PDB file using the AminoAcid class in Biojava? Or> > any> > >     example using the AminoAcid class to extract info from a PDB file?> >> > >> > >     Any help would be greatly appreciated!> > >> > >     Martin Heusel wrote:> > >      > Hi Tjaart,> > >      > you can use DSSP to determine the secondary structure from a> > PDB.> > >      > http://swift.cmbi.ru.nl/gv/dssp/ <> > http://swift.cmbi.ru.nl/gv/dssp/>> > >      > or maybe better use Christoph's SecondaryStructure_Predictor> > with> > >     biojava> > >      >> > >     http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html> >> > >     <> > http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html> > >> > >      > bye> > >      > Martin> > >      > _______________________________________________> > >      > Biojava-l mailing list  -  Biojava-l@biojava.org> > >     <mailto: Biojava-l@biojava.org>> > >      > http://biojava.org/mailman/listinfo/biojava-l> > >      >> > >> > >     --> > >     Tjaart de Beer> > >> > >> > >     ---------> > >     The software required "Windows XP or better" ... so I installed> > Linux> > >     _______________________________________________> > >     Biojava-l mailing list  -  Biojava-l@biojava.org> > >     <mailto:Biojava-l@biojava.org>> > >     http://biojava.org/mailman/listinfo/biojava-l> > >> > >> >> > --> > Tjaart de Beer> > Bioinformatics and Computational Biology Unit> > Department Biochemistry> > FABI Square/Bioinformatics building> > Faculty of Natural Sciences> > University of Pretoria> > Lynwood rd> > Pretoria> > South Africa> > 0001> >> > Tel:    +27 12 420 5802> > Cell:   +27 83 504 7914> > Fax:    +27 12 420 5800> > Email:  tjaart@tuks.co.za> >         tdebeer@gmail.com> >> > ---------> > The software required "Windows XP or better" ... so I installed Linux> >>>
From mark.schreiber at novartis.com  Mon Feb 13 20:11:07 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Feb 13 20:06:53 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
Message-ID: <OFF30A8464.F9BA76EB-ON48257115.00063DA0-48257115.00068323@EU.novartis.net>

Hi Morgane -

I have to say that doesn't look much like Genbank : )

The biojavax parser are possibly a bit brittle due to their use of regexps 
to recognize key elements. It should be fixable, I think the problem is 
that the parser expects a word after LOCUS not a number. This may not be 
the only problem though. Could you post the entire file? Or if it is large 
then a representative file of smaller size.

- Mark


Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
Sent by: biojava-l-bounces@portal.open-bio.org
02/14/2006 04:36 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Genbank  parser error [biojavax]


Hello,

I have tried biojavax today with a view to use the Genbank file parser.

My test file is a Genbank formatted file which has been produced by 
Ensembl export system.

The head of the file is as follow :

LOCUS       6 489671 bp DNA HTG 13-FEB-2006
DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
            52296503..52786173 reannotated via EnsEMBL
ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
VERSION     chromosome:NCBIM34:6:52296503:52786173:1

I used the code provided in biojavax docbook to parse this file.
I get the following error :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 
6 489671 bp DNA HTG 13-FEB-2006
    at 
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

I had a look at GenbankFormat.java, and I guess the problem comes from 
the regular expression that do not recognize the LOCUS as a standard 
Genbank file LOCUS tag.

Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
exported files ?

Morgane.

-- 
*************************************
Morgane THOMAS-CHOLLIER, PHD Student 

Vrije Universiteit Brussels (VUB) 
Laboratory of Cell Genetics 
Pleinlaan 2 
1050 Brussels 
Belgium 


_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Mon Feb 13 20:45:11 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Feb 13 20:40:56 2006
Subject: [Biojava-l] Persist SingleDP object
Message-ID: <OF825B3A47.D61D26DB-ON48257115.000984E4-48257115.0009A171@EU.novartis.net>

Hi Martin -

You could try the XmlMarkovModel class. It has readModel and writeModel to 
write markov models as XML. I have used this successfully for models in 
the past.

- Mark


Martin Eklund <martin.eklund@farmbio.uu.se>
Sent by: biojava-l-bounces@portal.open-bio.org
02/14/2006 12:06 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Persist SingleDP object


Hi,

I'm wondering if there is some way of persisting SingleDP objects? As I
see it, serialization requires quite a lot of rewriting...or? Is there
another way?

Thank you!

Martin.

-- 
========================================
Martin Eklund
PhD Student
Department of Pharmaceutical Biosciences
Uppsala University, Sweden
Ph: +46-18-4714281
========================================

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From crackeur at comcast.net  Mon Feb 13 20:47:42 2006
From: crackeur at comcast.net (Jimmy Zhang)
Date: Mon Feb 13 20:53:07 2006
Subject: [Biojava-l] [ANN] VTD-XML Version 1.5 Released
References: <OFF30A8464.F9BA76EB-ON48257115.00063DA0-48257115.00068323@EU.novartis.net>
Message-ID: <006501c63108$a6979850$0d02a8c0@ximpleware>

[ANN] VTD-XML Version 1.5 Released

Eight years after the invention of XML, DOM and SAX, 
despite their respective issues, are still the mainstays 
of application developers.  
 
So is it the end of road for XML parsing innovation? 
 
The VTD-XML project team think not. We are proud to 
announce the availability of both C and Java version 
1.5 of VTD-XML, the next generation open-source XML 
parser that goes beyond DOM and SAX in terms of 
performance, memory usage and ease of use. 
 
The technical highlights of VTD-XML are: 

* Performance: the world's fastest XML parser,
  between 5x~10x faster than DOM
* Memory Usage: 3x to 5x less than DOM, 1.3x~1.5x
  XML document size
* Random access with built-in XPath support
* A simple and intuitive API 

Other advanced features include:
* Buffer reuse
* Large document support (2GByte)
* Incremental update
* Hardware acceleration
* Native XML indexing.

For demos, latest benchmarks, related articles and software 
downloads, please visit http://vtd-xml.sf.net. Also let us 
know your thoughts and suggestions and help us improve 
VTD-XML.


From mark.schreiber at novartis.com  Tue Feb 14 04:45:07 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Feb 14 04:41:10 2006
Subject: [Biojava-l] Contributers
Message-ID: <OFAFF1E278.04B90F48-ON48257115.003475D0-48257115.003591F4@EU.novartis.net>

Hi all -

As you will know we are moving biojava to a wiki based format. On the 
BioJava community page I am trying to extend the list of people who 
contributed to the project in someway. Currently the list is very 
incomplete.

If you have made any kind of contribution in the past please add yourself 
to the list. It's simple enough to do. Follow these instructions.

1. Go to http://biojava.open-bio.org/wiki/BioJava:Community_Portal
2. Click the edit button next to the contributers heading (you might be 
prompted to login, just give yourself a username and password and hey 
presto your editing)
3. add a link for yourself eg, * [[Joe Bloggs|Joe Bloggs]] and save the 
page.
4. Joe Bloggs will now appear as a red link on the page. Click the link 
and start adding information about yourself and what you did (do) with 
biojava.
5. add [[Category:People]] to the bottom of the page
6. Save the page.
7. By default you will have a [[User:MyUserName]] page. In this you should 
put #REDIRECT [[Joe Bloggs]]

Why should you do this?

1. Because you can.
2. It's really the only way people who contribute to biojava get any 
credit at all for their contributions to humanity.
3. It would be really great to keep some kind of record of who contributed 
what and how many people have contributed to biojava. This will be 
essential if we ever publish.

Don't make me look through the @author tags!

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From mthomasc at vub.ac.be  Tue Feb 14 08:33:02 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Tue Feb 14 08:28:58 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
In-Reply-To: <OFF30A8464.F9BA76EB-ON48257115.00063DA0-48257115.00068323@EU.novartis.net>
References: <OFF30A8464.F9BA76EB-ON48257115.00063DA0-48257115.00068323@EU.novartis.net>
Message-ID: <43F1DC0E.7050809@vub.ac.be>

Hello Mark,

My file is indeed too large to be posted.
So I have exported a smaller sequence from Ensembl that I tested with 
the parser. The behavior is the same.
You will find below this "Genbank" formatted file enclosed.

Thanks for your help,

Morgane.

LOCUS       6 3498 bp DNA HTG 14-FEB-2006
DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
            52305503..52309000 reannotated via EnsEMBL
ACCESSION   chromosome:NCBIM34:6:52305503:52309000:1
VERSION     chromosome:NCBIM34:6:52305503:52309000:1
KEYWORDS    .
SOURCE      House mouse
  ORGANISM  Mus musculus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
            Sciurognathi; Muridae; Murinae; Mus.
COMMENT     This sequence was annotated by the Ensembl system. Please visit the
            Ensembl web site, http://www.ensembl.org/ for more information.
COMMENT     All feature locations are relative to the first (5') base of the
            sequence in this file.  The sequence presented is always the
            forward strand of the assembly. Features that lie outside of the
            sequence contained in this file have clonal location coordinates in
            the format: .:..
COMMENT     The /gene indicates a unique id for a gene,
            /note="transcript_id=..." a unique id for a transcript, /protein_id
            a unique id for a peptide and note="exon_id=..." a unique id for an
            exon. These ids are maintained wherever possible between versions.
COMMENT     All the exons and transcripts in Ensembl are confirmed by
            similarity to either protein or cDNA sequences.
FEATURES             Location/Qualifiers
     source          1..3498
                     /organism="Mus musculus"
                     /db_xref="taxon:10090"
     gene            complement(506..2826)
                     /gene=ENSMUSG00000014704
     mRNA            join(complement(2261..2826),complement(506..1620))
                     /gene="ENSMUSG00000014704"
                     /note="transcript_id=ENSMUST00000014848"
     CDS             join(complement(2261..2639),complement(881..1620))
                     /gene="ENSMUSG00000014704"
                     /protein_id="ENSMUSP00000014848"
                     /note="transcript_id=ENSMUST00000014848"
                     /db_xref="MarkerSymbol:Hoxa2"
                     /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE"
                     /db_xref="RefSeq_peptide:NP_034581.1"
                     /db_xref="RefSeq_dna:NM_010451.1"
                     /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE"
                     /db_xref="EntrezGene:15399"
                     /db_xref="AgilentProbe:A_51_P501803"
                     /db_xref="EMBL:AB039184"
                     /db_xref="EMBL:AB039185"
                     /db_xref="EMBL:AB039186"
                     /db_xref="EMBL:AB039187"
                     /db_xref="EMBL:AB039188"
                     /db_xref="EMBL:AB039189"
                     /db_xref="EMBL:AB039190"
                     /db_xref="EMBL:AB039191"
                     /db_xref="EMBL:AB039192"
                     /db_xref="EMBL:AK134501"
                     /db_xref="EMBL:M87801"
                     /db_xref="EMBL:M93148"
                     /db_xref="EMBL:M93292"
                     /db_xref="EMBL:M95599"
                     /db_xref="GO:GO:0003700"
                     /db_xref="GO:GO:0005634"
                     /db_xref="GO:GO:0006355"
                     /db_xref="GO:GO:0007275"
                     /db_xref="IPI:IPI00132242.1"
                     /db_xref="UniGene:Mm.131"
                     /db_xref="protein_id:AAA37827.1"
                     /db_xref="protein_id:AAA37834.1"
                     /db_xref="protein_id:AAA37835.1"
                     /db_xref="protein_id:AAA37836.1"
                     /db_xref="protein_id:BAB68708.1"
                     /db_xref="protein_id:BAB68709.1"
                     /db_xref="protein_id:BAB68710.1"
                     /db_xref="protein_id:BAB68711.1"
                     /db_xref="protein_id:BAB68712.1"
                     /db_xref="protein_id:BAB68713.1"
                     /db_xref="protein_id:BAB68714.1"
                     /db_xref="protein_id:BAB68715.1"
                     /db_xref="protein_id:BAB68716.1"
                     /db_xref="protein_id:BAE22163.1"
                     /db_xref="AFFY_MG_U74Av2:102643_at"
                     /db_xref="AFFY_MG_U74Cv2:171063_at"
                     /db_xref="AFFY_Mouse430A_2:1419602_at"
                     /db_xref="AFFY_Mouse430_2:1419602_at"
                     /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH
                     STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE
                     KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF
                     NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK
                     VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN
                     EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD
                     ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY"
     exon            complement(506..1620)
                     /note="exon_id=ENSMUSE00000387033"
     exon            complement(2261..2826)
                     /note="exon_id=ENSMUSE00000193269"
BASE COUNT  938 a 815 c 882 g 863 t
ORIGIN
        1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA ATTTTTGATA
       61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC ACTCCACTCG
      121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG CTTGGGCTAG
      181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG GCCTGAGTCT
      241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA AAAAAAAAAA
      301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT TTGTTGCAGG
      361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG TGACCAGACT
      421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT TGAGAAAGAG
      481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA CCAAAAATAC
      541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG ACAATTTATG
      601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA AGCTTGTTGG
      661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC TTTAAAACTG
      721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG GGTAGATCAA
      781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA CCTGGTCAAA
      841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC AGATGCTGTA
      901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG ATATCTACAG
      961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC AGGCAGGAAT
     1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG GGACTGTCAT
     1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA ACAGTGGGTG
     1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA ACTGGGAAAG
     1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA TTTTGCTGAA
     1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC TCAAAGAGTG
     1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA AATTTCCCTT
     1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC CGGTTCTGAA
     1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC ACCCTGCGGG
     1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC TGAGTGTTGG
     1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT TCCAGGGATT
     1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG GGTCCGAGCA
     1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA AATGGCCGCC
     1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG GGAAGCCCAG
     1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC ATCCGGGAGC
     1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT AGCTGAGCAA
     1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA CTAGACAAGA
     1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG AAAGTGCCCC
     2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC CCTCCACCAA
     2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC TCTCTCCCCC
     2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC CGGAGGGGGA
     2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC GAGGCAGGCA
     2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT CTTCTCCTTC
     2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC GCGACTGCCC
     2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG ACTGCCCGGG
     2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG TGAAAGCGTC
     2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA TGTCAGGCAC
     2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC GTAATTCATG
     2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG CTTTGGGGGG
     2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG AAGATCGCTG
     2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG CTACTATTAA
     2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA CATGATTGCT
     2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT GATTGATCCA
     2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC ACTTTTTTTC
     3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC GTGGGGGGCG
     3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA GTGTGTGTGT
     3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG CCTCCCCCGC
     3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA AATCATTTAA
     3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT CAAAGTTTTG
     3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG AAAGGAGCAG
     3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA GAGAGAGAGA
     3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC TCTTCCTCCT
     3481 CTTTTTCCAA AATCAGTT
//


mark.schreiber@novartis.com wrote:

>Hi Morgane -
>
>I have to say that doesn't look much like Genbank : )
>
>The biojavax parser are possibly a bit brittle due to their use of regexps 
>to recognize key elements. It should be fixable, I think the problem is 
>that the parser expects a word after LOCUS not a number. This may not be 
>the only problem though. Could you post the entire file? Or if it is large 
>then a representative file of smaller size.
>
>- Mark
>
>
>
>
>
>Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>Sent by: biojava-l-bounces@portal.open-bio.org
>02/14/2006 04:36 AM
>
> 
>        To:     biojava-l@biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        [Biojava-l] Genbank  parser error [biojavax]
>
>
>Hello,
>
>I have tried biojavax today with a view to use the Genbank file parser.
>
>My test file is a Genbank formatted file which has been produced by 
>Ensembl export system.
>
>The head of the file is as follow :
>
>LOCUS       6 489671 bp DNA HTG 13-FEB-2006
>DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>            52296503..52786173 reannotated via EnsEMBL
>ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
>VERSION     chromosome:NCBIM34:6:52296503:52786173:1
>
>I used the code provided in biojavax docbook to parse this file.
>I get the following error :
>
>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>sequence
>    at 
>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>    at 
>org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
>Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 
>6 489671 bp DNA HTG 13-FEB-2006
>    at 
>org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
>    at 
>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>    ... 1 more
>
>I had a look at GenbankFormat.java, and I guess the problem comes from 
>the regular expression that do not recognize the LOCUS as a standard 
>Genbank file LOCUS tag.
>
>Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
>exported files ?
>
>Morgane.
>
>  
>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be)

Vrije Universiteit Brussels (VUB)    
Laboratory of Cell Genetics          
Pleinlaan 2                          
1050 Brussels                        
Belgium                              

Tel : +32 2 629 15 22                		     
**********************************************************
Stop Using Internet Explorer, choose FIREFOX !
http://emmanuel.clement.free.fr/navigateurs/comparatif.htm

From mthomasc at vub.ac.be  Wed Feb 15 03:56:53 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Wed Feb 15 03:52:50 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
In-Reply-To: <43F1DC0E.7050809@vub.ac.be>
References: <OFF30A8464.F9BA76EB-ON48257115.00063DA0-48257115.00068323@EU.novartis.net>
	<43F1DC0E.7050809@vub.ac.be>
Message-ID: <43F2ECD5.1070605@vub.ac.be>

Hello again,

I have continued using the Genbank parser, but this time with Genbank 
files coming from NCBI :)

I really appreciate the example from the documentation that converts a 
Genbank file into an EMBL file. I have to say, it is really easy to use.

I nevertheless have a question concerning the Organism and Source tags. 
Indeed, it is clear in the documentation that they are ignored by the 
parser.
But I do not really understand why.
When I used the Genbank file of the accession numbers : AC147788 and 
DQ158013, I was unable to get the common name of the organism or use 
getNameHierarchy(), but I can get the taxon ID for both.

Is there a way to get the common name of the organism, without using a 
remote call to the NCBI with the taxonID ?

Thanks for your help,

Morgane.

Morgane THOMAS-CHOLLIER wrote:

> Hello Mark,
>
> My file is indeed too large to be posted.
> So I have exported a smaller sequence from Ensembl that I tested with 
> the parser. The behavior is the same.
> You will find below this "Genbank" formatted file enclosed.
>
> Thanks for your help,
>
> Morgane.
>
> LOCUS       6 3498 bp DNA HTG 14-FEB-2006
> DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>            52305503..52309000 reannotated via EnsEMBL
> ACCESSION   chromosome:NCBIM34:6:52305503:52309000:1
> VERSION     chromosome:NCBIM34:6:52305503:52309000:1
> KEYWORDS    .
> SOURCE      House mouse
>  ORGANISM  Mus musculus
>            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
> Euteleostomi;
>            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
>            Sciurognathi; Muridae; Murinae; Mus.
> COMMENT     This sequence was annotated by the Ensembl system. Please 
> visit the
>            Ensembl web site, http://www.ensembl.org/ for more 
> information.
> COMMENT     All feature locations are relative to the first (5') base 
> of the
>            sequence in this file.  The sequence presented is always the
>            forward strand of the assembly. Features that lie outside 
> of the
>            sequence contained in this file have clonal location 
> coordinates in
>            the format: .:..
> COMMENT     The /gene indicates a unique id for a gene,
>            /note="transcript_id=..." a unique id for a transcript, 
> /protein_id
>            a unique id for a peptide and note="exon_id=..." a unique 
> id for an
>            exon. These ids are maintained wherever possible between 
> versions.
> COMMENT     All the exons and transcripts in Ensembl are confirmed by
>            similarity to either protein or cDNA sequences.
> FEATURES             Location/Qualifiers
>     source          1..3498
>                     /organism="Mus musculus"
>                     /db_xref="taxon:10090"
>     gene            complement(506..2826)
>                     /gene=ENSMUSG00000014704
>     mRNA            join(complement(2261..2826),complement(506..1620))
>                     /gene="ENSMUSG00000014704"
>                     /note="transcript_id=ENSMUST00000014848"
>     CDS             join(complement(2261..2639),complement(881..1620))
>                     /gene="ENSMUSG00000014704"
>                     /protein_id="ENSMUSP00000014848"
>                     /note="transcript_id=ENSMUST00000014848"
>                     /db_xref="MarkerSymbol:Hoxa2"
>                     /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE"
>                     /db_xref="RefSeq_peptide:NP_034581.1"
>                     /db_xref="RefSeq_dna:NM_010451.1"
>                     /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE"
>                     /db_xref="EntrezGene:15399"
>                     /db_xref="AgilentProbe:A_51_P501803"
>                     /db_xref="EMBL:AB039184"
>                     /db_xref="EMBL:AB039185"
>                     /db_xref="EMBL:AB039186"
>                     /db_xref="EMBL:AB039187"
>                     /db_xref="EMBL:AB039188"
>                     /db_xref="EMBL:AB039189"
>                     /db_xref="EMBL:AB039190"
>                     /db_xref="EMBL:AB039191"
>                     /db_xref="EMBL:AB039192"
>                     /db_xref="EMBL:AK134501"
>                     /db_xref="EMBL:M87801"
>                     /db_xref="EMBL:M93148"
>                     /db_xref="EMBL:M93292"
>                     /db_xref="EMBL:M95599"
>                     /db_xref="GO:GO:0003700"
>                     /db_xref="GO:GO:0005634"
>                     /db_xref="GO:GO:0006355"
>                     /db_xref="GO:GO:0007275"
>                     /db_xref="IPI:IPI00132242.1"
>                     /db_xref="UniGene:Mm.131"
>                     /db_xref="protein_id:AAA37827.1"
>                     /db_xref="protein_id:AAA37834.1"
>                     /db_xref="protein_id:AAA37835.1"
>                     /db_xref="protein_id:AAA37836.1"
>                     /db_xref="protein_id:BAB68708.1"
>                     /db_xref="protein_id:BAB68709.1"
>                     /db_xref="protein_id:BAB68710.1"
>                     /db_xref="protein_id:BAB68711.1"
>                     /db_xref="protein_id:BAB68712.1"
>                     /db_xref="protein_id:BAB68713.1"
>                     /db_xref="protein_id:BAB68714.1"
>                     /db_xref="protein_id:BAB68715.1"
>                     /db_xref="protein_id:BAB68716.1"
>                     /db_xref="protein_id:BAE22163.1"
>                     /db_xref="AFFY_MG_U74Av2:102643_at"
>                     /db_xref="AFFY_MG_U74Cv2:171063_at"
>                     /db_xref="AFFY_Mouse430A_2:1419602_at"
>                     /db_xref="AFFY_Mouse430_2:1419602_at"
>                     
> /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH
>                     
> STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE
>                     
> KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF
>                     
> NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK
>                     
> VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN
>                     
> EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD
>                     ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY"
>     exon            complement(506..1620)
>                     /note="exon_id=ENSMUSE00000387033"
>     exon            complement(2261..2826)
>                     /note="exon_id=ENSMUSE00000193269"
> BASE COUNT  938 a 815 c 882 g 863 t
> ORIGIN
>        1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA 
> ATTTTTGATA
>       61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC 
> ACTCCACTCG
>      121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG 
> CTTGGGCTAG
>      181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG 
> GCCTGAGTCT
>      241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA 
> AAAAAAAAAA
>      301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT 
> TTGTTGCAGG
>      361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG 
> TGACCAGACT
>      421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT 
> TGAGAAAGAG
>      481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA 
> CCAAAAATAC
>      541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG 
> ACAATTTATG
>      601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA 
> AGCTTGTTGG
>      661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC 
> TTTAAAACTG
>      721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG 
> GGTAGATCAA
>      781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA 
> CCTGGTCAAA
>      841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC 
> AGATGCTGTA
>      901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG 
> ATATCTACAG
>      961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC 
> AGGCAGGAAT
>     1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG 
> GGACTGTCAT
>     1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA 
> ACAGTGGGTG
>     1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA 
> ACTGGGAAAG
>     1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA 
> TTTTGCTGAA
>     1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC 
> TCAAAGAGTG
>     1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA 
> AATTTCCCTT
>     1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC 
> CGGTTCTGAA
>     1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC 
> ACCCTGCGGG
>     1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC 
> TGAGTGTTGG
>     1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT 
> TCCAGGGATT
>     1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG 
> GGTCCGAGCA
>     1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA 
> AATGGCCGCC
>     1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG 
> GGAAGCCCAG
>     1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC 
> ATCCGGGAGC
>     1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT 
> AGCTGAGCAA
>     1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA 
> CTAGACAAGA
>     1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG 
> AAAGTGCCCC
>     2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC 
> CCTCCACCAA
>     2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC 
> TCTCTCCCCC
>     2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC 
> CGGAGGGGGA
>     2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC 
> GAGGCAGGCA
>     2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT 
> CTTCTCCTTC
>     2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC 
> GCGACTGCCC
>     2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG 
> ACTGCCCGGG
>     2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG 
> TGAAAGCGTC
>     2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA 
> TGTCAGGCAC
>     2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC 
> GTAATTCATG
>     2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG 
> CTTTGGGGGG
>     2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG 
> AAGATCGCTG
>     2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG 
> CTACTATTAA
>     2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA 
> CATGATTGCT
>     2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT 
> GATTGATCCA
>     2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC 
> ACTTTTTTTC
>     3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC 
> GTGGGGGGCG
>     3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA 
> GTGTGTGTGT
>     3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG 
> CCTCCCCCGC
>     3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA 
> AATCATTTAA
>     3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT 
> CAAAGTTTTG
>     3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG 
> AAAGGAGCAG
>     3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA 
> GAGAGAGAGA
>     3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC 
> TCTTCCTCCT
>     3481 CTTTTTCCAA AATCAGTT
> //
>
>
>
>
> mark.schreiber@novartis.com wrote:
>
>> Hi Morgane -
>>
>> I have to say that doesn't look much like Genbank : )
>>
>> The biojavax parser are possibly a bit brittle due to their use of 
>> regexps to recognize key elements. It should be fixable, I think the 
>> problem is that the parser expects a word after LOCUS not a number. 
>> This may not be the only problem though. Could you post the entire 
>> file? Or if it is large then a representative file of smaller size.
>>
>> - Mark
>>
>>
>>
>>
>>
>> Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>> Sent by: biojava-l-bounces@portal.open-bio.org
>> 02/14/2006 04:36 AM
>>
>>
>>        To:     biojava-l@biojava.org
>>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>>        Subject:        [Biojava-l] Genbank  parser error [biojavax]
>>
>>
>> Hello,
>>
>> I have tried biojavax today with a view to use the Genbank file parser.
>>
>> My test file is a Genbank formatted file which has been produced by 
>> Ensembl export system.
>>
>> The head of the file is as follow :
>>
>> LOCUS       6 489671 bp DNA HTG 13-FEB-2006
>> DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>>            52296503..52786173 reannotated via EnsEMBL
>> ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
>> VERSION     chromosome:NCBIM34:6:52296503:52786173:1
>>
>> I used the code provided in biojavax docbook to parse this file.
>> I get the following error :
>>
>> Exception in thread "main" org.biojava.bio.BioException: Could not 
>> read sequence
>>    at 
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) 
>>
>>    at 
>> org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) 
>>
>> Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line 
>> found: 6 489671 bp DNA HTG 13-FEB-2006
>>    at 
>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) 
>>
>>    at 
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) 
>>
>>    ... 1 more
>>
>> I had a look at GenbankFormat.java, and I guess the problem comes 
>> from the regular expression that do not recognize the LOCUS as a 
>> standard Genbank file LOCUS tag.
>>
>> Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
>> exported files ?
>>
>> Morgane.
>>
>>  
>>
>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be)
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium

From mark.schreiber at novartis.com  Wed Feb 15 04:00:44 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Feb 15 03:56:32 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
Message-ID: <OF9029C41A.DCC8FB0D-ON48257116.0031629F-48257116.003181C8@EU.novartis.net>

Hi Morgane -

Turned out to be a problem with a greedy regexp parsing the LOCUS tag. 
This is fixed in CVS. Let me know if something else is a problem.

- Mark


Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
Sent by: biojava-l-bounces@portal.open-bio.org
02/14/2006 09:33 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Genbank  parser error [biojavax]


Hello Mark,

My file is indeed too large to be posted.
So I have exported a smaller sequence from Ensembl that I tested with 
the parser. The behavior is the same.
You will find below this "Genbank" formatted file enclosed.

Thanks for your help,

Morgane.

LOCUS       6 3498 bp DNA HTG 14-FEB-2006
DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
            52305503..52309000 reannotated via EnsEMBL
ACCESSION   chromosome:NCBIM34:6:52305503:52309000:1
VERSION     chromosome:NCBIM34:6:52305503:52309000:1
KEYWORDS    .
SOURCE      House mouse
  ORGANISM  Mus musculus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
            Sciurognathi; Muridae; Murinae; Mus.
COMMENT     This sequence was annotated by the Ensembl system. Please 
visit the
            Ensembl web site, http://www.ensembl.org/ for more information.
COMMENT     All feature locations are relative to the first (5') base of 
the
            sequence in this file.  The sequence presented is always the
            forward strand of the assembly. Features that lie outside of 
the
            sequence contained in this file have clonal location 
coordinates in
            the format: .:..
COMMENT     The /gene indicates a unique id for a gene,
            /note="transcript_id=..." a unique id for a transcript, 
/protein_id
            a unique id for a peptide and note="exon_id=..." a unique id 
for an
            exon. These ids are maintained wherever possible between 
versions.
COMMENT     All the exons and transcripts in Ensembl are confirmed by
            similarity to either protein or cDNA sequences.
FEATURES             Location/Qualifiers
     source          1..3498
                     /organism="Mus musculus"
                     /db_xref="taxon:10090"
     gene            complement(506..2826)
                     /gene=ENSMUSG00000014704
     mRNA            join(complement(2261..2826),complement(506..1620))
                     /gene="ENSMUSG00000014704"
                     /note="transcript_id=ENSMUST00000014848"
     CDS             join(complement(2261..2639),complement(881..1620))
                     /gene="ENSMUSG00000014704"
                     /protein_id="ENSMUSP00000014848"
                     /note="transcript_id=ENSMUST00000014848"
                     /db_xref="MarkerSymbol:Hoxa2"
                     /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE"
                     /db_xref="RefSeq_peptide:NP_034581.1"
                     /db_xref="RefSeq_dna:NM_010451.1"
                     /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE"
                     /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE"
                     /db_xref="EntrezGene:15399"
                     /db_xref="AgilentProbe:A_51_P501803"
                     /db_xref="EMBL:AB039184"
                     /db_xref="EMBL:AB039185"
                     /db_xref="EMBL:AB039186"
                     /db_xref="EMBL:AB039187"
                     /db_xref="EMBL:AB039188"
                     /db_xref="EMBL:AB039189"
                     /db_xref="EMBL:AB039190"
                     /db_xref="EMBL:AB039191"
                     /db_xref="EMBL:AB039192"
                     /db_xref="EMBL:AK134501"
                     /db_xref="EMBL:M87801"
                     /db_xref="EMBL:M93148"
                     /db_xref="EMBL:M93292"
                     /db_xref="EMBL:M95599"
                     /db_xref="GO:GO:0003700"
                     /db_xref="GO:GO:0005634"
                     /db_xref="GO:GO:0006355"
                     /db_xref="GO:GO:0007275"
                     /db_xref="IPI:IPI00132242.1"
                     /db_xref="UniGene:Mm.131"
                     /db_xref="protein_id:AAA37827.1"
                     /db_xref="protein_id:AAA37834.1"
                     /db_xref="protein_id:AAA37835.1"
                     /db_xref="protein_id:AAA37836.1"
                     /db_xref="protein_id:BAB68708.1"
                     /db_xref="protein_id:BAB68709.1"
                     /db_xref="protein_id:BAB68710.1"
                     /db_xref="protein_id:BAB68711.1"
                     /db_xref="protein_id:BAB68712.1"
                     /db_xref="protein_id:BAB68713.1"
                     /db_xref="protein_id:BAB68714.1"
                     /db_xref="protein_id:BAB68715.1"
                     /db_xref="protein_id:BAB68716.1"
                     /db_xref="protein_id:BAE22163.1"
                     /db_xref="AFFY_MG_U74Av2:102643_at"
                     /db_xref="AFFY_MG_U74Cv2:171063_at"
                     /db_xref="AFFY_Mouse430A_2:1419602_at"
                     /db_xref="AFFY_Mouse430_2:1419602_at"
 /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH
 STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE
 KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF
 NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK
 VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN
 EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD
                     ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY"
     exon            complement(506..1620)
                     /note="exon_id=ENSMUSE00000387033"
     exon            complement(2261..2826)
                     /note="exon_id=ENSMUSE00000193269"
BASE COUNT  938 a 815 c 882 g 863 t
ORIGIN
        1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA 
ATTTTTGATA
       61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC 
ACTCCACTCG
      121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG 
CTTGGGCTAG
      181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG 
GCCTGAGTCT
      241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA 
AAAAAAAAAA
      301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT 
TTGTTGCAGG
      361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG 
TGACCAGACT
      421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT 
TGAGAAAGAG
      481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA 
CCAAAAATAC
      541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG 
ACAATTTATG
      601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA 
AGCTTGTTGG
      661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC 
TTTAAAACTG
      721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG 
GGTAGATCAA
      781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA 
CCTGGTCAAA
      841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC 
AGATGCTGTA
      901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG 
ATATCTACAG
      961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC 
AGGCAGGAAT
     1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG 
GGACTGTCAT
     1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA 
ACAGTGGGTG
     1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA 
ACTGGGAAAG
     1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA 
TTTTGCTGAA
     1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC 
TCAAAGAGTG
     1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA 
AATTTCCCTT
     1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC 
CGGTTCTGAA
     1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC 
ACCCTGCGGG
     1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC 
TGAGTGTTGG
     1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT 
TCCAGGGATT
     1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG 
GGTCCGAGCA
     1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA 
AATGGCCGCC
     1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG 
GGAAGCCCAG
     1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC 
ATCCGGGAGC
     1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT 
AGCTGAGCAA
     1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA 
CTAGACAAGA
     1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG 
AAAGTGCCCC
     2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC 
CCTCCACCAA
     2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC 
TCTCTCCCCC
     2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC 
CGGAGGGGGA
     2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC 
GAGGCAGGCA
     2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT 
CTTCTCCTTC
     2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC 
GCGACTGCCC
     2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG 
ACTGCCCGGG
     2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG 
TGAAAGCGTC
     2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA 
TGTCAGGCAC
     2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC 
GTAATTCATG
     2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG 
CTTTGGGGGG
     2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG 
AAGATCGCTG
     2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG 
CTACTATTAA
     2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA 
CATGATTGCT
     2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT 
GATTGATCCA
     2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC 
ACTTTTTTTC
     3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC 
GTGGGGGGCG
     3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA 
GTGTGTGTGT
     3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG 
CCTCCCCCGC
     3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA 
AATCATTTAA
     3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT 
CAAAGTTTTG
     3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG 
AAAGGAGCAG
     3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA 
GAGAGAGAGA
     3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC 
TCTTCCTCCT
     3481 CTTTTTCCAA AATCAGTT
//


mark.schreiber@novartis.com wrote:

>Hi Morgane -
>
>I have to say that doesn't look much like Genbank : )
>
>The biojavax parser are possibly a bit brittle due to their use of 
regexps 
>to recognize key elements. It should be fixable, I think the problem is 
>that the parser expects a word after LOCUS not a number. This may not be 
>the only problem though. Could you post the entire file? Or if it is 
large 
>then a representative file of smaller size.
>
>- Mark
>
>
>
>
>
>Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>Sent by: biojava-l-bounces@portal.open-bio.org
>02/14/2006 04:36 AM
>
> 
>        To:     biojava-l@biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        [Biojava-l] Genbank  parser error [biojavax]
>
>
>Hello,
>
>I have tried biojavax today with a view to use the Genbank file parser.
>
>My test file is a Genbank formatted file which has been produced by 
>Ensembl export system.
>
>The head of the file is as follow :
>
>LOCUS       6 489671 bp DNA HTG 13-FEB-2006
>DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>            52296503..52786173 reannotated via EnsEMBL
>ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
>VERSION     chromosome:NCBIM34:6:52296503:52786173:1
>
>I used the code provided in biojavax docbook to parse this file.
>I get the following error :
>
>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>sequence
>    at 
>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>    at 
>org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
>Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 
>6 489671 bp DNA HTG 13-FEB-2006
>    at 
>org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
>    at 
>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>    ... 1 more
>
>I had a look at GenbankFormat.java, and I guess the problem comes from 
>the regular expression that do not recognize the LOCUS as a standard 
>Genbank file LOCUS tag.
>
>Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
>exported files ?
>
>Morgane.
>
> 
>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be)

Vrije Universiteit Brussels (VUB) 
Laboratory of Cell Genetics 
Pleinlaan 2 
1050 Brussels 
Belgium 

Tel : +32 2 629 15 22 
**********************************************************
Stop Using Internet Explorer, choose FIREFOX !
http://emmanuel.clement.free.fr/navigateurs/comparatif.htm

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From mthomasc at vub.ac.be  Wed Feb 15 05:04:22 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Wed Feb 15 05:00:07 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
In-Reply-To: <OF9029C41A.DCC8FB0D-ON48257116.0031629F-48257116.003181C8@EU.novartis.net>
References: <OF9029C41A.DCC8FB0D-ON48257116.0031629F-48257116.003181C8@EU.novartis.net>
Message-ID: <43F2FCA6.4040905@vub.ac.be>

Hi Mark,

I have downloaded the fixed version and tested it with my large file. 
Works great.

Thank you very much,

Morgane.

mark.schreiber@novartis.com wrote:

>Hi Morgane -
>
>Turned out to be a problem with a greedy regexp parsing the LOCUS tag. 
>This is fixed in CVS. Let me know if something else is a problem.
>
>- Mark
>
>
>
>
>
>Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>Sent by: biojava-l-bounces@portal.open-bio.org
>02/14/2006 09:33 PM
>
> 
>        To:     biojava-l@biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        Re: [Biojava-l] Genbank  parser error [biojavax]
>
>
>Hello Mark,
>
>My file is indeed too large to be posted.
>So I have exported a smaller sequence from Ensembl that I tested with 
>the parser. The behavior is the same.
>You will find below this "Genbank" formatted file enclosed.
>
>Thanks for your help,
>
>Morgane.
>
>LOCUS       6 3498 bp DNA HTG 14-FEB-2006
>DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>            52305503..52309000 reannotated via EnsEMBL
>ACCESSION   chromosome:NCBIM34:6:52305503:52309000:1
>VERSION     chromosome:NCBIM34:6:52305503:52309000:1
>KEYWORDS    .
>SOURCE      House mouse
>  ORGANISM  Mus musculus
>            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
>Euteleostomi;
>            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
>            Sciurognathi; Muridae; Murinae; Mus.
>COMMENT     This sequence was annotated by the Ensembl system. Please 
>visit the
>            Ensembl web site, http://www.ensembl.org/ for more information.
>COMMENT     All feature locations are relative to the first (5') base of 
>the
>            sequence in this file.  The sequence presented is always the
>            forward strand of the assembly. Features that lie outside of 
>the
>            sequence contained in this file have clonal location 
>coordinates in
>            the format: .:..
>COMMENT     The /gene indicates a unique id for a gene,
>            /note="transcript_id=..." a unique id for a transcript, 
>/protein_id
>            a unique id for a peptide and note="exon_id=..." a unique id 
>for an
>            exon. These ids are maintained wherever possible between 
>versions.
>COMMENT     All the exons and transcripts in Ensembl are confirmed by
>            similarity to either protein or cDNA sequences.
>FEATURES             Location/Qualifiers
>     source          1..3498
>                     /organism="Mus musculus"
>                     /db_xref="taxon:10090"
>     gene            complement(506..2826)
>                     /gene=ENSMUSG00000014704
>     mRNA            join(complement(2261..2826),complement(506..1620))
>                     /gene="ENSMUSG00000014704"
>                     /note="transcript_id=ENSMUST00000014848"
>     CDS             join(complement(2261..2639),complement(881..1620))
>                     /gene="ENSMUSG00000014704"
>                     /protein_id="ENSMUSP00000014848"
>                     /note="transcript_id=ENSMUST00000014848"
>                     /db_xref="MarkerSymbol:Hoxa2"
>                     /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE"
>                     /db_xref="RefSeq_peptide:NP_034581.1"
>                     /db_xref="RefSeq_dna:NM_010451.1"
>                     /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE"
>                     /db_xref="EntrezGene:15399"
>                     /db_xref="AgilentProbe:A_51_P501803"
>                     /db_xref="EMBL:AB039184"
>                     /db_xref="EMBL:AB039185"
>                     /db_xref="EMBL:AB039186"
>                     /db_xref="EMBL:AB039187"
>                     /db_xref="EMBL:AB039188"
>                     /db_xref="EMBL:AB039189"
>                     /db_xref="EMBL:AB039190"
>                     /db_xref="EMBL:AB039191"
>                     /db_xref="EMBL:AB039192"
>                     /db_xref="EMBL:AK134501"
>                     /db_xref="EMBL:M87801"
>                     /db_xref="EMBL:M93148"
>                     /db_xref="EMBL:M93292"
>                     /db_xref="EMBL:M95599"
>                     /db_xref="GO:GO:0003700"
>                     /db_xref="GO:GO:0005634"
>                     /db_xref="GO:GO:0006355"
>                     /db_xref="GO:GO:0007275"
>                     /db_xref="IPI:IPI00132242.1"
>                     /db_xref="UniGene:Mm.131"
>                     /db_xref="protein_id:AAA37827.1"
>                     /db_xref="protein_id:AAA37834.1"
>                     /db_xref="protein_id:AAA37835.1"
>                     /db_xref="protein_id:AAA37836.1"
>                     /db_xref="protein_id:BAB68708.1"
>                     /db_xref="protein_id:BAB68709.1"
>                     /db_xref="protein_id:BAB68710.1"
>                     /db_xref="protein_id:BAB68711.1"
>                     /db_xref="protein_id:BAB68712.1"
>                     /db_xref="protein_id:BAB68713.1"
>                     /db_xref="protein_id:BAB68714.1"
>                     /db_xref="protein_id:BAB68715.1"
>                     /db_xref="protein_id:BAB68716.1"
>                     /db_xref="protein_id:BAE22163.1"
>                     /db_xref="AFFY_MG_U74Av2:102643_at"
>                     /db_xref="AFFY_MG_U74Cv2:171063_at"
>                     /db_xref="AFFY_Mouse430A_2:1419602_at"
>                     /db_xref="AFFY_Mouse430_2:1419602_at"
> /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH
> STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE
> KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF
> NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK
> VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN
> EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD
>                     ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY"
>     exon            complement(506..1620)
>                     /note="exon_id=ENSMUSE00000387033"
>     exon            complement(2261..2826)
>                     /note="exon_id=ENSMUSE00000193269"
>BASE COUNT  938 a 815 c 882 g 863 t
>ORIGIN
>        1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA 
>ATTTTTGATA
>       61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC 
>ACTCCACTCG
>      121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG 
>CTTGGGCTAG
>      181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG 
>GCCTGAGTCT
>      241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA 
>AAAAAAAAAA
>      301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT 
>TTGTTGCAGG
>      361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG 
>TGACCAGACT
>      421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT 
>TGAGAAAGAG
>      481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA 
>CCAAAAATAC
>      541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG 
>ACAATTTATG
>      601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA 
>AGCTTGTTGG
>      661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC 
>TTTAAAACTG
>      721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG 
>GGTAGATCAA
>      781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA 
>CCTGGTCAAA
>      841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC 
>AGATGCTGTA
>      901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG 
>ATATCTACAG
>      961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC 
>AGGCAGGAAT
>     1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG 
>GGACTGTCAT
>     1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA 
>ACAGTGGGTG
>     1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA 
>ACTGGGAAAG
>     1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA 
>TTTTGCTGAA
>     1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC 
>TCAAAGAGTG
>     1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA 
>AATTTCCCTT
>     1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC 
>CGGTTCTGAA
>     1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC 
>ACCCTGCGGG
>     1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC 
>TGAGTGTTGG
>     1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT 
>TCCAGGGATT
>     1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG 
>GGTCCGAGCA
>     1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA 
>AATGGCCGCC
>     1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG 
>GGAAGCCCAG
>     1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC 
>ATCCGGGAGC
>     1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT 
>AGCTGAGCAA
>     1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA 
>CTAGACAAGA
>     1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG 
>AAAGTGCCCC
>     2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC 
>CCTCCACCAA
>     2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC 
>TCTCTCCCCC
>     2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC 
>CGGAGGGGGA
>     2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC 
>GAGGCAGGCA
>     2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT 
>CTTCTCCTTC
>     2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC 
>GCGACTGCCC
>     2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG 
>ACTGCCCGGG
>     2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG 
>TGAAAGCGTC
>     2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA 
>TGTCAGGCAC
>     2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC 
>GTAATTCATG
>     2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG 
>CTTTGGGGGG
>     2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG 
>AAGATCGCTG
>     2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG 
>CTACTATTAA
>     2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA 
>CATGATTGCT
>     2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT 
>GATTGATCCA
>     2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC 
>ACTTTTTTTC
>     3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC 
>GTGGGGGGCG
>     3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA 
>GTGTGTGTGT
>     3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG 
>CCTCCCCCGC
>     3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA 
>AATCATTTAA
>     3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT 
>CAAAGTTTTG
>     3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG 
>AAAGGAGCAG
>     3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA 
>GAGAGAGAGA
>     3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC 
>TCTTCCTCCT
>     3481 CTTTTTCCAA AATCAGTT
>//
>
>
>
>
>mark.schreiber@novartis.com wrote:
>
>  
>
>>Hi Morgane -
>>
>>I have to say that doesn't look much like Genbank : )
>>
>>The biojavax parser are possibly a bit brittle due to their use of 
>>    
>>
>regexps 
>  
>
>>to recognize key elements. It should be fixable, I think the problem is 
>>that the parser expects a word after LOCUS not a number. This may not be 
>>the only problem though. Could you post the entire file? Or if it is 
>>    
>>
>large 
>  
>
>>then a representative file of smaller size.
>>
>>- Mark
>>
>>
>>
>>
>>
>>Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>>Sent by: biojava-l-bounces@portal.open-bio.org
>>02/14/2006 04:36 AM
>>
>>
>>       To:     biojava-l@biojava.org
>>       cc:     (bcc: Mark Schreiber/GP/Novartis)
>>       Subject:        [Biojava-l] Genbank  parser error [biojavax]
>>
>>
>>Hello,
>>
>>I have tried biojavax today with a view to use the Genbank file parser.
>>
>>My test file is a Genbank formatted file which has been produced by 
>>Ensembl export system.
>>
>>The head of the file is as follow :
>>
>>LOCUS       6 489671 bp DNA HTG 13-FEB-2006
>>DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>>           52296503..52786173 reannotated via EnsEMBL
>>ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
>>VERSION     chromosome:NCBIM34:6:52296503:52786173:1
>>
>>I used the code provided in biojavax docbook to parse this file.
>>I get the following error :
>>
>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>sequence
>>   at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>   at 
>>org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31)
>>Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 
>>6 489671 bp DNA HTG 13-FEB-2006
>>   at 
>>org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229)
>>   at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>   ... 1 more
>>
>>I had a look at GenbankFormat.java, and I guess the problem comes from 
>>the regular expression that do not recognize the LOCUS as a standard 
>>Genbank file LOCUS tag.
>>
>>Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
>>exported files ?
>>
>>Morgane.
>>
>>
>>
>>    
>>
>
>  
>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be)

Vrije Universiteit Brussels (VUB)    
Laboratory of Cell Genetics          
Pleinlaan 2                          
1050 Brussels                        
Belgium                              

Tel : +32 2 629 15 22                		     
**********************************************************
Stop Using Internet Explorer, choose FIREFOX !
http://emmanuel.clement.free.fr/navigateurs/comparatif.htm

From mark.schreiber at novartis.com  Wed Feb 15 07:20:13 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Feb 15 07:16:10 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
Message-ID: <OFB42E1AA9.098C0A0B-ON48257116.0043AAC6-48257116.0043C53F@EU.novartis.net>

I think these properties should be going to the (Rich)Annotation bundle.

- Mark


Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
Sent by: biojava-l-bounces@portal.open-bio.org
02/15/2006 04:56 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Genbank  parser error [biojavax]


Hello again,

I have continued using the Genbank parser, but this time with Genbank 
files coming from NCBI :)

I really appreciate the example from the documentation that converts a 
Genbank file into an EMBL file. I have to say, it is really easy to use.

I nevertheless have a question concerning the Organism and Source tags. 
Indeed, it is clear in the documentation that they are ignored by the 
parser.
But I do not really understand why.
When I used the Genbank file of the accession numbers : AC147788 and 
DQ158013, I was unable to get the common name of the organism or use 
getNameHierarchy(), but I can get the taxon ID for both.

Is there a way to get the common name of the organism, without using a 
remote call to the NCBI with the taxonID ?

Thanks for your help,

Morgane.

Morgane THOMAS-CHOLLIER wrote:

> Hello Mark,
>
> My file is indeed too large to be posted.
> So I have exported a smaller sequence from Ensembl that I tested with 
> the parser. The behavior is the same.
> You will find below this "Genbank" formatted file enclosed.
>
> Thanks for your help,
>
> Morgane.
>
> LOCUS       6 3498 bp DNA HTG 14-FEB-2006
> DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>            52305503..52309000 reannotated via EnsEMBL
> ACCESSION   chromosome:NCBIM34:6:52305503:52309000:1
> VERSION     chromosome:NCBIM34:6:52305503:52309000:1
> KEYWORDS    .
> SOURCE      House mouse
>  ORGANISM  Mus musculus
>            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
> Euteleostomi;
>            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
>            Sciurognathi; Muridae; Murinae; Mus.
> COMMENT     This sequence was annotated by the Ensembl system. Please 
> visit the
>            Ensembl web site, http://www.ensembl.org/ for more 
> information.
> COMMENT     All feature locations are relative to the first (5') base 
> of the
>            sequence in this file.  The sequence presented is always the
>            forward strand of the assembly. Features that lie outside 
> of the
>            sequence contained in this file have clonal location 
> coordinates in
>            the format: .:..
> COMMENT     The /gene indicates a unique id for a gene,
>            /note="transcript_id=..." a unique id for a transcript, 
> /protein_id
>            a unique id for a peptide and note="exon_id=..." a unique 
> id for an
>            exon. These ids are maintained wherever possible between 
> versions.
> COMMENT     All the exons and transcripts in Ensembl are confirmed by
>            similarity to either protein or cDNA sequences.
> FEATURES             Location/Qualifiers
>     source          1..3498
>                     /organism="Mus musculus"
>                     /db_xref="taxon:10090"
>     gene            complement(506..2826)
>                     /gene=ENSMUSG00000014704
>     mRNA            join(complement(2261..2826),complement(506..1620))
>                     /gene="ENSMUSG00000014704"
>                     /note="transcript_id=ENSMUST00000014848"
>     CDS             join(complement(2261..2639),complement(881..1620))
>                     /gene="ENSMUSG00000014704"
>                     /protein_id="ENSMUSP00000014848"
>                     /note="transcript_id=ENSMUST00000014848"
>                     /db_xref="MarkerSymbol:Hoxa2"
>                     /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE"
>                     /db_xref="RefSeq_peptide:NP_034581.1"
>                     /db_xref="RefSeq_dna:NM_010451.1"
>                     /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE"
>                     /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE"
>                     /db_xref="EntrezGene:15399"
>                     /db_xref="AgilentProbe:A_51_P501803"
>                     /db_xref="EMBL:AB039184"
>                     /db_xref="EMBL:AB039185"
>                     /db_xref="EMBL:AB039186"
>                     /db_xref="EMBL:AB039187"
>                     /db_xref="EMBL:AB039188"
>                     /db_xref="EMBL:AB039189"
>                     /db_xref="EMBL:AB039190"
>                     /db_xref="EMBL:AB039191"
>                     /db_xref="EMBL:AB039192"
>                     /db_xref="EMBL:AK134501"
>                     /db_xref="EMBL:M87801"
>                     /db_xref="EMBL:M93148"
>                     /db_xref="EMBL:M93292"
>                     /db_xref="EMBL:M95599"
>                     /db_xref="GO:GO:0003700"
>                     /db_xref="GO:GO:0005634"
>                     /db_xref="GO:GO:0006355"
>                     /db_xref="GO:GO:0007275"
>                     /db_xref="IPI:IPI00132242.1"
>                     /db_xref="UniGene:Mm.131"
>                     /db_xref="protein_id:AAA37827.1"
>                     /db_xref="protein_id:AAA37834.1"
>                     /db_xref="protein_id:AAA37835.1"
>                     /db_xref="protein_id:AAA37836.1"
>                     /db_xref="protein_id:BAB68708.1"
>                     /db_xref="protein_id:BAB68709.1"
>                     /db_xref="protein_id:BAB68710.1"
>                     /db_xref="protein_id:BAB68711.1"
>                     /db_xref="protein_id:BAB68712.1"
>                     /db_xref="protein_id:BAB68713.1"
>                     /db_xref="protein_id:BAB68714.1"
>                     /db_xref="protein_id:BAB68715.1"
>                     /db_xref="protein_id:BAB68716.1"
>                     /db_xref="protein_id:BAE22163.1"
>                     /db_xref="AFFY_MG_U74Av2:102643_at"
>                     /db_xref="AFFY_MG_U74Cv2:171063_at"
>                     /db_xref="AFFY_Mouse430A_2:1419602_at"
>                     /db_xref="AFFY_Mouse430_2:1419602_at"
> 
> /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH
> 
> STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE
> 
> KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF
> 
> NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK
> 
> VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN
> 
> EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD
>                     ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY"
>     exon            complement(506..1620)
>                     /note="exon_id=ENSMUSE00000387033"
>     exon            complement(2261..2826)
>                     /note="exon_id=ENSMUSE00000193269"
> BASE COUNT  938 a 815 c 882 g 863 t
> ORIGIN
>        1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA 
> ATTTTTGATA
>       61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC 
> ACTCCACTCG
>      121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG 
> CTTGGGCTAG
>      181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG 
> GCCTGAGTCT
>      241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA 
> AAAAAAAAAA
>      301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT 
> TTGTTGCAGG
>      361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG 
> TGACCAGACT
>      421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT 
> TGAGAAAGAG
>      481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA 
> CCAAAAATAC
>      541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG 
> ACAATTTATG
>      601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA 
> AGCTTGTTGG
>      661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC 
> TTTAAAACTG
>      721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG 
> GGTAGATCAA
>      781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA 
> CCTGGTCAAA
>      841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC 
> AGATGCTGTA
>      901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG 
> ATATCTACAG
>      961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC 
> AGGCAGGAAT
>     1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG 
> GGACTGTCAT
>     1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA 
> ACAGTGGGTG
>     1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA 
> ACTGGGAAAG
>     1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA 
> TTTTGCTGAA
>     1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC 
> TCAAAGAGTG
>     1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA 
> AATTTCCCTT
>     1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC 
> CGGTTCTGAA
>     1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC 
> ACCCTGCGGG
>     1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC 
> TGAGTGTTGG
>     1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT 
> TCCAGGGATT
>     1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG 
> GGTCCGAGCA
>     1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA 
> AATGGCCGCC
>     1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG 
> GGAAGCCCAG
>     1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC 
> ATCCGGGAGC
>     1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT 
> AGCTGAGCAA
>     1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA 
> CTAGACAAGA
>     1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG 
> AAAGTGCCCC
>     2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC 
> CCTCCACCAA
>     2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC 
> TCTCTCCCCC
>     2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC 
> CGGAGGGGGA
>     2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC 
> GAGGCAGGCA
>     2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT 
> CTTCTCCTTC
>     2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC 
> GCGACTGCCC
>     2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG 
> ACTGCCCGGG
>     2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG 
> TGAAAGCGTC
>     2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA 
> TGTCAGGCAC
>     2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC 
> GTAATTCATG
>     2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG 
> CTTTGGGGGG
>     2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG 
> AAGATCGCTG
>     2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG 
> CTACTATTAA
>     2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA 
> CATGATTGCT
>     2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT 
> GATTGATCCA
>     2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC 
> ACTTTTTTTC
>     3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC 
> GTGGGGGGCG
>     3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA 
> GTGTGTGTGT
>     3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG 
> CCTCCCCCGC
>     3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA 
> AATCATTTAA
>     3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT 
> CAAAGTTTTG
>     3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG 
> AAAGGAGCAG
>     3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA 
> GAGAGAGAGA
>     3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC 
> TCTTCCTCCT
>     3481 CTTTTTCCAA AATCAGTT
> //
>
>
>
>
> mark.schreiber@novartis.com wrote:
>
>> Hi Morgane -
>>
>> I have to say that doesn't look much like Genbank : )
>>
>> The biojavax parser are possibly a bit brittle due to their use of 
>> regexps to recognize key elements. It should be fixable, I think the 
>> problem is that the parser expects a word after LOCUS not a number. 
>> This may not be the only problem though. Could you post the entire 
>> file? Or if it is large then a representative file of smaller size.
>>
>> - Mark
>>
>>
>>
>>
>>
>> Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>> Sent by: biojava-l-bounces@portal.open-bio.org
>> 02/14/2006 04:36 AM
>>
>>
>>        To:     biojava-l@biojava.org
>>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>>        Subject:        [Biojava-l] Genbank  parser error [biojavax]
>>
>>
>> Hello,
>>
>> I have tried biojavax today with a view to use the Genbank file parser.
>>
>> My test file is a Genbank formatted file which has been produced by 
>> Ensembl export system.
>>
>> The head of the file is as follow :
>>
>> LOCUS       6 489671 bp DNA HTG 13-FEB-2006
>> DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>>            52296503..52786173 reannotated via EnsEMBL
>> ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
>> VERSION     chromosome:NCBIM34:6:52296503:52786173:1
>>
>> I used the code provided in biojavax docbook to parse this file.
>> I get the following error :
>>
>> Exception in thread "main" org.biojava.bio.BioException: Could not 
>> read sequence
>>    at 
>> 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) 

>>
>>    at 
>> 
org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) 

>>
>> Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line 
>> found: 6 489671 bp DNA HTG 13-FEB-2006
>>    at 
>> 
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) 

>>
>>    at 
>> 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) 

>>
>>    ... 1 more
>>
>> I had a look at GenbankFormat.java, and I guess the problem comes 
>> from the regular expression that do not recognize the LOCUS as a 
>> standard Genbank file LOCUS tag.
>>
>> Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
>> exported files ?
>>
>> Morgane.
>>
>> 
>>
>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be)
Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From dreher at mpiib-berlin.mpg.de  Wed Feb 15 09:49:32 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Wed Feb 15 09:48:49 2006
Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation
Message-ID: <43F33F7C.2070407@mpiib-berlin.mpg.de>

Hello,

I have a question regarding the BioSQL-schema-scripts.
The tutorial on installing BioSQL 
(http://www.biojava.org/tutorials/biosql.html) says that three scripts 
are required:

biosqldb-pg.sql
biosql-accelerators-pg.sql
biosqldb-assembly-pg.sql

However, the 'assembly'-script can not be found on the CVS-server. 
Instead there is another script called 'biosqldb-views-pg.sql'.
So I would like to know which scripts should be used.


Furthermore I have a problem with adding an annotation (or also a 
feature) to a RichSequence.
As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I 
use BioJava-live (CVS) from 2 weeks ago and the latest CVS-BioSQL-scripts.

When I try the following code, the following Exceptions are thrown 
(while the execution of line 2).

1        RichSequence seq =  (SimpleRichSequence) 
RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq"));
2        ComparableTerm ct = 
RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname");
3        seq.getAnnotation().setProperty(ct, "project_25");


Exception in thread "main" java.lang.RuntimeException: Error while 
trying to call new class 
org.biojavax.ontology.SimpleComparableOntology(class java.lang.String)
        at 
org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154)
        at 
org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97)
        at 
org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178)
        at hibernatetest.Main.main(Main.java:246)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at 
org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138)
        ... 3 more
Caused by: org.hibernate.exception.SQLGrammarException: could not 
insert: [Ontology]
        at 
org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65)
        at 
org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
        at 
org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405)
        at 
org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269)
        at 
org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167)
        at 
org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101)
        at 
org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131)
        at 
org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87)
        at 
org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38)
        at org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642)
        at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616)
        ... 8 more
Caused by: org.postgresql.util.PSQLException: ERROR: relation 
"ontology_ontology_id_seq" does not exist
        at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
        at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
        at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
        at 
org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42)
        ... 20 more


Thanks in advance!

Greetings,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From mark.schreiber at novartis.com  Wed Feb 15 21:44:31 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Feb 15 21:40:33 2006
Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation
Message-ID: <OF13635A2C.F4C8008F-ON48257117.000EA542-48257117.000F1026@EU.novartis.net>

Wow, that tutorial is out of date!

The assembly sql is not required any longer. It was specifically put in by 
David Huen (I think) to allow him to store assembly data in biosql. Can 
anyone comment on the need for the accelerators?

As for you second point I would discourage the use of the enrich method 
whenever possible. It does the best it can but cannot work miracles. If 
you get a new download of CVS RichSequence.Tools has several 
createRichSequence methods to avoid the use of this 'anti-pattern'.

RichSequence seq = 
(SimpleRichSequence)RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq"));

As an aside there is no need to cast the return of enrich if you are 
assining it to a RichSequence pointer.

Hope this helps,

- Mark


Felix Dreher <dreher@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
02/15/2006 10:49 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation


Hello,

I have a question regarding the BioSQL-schema-scripts.
The tutorial on installing BioSQL 
(http://www.biojava.org/tutorials/biosql.html) says that three scripts 
are required:

biosqldb-pg.sql
biosql-accelerators-pg.sql
biosqldb-assembly-pg.sql

However, the 'assembly'-script can not be found on the CVS-server. 
Instead there is another script called 'biosqldb-views-pg.sql'.
So I would like to know which scripts should be used.


Furthermore I have a problem with adding an annotation (or also a 
feature) to a RichSequence.
As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I 
use BioJava-live (CVS) from 2 weeks ago and the latest CVS-BioSQL-scripts.

When I try the following code, the following Exceptions are thrown 
(while the execution of line 2).

1        RichSequence seq =  (SimpleRichSequence) 
RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq"));
2        ComparableTerm ct = 
RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname");
3        seq.getAnnotation().setProperty(ct, "project_25");


Exception in thread "main" java.lang.RuntimeException: Error while 
trying to call new class 
org.biojavax.ontology.SimpleComparableOntology(class java.lang.String)
        at 
org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154)
        at 
org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97)
        at 
org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178)
        at hibernatetest.Main.main(Main.java:246)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at 
org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138)
        ... 3 more
Caused by: org.hibernate.exception.SQLGrammarException: could not 
insert: [Ontology]
        at 
org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65)
        at 
org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
        at 
org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994)
        at 
org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405)
        at 
org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37)
        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
        at 
org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269)
        at 
org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167)
        at 
org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101)
        at 
org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131)
        at 
org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87)
        at 
org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38)
        at 
org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642)
        at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616)
        ... 8 more
Caused by: org.postgresql.util.PSQLException: ERROR: relation 
"ontology_ontology_id_seq" does not exist
        at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
        at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
        at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
        at 
org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42)
        ... 20 more


Thanks in advance!

Greetings,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From dreher at mpiib-berlin.mpg.de  Thu Feb 16 07:25:53 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Thu Feb 16 07:25:23 2006
Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation
Message-ID: <43F46F51.5030002@mpiib-berlin.mpg.de>

Hello Mark,

thank you for the information.
If I got it right, when using BioJavaX, the only BioSQL-script that is 
really needed for PostgreSQL is 'biosqldb-pg.sql' (plus possibly 
'biosql-accelerators-pg.sql').
I tried the example again (with and without the accelerator-script) with 
the new CVS-RichSequence.Tools method (see below), but still the same 
exceptions are thrown:

org.postgresql.util.PSQLException: ERROR: relation 
"ontology_ontology_id_seq" does not exist

org.hibernate.exception.SQLGrammarException: could not insert: [Ontology]

Exception in thread "main" java.lang.RuntimeException: Error while 
trying to call new class
   org.biojavax.ontology.SimpleComparableOntology(class java.lang.String)


I'm wondering if something with the Hibernate-Configuration is wrong, 
because in the log-file I found two suspicious entries:


2006-02-16 12:35:12,676  INFO [main]
calling method:
org.hibernate.transaction.TransactionManagerLookupFactory.getTransactionManagerLookup(TransactionManagerLookupFactory.java:33)
No TransactionManagerLookup configured (in JTA environment, use of 
read-write or transactional second-level cache is not recommended)

2006-02-16 12:35:12,754  WARN [main]
calling method:
net.sf.ehcache.config.Configurator.configure(Configurator.java:126)
No configuration found. Configuring ehcache from ehcache-failsafe.xml 
found in the classpath: 
jar:file:/home/dreher/Java/hibernate-3.1/lib/ehcache-1.1.jar!/ehcache-failsafe.xml


Since I ran out of ideas, I hope maybe someone has a hint where I could 
search further.

Thanks in advance,
Felix


P.S.: Here's the code-example:


public class HibernateTest {

    static private final Logger logger =
    PredictionLogger.getLogger(HibernateTest.class);

    public static void main(String[] args) {

        SessionFactory hibernateFactory  = new
        Configuration().configure().buildSessionFactory();
        Session session = hibernateFactory.openSession();
        RichObjectFactory.connectToBioSQL(session);
        Transaction tx = session.beginTransaction();
     
        try {
            //create a RichSequence
            FiniteAlphabet dna = (FiniteAlphabet)
            AlphabetManager.alphabetForName("DNA");
            RichSequence seq =
            RichSequence.Tools.createRichSequence("targets", "testseq", 
"acgcttcatctgc", dna);
      
            //add an Annotation to that Sequence
            ComparableTerm ct = 
RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname");
            seq.getAnnotation().setProperty(ct, "bklf25");
          
            tx.commit();
            System.out.println("Annotation added.");
        }
        catch (Exception ex) {
            tx.rollback();
            System.out.println("Transaction Error.");
            logger.error("Changes rolled back.", ex);
        }
        finally {
            session.close();
        }
    }
}


mark.schreiber@novartis.com wrote:

 >Wow, that tutorial is out of date!
 >
 >The assembly sql is not required any longer. It was specifically put 
in by
 >David Huen (I think) to allow him to store assembly data in biosql. Can
 >anyone comment on the need for the accelerators?
 >
 >As for you second point I would discourage the use of the enrich method
 >whenever possible. It does the best it can but cannot work miracles. If
 >you get a new download of CVS RichSequence.Tools has several
 >createRichSequence methods to avoid the use of this 'anti-pattern'.
 >
 >RichSequence seq =
 >(SimpleRichSequence)RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq"));
 >
 >As an aside there is no need to cast the return of enrich if you are
 >assining it to a RichSequence pointer.
 >
 >Hope this helps,
 >
 >- Mark
 >
 >
 >
 >
 >
 >Felix Dreher <dreher@mpiib-berlin.mpg.de>
 >Sent by: biojava-l-bounces@portal.open-bio.org
 >02/15/2006 10:49 PM
 >
 >
 >        To:     biojava-l@biojava.org
 >        cc:     (bcc: Mark Schreiber/GP/Novartis)
 >        Subject:        [Biojava-l] Problem: BioSQL-cvs and/or
RichSequence-Annotation
 >
 >
 >Hello,
 >
 >I have a question regarding the BioSQL-schema-scripts.
 >The tutorial on installing BioSQL
 >(http://www.biojava.org/tutorials/biosql.html) says that three scripts
 >are required:
 >
 >biosqldb-pg.sql
 >biosql-accelerators-pg.sql
 >biosqldb-assembly-pg.sql
 >
 >However, the 'assembly'-script can not be found on the CVS-server.
 >Instead there is another script called 'biosqldb-views-pg.sql'.
 >So I would like to know which scripts should be used.
 >
 >
 >Furthermore I have a problem with adding an annotation (or also a
 >feature) to a RichSequence.
 >As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I
 >use BioJava-live (CVS) from 2 weeks ago and the latest CVS-BioSQL-scripts.
 >
 >When I try the following code, the following Exceptions are thrown
 >(while the execution of line 2).
 >
 >1        RichSequence seq =  (SimpleRichSequence)
 >RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq"));
 >2        ComparableTerm ct =
 >RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname");
 >3        seq.getAnnotation().setProperty(ct, "project_25");
 >
 >
 >
 >
 >Exception in thread "main" java.lang.RuntimeException: Error while
 >trying to call new class
 >org.biojavax.ontology.SimpleComparableOntology(class java.lang.String)
 >        at
 >org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154)
 >        at
 >org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97)
 >        at
 >org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178)
 >        at hibernatetest.Main.main(Main.java:246)
 >Caused by: java.lang.reflect.InvocationTargetException
 >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 >        at
 >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 >        at
 >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 >        at java.lang.reflect.Method.invoke(Method.java:585)
 >        at
 >org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138)
 >        ... 3 more
 >Caused by: org.hibernate.exception.SQLGrammarException: could not
 >insert: [Ontology]
 >        at
 >org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65)
 >        at
 >org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
 >        at
 >org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56)
 >        at
 >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994)
 >        at
 >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405)
 >        at
 >org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37)
 >        at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
 >        at
 >org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269)
 >        at
 >org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167)
 >        at
 >org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101)
 >        at
 >org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131)
 >        at
 >org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87)
 >        at
 >org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38)
 >        at
 >org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642)
 >        at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616)
 >        ... 8 more
 >Caused by: org.postgresql.util.PSQLException: ERROR: relation
 >"ontology_ontology_id_seq" does not exist
 >        at
 >org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
 >        at
 >org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
 >        at
 >org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
 >        at
 >org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
 >        at
 >org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
 >        at
 >org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
 >        at
 >org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42)
 >        ... 20 more
 >
 >
 >
 >Thanks in advance!
 >
 >Greetings,
 >Felix
 >
 >
 >
 >
 >


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From mark.schreiber at novartis.com  Thu Feb 16 07:45:04 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Feb 16 07:40:46 2006
Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation
Message-ID: <OFF3C6FB3F.9CCAE308-ON48257117.00458CB5-48257117.00460B37@EU.novartis.net>

Looking further at your exception trace from your previous email it seems 
like an error somewhere in the Hibernate binding in one of the hbm.xml 
config files.

Specifically, 

org.postgresql.util.PSQLException: ERROR: relation 
"ontology_ontology_id_seq" does not exist

The log files mean you have not configured a JTA transaction manager or 
cache. Not critical but recommended for any serious application.

- Mark


Felix Dreher <dreher@mpiib-berlin.mpg.de>
02/16/2006 08:25 PM

 
        To:     Mark Schreiber/GP/Novartis@PH, biojava-l@biojava.org
        cc: 
        Subject:        Re: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation


Hello Mark,

thank you for the information.
If I got it right, when using BioJavaX, the only BioSQL-script that is 
really needed for PostgreSQL is 'biosqldb-pg.sql' (plus possibly 
'biosql-accelerators-pg.sql').
I tried the example again (with and without the accelerator-script) with 
the new CVS-RichSequence.Tools method (see below), but still the same 
exceptions are thrown:

org.postgresql.util.PSQLException: ERROR: relation 
"ontology_ontology_id_seq" does not exist

org.hibernate.exception.SQLGrammarException: could not insert: [Ontology]

Exception in thread "main" java.lang.RuntimeException: Error while 
trying to call new class
   org.biojavax.ontology.SimpleComparableOntology(class java.lang.String)


I'm wondering if something with the Hibernate-Configuration is wrong, 
because in the log-file I found two suspicious entries:


2006-02-16 12:35:12,676  INFO [main]
calling method:
org.hibernate.transaction.TransactionManagerLookupFactory.getTransactionManagerLookup(TransactionManagerLookupFactory.java:33)
No TransactionManagerLookup configured (in JTA environment, use of 
read-write or transactional second-level cache is not recommended)

2006-02-16 12:35:12,754  WARN [main]
calling method:
net.sf.ehcache.config.Configurator.configure(Configurator.java:126)
No configuration found. Configuring ehcache from ehcache-failsafe.xml 
found in the classpath: 
jar:file:/home/dreher/Java/hibernate-3.1/lib/ehcache-1.1.jar!/ehcache-failsafe.xml


Since I ran out of ideas, I hope maybe someone has a hint where I could 
search further.

Thanks in advance,
Felix


P.S.: Here's the code-example:


public class HibernateTest {

    static private final Logger logger =
    PredictionLogger.getLogger(HibernateTest.class);

    public static void main(String[] args) {

        SessionFactory hibernateFactory  = new
        Configuration().configure().buildSessionFactory();
        Session session = hibernateFactory.openSession();
        RichObjectFactory.connectToBioSQL(session);
        Transaction tx = session.beginTransaction();
 
        try {
            //create a RichSequence
            FiniteAlphabet dna = (FiniteAlphabet)
            AlphabetManager.alphabetForName("DNA");
            RichSequence seq =
            RichSequence.Tools.createRichSequence("targets", "testseq", 
"acgcttcatctgc", dna);
 
            //add an Annotation to that Sequence
            ComparableTerm ct = 
RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname");
            seq.getAnnotation().setProperty(ct, "bklf25");
 
            tx.commit();
            System.out.println("Annotation added.");
        }
        catch (Exception ex) {
            tx.rollback();
            System.out.println("Transaction Error.");
            logger.error("Changes rolled back.", ex);
        }
        finally {
            session.close();
        }
    }
}


mark.schreiber@novartis.com wrote:

 >Wow, that tutorial is out of date!
 >
 >The assembly sql is not required any longer. It was specifically put 
in by
 >David Huen (I think) to allow him to store assembly data in biosql. Can
 >anyone comment on the need for the accelerators?
 >
 >As for you second point I would discourage the use of the enrich method
 >whenever possible. It does the best it can but cannot work miracles. If
 >you get a new download of CVS RichSequence.Tools has several
 >createRichSequence methods to avoid the use of this 'anti-pattern'.
 >
 >RichSequence seq =
 
>(SimpleRichSequence)RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq"));
 >
 >As an aside there is no need to cast the return of enrich if you are
 >assining it to a RichSequence pointer.
 >
 >Hope this helps,
 >
 >- Mark
 >
 >
 >
 >
 >
 >Felix Dreher <dreher@mpiib-berlin.mpg.de>
 >Sent by: biojava-l-bounces@portal.open-bio.org
 >02/15/2006 10:49 PM
 >
 >
 >        To:     biojava-l@biojava.org
 >        cc:     (bcc: Mark Schreiber/GP/Novartis)
 >        Subject:        [Biojava-l] Problem: BioSQL-cvs and/or
RichSequence-Annotation
 >
 >
 >Hello,
 >
 >I have a question regarding the BioSQL-schema-scripts.
 >The tutorial on installing BioSQL
 >(http://www.biojava.org/tutorials/biosql.html) says that three scripts
 >are required:
 >
 >biosqldb-pg.sql
 >biosql-accelerators-pg.sql
 >biosqldb-assembly-pg.sql
 >
 >However, the 'assembly'-script can not be found on the CVS-server.
 >Instead there is another script called 'biosqldb-views-pg.sql'.
 >So I would like to know which scripts should be used.
 >
 >
 >Furthermore I have a problem with adding an annotation (or also a
 >feature) to a RichSequence.
 >As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I
 >use BioJava-live (CVS) from 2 weeks ago and the latest 
CVS-BioSQL-scripts.
 >
 >When I try the following code, the following Exceptions are thrown
 >(while the execution of line 2).
 >
 >1        RichSequence seq =  (SimpleRichSequence)
 
>RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq"));
 >2        ComparableTerm ct =
 >RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname");
 >3        seq.getAnnotation().setProperty(ct, "project_25");
 >
 >
 >
 >
 >Exception in thread "main" java.lang.RuntimeException: Error while
 >trying to call new class
 >org.biojavax.ontology.SimpleComparableOntology(class java.lang.String)
 >        at
 
>org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154)
 >        at
 >org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97)
 >        at
 
>org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178)
 >        at hibernatetest.Main.main(Main.java:246)
 >Caused by: java.lang.reflect.InvocationTargetException
 >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 >        at
 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 >        at
 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 >        at java.lang.reflect.Method.invoke(Method.java:585)
 >        at
 
>org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138)
 >        ... 3 more
 >Caused by: org.hibernate.exception.SQLGrammarException: could not
 >insert: [Ontology]
 >        at
 
>org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65)
 >        at
 
>org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
 >        at
 
>org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56)
 >        at
 
>org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994)
 >        at
 
>org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405)
 >        at
 
>org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37)
 >        at 
org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243)
 >        at
 
>org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269)
 >        at
 
>org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167)
 >        at
 
>org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101)
 >        at
 
>org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131)
 >        at
 
>org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87)
 >        at
 
>org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38)
 >        at
 >org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642)
 >        at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616)
 >        ... 8 more
 >Caused by: org.postgresql.util.PSQLException: ERROR: relation
 >"ontology_ontology_id_seq" does not exist
 >        at
 
>org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
 >        at
 
>org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
 >        at
 
>org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
 >        at
 
>org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
 >        at
 
>org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
 >        at
 
>org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)
 >        at
 
>org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42)
 >        ... 20 more
 >
 >
 >
 >Thanks in advance!
 >
 >Greetings,
 >Felix
 >
 >
 >
 >
 >


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426


From mthomasc at vub.ac.be  Fri Feb 17 05:16:05 2006
From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER)
Date: Fri Feb 17 05:37:21 2006
Subject: [Biojava-l] Genbank  parser error [biojavax]
In-Reply-To: <OFB42E1AA9.098C0A0B-ON48257116.0043AAC6-48257116.0043C53F@EU.novartis.net>
References: <OFB42E1AA9.098C0A0B-ON48257116.0043AAC6-48257116.0043C53F@EU.novartis.net>
Message-ID: <43F5A265.7000605@vub.ac.be>

Hello Mark,

Thank you very much for your quick reply.

However, I could not find out how to get the organism informations via 
the (Rich)Annotation.
Would it be possible for you to post a piece of code showing how I could 
retrieve the common name for the organism ?

Sorry for insisting, but I really need this parser for my work, and I 
also really need to retrieve the organism info from the file :)

Thank you for your help,

Morgane.


mark.schreiber@novartis.com wrote:

>I think these properties should be going to the (Rich)Annotation bundle.
>
>- Mark
>
>
>
>
>
>Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>Sent by: biojava-l-bounces@portal.open-bio.org
>02/15/2006 04:56 PM
>
> 
>        To:     biojava-l@biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        Re: [Biojava-l] Genbank  parser error [biojavax]
>
>
>Hello again,
>
>I have continued using the Genbank parser, but this time with Genbank 
>files coming from NCBI :)
>
>I really appreciate the example from the documentation that converts a 
>Genbank file into an EMBL file. I have to say, it is really easy to use.
>
>I nevertheless have a question concerning the Organism and Source tags. 
>Indeed, it is clear in the documentation that they are ignored by the 
>parser.
>But I do not really understand why.
>When I used the Genbank file of the accession numbers : AC147788 and 
>DQ158013, I was unable to get the common name of the organism or use 
>getNameHierarchy(), but I can get the taxon ID for both.
>
>Is there a way to get the common name of the organism, without using a 
>remote call to the NCBI with the taxonID ?
>
>Thanks for your help,
>
>Morgane.
>
>Morgane THOMAS-CHOLLIER wrote:
>
>  
>
>>Hello Mark,
>>
>>My file is indeed too large to be posted.
>>So I have exported a smaller sequence from Ensembl that I tested with 
>>the parser. The behavior is the same.
>>You will find below this "Genbank" formatted file enclosed.
>>
>>Thanks for your help,
>>
>>Morgane.
>>
>>LOCUS       6 3498 bp DNA HTG 14-FEB-2006
>>DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>>           52305503..52309000 reannotated via EnsEMBL
>>ACCESSION   chromosome:NCBIM34:6:52305503:52309000:1
>>VERSION     chromosome:NCBIM34:6:52305503:52309000:1
>>KEYWORDS    .
>>SOURCE      House mouse
>> ORGANISM  Mus musculus
>>           Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
>>Euteleostomi;
>>           Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
>>           Sciurognathi; Muridae; Murinae; Mus.
>>COMMENT     This sequence was annotated by the Ensembl system. Please 
>>visit the
>>           Ensembl web site, http://www.ensembl.org/ for more 
>>information.
>>COMMENT     All feature locations are relative to the first (5') base 
>>of the
>>           sequence in this file.  The sequence presented is always the
>>           forward strand of the assembly. Features that lie outside 
>>of the
>>           sequence contained in this file have clonal location 
>>coordinates in
>>           the format: .:..
>>COMMENT     The /gene indicates a unique id for a gene,
>>           /note="transcript_id=..." a unique id for a transcript, 
>>/protein_id
>>           a unique id for a peptide and note="exon_id=..." a unique 
>>id for an
>>           exon. These ids are maintained wherever possible between 
>>versions.
>>COMMENT     All the exons and transcripts in Ensembl are confirmed by
>>           similarity to either protein or cDNA sequences.
>>FEATURES             Location/Qualifiers
>>    source          1..3498
>>                    /organism="Mus musculus"
>>                    /db_xref="taxon:10090"
>>    gene            complement(506..2826)
>>                    /gene=ENSMUSG00000014704
>>    mRNA            join(complement(2261..2826),complement(506..1620))
>>                    /gene="ENSMUSG00000014704"
>>                    /note="transcript_id=ENSMUST00000014848"
>>    CDS             join(complement(2261..2639),complement(881..1620))
>>                    /gene="ENSMUSG00000014704"
>>                    /protein_id="ENSMUSP00000014848"
>>                    /note="transcript_id=ENSMUST00000014848"
>>                    /db_xref="MarkerSymbol:Hoxa2"
>>                    /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE"
>>                    /db_xref="RefSeq_peptide:NP_034581.1"
>>                    /db_xref="RefSeq_dna:NM_010451.1"
>>                    /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE"
>>                    /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE"
>>                    /db_xref="EntrezGene:15399"
>>                    /db_xref="AgilentProbe:A_51_P501803"
>>                    /db_xref="EMBL:AB039184"
>>                    /db_xref="EMBL:AB039185"
>>                    /db_xref="EMBL:AB039186"
>>                    /db_xref="EMBL:AB039187"
>>                    /db_xref="EMBL:AB039188"
>>                    /db_xref="EMBL:AB039189"
>>                    /db_xref="EMBL:AB039190"
>>                    /db_xref="EMBL:AB039191"
>>                    /db_xref="EMBL:AB039192"
>>                    /db_xref="EMBL:AK134501"
>>                    /db_xref="EMBL:M87801"
>>                    /db_xref="EMBL:M93148"
>>                    /db_xref="EMBL:M93292"
>>                    /db_xref="EMBL:M95599"
>>                    /db_xref="GO:GO:0003700"
>>                    /db_xref="GO:GO:0005634"
>>                    /db_xref="GO:GO:0006355"
>>                    /db_xref="GO:GO:0007275"
>>                    /db_xref="IPI:IPI00132242.1"
>>                    /db_xref="UniGene:Mm.131"
>>                    /db_xref="protein_id:AAA37827.1"
>>                    /db_xref="protein_id:AAA37834.1"
>>                    /db_xref="protein_id:AAA37835.1"
>>                    /db_xref="protein_id:AAA37836.1"
>>                    /db_xref="protein_id:BAB68708.1"
>>                    /db_xref="protein_id:BAB68709.1"
>>                    /db_xref="protein_id:BAB68710.1"
>>                    /db_xref="protein_id:BAB68711.1"
>>                    /db_xref="protein_id:BAB68712.1"
>>                    /db_xref="protein_id:BAB68713.1"
>>                    /db_xref="protein_id:BAB68714.1"
>>                    /db_xref="protein_id:BAB68715.1"
>>                    /db_xref="protein_id:BAB68716.1"
>>                    /db_xref="protein_id:BAE22163.1"
>>                    /db_xref="AFFY_MG_U74Av2:102643_at"
>>                    /db_xref="AFFY_MG_U74Cv2:171063_at"
>>                    /db_xref="AFFY_Mouse430A_2:1419602_at"
>>                    /db_xref="AFFY_Mouse430_2:1419602_at"
>>
>>/translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH
>>
>>STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE
>>
>>KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF
>>
>>NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK
>>
>>VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN
>>
>>EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD
>>                    ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY"
>>    exon            complement(506..1620)
>>                    /note="exon_id=ENSMUSE00000387033"
>>    exon            complement(2261..2826)
>>                    /note="exon_id=ENSMUSE00000193269"
>>BASE COUNT  938 a 815 c 882 g 863 t
>>ORIGIN
>>       1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA 
>>ATTTTTGATA
>>      61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC 
>>ACTCCACTCG
>>     121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG 
>>CTTGGGCTAG
>>     181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG 
>>GCCTGAGTCT
>>     241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA 
>>AAAAAAAAAA
>>     301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT 
>>TTGTTGCAGG
>>     361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG 
>>TGACCAGACT
>>     421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT 
>>TGAGAAAGAG
>>     481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA 
>>CCAAAAATAC
>>     541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG 
>>ACAATTTATG
>>     601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA 
>>AGCTTGTTGG
>>     661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC 
>>TTTAAAACTG
>>     721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG 
>>GGTAGATCAA
>>     781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA 
>>CCTGGTCAAA
>>     841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC 
>>AGATGCTGTA
>>     901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG 
>>ATATCTACAG
>>     961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC 
>>AGGCAGGAAT
>>    1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG 
>>GGACTGTCAT
>>    1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA 
>>ACAGTGGGTG
>>    1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA 
>>ACTGGGAAAG
>>    1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA 
>>TTTTGCTGAA
>>    1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC 
>>TCAAAGAGTG
>>    1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA 
>>AATTTCCCTT
>>    1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC 
>>CGGTTCTGAA
>>    1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC 
>>ACCCTGCGGG
>>    1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC 
>>TGAGTGTTGG
>>    1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT 
>>TCCAGGGATT
>>    1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG 
>>GGTCCGAGCA
>>    1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA 
>>AATGGCCGCC
>>    1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG 
>>GGAAGCCCAG
>>    1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC 
>>ATCCGGGAGC
>>    1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT 
>>AGCTGAGCAA
>>    1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA 
>>CTAGACAAGA
>>    1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG 
>>AAAGTGCCCC
>>    2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC 
>>CCTCCACCAA
>>    2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC 
>>TCTCTCCCCC
>>    2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC 
>>CGGAGGGGGA
>>    2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC 
>>GAGGCAGGCA
>>    2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT 
>>CTTCTCCTTC
>>    2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC 
>>GCGACTGCCC
>>    2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG 
>>ACTGCCCGGG
>>    2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG 
>>TGAAAGCGTC
>>    2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA 
>>TGTCAGGCAC
>>    2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC 
>>GTAATTCATG
>>    2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG 
>>CTTTGGGGGG
>>    2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG 
>>AAGATCGCTG
>>    2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG 
>>CTACTATTAA
>>    2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA 
>>CATGATTGCT
>>    2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT 
>>GATTGATCCA
>>    2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC 
>>ACTTTTTTTC
>>    3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC 
>>GTGGGGGGCG
>>    3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA 
>>GTGTGTGTGT
>>    3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG 
>>CCTCCCCCGC
>>    3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA 
>>AATCATTTAA
>>    3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT 
>>CAAAGTTTTG
>>    3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG 
>>AAAGGAGCAG
>>    3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA 
>>GAGAGAGAGA
>>    3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC 
>>TCTTCCTCCT
>>    3481 CTTTTTCCAA AATCAGTT
>>//
>>
>>
>>
>>
>>mark.schreiber@novartis.com wrote:
>>
>>    
>>
>>>Hi Morgane -
>>>
>>>I have to say that doesn't look much like Genbank : )
>>>
>>>The biojavax parser are possibly a bit brittle due to their use of 
>>>regexps to recognize key elements. It should be fixable, I think the 
>>>problem is that the parser expects a word after LOCUS not a number. 
>>>This may not be the only problem though. Could you post the entire 
>>>file? Or if it is large then a representative file of smaller size.
>>>
>>>- Mark
>>>
>>>
>>>
>>>
>>>
>>>Morgane THOMAS-CHOLLIER <mthomasc@vub.ac.be>
>>>Sent by: biojava-l-bounces@portal.open-bio.org
>>>02/14/2006 04:36 AM
>>>
>>>
>>>       To:     biojava-l@biojava.org
>>>       cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>       Subject:        [Biojava-l] Genbank  parser error [biojavax]
>>>
>>>
>>>Hello,
>>>
>>>I have tried biojavax today with a view to use the Genbank file parser.
>>>
>>>My test file is a Genbank formatted file which has been produced by 
>>>Ensembl export system.
>>>
>>>The head of the file is as follow :
>>>
>>>LOCUS       6 489671 bp DNA HTG 13-FEB-2006
>>>DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
>>>           52296503..52786173 reannotated via EnsEMBL
>>>ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
>>>VERSION     chromosome:NCBIM34:6:52296503:52786173:1
>>>
>>>I used the code provided in biojavax docbook to parse this file.
>>>I get the following error :
>>>
>>>Exception in thread "main" org.biojava.bio.BioException: Could not 
>>>read sequence
>>>   at 
>>>
>>>      
>>>
>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) 
>
>  
>
>>>   at 
>>>
>>>      
>>>
>org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) 
>
>  
>
>>>Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line 
>>>found: 6 489671 bp DNA HTG 13-FEB-2006
>>>   at 
>>>
>>>      
>>>
>org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) 
>
>  
>
>>>   at 
>>>
>>>      
>>>
>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) 
>
>  
>
>>>   ... 1 more
>>>
>>>I had a look at GenbankFormat.java, and I guess the problem comes 
>>>from the regular expression that do not recognize the LOCUS as a 
>>>standard Genbank file LOCUS tag.
>>>
>>>Am I wrong ? Have biojavax Genbank parser been tested on Ensembl 
>>>exported files ?
>>>
>>>Morgane.
>>>
>>>
>>>
>>>      
>>>
>
>  
>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be)

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium

Tel : +32 2 629 15 22
**********************************************************
Stop Using Internet Explorer, choose FIREFOX !

From mark.schreiber at novartis.com  Sun Feb 19 21:39:52 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Feb 19 21:35:27 2006
Subject: [Biojava-l] RichAnnotation
Message-ID: <OFA78FF2F6.A346FFBE-ON4825711B.000E1DE2-4825711B.000EA2F2@EU.novartis.net>

Hello -

We recently had some questions on the list about getting information out 
of a RichAnnotation. This would be one way to do it ...


RichAnnotation theAnnotation = (RichAnnotation)seq.getAnnotation();
Iterator notesIterator = theAnnotation.getNoteSet().iterator();
while (notesIterator.hasNext()) {
                   System.out.println();
                   Note note = (Note)notesIterator.next(); 
                   System.out.println(note);
}

All notes have a Term and a value. The value is a String and the Term is 
an ontology term.

Term term = note.getTerm();
String value = note.getValue();

The term has a name which is also a String

String name = term.getName();

So to get the name, value pair of a note you would do this:


String name = note.getTerm().getName();
String value = note.getValue();

Which is pretty much what the Note toString() method does.

- Mark


Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From mark.schreiber at novartis.com  Sun Feb 19 22:24:03 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Feb 19 22:20:33 2006
Subject: [Biojava-l] release plan
Message-ID: <OF57A1C9F1.34715D8F-ON4825711B.0012777F-4825711B.0012AED5@EU.novartis.net>

Hello -

There have been a few questions about when we are planning for a biojava 
1.5 release. I have posted a release plan to the website 
http://biojava.open-bio.org/wiki/BioJava:1.5ReleasePlan

Please feel free to comment and modify. As always volunteers are required 
to help move this forward. Let me know if you can help at all with any of 
the tasks.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From mark.schreiber at novartis.com  Sun Feb 19 22:36:54 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Sun Feb 19 22:32:27 2006
Subject: [Biojava-l] last call for logos
Message-ID: <OF2BC5FC82.C1AC95FD-ON4825711B.0013AE8C-4825711B.0013DBC5@EU.novartis.net>

Hello -

Last chance to post a logo for the biojava logo 
(http://biojava.open-bio.org/wiki/BioJava:Logo). 

If you like one of the ones you see but think it could be improved why not 
modify it. This is open-source after all.

"Voting" will start soon :)

- Mark
From dreher at mpiib-berlin.mpg.de  Mon Feb 20 12:34:22 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Mon Feb 20 12:33:44 2006
Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error
Message-ID: <43F9FD9E.1050408@mpiib-berlin.mpg.de>

Hello,

in the constructor of BioSQLRichSequenceDB the following line is called:
    this.addCriteria = criteria.getMethod("add", new Class[]{Class.class});
where criteria is an instance of org.hibernate.Criteria

The problem is, when I try to initialise a BioSQLRichSequenceDB,
a "NoSuchMethodException" is thrown at this line.
I searched for the getMethod- method and in fact it is not present in 
org.hibernate.Criteria.
So does anyone know if this is an error in the BioJava-class or in the 
Hibernate-class?

Greetings,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From mark.schreiber at novartis.com  Wed Feb 22 07:45:21 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Feb 22 07:40:52 2006
Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error
Message-ID: <OF3D8689F1.A62DCF41-ON4825711D.0045CA09-4825711D.004612C4@EU.novartis.net>

I think the method is add(Class class) but I will look into it shortly.

Please be aware that this is a very new untested and experimental class, 
unfortunately the author is in the process of moving countries so we may 
not be able to support you on this immediately.

Apologies for the problem.

- Mark


Felix Dreher <dreher@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
02/21/2006 01:34 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioSQLRichSequenceDB: initialisation error


Hello,

in the constructor of BioSQLRichSequenceDB the following line is called:
    this.addCriteria = criteria.getMethod("add", new 
Class[]{Class.class});
where criteria is an instance of org.hibernate.Criteria

The problem is, when I try to initialise a BioSQLRichSequenceDB,
a "NoSuchMethodException" is thrown at this line.
I searched for the getMethod- method and in fact it is not present in 
org.hibernate.Criteria.
So does anyone know if this is an error in the BioJava-class or in the 
Hibernate-class?

Greetings,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From dreher at mpiib-berlin.mpg.de  Fri Feb 24 07:53:45 2006
From: dreher at mpiib-berlin.mpg.de (Felix Dreher)
Date: Fri Feb 24 07:49:43 2006
Subject: [Biojava-l] BioJavaX docbook minor corrections
Message-ID: <43FF01D9.8040107@mpiib-berlin.mpg.de>

Hello,

just two suggestions/corrections about the BioJavaX-docbook.
In the section "Configuring your application to use Hibernate and BioSQL", complete example,
I found two errors (or at least these parts don't work in my test-app).


1)
// print out all the sequences in the namespace
        Query sq = session.createQuery("from BioEntry where namespace=?",ns);

--> should probably be:
       Query sq = session.createQuery("from BioEntry where namespace=:nsp");
       sq.setParameter("nsp",ns);


2)
// if the sequence is called bloggs, change its version to 99
       be.setVersion(99);

--> can't use the method setVersion(int) --> but e.g. setDescription("XYZ");


Regards,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

From mark.schreiber at novartis.com  Mon Feb 27 03:18:10 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Feb 27 03:13:30 2006
Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error
Message-ID: <OF80764A35.B2BAD0D0-ON48257122.002D9006-48257122.002D9BDD@EU.novartis.net>

Hi -

This should be resolved in CVS now. Let me know if it doesn't work.

Best regards,

- Mark


Felix Dreher <dreher@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
02/21/2006 01:34 AM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioSQLRichSequenceDB: initialisation error


Hello,

in the constructor of BioSQLRichSequenceDB the following line is called:
    this.addCriteria = criteria.getMethod("add", new 
Class[]{Class.class});
where criteria is an instance of org.hibernate.Criteria

The problem is, when I try to initialise a BioSQLRichSequenceDB,
a "NoSuchMethodException" is thrown at this line.
I searched for the getMethod- method and in fact it is not present in 
org.hibernate.Criteria.
So does anyone know if this is an error in the BioJava-class or in the 
Hibernate-class?

Greetings,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From mark.schreiber at novartis.com  Tue Feb 28 01:04:18 2006
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Tue Feb 28 00:59:40 2006
Subject: [Biojava-l] BioJavaX docbook minor corrections
Message-ID: <OF1EF25B00.62AF7371-ON48257123.00212CBA-48257123.00215A9A@EU.novartis.net>

Thanks for pointing these out. Corrected now in CVS.

The use of name parameters ( namespace= :nsp) is prefered (and the previous syntax was incorrect). As you point out, 
set version is a private method so you cannot access it (even in Hibernate 
which does some pretty odd things) so I changed this to you setDescription 
suggestion.

- Mark


Felix Dreher <dreher@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
02/24/2006 08:53 PM

 
        To:     biojava-l@biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BioJavaX docbook minor corrections


Hello,

just two suggestions/corrections about the BioJavaX-docbook.
In the section "Configuring your application to use Hibernate and BioSQL", 
complete example,
I found two errors (or at least these parts don't work in my test-app).


1)
// print out all the sequences in the namespace
        Query sq = session.createQuery("from BioEntry where 
namespace=?",ns);

--> should probably be:
       Query sq = session.createQuery("from BioEntry where 
namespace=:nsp");
       sq.setParameter("nsp",ns);


2)
// if the sequence is called bloggs, change its version to 99
       be.setVersion(99);

--> can't use the method setVersion(int) --> but e.g. 
setDescription("XYZ");


Regards,
Felix


-- 
Felix Dreher
Max-Planck-Institute for Infection Biology
Campus Charit? Mitte
Department of Immunology
Mailing address: Schumannstra?e 21/22
Visitors: Virchowweg 12
10117 Berlin
Germany
Tel.: +49 (0)30 28460-254 / -494
Mobile: +49 (0)163 7542426

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From shameer at ncbs.res.in  Mon Feb  6 05:03:10 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed Mar  8 09:45:10 2006
Subject: [Biojava-l] OBF - logo+slogan sample 
In-Reply-To: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
	<47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
	<2888.192.168.4.38.1139214470.squirrel@192.168.4.38>
Message-ID: <33354.192.168.1.1.1139218154.squirrel@192.168.1.1>

Dear All,

I have done with one - have a look at it-
pls check the attachment
S K

> Dear All,
>
> As we are moving to the all new look wiki-style-web - why dont we think
> about a unique logo +  slogan that can express our spirit and excitement
> ???
>
> For Example we can have a logo with O|B|F its full form and the slogan -
> any body is interested - i would be happy to design logos once we have
> done with the logo.
>
> I have a couple of suggestions -I hope all OBF members can sent much more
> powerful slogans than mine
>
> 'Let's Code for Life'
> 'Let's Decode Life'
> 'Let's Recode Life'
> 'Code your Life '
>
> Happy O|B|!!!
> --
> Mr. Shameer Khadar (JRF)
> Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
> National Centre for Biological Sciences (TIFR)
> UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
> T - 91-080-23636420-32 EXT 4241
> F - 91-080-23636662/23636675
> W - http://www.ncbs.res.in
> --------------------------------------------------
> "Refrain from illusions, insist on work and not words,
>  patiently seek divine and scientific truth."
> MM
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: obf-logo.gif
Type: image/gif
Size: 5370 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060206/6e9b166d/obf-logo.gif
From hotafin at gmail.com  Wed Feb  8 08:50:25 2006
From: hotafin at gmail.com (Tamas Horvath)
Date: Wed Mar  8 09:45:13 2006
Subject: [Biojava-l] structureNMRImpl
Message-ID: <c343d7080602080528s6655e325sbfdcb757bbaf3e8d@mail.gmail.com>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: structureNMRImpl.java
Type: application/octet-stream
Size: 10699 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060208/3f62c456/structureNMRImpl.obj