[Biojava-l] [Biojava-dev] [Fwd: large genbank data]
Rey Vincent Babilonia
rvincent at asti.dost.gov.ph
Mon Jul 21 02:35:04 UTC 2008
Dear all,
Here's the complete stack trace:
10:26:14,796 INFO Loader:296 - D:\AE000521.gbk is readable.
10:26:16,046 INFO Loader:340 - Alphabet of AE000521 is Empty Alphabet.
Skipping...
10:26:16,250 INFO Loader:296 - D:\AE004438.gbk is readable.
10:26:20,750 FATAL Loader:334 - Sequence AE004438 already exists.
10:26:20,921 INFO Loader:296 - D:\AE005174.gbk is readable.
10:26:28,328 INFO Loader:326 - Loading sequence AE005174 with
identifier 56384585, length 5528445 and alphabet DNA...
org.hibernate.PropertyAccessException: Exception occurred inside getter
of org.biojavax.bio.seq.SimpleRichSequence.sequenceLength
at
org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:148)
at
org.hibernate.tuple.entity.AbstractEntityTuplizer.getPropertyValues(AbstractEntityTuplizer.java:256)
at
org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValues(PojoEntityTuplizer.java:209)
at
org.hibernate.persister.entity.AbstractEntityPersister.getPropertyValues(AbstractEntityPersister.java:3581)
at
org.hibernate.event.def.DefaultMergeEventListener.copyValues(DefaultMergeEventListener.java:377)
at
org.hibernate.event.def.DefaultMergeEventListener.entityIsTransient(DefaultMergeEventListener.java:179)
at
org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:123)
at
org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:53)
at org.hibernate.impl.SessionImpl.fireMerge(SessionImpl.java:677)
at org.hibernate.impl.SessionImpl.merge(SessionImpl.java:661)
at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:328)
at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:145)
... 12 more
Caused by: java.lang.NullPointerException
at
org.biojavax.bio.seq.SimpleRichSequence.length(SimpleRichSequence.java:91)
at
org.biojavax.bio.seq.SimpleRichSequence.getSequenceLength(SimpleRichSequence.java:97)
... 17 more
10:26:28,937 ERROR AbstractBatcher:51 - Exception executing batch:
org.hibernate.StaleStateException: Batch update returned unexpected row
count from update [0]; actual row count: 0; expected: 1
at
org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
at
org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
at
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
at
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
at
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246)
at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168)
at
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351)
at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)
10:26:28,937 ERROR AbstractFlushingEventListener:301 - Could not
synchronize database state with session
org.hibernate.StaleStateException: Batch update returned unexpected row
count from update [0]; actual row count: 0; expected: 1
at
org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
at
org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
at
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
at
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
at
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246)
at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168)
at
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351)
at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)
Exception in thread "main" org.hibernate.StaleStateException: Batch
update returned unexpected row count from update [0]; actual row count:
0; expected: 1
at
org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
at
org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
at
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
at
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
at
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246)
at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
at
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168)
at
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351)
at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)
Richard Holland wrote:
> Hmm in that case it must be something else.
>
> Your original mail only posted the first couple of lines of the stack
> trace. Could you post the whole thing so we can take a closer look?
>
> 2008/7/18 Mark Schreiber <markjschreiber at gmail.com>:
>> Was looking on the internet ...
>>
>> So the Java spec says nothing about an upper limit however the sun JDK
>> implements String as a char[] (behind the scenes). Therefore I think
>> that on the Sun JDK with the right amount of RAM you could go to 2^32
>> (except for string literals as mentioned above) which is 4,294,967,296
>> characters. So a string of a sequence should be able to get to about 4
>> billion bases.
>>
>> Of course if you don't assign enough memory to the JVM ( -Xmx4G) you
>> won't be able to get close. Of course even if you can assign that much
>> that doesn't account for all the other Java overhead and all the stuff
>> Hibernate is doing with proxy classes etc. Also BioSQL usually
>> defines sequence as a CLOB so depending on your DB implementation
>> there may be a limit on that. On a 32 bit machine 4GB is all you can
>> get per CPU so you would have issues trying to do anything bigger.
>>
>> Anyhow I know I have stored human chromosome 1 (approx 1 billion bases
>> in memory).
>>
>>
>>
>> - Mark
>>
>> On Fri, Jul 18, 2008 at 6:45 PM, James Carman
>> <james at carmanconsulting.com> wrote:
>>> That is a limitation for string literals, not any string. Correct?
>>>
>>> On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland
>>> <dicknetherlands at gmail.com> wrote:
>>>> In order to persist to BioSQL, BioJava has to convert the symbol list
>>>> into a string so that it can pass it to JDBC via Hibernate. Therefore
>>>> the maximum length of a sequence you wish to persist to BioSQL is the
>>>> maximum length of a string in Java, which is 65536 (2^16) if you are
>>>> working in a UTF-8 environment.
>>>>
>>>> 2008/7/18 Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>:
>>>>> Hi Mark,
>>>>>
>>>>> What is the maximum sequence length that a RichSequence can handle?
>>>>>
>>>>> java -Xms1024m -Xmx1256m -jar loader.jar
>>>>> .
>>>>> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable.
>>>>> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier
>>>>> 56384585, length 5528445 and alphabet DNA...
>>>>> org.hibernate.PropertyAccessException: Exception occurred inside getter of
>>>>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength
>>>>>
>>>>> Rey Vincent Babilonia wrote:
>>>>>> Hi Mark,
>>>>>>
>>>>>> At first it throws an out of memory exception. My workaround is to
>>>>>> subdivide the sequence file into individual GenBank files.
>>>>>>
>>>>>> The error now is that if a GenBank sequence has an 'empty alphabet', it
>>>>>> does not get loaded to BioSQL. My workaround is to check if
>>>>>> sequence.getAlphabet().getName() is DNA.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Mark Schreiber wrote:
>>>>>>> Hi -
>>>>>>>
>>>>>>> Is the code throwing an exception or running out of memory??
>>>>>>>
>>>>>>> Can you send an example program and the problem you encounter to the
>>>>>>> list.
>>>>>>> - Mark
>>>>>>>
>>>>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia
>>>>>>> <rvincent at asti.dost.gov.ph> wrote:
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: large genbank data
>>>>>>>> Date: Wed, 28 May 2008 18:02:48 +0800
>>>>>>>> From: Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>
>>>>>>>> To: biojava-l at biojava.org
>>>>>>>>
>>>>>>>> hi,
>>>>>>>>
>>>>>>>> anybody tried uploading a large genbank data (e.g.
>>>>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql?
>>>>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and
>>>>>>>> it can't read the sequence (maybe because it has 30000+ sequences).
>>>>>>>>
>>>>>>>> thanks.
>>>>>>>>
>>>>>>>> --
>>>>>>>> /**
>>>>>>>> * @author Rey Vincent P. Babilonia
>>>>>>>> * @number +63 2 426 9760 local 1302
>>>>>>>> * @pgp 0x383454CF <at> pgp.mit.edu
>>>>>>>> * @project Philippine Bioinformatics Solutions
>>>>>>>> * @program Philippine e-Science Grid
>>>>>>>> * @division Research and Development Division
>>>>>>>> * @agency Advanced Science and Technology Institute
>>>>>>>> * @url http://www.psigrid.gov.ph
>>>>>>>> */
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> /**
>>>>>>>> * @author Rey Vincent P. Babilonia
>>>>>>>> * @number +63 2 426 9760 local 1302
>>>>>>>> * @pgp 0x383454CF <at> pgp.mit.edu
>>>>>>>> * @project Philippine Bioinformatics Solutions
>>>>>>>> * @program Philippine e-Science Grid
>>>>>>>> * @division Research and Development Division
>>>>>>>> * @agency Advanced Science and Technology Institute
>>>>>>>> * @url http://www.psigrid.gov.ph
>>>>>>>> */
>>>>>>>>
>>>>>>>> No virus found in this outgoing message.
>>>>>>>> Checked by AVG.
>>>>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date:
>>>>>>>> 5/28/2008 5:33 PM
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>> --
>>>>> /**
>>>>> * @author Rey Vincent P. Babilonia
>>>>> * @number +63 2 426 9760 local 1302
>>>>> * @pgp 0x383454CF <at> pgp.mit.edu
>>>>> * @project Philippine Bioinformatics Solutions
>>>>> * @program Philippine e-Science Grid
>>>>> * @division Research and Development Division
>>>>> * @agency Advanced Science and Technology Institute
>>>>> * @url http://www.psigrid.gov.ph
>>>>> */
>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
/**
* @author Rey Vincent P. Babilonia
* @number +63 2 426 9760 local 1302
* @pgp 0x383454CF <at> pgp.mit.edu
* @project Philippine Bioinformatics Solutions
* @program Philippine e-Science Grid
* @division Research and Development Division
* @agency Advanced Science and Technology Institute
* @url http://www.psigrid.gov.ph
*/
More information about the Biojava-l
mailing list