[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak
George Waldon
gwaldon at geneinfinity.org
Thu Apr 26 22:03:42 UTC 2012
Hi Tjeerd,
You can certainly reopen a bug.
I am not the expert here, but looking at the code, RichObjectFactory
uses a LRU cache as a temporary cache for objects that are reused
often while SimpleRichObjectBuilder caches everything. I doubt you can
remove this last one.
Maybe you can write your own RichObjectBuilder that would cache only
things you are interested in (to compare?). I bet you generate a lot
of DocRef that do not need to be cached; then use
RichObjectFactory.setRichObjectBuilder to set your partial builder
before you parse. Just an idea.
Hope this helps,
George
Quoting Tjeerd Boerman <twboerman at gmail.com>:
> Wow. I'm quite emberassed, but the issue is actually not fixed in
> 1.8.2. The RichObjectFactory class uses an LRU cache, while the
> SimpleRichObjectBuilder class uses the cache described in my earlier
> post, and I mixed up the two. Sorry about that, I must have been
> staring at this code for too long today.
>
> It turns out the RichObjectFactory class is the only user of
> SimpleRichObjectBuilder, and has its own LRU cache. So the cache in
> SimpleRichObjectBuilder can probably be removed altogether. If the
> experts agree I would be happy to write a patch for this.
>
> Can the old bug report be reopened, or should I file a new bug
> report? Or should I submit a possible patch through this mailing list?
>
> Regards,
> Tjeerd
>
> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>> Issue #3345 has been updated by Andreas Prlic.
>>
>> * Status changed from New to Closed
>> * % Done changed from 0 to 100
>>
>> already fixed in 1.8.2
>>
>> ------------------------------------------------------------------------
>>
>>
>> Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>> memory leak <https://redmine.open-bio.org/issues/3345>
>>
>> * Author: Tjeerd Boerman
>> * Status: Closed
>> * Priority: Normal
>> * Assignee: biojava-dev list
>> * Category: bio
>> * Target version: BioJava 1.8 - legacy
>> * URL:
>>
>> I encountered a memory problem when parsing many Genbank files with
>> the Biojava 1.8.1. The parsed files were protein GPFF (GenPept Flat
>> File format) files from the latest RefSeq release. The application
>> tried to parse millions of protein sequences from these files, but
>> an OutOfMemoryException would always occur after some time. The
>> used heap space would gradually increase from a couple hundred
>> megabytes to over 1.5 GB, until the heap could grow no further.
>> Upon inspection I discovered a HashMap in RichSequenceBuilder was
>> the culprit:
>>
>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>
>> private static Map objects = new HashMap();
>>
>> public Object buildObject(Class clazz, List paramsList) {
>> ...
>>
>> // return the constructed object from the hashmap if there already
>> if (contents.containsKey(ourParamsList)) return
>> contents.get(ourParamsList);
>>
>> ...
>>
>> // Instantiate it with the parameters
>> Object o = c.newInstance(ourParamsList.toArray());
>>
>> // store it for later in the singleton map
>> contents.put(ourParamsList, o);
>>
>> ...
>> }
>> }
>>
>> It seems the *objects* Map in SimpleRichSequenceBuilder is used as
>> a static cache for objects, but when many different objects are
>> created this cache grows out of control. I am unsure if this is a
>> 'true' bug, but for my application it was a definite problem. My
>> fix was to simply comment out the *contents.put()* statement, but
>> I'm sure there is a better way to resolve this - perhaps by making
>> the use of the cache optional through a configuration option.
>>
>> ------------------------------------------------------------------------
>>
>> You have received this notification because you have either
>> subscribed to it, or are involved in it.
>> To change your notification preferences, please click here and
>> login: http://redmine.open-bio.org
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
--------------------------------
George Waldon
More information about the biojava-dev
mailing list