[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak
Tjeerd Boerman
twboerman at gmail.com
Fri May 11 13:42:19 UTC 2012
Any thoughts on this from the Biojava authors? I strongly believe that
the default behavior of BioJava should not be "unbounded memory usage
unless you implement your own RichObjectBuilder".
- Tjeerd
On 4/27/2012 3:45 PM, George Waldon wrote:
> You can use RichObjectFactory.setRichObjectBuilder(RichObjectBuilder
> b) to enable you own builder.
>
> - George
>
> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>
>> Hi George,
>>
>> A very good point. While the specification of the RichObjectBuilder
>> interface does not *enforce* that implementations persist the
>> objects, it seems to be the intended use. Both implementations
>> (BioSQLRichObjectBuilder and SimpleRichObjectBuilder) persist all
>> created objects. Creating my own class would work for me, but this
>> should be fixed in BioJava codebase as well, correct? Creating
>> another RichObjectBuilder implementation that works in finite memory
>> would work, but it would have to be 'plugged in' by editing the
>> RichObjectFactory or enabled through a configuration option - both
>> seem undesirable. Any ideas?
>>
>> Tjeerd
>>
>> On 4/27/2012 6:44 AM, George Waldon wrote:
>>> Hi Tjeerd,
>>>
>>> The use of a singleton map for rich objects with a Hibernate
>>> database is described in the biojavax documentation; see also the
>>> class BioSQLRichObjectBuilder.
>>>
>>> You can implement your solution simply by creating your own
>>> implementation of RichObjectBuilder as I mentioned before. Maybe you
>>> could store the cache values as WeakReference and recreated objects
>>> only when needed. Or you could filter the list of parameters, e.g.
>>> in the case of SimpleDocRef return the same singleton representing
>>> an empty DocRef if you are not interested in those, etc.
>>>
>>> - George
>>>
>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>
>>>> Hey,
>>>>
>>>> I cannot reopen the bug myself, but I'm sure Andreas or another
>>>> manager of the bugtracker can.
>>>>
>>>> For my purposes the memory usage could be kept in check by
>>>> disabling the cache in SimpleRichObjectBuilder. It's hard to say
>>>> what the performance hit is, but it's not been noticeable in my
>>>> application. A permanent solution needs some more thought.
>>>>
>>>> I double-checked, and all calls to SimpleObjectBuilder.getObject()
>>>> go through RichObjectFactory.getObject() - it effectively delegates
>>>> the call to the SimpleObjectBuilder. The SimpleObjectBuilder class
>>>> keeps a cache of unrestricted size, with one object for every
>>>> unique combination of class and parameters. The RichObjectFactory
>>>> class also keeps a cache of objects, but this cache is restricted
>>>> to containing instances for at most 20 different object classes at
>>>> any given time.
>>>> So we can think of this as a level 1 cache in RichObjectFactory,
>>>> and a level 2 cache in SimpleObjectBuilder. The level 1 cache may
>>>> contain at most 20 entries, and the level 2 cache may contain at
>>>> most the amount of instantiable classes in BioJava, which will be a
>>>> couple hundred *maximum*. I don't think this difference justifies
>>>> the use of two caches.
>>>>
>>>> Possible fix:
>>>> Unless I missed something, removing the cache from
>>>> SimpleObjectBuilder and increasing the size of the LRU cache in
>>>> RichObjectFactory to compensate seems like a good option. This
>>>> would solve the unbounded growth of the level 2 cache by
>>>> eliminating it.
>>>>
>>>>
>>>>
>>>> There is another issue with the level 1 cache that could cause
>>>> unbounded growth. Look at this piece of code in
>>>> RichObjectFactory.getObject():
>>>>
>>>> Map m = (Map)cache.get(applicationClass);
>>>> if (!m.containsKey(paramsList)) {
>>>> m.put(paramsList,builder.buildObject(applicationClass,
>>>> paramsList));
>>>> }
>>>>
>>>> For every unique combination of parameters, the object is built and
>>>> stored in the Map associated with that class. This Map is not a LRU
>>>> implementation, so creating 10,000,000 instances of a class (i.e.
>>>> SimpleDocRef) with different parameters would cause every single
>>>> instance to be saved in the cache. Such a situation would again
>>>> cause memory problems.
>>>>
>>>> Possible fix:
>>>> Well that's an easy one! Replace this simple Map cache with a LRU
>>>> implementation, as proposed before.
>>>>
>>>> Regards,
>>>> Tjeerd
>>>>
>>>>
>>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>>> Hi Tjeerd,
>>>>>
>>>>> You can certainly reopen a bug.
>>>>>
>>>>> I am not the expert here, but looking at the code,
>>>>> RichObjectFactory uses a LRU cache as a temporary cache for
>>>>> objects that are reused often while SimpleRichObjectBuilder caches
>>>>> everything. I doubt you can remove this last one.
>>>>>
>>>>> Maybe you can write your own RichObjectBuilder that would cache
>>>>> only things you are interested in (to compare?). I bet you
>>>>> generate a lot of DocRef that do not need to be cached; then use
>>>>> RichObjectFactory.setRichObjectBuilder to set your partial builder
>>>>> before you parse. Just an idea.
>>>>>
>>>>> Hope this helps,
>>>>> George
>>>>>
>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>
>>>>>> Wow. I'm quite emberassed, but the issue is actually not fixed in
>>>>>> 1.8.2. The RichObjectFactory class uses an LRU cache, while the
>>>>>> SimpleRichObjectBuilder class uses the cache described in my
>>>>>> earlier post, and I mixed up the two. Sorry about that, I must
>>>>>> have been staring at this code for too long today.
>>>>>>
>>>>>> It turns out the RichObjectFactory class is the only user of
>>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the cache
>>>>>> in SimpleRichObjectBuilder can probably be removed altogether. If
>>>>>> the experts agree I would be happy to write a patch for this.
>>>>>>
>>>>>> Can the old bug report be reopened, or should I file a new bug
>>>>>> report? Or should I submit a possible patch through this mailing
>>>>>> list?
>>>>>>
>>>>>> Regards,
>>>>>> Tjeerd
>>>>>>
>>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>>
>>>>>>> * Status changed from New to Closed
>>>>>>> * % Done changed from 0 to 100
>>>>>>>
>>>>>>> already fixed in 1.8.2
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>>
>>>>>>> * Author: Tjeerd Boerman
>>>>>>> * Status: Closed
>>>>>>> * Priority: Normal
>>>>>>> * Assignee: biojava-dev list
>>>>>>> * Category: bio
>>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>>> * URL:
>>>>>>>
>>>>>>> I encountered a memory problem when parsing many Genbank files
>>>>>>> with the Biojava 1.8.1. The parsed files were protein GPFF
>>>>>>> (GenPept Flat File format) files from the latest RefSeq release.
>>>>>>> The application tried to parse millions of protein sequences
>>>>>>> from these files, but an OutOfMemoryException would always occur
>>>>>>> after some time. The used heap space would gradually increase
>>>>>>> from a couple hundred megabytes to over 1.5 GB, until the heap
>>>>>>> could grow no further. Upon inspection I discovered a HashMap in
>>>>>>> RichSequenceBuilder was the culprit:
>>>>>>>
>>>>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>>>>
>>>>>>> private static Map objects = new HashMap();
>>>>>>>
>>>>>>> public Object buildObject(Class clazz, List paramsList) {
>>>>>>> ...
>>>>>>>
>>>>>>> // return the constructed object from the hashmap if there
>>>>>>> already
>>>>>>> if (contents.containsKey(ourParamsList)) return
>>>>>>> contents.get(ourParamsList);
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> // Instantiate it with the parameters
>>>>>>> Object o = c.newInstance(ourParamsList.toArray());
>>>>>>>
>>>>>>> // store it for later in the singleton map
>>>>>>> contents.put(ourParamsList, o);
>>>>>>>
>>>>>>> ...
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is used
>>>>>>> as a static cache for objects, but when many different objects
>>>>>>> are created this cache grows out of control. I am unsure if this
>>>>>>> is a 'true' bug, but for my application it was a definite
>>>>>>> problem. My fix was to simply comment out the *contents.put()*
>>>>>>> statement, but I'm sure there is a better way to resolve this -
>>>>>>> perhaps by making the use of the cache optional through a
>>>>>>> configuration option.
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> You have received this notification because you have either
>>>>>>> subscribed to it, or are involved in it.
>>>>>>> To change your notification preferences, please click here and
>>>>>>> login: http://redmine.open-bio.org
>>>>>>>
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------
>>>>> George Waldon
>>>>>
>>>>
>>>
>>>
>>>
>>
>
More information about the biojava-dev
mailing list