[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak
George Waldon
gwaldon at geneinfinity.org
Fri May 11 15:34:17 UTC 2012
Hi Tjeerd,
Bj1x was designed and written by Richard Holland; that would be the
person to ask directly I guess:-).
It is not desirable to change behavior retroactively. Are you still
facing memory issues? Could you be more explicit on your problem?
There are several ways we could adjust the current code without
breaking the original design; e.g. use of service providers or
implement a getRichObjectBuilder method to swap the builder for a
particular job. Just thoughts.
- George
Quoting Tjeerd Boerman <twboerman at gmail.com>:
> Any thoughts on this from the Biojava authors? I strongly believe
> that the default behavior of BioJava should not be "unbounded memory
> usage unless you implement your own RichObjectBuilder".
>
> - Tjeerd
>
> On 4/27/2012 3:45 PM, George Waldon wrote:
>> You can use
>> RichObjectFactory.setRichObjectBuilder(RichObjectBuilder b) to
>> enable you own builder.
>>
>> - George
>>
>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>
>>> Hi George,
>>>
>>> A very good point. While the specification of the
>>> RichObjectBuilder interface does not *enforce* that
>>> implementations persist the objects, it seems to be the intended
>>> use. Both implementations (BioSQLRichObjectBuilder and
>>> SimpleRichObjectBuilder) persist all created objects. Creating my
>>> own class would work for me, but this should be fixed in BioJava
>>> codebase as well, correct? Creating another RichObjectBuilder
>>> implementation that works in finite memory would work, but it
>>> would have to be 'plugged in' by editing the RichObjectFactory or
>>> enabled through a configuration option - both seem undesirable.
>>> Any ideas?
>>>
>>> Tjeerd
>>>
>>> On 4/27/2012 6:44 AM, George Waldon wrote:
>>>> Hi Tjeerd,
>>>>
>>>> The use of a singleton map for rich objects with a Hibernate
>>>> database is described in the biojavax documentation; see also the
>>>> class BioSQLRichObjectBuilder.
>>>>
>>>> You can implement your solution simply by creating your own
>>>> implementation of RichObjectBuilder as I mentioned before. Maybe
>>>> you could store the cache values as WeakReference and recreated
>>>> objects only when needed. Or you could filter the list of
>>>> parameters, e.g. in the case of SimpleDocRef return the same
>>>> singleton representing an empty DocRef if you are not interested
>>>> in those, etc.
>>>>
>>>> - George
>>>>
>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>
>>>>> Hey,
>>>>>
>>>>> I cannot reopen the bug myself, but I'm sure Andreas or another
>>>>> manager of the bugtracker can.
>>>>>
>>>>> For my purposes the memory usage could be kept in check by
>>>>> disabling the cache in SimpleRichObjectBuilder. It's hard to say
>>>>> what the performance hit is, but it's not been noticeable in my
>>>>> application. A permanent solution needs some more thought.
>>>>>
>>>>> I double-checked, and all calls to
>>>>> SimpleObjectBuilder.getObject() go through
>>>>> RichObjectFactory.getObject() - it effectively delegates the
>>>>> call to the SimpleObjectBuilder. The SimpleObjectBuilder class
>>>>> keeps a cache of unrestricted size, with one object for every
>>>>> unique combination of class and parameters. The
>>>>> RichObjectFactory class also keeps a cache of objects, but this
>>>>> cache is restricted to containing instances for at most 20
>>>>> different object classes at any given time.
>>>>> So we can think of this as a level 1 cache in RichObjectFactory,
>>>>> and a level 2 cache in SimpleObjectBuilder. The level 1 cache
>>>>> may contain at most 20 entries, and the level 2 cache may
>>>>> contain at most the amount of instantiable classes in BioJava,
>>>>> which will be a couple hundred *maximum*. I don't think this
>>>>> difference justifies the use of two caches.
>>>>>
>>>>> Possible fix:
>>>>> Unless I missed something, removing the cache from
>>>>> SimpleObjectBuilder and increasing the size of the LRU cache in
>>>>> RichObjectFactory to compensate seems like a good option. This
>>>>> would solve the unbounded growth of the level 2 cache by
>>>>> eliminating it.
>>>>>
>>>>>
>>>>>
>>>>> There is another issue with the level 1 cache that could cause
>>>>> unbounded growth. Look at this piece of code in
>>>>> RichObjectFactory.getObject():
>>>>>
>>>>> Map m = (Map)cache.get(applicationClass);
>>>>> if (!m.containsKey(paramsList)) {
>>>>> m.put(paramsList,builder.buildObject(applicationClass, paramsList));
>>>>> }
>>>>>
>>>>> For every unique combination of parameters, the object is built
>>>>> and stored in the Map associated with that class. This Map is
>>>>> not a LRU implementation, so creating 10,000,000 instances of a
>>>>> class (i.e. SimpleDocRef) with different parameters would cause
>>>>> every single instance to be saved in the cache. Such a situation
>>>>> would again cause memory problems.
>>>>>
>>>>> Possible fix:
>>>>> Well that's an easy one! Replace this simple Map cache with a
>>>>> LRU implementation, as proposed before.
>>>>>
>>>>> Regards,
>>>>> Tjeerd
>>>>>
>>>>>
>>>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>>>> Hi Tjeerd,
>>>>>>
>>>>>> You can certainly reopen a bug.
>>>>>>
>>>>>> I am not the expert here, but looking at the code,
>>>>>> RichObjectFactory uses a LRU cache as a temporary cache for
>>>>>> objects that are reused often while SimpleRichObjectBuilder
>>>>>> caches everything. I doubt you can remove this last one.
>>>>>>
>>>>>> Maybe you can write your own RichObjectBuilder that would cache
>>>>>> only things you are interested in (to compare?). I bet you
>>>>>> generate a lot of DocRef that do not need to be cached; then
>>>>>> use RichObjectFactory.setRichObjectBuilder to set your partial
>>>>>> builder before you parse. Just an idea.
>>>>>>
>>>>>> Hope this helps,
>>>>>> George
>>>>>>
>>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>>
>>>>>>> Wow. I'm quite emberassed, but the issue is actually not fixed
>>>>>>> in 1.8.2. The RichObjectFactory class uses an LRU cache, while
>>>>>>> the SimpleRichObjectBuilder class uses the cache described in
>>>>>>> my earlier post, and I mixed up the two. Sorry about that, I
>>>>>>> must have been staring at this code for too long today.
>>>>>>>
>>>>>>> It turns out the RichObjectFactory class is the only user of
>>>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the
>>>>>>> cache in SimpleRichObjectBuilder can probably be removed
>>>>>>> altogether. If the experts agree I would be happy to write a
>>>>>>> patch for this.
>>>>>>>
>>>>>>> Can the old bug report be reopened, or should I file a new bug
>>>>>>> report? Or should I submit a possible patch through this
>>>>>>> mailing list?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Tjeerd
>>>>>>>
>>>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>>>
>>>>>>>> * Status changed from New to Closed
>>>>>>>> * % Done changed from 0 to 100
>>>>>>>>
>>>>>>>> already fixed in 1.8.2
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------ Bug #3345: Static object cache in SimpleRichObjectBuilder
>>>>>>>> causing
>>>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>>>
>>>>>>>> * Author: Tjeerd Boerman
>>>>>>>> * Status: Closed
>>>>>>>> * Priority: Normal
>>>>>>>> * Assignee: biojava-dev list
>>>>>>>> * Category: bio
>>>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>>>> * URL:
>>>>>>>>
>>>>>>>> I encountered a memory problem when parsing many Genbank
>>>>>>>> files with the Biojava 1.8.1. The parsed files were protein
>>>>>>>> GPFF (GenPept Flat File format) files from the latest RefSeq
>>>>>>>> release. The application tried to parse millions of protein
>>>>>>>> sequences from these files, but an OutOfMemoryException would
>>>>>>>> always occur after some time. The used heap space would
>>>>>>>> gradually increase from a couple hundred megabytes to over
>>>>>>>> 1.5 GB, until the heap could grow no further. Upon inspection
>>>>>>>> I discovered a HashMap in RichSequenceBuilder was the culprit:
>>>>>>>>
>>>>>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>>>>>
>>>>>>>> private static Map objects = new HashMap();
>>>>>>>>
>>>>>>>> public Object buildObject(Class clazz, List paramsList) {
>>>>>>>> ...
>>>>>>>>
>>>>>>>> // return the constructed object from the hashmap if
>>>>>>>> there already
>>>>>>>> if (contents.containsKey(ourParamsList)) return
>>>>>>>> contents.get(ourParamsList);
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>> // Instantiate it with the parameters
>>>>>>>> Object o = c.newInstance(ourParamsList.toArray());
>>>>>>>>
>>>>>>>> // store it for later in the singleton map
>>>>>>>> contents.put(ourParamsList, o);
>>>>>>>>
>>>>>>>> ...
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is
>>>>>>>> used as a static cache for objects, but when many different
>>>>>>>> objects are created this cache grows out of control. I am
>>>>>>>> unsure if this is a 'true' bug, but for my application it was
>>>>>>>> a definite problem. My fix was to simply comment out the
>>>>>>>> *contents.put()* statement, but I'm sure there is a better
>>>>>>>> way to resolve this - perhaps by making the use of the cache
>>>>>>>> optional through a configuration option.
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------ You have received this notification because you have either subscribed to it, or are involved in
>>>>>>>> it.
>>>>>>>> To change your notification preferences, please click here
>>>>>>>> and login: http://redmine.open-bio.org
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> biojava-dev mailing list
>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------
>>>>>> George Waldon
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>
More information about the biojava-dev
mailing list