[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak
George Waldon
gwaldon at geneinfinity.org
Fri Apr 27 04:44:56 UTC 2012
Hi Tjeerd,
The use of a singleton map for rich objects with a Hibernate database
is described in the biojavax documentation; see also the class
BioSQLRichObjectBuilder.
You can implement your solution simply by creating your own
implementation of RichObjectBuilder as I mentioned before. Maybe you
could store the cache values as WeakReference and recreated objects
only when needed. Or you could filter the list of parameters, e.g. in
the case of SimpleDocRef return the same singleton representing an
empty DocRef if you are not interested in those, etc.
- George
Quoting Tjeerd Boerman <twboerman at gmail.com>:
> Hey,
>
> I cannot reopen the bug myself, but I'm sure Andreas or another
> manager of the bugtracker can.
>
> For my purposes the memory usage could be kept in check by disabling
> the cache in SimpleRichObjectBuilder. It's hard to say what the
> performance hit is, but it's not been noticeable in my application.
> A permanent solution needs some more thought.
>
> I double-checked, and all calls to SimpleObjectBuilder.getObject()
> go through RichObjectFactory.getObject() - it effectively delegates
> the call to the SimpleObjectBuilder. The SimpleObjectBuilder class
> keeps a cache of unrestricted size, with one object for every unique
> combination of class and parameters. The RichObjectFactory class
> also keeps a cache of objects, but this cache is restricted to
> containing instances for at most 20 different object classes at any
> given time.
> So we can think of this as a level 1 cache in RichObjectFactory, and
> a level 2 cache in SimpleObjectBuilder. The level 1 cache may
> contain at most 20 entries, and the level 2 cache may contain at
> most the amount of instantiable classes in BioJava, which will be a
> couple hundred *maximum*. I don't think this difference justifies
> the use of two caches.
>
> Possible fix:
> Unless I missed something, removing the cache from
> SimpleObjectBuilder and increasing the size of the LRU cache in
> RichObjectFactory to compensate seems like a good option. This would
> solve the unbounded growth of the level 2 cache by eliminating it.
>
>
>
> There is another issue with the level 1 cache that could cause
> unbounded growth. Look at this piece of code in
> RichObjectFactory.getObject():
>
> Map m = (Map)cache.get(applicationClass);
> if (!m.containsKey(paramsList)) {
> m.put(paramsList,builder.buildObject(applicationClass, paramsList));
> }
>
> For every unique combination of parameters, the object is built and
> stored in the Map associated with that class. This Map is not a LRU
> implementation, so creating 10,000,000 instances of a class (i.e.
> SimpleDocRef) with different parameters would cause every single
> instance to be saved in the cache. Such a situation would again
> cause memory problems.
>
> Possible fix:
> Well that's an easy one! Replace this simple Map cache with a LRU
> implementation, as proposed before.
>
> Regards,
> Tjeerd
>
>
> On 4/27/2012 12:03 AM, George Waldon wrote:
>> Hi Tjeerd,
>>
>> You can certainly reopen a bug.
>>
>> I am not the expert here, but looking at the code,
>> RichObjectFactory uses a LRU cache as a temporary cache for objects
>> that are reused often while SimpleRichObjectBuilder caches
>> everything. I doubt you can remove this last one.
>>
>> Maybe you can write your own RichObjectBuilder that would cache
>> only things you are interested in (to compare?). I bet you generate
>> a lot of DocRef that do not need to be cached; then use
>> RichObjectFactory.setRichObjectBuilder to set your partial builder
>> before you parse. Just an idea.
>>
>> Hope this helps,
>> George
>>
>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>
>>> Wow. I'm quite emberassed, but the issue is actually not fixed in
>>> 1.8.2. The RichObjectFactory class uses an LRU cache, while the
>>> SimpleRichObjectBuilder class uses the cache described in my
>>> earlier post, and I mixed up the two. Sorry about that, I must
>>> have been staring at this code for too long today.
>>>
>>> It turns out the RichObjectFactory class is the only user of
>>> SimpleRichObjectBuilder, and has its own LRU cache. So the cache
>>> in SimpleRichObjectBuilder can probably be removed altogether. If
>>> the experts agree I would be happy to write a patch for this.
>>>
>>> Can the old bug report be reopened, or should I file a new bug
>>> report? Or should I submit a possible patch through this mailing
>>> list?
>>>
>>> Regards,
>>> Tjeerd
>>>
>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>
>>>> * Status changed from New to Closed
>>>> * % Done changed from 0 to 100
>>>>
>>>> already fixed in 1.8.2
>>>>
>>>> ------------------------------------------------------------------------
>>>> Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>
>>>> * Author: Tjeerd Boerman
>>>> * Status: Closed
>>>> * Priority: Normal
>>>> * Assignee: biojava-dev list
>>>> * Category: bio
>>>> * Target version: BioJava 1.8 - legacy
>>>> * URL:
>>>>
>>>> I encountered a memory problem when parsing many Genbank files
>>>> with the Biojava 1.8.1. The parsed files were protein GPFF
>>>> (GenPept Flat File format) files from the latest RefSeq release.
>>>> The application tried to parse millions of protein sequences from
>>>> these files, but an OutOfMemoryException would always occur after
>>>> some time. The used heap space would gradually increase from a
>>>> couple hundred megabytes to over 1.5 GB, until the heap could
>>>> grow no further. Upon inspection I discovered a HashMap in
>>>> RichSequenceBuilder was the culprit:
>>>>
>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>
>>>> private static Map objects = new HashMap();
>>>>
>>>> public Object buildObject(Class clazz, List paramsList) {
>>>> ...
>>>>
>>>> // return the constructed object from the hashmap if there already
>>>> if (contents.containsKey(ourParamsList)) return
>>>> contents.get(ourParamsList);
>>>>
>>>> ...
>>>>
>>>> // Instantiate it with the parameters
>>>> Object o = c.newInstance(ourParamsList.toArray());
>>>>
>>>> // store it for later in the singleton map
>>>> contents.put(ourParamsList, o);
>>>>
>>>> ...
>>>> }
>>>> }
>>>>
>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is used
>>>> as a static cache for objects, but when many different objects
>>>> are created this cache grows out of control. I am unsure if this
>>>> is a 'true' bug, but for my application it was a definite
>>>> problem. My fix was to simply comment out the *contents.put()*
>>>> statement, but I'm sure there is a better way to resolve this -
>>>> perhaps by making the use of the cache optional through a
>>>> configuration option.
>>>>
>>>> ------------------------------------------------------------------------
>>>> You have received this notification because you have either
>>>> subscribed to it, or are involved in it.
>>>> To change your notification preferences, please click here and
>>>> login: http://redmine.open-bio.org
>>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>>
>>
>> --------------------------------
>> George Waldon
>>
>
More information about the biojava-dev
mailing list