2020.10.09

Guava Cache

こんにちは、次世代システム研究室のN.M.です。

Caches

We cache a key and it’s value, these together make up a cache entry. The key is the input to a computation and the value is the result of the computation. It may be more efficient to store a hash of the key, more on this later.

You should consider caching a computation when the following apply:
  • We want to trade memory for computation cost (time and other resources such as CPU or IO)
  • The computation is costly
  • The computation runs more than once with the same input
Cost reduction is a function of number of cache hits multiplied by the cost of the computation minus the total memory consumed by the cache.

The weights of these factors will be different per application.

Memory consumed is the number of entries in the cache multiplied by storage required for a cache entry.

Caches do comparison of keys so we should have fast comparison logic for our keys, often hashed keys are used for this purpose.

Guava Cache Framework

First an example.

Let’s imagine that we have a costly computation which retrieves articles from some external datasource given search input (keywords and categories).
List<ArticleSummary> getArticleSummaries(SearchInput searchInput) {
   // Some expensive operation to retrieve the list of article summaries
   List<ArticleSummary> articleSummaries = ...
   return articleSummaries;
}
This expensive computation may query an external datasource and take a while so we don’t want to repeat this computation given the same input. A good time to use a cache! So let’s go ahead and build our cache.
LoadingCache<SearchInput, List> articleCache = CacheBuilder.newBuilder()
            .maximumSize(1000)
            .expireAfterWrite(10, TimeUnit.MINUTES)
            .removalListener(removalListener)
            .build(new CacheLoader<SearchInput, List>() {
                @Override
                public List load(SearchInput searchInput) throws Exception {
                    return getArticleSummaries(searchInput);
                }
            });
Let’s look at each of the components above and learn about some of the important classes of the Guava Cache library.

LoadingCache<K,V>

  • Interface to caching functionality
  • LoadingCache implementations are required to be thread safe
When you instantiate an instance of LoadingCache, you specify the classes for the key and value. In the example above the key is a SearchInput object and the value is a list of ArticleSummary objects.
@AllArgsConstructor
class SearchInput {
    final List<String> searchKeys;
    final List<String> categories;
}
@AllArgsConstructor
class ArticleSummary {
    final Url url;
    final String title;
    final String author;
}
Here we use the Lombok annotation AllArgsConstructor to generate a constructor that sets the values of our two fields searchKeys and categories.

So we can easily see by the LoadingCache(K, V) definition the types of the keys and values.

CacheBuilder<K,V>

Use this class to build and configure instances of LoadingCache.

In our example we instantiate and build a LoadingCache with a maximum size of 1000 entries.

We then tell the cache to expire entries 10 minutes after writing into the cache.

Lastly we register a listener object named removalListener to be notified whenever an entry is removed from the cache.

A simple RemovalListener definition is given below:
RemovalListener<SearchInput, List> removalListener = new RemovalListener<>() {
        @Override
        public void onRemoval(RemovalNotification<SearchInput, List> removalNotification) {
            removalNotification.getValue().forEach(articleSummary -> log.debug("removed article: {}", articleSummary.title));
        }
    };
This simple RemovalListener does nothing but produce some debug logs, but you may want more complex logic here if you are caching resources that need to be cleaned up, for example database connections that need to be closed when removed.

CacheLoader<K,V>

This class defines the logic to actually compute the value for the given key, this logic is defined inside the load(K key) method. This class is passed as a parameter to the CacheBuilder.build() method. The example CacheLoader.load(K) calls the costly getArticleSummaries(searchInput) method, which takes time to get the results. This is the reason for using a cache.
new CacheLoader<SearchInput, List>() {
                @Override
                public List load(SearchInput searchInput) throws Exception {
                    return getArticleSummaries(searchInput);
                }
            }
 

Behavior

When we make a query via the LoadingCache.get(K) method one of two things can happen:
  • Cache Hit
    • The value of K matches an existing key, the corresponding value is returned from the cache
  • Cache Miss
    • There is no match found for K
    • CacheLoader.load(K) method is called and runs in separate thread.
    • If other calls for same key occur during the load they will wait for result of the running load method.
When are entries removed?
At first it might seem that cache entries would be removed as soon as they become eligible for removal. Such as when they expire (if using expireAfterWrite or expireAfterRead) or the number of entries exceeds the maximum size (if maximumSize is set). But actually cache entries that are eligible for removal are removed during any unrelated cache write, or if cache writes happen infrequently during any unrelated cache read. The reason for this is, for realtime removal of eligible entries, Guava Cache would need a background process running continually. This process could possibly conflict with concurrent LoadingCache.get(key) calls, causing waits and degradation of cache performance. So instead of using a background process to clean up the cache, Guava does cache cleanup around cache writes or cache reads.

If you need to enforce a cache cleanup and are willing to risk some performance degradation to your cache clients you may use one of the LoadingCache.invalidate methods shown, see javadocs for more info.
  • void invalidate(Object key) – removes one entry matching key
  • void invalidateAll(Iterable<?> keys) – removes all entries matching the elements of keys
  • void invalidateAll() – removes all entries

How does your cache perform?

If you want to see some cache metrics use the LoadingCache.stats() method. This method returns a CacheStats object that contains statistics such as:
  • hit count – the number of times Cache lookup methods have returned a cached value
  • hit rate – the ratio of cache requests which were hits
  • miss count – the number of times Cache lookup methods have returned an uncached (newly loaded) value, or null
  • miss rate – the ratio of cache requests which were misses
  • load count – the total number of times that Cache lookup methods attempted to load new values
  • load success count – the number of times Cache lookup methods have successfully loaded a new value
  • load exception count – the number of times Cache lookup methods threw an exception while loading a new value
  • and more
Each of these counts starts from zero at cache startup.

Optimizing our Cache

Observant readers will have seen that we have a potentially costly operation caused by the structure of our cache. The SearchInput key contains two lists, one for searchKeys and categories. Since the cache checks for existing keys during a call to LoadingCache.get(key), or any of the other key access operations, keys within the cache are compared to the key parameter for equality.

Admittedly, in our example it is unlikely that these lists would be long, but let’s imagine that our SearchInput example can contain longs lists of search keys. In this case we should optimize the equals method in our key class.
@EqualsAndHashCode
class SearchInput {
    @EqualsAndHashCode.Exclude
    final List<String> searchKeys;
    @EqualsAndHashCode.Exclude
    final List<String> categories;
    @EqualsAndHashCode.Include
    final int searchKeysHash;
    @EqualsAndHashCode.Include
    final int categoriesHash;

    @VisibleForTesting
    TargetDomainsBlockDomains(List<String> searchKeys, List<String> categories) {
        this.searchKeys = searchKeys;
        this.categories = categories;
        this.searchKeysHash = searchKeys.hashCode();
        this.categoriesHash = categories.hashCode();
    }
}
We added two new fields searchKeysHash and categoriesHash. These int fields are hashes, computed from searchKeys, categories inside of the constructor.
The Lombok annotation EqualsAndHashCode, is used to generate a custom equals method: the important part is that we only check the two new int fields for equality.
In this way we speed up equality checks for the SearchInput key with long list fields.

Summary

Guava Cache’s Cache API gives a reliable and easy to use framework to speed up our applications whenever we do costly and repetitive computation. The following points were discussed:
  • cache loading due to a cache miss
  • when entries get removed
  • how to check your cache performance
  • complex keys and how they can slow your cache and a work-around
次世代システム研究室では、グループ全体のインテグレーションを支援してくれるアーキテクトを募集しています。インフラ設計、構築経験者の方、次世代システム研究室にご興味を持って頂ける方がいらっしゃいましたら、ぜひ募集職種一覧からご応募をお願いします。

 

皆さんのご応募をお待ちしています。

  • Twitter
  • Facebook
  • はてなブックマークに追加

グループ研究開発本部の最新情報をTwitterで配信中です。ぜひフォローください。

 
  • AI研究開発室
  • 大阪研究開発グループ