How does It work

BlazingCache is a fast distributed cache.

We use cache because we want to access data in a very fast way with no need to read from a database or perform other expensive operations. This pose a problem: you can read data from your cache but you need to invalidate it when the value you stored on the cache is no longer valid, i.e when it gets "stale".

Let's use a simple database example: if you are performing the same query lots of times, why not caching the result and storing it in a local variable? For instance, if you steadily issue the "SELECT * FROM FOO WHERE ID=1" query, you can save the result to a cache assigning the value to the "1" key. Then, when you update the row at ID=1, you need to invalidate the cache and tell the system that the value stored in the cache is now no longer valid.

What does distributed mean?

You can think of a cache as a (Java) Map such as ConcurrentHashMap implementation, for instance. When you adopt a ConcurrentHashMap, you are not going to be able to run your code in more than a single JVM: indeed, any invalidation issued needs to be propagated to other JVMs, otherwise other caches would contain stale data at the time of the first invalidation.

BlazingCache simply coordinates the job of distributedly invalidating cache entries so as to ensure overall data consistency, and also add some minor functionality such as managing data expiration and memory limits.

Why not use another distibuted cache implementation?

With BlazingCache you will get "only" a distributed cache. Usually distributed cache implementations from vendors offer plenty of functions (e.g. computing grids components) but, in case you need only a cache which "works well", maybe BlazingCache is the answer to your needs.

How does BlazingCache work in a distributed environment?

In local mode BlazingCache is no more than a ConcurrentHashMap, providing a few more cache functionalities such as entry expiration and the ability to fix an upper bound to the memory used by the cache itself.

In distributed mode, it works in a fashion pretty much similar to local mode, with the additional guarantee of the following properties:

If a client invalidates a key or performs a "put", the function blocks until all other clients hosting the same entry perform the same invalidation.
When a client performs a "put", the corresponding event is notified to the server which, in turn, broadcast the new value to any other client known to have the same entry in memory.
Conflicts and locks are resolved by the system automatically, any "problem" resulting in invalidation of the entry which is in "doubt".

The invalidation works in a very simple manner: when a client invalidates an entry, this is invalidated locally: afterwards, the client issues an invalidate message to the cache server; the latter routes it to any other client which is known to have the same entry in memory.

Every operation is implemented using asynchronous protocols and the majority of the functions are handled client-side. This means that the cache scales very well and there is no need to use a thread for each client or pending operation.

Since the cache server is only a coordinator of clients, it does not keep a copy of any value. In distributed mode, to alleviate the cache misses penalty, the clients have the ability to "fetch" data from other connected clients in case of cache miss: this function helps to reduce the load on database by reading data from other peers instead of falling back to cache misses.

What about network problems? High Availability?

BlazingCache server is very fast but makes a very important assumption on clients:

when a client is disconnected, its cache is empty.

The same assumption is made from the client:

If no connection is active to the server my local cache is empty.

This fail-fast way of working handles very well the case of slow clients/network problems and simplifies handling the state of clients from the server point of view.

Furthermore, from the client perspective, this means that if there is no connection to the server the cache is considered empty and "invalidations"/"puts" cannot be completed; this kind of operations will then be blocked until a configured timeout is reached.

You can start more than one server at the same time, but only one will serve connections for clients, the other will work in backup mode and, in case of failure of the current leader, take leadership. This also means that there is little reason to put in production more than two or three cache servers.