How do I write lock-free non-blocking code and why should I care?

What are the benefits of lock free non-blocking code? Traditional Java programming uses mutex locks to give mutually exclusive access to resources. How can immutables and snapshots be used to write code without these locks. What libraries exist that provide algorithms that we can use out-of-the-box to make our job easier? What are the consequences of continuing to write blocking code and how will our apps fail?

Lock-free code will not make threads and processes in your application wait for one another to do their jobs, under most circumstances, which makes it highly suitable for low latency mission critical applications that cannot freeze. They use predictable immutable snapshots of objects, so that the computation on them could not be held back by any other thread, or action the application takes. Instead once these snapshots are created, each thread can pretend its the only one working on the data. Much of the success of lock free code depends on the non-trivial process of creating these snapshots efficiently. There exists a number of libraries both in Scala (immutable collections, TrieMap) and in Java (Guava immutables), that will let you get started fast writing non-blocking code.

A helpful analogy

To understand what lock free code is, think of an intersection with a traffic light in the middle. Everyone has to wait for the light to turn green to proceed. For everyone else the light will be read, to give us mutually exclusive access to the intersection. This works to prevent accidents, but it also means most cars will be waiting at any given time.

Now think of a roundabout, which allows multiple cars moving around in the intersection at any given time. Less cars have to wait, so everyone gets home for dinner faster. But sometimes you still have to wait when you are trying to get on it and a car is right in front of you. So you wait less, but still wait a little.

But if you are really lucky, you have a freeway between your office and your house, because freeways when they work well, don’t make you stop at all. You are not blocked, and you don’t have to ask for permission from anyone to proceed. Of course, this will be the fastest way for you to get home. Code is no different!

Scala makes non-blocking easy

Scala’s programming model has some built in patterns that will allow us to work with immutable data-structures and still change data. These patterns are based on making efficient copies, a staple of functional programming, and swapping them out using volatile references to these copies. When writing Scala code, we are encouraged to use these immutable data structures. If we want a Map, we are told to use an immutable.Map. If we want a list, we better pick an immutable.List, or we won’t make it trough code review at any respectable developer shop. This is because the side effect of using these immutable data structures is non-blocking code, which runs faster and is more reliable. Read on to see how.

Using your own volatile references to an immutable collection

The example below creates a list of integers 1, 2, 3, modifies it by adding a 0 to the front of the list, in a completely thread-safe manner, without using synchronization or other mutex locks:

//Thread 1: 
@volatile var list = List(1, 2, 3)

//Thread 2: 
list = 0 +: list

This implies that when you code this way, you are still responsible for coming up with what kinds of snapshots you create, which is sometimes very hard. Another thing to keep in mind is that we still have to do explicit concurrency management. You still have to create the new snapshot by hand and then store it back to a variable and the variable must be volatile. That’s two many things to keep in mind that can go wrong.

In this very simple example of updating a list, a var pointing to an immutable list may be okay. But let’s consider a more complex locking problem that cannot be easily solved by using the above pattern.

A classic concurrency problem goes like this: We have two accounts, one belongs to Joe, the other belongs to Jack. Write code that debits one account and credits the other in a way that both accounts always remain correct.

The first thing that we realize is that for this problem we need a map. We have to look up each account by the name “Joe” or “Jack”, modify it, then store back. Maps are good at finding things by name so lets see what options we have if we want to write lock free code using maps.

ConcurrentHashMap

The technique shown above is the basis of working with only immutables to create all kinds of data structures that store changing data. Some variant of this is used in Java’s concurrent HashMap, which is lock free for gets.

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;
   [...]
}
transient volatile Node<K,V>[] table;

Unfortunately, the way concurrent HashMap is implemented is not completely lock free. For mutations it does lock, although it only locks the pieces of the data that will change. For instance it locks a collision list by locking the first node, instead of locking the full hash table, when it needs to change that list. This takes us closer to our “dream” of lock free computing, it does not fully solve the problem, though.

Volatiles are often combined with cleverly chosen snapshots, which are immutable objects. These snapshots must contain the data that we want to change atomically, for the pattern to work. In this case the collision list was the unit which we wanted to change atomically. Lets see if there is a better data structure that can do this without any locking.

Using TriMap

There is a class in the Scala concurrent collection library that will hide all the concurrency management complexity from us just like Java’s concurrent HashMap does, but will do it in a completely lock-free way. It hides this complexity from you and manages the immutable snapshots so that you end up with very fast collections. This class is called TrieMap.

TriMaps “a concurrent thread-safe lock-free implementation of a hash array mapped trie”, according to the documentation. You get HashMap performance in a concurrent environment, which won’t block! TrieMaps are constructed using a combination of hashing (such as in hash tables), trees, and immutable snapshots to create a very clever algorithm based on CAS operations that is both very fast and will not block the execution of any thread.

So we are getting closer to our goal, although we haven’t quite reached Nirvana quite yet because we not only have the problem of maintaining the account map in a lock free way, but also have the issue that a debit from one account must be done atomically with a credit to the other account.

Imagine if we tried to transfer $100 from Joe to Jack and concurrently from another thread, $200 from Jack to Joe, where both of them had a $1000 account balance. First we would retrieve from Joe’s account, get the balance, decrement it by $100, and write it back. Then we would get Jack’s account, get the balance, increment it by $100, then write it back. We would do the same in reverse for the other transaction for $200.

But what if the first transaction read Joe’s balance as $1000, subtracted $100 from it making it $900, while the second transaction already incremented Joe’s balance to $1200 and wrote it back because it was running faster. So writing back $900 is wrong. It should be writing back $1100, because Joe’s balance was increased in the mean time externally. We stole $200 from poor Joe! This is the read-modify-write problem. For this to work correctly, we would still need to synchronize the debit and credit to make them atomic. Unless of course we use an actor.

Using an actor

Actors are a different kind of solution to this problem. They have exclusive access to their arbitrarily complex state. They sequence everything and push the updates through an event queue, so that only one thread is actually at work at any given time. This means that there is no need for you to worry about concurrency issues, since it all becomes single threaded programming within the actor. So inside the actor you might as well use mutable maps or list buffers, or anything else that you find easiest to work with. The “outside world” can never get directly to the actor state, so you’re isolated from any external interference. The actor updates its own state in response to immutable events it receives.

In the actor model, the actor is said to have a mailbox. Each actor processes its mailbox one message at a time sequentially. It’s only possible to talk to actors through immutable events. In response to each message the actor will change its state. Here again we have the requirement that we cannot block an actor on any condition for the model to work well. If we blocked inside an actor, we would effectively shut that actor down which could lead to dead locks amongst other things. Actors work in hierarchies and have a concept of supervision. When an actor crashes, it’s supervising actor can restart it or take some other action to heal the system.The actor model was initially popularized by Erlang but it is best implemented on the JVM by Akka, which works better when used from Scala, but can also be used from Java.

Event Sourcing

Actor’s are superstars in maintaining state in a concurrent environment, certainly when we consider the alternatives. If that was all they did, it would already be well worth it to base highly concurrent systems on the actor model. They have one other remarkable quality though: They can retroactively fix their state!

Think of how actors maintain their state: They get all state change requests from events. Sticking with the example above, when maintaining financial accounts, we would receive transactions which would be added to the account and the new account balance would be calculated. Now imagine that the account we are maintaining is a credit card account and a transaction we applied last week was reported fraudulent. We would have to go back and fix the account by removing all the fraudulent transactions we can find. Imagine how hard this task would be in complex financial systems.

There is an ingenious solution to this problem. Since all actor state changes are based on events, we can simply persist their state in a way that we write the events to persistent storage, not the actual current run-time state of the actor. Then when we restart the actor, its state would be rebuilt by replaying the events that lead to the last current state. Now fixing a problem becomes very easy because we just have to delete the fraudulent credit card charge events, replay the state from persistent storage, and everything would automatically fall in place without any further work.

This method is called Event Sourcing and makes retroactively fixing state changes easy. The actor model directly supports this technique and it is considered to be one of the additional benefits of using actors in industries which require fixing errors, such as the financial industry

Conclusion

It’s safe to say that the above examples show a trend towards tools which can maintain state more efficiently. ConcurrentHashMap was an improvement over Java’s old Hashtable, because it removed locking for reads and reduced locking for writes. Scala’s TrieMaps raised the bar by also removing all locking for writes, but still left the door open for other race conditions. Actors help us safeguard our state even more by making one thread doing all mutation. Event Sourcing even allow us to fix state updates retroactively. The result is faster and more reliable code that runs better in highly concurrent and multi-processing environments.