The Java Virtual Machine (JDK) has a number of techniques to allow us to work safely with data in multi-treaded applications: We can declare a variable volatile, we can synchronize, explicitly define and use a lock from java.util.concurrent.locks, or use one of the classes from the java.util.concurrent.atomic package. Using volatile variables is often thought of as a more efficient alternative to synchronization but will declaring a variable volatile be always enough to make it safely usable from multiple threads?

To make a variable safely usable from multiple threads in Java, we must consider two potential problems: Memory consistency issues and thread interference problems. The former refers to the visibility of a thread’s state, its data, from other threads, while the latter has to do with race conditions when instructions in multiple threads are carried out in an unpredictable order. The volatile keyword fixes memory consistency issues but it will not help with race conditions. In these cases we must synchronize or use an explicit lock to make the code thread-safe. In some well defined situations, we may be able to use one of the atomic classes which will take care of both problems at the same time, in a very efficient manner.

The “visibility” problem

Imagine you have two threads and the scheduler decides to execute them on different cores in the CPU. One of them will create a variable and initialize it. The other will print it.

Thread 1: myInt = 1

Thread2: println(myInt)

Now remember that a couple of lines of Java code will actually turn into many machine code instructions on the Java Virtual Machine. This is Java’s assembly code. So there will be some temporary data, memory references, and other things these instructions will need to work. This data will be cached in the core. Since the two threads run on different cores, they cannot see each other’s work.

The “race condition” problem

Besides the memory consistency issue explained above, there is also the issue of thread intereference. This could be best explained by the following example:

Thread 1: @volatile var myInt = 0

Thread 2 println(s”T2=$myInt”); myInt += 1;

Thread 3 print(s”T3=$myInt”); myInt += 1;

In the above example line 2 and 3 are undeterministic, because we don’t know which thread increments first. Depending on which order the threads will run in, we may get T2=0 and T3=1 as output, or we may get T2=1 and T3=0 as output, or 0 and 0, if the two print statement’s happen first. This is not good!

The 4 ways of fixing these problems

Synchronization and memory barriers

In the memory consistency issue, the way threads will see what other threads are doing is by flushing the core caches and bringing the result into the CPU main memory. The processor has support for this on a low level with memory barriers. When you bring all the partial results together for reading data using memory barriers, you are said to use a read barrier. This means that whatever happened before will actually be reflected in the state of the memory. It creates a happened before relationship for reading. Similarly you can ask for a write barrier if you want a happened before relationship for writing.

Locks

We can also use locks from java.concurrent.lock to create these memory barriers. The difference is that when we use an explicit lock, we have more control. For instance we have multiple wait sets each corresponding to a condition, which means we can block on multiple conditions. We also can have locks that are not fully nested within each other. Synchronization, which under the hood also creates locks, requires that the first lock we acquired be released last. Explicit locks do not have this restriction. Synchronization can be thought of as a special case of using locks, which requires the nesting of the locks.

These memory barriers work by forcing all relevant results into the main memory but they are costly. For one thing in order to do this the CPU must sacrifice some parallelism, because it is bringing things back from the parallel cores. It also is moving data around, which takes time. Wouldn’t it be great if there was a better way?

Java Volatiles and Scala Volatiles

There is a better away, using volatile variables. You can create volatile primitive variables, but you can also create volatile references, which opens up a world of possibilities of writing lock-free code.

In Java you use the volatile key word, while in Scala you must use the @volatile annotation. They both do the same thing.

Remember declaring a variable and initializing it, can actually involve multiple steps in the processor, even though it may be only one line in your source code. If one thread creates an integer variable and another prints it, without volatile or synchronization, you might end up with a half-initialized integer when your printer thread accesses it. When you use volatile, just like with synchronization, the processor will guarantee that in processor main memory the integer will be there in a consistent state. Because volatile also guarantees happens before handling, if thread 1 created the int variable and then thread 2 tries to print it, thread 2 will see the fully initialized int because it happened before. Again, without concurrency management there would be no such guarantee because the processor may actually execute things in a different order if it “thinks” that would be more efficient, leaving you with a half initialized integer. So this happens before handling is important.

Atomics

When we use a class from the java.util.concurrent.atomic package, we no longer have to use either volatile or synchronization. This will fix both type of concurrency problems, the visibility of thread data from other threads, and race conditions. The only drawback to this technique is that it can only be used in a very small well defined situations because all interactions to our shared variables will have to be expressed through the APIs of the atomic classes we use. But if we manage to do this, our code will be fast and non-blocking.

Atomic operations are defined for the read-modify-update cycle of primitive data types such as integers and longs, arrays of these data types, and accumulators. They take advantage of direct support for these operations by modern processors. This means that if we can express our algorithm in terms of updates to the types for which there are atomic updates defined, we can make our code run very fast while still maintaining the high level of protection that synchronization would provide.

Examples for each kind of solution

Example 1: When volatile will work by itself

When you want to update a variable in a way that your update does not depend on the current value of the variable. For instance, your variable only contains a time stamp which is periodically updated from multiple threads. Then a third thread periodically reports the current time stamp by logging it or printing somewhere.

Example 2: When volatile isn’t enough and we must synchronize

The classic example where an update of a variable depends on its current value, is the counter. Updating a counter variable requires a CPU core to read the old value, modify it, and write the new value back. When two or more cores are doing this in parallel, they might interfere with each other and produce a different result each time. Synchronization prevents this problem by using a lock which makes certain only one thread acts on the variable at any one time, within the read-modify-write cycle.

Example 3: When we must use an explicit lock

Sometimes, the strict nesting that synchronization requires is not possible because of the data structure we are using. Think of how you would lock a stream. You may have to lock one frame and the next, then release the first lock and lock the one after the latter lock, kind of like moving a lock window over the frames of the stream. This can only be done with explicitly created locks.

When you want to lock based on different set of conditions, you also need a manual lock. For instance you may have write lock and read locks for an object, and release them alternating between write and read, letting a bit of writing and then a bit of reading to occur. This could not be done with synchronization because all threads waiting on exclusive access to the object are in the same wait set.

Example 4: When using one of the atomic classes will work

If your update falls in the category of reading simple values, comparing them to see if they are equal a certain value, and updating them only iff they are, then you can use the atomic classes, which will cary out these tasks with the same concurrency protections as synchronization would, but without a lock, and therefore very fast. By no locking this method also reduces the chance of introducing dead-locks, live locks, and other types of locking related side effects.