Bad Things Happen – Recognizing Common Pitfalls in Java Concurrency Programming

The rules in concurrency programming are different. Single threaded code is easily understood as deterministic. However, deterministic rules can change when multiple threads are reading and writing to the same memory space. It has been my experience that these rules for writing deterministic thread-safe code are often misunderstood. Bad things that we never expect to happen, happen.

Concurrency programming is mostly about ensuring that your system remains deterministic when multiple threads read and write to the same memory space.

Introductory Concurrency Problem

Let’s take one of the simplest concurrency problems and I will describe what can go wrong.

Let’s say that there exist an Account object in memory with a balance of $500.00, and two threads are reading and writing to it. Thread 1 wants to add $25000 to the balance. Thread 2 wants to withdraw $300 from the balance. If the code is not thread safe, then it will be quite possible for an execution of this scenario to yield a balance of $200.00.

"account

The following interleaving analysis depicts how this scenario can happen.

interleaving analysis
Bad Things Happen – Account

An interesting exercise is to analyze what happens behind the scenes. To the untrained eye, the changeBalance method only has one statement in it. What can possibly go wrong when when two different threads call changeBalance on the same account object? Let’s disassemble the class file with javap.

In Java byte code, it represents seven lines of code. This will require even more lines of code when translated to machine code (assembly)!

Good Things Happen

Multi-threaded concurrent programs provide the following benefits.

Exploit multiple processors – Properly designed, multithreaded programs can improve throughput by utilizing available processor resources more effectively  (Java Concurrency in Practice 3 p)

Simplicity of modeling – Using a thread to handle one type of task can be simpler to write, less error-prone, and easier to test (ibid. 3 p).

Simplified handling of asynchronous events – A server application that accepts socket connections from multiple remote clients may be easier to develop when each connection is allocated its own thread and allowed to use synchronous I/O (ibid. 4 p).

More responsive user interfaces – Long running tasks executed in a separate thread create a more responsive UI (ibid. 5 p).

Bad Things Happen

However, multi-threaded concurrent programs put the following at risks.

Safety“Nothing bad ever happens” (ibid. 8 p.)

Liveness“Something good eventually happens” as opposed to actively being in a state where it is permanently unable to make forward progress. (ibid. 8 p.)

Performance“Something good happens quickly.” (ibid. 8 p.)

UbiquitousnessThreads are everywhere. Often buried deeply within a framework, and hidden from the programmer! (ibid. 9 p)

Bad Thread Safety

If multiple threads access the same mutable state variable without appropriate synchronization,  your program is broken (ibid. 16 p). It may no longer execute in the deterministic manner that you expect.

Without appropriate synchronization:

  • Changes made by one thread might not be visible to another thread;
  • There might be a race condition;
  • a variable might have an unsafe publication; or
  • A class might have been improperly constructed

Appropriate synchronization provides deterministic rules for sharing memory.

Bad Things Happen – No Visibility

In this scenario, two bad things happen. First, since there are no visibility guarantees on the state variable ready, the reader thread might run an infinite loop forever. Second, this application is non-deterministic because it might print 0 or 42.

Bad Things Happen – Unsafe Publication

In this scenario, two bad things happen if multiple threads share an UnsafePublication reference. First, UnsafePublication.holder might be null, and its assignment might never be visible to another thread. Second, the holder might have stale values, since n will be initialized with the default value zero, and then later it will be assigned the value from the constructor. This change in value can happen in the middle of the n != n comparison!

While it may seem field values set in a constructor are the first values written to those fields and therefore that there are no “older” values to see as stale values, the Object constructor first writes the default values to all non-final fields before subclass contstructors run. It is therefore possible to see the default value for a field as a stale value.

Bad Things Happen – Sleep and Pray

In this scenario, bad things happen because there is no guarantee that the thread that calls the stepOne method had a chance to execute and call that method. Ultimately, the Operating System decides which threads to execute. This is probably the most common concurrency error I find when I review code. I call this the “Sleep and Pray Anti-Pattern”.

Thread.sleep(n) does not resolve race conditions. Anytime you see this, the application or code is in fact broken. A correct solution would instead use a signal, latch, cyclic barrier, or other similar solution.

Bad Things Happen – Unsafe Encapsulation

In this scenario, bad things happen if two threads share the same reference to a wallet. In-between method calls, the other thread can gain ownership and change the value at an inopportune moment. The correct solution would use a synchronized block.

Bad Things Happen – Bad Latch

In this scenario, three bad things happen. First, there are no visibility guarantees on the latch assignment. Second, there is no guarantee that stepOne will be called before stepTwo. Third, Thread.sleep(1) does nothing to fix the race condition. This scenario is actually the “Sleep and Pray Anti-Pattern” in disguise. However, if you remove the sleep calls, there are still visibility concerns and race conditions. I have seen this type of mistake quite often. In fact, anytime a CountDownLatch is not final, and used in a multi-threaded scenario (it’s intended use), there is very high likelihood that it can yield this same error.

Bad Things Happen – Different Lock

In this scenario, bad things happen because the counter is protected by two different monitors. Proper concurrency access requires using the same monitor.

Bad Things Happen – Thread Safety Analysis

Thread Safety Analysis

It is the developer’s responsibility to determine which threads exist within an application, and what mutable state variables are shared between them.

This requires understanding how the framework that you are using uses threads.

Thread Safety Tactics

Favor statelessnessStateless objects are always thread safe. Certain types of objects should never have state, for example servlets, services, or RESTful resources.

Favor immutability – Immutable objects are always thread-safe. An object is immutable if:
Its state cannot be modified after construction; All its fields are final; and it is properly constructed (the this reference does not escape during construction).

Encapsulate synchronization – Thread-safe classes encapsulate any needed synchronization so that clients need not provide their own.

Thread ConfinementDon’t share memory between threads.

Bad Thread Liveness

Thread-safety is about sharing memory. Thread liveness is about the health of a thread. The following bad things happen to threads.

Deadlock – Java applications do not recover from deadlock (ibid. 205 p.).

Lock-ordering deadlocks – if any threads acquire locks (plural) in a different order this will cause deadlock (ibid. 206 p.).

Deadly embracewhen thread A holds lock L and tries to acquire lock M, but at the same time thread B holds M and tries to acquire L, both threads will wait forever (ibid. 205 p.).

Resource Deadlockswhen thread A holds a connection to database D1 while waiting for a connection to D2, and thread B holds a connection to do D2 while waiting for a connection to D1 (ibid. 215 p.).

Starvationoccurs when a thread is perpetually denied access to resources that it needs in order to make progress (ibid. 218 p.).

Livelock –  occurs when when a thread, not blocked, cannot make progress because it keeps retrying an operation that will always fail.

Deadlock

Deadlock

Java applications do not recover from deadlock. Above is an interleaving analysis of a scenario that leads to a deadly embrace.

Thread 1 holds a lock S while trying to lock Y, while Thread 2 holds lock Y while trying to lock S.

Bad Things Happen – Lock Ordering Deadlock

In this scenario, bad things happen when two or more threads share a reference to a LockOrderingDeadLock instance and called leftRight() and rightLeft() at the same time. At the risk of repeating myself, this is a lock ordering dead lock.

Bad Things Happen – Dynamic Lock Ordering Deadlock

In this scenario, bad things happen if two threads call transferMoney at the same time, one transferring X to Y, and the other doing the opposite. This is a dynamic lock ordering deadlock.

Thread Liveness Tactics

Lock OrderingAcquire locks in a fixed global order (ibid. 206 p). If you must acquire multiple locks, lock ordering must be part of your design (ibid. 215 p). Use System.identityHashCode(Object) to induce lock ordering when there is dynamic lock ordering (ibid. 208 p).

Non-Cyclical GraphsThink of threads as the nodes of a directed graph whose edges represent the relation “Thread A is waiting for a resource held by Thread B”. If this graph is cyclical, there is a deadlock (ibid. 205-206 p).

Open Callsstrive for open calls (avoid synchronized methods, instead use compact synchronize blocks) (ibid. 211-213 p).

Timed Locksuse the timed tryLock feature of the explicit Lock classes instead of intrinsic locking (ibid. 215-216 p).

ToolingThread dump analysis through jstack or other means helps identify which locks and threads interact with each other (ibid. 216 p).

Bad Thread Performance

Multiple threads introduces performance costs in the form of coordinating between threads (locking, signaling, memory synchronization), increased context switching, thread creation, teardown, and scheduling overhead (ibid. 221 p).

Context switching – a context switch occurs when the OS preempts one thread so that another can use the CPU (ibid. 229 p). This impacts performance.

Memory synchronization – performance cost of synchronization happens at the CPU level in the manner that the processor must handle its caches. Most of this cost occurs with contended synchronization which requires the OS (ibid. 230-231 p).

Blocking – when locking is contended, the losing thread(s) must block (ibid. 232 p).

Bad Things Happen – Wide Lock

In this scenario, bad things happen in relation to performance. The method userLocationMatches completely blocks during the whole operation when instead only the attributes field needed synchronization. Even better, this code should just use a ConcurrentHashMap.

Thread Performance Tactics

Reduce lock contention.

  1. Reduce the duration for which the lock is held;
  2. Reduce the frequency with which locks are requested; or
  3. Replace exclusive locks with coordination mechanisms that permit greater concurrency (ibid. 233 p).

Reduce the scope of a lock and get and out quickly (ibid. 233 p).

Consider splitting one lock into multiple locks, or to use lock striping. ConcurrentHashMap uses lock striping by utilizing 16 locks, and protects bucket N with lock N mod 16^3 (ibid. 237 p).

Forgo the use of exclusive locks in favor of more concurrency friendly means of managing shared state, such as concurrent collections, read-write locks, immutable objects, and atomic variables (ibid. 239 p).

Monitor CPU utilization. If CPUs are not fully utilized there are several likely causes:

  1. Insufficient load;
  2. I/O-bound;
  3. Externally bound; or
  4. Lock contention.

Bad Memory Visibility

Perhaps the least undersdood concept in concurrency programming is memory visibility.

When reads and writes occur on different threads, without synchronization, there is no guarantee that the reading thread will see a value written by another thread on a timely basis, or even at all (ibid. 33 p).

Locking is not just about mutual exclusion; it is also about memory visibility. To ensure all threads see the most up-to-date values of shared mutable variables, the reading and writing threads must synchronize on a common lock (ibid. 37 p).

Thinking Concurrency

In many respects, concurrency is about correctly ensuring that you can share data with different threads.

It follows this theme:

Sharing Signals Work

Everything that the JVM and Java language have to offer fit within this theme of concurrency programming.

SharingSignalsWork
synchronized
ReentrantLock
ReentrantReadWriteLock

Atomic{Type...}

Concurrent Collections

BlockingQueue
Exchanger

final

Immutable Object

ThreadLocal

Pipe
interrupt()

wait(...)
notify()
notifyAll()

CountDownLatch
CyclicBarrier
Semaphore

Condition
Runnable
Thread

Callable
Future
FutureTask

Executor

Completion Service

Sharing

Signals refers to communicating and coordinating the acquisition and relinquishment of a lock.

sharing

Sharing Tactics

Favor statelessness, favor immutability, and encapsulate synchronization, in that order.

A mutable variable used by only one thread is also safe.

Address visibility concerns when a mutable variable is shared.

Volitile only addresses visibilty. Atomic classes address visibility and certain compound statements.

Understand that only a lock or synchronized block can safely guard multiple variables or provide mutually exclusive memory access.

When you need to provide a sequence of data between threads use a BlockingQueue or Pipe.

Signals

Signals refers to communicating and coordinating the acquisition and relinquishment of a lock.

signals

Signaling Tactics

A signal (sometimes called a notification) is the only thread-safe technique that can safely coordinate access to a state-dependent class.

If you need to implement a state-dependent class, one whose methods must block if a state-based precondition does not hold, the best strategy is usually to build upon an existing library such as Semaphore, BlockingQueue, or CountDownLatch.

However, sometimes existing library classes do not provide a sufficient foundation; in these cases you must implement your signaling using conditions and signals (notifications).

Sleep and Pray Anti-Pattern

Thread.sleep(n) does not resolve race conditions.

Anytime you see this, the application or code is in fact broken.

I call this “sleep and pray”.

A correct solution would instead use a signal, latch, cyclic barrier, or other similar solution.

Work

threads

Work refers to the execution of code within the context of a thread.

Although most of the complexity within concurrency programming concerns itself with “sharing”, it is also important to manage the amount of threads used by an application.

The JVM provides a variety of means for managing threads.

Work Tactics

Understand which threads exist in your application, and which mutable state is shared between them.

When creating them, manage them, do not launch them into “space”. The amount of threads should be bounded by an executor.

Amdahls’s Law bounds the benefit in introducing new threads.

speed up <= 1/(F+(1-F)/N)

As N approaches infinity, the maximum speedup converges to 1 / F. F is the fraction of the calculation that must be performed serially. N is the number of processors.

References

Most of the material in this presentation comes from my source references.

My most important sources were experience, JavaDocs, the Java Language Specification, and the book Java Concurrency in Practice by Brian Goetz.

  1. Java Concurrency Idioms by Alex Miller at Terracotta.
  2. Java Concurrency in Practice by Brian Goetz.

  3. Java Platform SE 6 API.
  4. The Java Language Specification.
  5. Java Lesson: Concurrency.
  6. Javamex: Synchronization and Concurrency.