Friday, June 12, 2009

Current Project:: Wait-Free Cache Allocator (pt 2)

So this post is a little... no, wait, a lot more directed at programmers. If you're not a programmer... you might want to stop now 'cause the next bit gets fairly scary.

Traditional multithreaded programming uses mutual exclusion (Mutex) or critical sections to require that only one thread is using a piece of memory. Mutexes work by saying "I'd want to be the only one using this piece of memory" and waits until it is the exclusive owner. When done, the thread releases the mutex, allowing other processes to do something with that memory.

Another way to do this is through something called a "spin lock." A mutex uses extremely efficient "no CPU" waits where the entire thread is shut down until the mutex can be obtained. Spin locks, on the other hand, burn CPU time constantly checking a variable.

Lock-free algorithms typically use atomic operations as a replacement for mutexes. Atomic operations allow a program to modify a single piece of memory guaranteeing that no other thread is currently reading or writing to it. Under Windows, one of the functions to do this is InterlockedCompareExchange. These atomic operations can fail if some other thread is using the data or if the data used during the compare and exchange is stale, so some sort of wait (spin locks, for example) are used to pass the time until the data can be changed.

Wait-free algorithms are lock free algorithms, but they're written in such a way that if a wait is ever needed, they can do something else instead. Essentially, they never sit around and wait. Instead, they perform some other operation then come back and try again later. These types of algorithms are extremely complicated to set up. You've got to guarantee that you always have something else to do just in case an atomic operation can't happen. Essentially, it's loosely organized chaos.

Another problem with wait-free algorithms is that they're rarely performant in either development or execution. So much time and effort is put into making it "truly" wait-free that the algorithm gets very complicated. Not to mention that it takes an incredible amount of time to write one of these correctly.

The cache algorithm I created straddles a fine line between wait-free and lock-free. It is a lock-free algorithm in that it guarantees that some progress can be made if all other threads are suspended (don't ask me, this is someone else's definition.) It is sort of a wait-free algorithm in that the steps used in the cache are broken up to be parallelized as much as possible. It is not, however, a true wait-free algorithm because it uses spin locks to set data and to wait if no work is available. The reason I did it this way is for performance, both in execution and development. The algorithm is compact and very efficient and causes very few hits on the spin locks.

In the next post, I'll go back to the box analogy and explain how atomic operations can be used in a cached memory allocation scheme.

No comments:

Post a Comment