Originally published on Cohost under the title “yeah i got nerd-sniped” as a response to someone who posted under the “#c” tag with a simple spinlock implementation.
i love talking about mutexes so now i’m going to make that everyone else’s problem
ok so. indeed, that code is a relatively straightforward implementation of mutexes. i could digress into stuff about volatile
and memory orderings, but instead, there is something more important we need to talk about before considering if this mutex is “good”:
What Is A Mutex?
By its name, a mutex is something that provides Mutual Exclusion of a shared resource for concurrent tasks. In plain language:
Only one task can access the thing at a time
Programmers call it “acquiring” or “locking” the mutex when you attempt to access that shared resource, and “releasing” or “unlocking” the mutex when you had ownership of the resource and are giving it away for others to use.
Besides mutual exclusion, there’s another property (not in the name) that’s required for something to be a proper mutex:
Progress
No matter how many tasks are trying to acquire the mutex, if the resource isn’t owned, someone will be able to get it
It’s hard to mess this one up; in fact I’d say most buggy mutex implementations skew the other way by not correctly providing mutual exclusion. To give a concrete example of how progress could be violated, consider the following implementation: (for some reason i couldn’t put it inside a <details>, sorry)
#include <stdatomic.h>
typedef struct {
queue_t waiters;
atomic_int owner;
} mutex_t;
static const int EMPTY = 0;
void lock(mutex_t *m, int pid) {
if (!atomic_compare_exchange_strong(&m->owner, &EMPTY, pid)) {
push_back(&m->q, pid);
deschedule(); // Pauses execution until reschedule(pid) called
}
m->owner = pid;
}
void unlock(mutex_t *mutex) {
int pid = pop_front(&m->q);
if (pid != EMPTY) {
reschedule(pid);
} else {
atomic_store(&m->owner, EMPTY);
}
}
Between the atomic_compare_exchange_strong
line and the push_back
line, the current owner could release the mutex without noticing that we were trying to put ourselves on the queue. This violates progress because a single task trying to acquire the lock was not able to.
There are certainly many other bugs, such as being able to race on m->q
if it is not atomic. That is the thing about bugs, more pop up than u intend :)
Progress and mutual exclusion are basically required for a correct mutex, but this last one is just nice to have:
Bounded Waiting
If a task tries to acquire the mutex, it will eventually be the owner of the resource
This can be strengthened further by putting bounds on how long it will take (a function of the number of competing tasks, proportional to), or weakened to deal with probabilistic algorithms. In any case, it’s common for implementations attempting to go for bounded waiting do so at the cost of progress or mutual exclusion (by accident, as was the case in the example). So it is refreshing to see a simple spinlock that won’t even try to deal with it.
However! Bounded waiting is still a very important property, and any non-specialized implementation should have it. Consider the following code:
void mainloop(void) {
while (!should_stop) {
lock(&m);
process(&shared_resource);
unlock(&m);
}
}
To be clear, this is is a terrible, effectively single-threaded design. Still, it’s not all that uncommon in real designs for threads to relatively quickly acquire and release the same mutex over and over, which can starve other threads if there isn’t bounded waiting on the mutex.
Can We Achieve Bounded Waiting With Only <stdatomic.h>
?
My claim is yes! Here is an example of such a mutex, known as a ticket lock:
#include <stdatomic.h>
typedef struct {
atomic_int next_ticket;
atomic_int now_serving;
} mutex_t;
void lock(mutex_t *m) {
// Get our ticket
int ticket = atomic_fetch_add(&m->next_ticket, 1, memory_order_relaxed);
// Spin until we are being served
while (ticket != m->now_serving) {}
atomic_thread_fence(memory_order_acquire);
}
void unlock(mutex_t *m) {
// Let the next ticket be served
atomic_fetch_add(&m->now_serving, 1, memory_order_release);
}
How it works is fairly simple, and pretty much entirely explained in the comments already. It’s inspired by how you’d wait in line at a deli or pharmacy.
Some great things about this lock:
- just one atomic operation to lock and unlock
- small (8 bytes)
Some not-great things about this lock:
- spin waiting :(
- technically, ABA, but with 32-bit tickets this shouldn’t be an issue in practice
- doesn’t detect misuse
Most of these deficits could be solved by better design, but i am too lazy rn :P
Now, you may have still some questions like “wait why didn’t you use volatile
?” and “what’s with that memory order stuff?”, but this post is a little long as it stands, so I figure I’ll leave the discussion about How Your Compiler And CPU Lie To You for a future post. Or just read Tower Of Weakenings because it answers or has links to answers for those questions and more.
And by the way, it’s entirely possible I missed something and this code has a bug I am unaware of, so please do ask questions if you think something is up!edit: I did get it wrong in multiple ways the first time. After talking with a few friends I am now more confident it is correct :) will explain in the later post
edit 2: I got it wrong again! Definitely making a post because explaining how I got things wrong is quite a journey.