Exploring Mutexes In C

Originally published on Cohost under the title "yeah i got nerd-sniped" as a response to someone who posted under the "#c" tag with a simple spinlock implementation.

i love talking about mutexes so now i'm going to make that everyone else's problem

ok so. indeed, that code is a relatively straightforward implementation of mutexes. i could digress into stuff about volatile and memory orderings, but instead, there is something more important we need to talk about before considering if this mutex is "good":

What Is A Mutex?

By its name, a mutex is something that provides Mutual Exclusion of a shared resource for concurrent tasks. In plain language:

Only one task can access the thing at a time

Programmers call it "acquiring" or "locking" the mutex when you attempt to access that shared resource, and "releasing" or "unlocking" the mutex when you had ownership of the resource and are giving it away for others to use.

Besides mutual exclusion, there's another property (not in the name) that's required for something to be a proper mutex:

Progress

No matter how many tasks are trying to acquire the mutex, if the resource isn't owned, someone will be able to get it

It's hard to mess this one up; in fact I'd say most buggy mutex implementations skew the other way by not correctly providing mutual exclusion. To give a concrete example of how progress could be violated, consider the following implementation: (for some reason i couldn't put it inside a <details>, sorry)

#include <stdatomic.h>

typedef struct {
  queue_t waiters;
  atomic_int owner;
} mutex_t;

static const int EMPTY = 0;

void lock(mutex_t *m, int pid) {
  if (!atomic_compare_exchange_strong(&m->owner, &EMPTY, pid)) {
    push_back(&m->q, pid);
    deschedule(); // Pauses execution until reschedule(pid) called
  }
  m->owner = pid;
}

void unlock(mutex_t *mutex) {
  int pid = pop_front(&m->q);
  if (pid != EMPTY) {
    reschedule(pid);
  } else {
    atomic_store(&m->owner, EMPTY);
  }
}

Can you spot an error with this? I'll hide the intended solution in case you want to puzzle it out.

Between the atomic_compare_exchange_strong line and the push_back line, the current owner could release the mutex without noticing that we were trying to put ourselves on the queue. This violates progress because a single task trying to acquire the lock was not able to.

There are certainly many other bugs, such as being able to race on m->q if it is not atomic. That is the thing about bugs, more pop up than u intend :)

Progress and mutual exclusion are basically required for a correct mutex, but this last one is just nice to have:

Bounded Waiting

If a task tries to acquire the mutex, it will eventually be the owner of the resource

This can be strengthened further by putting bounds on how long it will take (a function of the number of competing tasks, proportional to), or weakened to deal with probabilistic algorithms. In any case, it's common for implementations attempting to go for bounded waiting do so at the cost of progress or mutual exclusion (by accident, as was the case in the example). So it is refreshing to see a simple spinlock that won't even try to deal with it.

However! Bounded waiting is still a very important property, and any non-specialized implementation should have it. Consider the following code:

void mainloop(void) {
  while (!should_stop) {
    lock(&m);
    process(&shared_resource);
    unlock(&m);
  }
}

To be clear, this is is a terrible, effectively single-threaded design. Still, it's not all that uncommon in real designs for threads to relatively quickly acquire and release the same mutex over and over, which can starve other threads if there isn't bounded waiting on the mutex.

Can We Achieve Bounded Waiting With Only `<stdatomic.h>`?

My claim is yes! Here is an example of such a mutex, known as a ticket lock:

#include <stdatomic.h>

typedef struct {
  atomic_int next_ticket;
  atomic_int now_serving;
} mutex_t;

void lock(mutex_t *m) {
  // Get our ticket
  int ticket = atomic_fetch_add(&m->next_ticket, 1, memory_order_relaxed);
  // Spin until we are being served
  while (ticket != m->now_serving) {}
  atomic_thread_fence(memory_order_acquire);
}

void unlock(mutex_t *m) {
  // Let the next ticket be served
  atomic_fetch_add(&m->now_serving, 1, memory_order_release);
}

How it works is fairly simple, and pretty much entirely explained in the comments already. It's inspired by how you'd wait in line at a deli or pharmacy.

Some great things about this lock:

just one atomic operation to lock and unlock
small (8 bytes)

Some not-great things about this lock:

spin waiting :(
technically, ABA, but with 32-bit tickets this shouldn't be an issue in practice
doesn't detect misuse

Most of these deficits could be solved by better design, but i am too lazy rn :P

Now, you may have still some questions like "wait why didn't you use volatile?" and "what's with that memory order stuff?", but this post is a little long as it stands, so I figure I'll leave the discussion about How Your Compiler And CPU Lie To You for a future post. Or just read Tower Of Weakenings because it answers or has links to answers for those questions and more.

~~And by the way, it's entirely possible I missed something and this code has a bug I am unaware of, so please do ask questions if you think something is up!~~
~~edit: I did get it wrong in multiple ways the first time. After talking with a few friends I am now more confident it is correct :) will explain in the later post~~
edit 2: I got it wrong again! Definitely making a post because explaining how I got things wrong is quite a journey.

Exploring Mutexes In C

What Is A Mutex?

Progress

Bounded Waiting

Can We Achieve Bounded Waiting With Only <stdatomic.h>?

Can We Achieve Bounded Waiting With Only `<stdatomic.h>`?