Preemption in Go: an introduction

Posted on Jun 9, 2024

In my post about the sync.Pool internals, I briefly touched on preemption, but I didn’t explain what it really is about. Let’s fix that.

If you are in France, like me, you just completed your income tax declaration. If the go runtime is the tax administration, cpu-time is money, and preemption would be a letter of reminder. Or a bailiff banging on your door if you’ve been naughty.

What is preemption?

Preemption is one of the ways the go runtime ensures fairness. It distributes the time between all goroutines, so that no goroutine can take all the cpu resources and starve the others. Not only other goroutines could starve, but scheduling could also be delayed, so this could derail into serious problems.

Preemption as we know it today was first introduced in go 1.2, back in 2013. Before that, the go runtime was not preemptive, meaning that it would not enforce time distribution fairness. The whole stuff would rely on the goroutines to play by the rules. It was a time where we could trust each others, we didn’t need to lock our doors, and children could go out all day without parents knowing what they were up to. Ah, the good old days… Where was I again? Oh yes, the go runtime.

So a goroutine could suck up all the cpu-time, starve the others, and delay new ones, until it explicitly gave up control, such as by calling runtime.Gosched() or during I/O operations.

And here’s preemption coming to the rescue. It occurs at so-called “safe-points”, which are places in the code where we know for sure that we can take over control flow from a goroutine without getting into trouble. What I mean by that is that we don’t want to preempt goroutines anytime, but only at places where it is easy to stop them - think about it - what if I suddenly stopped you while taking a dump? It would have been easier to have stopped you before, or after, but not in the middle of it. Not sure about the analogy, but you got it.

The first step towards preemption is using blocked safe-points. These safe-points happens either during descheduling, or when a goroutine is blocked on synchronization primitives, or on a system call. The goroutine is just hanging there doing nothing, so of course it can be preempted. But this is not enough! What if a goroutine does not stop processing stuff, such as crunching numbers?

Synchronous preemption

Synchronous preemption happens when a goroutine checks for a preemption request: the go runtime communicates its intent to preempt a goroutine, the goroutine “sees” the request and bails out. But there’s something interesting in how this is implemented.

As you may already know, a function begins with a prologue, and ends with an epilogue. These two parts are added by the compiler to do various stuff, preparing and finalizing the function call. In the prologue, there’s a stack bound check, which is a complicated way of saying that we check that there’s enough space on the stack for this call.

Let’s take a look at this engineering masterpiece:

func main() {
	if sum(10, 10) != 20 {
		panic("earth must be flat")
	}
}

func sum(a, b int) int {
	var s int

	if a > 0 {
		s++
		a--
	}
	if b > 0 {
		s++
		b--
	}

	if s == 0 {
		return 0
	}

	return s + sum(a, b)
}

That’s how the cool kids do it. Now, let’s have a look at what the compiler spits out when confronted with this level of quality, using go tool compile -S:

00000 CMPQ    SP, 16(R14)
00000 TEXT    main.sum(SB), ABIInternal, $32-16
00004 PCDATA  $0, $-2
00004 JLS     89
00006 PCDATA  $0, $-1
00006 PUSHQ   BP
00007 MOVQ    SP, BP
00010 SUBQ    $24, SP
00014 FUNCDATA        $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
00014 FUNCDATA        $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
00014 FUNCDATA        $5, main.sum.arginfo1(SB)
00014 FUNCDATA        $6, main.sum.argliveinfo(SB)
00014 PCDATA  $3, $1
00014 LEAQ    -1(AX), CX
00018 LEAQ    -1(BX), DX
00022 TESTQ   AX, AX
00025 SETGT   SIB
00029 CMOVQGT CX, AX
00033 MOVBLZX SIB, CX
00037 LEAQ    1(CX), SI
00041 TESTQ   BX, BX
00044 CMOVQGT SI, CX
00048 CMOVQGT DX, BX
00052 TESTQ   CX, CX
00055 JNE     65
00057 XORL    AX, AX
00059 ADDQ    $24, SP
00063 POPQ    BP
00064 RET
00065 MOVQ    CX, main.s+16(SP)
00070 PCDATA  $1, $0
00070 CALL    main.sum(SB)
00075 MOVQ    main.s+16(SP), CX
00080 ADDQ    CX, AX
00083 ADDQ    $24, SP
00087 POPQ    BP
00088 RET
00089 NOP
00089 PCDATA  $1, $-1
00089 PCDATA  $0, $-2
00089 MOVQ    AX, 8(SP)
00094 MOVQ    BX, 16(SP)
00099 CALL    runtime.morestack_noctxt(SB)
00104 PCDATA  $0, $-1
00104 MOVQ    8(SP), AX
00109 MOVQ    16(SP), BX
00114 JMP     0

The first instruction CMPQ SP, 16(R14) compares the stack pointer with the stack guard. Then at 00004, there’s a conditional jump JLS 89 which says that we should jump at 89 if the stack pointer was greater than or equal to the stack guard, which would indicate that there’s not enough space in the stack for this call. Then, between 00089 and 00099, we save AX and BX onto the stack so that we do not lose them later on, and call runtime.morestack_noctxt.

This runtime.morestack_noctxt is more or less the same as runtime.morestackwhich is implemented using assembly, and in the end, it calls runtime.newstack - just trust me on this.

runtime.newstack’s job is to allocate a bigger stack, and relocate the old one onto it. Let’s have a quick look at it:

func newstack() {
    [...]
    stackguard0 := atomic.Loaduintptr(&gp.stackguard0)
    preempt := stackguard0 == stackPreempt
	if preempt {
		if !canPreemptM(thisg.m) {
			// Let the goroutine keep running for now.
			// gp->preempt is set, so it will be preempted next time.
			gp.stackguard0 = gp.stack.lo + stackGuard
			gogo(&gp.sched) // never return
		}
	}

    [...]

    if preempt {
		[...]
		// Act like goroutine called runtime.Gosched.
		gopreempt_m(gp) // never return
	}

    [...]
}

And here it is: newstack contains code related to preemption. It checks if the stack guard was equal to stackPreempt, which is 0xfffffade, in which case it fixes the stack guard and preempt right now, calling gopreempt_m, which will act as if the goroutine itself called runtime.Gosched().

This is how the runtime forces preemption at the next function prologue: it poisons the stackguard of the goroutine it wants to preempt, and waits for the next function prologue to fail, as SP is always greater than 0xfffffade. Using that trick, we ensure that the goroutine is at a safe-point, in a function prologue. It also does not cost a dime: should the runtime not have preempted the goroutine, the stack bound check would have happened anyway.

But what if the sum function was crunching number in a tight loop, maybe with a couple inlined functions and other optimizations? There would not be any function prologue to check against the stack guard. The stack bound check technique is a letter of reminder - now we need to call the bailiff.

Asynchronous preemption

Asynchronous preemption is a more recent addition to the go runtime. It was introduced in go 1.14, in 2020, to fix tight loop preemption, which was getting worse with compiler optimizations getting better. It works by sending a SIGURG signal to the thread where the goroutine is running:

// preemptM sends a preemption request to mp. This request may be
// handled asynchronously and may be coalesced with other requests to
// the M. When the request is received, if the running G or P are
// marked for preemption and the goroutine is at an asynchronous
// safe-point, it will preempt the goroutine. It always atomically
// increments mp.preemptGen after handling a preemption request.
func preemptM(mp *m) {
	// On Darwin, don't try to preempt threads during exec.
	// Issue #41702.
	if GOOS == "darwin" || GOOS == "ios" {
		execLock.rlock()
	}

	if mp.signalPending.CompareAndSwap(0, 1) {
		if GOOS == "darwin" || GOOS == "ios" {
			pendingPreemptSignals.Add(1)
		}

		// If multiple threads are preempting the same M, it may send many
		// signals to the same M such that it hardly make progress, causing
		// live-lock problem. Apparently this could happen on darwin. See
		// issue #37741.
		// Only send a signal if there isn't already one pending.
		signalM(mp, sigPreempt) // sigPreempt is SIGURG
	}

	if GOOS == "darwin" || GOOS == "ios" {
		execLock.runlock()
	}
}

On the other side, in the thread, a signal handler listens for it:

func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
	[...]

	if sig == sigPreempt && debug.asyncpreemptoff == 0 && !delayedSignal {
		// Might be a preemption signal.
		doSigPreempt(gp, c)
		// Even if this was definitely a preemption signal, it
		// may have been coalesced with another signal, so we
		// still let it through to the application.
	}

    [...]
}

// doSigPreempt handles a preemption signal on gp.
func doSigPreempt(gp *g, ctxt *sigctxt) {
	// Check if this G wants to be preempted and is safe to
	// preempt.
	if wantAsyncPreempt(gp) {
		if ok, newpc := isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()); ok {
			// Adjust the PC and inject a call to asyncPreempt.
			ctxt.pushCall(abi.FuncPCABI0(asyncPreempt), newpc)
		}
	}

	// Acknowledge the preemption.
	gp.m.preemptGen.Add(1)
	gp.m.signalPending.Store(0)

	if GOOS == "darwin" || GOOS == "ios" {
		pendingPreemptSignals.Add(-1)
	}
}

Here, doSigPreempt checks if we are at an asynchronous safe-point, and forces the goroutine to get into a call to asyncPreempt. Here is asyncPreempt:

// asyncPreempt saves all user registers and calls asyncPreempt2.
//
// When stack scanning encounters an asyncPreempt frame, it scans that
// frame and its parent frame conservatively.
//
// asyncPreempt is implemented in assembly.
func asyncPreempt()

//go:nosplit
func asyncPreempt2() {
	gp := getg()
	gp.asyncSafePoint = true
	if gp.preemptStop {
		mcall(preemptPark)
	} else {
		mcall(gopreempt_m)
	}
	gp.asyncSafePoint = false
}

As said in the comments, asyncPreempt saves all registers to keep the context, and gets into asyncPreempt2, which in turn goes into gopreempt_m, that will act as if the goroutine just called runtime.Gosched().

If we simplify things a little, we can say that when the runtime requests synchronous preemption and poisons the stack guard, it also prepares for asynchronous preemption. The goroutine has around 10 milliseconds to get preempted, or the SIGURG signal will be sent to its thread.

I hope this cleared things up a little. Now go have fun!