Bare Metal Rust 2: Retarget your compiler so interrupts are not evil
Want to build your own kernel in Rust? See the Bare Metal Rust page for more resources and more posts in this series. There’s just a few more posts to go until we have keyboard I/O!
Hacking on kernels in Rust is a lot of fun, but it can also result in massive frustration when QEMU starts rebooting continuously because of a triple fault. One good way to minimize frustration is to wander on over to the ever-helpful OSDev wiki. It’s sort of like having an experienced kernel developer on hand to give grouchy but sanity-saving advice.
The OSDev Beginner Mistakes page, in particular, has saved me a couple times already. But there’s one bit of advice that I want to focus on today, which I’ve marked in boldface below:
Beginners often ask “What is the easiest way to do X?” rather than “What is the best, proper, correct way to do X?”. This is dangerous as the newcomer doesn’t invest time into understanding the superior way to implement something, but instead picks a conceptually simpler method copied from a tutorial. Indeed, the simpler route is often too simple and ends up causing more problems in the long run, because the beginner is ignorant of the superior alternative and doesn’t know when it is better to switch. What’s so bad about taking the hard route instead?
Common examples include being too lazy to use a Cross-Compiler, developing in Real Mode instead of Protected Mode or Long Mode, relying on BIOS calls rather than writing real hardware drivers, using flat binaries instead of ELF, and so on. Experienced developers use the superior alternatives for a reason…
So what does that mean, “being too lazy to use a cross-compiler”? It means
cheating, and using our regular rustc
setup to build ordinary user-space
code, and then trying to run it in kernel space. This will actually work,
at least for a while. But eventually, we may find ourselves engaged in
multiweek debugging nightmares.
So today, I’m going to talk about the sanity-saving difference between
--target x86_64-unknown-linux-gnu
and --target x86_64-unknown-none-gnu
,
and how to get your Rust compiler ready for the kernel world.
Problem 1: The “red zone”, or how to keep interrupts from corrupting your stack
An x86_64 processor has a stack pointer register rsp
. This points to a
stack which grows downwards in memory, and which is used to store
function arguments, return addresses and local variables. (Here’s
a great introduction to the x86 stack.)
Normally, all data for the current function is stored at or above the
address pointed to by rsp
. Everything below rsp
in memory is
leftover garbage that can be safely overwritten at any moment. This is
important because, at any moment, the CPU might receive a hardware
interrupt, causing it to pause your currently running function and push a
whole bunch of new data on the stack. As long as all your data is at or
above rsp
, you’re safe.
But hey, updating rsp
to keep track of where we’re storing data wastes
precious processor time, and we can’t have that. So the
[x8664 calling convention (ABI)]abi allows applications to use up to 128
bytes of scratch space _below rsp
without telling anybody about it.
This 128-byte space is called the “red zone.” But unfortunately, the CPU
doesn’t know about the red zone and will happily clobber it when handling
an interrupt.
How bad can it get? Philipp Oppermann pointed out this debugging horror story:
It has costed me 6 days to debug and fix this, but it was really worth the effort.
It’s interesting that no OSDev resources discussed this important topic before. From further inspection, there seems to be a high rate of hobby x86-64 kernels that get their leaf functions stacks silently overriden in case of an interrupt triggered in the right place.
Now to the story: somehow the montonic PIT interrupts that get triggered every 1 millisecond badly corrupted my kernel state. At first, I thought the handler code might have corrupted the kernel stack, but minimizing it only to acknowledging the local APIC… led to the same buggy behaviour.
It was weird. Once I enable interrupts and program the PIT to fire at a high rate, things go insane: random failed assert()s and page-fault exceptions get triggered all over the place.
Yeah, I think I’ll just as soon skip that particular experience. Happily,
we can convince rustc
to not use the red zone by passing -C no-redzone
,
and this will do the right thing for a single Rust library.
But what if we want to use cargo
? We really want to pass -C no-redzone
to every Rust crate our compiler builds, because we really don’t want
little bits of unsafe code sneaking into our kernel when we’re not looking.
Problem 2: How to keep interrupts from corrupting your floating point registers
There’s a similar problem with floating point registers. But first, let’s explain what happens when the kernel enters an interrupt routine. Here’s a basic keyboard interrupt handler routine:
extern rust_interrupt_handler
section .text
bits 64
keyboard_interrupt_handler:
;; Save registers which are normally supposed to
;; be saved by the caller. I _think_ this list
;; is correct, but don't quote me on that. I'm
;; probably forgetting something vital.
push rax
push rcx
push rdx
push r8
push r9
push r10
push r11
push rdi
push rsi
;; Call a Rust function.
call rust_keyboard_interrupt_handler
;; Pop the registers we saved.
pop rsi
pop rdi
pop r11
pop r10
pop r9
pop r8
pop rdx
pop rcx
pop rax
;; Pop CPU interrupt state
iretq
When the CPU enters our interrupt routine, most of the CPU registers still contain whatever data they did before. When we’re done handling the interrupt, that data needs to still be there, or else the code that we interrupted will get really confused when the contents of a register change without warning.
We can save registers by pushing them to the stack. But which registers do
we need to save? Well, we need to go read the ABI manual again and
figure out which registers rust_keyboard_interrupt_handler
is allowed to
clobber, and which it must preserve. We quickly discover that, yes, Rust
code is allowed to clobber the floating point registers.
Can we save the floating point state in our interrupt? Well, the CPU
provides the FXSAVE
and FXRSTOR
instructions:
Reloads the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image specified in the source operand. This data should have been written to memory previously using the FXSAVE instruction, and the first byte of the data should be located on a 16-byte boundary. The FXSAVE table shows the layout of the state information in memory and describes the fields in the memory image for the FXRSTOR and FXSAVE instructions.
512 bytes? OK, that’s huge. And it’s going to take a relatively long time to dump and reload all that. So on the x86 and x86_64 architecture, the traditional solution is to simply avoid using floating point registers in kernel space, which is what Linux does. This keeps interrupt routines and syscalls nice and fast, without paying the full overhead of a context switch (which requires saving more registers).
But there’s a sneaky complication here: The MMX and XMM registers are
technically part of the floating point state, but rustc
feels free to use
them to optimize non-floating-point code. So once again, we need to
customize rustc
’s code generation. Gerd Zellweger proposes
passing the following flags to rustc
:
-C target-feature=-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-3dnow,-3dnowa,-avx,-avx2
Again, we ideally want a solution that works with cargo
, too, so that
code using these registers doesn’t turn up in a crate.
Solution: Creating a target file
When we want to compile code on one operating system or processor, and run
it on another, we need to know about compiler “target triples”.
These are strings like x86_64-linux-gnu
or thumbv7em-none-eabi
, where
the first part (x86_64
or thumbv7em
) is the CPU architecture, the
second part (linux
) is the operating system, and the third part (gnu
or
eabi
) is the ABI or calling convention. We can also insert an extra
field to specify an OS vendor, writing x86_64-unknown-linux-gnu
.
Now, our problem up until now is that we’ve been compiling code for
x86_64-unknown-linux-gnu
and trying to run it without Linux. What we
actually want is x86_64-unknown-none-gnu
, where none
means “no
operating system”. And we want to customize it a bit.
Thanks to some great work by Corey Richardson, we can define our
custom target without modifying the compiler. We just need to create a
file named x86_64-unknown-none-gnu.json
and leave it in our project’s
root directory.
But what do we put in that file? Well, that’s a bit mysterious. I
recommend reading target/mod.rs
and paying particular attention
to the Target
and TargetOptions
structs. Then read through
linux_base.rs
and
x86_64_unknown_linux_gnu.rs
to get some sensible
defaults. Then check your guesses against
rust-barebones-kernel and Redox, and hope for
the best. Based on all of these, here’s my best guess, which includes both
disable-redzone
and Gerd Zellweger’s FPU-related flags:
{
"llvm-target": "x86_64-unknown-none-gnu",
"target-endian": "little",
"target-pointer-width": "64",
"os": "none",
"arch": "x86_64",
"data-layout": "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128",
"pre-link-args": [ "-m64" ],
"cpu": "x86-64",
"features": "-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-3dnow,-3dnowa,-avx,-avx2",
"disable-redzone": true,
"eliminate-frame-pointer": true,
"linker-is-gnu": true,
"no-compiler-rt": true,
"archive-format": "gnu"
}
But ultimately, the contents of this file are up to you: What CPU features do you want Rust to use in your kernel?
To build using our new target, we can run use rustc --target
x86_64-unknown-none-gnu
or cargo build --target x86_64-unknown-none-gnu
.
Both of these will look for a x86_64-unknown-none-gnu.json
file in our
current directory and use it.
Rebuilding libcore
Now that we have our new x86_64-unknown-none-gnu
target, we can use it to
rebuild Rust’s libcore
(the bare metal runtime). We can check out the
Rust source tree and run rustc
like this:
git clone https://github.com/rust-lang/rust.git
mkdir -p build
rustc --target x86_64-unknown-none-gnu -Z no-landing-pads \
--out-dir build/ \
rust/src/libcore/lib.rs
(The -Z no-landing-pads
flag is to disable unwinding until we’re ready to
support it. See Philipp Oppermann’s
Setup Rust post for details.)
But when we run the command above, it fails with the following error:
LLVM ERROR: SSE register return with SSE disabled
You can find the gory details in the Rust bug tracker, but basically, as eternaleye explains:
The x86_64 ABI states that it is mandatory for floating-point return values to be passed in SSE registers. On one level, this makes an enormous amount of sense: the 387 FPU being a weird 80-bit thing causes real issues, and every x86_64 CPU possesses SSE2.
So our problem is that somewhere in libcore
, somebody is trying to use
floating point numbers, and there’s no way to validly compile them without
using those obnoxious SSE2 registers we just worked so hard to disable.
Now, it’s not clear to me that floating point should be a mandatory part of
libcore
. It makes it harder to write kernels or Linux kernel
modules in Rust. And of course, there are lots of embedded processors
without FPU support, some of which would otherwise be fine targets for
embbeded Rust. “No floating point” might be a special case, but it’s
relatively common and well-defined one.
Happily, there’s a simple solution. The rust-barebones-kernel
by John Hodge includes a patch which adds a disable_float
feature
to libcore
, making all the important uses of f32
and f64
conditional. For example:
+#[cfg(not(disable_float))]
clone_impl! { f32 }
+#[cfg(not(disable_float))]
clone_impl! { f64 }
I’ve proposed including some version of this patch in libcore
so that people will be able to skip this step in the future when working in
environments without floating point.
Once we apply this patch to libcore
, we can now build it as follows:
rustc --target x86_64-unknown-none-gnu -Z no-landing-pads \
--cfg disable_float \
--out-dir build/ \
rust/src/libcore/lib.rs
So what happens if we try to build a crate using cargo build --target
x86_64-unknown-none-gnu
? We get:
error: can't find crate for `core` [E0463]
The final fix required to get libcore
working is to place the compiled
libcore.rlib
somewhere that cargo
will find it. Assuming you’re using
a nighly Rust build installed using multirust
, you can
write:
rustc --target x86_64-unknown-none-gnu -Z no-landing-pads \
--cfg disable_float \
--out-dir ~/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-none-gnu/lib \
rust/src/libcore/lib.rs
At this point, you can use cargo
almost normally, as long as you only
pull in crates that use core
and not std
. (It’s possible to get
collections
working without too much trouble, which I’ll talk about in a
later post.)
Next steps
Now that we have CPU I/O ports and a correctly configured target, we’re ready to set up interrupts. And once we have interrupts, we can implement a keyboard driver and start doing I/O!
If you have any questions (or corrections!), I’ll be following the discussion over on /r/rust.
Want to build your own kernel in Rust? See the Bare Metal Rust page for more resources and more posts in this series.
Want to contact me about this article? Or if you're looking for something else to read, here's a list of popular posts.