Want to build your own kernel in Rust? See the Bare Metal Rust page for more resources and more posts in this series. There's just a few more posts to go until we have keyboard I/O!

Hacking on kernels in Rust is a lot of fun, but it can also result in massive frustration when QEMU starts rebooting continuously because of a triple fault. One good way to minimize frustration is to wander on over to the ever-helpful OSDev wiki. It's sort of like having an experienced kernel developer on hand to give grouchy but sanity-saving advice.

The OSDev Beginner Mistakes page, in particular, has saved me a couple times already. But there's one bit of advice that I want to focus on today, which I've marked in boldface below:

Beginners often ask "What is the easiest way to do X?" rather than "What is the best, proper, correct way to do X?". This is dangerous as the newcomer doesn't invest time into understanding the superior way to implement something, but instead picks a conceptually simpler method copied from a tutorial. Indeed, the simpler route is often too simple and ends up causing more problems in the long run, because the beginner is ignorant of the superior alternative and doesn't know when it is better to switch. What's so bad about taking the hard route instead?

Common examples include being too lazy to use a Cross-Compiler, developing in Real Mode instead of Protected Mode or Long Mode, relying on BIOS calls rather than writing real hardware drivers, using flat binaries instead of ELF, and so on. Experienced developers use the superior alternatives for a reason…

So what does that mean, "being too lazy to use a cross-compiler"? It means cheating, and using our regular rustc setup to build ordinary user-space code, and then trying to run it in kernel space. This will actually work, at least for a while. But eventually, we may find ourselves engaged in multiweek debugging nightmares.

So today, I'm going to talk about the sanity-saving difference between --target x86_64-unknown-linux-gnu and --target x86_64-unknown-none-gnu, and how to get your Rust compiler ready for the kernel world.

Problem 1: The "red zone", or how to keep interrupts from corrupting your stack

An x86_64 processor has a stack pointer register rsp. This points to a stack which grows downwards in memory, and which is used to store function arguments, return addresses and local variables. (Here's a great introduction to the x86 stack.)

Normally, all data for the current function is stored at or above the address pointed to by rsp. Everything below rsp in memory is leftover garbage that can be safely overwritten at any moment. This is important because, at any moment, the CPU might receive a hardware interrupt, causing it to pause your currently running function and push a whole bunch of new data on the stack. As long as all your data is at or above rsp, you're safe.

But hey, updating rsp to keep track of where we're storing data wastes precious processor time, and we can't have that. So the x86_64 calling convention (ABI) allows applications to use up to 128 bytes of scratch space below rsp without telling anybody about it. This 128-byte space is called the "red zone." But unfortunately, the CPU doesn't know about the red zone and will happily clobber it when handling an interrupt.

How bad can it get? Philipp Oppermann pointed out this debugging horror story:

It has costed me 6 days to debug and fix this, but it was really worth the effort.

It's interesting that no OSDev resources discussed this important topic before. From further inspection, there seems to be a high rate of hobby x86-64 kernels that get their leaf functions stacks silently overriden in case of an interrupt triggered in the right place.

Now to the story: somehow the montonic PIT interrupts that get triggered every 1 millisecond badly corrupted my kernel state. At first, I thought the handler code might have corrupted the kernel stack, but minimizing it only to acknowledging the local APIC… led to the same buggy behaviour.

It was weird. Once I enable interrupts and program the PIT to fire at a high rate, things go insane: random failed assert()s and page-fault exceptions get triggered all over the place.

Yeah, I think I'll just as soon skip that particular experience. Happily, we can convince rustc to not use the red zone by passing -C no-redzone, and this will do the right thing for a single Rust library.

But what if we want to use cargo? We really want to pass -C no-redzone to every Rust crate our compiler builds, because we really don't want little bits of unsafe code sneaking into our kernel when we're not looking.

Problem 2: How to keep interrupts from corrupting your floating point registers

There's a similar problem with floating point registers. But first, let's explain what happens when the kernel enters an interrupt routine. Here's a basic keyboard interrupt handler routine:

extern rust_interrupt_handler

section .text
bits 64

keyboard_interrupt_handler:
        ;; Save registers which are normally supposed to
        ;; be saved by the caller.  I _think_ this list
        ;; is correct, but don't quote me on that.  I'm
        ;; probably forgetting something vital.
        push rax
        push rcx
        push rdx
        push r8
        push r9
        push r10
        push r11
        push rdi
        push rsi

        ;; Call a Rust function.
        call rust_keyboard_interrupt_handler

        ;; Pop the registers we saved.
        pop rsi
        pop rdi
        pop r11
        pop r10
        pop r9
        pop r8
        pop rdx
        pop rcx
        pop rax

        ;; Pop CPU interrupt state
        iretq

When the CPU enters our interrupt routine, most of the CPU registers still contain whatever data they did before. When we're done handling the interrupt, that data needs to still be there, or else the code that we interrupted will get really confused when the contents of a register change without warning.

We can save registers by pushing them to the stack. But which registers do we need to save? Well, we need to go read the ABI manual again and figure out which registers rust_keyboard_interrupt_handler is allowed to clobber, and which it must preserve. We quickly discover that, yes, Rust code is allowed to clobber the floating point registers.

Can we save the floating point state in our interrupt? Well, the CPU provides the FXSAVE and FXRSTOR instructions:

Reloads the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image specified in the source operand. This data should have been written to memory previously using the FXSAVE instruction, and the first byte of the data should be located on a 16-byte boundary. The FXSAVE table shows the layout of the state information in memory and describes the fields in the memory image for the FXRSTOR and FXSAVE instructions.

512 bytes? OK, that's huge. And it's going to take a relatively long time to dump and reload all that. So on the x86 and x86_64 architecture, the traditional solution is to simply avoid using floating point registers in kernel space, which is what Linux does. This keeps interrupt routines and syscalls nice and fast, without paying the full overhead of a context switch (which requires saving more registers).

But there's a sneaky complication here: The MMX and XMM registers are technically part of the floating point state, but rustc feels free to use them to optimize non-floating-point code. So once again, we need to customize rustc's code generation. Gerd Zellweger proposes passing the following flags to rustc:

-C target-feature=-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-3dnow,-3dnowa,-avx,-avx2

Again, we ideally want a solution that works with cargo, too, so that code using these registers doesn't turn up in a crate.

Solution: Creating a target file

When we want to compile code on one operating system or processor, and run it on another, we need to know about compiler "target triples". These are strings like x86_64-linux-gnu or thumbv7em-none-eabi, where the first part (x86_64 or thumbv7em) is the CPU architecture, the second part (linux) is the operating system, and the third part (gnu or eabi) is the ABI or calling convention. We can also insert an extra field to specify an OS vendor, writing x86_64-unknown-linux-gnu.

Now, our problem up until now is that we've been compiling code for x86_64-unknown-linux-gnu and trying to run it without Linux. What we actually want is x86_64-unknown-none-gnu, where none means "no operating system". And we want to customize it a bit.

Thanks to some great work by Corey Richardson, we can define our custom target without modifying the compiler. We just need to create a file named x86_64-unknown-none-gnu.json and leave it in our project's root directory.

But what do we put in that file? Well, that's a bit mysterious. I recommend reading target/mod.rs and paying particular attention to the Target and TargetOptions structs. Then read through linux_base.rs and x86_64_unknown_linux_gnu.rs to get some sensible defaults. Then check your guesses against rust-barebones-kernel and Redox, and hope for the best. Based on all of these, here's my best guess, which includes both disable-redzone and Gerd Zellweger's FPU-related flags:

{
    "llvm-target": "x86_64-unknown-none-gnu",
    "target-endian": "little",
    "target-pointer-width": "64",
    "os": "none",
    "arch": "x86_64",
    "data-layout": "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128",
    "pre-link-args": [ "-m64" ],
    "cpu": "x86-64",
    "features": "-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-3dnow,-3dnowa,-avx,-avx2",
    "disable-redzone": true,
    "eliminate-frame-pointer": true,
    "linker-is-gnu": true,
    "no-compiler-rt": true,
    "archive-format": "gnu"
}

But ultimately, the contents of this file are up to you: What CPU features do you want Rust to use in your kernel?

To build using our new target, we can run use rustc --target x86_64-unknown-none-gnu or cargo build --target x86_64-unknown-none-gnu. Both of these will look for a x86_64-unknown-none-gnu.json file in our current directory and use it.

Rebuilding libcore

Now that we have our new x86_64-unknown-none-gnu target, we can use it to rebuild Rust's libcore (the bare metal runtime). We can check out the Rust source tree and run rustc like this:

git clone https://github.com/rust-lang/rust.git
mkdir -p build
rustc --target x86_64-unknown-none-gnu -Z no-landing-pads \
    --out-dir build/ \
    rust/src/libcore/lib.rs

(The -Z no-landing-pads flag is to disable unwinding until we're ready to support it. See Philipp Oppermann's Setup Rust post for details.)

But when we run the command above, it fails with the following error:

LLVM ERROR: SSE register return with SSE disabled

You can find the gory details in the Rust bug tracker, but basically, as eternaleye explains:

The x86_64 ABI states that it is mandatory for floating-point return values to be passed in SSE registers. On one level, this makes an enormous amount of sense: the 387 FPU being a weird 80-bit thing causes real issues, and every x86_64 CPU possesses SSE2.

So our problem is that somewhere in libcore, somebody is trying to use floating point numbers, and there's no way to validly compile them without using those obnoxious SSE2 registers we just worked so hard to disable.

Now, it's not clear to me that floating point should be a mandatory part of libcore. It makes it harder to write kernels or Linux kernel modules in Rust. And of course, there are lots of embedded processors without FPU support, some of which would otherwise be fine targets for embbeded Rust. "No floating point" might be a special case, but it's relatively common and well-defined one.

Happily, there's a simple solution. The rust-barebones-kernel by John Hodge includes a patch which adds a disable_float feature to libcore, making all the important uses of f32 and f64 conditional. For example:

+#[cfg(not(disable_float))]
 clone_impl! { f32 }
+#[cfg(not(disable_float))]
 clone_impl! { f64 }

I've proposed including some version of this patch in libcore so that people will be able to skip this step in the future when working in environments without floating point.

Once we apply this patch to libcore, we can now build it as follows:

rustc --target x86_64-unknown-none-gnu -Z no-landing-pads \
    --cfg disable_float \
    --out-dir build/ \
    rust/src/libcore/lib.rs

So what happens if we try to build a crate using cargo build --target x86_64-unknown-none-gnu? We get:

error: can't find crate for `core` [E0463]

The final fix required to get libcore working is to place the compiled libcore.rlib somewhere that cargo will find it. Assuming you're using a nighly Rust build installed using multirust, you can write:

rustc --target x86_64-unknown-none-gnu -Z no-landing-pads \
    --cfg disable_float \
    --out-dir ~/.multirust/toolchains/nightly/lib/rustlib/x86_64-unknown-none-gnu/lib \
    rust/src/libcore/lib.rs

At this point, you can use cargo almost normally, as long as you only pull in crates that use core and not std. (It's possible to get collections working without too much trouble, which I'll talk about in a later post.)

Next steps

Now that we have CPU I/O ports and a correctly configured target, we're ready to set up interrupts. And once we have interrupts, we can implement a keyboard driver and start doing I/O!

If you have any questions (or corrections!), I'll be following the discussion over on /r/rust.

Want to build your own kernel in Rust? See the Bare Metal Rust page for more resources and more posts in this series.