NixOS has a great out-of-the-box support for ARM64v8 systems, but that comes with a catch: you have to use the prebuilt images to install the system, which are (obviously) not customizable, and come without OpenSSH enabled by default. Unfortunately, this requires to attach a display to the Raspberry Pi to complete an installation – not ideal! This article is the story of my journey to build a custom NixOS image for my Raspberry Pi, with all the pitfalls and errors I had to solve to eventually reach the objective.

NOTE: if you just want to have a working image quickly, then head over to this GitHub repo and follow the instructions. If you’re already running on NixOS, then check out Building on NixOS, or if you want to avoid Docker you might want to just jump to The VM approach: using Vagrant. Finally, if you feel brave and want to get this done in absolutely the quickest way possible, check out how to build an SD image natively on EC2 in 5 minutes.

Table of contents

An introduction to Nix

The world of system administration (a term now falling in disuse) has seen dramatic changes in the course of the years. The incredibly quick rising of the “cattle, not pets” mantra has permanently changed the way people deploy their applications, and for a good reason.

Nowadays, in such a stateless and containerized world, a technology is increasingly gaining traction: Nix. The promise is simple: get a package manager, add a functional and declarative programming language to define packages on top and season with an isolated and sandboxed build process. Voilà! You now have a way to build reproducible, stable and easily rollbackable packages.

Now take the same recipe and extend the concept to an entire Linux distribution: that’s exactly what NixOS aims to. You can have a configuration as minimal as:

{
  boot.loader.grub.device = "/dev/sda";
  fileSystems."/".device = "/dev/sda1";
  services.sshd.enable = true;
}

(taken from https://nixos.org/nixos/about.html)

And you’re just a command away from having a fully functional system with OpenSSH already set up.

NixOS on a Raspberry Pi

Given the obvious advantages of all of this, it seems natural to extend the concept even to a platform like the Raspberry Pi. A practical example of why this could be particularly useful is given by the fragility of SD cards: in the event of a catastrophic filesystem failure due to an SD card committing seppuku, having a .nix file which holds everything needed to get my Pi from zero to ready sounds quite amazing.

Fortunately, it looks like the people and contributors behind NixOS thought the same and have provided first-class support for AArch64 (ARM64v8), with the Raspberry Pi use case in mind. They also provide ready-to-boot SD card images on their build system. If you just want to try NixOS on your AArch64 system, just take the last successful build of the latest stable release of NixOS (look here for the stable at the time of writing, 20.03) and flash it on your SD card. It’s that easy! Keep in mind, however, that you have to attach an HDMI display and a keyboard to your RPi.

Note that at the time of writing, according to the unofficial wiki, NixOS has first class support only for the Raspberry Pi 3, but there’s loads of activity regarding the Raspberry Pi 4, and configs which seem to work for other users.

However, I want my SD image to be as close to the final configuration as possible – at the very minimum, I want OpenSSH already set up with my SSH key so I don’t have to painstakingly connect my RPi to a display. But there’s a catch: images for foreign architectures can only be built by a system which understands that architecture.

This leaves three ways to build a custom AArch64 image with your own configuration. In ascending order of complexity:

  • Use a remote builder, such as an actual Raspberry Pi with Nix on it or ask access to the community aarch64 builder.
  • Just build it on the Raspberry Pi (i.e. without a remote build). Be sure to use a fair amount of swap if you do that, as the image creation process is memory hungry.
  • Emulate AArch64 and comfortably build the image on my PC, which is x86_64.

Clearly, the only possible option for me was to go with the easiest most complicated way of the three: building an SD image from x86_64, which requires emulation. Buckle up, this is going to be quite a ride!

Emulating AArch64: QEMU and binfmt_misc

Fortunately, there are loads of good indications on the wiki which I used as the basis for my endeavor.

To do this, there are two ways:

  • booting an emulated AArch64 system (system emulation), so basically an emulated VM. This is fine but on the heavier side of things, as you need to allocate a fixed amount of RAM to the guest system and need a boot image.
  • using QEMU in user emulation mode, which allows it to execute foreign binaries on the fly, without requiring a running guest system.

At this point, I decided to proceed with the user emulation way, but there is one more trick needed to make the whole thing work: binfmt_misc. binfmt_misc is an awesome capability of the Linux kernel which allows the kernel to understand foreign executable file formats and delegate execution to any userspace program. When you try to execute something which the kernel does not know how to handle, it will try to see if it matches any of the binfmt_misc handlers via its “magic” – if it does, it calls the specified executable (which will be the emulator in our case) with the original command line. Pretty cool! The kernel uses a very similar mechanism to parse the shebangs on top of your scripts. Pair this with QEMU, an incredibly extensive emulator, and you get the possibility to run AArch64 binaries on your box! Keep in mind, however, that since this will emulate the actual architecture it will be quite slow.

Building on NixOS using nixos-generators

Thanks to the useful insights and tips of @makefoo, @Atemu12 and @sohalt on Reddit and lobste.rs, I am now aware of a quicker way to get this done if you’re already running NixOS on another system.

NixOS is a quickly changing environment, refer to the unofficial wiki for up-to-date instructions.

If you’re on x86_64, enable emulation by adding to your /etc/nixos/configuration.nix:

boot.binfmt.emulatedSystems = [ "aarch64-linux" ];

Running nixos-rebuild switch will, in one shot, download and compile QEMU and enable support for binfmt_misc. The easiest way to build an image is to use nixos-generators, a powerful set of utilities which allow to take care of the whole building step in one command.

First, follow the instructions in Nix packages and image configuration to clone a local checkout of nixpkgs and set up a basic sd-image.nix.

Make sure the checkout of nixpkgs is accessible in $HOME/nixpkgs and the configuration as $HOME/sd-image.nix, then install nixos-generators:

nix-env -f https://github.com/nix-community/nixos-generators/archive/master.tar.gz -i

You should now have nixos-generate in your $PATH. Time to actually build the image:

nixos-generate -f sd-aarch64-installer --system aarch64-linux -c sd-card.nix -I nixpkgs=$(pwd)/nixpkgs

If everything goes well, an image file will be produced and its path printed on screen!

Refer to the documentation of nixos-generators for more info, or go to Flashing.

The VM approach: using Vagrant

This details my journey (including errors that I had to face) to build an image on a normal, non-NixOS VM.

My first approach was to use Vagrant to spin up a VM.

vagrant init generic/debian9

Make sure to start the VM with enough RAM – the image building step of the process takes more than 4 GiB of RAM, but if you have enough swap it’s alright. If you’re using VirtualBox as the backend:

patch Vagrantfile <<EOF
52c52
<   # config.vm.provider "virtualbox" do |vb|
---
>   config.vm.provider "virtualbox" do |vb|
57,58c57,59
<   #   vb.memory = "1024"
<   # end
---
>     vb.memory = "4096"
>     vb.cpus = 4
>   end
EOF

Setup

After I got into the VM with vagrant ssh, I set up QEMU:

sudo apt update
sudo apt install -y qemu-user-aarch64 binfmt-support qemu-user-static

Then I checked that QEMU was correctly registered as a binfmt_misc handler:

$ sudo update-binfmts --display
[...]
qemu ... (enabled):
[...]

At this point, I set up Nix:

# installs Nix (run this as a normal user)
sh <(curl https://nixos.org/nix/install) --no-daemon
# loads nix into the env without reopening the shell
. $HOME/.nix-profile/etc/profile.d/nix.sh

Nix packages and image configuration

It is now time to choose what revision of nixpkgs to use. In this case, I want to build the latest stable release of NixOS (20.03 at the time of writing), so I cloned the release-20.03 branch of nixpkgs:

git clone --depth=1 -b release-20.03 https://github.com/NixOS/nixpkgs

To avoid incurring in weird “cannot allocate memory” errors, check if QEMU has been updated to version 5 or if PR #82718 by @misuzu has been merged. If not, run:

cd nixpkgs
curl -L "https://github.com/NixOS/nixpkgs/pull/82718.patch" | git am
cd -

This will apply the patch of @misuzu on top of the cloned checkout of nixpkgs. So far so good! I then set up a basic configuration:

cat > $HOME/sd-image.nix <<EOF
{ lib, ... }: {
  imports = [
    <nixpkgs/nixos/modules/installer/cd-dvd/sd-image-aarch64.nix>
  ];
  # The installer starts with a "nixos" user to allow installation, so add the SSH key to
  # that user. Note that the key is, at the time of writing, put in `/etc/ssh/authorized_keys.d`
  users.extraUsers.nixos.openssh.authorizedKeys.keys = [
     "ssh-ed25519 ..."
  ];
  # bzip2 compression takes loads of time with emulation, skip it.
  sdImage.compressImage = false;
  # OpenSSH is forced to have an empty `wantedBy` on the installer system[1], this won't allow it
  # to be started. Override it with the normal value.
  # [1] https://github.com/NixOS/nixpkgs/blob/9e5aa25/nixos/modules/profiles/installation-device.nix#L76
  systemd.services.sshd.wantedBy = lib.mkOverride 40 [ "multi-user.target" ];
  # Enable OpenSSH out of the box.
  services.sshd.enabled = true;
}
EOF

Building, failing and building again

Everything is now ready to go! Time to kick start the build:

cd nixpkgs/nixos
nix-build -A config.system.build.sdImage \
  --option system aarch64-linux \
  --option sandbox false \
  -I nixos-config=$HOME/sd-image.nix \
  -I nixpkgs=$HOME/nixpkgs \
  default.nix

This took quite a while, but eventually it reached the stage where it started to create the .img file. cptofs (the utility used to copy the system files to the image being built) is an incredible memory hog and it’s the most memory intensive process, in my case peaking at about 8 GiB of used RAM. Then, suddenly, it started spitting out loads of this:

error while reading directory /nix/store/[...]: Cannot allocate memory

Ouch, that doesn’t look good. I monitored RAM usage and the system was definitely not out of memory (nor swap), so something weird was going on. However, the build did not error out, it produced an image anyway.

Some Googling revealed very few results, except for some poor souls on the IRC channels #nixos/#nixos-aarch64 who had the same issues, reporting inability to boot with the resulting images, and without a solution [1] [2] [3]. Other sources say that the build actually works anyway, but I didn’t feel comfortable booting an (apparently) half-baked image. Not wanting to give up, I intensified my Googling, and found similar issues that people found on other software when emulating. Specifically, Debian maintainers found that on PIE-compiled binaries allocations made with brk(2) were failing randomly. The issue, originally reported in 2018, was fixed in January 2020. Still not sure if this was going to be the fix for what was going on, and using a distro not exactly known for its up-to-date software, I decided to build the latest stable of QEMU from its source. At the time of writing, v5.0.0 just came out!

Note: it is very much possible that a distribution with more up to date packages won’t need this. Attempt a normal build first!

Note: Debian unstable already bundles QEMU 5.0 – it’s perfectly sufficient to use the official package if available.

Update: this can also be fixed by applying the patch described in the Nix packages and image configuration section.

# remove system QEMU
sudo apt remove qemu-system-aarch64 qemu-user-static
# clone qemu
git clone --depth=1 -b v5.0.0 https://git.qemu.org/git/qemu.git; cd qemu
# install deps
sudo apt install git libglib2.0-dev libfdt-dev libpixman-1-dev zlib1g-dev
# configure minimally and build
./configure --enable-linux-user --target-list=aarch64-linux-user --disable-bsd-user \
    --disable-system --disable-vnc --disable-curses --disable-sdl --disable-vde \
    --disable-kvm --static --disable-tools --cpu=x86_64
make -j$(( $(nproc --all) + 1 ))

A bit of time later I had a working binary of QEMU in ./aarch64-linux-user/qemu-aarch64. However, I’ve lost the binfmt_misc registration done by the Debian package. Fortunately, QEMU comes with a script to setup binfmt_misc out of the box. I altered it slightly to only register the signature for aarch64 binaries:

patch scripts/qemu-binfmt-conf.sh <<EOF
4,7c4,8
< qemu_target_list="i386 i486 alpha arm armeb sparc sparc32plus sparc64 \
< ppc ppc64 ppc64le m68k mips mipsel mipsn32 mipsn32el mips64 mips64el \
< sh4 sh4eb s390x aarch64 aarch64_be hppa riscv32 riscv64 xtensa xtensaeb \
< microblaze microblazeel or1k x86_64"
---
> qemu_target_list="aarch64"
EOF

sudo ./scripts/qemu-binfmt-conf.sh --qemu-path $(pwd)/aarch64-linux-user

Note that this won’t persist after a reboot and isn’t the “standard way” of doing it, but it’s more than sufficient for a quick build.

And voilà! Time to attempt a new build:

nix-build -A config.system.build.sdImage \
  --option system aarch64-linux \
  --option sandbox false \
  -I nixos-config=$HOME/sd-image.nix \
  -I nixpkgs=$HOME/nixpkgs \
  default.nix

This time, no allocation failures! Woo-hoo, thanks QEMU contributors! Unfortunately, my celebrations were quickly stopped by what I saw next:

building '/nix/store/q4kcsy4f1jcxxa2kc6x02rjhg8z1911y-ext4-fs.img.zst.drv'...
[...]
copying store paths to image...
copying files to image...
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
NIXOS_SD: 93738/177408 files (0.1% non-contiguous), 581365/708701 blocks
Resizing to minimum allowed size
resize2fs 1.45.5 (07-Jan-2020)
Please run 'e2fsck -f temp.img' first.

builder for '/nix/store/q4kcsy4f1jcxxa2kc6x02rjhg8z1911y-ext4-fs.img.zst.drv' failed with exit code 1
cannot build derivation '/nix/store/ari7yzjkhif5d7q256dy5rrdfkjhqb8f-nixos-sd-image-20.09pre221814.10100a97c89-aarch64-linux.img.drv': 1 dependencies couldn't be built
error: build of '/nix/store/ari7yzjkhif5d7q256dy5rrdfkjhqb8f-nixos-sd-image-20.09pre221814.10100a97c89-aarch64-linux.img.drv' failed

This proved quite difficult to debug and solve, and I separated the investigation for this issue in another post: “Why doesn’t resize2fs resize my filesystem?”.

The gist of it is that I had to patch nixpkgs to sort it out – check this PR to see if this is was merged and you don’t need to take care of it anymore. Otherwise, applying the patch to the local checkout is pretty easy:

curl -L "https://github.com/NixOS/nixpkgs/pull/86366.patch" | git am

After applying the patch and building, the build finally finished leaving me with a fancy .img file:

$ ls result/sd-image/
nixos-sd-image-20.03pre-git-aarch64-linux.img

Flashing

After getting the image file back to a machine where I had an SD card reader (a MacBook), I took the device name of my SD card (which was /dev/disk2, retrievable via “Disk Utility”) and flashed it:

sudo gdd if=nixos*.img of=/dev/rdisk2 bs=64K status=progress

Note: this permanently destroys data on the target device. Use with care.

Note: gdd comes from the brew package coreutils, and basically corresponds to an up-to-date GNU version of dd which has the status= parameter. It is perfectly fine to use the built-in one as well.

After plopping the SD card back into my Raspberry Pi, I connected it to the network and powered it up. Crossing all the fingers that I have, I waited for it to come online, and moment of truth time…

$ ssh nixos@10.0.0.x
Enter passphrase for key '<key>':
[nixos@nixos:~]$ uname -a
Linux nixos 5.4.33 #1-NixOS SMP Fri Apr 17 08:50:26 UTC 2020 aarch64 GNU/Linux

Eureka! It’s alive, and it was born with my SSH public key! It was not without its efforts, but it was a success.

One step forward: the Docker approach

I could have stopped here, but it wasn’t fun enough! Since I know that I won’t be the only one that wants to do this, I want to at least make it as easy as possible for the next one who wants to do the same. Thus, I decided to create a Dockerfile which, along with some docker-compose magic, allows to build a NixOS SD image with one command in about 15-20 minutes.

It’s available on GitHub, and the documentation should be easy to follow. You can stop reading this post now if you just need to build NixOS – keep reading to have some background.

Background

Originally, I was planning to use multiarch/qemu-user-static from the official Docker hub to be able to set up QEMU and binfmt_misc for free. Unfortunately, it has not been updated to use QEMU 5.0 yet, which according to my testing is the only one that does not throw random memory errors when copying the files to the filesystem. I looked into contributing to their repo, but unfortunately they depend on a package supplied by Fedora, which has only been updated to the latest release candidate of QEMU 5.0, rather than the final version which came out a few days ago.

To solve this, I built my own Dockerfile (with blackjack and hookers) which downloads a statically compiled usermode QEMU 5.0 exclusively for AArch64 from the official Debian repositories. After verifying its integrity, it also downloads the official script to enable binfmt_misc from the main repository of QEMU and, via the privileged Docker flag, enables it on the host system.

I used a few tricks to make the thing really painless:

  • I employed the fix-binary flag when registering the QEMU binary via binfmt_misc. The kernel usually loads the “interpreter” (emulator in this case) lazily, reading and executing it for each invocation – instead, this flag makes the kernel open and retain the QEMU binary in memory. This means that the actual executable can be removed without any issues, which is exactly what happens since the container lives shortly. Pretty amazing!
  • Since containers do not await each other to finish, I compiled a very small AArch64 binary which simply runs a printf. This is used by the builder to wait until QEMU has been correctly set up to start the build.
  • I wanted the whole process to be as secure as possible, so the containers that interact with the host kernel (using the privileged flag) are only in charge of the task of setting up QEMU: the actual build is done on another container which does not require special privileges.
  • The previous point, however, imposed another roadblock: containers are started in parallel and execution is (rightfully) not sequential. I wanted to make sure to clear up the binfmt_misc handler once the build was done, to get rid of the only residues that are effectively left on the host system. To do that, there is a final container (using the same image of the one that sets up QEMU) which waits on its local network to be notified by the main container when the build is over. I achieved that by simply listening on a TCP port with nc (which comes out of the box on Alpine, thanks busybox!) on the cleanup container, which will patiently wait until the builder sends a message to that port.

With this, I had all the necessary ingredients to make this painless for everyone else. Check it out on GitHub if you haven’t already.

Pushing it to the limit: native AArch64 build on EC2 in 5 minutes

Not too long ago, Amazon added native AArch64 machines to EC2, which are perfect for our use case. The cost is already pretty low, but you can go even lower with Spot instances!

Initially, I just spun up an EC2 instance, cloned the repository mentioned in the previous paragraph and let it run. As a sneak peek, here is exactly how much time it took to build the entire image:

$ time sudo docker-compose up
...
build-nixos_1  | /nix/store/56khbas8w2y9xv5m6lihpmadw73nfvkd-nixos-sd-image-20.03pre-git-aarch64-linux.img
nixos-docker-sd-image-builder_build-nixos_1 exited with code 0

real    5m16.864s
user    0m1.519s
sys     0m0.194s

Did I already mention that this includes the time to build the Docker image too? That’s pretty good!

However, I decided to go one last step forward and create a Packer configuration which automatically creates an EC2 Spot instance and builds an SD image of NixOS on it. It works almost magically, as all it takes is cloning nixos-docker-sd-image-builder and doing the following:

cd packer
packer build build.pkr.hcl

Check out the documentation if you want to learn more!

Conclusion

Woah, this was one hell of a journey! Having weird and hard to debug problems and the willingness to always go one step further than the previous made it certainly very interesting.

I hope this was informative – feel free to send me an e-mail if you have any questions or comments!

Appendix: Changelog

I intend to update this article if anything changes or if I get suggestions. Here’s a changelog:

2020-05-10

  • replaced Terraform with Packer for the EC2 magic

2020-05-09

  • added section to build with nixos-generators
  • added mention about a patch which gets rid of cptofs and makes it possible to avoid using QEMU 5
  • use -I nixpkgs= instead of setting NIX_HOME
  • slight cleanup/reorganization of Nix packages and image configuration

Thanks to @sohalt, @makefoo and @Atemu12!