These past few weeks have been busy, technical and otherwise. But here's what I've been thinking on.
Our latest release Beta 11 contains a lot of new features and set a new record for the number of release candidate builds required to get it stable (11 to be precise).
But I'm going to focus in on a specific problem. A four car pileup of complex codebases, starting with static binary building on Rust.
A static binary has no dynamicly linked libraries from the local machine. Meaning that while it depends on features of the operating system to run all of the code it needs is carried with it.
This is a key distinction, a static binary for Linux is still useless on Windows. But a dynamicly linked binary is tied to a specific version of whatever software it is linked against.
In our case the problem is at the intersection of Rust, MIPS, and soft floating point.
❯ find build/ -name rita | grep "sbin" | grep "staging_dir" | xargs file build/staging_dir/target-arm_cortex-a9+vfpv3_musl_eabi/root-mvebu/usr/sbin/rita: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, no section header build/staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-ipq40xx/usr/sbin/rita: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, no section header build/staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-brcm2708/usr/sbin/rita: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, no section header build/staging_dir/target-mips_24kc_musl/root-ath79/usr/sbin/rita: ELF 32-bit MSB pie executable, MIPS, MIPS32 rel2 version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-mips-sf.so.1, no section header build/staging_dir/target-mips_24kc_musl/root-ar71xx/usr/sbin/rita: ELF 32-bit MSB pie executable, MIPS, MIPS32 rel2 version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-mips-sf.so.1, no section header build/staging_dir/target-mipsel_24kc_musl/root-ramips/usr/sbin/rita: ELF 32-bit LSB pie executable, MIPS, MIPS32 rel2 version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-mipsel-sf.so.1, no section header build/staging_dir/target-x86_64_musl/root-x86/usr/sbin/rita: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, no section header build/staging_dir/target-i386_pentium4_musl/root-x86/usr/sbin/rita: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, no section header
Notice how the MIPS binaries are the only ones that are not statically linked.
Part of the problem here is that Rust doesn't have a 'build a static binary' flag, Go is famously universally staticly build. Meaning it will always build portable binaries. Rust on the other hand gives you the option of building static binaries. But that option and exactly what else impacts it isn't exactly clear. Other than one rather cryptic book page noting that you should use the MUSL libc implementation.
Using MUSL is not a guarantee, it's a polite request. For whatever reason the Rust soft float flag does not make it's way to LLVM for MIPS targets and we end up with.
build/staging_dir/target-mips_24kc_musl/root-ath79/usr/sbin/rita: ELF 32-bit MSB pie executable, MIPS, MIPS32 rel2 version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-mips-sf.so.1, no section header
Between Beta 8 and Beta 11 the OpenWRT was updated and the calling environment, specifically this floating point library, is not backwards compatible.
Binaries built against the old library work when run against the new library. But binaries built against the new library (Beta 11) don't run against the old library (Beta 8).
For this update we can hand build binaries against an older libc and ship those as special versions of Beta 11.
But how do we want to deal with this going on into the future? We do want to upgrade our MIPS building toolchain someday.
We could contribute to upstream Rust and see what it takes to get a fully static binary, or we could move in the direction of updating more packages on the local devices. After all if we ship a new libc and kernel update things should work just fine.
Cheaper MIPS routers typically use cheap and very small NOR flash chips. Sized 8MB, 16MB, or 32MB.
SquashFS throws out everything, including the kitchen sink, in order to make files smaller on the disk. But you can never modify it after the initial flash.
This brings us back to our last issue. If we try and ship a kernel and libc update we're going to double the storage size for both items. We will have the old version in SquashFS, which we can't erase without reflashing the device (risky to do remotely) and the new version in JFFS2.
At this point I've written a lot of words on this subject but I haven't really made an engineering decision on what solution to pursue. This problem is nuanced and occurs across several domains, which is part of what makes it interesting. The easiest thing to do is figure out how to build static binaries, but doing things like kernel updates remotely on these devices isn't a task that can be ignored forever.
In other development news I've made a few safety and performance patches to the main Rita codebase, the rest of my time is dedicated to the development of the nascent network operator tools codebase. The product of which I hope to reveal in a month or two.
Author: Justin Kilpatrick <email@example.com> Date: Wed Mar 4 15:21:32 2020 -0500 Use liboping for ping actions in althea_kernel_interface This change pulls in the c library oping and a Rust crate providing bindings. This lets us perform ping operations without shelling out and most importantly control the ping timeout with a resolution higher than one second. The one second timeout minimum is a key reason why the exits endpoint can run slowly. As it serially pings all the exits and the inside exit tunnel ip for the registered exit. That's at most (Num Exits + 1) * Timeout seconds of waiting time. This is actually a pretty rare case that all exits are inaccessible AND we have routes to them in Babel. But sadly this rare case is most commonly when people are looking at the dashboard! Strange connection troubles, bloat etc etc. So it has an outsized impact. This change uses our new found freedom of the timeout to reduce the exit ping timeout to 200ms. Which is pretty conservative and helps with the maximum wait time. Perf Check: previous common case: 1s previous worst case: 10s new common case: 900ms new worst case: 1.6s So ultimately +1 for less shelling out -1 for more C code +1 for more performance
Author: Justin Kilpatrick <firstname.lastname@example.org> Date: Tue Mar 3 17:18:41 2020 -0500 Forbidden settings merge values We're not working to give network operators more direct access to the oracle features. So it's important that it be a safe environment. While it's impossible for us to read the eth_private_key remotely from any router the settings merge json does provide a chance to overwrite it. This really isn't acceptible froma saftey perspective. This patch implements a small but effective blacklist of settings values where local router state is stored in such a way that remote updates can't possibly be anything but destructive.