Originally published on Cohost. I recall this one getting a fair amount of attention, which felt good because I put a fair amount of time into it.
Recently, a friend shared this gist about how Amazon’s internal buildsystem works, and wow did that unlock some Opinions for me about software packaging.
Supposedly, Amazon’s buildsystem “Brazil” works a little bit like Nix/NixPkgs1, in that it has entirely reproducible builds based on package declarations of nearly every package in existence. But instead of just being able to choose a single snapshot of package versions2, you can also pin to semver of a set of packages, whose mutual compatibility is automatically enforced via unit tests when creating a new immutable build, so you always have the latest updates. This is amazing. Also like Nix, you can have:
- Two versions of a package installed on your system, since different environments just choose which version they want
- Local overrides of packages for development/debugging environments
- Binary releases since everything is reproducible
And it all “just works”! My friend who currently works there says he loves it and building software has never been easier. Which is wild, given that:
The main build driver is a mess ‘o Perl scripts that generates Makefiles. The build system is bootstrapped by a minimal Perl script which assumes only base Perl deps and GCC are available, and downloads all other dependencies.
…but hey, if it works, it works!
Most Software Isn’t Like This
First, some terminology, so that it’s unambiguous what I’m talking about:
- Package: The atomic unit of software. Libraries, applications, etc. Each package has an:
- Interface version: Some identifier for other software to know a package supports certain capabilities. Hopefully, specified in some semver-compatible way, although in practice lol lmao. Adding extra debug logging or fixing security bugs that no legitimate consumer would be affected by won’t change an interface version.
- Build version: Some identifier that can be 1-to-1 correlated with differences in the produced binary3 for a package. This lets you distinguish between “the library i built with extra logging and a patch for the vuln” and “the library without those things”. Also accounts for differences in build versions of dependencies!
- Dependency: a package that another package requires to build and/or run. Often specified with interface versions, but can also be specified against a build version.
- Version Set: A collection of build versions of packages known to work well together. It’s important that build versions work well together and not interface versions, because it is very possible to mess up semver.
- Environment: For a specific package you want to use, the set of all other packages that are capable of affecting it.
As far as I am aware, there are two common methods for distributing software packages and creating environments to run them in. These aren’t the only methods in existence, and not everything can be categorized exactly, but they are archetypical.
Share Everything
- There is one central version set, containing every package, often with testing that they all work together.
- Only one build version per package can be installed at any given time. If you want to have different build versions at the same time, you need to create different packages or create package aliases.
This is a classic model for software environments. Arch Linux, RHEL, pip, npm, Homebrew, Forge, if you can point at it and say “that’s a Package Manager”, it likely uses this model. There are differences in how often upgrades are released, how exactly semver pinning works, other jobs the package manager takes care of, but all of the examples I listed share these key traits4.
Now, frankly, I think this model sucks big booty balls, pardon my language. Only being able to have one (1) version of a package officially installed is extremely limiting. Your package needs a build version of a dependency not in the central version set? Your options are to:
- Rename the dependency and install it globally.
- “Install” the dependency outside the control of the package manager.
- Give Up
Option 1 is dumb because suddenly you are specifying interface/build versions as part of the package name, when it is literally the package manager’s job to distinguish between these. Option 2 is dumb because suddenly you’re rolling your own package manager with CMakeLists.txt
and shell scripts when you have a perfectly good package manager right there5. Option 3 is dumb because Winners Never Quit 😤
Sometimes, it’s OK for packages to just live in their own little island of dependencies! Not everything needs to be global! It’s frankly unacceptable that local installs are so poorly supported! However, it’s possible to go too far the other way. Instead of sharing everything, you can instead:
Share Nothing
If a package has dependencies, it will need to include its own way of putting them in its environment. There are ways for packages installed separately to be part of the same environment, but absent a package manager, these methods are rickety and best6 and infuriatingly buggy at worst. Curiously enough, consumer-grade operating systems like Windows and MacOS have this as their default approach. Even more curiously, a semi-recent push towards containerization technologies like Docker, snap, flatpak, and others have been pushing Linux software to be also distributed using this model. Why is this?
I hypothesize this model’s popularity comes from the fact it is more likely to produce consistently working software. Linux distributions have been plagued by “works on my machine” or “works on my distro” problems for, like, ever. Share Everything, when you’re trying to build packages outside the global version set, against different global version sets from different distros, or even the same distro as it evolves over time, is just begging to have frustrating build problems. This is why language-specific package managers with virtual environments feel like such an upgrade, this is why Docker is so popular. Your global environment has ghosts in it, invisible dependencies that haunt your build process, so isolating everything to trap the ghosts has its benefits for reproducibility.
Now, to be clear, the Share Nothing approach is not without its drawbacks. Requiring packages to bundle all their dependencies/exist within an isolated Share Everything fiefdom leads to a ton of bloat. I don’t particularly want 5 copies of Tensorflow and PyTorch on my machine, but since I also don’t want to try to shove all my one-off AI projects into a global Python environment, this is what I get stuck with.
What The Approaches Are Missing
Let’s lay out our Ideal Build System Requirements again:
- Reproducible builds: if a remote system can build it, your local system can too
- Local overrides: not only can you build the package locally, you can swap it out for anything expecting that package
- Remotely hosted binary releases: because you deserve to not have your fans take off and your disk explode every time you want to install some software
- No global version set: you can have more than one version (major, minor, patch all included) of a package installed on your system, and things just don’t care, since they use the one they were reproducibly built for
- Semver and hash pinning: to enable dependency sharing if supported, but also for exact reproducibility if needed
hm. Neither approach we’ve covered gets this right, or even comes close to doing all these7! There are fundamental flaws in the way software packages are currently distributed that prevents these goals from being achievable.
Boiling The Ocean
Now that we have a better understanding of what Share Everything and Share Nothing lack, we can appreciate why Amazon’s Brazil is such an achievement. Not only does it let you isolate packages to only see their dependencies so everything is reproducible, it also lets packages share dependencies that provide the same interface version! Awesome! But what does it take to accomplish this, exactly?
Technical Challenges
Not going to go too in-depth here (this post is long enough as it is LOL), but somewhat unsurprisingly, there aren’t any compelling technical reasons we can’t have this. Environment isolation at all levels has gotten quite good on the major OSes; why isn’t that enough?
Social Challenges
It’s not that the technology doesn’t exist, it’s just that, no one has cared enough to put it into practice. “Why should I change how I build software”, say the developers, the distro creators, “it works well enough for my use cases!”.
Personally, I have built many pieces of software in environments just a teensy bit different from where they were supposed to be built, and have seen how bad it is. Every package is different, with its own magic incantations of scripts and command line flags and environment variables and build directories to get them working. As one commenter on the Brazil gist puts it:
One major issue, in my experience, is the Brazil packaging concept doesn’t have critical mass or would require boiling the ocean to get there. At Amazon, there’s one package manager - Brazil. You want a Gem, NPM package, *.so, or JAR dependency? You add it to Brazil. Sometimes you go through great pain to add it to Brazil (especially when importing a new build system). But once the package is there, everything just works.
Only nerds with obscene amounts of time on their hands have the ability to create an ecosystem. Package maintainers for Gentoo, NixPkgs, Guix, AUR, each take up their own divine hammers and bend the universe of software to their will. Everything “just works” when you’re in the system, but for those of us unlucky enough to be developing software outside of it, we’re living in an unending nightmare, where nothing works and everything is hard and no one has enough spare for a 15th standard8 because we’re just trying to build our silly little packages gosh darn it.
What Amazon Does
In a nutshell, they throw money at the problem. Money for compute, to build every single transitive dependent when a package is published to make sure the listed interface version is semver-compliant. Money for storage, to host the entire history of software (source and binaries) so no deploys can possibly fail by missing old build versions. And most importantly, money for developers to port all software they want to use to this buildsystem.
This approach really only works for companies like Amazon, since the reward for them is in fact worth the cost. But what about the rest of us?
Can We Have This Too?
I don’t know, honestly. In my uninformed opinion, I’d probably say that NixPkgs and Guix are the closest things to what I’ve laid out, hitting all my Ideal Build System Requirements minus ones that are impossible thanks to No Money (semver pinning, total history9). I have not used either extensively enough to judge the UX fully10, though I will say I have heard terrible things about NixPkgs11 and nothing about Guix, neither of which is a good sign.
As an individual, it is not worth it for me to boil the ocean. I’ve gotten used to living in the nightmare, holding my Windows development environment together with duct tape & superglue, so I don’t think I’ll be making the switch any time soon. However, it takes a community to boil the ocean, so maybe now that my Arch install has collapsed in on itself, my next Linux install will be a reproducible one. I can only hope others feel the same.
Footnotes
-
For a great explanation of the differences between these, see this excellent blog post by @fullmoon ↩
-
I don’t actually know Nix that well, this was just my impression based on limited use + doc reading, please correct me on this point if I’m wrong! ↩
-
“What about source-only languages??” the source is a form of binary :) ↩
-
I think? Reasonable people could disagree on some aspects, maybe i am generalizing too hard but u tell me ↩
-
Now to be fair, a lot of language-specific package managers do support having “local installs” of packages that read from a specific directory instead of from a repository. This is more a dig at the worst languages for packaging, C and C++, but still applies for how you get other languages’ custom packages into local directories in the first place. ↩
-
On Windows, you can put your .dll or .exe into the global
PATH
for other packages to access it12, but this has the drawback of “oh shoot I’m using the wrong version of python.exe again, better reorder myPATH
”, so usually it’s just best practice to put all the .dlls a package will need in the same folder. if you forget this best practice, your app runs with elevated privileges, and thePATH
can somehow contain a user-writable folder, oops! DLL hijacking. I think MacOS works similarly, but with better sandboxing in certain areas (so package files are only accessible by the package itself really, i think it is still susceptible forms of dll hijacking). for docker, u share dependencies through exposing network ports or shared folders, which, lol lmao. ↩ -
Honestly? I believe that Cargo, the build system for Rust packages, comes the closest! Unfortunately the constraints of the language make binary releases nearly impossible (generics), and semver a thing only for source distributions. Plus, unless you’re a madwoman, Cargo isn’t really suited for non-Rust packages. ↩
-
It’s also a hurdle that proprietary software won’t adopt a model like this unless it is the standard, so anything that tries will have to bend over backwards for proprietary software, even more than for weird build scripts. But I digress, since this is ALREADY TOO LONG ↩
-
And when I did give it a little whirl, it promptly ate up all 72GB of my free disk space, which is maybe on me, but also, that’s a lot of disk space!! so i haven’t used it since. ↩
-
https://blog.wesleyac.com/posts/the-curse-of-nixos , among others ↩
-
Another thing you can do is register a different global environment variable just for your package, and then anyone looking for your package can simply read from that environment variable. For Docker,
s/environment variable/network port/g
. Like I said, rickety at best, infuriatingly buggy at worst :P ↩