PolyWolf's Blog

Software Packaging Is A Nightmare

Published: 9/24/2023, 7:23:42 PM

Originally published on Cohost. I recall this one getting a fair amount of attention, which felt good because I put a fair amount of time into it.


Recently, a friend shared this gist about how Amazon’s internal buildsystem works, and wow did that unlock some Opinions for me about software packaging.

Supposedly, Amazon’s buildsystem “Brazil” works a little bit like Nix/NixPkgs1, in that it has entirely reproducible builds based on package declarations of nearly every package in existence. But instead of just being able to choose a single snapshot of package versions2, you can also pin to semver of a set of packages, whose mutual compatibility is automatically enforced via unit tests when creating a new immutable build, so you always have the latest updates. This is amazing. Also like Nix, you can have:

And it all “just works”! My friend who currently works there says he loves it and building software has never been easier. Which is wild, given that:

The main build driver is a mess ‘o Perl scripts that generates Makefiles. The build system is bootstrapped by a minimal Perl script which assumes only base Perl deps and GCC are available, and downloads all other dependencies.

…but hey, if it works, it works!

Most Software Isn’t Like This

First, some terminology, so that it’s unambiguous what I’m talking about:

As far as I am aware, there are two common methods for distributing software packages and creating environments to run them in. These aren’t the only methods in existence, and not everything can be categorized exactly, but they are archetypical.

Share Everything

This is a classic model for software environments. Arch Linux, RHEL, pip, npm, Homebrew, Forge, if you can point at it and say “that’s a Package Manager”, it likely uses this model. There are differences in how often upgrades are released, how exactly semver pinning works, other jobs the package manager takes care of, but all of the examples I listed share these key traits4.

Now, frankly, I think this model sucks big booty balls, pardon my language. Only being able to have one (1) version of a package officially installed is extremely limiting. Your package needs a build version of a dependency not in the central version set? Your options are to:

  1. Rename the dependency and install it globally.
  2. “Install” the dependency outside the control of the package manager.
  3. Give Up

Option 1 is dumb because suddenly you are specifying interface/build versions as part of the package name, when it is literally the package manager’s job to distinguish between these. Option 2 is dumb because suddenly you’re rolling your own package manager with CMakeLists.txt and shell scripts when you have a perfectly good package manager right there5. Option 3 is dumb because Winners Never Quit 😤

Sometimes, it’s OK for packages to just live in their own little island of dependencies! Not everything needs to be global! It’s frankly unacceptable that local installs are so poorly supported! However, it’s possible to go too far the other way. Instead of sharing everything, you can instead:

Share Nothing

If a package has dependencies, it will need to include its own way of putting them in its environment. There are ways for packages installed separately to be part of the same environment, but absent a package manager, these methods are rickety and best6 and infuriatingly buggy at worst. Curiously enough, consumer-grade operating systems like Windows and MacOS have this as their default approach. Even more curiously, a semi-recent push towards containerization technologies like Docker, snap, flatpak, and others have been pushing Linux software to be also distributed using this model. Why is this?

I hypothesize this model’s popularity comes from the fact it is more likely to produce consistently working software. Linux distributions have been plagued by “works on my machine” or “works on my distro” problems for, like, ever. Share Everything, when you’re trying to build packages outside the global version set, against different global version sets from different distros, or even the same distro as it evolves over time, is just begging to have frustrating build problems. This is why language-specific package managers with virtual environments feel like such an upgrade, this is why Docker is so popular. Your global environment has ghosts in it, invisible dependencies that haunt your build process, so isolating everything to trap the ghosts has its benefits for reproducibility.

Now, to be clear, the Share Nothing approach is not without its drawbacks. Requiring packages to bundle all their dependencies/exist within an isolated Share Everything fiefdom leads to a ton of bloat. I don’t particularly want 5 copies of Tensorflow and PyTorch on my machine, but since I also don’t want to try to shove all my one-off AI projects into a global Python environment, this is what I get stuck with.

What The Approaches Are Missing

Let’s lay out our Ideal Build System Requirements again:

hm. Neither approach we’ve covered gets this right, or even comes close to doing all these7! There are fundamental flaws in the way software packages are currently distributed that prevents these goals from being achievable.

Boiling The Ocean

Now that we have a better understanding of what Share Everything and Share Nothing lack, we can appreciate why Amazon’s Brazil is such an achievement. Not only does it let you isolate packages to only see their dependencies so everything is reproducible, it also lets packages share dependencies that provide the same interface version! Awesome! But what does it take to accomplish this, exactly?

Technical Challenges

Not going to go too in-depth here (this post is long enough as it is LOL), but somewhat unsurprisingly, there aren’t any compelling technical reasons we can’t have this. Environment isolation at all levels has gotten quite good on the major OSes; why isn’t that enough?

Social Challenges

It’s not that the technology doesn’t exist, it’s just that, no one has cared enough to put it into practice. “Why should I change how I build software”, say the developers, the distro creators, “it works well enough for my use cases!”.

Personally, I have built many pieces of software in environments just a teensy bit different from where they were supposed to be built, and have seen how bad it is. Every package is different, with its own magic incantations of scripts and command line flags and environment variables and build directories to get them working. As one commenter on the Brazil gist puts it:

One major issue, in my experience, is the Brazil packaging concept doesn’t have critical mass or would require boiling the ocean to get there. At Amazon, there’s one package manager - Brazil. You want a Gem, NPM package, *.so, or JAR dependency? You add it to Brazil. Sometimes you go through great pain to add it to Brazil (especially when importing a new build system). But once the package is there, everything just works.

Only nerds with obscene amounts of time on their hands have the ability to create an ecosystem. Package maintainers for Gentoo, NixPkgs, Guix, AUR, each take up their own divine hammers and bend the universe of software to their will. Everything “just works” when you’re in the system, but for those of us unlucky enough to be developing software outside of it, we’re living in an unending nightmare, where nothing works and everything is hard and no one has enough spare for a 15th standard8 because we’re just trying to build our silly little packages gosh darn it.

What Amazon Does

In a nutshell, they throw money at the problem. Money for compute, to build every single transitive dependent when a package is published to make sure the listed interface version is semver-compliant. Money for storage, to host the entire history of software (source and binaries) so no deploys can possibly fail by missing old build versions. And most importantly, money for developers to port all software they want to use to this buildsystem.

This approach really only works for companies like Amazon, since the reward for them is in fact worth the cost. But what about the rest of us?

Can We Have This Too?

I don’t know, honestly. In my uninformed opinion, I’d probably say that NixPkgs and Guix are the closest things to what I’ve laid out, hitting all my Ideal Build System Requirements minus ones that are impossible thanks to No Money (semver pinning, total history9). I have not used either extensively enough to judge the UX fully10, though I will say I have heard terrible things about NixPkgs11 and nothing about Guix, neither of which is a good sign.

As an individual, it is not worth it for me to boil the ocean. I’ve gotten used to living in the nightmare, holding my Windows development environment together with duct tape & superglue, so I don’t think I’ll be making the switch any time soon. However, it takes a community to boil the ocean, so maybe now that my Arch install has collapsed in on itself, my next Linux install will be a reproducible one. I can only hope others feel the same.

Footnotes

  1. For a great explanation of the differences between these, see this excellent blog post by @fullmoon

  2. I don’t actually know Nix that well, this was just my impression based on limited use + doc reading, please correct me on this point if I’m wrong!

  3. “What about source-only languages??” the source is a form of binary :)

  4. I think? Reasonable people could disagree on some aspects, maybe i am generalizing too hard but u tell me

  5. Now to be fair, a lot of language-specific package managers do support having “local installs” of packages that read from a specific directory instead of from a repository. This is more a dig at the worst languages for packaging, C and C++, but still applies for how you get other languages’ custom packages into local directories in the first place.

  6. On Windows, you can put your .dll or .exe into the global PATH for other packages to access it12, but this has the drawback of “oh shoot I’m using the wrong version of python.exe again, better reorder my PATH”, so usually it’s just best practice to put all the .dlls a package will need in the same folder. if you forget this best practice, your app runs with elevated privileges, and the PATH can somehow contain a user-writable folder, oops! DLL hijacking. I think MacOS works similarly, but with better sandboxing in certain areas (so package files are only accessible by the package itself really, i think it is still susceptible forms of dll hijacking). for docker, u share dependencies through exposing network ports or shared folders, which, lol lmao.

  7. Honestly? I believe that Cargo, the build system for Rust packages, comes the closest! Unfortunately the constraints of the language make binary releases nearly impossible (generics), and semver a thing only for source distributions. Plus, unless you’re a madwoman, Cargo isn’t really suited for non-Rust packages.

  8. It’s also a hurdle that proprietary software won’t adopt a model like this unless it is the standard, so anything that tries will have to bend over backwards for proprietary software, even more than for weird build scripts. But I digress, since this is ALREADY TOO LONG

  9. https://hachyderm.io/@fasterthanlime/111103916027416609

  10. And when I did give it a little whirl, it promptly ate up all 72GB of my free disk space, which is maybe on me, but also, that’s a lot of disk space!! so i haven’t used it since.

  11. https://blog.wesleyac.com/posts/the-curse-of-nixos , among others

  12. Another thing you can do is register a different global environment variable just for your package, and then anyone looking for your package can simply read from that environment variable. For Docker, s/environment variable/network port/g. Like I said, rickety at best, infuriatingly buggy at worst :P

#cohost#programming#writing