Installing Packages in Linux


<- back | tags: /dev/journal shell bash CLI

In my day job(s) and personal projects I often find myself having to setup environments. Environment problems are the worst problems. They're frustrating. They're inherently platform dependent. Sometimes, they require you to know and understand things about the underlying platform that you really didn't care to know. C'est la vie.

A recurring theme that I frequently encounter is the need to install packages in a variety of operating systemms. This might be for infrastructure work, setting up static servers, configuring Jenkins nodes, creating crash environments, or test environments. It's common enough that I've had to scrounge around my poor memory on more than one occassion trying to recall how I solved something once upon a time - where "once upon a time" is more like every other month.

When I first started looking at this totally solved problem I had a paradigm where I maintained an installation script for each type of OS I might encounter.

A first pass

The scripts would be pretty boiler-plate and generally have the same packages, shebang, and flow in each file. For the sake of brevity I'll confine myself to rpm and deb based systems. Those files might look something like:

#!/bin/bash
sudo yum install -y package-a \
    package-b \
    package-c \
    package-d
#!/bin/bash
sudo apt-get install -y package-a \
    package-b \
    package-c \
    package-d

The only actual difference here, as you might tell, are the OS specific commands to install packages on a given system.

First consolidation

The next evolution of my thought was that it'd be nice to have some extra information and a more structured way to see what is (and isn't) getting installed. So we turn to arrays and loops:

#!/bin/bash
set -euxo pipefail

declare -a packages
packages=( package-a package-b package-c package-d )

for i in "${packages[@]}"; do
    installed=$(rpm -q $i)

    if [ "$installed" == "package $i is not installed" ]; then
        sudo yum install -y $i
    fi
done
#!/bin/bash
set -euxo pipefail

declare -a packages
packages=( package-a package-b package-c package-d  )

for i in "${packages[@]}"; do
    installed=$(dpkg -s $i | grep status)

    if [ "$installed" != "Status: install ok installed" ]; then
        sudo apt-get install -y $1
    fi
done

The consolidation here is really not a consolidation so much as a re-organization. It's just a different way of adding structure to the install script. Instead of trusting the package installation programs to give us some sane and easy to parse information, we've taken the responsibility into our own hands. We're very deliberately checking whether or not each package was already installed and installing it if it wasn't. I, personally, also like the array of packages instead of the multi-line list with trailing \'s. That's broadly preference.

Second consolidation

Since bash is prolific I'll take the assumption that it's available on our target systems. Well, that, and I've manufactured a contrived exmple. That means we can consolidate to something like:

#!/bin/bash
set -euxo pipefail

declare -a packages
packages=( package-a package-b package-c package-d )

if hash rpm 2>/dev/null; then
    # This is an RPM based system
    for i in "${packages[@]}"; do
        installed=$(rpm -q $i)

        if [ "$installed" == "package $i is not installed" ]; then
            sudo yum install -y $i
        fi
    done
else
    # We assume this is a deb based system
    for i in "${packages[@]}"; do
        installed=$(dpkg -s $i | grep status)

        if [ "$installed" != "Status: install ok installed" ]; then
            sudo apt-get install -y $i
        fi
    done
fi

Now we only have to maintain a single script for two broad categories of operating systems. When doesn't this work, you ask? It doesn't work well when you have a common package that is named two different things between the rpm and deb systems.

Performance Tuning and Improvements

Before we continue, a brief anecdote about first-pass solutions and improvements:

A principal engineer at my first engineering job once told me to implement things like they're going into production the first time. I took that advice to heart and added an apparent rule in my life: implement the straight-forward, simple, and working thing first.

Simple and straight-forward things are generally good for production. They're easy to maintain, easy to understand, and easy to fix if they break. I've expanded their advice to most things I work on.

The primary benefit of this approach, for me personally (read: selfishly), is that it helps me understand when I'm implementing or trying to learn something new. The second benefit, if you're in a work environment, is that you (probably) get an acceptable deliverable quickly. The catch is that we seldom have time to go back and improve on things, hence the concept of implementing something like its going into production the first time.

In any case, enough rambling about manifestos and personal credos.

I'm going to defer the actual experimentation to its own post. It seems like a good opportunity to talk about how we can measure and benchmark such things - and I haven't posted in a while so getting something out is probably a good idea anyway. There's some good improvements that we'll make on this script in my next post on this topic.

To be continued....