So often you hear the phrase “it works on my machine” when you face errors upon running a piece of software. The phrase is a symptom of a deeper problem whose root cause is the nature of software systems. Software systems are inherently dependent upon the environment in which they are executed called the runtime. Runtime represents the set of resources that are available to the software system. A resource could be anything from main memory to a file in a specific location.
Problems arise when software systems make assumptions about the runtime environment. For example
assumes that the file a.txt (which is a part of the runtime) is present in the location /home/jiraaya. Now when L runs the program on his machine, the file might be present in /home/L which would cause the program to fail.
The traditional solution to this problem is not assuming the location of the file, and letting L specify the location to the file as a parameter. So the code would look like this
where path_to_file is a variable and L would set the path to file as /home/L/a.txt
In very large programs, the number of such parameters could easily exceed the capacity of a single programmer to keep track of. There are multiple ways to solve this problem. A relatively recent approach tries to side step the problem in two steps.
- Use virtualization to setup similar machines and networks for everyone. [Standardization]
- Use tools like puppet or ansible to install packages that you need on top of the virtual machines. [Provisioning]
Now you might ask, why don’t you create a new virtual machine with all the software that you want installed. Well, you could do that unless you end up creating multiple images each with a slightly different configuration. For example, in your developer machines, the name of the user that runs your program might be dev but on your QA machines, the name of the user might be qa. Virtual machine images consume a lot of memory and it is a waste to create one for each variation.
Next you might ask, why don’t you use shell that has been there all along to do the same things that puppet and ansible do. You can do that, but the puppet and ansible have solved a lot of problems that you have to write code for if you are using shell; which brings me to the set of lessons learned when doing configuration as code myself.
- Use declarative programming – reduce redundant checks
- Make your scripts idempotent. That is, they can be safely rerun multiple times without side effects. Some guidelines to do this are
- Don’t try to install packages already installed
- Don’t try to add content to a config file like iptables that already has equivalent content
- The file system is a shared memory, use it carefully. Create a unique folder under /tmp for each run of the installation script and place all the temporary files that it creates inside that location.
- Don’t hide ugliness through automation. If the installation of your program requires ugly steps like copying files from one location to another, formatting them etc, try to think about solving that problem first.
- Use system packages like rpm and deb whenever possible. You get a lot of things like dependencies, update protocol etc for free
- Don’t compile from source using tools like puppet, it is better to create a package and distribute it through local repositories
- Test your scripts using tools like vagrant and not on the actual machine.
- Do not install packages from untrusted sources, if you are using a mirror, use cheksums.
- Keep module configuration separate from node configuration.
In short standardization of runtime and provisioning will save you lots of time trying to chase down hard to debug errors and as such has a direct impact on the success of the development effort.