Welcome Developers


My name is Luis Atencio. Currently, I am Lead Software Developer at Citrix Systems. Programming languages, design patterns, and techniques are my passion. If you are interested in programming or just technology in general, please follow my blog!

Sunday, February 24, 2013

Notes on Continuous Delivery - Configuration Management

Overview

I will be continuing the topic on Continuous Delivery which began in my previous post: Notes on Continuous Integration; this time we will start looking at the first and most important step, Configuration Management. In the words of the authors (resource below):
Configuration Management refers to the process by which all artifacts ... and the relationships between them, are stored, retrieved, uniquely identified, and modified.
Configuration Management involves four principles:

1. Keep everything in version control

Source control is just for source code, version control is for any type of artifact your project has: source code, tests, database scripts, wireframes, high definition mock-ups, build scripts, documentation, libraries, configuration files, release plans, requirements documents, architecture diagrams, virtual machine configuration, virtual machine images and so on. I challenge you to change this in your geeky vocabulary, if you haven't already.

A little ambitious, I agree. The metadata stored within your version control system enables to access every version of every file you have ever stored as well as facilitates collaboration amongst distributed teams in space and time. 

The level of sophistication in your continuous delivery process will depend on how mature your configuration management strategy is. If storing absolutely everything is not feasible, or requires too much work and money, start with a few artifacts (in addition so source code, of course) and you will see the improvements in your process almost instantly.

Promote the idea of checking-in code frequently followed with useful commit messages. Make use of the typical "-m" switch in your version control command line tool, almost all of them support it.

Version control gives you the freedom to delete files;  worst case scenario, you can always retrieve it easily. It's like refactoring code with endless "Undo" history.

2. Manage dependencies

Dependencies constitute any external libraries, components, and modules that your application uses. For instance:
  • JAR files - Java or JVM language
  • PEAR, PECL modules or PHAR files - PHP
  • DLLs - .NET
  • Ruby Gems - Ruby
  • Bindings - Python
  • NPM - Node.js
Also under this category would be any extensions or libraries your operating system is configured with.

Build tools make it easier (not easy) to manage dependencies. I have had success with tools such as Maven, Ant, and Phing. There are many others that have gotten a lot of traction lately such as: Gradle and Ivy. I would recommend a tool like Maven because it makes it possible to recreate environments on different machines as well as manage your dependencies in centralized fashion by setting a package repository.

3. Manage software configuration

Perhaps the most important principle of all, configuration management requires that you have a strategy for automating the injection of configuration properties to your software. This is critical if you are planning to support different environments: QA, Staging, Preview, and Production. 

Configuration information can be injected:
  • At build time or packaging time: use Maven to read property files and inject configuration information into them. This is usually seen as property files with name-value pair records. These are common in Java and PHP; YAML files are common in the Ruby and Python worlds. XML is even a good contender here too supported in all platforms.
  • At startup time: usually done via environment variables or command line arguments.
  • At runtime: say you store configuration information in the database or in an external system. Then your application can fetch configuration information from the web or via scripts and apply them. For this, some bootstrapping information is always necessary such as: database connections and external URIs. 
Whichever mechanism you use, make sure everything is version controlled. One immediate question is, how can we store sensitive information like passwords? Here is one way to do that:


Another is to store passwords together with the code so that it gets compiled and obfuscated. This is not good practice, especially in interpreted  languages.

I recommend using secure certificates, SSL keys, or encrypted information at the very least.

The simplest approach to configuration management with which I have had success implementing is via build time injection of property files as mentioned above. These vary depending on the following:
  1. The application
  2. The version of the application
  3. The environment (QA, Production, UAT, Preview, Staging, etc)
With Maven, you can easily inject configuration information into the application before packaging it into a deployable artifact. This is a very nice and simple approach because all of the configuration is stored in version control and uses the file system, which is widely supported in all platforms.

Disclaimer: If you are actually developing Java Applets (I don't why someone would still want to do that...), then access to the file system might not be an option. In that case, storing your configuration in an external system that can be fetched via RESTful calls is a good solution, all of the same principles mentioned thus far still apply.

Something to keep in mind, consider configuration management early in the development lifecycle. Create patterns for your organization so that every application does it the same way: convention over configuration. Often, it is an after thought and teams tend to develop their own ad-hoc configuration management strategy. This will make it really hard for your Ops and DevOps teams to automate, waste time reinventing the wheel, and you are very likely to make the same mistakes someone else already made.

4. Manage environments

The configuration of the environment is as important as that of the application's. Your application might need specific system level configuration such as number of file handles, memory limits, firewall or networking, connections pools, etc. This is very important to get right, and can vary from one application to the other.

The same principles from above apply here to. Do not reinvent the wheel, do not implement ad-hoc solutions, and keep everything in version control. Obviously you cannot check in your OS into version control, but it's configuration and the scripts can.

I would say that without a doubt, managing environments is probably one of the hardest things to automate. The end goal is to be able to recreate full environment baselines at the touch of a button, including different versions in time. In order for this to work, you absolutely must create a strong synergy with a very agile IT environment, which is not necessarily the case in most organizations and can be very costly. Different departments can have very different philosophies when it comes to managing their environments -- these silos must be broken.

As a result, many organizations resort to virtual environments powered by Citrix or VMWare or cloud environments such as AppEngine, Amazon EC2, Rackspace, Heroku, Azure, etc. You need to be able to fully control the environments you deploy to. As I said before, start by automating as much as you can, little steps towards this will reap lots of benefits down the line.

I don't have any experience with environment management, but I've heard good things about systems like Puppet. 

Conclusion

Configuration managent sits at the core of continuous delivery.

Store all application and infrastructure information so that you can recreate environments: configuration, database, DNS zone files, firewall configuration, patches, libraries installed, extensions, etc. The automation process (as we will see in later posts) will depend on having every artifact your application needs to be accessible on demand.

In the projects I have worked on, I always promote checking in to version control frequently, as well as its counterpart updating from version control frequently. It helps a lot with resolving conflicts and tedious merges.

One caveat of continuous delivery is to go against branching. Branching is antithetical for continuous integration. If you are using Git or Mercurial where branching and merging is the norm, establish a commit and push policy that works for your team, you want to have a stable and updated main trunk line from which you can deploy your code.


One piece of advice, meet with your Ops team to determine how to properly implement configuration management for your applications. This is an aspect of the system everyone should be aware of.

If you will be using key-value paris for your configuration options, use descriptive names for your property keys. The name should express very clearly and concisely what the configuration is for. Use lots of comments on your property files to add more description is necessary. Have a look at a PHP installation's php.ini file for a good example. Do not hard code property keys everywhere you need to access them, wrap them with some sort of ConfigurationService that makes this access simple and testable.

Finally, it is typical for web systems to expose your application's configuration in some sort of management console for super user admins to change. While this sounds like a good idea, and up until this point I thought it was, it's not. Unless you can write that change back into version control, runtime system configuration is not a good idea. In addition, a simple change to a property file can potentially break or degrade the entire system. Therefore, it should follow the same mechanisms in place for source code changes.

Stay Tuned for more on this topic in the posts to come!

Resources

  1. Humble, Jez and Farley David. Continuous Delivery: A Reliable Software Releases through Build, Test, and Deployment Automation. Addison Wesley. 2011

7 comments:

  1. Packaging config into the deployable makes your deployable non-portable. See http://goo.gl/QXxPR for an alternative. Everything else is a nice approach. Even puppet can gather it's data from SVN.

    ReplyDelete
    Replies
    1. Do you mean non-portable to other environments such as QA, Staging, Production, etc? So, using template configuration pattern, you can template the configuration for each specific environment. If you are building QA for instance, you will be able to inject all of the QA specific parameters at build time and package the app.

      Delete
    2. I don't think that is a gold idea if you are using a using a maven artifact approach and nexus (or similar). In a maven environments it is an absolute no go to produce artifacts, which are identically (from a maven and thus, nexus point of view) but may contain different configuration and thus, behavior.

      You will have to produce either different artifacts (different versions) or (mis)use classifiers.

      Delete
    3. You raise a valid point; however, under CD, the idea is to use Configuration Management so that one artifact is all you need to run in all environments. Make your application able to adapt to all configurations. You can do this by creating web hooks so that you can configure the application remotely and create some sort of defaults that can allow the application to run minimally to be able to respond to configuration actions. Ultimately, using Maven or not, the goal is not to recompile. In this manner, what QA is testing is "exactly" what gets deployed.

      Really good comment, thanks for sharing!

      Delete