Continuing on the series of Continuous Delivery topics which follows from this last post here, in this post I will take about the importance of using a build tool to give structure to your build process/workflow.
Application systems become exponentially more complex by several increasing factors: size of code base; size of the development team; number of third party artifacts; and number of external dependent systems to integrate with such as web services, databases, queues, caches, etc.
Building the artifacts of such an application is a multi-step task that will quickly start to become a real headache to carry out manually. Automating your build process via scripts or tools is essential for medium to large size projects. Developers, QA and Operations (and emerging DevOps) teams must work in unison to accomplish this. Continuos Delivery is the responsibility of the whole team.
Build tools typically follow a certain paradigm: they carry out ordered tasks that belong to different phases or goals. These tasks can be anything from source code checkout, source compilers, static code analysis, test execution, deployment scripts, etc.
From the image above, your build script is a pipeline composed of a sequence of actions that must be executed predictably in the right order and together constitute your application's backbone. A deployment pipeline will have different stages which might execute scripts to: checkout code, compile, test, deploy, run tests, code analysis, versioning, etc. Performing these steps manually is error prone and inefficient.
In the Java world there are many tools you can use: Ant, Maven, or Gradle, to name a few. In the PHP world it's pretty much Phing's game.
In this post I will focus on Maven, since this is the tool I have the most experience with and use successfully and many different types of projects. Maven provides a rich declarative, extensible XML-driven domain model for building applications. With its convention over configuration paradigm, Maven can accomplish any build task possible via its rich plugin ecosystem.
Maven can be downloaded here: http://maven.apache.org/download.cgi
Maven is a tool developed in Java for Java projects primarily and its power lies in its dependency management. Large Java deployments are typically multi-module in nature, and having a tool to manage all of your third party as well as in-house modules and their transitive dependencies, could save you from lots of pain. Transitive dependencies are easy to understand; basically, if project A depends on artifact B, B depends on artifact C, by the transitive property, project A depends on C. Maven handles all of this for you as well.
Every tool has its downsides, and Maven has its share of them. Although it does not occur too frequently, the biggest downside is that Maven will tend to update its core plugins without warning, making your builds somewhat unpredictable. Because Maven's core is really small by design, it relies on plugins to become a full fledged build system. A subset of these plugins might get updated on the fly at any time. Another downside is that Maven's language is an external DSL written in XML called a POM file, which means in order to extend it you must write custom extension plugins. The good part is that Maven's plugin ecosystem is enormous, there are plugins for every task you will perform on a typical Java enterprise projects. As a result, every vendor or tool provider will have a Maven plugin available that integrates your tools/products together.
On the other hand, a build tool called Rake, the Ruby solution to Make, solves this problem by providing a native API for Ruby. Ruby is a very expressive language well suited for writing internal DSLS (I discuss internal DSLs here). Instead of writing XML elements, your build script becomes Ruby code which allows you to use all of the power of a general purpose programming environment such as a debugger and code completion, code refactoring and modularization, class augmentation, etc. A word of advice: Maven uses a feature called SNAPSHOTs. Snapshots are equivalent to Composer's "-dev" functionality for PHP applications. Maven will always check for newer versions of the same artifact version you are using, and download it if there is one available. I recommend using snapshots only in development phases and only for in-house artifacts. If you are using a third party artifact, you will want to download it only once-- avoid using SNAPSHOT versions for these. Third party vendors can change the contents of a SNAPSHOT artifact at any time, so your build becomes very unpredictable. Maven provides configuration to avoid snapshot downloads from third party repositories.
Maven will used to drive the assembling of your application artifact, whether it be a an EAR, WAR, or JAR file. Maven will begin executing at the commit stage in the diagram above. Once the artifact has been built, it will proceed to deploy it to an Artifact Repository for later consumption. The following is a high level diagram illustrating all of the phases of the lifecycle. You can attach plugins to perform tasks at every step.
Principles and Practices
Below I will discuss some important principles and practices to follow when you are creating your build scripts. Maven will help make this process really straightforward and consistent. Fore more information on creating a Deployment Pipeline, you can take a look at this post Deployment Pipeline. This will become important for architecting your scripts.
Create a Script for each State
One way of organizing your builds is to write a script for each stage in your deployment process. This keeps your scripts clean and focused on the particular tasks for each stage. If you need to share information amongst the scripts, Maven provides the functionality of having a parent script (parent POM) that can be derived by your individual scripts. Maven translates the idea of writing scripts, to implementing plugins. The plugin ecosystem is huge and you can perform tasks such as: compiling code, running tests, assembling code, copying resources, creating manifest files, versioning, source code checkout, code minification, etc.
Use the Same Script to Deploy to Every Environment
Scripts used to deploy to development machines should be exactly the same as QA, staging, as well as Production environments. This will ensure that your build process is tested thoroughly every step of the way. To achieve this you must externalize (or extract) configuration information from the scripts so that you can configure each environment in the same way, both artifacts belong in source control. Maven can accomplish this with the Templates and Filters pattern.
Oftentimes, the setup in a developer's machine is nowhere near the same as a production environment. For instance, you might have queueing systems, messaging systems, email servers, or databases that are configured far differently in production than in a local environment. In these cases, you need to look for simplified versions of these dependencies. Research of tools such as in-memory databases, in-memory queues, mock e-mail servers, etc. This investment is well worth your time. If your application depends on some components built in-house, it is essential that all build environments have access to them. With Maven, you can set up and external Nexus server. Nexus is an artifact repository (or package proxy) in charge of storing artifacts developed in-house. It can also act as a package proxy for external repositories so that you can read artifact (JAR files) from the open source community like Apache, Spring, Google code, to name a few. This makes all components accessible to all build environments.
Whenever possible, use packaging systems for any artifacts related to the Operating System and your application. I highly recommend that all your environments be Unix or Linux-based. Package managers really depend on the platforms you are using: Debian and Ubuntu use the Debian package manager, RedHat uses RPM, and CentOS and Fedora use Yum.
Language platforms also have package managers, every major language has one: PHP has Pear or Composer; Java uses Maven; Python uses pickle or Pip, Perl has CPAN, Ruby uses rubygems, and so on. I recommend that your application be treated as a package as well so as part of your build process, deploy the different versions of your application artifacts into the artifact repository. Using package managers allows you to script your deployment process much easier, all of the installations become a set of commands against your package manager tool. If you require special installations of commercial software for which there are packages available, you would have to manage this exception and make this manual step.
Make your builds idempotent
This is a nice way to say: ensure that your build process leaves the build environment in the exact same state you found it before building. This is especially true for test artifacts like databases. If you need to set up databases to perform integration tests for your application, always make sure to remove any test artifacts created as part of the build. I've experienced this many times. Developers tend to write unit test that write data against databases (terrible idea) and fail to properly clean up data. This can cause the next builds to fail.
Script your deployments
The best way to script your deployment process is to you a Continuous Integration (CI) tool. There are many players in this field including Jenkins, Hudson, and Bamboo. Some better than others, these tools provide an interface to set up different build plans that are composed of multiple stages. Depending on your deployment pipeline and the needs of your system, you should script each stage accordingly. These CI tools have a plugin architecture that will allow you to run tools such as Code Analysis, Unit Testing, among others.
At work we have had a really positive experience using Bamboo. Bamboo is commercially available, but you can install Jenkins which is free to use. If you don't have access to these tools for some reason, at the very least you should write your own scripts that perform this orchestration for you.
In later posts I will provide more details about our implemented deployment pipeline, scripts, and our use of Bamboo CI.
- Humble, Jez and Farley David. Continuous Delivery: A Reliable Software Releases through Build, Test, and Deployment Automation. Addison Wesley. 2011