What is Canary Testing?
Canary Testing is a method used to test new functionality or features in production with reduced risk and minimal impact on users. In this testing method, the new feature or new version of the product is released to only a small group of users at a time. This gives minimal exposure to the new feature, allowing teams to validate the new changes without affecting the user experience at large.
The terms canary release or canary deployment are closely related to canary testing. They refer to the deployment pattern where the new feature is rolled out only to a small subset for the initial test. In this context, canary testing is the method that uses the canary deployment technique to test how the new feature is received from the subset of the audience.
Origin of Canary
The term canary testing originates from the coal mines terminology. Canary birds were once used in coal mines to test for dangerous levels of odorless toxic gases in the mine. The presence of dangerous gases would kill the canary before it could kill the miners. So, if the canary is alive, singing happily inside the mine, then miners know they can keep working. The death of a canary bird would indicate the miners to evacuate the mines immediately.
So like a canary bird works as an early signal for coal miners to evacuate the mines before the dangerous gases reach them, the end-users testing the new service or product also provide an early warning for any glitches before the feature is released for all users. In both cases, the canary and the users are not aware that they are being used for testing. The analogy fits perfectly, hence the testing is named canary testing.
Canary test is performed after the product is tested in a sandbox environment. The process for canary testing is automated. So if there is any indication of errors, the release can be quickly reversed.
How does Canary Testing work?
The basic premise of canary testing is the idea that the new features are deployed to only a certain percentage of users. The canary test can be performed by different deployment strategies like blue-green deployment or even by using feature flags.
In blue-green deployment, the production environment is split into two parts - a blue environment and a green environment. The blue and green production environments are kept as identical as possible. Here, the original version of the code is hosted in one environment and the new version is pushed to the other environment, which is currently inactive. The traffic is then split at the server level using a traffic router, to move a certain percentage of the traffic to the new version. So for the purpose of canary testing using blue-green deployment, a small subset of user traffic is moved to the new environment, allowing the team to test for any bugs.
Another way to perform canary testing is by using feature flags which we will discuss in the further section. Using feature flags, the testing team will have more granular control over the deployment process.
Performing this deployment methodology, teams can look out for, analyze and even prevent downtimes, revenue effects, bugs or errors, and the overall sentiment of customers towards the new feature. This is a way to quickly learn about how the new features will be received by users to then formulate an informed strategy for the code version.
Testing in production is considered important because in most cases, the development environment is quite different from the production environment. Performing initial tests in the production environment can help detect small bugs or issues that your user might encounter.
So in canary testing, once the product is tested for a few users, it is rolled out to more users in phases. This largely minimizes the risk of deploying new features and gives the team an opportunity to solve problems that might turn fatal if ignored.
Running Canary Test using Feature Flags
A different method to run canary testing is by using feature flags. Feature flag or feature toggle is a software development process that allows you to turn a feature on or off during runtime without deploying code.
Feature flags, unlike the blue-green deployment, don’t make use of multiple production environments. Canary testing is performed using feature flags, as feature flags enable teams to separate the process of code release from simple feature enablement. So the team can simply turn a feature on or off remotely for a certain percentage of users without affecting the original version.
For canary testing, using the feature flags, teams can roll out new features to only 1% of users and validate the feature for all the key metrics. If the team comes across any errors, the feature can be easily turned disabled by using feature flags.
Once the team has tested for all possible errors and metrics the new feature can be rolled out to 100% of the user base in phases.
Canary Testing and Continuous Delivery
Continuous Delivery is a software development process that allows teams to easily deploy new code changes into the production environment. As the name suggests, there is a continuous upgrade to the previous code version. So teams work on changes to the code in short cycles, so the new code can be tested and deployed frequently. It is a method quite commonly used to deliver improved software quickly to your end-users.
Canary testing fits in well with the principle of continuous delivery methodology, as canary testing mainly focuses on deploying new features to end-users in a risk-free, efficient way. Using both continuous delivery and canary testing can work as a smooth way for development teams to deliver new functionality to end-users at scale, incrementally, and with minimal risk.