Do I need Zero Downtime deployment?
Achieving Zero Downtime deployment is a must for business and mission-critical software systems. Imagine your bank or hospital system going down for regular update or maintenance? Not a great experience!
An important KPI in modern systems is Uptime. It is often closely related to SLA and legal and financial terms and conditions. And even for non-critical SaaS applications, an outage can frustrate customers, and can motivate them to consider moving to your competitor. Not to mention, if an outage affects an e-commerce service, then it will result in lost sales and can be measures in direct monetary value.
In order to achieve Zero Downtime, we need to keep the service running at all time, even while deploying a new version of that same service!. Now that may sound complicated (or even insane to some), but the reality is that it has never been easier. In fact, it is very easy to achieve now with cloud technologies.
Infrastructure as code
The key to a successful shift from traditional Ops to a DevOps culture and practice is to script and automate everything, including infrastructure. There are 3 main activities performed on infrastructure: provision, configure, and monitor, and we need to ensure we perform those 3 activities with minimal (Zero, actually) human interaction.
Representing the infrastructure as code has many benefits
- Faster deployment and recovery. Automation is the key to speed and efficiency
- Eliminate human errors and risks. Know that you can always deploy your system in the exact same way it was done before with zero deviation from the designed process and procedure
- Version tracking. You can see how the infrastructure changed and evolved over time
Immutable, as defined in the dictionary means “Unchanging over time or unable to be changed”.
In our context, once we provision, configure, and deploy a version of a service to specific infrastructure (be it a VM, or PaaS) it becomes unchangeable. That piece of infrastructure should never be touched. Absolutely no patching, upgrades or updates.
The benefits of Immutable Infrastructure are
- Going back to an old version is easy, Just deploy an old image or version to a new server.
- Enforcing infrastructure-as-code. Every change must be a script in the deployment pipeline. Ensuring that a server can be replaced at any time. No more snowflake environments
- Makes it easy and quick to deploy new environments. You can create a new production-like setup at any time for testing or investigating bugs for example
So what we do when we need to perform regular maintenance and updates to new versions?
You provision, configure, and deploy a completely new instance in a new piece of infrastructure (VM or PaaS) and replace the old one with the new one after performing some tests to ensure the new instance is working fine and has no issues or defects.
So what happens to the old version?
It gets de-provisioned and destroyed after the full roll-out of the new version in the new instances. All you need to decide is when you’d like these to be decommissioned?
What if there is a database change between versions?
This is the tricky part. And you need to carefully consider your database change policy. if there will be breaking changes in a database schema, it means you may have to take some downtime to complete the upgrade, testing, and publishing the new version.
Keeping the database updates backward compatible means you can perform Zero Downtime deployments without worrying about a rollback or keeping two different schema versions synchronised or mirrored.
But what about that switch, that must involve downtime?
Not necessarily. But, automation is key here. The pattern is called blue-green deployment explains how to switch between different versions of a service or application in real-time.
Follow the infrastructure-as-code methodology to a lower level. We can use automation and scripting to configure network switches and load-balancers to redirect traffic.
The key to achieve this is to have 2 identical production environments ready at the same time to receive requests. one of them is the current/active instance (Green) and the other is the new replacement (Blue).The redirection of traffic between environments or instances become a matter of flicking a switch to move the traffic from landing at one endpoint to another at the network level. The change takes effect almost instantly. The users won’t notice, except if visible changes to the users was introduced as part of the new version upgrade.
You can even start redirecting only part of the traffic, say 10%, to the new instance to unlock some cool scenarios like A/B testing a new feature or compare performance with current version.
High Availability (HA)
Zero Downtime deployment using blue-green deployment requires an implementation or usage of HA mechanism. which also can be used to improve availability and scalability.
Distribute the workload across multiple servers/endpoints using load balancing is a great way to ensure uptime and mitigate outage. You could balance workloads on OSI Layer 7 (application) via HTTP/S, Layer 4 (protocol) via TCP/IP, Or via DNS. Azure has services that cover the different load balancing technologies you might require. You can go check them out at Traffic Manager (DNS), Load Balancer (L4), and Application Gateway (L7)