By Christoph Goldenstern, Kepner-Tregoe
The key factor that can make or break a company in today’s competitive business environment is the availability of the services it provides to its customers. Outages happen, regardless of how brief they may be. When the revenue of a business mostly relies on “being online and available”, even a minute may be too much. When Facebook, Twitter, Salesforce.com or Amazon is down, it is international news.
Our systems must be online, 24 hours a day, 7 days a week, without a break. Is 100 percent availability even possible? Can we insist on it in our service level agreements (SLA) with suppliers?
One-hundred-percent uptime is difficult to provide because of simple, everyday IT management issues. There are always upgrades to install, system overloads, hardware and software issues, crashes and other common mishaps that will almost always make for at least some downtime, unless we have 100% redundancy.
That said, can we insist on 100-percent availability? The answer is a firm…maybe!
Total availability is possible, but at a cost. Is the cost of 100-percent uptime worthwhile and what does it actually mean to you? Every SLA with a cloud provider will include a list of exceptions, or time not included in its uptime statistics. These may be agreed maintenance windows and force majeure (also known as acts of God). There may be a clause that states the provider has a minimal window of time to recover from an outage – in a “high-availability” SLA this is likely to be a matter of minutes. If it achieves this goal, then the outage will not be counted as downtime.
This means, in reality, 100 percent will actually be somewhat less than “100 percent.” To your end users, however, it should, for all intents and purposes, appear to deliver constant availability. The key to maximizing the customer experience is how your cloud provider responds to planned and unplanned outages, taking advantage of cloud architecture and on-demand concepts, literally streaming IT services to reach availability goals.
The nature of the cloud allows for multiple redundancies, with the capability to transfer delivery of services to alternative hardware virtually seamlessly, which means your end customers do not perceive any interruption to their services.
The level of competition in the cloud marketplace offers cloud providers with a very strong incentive to deliver a high level of service quality.
Cloud providers promising 100-percent uptime are actually promising your site/servers will be online constantly. If they are unable to do so, and they will be trying their hardest to achieve this goal, then they are agreeing to be accountable and to compensate you for any downtime.
Any reasonable person knows there is no 100-percent uptime. The secret to understanding what it actually means is in your SLA. Cloud providers will have limits on what is acceptable and they will pay for the uptime they can’t deliver. It is extremely important you read the terms of the SLA and understand the exceptions, and what compensation the provider is offering. Some SLAs look good on the surface, but may actually be so full of exceptions that the percentage of availability offered is meaningless. You must read and understand what is being offered.
A true 100-percent availability SLA actually means your provider will do everything in its power to make sure your systems are constantly online and your customers do not experience any outages. The provider is promising to be held accountable – and that accountability is the key.
Before services go down put a plan in place to get them back on line.
Kepner-Tregoe has been the industry leader in problem-solving and service-excellence processes for more than 60 years. The experts at KT have helped companies raise their level of incident- and problem-management performance through tools, training and consulting – leading to highly effective service-management teams ready to respond to your company’s most critical issues.
To learn more about how Kepner-Tregoe can stop firefighting at your company