Racksquared Updates and Insights

IBM iSeries Disaster Recovery: Lessons learned from a hardware failure

Written by Carl Miller | Aug 20, 2020 5:58:00 PM

More than 150,000 organizations worldwide rely on the IBM Power Systems platform and the IBM iSeries operating system to keep their core business up and running. This includes business applications like enterprise resource planning (ERP) software, banking applications, customer relationship management (CRM) software and health information systems. Because of  the business-critical nature of these applications, companies rely on the iSeries as it is known for its reliability, scalability and stability. But what happens when the unthinkable happens and one of these systems fail?

Putting Backup and Disaster Recovery to the Test

Several years ago, I was working with a company that had a double hard drive failure on their iSeries requiring a complete reload of the server. The good news, they had solid backup procedures in place and no data was lost. The bad news, it took over 48 hours to restore the data and recover the server. During this time, their warehouses were down, creating delays in customer shipments resulting in a poor customer experience.

Would you give this result a passing grade? While there was no data loss, an extended system downtime can hardly be considered acceptable and they knew they had to do better.

How Tape Backups are Holding you Back

At the root of this extended recovery time is the fact that they, along with most companies we talk to, were leveraging tape to backup these systems. The problem with this approach, is that tape is slow from a backup process, sometimes taking as long as twelve hours. When backup times are too long, they can run into production time which means the backup has to be cut short so the systems can be returned to work. This simply doesn’t work and leads to business disruption.

If the backup times are not enough of a concern, take note of recovery times from tape. In my experience, recovery time can be two to four times longer than your backup process, preventing you from meeting internal service level agreements (SLAs) for recovery time objectives (RTO), which was the case for this company.

Bottom line, your tape-based approach may prevent you from recovering systems in a timely manner.

Virtual Tape Library, A Step in the Right Direction

While we were able to recover everything for them, a postmortem of the outage made it clear that they needed to make changes in order to reduce backup and recovery times and meet internal SLAs. They set their recovery time objective (RTO) to 60 minutes and began reviewing their processes and the technologies they were leveraging. After reviewing several solutions and vendors, they chose to go with a Cybernetics Virtual Tape Library (VTL) solution. The VTL solution provided a faster method of backing up their entire system and, in the event of a disaster, they could recover faster. Additionally, the solution provided the ability to replicate their data to a secondary site where they had another VTL and if desired store data to tape for long term retention.

The VTL dramatically improved their situation but they would still fall short of their 60 minute RTO, more needed to be done.

Meeting Recovery Time Objectives (RTO) with High Availability

As with most companies, downtime comes with a cost and having several manufacturing and distribution centers sitting idle was not a prospect they wanted to consider. But the problem was about more than just backup and recovery processes, it became an issue of needing to deploy a high availability infrastructure that would allow for a quick failover. To achieve this we deployed additional hardware to a secondary data center and implemented Rocket iCluster software. The Rocket iCluster software enabled us to replicate their server over to a secondary data center and infrastructure. In the event of a disaster, this software allows them to switch over to the backup server in less than 15 minutes meeting the company’s recovery time objective.

How Confident are You in Your Backup and Recovery Processes?

Your company runs on IT and if IT is not working, neither is your business. If the above story sounds like your current situation, let’s talk about how Racksquared can help. We took what we learned from this company and created a solution that we sell to companies around the world. We host, manage and monitor iSeries solutions for our clients, helping to make sure they can meet their RTO should a disaster occur. Further, our solutions can help you break out of the capital expense model and smooth expenses into an operating expense model with consistent, predictable monthly outlays. Think about it, no more hardware upgrades and no more expensive maintenance contracts.

If this sounds interesting to you, please check out our iSeries Hosting solutions as well as our iSeries Management services. Or, if you would like to talk to an iSeries subject matter expert, give us a call at 614-437-4902.