Let’s not kid ourselves: the public sector at large is a sitting duck for cyberattacks and, in more cases than not, outages aren’t a question of “if”, but “when”.

That might sound alarmist, but the stats tell their own story: 97% of unplanned outages last an average of seven hours, 94% of ransomware attacks now target backups, and only 31% of organisations are confident in their disaster recovery (DR) plans.

If you think your organisation is the exception, you’re probably deluding yourself.

RTO, RPO, and the Fantasy of “Zero Downtime”

Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are the only metrics that matter when the proverbial hits the fan. RTO is how long you can afford to be offline; RPO is how much data you can afford to lose.

The lower your targets, the more you’ll pay – so set them based on a business impact analysis, not wishful thinking or vendor PowerPoint slides. And no, you can’t just slap “zero downtime” on a requirements doc and expect it to happen.

Bucket your workloads: T1 for the crown jewels, T2 for important-but-not-critical, T3 for the stuff nobody will miss. Over-engineering for everything is a fast track to budget hell.

The Shared Responsibility Model: Stop Blaming the Cloud

Here’s a reality check: your cloud provider is responsible for the infrastructure, but your data resilience is on you. If you botch your backups or misconfigure your recovery, don’t expect AWS or Azure to swoop in and save you.

Multi-AZ might sound fancy, but it won’t save you from yourself if you haven’t locked down your data, implemented immutable backups, or tested your recovery runbooks. Accountability isn’t optional – if you can’t prove your plan works, you don’t have a plan.

Recovery Strategies: Horses for Courses

There’s no one-size-fits-all. Backup & Restore is cheap but slow.

Pilot Light keeps a minimal environment ready for critical systems. Warm Standby is a scaled-down live environment, faster but pricier. Active/Active is the gold standard – continuous replication, zero downtime, and a bill to match.

Don’t waste money on Active/Active for systems nobody cares about. Mix and match strategies by workload and component. For example, your database might need Warm Standby, but the front end can sit on Backup & Restore. It’s about business impact, not technical vanity.

Testing: The Bit Everyone Ignores (Until It’s Too Late)

An untested backup is a fantasy, not a recovery plan. Full DR tests should happen annually, tabletop exercises quarterly, component restores monthly, and backup checks daily. If you’re not testing, you’re just hoping. And hope is not a strategy. Document your actual RTO/RPO versus your targets, and treat every test as a chance to find what’s broken – before reality does it for you.

Pitfalls: The Greatest Hits of Failure

  • “Set and forget” DR plans that gather dust.
  • Incomplete documentation – if it’s not written down, it doesn’t exist.
  • Single points of failure (one backup location? Really?).
  • Ignoring dependencies – everything is connected, and you’ll find out the hard way.
  • Staff who don’t know the runbooks.
  • Penny-pinching on DR, only to pay in reputation and fines later.

The Five-Step Framework (Because You Need a Process)

  1. Assess: Identify what matters.
  2. Design: Set realistic RTO/RPO and pick the right strategies.
  3. Implement: Deploy, automate, and lock it down.
  4. Test: Relentlessly.
  5. Maintain: Update, train, and improve – because threats evolve and so should you.

Bottom line: Resilience isn’t optional. Set clear targets, test relentlessly, and stop pretending the cloud will save you from your own negligence. If you can’t prove your recovery works, it doesn’t.