Multi-Region Disaster Recovery with AWS Backup

I spent some time exploring how AWS Backup could serve as the foundation for a multi-region disaster recovery strategy. The scenario: critical production data needs to survive even if an entire AWS region goes down or an account gets compromised.

This is what the design looks like when you combine cross-region copies, cross-account isolation, and logically air-gapped vaults.

 Primary Account
 ┌─────────────────────────────────────────────────────┐
 │                                                     │
 │  Region A (primary)           Region B (secondary)  │
 │  ┌───────────────┐           ┌───────────────┐      │
 │  │  AWS Backup   │ cross-    │    Backup     │      │
 │  │    Vault      │ region    │    Vault      │      │
 │  │  (per resource│─────────> │  (per resource│      │
 │  │   type)       │           │   type)       │      │
 │  └───────────────┘           └───────────────┘      │
 │         │                                           │
 └─────────┼───────────────────────────────────────────┘
           │ cross-account copy
           v
 Isolated Backup Account
 ┌─────────────────────────────────┐
 │  Region A                       │
 │  ┌───────────────┐              │
 │  │  LAG Vault    │              │
 │  │  (air-gapped) │  shared      │
 │  │               │  via RAM     │
 │  │               │<─ ─ ─ ─ ─ ─ ─  source account
 │  │               │  restore        can restore
 │  └───────────────┘              │
 └─────────────────────────────────┘

The idea

Every supported resource gets a dedicated vault in the primary account. One per resource type, not a shared vault. This keeps vault policies and KMS keys isolated from each other.

Two copy targets protect each resource:

The cross-region copy sends recovery points to a secondary region within the same account. If the primary region goes down, the secondary has recent backups ready to restore.

The cross-account copy sends recovery points to a completely separate AWS account. The destination isn't a regular vault. It's a logically air-gapped vault (LAG vault). If the primary account is breached, the backups in the isolated account remain untouched.

What makes a LAG vault different

A LAG vault is designed specifically for isolation. AWS manages the encryption keys (you can't bring your own), and the vault enforces a minimum retention period that nobody can override or shorten. Not even an account admin.

Even if someone gains full access to the account, they can't delete or tamper with the backups before the retention expires. It's the closest thing to a physical air gap without actually disconnecting anything.

The LAG vault lives in the isolated account, but the source account still needs to restore from it during a disaster. That's where Resource Access Manager (RAM) comes in. The isolated account shares the LAG vault via RAM with the source account. Once the share is accepted, the source account can see and restore from the recovery points. Without RAM, the backups would be locked with no way to access them.

The two-plan tradeoff

Ideally, each resource would have a single backup plan with two copy actions: one for cross-region, one for cross-account. Cleaner, no duplicate recovery points, less storage cost.

But AWS Backup applies all copy actions to every resource that enters the plan. There's no way to conditionally skip a copy action based on a resource tag. Tag-based filtering only works at the selection level (which resources enter the plan), not at the copy action level.

This is a problem because some resources don't need the cross-region copy. DynamoDB Global Tables and Aurora Global Database already replicate natively. Cross-region backups for those are redundant. But they still need the cross-account copy to the LAG vault for ransomware protection.

With a single plan, it's all or nothing. So the design splits into two plans: one for cross-region, one for cross-account. Each plan has its own selection with different tag-based exclusions. The downside is duplicate in-region recovery points, but it's the only way to get per-resource granularity.

This is a known AWS Backup limitation. Conditional copy actions based on resource tags aren't supported today.

Opt-out with tags

The backup plans use tag-based selection with an opt-out model. All resources are included by default.

backup/exclude-all: "true" removes a resource from all backup plans entirely.

backup/exclude-cross-region: "true" skips only the cross-region copy. In-region and cross-account backups still run. Useful for resources with native replication.

Services without native support

OpenSearch and MemoryDB aren't supported by AWS Backup natively. The workaround: CronJobs that generate snapshots and store them in S3 buckets. Once the snapshots land in S3, they're automatically picked up by the standard S3 backup plans and get the same cross-region and cross-account protection as everything else.

Takeaways

LAG vaults are worth the tradeoff. You lose KMS key control, but you gain tamper-proof retention.

One vault per resource type keeps things clean. Shared vaults get messy at scale.

Opt-out is safer than opt-in. People forget to opt in. They rarely forget to opt out.

Continuous backups for S3 save real money on cross-account copies of large datasets. PITR is more useful than daily snapshots.

Test restores regularly. A backup that can't be restored is not a backup.