How Monzo can tolerate a complete outage of their platform while still giving their customers access to their money.
More like this
Monzo’s customers reasonably expect to be able to spend on their card, make bank transfers, and see that their money is safe 24/7. Their lives don’t have downtime for maintenance so nor should we. Monzo take reliability seriously so to add another layer of defence against these potential incidents we’ve built a completely separate backup bank called Monzo Stand-in.
Monzo Stand-in is designed as an always-on disaster recovery system, to be kept up-to-date in near realtime, to be constantly tested in production, and to be used in a matter of seconds should we have a critical incident. It runs completely independently to Monzo’s main platform in AWS with entirely new systems and infrastructure running in Google Cloud.
Here we talk through the design of our system, the kinds of decisions and trade-offs we made to achieve it in just 10 months, and how we worked with others across Monzo to deliver it.
Key takeaways
- Principles for running a large programme of work and designing systems that touch every part of an organisation.
- Insight into a proven, pragmatic, system design for a Disaster Recovery system that has continuous testing and immediate enabling built-in, including highly technical detail in particular areas.
- Interesting anecdotes about some of the hurdles, trade-offs and decisions we made along the way that helped us to deliver in just 10 months.