There’s a point in many senior engineers’ careers where they have a great idea for a big architectural change or technical investment that would be critical to their organization’s success.
They write up an RFC, send it out for review, get some vaguely approving comments…and then nothing gets added to the roadmap. This lack of feedback and results is frustrating. The organization is wasting time avoiding an important problem and the senior engineer doesn’t get to show the multi-team, multi-year impact they need to get promoted to staff engineer!
This is all part of the learning curve for operating at the next level. The processes they’ve learned for writing team-level RFCs as a senior engineer won’t cut it when trying to get bigger changes implemented, and they come across some common challenges:
- Management stakeholders: Engineering managers and directors are critical to getting the change implemented. Persuading them to allocate staffing to an expensive technical investment is more difficult than convincing other engineers that a change is safe.
- Technical stakeholders: Lead engineers on other teams need to sponsor or sign off on the new architectural direction, but they know more about the affected systems than the senior engineer does and are nervous that big changes will disrupt their teams.
- System complexity: Managing all the moving pieces of a multi-team or multi-year change can lead to complicated designs that are hard to explain and review.
- Planning timeline: Planning and review for big changes routinely takes months and implementation can last years. It involves a lot of nebulous alignment work and it can be hard to tell the difference between planning that’s on track and a totally stalled project.
I’ve worked on a few multi-year refactors and helped several engineers plan their own initiatives. I can’t help with the technical specifics for your organization’s problems (‘it depends’) but I want to share the planning template I’ve used for making sure these initiatives stay on track.
Milestone 0: The current system’s trade-offs are documented
The first issue that these large plans run into is that the leadership reviewers have been working far from the code and may have outdated, imprecise, or inconsistent mental models of the system (and they’re too busy to do the legwork to investigate themselves).
Without that shared mental model, reviewers often spend more time trying to understand how the system works than choosing the right path forward.
To accelerate those discussions, you’ll need to be able to quickly and concisely onboard collaborators to your model of the system. A few ways to get there include:
- Update system documentation with overviews that highlight the architectural patterns, trade-offs, and connections to other systems
- Backfill a history of architectural decisions for the system (especially implicit or unintentional ones)
- Run a book club or paper-reading group about industry-standard patterns and vocabulary for the problem space
Because this pre-work (‘milestone 0’) may turn up new information that changes your mind about how to improve the system, I recommend approaching it as volunteer work or a skill development project that doesn’t carry the weight of being committed to solving a particular problem.
This is also a good checkpoint to see if the problem is still worth fixing. There might be other problems that are more important to work on, but your reference documentation will be valuable for onboarding engineers, so your efforts haven’t been wasted.
Milestone 1: The problem statement is approved by management
Multi-team tech changes require collaboration with engineers to understand the impact on other systems, explore potential solutions, and ultimately work on the implementation.
Convincing engineers to take time away from their regular work to help you on the design and implementation is often a bigger challenge than the tech design itself.
To get that help, it’s best to target a specific person in leadership as your champion (such as a director or VP) and write a problem statement that describes why it’s worth spending valuable engineering hours on solving this problem.
Here’s a quick template for writing problems statements that I like to use:
Ideally, <sentence describing vision>
In reality, <sentence about current pains>
Consequences:
- <list of impacts to the business>
Proposal:
Have <working group> evaluate and design changes to address <pains above>, reviewed by <stakeholders recommended by champion>. We’ll ensure that the changes don’t harm <most important qualities to maintain>.
Other considerations:
<short summary of related/out-of-scope problems, prior work, or open questions>
I recommend keeping this problem statement to one or two pages. Any longer and it’s a sign you haven’t prioritized the most critical parts of the problem (making it hard to design a solution); any shorter and it’s a sign you haven’t done enough investigation or talked to enough stakeholders to understand the risks (making it hard to get approval later).
Documenting quality attributes (e.g. performance or uptime goals) and estimates for business impact (e.g. engineering hours or money saved) as explicit non-functional requirements can also unblock decisions later about what’s ‘good enough’ for technical trade-offs and how much staffing the implementation deserves.
Milestone 2: The proposal is ready for discussion
After a few weeks or months of exploration, you should be ready to write up a draft plan with your team’s RFC template (or this template from Squarespace if you don’t have one). However, communicating a big plan is often as challenging as drafting it. Include too many details and you’ll end up with an overwhelming document that is practically impossible to write and review.
To avoid overwhelming authors and reviewers, I recommend extracting a ‘high-level RFC’ that focuses on the overall decisions that need group approval:
- Key architectural trade-offs that change the qualities of the system
- Milestones for value delivery, decision points, and estimated costs
- Breakdown of the overall problem into scopes of team-level RFCs to be written later
The deeper analysis and exploration needed to create the plan can be moved to an appendix or supporting docs. Those details are very important to some of the stakeholders, but not the whole group.
Leaving room for future engineers to plan key components also enables them to improve on the implementation and gives them opportunities to show their own RFC-writing impact!
Of course, no design is complete until it has incorporated review feedback.
Milestone 3: The plan is approved in a group review meeting
When a plan has many technical and people dependencies, it must be reviewed and approved by all the groups responsible for implementing it.
The most common pitfall I see with these reviews is that senior engineers are used to team-level RFCs that usually operate under a do-no-harm policy of ‘as long as you’ve sufficiently addressed the risks, you’re approved for implementation’. But addressing the risks isn’t enough to secure sufficient buy-in for larger changes.
To ensure you have everyone’s support, it’s helpful to spend time on a consensus-building practice called nemawashi, a process of seeking approval from each significant person on a proposed project before committing to a group decision.
To do this, schedule a 1:1 meeting with leads of the affected systems and teams to talk through the proposal, get their impression on the quality of the plan, and hear what they think is important in the design (especially spicy opinions that they don’t feel comfortable sharing on a public doc). Reconcile the RFC with their feedback. Repeat.
Once you’ve secured all the decision-makers’ support in private, you can schedule a short approval meeting where you ask the attendees to show that support on record. A clear way to do this is by going around the room and asking everyone to give a thumbs-up if they think this is the best plan forward and that you should commit to it.
…and with all that taken care of, you have an approved plan. Congratulations, that multi-team, multi-year implementation can finally begin!