What can we learn from the big tech companies?
The ‘build vs buy’ decision is a rich topic. At its heart, it’s about how to use developer attention: do I spend my team’s time and attention to build a thing, or do I spend money for a vendor’s time and attention? At many companies, developer attention is the scarcest resource and bottleneck to delivering product features. Any time you make more productive use of developer attention, you’ve effectively multiplied the velocity of your development team.
In this article, I’ll present two guiding principles for the ‘build vs buy’ calculus, and explore case studies of outstanding companies and cautionary tales.
Advice from Tim Cook and Jeff Bezos
Let’s see what wisdom we can glean from tech executives who’ve built successful businesses with their dev teams.
Tim Cook’s doctrine for Apple shows when to build:
‘We believe that we need to own and control the primary technologies behind the products we make, and participate only in markets where we can make a significant contribution.
We believe in saying no to thousands of projects so that we can really focus on the few that are truly important and meaningful to us.’
Is it a ‘primary technology’ (i.e. competitive advantage) for your company’s products? And have you been extra rigorous and discerning that this is a valuable, market-changing idea? If so, build it.
As for when to buy, Jeff Bezos captures it in his 2006 mission statement for AWS: We Build Muck, So You Don’t Have To:
‘Aspiring web entrepreneurs face another battle, namely, doing the undifferentiated yet all-important “heavy lifting” needed to create and operate a web application. …
We are able to share our world-class muck with other developers, … giving them the ability to compete based on the quality of their ideas rather than on their ability to create their own muck.’
Is it undifferentiated heavy lifting, i.e. something you’re not doing in a unique or especially valuable way? If so, buy it.
Know your competitive advantages
Let’s clarify what Tim Cook calls ‘primary technologies’. It’s not every critical component of your products, rather the key differentiators or competitive advantages.
For example, every web app relies on web server software to parse and route requests. But web servers are ‘undifferentiated heavy lifting’ that’s best left to Apache, Nginx, or managed API gateways. The key differentiators for web apps are the interactions and insights they enable.
Every ecommerce store needs to process payments. But processing payments is also ‘undifferentiated heavy lifting’ best left to Stripe, PayPal, and similar. A key differentiator for ecommerce stores is leveraging the brand relationship and trust with the customer.
Even Apple outsources critical components; Apple designs their famous chips, while a vendor, TSMC, fabricates them.
Navigating early adoption
A particularly innovative company may find itself so early in the technology adoption lifecycle that there’s not yet a mature market solution for the ‘undifferentiated heavy lifting’ it needs. The company must either build it or wait until a vendor emerges.
Etsy, a decade ago, was such an example. Erik Kastner created Etsy’s Deployinator, one of the first one-click deploy tools. Today, mature deploy tooling abounds, with virtually every cloud host, code repository, and CI tool offering easy deployment integrations. But ten years ago, the only option was to build it yourself.
It was similar for Etsy’s metrics pipeline. Inspired by Flickr, Etsy invented StatsD before there was an easy way to collect application metrics. But today, robust metric pipelines and easy integrations abound from Datadog, Honeycomb, AWS CloudWatch, Prometheus, etc.
A critical skill for tech leaders is to recognize when they need to ‘kill their darlings’: replace their once innovative in-house components when they’re surpassed by new vendors and services.
Reluctance to kill your darlings results in a Not Invented Here syndrome and an ossified tech stack.
Netflix: effective early adoption
Netflix famously moved much of their server infrastructure to AWS in 2010. Instead of building more data centers, Netflix decided:
We want our engineers to focus as much of their time as possible on product innovation for the Netflix customer experience; that is what differentiates us from our competitors.
So while they outsourced much of their data center costs, they had to invent new ways of building, testing, and deploying software in the public cloud that wasn’t addressed by the market at the time.
To that end, Netflix created Chaos Monkey, which kickstarted the field of chaos engineering and ensured Netflix services were resilient to novel infrastructure failure modes.
To productively orchestrate their complex autoscaling infrastructure, they created the Asgard deployment tool. Later, they killed their darling as Spinnaker matured.
Today, vendors like Gremlin enable chaos engineering and resiliency testing with less effort. And developers have many flexible, ready-made tools for deploying complex cloud services such as Terraform, Cloudformation/CDK, and Kubernetes.
Netflix stands out as an exemplar for embracing innovative vendors, building only what they need to, and evolving as better tools emerge.
Compaq: a cautionary tale of outsourcing too much
In contrast, Ben Thompson explains how computer maker Compaq destroyed itself by outsourcing too much for the sake of high-profit margins. In 1994, Compaq was the largest PC manufacturer in the world, designing and manufacturing low-cost and low-margin PCs. Much of their success came from cleverly reverse-engineering IBM’s BIOS software. They realized they could achieve higher profit per PC by outsourcing their manufacturing. By similar reasoning, they outsourced the design of their PCs, and much of the support and software integration customers wanted.
Soon, customers scratched their heads to see what value proposition Compaq offered. The company shrunk, merged with HP, and eventually became a write-off. By failing to control any technology that could be their key differentiator, the value of Compaq slipped away. Such a conspicuous decline of a PC competitor likely inspired Apple’s focus on control and tightly integrated product ecosystems.
Flow chart
Shout out to my visual thinkers. Here’s a graphic to drive the guiding principles home.
Keep in mind Cook’s heuristic to say ‘no’ to building things far more often than you say ‘yes’.
The wrong way to decide: Total Cost of Ownership (TCO)
Often people approach the ‘build vs. buy’ decision from a cost perspective: tally up how much it costs to build it in-house, get some quotes from vendors, and pick the cheapest one that meets your criteria.
While using a ‘cost of ownership’ calculus is logically coherent, I find it quite difficult in practice. How do you value the quality of a component several years in the future? A vendor is likely to add and improve features over time. Would your organization do the same for an in-house service, or is it wishful thinking?
It may be tempting to add complexity to your TCO model to more fully capture other costs, like ongoing development and maintenance, depreciation, opportunity cost, and so on. Calculating TCO for a large software decision has enough hand-wavy estimates to easily fit whatever conclusion stakeholders may wish. I’ve found that making a detailed TCO model is much more effort, and doesn’t clarify the decision as clearly as the Tim Cook and Jeff Bezos doctrines above.
Conclusion
I encourage you to consider ‘build vs. buy’ primarily from the lens of whether the opportunity merits a long-term strategic investment of your team’s attention, and less from the lens of short-term financial cost. Build if there’s an opportunity to make a significant improvement on the state of the art and create a competitive advantage for your organization. Buy it otherwise. And be ready to discard your competitive advantages of yesteryear as better alternatives emerge.