Planning next moves: Improving performance when half your stack is someone else's problem

Learn how to measure latency, set realistic goals, and improve performance even when critical parts of your system are out of your control.

Speakers: Maude Lemaire

June 03, 2026

Berlin • November 9 & 10, 2026

Engineering leadership has never moved this fast.
See how other leaders are keeping pace at LeadDev Berlin.

Explore

Most of my career has been spent making one big backend stay up, and go fast. My new job description is harder: keep a product fast when half the latency budget lives inside somebody else’s GPU cluster, every model provider degrades differently, and the honest answer to “”what is our time to first token?”” is “”which of the five numbers do you want?””

So, what does building a reliable and performant product on top of a notoriously unreliable and under-performant AI models look like at Cursor? We’ll walk through the time-to-first-token pipeline from a user’s keystroke through client, network, agent server, inference proxy, and model provider: all the way through the (sometimes comical) ways each layer lies to you about where time went. We’ll learn about how you can’t skimp on the basics (good observability), how to set aggressive-but-achievable goals, and how sometimes, just a handful of relatively simple changes can make all the difference.

If you work on a product that sits downstream of dependencies you don’t own, providers you can’t fire mid-request, and users who think your product is slow when really it’s the weather, this one’s for you.

Slides