Finding the 80/20: Lessons from delivering our first LLM feature

How to choose what actually matters when shipping your first LLM feature, balancing evaluation, trust, and delivery without overengineering.

Speakers: Thordis Thorsteins

June 02, 2026

Shipping your first LLM feature requires more than code. This talk explains how we navigated new strategic demands, from opt-in stances to evaluation workflows, to find a pragmatic path to production.

Berlin • November 9 & 10, 2026

LeadDev Berlin ticket prices go up soon!

Book now and save up to €800.

Tickets

Even for mature engineering organisations, the first LLM-powered feature introduces a paradigm shift. The actual coding often feels like the easy part. The real challenge lies in a daunting list of new “supporting” activities. From defining evaluation frameworks and prompt versioning to deciding on customer “opt-out” stances and communication strategies, how do you decide where to focus your limited energy?

In this session, I will share the “battle scars” from our journey to production. I’ll break down our approach to the 80/20 rule: identifying the activities that required deep investment, those we found a “light” way to handle, and those we intentionally parked for later.

We will explore:

The Effort Flip: How we restructured our workflow to move from “getting it to work” to “ensuring it’s good enough,” reallocating engineering time from writing code to context engineering and evaluation.
Strategic Stances: How we tackled non-code needs like opt-in/opt-out policies, customer FAQs, and internal upskilling.
Triage in Action: What we over-invested in, what I wish we’d done sooner (like early internal documentation on feature capabilities) and what we successfully deferred (like automation of evals).

This isn’t a prescriptive guide, but a look behind the curtain at how we defined our own “production-ready” standard. You will leave with a framework for auditing your own AI roadmap to ensure you’re focused on the activities that truly drive reliability and trust.

Key takeaways

Evaluation First: Robust evaluation matters more for reliable AI systems than perfecting the initial code.
Prioritising based on a North Star: We treated security and trust as non-negotiables, informing what to prioritise, including OWASP Top 10 reviews and data privacy decisions.
Design the First Feature to Scale: The initial rollout can establish reusable patterns that accelerate every AI feature that follows.
Manual Before Automated: Starting with human-in-the-loop evaluation saved time and helped us understand failure modes before investing in automated testing.

Slides