Your team’s AI prompts are code. Treat them like it

AI prompts are now part of the technical surface area of engineering managers.

By Adeem Shaik

March 24, 2026

Estimated reading time: 5 minutes

Key takeaways:

Your AI isn’t broken, but your process probably is. Treat prompts like shared engineering assets, not personal hacks.
Structure beats guesswork: separate context, instructions, constraints, and format.
Manage AI prompts like code. Version them, review changes, and track metrics.

It’s time to treat AI prompts as code. Most engineering leaders have seen this scenario play out: an AI-generated Request for Comments (RFC) lands in the review queue looking polished on the surface, but falls apart under scrutiny.

Risk assessments are missing. Key decisions are buried three paragraphs deep. Entire sections read like marketing copy rather than technical documentation.

Most teams blame the model. The model isn’t the problem. The way engineers interact with it is. Without shared practices, every engineer develops their own prompting approach in isolation. Output quality varies wildly. Reviewers can’t predict what they’ll receive, and review cycles expand as engineers spend more time fixing AI drafts than they would have spent writing from scratch.

Most teams are stuck somewhere between casual AI usage and systematic prompt engineering. Getting them unstuck is increasingly an engineering leadership problem.

Your inbox, upgraded.

Receive weekly engineering insights to level up your leadership approach.

From Stack Overflow patterns to production systems

Engineering teams initially approached Large Language Models (LLMs) the way they once used Stack Overflow: quick searches, copy-paste solutions, and minimal adaptation. This pattern works for isolated problems, but fails at scale.

The symptoms are predictable. Without a shared approach, prompts become personal habits rather than team resources. One engineer’s RFC prompt produces verbose documents missing critical sections. Another generates terse outlines lacking necessary detail. Knowledge doesn’t compound. Lessons learned by one team member stay locked in their individual workflow.

The teams breaking this cycle treat AI prompts as first-class technical artifacts. They define clear interfaces, maintain version control, implement quality checks, and make outcomes measurable. This shift from ad-hoc usage to systematic engineering is what separates teams that successfully scale AI assistance from those that abandon it in frustration.

Structuring prompts as engineering artifacts

In 2024, Schulhoff et al. published The Prompt Report, a systematic survey that cataloged 58 distinct prompting techniques for LLMs. The paper’s core conclusion: prompt engineering has matured from ad-hoc experimentation into a structured discipline with identifiable, repeatable patterns.

For engineering teams, the practical takeaway is simple. Model providers like Anthropic and OpenAI agree on the distinct components of a good prompt: separate your context (audience, domain, systems), instructions (the specific task), constraints (length, scope, style), and output format (structure for validation). Cramming all of this into a single, unstructured paragraph is the most common failure mode – and the easiest to fix.

A prompt like “write an RFC about adding Redis caching to our Application Programming Interface (API) gateway to fix the latency issues we’re seeing, and make it professional” leaves too many decisions to the model. The model doesn’t know who will read this, what level of detail matters, or what format the team expects. It will fill in those gaps with defaults, and those defaults rarely match what your team actually needs.

Compare this with a structured version: you are a senior staff engineer writing an RFC for engineering leadership and development operations.

System: API Gateway (Node.js/Express).
Problem: P99 latency spikes to 2s under load; database is the bottleneck.

Write an RFC that includes:

2–3 implementation options with trade-offs.
Risk assessment for each option.
Recommended approach with justification.

Length: 800–1200 words
Tone: technical, direct, no marketing language.

The structured version specifies role, audience, constraints, and expected output format. Every engineer on the team generates comparable quality on the first attempt. Two additional techniques are worth building into your team’s toolkit. Few-shot prompting provides examples of desired outputs rather than descriptions of them.

Instead of explaining what a good problem statement looks like, show one:

Example: RFC opening.
Problem: Current synchronous processing causes 3-second P95 latency during peak hours, resulting in a 12% timeout rate for mobile clients.

Now write a problem statement for: [your actual problem]

Chain-of-thought prompting guides the model through explicit reasoning steps, which is particularly valuable for complex technical decisions like architecture trade-offs or incident postmortems:

Analyze this architecture decision step-by-step:

List current system constraints.
Identify must-have requirements.
Evaluate each option against requirements.
Recommend based on trade-off.

Structure gets you consistency. Layer in few-shot and chain-of-thought, and you go from a vague request to a repeatable process.

Version control and measurement

Once teams have structured prompts, they face the drift problem. Well-intentioned “improvements” gradually degrade output quality. An engineer tweaks a postmortem prompt for “thoroughness,” and suddenly it produces 3,000-word documents rather than actionable summaries. Without version history, there’s no way to pinpoint what changed or roll it back.

The fix is the same one we already use for any shared technical resource. Store prompts in a dedicated directory or use tools like PromptLayer that bring Git-style versioning to prompt management.

Review changes through pull requests. Treat prompt modifications with the same scrutiny you’d give a configuration change to a production system.

The tooling is finally catching up. We now have validation frameworks like Guardrails AI and evaluation platforms like TruLens, signaling that prompt engineering is moving from experimental to standard practice

Measurement provides the feedback loop. Three metrics are worth tracking:

Acceptance rate: the percentage of outputs used without significant revision – below 70% suggests prompt problems.
Edit distance: how much content changes before use.
Compliance rate: whether outputs meet security or formatting requirements.

When acceptance rates decline, investigate recent prompt changes. Rising edit distance indicates drift from team standards. With these metrics, AI assistance stops being a black box and starts being something you can actually manage.

The cost of skipping this discipline can be severe. In October 2025, Deloitte Australia agreed to partially refund the Australian government after a AU$440,000 report was found to contain fabricated court quotes and references to nonexistent academic papers.

The report had been drafted using GPT-4o, and the errors were caught not by Deloitte’s own review process but by an external researcher, Dr. Christopher Rudge at the University of Sydney. The issue wasn’t the model – it was the absence of any structured validation process around AI-generated output.

Berlin • November 9 & 10, 2026

AI. Burnout. Big decisions.
The pressure is real. Find what works at LeadDev Berlin.

Explore

Where to start when AI prompts are code

Pick your highest-friction workflow, the one where AI-generated drafts are already creating review bottlenecks or inconsistent outputs. For most teams, that’s RFCs or incident postmortems.

Build structured prompts for that single use case, measure acceptance rate and edit distance, and refine based on what the data tells you before expanding to other workflows. Build from success, rather than mandate.

Just as we expect engineers to write clean code and clear documentation, we should expect them to craft effective prompts. Think of it less as an AI adoption project and more as an extension of the engineering standards your team already maintains.

If you manage engineers, prompts are now part of your technical surface area.

About the author

Adeem Shaik

Adeem Shaik a software engineer at TikTok, specializing in distributed systems and data infrastructure.

Newsletters

Panel discussions

Videos

Reports

For you

London

Meetups

New York

Berlin

Your team’s AI prompts are code. Treat them like it

By Adeem Shaik

Your inbox, upgraded.

From Stack Overflow patterns to production systems

Structuring prompts as engineering artifacts

More like this

Version control and measurement

Where to start when AI prompts are code

About the author

Adeem Shaik

London

Meetups

New York

Berlin

Your team’s AI prompts are code. Treat them like it

By Adeem Shaik

Your inbox, upgraded.

From Stack Overflow patterns to production systems

Structuring prompts as engineering artifacts

More like this

Version control and measurement

Where to start when AI prompts are code

Share:

About the author

Share:

More like this