If 95% of generative AI pilots fail, what’s going wrong?

Learning the right lessons from that MIT study

By Charles Humble

November 24, 2025

Estimated reading time: 9 minutes

While the MIT study may be overblown, it’s important to remember the fundamentals when rolling out any new technology.

In September, a group at MIT released a survey which found that 95% of corporate AI pilots fail. It was an eye catching finding that went immediately viral. But the data behind the headline is scant.

As Wharton professor Kevin Werbach points out, the 95% figure is presented in one sentence, but the authors offer no detail on how they came up with it. Moreover, the report alludes to “52 interviews and hundreds of data points,” but very little of the demographics of the pool or how the data was collected were disclosed.

But if the majority of Gen AI projects do fail, what patterns can we identify that give them a chance to succeed?

Recent DORA research – involving more than 100 hours of qualitative interviews and surveys with nearly 5,000 technology professionals globally – concluded that generative AI is an amplifier, for good or bad.

“It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones,” the report states. The greatest returns on AI investment come not from the tools themselves, but from “a strategic focus on the underlying organizational system: the quality of the internal platform, the clarity of workflows, and the alignment of teams. Without this foundation, AI creates localized pockets of productivity that are often lost to downstream chaos.”

Before you even start a rollout, “you need to decide what you are going to measure: it shouldn’t be adoption because that isn’t the same as impact. You need to be very clear about the impact you are expecting to get out of something,” Google Cloud’s DORA team lead Nathen Harvey told LeadDev.

Your inbox, upgraded.

Receive weekly engineering insights to level up your leadership approach.

Communication and training

These lessons seem to apply equally beyond software development. Nicoletta Curtis, who recently retired as Head of Infrastructure for IT security and service delivery at Unum UK, led a successful large pilot of Microsoft 365 Copilot in an unnamed financial services institution.

That company has around 1,000 staff, and the project Curtis ran laid the groundwork for a phased rollout to all of them, from actuaries and data analysts, to staff in the call centers – all driven by the CIO with executive buy-in.

A key factor in their success was communication. “Before I launched the rollout, they’d had several public town hall meetings where they talked about AI, showing how the tools could take away some of the boring stuff, rather than being a big, scary thing that’s going to take over jobs,” Curtis said.

When she joined, she performed a lot of stakeholder management with heads of departments and team leaders, to make sure they understood their staff’s concerns.

As part of the successful pilot the firm invested in training, with an initial three hours of remote lessons for all staff conducted by Advania, an external training provider and Microsoft partner. The initial training was in three sessions covering:

What are your pain points at work?
General training – how Copilot works, what makes a good prompt etc
How to help solve the pain points discussed earlier.

This was followed by a further two hours of in-person ‘prompt-o-thons’, to help staff understand what was possible, focused on real-life work problems and writing good prompts to get the results you need.

During the training process staff members discussed use cases. Some were basic, such as helping managers to write objectives, taking notes from meetings, and summarizing actions. Curtis told LeadDev that they also “had a lot of staff for whom English wasn’t their first language, who found it helpful to iterate through business proposals to improve their writing and get their point across.”

Clear polices

The DORA report emphasizes the importance of having clearly defined AI policies, and Curtis echoed this. Her client was already developing some as “they were already using AI in a number of areas such as pricing and data,” she said.

As Deloitte recently found to its cost, you really need to make sure staff understand that these tools are slot machines, not databases, and will create false information.

Deloitte has rapidly adopted GenAI tools for staff in an attempt to boost productivity, whilst also warning them to do their own due diligence and quality assurance. Now Deloitte Australia has agreed to partially refund the Australian government for a $440,000 AUD (about $290,000 USD) report on welfare compliance that contained multiple AI-generated errors, including fabricated academic references, non-existent research papers, and a fake quote from a federal court judge.

For Curtis’ team, it was emphasized that “staff remain accountable for the decisions they make, particularly when talking about clients or claimants,” she said.

In addition, working with the operational resilience and risk teams, a policy was drafted that will become part of the annual training and is compulsory for all staff. It covers data security as well as appropriate use. Curtis’ client monitors prompts for workplace appropriateness, which has necessitated expanding HR policies and ensuring staff are available to do the monitoring.

Theory of constraints

From an organizational point of view, DORA’s Harvey emphasized that having a healthy data ecosystem that your AI has access to is really important. “We want to make sure we have the right permissions set,” Harvey said. “At the same time I worry that some organizational leaders might use this as an excuse not to try new things. But you can constrain what data we give your AI access to, along with ‘observability’ so we can detect when we’ve crossed a line.”

Curtis found that when preparing for the rollout the biggest impact was on the security team, who had to carry out a large amount of work on data permissions upfront. “Even with the basics, such as OneDrive and Sharepoint, we found documents that were overshared or with open permissions,” she said. While such information would always have been discoverable, Copilot adds to the risk by making accidental discovery more likely.

Addressing this required extensive liaising with heads of departments, to “get their acceptance that there might be some disruption while security locked down permissions.” It also involved the risk and data protection teams.

Make hidden investments

As the rollout continued, some more advanced use cases emerged, such as building agents to look at particular types of quote, searching policy documents, or comparing contracts in procurement.

When adding these agentic AIs into core workflows, Curtis emphasized, operational resilience becomes more important. While LLMs are fundamentally deterministic in their underlying mathematical model, in practice they are often non-deterministic due to factors like randomness parameters and system-level variations.

In other words they will behave unpredictably: you might think of when Air Canada was ordered to compensate a passenger who received incorrect refund information from its chatbot or The National Eating Disorders Association (NEDA) chatbot, Tessa, making potentially dangerous suggestions related to eating disorders. “In financial services, if the AI agent malfunctions, your staff need to be able to run the process manually,” Curtis emphasized.

Once a tool like Copilot is well integrated, getting rid of it isn’t easy. Curtis’ team ran before and after surveys, asking questions such as, ‘How much of your time is spent on boring repetitive work?’ and ‘Would you miss Copilot if it was taken away?’ “Some people told us that if we took it away, they’d quit and look for a different job,” she said.

The case for starting small

Harvey suggests that there are two ways to adopt generative AI. “The first is to find people in your organization who are excited about using this new tool and give them the time and space to go play around: we learn and get better through play.”

A second method is to start from a value stream mapping exercise. “Use the value stream map to identify the waste in the system, and figure out if you can eliminate the waste altogether without introducing any new technology, and then look at the ways this new technology can help.”

But the most important thing is to work in small batches. “The role of engineering is to take a large problem and break it down into smaller problems which you then solve. From an organizational perspective if you are trying to roll out a transformation you take that transformation, break it down into smaller problems, and you start solving those smaller problems.”

Rigorous code review

Steve Curtis is a senior engineer at ClearBank, a young clearing bank that has had some early success applying AI.

“We’ve been able to reduce costs in fraud detection and financial crime even as the volume of transactions grows. This is through trend analysis, spotting recurring bottlenecks, and working more proactively to put the right controls and best practices in place” Clearbank’s Curtis told LeadDev.

The firm also measures the value of the time saved across the entire business, which it calls capacity release, for Copilot. “I’ve been told that after five months of having Copilot available to all colleagues, we estimate a capacity release of £750,000. This is increasing exponentially, so we’re on track to hit over £2 million in our first year.’

His director of engineering is very keen to push adoption of AI assisted development. “About six months ago he tasked us with building a Personal Backlog Item (PBI), using AI prompts only in Visual Studio in agent mode to see how we got on.” What his team found however was that “it can write a lot of code but it’s not always good enough.”

Curtis has noted a generational split at Clearbank. “Staff in their 20s are much more likely to use the AI, whereas the older staff are much less likely to use it, and when we do it’s very specific, for tasks like generating unit tests, or converting a JSON structure into an anonymous type.

“It definitely can save you some time, but of course typing has never been the hard bit of programming, and I’m not sure it saves as much time as CTOs imagine,” he said. “I’ve certainly watched a young programmer repeatedly trying to craft a prompt to get the result they wanted and thought ‘I could have just written the code by now!’”

By using mob and pair programming routinely and pairing developers of different ages they are able to learn from each other. In addition ClearBank has used hackathons and have run their own internal conference with guest speakers and members of the tech team giving talks as part of training.

Final thoughts

ClearBank’s experience with AI-assisted development so far suggests these tools offer genuine but modest productivity gains, rather than the transformative revolution some might expect.

While GitHub Copilot excels at specific tasks like converting data structures and generating unit tests, it hasn’t fundamentally changed the nature of software development, where thinking and design remain far more challenging than typing code.

In this new landscape, the key skill may not be using AI more, but knowing when and how to use it appropriately.

More broadly, the success of generative AI rollouts depends far less on the technology itself than on the foundational work that organizations are willing to undertake. The best approach combines thoughtful change management, clear communication, comprehensive training, robust policies, and ongoing operational oversight.

Organizations that recognize AI as an amplifier of their existing strengths and weaknesses, and invest accordingly in their people and processes, are positioning themselves to realize genuine, sustained value.

LDX3 London 2026 agenda is live - See who is in the lineup

London • June 2 & 3, 2026

LDX3 London prices rise March 4. Save up to £500 💸

Book now

About the author

Charles Humble

Charles is a writer, editor, and consultant. He is the former editor-in-chief at InfoQ.

Newsletters

Panel discussions

Videos

Reports

For you

London

Meetups

New York

Berlin

If 95% of generative AI pilots fail, what’s going wrong?

By Charles Humble

Your inbox, upgraded.

Communication and training

Clear polices

More like this

Theory of constraints

The case for starting small

Rigorous code review

Final thoughts

About the author

Charles Humble

London

Meetups

New York

Berlin

If 95% of generative AI pilots fail, what’s going wrong?

By Charles Humble

Your inbox, upgraded.

Communication and training

Clear polices

More like this

Theory of constraints

The case for starting small

Rigorous code review

Final thoughts

Share:

About the author

Share:

More like this