From safe experiments to scaled impact: bridging the AI delivery gap

Kevin O'Sullivan

3 weeks ago

Across public services, AI activity is everywhere. Pilots are running, prototypes are being built and teams are exploring what’s possible. Importantly, this isn’t about replacing people, it’s about supporting them, helping teams work more efficiently, make better decisions, and focus on higher-value work.

But very little of it is making the leap into production – and the reason for this isn’t that the tools aren’t good enough; it’s that the conditions surrounding it aren’t quite right.

The organisations that are doing this effectively aren’t just experimenting more, they’re designing their experimentation so that it can scale. They’re operating with one eye on the now, and one on the future.

The comfort, and risk, of the experimentation phase

Experimentation is something that all organisation need to do more of in order to progress. Without it, innovation wouldn’t happen, services won’t improve, employees and citizens won’t have better jobs or lives.

And when done well, experimentation creates a safe space that allows teams to:

Move quickly without heavy governance overhead.
Test ideas before committing investment.
Learn and build skills, capability and confidence in emerging technologies like LLMs and AI Agents.
Reimagine how work gets done.

But that same safety can become a trap if there isn’t a clear purpose or the right conditions surrounding it.

We’ve seen teams stuck in environments that are:

Technically disconnected from core system estate.
Loosely governed, with no clear path to approval.
Optimised for speed, not sustainability.
Not clearly focused on the business value.

The result? Promising pilots that can’t be deployed into operational services.

Designing experiments that are built to scale

If scaling is the goal, meaning moving experiments into live operational processes, it needs to shape decisions from the start. This doesn’t mean slowing down; it means being more intentional about how you move fast.

We find that there are three shifts that make this difference:

1. Start with a real opportunity

Don’t just ask “Where can we try AI?”

Ask:

Where would success require integration into real workflows?
What would “live” actually look like?
What is the outcome we’re trying to achieve?
What are the high-value, low complexity use cases?
Does the available data support the use case?
Does the solution require AI or are traditional solutions more relevant?

If you can’t describe the production and wider business context, you’re not testing the right thing.

2. Build with real constraints in mind

In the public sector, experiments need to reflect the environments they are intended to operate in. This means designing with real operational constraints in mind from the start, rather than treating them as considerations to address later.

That includes:

Data sensitivity and access controls
Governance and auditability
Existing architecture and integration layers
Trustworthiness of the service with appropriate safeguards for Citizens

The most valuable experiments aren’t the most impressive; they’re the ones that can realistically transition into live services.

3. Measure what matters at scale

Many pilots prove something works but far fewer prove it’s worth scaling.

Shift your metrics from:

“Did it run?” to “Did it deliver measurable value?”
“Is it interesting?” to “Is it viable at volume?”
“How do we get it live” to “How do we continuously improve it”

Creating a pathway out of experimentation

Moving into production isn’t a single step but a joined-up model that brings together business, technology and governance functions.

Some of the key things to consider include:

Does your organisation have enterprise agility?
How do you build the case for investment?
Does your workforce have the literacy, culture and expertise to adopt AI at scale?

Cross-functional ownership

Scaling AI isn’t just a technical exercise. This is something we talk about over and over again, but for good reason.

The biggest failure mode for AI is lack of adoption. To get the best outcomes, AI requires product led multi-disciplinary teams:

Business subject matter experts
Project Delivery
Data and architecture
Risk and governance
Operational owners

If these groups only engage at the end, scaling will stall. Cross-functional ownership helps business stakeholders take responsibility, builds user trust, and integrates AI into daily operations instead of isolating it in labs.

AI succeeds when it is owned by the business, not just built by technologists.

A platform mindset

Rather than rebuilding from scratch each time, we find that leading organisations create repeatable foundations:

Shared tooling and accessible dedicated environment
Reusable components
Standardised approaches to integration and monitoring

This is what turns isolated wins into scalable capability.

One eye on the now, one on the future.

This balance is the hard part. Move too fast and you create risk you can’t manage, but move too cautiously and you never realise value.

The answer isn’t choosing one or the other. It’s finding the point where progress and control work together.

This looks like:

Creating safe spaces for experimentation
Designing those spaces with production in mind
Choosing the right moments to experiment, based on the maturity and reliability of the technology
Building the muscle to scale what works

Not every new AI capability is ready for real use. Some will still be evolving, unproven, or driven by hype rather than value. The challenge is knowing when a technology is mature enough to test with intent, and when to wait.

AI success doesn’t come from a single tool, pilot or agent demonstrator, it comes from the ability to learn quickly, test responsibly, and scale with confidence.

Taking the next step

Moving beyond pilots and experimentation isn’t just a simple task, especially when organisations bring individual nuances and ways of working. And for those in the public sector turning pilots into real, operational impact is harder than ever.

We explored this further during our masterclass, From Pilots to Production – Scaling AI Safely in Public Services, at the FutureScot Public Sector conference on the 21^st May.

We covered:

The importance of data in successful AI implementation.
Why traditional delivery approaches struggle with emerging AI technologies.
The principles of effective AI experimentation: clear use cases, rapid iteration, and measurable outcomes.
How to safely test AI within public sector constraints, including governance and data sensitivity.
How to make good technology choices.
The technical steps required to move from prototype to production, including the integration, scaling and operation considerations.

If you want to find out more about what we discussed and how it could apply to your organisation, drop me an email at gary.craven@soprasteria.com and we can explore how your next steps.

Contributing authors: Neil Anderson, Data AI Practice, Chief Technology Officer, Sopra Steria
Gary Craven, Head of AI Strategy and Transformation, Sopra Steria