When human oversight isn’t enough: Governing AI in the real world

Human-in-the-loop AI sounds reassuring. It suggests that automation is not being left alone, that professional judgement remains in control, and that risk can be managed by asking people to check AI outputs before using them.

In high-stakes environments, that reassurance can be dangerously incomplete. Human oversight matters, but it is not the same as effective governance. An AI tool approved for one purpose can be used for another, even with guidance and policy in place to prevent this. A system introduced to reduce administrative burden can start to influence clinical reasoning. A disclaimer can be visible, a human can remain involved, controls can be in force - but real harm can still occur.

The question is not simply whether a human is in the loop. The question is whether the whole loop has been designed, tested, monitored, and assured for the conditions in which people work.

NHS hospital setting with a clinician using an administrative tool to manage a busy workload

A realistic example: administrative support in a busy hospital

Imagine an NHS hospital introducing an internal AI assistant to help clinical teams with administrative work.

The problem is real: staff are stretched, spending precious time drafting letters, searching for local policies, preparing discharge information, and completing forms. While necessary, these tasks reduce the time available for patients.

The AI assistant is approved for a narrow and sensible purpose. It can interpret patient records, draft letters, retrieve internal guidance, and complete hospital documentation. It is explicitly not approved to provide clinical advice or diagnoses.

The governance arrangements appear reasonable:

Clinicians must review outputs before use.
The interface states: “Not for clinical decision-making.”
Usage is logged.
Concerns can be escalated.

Initially, the deployment looks successful. Letters are drafted faster. Staff find policies more easily. Documentation is completed with less friction. The hospital sees productivity gains and the tool is widely adopted.

Those benefits are tangible: used well, AI can reduce administrative burden and release clinical time. But this is exactly where governance needs to become more active, not less.

The creep towards misuse

Over time, use of the assistant begins to change.

Clinicians still ask administrative questions:

“When was this patient first admitted?”
“Draft a discharge letter.”
“Find the follow-up policy.”

But under pressure, they also begin asking clinical-adjacent questions:

“Does this presentation need urgent escalation?”
“What red flags should I consider?”
“What does the guidance suggest for these symptoms?”

The assistant continues to answer confidently. Sometimes retrieving approved local guidance, giving general information, but it occasionally misses context, ignores uncertainty, and confidently presents misinformation.

No single interaction obviously breaks the rules: the warning is visible, clinicians still review responses, the tool is still labelled as administrative support.

Even though the AI caveats outputs with “results should be verified with a qualified clinician”, in practice, the system is now being trusted for clinical advice.

The harm

A clinician is preparing a discharge summary at the end of a pressured evening shift. A patient was admitted with abdominal pain and nausea. Their general examination has been stable, the pain is resolving with simple medication, and blood test are being processed. The team is considering discharge with GP follow-up.

The clinician asks the AI assistant to “prepare a discharge summary, if appropriate, based on these assessment notes”. The assistant produces a clear, well-structured discharge letter. It describes the case as “improving abdominal pain, suitable for self-management, with GP follow-up and safety-netting if blood results are satisfactory.”

Based on the information provided, the assistant responded that routine follow-up appears appropriate, while advising the user to check local guidance and use clinical judgement. It does not highlight a documented nursing concern that the patient “does not look right”, or the hospital’s policy not to discharge patients awaiting blood results.

The clinician reviews the output. The response is plausible. It matches the direction of the working plan. The disclaimer is visible, but the line between clinical and administrative support is blurred. The discharge summary is finalised, and the patient leaves.

Hours later the patient is readmitted as an emergency with sepsis. It becomes evident they should not have been discharged before blood tests were processed. The patient suffers avoidable harm. The family raises a complaint. The incident investigation identifies premature discharge, over-reliance on AI-generated summaries, and a failure to notice that the tool had moved beyond its approved administrative use.

The organisation’s first defence is familiar: the clinician was responsible, the tool carried a warning, and the output was meant to be checked. The clinician absorbs all liability.

Why human-in-the-loop failed

The problem was not that there was no human in the loop.

The problem was that the human was placed downstream of a system that had already shaped the task. The assistant selected information, framed the case, suggested a level of urgency, and made one course of action feel reasonable. The clinician was then expected to identify what was missing while tired, interrupted, and working under pressure.

That is not a robust control, but a fragile assumption. When the process was automated, the clinician’s scrutiny of outputs was weaker.

A warning label did not prevent clinical use. A review requirement did not ensure meaningful challenge. Logging did not detect misuse early enough. The original approval did not reflect how the tool would be used in context.

This highlights the governance gap: human-in-the-loop deployment often assumes that people can reliably catch system errors after the fact, then blame individuals when a system failure slips through. In high-stakes environments, that assumption is unsafe unless the organisation has also assured the wider system around the human.

What whole-loop assurance requires

A stronger approach would not reject the AI assistant. The productivity benefits may be valuable, but the system must be governed for foreseeable use, not just intended use.

That means asking harder questions before and during deployment:

What clinical-adjacent uses are predictable?
How should the assistant respond when users ask for advice outside its approval?
Can it refuse, redirect, or require escalation?
Are answers grounded only in approved local guidance?
Can users see exactly where key statements come from?
Are logs proactively monitored for drift and user bias, not just reviewed after incidents?
What thresholds trigger intervention, retraining, redesign, or suspension?

Whole-loop assurance looks at the entire system: the user, the data, the workflow, the interface, the pressures, and the decisions that may be influenced, allowing an overall argument for the safety of the system-in-use to be made. It recognises that AI risk does not live only inside the model, it emerges from how the system is used in practice.

Find out more

For AI used in high-stakes environments, organisations need to define intended use clearly, anticipate foreseeable misuse or drift, design proportionate controls, monitor real-world adoption, and build the evidence needed to support safe scaling and continuous operation.

Synoptix brings a strong systems thinking heritage to this challenge, built on 15 years supporting some of the UK’s most complex Defence programmes. By assessing how AI performs in real operational conditions and how people actually use it, we surface vulnerabilities that are often missed, including degraded performance and unexpected emergent behaviours, to build a clear, defensible case for trust in operation.

To go beyond tick-box compliance to evidence-based, uncertainty informed, intentional design, sign up to our upcoming webinar or find out more about our work on our website.

Topics from this blog: AI Assurance