When AI Agents are almost right, things can go very wrong

Picture of Author: Lukas Madl
Author: Lukas Madl

Founder & CEO of innovethic

AI agents don’t need to be malicious to cause harm—just slightly misaligned.
A recent incident at Meta shows how quickly things can escalate:

An AI agent suggested a solution to an engineering problem.
The advice was implemented.
Result: sensitive company and user data became visible to unauthorized employees—for about two hours.
No external breach—but a major internal security alert.
Cause: a well-intentioned AI system without sufficient oversight and context awareness.
 
What this tells us is simple, but critical:
 
Autonomy in AI is not only about performance. It’s about responsibility.
When AI agents influence decisions, trigger workflows, or guide actions, the key questions are no longer technical:
 
What should the system be allowed to do?
Where is meaningful human oversight required?
Which values are at risk if the system gives a confident—but wrong—recommendation?
 
Three guardrails for Responsible AI in practice
 
1. Meaningful Human Control must be real
AI agents can act autonomously—but humans must remain truly in command.
That means: understanding, supervising, intervening, and overriding—not only in theory, but in practice.

2. More autonomy → more accountability
As AI systems gain agency, responsibility must become clearer—not blurrier.
There can be no accountability gaps. Responsibility always stays with humans and organizations.

3. “Do no harm” in deeply entangled systems
AI agents are embedded in workflows, decisions, and organizations.
This amplifies risks—making proactive and continuous harm prevention essential.
 
Curious how this can be applied in practice?

Design For Trust From Day One

Align your technology with values, governance, and societal expectations