Build Systems Worth Trusting | Columbia Engineering

Home
The Lever
The Lever: AI Agents
Build Systems Worth Trusting

AI agents need the right guardrails to put humans at ease.

By Eugene Wu

Imagine a new technology that eliminates office drudgery by making business operations faster and more efficient, but at a cost: it’s prone to mistakes, hard to trust, and hungry for energy.

This isn’t just the story of AI in 2025. It was also the story of relational databases from the 1980s.

From a systems perspective, the challenges are similar — and so are the solutions.

Back then, business software was expensive and brittle, with no reliable infrastructure for managing data. The lack of trust in the underlying data systems, which we call a “trust wall,” meant developers had to rewrite vast swaths of the application logic anytime they wanted to optimize the data layout, slightly change the data model, or simply add a new feature. Relational databases made it reasonable to trust these digital systems. Once that infrastructure existed, entire industries scaled. That shift underpins nearly every enterprise system in use today.

AI agents now face a similar “trust wall.” Many organizations let them read data or draft documents, but hesitate to let them take action, like submitting a form or updating a record. And that’s not unreasonable: while agents are powerful, they’re also unpredictable, prone to hallucination, and tough to monitor in real time. Even a single mistake could be catastrophic because the systems agents operate within weren’t built with those shortcomings in mind.

So, what does it take to make agents trustworthy? We need systems that let agents plan, simulate, and act in controlled environments that are designed to take advantage of their strengths while mitigating their weaknesses:

Sandboxed environments where agents can explore hundreds of actions before committing to one.
Forkable infrastructure that isolates mistakes, preventing them from cascading through the system.
Built-in safeguards at the data layer that block unverified actions from affecting live operations.

When that kind of infrastructure exists, agents will move beyond offering suggestions and start taking action. These components will also fundamentally shape how agents and models are designed. They will guide what assumptions are safe to embed into models, how to optimize for efficiency and reliability, and how to structure agent interaction patterns. Moreover, this infrastructure will inform the design of user experiences that give people the right level of visibility, control, and trust in the agents acting on their behalf.

To unlock the next wave of automation, we don’t just need better agents. We need to build systems that make trust possible.

If you’d like to learn more about partnering with Columbia researchers working at the forefront of applied research in AI, visit the DAPLab website or contact co-director Eugene Wu.

Learn More

Get in Touch

Eugene Wu

Associate professor of computer science at Columbia Engineering and a co-director of the DAPLab; a member of Columbia’s Data Science Institute and co-chair of DSI’s Data, Media, and Society Center