Research

Making AI Agents Reliable, Despite Their Mistakes

A new collaboration aims to chart a path to building genuinely trustworthy AI agents.

April 03, 2025
Grant Currin

More than 170 leaders from industry and academia gathered at Columbia Business School on March 12 to discuss the significant opportunities — and challenges — associated with weaving AI agents into the operational fabric of organizations like business and government. 

The closed-door workshop, “AI Agents for Work,” was hosted by Columbia University’s new Data-Agents-Process Lab (DAPLab), which convenes researchers from across the University and industry partners to share research and experiences about the major hurdles with designing AI agent automation to operate reliably, safely, and efficiently within enterprises.

“Organizations rely on applications, data systems, and operating systems that weren’t designed to handle situations when an AI agent makes the wrong decision,” says Eugene Wu, co-director of DAPLab and associate professor of computer science at Columbia Engineering. “Rather than working to improve individual models or build scaffolding around them, we’re collaborating with industry partners to develop full-stack approaches to making agents safe, reliable, and trustworthy.” 

Image
Eugene Wu standing at a podium and holding a microphone
Eugene Wu, associate professor of computer science and co-director of DAPLab. Credit: David Dini/Columbia Engineering

AI agents have the potential to perform tasks without oversight or intervention from a human, yet today’s LLMs and agents regularly hallucinate in unpredictable ways. A major theme throughout the workshop was the critical — if often overlooked — challenge of ensuring reliability. This is crucial because agentic systems act in the world, potentially performing vital and irreversible functions.

The workshop included speakers representing Columbia Engineering, Columbia Business School, and the Data Science Institute as well as industry partners such as Celonis, Intellect, OpenAI, Amazon, Google, DataBricks, IBM, and Microsoft.

A huge opportunity space

Shih-Fu Chang, dean of Columbia Engineering, welcomed a crowded room of attendees by marveling at the rapid pace of progress in AI development over the last several years.

“In the 31 years that I’ve been working in this area, I never imagined the extremely fast rate of improvement that we have seen recently,” said Chang, who is the Morris A. and Alma Schapiro Professor of Engineering and a professor of electrical engineering and computer science. “If we are going to make further progress in this rapidly moving area, collaboration between industry and university is key.”

Throughout the morning, attendees at the workshop heard speakers illuminate AI agents from a wide-ranging set of perspectives. Open AI’s Jason Wei delivered a keynote address on the state of LLM development and future directions for AI research, several technologists and business experts probed emerging topics in a panel on the development of AI Agents, and Columbia Engineering faculty shared research updates in a series of lightning talks.

Throughout the workshop, speakers continually emphasized the importance of collaboration across industries and disciplines.

“Many of the faculty involved in DAPLab are from computer science, but our collaborations with partners ranging from business researchers and industry to cognitive science and the humanities are vital to our mission,” said Lydia Chilton, assistant professor of computer science at Columbia Engineering. “This is a huge opportunity space, and no single discipline or company is going to dominate it.” 

Pooling our insights

The second half of the workshop focused on these collaborations between academic research and industry partners, with a panel discussion and lightning talks focused on reporting back from real-world attempts to implement AI agents. 

Michael Morris, Chavkin-Chang Professor of Leadership, welcomed the attendees to Columbia Business School by observing that academia and industry each account for roughly half of the research that happens in the United States.

“We really ought to be pooling our insights,” he said. “Business schools can play a natural role in facilitating industry-academia cooperation — I’m hoping the DAPLab will be one of many ways we do this.” 

In her keynote address, cognitive scientist Danielle Perszyk described how her team at Amazon AGI leverages insights from human intelligence to refine its efforts to build reliable AI systems. 

“It's one thing for an agent, whether a human or an AI, to be able to do the same thing like click on an icon or type in a text field in the same environment every time,” Perszyk said. “But the digital world is ever-evolving, and we can't assume that it's going to stay the same even for short periods of time.” 

She described how research into the evolution of human intelligence frames the questions necessary to develop agents that are robust in fast-changing environments. 

In closing the day, Chilton described the journey to building and deploying these technologies as “a business problem, a sociology problem, and a cognitive science problem that draws from history, science, language arts, visual arts, and interaction.” 

For her, AI agents won’t just automate tasks — they will open entirely new sources of value, the same way databases, the internet, and the cloud have.

“Every faculty member involved in DAPLab has deep collaborations throughout the university,” she said. “We also have a strong track record of delivering results for industry problems through sponsored research.”

The DAPLab is currently identifying partners to work together on a shared research agenda to develop next-generation systems to support reliable, safe, and efficient agent automation. To learn more, visit DAPLab’s website or contact co-directors Eugene Wu or Zhou Yu


Lead Photo Caption: Zhou Yu, associate professor of computer science and co-director of DAPLab

Lead Photo Credit: David Dini/Columbia Engineering