Campus
What We Miss When We Talk About AI
Most conversations about AI focus on data and models—but the real advances often come from the systems that run them.
When people talk about artificial intelligence (AI), the headlines almost always zero in on data, training tricks, or the latest buzzy model. But at the Lecture Series in AI held Nov. 21 at Columbia’s Morningside campus, Jingren Zhou MS‘01, PhD‘04 made a compelling case for something less glamorous, and far more essential. AI systems, from infrastructure to execution platforms, are doing as much heavy lifting as the algorithms themselves. Without them, “none of the breakthroughs we celebrate would actually work.”
Zhou, the chief technology officer of Alibaba Cloud, explained that the real magic comes from the massive, complicated systems that make these models possible in the first place. He began his talk with a simple idea: Today’s AI boom didn’t happen because of one big invention. It happened because three things came together at the right time—tons of data, extremely powerful computers, and a clever model design called “transformers.” A transformer is a type of computer program that learns patterns, like words in a sentence, by paying attention to how each piece relates to the others, making it the backbone of today’s powerful AI systems. But to make all that work, companies have to build enormous supercomputers and storage systems behind the scenes.
Training Models at an Unprecedented Scale
The development of Alibaba Cloud’s open source AI models, Qwen and Wan, was led by Zhou. Training models like these takes months, uses tens of thousands of graphics processors, and requires carefully managing trillions of pieces of data. To support this, his team has built a massive, highly specialized infrastructure. Alibaba can build systems at this scale because it operates one of the largest cloud infrastructures in China, giving its engineers the hardware capacity, data pipelines, and on-the-ground experience needed to design AI supercomputers from the ground up. This includes racks of liquid-cooled processors, lightning-fast networking, and storage systems designed to juggle millions of tiny files without slowing down.
But Zhou made something else clear: once a model is trained, the work is only half done.
The real challenge is serving the model, getting it to respond quickly and reliably when millions (or billions) of people use it each day, he said. Unlike a search engine that displays results instantly, an AI model generates its responses one word at a time, which is much harder to accelerate. And because people expect fast, high-quality answers, companies must be very strategic about how they schedule and balance their workload.
There’s also the issue of versions. New model versions don’t replace old ones overnight—they coexist. That means feedback and data come from multiple versions simultaneously, and engineers need systems that keep each version’s information separate so the newest model isn't trained on outdated or confusing results.
Another challenge is how many ways a model can be set up once it’s running. Every choice—how much to compress the model, how many machines to use, how long responses should be—changes how fast or efficient the system is. Big tech companies now use automated tools to try out thousands of configurations behind the scenes, allowing users to enjoy smooth and fast experiences.
Why Smarter Systems Will Shape AI’s Future
And that’s just for one model, he noted. The largest AI companies run hundreds or thousands of models across hundreds of thousands of computers worldwide. Said Zhou, “Keeping everything balanced and running efficiently is one of the hardest engineering problems in the field.”
The talk concluded with an important message: The future of AI will not be shaped by smarter algorithms. It will be shaped by smarter systems—the massive, invisible machinery that keeps these models trained, updated, and available to everyone.
“It is not just about the models, but the thousands of engineers solving enormous behind-the-scenes problems every single day,” said Zhou.
Lead Photo Caption: Jingren Zhou MS‘01, PhD‘04, CTO at Alibaba Cloud, returns to campus as a guest speaker in our Lecture Series in AI
Lead Photo Credit: Chris Taggart