David Blei’s Big Data Innovations Win Him a Guggenheim Fellowship
David Blei, center, and his team of students and postdocs tailor machine learning and topic modeling for a wide variety of research areas, including image analysis and patient monitoring.
Analyzing today’s massive data sets, from health care data to user data, is rarely as simple as finding a standard statistical method and putting it to work. Smart analysis requires tailored methods that can look at that data in context and continuously learn from the patterns, says Columbia Statistics and Computer Science Professor David Blei. Blei's innovations in big data analysis have been drawing attention for several years and just won him a prestigious Guggenheim Fellowship.
“Massive collections of observational data are everywhere—in government, industry, science. Practitioners need to be able to quickly explore, visualize, and understand them,” Blei said. “To do that, we need statistical and machine learning tools.”
Blei has been tailoring machine learning and topic modeling for a wide variety of research areas, including image analysis and patient monitoring. His research in topic modeling resulted in the development of the Latent Dirichlet Allocation (LDA) model, which identifies semantic themes across billions of documents using language processing. In health care, Blei’s methods for analyzing DNA sets using machine learning algorithms could eventually help identify disease-carrying mutations, and his work on patient care could help predict patient survival.
For DNA analysis, Blei and a team from Columbia and Princeton recently designed an algorithm that can scan enormous quantities of genetic data from across populations and could eventually characterize the structure of world-scale human populations. The algorithm, called TeraStructure, updates its model as it analyzes data, constantly learning as it churns through new information. The design expands on the widely used Structure algorithm, a Bayesian model, using Blei’s earlier work with stochastic variational inference.
Blei is also adapting stochastic inference in an effort to expand patient survival analysis by parsing thousands of electronic health records at NewYork-Presbyterian Hospital in collaboration with researchers at Columbia’s College of Physicians and Surgeons. The outcome could help predict the course of patients' diseases and improve patient monitoring.
Students are integral to all of those projects and several others that Blei has underway. "My students and postdocs are involved in every aspect of my work,” he said. “I am very lucky to have such a friendly, collaborative, and productive lab."
Blei, who in 2011 received the U.S. government's highest honor for early career scientists and engineers, the Presidential Early Career Award for Scientists and Engineers (PECASE), plans to use his new Guggenheim Fellowship to develop new methods for probabilistic modeling and share them widely with scientists and scholars through free software and a book. The John Simon Guggenheim Foundation chooses scholars and artists for the fellowship based on their achievement and promise. Recipients are covered for six months to a year of time to “work with as much creative freedom as possible” on whatever they wish.