Machine Learning Infers Genetic Networks

The research of Associate Professor Chris H. Wiggins, an applied mathematician in the Department of Applied Physics and Applied Mathematics, is the subject of an article in the April issue of Scientific American. The article, "At the Edge of Life's Code," focuses on Wiggins's use of machine learning to develop models that can predict how all of an organism's genes behave under different conditions and explain why a certain collection of proteins will give rise to a cell's response or ‘phenotype,’ either healthy or diseased.
When Wiggins encountered data on messenger RNA (mRNA) expressed by more than 6,200 genes of active yeast cells, he realized that biology poses new mathematical modeling challenges unlike those of traditional biomathematics. The newly-discovered ability to sequence genomes, along with DNA microarray data and other high-throughput novel technologies, suggests the possibility of inferring the causal interactions among the genes, says Wiggins, and this, in turn, requires identifying the short sequence elements to which proteins bind, thus regulating the genetic expression.
“We have developed mathematical approaches for the inference, organization, and analysis of biological networks,” he says, “and among the mathematical approaches we use are the large deviation / large margin-based classifiers of machine learning, as well as statistical inference, network analysis and graph theory, and information theory.” Wiggins and Christina Leslie, then an adjunct research scientist with the Center for Computational Learning in the Department of Computer Science, built an algorithm called a classifier, which is able to sift through data and learn the decision rules that govern genetic expression. From these rules, predictions about gene activity can be made. The algorithm, dubbed MEDUSA (motif element discrimination using sequence agglomeration), scans every possible paring between a set of DNA promoter sequences (motifs), and regulators. MEDUSA then finds the pairs that fit together best, revealing basic design principles behind gene regulatory networks. The analysis allows researchers to determine which pairs are essential to certain activities and under what conditions the behavior of the genes will change, determining the health of the cell.
Wiggins is also affiliated with Columbia's Center for Computational Biology and Bioinformatics and one of the co-PIs in the NIH-funded MAGNet (Multiscale Analysis of Genomic and Cellular Networks) National Center for Biomedical Computation. He is also one of the co-PIs in the NIH-funded Nanotechnology Center for Mechanics in Regenerative Medicine, in which he hopes to develop machine learning approaches to answer questions in cellular biology. “By developing 'data-driven' approaches to microscopy (building on experience with microarray data) and thus to cellular biology (building on experience with molecular biology), we hope to forge a much-needed bridge between the 'new biomathematics,' using bioinformatics and statistical learning theory, and the more traditional biomathematics of microscopic and mechanistic modeling,” says Wiggins.
Wiggins graduated from Columbia with a major in physics and obtained his Ph.D. from Princeton University in theoretical physics. Prior to joining the SEAS faculty, he was a Courant instructor at NYU, and held visiting positions at Institut Curie (Paris), the Hahn-Meitner Institut (Berlin), and the Kavli Institute for Theoretical Physics (Santa Barbara).


