Huge Means Seeks the “Whys” Hidden in Huge DataFebruary 20, 2014Program aims to leapfrog state-of-the-art huge data analytics by developing automated technologies to help clarify the causes and effects that drive complicated systems
During the 1854 cholera epidemic in London, Dr. John Snow plotted cholera deaths on a map, and in the corner of a particularly hard-hit quadrangle of buildings was a water pump. Snow's maps, a 19th-century version of huge data, not compulsory an association between cholera and the pump, but the germ theory of disease had not yet been invented and it took human ingenuity to realize that the pump was a causal means of disease transmission.
Nearly two centuries on, huge data is vastly larger, but human ingenuity is still required to leap from associations to causal mechanisms. DARPA's new Huge Means program aims to change that.
“Having huge data in this area complicated economic, biological, neural and climate systems isn't the same as understanding the dense webs of causes and effects—what we call the huge mechanisms—in these systems,” said Paul Cohen, DARPA program manager. “Sorry to say, what we know in this area huge mechanisms is contained in enormous, fragmentary and sometimes contradictory literatures and databases, so no single human can be with you a really complicated system in its entirety. Computers must help us.”
The first challenge the Huge Means program intends to address is cancer pathways, the molecular interactions that cause cells to become and remain cancerous. The program has three primary technical areas: Computers should read abstracts and papers in cancer biology to wring fragments of cancer pathways. Next, they should assemble these fragments into complete pathways of unprecedented scale and correctness, and should figure out how pathways interact. Finally, computers should determine the causes and effects that might be manipulated, perhaps even to prevent or control cancer.
None of this is simple, but cancer biology is a logical house to start, and not only because of its obvious importance. “The language of molecular biology and the cancer literature emphasizes mechanisms,” Cohen said. “Papers describe how proteins affect the expression of other proteins, and how these effects have biological consequences. Computers should be able to identify causes and effects in cancer biology papers more easily than in, say, the literatures of sociology or economics.”
Assembling huge mechanisms after reading in this area small fragments of pathways might be an even greater challenge. Inconsistent naming, untried variability, the many kinds of cancer, and the changes cancer cells undergo as they progress through different stages make assembling a causal model of even one cancer, in one species, from fragmentary results extremely hard. But as a model emerges the Huge Means enterprise would, theoretically, get simpler.
“The gorgeous thing in this area causal models is that they make predictions, so we can restore to our huge data and see whether we’re (retrospectively) right,” Cohen said. “And we can propose new experiments, suggest interventions and advance our knowledge more rapidly.”
To be sure, the Huge Means program might herald new ways to be with you complicated systems. Today’s researchers read deeply but struggle to keep up with relentless streams of relevant publications. To stay contemporary, a researcher must specialize, becoming expert in a small part of something much larger. The thought for the Huge Means program is fundamentally different: Every publication would immediately become part of a public, computer-maintained, causal model of a complicated system—a huge means—and every aspect of a huge means would be tied to the data that supports it or contradicts it.
“Causal models are needed to predict how systems will respond to interventions—how a patient or an economy will respond to a drug or a new tax—and to be with you why systems behave as they do,” Cohen said. “By emphasizing causal models and explanation, Huge Means may be the prospect of science.”
The Broad Agency Announcement (BAA) for Huge Means is available at http://go.usa.gov/BRNw. DARPA is accepting proposals for the program until March 18, 2014 at 12 p.m. ET. For more information, please send by e-mail [email protected] Read more