Buried in Data and Starving for Information

Timothy Gardner

Since the time of Newton and Galileo, the tools for capturing and communicating science have remained conceptually unchanged—in essence, they consist of observations on paper (or electronic variants), followed by a “letter” to the community to report your findings. Sure, we have more sophisticated instrumentation, more complex workflows, and better electronic publishing. But the basic practices and paradigms remain the same.

These age-old tools are wholly inadequate for the complexity of today’s scientific challenges. Consequently the scientific community is facing a well recognized reproducibility crisis, analogous to that which caused the automobile industry to implement lean manufacturing and total quality procedures from the 1960's through 1980's.[1]

If modern software engineering worked like science does today, programmers would not share open-source code with a few clicks of a mouse. They would take notes and observations of their work, and then publish long-form articles about their software. Their colleagues would read those papers months or years later and attempt to reproduce the software based on the article. Clearly, that’s not a good approach, but that is how science works today.

The Global Biological Standards Institute estimated that 50% of published academic research was flawed or incorrect—suggesting that $30B per year of R&D spending in the United States is lost on unreliable/unrepeatable scientific research.[2] But that number is a vast underestimate of the magnitude of the problem. Globally, more than $400B per year is spent on process-based academic and industrial R&D. If we assume a 25% error rate globally (just half of that reported), it implies that we lose more than $100B per year on flawed R&D.[3]

For the folks at Riffyn, this problem was a call to action. We’d struggled with it our entire scientific lives starting in the 1990s, and in many situations we conquered it. Along the way, we also discovered something kind of obvious, but also perhaps controversial—the experiment, which of course is the fundamental unit of science, is generally misunderstood and misused.

An experiment is thought of as a tool for solving a hypothesis.[4] But most of the time an experiment is a really a tool for gathering data. It is a measurement device like a caliper, or a spectrometer, or any other measurement instrument. Its job is to contribute new measurement data of the highest possible quality, and greatest permanence to a growing record of evidence that others can build upon with trust and confidence.

Data that goes into a database should obey what I call the CAP principle. It should be complete, it should be accurate, and it should be permanent, so you never have to do it again. Otherwise, there’s no progress.
— Sydney Brenner, Nobel Laureate [5]

When you conceptualize an experiment as a measurement device, something fundamental happens to your thinking. Instead of asking “is my hypothesis true?”, you ask “is my measurement a precise, accurate and reproducible assessment?” You begin to treat an experiment as a “thing” that can be seen, known, improved upon just like bits of code shared among computer programs, or like high-quality automobile parts in a supply chain. You can apply the vast body of knowledge and methods developed for measuring, monitoring and improving quality in the manufacturing, software and other industries.[6,7]

When you treat an experiment as a thing, you create a supply chain of scientific methods and experimental data whose final product is knowledge of unassailable quality. When you pass around methods and data like that, a whole explosion of computationally-enabled “goodness” will rain down upon science. And the pace of discovery will leap.[8,9]

View fullsize
Application of quality methods drives dramatic increases in productivity in both manufacturing (left) and R&D (right) processes.
Application of quality methods drives dramatic increases in productivity in both manufacturing (left) and R&D (right) processes.

To help achieve this vision, Riffyn spent years creating Riffyn Nexus—a cloud-based software toolset built to elevate the scientific experiment to a thing you can see, a thing you can improve, a thing you can revere and share—just as you might do for a new device or piece of software. Riffyn treats scientific experiments as tangible objects, and provides the means to design, execute, and share them as visual, computable data sets for interactive analysis and machine learning. With this new approach we’re aiming to help scientists deliver reusable research—research we can trust and build on efficiently, like high-quality parts in a global supply chain of methods and data.

To learn more about the roots of Riffyn and how it brings data together, take a look this slide presentation which takes you on a visual journey of the concepts laid out here.

Buried in data and starving for information from Tim Gardner

This article also appears on LinkedIn: https://www.linkedin.com/pulse/buried-data-starving-information-timothy-gardner


[1] Gardner, TS. (2013 Aug) A Swan in the making. Retrieved from http://science.sciencemag.org/content/345/6199/855.full

[2] Freedman, LP, et al. (2015 Jun) The Economics of Reproducibility in Preclinical Research. Retrieved from http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165

[3] Grueber, M & Studt, T (2013 Dec) 2013 R&D Magazine Global Funding Forecast. Retrieved from https://www.rdmag.com/digital-editions/2012/12/2013-rd-magazine-global-funding-forecast

[4] Cambridge Dictrionary (2017 Oct) Definition of “Experiment.” Retrieved from http://dictionary.cambridge.org/us/dictionary/english/experiment

[5] Duncan, DE. (2004 Apr) Discover Dialogue: Sydney Brenner. Retrieved from http://discovermagazine.com/2004/apr/discover-dialogue/

[6] Douglas C. Montgomery (2012) Statistical Quality Control, 7th Edition, John Wiley & Sons, ISBN: 9781118146811

[7] George C. Runger, Douglas C. Montgomery (2013) Applied Statistics and Probability for Engineers, 6th Edition, John Wiley & Sons, ISBN: 9781118539712

[8] Gardner, TS. (2013) Synthetic Biology: from hype to impact. Trends in Biotech. 31(3): 123-125. http://dx.doi.org/10.1016/j.tibtech.2013.01.018

[9] M. A. Cusumano (1985) The Japanese Automobile Industry; Technology and Management at Nissan and Toyota, Harvard University Press.