Perspectives

The importance of controlled ontology

Embriette Hyde
Ontology

Map of semantic and pragmatic components of ontology (figure adapted from the version appearing with this article on differencebetween.net).

By: Embriette Hyde and Loren Perelman

We often hear from our customers that Riffyn Nexus has improved communication between team members or across collaborating teams. This may come as a surprise. It’s expected that a process data system like Riffyn Nexus would help with designing and executing workflows or formatting data for statistical analyses. But communication? That isn’t something that typically springs to mind.

Yet communication breakdowns are a major contributing factor to slowed scientific progress. And it’s not just that Sally was rude to Harry or took too long responding to an email. Miscommunication can occur, for example, when two different people use two different words in their laboratory notebook but they are actually talking about the same thing. Then, when someone tries to analyze the data at the end of the experiment, more time may be spent trying to resolve the discrepancy and format the data for analysis than actually analyzing the data. We shouldn’t need a Rosetta Stone just to decipher what each person has done in the laboratory.

Imagine how much headache could be saved if everyone worked off of a standardized vocabulary for naming and contextualizing. Such a solution does exist. It’s called ontology.

More than just a name

At its most fundamental definition, ontology is described as a branch of metaphysics concerned with the nature and relations of being. But ontologies aren’t restricted to metaphysics. There are many examples in biology and the life sciences (you’ve probably heard of, and perhaps even used, the Gene Ontology, for example) — in fact, they are necessary for science. Without them, you cannot assess your results in the context of others, nor can you share your knowledge with the wider scientific community in a way that makes sense.

At face value, ontology is generally thought of as a system for naming things — but it is so much more than that. Ontologies define objects’ purpose and relationships to other objects. The act of naming things is a formalization of semantics — but with ontologies we must also consider the structural component: the aspects of the thing you named that make it meaningful. In other words, ontologies bring context to the objects that are used in our daily work as scientists.

Imagine something as standard as collecting fermentation titer data. What do you call that property — do you call it “titer” or “concentration?” What does the titer or concentration apply to — the broth? The fermentor itself? What is it ultimately a measure of? And does titer mean the same thing independent of the project the concept is applied to?

As you can see, you are ultimately answering a lot of questions when you make the choice of what to call a thing — not just what that thing is, but what other things like it are as well. It is that standardization of naming, character, and relationship that is defined by ontology.

Ontologies for science

There are a variety of ontologies that scientists can and do use every day, and they all have different characteristics. Some are more customizable than others, they can use different levels of granularity, and different ontologies may be required for different niche areas. But one thing that all ontologies have in common is a hierarchical structure that results in a complete, contextualized picture of the object you’re interested in — including that objects’ relationship with other objects or things. They can also be combined with other ontologies, if necessary. This is critical — the concatenation of simple terms and combinations of ontologies are what lead to specificity.

These are all considerations Riffyn has taken into account when creating its ontology. Ultimately, what has resulted is a global ontology that enables users across or even between organizations to communicate using a shared language and structure. This facilitates repeatability, faster data analysis, and ultimately, faster R&D cycles by simplifying language and eliminating useless pain points caused by miscommunication.

So, how exactly does it work?

The Riffyn Nexus Ontology

The Riffyn Nexus Ontology is fairly simple by most measures, and intentionally so. We chose simplicity because it helps make it easier to use, but still gets most of the benefits of language standardization and communication of more complex ontologies.

In Riffyn Nexus, the ontology defines a hierarchy of three fundamental concepts: resources, properties, and units. Resources are physical objects and data objects manipulated or transformed in a process (i.e., things you have/know or things you make/learn). Properties are attributes of a resource in the context of the step where the resource is used or produced. Units are the dimensions of numeric properties. These three concepts are then layered into a four-level hierarchy: resources > component resources > property > unit (we note that components are resources that act as an additional specification layer for resources).

These hierarchical relationships are used in the context of scientific processes to describe the inputs and outputs of each transformational step in that process. Transformations could be a chemical reaction, mixing, a measurement, fluid separation, etc. This approach has two benefits. First, the controlled vocabulary ensures consistent definition of the materials and data that flow across lab and manufacturing processes — even processes used in different labs in different locations. For example, glucose concentration is the same thing with the same units in any lab or location. Second, the process itself adds to the contextual information that defines the purpose of a material. For example, a glucose concentration in a bioreactor process serves a quite different purpose than the glucose concentration of a blood sample of an animal pharmacodynamics testing process.

These benefits play out in scientific research in a profoundly impactful way. They allow collaborating researchers to better understand and compare their workflows and processes. And most importantly, they allow collaborators to share and combine data easily. For example, two researchers might each run a screening process that generates cell culture samples as an output. They might name those samples many different things, but since they were both classified as “cell culture” resources both resources know those samples have a shared type, and could be compared in data analysis or sent for common processing downstream.

This may seem almost boringly obvious, but it’s actually quite significant. It allows scientists to normalize data across labs, across geography, across changing processes, and across time. That could allow a scientist to easily determine the globally optimal cell line from many weeks of screening experiments. This avoids data integration headaches, errors, and artifacts that lead to false positives and wasted time.

Are you interested in learning more about how Riffyn Nexus can make communication bottlenecks a thing of the past in your R&D workflows? Send us a note a hello@riffyn.com

Embriette Hyde's photo

Embriette Hyde

Embriette is an academic-turned science writer with a passion for spreading responsible science. She holds a PhD in microbiome research from Baylor College of Medicine. After a 4-year post-doc, during which she managed the world's largest citizen science research project (the American Gut Project), Embriette became a full-time science writer and research consultant. You can find her work at riffyn.com, synbiobeta.com, and her personal webpage: drhydenotjekyll.com