R&D Data as a Competitive Advantage


The pharmaceutical industry has been obsessed with creating a sustainable competitive advantage through a more strategic use of their crown jewels - their data.  

GSK has invested tens of millions into incorporating AI in their drug development process. Merck has partnered with Accenture to develop a Research Life Science Cloud to enable their clients to get the most from their R&D data. BASF has announced aggressive plans for a digital transformation within its research organization.

So how will these organizations translate their investments into gains in R&D productivity and better outcomes for patients and consumers?


How Does Google Do it?

Let’s think about how Google uses its hoards of data to illustrate a point. Over the last 20 years Google has amassed huge volumes of user data, primarily search queries. With it they’ve been able to apply various algorithms to it in order to provide increasingly meaningful search results. Since Google Search has become the benchmark for the entire industry, these improved results are the standard against which any competitive search service must meet in order to compete.

 To meet this standard, any new competitor search engine would need a requisite volume of user data as a starting point. This would seem to be a bridge too far for a few people in a garage or a team of graduate students to cross. Google’s dataset, then, has made their search product pretty unassailable in the marketplace and serves as a significant sustainable competitive advantage.

So how might this type of strategy apply to a drug maker?  


The Drug Maker’s Advantage

One potentially interesting idea is a deep-learning model that can project the value of the critical scale-up variables or risk points for prospective genes/proteins/cell lines. It would be incredibly valuable, for example, to be able to predict from a genetic sequence of a biologic drug what are the setpoints of critical operating parameters, media components, or parent cell line traits for a  manufacturing process. Even ballpark estimates for most variables would greatly streamline the overall drug development process. Such a model is the holy grail for bioprocess research. However, it’s not an unthinkable goal for an ambitious company.

The current state of the art for bioprocess development, however, still involves a battery of development exercises in cell line development, fermentation scale-up experimentation, and purification methods. Each of these exercises are ultimately, at their core, trade-off calculations between various factors (perhaps thousands of them).

For example, fermentation media has a drastic impact on the quality, cost, and efficiency of a fermentation process. If critical components are too expensive and drive up your production costs, you might not be able to proceed to commercial scale production. A different cell line, perhaps engineered to eliminate a by-product to improve media effectiveness, might require only cheap, available components and still produce at high yields. Currently, it can take months or years of labor and data-intensive cell line screening and genetic engineering to identify the right formulation. A data model developed from a database containing many years of process data might provide you with this insight within seconds.

Having a dataset of thousands or millions of fermentation iterations across a wide variety strains, vectors, fermentation environments, and purification methods can serve as a valuable training set against which to test new genetic entities.  It would be very difficult and expensive to create, but could potentially mitigate much of the technical risk inherent to scaling production for new products.


Ok, so, how do I do that?

In order to use your data as a weapon, you have to invest in making it structured and accessible to your entire organization. A database is woefully insufficient. Datasets have to meticulously created, annotated and linked.  Even more challenging, all of your data must be linked across all of your experimental teams within your organization. Each team must be able to input their data into this dataset in a similar format and structure as all the other labs. This might be the hardest step of all — getting humans to change their behavior.  

More practically, in order to work with this giant dataset, it must be readily available to update and interrogate. So it can’t be buried in directory structures in a file system or within data tables in a database within a data center never to be seen again.

There are certainly ways to accomplish this, but they are challenging. There are many components to it, and they’re not designed for scientific organizations that are constantly changing.


The place to start

Riffyn was founded to address these data access, structure and interpretation challenges within large R&D organizations. We offer three core capabilities for R&D organizations to have the type of visibility and control into their data.

First, all of your process designs and parameters are saved, versioned, and developed collaboratively across global  teams. Riffyn then can collect, link, automatically add context, and structure datasets collected by running those processes. All experimental data is always associated with the processes and the parameters that generated it, so you can establish cause and effect. Then your data is automatically transformed into a structured, flattened statistical data frames for machine learning.

Second, process and experimental designs can change and evolve along with your evolving processes while preserving their data structure. This means you can always keep stacking data set upon data set for large-scale aggregate analysis.

Third, all of your experimental data is instantly available from a web browser or any API-enabled analytical software application. You can easily import your exported statistical data frames natively into JMP, Tableau, Spotfire, Python, etc. Everyone in your organization has access to the same datasets and data analyses at all times.

With these core features, Riffyn removes the barriers to implementing a data digitization and AI strategy. Your R&D data can be queried and mined for insights rather remaining buried within notebooks (electronic or paper).

If you’d like some advice about how to use your data to give your company a competitive advantage, drop us a line at advice@riffyn.com.

Douglas Williams