Managing Big Data in Healthcare

The big problem with Big Data is that there is just too much of it, especially in the life sciences industry, where information is coming from all different directions, including R&D, manufacturing, clinical trials and even patient care.

To complicate matters more, the U.S. Food and Drug Administration (FDA) and the European Medicines Agency (EMA) are imposing new kinds of pressure in the form of good manufacturing best practices that turn into regulations. For example, the latest guidance for the pharmaceutical industry, called continued process verification (CPV), requires the collection and analysis of end-to-end production and process data to ensure product outputs are within predetermined quality limits.

Basically, the guidance issued by the FDA in 2011 and adopted in 2014 by the EMA as a “guideline for process validation” asks that pharmaceutical manufacturers provide ongoing verification of the performance of their production processes by ensuring a constant state of control throughout the manufacturing lifecycle. If the process is not in control, the guidance requires that corrective or preventive actions be taken.

While a good concept, it is problematic for pharmaceutical companies for two reasons: First, every process and methodology is documented and validated to meet regulatory requirements. Any change requires revalidation—an onerous task. Second, the life sciences industry is still very dependent on paper. To date, the use of statistical analysis on Big Data has been relegated to R&D and the drug approval process. But analytics have not been applied in manufacturing.

“We have people in labs using our software to monitor analytical processes, but when we talk about the plant floor they just look in the air and shrug,” says Louis Halvorsen, chief technology officer of Northwest Analytics. But continued process verification is leading a push to use statistical process control to monitor manufacturing processes, he says.

It’s all about making regulators happy. Companies that comply with CPV are seen in a favorable light by inspectors, making audits so much easier. Still, compliance with CPV and data integrity have posed new challenges to the pharmaceutical industry. Just ask Donal Coakley, an associate scientist at Gilead Sciences in Cork, Ireland. The good news is, it is now a fairly painless process since Gilead Sciences started using Northwest Analytics software to perform CPV and show variations in processes. An example of how Northwest Analytics software helps is the ability to meet the CPV guidance for predicting product shelf life. “The FDA says that if a company follows the guidance, it will be a short, friendly audit,” Halvorsen says. “But if your statistician has their own idea of predicting shelf life, [the FDA] will bring in their own experts to do an audit.”

Beyond that, applying Big Data analytics to the plant floor opens the door to more opportunities. Indeed, Big Data is quickly becoming a big deal in the life sciences industry because it can help improve quality, integrate IT and manufacturing operations, enable better forecasts, and even bridge the divide between R&D, manufacturing, clinical trials, patient health and the watchful eye of the FDA.

“There is a movement to expand the set of data that life sciences companies are looking at, not only to bring product to market, but to ensure it is safe and effective and potentially identify new opportunities,” says Matt Gross, director of the health and life sciences global practice at SAS Institute.

But there’s still a big obstacle to overcome: Where to start?

Big Data launch
Gilead Sciences is one of the pioneering companies using Big Data analytics in manufacturing. A few years ago, the company replaced its Microsoft Access database and Excel spreadsheets—which were used to manually track data—with Northwest Analytics statistical software, and integrated it with its laboratory information management systems (LIMS) and other applications.

But Gilead, like many other companies moving from manual data collection to automated analysis, struggled to start the deployment because of the sheer volume of data available. “The challenge we had is that there is a lot of data from different sources,” Coakley says. “And more data isn’t necessarily better.”

To start sifting out less important data, each department was first asked what it wanted to trend. Then a risk analysis was performed with manufacturing parameters scored based on the impact a variation would have on a product, the probability that variations would occur, and the ability to detect a meaningful variation at a meaningful control point. From there, the company applied the Northwest Analytics software only on the parameters and performance indicators that were both critical and showed variation.

“We collect so much data from our process that it is not possible to analyze it all,” Coakley says. “Before CPV, we only looked at the data when we knew there was a problem. With CPV, we are alerted to adverse trends before they become problems.”

To identify adverse trends, Gilead uses an enterprise manufacturing intelligence (EMI) dashboard for trending the data and tracking if something is out of trend by color coding the data with green, yellow or red flags.

SAS’s Gross agrees that it is the quality of the data, not the quantity, that opens the door to new opportunities. In operations, it is all about risk-based monitoring and bringing data streams together to look for patterns.

Visualization is also very important. “You have to [present] the information in a way that you can see trends, outliers and patterns, and drill down to figure out what the question is you want to ask,” Gross says. “The goal is to reduce the time it takes to get the right question to the CEO.”

Of course, all of this gets a bit more difficult to do when you factor in the streaming data of the Industrial Internet of Things (IIoT).

Analytics at the edge
When it comes to quality control, companies must be able to capture data coming out of the production process, and also trace any problems back to a machine or even a specific tool on a machine. Until the addition of IIoT, the Big Data discussion circled around pulling data into a giant database and using distributed computing environments like Hadoop. But there has since been a shift, especially for machine builders, to include analytics at the lower levels and only push a small, relevant data set to the enterprise or cloud for further trending and analysis.

Processing errors in machines can cause variations in production, but there is not always enough information about what’s happening with the machine. Beckhoff Automation set out to solve that problem with its TwinCAT Analytics tool—announced late last year and generally available by the end of this year—which stores all process-relevant data locally in the controller, pushing some data to a cloud-based server. It provides a complete temporal image of the process and the production data for a comprehensive condition analysis of the machine’s functions.

Using high-speed cameras and transferring data to a PC, inspection imagery of syringes and vials is displayed via custom HMI software developed by Particle Inspection Technologies.

Particle Inspection Technologies (PI-Tech), which custom designs vision software used to inspect pharmaceutical systems for manufacturing defects, is using Beckhoff technology to compare a product test image with a template image to discover differentiations that could be a defect. Using high-speed cameras and transferring data to a PC, PI-Tech can analyze scratches, cracks and dents in any container type, such as the vials and flanges used in parenteral packaging applications.

“It’s up to me to collect the data and analyze it in real time to basically get a rejection or a pass,” says Jerry Wierciszewski, an engineer for PI-Tech. “I’m also sending data for real-time collection to a database so that it can be retrieved down the road by scientists.”

The ability to transfer the right information and interface with other analytics programs (which TwinCAT Analytics and another new tool called TwinCAT IoT suite, will do) is important to the big picture: sharing—and pairing—the data with other parts of the enterprise and beyond.

Ultimately, all data relates back to the patient. That means understanding what’s happening as a whole from R&D to manufacturing to clinical trials and drug approvals to hospitals, patients and even insurance.

But first, says Mike Flannagan, vice president of data and analytics at Cisco, “we need quick insight into a problematic trend.” IoT is changing business models in life sciences, among other industries, he adds. “We want to push [analytics] as close to the manufacturing point as possible to take evasive action. If it looks like we’re about to create bad pills, we want to stop it, which is why we are distributing processing all the way to the edge.”

Cisco’s role is to create a network infrastructure that enables integration of distributed data. A few years ago, Cisco acquired Truviso, which provides real-time streaming network data analysis. Last year, the network company acquired ParStream, which has an analytics database designed for large amounts of IoT data.

“ParStream can run on a lightweight server and manage multiple terabytes of data easily, so you can take a small footprint database and distribute it to the edge of the network where sensor data is generated,” Flannagan says. By storing the data locally and treating it as one big virtual database, life sciences companies have easy access to process data. “If you get audited, you can query records from dozens of databases across manufacturing facilities…pulling data together quickly.”

Now, let’s say that the drug is in clinical trials. What if you could make better clinical decisions before and after a product launch to bring safer products to market faster? As computing power gets faster, storage gets cheaper and advanced algorithms become more accessible, new tools are emerging that improve clinical trial monitoring and even factor in real-world evidence of patient treatment.

Welcome to the real world
Pharmaceutical companies have a difficult job managing and understanding unstructured and complex data types coming from image files, video and instruments. In some cases, that data is not generated from manufacturing operations, but rather from clinical trials and real-world patient data. This data includes the commercialization of the drug, the path for treatment, and the processed insurance claims, among other things.

Enter Big Data solutions supplier Saama Technologies, which recently released Fluid Analytics for Life Sciences, including a framework and pre-built analytic modules for specific use cases.

In the traditional world of manufacturing medicine, drugs are created, clinical trials conducted, and then the drugs go through the FDA approval process, which can take years. Once a drug is released, the real-world effects might be different than in the lab—but there’s been no way to get feedback to the scientists. Saama’s Fluid Analytics pulls the actual data about a drug that is coming from patient experiences and insurance claims.

Fluid Analytics has connectors to different data sources that enable users to act on exceptions to business rules. Advanced analytics methodologies include risk assessments to clinical trials and comparative drug effectiveness, while a dashboard can provide a view into treatments of drugs, including a timeline. Though this may seem way out of the realm of Big Data in manufacturing, ultimately, everything will be connected.

“We’ll be looking at data coming in from clinical trials, looking at safety and efficacy, and bringing real-world data from healthcare and wearables into the analysis of what to look for in research,” Gross says.

More importantly, the big lesson in Big Data is that you need to filter the right information. As Gilead Sciences’ Coakley astutely notes again and again, “More data isn’t necessarily better, unless it’s relevant to the process.”