key: cord-032403-9c1xeqg1 authors: Sokolov, Michael title: Decision Making and Risk Management in Biopharmaceutical Engineering—Opportunities in the Age of Covid-19 and Digitalization date: 2020-09-08 journal: Ind Eng Chem Res DOI: 10.1021/acs.iecr.0c02994 sha: doc_id: 32403 cord_uid: 9c1xeqg1 [Image: see text] In 2020, the Covid-19 pandemic resulted in a worldwide challenge without an evident solution. Many persons and authorities involved befriended the value of available data and established expertise to make decisions under time pressure. This omnipresent example is used to illustrate the decision-making procedure in biopharmaceutical manufacturing. This commentary addresses important challenges and opportunities to support risk management in biomanufacturing through a process-centered digitalization approach combining two vital worlds—formalized engineering fundamentals and data empowerment through customized machine learning. With many enabling technologies already available and first success stories reported, it will depend on the interaction of different groups of stakeholders how and when the huge potential of the discussed technologies will be broadly and systematically realized. In 2020, the world was shocked by the Covid-19 pandemic. In many countries a large number of restrictive measures have been introduced in order to reduce the rate and extent of the outbreak. The pandemic has spread in a characteristic sequence from Asia to Europe and then to the rest of the world. The introduced country-specific interventions strongly vary in their severity and timeline across different, even neighboring countries. During such a "lockdown" phase, 1 which in this case usually lasted over more than two months, a major goal is to monitor the situation and to collect sufficient data. 2 In the era of digitalization and globalization, several organizations have been able to measure trends based on a daily updated display of available worldwide information. 3, 4 This has been essential to plan next steps, while reducing risks for the health system such as operative or capacity bottlenecks. Despite different governmental strategies to address the problem, all of them strongly rely on the concept of social distancing, the adherence to which is difficult to monitor and control across society. Such limited control is often present when trying to solve a problem, which is relatively new as well as subject to many potentially influential factors, with some of them difficult to quantify or to predict. During the Covid-19 crisis, it has become broadly evident how essential the availability of data is for decision-making for complex problems, and how unstable such decision processes can become when the data are biased by uncertainty and lack of prior expertise. 5 Although with a different complexity and effect on society, the biopharmaceutical industry faces an analogous uncertaintydriven environment on a daily basis in their workstream. 6 The biopharmaceutical sector is a dominantly growing branch of the pharma industry 7 with prominent blockbuster therapeutic protein products such as Humira (adalimumab) and Rituxan (rituximab). 8 This industry uses as one of the principle unit operations a biotechnological process based on a living organism to produce highly specific drugs targeting, for example, cancer, autoimmune, and orphan diseases. These bioprocesses are complicated to control and require many cycles of usually quite long experimental investigations. Hence, this industry is driven by two opposing objectives: ensuring high drug quality and safety to patients, while competitively reducing time to market and process development and manufacturing costs. 9 Hundreds of potentially influential factors in the production process can be taken into account and many tens of them are being broadly monitored and controlled. 10 The main engineering challenges 9,11−13 are to (1) robustly control the behavior of the living organism involved in the process, (2) efficiently align the often heterogeneous data generated across different process units and scales, (3) include all available prior know-how and experience into the decision process, (4) reduce human errors and introduced inconsistency, and (5) enable an automated and adaptive procedure to assess the critical process characteristics. This commentary takes the Covid-19 pandemic as an illustrative example of decision making under uncertainty based on a daily increasing number of available data and know-how. This example will be used throughout the commentary to portray the decision making challenges in the biopharmaceutical industry with the key goals to reflect on the potential of different digital data-and knowledge-driven solutions to support mastering the path toward the standards of industry 4.0. Complex problems can be solved efficiently through the support of relevant data and/or through sufficient experience in dealing with similar problems. In both cases, it is essential to evaluate how close prior data and knowledge are to the problem to be solved and how trustworthy these are. In the Covid-19 pandemic strong biases are introduced on the data due to the long incubation time of the virus resulting in a delayed symptomatic response and appearance in the database, inconsistencies in the fatality definitions, and incomplete testing across the population, among others. 14, 15 This uncertainty is coupled to a lack of governmental and social experience of dealing with pandemics of such broad magnitude. 14 Also, the contribution from the scientific community is yet very heterogeneous in focus and suggestions. 16 Therefore, data is used as relative trending method, while experience is gradually building up within and across countries. In biopharma, the situation is usually much better as there is less uncertainty on the acquired data and available prior experience on developing or operating similar biopharmaceutical processes. Nonetheless, the level of uncertainty can be expected considerably larger compared to the closely related small-molecule pharma and general chemistry sectors. Figure 1 presents the landscape of different data (in red) and expertise (in yellow) available in bioprocessing together with the duration to generate these and a tendency of their utilization importance for decision making. In the very heterogeneous field of data sources in bioprocessing one important group is the information defined or available before the start of the process (or a certain unit operation) such as the designed set points (experimental design in development or process design space in manufacturing), from here on referred to as Z variables. These variables are essential to define an optimal and robust operation strategy for the process, whereby meta data (e.g., information on operator, devices, site specifications) and raw material information are often taken into account significantly less compared to the other variables. The variables labeled with a preceding X are dynamic process measurements through different sensors (online) or offline analytics. These are essential to monitor the process and provide a basis for control. Unlike variable-specific sensors such as pH, information from spectroscopes (e.g., Raman or NIR) has to be linked to the variable of interest through a regression method, which requires additional work to be calibrated. In particular, the profiles of selected characteristic variables for each unit operation are always considered in the decision process, while the inclusion of all other sensor data depends on its importance for process control and direct availability from the historian, that is, the possibility to directly align all available dynamic data sources, which often is not the case. Finally, the variables quantifying the product quality are labeled with a Y symbol. These are essential to characterize the process outcome and interconnect different unit operations, for example, the impurities produced upstream in the bioreactor to the purification procedure in the following downstream operations. These variables often require a complex analytical procedure over many hours to days. Eventually, after many days to few weeks, the first data sets can be obtained for a unit operation and, within months, also the information on several development cycles as well as several unit operations can be available. After many months to several years, a complete development activity toward the manufacturing scale can be obtained. In development and manufacturing, many professionals are involved, who had to go through a long learning procedure of hands-on experience in the lab, working on multiple scales and for the production of different molecules, possibly utilizing different biological organisms, operation strategies and devices. Such expertise for the complex underlying processes is built throughout several years. Similarly, also the modeling experts capable of formalizing certain mechanistic process behavior and/or statistically deriving process interrelationships (chemometrics) require training of many months to a few years, with the generation of a new predictive model potentially lasting for several months. All such expertise is usually linked to individuals and is not available in a generalized format to other team members. Because of significant time pressure in development and risk mitigation pressure in manufacturing, decisions are often made on an ad hoc basis involving expert meetings where all readily available data, analysis results, and experience sources are taken into account without ensuring consideration of all possible available information hidden in the databases or inside the potential of (not automatedly retrained or connected) predictive models. Hybrid modeling pursues the goal to synergistically combine available data and know-how as highlighted in the graphical abstract figure. Thereby, the know-how is provided as the fundamental backbone based on formalizing central process characteristics (e.g., mass balances) and interrelationships (e.g., characteristic ratios or dimensionless numbers) in broadly valid model equations. As explained in detail by von Stosch et al. 17 and Narayanan et al., 18 the available data is then used to finetune the model parameters to the considered use case and to flexibly adapt to different scenarios. Figure 2 compares the concept of hybrid modeling to the two standard modeling approaches, 19 namely purely data-driven (statistical) and purely knowledge-driven (mechanistic) approaches. On the side of assessing the process behavior by a process expert perspective, one could either rely on statistical methods, which require a lot of data to support decision making or on deterministic methods, which can only be formalized if a large part of the behavior is well understood. The first approach is strictly limited by the amount of available data due to the large labor cost of each data point produced combined with the complexity of the data. 12, 20 The latter is limited by the generally available understanding of the complex unit operations as well as the availability of an expert for each of them. 11, 12 Therefore, a solution based on combining the formalization of the central know-how with the flexible learning of the unknown remainder from the available data breaks the need for either large data sets or the continuous involvement of a process expert in order to reliably withdraw important decision support. From the perspective of the decision making stakeholders such algorithmic solution provides a trustworthy decision base with less effort, that is, less time and labor to conduct experiments to create the central know-how and less labor time of an expert to correctly structure that information for the decision making perspective. Figure 3 supports the explanation of the central role of hybrid modeling in improving trustworthiness and decision support in process design and failure detection. Standard design of experiments methods 21,22 try to explain the product information (Y) based on the design process factors (Z) with a so-called "black-box" approach, that is, without specifically integrating mechanistic process information or the dynamic process information (X). Taking simplistically the Covid-19 example, this could be interpreted as trying to understand how certain imposed restrictions affect the final lethality of the period under these restrictions without anticipating all the trackable society behavior in the meantime. The X information therefore bears a central possibility to better understand characteristic dynamic scenarios and patterns, which might result in a different final outcome Y. Hybrid modeling enables a simulation, based on the integrated know-how, of how for given initial conditions Z a process could evolve. In the second step this predicted evolution is linked based on a historical model 23 to the final product quality Y. Hence, while retaining the general goal of finding the optimal conditions Z to reach the target outcome Y optimal , this procedure supplements a lot of certainty on the final outcome compared to the black-box approach, based on a knowledge-supported, projected architecture of possible bridges between process start and end. In the absence of long-standing experience in the problem field, such dynamic progression can be also simulated based on simple, yet effective dynamic interrelationships or based on stochastic approaches, which are both also being utilized in modeling the Covid-19 pandemic. 1, 24, 25 In the past few years, hybrid modeling has become increasingly popular in the bioprocessing domain leading to be considered a main new direction. 26 Hybrid modeling, in the context of therapeutic protein manufacturing models, demonstrated its enabling potential in applications such as monitoring and forecasting, 27,28 control, 29,30 optimization, 31, 32 and also in downstream processing. 33 Despite real-world problems seldom existing in isolation, heterogeneity is often a governing factor in finding a solution. This means that despite the availability of some prior data and/ or know-how, their alignment is complex due to structural or phenomenological differences. The characteristic sequence of the Covid-19 pandemic spread provides countries affected at later stages the possibility to learn from the data from the previously affected ones. Such learning is obviously limited as countries strongly vary in organization and capacity of their health system, population size and density, and geographical location, etc. Nonetheless, certain effects such as characteristic symptoms, contagiousness, risk groups, lethality, etc. could be identified even without or with limited data available. Human beings possess a powerful cognitive ability for such knowledge transfer, while traditional optimization solvers usually lack such ability in their search strategy. 34 In machine learning such a concept is described as transfer learning, 35 where one "generic" part of the model, usually the first layers of a convolutional neural net, is learnt from generally available data and then the data of a specific system is used to fine-tune the model to that specific use case. 36 . Enabling possibility of hybrid modeling to learn process dynamics and support forecast of final product quality. Such a two-step procedure enables complete simulation of process and product quality based on different process designs and optimizing the design space to reach optimal product quality. Industrial & Engineering Chemistry Research pubs.acs.org/IECR Commentary In many engineering sectors including biopharmaceutical processes, process development or operations, teams are exposed to new entities such as new cell lines to be used in the bioprocess, new column material for purification or abnormal effects such as unusual levels or profiles of characteristic process variables in the manufacturing plant. Often, such situations must be managed under time pressure, and decisions are made on the readily available data and best educated guesses by experts. It is an infrequent practice to rigorously include any similar data and know-how from previous activities directly into the decision process due to often severe levels of heterogeneity, which can be a result of (partially) different utilized devices, scales, and materials as well as differently structured or quantified data. 37 A smart digital solution enabling an automatic leverage of available prior information from heterogeneous sources by reliably deducing the transferrable know-how could enable a tremendous breakthrough for supporting complex decision making in biopharma manufacturing. 38 The general structure of hybrid models is quite attractive to apply such transfer learning concept in small data environments, where the mechanistic backbone accounts for major generic effects while the machine learning part enables fine-tuning based on the limited available data, for instance to a specific molecule. In process development such an approach capitalizes strongly on all available information from previous development activities. This could not only support pharma companies, but particularly also customer manufacturing organizations (CMOs) the business model of which scales even more with delivering on time, and which are exposed to a large level of diversity. Of course, a beneficial implementation of broadly applicable transfer learning must go through a rigorous digitalization and integration of all data archives, 39 which requires a tremendous preparation and investment. 40 Other industries such as finance have demonstrated the impact and potential of such digital transformation. 41 Although hybrid modeling has not been reported in direct connection with transfer learning in biomanufacturing, several transfer use cases based on datadriven techniques have been already conducted to further explore and adapt model-supported transfer learning in biopharma. Examples include extrapolation from low to high performing conditions with hybrid models, 18 cross-scale prediction, and cross-molecule prediction with multivariate 42−44 and with adaptive machine learning techniques. 45 It goes without saying that, in a situation such as the Covid-19 pandemic, a trustworthy forecast of the near future would be priceless. Thereby, one has to comment that it is not only the knowledge of the future evolution that counts but also the underlying understanding on its relation to the introduced regulations. This understanding is vital to make a solid decision among many potential alternatives. Hence, taking the definitions in Figure 3 , at each point of time one would like to understand how changes in Z affect the process outcome Y, and which combination of Z is optimal to reach the desired outcome Y optimal . While this theoretically represents a classical optimization problem, in the process development lab and on the manufacturing floor, the different teams require a practically relevant representation of such a solution which is connected as much as possible to their daily workstream and mindset as well as associated decision making process. The possibility to simulate different future scenarios and compare the results must therefore be presented in a visually comprehensible, tangible, manageable, and transferrable form. Figure 4 presents the added value generated by different levels of technological complexity and integration of digital solutions in biopharmaceutical manufacturing. As highlighted in Figure 4 , even further value can be created not only if the potential of hybrid modeling and transfer learning is assessable through a practically designed digital twin but also if such a digital twin is directly connected to the process and becomes an active stakeholder of the decision making process. 46 Being set up in real-time connection to all data generating devices, all accessible and consistently learning models, and the process control layer, such a digital twin can not only provide predictive-model based real-time alerts, but also automatically take actions based on optimization across different scenarios. 47 While at the process development level, such digital twin-based controls could be used for efficient process design, at the manufacturing level, predictive quality and predictive manufacturing are likely to be the central applications. It is important to highlight that such digital twins should be realized across all interconnected unit operations, to enable communication, scheduling, and optimization across the entire plant. 48 Another very important application in bioprocessing is smart operations of parallel high-throughput experimental systems. Here such digital twins can efficiently learn across all of the ongoing operations and reduce (in real-time) the redundant information, while consistently redesigning the experiments to provide further knowledge. 49, 50 In such cases, experimental systems and digital twins must actively collaborate on simultaneously improving process understanding as well as the process itself. AND REGULATED INDUSTRIES If human health or even survival is affected by the decision process, such decisions must be accurately documented, validated, and surveilled. In biopharmaceutical manufacturing, health authorities impose stringent regulations on the process design to ensure consistent product quality. Thereby, regulations such as the quality guidelines by the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use 51 also actively incentivize the utilization of model-based solutions to support understanding and operation of the complex processes. The smart digital solution-enabled stabilization of decisions through robust learning from previous know-how and data should be positively embraced by drug producers as well as health authorities. However, in manufacturing operations which are based on Industrial & Engineering Chemistry Research pubs.acs.org/IECR Commentary decisions either actively introduced or supported by such models, a detailed assessment of these smart digital solutions is required. This will inevitably result in a critical confrontation of smart manufacturing procedures and smart humans. 52 It can be expected that a growing number of companies will increasingly utilize advanced predictive solutions besides the commonly utilized, static multivariate techniques, which given their linear nature are much simpler to validate for Good Manufacturing Practice (GMP) utility. This experience will very likely provide more clarity on the limits of complexity which can be introduced into such smart digital solutions to ensure transparency and trackability for health authorities, but also on the general filing procedure of a highly interconnected, digital-twin-supervised manufacturing facility. He is cofounder and COO of DataHow AG, an internationally active spin-off company from ETH Zurich. The company specializes in process digitalization, data analytics, and modeling with a particular focus on the biopharmaceutical domain. He also holds a lecturer position for statistics for chemical engineers at ETH and continues collaborating on academic projects in his field of expertise. This invited contribution is part of the I&EC Research special issue for the 2020 Class of Influential Researchers. The author expresses deep gratitude to his colleagues and collaborators to jointly develop the vision on the potential of digitalization in bioengineering expressed in this commentary. The Cost of Staying Open: Voluntary Social Distancing and Lockdowns in the US. SSRN Electron How Will Country-Based Mitigation Measures Influence the Course of the COVID-19 Epidemic? (4) WHO. Coronavirus Disease A Fiasco in the Making? As the Coronavirus Pandemic Takes Hold, We Are Making Decisions without Reliable Data. GV Wire Fed-Batch and Perfusion Culture Processes: Economic, Environmental, and Operational Feasibility under Uncertainty The Market of Biopharmaceutical Medicines: A Snapshot of a Diverse Industrial Landscape Biopharmaceutical Benchmarks Evolving Trends in MAb Production Processes Production of Protein Therapeutics in the Quality by Design (QbD) Paradigm. Top Engineering Challenges in Therapeutic Protein Product and Process Design Big Data in Biopharmaceutical Process Development: Vice or Virtue? Workflow for Criticality Assessment Applied in Biopharmaceutical Process Validation Stage 1 Covid-19 -Navigating the Uncharted COVID-19 in Italy: Momentous Decisions and Many Uncertainties. The Lancet Global Health The Scientific Literature on Coronaviruses, COVID-19 and Its Associated Safety-Related Research Dimensions: A Scientometric Analysis and Scoping Review Hybrid Semi-Parametric Modeling in Process Systems Engineering: Past, Present and Future A New Generation of Predictive Models: The Added Value of Hybrid Models for Manufacturing Processes of Therapeutic Proteins Model-Based Methods in the Biopharmaceutical Process Lifecycle Identification of Manipulated Variables for a Glycosylation Control Strategy Application of Quality by Design to the Characterization of the Cell Culture Process of an Fc-Fusion Protein Industrial & Engineering Chemistry Research pubs.acs.org/IECR Commentary Enhanced Process Understanding and Multivariate Prediction of the Relationship between Cell Culture Process and Monoclonal Antibody Quality Insights into the Dynamics and Control of COVID-19 Infection Rates Early Dynamics of Transmission and Control of COVID-19: A Mathematical Modelling Study Hybrid Modeling for Quality by Design and PAT-Benefits and Challenges of Applications in Biopharmaceutical Industry Hybrid Modeling as a QbD/PAT Tool in Process Development: An Industrial E. Coli Case Study Hybrid-EKF: Hybrid Model Coupled with Extended Kalman Filter for Real-time Monitoring and Control of Mammalian Cell Culture Quality by Control: Towards Model Predictive Control of Mammalian Cell Culture Bioprocesses A General Hybrid Semi-Parametric Process Control Framework Hybrid Metabolic Flux Analysis/Data-Driven Modelling of Bioprocesses Systematic Interpolation Method Predicts Protein Chromatographic Elution from Batch Isotherm Data without a Detailed Mechanistic Isotherm Model Insights on Transfer Optimization: Because Experience Is the Best Teacher Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning Role of Knowledge Management in Development and Lifecycle Management of Biopharmaceuticals Machine Learning: Overview of the Recent Progresses and Implications for the Process Systems Engineering Field Provable Data Integrity in the Pharmaceutical Industry Based on Version Control Systems and the Blockchain Organizational Transformation for Sustainable Development: A Case Study, Management of Permanent Change Digital Finance and FinTech: Current Research and Future Research Directions Cross-Scale Predictive Modeling of CHO Cell Culture Growth and Metabolites Using Raman Spectroscopy and Multivariate Analysis Line and Real-Time Prediction of Recombinant Antibody Titer by in Situ Raman Spectroscopy Sequential Multivariate Cell Culture Modeling at Multiple Scales Supports Systematic Shaping of a Monoclonal Antibody Toward a Quality Target A Machine-Learning Approach to Calibrate Generic Raman Models for Real-Time Monitoring of Cell Culture Processes Accelerating Biologics Manufacturing by Modeling or: Is Approval under the QbD and PAT Approaches Demanded by Authorities Acceptable without a Digital-Twin? Processes Processwide Control and Automation of an Integrated Continuous Manufacturing Platform for Antibodies Integrated Optimization of Upstream and Downstream Processing in Biopharmaceutical Manufacturing under Uncertainty: A Chance Constrained Programming Approach Online Optimal Experimental Re-Design in Robotic Parallel Fed-Batch Cultivation Facilities Monitoring Parallel Robotic Cultivations with Online Multivariate Analysis Commentary: The Smart Human in Smart Manufacturing