key: cord-1046011-2ezzrbmj authors: Kang, SungKu; Jin, Ran; Deng, Xinwei; Kenett, Ron S. title: Challenges of modeling and analysis in cybermanufacturing: a review from a machine learning and computation perspective date: 2021-08-04 journal: J Intell Manuf DOI: 10.1007/s10845-021-01817-9 sha: c18d6221a51e21138dbfa5fc4d459fdb84170778 doc_id: 1046011 cord_uid: 2ezzrbmj In Industry 4.0, smart manufacturing is facing its next stage, cybermanufacturing, founded upon advanced communication, computation, and control infrastructure. Cybermanufacturing will unleash the potential of multi-modal manufacturing data, and provide a new perspective called computation service, as a part of service-oriented architecture (SOA), where on-demand computation requests throughout manufacturing operations are seamlessly satisfied by data analytics and machine learning. However, the complexity of information technology infrastructure leads to fundamental challenges in modeling and analysis under cybermanufacturing, ranging from information-poor datasets to a lack of reproducibility of analytical studies. Nevertheless, existing reviews have focused on the overall architecture of cybermanufacturing/SOA or its technical components (e.g., communication protocol), rather than the potential bottleneck of computation service with respect to modeling and analysis. In this paper, we review the fundamental challenges with respect to modeling and analysis in cybermanufacturing. Then, we introduce the existing efforts in computation pipeline recommendation, which aims at identifying an optimal sequence of method options for data analytics/machine learning without time-consuming trial-and-error. We envision computation pipeline recommendation as a promising research field to address the fundamental challenges in cybermanufacturing. We also expect that computation pipeline recommendation can be a driving force to flexible and resilient manufacturing operations in the post-COVID-19 industry. In the era of Industry 4.0, advanced communication, computation, and control infrastructures are integrated into cyber-physical systems (CPS) incorporating a network of multiple manufacturing systems. paradigm of smart manufacturing, called cybermanufacturing. Conventional smart manufacturing has relied on traditional data-driven decision-making methods (e.g., statistical process control) to improve the performance of individual manufacturing systems. On the other hand, the next smart manufacturing, cybermanufacturing, is founded upon a CPS that constitutes a physical entity (e.g., physical equipment with sensor systems) and its corresponding cyber entity (e.g., a computer simulation model of the physical system, also referred to as digital twin (Kendrik et al. 2020) ). Thus, the concept of digital twins can be closely integrated at all scales and levels (i.e., from individual equipment level to systems and networks of equipment) in the context of manufacturing operations, while the application of the general concept of digital twin is not restricted to cybermanufacturing. In cybermanufacturing, the behavior of physical entities can be timely sensed, predicted, and controlled via cyber entities. Cybermanufacturing is the next stage of smart manufacturing, and it is a significant leap from conventional smart manufacturing. Specifically, cybermanufacturing allows recording and registering all manufacturing events to improve the availability of manufacturing data, as well as provides ubiquitous computation resources and autonomous robotics to improve manufacturing efficiency and flexibility . With state-of-the-art machine learning methods, cybermanufacturing is expected to unleash the power of multi-modal manufacturing data, thus enhancing manufacturing efficiency, cost reduction, product quality, flexibility, and domain knowledge discovery. For example, case studies report on the applications of cybermanufacturing to innovate important sectors of manufacturing, including manufacturing logistics , anomaly detection (Stojanovic et al. 2016; Pagani et al. 2020; Yan et al. 2020) , and quality improvement (Oliff and Liu 2017) . In addition, not only large manufacturing corporations like Stanley Black & Decker (Cisco 2019) and Siemens (Siemans 2019) , but also small/medium-sized companies can benefit from cybermanufacturing. For example, a recent study has demonstrated the feasibility of adopting cybermanufacturing in small/medium-sized spring manufacturing companies by utilizing affordable sensor systems (Kuo et al. 2017) . In other words, the transition into cybermanufacturing is not only about maintaining competitiveness, but also becoming essential to survival in Industry 4.0. In the meantime, disruptive events, such as the COVID-19 pandemic, are calling for a more resilient manufacturing industry from unexpected downs of facilities, supply chains, and human resources (Loayza and Pennings 2020; Ivanov and Dolgui 2020) . This accelerates the adoption of datadriven decision-making methods in manufacturing, so that the majority of manufacturing decisions in production, supply chain, quality engineering, and reliability engineering will be automatically made. This will allow the workforce to focus more on the insight-needed and situation-dependent tasks, such as business planning and new process/product development, rather than the basic and repetitive tasks (Chen and Jin 2017) . In this direction, the concept of computation services in manufacturing has been proposed , to describe a situation where manufacturing data is automatically collected and processed in ubiquitous computation units, such as the Cloud and the Fog nodes Zhang et al. 2019 ). The goal is to provide real-time or online computation results and meet on-demand computation requests from manufacturing processes, systems, and users for decision-making. Thus, minimal labor-intensive training and tuning works are needed to effectively use data analytics and machine learning methods in such a computation service. In this paper, we focus on the "modeling and analysis" type of problems in manufacturing (Van Luttervelt et al. 1998; Shi 2006) , which has served as the foundation for process monitoring, root-cause diagnosis, and control (Monostori and Prohaszka 1993; Duffie and Malmberg 1987; Prabhu and Duffie 1995) . Specifically, we review the modeling and analysis methods in smart manufacturing, and discuss the fundamental challenges to use those methods in cybermanufacturing. Other important research topics, such as sensor systems, computation infrastructure, production planning, and control/robotics, will not be discussed in this paper. For more detailed information about the advancement of sensor, robotics technologies, and human-machine interaction, the readers are referred to the recent reviews (Schütze et al. 2018; Bahrin et al. 2016; Kendrik et al. 2020; Cadavid et al. 2020; Derigent et al. 2020; Ansari et al. 2018; Paelke et al. 2020) . To streamline computation service in cybermanufacturing, it is necessary to identify a proper configuration of data analytics and/or machine learning methods providing the best computation performances for a contextualized computation task, such as variation analysis and anomaly detection. Here, one effective configuration is called a "computation pipeline", which is a sequence of method options involving data collection, data preprocessing, data analytics methods, etc. (Sparks et al. 2017; . Adopting the concept of computation pipeline can comprehensively show how the data are sourced and processed to generate computation outcomes . In cybermanufacturing, efficient identification of a proper computation pipeline under a given contextualized computation task ensures the resiliency of cybermanufacturing by improving the flexibility of the configuration of machine learning methods. In this paper, we will explore the recent advancement in selecting the computation pipelines, especially focusing on computation pipeline recommendation. The remainder of this work is organized as follows. Section 2 provides an overview of smart manufacturing with respect to modeling and analysis. Section 3 focuses on a systematic review of the challenges of modeling and analysis in cybermanufacturing. Then, we review the computation services and computation pipelines as a promising research area in cybermanufacturing in Sect. 4. Lastly, we summarize the contribution of this paper with some discussions in Sect. 5. We envision that active investigation of computation pipeline selection/recommendation will catalyze the adoption of cybermanufacturing by providing rapid and effective decision-making in the post-COVID-19 industry. The third industrial revolution involved the use of simple embedded systems, such as sensors and programmable logic controllers to achieve an unprecedented level of automation in manufacturing. With the extensive use of embedded Fig. 1 The progression of modeling and analysis methods in the manufacturing industry systems in the manufacturing industry, the third industrial revolution has significantly improved the throughput, efficiency, and product quality in the entire manufacturing industry, while considerably reducing human labor. This innovation opened the era of "smart manufacturing", which is capable of utilizing the sensor data to enable data-driven decision-making (e.g., variation analysis and anomaly detection) and then precisely controlling the equipment with numerical controllers (Kenett et al. 2019) . However, smart manufacturing has experienced a stagnant period due to its fundamental limitations, which calls for the transition to its next stage: cybermanufacturing. In the rest of this section, we revisit the progression of modeling and analysis under smart manufacturing in Sect. 2.1. Then, we review the limitation of conventional smart manufacturing, and the advancement of enabling technologies that stimulates the transition into cybermanufacturing in Sect. 2.2. In the manufacturing industry, modeling and analysis involve quantifying the relationship between process variables and manufacturing performance variables (e.g., product quality) as well as the inter-relationships among process variables for accurate prediction of the results of manufacturing operations (Van Luttervelt et al. 1998) . Modeling and analysis play the most important role in manufacturing, by providing the foundation for process monitoring, root-cause diagnosis, and control (Monostori and Prohaszka 1993; Duffie and Malmberg 1987; Prabhu and Duffie 1995) . The modeling and analysis methods in manufacturing have evolved in accordance with the advancement of domain knowledge and technologies. Figure 1 shows the progression of modeling and analysis methods in the manufacturing industry. Motivated by manufacturing modeling needs, first principle modeling have been firstly developed for modeling and analysis, based on physical principles in Thermodynamics, Fluid Mechanics, etc. This approach involves deriving exact solutions for ordinary/partial differential equations (ODE/PDEs) (e.g., Navier-Stokes equations) or solving an approximation of ODE/PDEs via numerical methods (e.g., finite element analysis, FEA) for efficiency (Meguid et al. 1999; Roberts et al. 2009; Wen et al. 2018) . However, first principle modeling is limited when there is a significant gap between the assumption of physical principles (e.g., ideal boundary conditions) and the actual manufacturing condition. Therefore, the statistical design of experiments (DOE) became an active research area for modeling and analysis in manufacturing. Originated from the design and analysis of agriculture experiments (Fisher 1919) , DOE has been widely applied to manufacturing applications and computer simulations (Box et al. 2005; Wu and Hamada 2011; Santner et al. 2003; Kenett et al. 2013) . The design and analysis of physical experiments provide valuable information about the manufacturing process under controllable settings. While the design and analysis of computer experiments can seek information on the process when the physical experiments are too costly to even impossible to conduct. Founded upon DOE, computer model calibration has been advanced (Kennedy and O'Hagan 2001; Oakley and O'Hagan 2002; Higdon et al. 2008) , such that first principle models can be calibrated with the observational process data to provide more accurate results. In addition, DOE has also contributed to modeling and analysis via (1) variable screening, (2) treatment comparison, (3) uncertainty quantification, (4) system optimization, and (5) system robustness (Wu and Hamada 2011) . In the last few decades, the paradigm of smart manufacturing has significantly advanced sensing technologies, which allows the incorporation of in situ process variables in modeling and analysis. This advancement has enabled compensation of modeling and analysis methods based on in situ process data to improve product quality, efficiency, as well as the automation of manufacturing process. An example includes the compensation of machine tool positioning error based on the in situ sensor data from multiple thermal sensors (Yang et al. 1999; Yang and Ni 2005) . In the meantime, the advancement of sensing technology also provides a data-rich environment, where a large amount of data can be collected from heterogeneous manufacturing systems in multiple modalities. Specifically, the manufacturing industry is known to generate more data than any other industries, thus efficiently utilizing the data has become one of the most important factors to be successful in the manufacturing industry (Hartmann et al. 2015) . There are two promising directions for utilizing manufacturing data: data fusion and machine learning methods. Data fusion refers to the methods of integrating different models and data sources (e.g., first principle model and datadriven model) (Zhao et al. 2011) or different types of data sets (e.g. observational data and DOE data) Dalla Valle and Kenett 2018) . In the manufacturing industry, data fusion has demonstrated its significant advantages in modeling and analysis by utilizing data with different sources, formats, or distributions (Zhao et al. 2011; Jin and Deng 2015; Deng and Jin 2015; Sun et al. 2017 ). On the other hand, machine learning refers to the methods of building a mathematical model based on data, such that the model can make predictions or decisions without being explicitly programmed to do so. Given abundant data, machine learning methods have shown superior performances in modeling complex manufacturing processes in the past few years. Recently, with the advancement of computation infrastructure and increased availability of manufacturing data, machine learning and artificial intelligence (AI) algorithms, such as deep learning, are actively adopted in the manufacturing industry (Wang et al. 2018a ). The most notable feature of deep learning is that its multiple hidden layers allow learning important features purely from the complex manufacturing data (e.g., image or temporal data), without expert domain knowledge. For example, the convolutional neural network (CNN) and its variants, originally designed for image analysis, are well suited for the analysis of image data from manufacturing process. The CNN has been widely adopted for product quality inspection and fault assessment in manufacturing process (Park et al. 2016; Weimer et al. 2016) . Another example is recurrent neural network (RNN) and its variants, originally designed for the analysis of temporal behavior, are suitable for the predictive analysis based on historical records. Therefore, RNN has been widely adopted for prognosis in manufacturing process (Malhi et al. 2011; Zhao et al. 2018 ). In the meantime, a known limitation of such AL algorithms is that they are mostly black-box models in nature with highly nonlinear model structure, thus not interpretable to human (Carvalho et al. 2019) . Recent trends in machine learning community investigate model interpretability to improve the black-box models, such as the works on explainable machine learning (XAI) (Arrieta et al. 2020 ) and knowledge-based artificial neural network (KBANN) (Coatanéa et al. 2018) . Such advancements are promoting a trend to develop machine learning methods for modeling and analysis in manufacturing (Dagli 2012) . In this context, data fusion and machine learning methods are expected to play crucial roles in cybermanufacturing. Regardless of the success of smart manufacturing for decades, the fundamental limitations of conventional smart manufacturing have been proposed as follows. First, the scalability of the data fusion and machine learning methods is limited, as the manufacturing systems and cyber resources (e.g., computation resource, data storage) are not connected to each other. In addition, the manufacturing systems are not operated in a synergistic manner nor jointly using the data collected from different manufacturing systems. Second, conventional smart manufacturing cannot fully take advantage of state-of-the-art machine learning models nor use the potential of multi-modal manufacturing data, as the computation capability of simple embedded systems is limited. While utilizing external computation resources (i.e., Cloud computing service) can be an option to alleviate this issue, it cannot be a reliable solution since its reliability is highly affected by network conditions. This limits the efficiency and reliability of modeling and analysis under conventional smart manufacturing, especially where the given contextualized computation task is computationally demanding and/or requires responsive and reliable decision making. Third, the flexibility (i.e., the ability for re-configuration) of the entire system is limited, as conventional smart manufacturing systems typically assume limited flexibility in terms of its operation (e.g., production plan, facility layout, etc.). This requires extensive cost (i.e., additional human labor, downtime, and considerable lead time to stabilize the new configuration) when an adjustment or variant management is required due to disruptive events. In this context, there has been a pressing need to innovate smart manufacturing via enhanced communication, computation, and control infrastructure. In recent years, CPS has been actively adopted in the manufacturing industry to innovate important sectors of manufacturing (e.g., manufacturing logistics , anomaly detection (Stojanovic et al. 2016; Pagani et al. 2020; Yan et al. 2020) , and quality improvement (Oliff and Liu 2017) ) aiming at addressing the aforementioned limitations (Tao et al. 2019) . By incorporating CPS, cybermanufacturing is expected to enhance intelligent production planning, predictive maintenance, flexible/autonomous manufacturing process, and human-machine integration (Jeschke et al. 2017 ). There are three enabling technologies towards cybermanufacturing to form the infrastructure to support the future generation of machine learning and computation in manufacturing (Gorecky et al. 2014; Xiong et al. 2015; Tao et al. 2019; Jaloudi 2019) . Communication Infrastructure There have been two important advancements in communication infrastructure. First, the advancement of wireless communication technologies, such as 5G and Wi-Fi 6, enables a seamless connection among manufacturing systems, which allows efficient data (Profanter et al. 2019) . Coupled with Time Sensitive Networking (TSN) standard, the protocols will allow deterministic real-time machine to machine communications with low latency, which is one of the most important features of cybermanufacturing (Arestova et al. 2021) . Computation Infrastructure There have been two important advancements in computation infrastructure. First, the capability of computation resources (e.g., graphics processing unit (GPU) and tensor processing unit (TPU)) have been significantly improved to afford computation-intensive tasks (e.g., analyzing video streams for inspection). Second, combined with communication infrastructure, such computation resources become ubiquitous, thus accessible as cloud node (high performance, but high latency due to remote connection) as well as Edge or Fog nodes (low performance, but low latency due to proximity) via the network. Control Infrastructure There have been two important advancements in control infrastructure. First, the advancement of autonomous robotic systems (e.g., autonomous mobile robots (AMR) (Fragapane et al. 2020 ) and automated guided vehicles (AGV) (Mehami et al. 2018) ) enhances the flexibility of production to respond to dynamic markets. Second, the advancement of human-machine integration technologies (e.g., exoskeletons (Huysamen et al. 2018) and augmented reality (Paelke 2014)) allows efficient collaboration between humans and machines to achieve high productivity as well as responsive decision-making. Cybermanufacturing with advanced communication, computation, and control infrastructures is expected to enhance product quality/reliability, product realization efficiency, personalization, and domain knowledge discovery. However, cybermanufacturing faces several challenges of modeling and analysis with respect to machine learning and computation. Figure 2 shows an overview of the challenges. In this section, we review five important challenges of modeling and analysis in cybermanufacturing: informationpoor datasets, computation complexity, incomplete engineering knowledge, a lack of systematic understanding of the borders among different machine learning methods, and a lack of reproducibility. In this section, we make the distinction between data quality and information quality. Data quality is an intrinsic feature of the collected data. Information quality reflects the analysis and management of its outcomes, conditioned on the goals of the analysis. Data quality has been recognized as a serious challenge by statisticians and data analysts, since almost all data requires some pre-processing (e.g., filtering or cleaning) before it can be useful for a certain type of modeling and analysis via the transition to information (Kenett and Shmueli 2016) . In that transition, many additional dimensions need to be considered in order to ensure information quality. To evaluate data quality, several assessment frameworks have been proposed, e.g. Wang et al. (1995) . To ensure information quality, a framework based on four components and eight dimensions has been proposed in Kenett and Shmueli (2016) , which is called InfoQ. The application of InfoQ in the context of Industry 4.0 was reviewed in ). In the InfoQ framework, information quality is defined as the utility, U , in applying a method of analysis, f , to a data set, X , conditioned on the analysis goal, i.e., I n f oQ(U , f , X , g) = U ( f (X |g)). To assess InfoQ, the following eight dimensions are considered: 1) Data Resolution, 2) Data Structure, 3) Data Integration, 4) Temporal Relevance, 5) Chronology of Data and Goal, 6) Generalizability, 7) Operationalization and 8) Communication. Specifically, each of the eight dimensions is quantified with respect to each of the components U , f , X , and g by answering a set of relevant questions, and the then components are weighted to derive InfoQ value. For more details on the calculation of InfoQ in a practical setting, the readers are referred to the literature (Reis and Kenett 2018) , and JMP (Sall et al. 2017) which is a suite of computer programs for statistical analysis providing an InfoQ add-on (Kenett and Shmueli 2016) to calculate InfoQ. These frameworks provide important criteria to evaluate data and information quality so that one can quantitatively and/or qualitatively assess the data and information quality conforming to a specific contextualized computation task. In the era of Industry 4.0, the data quality issue is much more demanding than ever before, since modern sensing systems are capable of collecting an enormous amount of data over a long period of time from distributed heterogeneous manufacturing systems (Cai and Zhu 2015; Luthra and Mangla 2018) . While such a sensing system can collect a massive amount of data for machine learning, it may also include a lot of poorly integrated or redundant information which results in so-called "information-poor" datasets. Information-poor datasets lead to the following limitations in cybermanufacturing (Cai and Zhu 2015; Li et al. 2016; Dai et al. 2020; Ismail et al. 2019) . First, machine learning methods may be misled by the redundant information (i.e., low signal-to-noise ratio), which significantly limits the effectiveness and efficiency of manufacturing modeling and analysis. Moreover, due to the tremendous size of datasets, it is difficult to assess the data quality in a reasonable amount of time. Second, since cybermanufacturing involves the collection of data from heterogeneous manufacturing systems, it brings a wide variety of data types and heterogeneous data structures. This leads to unbalanced data, thus makes it difficult to integrate data for efficient manufacturing modeling and analysis. Lastly, in the era of Industry 4.0, a single type of data involves multiple contextualized computation tasks. For example, the data related to equipment (e.g., equipment vibration) may be used for fault diagnosis, preventive maintenance scheduling, and quality prediction at the same time. However, as information quality depends on the objective of a contextualized computation task, a single dataset may or may not conform to the information quality standards required in the multiple contextualized computation tasks. This makes it challenging to ensure the information quality from a dataset. Cybermanufacturing focuses on the convergence of the physical entities (e.g., manufacturing equipment) and the cyber entities (e.g., simulated equipment), such that the dynamic changes of the physical entities can be predicted and analyzed through the corresponding cyber entities (Qi and Tao 2018) . Therefore, the existing efforts aimed at achieving realistic cyber entities via accurate computer simulation models and machine learning methods (Qi and Tao 2018; Störmer et al. 2018 ). An example, in the context of decision tree, predictive analytics is the work of Ben-Gal et al. (2014) . In that paper, the authors presented the dual information distance (DID) method, which selects features by considering both immediate contributions to the classification as well as their future potential effects. Specifically, the DID method constructs classification trees by finding the shortest paths over a graph of partitions that are defined by the selected features. The authors demonstrated that the method takes into account both the orthogonality between the selected partitions as well as the reduction of uncertainty on the class partition, such that it outperforms popular classifiers in terms of average depth and classification accuracy. However, computation complexity is becoming one of the most challenging issues in realizing realistic cyber entities. First, high-fidelity computer simulations, which enable accurate prediction of the behavior of physical entities via the cyber entities, are typically too time-consuming. This issue becomes more challenging when the manufacturing process involves computationallyintensive multiphysics computer simulations (Dbouk 2017) . For example, recent work indicates that a single run of computer simulation for the fused deposition modeling process takes more than 20 hours (Li et al. 2018 ), which could result in a significant delay in updating the cyber entities. Second, as the advancement of sensing and communication systems enables the collection of enormous data from distributed manufacturing systems, machine learning methods to build the cyber entities require significantly higher computation cost and/or communication bandwidth across cybermanufacturing infrastructure. The issue of computation complexity leads to the following limitations in cybermanufacturing (Singh et al. 2018; Modoni et al. 2019; Bevilacqua et al. 2020; Rasheed et al. 2020) . First, computation complexity involving modeling complex behaviors of heterogeneous manufacturing systems and their interactions delays the manufacturing modeling and analysis for personalized demands, which limits the timeliness of personalization. This is an important issue, since cybermanufacturing aims at achieving efficient personalization of products by utilizing heterogeneous manufacturing systems connected. Second, computational complexity also affects the ability to generate data quality (Kenett and Shmueli 2016) . Insufficient storage or computation power to handle data with adequate resolution can negatively impact the delivery of outputs from analytical work to the right persons, in the right way, at the right time, thereby reducing information quality. Manufacturing processes involve complex physical mechanisms. Therefore, underlying engineering knowledge, such as cause-effect relationships, first principle models, and computer simulation models (e.g., finite element analysis, FEA) and design rules, may be incomplete. Here, incomplete knowledge can be due to 1) incomplete understanding of the underlying physical mechanism (e.g., first principle model, material properties), 2) incomplete information about model parameters, and 3) stochastic behavior or uncertainties associated with the system or numerical algorithms. To address the issue, computer model calibration (Kennedy and O'Hagan 2001; Higdon et al. 2008 Higdon et al. , 2013 Wong et al. 2017) has been continuously studied to compensate the incomplete knowledge with observational data, and design for variation (Reinman et al. 2012) has been studied to reduce the variation of products under the incomplete knowledge and uncertainties. However, in the era of Industry 4.0, it is becoming more challenging to address the issue for the following reasons. First, while the core of cybermanufacturing is to take advantage of multi-modal manufacturing data, the data can mislead the decision-making processes if incomplete knowledge (e.g., invalid assumption in modeling a manufacturing process) involves the interpretation of the data. Second, since the existing efforts typically assume a single or only a few manufacturing systems (Feng et al. 2017) , existing computer experimental design, modeling and calibration are not easily scalable to the scenario where a lot of heterogeneous manufacturing systems connected in cybermanufacturing. In these cases, the dependability and effectiveness of cybermanufacturing may be questioned, especially in case a delicate and complex situation requires decision-making in a real-time manner (Broy et al. 2012) . These issues are more common to the newly introduced additive manufacturing ( Babu and Goodridge 2015; Yang et al. 2017; Dilberoglu et al. 2017; Jared et al. 2017; Li et al. 2018; Mahmoudi et al. 2018; Sabbaghi et al. 2018; Kenett et al. 2019) . In summary, incomplete engineering knowledge leads to limitations in cybermanufacturing. First, the manufacturing design process becomes inefficient when underlying engineering knowledge is incomplete. Specifically, it has been speculated that 75% of the cost involving product development is committed early in the engineering design process when the knowledge of the product is unclear and/or incomplete (Chandrasegaran et al. 2013) . Second, when the computer simulation or data-driven models for a manufacturing process are limited in scope, the models cannot provide adequate predictions for prognostics and health management in cybermanufacturing (Weiss et al. 2015) . This can result in inefficient planning, maintenance and logistics due to the inaccurate prediction of equipment status (Davis et al. 2012; Edgar and Pistikopoulos 2018) . Cybermanufacturing focuses on personalization and customized production, which will generate a wide variety of heterogeneous data (Thoben et al. 2017 ). In the meantime, the adequacy of a machine learning method to such heterogeneous data may be significantly different due to the underlying statistical characteristics (e.g., the distribution of data) and/or contextualized computation tasks (e.g., fault diagnosis or quality control a specific manufacturing process) . Here, we call the different adequacy as the "border" of the machine learning methods. Thus, it is important to match a specific dataset/contextualized computation task with a proper machine learning method within the border to ensure the efficiency and effectiveness of manufacturing modeling and analysis. In current practice, a typical paradigm to identify which machine learning method for use is often heuristic based on domain knowledge of a specific contextualized computation task and/or data scientist's personal experience in data analysis. Clearly, such a heuristic manner could require a large number of trial-and-errors for identifying an efficient and effective machine learning method under a given contextualized computation task. It calls for a systematic methodology to understand the border among different machine learning methods, especially in the field of manufacturing modeling and analysis. There are several challenges in Industry 4.0 due to the lack of systematic understanding of the borders among different machine learning methods. We list some of them below. First, considering the heterogeneous manufacturing systems connected in cybermanufacturing, it will require considerable lead time for identifying a proper machine learning method for each manufacturing system and computation task. For example, for a thermal spray coating process using heterogeneous spray guns, it is reported that linear regression model worked well for one spray gun was not applicable to the other spray guns, due to the violation of the assumption of samples from the same underlying distribution . Second, it is known that manufacturing processes and systems are likely to be dynamic in model relationship, due to a number of factors from raw materials, equipment status, and environment. For example, it is reported that the model parameters for crystal growth process should be adjusted based on the degradation level of the equipment . However, most of the machine learning methods can-not generate dynamic models. Therefore, it will be beneficial to efficiently match the optimal machine learning method with the degradation levels. Third, cybermanufacturing often requires different accuracy of machine learning methods with the consideration of computational cost and utility costs. A lack of understanding of borders among machine learning methods could increase efforts to select the methods not only be adequate, but also be reliable and responsive. As shown in Kang et al. (2021) , a tradeoff between the computational cost of designs and the accuracy of surrogate models could facilitate the identification of the feasible design region, which is crucial in the timeliness of personalized product realization in Industry 4.0. Meanwhile, the time latency in machine learning training process, and the unreliable computation due to computation node failure or loss of communication to Cloud will prohibit the use of advanced, but computation intensive algorithms. When machine learning methods are employed, different researchers or practitioners tend to choose different configurations (e.g., splitting of the samples for training and testing), even when they analyze the same dataset (Botvinik-Nezer et al. 2020 ). This flexibility leads to difficulties in the reproducibility of machine learning methods, and needs to be accounted for and controlled in manufacturing modeling and analysis. However, even though there have been consistent efforts to address the issue in science (Kenett and Rubinstein 2017; Botvinik-Nezer et al. 2020) , the manufacturing industry is less concerned with the issue of reproducibility of analytical studies (Kenett 2020) . In other words, companies tend to overlook the experimental works designed to improve processes and products for reproducibility using adequate statistical criteria. In the meantime, fierce competition in the era of Industry 4.0 allows only short-term opportunities to try out new products and/or new process setups, which calls for ensuring the reproducibility of machine learning methods in a contextualized computation task. A lack of reproducibility leads to the following challenges in cybermanufacturing. First, in the manufacturing industry, a lack of reproducibility of machine learning methods can result in misleading decision-making, which is very costly and time-consuming. For example, it is reported that around 50% of the costs incurred in new product development tend to be spent on unnecessary design iteration (Schütze and Störmer 2012) , which can be avoided by accurate and reliable predictions. Second, since cybermanufacturing involves efficient utilization of heterogeneous manufacturing systems connected to cybermanufacturing network (Lee et al. 2015; Jeschke et al. 2017; Wang et al. 2020a) , reproducibility should be ensured such that consistent product quality can be achieved across the cybermanufacturing network. Lastly, a lack of reproducibility of machine learning methods can result in increased product variation, which can deteriorate customer satisfaction (Luo et al. 2005; Dharmesti and Nugroho 2013) . It is an important issue, since improved customer satisfaction is one of the most important goals to be achieved in the context of Industry 4.0 (Bortolini et al. 2017; de Man and Strandhagen 2017; Bär et al. 2018) . In the post-COVID-19 industry, it is very important to automate the identification of an optimal configuration for machine learning methods and match the methods to the contextualized computation tasks (e.g., variation analysis and anomaly detection), including accuracy, responsiveness, reliability, interpretability, etc. (Wang et al. 2020b ). In the literature, this matching problem has been formulated as a computation pipeline recommendation problem . In the rest of this section, Sect. 4.1 will review the general concept of computation pipelines for machine learning. Then, Sect. 4.2 will review the state-of-art on the computation pipeline recommendation. The concept of computation pipelines for machine learning is suggested from the software engineering community to systematically organize a sequence of method options, including data collection, data preprocessing, data filtering, feature selection (optional), data-fusion/machine learning methods, computation, and post-processing . For example, Scikit-learn, which is a machine learning library for Python, proposed a computation pipeline to assemble several steps that can be cross-validated together with different setting parameters (Pedregosa et al. 2011) . Similarly, Google ® Tensorflow (Abadi et al. 2016) and Pytorch (Paszke et al. 2019) , which are the widely used deep learning platforms, suggested the idea of computational graph to organize computation pipeline for deep learning. These computation pipelines enhance the readiness and traceability to use machine learning and deep learning method options. In the manufacturing industry, most relevant works involving computation pipelines have focused on constructing an autonomous framework to tune a specific computation pipeline or only a limited number of method options. Examples of such works include the application of computation pipelines for preventive maintenance operation (O'Donovan et al. 2015) , fault prognostics (Kozjek et al. 2017) , and production planning (Wang et al. 2018b) . While the aforementioned works are applicable to a specific contextualized computation task, they cannot be adequate in different tasks when modeling assumptions are violated (e.g., the underlying distribution of data is different). In other words, to ensure the effectiveness of manufacturing modeling and analysis, one should efficiently switch to a proper computation pipeline from a number of alternatives that fits well with the scenario. However, the current practice relies on trial-and-errors according to domain knowledge and experiences, which is too time-consuming to identify the optimal method options for a proper computation pipeline. Computation pipeline recommendation is a relatively new research field aiming at identifying a proper computation pipeline from a number of alternative configurations. Figure 3 shows an example of computation pipeline recommendation. Given different contextualized computation tasks, computation pipeline recommendation efficiently explores available computation pipelines (shown as the connections between the method options in Fig. 3 ) and provides the optimal pipeline for each of the contextualized computation tasks. This example illustrates providing different optimal computation pipelines for scenario 1 (marked as red computation pipeline in Fig. 3 ) and scenario 2 (marked as blue computation pipeline in Fig. 3) . Sparks et al. (2017) and are the pioneers of computation pipeline recommendation. Sparks et al. (2017) presented a system called KeystoneML, which enables the development of end-to-end computation pipelines for machine learning. Specifically, by capturing the endto-end machine learning application, KeystoneML could optimize the whole-pipeline such that it can automatically adapt to changes in data, hardware, and other environmental factors. On the other hand, proposed a learning-to-rank method to rank the pipelines with Top-N prediction accuracy to identify proper pipelines for quality prediction. They demonstrated the effectiveness and efficiency of the proposed method via multiple datasets from thermal spray coating, aerosol jet ® printing, and fused deposition modeling. For the datasets, their method could ease the use of data-fusion and machine learning methods and effectively avoid unnecessary computation workloads involving exhaustive exploration of all the available computation pipeline. Similar works relevant to computation pipeline recommendation include the work called "data analytics as a service (Ardagna et al. 2018) ". Specifically, the approach aims at identifying proper computation pipeline conforming to a specific set of requirements and/or platforms, by utilizing a model-based approach founded upon a domain ontology on big data services. Computation pipeline recommendation has been adopted not only to improve the quality prediction, but also to improve other important sectors of Industry 4.0: informative visualization and efficient human-machine collaboration. In the following-up research of , they extended the concept of computation pipeline recommendation to the personalization of a visualization system, which is called Personalized Recommender System for Information visualization Methods via Extended matrix completion (PRIME) . The main improvement of PRIME over the previous method lies in incorporating the wearable sensor data for pipeline recommendation, such that pipeline recommendation can be adopted to improve human-computer interaction to acquire insights from complex datasets. Specifically, PRIME quantitatively models covariates (i.e., wearable sensor data) to predict recommendation scores (e.g., perceived complexity, mental workload, etc.) for users to adapt the visualization specific to the contextualized computation task. In addition, PRIME can make accurate recommendations even for new users or new contextualized computation tasks based on historical wearable sensor signals and recommendation scores. demonstrated that PRIME could achieve satisfactory recommendation accuracy for adapting visualization, even when there are limited historical datasets. This capability contributes to designing a new generation of visualization systems that adapt to users' real-time status. PRIME can support researchers in reducing the sample size requirements to quantify individual differences, and practitioners in adapting visualizations according to user states and contextualized computation tasks in a real-time manner. Computation pipeline recommendation is expected to be generalized such that it can stimulate the broader adoption of machine learning in the manufacturing industry, in the context of cybermanufacturing. For example, proposed an improved computation pipeline recommendation framework, named as Adaptive computation Pipelines (AdaPipe) . The main improvement of AdaPipe over the previous methods lies in considering the similarities of computation pipelines from crowdsourcing and word embedding, and features of contextualized computation tasks (i.e., process types, manufacturing settings, and quality specifications). In addition, AdaPipe can be adopted to recommend computation pipelines for a wide variety of contextualized computation tasks, including the lowest prediction error, lowest time latency, etc.. They demonstrated the effectiveness and efficiency of AdaPipe via 60 bootstrapped data sets from thermal spray coating, printed electronics, and additive manufacturing processes, and 27 computation pipelines. The generalization and further advancement of computation pipeline recommendation will provide a systematic and efficient methodology to explore a number of method options without time-consuming trial-and-error. We envision that this will contribute to addressing the key challenges reviewed in this paper by providing a proper computation pipeline if a contextualized computation task is susceptible to either of the key challenges. To achieve the goal, the future direction for computation pipeline recommendation lies in incorporating the issues related to the challenges (e.g., information quality, computation complexity, available engineering knowledge, etc.) in its recommendation procedure. In the era of Industry 4.0, the integration of cybermanufacturing and state-of-the-art machine learning methods is expected to greatly improve product quality, product realization efficiency, personalization, and domain knowledge discovery. In this work, we have focused on reviewing several key challenges in modeling and analysis in the transition into cybermanufacturing. These include required learning times in heterogeneous manufacturing systems, matching of optimal machine learning methods with equipment degradation levels, and the adaptation of machine learning methods with considerations of computational cost and utility costs. The overall goal is to optimize system reliability and responsiveness. Aligned with the challenges, we have provided a promising direction of using computation pipeline recommendation as an enabling technology of cybermanufacturing. The generalization of computation pipeline for all algorithms will facilitate the broader adoption of machine learning in the manufacturing industry by efficiently providing an optimal configuration of machine learning methods without extensive trial-and-error. Note that efficient deployment of machine learning methods in cybermanufacturing becomes more important than ever before, the use of computation pipeline can be a driving force to flexible and resilient manufacturing operations in the post-COVID-19 industry. Areas for further work include (1) investigating the impact of information quality in computation, (2) better understanding the reproducibility and repeatability of computational flows, (3) integrating analytic recommendations with constraints from communication, computation, and control, and (4) further extension of computation pipeline to improve the broader area of human-machine interaction (e.g., mutual learning between human and machine). This paper provides an initial mapping of such challenges. The authors declare that they have no conflict of interest. Tensorflow: A system for large-scale machine learning Rethinking human-machine learning in industry 4.0: How does the paradigm shift treat the role of human learning? Procedia Manufacturing Model-based big data analytics-as-a-service: Take big data to the next level A service-oriented real-time communication scheme for AUTOSAR adaptive using OPC UA and time-sensitive networking Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion Additive manufacturing Industry 4.0: A review on industrial automation and robotic Considering Industry 4.0 aspects in the supply chain for an SME Efficient construction of decision trees by the dual information distance method. Quality Technology and Quantitative Management Digital twin reference model development to prevent operators risk in process plants Assembly system design in the Industry 4.0 era: A general framework Variability in the analysis of a single neuroimaging dataset by many teams Statistics for experimenters: design, innovation, and discovery Machine learning applied in production planning and control: A state-of-the-art in the era of Industry 4.0 The challenges of data quality and data quality assessment in the big data era Machine learning interpretability: A survey on methods and metrics. Electronics The evolution, challenges, and future of knowledge representation in product design systems Statistical modeling for visualization evaluation through data fusion Data fusion pipelines for autonomous smart manufacturing Adapipe: A recommender system for adaptive computation pipelines in cyber-manufacturing computation services Predictive offloading in mobile-fog-cloud enabled cyber-manufacturing systems PRIME: A personalized recommender system for information visualization methods via extended matrix completion Industrial IoT in 5G environment towards smart manufacturing Leading tools manufacturer transforms operations with iot Knowledge-based artificial neural network (KB-ANN) in engineering: Associating functional architecture modeling, dimensional analysis and causal graphs to produce optimized topologies for KB-ANNs Big data analytics for manufacturing internet of things: Opportunities, challenges and enabling technologies Social media big data integration: A new approach based on calibration Smart manufacturing, manufacturing intelligence and demand-dynamic performance A review about the engineering design of optimal heat transfer systems using topology optimization QQ models: Joint modeling for quantitative and qualitative quality responses in manufacturing systems Industry 4.0: Contributions of holonic manufacturing control architectures and future challenges The antecedents of online customer satisfaction and customer loyalty The role of additive manufacturing in the era of Industry 4.0. Procedia Manufacturing Error diagnosis and compensation using kinematic models and position error data Smart manufacturing and energy systems Process flexibility design in heterogeneous and unbalanced networks: A stochastic programming approach XV.-The correlation between relatives on the supposition of mendelian inheritance Increasing flexibility and productivity in Industry 4.0 production networks with autonomous mobile robots and smart intralogistics Humanmachine-interaction in the Industry 4.0 era Digital manufacturing: The revolution will be virtualized Computer model calibration using high-dimensional output Computer model calibration using the ensemble kalman filter Evaluation of a passive exoskeleton for static upper limb activities Manufacturing process data analysis pipelines: A requirements analysis and survey Viability of intertwined supply networks: extending the supply chain resilience angles towards survivability. a position paper motivated by COVID-19 outbreak Communication protocols of an industrial internet of things environment: A comparative study Additive manufacturing: Toward holistic design Industrial Internet of Things and Cyber Manufacturing Systems Ensemble modeling for data fusion in manufacturing process scale-up Dynamic quality-process model in consideration of equipment degradation A cost-efficient data-driven approach to design space exploration for personalized geometric design in additive manufacturing A state-ofthe-art survey of digital twin: Techniques, engineering product lifecycle management and business innovation perspectives Reviewing of applied research with an Industry 4.0 perspective Available at SSRN 3591808 Generalizing research findings for enhanced reproducibility: A translational medicine case study Available at SSRN 3035070 Information quality: The potential of data and analytics to generate knowledge A road map for applied data sciences supporting sustainability in advanced manufacturing: The information quality dimensions Systems engineering in the fourth industrial revolution: Big data, novel technologies, and modern systems engineering Bayesian calibration of computer models A data-driven holistic approach to fault prognostics in a cyclic manufacturing process Automatic machine status prediction in the era of Industry 4.0: Case study of machines in a spring factory A cyber-physical systems architecture for Industry 4.0-based manufacturing systems Integration of physically-based and data-driven approaches for thermal field prediction in additive manufacturing Industry 4.0-potentials for predictive maintenance Macroeconomic policy in the time of covid-19: A primer for developing countries Design of robust new products under variability: Marketing meets design Evaluating challenges to Industry 4.0 initiatives for supply chain sustainability in emerging economies Multivariate calibration and experimental validation of a 3D finite element thermal model for laser powder bed fusion metal additive manufacturing Prognosis of defect propagation based on recurrent neural networks An Industry 4.0 research agenda for sustainable business models Threedimensional dynamic finite element analysis of shot-peening induced residual stresses Smart automated guided vehicles for manufacturing in the context of Industry 4.0. Procedia Manufacturing Synchronizing physical and digital factory: Benefits and technical challenges A step towards intelligent manufacturing: Modelling and monitoring of manufacturing processes through artificial neural networks The evolution of manmachine interaction: The role of human in industry 4.0 paradigm. Production and Manufacturing Research Bayesian inference for the uncertainty distribution of computer model outputs Towards Industry 4.0 utilizing data-mining techniques: A case study on quality improvement An industrial big data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities Augmented reality in the smart factory: Supporting workers in an Industry 4.0. environment Automated layerwise detection of geometrical distortions in laser powder bed fusion Machine learning-based imaging system for surface defect inspection Pytorch: An imperative style, high-performance deep learning library Scikit-learn: Machine learning in python Modelling and analysis of nonlinear dynamics in autonomous heterarchical manufacturing systems control Opc ua versus ros, dds, and mqtt: performance evaluation of industry 4.0 protocols Digital twin and big data towards smart manufacturing and Industry 4.0: 360 degree comparison Digital twin: Values, challenges and enablers from a modeling perspective Assessing the value of information of data-centric activities in the chemical processing Industry 4.0 A three-dimensional finite element analysis of the temperature field during laser melting of metal powders in additive layer manufacturing Bayesian model building from small samples of disparate data for capturing in-plane deviation in additive manufacturing JMP start statistics: a guide to statistics and data analysis using JMP. Sas Institute Santner Experimentable digital twins-streamlining simulation-based systems engineering for Industry 4.0 Lean product development-enabling management factors for waste elimination Sensors 4.0-smart sensors and measurement technology enable Industry 4.0 Stream of variation modeling and analysis for multistage manufacturing processes Challenges of digital twin in high value manufacturing KeystoneML: Optimizing pipelines for large-scale advanced analytics Big-data-driven anomaly detection in industry (4.0): An approach and a case study The fit of Industry 4.0 applications in manufacturing logistics: A multiple case study Functional graphical models for manufacturing process modeling Digital twins and cyberphysical systems toward smart manufacturing and Industrie 4.0 and smart manufacturing-a review of research issues and application examples Present situation and future trends in modelling of machining operations progress report of the CIRP working group 'modelling of machining operations Deep learning for smart manufacturing: Methods and applications Big data driven cycle time parallel prediction for production planning in wafer manufacturing A collaborative architecture of the industrial internet platform for manufacturing systems Online computation performance analysis for distributed machine learning pipelines in fog manufacturing A framework for analysis of data quality research Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection Measurement science for prognostics and health management for smart manufacturing systems: key findings from a roadmapping workshop Feasibility analysis of composite fuselage shape control via finite element analysis A frequentist approach to computer model calibration Experiments: planning, analysis, and optimization Cyber-physical-social system in intelligent transportation AKM 2 D: An adaptive framework for online sensing and anomaly quantification Dynamic neural network modeling for nonlinear, nonstationary machine tool thermally induced error Thermal error mode analysis and robust modeling for error compensation on a CNC turning center Investigating grey-box modeling for predictive analytics in smart manufacturing Fog computing for distributed family learning in cyber-manufacturing modeling PDE-constrained gaussian process model on material removal rate of wire saw slicing process Machine health monitoring using local feature-based gated recurrent unit networks Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations