key: cord-0853914-12i7nwr5 authors: Chin, Hon Huin; Varbanov, Petar Sabev; Klemeš, Jiři Jaromír; Benjamin, Michael Francis D.; Tan, Raymond R. title: Asset Maintenance Optimisation Approaches in the Chemical and Process Industries - A Review date: 2020-10-08 journal: Chem Eng Res Des DOI: 10.1016/j.cherd.2020.09.034 sha: d9890566ea83c02d2070cd52d4e73f2e89431f38 doc_id: 853914 cord_uid: 12i7nwr5 The operational performance of a chemical process plant highly depends on the assets’ condition and maintenance practices. As chemical processes are highly complex systems, increasing the risk frequencies and their interactions, the maintenance planning becomes crucial for stable operation. This paper provides a critical analysis of the recently developed approaches for asset maintenance approaches in the chemical industry. The strategies include corrective maintenance, time-based, risk-based, condition-based and opportunistic maintenance. Various methods on selecting the optimal maintenance strategy are discussed as well. This paper also evaluates reliability issues in chemical plants and integrated sites encompassing the maintenance optimisation. Several directions for potential future improvements are proposed based on this analysis, as follows: (i) Potential study of exploiting production or other opportunities to postpone or conduct earlier maintenance (ii) Joint optimisation of spare part ordering strategy and data-driven maintenance planning study is needed (iii) Fault propagation modelling of structural dependent units to facilitate proper maintenance planning (iv) A framework or tool that consider quantitative and qualitative time-variant data inputs is lacking for business-informed asset maintenance. has high numbers of process-related accidents (shown in (Okoh and Haugen, 2014) ), maintenance planning for risks reduction is crucial for safe operation. This point is underscored by the catastrophic failure of the Union Carbide plant in Bhopal, which can be traced to equipment failure (Chouhan, 2005) . Maintenance management has benefited from the advent of a large area in operational research, called maintenance optimisation. This approach is based on mathematical models in which both costs and benefits of maintenance are quantified, and the optimal balance between both can be obtained. In the early 1960s, this research area was founded by several researchers, whose models and results are documented by McCall (1965) . However, the main question arises: What are the gaps of applications of the maintenance optimisation models in practical maintenance management? Dekker (1996) conducted a literature search of published works before 1990, which focus on publications with applications. The majority of the analysed works dealt with the use of maintenance optimisation models with real case studies. The remainder dealt with hypothetical case studies with real data or real examples with hypothetical data. However, limited publications were reported on the petrochemical and chemical industry. In another review, Yadava et al. (2011) highlighted that maintenance is becoming a profit-driven business function instead of just a supportive function to the assets. The effort in improving operation up-time and enhancing assets' capability and service lifetime could improve the cost-effectiveness of the process. This is specifically applicable to the oil and chemical industry as they are highly susceptible to hazardous risks, which are detrimental to production, operation, human and environment. The common assumption in maintenance modelling is that the maintenance is 'imperfect' (see the latest British Standards of risk inspection (BS-EN-13306, 2017) . Carlo and Arleo (2017) conducted reviews on the applications of 'perfect' maintenance into real practises in the industry and provided guidelines to select the proper 'imperfect' maintenance model. Jonge and Scarf (2019) presented a comprehensive review of different mathematical approaches used by researchers in generic systems up until 2018 in maintenance planning Tseng et al. (2015) did a comprehensive review of the maintenance management models in various industries. Most of the studied areas focus on the power plants, manufacturing facilities and oil refineries. Various published works from 1995 to 2011 that utilise preventive and predictive maintenance strategies, e.g. time-based maintenance (TBM) and condition-based maintenance (CBM) are reviewed. Alrabghi and Tiwari (2015) conducted a state-of-the-art review of simulation-based approaches for maintenance systems in various industries. Majority of the reported works used discrete event simulation and genetic algorithm (GA) as an optimisation tool. They suggest that a well-developed framework to guide the maintenance planning process is needed in real case studies, especially in the context of CBM. In summary, a significant body of scientific literature about the maintenance management models can be found. However, the overall lifecycle of the asset is not (i) Long-term Maintenance Planning. This section analyses the state-of-the-art practises in long-term maintenance planning, including time-based, risk-based, condition-based and opportunistic maintenance. Short-term maintenance schedule models in the chemical plant are discussed as well. (ii) Long-term Maintenance Strategy Selection. This section mainly assesses various approaches (qualitative and quantitative) on selecting the optimal maintenance planning strategy for the chemical process. (iii) Long-and Short-term Maintenance Planning for Integrated Chemical Sites. The up-to-date analysis of published works on Total Site maintenance planning and scheduling is provided in this section. (iv) Summary and Perspective on Asset Maintenance and Management. Suggestions on the future development of asset maintenance planning and scheduling models, as well as way-forwards of asset optimisation other than maintenance, are proposed. Generic maintenance management approaches such as Total Productive Maintenance (TPM), Reliability-Centered Maintenance (RCM) and Total Quality Maintenance (TQM) are not discussed in this paper. These approaches are based on guidelines for practitioners and standard operating procedures. Such procedures and guidelines are the output of conceptual modelling and optimisation, which positions them beyond the current scope. This paper focuses on the analysis of different strategies used by other researchers in optimising maintenance planning. For more information about their descriptions or implementations, the readers are suggested to refer (Ahuja and Khamba, 2008) for TPM, (Hezoucky and Agence internationale de l'énergie atomique, 2008) for RCM and (Al-Najjar, 1996) for TQM. The collected articles in this review paper have been found via Google Scholar, Scopus and Web of Science databases. The articles are searched by the following keywords: 'Maintenance Scheduling | Planning' & 'Chemical Process | Production | Refinery', where '&' and '|' are logical operators for 'and' and 'or' . Figure 2 shows the publication statistics using these keywords. Web of Science lists fewer publications (total 1,033) compared to Google Scholar (total 1,200) and Scopus (total 1,047). Articles from Google Scholar are collected and filtered manually for analysis in this review to avoid omitting relevant papers. The publications for 2020 are published as of 1 st July 2020. distribution functions of the failure rate over time as predictors of the equipment reliability. Time there may refer to the elapsed time since installation or repair (calendar years), or operating time (e.g. accumulated hours of service). This maintenance policy is further divided into two sub-classes of methods: age-based policy and block-based policy (see Figure 4 ). The age-based policy is a preventive maintenance/replacement policy that is dependent on the equipment's lifetime, assuming a specific deterioration rate with equipment ageing. Failure replacement is conducted if a failure occurs before the maintenance time. Replacement also occurs if the time elapsed reaches a statistically determined period, even if failure has not yet occurred. The time of replacement end then becomes the starting point for the next period of maintenance. The maintenance period is affected by the failure events. This policy brings the advantage of maximising the lifetime of equipment but requires complex maintenance activity and planning (Jiang and Ji, 2002) . The block-based policy is a periodic maintenance policy that is conducted on the component(s) independent of its/their previous failures. Preventive maintenance is conducted after fixed periods. Even after a failure occurs, and the next maintenance period is unaffected. It was implemented to maximise the production rate and allow the maintenance to occur on the basis of a fixed schedule. In the case of 'imperfect' maintenance model, the maintenance policy can be aperiodic due to the increment of failure frequency. However, the main concern is that the new equipment may undergo excessive maintenance, which incurs extra maintenance cost (Kamaruddin and Ab-Samat, 2014) . The failure and reliability of data for equipment are the main input for the timebased maintenance optimisation model. Failure analysis through component reliability study must be performed at the design stage (Vilarinho et al., 2017) . The goal is to determine the fundamental reliability functions, in particular, the failure density function (Eq1), the survival or reliability function (Eq2), and the hazard rate function (Eq3). The estimation of the reliability functions from collected data set is generally made up of two approaches: the first approach derives directly empirical survival functions from real data about time to failure (TTF), and it is called empirical function direct to data; the second, called theoretical distribution research is more complicated and most appropriate for theoretical distributions, such as Weibull, exponential or normal (Gaussian) (Faccio et al., 2014) . The authors concluded that the second approach is usually adopted because it provides more information, in particular, a better evaluation of reliability from the range of collected data. The theoretical distribution can be used in more complex engineering analysis of reliability and maintenance policy decisions. The most popular distribution model for the failure function is the Weibull model (Eq1), because of its flexibility. It is suitable to study the lifetime of components with different hazard rate functions (Jóźwiak, 1997) . Its mathematical formulation is as follows: (3) Where f(t) is the failure density function, y is the scale parameter, b is the shape, Tf is the failure time, R(t) is the reliability function, and h(t) is the hazard rate function. The estimation of scale and shape parameters can be carried out from the run-tofailure analysis. The application of least square method allows calculating the Weibull parameters according to empirical reliability data. The estimated failure rate data for plant equipment can be obtained from the recently published OREDA handbook (OREDA, 2015) . In the industry, the application of the PM strategy can be generally performed through either experience or original equipment manufacturer (OEM) recommendations which are based on a scientific approach. The application of PM through experience is a conventional PM practice. In most cases, it is performed at regular time intervals (Shey-Huei Sheu et al., 1995) , for example, every 1,000 h or every 10 d, based on recommendations. Through experience, no standard procedures are followed, and knowledge from technicians and engineers for maintenance purposes is a valuable asset to the company. Technicians and engineers in this setting learn from previous mistakes and past experiences. They are able to detect the abnormal conditions of a machine by observation, and then decide the appropriate PM actions to apply in order to avoid machine breakdown. The main drawback of PM through experience is that the company may face difficulties when the experienced person leaves the company. The personnel may not always be present in production lines to solve maintenance problems. This PM practice is not usually applicable when attempting to minimise operation costs and maximise machine performance, as mentioned by Ahmad and Kamaruddin (2012) . One of the earliest instances of periodic maintenance planning in the chemical industry is the work done by Dedopoulos and Shah (1995) . They first developed a short-term stochastic scheduling problem to determine the expected profitability of a multi-purpose plant varies with the hypothetical equipment failure rate. The solutions are then transferred to the next long-term maintenance optimisation problem to determine the optimal time-based maintenance policy, with the objective to maximise the expected profit of the plant. The maintenance policies determination for a multipurpose or batch production plant framework are complicated as the process operation, and equipment flexibility must be considered. The unplanned equipment shutdown could affect the timely production of different products. Tan and Kramer (1997) proposed a generic framework utilising Markov model to identify optimal preventive maintenance plan in a feed pre-heat exchanger network. As failure is a stochastic phenomenon, it is necessary to consider it quantitatively from a probabilistic point of view. In the mentioned works, the failure analysis provides a basic understanding of the failure characteristics that enable the plant reliability engineer to estimate the bottlenecks of the operations. However, the quantitative aspects of reliability engineering are often inaccurate and criticised as J o u r n a l P r e -p r o o f complicated. Their models also far from applicable because the lifetime data is solely theoretical, and the risk aspects are not considered. Mauney and Schmidt (1997) presented a quantitative decision-making model by integrating engineering analysis with economic evaluation of the chemical process plant components. They applied the use of failure analysis to determine the PM action at the optimal timing to obtain maximum NPV. They applied the concept to primary reformer replacement in a single path ammonia synthesis unit and boiler feed water pump as case studies. However, their model only focuses on a single equipment, which neglects the interaction between units. As failure effects may propagate through the equipment in the real case, their simplified model limited its practicality and required further development. Charles et al. (2003) had implemented the periodic PM and CM strategies on batch plant scheduling in a semiconductor wafer fabrication facility. By examining the industrial data, the lifetime distribution and the repair time for all equipment are fitted. In their work, the MELISSA-C++ is utilised as an object-oriented simulator to model the interaction effect on production and maintenance. The interdependency between PM and CM actions are also identified. They also performed a graphical search to determine optimal PM frequency, with the objective to minimise the direct and indirect maintenance costs. The proposed work is only tested on a bench of semiconductor manufacturing prototype, which is yet applied in a real plant. As the optimum cost is only identified using a graphical approach, an advanced global optimisation tool or algorithm is needed to identify the optimal PM frequencies. In contrast to the approach that mainly focuses on identifying effective PM policy, Pistikopoulos et al. (2000) proposed a general MILP optimisation framework to properly account for maintainability of equipment at process design level for a simple multipurpose chemical process. They also proposed the model for simultaneous optimisation of design, production and maintenance planning. The degree of assets utilisation is determined by the initial reliability characteristics and implemented time-based maintenance policies. Goel et al. (2003) also proposed a MILP model to integrate reliable process design with production and time-based PM planning for a large-scale industrial process. Their frameworks allow the selection of reliability of the equipment at the design stage by balancing the associated cost with its impact on process availability at the operational stage. However, the frameworks proposed by the mentioned works only applied to the imaginary and simplified block-based case studies. The practicality of the proposed models might be low for large-scale problems due to the large numbers of variables. The associated failure events and risks are not incorporated into the models either. A review of TBM application prior to 2010 is given by Das and Sarmah (2010) . They focus on the review of the theoretical optimisation models for the repair/replacement/inspection of components/assemblies/ subsystems in heavy process industries. The maintenance grouping models based on economic dependencies are also provided. According to that study, the grouping of maintenance activities that deal with combinatorial aspects of maintenance planning is more practical in industries. However, the articles they reviewed mostly deal with theoretical optimisation models with no real applications. Most of the models made several assumptions, which restrict their practicality in the real industry. In recent years, Mat Dan and Zulkafli (2016) applied Weibull analysis on the equipment in a gasification process to determine their failure behaviour. The data is either estimated or obtained from the company's owner with permission. They fitted Weibull function to the data in order to determine the minimum and maximum of constant maintenance intervals as well as the frequencies of maintenance. They also estimated the size of labour required to perform the maintenance tasks. However, their method might not be applicable to the real process as several factors are ignored (e.g. unit dependencies, risks and maintenance-production relationship). A framework to optimise the reliability of a heat supply system is illustrated in the work of Postnikov et al. (2018) . The joint optimisation of the component reliability of heat source and heat network schemes is considered in their study. The problem focuses on the determination of the optimal redundancy (hot standby) allocation to the equipment (to reduce failure rates) and the effective maintenance policy equipped with advanced control systems and instrumentations (to reduce restoration time). They also consider the post-emergency heat supply systems in the event of component failure. Their proposed model ensures the most efficient distribution of heat supply with high confidence in heat supply reliability. Thodi et al. (2016) had applied a Markov modelling technique to identify the optimal PM intervals for each equipment in the liquefied natural gas processing facility. By setting certain availability threshold for the process, the PM time interval for each unit can be determined. Their study is able to determine the states of the equipment (failed or operating) for a certain timestamp based on their corresponding failure and repair rates. However, their study did not consider the associated maintenance costs and assumed the equipment is 'as good as new' once they are maintained, which is hardly the case in practice. Nguyen and Bagajewicz (2010) proposed a Monte-Carlo simulation-based maintenance optimisation model, with the objective to minimise the maintenance costs for a 2 y time horizon of plant operations. The model is an extension of a previous analysis framework (Nguyen and Bagajewicz, 2008) , including more features, such as spare parts inventory policies, human labour allocations based on their specific skills and imperfect maintenance model. They applied the model in the realistic chemical processes: Tennessee Eastman Problem (Ricker and Lee, 1995) and a large-scaled fluid catalytic cracking (FCC) unit in refineries. Based on reliability data obtained from previous studies, the PM time schedule and resources allocations (spare parts and human labours) were determined with optimal maintenance costs. However, to simplify the model, they assume the maintenance intervals and starting times are fractions of the equipment's Mean-Time-to-Failure (MTTF). The workforce size is also determined solely by the maintenance intervals with no scheduling planning. Further development of a maintenance schedule is required to determine the manpower and inventories requirements. Mendes and Lorenzoni (2018) also applied a Monte Carlo simulation approach to determine the optimal periodic maintenance intervals (lowest maintenance cost with required availability) for generic cold-standby systems. This approach is useful for any probabilistic failure functions, not limited to exponential functions only. Hamedi et al. (2019) applied a similar approach to maximise the availability of the process. Simulation approach allows the simplified analysis of a complex system or maintenance policy. A real application of such an approach is needed in order to validate the model. Halim et al. (2014) also studied the maintenance planning for industrial compressors. As the demands change in every season, the operating speed for the compressors would need to change as well. This, in turn, results in an unexpected breakdown occurs. The expected number of failures, which depends solely on the collected failure rate data, is minimised by determining the planned periodic maintenance intervention time. However, the data source for their model is unknown and depends entirely on the statistical analysis. Kopanos et al. (2015) provided a general mathematical framework to optimise the operations of the compressor and maintenance simultaneously in an air separation plant. As compressors are among the most energy-intensive parts in most industrial environments, they are usually targeted for energy and cost savings (Saidur et al., 2010) . In their model, they considered the PM tasks with fixed and flexible maintenance starting times, both determined by the pre-defined minimum shutdown time for compressors with an associated cost. The starting time of the flexible maintenance tasks is the decision variables to be identified in the proposed framework. They also considered a reactive maintenance scheduling based on fluctuations in compressed air and oxygen demands of the system. The proposed flexible maintenance policy resulted in reduced start-up, shut-down and power consumption cost. However, they assume the compressors are not susceptible to fouling. Fouling process or vibration analysis should be incorporated into the model to allow more effective maintenance planning. From the analysis of maintenance planning with time-based method from the above, all of the published works rely on the use of prior failure data of the equipment. Most of the works formulated cost-based optimisation models and setting the desirable availability or reliability threshold to acquire the optimal maintenance tasks frequencies (reactive or proactive). For example, Charles et al. (2003) fitted the lifetime and repair time distribution of the semiconductor facility to identify the optimal maintenance intervals in order to minimise direct and indirect maintenance cost. Nguyen and Bagajewicz (2010) used a Monte Carlo simulation on the plant reliability data, coupled with cost optimisation model to identify the maintenance intervals as well as the workforce size and the spare parts inventories. Thodi et al. (2016) applied the Markov modelling approach with the aid of failure data from the liquefied natural gas processing plant to determine the optimal preventive maintenance tasks which give minimal cost. Several sources also propose the integration of maintenance planning with process design or production planning. Kopanos et al. (2015) formulated a mathematical framework for planning operations and maintenance of compressors jointly in an air separation plant. The maintenance and operational decisions are made so that the demands fluctuations and minimum shutdown time are fulfilled for reliable operations. In summary, researchers often utilise the historical failure data to provide a numerical representation of the equipment's reliability variation with time. The estimated failure data of individual equipment is mostly obtained from the OREDA handbook (2015) . The data allow the estimation of units' lifetime so that maintenance can be performed to extend their uptimes. However, the data are not always available in real practice. The failure data can be inaccurate or misinterpreted. As the data are often recorded by the person-in-charge, incorrect judgement on the failure modes of the unit may occur. A unit replacement due to end-of-life of equipment or planned maintenance may also be mistaken for failure during data collection (Ahmad and Kamaruddin, 2012) . For forecasting the condition and to plan maintenance of novel technologies and processes that have recently been commercialised, it is unsuitable for relying on historical data. The development of this type of maintenance models is also very timeconsuming. Most of the methods require the intensive use of mathematical and programming knowledge to set up the optimisation models. This limits the use of such models in practical applications. Simulation-based approach (Monte Carlo simulation) is an alternative to modelling a complex system or maintenance policies. The failure time data also provides no information about the types of failures. The failure modes are often assumed and modelled as random events based on intuition. Dekker (1996) also pointed out that a sufficient set of lifetime data has to be collected for a few years for modelling purposes. It can be time-consuming and may be expensive to collect, process and validate an adequate amount of data. It is often difficult to define the most effective maintenance intervals due to the lack of reliable historical data. This may lead to unnecessary maintenance that itself may cause deterioration of machines if incorrect maintenance is implemented (Srivastava et al., 2018) . Significant improvement of the process reliability would require the selection of better equipment at the design stage. Not too many works have dealt with the interdependency between units. TBM relies on the statistical lifetime model to anticipate failure on a unit. However, the failure propagation effect could not be identified. For example, a failure of a parallel heat exchanger systems might cause the redistribution of the streams to a standby unit. The capacity of the standby unit might not be able to accommodate multiple streams and eventually disrupt the subsequent units. The effects of cascading failure should not be neglected. A small breakdown of a unit might adversely affect the whole production network and potentially cause catastrophic disasters. Although time-based maintenance is a theoretical approach, it still provides a logical estimation of the failure behaviour of the unit, and especially of the failure trends with time. The engineers can understand the failure rate of the units by investigating their past failure behaviours. This provides engineers with some numerical background to support maintenance decision making. Time-based maintenance planning alone is of limited use and to be applied in practice; it has to be combined with additional information and models. The use of other data sources or information that are available in practice can be integrated into the time-based approach. Data or information such as equipment monitoring parameters, condition data from the manufacturer or ISO standards, subjective information from experienced personnel and risk assessments can be combined with the lifetime data to strengthen the maintenance decision-support system, suggested by Ahmad and Kamaruddin (2012) . The following section describes the use of risk analysis techniques to aid in maintenance decision makings in the process industry. The periodic inspection and maintenance on the equipment without any bases is certainly not the optimal maintenance strategy. As the chemical process equipment often runs under rigorous conditions, they are susceptible to erosion, wear or tear, and other risks, which results in large financial loss. As such, riskbased maintenance (RBM) strategies have received increasing attention from researchers and developed to account for unpredictable risks. The main part of RBM is to improve plant reliability while achieving a significant reduction in failure likelihood and risk severity/consequences (Silva, 2016) . Combining the risk assessment methods with risk-based decision tool allows the improvement and optimised asset maintenance management. The information and approach on how risk-based methodology could achieve better asset management can be found in (Montgomery and Serratella, 2002) . RBM first requires the risk assessment of an industrial unit. Risk assessment is a systematic approach to understanding, expressing, and evaluating the risk in chemical industries using available knowledge. It is formally defined by Kaplan and Garrick (1981) as a triplet of questions, namely: (1) "What can go wrong?" (2) "How likely is it?" and (3) "What are the consequences?". If the computed risk is above the acceptance threshold, proper mitigation actions need to be taken to reduce the overall risk. The inspection and maintenance strategies (CM or PM) are prioritised based on the quantified risk values for each equipment. The extended work by Hameed and Khan (2014) consists of a qualitative risk ranking matrix on the plant equipment in a process plant. As an example, Risk-Based Inspection (RBI) is proposed as an alternative for time-based maintenance activities through the implementation of Special Scheme Inspection (SSI) Regulations 2014 in Malaysia. Mohamed et al. (2018) stated that the introduction of the regulation causes tremendous change to the Malaysian industries as periodic inspections approaches were transformed into a more flexible risk-based approach. This is well-received by the petroleum and chemical industries, for which the 45.7 % of the RBI Scheme users belong to chemical and processing industry, 8.6 % to the oil refinery and the rest to a gas processing facility. The occasional industrial accident, such as the Bhopal disaster (Chouhan, 2005 ) that led to the catastrophic release of methyl isocyanate (MIC) from Union Carbide's facility, could have been avoided through risk analysis (Ishizaka and Labib, 2014) . The expost event analysis was done by Ishizaka and Labib (2014) shows that the incident was caused by a combination of design flaws, inappropriate materials, maintenance delay and human errors. Those errors could easily be mitigated through proper risk analysis of the process. This suggests that learning from risk prediction is less costly than learning from accidents. Risk assessment is classified into four main groups: qualitative, semiqualitative, quantitative and hybrid. Khan et al. (2015) conducted a comprehensive review of the state-of-the-art risk assessment method before 2015 in the process industries, concerning the past, present and future trends of risk assessment. According to their study, the popular risk models used are Hazard and Operational Study (HAZOP), Failure Mode Effect Analysis (FMEA), Fault Tree Analysis (FTA) and Event Tree Analysis (ETA) before the 1990s. More recent works integrate the models with networking approaches such as AHP and ANP to determine the interaction between events. For example, Ishizaka and Labib (2014) presented a Crisis Tree Analysis with AHP to identify the criticality of each risk for the Bhopal incident, and similarly, Labib (2015) utilised Fault Tree Analysis. Following the evolution of technology, the researcher tends to focus on the quantitative approach, including probabilistic and dynamic risk analysis (see a review of dynamic risk analysis in (Villa et al., 2016) . Fault detection is also integrated into the framework to aid in decision making. Table 1 shows the summary of recent publications of research articles regarding the risk assessment approaches in the chemical process industry. It can be noticed that the researchers are inclined toward quantitative or hybrid approach, such as fuzzy modelling, Monte Carlo Simulation, Bayesian Network (BN), P-graph, etc. The analytical approach is developed to handle the uncertainty with probabilistic quantification. BN is particularly well-received due to its uncertainty handling and updating abilities. The review of BN for risk quantification in the process industry is well discussed in (Kabir and Papadopoulos, 2019) . Its usage in climate change impacts in risk assessment is also discussed in (Sperotto et al., 2017) . The risk analysis model is not discussed in detail in this paper. Instead, this paper focuses on the application of such risk models in making maintenance decision (see Section 2.1.3.2 onwards). One of the earlier works for RBM was done by Harnly in 1998 (Harnly, 1998 . They developed a risk-ranked inspection recommendation procedure that is used in one of Exxon's chemical plants to prioritise repairs that are identified during equipment inspection. The associated risks in the plant are identified through the FTA, which are determined based on the experiences on operations. The equipment is prioritised based on the severity index, which is the failure potential combined with consequences of failure. The inspection or maintenance can be prioritised according to the computed risk values. Hannu and Backlund (2002) determined that the effective use of resources can be achieved by optimising where and when to perform maintenance on the equipment in a hydropower plant. However, details of risks methods are unknown. The risk analysis requirement is vague and not organised, with incomplete documentation. Jovanovic (2003) made a review on the use of RBM in European power and process plants, including some suggestions on methods, frameworks, and standards to be followed. The review analysis from the author shows that the detailed RBM requires extensive data, models and software. Khan and Haddara (2003) proposed a framework for RBM, where the failure events probabilities are determined via FTA. By setting risks as constraints, the maximum probability of failure occurrence can be determined by using reverse fault tree analysis. The adverse effects of the failure on financial, production, human and environment are incorporated in the risk analysis framework. The optimal periodic PM is determined given the threshold for risk and reliability. In their follow-up work , they applied a similar strategy to offshore oil and gas processing facilities. A detailed maintenance plan is constructed for the safe and theoretically fault-free operation of the facilities. These authors also applied a similar framework to an ethylene oxide production plant. In their study, the ethylene transportation pipeline is determined as the highest risk unit based on risk analysis and evaluation. The cost-effective maintenance program is determined with the proposed methodology. Krishnasamy et al. (2005) also developed an RBM methodology to an operating oil-fired electrical steam power generating a plant. The maintenance intervals are decided with the objectives to reduce financial risk and increase equipment reliability. FTA is also used to determine potential risks and failures. Hu et al. (2009) proposed an imperfect maintenance model by implementing similar RBM strategy to a petrochemical reforming reaction system in Sinopec Luoyang. The periodic PM plan for plant equipment is established to meet the risk criteria. It can be observed from those models, that subjective risk analysis utilising FTA are applied in the maintenance planning. It allows the graphical representation of the risks and their associated causes and effects, which provides better visualisation of the risks or fault propagations. FTA based methods are extensively used in the field of risk analysis of process systems and fault diagnosis. However, this approach is not suitable for analysing large systems as it is not able to identify redundant failures, common cause failures or mutually exclusive primary events. The failure events in FTA are J o u r n a l P r e -p r o o f also assumed to be independent, which is not applicable to real systems (Khakzad et al., 2011) . They are also unable to directly handle logical feedback loops and require additional steps to remove circular logic that may exist in real systems (Lim and Han, 2012) . Apeland and Aven (2000) proposed that the Bayesian method could be a potential alternative to RBM optimisation compared to the conventional probabilistic framework. As risk analysis is a tool for treating uncertainties, the classical probabilistic approach is not able to represent the uncertainties fully, and the analysis is often not convincing, especially the uncertainties are large. In an overview study of BN done by Weber et al. (2012) , they showed that BN is widely accepted and applied in the contexts of risk analysis, failure dependability and maintenance. The research work on the use of BN in the topics had increased by 800 % in the last 10 y. The full Bayesian approach describes the uncertainties as probabilistic distributions, based on data collected, subjective estimates, and experiences. The breakdown events in a chemical process would contribute to the release of hazardous emissions to the environment. As such, Vassiliadis and Pistikopoulos (2000) proposed a multi-criteria maintenance optimisation model, that integrates the environmental risk, plant reliability and process design over a particular time horizon. The model is applied in a reaction-separation process as a case study. The environmental risks are modelled as emissions ratios relative to the standard limit value. Optimal maintenance actions (corrective maintenance or periodic maintenance) are determined on the process, which gives the maximum profitability/minimum environmental risks. The trade-offs between environmental impacts, process revenues and maintenance costs are determined as well. Such analysis provides guidelines to decide on what maintenance policies are needed so that profits are maximised while complying environmental legislation limit. However, the maintenance model is simplified and only assumes a single maintenance action at a time with independent failure events, which rarely occurs in real cases. Dey (2001) proposed an Analytic Hierarchy Process (AHP) method to assess risks level in cross-country petroleum pipeline. The inspection and maintenance of pipeline segments are prioritised that ultimately reduce the maintenance cost. The efficient design and operation guidelines, construction methodology, and logical insurance plans are also suggested. The weighting factors of the failure modes are based on subjective information and experiences from various experts. For this method, the opinions from the experts are not able to capture the information about the risks fully. As a consequence, although the risks are quantified, the failure events still could occur unpredictably. Further advanced methods have to be combined with the cognitive experiences from the individuals to allow more accurate and effective risks quantification. For instance, fuzzy modelling of experts' opinions can be a useful approach, due to its ability to account for vagueness and uncertainty. Bertolini et al. (2009) proposed a strategy by combining a panel of academicians and refinery operators to develop qualitative and systematic procedures for RBI and RBM for petrochemical plants. Communication between experts is encouraged, and information is shared about the process. Failure categorisation, severity and decision criteria are developed based on the outcomes of the panel. Maintenance actions are identified, and the schedule is determined to reduce the financial and safety risks. The work procedure developed led to easily comprehensible technical discussions between personnel that interact with the system. Qinqin et al. (2014) utilised bow-tie analysis and fishbone diagrams to evaluate the critical factors of the environment-related hazardous events and risks within the petrochemical industry. Various risks indices are developed for determining routine inspection and better management of risk sources within the facility. The framework is a valuable and flexible tool. As new research data are generated or time factors are considered, the model algorithms may need to be updated dynamically. The output may also be further enhanced to allow for better and faster data interpretation. Some statistical analysis may be incorporated to account for the confidence intervals of the risk levels. Mohamed and Saad (2016) also devised a quantitative likelihood of risk (LOR) method to evaluate the risks identified by the FTA for a mixer and a valve in a petrochemical plant. The risks criteria are computed as the monetary terms based on the area under the damage radius. The CM and PM costs are identified as financial loss, while production loss is neglected. They also assumed the weights for the consequences of different risk criteria are equal. This is rarely the case as the weights are dependent on the preferences of the decision-makers. Aoki et al. (2014) conducted a study of RBM in a Japanese nuclear power plant. FTA is used to evaluate the potential risks associated with the process based on the judgement of the experiences. As the nuclear power plant requires highreliability operations, careful judgement and suggestions are necessary to prevent any undesirable danger events. In their work, they proposed the use of probabilistic analysis of plant reliability and the corresponding failure rate to determine the relevant maintenance interval and actions. The aim of their work is to develop new inspection and on-site work-team plan with technical knowledge and experience regarding the maintenance planning from the past. Since the probabilistic analysis is theoretical and often represent the failure behaviour inaccurately, the practicality in the nuclear power plant is highly doubted due to the high-reliability requirement for operations. This method may still be applicable for process plants with high occurrence of less-critical failures. Melani et al. (2018) presented a framework to determine the criticality of the equipment in a flue gas desulphurisation system of a coal-fired power plant. HAZOP and FTA are used to assess the potential process of deviations and failure events. The Failure Modes and Effects Criticality Analysis (FMECA) is then used to identify the criticality of the events. ANP is used to rank the most critical component, considering the equipment's Mean Time to Repair (MTTR), financial impact, environmental impact and the energy generation impact. The ranking figures are shared with the engineering team of the plant, and they agreed to the evaluated results based on their perception of the historical data of the plant. However, the methods are mainly based on subjective opinions, and risk uncertainties are not considered. Some advanced methods (e.g. BN or fuzzy inference) may be incorporated to account for the uncertainties of the risks, with fault diagnosis and prognosis for better maintenance planning. Another RBM study is performed by Vinod et al. (2014) on an H2S based process plant. A qualitative approach to estimate consequences on leakage of H2S using influence factors from the American Petroleum Institute (API) is proposed. For their case study, the H2S flowrate falls below the lowest category as per API 581. They developed a new approach to scaling the quantity factor by performing CFD simulation on H2S dispersion at ground and stack level. The proposed methodology classifies the equipment in different inspection categories based on their failure likelihood and consequences. It provides a substantial reduction in the unnecessary inspection of the system. The model provides useful insights and visual representations on the H2S dispersions in the plant. However, the maintenance approach is only performed after the leakage has happened. Fault prognosis on the leakage using condition monitoring sensors could be implemented into the model. Keshavarz et al. (2012) combined the preventive maintenance policy, active redundancy and standby redundancy to achieve the minimum risk for the expected life of an LNG plant. In their work, the consequence of the risk is calculated in the monetary terms, which focus on initial asset cost, income, production losses and maintenance costs. The method proposed is capable of improving the plant reliability while maintaining its operational risk above the desirable level, which is superior to the maintenance strategy of periodic intervals. However, the work did not consider the equipment prioritising criteria for shutdown intervals characterisation. The numbers of maintenance and shutdown can be reduced by removing selected equipment that poses a relatively lower risk to plant reliability. The criticality analysis of the equipment done by the mentioned works is mainly qualitative, which limited the usefulness in quantifying risk uncertainties. Shutdown inspection is necessary to perform a maintenance check on the equipment to reduce the risks exposures. Hameed and Khan (2014) estimated shutdown and maintenance intervals for a gas chilling and liquefaction unit in a hydrocarbon processing facility based on the reliability-based risk analysis of the equipment. The total operating costs are reduced due to lower production loss and lower risks. Hameed et al. (2016) extended the work by integrating the factors of human errors is proposed to determine the optimal shutdown inspection and maintenance interval for a processing unit. The human errors are modelled using the Success Likelihood Indexing Method (SLIM) is applied in this work. Expert judgement is used to quantify the performance factor and importance of the task, which is then converted and incorporated into failure probability equations. Huang et al. (2012) present a qualitative and quantitative framework to assess risk factors in the petrochemical industry. Five criteria are considered in their study: J o u r n a l P r e -p r o o f financial risks, logistical support, service level, learning and innovation and risk control. The performance of the unit based on these criteria is assessed by collecting expert opinions, and the Analytic Network Process (ANP) is used to calculate the weighting factors. Four different companies that handled headstream, midstream and downstream of the processes are chosen as the industrial case studies. The proposed model is proven to be a reliable reference for asset management, as the dependencies between each criterion could be determined using ANP method. The downside for this method is that the risk values are deterministic, which are not able to represent the uncertainties fully. Kumar and Maiti (2012) presented fuzzy ANP (FANP) to select the optimal maintenance policies in a section of a chemical plant. The criteria considered are the equipment risks and maintenance costs. The use of fuzzy intervals in priority judgements removes the shortcomings of AHP and goal programming approach that use crisp or deterministic values for decision making. ANP handles all relationships between goal, criteria and alternatives. The study provides better insight in finding safer and more economical maintenance strategies and applicable to all process industries. Yazdi et al. (2019) had developed a fuzzy dynamic risk-based maintenance investment optimisation on a separation system of a common offshore facility. FTA is used to perform failure likelihood assessment and the severity of the failure. The safety investment factors are to-be-determined using AHP. As classical AHP is often insufficient to capture human cognition, the Intuitionistic Fuzzy Analytic Hierarchy Process (IFAHP) is proposed to determine the important factors in the risk index quantitatively. A bi-objective optimisation model is formulated to minimise failure probability and risk severity, subjected to costs budgets constraints. Suggested expansion on safety investment on risks management is provided. Jaderi et al. (2019b) applied the fuzzy RBM methods for the risk analysis of petrochemical assets failure in the Amirkabir petrochemical plant. The use of a fuzzy membership function allows the incorporation of vagueness and risks uncertainty to risks factors evaluation. The hybrid of both quantitative and qualitative analysis of risk as well as statistical analysis is introduced in their work. Four consequence factors are considered, which includes operational impact, operational flexibility, maintenance cost and impact on safety and environment. They concluded that the fuzzy calculation is more rigorous and accurate compared to traditional RBM methods. BahooToroody et al. (2019) presented a dynamic risk-based optimisation framework for maintenance schedule for a Natural Gas Regulating and Metering Station. Time series regression is used to predict the future perturbation behaviour of the pressure of the unit based on the historical data. The behaviour of the failure in the plant is modelled using Dynamic BN, Damage Modelling and sensitivity analysis. Markov degradation model is also used to model the process deterioration in the discrete-time case. The optimal maintenance time schedule is estimated by performing cost analysis on the inspection and maintenance strategies Bhandari et al. (2016) also applied the dynamic risk-based risk assessment and determined the optimal maintenance interval for a typical offshore processing facility. The undesirable failure events are first determined using scenario analysis, J o u r n a l P r e -p r o o f and the BN approach is used to evaluate the dynamic risk of the facility. The methodology incorporates the sub-components dependencies, model the risk uncertainty, and produces consistent maintenance and inspection frequencies. The consequences also can be represented with nodes in the networks. The consequence analysis can be performed by determining the damage radii (the radius of the area which damage would occur, property damage and toxic effects, which was presented by . Leoni et al. (2019) applied the BN approach on the risk-based maintenance model for a Natural Gas Regulating and Metering Station, after BahooToroody et al. (2019). A step-by-step framework is proposed to model the associated risks using the BN method. In their work, the risks are analysed qualitatively and differentiated into three main categories: minor risk, major risk and catastrophic risk. The optimal maintenance intervals for each component are estimated, and the critical component (longest maintenance interval) is identified. The use of BN is clearly beneficial due to its updating ability. The dynamic risk assessments can be performed with such a method with the changing opinions of the plant authorities. The ability of BN to utilise statistical functions to model risk uncertainties can aid engineers in estimating the maintenance outcome. The works mentioned in the above clearly demonstrated the advantages of BN. Nevertheless, fault diagnosis and prognosis approach can be incorporated into the BN framework to allow more effective and accurate maintenance planning. Using AI or Big Data analytics for risk evaluation is the emerging field for riskbased maintenance management. Sen et al. (2018) had recently proposed an AI platform to facilitate asset management in water, wastewater and reuse plant. The equipment criticality scores, coupled with risks reduction metrics and the probability of failure, allows the proper planning of maintenance. Jaderi et al. (2019a) proposed an unsupervised learning model based on self-organising maps (SOMs) to evaluate the risk level in a petrochemical plant in Iran. The use of SOMs is applicable to quantitative and qualitative data, which can be used in an accurate estimation of risk in petrochemical processes or equipment. The risk management is inherently a matter of pattern recognition, and SOMs are relevant to this type of problem, as suggested by Nikoo et al. (2019) . Five critical parameters (operation impact and flexibility, maintenance cost, safety impact, environmental impact, frequency) are inputted to the model, and the training algorithm applied is retrieved from Kohonen (1989) . The computed results from their model are compared with the fuzzy model, and they concluded that this method yields higher accuracy and flexibility to data input. The use of adaptive learning algorithms to compute predicted future risks is an interesting framework and allows engineers to plan maintenance proactively. However, the use of such models often requires extensive sensor data and computational power. The investments in developing skills or training on the computing techniques are needed. The framework could also be integrated with the health condition data of the units to make better predictions. It is noticeable that quite a number of publications and applications have been reported on the use of a risk assessment approach to aid in maintenance management. Several tools for risk assessment that are coupled with maintenance planning have been reported. The tools included are FTA (Mohamed and Saad, 2016) ANP (Huang et al., 2012) , FANP (Kumar and Maiti, 2012) , AHP (Dey, 2001) , FAHP (Zhen et al., 2018) , bow-tie analysis and fishbone diagrams (Qinqin et al., 2014) , HAZOP (Melani et al., 2018) , FMECA, standardised API approach (Vianello et al., 2016) , BN (BahooToroody et al., 2019) and SOM approaches (Jaderi et al., 2019a) . Reported work from Hameed and Khan (2014) and Hameed et al. (2016) to incorporate Human Errors in risk-based maintenance using Success Likelihood Indexing Method (SLIM). The dynamic risk analysis method is also reported to be applied in the maintenance management, for example, in Bhandari et al. (2016) to adapt for the continuous changing of experts' opinions. Based on the analysis above, it can be observed that human judgement is required to quantify the risk level. This requires some multidisciplinary expertise to assess the potential failures of the process. A detailed assessment of the equipment based on past experiences is needed to compute the severity index. The process can become tedious and troublesome if the process is large, which requires intensive documentation and expert meetings (for FMECA, FTA, HAZOP, ANP and AHP). The problem could become considerably large, to the extent that the risk computation becomes impractical using FTA or FMECA. The calculated risk indices, based on these methods, are deterministic as well, which could not capture the uncertainties of the failure events, human judgment and cognitive thoughts. Fuzzy modelling is thus incorporated into these methods to enhance the capability of coping with uncertainties. AI approaches such as SOMs are also applied in future risk prediction so that the maintenance could be planned in advance (Jaderi et al., 2019a) , but require a certain amount of computation knowledge to demystify the model. The risk-based maintenance can be a practical and better approach than the solely time-based approach. The downside of this approach is that the failure and consequences are stochastic, which requires probabilistic or fuzzy modelling. The uncertainties are not easy to model as no strong background knowledge or experiences regarding the asset's failure behaviour or other influential factors. They can be biased depending on the individuals. The accuracies of the proposed models are questionable since the information of the failure events is based on past historical data. In the case of extremely rare modes of failure, or for novel process technologies, historical data may be very sparse or even totally absent. The only method to deal with such uncertainty is by fuzzy modelling of the risk, influenced by the experts' experience or opinions. CBM relies on the monitoring of the operating conditions of the system to detect the problems and provide relevant information to the maintenance personnel. The system could be preventively maintained long before a failure occurs when necessary. This, in turn, saves resources (human and spare parts) and improves system availability. The main goal of CBM is to eliminate wasted operating time in mitigating failures and the risk of failures with quick detection of abnormalities (Xiang et al., 2012) . This maintenance approach is required for challenging systems with their reliability and functionality are of the utmost importance, such as aerospace equipment, nuclear power plant and offshore installations (Marseguerra et al., 2002) . Maintenance decision making under the CBM program is dependent on two contexts: faults diagnosis and prognosis. Diagnosis is the process of finding the source of a fault (Jeong et al., 2007) , while the prognosis is the process of estimating/predicting when a failure may occur (Lewis and Edwards, 1997) . The main aim of diagnosis is to provide early warning signs to engineers while the monitored equipment is operating but deteriorating. There may still be a certain amount of time before the actual failure occurs. The prognosis has to be done to address this situation. The main aim of prognosis is to provide further warning by estimating/predicting when the equipment will fail. As such, the equipment can be fully utilised, and the appropriate time to carry out PM just before it fails can be determined. The condition monitoring plays a significant role in detecting and diagnosing the fault of the equipment/system before any sudden failure or breakdown. The health of the current equipment can be inferred based on the obtained monitoring information. It facilitates the maintenance personnel to decide/modify the maintenance strategy of the plant. The international standard ISO 13381 documents the detailed monitoring, diagnostic and prognostic approaches for industrial machinery. The most popular monitoring approach used in the CBM program since 1997, especially for rotating equipment (e.g., compressors, bearing and gearbox), is vibration monitoring (Al-Najjar, 1997). The vibration monitoring refers to the use of sensing and analysis of equipment characteristics. The health of the equipment is tested or determined in situ (on-line) with the aid of special devices, such as vibration sensors, to detect changes that may indicate damage or degradation. The monitoring processes can be either periodical or continuous in practice. Sound or acoustic monitoring is another monitoring technique frequently used in CBM. While vibration sensors are rigidly mounted on the component involved to register local motions, acoustic sensors 'listen' to the equipment. Similar to vibration monitoring, sound or acoustic monitoring is executed on-line, either through periodical or continuous manners. The anomalies could be detected based on the frequency and amplitude of the collected signal. The acoustic monitoring can be performed efficiently for timely detection for check valves (Ayo-Imoru et al., 2015) . The process noise data fluctuations are used as the indicative signals to monitor the J o u r n a l P r e -p r o o f reactor, e.g. coolant flow velocity distribution. Ansari et al. (2008) applied the neutron noise approach to detect flow-induced vibration, enabling flow blockages or leakages in pipelines to be detected. Another method is the oil-analysis or lubricant monitoring. The condition (quality) of the oil is evaluated by determining whether the oil is suitable for further use. The results of oil analysis can show the wear conditions of internal oil-wetted components, such as engine shafts. This method has two general purposes: safeguarding oil quality and safeguarding the components involved. A detailed discussion of the physical test and contamination identification procedures that constitute a regular periodic oil sampling program is given by Newell (1999) . Other condition monitoring approaches include electrical, temperature, and physical condition monitoring. The electrical monitoring approach involves measuring changes in equipment properties, such as resistance, conductivity, dielectric strength, and potential. These are mainly used to monitor the state of electrical systems in the plant. The temperature monitoring method is often applied for the failure identification and monitoring of electric and electronic components. Physical condition monitoring focuses on the identification of the physical changes of materials, such as cracks and corrosion (e.g. tubes in heat exchangers). This approach is typically carried out off-line via direct inspection. For more information on the CBM methods, the readers are referred to (Ahmad and Kamaruddin, 2012) . This section presents an overview of the application of CBM in the field of chemical and process engineering. Ahmad and Kamaruddin (2012) had compared the use of TBM and CBM in their comprehensive review. Data accuracy and availability is the main issue for CBM. TBM, on the other hand, relies solely on theoretical statistical rules and analysis, for which the accuracy is often criticised. It is also more time consuming to conduct TBM and requires a mathematical background for engineers. More detailed applications and published works on TBM and CBM in other fields can be found in (Ahmad and Kamaruddin, 2012) as well. Shin and Jun (2015) provided an overview of some related international standards of CBM, which includes definitions, advantages and drawbacks of the CBM tools. Barbera (1994) had presented a heuristic to optimise the PM policy for a chemical system that is subject to random 'shock'. The problem investigated is the rotating pump units in a chemical plant in Monsanto. They modelled 'shock' as stochastic non-decreasing wear and deteriorating process. The proposed model could aid the decision-makers to determine whether it is cheaper to repair or maintain a deteriorating production system. In their work, only wear and tear of the equipment is considered as 'shock'. It is a good way to model 'shock' as a statistical probabilistic distribution function, but it could not provide reality representation of the shocks. The 'shock' can be measured by installing physical and real sensors in the equipment to monitor its condition. Zhao et al. (2010) proposed a predictive maintenance policy based on the condition data obtained by simulating the chemical process (Tennessee Eastman Problem). The process variables are transformed into a combined index. The method to calculate the control limit for the index is proposed and used in fault predictions. The reliability of the system is thus estimated and analysed with different maintenance cycles. Sikos and Klemeš (2010) have suggested a methodology to combine specific Heat Exchanger Network (HEN) optimisation tools with reliability software packages to improve the HEN reliability. For HENs, fouling mitigation strategies can be modelled using Fuzzy Logic Expert Systems (FLES) and Computational Fluid Dynamic (CFD) software. Comprehensive reviews on the reliability software packages are done by the authors. The suggested approach is applied in a HEN of a petroleum refinery plant in Hungary. Major failures such as breakdowns, fouling and leakages of heat exchangers are included in their study. They concluded that integration of failure analysis with HEN reliability optimisation, fouling estimation of heat exchangers could eliminate the need for total unit shutdowns required for cleaning. Faults diagnostics and prognostics approaches can be implemented along with the framework to allow better HEN maintenance planning. Niu (2010) also developed a novel condition-based framework to optimise maintenance cost for low methane compressors and induction motors. They employed a data fusion strategy for improving the health condition monitoring, health assessment and fault prognostics. Their results show that the approach is efficient and effective in maintenance planning. Regretfully, the method requires a massive input of data that is expensive to retrieve. Trzcinski and Markowski (2018) proposed that heat exchangers (HE) temperature data can be used for fouling prediction. The clean HE data is fed into the ANN model to predict the outlet temperatures. The fouled HE can then be predicted by comparing the outlet temperatures due to heat loss. Diaz-Bejarano et al. (2019) also proposed an approach for refineries HE fouling parameters estimation using maximum likelihood method and applied in a real refinery case study. Their method allows the early detection of acute solid deposition and early diagnosis to support immediate remedial decisions making. Diaz-Bejarano et al. (2018) suggested a mathematical simulation to estimate fouling based on the changes in heat transfer coefficients and pressure drops. The study is demonstrated with HE in a crude oil refinery and shows excellent results. Miyata et al. (2019) presented a Convolutional Neural Networks (CNN) approach to detect and diagnose faults in a heat source system with thermal water storage. As CNN is typically used for applications such as image-recognition models, the process data has to be converted into images data. High accuracy fault diagnosis is achieved with this model by comparing with real data. These methods (ANN, Maximum likelihood, CNN) help analyse the real data and indicates faults with probability. However, this analysis requires massive data loading and computational power. Wang et al. (2013) designed a general mathematical framework for joint spare part, ordering and CBM for a general production system. The prognostic information J o u r n a l P r e -p r o o f (remaining useful life) about the condition of the critical components are updated online using Bayesian updating technique. The data will be inputted to the costbased optimisation model to determine the optimal replace time, and subsequently, the ordering time for spare parts is identified. This provides a more reasonable maintenance planning considering the spare parts delay and logistics as well as faults prediction. However, the application has not yet tested on any real applications. The process degradation model has to be validated with real-time or historical data to confirm the applicability of the framework. Kareem and Jewo (2015) had developed a failure prediction model based on the monitored condition in the Carbon Black Processing Unit of a petrochemical plant in Nigeria. The traditional inspection model is modified that incorporates progressive deterioration in the equipment to plan a preventive/corrective action. The physical properties of the plant's critical component: centrifugal compressor are collected, namely temperature, pressure and the vibration analysis of the unit. The expected failure will be predicted based on the collected data, and the maintenance activities can be planned proactively. Their findings showed that the planned maintenance with failure prediction is profitable. According to their study, the component's functionality and conditions have to be monitored closely to prevent any false positives or negatives. Time and efforts have to be invested for process control so that the anomalies can be detected proactively. Susto et al. (2015) presented a novel multiple classifier method for CBM on an implanter in a semiconductor manufacturing process. Their machine learning (ML) based framework allows the prediction of unexpected breaks and lifetime. The cost-effective maintenance plan can then be planned and updated dynamically based on the currently collected data. Ben Ali et al. (2015) also shows an ANN model to assess performance degradation of rolling element bearings for industrial machinery. The model is capable to classify the defects accurately to aid in maintenance decision making. However, their models might require modest computational power to process data with high dimensionality and to train the model. Cho et al. (2016) had demonstrated the possibility of the condition-monitoring approach to determine future failure times for an offshore plant. The vibration data of the gas compressor shaft is on-line monitored closely. They proposed the use of continuous-time Markov model-based approach, regression model and the hybrid approach to predict whether the vibration level will exceed the threshold value. The next failure time is then predicted from the analysis, which provides a relevant guideline for maintenance planning. The use of Markov model provides an interesting insight into the life-cycle of the compressor. However, the problem can get very large if more parameters and longer time frames are considered. Pan et al. (2019) combine health condition data and compose a multi-level indicator to determine the pump shaft system's failure phenomena in nuclear power plants. Fuzzy modelling and AHP is then used to model the deterioration of the key indicators. This provides a good basis for predictive maintenance planning. Benkedjouh et al. (2013) presented a two-step (offline and online) approach to health monitoring for bearings. Feature reduction is applied to analyse the vibration signals (offline), and the current health state of the bearing is estimated using support vector regression (online). Zhang and Hu (2013) summarise the achieved milestones, challenges and emerging trends of safety prognostic technology in the complex petroleum engineering system. By combining knowledge of the process and system, the authors proposed that the HAZOP analysis, degradation analysis and dynamic BN can be integrated with the database to predict failures, accounting the for spatial and temporal dimensions. More effective technology system and knowledge bases are needed to transform the 'fail and fix' maintenance practises to 'predict and prevent' strategy. Based on the reviews above, CBM requires the data collection of the physical health condition of the operating assets. It reflects the real condition of the equipment that allows accurate interpretation of the potential risks. Vibration monitoring is mainly used in the rotating equipment in process industries, reported applications are from the nuclear power industry (Ayo-Imoru and Cilliers, 2018), and oil refineries (Kareem and Jewo, 2015) . Temperature and noise analysis are also a common approach to asset health monitoring. Fouling analysis, along with the CFD tool, is reported to be applied in heat exchanger network (HEN) fouling mitigation (Sikos and Klemeš, 2010) . Condition monitoring methods can be integrated with faults diagnostic and prognostic approach, such as the Bayesian method (Wang et al., 2013) to devise an efficient and accurate maintenance planning. Shin and Jun (2015) summarised the tools or approaches that are typically used in each CBM phases, including data processing, diagnostics, prognostics and maintenance strategy selection. They are also done some surveys on international standards for CBM of various equipment. Various sophisticated AI or machine learning tools such as ANN (Ben Ali et al., 2015) , CNN (Miyata et al., 2019 ) Support Vector Regression (Benkedjouh et al., 2013) , multiple classifier model (Susto et al., 2015) , Bayesian Network (Wang et al., 2013) or Continuous Markov Chain (Cho et al., 2016) are useful in providing quick, accurate and online prediction framework for fault prognosis and diagnosis, as well as degradation/deterioration modelling. In contrast to the TBM approach, the dataset used for CBM reflects the true health condition of the equipment. TBM utilises past failure data, and the statistical function is needed to model the failure behaviours. Although CBM is capable of predicting equipment failure, the data collection and analysis process is often expensive. The installation of costly sensors and instrumentations is necessary. The investors and companies' shareholders are often against this approach since they are not willing to invest in them (Ahmad and Kamaruddin, 2012) . The equipment condition data collected is difficult to be processed. For example, the vibration or acoustic data contains noise signals, and visual images data are made up of large dimensional numerical values. The analysis and interpretation of these data require investments in experts and software usage training. The CBM framework can be combined with the simulation of the maintenance system (e.g. discrete event simulation) to enhance the quality of the decision-making process (Alrabghi and Tiwari, 2015) . Ramesh et al. (2020) highlight the needed preparation, especially in logistic aspects towards fully remote condition monitored process plants. Performing regular PM requires the stoppage of the process and incurs expensive downtime cost. Opportunistic maintenance (OM) policy exploits the production stoppage as an opportunity to execute maintenance. The elaborate combination of preventive and opportunistic maintenance might be beneficial from the perspective of cost reduction. This idea is illustrated in Figure 5 for two components/equipment items. Assume component/unit 1 has failed; there exists an opportunity to maintain component 2 with component 1 to save the stoppage cost. Group maintenance policies are extensively applied to complex multi-unit systems. Koozhaki et al. (2012) pointed out that the aim of OM is to group maintenance activities of two or more components based on their conditions in order to reduce maintenance costs, downtime and hazardous risks. The maintenance activities can be grouped together (opportunistic) by considering the common dependencies within the system. Group maintenance relies on the fact that the common maintenance cost of a group of components is less than the total cost of maintaining these components individually. Under this policy, the components with similar operating conditions (such as electrical components) are identified and undergo an inspection and maintenance tasks together (Shi and Zeng, 2016) . In other words, a group maintenance policy provides a basis to combine maintenance activities and share the set-up costs with a number of components in the system. Such a sharing strategy can decrease cost or may result in lower cost compared to the case when maintenance tasks are conducted separately for each component. This economic dependency by saving the set-up cost is the most common approach considered by previous researchers. However, as pointed out by Besnard et al. (2009) , higher maintenance and spare parts costs would be incurred for equipment that is still in good shape. The grouping strategy is only suitable for high failure rate components. It requires complex planning and might take more time than the scheduled downtime. Petchrompo and Parlikad (2019) offer a comprehensive review of asset management on multi-asset systems. Their research outcome concludes that the multi-component dependencies are classified into three types: performance, stochastic (similar failure rate) and resource (limited repairmen or spares). They delve into modelling approaches adopted by previous researchers and addressed the potential research gaps in multi-asset systems. Jonge and Scarf (2019) presented a comprehensive review of different mathematical approaches used by researchers up until 2018 in maintenance planning, including the grouping maintenance strategies. Opportunistic maintenance models have been extensively reviewed in the literature by Lapa et al. (2006) . Generally, those models are formulated within two main frameworks, i.e., TBM and CBM. The TBM framework defines an age threshold, exceeding which triggers the consideration for opportunistic . The CBM framework performs opportunistic maintenance based on the measurement of the current system condition (Moghaddam, 2013) . For instance, if a system is found defective or wear-out at an inspection or its degradation level exceeds a predetermined level, opportunistic maintenance is accepted. A maintenance policy termed as an opportunistic replacement and inspection was introduced by Radner and Jorgenson (1963) . It is based on research conducted by the RAND Corporation to find optimal maintenance policy for manned aircraft and ballistic missile systems. In the process industry, opportunities may arise for low-cost maintenance activities via the utilisation of unavoidable stoppage events (e.g. exhaustion of raw materials, lacking demands or harsh environmental conditions). One of the pioneering works in this field of research on the production system is the work done by Kulshrestha (1968) . They proposed an opportunistic maintenance policy for two classes of units. When a breakdown occurs in one of the units, it provides an opportunity to repair component in another class of the unit that has already failed. Bergman (Bergman, 1978) suggested a preventive maintenance policy for a single machine with two identical components. Upon failure on one of the components, the other component is also repaired if its statistical age exceeds a predefined critical age limit. Similarly, Zheng (1995) presented an opportunistic maintenance model for non-identical units in a system. The failure occurrence and the reactive replacements create opportunities to repair another unit, which its hazard rate has exceeded a hazard rate tolerance. It is apparent that the early researches mainly focus on the use of the component's age to identify the opportunities for simultaneous maintenance. The statistical representation of the equipment age requires historical failure data and is often inaccurate. Performing excessive maintenance on the functioning system may incur a higher cost as well as wastage of resources. For industrial applications, Nilsson et al. (2009) studied the OM in a nuclear power plant. They reconstructed the replacement schedules of shaft seals in a feedwater pump system. The components in the pump system are clustered according to their individual failure data. The proposed approach is to minimise the cost according to some constraints and discounted by the time value of money. A sensitivity analysis was also done where the different parameters vary in relation to the discount rate. The conclusion drawn is that the proposed OM optimisation model is a deterministic model and applicable in practice. Laggoune et al. (2009) presented an approach for grouping maintenance activities on a hydrogen compressor in a continuous operating oil refinery. They also applied the statistical failure data to estimate the age of the components and identify the maintenance periods. The maintenance on the components is grouped together by solving the cost-based optimisation model, for which the grouping would yield the total expected maintenance cost per unit time. The maintenance planning, including opportunistic grouping, are expressed as decision nodes in each time period and incorporated into Monte Carlo framework. The numerical search is performed to identify the pathway that yields minimal maintenance cost. Although grouping maintenance strategy might not give optimal condition for individual components, the overall cost can be optimised. The framework requires further refinements for real applications due to the several assumptions are made (e.g. instantaneous maintenance actions and neglected resources availability) Recent Development in OM Another line of research that emerges in OM is the delay-time concept. The concept was first proposed by Chister and Waller (1984) , which the delay time is the time window from the first detected product defects to the failure of the process. Wang (2012) provided a comprehensive review of numerous models and case studies using this concept. Zhao et al. (2015) utilised the delay-time based concept to determine the optimal PM interval in a steel mill. The production stoppages caused by waiting for raw materials provides an opportunity to inspect and perform maintenance action. According to their study, the production stoppage is not the critical factor in determining optimal PM frequency in the steel mill. This might not be applicable to the large-scale process as the production stoppage could cause major production losses. The relationship between production stoppage and PM scheduling should not be ignored. Berrade et al. (2017) also utilised this concept to evaluate the optimal postponed maintenance time intervals to avoid production disruption. They explored the conditions for the delayed time in performing maintenance to devise cost-effective planning for maintenance and production, assuming the unit is still functional after a defect is found. The method is tested in a hypothetical case study with food processing production line, and they claimed that if the failure cost is not too high, the postponed maintenance can be beneficial to the production constrained by a time limit. However, the time for failure arrival is modelled with an assumed probabilistic model with no real data as supporting evidence. Huynh et al. (2013) devised a strategy to utilise opportunities from the monitored condition of the equipment (deterioration level) to determine a costeffective maintenance program. They proposed a general maintenance framework on a k-out-of-n deteriorating system with the objective to identify the minimal longrun expected cost from the maintenance action. The decision variables are the inspection time and conditional reliability. applied a Markov decision process (MDP) to a k-out-of-n system to devise an opportunistic maintenance program subjected to deterioration caused by environmental conditions. The multi-component system is solved by decomposing into several single-component systems. They claimed that the deterioration accelerates when the environment condition gets worse. The study is interesting to consider the environmental influence and might be applicable to the real chemical system (e.g. utilities supply system for a chemical site), considering the unit dependencies. Laggoune et al. (2010) studied the optimisation of replacement for a multicomponent series system subjected to random failures, where the cost rate is minimised under general lifetime distribution. When the system is down, either correctively or preventively, the opportunity to preventively replace functional components is considered. The proposed optimisation model for the component replacement is validated for the hydrogen compressor in an oil refinery. Li et al. (2018) proposed a maintenance strategy for automotive production line considering the reliability, operating rate and maintenance cost for the equipment. In their study, the relationship between the failure rate and the operating rate of the production line is determined by using regression analysis. The maintenance plan of the most critical machines is built based on the delay-time theory, for which the maintenance period is optimised by taking the total shutdown time as the objective function. Based on the mentioned works, the maintenance on the units is solely determined by the statistics of the failure data, which gives the optimal maintenance cost with certain reliability threshold. However, their model neglects the cumulative destruction of the maintenance actions on the conditions of the machine. Kamaruddin and Ab-Samat (2014) conducted an overview of the implementation of OM. They concluded that most of the researchers utilise mathematical modelling, fuzzy logic and statistical analysis to perform OM planning. Majority of the published works also use cost as criteria up until 2013. Based on their results, not many publications on the industrial application of OM are found. Cavalcante and Lopes (2015) presented an approach to construct the OM policy of a sugar and ethanol cogeneration system. The maintenance is mainly focused on the water supply pumps for the boilers, exploiting the maintenance opportunity arises when a plant system fails. The time points of the arising opportunities are modelled with the Poisson distribution. The authors considered the multi-objective optimisation with cost and availability as criteria. The scaling factors are then determined using value function methods, using the solution ranges obtained from the single optimisation of the two criteria. The results of their study exploit the use of the opportunity to perform maintenance based on decision-makers' preferences, which encourage the use of similar mathematical models in a real application. Basri et al. (2017) pointed out that most of the researchers only focus on maintenance planning without the implementation to check its relevance. The reviews did here show that the scope of maintenance-grouping covers the process of identifying similarities or recognizing identical characteristics amongst maintenance actions, systems or components and spare parts or tools. The grouping approach in the PM planning has received more attention from researchers as it provides various benefits such as simplifying maintenance actions, aiding mathematical analysis and creating a standard process plan which can lead to time and cost savings. As indicated by Rustogi and Strusevich (2012) , the grouping is an easy and straightforward approach as the number of groups indicates the number of maintenance actions involved. The authors also pointed out that reliable data and grouping procedure can be incorporated in a computer-based integrated maintenance planning and scheduling to assist practical decision making. The cluster analysis allows for easier representation of the maintenance-production scheduling problem and could improve systems performance and productivity. Yang et al. (2018) presented an OM strategy for a crystalliser casting machine. The strategy is based on the determination of the control limits for the deterioration level of the equipment, which provides opportunistic for maintenance actions. Different maintenance strategies are compared, namely regular maintenance, condition-based maintenance or both. They only considered a piece of single equipment in their work. However, internal maintenance opportunities may arise for a complex chemical process (due to another equipment failure), and failure dependencies may contribute to the chances for maintenance action. Extra elements or parameters could be incorporated to extend the proposed model. Human errors and grouping could be balanced to reduce major accident risk. In the field of the chemical engineering industry, not many publications are found accounting human errors in the opportunistic maintenance strategy. Sheikhalishashi et al. (2017a) incorporated human factors into the opportunistic maintenance strategies for fuel-supplied electrical power plant. The equipment maintenance actions are grouped with failure-based opportunities. A component fails when its cumulative deterioration exceeds the given threshold, which creates an opportunity to perform repair or maintenance on other non-failed components. Two human factorstime pressure and fatigue level, are incorporated into the model. The grouping maintenance activities considering human factors could achieve significant cost-saving and reduce the risks of delaying maintenance. Another case study, incorporating human factors into maintenance grouping in a petrochemical plant, is presented by Sheikhalishashi et al. (2017b) . Similar assumptions and models are utilised in their work too. According to their results, the proposed meta-heuristic algorithms could be implemented in small, medium and even large-scale problems with reasonable computation time. The downside is that the implementation of the algorithms requires extensive programming background. A significant number of articles exist regarding opportunistic maintenance management in the chemical process industry. According to the above discussion, the researchers mainly identify the maintenance action clusters through the assets' economic dependencies (e.g. the sharing of the fixed set-up cost for maintenance) as is done in Laggoune et al. (2010) , for example. This sharing strategy may yield lower overall costs compared to the case where maintenance tasks are conducted J o u r n a l P r e -p r o o f separately on each equipment. The works on grouping strategies, based on stochastic dependencies (e.g. deterioration characteristics of the assets), are also found, for example in (Nilsson et al., 2009 ) for feedwater pump system and (Yang et al., 2018 ) for a production system. The delay-time concept is also applied in OM application (Wang, 2012) . The maintenance is delayed purposely after the first fault detection to maximise the production uptime, provided the fault is not too vital. Zhao et al. (2015) have shown the application of this delayed opportunity to devise a costeffective maintenance plan for the production. The failure rates of the components or equipment are assumed to be known from historical data (Laggoune et al., 2010) . As mentioned in section 2.1.2.3, the failure data is usually difficult to obtain. They can be inaccurate or misinterpreted. As the data are often recorded by the person-in-charge, incorrect judgement of the failure modes of the units may occur. Choice of probability distributions also affects results significantly, especially since failures are rare events that occur in the tails or extremes of probability distributions. A unit replacement due to end-of-life of equipment or planned maintenance may also be mistaken for failure during data collection (Ahmad and Kamaruddin, 2012) . Previous researches also neglect the spare parts inventory management. The maintenance grouping strategy may incur higher maintenance and spare cost for equipment that is still in good shape. The grouping method also requires complex planning prior to implementation. Occasionally it is impossible to predict when to perform the corrective maintenance actions, as failures are difficult to predict. The incorrect maintenance grouping may sacrifice the optimal preventive maintenance schedule and delay the repair actions. Despite the significant advantages of opportunistic maintenance management, there exists a research gap between theoretical and practical application for this approach. Most of the works are demonstrated in the maintenance of multiple components in a single equipment, while multi-unit systems are considered with at most two dependent components. The scale of the problem can be enlarged to determine the optimal clusters of a turnaround for the process equipment. As suggested by Kamaruddin and Ab-Samat (2014) , the simulation of the process is required to check for the validity of the models before it is implemented. The bottlenecks of the process can be identified through simulation and criticality analysis of the plant assets, and the maintenance can be performed more effectively. Chin et al. (2019) presented a resource-allocation model to identify the optimal investment needed. Failure events are expressed as the 'cost demands', while the opportunistic maintenance cost saved is treated as 'cost supply'. However, their model requires further validation. Human errors can be integrated into the simulation, as suggested by Sheikhalishahi et al. (2017b) , to further lower the risks of failures and improve the process reliability. Jonge and Scarf (2019) mentioned there are ample research opportunities considering different dependencies in multi-asset systems, including economic (similar set-up cost), structural (system configuration), stochastic (similar asset failure or deterioration behaviour) and resource (limited repairmen and spares). An appropriate maintenance strategy could mitigate the probability of equipment failure and improve their working conditions. This could result in lower maintenance cost and higher product quality, which in turn leads to a larger profit. As reported by Okoh and Haugen (2014) , most of the major accidents occurred in chemical and petrochemical plants (46 % of 183 major accidents) from 2000 to 2011, were maintenance-related. This suggests that the proper selection of maintenance policy is crucial to reduce the risk levels in terms of operational, production, human safety and environmental risks. Selection of an appropriate maintenance strategy for the process is a complex task due to the difficulties of data collection and a large number of conflicting criteria or goals that need to be considered. The criteria that are needed to be fulfilled consist of investment cost, safety hazards, environmental constraints, failure and maintenance costs, plant reliability, manpower allocation, etc. Some of the mentioned criteria are not easy to be quantified and cannot be expressed in monetary terms. As such, the maintenance strategy selection problem is often considered as a complex multi-criteria decision making (MCDM) problem. The MCDM approach developed by Shafiee et al. (2015) considers the set of maintenance alternatives (CBM, PM, CM, RBM, TBM, OM), for which the decisionmaker (i.e. top management) must select the optimal strategy. The maintenance strategies are evaluated based on their weighted importance of the finite set of predefined criteria (see Figure 6 ). The comprehensive review of the MCDM framework for maintenance strategy selection before 2013 is presented in (Shafiee, 2015) . Table 2 shows the selected published works on maintenance strategy selection based on MCDM framework. In their approaches, most of them expressed qualitative data such as risks quantitatively, e.g. in (Sahoo et al., 2014) . Although risk or safety thresholds are quantified, the physical meaning of the numerical values is doubtful. The deterministic values of the risks are not able to project the uncertainty factors within them. The use of fuzzy logic in decision making allows the uncertainties of the safeties to be modelled. A number of publications are reported applying fuzzy approaches in MCDM framework. The cognitive thoughts from the experts can be properly quantified and incorporated into the model to aid in decision making using this approach. AHP is also widely used to determine the weighted rankings of the criteria. Events simulation is also used in evaluating different policies, see (Alrabghi et al., 2017) . However, such a simulation model for complex maintenance system requires high computational expenses. Powerful workstation and multiple software licenses are inevitably required for this approach. The discussion about the maintenance problems from previous sections only focuses on the long-term maintenance intervals and optimal maintenance time for the process. The constraints on the availability of the resources (e.g. time, manpower, storage and money) are generally not considered in the previous planning models. In this section, some of the works performed in the area of maintenance schedule optimisation in the chemical industry, emphasising maintenance tasks allocations are discussed. A joint production and maintenance scheduling model with multiple preventive maintenance services are also presented. The turnaround planning in a chemical plant requires long-term turnaround planning; then the short-term individual maintenance tasks scheduling can be performed (see Figure 7) . The turnaround usually takes at most one year for a chemical plant (Cheung et al., 2004) . Pistikopoulos et al. (2001) formulated a mixed-integer linear programming (MILP) model to perform production scheduling and exploit idle times in multipurpose equipment utilisation to perform preventive maintenance. Piecewiseconstant failure rates of equipment are assumed, and these are incorporated in a multi-period planning model. Explicit uptime constraints are included to quantify the availability of equipment. The solutions are able to balance the process design cost with the maintenance costs corresponding to the system effectiveness criterion. However, solving the optimisation problem requires complex mathematical formulations and computational effort which limits its applicability in large-scale problems. Cheung and Hui (2001) also formulated a MILP to perform long-term maintenance scheduling for a chemical complex with eight plants and associated utility plants. The planning horizon considered is two years, with monthly planning periods. A cyclic schedule is built, assuming that each unit needs to be shut down for maintenance once a year. Scenarios, including the availability of intermediate storage, and the import of electricity and intermediate products are considered. Similarly, the computational burden would become heavier as the problem domain gets larger. More robust and computationally efficient algorithms have to be used for larger-scale problems. Ahmed et al. (2015) devised a maintenance scheduling problem for a complex gas absorption system in a hydrocarbon processing facility. Considering the constraints of equipment risk, maintenance cost, system reliability and availability, the proper scheduling involving inspection, maintenance and replacement is determined. In their model, they assumed different actions could alter the original failure characteristics over time. The proposed model successfully optimised the existing maintenance schedule of a gas absorption system and an estimated 17 % in maintenance cost saving is achieved as compared to similar system reliability levels. However, accurate failure analysis of the equipment is needed, which heavily depends on historical data, fault diagnostics, and prognostic information. Sahoo et al. (2014) studied and proposed the maintenance scheduling of a gas engine power plant for 3 y time horizon, where only a single maintenance crew is available for maintenance. The task is to schedule the shutdown of parallel units so as to minimise idle time and shutdowns in high-tariff periods assuming seasonal variation in electricity prices. The model also allows for staggering the duration of shutdowns in the case of low power demands. A continuous-time formulation is proposed, and a general disjunctive programming scheme is used to solve the problem efficiently. The availability of only a single maintenance team causes a critical bottleneck to the process. For future development, outsource of contractors and manpower allocation could be integrated into the maintenance scheduling process. Megow et al. (2010) consider turnaround scheduling in the chemical industry, specifically in continuous plants. The task is to minimise the cost of maintenance with respect to the resources used, which are manpower and maintenance equipment. This minimisation is subject to pre-set precedence rules for maintenance tasks and resource scheduling constraints that involve shift calendars for maintenance workers. The assignment constraint, which assigns maintenance resources to jobs in each time period, gives the detailed maintenance schedule. The interesting trade-off is the time-cost trade-off, where more expensive external resources can be utilised in order to perform a certain task in reduced time. The authors only consider the risk evaluation after the schedules have been identified. The inherent risk analysis of the whole production process is neglected. Aguirre and Papageorgiou (2017) formulated a continuous-time MILP model to determine the optimum production and maintenance schedule for a multiproduct batch process. The tasks are scheduled by using the travelling salesman problem (TSP)/precedence-based concepts, which are different from the previously applied principle of first-in-first-out (FIFO) (Dedopoulos and Shah, 1995) . Their model also incorporated the production resources constraint and unit performance decay that reflect the reality of a chemical batch process. The performance decay is modelled as a statistical distribution. It provides some numerical guidelines to engineers to determine the optimal maintenance actions considering the deterioration model. However, the representation of performance decay without any data as evidence is hardly convincing. Another published work on integrating process and maintenance scheduling can be found in Idris (2016) . The study is conducted on a Mexican oil chemical company. The authors proposed a mixed-integer non-linear programming (MINLP) model to optimise the product processing time and overall profit. The PM actions are translated into maintenance demands, and they are considered as products constraints with specific time windows and goals. By using historical production data from the company, they applied a Monte Carlo approach to generate several production instances and significant improvement of profit is obtained using the proposed model. However, the practicality of the model is questionable, since a longer time horizon of maintenance planning enlarges the problem significantly. The need to have the right spare parts at the right place and at the right time inevitably requires the joint optimization of maintenance schedules and logistics of maintenance resources. The joint decision-making problem becomes particularly challenging if one considers multiple options for PM operations and multiple delivery methods for the necessary spare parts. Wang and Djurdjanovic (2018) presented an integrated decision-making policy for simultaneous PM scheduling, spare parts inventory management and transportation planning in a system of geographically dispersed multi-part degrading assets and maintenance. This integrated decisionmaking policy considers both perfect and imperfect maintenance options, as well as multiple shipping methods for spare part deliveries. The problem investigated here is only a theoretical example, but the framework can be extended and applied to largescale practical problems in chemical process plants with real data. The allocation of human resources also deserves to be researched to execute maintenance activities. An integrated site consists of a network of plants or large-scale processes (Feord et al., 2002) . A large-scale integrated chemical site constitutes a number of individual production units that are either connected to each other directly or through buffer storage capacities. Each production unit supplies raw material to other units and produces final products that are ready to be shipped to end-users. Buffer storage tanks are usually installed between inter-connected plants so that the product demand can be fulfilled during the turnaround periods (see Figure 8) . Notes that the plants may not be connected in a similar way with the structure as shown. This figure is for illustration purpose only. The integration of this network of plants provides synergistic opportunities for sharing raw materials, products, process and business information, domain knowledge, energy, utilities, manpower, safety infrastructure, and transportation (Amaran et al., 2015) . An example of a large-scale chemical integrated site is the Dow Texas Operations. From the year 2017, it has gradually evolved from smaller sites to home its state-of-the-art ethylene facility with a nominal capacity of 1.5 Mt (Dow, 2019) . However, the risk of integration failures, such as mismatch on capacity and blocking or starving other units, can be fatal and causes more obstructions to the processes compared to the conventional independent production plants. Terrazas-Moreno et al. (2010) mentioned that uncertain events could affect the performance of an integrated site as well. The variations in external supply, demand, or fluctuations in plant throughput could significantly affect the operation of a chemical site. The presence of buffers or intermediate storage in a process creates different dynamics in the production line, as discussed by Marquez et al. (2003) . They claimed that the buffer inventory level should provide the proper isolation time for maintenance before the buffer becomes empty. Maintenance can be performed on failed components in the upstream, without causing propagation of the effect of the failure to the downstream (avoiding the so-called 'material starvation'). Similarly, the buffer inventory level should guarantee the maintenance action is entirely performed to the failed downstream equipment without causing propagation of the failure effect to the upstream (avoiding the so-called 'blocking of production'). This strategy is interesting in the process industry, as the typical capital cost of equipment is high, and the holding cost of material is low. The isolation times before reaching 'material starvation' or 'blocking of production' have to be properly identified to make an accurate analysis of their effect on the throughput of the line. Benjamin et al. (2015) suggested that criticality analysis has to be performed to each unit to determine which will cause greater disruption in the entire chemical site. The main design decisions considered are the changes in process capacity, the introduction of parallel units, and the addition of intermediate storage. A metric is known as expected stochastic flexibility to quantify the robustness of the integrated site is used. As expected, the higher capital investment could tackle a higher amount of uncertainties for reliable operation. The downside for this proposed framework is its inability to handle exponential growth in the number of states for large-scale problems, which requires further developments for improving the method scalability. Macchi et al. (2012) provided an assessment of the effect of buffer inventory levels on production throughput for supporting maintenance decisions. In their work, a model, derived by extending the well-known reliability block diagram (RBD) method, integrated with state space approach is provided to analyse the effects of process isolation times to the production line. The integrated tool is applied in the mining industry to demonstrate its practicability. More accurate analytical models can be applied to the analysis, such as fluid Markov chains ) and fluid stochastic Petri nets (Silva and Recalde, 2004) . Demir et al. (2014) presented a classification of the research work in the buffer allocation problem in a production system. According to their results, the goal of the majority of the existing works was to choose buffer sizes for a production line. They assume that the only parameters to find are buffer sizes. However, the proper maintenance planning, coupled with buffer allocation, is required for optimal, reliable design of integrated chemical sites. Nahas (2017) considers a serial production line consisting of n unreliable machines with (n -1) buffers. The objective is to determine the optimal preventive maintenance policy and the optimal buffer allocation that will minimise the total system cost subject to a given system throughput level. The time-based maintenance policy is utilised in their model. The numerical results showed that preventive maintenance has a major impact on throughput and the total cost for a serial production line. Plant maintenance poses widespread disruptions to production. Maintenance effects are amplified when the plant is part of an integrated chemical site, as production levels of adjacent plants in the site are also significantly influenced. A challenge in dealing with turnarounds is the difficulty in predicting their duration, due to discovery work and delays. Amaran et al. (2016) mentioned that this uncertainty in duration affects two major planning decisions: production levels and maintenance manpower allocation. The size of the maintenance crew must be decided several months before the turnarounds occur, contractors are typically hired several months in advance for a specific time period. If maintenance exceeds the nominal duration, and maintenance personnel are not available, certain maintenance activities could be deferred to future turnarounds. This could have severe impacts on production, especially in a tightly integrated site. Replacement or on-demand manpower is usually not an option due to scarcity, so there is no immediate recourse action that can be affected. As a result, the availability of manpower for the nominal duration as well as for an extended duration is crucial. It is unlikely that all the turnarounds occurring in the time window concerned are affected. These characteristics indicate J o u r n a l P r e -p r o o f that a schedule that is robust to uncertain turnaround durations with respect to manpower availability is needed. Amaran et al. (2016) investigate the turnaround planning problem under uncertainty over a medium-term time horizon (24 weeks). Two approaches were proposed to handle the uncertainty in turnaround duration. The first involved a robust sequential optimisation in minimising the turnaround frequencies and multistage stochastic programming approach to maximise profit. The demands and manpower constraints and turnaround duration are also evaluated. The downtime of the plants will be managed by the use of inventory capacity. The combined robust optimisation and two-stage stochastic programming formulation are to conduct scheduling as well as production simultaneously. The scheduling decisions at the first stage are then fixed in the multi-stage stochastic linear program to optimise production planning. An advantage that the sequential approach has is that one may choose to enhance the manpower and subtask model with more detail, for which the expense of the simultaneous approach may significantly increase. The downside of the sequential approach is that the solution is not guaranteed to be a global optimum. Integrated sites may also benefit from holistic and long-term maintenance turnaround planning. Amaran et al. (2015) devised a long-termed turnaround planning for hypothetical integrated chemical sites (30 years). The main objective is to minimise profit losses due to unexpected failures, constraint by resources availability and financial impact. The three significant decisions for the framework are the timing of turnaround that: (1) maximises the net present value; (2) balancing financial and operational performance; and (3) outsourced manpower reduction. Real-world issues such as the incorporation of seasonal constraints and the possibility to import intermediate products are discussed. They mentioned that rolling horizon planning (decompose to several planning sub-models) is more relevant than cyclic scheduling due to more resemblance to real cases. However, the planning model is computationally expensive, which limited its practical usage. Rajagopalan et al. (2017) proposed a stochastic model to evaluate the risk of loss in rescheduling turnarounds over vast networks of interconnected plants. The strategy of using inventories during unplanned outages is also considered, which adds the flexibility factor to the maintenance planning problem. They performed a sensitivity analysis of the financial risks due to loss of sales between the base schedules and proposed alternative turnaround plan. They found out that the benefit of frequent turnaround is determined by the trade-off between financial loss and chances of a production outage. For the shorter term, the benefit of rescheduling turnarounds increases with time due to demands become leaner (more intermediate storage is available). For the longer term, proactive planning is more attractive in terms of profits. The proposed method is useful for optimal planning of future prediction for production and inventory. However, the authors only incorporate the financial risks into the maintenance planning of the sites. Future development of the model is needed to account for several risks, such as environmental risks and human risks. J o u r n a l P r e -p r o o f Cheung et al. (2004) formulated a MILP Model for short-term site-wide maintenance schedule in a chemical production site, which consists of eight production plants. The maintenance schedule requires pre-determining utility and product demand profiles during plant shutdown, overhaul and start-up. Payment for skilled labour is another primary concern in the scheduling process. As in-house skilled maintenance operators are limited, external labours are hired to supplement the manpower shortage. The objective of the model is to identify the optimum profit considering labour cost, electricity cost, production cost and maximum product revenue, taking into account the varying electricity prices and production demands. They also performed an analysis of the maintenance planning based on different electricity contracts and overhaul periods. The determined maintenance plan performs better than the original heuristics-based maintenance schedule in terms of feasibility and profitability. Terrazas-Moreno et al. (2011) combined simulation and optimisation to design the integrated chemical processing networks, which are subjected to random failures of units. The superstructure optimisation model is formulated to add parallel units or additional buffers to the integrated sites. Discrete rate simulation is used to represent the failure modes and operation of the integrated site with great details. The objectives of the model are to minimise the overall capital investment as well as maximise the average production capacity. The MILP formulation of the stochastic programming model incorporates endogenous uncertainty of failures for different design selections and production/storage capacities. The problem is decomposed into multiple stages. It first requires to iteratively solve for the design (flowsheet) with only a few failure scenarios for low-risk equipment, and then solve the rest of the problem in a reduced space where only scenarios relevant to the fixed flowsheet are considered. However, the computation performance of the model is still relatively weak. The model requires refinement so that it can be accepted as a tool for an industrial setting. This section discusses the overall summaries of the maintenance planning and scheduling model in the process industry. Separate sections are devoted to the optimisation criteria and the limitations of the current modelling methodologies. Summarising the reviews of maintenance planning, the challenges most often faced are the issue of data availability and accuracy. For TBM, RBM, OM or sometimes CBM, the failure characteristics of the asset are needed and require the historical operational data. The data requires a significant amount of time to be collected, and even worse, the data might be censored or truncated due to misinterpretation. The deterioration modelling also relies on the virtual age of the asset, which raises the issue of applicability and accuracy. For RBM, the risk scores can be integrated with uncertainty modelling. Probabilistic modelling is used for the uncertainty that can be modelled with statistical distribution; while fuzzy modelling is for the epistemic uncertainties due to lack of fundamental knowledge (Kiureghian and Ditlevsen, 2009) . However, it is difficult to understand the uncertainties due to the lack of statistical data or too little background knowledge to understand the judgement of the failure occurrence. Risk computations can also become impractical if the problem domains get too large. CBM is thus introduced to overcome these issues. The maintenance decision making depends on the health condition of the asset, evaluated as a combination of measurements of indicators by various sensors and estimation of other properties. The data collection requires the installation of costly instrumentations to closely monitor the data and to prevent any false alarms. Expensive software licenses are needed to process the information, train the model and to estimate the remaining useful life of the assets. Majority of the works on OM are also focused on the maintenance of multiple components in a single piece of equipment. Significant amounts of literature are published for maintenance strategy selection. Majority of the works formulate the model as a multi-criteria optimisation model, where the criteria can be in the context of economic, technical, social and environmental. Most reported approaches applied expert judgement system (e.g. AHP and ANP) to determine the weight factors of each considered criteria. Simulation approach and fuzzy modelling are widely applied to this problem to account for the uncertainties. The optimal maintenance strategy (e.g. TBM, CM, RBM, OM and CBM) is selected based on the chosen criteria. The obtained results often show CBM is the best policy. However, CM is occasionally reported as the optimal strategy due to a limited budget. The published works on maintenance scheduling in the chemical industry are also collected and analysed. Mathematical optimisation models (mainly MILP) are often formulated to determine optimal scheduling. The works on joint production and maintenance planning are also reported, such as (Aguirre and Papageorgiou, 2017) . The material balances, inventory requirements, resource availability and demand requirements are often set as constraints in the scheduling problem to determine the cost-effective maintenance schedule. It is also noted that the long-term (Idris, 2016) , the medium-term (Amaran et al., 2016 ) and short-term maintenance scheduling (Castro et al., 2014) are reported, with both cyclic schedule and rolling horizon approach. Similarly, the maintenance schedule for integrated chemical sites is also considered in Section 3. The buffer allocations to store for products are needed for integrated chemical sites as the turnaround of a plant causes the production stoppage. The widely-used optimisation criterion in these models is the cost. This can be the maintenance cost or cost integrated with production loss, labour cost and inventory cost. The Net Present Value (NPV) or the profit of the plant considering the equipment's lifetime is also commonly used as the objective function for the models. For systems with high-reliability requirement (e.g., nuclear power plants or power generation plants), the optimisation criteria often prioritise the system availability, constrained by pre-defined maintenance budget, labour or spares storage limitations. Preventive maintenance time or intervals, labour workforce size and the spare parts inventories are usually the chosen variables to be identified in the optimisation model. It is also common to express risk as monetary terms, e.g. financial risk or production risk-see (Mohamed and Saad, 2016) , while environmental risk can be expressed with the ratio between emission-to-standard limit values, see . In most cases, renewal theory is used as the main methodology to identify the optimum cost per replacement cycle during long-term maintenance planning. However, the simplified assumptions of renewal theory on cost optimisation is not appropriate for industrial application. For vastly connected process networks, renewal theory becomes infeasible as the analytical solution is difficult to obtain. For more complex models, such as the inclusion of deterioration or maintenance with production planning, the simulation-based approach is often used. Monte Carlo Simulation is the most popular due to its simplicity and ease of use. The simulation is also widely used to compare different maintenance policies. The Markov decision process is also often used for modelling equipment deterioration within a time horizon, either continuously or with discrete states. However, not many works have considered the combination of both quantitative and qualitative data input for maintenance decision making. Quantitative input such as equipment's image or health condition has to be processed with qualitative data (e.g. human speech or risk scores) for well-informed business decision making. A minority of the works consider the constant changing of data as well. To model the maintenance/repair effect, the common approach is to use 'minimal repair' policy, i.e. quick fixes immediately after the failure, especially in timebased models. The failed equipment is returned to the state just before it is failed. Imperfect maintenance models provide a repair policy to the equipment to back to a slightly better state. The failure rate is also often assumed to be increasing after a repair, depending on the chosen failure distribution. The 'age threshold' or 'risk threshold' method is also often used as a benchmark to identify the maintenance period. Practically, it is impossible that failure occurs only after a fixed deterioration level. Failure could occur; even the deterioration level is stable (de Jonge and Scarf, 2019) . In fact, failure should be described by the statistically derived failure rate that depends on the deterioration level or identified from the data. The concept of postponing or conducting maintenance earlier is addressed in the opportunistic maintenance model, to save the production downtime and cost. In the current state, there is still a lack of research results concerning on opportunity maintenance policy. The interaction of multi-asset in a vastly connected network is also challenging to model. Not only structural interdependencies, but the unit failure would propagate and affect the operating parameters, causing major damage to the system, especially in an integrated chemical sites There is also a lack of publications concerning job scheduling or shift scheduling to individual maintenance operators in the chemical plant/sites. Most of the works assumed the repair or replacement is immediately performed once a failure is detected. The spare parts logistics are also usually ignored. Although Wang and Djurdjanovic (2018) considered this aspect, their case study is hypothetical. Real studies concerning these issues are needed to reflect the reality of the problem. CBM is the more accurate maintenance planning model than TBM as the data collected reflects the real-time condition of the equipment. This signifies that on-line monitoring framework of the data is needed to implement CBM. The framework proposed by Zhang and Hu (2013) is an example of the integrated maintenance framework, which gathers data collection and maintenance management software. Some of the advanced maintenance planning models (e.g. RBM with dynamic BN) can be coded and implemented in the existing Computerised Maintenance Management Systems (CMMS) and integrate with the condition monitoring systems to allow for the effective fault diagnosis and prognosis. The maintenance strategy selection framework can be incorporated into the computer tools as well to select the optimal maintenance decision. The readers are referred to (Ahmad and Kamaruddin, 2012) for further information about the existing CMMS tools. AI approaches such as SOMs are also applied in future risk prediction so that the maintenance could be planned in advance (Jaderi et al., 2019a) . The approach is useful as it allows adaptive prediction of risks with both quantitative and qualitative inputs. The learning algorithm could predict the risk in advance, allowing proactive actions to be planned. However, this approach requires extensive training with a large amount of historical data. The challenge lies in separating signals from the noise. For future works, Principal Component Analysis (PCA) is a promising tool that reduces the dimensionality while considering multiple criteria without the loss of information. Quantitative and/or qualitative risk scores can be inputted into PCA model to determine the weightage of the risks in different aspects (cost, environmental or social), and in turn, devise a proper maintenance plan. FANP can be incorporated as well to identify the interdependencies and uncertainties of factors. These tools can be further developed to aid in risk-based decision making in maintenance since the computational resource is well developed in this era. In a constantly changing business environment, the weight priorities via AHP/ANP has to be updated dynamically. The dynamic risk analysis framework should be incorporated, for example, from Bhandari et al. (2016) to adapt for the continuous changing of experts' opinions. The maintenance planning models should also combine with the maintenance scheduling framework for systematic asset maintenance management. Maintenance planning can be just an ideal case which neglects the resources management (e.g. human labour, spare parts and time constraints). The online CBM framework can be coupled with the maintenance strategic planning and scheduling optimisation models formulated in commercial optimisation software for effective maintenance tasks allocations. The opportunistic maintenance policy should also be incorporated into the optimisation model. Most of the maintenance planning or scheduling models developed were mostly MILP model, and inevitably some are MINLP model. The discrete decisions on when to perform the maintenance are the causes for the integer/binary variables. MILP is indeed preferable, but as the problem gets complicated, it requires heavy computation burden as well. However, as maintenance planning is a planning problem, computation speed, up to some extent, is not a major issue because a solution can be obtained eventually. The major issue is the global optimality of the solution and the generation of near-optimal solutions simultaneously. This feature is important as the engineers/practitioners could evaluate the alternatives based on their preferences in one go. Regretfully, not all solvers currently have this feature. This calls for a graph-theoretical tool (e.g. P-graph (2019) to represent and generate all combinatorially feasible network structures. These structures can be used to identify optimal and near-optimal solutions for examination by decision-makers. This approach can also accelerate optimisation of large problems by reducing the problem search space. It could be coupled with evolutionary algorithms (e.g. Particle Swarm Optimisation) as well to obtain a globally optimal solution. The readers could refer to the application of P-graph in process system failure analysis , reliability analysis in a process system (Kovacs et al., 2019 ) and Heat Integrated system and (Orosz et al., 2018) for reliability with sustainability analysis. The maintenance scheduling model is often formulated as a 'black-box' mathematical model. It is difficult to interpret the solutions if the user has limited mathematics and programming knowledge. A graphical approach based on thermodynamic background, such as Pinch Methodology (Linnhoff et al., 1982) was a new research direction to facilitate the allocation of risk mitigation strategies, see (Tan et al., 2016) . For more recent developments see e.g. Klemeš et al. (2018) , maintenance jobs scheduling or maintenance planning (Chin et al., 2019) . P-graph based approaches can be used as well for optimum allocation of maintenance services within the plant. This method is applied to prioritising critical areas in an organisation (Aviso et al., 2017) and human resource planning in a university setting (Aviso et al., 2018) . The similar approach can be applied to an analogous system of workforce planning in a chemical site. Maintenance is not the only option to improve process reliability. The retrofit of the assets in production can be a better alternative compared to maintenance. For example, when the remaining service life of the asset becomes shorter, it might be better to replace the equipment instead of repairing it any further. The relationship between its service life, failure behaviours and the maintenance effect can be a potential research area to devise proper asset maintenance as well as investment planning. In the perspective of smart asset management operation, it is necessary to form a collaborative design platform based on a knowledge base, highly integrated knowledge sharing among professional design systems. An example is the integration of asset management life cycle analysis with the remote faults diagnostics and simulation models for prediction of equipment failure, as well as qualitative risk-prediction approach. The remaining life of the asset could be estimated based on the collected data to provide 'just-in-time' mitigation action. The system retrofit and the maintenance are related intrinsically, but the modelling approaches have developed in parallel, with the little intersection. To provide the link between these task classes is another challenge, requiring a multidisciplinary approach. The increasing amount of repairs and maintenance of machinery, which produces various types of wastes would increase the burden on the environment. Ajukumar and Gandhi (2013) emphasised that the product design should not be focused solely on its functionality, but also take into account the environmental impact during maintenance. They proposed that the design of engineering equipment should be driven by the green-maintenance concept. The design characteristics should fulfil the operational criteria, safety criteria, as well as environmental compatibility. Orosz et al. (2018) applied the P-graph tool in identifying the optimal process system design in term of reliability, investment and footprint. The selection of operating units with non-optimal design may incur sustainable penalties during operating and maintenance phase. Maintenance strategies can reduce the environmental effects due to the breakdown of machinery. Potential impacts can result from accidental release of intermediates or products, or from energy and feedstock utilised to produce offspecification products that need to be reprocessed further. For serious accidents, such as the Bhopal incident (Ishizaka and Labib, 2014 ) (release of methyl isocyanate to the environment), the massive release of the chemicals incurs environmental risks. Smaller accidental releases of chemicals may still cause impacts on the environment. The maintenance system can be simulated by paying attention to the investment in new and green technologies as well as green maintenance operation technologies in future research work. New technologies to decrease pollution and improve product efficiency should be considered. The installation can be compared with performing maintenance on the asset so that the investment and environmental impact can be compromised. The facts that maintenance planning could address the issue of environmental risks prove the relationships between maintenance and sustainability. For example, the loss of contaminant from broken equipment is critical and cause undesirable hazards to the environment. For potential future development, the life-cycle sustainability assessment of different maintenance approaches based on sitespecific data can be performed to evaluate the maintenance actions. The maintenance as an alternative sustainable action instead of introducing new J o u r n a l P r e -p r o o f technologies or different mitigation approaches could be a new method and trend to promote sustainability, which is in-line with the concept of Circular Economy (CE) (see Figure 9 ). CE is a conceptual framework that proposes a closed-loop system (i.e., waste and in-house resources maximisation) when considering planning, designing, and operating industrials systems within an economy. Maintenance could prevent potential major accidents, but it can also be a cause of the major accidents themselves. As reported by Okoh and Haugen (2014) , for the period from 2000 to 2011, from 183 major accidents taken place in chemical and petrochemical plants, 46 % were maintenance-related. Modification-related process accidents could occur while the item is being repaired or restored to its original required function (BS-EN-13306, 2017) . Typical accidents reported are fire occurrence, explosion, emission of hazardous substances and equipment structural failure. The most effective improvement strategies for maintenance planning could be identified by investigating the accident causes. The abnormal events, such as nearmisses also frequently occur in chemical process operations. Such events are often termed as accident precursors, and its information is crucial to understand the safety of a process (Rathnayaka et al., 2011) . Gnoni and Saleh (2017) concluded that there are important synergies between near-miss management, risk mitigation and maintenance for accident prevention. The abundance near-miss data from multiple sources can be integrated into the existing safety management framework (e.g. risk assessment or maintenance planning) to improve the system reliability. The intervention of the process to perform maintenance increases the likelihood of nearmiss, accidents or operational deviation. This suggests that the proper selection of maintenance policy is crucial to reduce the risk levelssuch as operational, production, human safety and environmental risks. Gnoni and Saleh (2017) also stated that near-misses, operational deviations and mishaps, are crucial indicators of the potential process failure. Learning from the near-misses is less costly than learning from the accidents. As presented by Rathnayaka et al. (2011) , a predictive accident modelling approach (e.g., Bayesian approach), combining risk assessment approaches, accident precursor information (near-misses) and the failures cause-effect relationship could be coupled to maintenance optimisation models. The process behaviour can be captured dynamically, and the accident likelihood can be updated through predictive capabilities. The accident model can be integrated with assets' condition data or other AI predictive strategies (e.g. deep learning models) for more accurate predictions. During critical times, like the current COVID-19 pandemic, plant managers may forego maintenance since some of the production of the goods companies are operated at well below the capacity. This is an incorrect approach that might place the company's business into future difficulties when recovery eventually occurs and production returns to normal. Not only the equipment breakdown risk or accident prediction should be incorporated into the maintenance model, but the epistemic uncertainty resulting from adverse external disruptions strongly affects the maintenance requirements as well. This has received considerable attention recently, see, e.g. . Unlike statistical uncertainty, this type of uncertainty is difficult to predict, as no measurement can be performed and data is sparse. Accounting for the effects of this type of uncertainty is crucial in data-driven risk quantification, which affects the outcome of maintenance optimisation. The same principles apply to other types of disruptive events that can affect plant operations and maintenance, such as natural disasters or political crises. Maintenance in this era faces the challenge of utilising these technologies and concepts in the form of a smart system that predicts failure, makes a diagnosis and triggers maintenance (Kumar and Galar, 2018) . The architectural design and the conceptual framework for a smart maintenance decision support system that is based on corporate Big Data analytics can be utilised. More effective technology system and knowledge bases are needed to transform the 'fail and fix' maintenance practises to 'predict and prevent' (Zhang and Hu, 2013) . For the predictive maintenance framework (see Figure 10 ), different sources of data (e.g. healthy asset data and faulty asset data) can be acquired and preprocessed using some data analytics algorithms. As the data is often noisy, especially periodic data from vibration monitoring, the feature extraction method is needed to classify different types of faults. The successful application in the chemical process industry, as mentioned by García-Muñoz and Macgregor (2016) is to project the data into low dimension latent space (i.e. PCA). This is because only a few dominant variables are driving the process under normal operations. Projecting data from higher to lower dimension space allows the interpretation of the problem and data visualisation. The processed data can then be used to identify the remaining useful life (RUL) of the equipment by comparing the healthy and faulty data, by using certain condition indicators (see an example from (Daher et al., 2020) applied in lab-scaled distillation column). Different models of RUL computation can be found in and the review of data-driven statistical approaches of RUL estimation can be found in (Cox, 1972) . These indicators can be useful to develop an appropriate maintenance planning and schedules. Large amounts of data generated from the Internet of Things (IoT) sensor networks can potentially be mined to help plan maintenance. Data directly related to the condition of a component or system can be historical or current (real-time). Historical data includes logistical data from maintenance records, which can also include a user's experience and knowledge. The sensor data is also collected, along with external data, such as the weather or environmental variables. Data and knowledge from various sources (e.g., human operator experience, technical manuals, and reports) are also valuable. To extract useful information from all this data, advanced data analytics is needed. Methodologies to fuse knowledge from empirical data and from experts will also be needed. Bayoumi and McCaslin (2017) had demonstrated and suggested the use of IoT to perform CBM in petrochemical plant, water treatment plant and general machinery. Bokinsky et al. (2013) pointed out that written text from logistical data can often contain mistakes and gaps due to human error. AI tools, such as natural language processing, can be done in order to transform the text into a useable form. The written speech from experienced maintenance operators can also be transformed into some transferable files to provide details or training on asset maintenance. The collected sensor data will be processed using techniques such as advanced signal processing. Advanced signal processing helps in the extraction of relationships between sensors (Coats et al., 2011) . Elhdad et al. (2013) presented an ontology-based framework for flexible integrated maintenance and production planning framework for a hypothetical petroleum plant. This approach combines the available knowledge bases about the operations, structures and conditions of the equipment/process. The physical plant elements are captured within the knowledge bases, which allows the decisionmakers to modify it depending on the desired business rules. Monitoring and maintenance records can also be stored in the data storage system to track the failure behaviour of the equipment. Natarajan et al. (2012) also utilised this framework to design a process supervision system for a large chemical plant. This approach can be beneficial as the process can be simulated, and scenario analysis can be performed. The created logical plant structures mimic the real simulation of the chemical plant and are able to interpret abstract knowledge from the operators/engineers in a much more systematic way. This framework can be coupled with Big Data analytics and asset lifecycle planning to decide the actions for the assets: repair, retrofit or replace. The current emerging trend in predictive maintenance is to create a modelbased "digital twin" of the asseta simulated detailed replica of the asset (Magargle et al., 2017) . The replica provides a simulation of the asset and can be used to diagnose anomalies that affect its performance. Future research can emphasise on creating the digital twin of the assets in the chemical plant with 3-D simulation, coupled with Big Data analytics to predict the remaining useful life of the asset. Fidelity of the digital twin can be incrementally improved by gradual recalibration using operational data, which accumulates over the life of the asset. Du (2018) presented an industrial scale virtual instrument technology for the online monitoring system. This tool is suggested to integrate with an online monitoring system as well as the digital simulation of the assets so that the efficiency of the CBM can be improved in the future. The maintenance plan can be decided effectively with a higher confidence level since the engineers or operators could observe the asset's physical condition. However, as mentioned by Reis et al. (2016) , the data collected may not accurately reflect the actual system, no matter how large the sample size is. Even with massive data available, the knowledge about the system is still required. The careful selection of various data analytic frameworks is necessary so that the model captures the system properly. Another important issue is the dynamic evolution of data, which would affect the judgment on maintenance actions. The historical data may not reflect the current situation accurately, and the data analytic algorithms have to adapt to the non-stationary evolving data. This paper presents an overview of the state-of-the-art in asset maintenance management practises for chemical process industries. A significant amount of published scientific articles, devoted to the maintenance policies planning, scheduling and strategy selection focusing on the application in the chemical and process industry, have been assessed. The potential research gaps have been identified, and the potential future research developments are suggested. A critical issue that this review has identified is the limited availability of data to conduct proper research. For TBM, RBM, OM or sometimes CBM, the historical failure data of the assets are needed. The data requires a significant amount of time to collect, and even worse, the data might be censored or truncated due to misinterpretation or some special reasons. The deterioration modelling also relies on the virtual age of the asset, which raises the issue of applicability and accuracy. RBM can be a better alternative to TBM since it prioritises the assets according to their criticality rankings. However, the presence of uncertainties due to the lack of statistical data or too little background knowledge to understand the judgement of the failure occurrence is still a pressing challenge. RBM may not be a practical approach if the problem domain is large due to the limitations of risk analysis approaches. CBM is preferred as the maintenance decision is made based on the physical condition of the equipment. However, the data collection requires the installation of costly instrumentations to closely monitor the data and to prevent any false alarms. Dedicated software is also needed to process the information, train the model and estimate the remaining useful life of the assets. CBM is the potential predictive maintenance policy with high accuracy and worth to be developed. It can be coupled with comprehensive RBM model (e.g. dynamic BN and predictive accident modelling) for effective faults prediction and prioritisation. Maintenance scheduling with multi-criteria optimisation models can also be incorporated into the framework to screen through several maintenance options. Maintenance strategic selection is important as well to prevent any maintenance-related accidents and to fulfil multiple criteria. For a more comprehensive and practical maintenance optimisation, tools advancement by coupling structures generation algorithms with global solvers is necessary. In terms of modelling approaches, the following potential advancement on maintenance optimisation are identified from the review: Opportunistic maintenance of equipment exploiting the production stoppage. The current works mostly dealt with the maintenance of multiple components in a single equipment, while multi-unit systems are considered with at most two dependent components. (ii) Data-driven joint optimisation of spare parts ordering policy with maintenance planning should be considered. Currently, there are a lack of practical studies on maintenance optimisation with resource limitations (iii) Non-stationary evolving data on assets' condition and hazardous risk should be utilised. Not many works have considered both quantitative and qualitative data. Both types of data should be combined with well-informed maintenance planning. (iv) Fault propagation effect from one unit to another should be incorporated into maintenance optimisation, especially in Total Site planning. Not only structural interdependencies but the unit failure could affect the operating parameters, causing major damage More attention should be paid to the sustainability impacts caused by equipment maintenance and breakdown. Greener and more efficient technologies can be installed into the assets (retrofit), other than the maintenance of the equipment. Life-cycle sustainability assessment of the asset management methods can be performed to evaluate the sustainability index of the assets. Advanced development of assets maintenance planning and scheduling models concerning the environmental burden is needed so that the process can advance toward Circular Economy. Big Data analytics is going to play an increasingly important role in predictive maintenance. Other than the condition data, the logistical data and written text or speech from experienced operators can be input into maintenance planning models using AI approaches. In terms of maintenance management, ontology-based maintenance system is also recommended by combining the knowledge bases from the operational and condition data generated from the equipment. Cloud computing provides a potential platform for training the models with massive loading of data. Digital twins with virtual online monitoring system can be an alternative predictive maintenance strategy as it provides physical details about the assets and the measurement systems. Learning from past near-miss or accident data could lead to a better risk-and condition-informed maintenance decision making as well. Although massive data is available, a careful selection of the data analytic framework is still required to reflect the real asset condition. This review should serve its purpose if it inspires practitioners or researchers to develop novel ideas for further research in the field of asset maintenance and management. The system retrofit and the maintenance are related intrinsically. Participation of experts and stakeholders from different problem domains is required to facilitate appropriate and optimal asset management strategy (e.g. the decision to repair, renovate or replace the asset, see Figure 9 ). Future research on the development of such a decision-making framework would be beneficial for driving business profit by asset planning optimisation. Further improvements in the predictive maintenance strategy (Figure 10 ) are also encouraged. The intellectual integration between academic researchers and industrial practitioners play an essential role in addressing the current imbalance between theory and practices in asset management. Zhen, X., Vinnem, J.E., Peng, C., Huang, Y., 2018 J o u r n a l P r e -p r o o f (Labib, 2015) . The criticality of each failure events are evaluated, and safety mitigation investments are optimised (Ishizaka and Labib, 2014) 2014 Fuzzy bow-tie analysis and 'lean' principles Fishbone diagram and FMEA are used on decorative paint, adhesives, industrial paints and printing inks manufacturer. Risk scores for the unit are identified, and the selection of mitigation strategies is optimised. (Aqlan and Mustafa Ali, 2014) 2015 Inputoutput modelling Criticality ranking of the processing unit in a polygeneration plant and a bioenergy park The risk-based analysis is performed via unit capacity changes and measuring the downstream effects. (Benjami n et al., 2015) 2015 Adaptive risk analysis The effect of the decisionmaking process is updated dynamically for the oil and gas industry. The method provides more degree of freedom in tackling changing information (Bjerga and Aven, 2015) 2015 Bayesian Network (BN) The method is used to determine spatial and temporal evolutions of domino effects of accidents (e.g. fire) in a hypothetical fuel storage plant. The equipment is ranked considering the risk and its effect propagation. The most probable sequence of accidents is determined. (Khakzad , 2015) (identify the temporal trend of the monitored condition parameters) The boiler section requires the CBM as it is highly susceptible to failures 2019 Qualitative Process Resiliency Cooling towers in power generation plants The maintenance strategy (PM, CM and CBM) is selected qualitatively based on the failure risk, downtime and equipment life, with the predefined process and cost model. (Jain et al., 2019) Resource-constrained formulation for production scheduling and maintenance An overview of time-based and condition-based maintenance in industrial application A multi-constrained maintenance scheduling optimization model for a hydrocarbon processing facility Total productive maintenance implementation in a manufacturing organisation Evaluation of green maintenance initiatives in design and development of mechanical systems using an integrated approach Analysis and optimization of periodic inspection intervals in cold standby systems using Monte Carlo simulation Total quality maintenance: An approach for continuous reduction in costs of quality products Condition-Based Maintenance: Selection and Improvement of a Cost-Effective Vibration-Based Maintenance Policy for Rolling element Bearings. Condition-Based Maintenance Selecting the most efficient maintenance approach using fuzzy multiple criteria decision making State of the art in simulation-based optimisation for maintenance systems Simulation-based optimisation of maintenance systems: Industrial case studies Long-term turnaround planning for integrated chemical sites Medium-term maintenance turnaround planning under uncertainty for integrated chemical sites Study of the Optimization of Maintenance Plan for Risk based maintenance optimization: foundational issues Integrating lean principles and fuzzy bow-tie analysis for risk assessment in chemical industry Develop a Maintenance and Reliability Plan