key: cord-0067912-qnhmggbb
authors: Zeng, Guanwen; Sun, Zhiyuan; Liu, Shiyan; Chen, Xiaoqi; Li, Daqing; Wu, Jianjun; Gao, Ziyou
title: Percolation-based health management of complex traffic systems
date: 2021-09-27
journal: Front
DOI: 10.1007/s42524-021-0174-0
sha: 5cc996976739e9158679b76f8e21ede4f531dfb4
doc_id: 67912
cord_uid: qnhmggbb

System health management, which aims to ensure the safe and efficient operation of systems by reducing uncertain risks and cascading failures during their lifetime, is proposed for complex transportation systems and other critical infrastructures, especially under the background of the New Infrastructure Projects launched in China. Previous studies proposed numerous approaches to evaluate or improve traffic reliability or efficiency. Nevertheless, most existing studies neglected the core failure mechanism (i.e., spatio-temporal propagation of traffic congestion). In this article, we review existing studies on traffic reliability management and propose a health management framework covering the entire traffic congestion lifetime, from emergence, evolution to dissipation, based on the study of core failure modes with percolation theory. Aiming to be “reliable, invulnerable, resilient, potential, and active”, our proposed traffic health management framework includes modeling, evaluation, diagnosis, and improvement. Our proposed framework may shed light on traffic management for megacities and urban agglomerations around the world. This new approach may offer innovative insights for systems science and engineering in future intelligent infrastructure management.

China launched New Infrastructure Projects to boost economic development in the new era, especially under the shock of the COVID-19 epidemic. Compared with traditional infrastructures (e.g., railways, highways, and airports), the New Infrastructure Projects will utilize more advanced technologies, including 5G, ultra-high voltage, intercity transport, vehicle charging stations, big data centers, artificial intelligence, and the industrial internet. Although the New Infrastructure is designed to be more connected, more digital, and smarter than traditional infrastructures, determining the management concept based on new technologies would be challenging. For the transportation system, focusing on efficiency may paradoxically attract increased traffic demands, which may drive the system to the edge of its operational critical point. Furthermore, the New Infrastructure generates new operational modes, which may amplify uncertainty of system risks as well. Thus, sustainability (Behrends et al., 2008) and resilience (Holling, 1973) entered the scope of system management. Meanwhile, owing to the pollution that may be generated by vehicle emissions, green transportation has also become one of the new management objectives. Such different objectives, like in other complex systems, are given preferential decision weights under different circumstances and sometimes, against one another. In the traffic context, capacity, safety, efficiency, economy, and environmental protection may have tradeoffs. For instance, the increasing transport capacity of urban road networks is suitable for increased travel demands, but the accompanying external costs (e.g., congestion, accidents, noise, pollutions, and so on) may deviate from optimal welfare in terms of private and social aspects. These considerations raise one fundamental question: What is the basic logic behind transportation management under the new scenario?

In this study, we propose a traffic health management framework as a possible solution. System health plays an essential role in the reliable and efficient operation of a system during its entire lifetime. The New Infrastructure is a complex system with characteristics such as nonlinearity, emergence, adaptability, hierarchy, and so on, which can dramatically increase management complexity. Moreover, the components of the New Infrastructure unprecedently interact with one another, which could further increase system uncertainty, especially the risk of catastrophic cascading breakdowns (Zhao et al., 2016) . However, classical reliability management approaches are typically based on the assumption that failures in a system are independent (Zio, 2009) . Although these approaches can be effective for simple systems, they may fail in a critical infrastructure with the mentioned complexity (Zio, 2016) . The transportation system, as a complex critical infrastructure, experiences huge economic losses annually owing to traffic jams, which can be regarded as an "urban disease". One of the objectives of traffic health management is to mitigate this traffic congestion "disease" in advance. Existing traffic management methods are driven mainly by responses as a "reactive treatment of disease". Although these methods may be helpful for controlling recurrent congestion in the traffic system, they are challenged when dealing with increasing uncertainty, which may lead to a "black swan" event in traffic management, with severe consequences. The key to this problem is to start by deeply understanding the failure mechanism of congestion from a systems engineering viewpoint (Guan et al., 2020) . Therefore, in this paper, we review the current research on traffic reliability management and propose a traffic health management framework considering the core failure mechanism (i.e., spatiotemporal propagation of traffic congestion). We hope to establish the health view based on the principle of "preventive treatment of disease", which may be helpful for present and future critical infrastructure management.

Our system health management (SHM) framework aims to increase the availability and resilience of a system by understanding its core failure mode. Previous studies defined system health from different perspectives. Johnson et al. (2011) defined system health as a system's capability to preserve its normal functions from operational interference. Bonissone et al. (2009) divided SHM into two major aspects, that is, (i) health assessment to mainly perceive the current and future health status of a system and (ii) health management to mainly implement control and adjustment and formulate logistics support plans according to the perceived system status. Khan and Yairi (2018) described health management in engineering systems as the process of diagnosing and preventing system faults by predicting the reliability and remaining useful life of system components. Generally, health management execution involves the following steps:

(i) collecting system status data from various distributed sensors, (ii) analyzing the health status through key feature extraction and performing fault diagnosis and prognostics, and (iii) providing further decision-making suggestions for health improvement.

In health management development, the intelligent maintenance strategies are enhanced continuously. Equipment performance and reliability improvement performed only after the occurrence of a failure, which is the traditional maintenance strategy of reactive maintenance (by repairing and replacing faulty components), cannot deal with the huge losses caused by a failure. To mediate equipment failure in advance, the preventive maintenance strategy was developed to replace key components after a certain operation period according to the intended engineering schedule. However, the limitation of this maintenance strategy is that a large amount of operation data (especially the failure time) are required for recording, and components may be "terminated" before the end of their actual lifetime, resulting in over-maintenance, which can increase maintenance costs. The later-developed condition-based maintenance strategy (Marseguerra et al., 2002; Peng et al., 2010) , which collects various types of data during operation to discover early failure phenomena, has strong accuracy and thus can remove a portion of unnecessary scheduled maintenance tasks to reduce turnaround time. Although condition-based maintenance can perceive the current operating state, it is hard to assess equipment damage degree and predict remaining useful life, thereby making future failure risks difficult to control. Therefore, to further improve maintenance intelligence during an entire lifespan, new health theories and technologies are necessary. Prognostics and health management (PHM) (Zeng et al., 2005; Mancuso et al., 2021) focuses mainly on health status monitoring, prognostics, and management. PHM collects data on different dimensions and applies various intelligent algorithms to convert measured data into indicators that can characterize the health status of components to predict and manage faults before they occur. The core of PHM technology depends heavily on understanding failure mechanisms.

In contrast to PHM, SHM focuses on systematic management, and presents high requirements for the health knowledge of complex systems. As the "lifeline" of the nation's economy and safety, critical infrastructures are necessary to provide high-quality services; however, once they fail, immeasurable losses may occur consequently. To predict and manage uncertain risks and maintain the health status of an entire system instead of only its components, SHM is of considerable importance. Specifically, the health management of an urban transportation system, which is a typical critical infrastructure, has accompanying complexities. Zhao et al. (2012) analyzed the traffic flow data of mountainous cities and proposed a method for measuring the subhealth state of a system based on the fractal characteristics of city traffic. Chen et al. (2015) proposed a set of health indices for evaluating the health status of a bus line network in a typical metropolitan city. Wang (2016) proposed a comprehensive health status evaluation system including road network density, bus stop coverage, bikeway network density, and sidewalk network density based on the road networks of an old town. Zhou et al. (2020b) developed a traffic health indicator that integrates three factors, namely, traveler factors, society factors and nature factors.

The aforementioned urban traffic health evaluation methods have certain limitations. First, the indicators neglect the relation between transportation health and its core failure mechanism (i.e., the spatio-temporal propagation of traffic congestion). Second, the indicators are mainly statistical and pay little attention to the topological and functional behaviors of the entire transportation system. Third, the indicators are difficult to provide management decision support. Based on the core failure mechanism, we develop an SHM framework using intelligence methods from traditional reliability engineering.

The traffic congestion propagation process can be divided into three stages, namely, occurrence, evolution, and dissipation. Traffic reliability management corresponds to the occurrence stage, traffic vulnerability management corresponds to the evolution stage, and traffic resilience management corresponds to the dissipation stage. In the following sections, we review existing studies on traffic reliability, traffic vulnerability, and traffic resilience to form a retrospection.

Traffic reliability can be defined as the ability of a traffic system to meet certain travel requirements under specified conditions and within a specified time. Commonly used traffic reliability indicators include connectivity reliability, travel time reliability, and capacity reliability. Among them, research on connectivity reliability can be traced back to the study of Mine and Kawai (1982) , which mainly examined end-to-end connectivity in traffic systems measured by the likelihood of a system to maintain connectivity after a disturbance. Chen et al. (2003) supplemented the concept of connectivity reliability as the likelihood of road traffic operation to meet unblocked demands under peak use conditions of a road network. This expansion changes the binary state (i.e., connected or disconnected) of each road segment to the probabilistic connection state, in which connectivity reliability is determined by analyzing the road connectivity probability between given origin-destination (OD) pairs. Guo et al. (2007) considered accessibility between OD pairs and proposed a method for evaluating connectivity reliability based on alternative routes, that is, when the initial travel route fails, a traveler will choose an alternative travel route (which is feasible for high-density urban road networks). The smaller the travel cost difference between the two routes connecting a given OD pair, the higher the connectivity reliability is. Important connectivity reliability application scenarios include emergency situations, such as disasters (e.g., earthquakes, mudslides, and so on) or large-scale accidents, on account of the considerable impact of the road network on the connectivity structure. For example, some scholars used the connectivity reliability indicator to measure road traffic system seismic resistance through Monte Carlo simulations (Cui et al., 2019) .

To make up for the shortcomings of connectivity reliability in describing traffic operation status, several scholars proposed the concept of travel time reliability. Research on travel time reliability can be traced back to a conference paper published by Asakura and Kashiwadani (1991) . The authors proposed that travel time reliability characterizes the probability of completing a trip between a given OD pair within a specified time. In 1999, Lam and Xu (1999) assessed traffic time reliability through traffic flow simulation (TFS). The TFS model only needs to provide a prior OD matrix and partial road operation information to estimate overall traffic flow and obtain travel time reliability according to variances in travel demands. Xiong et al. (2004) pointed out that classical travel time reliability measurements, which ignore the mutual influence of roads, could easily lead to overestimation. Thus, the authors proposed the travel time reliability indicator, considering the correlation effects among different roads, which can be divided into three categories, namely, not correlative effect, synchronous effect, and slight effect. Moreover, the authors found that considering interaction patterns could be significant, as the higher the degree of associated effects, the faster the decrease in travel time reliability is. Carrion and Levinson (2013) studied how users would place the reliability value when a travel fee mechanism is introduced. Sun et al.

(2014) proposed a two-level model by setting maximizing the reliability of total travel time as the upper-level target while considering reducing drivers' travel time uncertainty as the bottom-level target. Talebpour et al. (2017) studied how the introduction of autonomous vehicles and corresponding reserved lanes for such vehicles can influence travel time reliability. Saedi et al. (2020) proposed a network partitioning method that can obtain an accurate estimation of travel time reliability.

The capacity reliability of transportation systems was first proposed by Chen et al. (1999) , which is defined as the likelihood that a traffic network can accommodate a certain level of travel demands. Compared with connectivity reliability, the advantage of capacity reliability is that the status of each road is encoded as a probability rather than 0/1 binary (i.e., disconnected/connected), thereby reflecting the multiple connection status of a transportation system. Meanwhile, the routing behavior of travelers is also affected by the capacity limitation of a road, thereby making capacity reliability indirectly related to travel time reliability. Lo and Tung (2003) introduced probabilistic user equilibrium into traffic reliability, focusing on the impact of road capacity decline on total travel time. Liu and Pu (2004) constructed a two-level programming model that reduces calculation complexity when solving capacity reliability issues. Xu and Gao (2006) established a twolevel planning model for traffic network design and analyzed its capacity reliability on a small synthetic network. Lam et al. (2008) analyzed the impact of supply and demand on road capacity and proposed a model that considers the impact of the two aspects comprehensively to adapt to the capacity reliability calculation for different scenarios. Additionally, some scholars further analyzed and explored the influencing factors of capacity reliability in a road network through simulation methods. For example, Shi et al. (2020) used cellular automata simulation to analyze critical capacity when a road network is faced with the "gridlock" phenomenon.

Vulnerability originates from the concept of material mechanics and characterizes the natural phenomenon that materials deteriorate when small deformations occur under external force. In systems engineering, system vulnerability refers to a system's irreversible global collapse owing to disturbance from internal or external factors. The urban transportation system serves as a critical infrastructure system, whose large-scale paralysis will considerably inconvenience normal travel and cause significant losses for related industries. However, no unified standard definition of urban traffic vulnerability exists. Classical traffic vulnerability research began mainly by measuring the susceptibility of traffic networks to abnormal events (e.g., continuous degradation or random shocks) (Wang et al., 2020) . On the one hand, the uncertainty of abnormal events reflects that the higher the probability of abnormal events, the more sensitive the response of the transportation system is. On the other hand, the performance degradation level after the occurrence of abnormal events reflects that the greater the performance degradation, the more sensitive the transportation system is. Based on this perspective, it can be seen that traffic vulnerability has a strong correlation with traffic risks (Berdica, 2002) ; thus, risk indicators could be used to measure traffic vulnerability in practical applications. Taylor et al. (2006) measured traffic vulnerability by measuring changes in performance through road network accessibility under the interference of abnormal events and pointed out that failure risk in different geographical locations should be considered. Chen et al. (2007) introduced a model for measuring traffic vulnerability by examining accessibility changes in degradable transportation networks after the occurrence of abnormal events. Using network theory and Monte Carlo simulation methods, Fang et al. (2020) proposed topological centrality indicators for critical infrastructures (e.g., railway high-speed networks) to assess their vulnerability to disruption. Although the susceptibility of transportation systems to disturbances is generally studied from the supply side (e.g., decrease in certain road capacity), Connors and Watling (2015) adopted another approach and proposed a traffic demand vulnerability index from the demand side to analyze fluctuations at the transportation system service level caused by changes in demand.

Another perspective considers the most serious failure state (or function collapse) of the transportation system under the destruction of abnormal events. From this viewpoint, the system experiences not just a significant loss of performance, but the situation that it cannot operate anymore. In contrast to the aforementioned risk-based perspective, this measurement does not involve the uncertainty of disturbance but determines traffic vulnerability according to "the worst state" of the transportation system. From this perspective, vulnerability is the opposite of system "invulnerability" (Tan et al., 2011) . Yang and Qian (2012) argued that traffic vulnerability assessment must identify the key roads causing the highest loss to the entire road network under the same attack intensity, in which maximum loss is used as the traffic vulnerability measure. Kim and Yeo (2016) proposed a method for assessing traffic vulnerability based on a macroscopic fundamental diagram (MFD) and obtained the value of the vulnerability index by comparing the difference between the MFD of the road network under normal operation scenarios and disaster scenarios. Meanwhile, scholars proposed traffic vulnerability indicators based on complex network methods. Wang and Wu (2010) analyzed the vulnerability of bus networks in four cities against different attack modes and calculated traffic vulnerability with the giant component and network efficiency as the performance indicators of the public transportation system. The authors found that the bus networks are highly vulnerable to malicious attacks but robust to random attacks, thereby showing a vulnerability pattern similar to a scale-free network (Barabási and Albert, 1999) . Zhang (2017) proposed a vulnerability index based on dynamic network efficiency, which is more effective than static network efficiency.

Resilience derives from the Latin word "resilire", meaning "to return to the original position or state". Holling (1973) first proposed resilience in ecosystems as the ability of a system to recover from external disturbances. Since then, resilience, as an important measure of a system's ability to resist disturbances, has attracted attention and triggered new research on different systems, such as society (Barrett and Constas, 2014; Headey and Barrett, 2015) , the economy (Xu et al., 2011) , transportation (Ganin et al., 2017; Calvert and Snelder, 2018; Gu et al., 2020) , biology (Masten, 2016; Feder et al., 2019) , ecology (Mumby et al., 2007; Ponce-Campos et al., 2013; Gao et al., 2016) , engineering (Hollnagel et al., 2006; Boin and McConnell, 2007) , and so on. General resilience indicators applicable to different types of systems, including transportation systems, can be divided into two categories, that is, deterministic and probabilistic resilience indicators. For deterministic resilience indicators, Bruneau et al. (2003) proposed a method for measuring system resilience by calculating the integral of the loss of system performance from the time the system is disturbed to the time recovery is completed. As the mathematical meaning of this integral is an area similar to a triangle, this indicator is also called the resilience triangle as one of the most classical system resilience measurements. Rose (2007) defined resilience as ability to maintain a functional state after being affected and proposed a static index to assess system resilience. This resilience index mainly measures the degree to which a system deviates from the settled function state after being disturbed. Based on the work of Rose (2007) , Cox et al. (2011) applied this resilience index to the transportation system to evaluate the performance of the transportation system using the change in volume after a disturbance as the resilience measurement index. Zobel (2011) considered the two most critical resilience factors when a system is subjected to external disturbances, that is, the initial performance degradation and recovery time. The initial performance degradation reflects a system's ability to resist disturbances (i.e., robustness), and the recovery time reflects the system's ability to recover quickly from disturbances (i.e., rapidity). Henry and Ramirez-Marquez (2012) believed that the resilience measure of a system should have a strong time attribute, especially the state characteristics of the system at different stages under imposed disturbances, including the initial state, the state of the beginning of a disturbance, the state of the end of a disturbance, the state of the beginning of recovery, the state of the end of recovery, and so on, and provided a demo case for a simple traffic road network. Additionally, Francis and Bekera (2014) believed that the resilience of a system is reflected in its ability to absorb, adapt to, and recover from disturbances and proposed a composite resilience factor. Based on this framework, Ji et al. (2019) evaluated the resilience of an expressway system by applying the operational invulnerability to characterize absorptive capability, the proportion of roads with a low average speed during restoration to characterize adaptive capability, and the proportion of roads with a high average speed to characterize recovery capability, thereby evaluat-ing the resilience of the expressway system from the above three dimensions.

In contrast to deterministic resilience indicators, probabilistic resilience indicators consider the likelihood of system resilience meeting the specified requirements. Chang and Shinozuka (2004) defined system resilience as the likelihood of a system to recover to a predetermined performance level within a predetermined recovery time after being disturbed. Youn et al. (2011) believed that system resilience is reflected mainly in the reliability and recovery degrees, in which the reliability degree is regarded as the passive survival rate, and the recovery degree is regarded as the proactive survival rate. Therefore, system resilience is expressed as the sum of these two rates. Ouyang et al. (2012) proposed a resilience assessment framework for systems experiencing frequent disturbances, in which system resilience is the expected performance change in the system after each disturbance, by comparing the actual performance curve with the target performance curve. This measurement is particularly suitable for evaluating the resilience of systems frequently subjected to similar types of disturbances over time.

Apart from the two general resilience measurements, for specific traffic contexts, some scholars proposed specific traffic resilience analysis methods. Ip and Wang (2011) determined that the resilience of each node in a traffic network can be calculated by the weighted average of the number of reliable paths to other nodes, and the resilience of the entire traffic network can be calculated by the weighted sum of the resilience values of each node. Arcidiacono et al. (2012) used traffic accessibility as a performance parameter and proposed seven resilience dimensions (Renschler et al., 2010) , which is suitable for traffic conditions, to analyze the traffic resilience of a system subjected to extreme events. Freckleton et al. (2012) employed the fuzzy inference method to assess traffic resilience. According to the traffic network of 20 cities, Ganin et al. (2017) used the annual time delay per person of each city during the peak period to characterize urban traffic performance and took the change in traffic efficiency after a disturbance as the urban traffic resilience index. The authors found that urban traffic efficiency and resilience are not strongly correlated, thereby indicating that traffic resilience management requires different implementation approaches from traffic efficiency improvement.

For the health definition of a traffic system, establishing a scientific and effective index system is necessary to propose "reliable, invulnerable, resilient, potential, and active" health features. The diagram of the traffic health index system is shown in Fig. 1 .

Traffic reliability can be decomposed into three aspects, namely, connectivity reliability, travel time reliability, and capacity reliability. Traffic invulnerability can be measured by two indicators, that is, influence range and influence intensity. Influence range is used to calculate the geographical scale of abnormal roads caused by disruptive events, while influence intensity comprehensively considers the abnormal duration and other abnormal characteristics under interference to quantify traffic system invulnerability. Traffic resilience can be measured by two metrics, namely, deterministic resilience and probabilistic resilience. Traffic potential measures the ability of a traffic system to maintain its basic functionality under a given fraction of congested or destroyed roads. An effective way to develop traffic potential is to improve bottleneck capacity in the traffic network. Traffic potential can be analyzed in three aspects, specifically, critical locations, critical paths, and critical condition. Critical locations and critical paths play an essential role in enhancing traffic potential, as they may be able to provide redundancy for urban traffic to cope with sudden surges in traffic demand. The critical condition can be identified through percolation analysis (Stauffer and Aharony, 1992; Bunde and Havlin, 1991) . According to percolation theory, a system undergoes an "order-disorder" phase transition at the percolation critical point, and the "order" phase corresponds to a state in which the system structure and functionality are relatively complete, whereas the "disorder" phase indicates a state in which the system is significantly paralyzed. By applying percolation analysis to the traffic flow network , it was found that the traffic network organization process disintegrates from a global traffic state to a local traffic state near the percolation critical threshold, thereby indicating the disappearance of macroscopic traffic functionality. Therefore, critical condition can be employed as the operational boundary of the macroscopic phase transition between high-performance and lowperformance system states.

Traffic vitality aims to characterize the possible opera-tional states of a traffic system under uncertainty. We focus on two aspects of traffic vitality, namely, metastability and mode switching. At the microlevel, metastability can measure the health or subhealth traffic state distribution. The core of metastability is to identify and analyze the possible multiple stable states that may emerge at the scale of the entire traffic network. The metastable regime of macroscopic traffic performance can be observed using percolation approaches (Zeng et al., 2020) . A hysteresislike phenomenon can also be revealed, which may be a manifestation of vitality during the evolution of the network state. At the macrolevel, mode switching is used to describe the transition of global traffic organization between heterogeneous low-dimensional state and homogeneous high-dimensional state within a certain time scale. According to theory of phase transitions, the system scaling behavior can be analyzed when it is approaching the critical point. This behavior can be identified by critical percolation exponents. By using the cluster-size exponent, it is suggested that two modes of different critical percolations are switching in the same network topology under different traffic dynamics .

The proposed traffic health management framework is organized into five aspects. Based on the core failure mode, traffic health management is conducted through modeling, evaluation, diagnosis, and improvement. The logic diagram of this framework is presented in Fig. 2 .

From the perspective of reliability, traffic congestion can be regarded as the failure in traffic systems. Therefore, studying the core failure mode of traffic systems involves analyzing the spatio-temporal propagation properties of traffic congestion. Specifically, the propagation process of traffic congestion includes the stages of occurrence, evolution, and dissipation.

The occurrence stage of traffic congestion examines the emergence condition or features of traffic congestion. Although many studies started with the occurrence of roadlevel congestion (Daganzo, 2002) , our methods concentrate on the occurrence condition of network-level congestion . Network-level congestion is due to the inherent correlations spreading across the network and commonly triggered by fluctuations. This process can be viewed as the traffic operational level phase transition from high functionality to low functionality. Based on percolation methods, we study this phase transition process via the following steps: (i) construct the dynamical traffic network based on the topology of the urban road network and traffic dynamics, (ii) conduct the percolation analysis of network-level traffic by calculating the global traffic performance under different values of the control parameter (e.g., speed threshold), and (iii) identify the traffic percolation critical threshold, which is the value of the control parameter where the global traffic performance breakdown occurs. Therefore, the traffic percolation critical threshold can be viewed as the network-level traffic congestion indicator .

In terms of the evolution process of congestion, whether several congested roads will evolve into a wide-range traffic jam or merely merge into small congestion clusters is the main concern. For this concern, we can apply cascading models to analyze the likelihood of overall congestion evolution. Congestion evolution is a typical cascading failure. Specifically, we analyze the cascading failure process of a traffic system based on static correlation or dynamical propagation. For the static correlation analysis, we can analyze traffic jams in city traffic to explore the spatial correlation between traffic jams at different locations (Li et al., 2014) . This correlation is found decaying slowly with distance between two congested locations. Meanwhile, the correlation length increases dramatically toward peak hours. For the dynamical propagation analysis, we start with the cascading overload failure model on a spatially embedded lattice (Zhao et al., 2016) . The cascading overload failures spread from the initial failure with an approximately constant velocity. The spread velocity decreases as tolerance increases, which is demonstrated similarly in various model networks.

Traffic congestion dissipation may relate to the topic of traffic resilience, which recently attracted widespread attention. System resilience is a system's ability to adapt to a disturbance and recover to a functional state. Currently, the definition of traffic resilience is not unified. However, most resilience metrics can be classified as either a deterministic or probabilistic type. Among them, the most widely used may be the "resilience triangle".

Although previous studies focused mainly on traffic control with dimensionless indicators, including travel time or speed (Rao and Rao, 2012) , the common limitation of dimensionless indicators is that they cannot reflect the spatio-temporal correlation of congestion propagation in the traffic network. We propose an intuitive definition of traffic resilience based on the spatio-temporal clusters of congestion in the traffic network, combining spatial and temporal dimensions, which may realistically describe the adaptability and recovery capability of urban traffic (Zhang et al., 2019) . Based on real data, we find that the spatiotemporal cluster size of jams follows a scale-free distribution, with similar exponents on different days and in different cities, thereby suggesting the universal response of transportation systems to different perturbation scenarios. This finding may indicate that urban traffic has inherent adaptations on different temporal and spatial scales.

Early traffic modeling approaches concentrated mainly on traffic flows on highway roads, ranging from macroscopic scales (Lighthill and Whitham, 1955; Richards, 1956; Herman et al., 1972) to microscopic scales (Newell, 1961; Nagel and Schreckenberg, 1992) . Compared with onedimensional highway roads, the complexity of twodimensional traffic network systems increased dramatically. In this sense, road-level modeling may be insufficient to describe the complex behaviors of urban traffic networks. To overcome this challenge, Geroliminis and Daganzo (2008) proposed the MFD concept, which aims to analyze traffic flow characteristics on the network scale.

Considering the network properties of traffic systems, we propose four network modeling types (Fig. 3) , which are discussed in the following sections.

The interconnecting roads in a transportation system naturally form a network structure, which can be helpful for modeling the topological features of the road network explicitly. Generally, intersections are regarded as the "nodes", and road segments are regarded as the "links" in the network. Specifically, given that a road is typically designed with a certain direction, a road system can be described as a directed network. Meanwhile, as urban roads are typically classified into different ranks or service levels (indicated by the number of lanes or designed velocity), different links in the network model may have different weights (Wu et al., 2006) . Apart from the aforementioned method, another topological modeling approach is dual graph representation (Añez et al., 1996) , in which road segments are mapped into nodes, and intersections are mapped into links. A road network is a typical spatial network (Barthélemy, 2011; Crucitti et al., 2006) , as it is embedded mainly into a two-dimensional space. If the nodes in the spatial network connect only to their nearest neighborhoods, then the dimension of the network will trivially equal to the dimension of the embedding space. By contrast, the existence of long-range connections, such as the role of city highways in the road network in bridging distant areas, is likely to alter the network dimension (Li et al., 2011) . The underlying structure plays a significant role in urban transportation. According to network science research, the reliability of a network can be effectively influenced by its topological characteristics. For example, compared with random networks (Erdős and Rényi, 1959) , a scale-free network is robust to random failures but vulnerable to malicious attacks (Albert et al., 2000; Cohen et al., 2000; 2001) . Therefore, the topological modeling of urban transport networks may offer a basic understanding and suggestions for urban transportation planning.

As traffic flow is loaded and restricted in the underlying topological structure, the "shape" of the traffic flow network is typically the same as that of the road network. However, the traffic flow network should reflect the operational features of the traffic system, which can be represented by traffic operational data (i.e., velocity or flow volume of each road). For the network perspective, attention is paid to the organization process of the traffic flow as a network. How local traffic flows are organized collectively into a global city-scale flow can be described by percolation-based modeling methods, which is known as "traffic percolation" . Through percolation analysis, the relationship between the global traffic state (as the order parameter) and congestion rate (or speed threshold, as the control parameter) can be uncovered. Particularly, the critical percolation threshold, where the collapse of traffic health status occurs, can be used as an early warning signal of health-to-unhealth phase transition. Although previous traffic reliability analyses were implemented mainly from the user perspective, percolation-based analyses attempt to evaluate and improve functional connectivity from the network view.

The above two modeling approaches, whether topological network modeling or flow network modeling, are based on physical connections between neighboring entities (i.e., by static roads or dynamical flows) in the traffic system. Meanwhile, dependency between distant sites in the traffic network may also exist owing to the spreading dynamics. We propose a new approach to capture the correlation network in city traffic based on real-time data . Specifically, for each pair of road sections, we define the normalized maximal cross-correlation of their velocity time series in a day as their link strength (link weight). Next, we explore these correlation relations as a weighted network and analyze their statistical properties. Two major measurements can be used to identify the most influential locations, that is, the weighted degree and impact distance. The larger the weighted degree or impact distance of a road, the stronger the influence it could impose in the correlation network. The correlation network modeling approach provides new insights for assessing interactions between different local traffic flows, which may facilitate possible strategies for predicting and controlling traffic flows.

The spreading dynamics of cascading failures is an Fig. 3 Demos for traffic network modeling methods. (a) Topological network modeling (the gray lines represent common roads, and the bold green lines represent urban highways). (b) Flow network modeling (different colors represent different functional levels of the road segments, with the red, yellow, and green lines representing a low, medium, and high level, respectively) . (c) Correlation network modeling (the two road segments indicated by an arrow are highly correlated owing to their high cross-correlation value calculated by sequential velocity data) . (d) Failure network modeling (the spatio-temporal propagation behavior of cascading overload failures in spatially embedded networks, with the initial attack nodes indicated in red, the overloaded nodes in the current step indicated in deep blue, the failed nodes in the previous steps indicated in black, and the functional nodes that did not fail indicated in cyan) (Zhao et al., 2016) . attractive topic. The epidemic-spreading model is a basic spreading model, which can be implemented to analyze how a disease spreads among individuals during the dynamical contagion process. In contrast to direct contact in the spread of a disease, overload failures could propagate through hidden functional dependencies. For urban traffic systems, we apply the cascading failure model to construct the failure network and analyze the overload failure characteristics. Through analytical and numerical analyses, Zhao et al. (2016) found that failures spread radially from the center of the initial failure, with an approximately constant velocity. Moreover, we apply different dynamics to the cascading process to test system robustness against different perturbation types. We find that the percolation transition changes from the secondorder type to the first-order type when different dynamical models are introduced (Duan et al., 2019) . This outcome may suggest a universal behavior of cascading failures in networks and may give practical hints on how to design and construct robust networked systems.

The core of traffic health evaluation is to study how a system deviates from its healthy state. Therefore, we propose the steps for traffic health evaluation as follows: i) explore the system health patterns or modes, which is the basis for the traffic health evaluation; ii) assess how far the current traffic state deviates from the health standard; and iii) analyze the reason behind the health status deviation.

Existing traffic health measurements are based mainly on the traffic performance statistics of every single road, such as average velocity, average travel time delay, average distance of congested roads, and so on. These statisticsbased measurements rarely consider the coupling relations of performance or congestion between different elements in the traffic network. Therefore, we propose a graph-based traffic health measurement that can use network metrics from the four networks defined above.

After determining the health measurement, we analyze how traffic health changes when perturbations are introduced through either real-time traffic data or simulations. The purpose of this step is to discover whether certain traffic health patterns exist. Once the traffic pattern is observed, checking whether a certain physical or mathematical model exists that could describe and analyze this traffic pattern is important. For example, using the size of the largest traffic cluster consisting of high-velocity roads as the health measurement, we observe that multiple metastable network states appear over different days (Zeng et al., 2020) , which suggests a possible traffic health pattern. Specifically, by analyzing the relationship between the largest cluster size G and network congestion rate f, we find the existence of tipping points separating three regimes, that is, a global functional regime and a metastable hysteresis-like regime, followed by a global collapsed regime, which can be expressed by G-f relationship functions, where one metastable relation with high network performance can be regarded as the traffic health standard.

Based on the traffic health standards derived from certain traffic patterns, traffic health assessment aims to estimate whether or not the current traffic state is healthy and its "distance" from the health state. The assessment approaches can be classified as either model-based or data-driven.

For the model-based approaches, we define the standard model as certain traffic health patterns. We analyze the real-time traffic situation and compare it with the standard model to assess the current traffic status. For example, based on the percolation analysis, we obtain the G-f curve numerically indicating the current traffic health state. If this G-f curve is close to the standard curve, then it can be concluded that the traffic system is currently functioning properly; otherwise, the current global traffic is regarded as unhealthy.

For the data-driven approaches, we classify the traffic network states into different categories based on big data and machine learning technologies. The simplest approach may involve classifying global traffic as simply "healthy" or "unhealthy". Traffic health state classification can be trained by machine learning methods. Typically, two types of machine learning methods are used to achieve the goal, that is, supervised learning and unsupervised learning. The difference between the two methods lies in whether certain tags (i.e., healthy or unhealthy) exist for network states during the learning process.

Once the current traffic is assessed as unhealthy, health reasoning is conducted. Traffic health reasoning can explain the evidence supporting the assessment results and offer a target for future transport planning or management. As the causes of unhealthy global traffic can be attributed to road-level abnormality, traffic health reasoning aims to isolate or locate specific roads in the traffic network. To identify influential roads whose frequent congestion may lead to large-scale traffic jams and significant loss of overall traffic health, we can apply causal inference to analyze the causality between the overall health state and congestion frequency of specific roads. Generally, this problem can be solved by numerous existing tools, including Granger causality methods (Granger, 1969) , Bayesian score-based approaches (Chickering, 2002) , structural causal models (Peters et al., 2017) , and so on.

While the emphasis of traffic health evaluation is on the health perspective, traffic health diagnosis works on the failure perspective. Compared with traffic health evaluation, which assesses whether the traffic system is in a health or unhealth state from the systematic view, traffic health diagnosis focuses on further understanding failures at the local level. The causes of urban traffic congestion are complex, including an unreasonable road network structure, unsmooth traffic organization, and the untimely handling of emergencies caused by accidental factors. The accurate diagnosis of traffic congestion can help us fundamentally understand different management elements and form a systematic health management plan. However, traffic has complex functional couplings at different temporal and spatial levels. Traditional fault diagnosis is based mainly on manual experience or mathematical models, and is difficult to deal with unknown fault associations and causality. Mining massive traffic data, combining machine learning methods to conduct research on semantic rules of traffic congestion, and proposing urban traffic health diagnosis theories are key steps for improving traffic health management.

While health reasoning focuses on identifying the causality between the global health state and regional congestion, health diagnosis focuses on identifying the causality between regional congestion and different management factors. Specifically, causes of congestion could include structural reasons, dynamical reasons, and so on (Cao et al., 2021) . Structural reasons are due to the inappropriate topology of current underlying road networks, which can be checked by analyzing network topological features such as betweenness, closeness, and so on. This problem can be solved by traffic planning or traffic designing. Dynamical reasons are due to the imbalance between traffic resource supply and demand (Zheng et al., 2021) , which is typically caused by a short-term surge in traffic flow. With proper traffic guidance or control strategies, this problem can be alleviated.

The traffic management goal in previous studies was to either improve its efficiency or enhance its reliability. Although most existing traffic improvement methods consider efficiency-oriented and reliability-oriented goals separately, traffic health should be the tradeoff of these goals. Nevertheless, balancing different management goals is challenging owing to the complexity of traffic systems.

The key point is to realize that the balance between the different management factors lies in understanding the core failure mechanism of the traffic system. Based on our proposed traffic health index system, we consider healthoriented traffic improvement approaches with comprehensive consideration of different management elements. Specifically, health-oriented improvement can be conducted from structural and operational aspects.

We consider specific improvements from the perspective of structural health to provide support for long-term traffic management. On the one hand, in view of traffic potential improvement, we could focus on the structural optimization of critical roads and paths. The network design problem (NDP) (Yang and Bell, 1998 ) is a challenging research direction in traffic planning. Modeling and solving the NDP can be used to optimize link improvement and link additions in an existing road network. Traffic health can be improved continuously in the long-term by taking corresponding measures for critical roads and paths, such as road expansion and upgrading. On the other hand, the health optimization of the traffic structure can be conducted based on the structural reasons for traffic diagnosis. According to a health diagnosis reflecting different congestion causality, we can adopt a flexible promotion strategy, such as bus stop setting (Ibeas et al., 2010) or transportation planning.

We consider specific improvements from the perspective of operational health to provide support for short-term traffic management. On the one hand, aiming to improve traffic vitality, we could determine the key factors affecting traffic metastability and mode switching, such as high-speed urban highways , through certain traffic management measures, such as traffic control (Papageorgiou et al., 2003) , traffic guidance (Liang and Wakahara, 2014) , traffic scheduling (Zhou et al., 2020a) , or traffic pricing (Yang and Huang, 2005; Yang and Wang, 2011) , to maintain high vitality. On the other hand, we could perform traffic operation health optimization based on the dynamical reasons for traffic diagnosis. In case of emergencies or weather disasters in traffic operation, we should pay attention to accelerating the recovery of failed road sections in the network to drive the state of the traffic away from the tipping point.

Braess's Paradox (Braess et al., 2005) tells us that increasing the capacity of certain roads (including constructing new roads) could slow down overall traffic flow (i.e., increase overall travel time) in the network owing to the complex gaming behaviors of travelers. As a result, a comprehensive management framework may be necessary to realize the balance between different management targets. System health, as an index system for measuring a system's ability to adapt to uncertain environments, may offer an alternative strategy. SHM is employed to increase system availability and resilience throughout its operating lifespan.

Driven by the need to develop a new traffic health concept, we propose a traffic health index system including traffic reliability, traffic invulnerability, traffic resilience, traffic potential, and traffic vitality. Traffic reliability measures a traffic system's ability to meet certain travel requirements under specified conditions and within a specified time, which can describe how the traffic system functions in daily scenarios (i.e., before extreme congestion occurs). Traffic invulnerability describes the performance of traffic systems in the face of extreme risks, which can reveal the consequences of extreme congestion. Traffic resilience focuses on the recovery process under disruptions, which indicates how extreme congestion will dissipate. Traffic potential measures a traffic system's ability to maintain its basic functionality under a given fraction of congested or destroyed roads, including the identification of critical locations, critical paths, and critical condition. Traffic potential analysis can offer suggestions for increasing maximal traffic adaptation, which is not developed or released in current traffic systems. Finally, traffic vitality characterizes the possible operational states of a traffic system under uncertainty, including metastability and mode switching. Based on the new traffic health index system, we propose a novel traffic health management framework involving four steps, that is, traffic network modeling, traffic health evaluation, traffic health diagnosis, and traffic health improvement. The above steps are based on the core failure mode of the traffic system, that is, the spatio-temporal propagation of congestion from emergence to evolution and then to dissipation. For the modeling, we expand it to topological network modeling, flow network modeling, correlation network modeling, and failure network modeling. For the evaluation, we follow the logic of health patterns, health assessment, and health reasoning. With regard to diagnosis, we infer failures from structural and dynamical aspects; and for improvement, we consider health-oriented traffic management strategies from two aspects, namely, structural and operational improvement.

Given the complexity of traffic health management, a systematic approach focusing on the global traffic status has considerable significance. As an interdiscipline of statistical physics, mathematics, and network science, percolation theory offers a useful tool for analyzing the traffic organization process of how local traffic flows merge into global traffic. Based on percolation-based analysis, the relationship between the global traffic state (as the order parameter) and congestion rate (or speed threshold, as the control parameter) can be determined. As previous traffic reliability analyses were implemented mainly from the user perspective, percolation-based analyses attempt to evaluate and improve functional connectivity from the network perspective. Meanwhile, unlike most existing statistics-based reliability indices, the percolation approach can offer graph-based analysis results, which can incorporate inherent interactions in traffic dynamics.

Unlike efficiency-oriented traffic management, the new framework provides a health-oriented traffic management target. Current PHM methods are generally applied to equipment-level or simple system-level entities, whereas the health management of large-scale complex systems, such as traffic systems in megacities, which are typically complex human-vehicle-road-cloud systems, is rare. A new complex system health management concept is of great significance. The key lies in understanding failure emergence in complex systems. However, considering the large number of elements in a complex system as well as the complicated interactions of such elements, classical statistics-based or machine learning methods experience difficulties locating possible failure factors owing to the enormous state space of the system, which could lead to the "curse of dimensionality". The percolation-based approach can provide another path for analyzing system states from a phase-transition perspective. Based on percolation theory, we "collapse" the system states into an "order (health)" or "disorder (unhealth)" phase via its global functional connectivity regardless of its detailed interaction or dependence on the element scale. In this case, we focus on the critical state when the systematic health phase transfers to the systematic unhealth phase. Therefore, the health-unhealth phase transition condition can be defined as an early warning signal of failure emergence, supporting management decision making as the "preventive treatment of the disease". With this new management concept, our proposed indices and framework may shed light on the future intelligent management paradigm of complex transportation systems and other critical infrastructures.

Error and attack tolerance of complex networks

Dual graph representation of transport networks

Community resilience evaluation including interdependencies

Road network reliability caused by daily fluctuation of traffic flow

Emergence of scaling in random networks

Toward a theory of resilience for international development applications

Spatial networks

The impact of urban freight transport: A definition of sustainability from an actor's perspective

An introduction to road vulnerability: What has been done, is done and should be done

Preparing for critical infrastructure breakdowns: The limits of crisis management and the need for resilience

A systematic PHM approach for anomaly resolution: A hybrid neural fuzzy system for model construction

On a paradox of traffic planning

A framework to quantitatively assess and enhance the seismic resilience of communities

A methodology for road traffic resilience analysis and review of related concepts

Quantification of the impact of traffic incidents on speed reduction: A causal inference based approach

Valuation of travel time reliability from a GPS-based experimental design

Measuring improvements in the disaster resilience of communities

Network-based accessibility measures for vulnerability analysis of degradable transportation networks

A capacity related reliability for transportation networks

Health-index" evaluation method of metropolitan bus lines

The application of reliability in the road network performance evaluation

Learning equivalence classes of Bayesiannetwork structures

Resilience of the internet to random breakdowns

Breakdown of the internet under intentional attack

Assessing the demand vulnerability of equilibrium traffic networks via network aggregation

Transportation security and the role of resilience: A foundation for operational metrics

Centrality measures in spatial networks of urban streets

Optimization and reinforcement of regional highway network seismic based on connectivity reliability

A behavioral theory of multi-lane traffic flow, Part II: Merges and the onset of congestion

Universal behavior of cascading failures in interdependent networks

On random graphs I

Vulnerability analysis of critical infrastructure under disruptions: An application to China Railway High-speed

The biology of human resilience: Opportunities for enhancing resilience across the life span

A metric and frameworks for resilience analysis of engineered and infrastructure systems

Evaluation of resiliency of transportation networks after disasters

Resilience and efficiency in transportation networks

Universal resilience patterns in complex networks

Existence of urban-scale macroscopic fundamental diagrams: Some experimental findings

Investigating causal relations by econometric models and cross-spectral methods

Performance of transportation network under perturbations: Reliability, vulnerability, and resilience

Transportation network systems engineering

Assessment methodology of connect reliability based on substituted route

Identifying the most influential roads based on traffic correlation networks

Opinion: Measuring development resilience in the world's poorest countries

Generic metrics and quantitative approaches for system resilience as a function of time

Kinetic theory of vehicular traffic: Comparison with data

Resilience and stability of ecological systems

Resilience Engineering: Concepts and Precepts

Optimizing bus stop spacing in urban areas

Resilience and friability of transportation networks: Evaluation, analysis and optimization

Assessment method of expressway resilience considering different intrusion scenes

System Health Management: With Aerospace Applications

A review on the application of deep learning in system health management. Mechanical Systems and Signal Processing

A flow-based vulnerability measure for the resilience of urban road network

Modeling impacts of adverse weather conditions on a road network with uncertainties in demand and supply

A traffic flow simulator for network reliability assessment

Percolation transition in dynamical traffic network with evolving critical bottlenecks

Spatial correlation analysis of cascading failures: Congestions and blackouts

Dimension of spatially embedded networks

Real-time urban traffic amount prediction models for dynamic route guidance systems

On kinematic waves I: Flood movement in long rivers

Road network capacity reliability based on link travel time reliability

Network with degradable links: Capacity analysis and design

Optimal Prognostics and Health Management-driven inspection and maintenance strategies for industrial systems

Condition-based maintenance optimization by means of genetic algorithms and Monte Carlo simulation

Resilience in developing systems: The promise of integrated approaches

Mathematics for Reliability Analysis. Tokyo: Asakura-shoten

Thresholds and the resilience of Caribbean coral reefs

A cellular automaton model for freeway traffic

Nonlinear effects in the dynamics of car following

A three-stage resilience analysis framework for urban infrastructure systems

Review of road traffic control strategies

Current status of machine prognostics in condition-based maintenance: A review

Ecosystem resilience despite large-scale altered hydroclimatic conditions

Measuring urban traffic congestion: A review

A framework for defining and measuring resilience at the community scale: The PEOPLES resilience framework

Shock waves on the highway

Economic resilience to natural and man-made disasters: Multidisciplinary origins and contextual dimensions

Estimating network travel time reliability with network partitioning

Simulation and analysis of the carrying capacity for road networks using a grid-based approach

Reliability-based traffic network design with advanced traveler information systems

Investigating the effects of reserved lanes for autonomous vehicles on congestion and travel time reliability

Progress in invulnerability of complex networks

Application of accessibility based methods for vulnerability analysis of strategic road networks

Research on health status evaluation system of road traffic in old city area. Urban Roads Bridges & Flood Control

An improved model for dependent competing risks considering continuous degradation and random shocks

Research on invulnerability of urban transit network based on complex network

Congestion in different topologies of traffic networks

Travel time reliability in road network associated with road section

Urban transport network design based on link capacity reliability

Interconnectedness and resilience of the US economy

Models and algorithms for road network design: A review and some new developments

Mathematical and Economic Theory of Road Pricing

Vulnerability analysis of road networks

Resilience-driven system design of complex engineered systems

Multiple metastable network states in urban traffic

Switch between critical percolation modes in city traffic dynamics

Status and perspectives of Prognostics and Health Management technologies

Scalefree resilience of real traffic jams

Day-to-day road network vulnerability identification based on network efficiency

Spatio-temporal propagation of cascading overload failures in spatially embedded networks

Fractal characteristics of mountain cities' traffic flow with sub-health state

Extreme unbalanced mobility network in bike sharing system

A robust train timetable optimization approach for reducing the number of waiting passengers in metro systems

Urban transportation decisionmaking evaluation index system based on healthy development concept. Urban Transport of China

Reliability engineering: Old problems and new challenges

Challenges in the vulnerability and risk analysis of critical infrastructures

Representing perceived tradeoffs in defining disaster resilience