key: cord-0065413-4jjy84iz authors: Schulman, Paul R. title: Reliability, uncertainty and the management of error: New perspectives in the COVID‐19 era date: 2021-03-12 journal: nan DOI: 10.1111/1468-5973.12356 sha: f31a116cefdfca1c8ae2938a2143fd71ff4d1ff6 doc_id: 65413 cord_uid: 4jjy84iz This essay argues for the importance of error as an organizing concept in the management of hazardous technical systems to high levels of reliability and safety. The concept of “error” has been essential to the development of high reliability organizations (HROs). As practiced in HROs, error management has also been an important strategy for the management of uncertainty. “Uncertainty” has been conceived by some analysts as a condition that can convey little or no reliable information about its own boundary conditions or its specific threat to the operation of complex systems. The argument here is that uncertainty is differentiated and specified in HROs and provides important information in relation to error. Uncertainty does not, in the special context of HROs, end the possibility and practice of reliable management. In fact, error in HROs can be a starting point for the further analysis of ways in which uncertainty itself can be managed reliably. But the argument offered here does not mean that uncertainty does not challenge reliability in other settings. The COVID‐19 pandemic is offered as an example of how uncertainties may invalidate even the application of "reliability" as a performance standard in certain domains of management and policy. For better or worse error has been a prominent feature in nearly all realms of human performance. So has the idea of reducing error. Even the structure of the United States government, with its separated and federated powers, was designed and has functioned as a hedge against error by government in the exercise of its power (Landau, 1969) . At the same time, error itself has positive aspects as well. Obviously "trial and error" has been a major source of learning, under which improvement can actually depend on error. 1 Even the evolutionary adaptation of species depends on mutations that confer selective advantage, and these mutations themselves occur because of random errors in DNA replication. 2 Error is also an important element in the development of science as a process of cumulative knowledge building. As organizational theorist, Martin Landau asserted: "The invention of the hypothesis has been one of the great liberating forces of the modern world. With it, sin was transformed into error" (Landau, 1972) . The belief or non-belief in an argument then depended on its testing against possible falsification, not the identity or moral worth of its proposer. Importantly, the concept of error brings its own discipline to both organized thinking and practice. Physicist Wolfgang Pauli once dismissed an "argument" that did not live up to this discipline: "it's not even wrong!" He meant in this that it did not offer an argument formal enough to be "decidable" by agreed-upon processes of validation or falsification. More on this will follow. While we may strive to be "nearly error-free" in the management of highly hazardous technical systems, error is still a possibility, if not an ongoing fact of life, in the organizations that operate hazardous systems. In fact, addressing the possibility of error has been a major organizing principle in the management effort to achieve high reliability in the operation of some of these systems. Much of the analysis to be offered here is based on empirical examples from HRO research and its description of how these organizations operate. 3 The conceptual and analytical foundations underlying human "error" have been well established in psychological and human factor research (Norman, 2002; Reason, 1990; Salvendy, 1997) , and in the role of error in organizational reliability and accidents (Dekker, 2014; Lalenti, 2021; Perrow, 1983; Rasmussen, 1982 Rasmussen, , 1990 Reason and Mycielska, 1982; Turner, 1978) . A variety of specific forms of human error have been identified, and James Reason has offered a useful typology of error based on a performance-level framework offered by Rasmussen and Jensen (1974) that differentiated between skill, rule and knowledge-based performance errors. Reason described specific types of error in each category: unintended slips and lapses are skill-based errors; mistakes are errors of intent based either on mis-judgment in rule-application or errors in a rule or procedure itself, and knowledge-based errors in planning or decision-making are errors of miscalculation founded on ignorance or misunderstanding (Reason, 1990) . These types of error are each of importance to HROs in their efforts to ward off events that simply "must not happen" in their operation of hazardous technical systems. In nuclear power plants, commercial aviation (including air traffic control systems), as well as other critical infrastructures, a distinctive form of error management has been a common framework for the pursuit of high reliability (Roe & Schulman, 2016) . It is in particular oriented to knowledge-based performance failures. For high reliability organizations, the inverse of knowledge is not ignorance or uncertainty-it is error. They identify and categorize uncertainty in relation to specific errors in decisions and actions they seek to avoid in order to preclude failures and accidents that can lead to catastrophic consequences and are also surrounded by societal dread. 4 When it comes to those critical complex, large-scale infrastructure systems societies increasingly depend upon, long duration loss of service can create major social disruption. But beyond service loss, many infrastructures manage hazardous technical systems which in themselves can fail in ways that produce catastrophic effects such as a loss of radiation containment at a nuclear power plant, sudden large-scale and extended blackouts in a high voltage power grid, collisions of commercial jets under air traffic control, or contamination of large urban water supplies by dangerous bacteria. In general, the consequences of these events are ones that could involve multiple deaths from the same failure or accident, and deaths that include populations outside of the operators and employees of the managing organization (Perrow, 2004) . These events are so dreaded by the public that they force, but also enable, infrastructure managers to focus on key omissions, errors and failures (and, in- creasingly, external attacks) that could lead to them. This focus takes priority in both specific decisions and general organizational culture over efficiency or even short-term service goals (LaPorte, 1996; Schulman, 2020) . The social dread surrounding this set of catastrophic hazards is stable and prospective, and qualitatively different from the retrospective condemnation that surrounds many other accidents but that then fades from public attention over time in issue attention and accident cycles. 5 This has meant that strong and stable financial, political and regulatory foundations are placed under these highhazard organizations in support of their high reliability priorities and management. Because the dread of accidents is both an organizational threat and a resource, high reliability organizations have an ongoing concentration on error (Weick & Sutcliffe, 2015) . Elaborate analytic modelling of failures and their link to potential errors is undertaken. In American nuclear power plants, for example, not only are "design basis" accidents, which a plant is designed and built to avoid or withstand, carefully modelled and probabilistically analysed but also the possibilities of "beyond-design basis" accidents must also be considered, including errors in a design basis accident analysis (U.S.NRC, 2019). The HRO approach includes analysing both probable and possible chains of error or failure that can lead to unacceptable accidents, and then analysing precursor conditions that can lead to upstream errors that could propagate along causal chains to the ultimate failures or accidents. These precursor conditions include physical conditions (e.g. excessive operating temperatures and pressures, loss of backup equipment, loss of sensor and monitoring inputs) that exceed a bandwidth of acceptable operating conditions. But they also include organizational conditions, for example cognitive load on operators, excessive noise in control rooms, breakdowns in organizational communication and erosion of inter-departmental cooperation and trust (Schulman, 1993) . It is understood that these precursors, if uncorrected, can degrade decision-making and control processes and induce error (Farhoun, 2005; Pettersen & Schulman, 2019; Roe & Schulman, 2008) . Within this strategy, high reliability managers strive to keep operations out of precursor zones. In the case of nuclear power plants, control operators may shut down reactors if they believe operations have moved into precursor conditions, and commercial pilots may refuse to fly whether they believe equipment or weather conditions are "hazardous". It is important to recognize that HROs are managing technical systems that are well-understood on the basis of physical principles and long operational experience. These HROs operate under a set of control variables, actionable features that can be manipulated to alter the overall condition or state of their infrastructure. In electricity grids, for example, generation dispatching can order the increase or decrease in generation inputs; dispatchers can also manipulate transmission routes as well as voltage and frequency in order to stabilize a grid. Understanding their technical systems and the role of control variables allows HRO managers to have a clear idea of what constitutes "rules of rightness" with respect to the operation of their systems. This gives them an agreed-upon framework for the identification of error (Roe & Schulman, 2016) . At this point consider this precursor strategy in relation to processes of risk assessment and risk management undertaken by many organizations. Some risk assessments attempt to consider certain accidents in relation to skill-based errors (e.g. maintenance lapses or slips) or rule-based errors (e.g. procedural gaps or mistakes in following a specific procedure) as risk "drivers" and thus directly connected to the risk of accidents. These are then assessed in relation to risk controls put in place to reduce overall accident risk. In one risk-analytic framework, for example, applied to hydrocarbon releases in offshore drilling operations, its authors note that for a number of the upstream risk influencing factors their "main area of application is not the calculation of the release frequency itself, but use of the method to assess the effect of risk reducing measures and risk increasing changes during operations" (Aven et al., 2006) . This could indeed be a method for analytic consideration of risk drivers as precursors. In many risk assessments, however, it is risk controls that are closely assessed in relation to the calculation of risk and not risk drivers. The drivers of risk are frequently described in general categories, such as "incorrect operation," and do not have the granularity which connects them to specific errors which in turn connect to specific risks. In fact, a driver such as "incorrect operation" is itself likely to be a consequence of some upstream precursor consisting of organizational factors. But rarely, given the difficulties of quantifying these factors, do risk analyses consider specific organizational factors as precursors to drivers and thus connected to either likelihoods or consequences in accident risks (Danner & Schulman, 2019) . In this respect, risk management is in typical practice a different process than managing to precursors. Also, for HROs, given their commitment to prevent deterministically (not simply probabilistically) catastrophic accidents as precluded events, it is the possibility of these accidents, more than their calculated probability that underlies the precursor strategy. A probability estimate is meant to suggest the frequency distribution of an event over a large number of trials or years. It does not speak to individual events that should never happen in any trial or in any year. Probability distributions have validity in relation to the expected incidence of similar events over a large-run of cases. While two "hundred-year" events in back-to-back years might just be "bad luck" and therefore does not invalidate an overall probability estimate, managerial decision-making that embraced such a possibility could certainly not be a foundation for a precluded event reliability standard and strategy that must apply to every case in any year. It is important to note at the same time that a precursor strategy for high reliability organizations can also include uncertainty surrounding their operations. And here as well the concept of error plays a special role. Many analysts have asserted that a "deep uncertainty" is now attached to modern technologies and their management because of their increased scale and complexity (Dekker et al., 2011; McDaniel & Driebe, 2005; Waldrop, 1993; Walker et al., 2013) . Ironically, at the same time, the formal definition of uncertainty itself remains uncertain-the concept has been left seriously under-specified. 6 One insightful effort to clarify the idea of uncertainty has been offered by technology researcher Andrew Stirling who distinguishes risk (where both likelihoods and consequences can be estimated for choices and actions), from ambiguity (where likelihood of events can be estimated but the full range of consequences cannot), uncertainty (where consequences can be estimated but not likelihoods) and, finally, ignorance (where neither events nor consequences can be estimated). (Stirling, 2010) . It is often assumed that all "deep" or epistemic uncertainty is really ignorance-the realm of the famous "unknown unknowns"-and this state can convey little or no reliable information about its own boundary conditions or specific threats to the operation of complex systems. As economist Frank Knight put it, "We can never be certain about uncertainty (Knight, 2002) ." For this reason, "epistemic uncertainty" has been assumed by some to constitute an inestimable danger to the reliability and safety of complex, hazardous technical systems, not least of which are many of society's critical infrastructures for water, energy, telecommunications, transportation and financial services (Beck, 1992; Downer, 2011) . It has even been further asserted that "some failures are inherently unpredictable and therefore unavoidable" (Downer, 2020, p. 80) . But as HROs demonstrate, epistemic uncertainty need not be synonymous with unknown unknowns. A great deal of uncertainty can in fact be categorized, described and even bounded by ranges of confidence . Within a high reliability management framework, uncertainty can be identified, categorized and bounded in its relation to possible impacts on error in decisions and operations that could cause those events and failures which have been identified as ones that must not happen. As a result of these categorizations, specific uncertainties in themselves could constitute conditions identifiable as potential precursors to these events and failures. In more than one case, a pilot has refused to fly under uncertain weather or uncertain air worthy-ness of an aircraft. Control operators have also refused to operate and have shut down reactors in the face of uncertainty in their condition or under operating conditions that have not been experienced before. In this respect, losing information about predictability is itself information that can lead to system shutdowns. It is this framework of error management through precursors that can bound epistemic uncertainty. For high reliability managers confronted with uncertainties, the issue is not epistemic uncertainty or even ignorance per se, but preventing or otherwise managing operational error in both real time and at all times. "Error" becomes an exacting context for both learning and managing in the face of uncertainty when operating under mandates of ensuring the safe provision of a critical service. It requires the ability to identify decisions and actions that would be or have been "wrong" within the context of agreed-upon rules of "rightness" for effective safety management. In other words, just as scientific research rests on the testing of falsifiable propositions, high reliability error management rests on actions or conditions that are also "decidable" as correct or incorrect in relation to safety or risk. And conditions that are undecidable with respect to error are likely to be avoided as a form of risk in themselves. A deliberate bias in high reliability error management strategy is to accept possible errors of excess caution (even in shutting down a system) in order to suppress those of hubris and over-confidence. In this sense, the fear of unknown unknowns does not lie at the centre of high reliability in the management of infrastructures. More pressing a concern is the possibility of unknown knowns-that there is information relevant to preventing error and subsequent accidents that could be known if thought about or that is known by someone but has not been communicated to the right people at the right time. Moreover, this perspective of high reliability managers has been well justified by accounts in the accident literature (Bazerman & Watkins, 2008) . It is very seldom that "deep" uncertainty lies at the heart of major infrastructure accidents. From Three-mile Island to Chernobyl, the 9/11 terror attacks to Deepwater Horizon it was not aleatory uncertainty or unknown unknowns that were the causes of these catastrophes, but rather errors based on what could have been and should have been known or in fact was known and even predicted by some, but was not communicated to or was disregarded by those in positions to act. 7 In these cases, it was not uncertainty that caused major accidents but rather specific errors in management, lapses in communication or in James Reason's formulation, errors in planning, execution or judgment (Reason, 1990 ). 8 Uncertainty is often differentiated under high reliability management into several types, each with different challenges and implications for reliability and safety. In this differentiation, uncertainty actually conveys information relevant to safety strategy once it is categorized in relation to the distinctive types of errors it can generate. This information is then used to promote reliability and safety in both technical design and managerial strategy. Consider the examples offered below. A dominant general strategy observable in HROs is to avoid errors connected to the systematic underestimation of risk, including both hazards and likelihoods of failure. There is a constant effort to test for the mis-specification, mis-estimation and misunderstanding of things -an effort to protect against what Karl Weick has called "failures of rendition" (Weick, 1993) . When confronted with uncertainty regarding probability or consequences, managers and operators in high reliability organizations do not simply assert a single number or an average in a distribution of numbers for either, but will reflect or "characterize" uncertainty by presenting a range of likelihoods or consequences. 9 Often, as we have noted, they will operate with a view to imagining and avoiding HRO studies have also described questioning in work planning sessions and in procedure writing and re-vision such as: "What assumptions are we making here? How confident are we in them? What if we're wrong about one or more of them?" One report describes a maintenance manager in a nuclear power plant who, after one maintenance procedure revision session, cautioned his maintenance crew supervisors about how they should communicate the change to their personnel: "Don't let them think we've thought of everything and that this technology can't still surprise them." (Schulman, 2004) . Incompleteness uncertainty can also lead to errors in understanding and action. "What haven't we thought of?"; "What are we leaving out?" and often, "What are we missing that is right in front of us," are also questions frequently posed in reports on decisionmaking and work planning sessions (Roe & Schulman, 2016) . These questions help guard against premature closure to additional variables in the assessment of options. Asking questions such as these is how employees and managers in HROs have learned to widen perspectives and identify more variables that enhance the reliable management of their systems. This probing for epistemic uncertainty also leads to the analytic enlargement of precursor factors and therefore the extension of precursor management. It is not "managing the unexpected" (Weick & Sutcliffe, 2015) that describes operating in HROs; it is the enlargement of expectancies (Pettersen & Schulman, 2019) . It is all part of a continuous search for improvement that is characteristic of these organizations (LaPorte, 1996) . Psychologist Gary Klein describes "experts" as engaging in "recognition primed decision-making" in which they begin by searching for similar conditions in previous cases but are also aware of differences between those earlier cases and present situations (Klein, 2008) . Similarly, key personnel identified throughout much of HRO research are simultaneously good at both pattern recognition in unfolding real-time circumstances and referencing them against past experiences and responses. But they are also skilled in recognizing anomalies-differences in what they are seeing in real-time occurrences in contrast to expected ones (Roe & Schulman, 2008) . The observational view derived from HRO research is that epistemic uncertainty is not inevitably a risk of indescribable proportions to well-managed complex infrastructures. There need be no "deep" epistemic uncertainty about uncertainty itself. HRO management takes seriously known unknowns and additionally strives to identify what might be currently unknown ones as well. Unlike the concept of "deep" unbounded uncertainty, in HROs, uncertainties can be quite granular-as granular as the error possibilities to which they will be analytically connected. In short, "uncertainty" should not be a concept that ends the belief in reliable management but instead should be understood as a property that can prompt analysis of the specific organizational errors it can lead to, and in this way, uncertainty can itself be managed reliably. It is important to note here that while HROs focus their uncertainty management heavily in relation to identified precluded system events, other non-system events, such as individuated accidents-slips, trips and falls, for example-may not receive similar treatment. Slips, trips and falls may not be subject to special precursor management in HROs beyond what they would receive in other organizations as a result of occupational health and safety regulatory requirements. Here, conventional risk management and lagging safety indicators may well predominate in the treatment of these individuated accident risks. Also, some long-term societal risks may also not be part of high reliability management. Hazards related to the ultimate disposition of spent reactor fuel, for instance, have not been the focus of nuclear plant management attention. Also, the long-term effects of the reliable outputs of some HROs such as those in air transportation may in themselves contribute to long-term environmental hazards such as air pollution or global climate change. Even the component reliability of energy providers to an electrical grid may not be subject to much grid manager attention. As one shift supervisor commented in relation to the reliability of a key out-of-state high voltage intertie providing peak-load power to the grid: "It's either there or it isn't." The precursor and error focus in HROs are bounded in this way by the time-frame and scale associated with their operations and service outputs. Individuals in HROs may think about broader societal consequences in extended time frames associated with their production processes and outputs, but they do not manage to them . The description presented here is only one part of a larger story. An equally, if not ultimately more important argument, is about where the HRO description above does not apply. The argument presented here concerning the concept of error and its primacy in reliability management should not be generalized beyond the highly special- For HROs what constitutes an "error" is a readily decidable question. It is decidable because an action taken or a decision made is directed towards a specific and bounded intent, and its impact in relation to that intent is likely to be clear. The concept of error, in other words, rests on an agreed-upon clear foundation of operational falsifiability that is what makes errors "wrong." Because of their dreaded consequences, the events HROs are attempting to preclude are not in dispute for prevention, and they have clear and immediate recognition, if they did happen. The objectives and priorities surrounding preventing them are widely accepted both within and beyond the organization. As noted earlier, this is a strong social and political foundation for high reliability management. This social and political foundation supports expensive safety investments and strong regulation on behalf of safety. Many HROs can even go "offline" in service temporarily under unsafe conditions, and do so with regulatory support, rather than being forced to operate in precursor zones with respect to safety. To sum up, HROs are able to operate as organizations within well-defined conceptual and analytical, as well as physical and political boundaries that stabilize input resources and service demand and thus reduce input variance in their production processes. Because of the low input variance and the maturity and experience base of their technical cores, HROs can also operate under low process variance directed by elaborate procedures and anticipatory modelling. Low input and low process variance can then lead to low output variance-the reliability of service and safety they achieve in their operation (Schulman, 1993) . It is important to recognize that the features outlined here are not characteristic of the work of the vast majority of public agencies and private market-centred organizations. Those organizations must operate in turbulent political or competitive market environments, with multiple and often conflicting objectives, unsettled "technologies" and ambiguous performance standards. 11 Here, the concept of "error" itself has a less than clear application. Fewer management and decision propositions are readily falsifiable, and it is less clear that they are even decidable under current understanding of the system variables under management. As one executive described it: "The more important the decision, the less precise the tools we have to analyze it and the longer it will take before we know if we were right" (Bolman & Deal, 2017, ch. 2) . While a number of prescriptive analyses have defined and endorsed high reliability as a desirable management process (Alavosius et al., 2017; Hopkins, 2014a; Weick & Sutcliffe, 2015 It is likely that many organizations, even current HROs, may well be facing ever-deepening uncertainty, unmoored to specified errors, from which they cannot escape. Consider the COVID-19 virus and its recent pandemic as an example. The COVID-19 pandemic has many features as a policy and managerial crisis that lies well beyond the foundations of high reliability organizations or many current public health organizations. This is a very different domain from the careful management of well-understood technical systems to produce repetitively safe and continuous outputs while avoiding well-defined and unacceptable events-which are clear errors and failures (Quarantelli, 2000) . This low social and political activation over flu deaths has proven a difficult background condition against which to develop worstcase peak-load stockpiles of hospital and healthcare resources such as ventilators, ICU capacity, masks and testing facilities. In a recent report comparing military and healthcare preparedness strategy and practice in the United States, for example, two public health analysts note that while "the federal government invests substantial resources in military preparedness" [where readiness in relation to national security objectives are pursued], "our systems [of healthcare] were set up with an expectation of scarcity and rationing when stressed, and that is what we have experienced…. There was little interest in building surge capacity in the U.S. healthcare system" (Rosenthal & Jones, 2020) . It may be that as a result of the COVID-19 experience the public may develop a widespread shared dread that could be a foundation for setting standards, defining precluded events and evolving lead organizations (perhaps even with centralized control rooms) that can coordinate actions and enforce controls on behalf of "high reliability" pandemic management. Under such conditions, precluded event standards in the United States might develop and even come to include avoiding strong correlations between death rates and racial or income inequality. But at the same time what would be the overall output standards in disease incidence and deaths to apply to assess "reliable" performance in pandemic management? It is probably just as likely that after successive years without a follow-on pandemic, public attention to the collective risk would fade, support for prospective readiness flag and pandemic organizations again struggle for resources, in other words a return to the current norm for much of healthcare infrastructure, at least in the United States. Apart from the difference in psychological and political foundations, another important difference exists between the COVID-19 challenge and that of managing physical infrastructures. This is the difference between managing bounded physical systems with wellunderstood properties, and a novel virus whose key properties are still being discovered, with multiple impacts on huge and diverse populations worldwide and a complex set of separate institutions, both public and private, and across different nations. The uncertainties associated with a new virus invite error in the treatment of cases, the anticipation and modelling of disease transmission and propagation, as well as in unexpected consequences connected to specific policy and managerial decisions (Ramachandran et al., 2020) . A recent description by a physician of treatment errors in ICU units describes "staff members working well out of their comfort zones, with unfamiliar systems and equipment, caring for a career's worth of critically ill patients in two months" and notes that "doctors were understandably desperate to help their patients, but the resultant frantic prescribing, especially of hydroxychloroquine, probably caused more harm than good" (Ofri, 2020) . needed" (Vaziri, 2021) . This additional study, of course, will have to be undertaken even as major efforts are already underway worldwide to inoculate hundreds of millions with the current vaccines. To sum up: in its character as a public problem and in its firstorder impacts, the COVID-19 pandemic is a condition that challenges the meaning of concepts such as reliability and even accountability. Reliability is challenged not only in information-gathering, decisionmaking and other management processes, but also in relation to predictable, repetitive outputs and outcomes in the containment of the virus. Not only has high reliability as practiced in HROs been impossible in process and outputs, but it is not clear what "reliability" itself should mean as a performance standard applied to the public health management of the current virus, given the speed and uncertainty in its propagation within and across nations and the limits of our current control variables. Even accountability becomes confused and unreliable as a concept to apply in the face of unclear, if not conflicting, performance standards to be applied to current COVID-19 policy and management. We can argue that learning and adaptation are themselves critical performance standards but how reliable can they be in the face of moving targets? Even before the COVID-19 pandemic those HROs managing infrastructures have been facing reliability challenges-including more complex and closer interconnections between the infrastructures such as electricity, water, telecommunications and transportation-as the output of one is increasingly a needed input for others. Reliability has now become a networked property that eludes the control and management of single organizations . Further, the interconnections themselves can shift during times of crisis in one or more of the infrastructures as latent interdependencies are revealed. Under these conditions both increased input and process variance are real prospects. Error-framed uncertainty could well disappear, to be replaced with deeper and perhaps unspecifiable uncertainty. It will be unknown unknowns and not unknown knowns that will then be the controlling elements in management outcomes. Under these conditions error will be transformed into surprise under conditions of continuing uncertainty, and surprise will produce outcomes well beyond their clarification by the term "error." Unlike errors within high reliability settings, errors will be increasingly undecidable, or in Wolfgang Pauli's words "not even wrong" within technical design principles and frameworks of both policymaking and organizational management. It is already argued by complexity theorists that adaptation and resilience must be the new fundamental principles for successful organizational management (Hollnagel et al., 2011; McDaniel & Driebe, 2005; Woods, 2020) . In a future with shifting background conditions such as climate change, the concept of reliability may well be superseded by conditions so surprising that its usefulness as either an organizing strategy or an evaluation criterion for policy and management may well disappear. Outputs across a wide variety of contexts and managerial scales may become unpredictable and increasingly unstable. If this happens, there will be few if any "high reliability organizations" in the future and new concepts might very well have to emerge to replace "reliability" as a standard to apply to a public policy and organizational performance. The arguments presented here suggest that we will need new performance standards to apply to policies and organizations in the face of the deepening complexity and scale in technologies and social problems that they address, and the turbulent and uncertain environments within which policy and management will have to be conducted (Ansell et al., 2017) . These standards will require more careful research into how complexity and scale, including the uncertainty they introduce, are likely to challenge performance across different problem and organizational domains. Among important questions that might guide research on this issue would could be: 1. Can we identify new policy and management control variables that will be both necessary and sufficient for enhancing and stabilizing performance against complex, large-scale and uncertain problems such as global climate change or global pandemics? This will require simultaneous analysis of both technical and social dimensions of objectives with respect to these problems. 2. Can we develop a "science" of public objectives that will inform us about what a specific objective will require in organizational capacity given the technological, social and administrative challenges it poses? This would seem to be a foundational requirement for the evolution of realistic expectations upon which to base performance standards, assessments and accountability. 3. Can we improve the capacity of public organizations to anticipate and forecast the growth trajectory of problems and the likely effects of policy responses to them? This will require the development of models which can incorporate both technical as well as organizational and social factors. More complexity will require the building of forecast models that incorporate more variables, ranging from macro-level phenomena to micro-level human behaviour surrounding the performance of specific tasks in specific organizations. Mining new sources of "big data" might improve our understanding of both larger-scale social patterns as well as regularities in individual behaviour that could improve the predictive accuracy of policy models (Pentland, 2015) . 4. How can we differentiate more clearly in a world of unstable and uncertain policy and management undertakings, those performance failures that reflect preventable errors (e.g. from incompetence, leadership failures or resource deficiencies) from performance failures that stem from unrealistic objectives imposed on policymakers and managers given the inherent challenges in scale, uncertainty and instability surrounding a problem itself? 5. Finally, with increasingly rapid changes in complexity, scale and uncertainty surrounding many public problems, it seems more likely that policies directed to each of problem separately will increasingly intersect across domains of quite different agencies and organizations. Already COVID-19 issues intersect across nations, but also across public health, law enforcement and social service agencies, not to mention economic policy organizations within nations. Climate change also entangles policymaking and administrative organizations in environment, energy, emergency management as well as public health domains. These entanglements may well require diverse organizations with quite different specializations to coordinate their separate but increasingly overlapping control variables, planning and decision-making processes. While there have been many exercises in "cooperative governance" these have generally been between agencies of related specialties, and moreover, even these have not always been successful. Managers are not generally trained in managing inter-organizational collaborations (Donahue & Zeckhauser, 2012) . What skill sets and training as well as changes in organizational design might be required in the future to facilitate acceptable performance in relation to entangled policy and management objectives? Data sharing not applicable -no new data generated. 1 It has been argued, for instance, that the Three-Mile Island Accident was a milestone in improving reliability in the management of reactors throughout the nuclear power industry (World Nuclear Organization, 2020). 2 As noted by biologist Joanna Masel, "If the mechanisms interpreting genetic information were completely flawless, organisms would stay the same all the time and be unable to adapt to new situations or changes in their environment." (Science Daily, January 11, 2011). ht tps://w w w.scien cedai ly.com/relea ses/2011/01/11012 51724 18.htm 3 For basic HRO literature, see LaPorte (1996) ; LaPorte and Consolini 1991; Roberts (1993) ; Schulman (1993) ; Roe and Schulman, 2008; and later MacRae (2014); Hopkins (2014a) and Roe and Schulman (2016) . More recent works have connected HRO features to aspects of organizational culture. See Helmreich and Davies (2004) ; Cantu et al., (2020) ; and Schulman (2020). Leadership and crew resource management in high-reliability organizations: A competency framework for measuring behaviors Governance in turbulent times Barrier and operational risk analysis of hydrocarbon releases (BORA-Release): Part I. Method description Bureaucracy and democracy: Accountability and performance Predictable surprises The risk society, towards a new modernity High reliability organization (HRO) systematic literature review: Discovery of culture as a foundational hallmark Inviting disaster Rethinking risk assessment for public utility safety regulation Drift into failure The complexity of failure: Implications of complexity theory for safety investigations Collaborative governance 737-Cabriolet': The limits of knowledge and the sociology of inevitable failure On Ignorance and Apocalypse: A Brief Introduction to 'Epistemic Accidents Up and down with ecology -The issue attention cycle Organizational learning and action in the midst of safety drift Why the economic recovery will be more of a 'Swoosh' than V-Shaped Culture, threat and error: Lessons from aviation Resilience engineering in practice Learning from high reliability organizations Issues in safety science Sources of power Risk, uncertainty and profit Drum breach: Operational temporalities, error politics, and WIPP'S kitty litter nuclear waste accident Redundancy, rationality and the problem of duplication and overlap Objectivity, neutrality and Kuhn's paradigm High reliability organizations: Unlikely, demanding and at risk Working in practice but not in theory: Theoretical challenges of high reliability organizations Close calls: Managing risk and resilience in airline flight safety Uncertainty and surprise in complex systems: Questions on working the unexpected The design of everyday things The public has been forgiving: But hospitals got some things wrong Social physics: How social networks can make us smarter The organizational context of human factors engineering Normal accidents In search of the issue-attention cycle Drift, adaptation, resilience and reliability Emergencies, disasters and catastrophes are different phenomena New York response worsened pandemic: Hasty expansion of facilities by state, city and hospital leaders led to grave mistakes Human errors Human error and the problem of causality in the study of accidents Mental procedures in real-life tasks Absent minded? The psychology of lapses and everyday errors New perspectives on organizational research High reliability management Reliability and risk A reliability & risk framework for the assessment and management of system risks in critical infrastructures with central control rooms Preparedness and unpreparedness: Healthcare vs the military. (Covid-19 White Paper #15 Handbook of human factors and ergonomics The flaw of averages: Why we underestimate risk in the face of uncertainty The negotiated order of organizational reliability General attributes of safe organizations. Quality and Safety in Health Care On the future of high reliability organizations in the age of turbulence Organizational structure and safety culture: Conceptual and practical challenges Extending reliability analysis across time and scope Keep it complex Man made disasters Causes of disaster: Sloppy management The disease burden of influenza Guidance on the treatment of uncertainties associated with PRAs in risk-informed decision making Bay area has its own scary new virus variant: Why experts are so concerned Complexity theory: The emerging science at the edge of order and chaos Adapt or perish: A review of planning approaches for adaptation under deep uncertainty. Sustainability Covid-19 forces delay of 7 reliability standards, including cybersecurity supply chain protections The collapse of sensemaking in organizations Managing the unexpected: Sustained performance on a complex world Bureaucracy: What government agencies do and why they do it The strategic agility gap: How organizations are slow and stale to adapt in turbulent worlds infor matio n-libra ry/safet y-and-secur ity/safet y-of-plant s/three -mile-islan dacci dent. aspx#:~:text=In%20197 9%20at%20Thr ee%20Mil e,backg round %20lev els%20to%20loc al%20res idents How to cite this article: Schulman PR. Reliability, uncertainty and the management of error: New perspectives in the COVID-19 era