key: cord-0651228-r32f4emw
authors: Blesch, Maximilian; Eisenhauer, Philipp
title: Robust decision-making under risk and ambiguity
date: 2021-04-23
journal: nan
DOI: nan
sha: 60c4a014d9a95299ec731febef9fc9968737fd66
doc_id: 651228
cord_uid: r32f4emw

Economists often estimate economic models on data and use the point estimates as a stand-in for the truth when studying the model's implications for optimal decision-making. This practice ignores model ambiguity, exposes the decision problem to misspecification, and ultimately leads to post-decision disappointment. Using statistical decision theory, we develop a framework to explore, evaluate, and optimize robust decision rules that explicitly account for estimation uncertainty. We show how to operationalize our analysis by studying robust decisions in a stochastic dynamic investment model in which a decision-maker directly accounts for uncertainty in the model's transition dynamics.

Decision-makers often confront uncertainties when determining their course of action. For example, individuals save to cover uncertain medical expenses in old age (French and Song, 2014) .

Firms set prices in an uncertain competitive environment (Ilut, 2020) , and policy-makers face uncertainties about future costs and benefits when voting on climate change mitigation efforts (Barnett et al., 2020) . We consider the situation in which a decision-maker posits a collection of economic models to inform his decision-making process. Each model formalizes the relevant objectives and trade-offs involved and provides an implicit rule for optimal decisions. Uncertainty is limited to risk for a given model, as the model induces a unique probability distribution over possible future outcomes. However, a decision-maker also faces model ambiguity as the true model within the collection remains uncertain (Arrow, 1951; Knight, 1921) .

It is the standard practice in economics to estimate models on data and use the point estimates as a stand-in for the truth when studying the model's implications and optimal decisionmaking. 1 This approach ignores model ambiguity, resulting from the remaining parametric uncertainty after the estimation, and opens the door for the misspecification of the decision problem. As-if decisions, decisions that are optimal if the point estimates used to inform decisions are correct (Manski, 2021) , often turn out to be very sensitive to misspecification (Smith and Winkler, 2006) . This danger creates the need for robust decisions that perform well over a whole range of different models instead of as-if decisions that perform best for one particular model. However, increasing the robustness of decisions, often measured by a performance guarantee under a worst-case scenario, reduces performance in all other cases. Striking a balance between the two objectives is challenging.

We solve this trade-off and determine the optimal level of robustness by combining insights from statistical decision theory (Berger, 2010) with data-driven robust optimization (Bertsimas et al., 2018) . A core concept in statistical decision theory is a statistical decision function (SDF) that provides a procedure to map all available data into decisions. At the same time, the literature on data-driven robust optimization provides us with precisely such procedures for decision-making with varying levels of robustness against misspecification of the decision problem. Our main contribution is interpreting these procedures as SDFs and evaluating their performance with the toolkit of statistical decision theory. This insight allows us to systematically determine the optimal level of robustness. In doing so, we bring together and extend research in economics and operations research by using econometric models in complex decision problems (Bertsimas and Thiele, 2006; Manski, 2021) .

In our application, we revisit Rust's (1987) seminal bus replacement problem. Model ambiguity is particularly consequential in dynamic models where the impact of erroneous decisions accumulates over time (Mannor et al., 2007) . 2 In the model, the manager Harold Zurcher implements a maintenance plan for a fleet of buses that maximizes his expected discounted utility.

He faces uncertainty about the future mileage utilization of the buses but has data on past utilization available to inform his decisions. While Rust's (1987) original goal was to describe the investment behavior of Harold Zurcher, our analysis is normative. We are interested in how a generic decision-maker should make decisions in this instance.

The bus replacement problem is typically modeled as a standard Markov decision problem (MDP), and the point estimates for the mileage utilization are treated as-if they correspond to the true parameters. The solution of the MDP is an as-if decision rule that is optimal given the estimates. This approach ignores model ambiguity. From the perspective of statistical decision theory, an MDP is just one particular example of an SDF suitable for analyzing the bus replacement problem. We, on the other hand, consider a whole class of SDFs called robust Markov decision problems (RMDP) (Ben-Tal et al., 2009) . RMDPs generalize the standard MDP, as they consider a whole set of distributions for the transition dynamics collected in an ambiguity set. The solution of an RMDP is a robust decision rule that is optimal under a worst-case scenario for all mileage utilization distributions inside the ambiguity set. We follow the literature and construct the ambiguity set so that it contains all distributions we cannot reject with a certain level of confidence ω ∈ [0, 1] around the point estimates under any possible realization of the data (Ben-Tal et al., 2013) . The size of the ambiguity set is a choice by the decision-maker and determines the level of robustness. Given the realization of the data, the robust decision rule based on the solution of an RMDP is always conditional on the specified level of robustness. Each choice of ω defines a different RMDP, and applying the toolkit of statistical decision theory allows us to determine the optimal level of robustness ω * within the whole class of SDFs.

To do so, we compare the performance of RMDPs with varying levels of robustness under different decision-theoretic criteria. We consider the situation before any data on mileage utilization is available and implement an ex-ante decision-theoretic analysis. We explore the performance of robust decision rules over the whole probability simplex and are thus able to determine the optimal level of robustness. Throughout, we compare robust and as-if decision rules, as the standard Markov decision problem remains one SDF within the broader class we consider.

2 Rust's (1987) model serves as a computational illustration in a variety of settings. See for example Christensen and Connault (2019) , Iskhakov et al. (2016) , Reich (2018) , and Su and Judd (2012 Figure 1 .: RMDPs as statistical decision functions

Our insight to evaluate robust decisions using statistical decision theory applies to the whole literature on data-driven robust optimization. There exists a growing number of applications of data-driven robust decision-making in a variety of settings, including portfolio decisions (Jin et al., 2020; Zymler et al., 2013) , elective admission to hospitals (He et al., 2019; Meng et al., 2015) , the timing of medical interventions (Goh et al., 2018; Kaufman et al., 2017) , and managing the production of renewable energy (Alismail et al., 2018; Samuelson and Yang, 2017) . 3

Despite its broad field of application, the existing literature only offers limited guidance on choosing the optimal level of robustness. At the most basic level, the recommendations range from simply advocating a high level of robustness (Ben-Tal et al., 2013) to choosing a level of robustness that ensures a pre-specified worst-case performance (Brown et al., 2012) . These approaches ignore the trade-off between a performance guarantee under a worst-case scenario and reduced performance in all other cases. Most recently, and much closer to our approach, Gotoh et al. (2021) restricts attention to the neighborhood of a realized point estimate. Thus, their analysis is ex-post as it does not aggregate performance over all possible realizations of the data.

At the same time, in econometrics, there is a burgeoning interest in assessing the sensitivity of findings to model or moment misspecification. 4 Our work is related to Jørgensen (2021) , who develops a measure to assess the sensitivity of results by fixing a subset of parameters of a model before the estimation of the remaining parameters. Our approach differs as we directly incorporate model ambiguity in the design of the decision-making process and assess the performance of a decision rule under misspecification of the decision environment. As such, our focus on ambiguity faced by decision-makers about the model draws inspiration from the research program summarized in Hansen and Sargent (2016) that tackles similar concerns with a theoretical focus. We complement recent work by Saghafian (2018) , who works in a setting similar to ours, but does not use statistical decision theory to determine the optimal robust decision rule. In ongoing work, Eisenhauer et al. (2021) construction. In addition, we contribute to the work on optimal treatment allocation started in Manski (2004) and Manski (2009) , which characterizes the structure of optimal statistical decision functions and provides (asymptotic) bounds on their performance (Hirano and Porter, 2009; Stoye, 2009; Tetenov, 2012; Stoye, 2012; Kitigawa, 2018) .

The structure of the remaining analysis is as follows. In Section 2, we present statistical decision theory as our framework to compare decision rules. We then set up a canonical model of a data-driven robust Markov decision problem in Section 3 and outline the decision-theoretic determination of the optimal level of robustness. Section 4 presents our analysis of the robust bus replacement problem. Section 5 concludes.

We now show how to compare as-if decision-making to its robust alternatives using statistical decision theory. We first review the basic setting and then turn to a classic urn example to illustrate some key points.

We study a decision problem in which the consequence c ∈ C of various alternative actions a ∈ A depend on the parameterization θ ∈ Θ of an economic model. A consequence function ρ : A × Θ → C details the consequence of action a under parameters θ: c = ρ(a, θ).

A decision-maker ranks consequences according to a utility function u : C → R, where higher values are more desirable. The structure of the decision problem (A, Θ, C, ρ, u) is known, but the true parameterization θ 0 is uncertain. As a result, the consequences of a particular action are ambiguous. An observed sample of data ψ ∈ Ψ, however, provides a signal about the true parameters, as P θ -the sampling distribution of ψ -differs by θ. A statistical decision function (SDF) δ : Ψ → A is a procedure that determines an action for each possible realization of the sample.

In our application, we study the bus replacement problem with unknown future mileage utilizations. The decision problem is dynamic, so a decision-maker acts by committing to a plan that specifies whether to maintain or replace a bus in any possible future scenario. The consequences of executing a particular plan are a stream of maintenance costs, aggregated by its discounted sum of utilities. A plan's total utility is determined by the true distribution of the bus mileage utilization. The optimal decision rule based on an RMDP depends on the observed sample of past mileage transitions as the sample informs the construction of the ambiguity set. So, each RMDP is one example of an SDF for the bus replacement problem. We consider many RMDPs with varying levels of robustness and thus analyze a whole class of SDFs.

Statistical decision theory provides the framework to compare the performance of alternative decision functions δ ∈ Γ. The utility achieved by any δ is a random variable before realizing ψ.

Thus, Wald (1950) suggests measuring the performance of δ at all possible parametrizations θ by computing the expected utility with respect to its induced sampling distribution P θ :

In general, no single decision function yields the highest expected utility for all possible parameterizations. In this case, determining the best decision function δ * is not straightforward.

Still, decision theory proposes various criteria (Gilboa, 2009; Marinacci, 2015) to aggregate the performance of a decision function at all possible parameterizations. At the most fundamental level, any decision function is admissible if another function does not exist, whose expected utility is always at least as high. In most cases, several decision functions are admissible, and thus additional optimality criteria are needed. Our analysis explores three of the most common decision criteria: (1) maximin, (2) minimax regret, and (3) subjective Bayes.

Following the maximin decision criterium (Wald, 1950) , we determine the optimal decision function by computing the minimum expected performance for each decision function over all points in the parameter space. We then choose the one with the highest minimum performance.

Stated concisely,

For the minimax regret criterion (Niehans, 1948) , we compute the maximum regret for each decision function over all points in the parameter space. The regret of choosing a decision function for any realization of θ is the difference between the maximum possible performance,

where the true parameterization informs the decision, and its actual performance. We then select the decision function with the lowest maximum regret. Thus, the minimax regret criterion solves:

Subjective Bayes (Savage, 1954 ) requires a subjective probability distribution f θ over the parameter space. Then, we select the decision function with the highest expected subjective utility:

We now illustrate the key ideas that allow us to compare as-if and robust decision-making using statistical decision theory with an urn example. As in our empirical application, we study a whole class of statistical decision functions. We first compare the performance of two distinct alternatives and then determine the optimal function within the class.

We consider an urn with black b and white w balls where the true share of black balls θ 0 is unknown. In this example, the action constitutes a guessθ about θ 0 after drawing a fixed number of n balls at random with replacement. The parameter and action space both correspond to the unit interval Θ = A = [0, 1].

If the guess matches the true share, we receive a payment of one. On the other hand, the payment is reduced by the squared error in case of a discrepancy. Thus, the consequence function takes the following form:

Going forward, we assume a linear utility function and directly refer to the monetary consequences of a guess as its utility. The sample space is

of length n is a typical realization of ψ. The observed number of black balls r among the n draws in a given sample ψ provides a signal about θ 0 . The sampling distribution for the possible number of black balls R takes the form of a probability mass function (PMF):

Any function δ : {b, w} n → [0, 1] that maps the number of black draws to the unit interval is a possible statistical decision function.

We focus on the following class of decision functions δ ∈ Γ, where each λ indexes a particular decision function:

The empirical share of black balls in the sample r/n provides the point estimateθ. The decision functions in Γ specify the guess as a weighted average between the point estimate and the midpoint of the parameter space. The larger λ is, the more weight is put on the point estimate.

At the extremes, the guess is either the point estimate (λ = 1) itself or fixed at 0.5 (λ = 0).

We begin by comparing the performance of the two decision functions with λ = 1 and λ = 0.9.

We refer to the former as the as-if decision function (ADF), as it announces the point estimate as if it is the true parameter. For reasons that will later become clear, we identify λ = 0.9 as the robust decision function (RDF). Following Wald (1950) , we evaluate their relative performance by aggregating the vector of expected payoffs over the unit interval using the different decision-theoretic criteria. We set the number of draws n to 50. Figure 2 shows the sampling distribution of the number of black balls R and the associated payoff of following the two decision functions for each possible draw. The true, but unknown, share in this example is 40%, i.e. θ 0 = 0.4. The RDF outperforms the ADF for realizations of the point estimates smaller than its true value due to the shift towards 0.5. At the same time, the ADF leads to a higher payoff at the center of the distribution. Overall, the performance of the RDF is more balanced across the whole parameter space, which motivates its name. functions have their lowest expected payoff at θ = 0.5. As the RDF outperforms its ADF alternative at that point, the RDF is preferred based on the maximin and minimax regret criteria. The maximin and minimax regret criteria are identical in this setting, as the payoff at the true share is constant across the parameter space. Using the subjective Bayes criterion with a uniform prior, we select the ADF, as its better performance at the boundaries of the parameter space is enough to offset its worse performance in the center. Returning to the whole set of decision functions, we can construct the optimal statistical decision function in Γ for the alternative criteria by varying λ to maximize the relevant performance measure. For example, Figure 5 shows the minimum and the uniformly weighted performance for varying λ.

Notes: We omit the performance measure for the minimax regret criterion, as λ * Maximin = λ * Regret in this setting as noted earlier. Neither of our two decision functions analyzed earlier turns out to be optimal, as λ * Bayes ≈ 0.96 and λ * Maximin ≈ 0.87. Overall, the performance measure is more sensitive to the choice of λ under the maximin criterion than under subjective Bayes.

In summary, the urn example illustrates the performance comparison of alternative decision functions over the whole parameter space. It shows how to construct an optimal decision function within a class for alternative decision-theoretic criteria. Next, we move to the more involved setting of a sequential dynamic decision problem with ambiguous transitions that we analyze in our application.

We now outline the framework of an RMDP for the analysis of sequential decision-making in light of model ambiguity. From the perspective of statistical decision theory, any RMDP with a fixed level of robustness is a statistical decision function. Once a sample of transitions is available, we construct the ambiguity set of a given size and solve the RMDP for a robust decision rule.

We first present the general setup of an RMDP and discuss the construction of the ambiguity set. We then turn to the solution approach and describe our decision-theoretic analysis to determine the optimal level of robustness. Throughout, we address the new challenges of analyzing an RMPD as opposed to a standard MDP. In line with our application, we discuss an infinite horizon model in discrete time, stationary utility and transition probabilities, and discrete states and actions. 5

We focus our exposition on ambiguity in the transition dynamics of the Markov decision process.

We do not address uncertainty about the parameters of the reward functions. Although our central insight to use statistical decision theory to determine the optimal level of robustness is also relevant for the parameters of the reward functions, we do not address uncertainty pertaining to these parameters, as each setting introduces its unique computational challenges (Mannor and Xu, 2019) .

We consider the following decision problem. At time t = 0, 1, 2, . . . a decision-maker observes the state of their environment s t ∈ S and chooses an action a t from the set of admissible actions A. The decision has two consequences. It creates an immediate utility u(s t , a t ), and the environment evolves to a new state s t+1 . The transition from s t to s t+1 is affected by the action, and governed by a transition probability distribution p(s t , a t ).

Decision-makers take the future consequences of the current action into account. While a decision rule d t specifies the planned action for all possible states within period t, a policy π = {d 0 , d 1 , d 2 , . . .} is a collection of decision rules and specifies all planned actions for all time periods. Figure 6 depicts the timing of events in the decision problem. At the beginning of period t, a decision-maker learns about the utility of each alternative, chooses one according to the decision rule d t , and receives its immediate utility. Then, the state evolves from s t to s t+1 , and the process repeats itself in t + 1. In a standard Markov decision process (MDP), a single transition probability distribution p(s t , a t ) is associated with each state-action pair. This distribution is assumed to be known, and thus the MDP incorporates risk only. In an RMDP, there is a whole set of distributions associated with each state-action pair collected in an ambiguity set p(s t , a t ) ∈ P(s t , a t ). For a particular RMDP, the ambiguity set is assumed to be known, and thus the RMDP incorporates risk for a given distribution and ambiguity about the true distribution.

In a standard MDP, the objective of a decision-maker in state s t at time t is to choose the optimal policy π * from the set of all possible policies Π that maximizes their expected total discounted utilityṽ t π * (s t ) as formalized in Equation (3.1):

δ t+r u(s t+r , d t+r (s t+r )) .

(3.1)

The exponential discount factor δ parameterizes a taste for immediate over future utilities. The superscript of the expectation emphasizes that each policy induces a different probability distri-bution over sequences of possible futures. As long as transition probabilities used to construct the policy are in fact correct, the standard value functionṽ t π * (s t ) measures the performance of the optimal policy.

In an RMDP, the goal is to implement an optimal policy that maximizes the expected total discounted utility under a worst-case scenario. Given the ambiguity about the transition dynamics, a policy induces a whole set of probabilities over sequences of possible future utilities F π , and the worst-case realization determines its ranking. The formal representation of the decision-maker's objective is Equation (3.2):

δ t+r u(s t+r , d t+r (s t+r )) .

( 3.2) We consider a setting where historical data provides information about the transition dynamics.

In the data-driven standard MDP, the empirical probabilitiesp(s t , a t ) serve as a plug-in for the truth, and the solution of the MDP provides an as-if decision rule. In a data-driven RMDP, the empirical probabilities are used to construct the ambiguity sets for the transitions, and the solution of the RMDP provides a robust decision rule.

We 

In a standard MDP, the objective is to maximize the expected total discounted utility as formalized in Equation (3.1). This requires evaluating the performance of all policies based on all possible sequences of utilities and the probability that each occurs. Fortunately, the stationary Markovian structure of the problem implies that the future looks the same whether the decision-maker is in state s at time t or any other point in time. The only variable that determines the value to the decision-maker is the current state s. Thus, the optimal policy is stationary as well (Blackwell, 1965) , and the same decision rule is used in every period. The Under mild conditions,Λ is a contraction mapping and allows to compute the value functioñ v(·) as its unique fixed point (Denardo, 1967) . For every state s ∈ S, the next state can be determined by any p ∈P(s, d(s); ω).

A policy π now induces a set of probability distributions F π on the set of all possible histories H.

Any particular history h = (s 0 , a 0 , s 1 , a 1 , . . .) can be the result of many possible combinations of transition probabilities. Rectangularity imposes a structure on the combination possibilities.

Assumption 1. Rectangularity The set F π of probability distributions associated with a policy π is given by

p(s t+1 |s t , a t ), with p(s t , a t ) ∈P(s t , d t (s t ); ω) for t = 0, 1, . . .

where the notation simply denotes that each element in F π is a product of p ∈ F dt , and vice versa (Iyengar, 2005) .

Assumption 1 formalizes the idea that ambiguity about the transition probability distribution is uncoupled across states and time. All elements of the ambiguity sets can be freely combined to generate a particular history.

The objective when facing ambiguity is to implement a policy π * that maximizes the expected total discounted utility under a worst-case scenario as presented in Equation ( Note that even for genuinely uncoupled uncertainties, the maximin criterion does not automatically select the most robust statistical decision function (ω = 1). This particular decision function is based on the worst-case scenario over the full probability simplex at each stateaction pair. In fact, the worst-case decision function might not be admissible in particular settings where it is weakly dominated by the as-if (or some other) decision function. Suppose, for example, the true distribution corresponds to the worst-case distributions. In this case, the distribution of sampled transitions is degenerate, as the worst-case scenario at each state-action pair is the certain transition to the state with the lowest future value (Nilim and El Ghaoui, 2005 

We now study robust decision-making in the seminal bus replacement problem. First, we discuss the general setting and the details of the computational implementation. Second, we conduct an ex-post analysis of robust decision rules constructed for the observed sample of mileage transitions analyzed in Rust (1987) . Third, considering the situation before any data is realized, we conduct an ex-ante analysis of robust decision functions with varying levels of robustness over the whole probability simplex, which allows us to determine the optimal level of robustness using statistical decision theory.

The bus replacement model is set up as a regenerative optimal stopping problem (Chow et al., 1971) . It is motivated by the sequential decision problem of a maintenance manager, Harold

Zurcher, for a fleet of buses. He makes repeated decisions about their maintenance to maximize the expected total discounted utility under a worst-case scenario. Each month t, a bus arrives at the bus depot in state s t = (x t , t ) described by its mileage since the last engine replacement

x t and other signs of wear and tear t . He faces the decision to either conduct a complete engine replacement (a t = 1) or perform basic maintenance work (a t = 0). The cost of maintenance c(x t ) increases with the mileage state, while the cost of replacement RC remains constant. In the case of an engine replacement, the mileage state is reset to zero. Note that we do not attempt to describe Harold Zurcher's decision-making process. Instead, we are interested in how a generic decision-maker should make decisions in this setting.

The immediate utility of each action is given by:

Decisions are made in light of uncertainty about next month's state variables captured by their conditional distribution p(x t , t , a t ).

Although in this framework, the utility and consequently the value function is finite in each state, they are not uniformly bounded. This property, however, is a crucial assumption for the results of Blackwell (1965) and Denardo (1967) on the contraction property of the Bellman operator and the stationarity of the optimal policy in the standard MDP setting. For the original as-if analysis, Rust (1988) circumvents this problem by imposing conditional independence between the observable and unobservable state variables, i.e. p(x t+1 , t+1 |x t , t , a t ) =

p(x t+1 |x t , a t ) q( t+1 |x t+1 ), and assuming that the unobservables t (a t ) are independent and identically distributed according to an extreme value distribution with mean zero and scale parameter one. These two assumptions, together with the additive separability between the observed and unobserved state variables in the immediate utilities, ensure that the expectation of the next period's value function is independent of the time. The regenerative structure of the process implies that the transition probabilities in case of replacement in any mileage state correspond to the probabilities of maintenance in the zero mileage state. Therefore, the expected value function is the unique fixed point of a contraction mapping on the reduced space of mileage states only. In addition, the conditional choice probabilities P (a t |x t ) have a closedform solution (McFadden, 1973) . We build on these results and extend them to our robust setting with ambiguous transition dynamics. The proof is available in Appendix A.

In the analysis of the original bus replacement problem, the distribution of the monthly mileage transitions are estimated in a first step and used as plug-in components for the subsequent analysis. We extend the original setup and explicitly account for the ambiguity in the estimation.

Following the arguments on the regenerative structure of the process above, we incorporate ambiguity in the RMDP with ambiguity sets conditional on the mileage states x only. We construct ambiguity setsP(x; ω) based on the Kullback-Leibler divergence D KL (Kullback and Leibler, 1951) that are statistically meaningful, computationally tractable, and anchored in empirical estimatesp(x).

Our ambiguity set takes the following form for each mileage state x:

where J x = {j 1 , . . . , j |Jx| } denotes the set of all states that have an estimated non-zero probability to be reached from x,∆ |Jx| = {p ∈ R |Jx| | p i > 0 for all i = 1, . . . , |J x | and |Jx| i=1 p i = 1} is the interior of the (|J x | − 1) -dimensional probability simplex, and ρ x (ω) captures the size of the set for the state x with a given level of confidence ω. Iyengar (2002) and Ben-Tal et al. (2013) provide the statistical foundation to calibrate ρ x (ω) such that the true (but unknown) distribution p 0 is contained within the ambiguity set for a given level of confidence ω. Let χ 2 df denote a chi-squared random variable with df degrees of freedom, and let F df (·) denote its cumulative distribution function with inverse F −1 df (·). Then, the following approximate relationship holds as the number of observations N x for state x tends to infinity (Pardo, 2005) :

We can therefore calibrate the size of the ambiguity set based on the following relationship:

We use Rust's (1987) original data to inform our computational experiments. His data consists of monthly odometer readings x t and engine replacement decisions a t for 162 buses. The fleet consists of eight groups that differ in their manufacturer and model. We focus on the fourth group of 37 buses with a total of 4,292 monthly observations. We discretize mileage into 78 equally spaced bins of length 5, 000 and set the discount factor to δ = 0.9999. Figure 7 highlights the limited information about the true distribution of mileage utilization.

It shows the number of observations available to estimate next month's utilization for different levels of accumulated mileage. While there are more than 1,150 observations on buses with less than 50,000 miles, there are only about 220 with more than 300,000.

We analyze a specific example of Rust's (1987) bus replacement problem. We do not use his reported estimates of the maintenance and replacement costs. Given these estimates, decisions are mainly driven by the unobserved state variable t , and so ambiguity about the evolution of Thus, we specify the following cost function c(x t ) = 0.4 x t and set the replacement costs RC to 50. We solve the model using a modified version of the original nested fixed point algorithm (NFXP) (Rust, 1988) , and we determine the worst-case transition probabilities in each successive approximation of the fixed point. Given the size of the ambiguity set, we can determine the worst-case probabilities as the solution to a one-dimensional convex optimization problem (Iyengar, 2005; Nilim and El Ghaoui, 2005) . 7

We first study as-if and robust decision rules for Rust's (1987) observed sample of mileage transitions. We present the estimated transition probabilities and the corresponding worstcase distributions. We then explore alternative decision rules based on several RMDPs, outline the resulting differences in maintenance decisions, and evaluate their relative performance under different scenarios. Figure 8 shows the point estimatesp for the transition probabilities of monthly mileage usage. We pool all 4,292 observations to estimate this distribution by maximum likelihood, and thus the probability of the next period's mileage utilization is the same for each state x t . We only observe increases of at most J = 3 grid points per month. For about 60% of the sample, monthly bus utilization is between 5,000 and 10,000 miles. Very high usage of more than 10,000 miles amounts to only 1.2%.

The confidence level ω and the available number of observations N x determine the size of the ambiguity set as outlined in Equation (4.1). From now on, we mimic state-specific ambiguity sets by constructing them based on the average number of 55 observations per state. Note that while the estimated distribution is the same for all mileage levels, its worst-case realization is not. However, there are only minor differences across mileage levels, so we focus our following discussion on a bus with an odometer reading of 75,000. Figure 9 shows the transition probabilities for different sizes of the ambiguity set. We vary the confidence level for the whole number of observations (N x = 55) on the left, while on the right, the level of confidence remains fixed (ω = 0.95), and we cut the number of observations roughly in half. The larger the ambiguity set, the more probability is attached to higher mileage utilization, resulting in higher costs overall. For example, while the probability of mileage increases of 10,000 or more is an infrequent occurrence in the data, its probability increases first to 1.7%.

It then doubles to 2.5% as we increase the confidence level. When only about half the data is available, this probability increases even further to 3.2%.

The decision-maker chooses whether to perform regular maintenance work on a bus or replace its complete engine each month. The assumed transition probabilities correspond to their worst-case transitions within the ambiguity set. As a result, any differences between the as-if To gain further insights into the differences between the as-if and robust decisions, we simulate a fleet of 1,000 buses for 100,000 months under the alternative decision rules. Figure 11 shows the level of accumulated mileage over time for a single bus under different decision rules. It clarifies our simulation setup, where we apply different decision rules to the same bus. The realizations of observed transitions and unobserved signs of wear and tear remain the same. The bus accumulates more and more mileage until Harold Zurcher replaces the complete engine and the odometer is reset to zero. The first replacement happens after 20 months at 60,000 miles following the as-if decision rule, while it is delayed for another four months under the robust alternative (ω = 0.95). As its timing differs, the odometer readings will start to diverge after 20 months, even though monthly utilization remains the same. Notes: We apply a Savitzky-Golay filter (Savitzky and Golay, 1964) to smooth the simulation results. 

We now turn to the situation before any data are realized. We evaluate the ex-ante performance of as-if and robust decision functions over the whole probability simplex and determine the optimal level of robustness.

We operationalize our analysis as follows. In line with Rust's (1987) assumption on the distribution of the mileage utilization, we specify a uniform grid with 0.1 increments over the interior of the two-dimensional probability simplex∆ 3 . At each grid point, we draw 100 samples of 55 random mileage utilizations. For each sample, we solve several robust decision functions for a grid of ω = {0.0, 0.1, . . . , 1.0} using the estimated transition probabilities. Note that the uncertainties are coupled across states, as the same underlying probability creates the sample of bus utilizations. Thus, the rectangularity assumption does not reflect the economic environment. However, we still impose it when constructing the robust decision functions to ensure tractability. We then simulate the implied decision rules' actual performance and compute their expected performance by averaging across the 100 runs for each grid point. Using this information, we measure the performance of the different decisions based on the maximin criterion, the minimax regret rule, and the subjective Bayes approach using a uniform prior.

In Figure 14 we illustrate the differences in expected performance between a robust decision function (ω = 0.1) and the as-if alternative over the probability simplex. Based on a maximin criterion, decision functions rank higher when the confidence level ω used to construct them is greater. The decision function with ω = 0.3 comes in first, while as-if decisions rank last. Thus, decision-makers can improve their worst-case outcomes by adopting a robust decision function. However, this comes at a cost, as indicated by the improved rankings for the as-if decision function as we move to different criteria. As-if decisions move to second place for minimax regret. The as-if decision rule comes in first when we aggregate performance across all states using a subjective Bayes approach with a uniform prior. Thus, our approach clarifies the trade-offs involved when choosing a particular decision function for decision-making.

We now determine the optimal size of the ambiguity set ω * for each decision-theoretic criterium. The minimax regret criterion leads to a slightly reduced level of ω * = 0.1. As-if decisions are optimal based on the subjective Bayes criterion with a uniform prior.

Economists often estimate economic models on data and use the point estimates as a stand-in for the truth when studying the model's implications for optimal decision-making. This practice ignores model ambiguity, exposes the decision problem to misspecification, and ultimately leads to post-decision disappointment. We develop a framework to explore, evaluate, and op-timize robust decision rules that explicitly account for the uncertainty in the estimation using statistical decision theory. We show how to operationalize our analysis by studying robust decisions in a stochastic dynamic investment model in which a decision-maker directly accounts for uncertainty in the model's transition dynamics.

As our core contribution, we combine ideas from data-driven robustness optimization (Bertsimas et al., 2018) , robust Markov decision processes (Ben-Tal et al., 2009) , and statistical decision theory (Berger, 2010) to optimize robustness in decision-making. This insight transfers directly to many other settings. For example, the COVID-19 pandemic provides a timely example of economists informing policy-making by using highly parameterized models in light of ubiquitous uncertainties (Avery et al., 2020) . When analyzing these models, economists treat many of their parameters as if they are known. However, their actual values are uncertain, as they are often estimated based on external data sources. Using statistical decision theory, our research illustrates how to conduct robust policy-making and to evaluate its relative performance against policies that ignore uncertainty. Such an approach promotes a sound decision-making process, as it provides decision-makers with the tools to systematically navigate the uncertainties they face (Berger et al., 2021) .

A. The robust contraction mapping Rust (1987) shows that the expectation of the next period's value function is a fixed point on the mileage states x only. He uses the regenerative property of the mileage process and Then Λ(·) is a contraction mapping on V, · ∞ with unique fixed point EV .

Proof. Let v, w ∈ V be arbitrary. Fix x ∈ X and assume without loss of generality that [u(x, a) + (a) + δ

This holds in particular for p a , which yields:

Arguing vice versa for Λ(w)(x) ≤ Λ(v)(x), this implies that

With ν arbitrary and δ ∈ [0, 1) this shows that Λ is a contraction mapping on V with respect to · ∞ . As V, · ∞ is a Banach space, the result is established.

A-2

The career costs of children

Optimal wind farm allocation in multi-area power systems using distributionally robust optimization approach

Measuring the sensitivity of parameter estimates to estimation moments

On the informativeness of descriptive statistics for structural estimates

Sensitivity analysis using approximate moment condition models

Alternative approaches to the theory of choice in risk-taking situations

Policy implications of models of the spread of coronavirus: Perspectives and opportunities for economists

Quantitative analysis of multiparty tariff negotiations

Pricing uncertainty induced by climate change

Robust solutions of optimization problems affected by uncertain probabilities

Robust optimization

Statistical decision theory and Bayesian analysis

Rational policymaking during a pandemic

Data-driven robust optimization

Robust and data-driven optimization: Modern decision making under uncertainty

Discounted dynamic programming. The Annals of Mathematical Statistics

Employment, hours of work and the optimal taxation of low-income families

Minimizing sensitivity to model misspecification

Aspirational preferences and their representation by risk measures

Locally robust semiparametric estimation

Great expectations: The theory of optimal stopping

Counterfactual sensitivity and robustness

Percentile optimization for Markov decision processes with parameter uncertainty

Contraction mappings in the theory underlying dynamic programming

An anatomy of international trade: Evidence from french firms

Structural models for policy-making: Coping with parametric uncertainty

The effect of disability insurance receipt on labor supply

Theory of decision under uncertainty

Data uncertainty in Markov chains: Application to cost-effectiveness analyses of medical innovations

Calibration of distributionally robust empirical optimization models

Robust Markov decision process

Data-driven patient scheduling in emergency departments: A hybrid robust-stochastic approach

Asymptotics for statistical treatment rules

The informativeness of estimation moments

Does strategic ability affect efficiency? Evidence from electricity markets

Estimating the innovator's dilemma: Structural analysis of creative destruction in the hard disk drive industry, 1981-1998

Paralyzed by fear: Rigid and discrete pricing under demand uncertainty

Comment on "Constrained optimization approaches to estimation of structural models

Robust dynamic programming

Robust dynamic programming. Mathematics of Operations Research

Tail risk and robust portfolio decisions

Sensitivity to calibrated paramters. Review of Economics and Statistics, forthcoming

Living-donor liver transplantation timing under ambiguous health state transition probabilities

A survey of decision making and optimization under uncertainty

Who should be treated? empirical welfare maximization methods for treatment choice

Risk, uncertainty and profit

On information and sufficiency

Robust MDPs with k-rectangular uncertainty

Bias and variance approximation in value function estimates

Data-driven methods for Markov decision problems with parameter uncertainty

Statistical treatment rules for heterogeneous populations

The 2009 Lawrence R. Klein Lecture: Diversified treatment choice under ambiguity

Econometrics for decision making: Building foundations sketched by Haavelmo and Wald. Econometrica, forthcoming

Model uncertainty

Conditional logit analysis of qualitative choice behavior

A robust optimization model for managing elective admission in a public hospital

Zur Preisbildung bei ungewissen Erwartungen

Robust control of Markov decision processes with uncertain transition matrices

Statistical inference based on divergence measures

Markov decision processes: Discrete stochastic dynamic programming

Distributionally robust optimization: A review

Divide and conquer: Recursive likelihood function integration for hidden Markov models with continuous latent variables

A Python package for robust optimization

An open-source package for the simulation and estimation of a prototypical infinite-horizon dynamic discrete choice model based on Rust

Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher

Maximum likelihood estimation of discrete control processes

Structural estimation of Markov decision processes

Ambiguous partially observable Markov decision processes: Structural results and applications

Data-driven distributionally robust control of energy storage to manage wind power fluctuations

The foundations of statistics

Smoothing and differentiation of data by simplified least squares procedures

The optimizer's curse: Skepticism and postdecision surprise in decision analysis

Minimax regret treatment choice with finite samples

Minimax regret treatment choice with covariates or with limited validity of experiments

Constrained optimization approaches to estimation of structural models

Statistical treatment choice based on asymmetric minimax regret choice

Statistical decision functions

Robust Markov decision processes

Worst-case value at risk of nonlinear portfolios