key: cord-0171317-63xoqkpc
authors: Liu, Jia; Chen, Zhiping; Xu, Huifu
title: Multistage Utility Preference Robust Optimization
date: 2021-09-10
journal: nan
DOI: nan
sha: 007f78003afa699153dc78fbbb2e02bd2ae40e85
doc_id: 171317
cord_uid: 63xoqkpc

In this paper, we consider a multistage expected utility maximization problem where the decision maker's utility function at each stage depends on historical data and the information on the true utility function is incomplete. To mitigate the risk arising from ambiguity of the true utility, we propose a maximin robust model where the optimal policy is based on the worst sequence of utility functions from an ambiguity set constructed with partially available information about the decision maker's preferences. We then show that the multistage maximin problem is time consistent when the utility functions are state-dependent and demonstrate with a counter example that the time consistency may not be retained when the utility functions are state-independent. With the time consistency, we show the maximin problem can be solved by a recursive formula whereby a one-stage maximin problem is solved at each stage beginning from the last stage. Moreover, we propose two approaches to construct the ambiguity set: a pairwise comparison approach and a $zeta$-ball approach where a ball of utility functions centered at a nominal utility function under $zeta$-metric is considered. To overcome the difficulty arising from solving the infinite dimensional optimization problem in computation of the worst-case expected utility value, we propose piecewise linear approximation of the utility functions and derive error bound for the approximation under moderate conditions. Finally we develop a scenario tree-based computational scheme for solving the multistage preference robust optimization model and report some preliminary numerical results.

Decision making under uncertainty has two important elements: belief and taste. Belief is the decision maker's (DM for brevity) view on the state of nature of the underlying uncertainty whereas taste is the DM's preference. When there is an ambiguity about brief, one may base the optimal decision on the worst-case scenario of the uncertainty or worst-case probability distribution [18] and this is the case of robust optimization or distributionally robust optimization. Over the past few decades, a lot of research have been conducted on robust optimization and distributionally robust optimization models, see monograph by Ben-Tal et al. [3] and a comprehensive overview by Rahimian and Mehrotra [38] .

Ambiguity may also occur with regard to taste. A DM might be able to tell his preferences but it is often difficult to identify a function which precisely characterizes the DM's preferences. Such ambiguity may arise from a lack of accurate description of human behaviour [44] , coginitive difficulty or incomplete information [29, 49] . It could also be the case where the decision making problem involves several stakeholders who fail to reach a consensus.

In the well established theory of expected utility, von Neumann and Morgenstern [45] show that if a DM satisfies certain axiomatic properties including completeness, continuity and independence, there is a utility function which can be used to describe the DM's preference. The utility function is unique up to positive linear transformation and can be identified when the space of random prospects facing the DM is relatively simple. In practice, however, identifying a DM's utility function may be an uneasy task either because the decision making problem is too complex or it involves several stakeholders who fail to reach a consensus. In the literature of decision analysis and behavioural economics, a popular method is to elicit the DM's preferences with paired gambling approaches for preference comparisons or certainty equivalence [17] and use the elicited information to construct an approximate utility function via some interpolation methods, see for instance [7] .

Armbruster and Delage [1] take a different approach to handle the issue. Instead of trying to find an approximate von Neumann and Morgenstern's utility function, they use the available information of the DM's preferences such as preferring certain lotteries over other lotteries, being risk averse over gains and risk taking over losses and prudent to construct an ambiguity set of plausible utility functions and then base the optimal decision on the worst-case utility function from the ambiguity set. The approach is called preference robust optimization (PRO) as it follows the general philosophy of robust optimization. Hu and Mehrotra [24] also take a PRO approach to tackle the ambiguity of the true utility function but in a slightly different manner. They consider a probabilistic representation of the class of increasing concave utility functions by confining them to a compact interval and scaling them to being bounded by 1. In doing so, they propose a moment-type framework for constructing the ambiguity set of the DM's utility functions which covers a number of important approaches such as the certainty equivalent and pairwise comparison.

Over the past few years, PRO has attracted increasing attentions. For instances, Haskell et al. [22] propose a robust model which handles the ambiguity of DM's brief and taste. Hu and Stepanyan [25] propose a so-called reference-based almost stochastic dominance method for constructing a set of utility functions near a reference utility which satisfy certain stochastic dominance relationship and use the set to characterize the DM's preference. Hu et al. [26] consider a PRO model with an ambiguity set of general utility functions and propose a Lagrangian function approach for solving the resulting maximin problem. Guo and Xu [19, 21] propose a piecewise linear approximation approach for solving a PRO model with the ambiguity set being specified by moment-type conditions and derive a bound for approximation error in terms of the ambiguity set, the optimal value and optimal solutions. The PRO approach has also been effectively applied to risk management problems where the DM's risk preferences are ambiguous, see [12, 20, 32, 47, 51, 53] .

In this paper, we extend this stream of research to multistage decision making process. There are several main modelling approaches in multistage decision making such as multistage stochastic optimization, Markovian decision making, approximate dynamic programming and reinforcement learning [37] . Among them, multistage stochastic optimization (MSO) has been widely studied and applied in long term financial planning, pension fund management, energy production and trading, supply chain management and inventory control [35] , as it can flexibly characterize the dynamic dependent structure of random data process. A key component in the multistage decision making modelling is the dynamic decision criterion, i.e., the objective function for the multistage stochastic optimization model. One of the most widely adopted objective functions is the multistage expected utility models, which can be also understood as a kind of multistage risk function where the utility function at each stage characterize the dynamic preference of the DM [13] . There are basically three types of multistage expected utility models: terminal utility model, additive utility model and recursive utility model [8] . Like terminal risk measures, the terminal utility model may lead to time inconsistent optimal policies as it only measures the utility of reward at the terminal stage. The additive utility model, which is most extensively studied, considers the sum of utilities of rewards at different stages, thus the DM's intertemporal preferences are risk neutral [13, 41] . The recursive utility model, also known as stochastic differential utility in continuous time setting, characterizes DM's nonlinear intertemporal preferences. The model has a natural connection with time consistency of the optimal dynamic policy. Important contributions include the recursive expected utility [30] and the well-known Kreps-Porteus utility which is recursive, but not necessarily expected utility [31] . However, traditional expected utility theory has received many criticisms for its failure to explain some experimental observations and theoretical puzzles such as Allais paradox. Rankdependent expected utility theory and cumulative prospect theory are subsequently proposed to address the drawbacks, see monograph by Puppe [34] for an overview of the development of the theories. In dynamic setting, Hu et al. [27] study a continuous-time portfolio selection model where a sequence of time-dependent probability weighting functions and rank-dependent utility functions are used to capture a DM's overweighting and underweighting behaviours on tail losses/rewards at different stages, see also [10] for empirical studies.

In all these works, the DM's utility functions are assumed to be known exactly and fixed in the decision making process. However, as we discussed earlier, the DM's utility function may be ambiguous and this motivates us to propose a PRO model for the multistage decision making process. Moreover, many studies argue that utility functions may be state-dependent. The most widely adopted approach is to consider a parametric form of utility functions where the parameters are state-dependent. For instance, Strub and Li [42] consider a sequence of S-shaped utility functions parameterized by a sequence of state-dependent reference points and show that failing to update the reference point as state changes may lead to time-inconsistent investments. Likweise, He et al. [23] consider a series of state-dependent distortion functions when they apply the rank-dependent expected utility theory to continuous time investment problems. Björk et al. [5] adopt a state-dependent risk-aversion parameter in the multistage mean-risk model.There is also a specific stream of research on so-called habit formation utility where the DM's consumption habit level and his utility at a particular stage and state is determined by a historical consumption process [13] .

In this paper, we will also use the habit formation utility model and concentrate on a situation where the DM's utility at each stage is ambiguous but it is possible to use partially available information to construct a set (called ambiguity set later on) of plausible utility functions which capture the DM's preferences. Two ways are proposed to construct the ambiguity set. One is to use the pairwise comparison approach which are widely used in the literature of PRO models and behavioural economics. The other is to construct a ball of utility functions centered at a nominal utility function under some pseudo-metrics. The main challenge to be tackled is to develop efficient computational schemes for solving the resulting multistage PRO models.

As far as we are concerned, the main contributions of the paper can be summarized as follows.

First, we propose a multistage PRO model where the DM's utility preferences at different stages depend on not only the current stage and state but also the history of the underlying random data process leading to the state. We introduce a definition of ambiguity set comprising certain sequences of state-wise utility functions and a maximin optimization model where the optimal policy is based on the worst-case overall expected utility value of the random reward functions at different stages computed with the ambiguity set.

Second, we introduce the concept of rectangularity of the ambiguity set of utility functions. Under some moderate conditions, we show the multistage maximin problem with state dependent ambiguity set is time consistent and demonstrate through a simple example that the problem is time inconsistent when the utility functions are state independent.

Third, by utilizing the time consistency, we derive a recursive formula for solving the multistage PRO problem when the utility functions are restricted to piecewise linear. For the ambiguity of general utility functions, error bounds for both the ambiguity set and the optimal value are derived when the utility functions in the ambiguity set are approximated by piecewise linear utility functions at each stage. To tackle nonlinearity in solving maximin problem at each stage, we propose a scenario tree approach which reformulates the holistic maximin problem as a single mixed integer linear program.

Fourth, we apply the proposed PRO model and the computational scheme to a multistage investment-consumption problem. We examine the convergence of the piecewise linear approximation as the number of breakpoints increases, the impact of variations of the radius of the Kantorovich ball and the number of questionnaires. We also investigate the price of robustness with variation of the number of time periods. We find the model works well when the number of stages is small. The rest of the paper is organized as follows. Section 2 defines the multistage expected utility maximization models to be discussed in this paper. Section 3 introduces the robust counterparts and discusses rectangularity and time consistency of the models when the utility functions are state-dependent. Section 4 demonstrates via an example that the multistage PRO model is time inconsistent when the utility functions are state-independent. Section 5 details construction of the ambiguity set with two approaches: pairwise comparisons and ζ-ball. In the latter approach, a piecewise linear approximation approach is proposed to approximate the general utility functions and error bounds are derived. Section 6 details tractable reformulations of the robust models and Section 7 reports the numerical results. Finally, Section 8 gives some concluding remarks.

We begin by introducing notions and notations that are commonly used in multistage stochastic programming. Let ξ = {ξ t } T t=1 be a stochastic process defined on some probability space (Ω, F, P) where ξ t : Ω → R dt is a random vector supported on Ξ t for t = 1, . . . , T . For simplicity of notation, we write ξ [t] for historical information (ξ 1 , . . . , ξ t ). Let F t denote the sigma algebra in the sample space Ω generated (induced) by ξ [t] , that is,

where B(Ξ [t] ) denotes the Borel sigma algebra of set Ξ [t] := Ξ 1 × · · · × Ξ t . By convention, we assume that there is an initial state ξ 0 which is deterministic and corresponds to the deterministic events F 0 = {∅, Ω}. Consequently, we have F 0 ⊂ F 1 ⊂ · · · ⊂ F t ⊂ F t+1 · · · ⊂ F T ⊂ F. As F t is generated by ξ [t] , we denote E |Ft [·] := E[· | ξ t ] for simplicity and E |F 0 [·] := E[·].

Let L p (Ω, F, P; R) denote the set of random variables ψ : (Ω, F, P) → R with finite pth moments, i.e., Ω |ψ(ω)| p dP(ω) < ∞ for p > 0. L p (R) denotes the set of real functions u : R → R integrable to the p-th order, i.e., R |u(x)| p dx < ∞ for p > 0. L p (Ω, F, P; L p (R)) denotes the set of random integrable functionsû : (Ω, F, P) → L p (R) with finite p-th moments. The state-dependent utility function is a particular case of the random integrable function u t (x, ω): (Ω, F t , P) → L p (R), which has a structure ofû t (x, ω) = u t (x, ξ [t−1] (ω)). By writing u t as a function of ξ [t −1] (ω), we mean that the DM's utility preference at stage t is dependent of historical data

Throughout the paper, we will use u t (·, ξ [t−1] ) to denote the single attribute, multi-parametric utility function which depends on historical data, and use u t (·) to denote the one which is independent of ξ [t−1] .

We consider the following multistage expected utility maximization problem max

where h t : R nt × R dt → R is a continuous reward function at stage t, and u t :

is the utility function characterizing the DM's utility value of the reward at stage t, x t is the decision vector and X t ( 1] ) is the set of feasible decisions at stage t for t = 2, · · · , T , the expectation at stage 1 is taken w.r.t. the distribution of ξ 1 , and the expectation at stage t is taken w.r.t. the distribution of ξ t conditional on the historical data ξ [t−1] , for t = 2, . . . , T . In this setup, the DM chooses an optimal decision x t from X t (

is maximized. The utility function at stage t depends not only the current stage (indicated by the subscript) but also the historical state (realization of) ξ [t −1] . The choice of the optimal decision x t is independent of the realizations of ξ t which means the decision is made before realization of uncertainty ξ t , and it is not a recourse action. Of course, we can interpret u t (h t (x t , ξ t ), ξ [t−1] ) as the optimal value arising from a recourse action. In particular, if the random reward function at stage t depends on the current state ξ t−1 rather than state ξ t at next stage (mathematically replacing h t (x t , ξ t ) with h t (x t , ξ t−1 )), then problem (2.1) can be written as max

In this formulation, a recourse action x t is taken before realization of ξ t−1 is observed. We skip the details on recourse actions so that we may focus on the key issues in this paper. Note that if we interpret the utility function as the DM's taste and the distribution of future uncertainty of reward as belief in the literature of decision analytics, then we can see that the randomness in u t (the taste) arises from historical data ξ [t−1] whereas the belief is concerned with future uncertainty of rewards. Unless specified otherwise, we assume that

) is a convex and compact subset of R nt for t = 1, · · · , T .

A simplified version of (2.1) is that the utility functions at each stage are state independent, that is,

In this model, the DM has the same utility preference in all states at stage t regardless of the overall wealth accumulated over the past t−1 stages. In the case when the DM takes an identical view on utilities over all stages, the model may be further simplified to max

In the classical MSP models, utility functions at different stages may be different but they are pre-determined at stage 1 which means the DM cannot adjust his utility at later stages as dynamic stochastic environment changes and this is inconsistent with practical decision making process. Indeed, many theoretical and empirical studies show that the utility function should depend on the current and/or historical state [10, 27, 33] . For example, the DM's utility of wearing a mask in the year of 2020 (at the peak of COVID-19 epidemic) must be totally different from the utility in normal circumstances. Even at different stages of the COVID-19 epidemic, the utility of wearing a mask varies. Here we assume that at stage t, the DM can adjust his utility function according to the current and historical state. Of course, regardless of the stage that the DM is in, his utility function must be specified prior to the decision making at the stage, i.e., u t does not depend on ξ t .

In this paper, we will focus on the case when the utility functions are ambiguous, and we will see that utility preference robust formulations of the above three models will have completely different properties.

In model (2.1), optimal decision at stage t is a vector in R nt . However, if we view the decision making from stage 1, then we may regard it as a random function of ξ [t −1] . Consequently, we may reformulate the multistage expected utility maximization problem (2.1) as

where the expectation is taken w.r.t. the distribution of ξ [T ] and we write x [1,T ] for a sequence of decisions (x 1 , x 2 (·) . . . , x T (·)), which is also known as an implementable policy. The reformulation is fundamentally related to Bellman's principle in dynamic programming that an optimal policy at the initial planning stage is consistent with the optimal decisions at each of the remaining stages, we will come back to this in Section 3. 

A practical application of the multistage utility maximization model is multistage portfolio selection problem. Example 2.1 Consider a financial market with n risky assets. Suppose that an investor joins the market at time 0 with a positive initial wealth w 0 and plans to invest his wealth in the market for T periods. At each period, the investor gains a reward which could be his end-of-period wealth or the increase of his wealth over this period, i.e., h t (r t , x t ) = (e + r t ) ⊤ x t or h t (r t , x t ) = r ⊤ t x t , where r t is the excess return rate vector of n risky assets over period t, e = [1, · · · , 1] ⊤ is the vector with an all components being one. The investor presets a utility u t (·) which measures his preferences on the reward over that period. Then the investor would like to maximize the overall expected utility over all of the T periods by adjusting his portfolios at the beginning of each period. If the investor's objective is to maximize the overall expected utility of the wealth, the decision making problem can be reformulated as a multistage expected utility maximization problem:

. . , T, where x t is the amount of wealth invested in the n risky assets at the beginning of period t, r ⊤ t x t is the wealth at the end of period t. e ⊤ x t = (e + r t−1 ) ⊤ x t−1 is the wealth balance equation, which together with the no-shorting constraint characterizes the feasible set X t of portfolio x t at period t. If the investor's utility is valued over the state-wise return rates, the objective could be set as

In this case, by normalizingx t = x t /(e ⊤ x t ), we have an equivalent utility maximization problem:

In the multistage expected utility optimization models that we presented in the previous section, the true utility functions which capture the DM's preferences at each stage are assumed to be known. This assumption may not be satisfied in practice as we discussed in the introduction section. It motivates us to consider a robust model where the optimal decision at each stage is based on the worst-case utility function from a set of plausible utility functions. Since the robust model is essentially built upon von Neumann-Morgenstern expected utility theory, we make a blanket assumption as follows. In practice, DM's utility preferences may be inconsistent due to cognitive biases [43] and/or elicitation errors [1] . Bertsimas and O'Hair [4] and Armbruster and Delage [1] proposed some approaches to handle the issue, it is possible to use the same approach for multistage decision making problems, here we restrict our discussions to the consistent preference case so that we may focus on the key challenges arising from multistage maximin problems.

In the expected utility maximization model (2.1) or its equivalent formulation (2.5), the sequence of dynamic decisions is made with respect to a sequence of utility functions {u t }. However, a DM may not have complete information to identify a sequence of true utility preferences but it is possible to use partial information to build an ambiguity set of plausible utility functions. We begin with a formal definition of the ambiguity set which captures DM's utility preferences at each stage. 

is a real-valued function in the deterministic ambiguity set U 1 . In the case that the utility functions are state independent, denoted by {u t (·)}, the ambiguity set is denoted by U .

In this definition, each utility function in the set U t (ξ [t−1] ) depends on historical information ξ [t −1] , which means the DM's utility preference at stage t is affected not only by the current state ξ t−1 prior to the decision making but also earlier experiences. A classical example of such state dependent utility function is the habit-formation utility model where a DM's utility u t (c t , h t ) at stage t depends on both the current consumption c t and the historical habit level of consumption h t = t j=1 α j c t−j where α j , j = 1, · · · , t are positive numbers. The latter can be understood as historical path ξ [t−1] dependent, see [13, 14] . Likewise, an investor who has experienced tough economic circumstances in the past may be more risk averse at the current stage. The structure of the ambiguity set depends on available information in concrete decision making problems, we will come back to details about this in Section 5.

To mitigate the risk arising from ambiguity of the true utility functions in the decisionmaking process under model (2.5), we propose a robust counterpart where the optimal policy is based on the worst-case sequence of utility functions:

Here the maximin robust formulation is based on a holistic view at the very beginning of the decision making process on both the optimal policy and the expected utility. Specifically, instead of considering the maximin robust formulation at each stage, we compute, for every sequence of feasible decisions x [1,T ] , the worst-case expected utility

with a sequence of utility functions u from the ambiguity set. The optimal policy is subsequently identified via the largest worst-case expected utility value. This kind of maximin robust approach is consistent with the philosophy of robust optimization, particularly the recent multistage distributionally robust optimization models [40] . We call it multistage utility preference robust optimization models.

In the case that the utility functions are independent of states, we may obtain a PRO counterpart for model (2.6)

is an ambiguity set of the vectors of utility functions in product form.

An important and widely accepted practice in multistage stochastic programming is that the optimal policy determined at stage 1 should be consistent with the optimal sub-policy to be set at stage t for t ≥ 1, which is known as time consistency or Bellman's principle [2, 9, 46] . The principle is not automatically fulfilled in the multistage PRO models unless the ambiguity set of utility functions is structured properly. This motivates us to introduce the next definition. In multistage risk minimization problems, the time consistency of the optimal dynamic policy can be achieved if the corresponding multistage risk measure is time consistent. The concept of time consistency on multistage risk measure characterizes an order keeping relationship between different stages: given two investment positions A and B, if A is riskier than B under a specific risk measure at some future time, then A is riskier than B under the same measure from today's perspective [6, 39] . All time-consistent risk measures can be written in a nested form [39] .

In multistage distributionally robust optimization, time consistency of the optimal dynamic policy is related to the structure of the dynamic ambiguity set of probability distributions. If the distributionally robust counterpart can be written in a nested form of stage-wise conditional distributionally robust counterparts, known as the rectangular set or recursive multiple-priors set [15, 40] , then the optimal policy is time consistent and the dynamic programming solution approach may follow [40] .

Likewise, the time consistency of the optimal dynamic policy of the multistage preference robust optimization problem relies on the structure of the preference ambiguity set. We shall define a property on the decomposability of the preference set. To this end, we introduce the concept of rectangularity of the ambiguity set of utility functions.

To ease the exposition, we denote the reward function h t (x t , ξ t ) by an F t -adaptable random variable Z t . Definition 3.3 (Rectangularity of the ambiguity set) Let U be a nonempty set of utility sequences u, U is said to be rectangular if

Analogous to the rectangularity of distributionally ambiguity set for multistage DRO problems [40] and the conditional (state-dependent) decomposition of uncertainty set for multistage robust optimization problem [11] , the proposed rectangularity is built on a broad decomposable structure of the inner minimization problem without relying on a specific form of the ambiguity set. The concept differs from the rectangularity in some MSP literature [36] or MDP literature [28, 50] , where a product form of sub-ambiguity set in different stages or states are considered. As noted by Pichler and Shapiro [36] , a product form of sub-ambiguity set is not enough to guarantee the decomposability of a multistage DRO problem. In the multistage PRO problems, this means a product form with deterministic sub-ambiguity set may lead to stateindependent PRO problems which are not rectangular and time-inconsistent (see Section 4). By adding state-dependent property to sub-ambiguity sets, we can show in Proposition 3.1 that the defined state-dependent preference ambiguity set in Definition 3.1 is rectangular.

To study the time consistency of the optimal policy of a PRO model, we shall investigate whether the global optimal solution is consistent to the local optimal solution of the sub-PRO problem over a sub-horizon. If we consider the sub-PRO model of ( In what follows, we will show that the ambiguity set U defined as in (3.1) is rectangular and the PRO model (MS-PRO-SD) is time consistent. To this end, we introduce an interchangeability principle for the preference robust counterpart.

Recall that a linear space M of F-measurable functions (mappings) ψ : Ω → R m is said to be decomposable if for every ψ ∈ M and A ∈ F, and every bounded and F-measurable function γ : Ω → R m , the space M also contains the function η(·) := 1 Ω\A (·)ψ(·)+1 A (·)γ(·). For example,

Lemma 3.1 Let Z be a Banach space and M be a decomposable space and f : Z × Ω → R be a random lower semicontinuous function. Then

where F z (ω) := f (z(ω), ω), provided that the right hand side of (3.7) is less than +∞.

The result is well-known in the case when Z is a Euclidean space, see for instance Theorem 7.92 [41] . We omit the proof as it is similar to the proof of the theorem.

the class of all state-dependent integrable utility functions u(·, ·). Suppose that

is a bounded and monotonically increasing real-valued function}.

(3.8)

Proof. By the tower property of the expectation operator,

By viewing E Pη [u(Z(η), ξ) | ξ] as f (z, ω) in Lemma 3.1, and applying the lemma, we have

The conclusion follows.

With Lemma 3.2, we are ready to deliver the rectangularity of the ambiguity set in the next proposition.

Proposition 3.1 The state-dependent preference ambiguity set in Definition 3.1 is rectangular.

By the tower property of the expectation operator,

which gives rise to (3.4) .

In problem (3.2), at each stage, the utility function is taken in the worst-case sense from a random set depending on historical information. By Proposition 3.1, problem (3.2) can be rewritten as max

(3.10)

Here,

and thus is deterministic.

The reformulations from (3.2) to (3.10) rely on the inter-changeability between operation inf ut∈Ut and the expectation E |F 0 , t = 2, . . . , T . However, the inf

cannot be further exchanged with E |F t−1 as the worst-case utility function u t and the preference ambiguity set

At each stage, we would meet a single period preference robust optimization problem which can be viewed as the well-studied static PRO models.

From (3.10), we can see that the multistage preference robust utility function can be described in a nested form. Analogous to the multistage risk aversion models [8, 39] and multistage distributionally robust optimization models [40] , the nested form guarantees the time consistency of the optimal dynamic policy of problem (3.10), i.e., it can be solved in a recursive dynamic programming procedure. 

for t = 1, . . . , T , with V T +1 (·, ·) := 0, and V 1 equals to the optimal value of problem (MS-PRO-SD). The optimal policy of (MS-PRO-SD) is time consistent.

Proof. At the last stage, we consider a conditional sub-problem of (3.10),

(3.12)

The first equality follows from the additivity of the conditional expectation operator and that the first term does not depend on x T . The second equality follows by Lemma 3.1 which enables us to exchange the order of operations max

and E |F T −2 . Before applying Lemma 3.1, we verify the upper semicontinuity of the functional

Given ] the optimal value of the sub-problem (3.12). By applying the procedure till the first stage, we can show that the optimal value of V 1 equals to the optimal value of problem (MS-PRO-SD).

The only different point at inter-stages from the terminal stage is that, we have to verify the random upper semicontinuity of the optimal cost-to-go value function

By Theorem 7.38 in [41] , we know that the infimum (resp. supremum) operator does not change the random lower semicontinourity of a random function with a bounded domain. Thus, with the boundedness and compactness of U t (ξ [t−1] ) for any ξ [t−1] at any stage t, we can verify the random upper semicontinuity.

In some application cases, the estimation of the scenario (historical path) dependent preference ambiguity set U t (ξ [t−1] ) is a bit complicated for practical use. A natural simpler approach is to consider the Markovian preference ambiguity set U t (ξ t−1 ) which relies only on the randomness at current stage ξ t−1 . Another approach is to consider discrete approximation.

Remark 3.1 It is worth noting that, we only need to exchange the order of the conditional expectation with the infimum operator as the considered utility function is additive in both probability and temporal dimension. For some utility considering nonlinear elasticity of intertemporal substitution, such as recursive or temporal utility functions, Lemma 3.1 is not enough to guarantee the time consistency as we need a stronger version which studies the interchangeability between the infimum operator and both conditional expectation operator and utility functions in previous stages. We assume that the DM is ambiguous about the true utility function which lies in the ambiguity set:

It is easy to see that u i is strictly increasing and concave over [0, 1] and U is independent of state and stage. We consider a simple two-stage portfolio selection problem under the state-independent preference robust expected utility model (3.3) :

where L 0 (R 2 + ) denotes the space of measurable functions taking finite values in R 2 + , and discuss how the worst-case utility function is identified at each investment stage.

We begin by investigating rectangularity of the ambiguity set in problem (4.1), which is essentially about the consistency between the global preference robust counterpart

with global worst-case utility function u * 1 , u * 2 and the nested local preference robust counterpart

with local worst-case utility functionû * 1 ,û * 2 (·). Note that in both problems (4.2) and (4.3), the decision variables are fixed. Here we set x 1 = [1, 0], x 2,1 = [1, 0] and x 2,2 = [1, 0] and demonstrate that the worst-case utility functions of the two problems are different at some state in the second stage. Since both problems are decomposable, we may solve them by solving The worst-case utility value is attained by u 1 (·). Summing them up, we obtain f * = f * 1 + f * 2 = 1.275. The analysis is depicted at the lhs of Figure 3 where "PLU" denotes the piecewise linear utility function. We now move on to calculatef * . x ⊤ 1 r 1,1 = 0

x ⊤ 1 r 1,2 = 0.8

x ⊤ 2,1 r 2,1 = 0.6

x ⊤ 2,1 r 2,2 = 0.6

x ⊤ 2,2 r 2,3 = 0.4

x ⊤ 2,2 r 2,4 = 1.0 PLU=0.45 PLU=0.8

Global worst-case utility Local worst-case utility Figure 3 : Worst-case utilities on a two-stage scenario tree.

From the analysis above, we can see that the DM would adopt a quadratic utility (QU) function u 2 (·) which is more risk averse after he has earned some money (second node at stage 1) but would take a piecewise linear utility function u 1 (·) after he has failed to earn anything (first node at stage 1). The analysis is depicted at the rhs of Figure 3 . The overall worst-case expected utility value in the two stages isf * = f * 1 +f * 2 = f * 1 + 1 2 (f * 2,1 +f * 2,2 ) = 1.26.

Summarizing the calculations of both problems (4.2) and (4.3), we conclude that

This is because model (4.2) chooses the worst-case utility function independent of scenarios in the second stage whereas model (4.3) chooses the worst-case utility function after observing the scenarios and hence is more conservative. The underlying reason is that the utility functions in the ambiguity set are state independent.

We now turn to discuss time-consistency of the preference robust optimization problem (4.1). Let {x * 1 , x * 2 (·)} = arg max

andx * 2 (·) = arg max

We want to show that x * 2 (·) =x * 2 (·). Observe that due to the decomposable structure of problem (4.4),

For the first-stage optimization problem in (4.6), the optimal portfolio is always x * 1 = [1, 0] as r 1 ≥ r 2 in both scenarios and the utility function does not affect the optimal choice, given that the utility function is increasing. To identify the worst-case utility, let us compare the optimal values u 1 (x ⊤ 1 r 1 ) under the two utility functions. It is easy to obtain that the optimal value is 0.48 under the quadratic utility function and 0.45 under the piecewise linear utility function. Thus the worst-case utility function u * 1 at the first-stage is u 1 (·).

Let us now look at the second-stage optimization problem in (4.6). By the finiteness of the preference robust set U and the scenario tree structure of the random return r, the second-stage optimization problem in (4.6) can be formulated as

by adding some auxiliary variables. Problem (4.7) is a convex quadratic programming problem which can be solved efficiently by CVX in Matlab.

Alternatively, we can solve (4.7) in a closed-form. As the return rate in all scenarios are larger than 0.2, thus 3x ⊤ 2 r 2 ≥ 0.5x ⊤ 2 r 2 + 0.5 in all scenarios. 

This is a maximization problem with a piecewise quadratic objective function. The optimum is attained potentially at two sets of points: the global maximizers of each piece, and the set of points where the two pieces intersect, that is,

where v * linear = max

and v * int = max We now turn to discuss solution of PRO problem (4.5) . Suppose that at the beginning of the second-stage, the DM can predict different scenarios that would occur at the end of the second stage. Then the DM may consider the sub-PRO problem (4.5) at the second stage, which may have different optimal solutions and corresponding worst-case utility functions at the two different nodes.

As there are two nodes at the end of the first stage, we have to solve the two sub-optimization problems conditional on the historical information on the two nodes at stage 1, i.e., Problem (4.11) has an optimal solutionx * 2,1 = [1, 0] withv * 2 (r 1,1 ) = 0.8. The corresponding worst-case utility u * 2,1 is u 1 (·). Problem (4.12) has an optimal solutionx * 2,2 = [0.8, 0.2] witĥ v * 2 (r 1,2 ) = 0.84. At the optimum, the expected utility values of u 1 (·) and u 2 (·) are the same. The analysis is depicted at the rhs of Figure 4 .

global optimizer = local optimizer By comparing the solutions shown in Figure 4 , we can see that the global optimal solution at the lhs and the local optimal solution at the rhs are not the same. Thus, the optimal solution of state-independent PRO model (3.3) is not time consistent. This is because in model (4.4) the worst-case utility u * 2,1 must be equal to u * 2,2 regardless of the reward at the end of stage one. In contrast, model (4.5) allows one to choose worst-case utility u * 2,1 or u * 2,2 after viewing the outcome of reward at the end of stage one. The fundamental reason is that the worst-case utilities in sub-horizon model (3.6) are scenario (F t−1 -adapted ξ [t−1] ) dependent whereas the worst-case utilities in (3.3) are all deterministic (independent of the stochastic process {ξ t }).

The structure of the ambiguity set of utility functions is determined by available information on the DM's utility preferences at each stage. Here we follow two approaches which are widely used in the literature of PRO models: pairwise comparison [1, 19, 24] and nominal approach [25, 47, 48] . The former elicits DM's preferences via pairwise comparison questionnaires such as a lottery vs a deterministic gain/loss and translates the preferences (answers) into a characterization/specification of the true utility function, the latter constructs an ambiguity set of utility functions in a neighborhood of a plausible nominal utility function.

To simplify the discussion, here we restrict the domain of utility functions to [a, b] which means the range of reward function h t (x t , ξ t ) falls within the interval, and normalize the utility function with u(a) = 0 and u(b) = 1 for t = 1, · · · , T . The normalization does not affect the utility preferences. Let U be the set of continuous normalized non-decreasing utility functions in L p ([a, b]) with u(a) = 0, u(b) = 1, and U c a subset where the utility functions are concave.

We begin with the pairwise comparison approach which is based on Von Neumann-Morgenstern's expected utility theory, that is, any preference between two random prospects by the DM can be represented by expected utility of the random prospects albeit such a utility is unknown. To narrow down the scope of the true utility function, one may design more pairwise comparison questionnaires and ask the DM to make a choice on each pair of them, see Armbruster and Delage [1] for details.

In a dynamic decision making process, the DM's preference depends on not only the overall wealth/gains that he has accumulated at the decision making stage but also the stage he is standing. The latter is particularly important because the DM's preference may be affected by the current environment. For instance, an investor in a bull market may prefer high growth stocks with higher tolerance to volatility, while in a bear market, he may prefer less volatile stocks even with a lower return rate. This means his answer to the same questionnaires may be affected by his risk attitude under different macro-market conditions. This motivates us to introduce ambiguity set of state-dependent utility functions in Definition 3.1 by setting

where {(W k , Y k ), k = 1, · · · , K} is a set of prospects for pairwise comparison. Note that this set may be fixed or enlarged over the process, which means the questionnaires used in stage t−1 will be used in stage t, but the DM might have different answers due to the change of stage/state. Here (supp (Y k ) ∪ supp (W k )) ∪ {b} and N := |S | denote the cardinality of set S , let {y j } j=1,...,N be the ordered sequence of points in S with fixed y 1 = a, y N = b. In the forthcoming discussions, we will use utility values at S to characterize the property of the true unknown utility function.

In some decision making problems, a DM may be able to "roughly" identify a nominal utility function which captures most of the DM's preferences either elicited through empirical data, or based on subjective judgement or from partially elicited preference information, but there is incomplete information to tell whether the nominal utility is the true utility. Under such a circumstance, it might be sensible to consider a set of utility functions near the nominal utility and base the optimal decision on the worst-case utility function from the set. We call this a nominal approach.

We begin by defining a kind of semi-distance between any two utility functions. Let G be a set of measurable functions defined over [a, b] . For u, v ∈ U , define the semi-distance between u and v by

where g, u = b a g(z)du(z). In the case that the utility functions in U are normalized with u(a) = 0, u(b) = 0, dl G (u, v) resembles the pseudo-metric of ζ-structure in probability theory. In this paper, we are interested in two cases: The former corresponds to the Kantorovich metric, denoted by dl K (u, v), and the latter corresponds to the uniform Kolmogorov metric.

With the definition of the ζ-metric, we are ready to introduce the definition of ζ-ball in the space of the utility functions U . We begin with the static case. In this paper, our focus is on the construction of an ambiguity set of a sequence of statedependent utility functions specified in Definition 3.1.

In this formulation,

, the radius r t (ξ [t −1] ) and the pseudo-metric dl G . The choice of functions in set G may depend on historical data ξ [t −1] . The nominal utility functionũ t (·, ξ [t−1] ) may be identified from empirical data, that is, the utility function is inferred from the DM's past utility preferences and the feedback (represented by historical states ξ [t−1] ). As the time goes on, we can collect more data/information about the DM's preferences and subsequently a more accurate nominal utility as well as a smaller radius.

The next proposition quantifies the difference between two ζ-balls of utility functions with different nominals and radiuses using Hausdorff distance. For any two sets U, V ⊂ U , define D(U, V ; dl G ) := sup u∈U inf v∈V dl G (u, v), which quantifies the deviation of U from V and H(U, V ; dl G ) := max {D(U, V ; dl G ), D(V, U ; dl G )} , the Hausdorff distance between the two sets under the pseudo-metric.

Proposition 5.1 Let u, v ∈ U and r 1 , r 2 ∈ R + . Then

In particular, if u * is the true utility function and u ref is a nominal utility function, then

Proof. It suffices to show that

and

We prove (5.7). The conclusion is trivial if B(u, r 1 ) ⊂ B(v, r 2 ), so we consider the case that

This shows v λ ∈ B(v, r 2 ). Thus

Swapping the positions of the two balls in the above discussions, we obtain (5.8).

Inequality (5.6) means that the Hausdorff distance of two balls is bounded by the distance of their centers plus the difference of the radii. With the proposition, we are ready to present a multistage PRO model with the ambiguity set defined via (3.10) and (5.5) as follows:

Here, B(ũ 1 , r 1 ) in U B 1 relies only on deterministic nominal utilityũ 1 and radius r 1 . By Theorem 3.1, (5.9) can be computed by the following dynamic programming equation,

(5.10)

From computational point of view, problem (5.10) is still not easy to solve because the inner minimization problem is infinite dimensional. This motivates us to develop an approximation scheme where the ball of utility functions B(ũ t (·, ξ [t−1] ), r t (ξ [t−1] )) is approximated by a ball of piecewise linear utility functions.

Let y 1 < · · · < y N be an ordered sequence of points in [a, b] and Y := {y 1 , · · · , y N } with y 1 = a and y N = b. Let U N be a class of continuous, non-decreasing, piecewise linear functions defined over the interval [y 1 , y N ] with breakpoints on Y . For a given v ∈ U N , let

and

for a given nominal utility functionũ t (·, ξ [t−1] ) ∈ U N . We propose to solve (5.10) by solving

(5.12) To justify this, we derive the error between

Remark 5.1 By specifying the nominal utility function to be piecewise linear, it is easier to estimate the function from the customer/investor in practice. We preset two endpoints a, b and some values in [a, b] , and then let the customer/investor score on these values under different scenarios. By collecting and normalizing the scores in different scenarios and linking the utility scores by a piecewise linear function, we obtain a normalized nominal utility function in each scenario. The radius describes the error in the scoring process which depends on the credibility of the scores.

Differing from B(u, r) defined in (5.4) , the ζ-ball consists of piecewise linear utility functions only. In what follows, we quantify the difference between B(u, r) and B N (v, r) under the ζ-metric so that we will be able to assess the impact when we replace the former with the latter in the utility preference robust optimization model. We need the following technical result which is drawn from [21, Proposition 4.1] and the proof of [21, Theorem 4.1] .

Proposition 5.2 Let u ∈ U . Assume that u(·) is Lipschitz continuous over an interval [a, b] with modulus L and u N is its piecewise linear approximation, that is,

where y 1 = a, y N = b. Let β N := max i=2,··· ,N (y i − y i−1 ). Then the following assertions hold.

We call u N defined in (5.13) a projection of u on U N . Using the proposition, we are able to derive an upper bound for the Hausdorff distance between B N (u, r) and B(v, r). 

(5.14)

In the case when u = v N is a projection of v on U N ,

where L and β N are defined as in Proposition 5.2.

Proof. Inequality (5.15) follows straightforwardly from (5.14) and (5.17), so we only prove (5.14). By the triangle inequality, H(B N (u, r) , B(u, r); dl G ) + dl G (u, v). Let ǫ be a small positive number and u ǫ ∈ B(u, r)\B N (u, r) be such that

For the given u ǫ , we may find u ǫ N ∈ U N as in Proposition 5.2 such that

and hence (5.16) because ǫ can be driven to zero. So we are left with the case that u ǫ N ∈ B N (u, r).

Then u λ ∈ U N and dl G (u λ , u) = λdl G (u ǫ N , u) = r. This shows u λ ∈ B N (u, r). Thus

and hence D(B(u, r), B N (u, r); dl G ) ≤ dl G (u ǫ , B N (u, r)) + ǫ which gives (5.16) by driving ǫ to zero.

We are now ready to present the main result of this section. 

for t = 1, . . . , T . In the case when β N (ξ [s −1] ) and L(ξ [s −1] ) are independent of states,

Proof. We prove by induction. Observe that for any x t−1 and ξ [t−1] , t = 2, . . . , T ,

where V T +1 = 0 and V T +1 = 0 and the last inequality follows from Lemma 5.1. Assume for stage t + 1 that

for any fixed x t and ξ [t] . Then

which gives rise to (5.17).

Letũ ∈ U N . We consider a ball in the space of U N with the Kantorovich metric

In what follows, we derive tractable formulation for computing dl K (u,ũ). Let g ∈ G where G consists of all Lipschitz continuous functions defined on [a, b] with modulus bounded by 1.

where β j denotes the slope of u at interval [y j−1 , y j ]. Since for each g ∈ G , −g ∈ G ,

whereβ j denotes the slope ofũ at interval [y j−1 , y j ]. Note that in this formulation, dl K (u,ũ) depends on the slopes of u,ũ rather than their function values, N j=2 β j (y j −y j−1 ) = u(b)−u(a) = 1, N j=2β j (y j − y j−1 ) = u(b) − u(a) = 1. Let w j := y j y j−1 g(t)dt and z j = g(y j ), j = 2, . . . , N . Since |g(y) − g(y j−1 )| ≤ y − y j−1 for all y ∈ [y j−1 , y j ], we have

for j = 2, · · · , N . Likewise, since |g(y j ) − g(y)| ≤ y j − y for all y ∈ [y j−1 , y j ], we have

for j = 2, · · · , N . Consequently 

The discussion above shows that we can obtain the Kantorovich distance dl K (u,ũ) by solving a linear program. This will facilitate us to derive tractable formulations for solving PRO(ζ, N ) by imbedding (5.20) into the inner minimization problem.

We can easily incorporate the tractable formulations of the Kantorovich ball into the dynamic program (5.12) and develop tractable formulations for the latter. To comply with the setting in Theorem 5.1, we need to impose Lipschitz continuity on the nominal utility functionũ t (·, ξ [t−1] ) and its derivativeũ ′ t (·, ξ [t−1] ) as well as the concavity of the utility function.

for all ξ [t −1] . Suppose that the optimal value function at period t + 1 is V t+1 x [t] , ξ [t] . Given historical data ξ [t−1] and historical decision x [t−1] , ξ t is discretely distributed with S scenarios ξ 1 t , . . . , ξ S t and appearing probability P(ξ t = ξ i t |ξ [t−1] ), i = 1, . . . , S, then the optimal decision x t at stage t can be derived by solving the following programming problem,

where the optimal value is V t

Proof. The resulting robust dynamic programs can be written as

(5.22) We can separate the maximin operations by writing

By utilizing the piecewise linear structure of u and setting α j = u t (y j ) and β j = u ′ t (y j ) at the breakpoints y j , j = 1, . . . , N , we can effectively write (5.23) aŝ

where constraints (5.24c)-(5.24h) characterize the Kantorovich ball (5.23b) as we described in (5.20) . Constraint (5.24j) characterizes the piecewise linear structure of u t in (5.23c) and constraints (5.24j)-(5.24k) imply that β j ≥ β j+1 and hence the concavity of the piecewise linear utility function u t . Constraint (5.24l) is concerned with the non-decreasing property and Lipschitz continuity of u t with modules bounded by L(ξ [t−1] ) as in (5.23d). Constraint (5.24m) is the reformulation of (5.23e) characterizing the Lipschitz continuity of marginal utility with modules bounded byL(ξ [t−1] ). As in the literature of PRO models in one-stage decision making, the evaluation of the utility function at point h t (x t , ξ i t ) in the objective is carried out by a linear function passing through point (h t (x t , ξ i t ), u t (h t (x t , ξ i t )), with slope ε i and intercept ϕ i . Constraint (5.24i) requires that all those linear pieces upper bound u t at those breakpoints.

is the slope of nominal utility at those breakpoints.

By taking the duality of the linear program, we obtain

Taking this duality form back to (5.22) gives the results.

Theorem 5.2 provides a connection between the optimal value functions at the adjacent stages by solving an optimization problem. When the optimal value function at period t + 1 is concave, the reward function h t (·, ξ t ) is concave, the feasible set X t x [t−1] , ξ [t−1] is compact and convex, the optimization problem is a convex programming problem which can be solved efficiently by the interior point method.

However, the dynamic programming approach (5.12) still suffers from some computational difficulties. First, we need to have an explicit form of V t+1 (x t , ξ [t] ), the optimal value function of the remaining stages from stage t + 1. Second, the optimal value function at stage t can only be computed state by state by using the optimization model in Theorem 5.2. There are two ways to address the challenges. One is to approximate the value function by a piecewise linear function or via a deep network (a multi-level composition of continuous functions), and then use the optimal values obtained from solving state dependent problem in Theorem 5.2 to update the approximation function. The former is known as the stochastic dual dynamic programming (SDDP) approach whereas the latter is known as deep reinforcement learning (DRL) approach. The other is to generate a large scenario tree of the mutistage decision making process in the sample space and then find historical path-dependent optimal solutions by solving a large scale optimization problem. In that case the optimization problem in Theorem 5.2 is embedded into each node on the large scenario tree.

Both SDDP and DRL require to learn a function for approximating the optimal value function, which could be algorithmic insufficient for problems with a large dimension. They can be efficiently applied only to stage-independent problems or Markovian problems where the dimension of the value function is limited. The scenario tree approach naturally has the historical path-dependent structure but it may suffer from the curse of dimensionality. In this paper, we will adopt the scenario tree approach and we will present a detailed description in the next section.

6 Scenario-tree based computational schemes As we discussed earlier, given the potential high nonlinearity of the dynamic programming equation (5.12) and the historical path dependent structure of the ambiguity set of the dynamic utility functions, we propose to use the scenario tree approach for solving the multistage preference robust optimization problem (3.2).

Given a discrete support Ξ and a scenario tree structure of {ξ} T t=1 as illustrated in Figure 5 . Figure 5 : A T + 1-stage scenario tree Suppose that there are S nodes in the scenario tree, with appearing probability p s , for s = 1, . . . , S. Denote by s − the father node of s, s + the set of son nodes of s, ξ[s] the historical scenario from the root node to node s. Denote by t(s) the stage of node s, S − the set of non-leaf nodes. Denote the decision at node s by x(s) and the historical decision from the root node to node s by x[s]. Notice that the decision at node s (at stage t(s)) is made according to the future realizations on the father node s − at stage t(s) − 1. Thus, the realization of the return function at node s is h t(s) (x(s − ), ξ(s)). The ambiguity set of the utility functions upon historical samples at node s is denoted as U (s) := U t(s) (ξ[s]).

Given the scenario tree structure of {ξ t }, the (MS-PRO) problem (3.2) can be reformulated as the following min-max programming problem,

where the inner minimization is to calculate the worst-case conditional expected utility value over the son nodes of s across all scenarios whereas the outer maximization is w.r.t. the optimal decision at node s. The further reformulation depends on the selection of the scenario(node)dependent ambiguity set U (s).

If we consider the Kantorovich ball ambiguity set studied in Section 5.2.3, we can directly apply the tractable reformulation of the dynamic programming equation obtained in Theorem 5.2 to the scenario tree recursively.

Proposition 6.1 Given the scenario tree structure of {ξ t } and a series of Kantorovich ball based ambiguity sets U K (s) = U K t(s) (ξ[s]) on each node s ∈ S − of the scenario tree, program (6.1)

can be reformulated as

w j (s) ≤ z j−1 (s)(y j − y j−1 ) + 1 2 (y j − y j−1 ) 2 ς(s), j = 2, · · · , N, s ∈ S − , − w j (s) ≤ −z j−1 (s)(y j − y j−1 ) + 1 2 (y j − y j−1 ) 2 ς(s), j = 2, · · · , N, s ∈ S − , w j (s) ≤ z j (s)(y j − y j−1 ) + 1 2 (y j − y j−1 ) 2 ς(s), j = 2, · · · , N, s ∈ S − , − w j (s) ≤ −z j (s)(y j − y j−1 ) + 1 2 (y j − y j−1 ) 2 ς(s), j = 2, · · · , N, s ∈ S − ,

Proof. There are two ways to obtain the reformulation. One is to use the duality technique in the proof of Theorem 5.2, on the basis of the finite expansion reformulation (6.1), to reformulate the inner minimization over U (s) as the sum of S − maximization problems. The other is to use Theorem 5.2 recursively from the last stage to the first stage, and then we derive the desired conclusion.

Under the concavity of h t (·, ·), t = 1, . . . , T , and the convexity of X 1 (·), X t (·, ·), t = 2, . . . , T , problem (6.2) is a convex programming problem.

Given a discrete support Ξ and a scenario tree structure of {ξ} T t=1 , the (MS-PRO) model (6.1) with a series of pairwise comparisons ambiguity set U P t (ξ [t−1] ) defined in (5.1) can be reformulated as a linear program. Proposition 6.2 Given the scenario tree structure of {ξ t } and a series of pairwise comparisons ambiguity sets U P (s) = U P t(s) (ξ[s]), problem (6.1) is equivalent to

(τ j (s) + σ j (s)) (y j+2 − y j )

Proof. Analogous to the proof of Theorem 1 in [1] , we can show that the worst-case utility function is in a piecewise linear form with at most N breakpoints. Let S(s) = |s + |. By taking the piecewise linear form in the functional infimum problem, we have inf us∈U P (s) i∈s +

The only difference between the studied model and the model in Theorem 1 of [1] is that, we replace the normalization constraint in [1] by bounded support constraints.

Taking the duality to the minimization LP problem gives an equivalent maximization LP reformulation. Applying maximization LP reformulation to each inner infimum problem at node s in (6.1), we obtain the deterministic reformulation of (6.1).

Given the concavity of h t (·, ·), t = 1, . . . , T , and the convexity of X 1 (·), X t (·, ·), t = 2, . . . , T , problem (6.3) is a convex programming problem.

To examine the performance of the proposed multistage PRO model (3.2) and numerical schemes, we carry out some numerical tests on a multistage investment-consumption problem on the basis of [13, 16] with state-dependent utility functions.

Consider an investor who plans to use his wealth to purchase crude oil and make oil products over T periods. At the beginning of each time period, the investor has two options: (a) consume all of the wealth for the purchase, and (b) consume part of it and invest the remaining wealth in n risky assets of a security market. The objective of the investor is to maximize the overall expected utility of the oil products consumption.

Let w 0 = 1 denote the normalized initial wealth and q t denote the quantity of crude oil that the investor plans to buy at beginning of time period t at price p t−1 which is the oil price at the end of time period t − 1 (alternatively, at the beginning of period t). The total cost from the purchase is q t p t−1 and the remaining wealth is w t−1 − q t p t−1 , where w t−1 is the wealth at the end of period t − 1. The remaining wealth is invested in n risky assets with a portfolio x t , where x i t is the wealth invested in the i-th asset, i = 1, . . . , n. The random return rate, denoted by r i t , is calculated period-wise, i.e., a $1 investment at the beginning of period t will generate $(1 + r i t ) at the end of the period. Thus, the wealth of the investor at the end of period t − 1 is w t−1 = (e + r t−1 ) ⊤ x t−1 . This wealth is divided into the consumption q t p t−1 and the further investment e ⊤ x t , i.e., w t−1 = q t p t−1 + e ⊤ x t . A combination of the two equations gives rise to the following wealth balance equation

At the initial period t = 1, we have e ⊤ x 1 = w 0 − q 1 p 0 and at the final period T , the investor must consume all of the wealth on purchase of oil, thus

The utility of the oil products is calculated at the end of each period as follows. We assume that all of the q t barrels of oil purchased at the beginning of period t is used to produce g t (q t ) quantities of the oil products by the end of period t with unit value d t . Thus the total value from the production is g t (q t )d t and the period-wise utility value is u t (g t (q t )d t , h [t−1] ). Here the investor's utility function depends on all the historical information h [t−1] = {p 0 , . . . , p t−1 , d 1 , . . . , d t−1 , r 1 , . . . , r t−1 }. Based on the discussions above, we formulate the multistage investmentconsumption problem as max

In the setup, we assume that short sales of the security assets and crude oil are forbidden, i.e., x t ∈ R n + and q t ∈ R + .

Assume that the investor is ambiguous about the true utility function at each stage, we then propose a preference robust counterpart of the multistage optimal investment-consumption problem to mitigate the risk arising from the ambiguity:

where the ambiguity set U is state-dependent defined in Definition 3.1 with two kinds of preference ambiguity set. The first class is the pairwise comparisons set U P t (ξ [t−1] ) defined in (5.1), and the other one is the Kantorovich ball set U K t (ξ [t−1] ) defined in (5.21).

To ease the exposition, we consider a simple case with g t (x) = x and d t = p t , for t = 1, . . . , T . This is based on the understanding that the productions of oil products are proportional to the purchased amount of the crude oil and the value of oil products is proportional to the crude oil price.

We assume that the utility of oil products depends on the crude oil price in two regimes. In the usual regime when the crude oil price is less than or equal to $60 per barrel, the investor has a linear utility function u lin (x) = x defined over [0, 1] . In the other regime when the crude oil price is greater than $60 per barrel, the investor has a concave utility u exp (x) = (1 − exp(−3x))/(1 − exp(−3)).

We set the risky assets pool with 9 exchange-traded-funds (ETF) in the US equity market corresponding to different industry sectors. The risky ETFs are Utilities Select Sector SPDR ETF (XLU), Energy Select Sector SPDR ETF (XLE), Financial Select Sector SPDR ETF (XLF), Technology Select Sector SPDR ETF (XLK), Health Care Select Sector SPDR ETF (XLV), Consumer Staples Select Sector SPDR ETF (XLP), Consumer Discretionary Select Sector SPDR ETF (XLY), Industrial Select Sector SPDR ETF (XLI), and Materials Select Sector SPDR ETF (XLB). We collect weekly data of crude oil price (OK Crude Oil Future Contract) and the ETF prices over the period 2007/1/1 -2021/3/29. ETF data are downloaded from Yahoo Finance 1 and oil prices are downloaded from Energy Information Administration 2 .

Before generating the scenario tree, the price data are transformed into log-return rate to pass the stationary test of historical time series. We adopt an ARMA-GARCH model with Gaussian residuals to forecast the future return rate of oil and ETF prices and built a scenario tree with a symmetrical branching structure. One can refer to [52] for detailed algorithms of the scenario tree generation.

We examine and compare the MSP model and MS-PRO model with following groups of settings:

• MSP-True: Multistage utility maximization problem (7.1) with the true utility functions.

• MSP-PLN: Multistage utility maximization problem (7.1) with piecewise linear nominal utility functions.

• MS-PRO-Kan: Multistage preference robust problem (7.2) with the ambiguity set U K t (ξ [t−1] ) constructed via the Kantorovich ball centered at a piecewise linear nominal utility function at each node.

• MS-PRO-PC: Multistage preference robust problem (7.2) based on pairwise comparison ambiguity set U P t (ξ [t−1] ) with K random generated questionnaires and answers at each node.

To generate the ambiguity set of utility functions, we need to obtain some information about the investor's preferences via two most widely used approaches: scoring and choosing. For the Kantorovich ball, we pre-select the N breakpoints isometrically distributed over [0, 1] and use the true utility function to determine the nominal utility values at these breakpoints under each scenario (a simulation of the scoring process from the investor). Next, we use the line segments between two consecutive points and the endpoints (0, 0), (1, 1) to generate a piecewise linear nominal utility function in each scenario. As for the pairwise comparison set, we also pre-select N breakpoints and randomly generate lotteries with two outcomes supported on these points. We compare the utility values of the two lotteries with the true utility under each scenario which gives the simulated choice from the investor. For both ambiguity sets, we set L = 3/(1−exp(−3)) andL = 9/(1 − exp(−3)) under all scenarios. All optimization problems in the deterministic reformulations are solved by SeDuMi solver through CVX package in Matlab R2016a on a PC with 3.4GHz CPU and 16GB RAM.

Numerical tests are carried out from the following 4 perspectives. Figure 7 shows the results.

• We compare the optimal values and optimal consumption at the initial period of MS-PRO-Kan with different radii of Kantorovich ball. Figure 8 plots the results when T = 4, and N = 40 under all scenarios.

• We compare the optimal values of MS-PRO-PC with different number of questionnaires. The corresponding results under T = 4 are shown in Figure 9 . • Figure 6 : with the increase of the number of stages, the investor has more flexible choices on the future consumption and thus has a higher optimal total expected utility value. This phenomenon holds for both MSP and MS-PRO models. Moreover, the optimal value under the MS-PRO model is smaller than that of MSP, which can be viewed as the price of robustness. The larger the number of time periods, the bigger the gap.

• Figure 7 : with the increase of the number of breakpoints, the optimal value of MSP-PLN converges to that of MSP-True; and the optimal value of MS-PRO-Kan gets closer to that of MSP-True.

• Figure 8 : when the radius reduces to zero, MS-PRO-Kan converges to MSP-PLN with the same piecewise linear nominal utility function; when the radius increases in a limited amount, the optimal initial consumption increases caused by increase of the ambiguity in future; when the radius is larger than a threshold value (around 0.2), the initial consumption drops rapidly and becomes close to zero. The reason is that the constraints corresponding to the Kantorovich ball becomes inactive then and only the bounds on Lipschitz modulus as well as the convexity constraints take effect.

• Figure 9 : with the increase of the number of questionnaires, the optimal value of MS-PRO-PC converges to that of MSP-True. The reason is that the randomness of questionnaires recedes.

In this paper, we explore preference robust models for multistage stochastic expected utility maximization problems when the DM's utility preferences are ambiguous. We introduce a concept of rectagularity property of the ambiguity set which plays an important role in the consistency of inter-temporal preference and policy. The time consistency enables us to develop a dynamic programming approach for solving the problem.

We propose a Kantorovich ball approach to define the ambiguity set of utility functions in the case when the DM has a nominal utility function and a dynamic pairwise comparison approach to capture the DM's state-dependent utility preferences. The two approaches may be used separately or combined. With the recursive dynamic structure of the MS-PRO model, we use the well-known scenario tree method to solve it. To examine the performance of the proposed robust model and numerical schemes, we apply them to an investment-consumption problem.

The preliminary numerical test results show that the scenario tree approach works well when the number of stages is small but suffers from curse-of-dimensionality when number of stages increases. Designing SDDP or DRL algorithm based on the dynamic optimal value function might be a promising direction for future research. Moreover, PRO in an MDP setting also remains to be explored. Finally, we note that since the MS-PRO model is built on von Neumann-Morgenstern's expected utility theory, it inherits the limitations of the latter. It might be interesting to explore similar robust models with rank dependent expected utility theory or cumulative prospect theory.

Decision making under uncertainty when preference information is incomplete

Coherent multi-period risk adjusted values and Bellman's principle

Robust Optimization

Learning preferences under noise and loss aversion: An optimization approach

Mean-variance portfolio optimization with statedependent risk aversion

Time consistent dynamic risk measures

Making Hard Decisions with Decision Tools Suite

Multi-period risk measures and optimal investment policies

Better than dynamic mean-variance: Time inconsistency and free cash flow stream

Beta and coskewness pricing: Perspective from probability weighting

Robust multistage decision making. INFORMS Tutorials in Operations Research

Minimizing risk exposure when the choice of a risk measure is ambiguous

Dynamic Asset Pricing Theory

Modelling the term structure of interest rates under nonseparable utility and durability of goods

Recursive multiple-priors

Multiperiod consumption-investment decisions

Utility assessment methods

Maxmin expected utility with non-unique prior

Utility preference robust optimization: piecewise linear approximation, error bounds and stability, manuscript

Robust spectral risk optimization when the subjective risk aversion is ambiguous: a moment-type approach

Utility preference robust optimization with moment-type information structure

Ambiguity in risk preferences in robust stochastic optimization

Forward rank-dependent performance criteria: Time-consistent investment under probability distortion

Robust decision making over a set of random targets or risk-averse utilities with an application to portfolio optimization

Optimization with reference-based robust preference constraints

Robust decision making using a general utility set

Consistent investment of sophisticated rank-dependent utility agents in continuous time

Robust dynamic programming

Subjectively weighted utility: A descriptive extension of the expected utility model

Stationary ordinal utility and impatience

Temporal resolution of uncertainty and dynamic choice theory

Inverse optimization of convex risk functions

State-dependent utility and ambiguity. Working paper

Distorted Probabilities and Choice under Risk

Multistage Stochastic Optimization

Mathematical foundations of distributionally robust multistage optimization

A unified framework for stochastic optimization

Distributionally robust optimization: A review

Risk-averse dynamic programming for Markov decision processes

Rectangular sets of probability measures

Lectures on Stochastic Programming: Modeling and Theory

Failing to foresee the updating of the reference point leads to time-inconsistent investment

Judgement under uncertainty: heuristics and biases

A law of comparative judgement

Theory of Games and Economic Behavior

Continuous time mean variance asset allocation: a time consistent strategy

Robust spectral risk optimization when information on risk spectrum is incomplete

Preference robust distortion risk measure and its application

Decision making with incomplete information

Robust Markov decision processes

Preference robust optimization for quasiconcave choice functions

A copula-based scenario tree generation algorithm for multiperiod portfolio selection problems

Preference robust models in multivariate utility-based shortfall risk minimization