key: cord-0677571-4b5hjddv
authors: Winkelmann, David; Ulrich, Matthias; Romer, Michael; Langrock, Roland; Jahnke, Hermann
title: Dynamic Stochastic Inventory Management in E-Grocery Retailing: The Value of Probabilistic Information
date: 2022-05-13
journal: nan
DOI: nan
sha: 8b39364a73ee0704530610ce4f7c81c97134052b
doc_id: 677571
cord_uid: 4b5hjddv

Inventory management optimisation in a multi-period setting with dependent demand periods requires the determination of replenishment order quantities in a dynamic stochastic environment. Retailers are faced with uncertainty in demand and supply for each demand period. In grocery retailing, perishable goods without best-before-dates further amplify the degree of uncertainty due to stochastic spoilage. Assuming a lead time of multiple days, the inventory at the beginning of each demand period is determined jointly by the realisations of these stochastic variables. While existing contributions in the literature focus on the role of single components only, we propose to integrate all of them into a joint framework, explicitly modelling demand, supply shortages, and spoilage using suitable probability distributions learned from historic data. As the resulting optimisation problem is analytically intractable in general, we use a stochastic lookahead policy incorporating Monte Carlo techniques to fully propagate the associated uncertainties in order to derive replenishment order quantities. We develop a general inventory management framework and analyse the benefit of modelling each source of uncertainty with an appropriate probability distribution. Additionally, we conduct a sensitivity analysis with respect to location and dispersion of these distributions. We illustrate the practical feasibility of our framework using a case study on data from a European e-grocery retailer. Our findings illustrate the importance of properly modelling stochastic variables using suitable probability distributions for a cost-effective inventory management process.

Inventory management in grocery retailing requires decision support in a stochastic environment to determine replenishment order quantities, with the aim of optimising economical targets based on an intended level of demand satisfaction. This level can be derived from the trade-off between shortage costs (short-term lost sales and long-term customer churn) and costs incurred by excess inventory (holding and spoilage).

In practice, most retailers offer SKUs with a shelf life of multiple demand periods. As a result, excess inventory can be sold in the following demand period(s) and thus affects the replenishment order decisions in these periods (Kim et al., 2014) . In addition to these dynamic inter-period dependencies, retailers are faced with a convolution of distributions for multiple stochastic variables, such as demand, shelf lives, and the quantity delivered from the supplier. These uncertainties are typically amplified by a lead time of multiple days. Therefore, costs resulting from a given order decision are uncertain, rendering the inventory management process a stochastic dynamic optimisation problem. In particular, retailers are required to deal with two aspects that make it difficult to determine optimal replenishment orders (Fildes et al., 2019) . First, parameters for the underlying probability distributions need to be estimated from historical data or explanatory variables (features). Second, retailers need to adequately incorporate the various sources of uncertainty into the decision-making process (Raafat, 1991 , Silver et al., 1998 .

In the past, the literature mostly focused on simple decision policies for determining replenishment order quantities (Heyman and Sobel, 2004) . More recently, retailers are able to collect comprehensive data at low costs while at the same time, the available computational power has increased. These developments made it possible to design new data-driven approaches for the derivation of replenishment order quantities (see e.g. Siegel and Wagner, 2021; Elmachtoub and Grigas, 2021; Lee et al., 2021; Xu et al., 2021) . In particular, e-grocery retailing offers opportunities for an accurate analysis of decision policies, as external effects are reduced and the data is typically more informative than in brick-and-mortar retailing, e.g.

due to the availability of uncensored demand data (Ulrich et al., 2021) . These approaches incorporating extensive data are mainly based on the setting of the newsvendor problem, the classic inventory management model to determine the cost-optimal inventory level in case of stochastic demand (Silver et al., 1998 , Zipkin, 2000 . However, this model relies on limiting assumptions, such as independent demand periods, a fixed sales period, and a match between the quantity ordered by the retailer and the quantity provided by the supplier.

In this paper, we aim at overcoming these limitations by explicitly modelling all relevant stochastic variables, namely demand, shelf lives of SKUs, and delivery shortfalls using suitable probability distributions. Fully accommodating the uncertainty, resulting in a convolution of probability distributions, renders it challenging to analytically derive optimal replenishment order quantities in a dependent multi-period setting. Instead, we propose a Monte Carlo-based approximate dynamic programming approach that determines the replenishment order decisions minimising the expected costs for a set of sample trajectories spanning a given lookahead horizon. An advantage of this approach, which, following the terminology proposed by Powell (2019a), can be characterised as a stochastic lookahead policy, is that it allows integrating the full distributional information of all stochastic variables available to the decision-makers.

In addition, this approach does not require stationarity assumptions but naturally adapts to time-dependent probabilistic forecasts such as those suggested by Ulrich et al. (2021) .

The stochastic lookahead policy provides us with the flexibility to perform a detailed computational study in which we assess the benefit of using probability distributions instead of relying on point estimates for the stochastic variables affecting the replenishment order decision process. In particular, we investigate the effect of representing only subsets of the stochastic variables by probability distributions -this analysis can help decision-makers in determining which stochastic variables are worth being modelled probabilistically. Additionally, we discuss the sensitivity of the results with respect to the specification of different model parameters. We further provide a case study using data from a European e-grocery retailer. This data is used to generate probabilistic forecasts which are fed into the stochastic lookahead policy. Comparing our approach to a parametric baseline policy used in practice demonstrates the practical applicability of our approach.

In this section, we provide a problem description for the business case and introduce our modelling framework. We start with a simple single-period setting, proceed to a multi-period setting and then describe our probabilistic models for supply shortages and spoilage.

In e-grocery, customers order groceries online and the retailer directly delivers the purchase from local distribution warehouses to the customer. This leads to different challenges and opportunities for inventory management optimisation compared to traditional store retailing.

E-grocery retailing requires the additional fulfilment processes picking and delivery which increases the time between the replenishment order for an SKU and its availability to the customer. This longer lead time decreases the expected forecasting accuracy of relevant variables, such as the inventory at the beginning of the demand period considered. Depending on the features used for demand forecasting, the longer lead time may also reduce the forecasting accuracy, as less information is available at an earlier decision period. A further challenge results from the high service-level targets of 97% to 99% in e-grocery (Ulrich et al., 2021) .

Opportunities for inventory management optimisation result from new types of data in e-grocery that are not available in traditional store retailing. During the ordering process no in-stock information is available to the customer, which allows to monitor customer preferences and, therefore, yields uncensored demand data. Furthermore, customers are able to select a delivery slot up to fourteen days in advance. This provides information on 'known demand', which equals the customer order quantity for a future delivery period at the time of determining the replenishment order quantity of the retailer. This information can be incorporated into the forecast of demand. Figure 1 displays the mean average percentage forecast error as a function of the lead time of the e-grocery retailer when applying a linear regression for all SKUs within the categories fruits and vegetables in the demand period January 2019 to December 2019. We observe that the forecasting accuracy measured by the mean average percentage error strongly decreases with an increase in the lead time, as less demand information is available for high lead times. An additional opportunity results from the fulfilment processes within the warehouse. As the stowing and picking processes are in the retailer's control, we can assume that units of a SKU are picked and delivered to the customer according to the First In -First Out (FIFO) principle, that is, the oldest SKUs are sold first. 

Recent contributions in the e-grocery literature rely on several restrictive assumptions, such as independent demand periods (Ulrich et al., 2021 , Ulrich et al., 2022 . Considering perishable SKUs with a shelf life of only one demand period, minimising expected costs E C(q t ) for each period t = 1, . . . , T individually leads to minimised expected total costs over the whole time horizon. Assuming that excess inventory generates spoilage costs of h (per unit) while excess demand leads to costs of b, the cost-optimal delivery quantity q * t given stochastic demand D t can be derived by minimising

According to the newsvendor model the cost-optimal service level target equals α = b/(b + h) and yields the optimal order quantity at the α-quantile of the (estimated) cumulative distribution function (CDF) F of demand (Silver et al., 1998 , Zipkin, 2000 . Then, the optimal replenishment order quantity q * t can be calculated as

A suitable specification of the demand distribution F , which is unknown and thus needs to be estimated, is of crucial importance (e.g. Bensoussan et al., 2007 , Levi et al., 2007 , Levina et al., 2010 .

In the single-period setting, introducing a lead time τ , such that the order amount q t to arrive in period t has to be ordered in period t − τ , may affect the forecast accuracy of the demand distribution, but does not change the model. We denote the order amount ordered in period t − τ to be delivered in period t by r t−τ,t . Assuming that the full amount is delivered, we have q t = r t−τ,t . We further assume that the replenishment order quantity cannot be adjusted by the retailer after its determination in t − τ .

Most SKUs have a shelf life of more than one demand period, thus allowing the transfer of excess inventory at the end of any demand period into the following period (Kim et al., 2014) .

In such a multi-period setting, the replenishment order quantity for any period t affects the inventory of the following periods t + 1, . . . , T . Assuming that demand shortages cannot be compensated within a future demand period, we obtain

where I t is the inventory at the beginning of period t. As excess inventory can be transferred to the following period, it does not lead to spoilage costs h anymore but to inventory costs v per unit and period. By assuming periodical replenishments, we can ignore potential fixed order costs in our model and obtain expected costs for demand period t as

In general, this dependence structure violates the assumption of independent demand periods in the newsvendor setting. Instead of minimising costs for each demand period separately, in this dynamic optimisation problem we aim to simultaneously consider consecutive periods affected by the replenishment order decision (Alden and Smith, 1992) . If τ > 0, the inventory I t+τ is unknown in period t when the order decision r t,t+τ has to be placed and needs to be considered as an additional stochastic variable in the order decision problem. Therefore, in addition to the prediction of demand, determining optimal replenishment order quantities in this multi-period setting requires an accurate forecast of the inventory level distribution at the beginning of period t + τ when the order placed in period t is delivered.

In general, retailers face the risk of supply shortages, e.g. due to supply constraints in the distribution channels. This problem is referred to as random yield in the literature. The quantity delivered by the supplier Q t becomes stochastic and depends on the quantity ordered, while it cannot exceed this quantity. If the relative supply shortage was known and constant, a retailer could simply add the percentage of known shortage to the specified replenishment order quantity to derive the target order quantity. However, supply shortages are neither constant nor known in retail practice, but rather follow an unknown probability distribution.

For a positive lead time τ > 0, supply shortages further affect the forecast on the distribution of inventory at the beginning of the demand period and increase uncertainty. In case of random yield, the optimal replenishment order quantity for given estimated inventory and demand increases.

Existing supply-uncertainty literature assumes that retailers know their suppliers' true supply distributions, see e.g. Yano and Lee (1995) , Grasman et al. (2007) , and Tomlin (2009) . Noori and Keller (1986) were among the first to address problems where supply and demand are both random, deriving the optimal order quantity for the unconstrained newsvendor problem with random yield. Parlar et al. (1995) allow for non-stationary supply by assuming that supply follows a Bernoulli process, i.e. the realisation of no or complete supply. To the best of our knowledge, there is no literature considering partial and complete supply shortages in the same model. In our model, we consider three different supply states G t , namely complete delivery (state 1), a cancellation of the total delivery (state 2), and partial delivery (state 3), determining the relative proportion of supply δ t in each demand period t. Since supply shortages often result from persistent problems in the supply chain, we model the sequence of supply states using a homogeneous Markov chain, specified by transition probabilities and its stationary distribution. In case of partial delivery, the proportion of units supplied is as-sumed to follow a Beta distribution with additional point masses on zero and one, respectively (Ospina and Ferrari, 2012) . 1

Classical multi-period inventory management models, such as the economic order quantity (EOQ) model (Silver et al., 1998 , Zipkin, 2000 , implicitly assume infinite shelf lives. However, many SKUs in grocery retailing have a finite shelf life, evoking spoilage costs and stock reductions if they are not sold within their shelf life. The resulting reductions in the inventory level need to be considered in the replenishment order decision. Therefore, forecasts on the distribution of the inventory in a multi-period setting require a detailed analysis of this stochastic variable. Existing contributions in the literature discuss the cases of fixed and stochastic shelf lives; see the surveys of Nahmias (1982) and Raafat (1991) . Most literature considering finite shelf lives assumes that the number of sales periods is known (e.g. Myers, 1997 , Chowdhury and Sarker, 2001 , Viswanathan and Goyal, 2002 . However, for SKUs such as fruits and vegetables, the number of sales periods is more realistically represented by a random variable. The associated probability distribution can be estimated by modelling the decay of the SKUs in the course of time. The rate of decay can be described by a constant fraction of the given inventory or by following a rate that changes according to an underlying function (Raafat, 1991) , as for example an exponential distribution (Nahmias, 1982) .

We model the shelf life of a SKU in days using an empirical discrete distribution that is estimated from historical data. As discussed in detail in Section A.2 in the Appendix, this empirical distribution can be used to derive the conditional probability of a unit of the SKU being spoiled after a certain number of days, given it was still saleable in the period before.

Denoting the resulting total number of spoiled units at the end of period t by Z(i t ) leads to the following function of expected costs for a certain period:

3 Stochastic dynamic optimisation

The model elements described in Chapter 2 form the basis for representing the dynamics of the inventory system in a discrete-time model operating on the level of demand periods.

In the following, we describe these dynamics in detail and introduce a stochastic lookahead policy that is capable of exploiting the representation of uncertain parameters as probability distributions for determining replenishment order quantities.

In our model, the state at the beginning of period t is represented by a state variable S t . S t comprises the inventory vector i t (where each element i t,j corresponds to the number of available units delivered j periods before), the supply state in the previous period G t−1 (see Section 2.4), and the ordered (but not yet delivered) replenishment quantities r t−τ,t , . . . , r t−1,t+τ −1 .

The transition to the next state S t+1 is not only affected by S t , but also depends on the realisation of the stochastic variables demand, supply, and spoilage during period t as well as on the replenishment order decision r t,t+τ resulting from the order policy.

In our model, we make the (simplifying) assumption that this transition follows from a sequence of events occurring within the period t (see the visualisation in Figure 2 ). At the beginning of a demand period, there is a starting inventory i t . At this moment, the first element i t,0 in the vector i t equals 0; it will only be changed when the supply is realised in the course of the period. The first thing happening in t is the replenishment order decision r t,t+τ affecting supply in the future period t+τ ; it has to be taken without information regarding the realisations of supply and demand in period t that only become known later. After the order decision, the supply q t (r t−τ,t ) becomes known. This results in the aforementioned change in the inventory yielding the vector i t with i t,0 = q t .

Then, demand d t becomes known. Given the e-grocery business case introduced in Section 2.1, we assume that SKUs are picked from the inventory according to a FIFO principle.

After taking the (satisfiable) demand out, the new inventory is written as i t . This inventory state is then used to parametrise the spoilage distribution from which a spoilage realisation vector z t = Z(i t ) is sampled. The spoilage yields a new inventory vector i t representing the inventory at the end of period t.

To obtain the inventory state i t+1 at the beginning of t + 1, the elements of the vector i t are shifted to the right by one, and i t+1,0 is set to 0. The other elements of the state variable S t+1 are given by the supply state G t and the vector with the replenishment order quantities that is augmented with r t,t+τ .

i t Inventory level at the beginning of period t (resulting from t-1)

Given a state S t , the cost C(S t ) incurred during a period t are assumed to consist of a cost b caused by each unit of lost sales, inventory cost of v per unit in stock at the end of the period, and a spoilage cost h per unit that is spoiled; we use the function Z(i) to represent the sum of the spoiled units over all elements of the inventory vector. For a given realisation of all random variables, the cost function can be written as:

The expected cost incurred in period t can then be written by using the probability distributions of the uncertain quantities; note that with the exception of the demand, these distributions are state-dependent, that is, they depend on S t .

Due to the lead time τ , the replenishment order decision r t,t+τ taken in period t does not affect the cost in period t but the cost in period t+τ . This becomes relevant when considering a given planning horizon T for which we can compute the total expected cost as the sum of the per-period costs:

Based on the state-transition model and the cost function explained above, we can formulate the inventory management problem as a stochastic dynamic optimisation problem. In the following presentation of this problem and the lookahead policy that we propose for solving it, we use terminology and notation conventions proposed by Powell (2019a) .

Assuming that we would like to minimise the total expected cost over a planning horizon of length T , we can state this problem as the problem of finding a policy π that, given the state information S t at the beginning of a period t, determines the replenishment order decision r π t,t+τ (S t ) that minimises the following objective function:

To address this stochastic dynamic optimisation problem, we propose a stochastic lookahead policy in which we use a Monte Carlo simulation to capture the effect of the replenishment order decision r t,t+τ taken in period t on future periods for a lookahead horizon of length H.

As discussed e.g. in Powell (2019b) , this type of policy exhibits several favourable properties:

Instead of relying on simplifying assumptions and point estimates, it is able to explicitly incorporate the full distributional information regarding uncertain parameters. Furthermore, it can be used in non-stationary settings in which the probability distributions of the stochastic variables (e.g. demand) vary over time.

Concerning the lookahead horizon H, observe that the first period that the order decision taken at t has an effect on is the period t + τ in which the order is supposed to be delivered;

we thus choose H ≥ τ . We denote the number of lookahead periods exceeding τ with ν, that is, H = τ + ν. Let us first assume that H = τ (that is, ν = 0). In that case, for a given state S t and for a given replenishment order decision r t,t+τ , we can approximate the expected cost E(C(S t+τ )|S t , r t,t+τ ) in period t + τ by simulating N sample paths starting at period t and ending at period t + τ . For a sample path n, C(S n t+τ )|S t , r t,t+τ is obtained by simulating the state-transition logic described in Section 3.1 from start state S t using the given decision r t,t+τ and random samples from the distributions representing supply, demand, and spoilage in each of the simulated periods from t to t + τ . In this setting, the optimisation problem to be solved in each period t reads as follows:

If our lookahead horizon H > τ , that is if ν > 0, we extend the sample paths described above until the final period t + H of the horizon. The costs in the lookahead periods after t + τ are not only affected by the decision r t,t+τ to be taken in t, but also by the "simulated" decisions r t ,t +τ taken in periods t with t < t ≤ t + ν that are part of the lookahead. To reflect the relative decrease in importance of impact on the decision r t,t+τ , we use a discount factor ρ. Note that while the lookahead decisions r t ,t +τ for t > t are not implemented, they are nonetheless part of the optimisation problem reading as follows:

Observe that the optimisation does not involve costs occurring in the periods before t + τ , since they are not affected by the decisions involved in the lookahead.

In this section, we examine the value of explicitly incorporating distributional information for the stochastic variables demand, spoilage, and supply shortage instead of point forecasts (expected values) when determining replenishment order decisions using the policy described in the previous section. In the field of decision analysis, the improvement in expected performance resulting from using full distributional information is called expected value of including uncertainty (EVIU), see e.g. Morgan et al. (1990) for a detailed description of EVIU and its relation to the value of information in economics. In the context of stochastic programming, the same concept is typically referred to as value of the stochastic solution (VSS), see e.g. Birge and Louveaux (2011) . While most analyses regarding EVIU and VSS compare the consideration of distributions for all stochastic variables to using no distributions at all, in the following investigation, we examine the value of considering distributions for each subset of the stochastic variables.

In a practical setting, the resulting values can be compared to the costs incurred by the collection and processing of the data needed for obtaining the distributional information regarding the respective stochastic variable(s). In particular, this allows the retailer to decide for each source of uncertainty whether a probabilistic representation is cost-efficient. In the following EVIU analysis, we consider the simplified situation in which the retailer knows the probability distribution for each source of uncertainty (demand, spoilage, shortages).

In practice, estimation uncertainty also needs to be taken into account, but this is highly dependent on the quality of the data available to the retailer. The parameter values for the simulation study are chosen in accordance with data from our business case.

The simulated data set covers T consecutive demand and supply periods for one example SKU. Our data set provides information on demand, spoilage, and supply shortages. Considering perishable SKUs as introduced in the business case in Section 2.1, we use a specific parameter vector for the following analyses. In the online supplementary material, we provide a discussion on the sensitivity of our results with respect to the underlying parameter values.

As suggested by previous literature (see Ulrich et al., 2021) , we assume that demand in period t follows a negative binomial distribution with mean µ t and size parameter k t , i.e.

In the simulations considered below, the retailer will either use this distribution or only an expected value (the mean of the distribution) when determining the replenishment order quantity. We reparameterise the distribution in terms of its mean µ t and variance σ 2 t = µ t +ω t , with the relation k t = µ t /(σ 2 t − µ t ). To allow for non-stationary demand, we draw the parameters of the demand distribution in each period as µ t ∼ P ois(λ µ ) and ω t ∼ P ois(λ ω ).

For the subsequent analysis we assume λ µ = 100 and λ ω = 300 and assume demand to be independent over the different periods, for simplicity avoiding more complex structure such as seasonality. An example realisation of simulated demand over 100 periods is shown in 3 (a).

The shelf life of the SKU is generated from the distribution with probability mass function f sl (j), as shown in Table 1 , with j = 1 corresponding to the situation in which the unit is spoiled at the end of the delivery period (i.e. day 0). The mean shelf life implied by this distribution is three periods. The conditional probabilities that a unit is spoiled after exactly j periods, given it was still saleable at the beginning of that period, are provided in Table 9 in the Appendix. Again, below we consider two scenarios in which the retailer either makes use of the full shelf life distribution or only its mean.

We assume there to be three "states" of delivery: complete delivery (state 1), complete shortage (state 2), and partial delivery (state 3), with the sequence of delivery states across demand periods governed by the Markov chain with transition probability matrix (TPM)

The associated stationary state distribution, also taken as the distribution for period t = 1, is π * = (0.9804, 0.0098, 0.0098) t . If the retailer is faced with partial supply shortage, i.e. if state 3 is active, then the realised relative amount of supply follows a beta distribution with shape parameters α = 2 and β = 3, leading to an average relative shortage of 60% in case of partial delivery and an overall average shortage of 1.57%. A corresponding example realisation of relative shortage for demand periods t = 1, . . . , 100 is given in Figure 3 (b).

While most retailers are able to specify inventory, spoilage and short-term shortage costs, they are not able to track the long-term costs of stock-outs (Walter and Grabner, 1975, Fisher et al., 1994) . However, long-run objectives that impact expected future sales strongly affect the strategic service-level selection (Anderson et al., 2006) . In e-grocery retailing, very high service levels of 97% -99% are considered, reflecting that overall shortage costs are evaluated significantly higher than inventory and spoilage costs (cf. Ulrich et al., 2021) . In accordance with the strategic environment given for the e-grocery retailer in our case study, we assume the costs for one unit excess demand to be b = 5, inventory costs to be v = 0.1 per unit, and spoilage to generate costs of h = 1 for the SKU considered. This relation between costs for excess inventory and shortages takes into account the high service-level target in e-grocery retailing. For the lookahead policy the absolute values of the cost parameters are not relevant, instead only the relation between these parameter values affects the solution determined by the model. A lead time of τ = 3 days is assumed to be required between the replenishment order decision and the delivery to the warehouse of the retailer.

We now investigate the relevance of distributional information for determining replenishment order decisions. To this end, for each source of uncertainty, we consider two different information settings: (i) the retailer only has an expected value of the uncertain quantity and (ii) the retailer knows the probability distribution. Given that we are interested in all possible combinations of these two levels for the three sources of uncertainty, we will investigate the eight scenarios depicted in Table 2 .

For each scenario, the retailer optimises the replenishment order quantity in each demand period according to the information available (i.e. expected values or distributions) and using the lookahead policy described in Section 3.2. At the end of the period, realised demand, spoilage, and supply shortages generate costs according to the given cost parameters (h, v, b For each scenario, we then calculate total costs over the time horizon considered, t = 1, . . . , T .

Our analysis is based on T = 5000 evaluation periods, which equals more than 15 years of data in a business case. This allows us to estimate the EVIU, i.e. cost reductions gained from precise distributional information, for each source of uncertainty as well as for the whole model. We parameterise the stochastic lookahead policy introduced in Section 3.2 based on a set of initial experiments, addressing the trade-off between computation time and stability of the simulation results. In particular, the retailer needs to determine order quantities for all SKUs in the assortment every day within a few hours, which limits the available computation power and time for single SKUs. Therefore, we use N = 1000 sample paths (simulation runs), while considering ν = 3 additional periods with discount factor ρ = 0.9. The results of our analyses based on these simulated data are provided in Table 3 . Incorporating the complete distributional information for each source of uncertainty (Scenario 8) reduces the total costs by 52% compared to the setting where merely the expected value for each source of uncertainty is applied (Scenario 1). The value of including uncertainty varies between the different model components. Furthermore, the sequence of including distributional information matters. In this simulation, including the distributional information for demand whereas using only the expected value for shelf life and supply (Scenario 5) already leads to a comprehensive reduction in total costs (51.6%). Compared to this, considering distributional information on shelf life only (Scenario 3) led to much higher costs (with cost reductions of only 6.8% compared to Scenario 1). Given distributional information on supply only (Scenario 2), the total costs even increase. When including probability distributions for supply and shortage, while still considering only the expected value for demand (Scenario 4), the effects obtained in Scenarios 2 and 3 cancel each other out leading to nearly the same total costs as in Scenario 1. Table 3 : Results for analysis on the expected value of including uncertainty. and the service level realised in our simulations. Scenarios 5-8, i.e. those that incorporate probabilistic demand information, lead to the highest average service levels of more than 98% as required in e-grocery retailing. To account for the variation in demand, the retailer here increases replenishment order quantities and holds a significantly higher safety stock. Therefore, the average inventory level and amount of spoilage increase by more than a factor of 3 between Scenarios 1 and 8. However, because of the asymmetric cost structure, savings due to the increased service level exceed additional expenditures for spoilage and inventory holding. The same results can be obtained when including only the probability distribution for shelf life (Scenario 3). However, the magnitude of the effect is much smaller due to the low probability of spoilage within the first two periods. In contrast, considering the probability distribution for supply only (Scenario 2) decreases all summary statistics compared to Scenario 1. While an expected supply shortage of 1.57% leads to a mark-up of the same amount on the order quantities in Scenario 1, shortages occur only rarely. To account for these shortages in single periods, the retailer would have to hold a safety stock equal to average demand in all periods. As this leads to expenditures due to inventory holding and spoilage exceeding potential savings for lost sales in single periods, the resulting order quantities do not account for the risk of supply shortages in most periods. While the mark-up of expected supply shortages compensates variation in demand at least to a small extent, including the probability distribution for supply is only beneficial when the retailer also accounts for uncertainty in demand.

The above simulation-based analysis indicates how retailers can reduce total costs when using full probability distributions instead of expected values for each source of uncertainty.

However, it is to be expected that the results strongly depend on the exact specification of the distributions of the random variables associated with demand, supply, and shelf life (and of course also on the cost structure). We hence provide a sensitivity analysis in the online supplementary material, where we gradually change the parameters used for each source of uncertainty and derive the resulting costs. We observe that the benefit of incorporating the demand distribution increases when its variance increases, which can be explained by the asymmetric cost structure. Incorporating information on the shelf life distribution is most beneficial for distributions with a high variance or a small mean (corresponding to a high risk of spoilage in early periods). The relevance of incorporating probabilistic information on potential supply shortages depends not only on the associated risk, but also on the persistence of the corresponding process (i.e. whether shortages tend to occur in several consecutive periods). Regarding the cost structure, we find that when assuming a constant relationship between inventory costs and spoilage costs, then potential savings increase with a higher cost asymmetry (due to lost sales).

In this section, we use data from a European e-grocery retailer to demonstrate the practical applicability of our proposed method for a business case as introduced in Section 2.1. Compared to the stationary setting considered in the simulation study, here the parameters values of the probability distributions need to be estimated from historical data and additionally vary over time.

The data set provided by the e-grocery retailer covers demand periods of six different local distribution warehouses from January 2019 to December 2019, i.e. before the beginning of the Covid-19 pandemic. One observation here equals one demand period t, i.e. one day of delivery.

We consider four SKUs within the category fruits and vegetables, namely mushrooms, grapes, organic bananas, and lettuce. For illustration, Figure 4 displays the demand for the SKU mushrooms in 2019 for one selected local warehouse. We find recurring peaks on Mondays, but do not observe any significant trend or seasonality. The data set includes features to be used for the demand forecast as well as the (uncensored) realised demand in this period (for a detailed description, we refer to Ulrich et al., 2021) . For the perishable SKUs analysed, the number of sales periods before spoilage is not defined by best-before-dates, but may depend on non-constant prior supply chain attributes, such as the weather or the country of origin. Due to missing best-before-dates, our data set includes a parameter for the expected number of sales periods for each SKU, predefined by the retailer. The expected number of sales periods for lettuce, as an example, equals one demand period, i.e. it is assumed that excess inventory cannot be sold in the following demand period and thus generates spoilage. In addition, the data set includes information on the quantity ordered, the quantity delivered by the regional distribution center, and the number of units spoiled in a certain demand period.

At the beginning of each demand period, the current inventory level, the supply state of the last period, and previous replenishment order quantities for future demand periods within the lead time are known. From historical data, we estimate the distribution of demand in future periods, the transition probability matrix (TPM) to make predictions with respect to possible supply states, and the empirical distribution of shelf lives -details are provided below.

However, the data related to spoilage depends on previous replenishment order decisions, as these quantities affect the level of inventory and, therefore, the amount of spoilage observed in the data set. In addition, we observe the realised supply shortages only for the quantity that was requested, and hence not for any other possible order quantity. However, the data set provided by the e-grocery retailer allows us to make use of uncensored demand data.

The demand forecast is obtained via regression modelling, considering the features ID of the warehouse, weekday, price, marketing activities, known demand, and median demand of the previous month. The marketing activities were included only for the SKU grapes, as for the others there was no marketing campaign in the demand periods analysed. Based on the good performance of distributional regression methods in situations with very high servicelevel targets in Ulrich et al. (2021) , we apply Generalized Additive Models for Location, Scale and Shape (GAMLSS) for demand forecasting, assuming a negative binomial distribution for the response.

As introduced in Sections 2.4 and 2.5, we model spoilage using an estimated probability distribution over the shelf life for each specific SKU, while supply shortages are assumed to be governed by a 3-state Markov chain with state transition probabilities estimated from historical data. For the state associated with partial supply, the parameters of the beta distribution are estimated based on supply shortage. For each determinant and each SKU, we use the previous six months of data to estimate the associated probability distributions and incorporate them into the lookahead policy for an evaluation period of one month. That is to say, as an example, we train on data from January to June 2019 to forecast demand, spoilage, and supply shortages in July 2019. Due to the limited number of demand periods during six months, we aggregate historical data on spoilage and supply shortage over the warehouses to ensure stable estimations.

To derive the stochastic distribution of the shelf life for a given SKU, we consider the number of units spoiled at the end of a certain period, for which we calculate the supply date under the assumption of the FIFO principle based on historical data. This allows us to derive the relative frequencies of shelf lives within the data set. Figure 5 illustrates the estimated CDF of the shelf life for the SKU mushrooms. We find only slight differences between months, implying a low level of seasonality in the shelf life of this SKU. 2 While about 30% of the units have a shelf life larger than two days, every other unit is already spoiled after the first day. The stationary distribution of supply states for the SKU mushrooms can be obtained from 

As benchmark to our lookahead policy, we replicate the current replenishment order decision model according to the guidelines of the e-grocery retailer analysed. For each SKU, the retailer operates with an inventory-level target that equals the sum of the expected mean demand 3 and a percentage share of the mean demand as safety stock. The safety stock depends on historic realised mean demand and the expected number of sales periods. For each SKU, the expected number of sales periods is specified by a fixed parameter, e.g. one sales period for the SKU lettuce. Therefore, the retailer does not consider any variation in the shelf life of the SKU. SKUs with a low mean demand and a low number of expected sales periods obtain a low safety stock, e.g. 30% of the mean demand for lettuce, whereas SKUs with a high mean demand and a high number of expected sales periods obtain higher safety stocks, e.g. 70%

of the mean demand for grapes. Thus, the inventory level target for a SKU with a safety stock of 30% equals 1.3 × mean demand. The safety stock for the four SKUs analysed in this case study varies between 30% and 70% of mean demand. The comparison between the assumed fixed shelf life of two days for the SKU mushrooms and the CDF in Figure 5 provides evidence for potential cost reductions by incorporating stochastic spoilage instead of a fixed sales period into the inventory management process.

The e-grocery retailer currently assumes that the quantity delivered equals the quantity ordered, i.e. the yield rate equals 100%. As a consequence, the risk of random yield is neglected by the retailer and does not impact the replenishment order decision. 

As at any point in time the demand forecast, the CDF for shelf life, and the TPM for supply states are estimated from the previous six months of data, we evaluate the lookahead policy according to an out-of-sample rolling window procedure, e.g. we train on data for January to June and evaluate July. This enables a comparison between the suggested lookahead policy and the benchmark for six consecutive months from July to December 2019. For both approaches we assume that the inventory is empty at the 1 st of July, i.e. at the beginning of our evaluation period. Due to the lead time of τ = 3, we consider the replenishment order quantities before the 4 th of July as given and identical for both policies. For each demand period, we conduct two steps. First, the next replenishment order quantity is determined according to the underlying policy, and second, the period is evaluated using the business case data set to calculate resulting costs. For Sundays and bank holidays when there was no service, we set the replenishment order quantity to zero.

As our demand data is uncensored, it does not depend on the inventory level. Therefore, we refer to realised demand according to the data set given by the retailer for the evaluation of a period. However, as our replenishment order quantity for a given period may deviate from the quantity ordered by the retailer for the corresponding day, we make use of the information on the relative amount of incompletely supplied replenishment order quantities in the data of the retailer. We transfer this information to our order quantities, i.e. if there was full supply (or full shortage), we also assume full supply (or full shortage) for a different quantity. The number of units spoiled depends on the composition of the inventory with corresponding date of supply. Since the inventory in our model again deviates from the inventory given by the retailer's data, we use simulations to determine which number of units would have been spoiled if the retailer had followed the policy. Specifically, as introduced above, we assume spoilage to follow a binomial distribution with the probability parameter estimated from historical data and the number of units with a given supply date. To determine the amount of spoilage in the evaluation, we make use of the underlying probability distribution which is used as the forecast in the lookahead policy for the following month. 4 For each demand period and supply date we generate a random number from a (0,1) uniform distribution. Applying this value to the inverse CDF of the shelf life gives the number of units spoiled. Using the same random number in the evaluation of both approaches ensures that a larger inventory on a given day with identical supply date leads to a larger number of units spoiled and vice versa.

We calculate total realised costs by considering costs for inventory holding, spoilage, and demand shortages using the replenishment order quantity determined by our lookahead policy and the benchmark. Cost parameters are given as introduced in Chapter 4 by v = 0.1 for one unit in the inventory, h = 1 for each unit spoiled, and b = 5 for one unit of lost sales.

Evaluating both policies for each SKU and warehouse enables us to monitor the resulting inventory at the end of a period, the number of units spoiled, lost sales, and resulting total costs for each demand period within the evaluation period.

We evaluate four SKUs within six local distribution warehouses. Due to missing data for the SKU lettuce in two warehouses, we are able to evaluate 22 SKU/warehouse combinations in total. Table 7 illustrates relative changes in the resulting average costs, i.e. relative savings, when using our lookahead policy instead of the benchmark approach. Overall, we find substantial cost reductions of 6.2% to 23.7% for all four SKUs. As our data set allows us to evaluate only six months of data, results vary considerably across the different SKU/warehouse combinations, and for 4 out of the 24 combinations we in fact found an increase in costs. It should also be noted that the cost parameters used in the lookahead policy may differ from the cost structure implicitly embedded in the benchmark policy. However, our results in the sensitivity analysis (see supplementary material) show that using probabilistic information is superior across different values of the cost parameter for lost sales b. -26.6% -7.5% -6.1% -6.6% Average -23.7% -8.8% -20.7% -6.2% Table 7 : Average change in the relative costs when using our lookahead approach compared to the benchmark approach, for each warehouse and SKU.

Our simulation study in Chapter 4 suggests that retailers are already able to reduce costs substantially even when accounting only for demand uncertainty. Therefore, we further compare average costs when using the lookahead policy incorporating only information on the demand distribution with the benchmark policy for the SKU mushrooms and every warehouse (Table 8) . We find that using the demand distribution alone reduces average costs over all warehouses by 22.9%, whereas additionally including distributional information on the shelf life and supply shortages leads to a further cost reduction of only 1.1%. These findings corroborate the results from the simulation study, indicating that the demand distribution is the main source of uncertainty and the most relevant information to incorporate in the replenishment order decision. customers switching to another company. In our setting with the specific assumptions made for the cost parameters, the higher safety stock under the lookahead policy induces lower average costs over the full evaluation period. The asymmetric cost structure leads to the interesting result that we find higher costs under the lookahead policy in about 65% of the demand periods, yet the average overall costs are lower by about 11.3% (see Table 7 ). To illustrate this phenomenon, Figure 7 In summary, we find that when fully accounting for the uncertainties in inventory management, the asymmetric cost structure in e-grocery retailing leads to higher average replenishment order quantities. While resulting costs under the lookahead policy are slightly increased for the majority of periods due to higher inventory levels and spoilage, the minimisation of lost sales yields an overall reduction in costs for the retailer compared to the benchmark policy.

In this paper, we developed an inventory management framework for grocery retailing that allows us to model various sources of uncertainty, namely demand, spoilage, and supply shortages, with suitable probability distributions estimated from data. Using a stochastic lookahead policy incorporating Monte Carlo techniques to address our dynamic stochastic optimisation problem, we analyse the value of explicitly exploiting probabilistic information instead of relying on point forecasts (expected values) when determining replenishment order decisions. Our results demonstrate that incorporating the full distributional information for all sources of uncertainty can lead to substantial cost reductions in inventory management (with the amount of savings of course depending on the specific situation). We additionally show that the importance of including distributional information tends to increase with higher asymmetry in costs (i.e. very low or very high service-level targets), as commonly found in e-grocery retailing.

In practice, data collection, data preparation, and data analysis require operational effort and costs for retailers, which needs to be taken into account when considering a possible implementation of the optimisation framework considered here. Regarding the different sources of uncertainty, the results of our simulation-based analysis as well as of the case study indicate that the benefit of integrating the probability distributions instead of expected values when determining replenishment order quantities is highest for demand. In contrast, the additional contribution of modelling shelf lives and supply shortages by probability distributions in our analyses was found to be marginal, such that the analysis costs related to these two sources of uncertainty may exceed the potential savings.

From a managerial perspective, the case study suggests that using modern computational methods exploiting the considerable amount of data available in e-grocery retailing has the potential to outperform simple parametric inventory management policies designed by experienced human experts. In addition to explicitly accounting for all sources of uncertainty, a key advantage of our lookahead policy over simple parametric policies is that it naturally adapts to a changing environment (e.g. induced by dynamic market developments), structural shocks (e.g. the Covid-pandemic), and regime shifts due to strategic changes (e.g. an increased focus on sustainability). Ulrich, M., Jahnke, H., Langrock, R., Pesch, R., and Senge, R. (2022) . Classification- 

A.1 State distribution of supply shortage Let δ t denote the proportion of the ordered quantity r t that is actually supplied, such that 1−δ t is the relative supply shortage. The homogeneous Markov chain determining the sequence of supply states G 1 , . . . , G T is specified by the transition probabilities π i,j = Pr(G t = j|G t−1 = i), i, j ∈ {0, 1, 2}, t ≥ 2. The state distribution in the first period t = 1 is assumed to be given by the Markov chain's stationary distribution, π * = (Pr(G t = 1), Pr(G t = 2), Pr(G t = 3)).

The proportion of units supplied, δ t , is then determined as follows:

In case of partial delivery, the beta distribution assumed for the proportion of units delivered implies a mean supply rate of α/(α + β) -the sum α + β constitutes a precision parameter.

Across all three states, the proportion of units supplied follows a beta distribution with additional point masses on zero and one and a stationary mean of π * 1 + π * 3 · α/(α + β).

The conditional probability p j that a given unit is spoiled after j periods is given by Table 9 : Conditional probability of spoilage p j at the end of a given demand period j in the simulated data set. For each of the three sources of uncertainty, we compare the results obtained under Scenarios 1 (expected values only) and 8 (full probabilistic information), respectively, in the same simulation setting.

We start by adjusting the variance of demand. As our costs are asymmetric, with lost sales more expensive than inventory and spoilage, we expect the benefit of incorporating the demand distribution to increase when its variance increases. We still allow for nonstationary demand but vary the parameter in the variance-generating Poisson distribution, λ ω ∈ {100, 200, 400, 500}, holding λ µ = 100 constant. Figure 8 shows that the average costs substantially increase when using expected values only (Scenario 1), while the per-period costs only slightly increase when incorporating full distributional information for all sources of uncertainty (Scenario 8). As expected, the importance of incorporating information on the demand distribution thus increases with its variance. The sensitivity of the results with respect to the shelf-life distribution is analysed in two different ways. First, we consider two settings (f sl 1 and f sl 2 ) with the same mean shelf life (three periods) but different variability. Second, we analyse two shelf-life distributions (f sl 3 and f sl 4 ) with the same relatively small variance but different mean shelf lives. The distributions are provided in Table 10 . Here f sl 1 corresponds to an SKU with a small variation in the shelf life, with 70% of the units spoiling one day after the expected shelf life at the latest, and each unit being saleable for 2-5 periods. In contrast, f sl 2 represents a heavy-tailed distribution where both short shelf lives (one period) and longer ones ( and Scenario 8 (using full distributional information). In the baseline setting as already presented in Section 4.2, the distribution is nearly symmetric around the mean shelf life of four periods, with a small risk of spoilage within the first two periods. In this setting, cost reductions of around 52% could be achieved when incorporating full distributional information, while the reduction in Scenario 3 is limited to 6.8%. If the risk of a very early spoilage is low, as caused by a small variance (f sl 1 ) or a high mean (f sl 4 ), similar cost reductions are achieved. In contrast, incorporating distributional information for shelf life only (Scenario 3) is more beneficial for distributions with a high variance (f sl 2 ) or a small mean (f sl 3 ), corresponding to a high risk of spoilage in early periods. At the same time, due to increased total costs, reductions achieved when incorporating probability distributions for all sources of uncertainty (Scenario 8) are smaller than under the baseline distribution.

Next we analyse the sensitivity of the results with respect to supply shortages, considering four different transition probability matrices regarding the change of supply states, while holding the parameters of the beta distribution in case of partial supply shortage constant.

The first matrix corresponds to a situation where the retailer is a bit more often faced with complete shortage than under the baseline scenario, with rare switches to partial or full , π * = (1/3, 1/3, 1/3) t .

The difference between these two settings lies in the state persistency, with Π 3 corresponding to higher and Π 4 to lower persistence. The results presented in Table 12 show a large variation in relative cost savings when comparing resulting average per period costs for Scenarios 1 and 8 for different transition probability matrices on supply states. Due to the increased risk of (partial) supply shortages, in all cases considered here average total costs are higher than under the baseline matrix.

This also leads to a decreased potential of reducing costs when incorporating probabilistic information for all sources of uncertainty (Scenario 8). However, while the low risk of supply shortages in the baseline scenario even increased total costs in Scenario 2, we find cost reductions for all cases in this analysis. Since lost sales are more expensive than inventory holding and shortage, a model incorporating knowledge of the TPM determines replenishment order quantities such that there is a larger safety stock. Therefore, comprehensive cost savings can be reached in Setting Π 1 . A similar result can be obtained when considering Π 4 with a probability of 1/3 for all three supply states independent of the previous state. At the same time, savings in Settings Π 2 and Π 3 are much smaller. Due to the persistence of the same supply state, the retailer is rarely able to react to supply shortages by increasing the replenishment order quantity for the following period as there is still a large probability for shortages.

Finally, we consider a change in the cost structure for lost sales, inventory holding, and spoilage. As introduced above, in general, costs in e-grocery retailing are asymmetric due to comprehensive long-term consequences if customers are unsatisfied with the shopping experience due to an unavailability. It can be expected that the extent of cost-savings when including probabilities into the inventory management optimisation reduces when costs are more symmetric. We test this hypothesis by changing the relative relationship between cost parameters. While assuming a constant relationship between inventory costs v = 0.1 and spoilage costs h = 1, we change the costs for one unit lost sales. In the first analysis, we assume that lost sales equal inventory costs leading to b 1 = 0.1. Furthermore, we consider b 2 = 0.5, b 3 = 1 (i.e. costs for lost sales and spoilage are identical), b 4 = 2 and b 5 = 10. Figure 9 shows average per period costs for Scenario 1 (red line) using expected values for the three sources of uncertainty and Scenario 8 (blue line) using distributional information for given unit costs for lost sales. If these costs are between unit costs for inventory holding and spoilage, the difference is negligible while it is more important to incorporate probability distributions if the cost structure is asymmetric.

This result is confirmed by Figure 10 savings of only 2.6% are achieved when including distributional information, whereas for the business case of e-grocery retailing with asymmetric cost structure (and corresponding high service-level targets) savings are much larger. As introduced above, for costs per unit lost of b = 5, including information on the distribution of demand, spoilage, and supply shortages reduces costs by more than 50%, while potential savings are even larger for b = 10. 

Rolling horizon procedures in nonhomogeneous markov decision processes

Measuring and mitigating the costs of stockouts

A multiperiod newsvendor problem with partially observed demand

Introduction to Stochastic Programming

Manufacturing batch size and ordering policy for products with shelf lives

Smart 'predict, then optimize

Retail forecasting: Research and practice

Making supply meet demand in an uncertain world

Newsvendor solutions with general random yield distributions

Stochastic Models in Operations Research: Stochastic Processes and Operating Characteristics

Optimal inventory control in a multi-period newsvendor problem with non-stationary demand

A data-driven distributionally robust newsvendor model with a wasserstein ambiguity set

Provably near-optimal sampling-based policies for stochastic inventory control models

Weak aggregating algorithm for the distribution-free perishable inventory problem

Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis

Meeting seasonal demand for products with limited shelf lives

Persishable inventory theory: A review

One-period order quantity strategy with uncertain match between the amount received and the quantity requisitioned

A general class of zero-or-one inflated beta regression models

A periodic review inventory model with markovian supply availability

A unified framework for stochastic optimization

Reinforcement Learning and Stochastic Optimization

Survey of literature on continuously deteriorating inventory models

Profit estimation error in the newsvendor model under a parametric demand distribution

Inventory Management and Production Planning and Scheduling

Impact of supply learning when suppliers are unreliable. Manufacturing and Service Operations Management

Distributional regression for demand forecasting in e-grocery