key: cord-0617083-awsx8hpy
authors: Ghosh, Mayukh; Kuiper, Alex; Mahes, Roshan; Maragno, Donato
title: Learn Global and Optimize Local: A Data-Driven Methodology for Last-Mile Routing
date: 2021-12-03
journal: nan
DOI: nan
sha: bc072f443940afdbe8b09595410e7af732de3acb
doc_id: 617083
cord_uid: awsx8hpy

In last-mile routing, the task of finding a route is often framed as a Traveling Salesman Problem to minimize travel time and associated cost. However, solutions stemming from this approach do not match the realized paths as drivers deviate due to navigational considerations and preferences. To prescribe routes that incorporate this tacit knowledge, a data-driven model is proposed that aligns well with the hierarchical structure of delivery data wherein each stop belongs to a zone - a geographical area. First, on the global level, a zone sequence is established as a result of a minimization over a cost matrix which is a weighted combination of historical information and distances (travel times) between zones. Subsequently, within zones, sequences of stops are determined, such that, integrated with the predetermined zone sequence, a full solution is obtained. The methodology is particularly promising as it propels itself within the top-tier of submissions to the Last-Mile Routing Research Challenge, while it maintains an elegant decomposition that ensures a feasible implementation into practice. The concurrence between prescribed and realized routes underpins the adequateness of a hierarchical breakdown of the problem and the fact that drivers make a series of locally optimal decisions when navigating. Furthermore, experimenting with the balance between historical information and distance exposes that historic information is pivotal in deciding a starting zone of a route. The experiments also reveal that at the end of a route, historical information can best be discarded, making the time it takes to return to the station the primary concern.

Last-mile routing is often the shortest, yet most complex and expensive leg in distribution systems.

It accounts for about 28% of the total costs of transportation from the distribution center to the final customer (Goodman 2005, Gevaers, Van de Voorde, and Vanelslander 2011) . Last-mile operations also have a large impact on the livability, because of traffic congestion and pollution it creates (Taniguchi and Thompson 2002, Chopra 2003) . This pressure is set to increase, because the trend of urbanization continues; around 68% of the global population will reside in urban areas by 2050, according to a report by the United Nations (2018) . Currently e-commerce sales are estimated at about $709.78 billion, which is projected to increase as the coronavirus has accelerated a channel shift to e-commerce, see eMarketer (2020). The same trends are echoed by a recent report of the World Economic Forum (Deloison et al. 2020) , which further points out that also the number of same-day deliveries will grow due to customers ever-increasing expectations, amplifying the importance of last-mile operations and its impact on the ecosystem. Examples of two realized routes by using OpenStreetMap (2021) . The numbers represent the order in which the stops are visited and the colors represent the different zones within the routes.

Logistics 2021a), wherein by means of offering a vast routing dataset, researchers around the world have been invited to bridge this gap. In this work we answer that call by providing a complete methodology that proposes routes that adequately reflect drivers' tacit knowledge and preferences by learning from historic route data.

In the US, Amazon steadily holds the number one position as the largest e-commerce retailer having a market share of 38.0% (eMarketer 2020), and inherently is responsible for the majority of traffic associated to last-mile delivery. Therefore, by applying the methodology on the data associated to Amazon will set an example for many other organizations facing the bewildering setting of last-mile delivery. Although we consider it as a predictive problem, which helps uncovering on a conceptual level the driver's decision process when determining a route, we emphasize that the approach can also be used to prescribe routes, which are more feasible from a driver's perspective.

To consider the problem at a more detailed level, two delivery drivers' routes are given in Figure 1 as an increasing sequence of numbers; each number represents a stop. Note that not all stops are shown as for readability purposes. Along the route the driver traverses through different zones, given by their zone ids, which are (preset) geographical areas. Comparing the routes in the left (more rectangular grid) and the right map, it is observed that the size of a zone ranges from a part of a street to a set of multiple streets.

Leveraging historical routes to learn recurrent patterns is feasible on a zone level rather than at a stop level. This is due to the fact that a zone appears in multiple routes while a stop does not (or rarely). The intrinsic importance of a zone sequence depends on how the zone is established.

Ideally, a zone should be constructed such that every stop in it is visited before moving to the next zone (as for example in Figure 1 ). However, with this presumption, classical heuristics and optimization models fall short to exploit this information. Two classical examples are illustrated in Figure 2 . On the left, we find the route obtained by applying the nearest neighbor heuristic, which is sometimes argued to mimic the driver's behavior, while on the right the route that minimizes distance, as the solution of the corresponding Traveling Salesman Problem (TSP).

On top of the condition that drivers visit all stops in a zone before moving to another, the challenge is to determine the sequence of zones a driver will follow. In Figure 2 , there are six possible permutations of zone sequences, but in real-life settings ( Figure 1 ) there are easily 30 zones leading to 30! unique zone sequences. The sequence might be based on information at ground level, navigational considerations, or merely a preference, but most importantly an approach should be able to learn this tacit information from realized routes.

The contributed methodology divides the problem into two parts: on the zone level, it aims to infer the sequence of zones after which, within each zone, the sequence of stops is determined. It stems from the idea that first a routing problem is abstractified into zones, for which a sequence is established by solving a TSP formulation that balances historical zone transition information, resembling the tacit knowledge, and distances. So the methodology learns on a global level, which aligns with the intuition that humans congregate route sequences on a higher level than remembering exact stop sequences. Secondly, the problem in each zone is of relatively low complexity and thus a driver is capable to determine an optimal sequence out of the set of stops that minimizes distance or travel time. In the end, the performance of the methodology confirms the competency of the approach.

In the next section, we provide the literature review, which outlines important elements to our methodology. Section 3 describes the methodology, and is split into learning and prediction. In Section 4, we introduce the dataset on which the methodology will be applied and provide the route prediction metrics aligned with the challenge. Using the case study data we carry out various experiments and demonstrate the methodology. Furthermore, we compare the performance to the aforementioned benchmarks as well as to the submissions of the competition. In Section 5, we present our concluding remarks and implications to practice, followed by a discussion and some suggestions for future research.

Solving a last-mile routing problem can be translated to a TSP, i.e., a set of stops should be visited-starting from and ending at the station. However, in this research the goal is to predict a driver's route using historic data. Therefore, in this pursuit, elements of different disciplines are combined. More specifically, we consider classical TSP optimization frameworks, hierarchical structures as an approximation (such as cluster first, routing second), human approaches to solving TSPs, and lastly, learning from route data. In relation to our work, we elaborate further on these themes below.

The TSP with Euclidean distance is notorious for being hard and is proven to be NP-complete (Papadimitriou 1977) , but for finding the optimal or near-optimal tour there is a rich litera- Tucker, and Zemlin (1960) . The MTZ formulation has the advantage of being intuitive and straightforward to implement and works well when the number of stops (variables) to consider is relatively small, fitting our instances. As a downside, it has been proven to lead to a weak relaxation, leading to higher computation times, see e.g., Campuzano, Obreque, and Aguayo (2020) for a discussion. In our case, as we break down the full problem into smaller ones, this is not an issue, but as desired any other TSP formulation can be adopted.

To reduce the computational complexity, the problem can be broken down into smaller instances, which are computationally much less involved, and connect these together. For example, in the seminal work by Karp (1977) , the proposed partitioning algorithms are asymptotically optimal heuristics in the case of uniformly distributed stop locations, i.e., the error tends to zero with probability one as the length of a route (i.e., the number of stops) increases.

Both Liao and Liu (2018) and Jiang et al. (2014) decompose the problem with small-scale nodes by relying on clustering algorithms. Each subproblem is subsequently optimized and the center nodes of the subproblems constitute a TSP in itself. Connecting all local tours in the order of the upper layer problem generates approximative solutions with significantly reduced computation times. Such an approach is also found from an empirical point of view, see Vickers et al. (2003) .

Moreover, evidence to support the use of a hierarchical structure in practice is found in Graham, Joshi, and Pizlo (2000) . They show that solution methods originating from artificial intelligence or operations research algorithms are insufficiently capable to mimic the human approach of solving TSPs. They introduce a hierarchical model by means of a pyramid algorithm on the visual representation. This algorithm is capable to render human-alike solutions to the various TSP instances where classical algorithms fail in this task, see also Pizlo et al. (2006) for a more refined algorithm.

More importantly, the aforementioned works hint at the use of a hierarchical breakdown, because of lower computational complexity and being concurrent with practice.

The TSP lends itself to a broad range of experimental studies as its goal of minimization of the route is easy to understand and visualize. Many experimental studies have been devoted to visual versions of the TSP (MacGregor and Ormerod 1996 , Van Rooij, Stege, and Schactman 2003 , Vickers et al. 2003 . MacGregor and Ormerod (1996) show that humans outperform well-known TSP heuristics and are in small problem sizes capable to be in 1% from optimality.

Humans rely on various tactics to generate near-optimal solutions for the TSP. The potential of these tactics is further demonstrated by the fact that the time needed increases only in a linear fashion compared to the problem size (Pizlo et al. 2006 , Dry et al. 2006 ). One of these tactics is to consider the problem first globally, considering the tour that visits all 'exterior' points, to which interior points are inserted; this resembles the so-called convex hull approach (MacGregor and Ormerod 1996, Vickers et al. 2003) . Argued as the underlying motivation for the convex hull approach, is the avoidance of crossings in a tour (Van Rooij, Stege, and Schactman 2003) . The motivation behind this tactic is the intuition that a cross in a tour is suboptimal, which is even a fact for metric TSPs. In an OTSP, where the solution is not a tour, the starting point can have a profound impact on the performance, see Sengupta, Mariescu-Istodor, and Fränti (2018) . However, they also report that humans are quite capable to select a 'right' starting point. Wiener, Ehbauer, and Mallot (2009) find in a series of experiments that, when navigating, a coarse route is stipulated first. This route visits a set of 'regions' after which it is 'optimized' on a detailed level along the way. Furthermore, if the problem size increases, the problem is divided into more regions to make the problem manageable and approachable. Such a global-to-local approach echoes the hierarchical decomposition of first determining the zone sequence on a global level, and subsequently a series of local problems to determine the sequence in which stops are to be visited in each zone. Additionally, another experiment from Wiener and Mallot (2003) reveals that a segmentation into zones affects the route planning and navigational behavior as it primes a driver to approach the problem from such a structure. A similar effect can be expected when drivers are presented a route, for example a solution from routing software as in the Amazon challenge, from which we take the data to demonstrate our methodology (Amazon and MIT Center for Transportation & Logistics 2021a).

Typically, the problem of finding a sequence of stops is positioned in the field of combinatorial optimization, but in this research the goal of find an optimal sequence to minimize an objective function does not take center stage. Here, the goal is how route data can be used to predict routes that drivers (will) follow; for an example where learning takes place in order to find the optimal route, see Kool, van Hoof, and Welling (2019).

The capability to learn and predict (part of) the routes is demonstrated in Krumm (2008) . The model is trained from drivers' long-term trip history using GPS data. It uses a Markovian approach,

i.e., on the basis of the last road segment, which can also be adjusted to incorporate more history, it predicts the next segment the driver will take. In the same vein, Ye et al. (2015) propose a route prediction method based on a hidden Markov model that can accurately predict an entire route early in the trip. Wang et al. (2015) also employ a Markov model; their algorithm relies on a probability transition matrix that is developed to represent the knowledge of the driver's preferred links and routes. For the VRP, Canoy and Guns (2019) show the potential of using a Markov model in an optimization framework. By constructing a transition probability matrix, based on historical data, and by exploiting the VRP structure, they render solutions that resemble actual route plans much better than relying on a distance metric.

The previous works demonstrate that a Markov model is a powerful tool to learn from historical data and to use these in predictions. We adopt such a model by using historical information, which is weighted against a distance metric. The full methodology is detailed in the next section.

The methodology that we propose relies on the fourfold of elements reviewed in Section 2. We impose a hierarchical structure on two levels: finding the zone sequence on a global level and finding the stop sequence within each zone locally. Such an approach is in line with evidence from human navigation, and more importantly matches the structure of typical routing data-each stop belongs to a zone. We learn on the global level the preference to traverse from one zone to another by adopting a Markov model, which we combine with the distances between zones to form the cost matrix for the global problem. Relying on standard methods, we solve the TSP on the global level to determine the zone sequence, and a series of TSPs on the local level, which mimics the driver capabilities to find near-optimal solutions in small instances. Finally, by patching the local solutions, we deliver a full prediction of the route.

To learn the drivers' preference over zones we use the fact that each route starts at a station s-also commonly known as depots, where s belongs to the set of stations S. Then for each stop in a route, the data consists of the GPS coordinates, which can be used to compute distances between stops, but the methodology can also be used when the costs of traversing from one stop to another are given, e.g., (expected) travel times. Furthermore, each stop within the route should be given an identifier, which relates the stop to belong to a geographical area, i.e., a zone. The zones should not overlap and therefore they form a segmentation. As a side note, the zone ids should be consistent within the set of routes in the dataset. Specifically in relation to zones there are two phases, which form the basis of our approach:

• Learning. Extracting zone sequences from historical data enables the counting and collection of zone transitions in a count matrix.

• Prediction. For new routes, a combination of a distance and count matrix is used in a TSP formulation which solution provides the zone sequence; after which within zones the stop sequences are iteratively determined by means of a series of OTSPs.

For learning, we exploit the information contained in route data to extract information about drivers' preferences at the level of zones. Each route consists of an ordered series of n + 1 stops, namely n delivery nodes (this number differs per route) and the station. Therefore, a route is represented as a sequence of tuples, route = ((x 0 , y 0 , z 0 ), . . . , (x n , y n , z n )), with (x i , y i ) ∈ R 2 indicating the stop's geographical location, and z i the corresponding zone id, with z i ∈ {Z 1 , . . . , Z M } for i ∈ {0, . . . , n}. As the data provides the stop sequence per route, we have to convert it to the higher-level sequence of zones.

Given the observations in the introduction and a navigational perspective that all stops within a zone are completed before a driver moves to another zone (Section 2.3), we determine for each route the sequence of unique zones. To illustrate the procedure let us consider a route with n delivery nodes. First, we consider the sequence of zones for each stop (z 1 , . . . , z n ). Then, the route's zone sequence is reduced to the sequence of pairs ((ζ 1 , τ 1 ) . . . , (ζ , τ )), where each pair (ζ i , τ i ) represents how many nodes (τ i ) belonging to the same zone (ζ i ) are visited in a row. When the ζ i are pairwise distinct, we consider (ζ 1 , . . . , ζ ) to be the desired sequence of unique zones.

Example 1. The following route consisting of six stops over three zones is reduced into a sequence of three unique zones:

In case of a zone's reoccurrence (ζ i = ζ j for an i < j), we only keep the occurrence with the highest number of stops by comparing τ i to τ j , or keep the earliest appearance in case of a tie, here that is ζ i . This procedure is repeated until we are left with a sequence of singly occurring zones. Two extreme, yet illustrative, examples of this procedure are given below.

Example 2. The reduction of two routes consisting of six stops over three zones under the procedure outlined above is illustrated below. First, zone Z 3 is put last as τ 1 = 1 and τ 4 = 2:

In the next case, zone Z 1 occurs thrice, but breaking the tie twice puts zone Z 1 at its first occurrence in the sequence:

(Z 1 , Z 3 , Z 1 , Z 2 , Z 2 , Z 1 ) → ((ζ 1 , 1), (ζ 2 , 1), (ζ 3 , 1), (ζ 4 , 2), (ζ 5 , 1)) → (ζ 1 , ζ 2 , ζ 4 ) = (Z 1 , Z 3 , Z 2 ).

Evidently, not all zones are visited within each route, therefore we keep track of all M zones that have been visited at least once in any route. Having reduced each route to its corresponding zone sequence, we compute an asymmetric count matrix N where each entry N ij represents the number of times a driver went from the i-th to the j-th zone, so that i and j range from 1 to M + S, where S is the number of stations in the dataset. The pseudocode related to the procedures outlined is provided in the Appendix.

In the next phase, we predict the route that a driver will take. For that purpose, we translate the count matrix, created in the previous phase, to a transition matrix P via P ij = N ij m j=1 N ij , so that P ij reflects the Markovian probability of transitioning from zone i to zone j.

Besides the zone sequences, the locations of the station and zones are necessary ingredients in the suspected trade-off a driver makes. For this purpose, we define the center of a zone, if not given, by computing the mean latitude and longitude of all stops that were visited in that zone.

Having the zone centers' location, we can compute the distances δ ij between zone i and zone j.

We normalize the distances by dividing each element through the largest, i.e., max i,j δ ij , and store these distances in a matrix D ∈ R (M +S)×(M +S) .

As an alternative to the Euclidean distances in D, a sensible choice is to use the travel times (or another measure for costs between stops) as a link to measure 'distance' between zones. However, in the case when we are only given route data, we can proceed with a two-step procedure to have an approximation of the travel times between zones. Since stops are typically assigned to zones, we first identify for each zone the closest stop within the route to the zone center by using the Euclidean distance metric. Second, once each zone has a stop assigned, we extract the travel times between these stops to compose a matrix T which consists of elements T ij corresponding to normalized travel times from station/zone i to j, i.e., T ij = t ij max i,j t ij where t ij are the provided travel times. We will compare the performances of both options.

Since we are to minimize a cost matrix, we reverse the transition matrix to 1 − P ij in the cost matrix to arrive at the following linear combination with ω ∈ [0, 1] being a weight parameter:

where D ij can be interchanged with T ij , and additionally we set the diagonal entries C ii = 0. In fact this problem can be considered as a TSP with unknown costs as we do not a priori know how distance is evaluated against likelihood of zone-to-zone transitions, which contain the implicit tacit knowledge and drivers' preferences. So a good value of ω is still to be determined. Next, we are able to use this cost matrix in a TSP formulation, which optimal solution constitutes a tour through all zones in which one or more stops are located. For details about the implementation we refer the reader to the Appendix.

At this point, we have a sequence of zones that yet has to be transformed into a sequence of stops, e.g., we are in Figure 3a where the zone sequence is (Z A , Z B , Z C ). As the problem size in each zone is considerably smaller, drivers are able to make (near-)optimal navigational decisions that minimize travel time or distance, see Section 2.3. To mimic that behavior, we employ again the framework for solving TSPs, but with the additional flexibility to be an open TSP (OTSP) as we do not need to close the loop within a zone, but rather traverse from one zone to another.

Although there are several options available, the procedure detailed in Figure 3 is in accordance with practice. The approach builds on two elements, which are applied iteratively in each zone:

• The last stop of a previous zone (or station, Figure 3b ) is used as the starting point of the OTSP for the current zone.

• An additional stop of the next zone is added to each OTSP to add sense of direction to find a good stop to finish its zone.

As seen in Figure 3a , the extra stop (not filled) is selected as the one closest (in terms of Euclidean distance) to the computed zone centers (×). Naturally, after the solution for an OTSP is found with the additional stop added, the last stop is removed from the solution such that for the subsequent zone the starting point is again a stop in its previous zone. The procedure continues until the last zone is reached for which the station is used as the additional 'stop'.

Because of clarity of the exposition, we considered Euclidean distances in the figure 

To assess the performance of the proposed methodology we employ it on a routing dataset, which was part of the Amazon Last Mile Research Challenge (Amazon and MIT Center for Transportation & Logistics 2021a). Also the metrics that we use are adopted from this challenge and are provided in Section 4.1.

The dataset counts 6,112 routes from 17 different stations. We split the data such that 5,112 routes are used for learning and 1, 000 routes are reserved as an out-of-sample test set. Furthermore, we impute missing zone ids by taking over the id of the closest stop in terms of travel time-a sensible choice from the driver's perspective. The experiments related to our methodology are run using an Intel Core i7-10850H 2.70 GHz CPU and 32 GB RAM. The learning takes approximately 1 minute and with a test set of 1, 000 routes, it takes on average 30 minutes to render all routes.

However, for choosing a right value of ω for Eq. (1) and reconsidering it in 4.3, we rely on 5-fold cross validation over the training set of 5, 112 routes.

The route prediction performance can be evaluated in two dimensions, that is, how often and by how much a prediction B deviates from the realized stop sequence A := (0, 1, . . . , n), and thus B is in essence a permutation of A. Below, we describe these different components and how they together form an overall route performance metric.

• The sequence deviation SD stop (A, B) measures the difference in stop sequences A and B.

Given that in both cases all n + 1 stops have been visited, we take the stop sequence of B, and for each position we trace back when it has been visited in A to create a vector (a 0 , a 1 . . . , a n ) = π(A) so that

Note that indeed if A ≡ π(A) = B, the deviation becomes 0. Lastly, due to the fact that we deduce the zone sequence from a route by running the procedure outlined in Section 3.1, we can also compute the zone sequence deviation SD zone (A, B) that on the higher level compares sequence deviations between two zone sequences A and B.

• Besides considering the position in the sequence, one can also count the number of edit operations needed to come from the predicted route to the realized route, familiarly known as the Levenshtein distance as introduced by Levenshtein et al. (1966) . This distance metric, renamed to ERP edit (A, B) , counts the number of deletions and insertions to get the same sequence.

• Apart from considering the concurrence and concordance of stop sequences, the size of deviation between two sequences A and B should be evaluated as well. To this end, normalized travel time is introduced to evaluate the difference between two routes on a stop basis;

be the i-th stop of A and the j-th stop of B, with i, j = 0, 1, . . . , n, then it is defined as 

As an aside, dividing ERP norm (A, B) by ERP edit (A, B) translates to the average additional travel time incurred by deviating, and is called ERP ratio (A, B).

Besides these separate metrics, they are combined to create a comprehensive score

so that a performance score is obtained by averaging over all I routes to be predicted:

Evidently, a lower route score means that the predicted sequence coincides more with the realized sequence; 0 means they are exactly the same.

As a first experiment, we consider varying the weight parameter ω in Eq. (1). Furthermore, we can replace distance by travel times T as these are provided in Amazon's dataset. The performances when varying ω in both versions are displayed in Figure 4 .

As expected, we find that only relying on historical information (ω = 0), or only using distance or travel times (ω = 1) does not perform as well as having a trade-off between the two. Overall, we observe that an ω value around 0.9 achieves the best trade-off, but actually any value between 0.1 and 0.9 in Eq. (1) (dashed and solid lines) will result in a performance of around 0.035. This observation demonstrates the robustness of the model as being insensitive to misspecification of ω. In addition, we conclude that the difference between travel times or Euclidean distance is small and thus distance can be used as a good proxy for travel time. 

Reconsidering Eq. (1), wherein we define an Ω matrix to weigh historical information against travel times, we further study the role of the station as the starting and ending point of a route. For the sake of readability, we consider a matrix Ω (r) which contains the weights in Ω corresponding to the station and zones that are visited in a specific route r where the station is related to the first row and column. So as to refine the role of the station, we define Ω (r) matrix as

where ω F ∈ [0, 1] is the weight in the cost function from the station to the first zone. Analogously, ω Z ∈ [0, 1] sets the balance for zone-to-zone transitions, and lastly ω L ∈ [0, 1] is the weight corre-sponding to the return to the station. The elements Ω ij from this matrix are put in the following refinement of Eq. (1) with travel times:

where, similarly to Eq. (1), the diagonal values are set to zero. With this generalized cost matrix we first set ω L = 1 as we anticipate that at the end of a route heading back to the station should be the only concern, and thus the number of times that a specific zone acted as the finish point of a route can be disregarded. This resonates with the thought that a driver is only focused at the end on the travel time it takes to return to station. Having this parameter restrained, we study combinations of ω F and ω Z to obtain the contour plot in Figure 5 . In the plot, we observe that the optimal point should lie somewhere in the region where ω F ∈ [0.1, 0.2] and ω Z ∈ [0.7, 0.8]. So when starting a route in the first transition from the station, historical information should be weighted more important than regular zone-to-zone transitions. This implicates that when determining the route globally preference and navigational considerations have a greater impact than the standard trade-off with travel times.

The choice of ω L = 1 is justified by the degradation in performance for values less than 1. This is shown in Figure 6 by taking four combinations of (ω F , ω Z ) that lie around the inner contour-close to the optimum. In all, when comparing the points in detail, we conclude that the best performance in our cross-validation is obtained when (ω F , ω Z , ω L ) = (0.2, 0.8, 1), which results in the reported performance of the final model in Figure 7 .

In Table 1 , we provide the performance of our approach abbreviated to LG-OL and compare it to two benchmarks to shed some light onto the range of performance scores that can be attained.

Besides the metrics outlined in Section 4 we also added the mean travel time in seconds in the Time column. First, we apply the Nearest Neighbor algorithm (Figure 2a ) on the out-of-sample test set of 1, 000 routes. This algorithm sequentially adds a connection from the last stop to the nearest unvisited stop; a strategy that is sometimes considered to model human route planning behavior (Graham, Joshi, and Pizlo 2000, Wiener, Ehbauer, and Mallot 2009) . As another idea, consider the route which minimizes the travel times as the route a driver follows (Figure 2b ). Such route is generally hard to compute, but within the range of state-of-the-art solvers (e.g., Gurobi).

Surprising is that drivers average travel time (9496 seconds) is about 18% more than when travel time minimization was the only objective (Full TSP); indeed showing that drivers' navigational decisions cannot be approached as a standard optimization problem.

Our methodology obtains a prediction performance (Performance) of 0.0299-a more than 64% reduction compared to the aforementioned benchmarks. Interestingly, the Nearest Neighbor and full TSP models have similar ERP ratio scores, but the full TSP renders better sequences (lower Performance ω F = 0.1, ω Z = 0.7 ω F = 0.1, ω Z = 0.8 ω F = 0.2, ω Z = 0.7 ω F = 0.2, ω Z = 0.8

A sensitivity study on ωL, (zone-to-station), for selected combinations of ωF and ωZ that lie around the inner contour of Figure 5 . The performances, as defined in Eq.

(2), are obtained by averaging over the 5-fold cross-validation scores.

against the benchmarks, we find that our model succeeds to have 40% of the test routes to be nearperfectly predicted, as they are below 0.01; and even gets 80% of them below the 0.05 threshold. In retrospect, reconsidering Figure 4 we find that even for any ω the unadjusted approach outperforms these benchmarks, supporting the fundamental idea of our learn global, optimize local approach.

Evaluating our performance to the submissions for the challenge (Amazon and MIT Center for Transportation & Logistics 2021b), we find that the approach propels itself in the top-tier of submissions, which performances range from 0.025 to 0.050. Finally, to verify the correctness and potential of the approach, we did an additional hypothetical experiment on the test set of which the results are provided in Table 1 as LG-OL (hypothetical). We extracted the zone sequences using Section 3.1 on the test set so as to consider the situation that we are fully capable to predict the zone sequence right. Thus remaining are the inconsistencies between the design choices made in Section 3, and drivers' behavior that is not reflected in the methodology. In the experiment, we find that the performance becomes 0.0094, which is well below all submissions for this challenge. Boxplot representing the route score distribution on the out-of-sample test set of 1,000 routes for our model, and the two benchmarks of Table 1 .

The costs associated with last-mile delivery surmount any other shipping costs. Hence, good predictions about the route taken are crucial from an operations management point of view. The proposed approach leads to several broadly applicable insights about delivery driving.

The approach first determines a zone sequence by solving a traveling salesman problem (TSP) over a cost matrix which is a weighted combination of historical information, which has to be learnt from realized routes, and distance (or travel time) between zones. Next, on the local level, adhering to the established zone sequence on the global level, the order of stops to be visited in a zone is found by solving an open TSP (OTSP), which generates locally optimal solutions. Repeating this for all subsequent zones and patching them together generates a feasible route. Therefore this approach can be considered to learn global (over zones) and optimize local (within zones).

Besides that the approach can be argued to follow the logic of human navigation, the approach is convenient, because it only requires learning on the zone level as by tracking the zone transitions.

The breakdown makes the approach relatively fast, as the zone-sequence TSP and within-zone

OTSPs are not so computationally involved compared to solving a single TSP for the entire route.

This also renders the approach amenable to real-time training using newly realized routes of the same day.

The key element in the approach is the computation of a cost matrix that combines distance and historical information when establishing the zone sequence as a TSP. The experiments show that a weighted combination with travel times provides a slight performance improvement over using Euclidean distance. This fact underpins that when determining the zone sequence an implicit trade-off is made between the driver's zone-to-zone preferences and a measure of distance. Studying the weight value by changing it from 0 to 1, where the setting of 1 corresponds to only considering travel time, uncovers that the exact value is not critical. Overall, the best performance is obtained when the value is close to 0.9. However, the weighting of station-to-zone and zone-to-station transitions can be further adapted in the cost matrix to better mimic driver's behavior resulting in another performance improvement. In fact, these adaptations are motivated by the logic that at the beginning of a route the driver is more concerned with where to start, whereas at the end, when all stops are visited, the only consideration is the return to the station. Indeed, the experiments confirm these hypotheses and we find that the travel time matrix should be weighted less important in the station-to-zone transitions than in the zone-to-zone transitions, and from zoneto-station it is the only factor. These insights come with the caveat that drivers were suggested a route beforehand (see Amazon and MIT Center for Transportation & Logistics 2021a), which might have tainted the driver's decisions.

The approach as-is only utilizes historical stop data (with zone ids) in making predictions. Using additional data which are part of the case study, such as package size, start time, quality of the route, did not provide any clear starting points for improvement. Since the zone ids were given, it is worthwhile to further study the underlying segmentation, because it determines the breakdown of the problem into a global and a series of local problems and as such it is pivotal to the performance of the approach. We have seen that occurrence of zone revisitations are rare. However, in 2.6% of the routes there are more than five revisitations, which hints that in these routes it might be valuable to reconsider the zone segmentation.

The test set also provides us an opportunity to quantify the remaining learning gap. By first extracting the zone sequences in the test set after which we apply the approach, we arrive at a score just below 0.01. This shows that the learn global and optimize local approach can yield a score as low as 0.009 (as a reference we get it to 0.0299, as shown in Section 4). The learning took place on only 5,112 routes, which results in a sparse transition matrix as the zone transition probabilities, when available, are learnt based on the relatively few routes available for each region. Increasing the number of routes for learning, for example by daily tracking of realized routes and updating the count matrix accordingly, the approach can likely improve its accuracy on zone sequence prediction, which have a profound impact on the overall performance.

Another endeavor is to further study the cost matrix structure. The methodology readily incorporates other functional forms, and also different weightings between historic information and distance-based measures. These weightings might depend on the size of the dataset available and even on the length of the route itself. Nevertheless, we find that a linear structure between distance or travel time and historical information is robust to misspecification of this weight parameter and is practically sound.

Finally, the methodology can be employed in other (last-mile) delivery concepts, see for example Boysen, Fedtke, and Schwerdfeger (2021), or generalized to other operational optimization problems, which are susceptible to human interference in the decision making process-a logical starting point is the allied Vehicle Routing Problem, e.g., Bräysy and Gendreau (2005a) . In all, this research demonstrates that integrating learning in an optimization pipeline leverages its connection and value to practice.

In this Appendix, we provide the pseudocode to the methodology presented in Section 3. Although the pseudocode has been designed on the basis of the Amazon Last-Mile Routing Research Challenge dataset, it can be easily adapted to different data structures. As a side note, the zone ids are not consistent for different stations, therefore, we run our code for each s ∈ S, with S the set of stations. The pseudocode used for the learning phase is reported in Algorithm 1, where the function ToZoneSeq converts the realized sequence of stops into a sequence of zones (see Section 3.1), while function ComputeCountMatrix computes an asymmetric matrix where the (i, j)-th entry represents the number of times a driver went from the i-th to the j-th zone. Algorithm 2 illustrates the pseudocode used in the prediction phase. Here, in order to compute the cost matrix, as thoroughly described in Section 3.2, we compute the distance matrix where each entry refers to the travel time between two different zone centers (computed with EstimateZoneCenter).

The code used to run the experiments presented in Section 4 is open source and will be made available.

Input: D s : route data set corresponding to station s 

Transportation & Logistics, 2021a Amazon Last-Mile Routing Research Challenge

Study and analysis of various heuristic algorithms for solving travelling salesman problem-a survey

Planning your route: Where to start?

A survey on matheuristics for routing problems

Modeling city logistics

68% of the world population projected to live in urban areas by 2050, says UN

Convex hull and tour crossings in the Euclidean traveling salesperson problem: Implications for human performance studies

The roles of the convex hull and the number of potential intersections in performance on visually presented traveling salesperson problems

Building efficient probability transition matrix using machine learning from big data for personalized route prediction

Planning paths to multiple targets: Memory involvement and planning heuristics in spatial problem solving

Fine-to-coarse' route planning and navigation in regionalized environments

A method for driving route predictions based on hidden Markov model

Algorithm 2 Predicting Routes Input: CountMatrix: output of Algorithm 1

Zones, Stops: zones and stops characterizing the route Input: station, TravelTimes Output: PredictedStops: predicted sequence of stops 1: ZoneCenters ← EstimateZoneCenter(Zones, Stops) 2: DistanceMatrix ← ComputeDistMatrix(Zones, ZoneCenters) 3: CostMatrix ← ComputeCostMatrix(DistanceMatrix, CountMatrix) 4: PredictedZoneSeq ← TSP(station, Zones, CostMatrix) 5: PredictedStopSeq ← [station] 6: for zone in PredictedZoneSeq do 7: PrevStop ← PredictedStopSeq