Dynamic Learning in Hyper-Heuristics to Solve Flowshop Problems

Pavelski, Lucas Marcondes; Kessaci, Marie-Éléonore; Delgado, Myriam

doi:10.1007/978-3-030-91702-9_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13073))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

779 Accesses

Abstract

Hyper-heuristics (HHs) are algorithms suitable for designing heuristics. HHs perform the search divided in two levels: they look for heuristic components in the high level and the heuristic is used, in the low level, to solve a set of instances of one or more problems. Different from offline HHs, hyper-heuristics with dynamic learning select or generate heuristics during the search. This paper proposes a hyper-heuristic associated with a dynamic learning strategy for selecting Iterated Greedy (IG) components. The proposal is capable of selecting appropriate values for six IG components: local search, perturbation, destruction size, neighborhood size, destruction position and local search focus. The proposed HH is tested with six dynamic adaptation strategies: random, $\epsilon $-greedy, probability matching, multi-armed bandit, LinUCB, and Thompson Sampling (TS). The hyper-parameters of each strategy are tuned by irace. As a testbed, we use several instances with four different sizes (20, 50, 100 and 200 jobs) of three different formulations of flowshop problems (permutation, no-wait, no-idle), two distinct objectives (makespan, flowtime), and four processing time distributions (uniform, exponential and job or machine correlated). The results show that different strategies are most suitable for adapting different IG components, TS performs quite well for all components and, except for local search, using adaptation is always beneficial when compared with the IG running with standard parameters.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Online Learning Hyper-Heuristics in Multi-Objective Evolutionary Algorithms

Dancing to the State of the Art?

Rigorous Performance Analysis of Hyper-heuristics

1 Introduction

An automated methodology for selecting or generating heuristics to solve optimization problems is the focus of hyper-heuristics (HHs) [4]. HHs can be further classified according to the learning phase feedback, where they possibly have no learning, offline learning, or online learning. The later is also known, and referred in this text, as dynamic learning, that selects or generates heuristics during the search process.

As occurs in a simpler level for meta-learning approaches, in a broader view, hyper-heuristics make algorithm design more adaptable to different problems [14]. Another face on the same problem is given by the algorithm selection, particularly for dynamic schedules of algorithms, which generalizes static and per-instance algorithm selection approaches [10]. Hyper-heuristics can contribute therefore to the development of algorithms for solving a wide range of optimization problems.

Flowshop problems (FSP) involve deciding how $J$ jobs will be processed on $M$ machines in series [2]. This paper investigates three FSP formulations: permutation (with no schedule constraints), and no-wait, no-idle which include constraints (there are no waiting jobs, and no idle machines, respectively).

Different proposals of parameter adaptation and HHs exist in the context of FSPs and scheduling problems in general [3, 16, 25, 26]. In this context, one of the first works proposed in the literature uses an adaptive genetic algorithm [26], with online selection of four types of crossover and three mutations for the permutation FSP with makespan objective. The algorithm produces new offspring using different operators proportional to their contributions on previous generations. Results show that the adaptive genetic algorithm presents a good performance when compared with an algorithm with static parameters and uniform selection of operators.

A HH based on Variable Neighborhood Search (VNS) is proposed in [16]. The VNS strategy adapts the shaking mechanism and local search, providing different low-level heuristics. The shaking is adapted by maintaining a tabu-list of non-improving heuristics, while different local searches are chosen greedily on a rank metric based on improving moves. The proposal performs well on four different combinatorial optimization problems, including permutation FSPs.

Another related work is presented in [25], using the Iterated Local Search (ILS) with different neighborhood types. A greedy strategy selects the best neighborhood based on the fitness improvement, number of times each operator is used and the time to perform the local search. Results show advantages on problems considering makespan, as well as flowtime objective.

A recent work [3] proposes an Iterated Greedy (IG) algorithm enhanced by hyper-heuristics to solve hybrid flexible FSPs with setup times. IG is a metaheuristic with excellent results for some FSP variants. It is based on initialization-destruction-construction phases, followed by a local search, which provides at the end a solution that can be accepted or discarded depending on an acceptance criterion. In [3], the neighborhood types used by the local search (swap, insert and inverse) are considered the low-level heuristics to be selected during the search. The enhanced IG is competitive while solving real-world instances.

Inspired by the fact that IG has been adapted, presents good performance on several combinatorial optimization problems, and performs particularly well on permutation FSP [18], in the present paper we propose and analyze different dynamic strategies used by a hyper-heuristic for selecting IG components. By adapting different components like destruction size and position, neighborhood size, perturbation, local search and its focus, the proposed HH is tested with distinct dynamic adaptation strategies: random, $\epsilon $-greedy, probability matching, multi-armed bandit, LinUCB, and Thompson sampling. The most suitable hyper-parameters of each strategy are set in the tuning phase performed by irace. The proposal’s performance is evaluated on three formulations (constraints) of FSPs, with four different sizes, two objectives, and four processing time distributions. This way, we intent to contribute to the FSP understanding and to find general strategies for solving different formulations of the problem.

The main contributions of the paper can be summarized as: (i) adapting six different IG components; (ii) testing six different dynamic learning strategies in the proposed HH; (iii) tuning main HH hyper-parameters; (iv) addressing several FSP variants; (v) providing a high performance adaptive IG capable of outperforming the standard IG [18] in many FSP variants. As far as we know, there is no other previous work considering different HHs with dynamic learning to FSP. Moreover, no other previous work considers the simultaneously adaptation of multiple IG components. Finally, there is no reported results with HH outperforming standard IG on different FSP types.

The paper is organized as follows. Section 2 discusses the basic concepts necessary to understand the proposal. Section 3 details the adaptive IG that is being proposed here. The methodology adopted in the experiments is described in Sect. 4. Results are presented and analyzed in Sect. 5. Finally, Sect. 6 concludes the paper and discusses future perspectives.

2 Background

This section presents the basic concepts regarding the application context (Sect. 2.1 details the FSP) and the proposal (Sect. 2.2 describes the dynamic adaptive strategies used by the proposed hyper-heuristic).

2.1 Flowshop Problems (FSPs)

Flowshop is a combinatorial optimization problem of scheduling. The problem involves deciding how $J$ jobs will be processed on $M$ machines in series. Given the processing times on each machine, a permutation $x= (x_0, \dots , x_J)$ informs the order jobs will be executed on all machines. The most common formulation considers that jobs and machines are available any time, with processing times known in advance, and machine processes are sequence-independent and occur without interruptions [2].

In permutation FSPs, the completion time of a job $x_j$ in the $m$-th machine can be determined by:

$$\begin{aligned} C_{x_j,m} = \max (C_{x_j,m}, C_{x_{j-1},m}) + p_{x_j,m} \end{aligned}$$

(1)

where $p_{x_j,m}$ is the processing time of job $x_j$ on machine $m$.

Two common objectives in FSPs are makespan and flowtime. Makespan is the time required to complete all jobs, i.e., $\max _jC_{x_j,M}$, and flowtime is the sum of all completion times, $\sum _jC_{x_j,M}$.

Besides the permutation FSPs formulations, other variants like the no-wait and no-idle include constraints on the schedules. The no-wait FSP variant only considers schedules where there are no waiting times between job operations. The no-wait completion times are given by:

$$\begin{aligned} C_{x_j,m} = d_{x_{j-1},j} + \sum _{m= 0}^{M} p_{x_j,m} \end{aligned}$$

(2)

where $d$ are the precomputed delay times [20].

No-idle schedules completion times are computed using [23]:

$$\begin{aligned} F(x_{1},m,m+1)&= p_{x_j,m+1} \end{aligned}$$

(3)

$$\begin{aligned} F(x_{j},m,m+1)&= \max \left\{ F(x_{j-1},m,m+1) - p_{x_j, m}, 0 \right\} + p_{x_j,m+1} \end{aligned}$$

(4)

$$\begin{aligned} C_{x_j,m}&= \sum _{m= 1}^{M-1} F(x_{j},m,m+1) + \sum _{k=1}^{j} p_{x_k,1}. \end{aligned}$$

(5)

where $F(x_{j},m,m+1)$ is the minimum difference between the completion of processing up to job $x_{j}$, on machines $k+1$ and k, restricted by the no-idle constraint.

In addition to objectives and constrains, a FSP formulation includes the definition of processing times, which can be correlated or non-correlated and whose distributions can be uniform or exponential. Large number of jobs and uniform processing times usually make the problem harder to solve. Alternatively, simple heuristics perform well when there are correlations between processing time [24]. Also, in this paper, we investigate processing times with exponential distributions, and job or machine correlated processing times.

2.2 Hyper-Heuristics and Their Adaptation Strategies

According to [15], a hyper-heuristic (HH) works with a two-level structure, at the high-level it looks for heuristic configurations $h \in H$, considering H the heuristic space. At the low-level, each solution $x\in X$ of the target optimization problem p is generated by the heuristic $h \in H$. There are two evaluation functions: in the first level, the HH’s success is measured by function $F:h \rightarrow \mathfrak {R}$ and in the second level, each solution $x\in X $ is evaluated by an objective function $f: x\rightarrow \mathfrak {R}$.

From a mapping function $M: f(x) \rightarrow F(h)$, it is possible to define the purpose of a selection HH. The HH must optimize F(h) by means of the search for the optimal heuristic configuration $h^*$, in H, thus $h^*$ generates the optimal solution(s) $x^*$ [15]. The formal HH definition, in a minimization optimization problem is summarized in Eq. 6.

$$\begin{aligned} \small F(h^* \mid h^* \rightarrow x^*, h^* \in H) \leftarrow f(x^*, x^* \in X) = min\{ f(x), x \in X\} \end{aligned}$$

(6)

In this paper, h relies on different choices for each IG component, f is associated with makespan or flowtime objectives, and F is measured by a reward function detailed in Sect. 3.

Hyper-heuristics aim therefore at providing more generalized solutions for optimization problems by producing good results when dealing with a set of instances or a set of problems. For this, HHs work in the heuristic space rather than the solution space. Based on specific strategies, they adapt low-level heuristics, which are used to solve the target problem(s).

We investigate six adaptation strategies commonly used in the HH literature: Random, $\epsilon $-greedy, Probability Matching, Multi-armed Bandit, LinUCB and Thompson sampling.

Random parameter selection is the simplest strategy and in most cases, it serves as a baseline for comparison between static and dynamic parameter selection. It might be beneficial depending on the chance of selecting the best parameter combination [5].

$\epsilon $ -greedy is a simple strategy often referenced on the exploration-exploitation dilemma [22]. With probability $\epsilon $, the parameter with the best average reward is chosen, otherwise, with probability $1 - \epsilon $, a random one is selected.

Probability Matching (PM) [7] works as a roulette wheel selection biased towards the operators with best quality. The probability of selecting the k-th operator from a set of K operators at iteration t is given by:

$$\begin{aligned} P_{k,t} = P_{min} + (1 - K \times P_{min}) \frac{q_{k,t}}{\sum _{j=1}^K q_{j,t}} \end{aligned}$$

(7)

where $q_{k,t}$ is the quality of k-th operator and $0< P_{min} < 1/K$ is used to guarantee that every operator will have a minimum chance of being chosen. The quality values are updated according to the rewards:

$$\begin{aligned} q_{k,t} = q_{k,t-1} + \alpha \times (r^{acc}_{k,t} - q_{k,t-1}) \end{aligned}$$

(8)

where $\alpha $ is the learning rate parameter and $r^{acc}_{k,t}$ is the accumulated reward of operator k during a given update window of size W. The accumulation in $r^{acc}_{k,t}$ considers either the average or extreme and, optionally, normalized reward values.

Multi-armed bandit (MAB) [1] algorithms are based on the Upper Confidence Bound for exploitation-exploration trade-off. In particular, the Fitness-Rate-Rank-Based Multi-armed Bandit [11] considers the dynamic search behavior with a sliding window of size W to store the rewards of each operator. The selected operator maximizes the expression:

$$\begin{aligned} FRR_{k,t} + C_s \left( \frac{2 \ln \sum _{l=1}^K n_l^t}{n_k^t} \right) \end{aligned}$$

(9)

where $C_s$ is a scaling parameter, $n_k^t$ is the number of times operator k is applied during the window of size W, and $FRR_{k,t}$ is the k-th operator credit value given by:

$$\begin{aligned} FRR_{k,t} = \frac{D^{rank_k} \times r_k}{\sum _{l=1}^K D^{rank_l} \times r_l} \end{aligned}$$

(10)

in which $D$ is the best operator influence decay parameter, $r^{acc}_k$ is the k-th operator accumulated reward, and $rank_k$ is the k-th operator reward sum rank.

Linear Upper Confidence Bound (LinUCB) [12] works by assuming that the reward for a given operator is linearly proportional to the values of contextual features, i.e., $E[r_{k,t}|\boldsymbol{\phi }_{k,t}] = \boldsymbol{\phi }_{k,t}^T \mathbf {\theta ^*}_{k}$, where $\boldsymbol{\phi }_{k,t}$ is the feature vector and $\mathbf {\theta ^*}_{k}$ the unknown coefficients. We consider four fitness landscape metrics as the context for a local search procedure, calculated online during the local search step [17]:

Adaptive walk length: the total number of steps of the local search;
Autocorrelation: correlation between the fitness values observed with the fitness values of the previous solutions;
Fitness-distance correlation: correlation between fitness and insertion distance considering the initial and final solutions;
Neutrality: proportion of neighbors with equal fitness values.

Using a ridge regression formulation, the coefficients can be found efficiently with the following steps:

$$\begin{aligned} \begin{aligned} \mathbf {\theta ^*}_{k,t}&= A_{k,t}^{-1} b_{k,t} \\ P_{k,t}&= \mathbf {\theta ^*}_{k,t} \boldsymbol{\phi }_{k,t} + \alpha \sqrt{\boldsymbol{\phi }_{k,t} A_{k,t}^{-1} \boldsymbol{\phi }_{k,t}} \end{aligned} \end{aligned}$$

(11)

where $\alpha $ is a learning rate parameter. The operator with maximum $P_{k,t}$ is chosen, yielding the reward value $r_{k,t}$, and the model update follows:

$$\begin{aligned} \begin{aligned} A_{k,t}&= A_{k,t-1} + \boldsymbol{\phi }_{k,t} \boldsymbol{\phi }_{k,t}^T \\ b_{k,t}&= b_{k,t-1} + r_{k,t} \boldsymbol{\phi }_{k,t}. \end{aligned} \end{aligned}$$

(12)

Thompson Sampling (TS) [19] strategy starts with a prior distribution, chooses the best operator by sampling, observes the output and updates the distribution. The Beta distribution $Beta(S_{k,t}, F_{k,t})$ models Bernoulli trials where operator k has $S_{k,t}$ successes (rewards $r_{k} > 0$) and $F_{k,t}$ fails (rewards $r_{k} \le 0$). Therefore we choose the operator with:

$$\begin{aligned} op = \mathop {\text {arg max}}\limits _{k} Sample[Beta(S_{k,t}, F_{k,t})] \end{aligned}$$

(13)

and update the distribution after the reward:

$$\begin{aligned} \begin{aligned} S_{k,t}&= S_{k,t-1} + 1_{r_{k,t} > 0} \\ F_{k,t}&= F_{k,t-1} + 1_{r_{k,t} \le 0}. \end{aligned} \end{aligned}$$

(14)

Alternatively, the Dynamic TS [8] introduces a window size parameter W and a modified update rule (after iteration W) as follows:

$$\begin{aligned} \begin{aligned} S_{k,t}&= (S_{k,t-1} + 1_{r_{k,t} > 0}) \tfrac{W}{W+1} \\ F_{k,t}&= (F_{k,t-1} + 1_{r_{k,t} \le 0}) \tfrac{W}{W+1}. \end{aligned} \end{aligned}$$

(15)

Table 1. Hyper-parameters of the addressed adaptation strategies.

Full size table

Table 1 shows a summary of all hyper-parameters used by the adaptation strategies, as well as two hyper-parameters common to all strategies: reward type and warm-up period. The warm-up period is considered at the beginning of the iterations where strategies are chosen randomly. Reward type is detailed in the next section.

3 Adaptive IG Proposal

Considered the state of the art for some FSP variants, the Iterated Greedy (IG) algorithm [18] is a successful iterative metaheuristic that encompasses five main steps: (1) the incumbent solution $x$ is initialized, (2) a destruction phase randomly removes d jobs, (3) a construction procedure inserts each job at the best position, (4) a local search generates a new solution by exploiting the solution resulted from construction and (5) the new solution replaces the incumbent $x$ according to an acceptance criterion. The last step accepts the new solution $x''$ based on the following probability:

$$\begin{aligned} P_{accep}(x'') = \left\{ \begin{array}{ll} 1.0 &{} \text{ if } f(x'') < f(x) \\ \exp \left( \tfrac{-(f(x'') - f(x)) }{Temp} \right) &{} \text{ otherwise } \\ \end{array}\right. \end{aligned}$$

(16)

where f(.) is the cost function and Temp is the temperature defined by:

$$\begin{aligned} Temp = T \times \frac{\sum _{j=1}^{J}\sum _{m=1}^{M} p_{j, m}}{J* M* 10}. \end{aligned}$$

(17)

IG has two main parameters: the destruction size d and the temperature factor T, whose values IG authors [18] recommend to be set as $d = 4$ and $T = 0.5$. Other recommended configurations are Nawaz–Enscore–Ham (NEH) construction heuristic as the initialization and iterative improvement as local search. The local search iteratively inserts a job, chosen randomly without replacement, on the best position until there are no improvements [21]. This version is referred from now on as Standard IG.

The hyper-heuristic proposed in the present paper considers an adaptive strategy (one fixed among the six possible strategies described in Sect. 2.2) to update some components of an IG algorithm. An adaptive strategy requires a set H of possible choices and a reward function indicating how well a particular choice performs.

An important issue for defining an adaptive strategy is the reward function. It returns a real number representing the quality of a given choice. Good choices have positive rewards and bad choices receive negative ones. In the case of using IG to solve FSPs, reward of kth parameter at iteration t can be calculated based on the relative decrease in cost function value (e.g. makespan or any other objective function being considered):

$$\begin{aligned} r_{k,t} = (f(x_{before}) - f(x_{after}))/f(x_{before}). \end{aligned}$$

(18)

Here $x_{before}$ and $x_{after}$ represent solutions before and after the reward evaluation period (whose reference can be either local search or iteration). We have four possible periods during which a solution has its performance evaluated: before the iteration (bI) or the local search (bL) and after the iteration (aI) or local search (aL), giving rise to four types of reward:

bLaL: $(f(x_{\text {before local search}}) - f(x_{\text {after local search}}))/f(x_\text {before local search})$;
bIaL: $(f(x_{\text {before iteration}}) - f(x_{\text {after local search}}))/f(x_\text {before iteration})$;
bLaI: $(f(x_{\text {before local search}}) - f(x_{\text {after iteration}}))/f(x_\text {before local search})$;
bIaI: $(f(x_{\text {before iteration}}) - f(x_{\text {after iteration}}))/f(x_\text {before iteration})$.

As Fig. 1 shows, each reward type considers a different period to get the feedback for each solution considering its quality increase/decrease.

Using the standard IG [18] as basis, with a budget computation time as the stopping criteria, we identify some components that can be adapted dynamically: type of the local search, type of perturbation, destruction size, neighborhood size, destruction position, local search focus. A summary of these adaptive components and their values are presented in Table 2.

In our case, different choices are possible for each IG component, for example, the number of deconstructions. We refer to the parameter values as discrete choices (also referred to arms in multi-armed bandit literature) indexed by $\{1, \dots , K\}$, where $K = |H|$ is the total number of choices. Therefore, according to Eq. 6, we have each choice representing a heuristic $h \in H$, where the dynamic adaptation strategy of the HH chooses h as the k-th choice, $k=1, \dots , |H|$.

The pool regarding the first choice sets encompasses the most usual options for Local search and Perturbation in the context of combinatorial optimization. The options for Destruction size are the ones often chosen in IG implementations for FSP. Finally, the pools for the last three choice sets are defined aiming to produce different granularities, starting from a coarse, ending to a fine search.

Table 2. IG adaptive parameters.

Full size table

The HH based on dynamic learning proposed in this paper is capable of adapting different components (e.g. local search and perturbation) for each iteration of the adaptive IG. In local search for example, it can choose between the original IG’s iterative improvement, first improvement, first-best improvement, random best improvement. The last three local search options consider all possible insertions. The perturbation adaptation considers three possibilities: (1) the IG destruction-construction steps; (2) two random swaps and a transposition, as used in [21] for ILS on FSP , and (3) destruction-construction with iterative improvement local search between destructions, recently proposed in [6].

The neighborhood size adaptation considers the percentage of the neighborhood explored during the local search step. For example, exploring only half of the neighborhood at the beginning of the search could save time for exploitation at the end of the search. Similarly, choosing the destruction size parameter dynamically might improve the search during exploration-exploitation phases.

As mentioned, there are two mechanisms in IG that randomly choose jobs to be re-inserted: destruction and best-insertion local search. An adaptive mechanism can be applied in the last two IG components shown in Table 2 to focus these operators on parts of the solution that have a better chance of improvement. For that, we propose partitioning the solution into chunks and adaptively selecting from each chunk the job that will be sampled.

As shown in Table 2, for some parameters there are multiple possible choice pools $\{H_1,H_2,...\}$. For example, the destruction size can be chosen from the set $H_1=\{2,4\}$, $H_2=\{4,6\}$, $H_3=\{2,4,6\}$ or $H_4=\{4,8\}$. Also on destruction position and local search focus, the solution can be partitioned into 3, 10 or $J$ chunks (the last considering one arm for each job). Some HH hyper-parameters like the pool $H_i$ for a component, reward type and update window are determined in a parameter tuning phase described in Sect. 4.

4 Tuning and Testing Phases

Based on the hyper-heuristic with dynamic learning and its six possible adaptive strategies described in Sect. 2.2, Sect. 3 has detailed the proposed adaptive IG. This section presents the hyper-parameter configuration performed by irace (tuning) using part of available data and the test performed on the tuned HH using the remaining one. The configurations are evaluated with a budget of $J\times (M/ 2) \times 30$ ms on tuning and test phases. Algorithms are implemented in C++ using the Paradiseo library [9]. The experiments have been executed on a server with 8-core AMD EPYC 7542 processors and 16 GB of RAM. For the results analysis, we used the R language and relevant packages^{Footnote 1}.

Tuning phase Before running the experiments, we perform a tuning phase aiming to determine which strategy configuration (shown in Table 1) works best for each IG component (shown in Table 2). The random strategy is not considered for destruction position and local search focus, because a random choice in these cases is equivalent to the standard IG behaviour.

The irace algorithm [13] with default parameters and 5000 configurations evaluations is used to tune each combination of the six adaptive components and the six dynamic strategies. There are multiple choice pools $\{H_1,H_2,...\}$ for destruction size, neighborhood size, destruction position and local search focus components. For example, the destruction size can be chosen from $H_1=\{2,4\}$ or $H_2=\{4,8\}$. In these cases, the pool choice is considered as an additional categorical parameter for irace during the tuning phase.

The instance set for the tuning phase is composed of 48 (3 $\times $ 2 $\times $ 3 $\times $ 4) instances resulted from the combinations of: 3 sizes (20 or 50 jobs and 10 machines), 2 objectives (makespan or flowtime), 3 types (permutation, no-wait or no-idle FSPs), and 4 processing times distributions: (exponential, uniform, job-correlated or machine-correlated processing times).

Testing phase After tuning, the best configurations are tested with 10 restarts on a set of unseen instances with the same features, but sampled with different random seeds. Aiming to better evaluate algorithms’ generalization capabilities, we also include larger instances of $J= 100$ and $J= 200$ jobs and $M= 20$ in the testing phase.

The evaluation for each algorithm is done using the Average Relative Percentage Deviation (ARPD) given by:

$$\begin{aligned} ARPD_{alg} = \frac{1}{R} \sum _{r=1}^{R} 100 \times \frac{f(x^{alg}_r) - f(x^{best})}{f(x^{best})} \end{aligned}$$

(19)

where R is the number of runs and $x^{alg}_r$ is the best solution found by algorithm alg. The reference solution $x^{best}$ is given by the best solution found by a standard IG (the same used by the authors in [18]) with a higher budget of $J\times (M/ 2) \times 120$ ms and 30 restarts.

Finally, the configurations are also compared with the standard IG configuration to evaluate the effectiveness of the adaptive components in the presented scenario. In all comparisons, we highlight the lowest ARPDs and perform a Friedman rank sum test with Nemenyi post hoc to verify if the differences are statistically significant with p-value threshold of 0.05: results with no statistically significant differences from the best one are highlighted with gray background.

5 Results

Table 3 shows some hyper-parameter values (reward type, choice sets and update window) tuned by irace for each IG component addressed in the paper.

Table 3. Tuned hyper-parameters.

Full size table

Different reward types have been tuned for the different adaptive components and HHs, with no dominance between the different options. The choice sets for the destruction size parameter enable more exploration with values higher than the default $d = 4$, which is present in all options. The choice sets for neighborhood size are small (2 or 3), with preference for a coarse search, while the local search focus uses partitions with 10 or more choices in three among five cases. Finally, the update window is short for MAB and TS on perturbation component, indicating that its value (4 or 8 destructions) is often switched during the search. However, in most cases the strategies prefer less frequent changes, since large window sizes are the tuned parameters.

The ARPD values for each adaptive component and strategy are shown in Table 4, and they are calculated using Eq. 19, with $R=10$ and considering all the different testing instances. The values for standard IG configuration [18] are computed with the same budget ($J\times (M/ 2) \times 30$) as the HHs proposals on the testing instances. Notice that this budget is lower than the one used to compute the reference $f(x_{best})$ values for ARPD.

The adaptive components are able to improve the static standard IG configuration for all (perturbation, destruction size, destruction position and local search focus) but one IG component. Local search adaptation is not effective independently of the strategy used by the HH, meaning that the iterative improvement performed by the standard IG is quite effective compared with the other choices. TS adaptation achieves the lowest ARPD for perturbation and destruction position components, while $\epsilon $-greedy performs well on the local search focus task. As (biased) adaptation might not be the best option in some cases [5], the simple random strategy performs well for selecting the destruction size and neighborhood size. In all cases, the adaptive strategies TS and MAB are among the strategies with the lowest or statistically equivalent to the lowest ARPD values.

We see from Table 4 that TS is robust for different components but provides the lowest among all results when adapting IG perturbation. However, it is equivalent to most other HH proposals.

We compare TS configuring only perturbation with two others that try to adapt multiple components simultaneously: Adapt all components, Adapt all components except local search (for which no adaptation strategy was capable of improving the performance). The Adapt all approaches use strategies with the best ARPD values for each component from Table 4, which means, TS for perturbation and destruction position, Random for destruction and neighborhood sizes and $\epsilon $-greedy for Local search focus. The results in Table 5 show the metrics separated by objective, FSP type, processing times distribution and size. Overall, IG with adaptive perturbation obtains the lowest ARPDs. Adapting all components at the same time does not provide benefits, but when we eliminate the local search it performs like the best approach.

Table 4. Adaptation strategies ARPDs (and standard deviation) for each adaptive component and strategy. Lowest mean values are highlighted in bold, statistically equivalent values are highlighted with gray background.

Full size table

Table 5. Adaptation strategies ARPDs (and standard deviation) for all adaptive components, perturbation and destruction size adaptation and standard (static) IG. Best mean and statistically equivalent values are in bold and gray background, respectively.

Full size table

6 Conclusions

This paper proposed and analyzed the use of hyper-heuristic with dynamic strategies to adapt different components of the Iterated Greedy algorithm. Six different adapting strategies (random, $\epsilon $-greedy, probability matching, multi-armed bandit, LinUCB, and Thompson sampling) were tested to adapt six IG components (local search, perturbation, destruction size, neighborhood size, destruction position and local search focus). After a tuning phase performed by irace to set the best strategy hyper-parameters for each component being adapted, the proposal was tested in different variants of the flowshop problems.

Results show that, in most cases, the adaptation is able to improve the performance over the static standard IG configuration, especially the perturbation operator adapted using dynamic Thompson Sampling. Also, using multiple adaptive components did not seem to be beneficial, unless we fix the local search as iterated improvement, a fact that deserves further investigation.

The work can be expanded by including different flowshop problems (objectives and constraints), adaptation strategies, and alternative operators. In addtion, we intend to propose modifications to improve the performance of the Adapt all approach.

Notes

1.
Tuning and Testing instances, as well as the code used in the paper, are available at https://github.com/lucasmpavelski/Adaptive-IG.

References

Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(9), 397–422 (2002)
Google Scholar
Baker, K.R., Trietsch, D.: Principles of Sequencing and Scheduling. Wiley Publishing, New Jersey USA (2009)
Google Scholar
Burcin, O.F., Sagir, M.: Iterated greedy algorithms enhanced by hyper-heuristic based learning for hybrid flexible flowshop scheduling problem with sequence dependent setup times: a case study at a manufacturing plant. Comput. Oper. Res. 125, 105044 (2021). https://doi.org/10.1016/j.cor.2020.105044
Burke, E.K., Hyde, M.R., Kendall, G., Ochoa, G., Özcan, E., Woodward, J.R.: A classification of hyper-heuristic approaches: revisited. In: Gendreau, M., Potvin, J.-Y. (eds.) Handbook of Metaheuristics. ISORMS, vol. 272, pp. 453–477. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91086-4_14
Chapter Google Scholar
Chakhlevitch, K., Cowling, P.: Hyperheuristics: recent developments. In: Studies in Computational Intelligence, vol. 136, pp. 3–29. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79438-7_1
Dubois-Lacoste, J., Pagnozzi, F., Stützle, T.: An iterated greedy algorithm with optimization of partial solutions for the makespan permutation flowshop problem. Comput. Oper. Res. 81, 160–166 (2017). https://doi.org/10.1016/j.cor.2016.12.021
Article MathSciNet MATH Google Scholar
Goldberg, D.E.: Probability matching, the magnitude of reinforcement, and classifier system Bidding. Mach. Learn. 5(4), 407–425 (1990). https://doi.org/10.1023/A:1022681708029
Article Google Scholar
Gupta, N., Granmo, O.C., Agrawala, A.: Thompson sampling for dynamic multi-armed bandits. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, pp. 484–489. IEEE, Honolulu, HI, USA (2011). https://doi.org/10.1109/ICMLA.2011.144
Humeau, J., Liefooghe, A., Talbi, E.G., Verel, S.: ParadisEO-MO: From Fitness Landscape Analysis to Efficient Local Search Algorithms. Research Report RR-7871, INRIA (2013)
Google Scholar
Kerschke, P., Hoos, H.H., Neumann, F., Trautmann, H.: Automated algorithm selection: survey and perspectives. Evol. Comput. 27(1), 3–45 (2019)
Article Google Scholar
Li, K., Fialho, A., Kwong, S., Zhang, Q.: Adaptive operator selection with bandits for a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 18(1), 114–130 (2014). https://doi.org/10.1109/TEVC.2013.2239648
Article Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web - WWW ’10, p. 661. ACM Press, Raleigh, North Carolina, USA (2010). https://doi.org/10.1145/1772690.1772758
López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L.P., Stützle, T., Birattari, M.: The irace package: iterated racing for automatic algorithm configuration. Oper. Res. Perspect. 3, 43–58 (2016). https://doi.org/10.1016/j.orp.2016.09.002
Article MathSciNet Google Scholar
Pappa, G.L., Ochoa, G., Hyde, M.R., Freitas, A.A., Woodward, J., Swan, J.: Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms. Genetic Program. Evolvable Mach. 15(1), 3–35 (2013). https://doi.org/10.1007/s10710-013-9186-9
Article Google Scholar
Pillay, N., Qu, R.: Hyper-heuristics: theory and applications. Springer Nature, 1 edn. (2018). https://doi.org/10.1007/978-3-319-96514-7
Hsiao, P.-C., Chiang, T.-C., Fu, L.-C.: A VNS-based hyper-heuristic with adaptive computational budget of local search. In: 2012 IEEE Congress on Evolutionary Computation, pp. 1–8. IEEE, Brisbane, Australia (2012). https://doi.org/10.1109/CEC.2012.6252969
Pitzer, E., Affenzeller, M.: Recent Advances in Intelligent Engineering Systems. Studies in Computational Intelligence, vol. 378, chap. A Comprehensive Survey on Fitness Landscape Analysis, pp. 161–186. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23229-9
Ruiz, R., Stützle, T.: A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur. J. Oper. Res. 177(3), 2033–2049 (2007). https://doi.org/10.1016/j.ejor.2005.12.009
Article MATH Google Scholar
Russo, D., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z.: A Tutorial on Thompson Sampling. arXiv:1707.02038 [cs] (2020)
Sapkal, S.U., Laha, D.: A heuristic for no-wait flow shop scheduling. Int. J. Adv. Manuf. Technol. 1, 1327–1338 (2013). https://doi.org/10.1007/s00170-013-4924-y
Stützle, T.: Applying iterated local search to the permutation flow shop problem. Technical report, FG Intellektik, TU Darmstadt, Darmstadt, Germany (1998)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning Series. 2nd edn. The MIT Press, Cambridge, Massachusetts (2018)
Google Scholar
Tasgetiren, M.F., Pan, Q.K., Suganthan, P.N., Liang, Y.C.: A discrete differential evolution algorithm for the no-wait flowshop scheduling problem with total flowtime criterion. In: 2007 IEEE Symposium on Computational Intelligence in Scheduling, pp. 251–258. IEEE, Honolulu, USA (2007)
Google Scholar
Watson, J.P., Barbulescu, L., Howe, A.E., Whitley, L.D.: Algorithm performance and problem structure for flow-shop scheduling. In: AAAI/IAAI, pp. 688–695. American Association for Artificial Intelligence, Menlo Park, CA, USA (1999)
Google Scholar
Yahyaoui, H., Krichen, S., Derbel, B., Talbi, E.G.: A hybrid ILS-VND based hyper-heuristic for permutation flowshop scheduling problem. Procedia Comput. Sci. 60, 632–641 (2015). https://doi.org/10.1016/j.procs.2015.08.199
Article Google Scholar
Zhang, L., Wang, L., Zheng, D.Z.: An adaptive genetic algorithm with multiple operators for flowshop scheduling. Int. J. Adv. Manuf. Technol. 27(5–6), 580–587 (2006). https://doi.org/10.1007/s00170-004-2223-3
Article Google Scholar

Download references

Acknowledgment

M. Delgado acknowledges CNPq (grants 439226/2018-0, 314699/2020-1) for her partial financial support.

Author information

Authors and Affiliations

CPGEI - Universidade Tecnológica Federal do Paraná, Curitiba, Brazil
Lucas Marcondes Pavelski & Myriam Delgado
CRIStAL - Univ. Lille, CNRS - Centrale Lille - UMR 9189, 59000, Lille, France
Marie-Éléonore Kessaci

Authors

Lucas Marcondes Pavelski
View author publications
Search author on:PubMed Google Scholar
Marie-Éléonore Kessaci
View author publications
Search author on:PubMed Google Scholar
Myriam Delgado
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Lucas Marcondes Pavelski .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pavelski, L.M., Kessaci, MÉ., Delgado, M. (2021). Dynamic Learning in Hyper-Heuristics to Solve Flowshop Problems. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13073. Springer, Cham. https://doi.org/10.1007/978-3-030-91702-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-91702-9_11
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91701-2
Online ISBN: 978-3-030-91702-9
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

Dynamic Learning in Hyper-Heuristics to Solve Flowshop Problems

Abstract

Similar content being viewed by others

Online Learning Hyper-Heuristics in Multi-Objective Evolutionary Algorithms

Dancing to the State of the Art?

Rigorous Performance Analysis of Hyper-heuristics

Explore related subjects

1 Introduction

2 Background

2.1 Flowshop Problems (FSPs)

2.2 Hyper-Heuristics and Their Adaptation Strategies

3 Adaptive IG Proposal

4 Tuning and Testing Phases

5 Results

6 Conclusions

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us