1 Introduction

An automated methodology for selecting or generating heuristics to solve optimization problems is the focus of hyper-heuristics (HHs) [4]. HHs can be further classified according to the learning phase feedback, where they possibly have no learning, offline learning, or online learning. The later is also known, and referred in this text, as dynamic learning, that selects or generates heuristics during the search process.

As occurs in a simpler level for meta-learning approaches, in a broader view, hyper-heuristics make algorithm design more adaptable to different problems [14]. Another face on the same problem is given by the algorithm selection, particularly for dynamic schedules of algorithms, which generalizes static and per-instance algorithm selection approaches [10]. Hyper-heuristics can contribute therefore to the development of algorithms for solving a wide range of optimization problems.

Flowshop problems (FSP) involve deciding how \(J\) jobs will be processed on \(M\) machines in series [2]. This paper investigates three FSP formulations: permutation (with no schedule constraints), and no-wait, no-idle which include constraints (there are no waiting jobs, and no idle machines, respectively).

Different proposals of parameter adaptation and HHs exist in the context of FSPs and scheduling problems in general [3, 16, 25, 26]. In this context, one of the first works proposed in the literature uses an adaptive genetic algorithm [26], with online selection of four types of crossover and three mutations for the permutation FSP with makespan objective. The algorithm produces new offspring using different operators proportional to their contributions on previous generations. Results show that the adaptive genetic algorithm presents a good performance when compared with an algorithm with static parameters and uniform selection of operators.

A HH based on Variable Neighborhood Search (VNS) is proposed in [16]. The VNS strategy adapts the shaking mechanism and local search, providing different low-level heuristics. The shaking is adapted by maintaining a tabu-list of non-improving heuristics, while different local searches are chosen greedily on a rank metric based on improving moves. The proposal performs well on four different combinatorial optimization problems, including permutation FSPs.

Another related work is presented in [25], using the Iterated Local Search (ILS) with different neighborhood types. A greedy strategy selects the best neighborhood based on the fitness improvement, number of times each operator is used and the time to perform the local search. Results show advantages on problems considering makespan, as well as flowtime objective.

A recent work [3] proposes an Iterated Greedy (IG) algorithm enhanced by hyper-heuristics to solve hybrid flexible FSPs with setup times. IG is a metaheuristic with excellent results for some FSP variants. It is based on initialization-destruction-construction phases, followed by a local search, which provides at the end a solution that can be accepted or discarded depending on an acceptance criterion. In [3], the neighborhood types used by the local search (swap, insert and inverse) are considered the low-level heuristics to be selected during the search. The enhanced IG is competitive while solving real-world instances.

Inspired by the fact that IG has been adapted, presents good performance on several combinatorial optimization problems, and performs particularly well on permutation FSP [18], in the present paper we propose and analyze different dynamic strategies used by a hyper-heuristic for selecting IG components. By adapting different components like destruction size and position, neighborhood size, perturbation, local search and its focus, the proposed HH is tested with distinct dynamic adaptation strategies: random, \(\epsilon \)-greedy, probability matching, multi-armed bandit, LinUCB, and Thompson sampling. The most suitable hyper-parameters of each strategy are set in the tuning phase performed by irace. The proposal’s performance is evaluated on three formulations (constraints) of FSPs, with four different sizes, two objectives, and four processing time distributions. This way, we intent to contribute to the FSP understanding and to find general strategies for solving different formulations of the problem.

The main contributions of the paper can be summarized as: (i) adapting six different IG components; (ii) testing six different dynamic learning strategies in the proposed HH; (iii) tuning main HH hyper-parameters; (iv) addressing several FSP variants; (v) providing a high performance adaptive IG capable of outperforming the standard IG [18] in many FSP variants. As far as we know, there is no other previous work considering different HHs with dynamic learning to FSP. Moreover, no other previous work considers the simultaneously adaptation of multiple IG components. Finally, there is no reported results with HH outperforming standard IG on different FSP types.

The paper is organized as follows. Section 2 discusses the basic concepts necessary to understand the proposal. Section 3 details the adaptive IG that is being proposed here. The methodology adopted in the experiments is described in Sect. 4. Results are presented and analyzed in Sect. 5. Finally, Sect. 6 concludes the paper and discusses future perspectives.

2 Background

This section presents the basic concepts regarding the application context (Sect. 2.1 details the FSP) and the proposal (Sect. 2.2 describes the dynamic adaptive strategies used by the proposed hyper-heuristic).

2.1 Flowshop Problems (FSPs)

Flowshop is a combinatorial optimization problem of scheduling. The problem involves deciding how \(J\) jobs will be processed on \(M\) machines in series. Given the processing times on each machine, a permutation \(x= (x_0, \dots , x_J)\) informs the order jobs will be executed on all machines. The most common formulation considers that jobs and machines are available any time, with processing times known in advance, and machine processes are sequence-independent and occur without interruptions [2].

In permutation FSPs, the completion time of a job \(x_j\) in the \(m\)-th machine can be determined by:

$$\begin{aligned} C_{x_j,m} = \max (C_{x_j,m}, C_{x_{j-1},m}) + p_{x_j,m} \end{aligned}$$
(1)

where \(p_{x_j,m}\) is the processing time of job \(x_j\) on machine \(m\).

Two common objectives in FSPs are makespan and flowtime. Makespan is the time required to complete all jobs, i.e., \(\max _jC_{x_j,M}\), and flowtime is the sum of all completion times, \(\sum _jC_{x_j,M}\).

Besides the permutation FSPs formulations, other variants like the no-wait and no-idle include constraints on the schedules. The no-wait FSP variant only considers schedules where there are no waiting times between job operations. The no-wait completion times are given by:

$$\begin{aligned} C_{x_j,m} = d_{x_{j-1},j} + \sum _{m= 0}^{M} p_{x_j,m} \end{aligned}$$
(2)

where \(d\) are the precomputed delay times [20].

No-idle schedules completion times are computed using [23]:

$$\begin{aligned} F(x_{1},m,m+1)&= p_{x_j,m+1} \end{aligned}$$
(3)
$$\begin{aligned} F(x_{j},m,m+1)&= \max \left\{ F(x_{j-1},m,m+1) - p_{x_j, m}, 0 \right\} + p_{x_j,m+1} \end{aligned}$$
(4)
$$\begin{aligned} C_{x_j,m}&= \sum _{m= 1}^{M-1} F(x_{j},m,m+1) + \sum _{k=1}^{j} p_{x_k,1}. \end{aligned}$$
(5)

where \(F(x_{j},m,m+1)\) is the minimum difference between the completion of processing up to job \(x_{j}\), on machines \(k+1\) and k, restricted by the no-idle constraint.

In addition to objectives and constrains, a FSP formulation includes the definition of processing times, which can be correlated or non-correlated and whose distributions can be uniform or exponential. Large number of jobs and uniform processing times usually make the problem harder to solve. Alternatively, simple heuristics perform well when there are correlations between processing time [24]. Also, in this paper, we investigate processing times with exponential distributions, and job or machine correlated processing times.

2.2 Hyper-Heuristics and Their Adaptation Strategies

According to [15], a hyper-heuristic (HH) works with a two-level structure, at the high-level it looks for heuristic configurations \(h \in H\), considering H the heuristic space. At the low-level, each solution \(x\in X\) of the target optimization problem p is generated by the heuristic \(h \in H\). There are two evaluation functions: in the first level, the HH’s success is measured by function \(F:h \rightarrow \mathfrak {R}\) and in the second level, each solution \(x\in X \) is evaluated by an objective function \(f: x\rightarrow \mathfrak {R}\).

From a mapping function \(M: f(x) \rightarrow F(h)\), it is possible to define the purpose of a selection HH. The HH must optimize F(h) by means of the search for the optimal heuristic configuration \(h^*\), in H, thus \(h^*\) generates the optimal solution(s) \(x^*\) [15]. The formal HH definition, in a minimization optimization problem is summarized in Eq. 6.

$$\begin{aligned} \small F(h^* \mid h^* \rightarrow x^*, h^* \in H) \leftarrow f(x^*, x^* \in X) = min\{ f(x), x \in X\} \end{aligned}$$
(6)

In this paper, h relies on different choices for each IG component, f is associated with makespan or flowtime objectives, and F is measured by a reward function detailed in Sect. 3.

Hyper-heuristics aim therefore at providing more generalized solutions for optimization problems by producing good results when dealing with a set of instances or a set of problems. For this, HHs work in the heuristic space rather than the solution space. Based on specific strategies, they adapt low-level heuristics, which are used to solve the target problem(s).

We investigate six adaptation strategies commonly used in the HH literature: Random, \(\epsilon \)-greedy, Probability Matching, Multi-armed Bandit, LinUCB and Thompson sampling.

Random parameter selection is the simplest strategy and in most cases, it serves as a baseline for comparison between static and dynamic parameter selection. It might be beneficial depending on the chance of selecting the best parameter combination [5].

\(\epsilon \) -greedy is a simple strategy often referenced on the exploration-exploitation dilemma [22]. With probability \(\epsilon \), the parameter with the best average reward is chosen, otherwise, with probability \(1 - \epsilon \), a random one is selected.

Probability Matching (PM) [7] works as a roulette wheel selection biased towards the operators with best quality. The probability of selecting the k-th operator from a set of K operators at iteration t is given by:

$$\begin{aligned} P_{k,t} = P_{min} + (1 - K \times P_{min}) \frac{q_{k,t}}{\sum _{j=1}^K q_{j,t}} \end{aligned}$$
(7)

where \(q_{k,t}\) is the quality of k-th operator and \(0< P_{min} < 1/K\) is used to guarantee that every operator will have a minimum chance of being chosen. The quality values are updated according to the rewards:

$$\begin{aligned} q_{k,t} = q_{k,t-1} + \alpha \times (r^{acc}_{k,t} - q_{k,t-1}) \end{aligned}$$
(8)

where \(\alpha \) is the learning rate parameter and \(r^{acc}_{k,t}\) is the accumulated reward of operator k during a given update window of size W. The accumulation in \(r^{acc}_{k,t}\) considers either the average or extreme and, optionally, normalized reward values.

Multi-armed bandit (MAB) [1] algorithms are based on the Upper Confidence Bound for exploitation-exploration trade-off. In particular, the Fitness-Rate-Rank-Based Multi-armed Bandit [11] considers the dynamic search behavior with a sliding window of size W to store the rewards of each operator. The selected operator maximizes the expression:

$$\begin{aligned} FRR_{k,t} + C_s \left( \frac{2 \ln \sum _{l=1}^K n_l^t}{n_k^t} \right) \end{aligned}$$
(9)

where \(C_s\) is a scaling parameter, \(n_k^t\) is the number of times operator k is applied during the window of size W, and \(FRR_{k,t}\) is the k-th operator credit value given by:

$$\begin{aligned} FRR_{k,t} = \frac{D^{rank_k} \times r_k}{\sum _{l=1}^K D^{rank_l} \times r_l} \end{aligned}$$
(10)

in which \(D\) is the best operator influence decay parameter, \(r^{acc}_k\) is the k-th operator accumulated reward, and \(rank_k\) is the k-th operator reward sum rank.

Linear Upper Confidence Bound (LinUCB) [12] works by assuming that the reward for a given operator is linearly proportional to the values of contextual features, i.e., \(E[r_{k,t}|\boldsymbol{\phi }_{k,t}] = \boldsymbol{\phi }_{k,t}^T \mathbf {\theta ^*}_{k}\), where \(\boldsymbol{\phi }_{k,t}\) is the feature vector and \(\mathbf {\theta ^*}_{k}\) the unknown coefficients. We consider four fitness landscape metrics as the context for a local search procedure, calculated online during the local search step [17]:

  • Adaptive walk length: the total number of steps of the local search;

  • Autocorrelation: correlation between the fitness values observed with the fitness values of the previous solutions;

  • Fitness-distance correlation: correlation between fitness and insertion distance considering the initial and final solutions;

  • Neutrality: proportion of neighbors with equal fitness values.

Using a ridge regression formulation, the coefficients can be found efficiently with the following steps:

$$\begin{aligned} \begin{aligned} \mathbf {\theta ^*}_{k,t}&= A_{k,t}^{-1} b_{k,t} \\ P_{k,t}&= \mathbf {\theta ^*}_{k,t} \boldsymbol{\phi }_{k,t} + \alpha \sqrt{\boldsymbol{\phi }_{k,t} A_{k,t}^{-1} \boldsymbol{\phi }_{k,t}} \end{aligned} \end{aligned}$$
(11)

where \(\alpha \) is a learning rate parameter. The operator with maximum \(P_{k,t}\) is chosen, yielding the reward value \(r_{k,t}\), and the model update follows:

$$\begin{aligned} \begin{aligned} A_{k,t}&= A_{k,t-1} + \boldsymbol{\phi }_{k,t} \boldsymbol{\phi }_{k,t}^T \\ b_{k,t}&= b_{k,t-1} + r_{k,t} \boldsymbol{\phi }_{k,t}. \end{aligned} \end{aligned}$$
(12)

Thompson Sampling (TS) [19] strategy starts with a prior distribution, chooses the best operator by sampling, observes the output and updates the distribution. The Beta distribution \(Beta(S_{k,t}, F_{k,t})\) models Bernoulli trials where operator k has \(S_{k,t}\) successes (rewards \(r_{k} > 0\)) and \(F_{k,t}\) fails (rewards \(r_{k} \le 0\)). Therefore we choose the operator with:

$$\begin{aligned} op = \mathop {\text {arg max}}\limits _{k} Sample[Beta(S_{k,t}, F_{k,t})] \end{aligned}$$
(13)

and update the distribution after the reward:

$$\begin{aligned} \begin{aligned} S_{k,t}&= S_{k,t-1} + 1_{r_{k,t} > 0} \\ F_{k,t}&= F_{k,t-1} + 1_{r_{k,t} \le 0}. \end{aligned} \end{aligned}$$
(14)

Alternatively, the Dynamic TS [8] introduces a window size parameter W and a modified update rule (after iteration W) as follows:

$$\begin{aligned} \begin{aligned} S_{k,t}&= (S_{k,t-1} + 1_{r_{k,t} > 0}) \tfrac{W}{W+1} \\ F_{k,t}&= (F_{k,t-1} + 1_{r_{k,t} \le 0}) \tfrac{W}{W+1}. \end{aligned} \end{aligned}$$
(15)
Table 1. Hyper-parameters of the addressed adaptation strategies.

Table 1 shows a summary of all hyper-parameters used by the adaptation strategies, as well as two hyper-parameters common to all strategies: reward type and warm-up period. The warm-up period is considered at the beginning of the iterations where strategies are chosen randomly. Reward type is detailed in the next section.

3 Adaptive IG Proposal

Considered the state of the art for some FSP variants, the Iterated Greedy (IG) algorithm [18] is a successful iterative metaheuristic that encompasses five main steps: (1) the incumbent solution \(x\) is initialized, (2) a destruction phase randomly removes d jobs, (3) a construction procedure inserts each job at the best position, (4) a local search generates a new solution by exploiting the solution resulted from construction and (5) the new solution replaces the incumbent \(x\) according to an acceptance criterion. The last step accepts the new solution \(x''\) based on the following probability:

$$\begin{aligned} P_{accep}(x'') = \left\{ \begin{array}{ll} 1.0 &{} \text{ if } f(x'') < f(x) \\ \exp \left( \tfrac{-(f(x'') - f(x)) }{Temp} \right) &{} \text{ otherwise } \\ \end{array}\right. \end{aligned}$$
(16)

where f(.) is the cost function and Temp is the temperature defined by:

$$\begin{aligned} Temp = T \times \frac{\sum _{j=1}^{J}\sum _{m=1}^{M} p_{j, m}}{J* M* 10}. \end{aligned}$$
(17)

IG has two main parameters: the destruction size d and the temperature factor T, whose values IG authors [18] recommend to be set as \(d = 4\) and \(T = 0.5\). Other recommended configurations are Nawaz–Enscore–Ham (NEH) construction heuristic as the initialization and iterative improvement as local search. The local search iteratively inserts a job, chosen randomly without replacement, on the best position until there are no improvements [21]. This version is referred from now on as Standard IG.

The hyper-heuristic proposed in the present paper considers an adaptive strategy (one fixed among the six possible strategies described in Sect. 2.2) to update some components of an IG algorithm. An adaptive strategy requires a set H of possible choices and a reward function indicating how well a particular choice performs.

An important issue for defining an adaptive strategy is the reward function. It returns a real number representing the quality of a given choice. Good choices have positive rewards and bad choices receive negative ones. In the case of using IG to solve FSPs, reward of kth parameter at iteration t can be calculated based on the relative decrease in cost function value (e.g. makespan or any other objective function being considered):

$$\begin{aligned} r_{k,t} = (f(x_{before}) - f(x_{after}))/f(x_{before}). \end{aligned}$$
(18)

Here \(x_{before}\) and \(x_{after}\) represent solutions before and after the reward evaluation period (whose reference can be either local search or iteration). We have four possible periods during which a solution has its performance evaluated: before the iteration (bI) or the local search (bL) and after the iteration (aI) or local search (aL), giving rise to four types of reward:

  • bLaL: \((f(x_{\text {before local search}}) - f(x_{\text {after local search}}))/f(x_\text {before local search})\);

  • bIaL: \((f(x_{\text {before iteration}}) - f(x_{\text {after local search}}))/f(x_\text {before iteration})\);

  • bLaI: \((f(x_{\text {before local search}}) - f(x_{\text {after iteration}}))/f(x_\text {before local search})\);

  • bIaI: \((f(x_{\text {before iteration}}) - f(x_{\text {after iteration}}))/f(x_\text {before iteration})\).

As Fig. 1 shows, each reward type considers a different period to get the feedback for each solution considering its quality increase/decrease.

Fig. 1.
figure 1

IG algorithm and reward types: bLaL, bIaL, bLaI and bIaI.

Using the standard IG [18] as basis, with a budget computation time as the stopping criteria, we identify some components that can be adapted dynamically: type of the local search, type of perturbation, destruction size, neighborhood size, destruction position, local search focus. A summary of these adaptive components and their values are presented in Table 2.

In our case, different choices are possible for each IG component, for example, the number of deconstructions. We refer to the parameter values as discrete choices (also referred to arms in multi-armed bandit literature) indexed by \(\{1, \dots , K\}\), where \(K = |H|\) is the total number of choices. Therefore, according to Eq. 6, we have each choice representing a heuristic \(h \in H\), where the dynamic adaptation strategy of the HH chooses h as the k-th choice, \(k=1, \dots , |H|\).

The pool regarding the first choice sets encompasses the most usual options for Local search and Perturbation in the context of combinatorial optimization. The options for Destruction size are the ones often chosen in IG implementations for FSP. Finally, the pools for the last three choice sets are defined aiming to produce different granularities, starting from a coarse, ending to a fine search.

Table 2. IG adaptive parameters.

The HH based on dynamic learning proposed in this paper is capable of adapting different components (e.g. local search and perturbation) for each iteration of the adaptive IG. In local search for example, it can choose between the original IG’s iterative improvement, first improvement, first-best improvement, random best improvement. The last three local search options consider all possible insertions. The perturbation adaptation considers three possibilities: (1) the IG destruction-construction steps; (2) two random swaps and a transposition, as used in [21] for ILS on FSP , and (3) destruction-construction with iterative improvement local search between destructions, recently proposed in [6].

The neighborhood size adaptation considers the percentage of the neighborhood explored during the local search step. For example, exploring only half of the neighborhood at the beginning of the search could save time for exploitation at the end of the search. Similarly, choosing the destruction size parameter dynamically might improve the search during exploration-exploitation phases.

As mentioned, there are two mechanisms in IG that randomly choose jobs to be re-inserted: destruction and best-insertion local search. An adaptive mechanism can be applied in the last two IG components shown in Table 2 to focus these operators on parts of the solution that have a better chance of improvement. For that, we propose partitioning the solution into chunks and adaptively selecting from each chunk the job that will be sampled.

As shown in Table 2, for some parameters there are multiple possible choice pools \(\{H_1,H_2,...\}\). For example, the destruction size can be chosen from the set \(H_1=\{2,4\}\), \(H_2=\{4,6\}\), \(H_3=\{2,4,6\}\) or \(H_4=\{4,8\}\). Also on destruction position and local search focus, the solution can be partitioned into 3, 10 or \(J\) chunks (the last considering one arm for each job). Some HH hyper-parameters like the pool \(H_i\) for a component, reward type and update window are determined in a parameter tuning phase described in Sect. 4.

4 Tuning and Testing Phases

Based on the hyper-heuristic with dynamic learning and its six possible adaptive strategies described in Sect. 2.2, Sect. 3 has detailed the proposed adaptive IG. This section presents the hyper-parameter configuration performed by irace (tuning) using part of available data and the test performed on the tuned HH using the remaining one. The configurations are evaluated with a budget of \(J\times (M/ 2) \times 30\) ms on tuning and test phases. Algorithms are implemented in C++ using the Paradiseo library [9]. The experiments have been executed on a server with 8-core AMD EPYC 7542 processors and 16 GB of RAM. For the results analysis, we used the R language and relevant packagesFootnote 1.

Tuning phase Before running the experiments, we perform a tuning phase aiming to determine which strategy configuration (shown in Table 1) works best for each IG component (shown in Table 2). The random strategy is not considered for destruction position and local search focus, because a random choice in these cases is equivalent to the standard IG behaviour.

The irace algorithm [13] with default parameters and 5000 configurations evaluations is used to tune each combination of the six adaptive components and the six dynamic strategies. There are multiple choice pools \(\{H_1,H_2,...\}\) for destruction size, neighborhood size, destruction position and local search focus components. For example, the destruction size can be chosen from \(H_1=\{2,4\}\) or \(H_2=\{4,8\}\). In these cases, the pool choice is considered as an additional categorical parameter for irace during the tuning phase.

The instance set for the tuning phase is composed of 48 (3 \(\times \) 2 \(\times \) 3 \(\times \) 4) instances resulted from the combinations of: 3 sizes (20 or 50 jobs and 10 machines), 2 objectives (makespan or flowtime), 3 types (permutation, no-wait or no-idle FSPs), and 4 processing times distributions: (exponential, uniform, job-correlated or machine-correlated processing times).

Testing phase After tuning, the best configurations are tested with 10 restarts on a set of unseen instances with the same features, but sampled with different random seeds. Aiming to better evaluate algorithms’ generalization capabilities, we also include larger instances of \(J= 100\) and \(J= 200\) jobs and \(M= 20\) in the testing phase.

The evaluation for each algorithm is done using the Average Relative Percentage Deviation (ARPD) given by:

$$\begin{aligned} ARPD_{alg} = \frac{1}{R} \sum _{r=1}^{R} 100 \times \frac{f(x^{alg}_r) - f(x^{best})}{f(x^{best})} \end{aligned}$$
(19)

where R is the number of runs and \(x^{alg}_r\) is the best solution found by algorithm alg. The reference solution \(x^{best}\) is given by the best solution found by a standard IG (the same used by the authors in [18]) with a higher budget of \(J\times (M/ 2) \times 120\) ms and 30 restarts.

Finally, the configurations are also compared with the standard IG configuration to evaluate the effectiveness of the adaptive components in the presented scenario. In all comparisons, we highlight the lowest ARPDs and perform a Friedman rank sum test with Nemenyi post hoc to verify if the differences are statistically significant with p-value threshold of 0.05: results with no statistically significant differences from the best one are highlighted with gray background.

5 Results

Table 3 shows some hyper-parameter values (reward type, choice sets and update window) tuned by irace for each IG component addressed in the paper.

Table 3. Tuned hyper-parameters.

Different reward types have been tuned for the different adaptive components and HHs, with no dominance between the different options. The choice sets for the destruction size parameter enable more exploration with values higher than the default \(d = 4\), which is present in all options. The choice sets for neighborhood size are small (2 or 3), with preference for a coarse search, while the local search focus uses partitions with 10 or more choices in three among five cases. Finally, the update window is short for MAB and TS on perturbation component, indicating that its value (4 or 8 destructions) is often switched during the search. However, in most cases the strategies prefer less frequent changes, since large window sizes are the tuned parameters.

The ARPD values for each adaptive component and strategy are shown in Table 4, and they are calculated using Eq. 19, with \(R=10\) and considering all the different testing instances. The values for standard IG configuration [18] are computed with the same budget (\(J\times (M/ 2) \times 30\)) as the HHs proposals on the testing instances. Notice that this budget is lower than the one used to compute the reference \(f(x_{best})\) values for ARPD.

The adaptive components are able to improve the static standard IG configuration for all (perturbation, destruction size, destruction position and local search focus) but one IG component. Local search adaptation is not effective independently of the strategy used by the HH, meaning that the iterative improvement performed by the standard IG is quite effective compared with the other choices. TS adaptation achieves the lowest ARPD for perturbation and destruction position components, while \(\epsilon \)-greedy performs well on the local search focus task. As (biased) adaptation might not be the best option in some cases [5], the simple random strategy performs well for selecting the destruction size and neighborhood size. In all cases, the adaptive strategies TS and MAB are among the strategies with the lowest or statistically equivalent to the lowest ARPD values.

We see from Table 4 that TS is robust for different components but provides the lowest among all results when adapting IG perturbation. However, it is equivalent to most other HH proposals.

We compare TS configuring only perturbation with two others that try to adapt multiple components simultaneously: Adapt all components, Adapt all components except local search (for which no adaptation strategy was capable of improving the performance). The Adapt all approaches use strategies with the best ARPD values for each component from Table 4, which means, TS for perturbation and destruction position, Random for destruction and neighborhood sizes and \(\epsilon \)-greedy for Local search focus. The results in Table 5 show the metrics separated by objective, FSP type, processing times distribution and size. Overall, IG with adaptive perturbation obtains the lowest ARPDs. Adapting all components at the same time does not provide benefits, but when we eliminate the local search it performs like the best approach.

Table 4. Adaptation strategies ARPDs (and standard deviation) for each adaptive component and strategy. Lowest mean values are highlighted in bold, statistically equivalent values are highlighted with gray background.
Table 5. Adaptation strategies ARPDs (and standard deviation) for all adaptive components, perturbation and destruction size adaptation and standard (static) IG. Best mean and statistically equivalent values are in bold and gray background, respectively.

6 Conclusions

This paper proposed and analyzed the use of hyper-heuristic with dynamic strategies to adapt different components of the Iterated Greedy algorithm. Six different adapting strategies (random, \(\epsilon \)-greedy, probability matching, multi-armed bandit, LinUCB, and Thompson sampling) were tested to adapt six IG components (local search, perturbation, destruction size, neighborhood size, destruction position and local search focus). After a tuning phase performed by irace to set the best strategy hyper-parameters for each component being adapted, the proposal was tested in different variants of the flowshop problems.

Results show that, in most cases, the adaptation is able to improve the performance over the static standard IG configuration, especially the perturbation operator adapted using dynamic Thompson Sampling. Also, using multiple adaptive components did not seem to be beneficial, unless we fix the local search as iterated improvement, a fact that deserves further investigation.

The work can be expanded by including different flowshop problems (objectives and constraints), adaptation strategies, and alternative operators. In addtion, we intend to propose modifications to improve the performance of the Adapt all approach.