Fitness Landscape Analysis of TPOT Using Local Optima Network

Teixeira, Matheus Cândido; Pappa, Gisele Lobo

doi:10.1007/978-3-031-45392-2_5

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14197))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

535 Accesses
1 Citation

Abstract

AutoML addresses the challenge of automatically configuring machine learning pipelines for specific data analysis tasks. These pipelines encompass techniques for preprocessing and classifying data. Numerous approaches exist for discovering the optimal pipeline configuration, with most focusing on optimization methods such as Bayesian optimization and evolutionary algorithms. Nevertheless, limited knowledge exists regarding the structure of the search space that these methods operate within. What is certain is that these spaces incorporate categorical, continuous, and conditional hyperparameters, and effectively handling them is not straightforward. To shed light on this matter, the present study conducts an examination of AutoML search spaces generated by the Tree-based Pipeline Optimization Tool (TPOT) algorithm utilizing local optimal networks (LON). The goal is to gain deeper insights into the overall characteristics of the search space, enhancing comprehension of the search strategy employed and the algorithm’s limitations. This investigation aids in understanding the search strategy and constraints of the algorithm, ultimately contributing to the advancement of novel optimization algorithms or the refinement of existing ones within the scientific literature. The findings have implications for enhancing optimization algorithms by illuminating how the search space is explored and the consequent impact on the discovered solutions.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Analysis of Neutrality of AutoML Search Spaces with Local Optima Networks

Incremental Search Space Construction for Machine Learning Pipeline Synthesis

A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search Behaviour

1 Introduction

In the last decade, Automated Machine Learning (AutoML) has experienced significant advancements in the development of algorithms and strategies for generating tailored machine learning pipelines for specific problems [8]. A machine learning pipeline consists of a sequence or parallel execution of operations, including data preprocessing, classification, and post-processing. While classification is commonly used, other machine learning tasks like regression or clustering can also be accommodated. These methods aim to improve accuracy while considering computational constraints, as evaluating these pipelines requires substantial computational resources. Major cloud platforms, such as Google, Microsoft, and Amazon, have incorporated AutoML methods into their offerings [4].

The generation of these pipelines primarily relies on Bayesian optimization, evolutionary computation, or hybrid techniques [8]. In recent years, the focus has shifted towards optimizing the hyperparameters of artificial neural networks, a process known as Neural Architecture Search (NAS) [3]. However, little is known about the structure of the search space in both AutoML and NAS domains, which could provide insights into more effective and efficient exploration methods.

Given the success of evolutionary computation in exploring these pipeline spaces, which started with basic methods like TPOT [12] and Recipe [16] and progressed to more advanced approaches like CMA-ES [7], it is natural to investigate the fitness landscape that these methods traverse. The fitness landscape of a problem is defined by three components: the set of valid solutions ($\mathcal {X}$), the neighborhood of a solution ($\mathcal {N}$), and a fitness function ($f: \mathcal {X}\rightarrow \mathbb {R}$) that evaluates the solutions in $\mathcal {X}$. By assessing solutions using f and considering the notion of neighborhood, the fitness landscape can be characterized.

Previous research has explored the fitness landscape of AutoML problems using Local Optimal Networks (LON) [10], which offers a global perspective compared to traditional metrics such as FDC. LONs provide a comprehensive view of the fitness landscape, while FDC focuses on local features of the search space. However, previous work employed an artificial search space generated using brute force, with a fixed number of neighbors for each solution, which is unrealistic since many algorithms employ efficient strategies to explore the search space. To address this limitation, we construct the search space using TPOT, an AutoML algorithm.

Essentially, our grammar and search space represent 69,960 valid machine learning pipelines that comprise a classification algorithm and potentially a preprocessing algorithm^{Footnote 1}. Although our grammar is smaller compared to the default TPOT configuration space, which can represent an enormous number of different ML pipelines, this simplification allows us to evaluate every pipeline represented by our grammar and determine the global optimum in the search space. The results reveal that in many cases, TPOT fails to optimize the problem, frequently evaluating suboptimal solutions instead of the global optimum in numerous datasets, indicating TPOT’s struggle to optimize in certain search spaces and its tendency to get trapped in local optima.

This work contributes to the field by constructing a new search space for AutoML problems using a real optimization algorithm (TPOT), proposing a novel approach for assigning weights to edges in constructing the Local Optima Network (LON), and conducting an analysis of the constructed LONs.

The remainder of this paper is structured as follows: Sect. 2 reviews related work, Sect. 3 presents our methodology, including details about the grammar, TPOT configuration, search space construction, and LON generation. Section 4 describes our experimental setup, including the datasets used, statistical information, and hardware specifications. Section 5 presents the results and ensuing discussion. Finally, Sect. 6 concludes the paper.

2 Related Work

Few works have looked so far at the fitness landscape of AutoML problems. Garciarena et al. [6] were the first to perform the analysis of AutoML landscapes considering a subspace of TPOT. Their objective was to identify the local characteristics of the space close to optimal solutions by using metrics such as slope, roughness and neutrality. Their results suggest that many regions of high fitness exist in the space, but these are prone to overfitting. In this same direction, Pimenta et al. [13] looked at fitness landscape metrics to better understand the search space of a huge space of machine learning pipelines. They looked at fitness distance correlation (FDC) and metrics of neutrality, and concluded FDC was a poor metric for performing the analyses.

Turning from AutoML complete pipelines to analyses of loss function in neural architecture search (NAS), Rodrigues et al. [15] characterized fitness landscapes of meta-heuristics for neuroevolution of Convolutional Neural Networks using autocorrelation and the entropic measure of ruggedness. Nunes et al. [9] also analyzed the fitness landscape of NAS in the context of graph neural network architectures. They used FDC together with the dispersion metric (which measures the dispersion between the funnels in the landscape), and have also looked at the neutrality of the space.

In the realm of AutoML problems, there has been ongoing research on the fitness landscape. Pushak et al. [14] conducted an assessment of the landscape of algorithm configuration, specifically focusing on the modality and convexity of parameter responses. Their approach involved defining parameter response slices, where a specific parameter, denoted as p, was varied within a defined window centered around an optimum solution identified by SMAC. By keeping all other parameters constant and measuring the algorithm’s performance as a function of p, they evaluated various algorithms for typical optimization problems. The findings revealed that many of the parameter slices exhibited unimodal and convex characteristics, both within instance sets and on individual instances.

One of the few works analyzing fitness landscapes based on LONs and related to our research is presented in [17]. The authors adapted LONs to analyze the overall structure of parameter configuration spaces. They examined the metrics derived from LONs and fitness distance correlation (FDC), observing significant discrepancies when tuning the same algorithm for different problem instances. Notably, in complex scenarios, a substantial number of sub-optimal funnels were identified, while simpler problems displayed a single global funnel. Similarly, Cleghorn et al. [2] investigated parameter spaces for Particle Swarm Intelligence (PSO) with a similar objective. Their analysis unveiled that the macro-level view of PSO’s parameter landscapes appeared relatively simple, yet the micro-level view revealed a much higher level of complexity. This complexity posed challenges for parameter tuning, exceeding the initial assumptions.

3 Methodology

The problem of AutoML can be formally defined as a generalization of the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem [5]. In its original definition, given a set $\mathcal {A} = \{ A^{(1)}, A^{(2)}, \ldots , A^{(k)} \}$ of learning algorithms, where each algorithm $A^{(j)}$ has a hyperparameter space $\varLambda ^{(j)} = \{ \lambda ^{(1)}, ..., \lambda ^{(S)}\}$, defined from the full set of algorithm’s hyper-parameters $\mathbf { \Omega }$, the CASH problem is defined as in Eq. 1^{Footnote 2}.

$$\begin{aligned} A^*_{\lambda ^*} = \underset{A^{(j)}\in \mathcal {A}, \lambda \in \varLambda ^{(i)}}{\textrm{argmax}} \frac{1}{k} \sum _{i=1}^{k}\mathcal {F}\left( \mathcal {A}_{\lambda }^{(j)},\ \mathcal {D}_{train}^{(i)}, \mathcal {D}_{valid}^{(i)}\right) \end{aligned}$$

(1)

where $\mathcal {F}(A_\lambda ^{(j)}, \mathcal {D}_{train}^{(i)}, \mathcal {D}_{valid}^{(i)})$ is the gain achieved when a learning algorithm A, with hyperparameters $\varLambda $, is trained and validated on disjoint training and validation sets $\mathcal {D}_{train}^{(i)}$ and $\mathcal {D}_{valid}^{(i)}$, respectively, on each partition $1 \le i \le k$ of a k-fold cross-validation procedure.

A generalization can be made if we replace $\mathcal {A}$ by a set of pipelines $\mathcal {P} = \{ {P}^{(1)}, ..., {P}^{(V)}\}$, which includes a subset of algorithms from $\mathcal {A}$ and their respective set of hyperparameters ${{\varGamma ^{(i)}}} = \{ {{ \varLambda ^{(1)}}}, ..., {{\varLambda ^{(S)}}\}}$, represented by the full set $\mathbf {\Psi }$, as defined in Eq. 2

$$\begin{aligned} {\mathbf {P^{*}}}_{\mathbf {\Gamma ^{*}}} = \underset{\mathcal {{P}}^{(i)} \subseteq {\textbf{P}}, {\mathbf {\Gamma ^{(i)}}} \subseteq {\mathbf {\Psi }}}{\textrm{argmax}} \frac{1}{K} \cdot \sum _{j=1}^{K} \mathcal {F}({\mathbf {P^{(i)}} }_{\mathbf {\Gamma ^{(i)}}}, D^{(j)}_{train}, D^{(j)}_{valid}) \end{aligned}$$

(2)

Figure 1 illustrates a fitness landscape, where the horizontal axis represents the configuration space and the vertical axis represents the fitness of a given configuration. As the landscape fitness concept is multidisciplinary (it can be applied to several problems), it is necessary to define what the configuration space represents and how the fitness is calculated. Therefore, in this article, the configurations space is formed by ML pipelines and the fitness was defined as the F1-score of the pipeline.

ML pipelines can be represented through a tree as illustrated in Fig. 2. Each configuration (represented by the horizontal axis) in Fig. 1 corresponds to a pipeline according to the examples illustrated in Fig. 2. Pipelines are built according to a context-free grammar that defines that every grammatically correct pipeline has 0 or 1 preprocessing algorithms and a classification algorithm. In the figure, the leaf nodes correspond to the algorithms and their respective hyperparameters. Although some algorithms have continuous hyperparameters, a set of discrete values have been selected and can be consulted through grammar^{Footnote 3}.

3.1 TPOT: Tree-Based Pipeline Optimization Tool

TPOT [12] is an automatic optimization tool for ML pipelines (ie, AutoML) that uses Genetic Programming (GP) as optimization heuristics. GP, as well as several algorithms based on the theory of evolution, use crossover, mutation, recombination and selection operators to evolve individuals. In the case of TPOT, individuals are ML pipelines, whose algorithms and hyperparameters are defined through a configuration file.

3.2 Construction of the Fitness Landscape

To analyze how TPOT explores the space, we first select a relatively small configuration space composed of 4 options of preprocessing algorithms (one option is not to use a preprocessing algorithm) and 5 classification algorithms. Each algorithm has a set of hyperparameters and, considering algorithms and hyperparameters, it is possible to form about 70,000 different pipelines. Each pipeline is made up of a classification algorithm and (optionally) a preprocessing algorithm. We evaluate the fitness (f1-score) of each of these solutions.

Then, we run the TPOT in that same space to be able to analyze how the algorithm explores the space, that is, if the TPOT, i.e. the GP, is an algorithm capable of effectively optimizing the AutoML problem. We run TPOT 30 times on each dataset using a different seed for statistical purposes. To build the fitness landscape, it is necessary to define the concept of neighborhood. We define the neighborhood through 3 different types of operators: mutation, recombination and reproduction. When a solution u is mutated and generates another solution v, we say that v is a neighbor of u, that is, $v \in \mathcal {N}(u)$. The same process occurs in the case of recombination. As for the crossover, the process is different. Two solutions $u_1$ and $u_2$ are selected for reproduction and a solution v is generated as a product of this operation. Figure 3 illustrates the process.

3.3 Compressing Neutral Nodes

The LON – shown below – is affected by the number of local optima in space. By definition, a locally optimal solution is a solution whose fitness is greater than that of its neighbors, therefore, a local optimum can be verified by comparing its fitness with that of its neighbors or through a local search algorithm like Hill Climbing, where the local optimum is the solution where the algorithm does not find any other optimization points. Regardless of the method used, both methods should achieve the same result, however, after some experiments, a divergence was observed in the nodes identified as local optima by both methods. After analyses, it was identified that the order in which the local search algorithm visits neutral regions of the fitness landscape can generate a greater amount of optima because the first solution visited in the neutral region is reported as an LO – there is no solution with fitness greater than current. Although dealing with the Hill Climbing acceptance criteria seems to be a solution to the problem – seeking to advance in a solution with fitness greater than or equal to the current one – the problem still persists if there is no solution with fitness greater than the one already found in the region neutral.

Therefore, the solution adopted in this work was to compress neutral solutions into a single node and insert an attribute that counts the number of merged nodes for later analysis purposes. The strategy used to find neutral solutions was to use Breadth-first search (BFS) to enter neutral regions through the neighborhood of a solution. After this adjustment, the graph with clustered neutral regions is used to construct the LON.

3.4 Local Optima Networks (LON)

A LON is represented by a directed graph LON = (V,E), where V are the vertices that represent the local optima of the fitness landscape and E are the edges that connect the vertices, which can represent basin-transition, escape or permutation edges, as detailed below. By extracting a set of features from this network, we obtain a view of the global structure of the fitness landscape of a problem. Here, we use the original LON model [11] to perform our analysis, along with other models more adapted to neutral spaces, such as Compressed LON (CLON), Monotonic LON (MLON), and Compressed Monotonic LON (CMLON) [1, 10].

3.5 Nodes

A local optimum is defined as a solution x such that $\forall x' \in \mathcal {N}(x),\, f(x) \ge f(x')$, where $\mathcal { N }(\cdot )$ is the neighborhood operator and $f(\cdot )$ is the fitness function. The local optima are defined by a local search algorithm. In this work, the local search looks for solutions with fitness greater than or equal to the current fitness.

Note that the f-measure varies from 0 to 1, and some machine learning pipelines may differ only on the third or fourth decimal point. These small differences make little to no effect on the results, and hence we could say solutions with these differences in fitness are neutral. Because of that, a pipeline $p_i$ is defined as better than a pipeline $p_j$ if the fitness value of $p_i$ is greater than the fitness value of $p_j + \delta $, where $\delta $ is a specified tolerance value given by the standard deviation of the mean fitness value of 30 independent random samples.

The literature defines a variety of local search algorithms to find the local optima of a LON, including Iterated Local Search (ILS) and Tabu Search. We have used a classical hill-climbing, as the search space is combinatorial and enumerable.

3.6 Edges

Once the LON nodes are determined, we define whether there is an edge $e_{ij}$ between two local optima $LO_i$ and $LO_j$ by defining a edge weight $w_{ij}$. An edge exists when $w_{ij} >0$. The literature defines at least three ways of assigning weights to LON edges [10, 11, 18]: basin-transition edges, escape edges, and perturbation edges. Each of these methods brings new information about the relationship between local optima, but basin-transition is more useful for combinatorial problems because of its probabilistic nature. A problem with this method is that it takes a long time to evaluate the weight of every edge (in the order of $O(n^3)$), so we propose a new methodology: common ancestor, as described below, and use the basin-transition as a baseline.

Basin-Transition (BT): As the local search defines a mapping from the search space to a set of locally optimal solutions, it also generates basins of attraction. The basin of attraction $b_i$ of a local optimum $LO_i$ is composed by all solutions s in the search space that satisfy $LS(s)=LO_i$, that is, $ b_i = \{v \in V(G)\, |\, LS(v) = LO_i\}$. Therefore, in the basin transition method [11], the weight of the edge that connects two local optima $LO_i$ and $LO_j$ is given by:

$$\begin{aligned} w(e_{ij})=\frac{1}{|b_i|}\sum _{s \in b_i}{\sum _{s' \in b_j}{p(s \rightarrow s')}} \end{aligned}$$

(3)

where $p(s \rightarrow s')$ is the probability of a mutation in s generates $s'$.

Common Ancestor (CA): This method assigns weights to edges proportionally to the number of common ancestors. A node v is an ancestor of u if there exists a path that connects v to u, i.e., $u \leadsto v$.

3.7 Network Statistics

Number of Nodes: Indicates the total number of nodes (or vertices).

Number of Edges: Indicates the total number of edges present in the graph.

Number of Self-loops: Indicates the number of simple loops in the graph, where an edge’s destination is equal to its origin (e.g., (u, u)).

Number of Isolated nodes: Indicates the number of nodes without any incoming or outgoing edges.

Degree Centrality: Measures the importance of a node in the graph based on the number of edges incident to that node. It quantifies the node’s connectivity within the network.

In-Degree: The number of edges that terminate at a given vertex. It indicates the importance of a solution in the search space and identifies frequently visited solutions during exploration.

Out-Degree: The number of edges that originate from a given vertex. It indicates the importance of a solution in the search space and identifies solutions that are challenging for the search algorithm to reach.

Density: Measures the proportion of edges present in the graph relative to the total number of possible edges.

Mean Clustering: The average clustering coefficient of each node in the graph, which captures the degree of interconnectedness among a node’s neighbors.

Total Weight: The sum of the weights of all edges present in the graph.

Increasing Weights: The sum of the weights of edges that connect nodes where the fitness of the destination node is greater than the fitness of the origin node.

Decreasing Weights: The sum of the weights of edges that connect nodes where the fitness of the destination node is smaller than the fitness of the origin node.

Neutral Weights: The sum of the weights of edges that connect nodes where the fitness of the destination node is equal to the fitness of the origin node.

4 Experimental Setup

4.1 Characterization of the Fitness Landscapes

The fitness landscape of a problem depends directly on the data being analyzed. In this work the pipelines were evaluated in 20 datasets selected from UCI Machine Learning Repository^{Footnote 4} and from Kaggle^{Footnote 5}. The selection criteria were: (i) popularity, (ii) numerical or categorical features and (iii) the task intended was classification.

Considering the search space defined in Sect. 3, we generate all the solutions and evaluated them for each of the 20 datasets, generating 20 different fitness landscapes.

Table 1 presents some features of the datasets used to generate the fitness landscape. The “Code” column indicates the code used to reference each dataset, the “Instances” column indicates the number of instance, the “Features” column indicates the number of features, the “Classes” column indicates the number of classes present in the target feature. Following, the “#Optimum” column indicates the number of solutions that achieve the value of optimal fitness. The “Var.”, “Mean”, “Max.” and “Min.” columns indicate the variance, mean, highest and lowest fitness value of the evaluated pipelines.

Table 1. Summary of the fitness value of the pipelines evaluated in each dataset

Full size table

Further, Fig. 4 shows the box-plots of the fitness distribution of the pipelines generated for each dataset. Note that, for some datasets, the fitness of the solutions is predominantly high or low, while for others they are better distributed. Observe that this distribution does not affect the FLA, but gives an insight on the difficulty of the problem.

Table 1 shows more detailed statistics of the fitness of the pipelines evaluated for each dataset, summarized in the box-plots of Fig. 4. The variance is relatively low since the fitness (F1-score) can vary from 0.00 to 1.00.

Figure 5 shows the total time required for evaluating the entire configuration space. The total time is equal to the sum of the time needed to train and evaluate each solution individually. Some factors that justify the variation in total time are the number of samples and the size of each dataset. For example, DS09 is the largest dataset (51 features). The experiments were run on an Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10 GHz and approximately 65 GB of RAM.

The configuration space generated by TPOT is composed of pipelines that have a feature selection algorithm, a preprocessing algorithm and a classification algorithm. There are approximately $3.94\times 10^{10}$, $2.66\times 10^{10}$ and $1.50\times 10^{38}$ combinations of selection, preprocessing and classification algorithms, respectively, considering the different combinations of hyperparameters.

The TPOT configurations are: population_size is 10, generations is 30, mutation rate is 0.9, crossover rate is 0.1 and the scoring function is f1-weighted. The experiments were run on 20 datasets for selected classification problems from the Kaggle platform.

5 Results

Our experiments aim to verify how the TPOT explores the configuration space. The first experiments analyze the performance of TPOT until finding the global optimum (GO) and the number of times the algorithm was able to find the GO. This helps to verify the difficulty faced by the algorithm in the search spaces of the experiment. As the entire space was evaluated by brute force and the space was conditioned to only a few solutions (about 70 thousand), it is possible to detect whether the algorithm found the GO.

The results of the fitness landscape of TPOT show that some solutions are evaluated frequently, that is, even when changing the seed of the algorithm, a subset of solutions (in some cases only one) are evaluated in most executions (more than 70% of the times). The analysis of these solutions indicates that in all cases, they are not the global optimum and that the average error (average of the difference between their fitness and the global optimum) is 0.06 with a standard deviation of 0.07.

Also, TPOT is not able to optimize the datasets in several cases (even in this small search space). For example, it is possible to verify through the column “Hits (%)” of Table 2 that TPOT does not find the global optimum in all executions in most datasets. For example, the global optimum was only found in 16,67% of the runs in the raisin dataset, while in the bank and nursery dataset it was found in all TPOT runs.

However, regardless of the number of hits, in all cases, the average error (between the highest fitness found and the global optimum in space) occurs only from the third decimal place. This result indicates that the TPOT was close to the global optimum in all runs, since a difference of 0.004 in the f1-score between different ML pipelines is not a significant gain or loss in most applications.

Table 2. Summarization of 30 runs of TPOT using the same

Full size table

Table 3. Performance do TPOT: regarding the number of generations until finding the GO

Full size table

Table 3 presents the generation in which the TPOT finds the global optimum (GO). The column “Gen.” indicates which generation the first GO was found. The other two columns indicate the amount that was found and the amount that the dataset has. Finding the GO in a large generation indicates that the algorithm found some difficulty in exploring the space and this may be an indication of the difficulty of the problem. Another aspect is the amount of GOs found (especially in multimodal problems) in the space, since the exploration of the search space is an important feature because of the diversity of solutions, which is an important feature for AutoML, as it allows the choice between different algorithms. For example, in the DS08 dataset, the number of GOs is 96, while the average of optimals found by TPOT is only 18.87, that is, a large part of the space diversity remains unexplored.

Figure 6 presents the centrality, density and clustering statistics of the graphs generated by the TPOT exploration. Reported results are the average of 30 runs of the experiment. It is possible to observe that, when we create the LON using the CA method, the results are closer to the original graph, while the BT method presents more discrepant results.

In the three metrics shown in the figure, the BT method has values greater than those calculated in the other graphs. This allows us to conclude that the BT results in a LON with greater centrality and denser than the others.

Figure 7 shows several runs of the algorithm on the same dataset, but with different seeds. It is possible to verify that in the first case, the algorithm was not able to find the GO and the exploration was concentrated in only one region — represented by the densest region in the Figure. In the second case, the algorithm found 75% of the global optima in space and the graph has many explored regions and the concentrations are scattered around several solutions. In the third case, the algorithm was able to find all GOs and the Figure indicates a concentration of the exploration around some solutions (visually less than in the second case). The explanation for this is that some hyperparameters do not affect the fitness of the solutions (in certain spaces), therefore, all solutions that differ only in the values of such hyperparameters, obtain the same fitness. Therefore, some solutions are close to each other due to this phenomenon.

Figure 8 presents the results of various graph statistics computed on the LON CA and LON BT graphs. For comparison purposes, the value of the statistics were normalized by the maximum value obtained in each method, that is, we divided the value of each statistic by the maximum value found in datasets. It is possible to observe that for both the in-degree and the out-degree, the statistic evaluated in LON BT is smoother (and higher) than the one evaluated in LON CA. In the other statistics, both vary significantly as the datasets vary. Therefore, we initially analyzed the correlation between the statistics of the two methods.

Considering the relationship between the LON BT and CA, it is possible to verify that the correction between the average degree of entry and exit is 0.051, that is, there is no correlation between the two types. Likewise, the correlation between the weight of the edges that form loops in the LON is -0.335, that is, the correlation is equally weak. The correlation between the number of edges and the weight of the graph is 0.5653 and 0.5414, respectively. Although in these last two cases the correlation is greater, it is still not possible to consider that there is a strong relationship between them. Thus, it is possible to observe that the structure of the LON are relatively different from each other.

In order to compare the difference between both methods, the following statistical test was performed: The correlation between the metrics of the two LONs were correlated with the features of the TPOT exploration graph (From Table 2: Hits (%), Error, Fitness, Global/ From Table 3: Generation, # Found, # Global) and the t-test^{Footnote 6} was used to compare whether there is a difference between the two populations. The resulting p-value was 0.57, which allows accepting the null hypothesis that both means are equal. In this case, we consider that the equality of the mean is a measure of the similarity between the two populations.

6 Conclusion

The results show that TPOT suffers somewhat from local optima, as indicated by the fact that suboptimal solutions remain for many generations. However, the solutions have a fitness relatively close to the global optimum (difference less than $10\times 10^{-4}$).

Another observation is the importance of repeating the experiments, as the algorithm is affected by the position where it was initialized. TPOT can quickly converge to suboptimal solutions and fail to explore space. Through several iterations of the algorithm, it is possible to verify and ’force’ the space exploration through different initialization points.

The results also show that LON CA has structural differences in relation to LON BT. The correlations between several metrics between them are low, which suggests that the two methods create “different” LONs. However, both methods produce a graph that has a similar correlation with the metrics of the original graph, with the advantage that the CA method is more efficient. It can be computed through the adjacency matrix of the graphs (cost O($N^2$)), while the BT method needs to calculate the combinations between pairs of adjacent nodes (cost O($N^3$)).

Notes

1.
In this work, the terms machine learning pipeline, solution, and configuration are used interchangeably, referring to one or more preprocessing/classification algorithms and their parameters.
2.
The original definition casts the problem as a minimization one. Here we replace the loss function by a gain function.
3.
Link to grammar.
4.
https://archive.ics.uci.edu/ml/index.php.
5.
https://www.kaggle.com/datasets.
6.
The same variance was not assumed for the two populations.

References

Adair, J., Ochoa, G., Malan, K.M.: Local optima networks for continuous fitness landscapes. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1407–1414. ACM, New York (Jul 2019). https://doi.org/10.1145/3319619.3326852, https://dl.acm.org/doi/10.1145/3319619.3326852
Cleghorn, C.W., Ochoa, G.: Understanding parameter spaces using local optima networks. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1657–1664. ACM, New York (Jul 2021). https://doi.org/10.1145/3449726.3463145, https://dl.acm.org/doi/10.1145/3449726.3463145
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(1), 1997–2017 (2019)
MathSciNet MATH Google Scholar
Erickson, N., et al.: Autogluon-tabular: robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505 (2020)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Google Scholar
Garciarena, U., Santana, R., Mendiburu, A.: Analysis of the complexity of the automatic pipeline generation problem. In: 2018 IEEE Congress on Evolutionary Computation (CEC). pp. 1–8. IEEE (jul 2018). https://doi.org/10.1109/CEC.2018.8477662, https://ieeexplore.ieee.org/document/8477662/
G. Shala, Biedenkapp, A., N.Awad, Adriaensen, S., M.Lindauer, Hutter, F.: Learning step-size adaptation in cma-es. In: Proceedings of the Sixteenth International Conference on Parallel Problem Solving from Nature (PPSN 2020) (Sep 2020)
Google Scholar
Hutter, F., Kotthoff, L., Vanschoren, J.: Automated machine learning: methods, systems, challenges. Springer Nature (2019). https://doi.org/10.1007/978-3-030-05318-5
Nunes, M., Fraga, P.M., Pappa, G.L.: Fitness landscape analysis of graph neural network architecture search spaces. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 876–884. ACM, New York (jun 2021). https://doi.org/10.1145/3449639.3459318, https://dl.acm.org/doi/10.1145/3449639.3459318
Ochoa, G., Chicano, F.: Local optima network analysis for MAX-SAT. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1430–1437. ACM, New York (jul 2019). https://doi.org/10.1145/3319619.3326855, https://dl.acm.org/doi/10.1145/3319619.3326855
Ochoa, G., Verel, S., Daolio, F., Tomassini, M.: Local Optima Networks: A New Model of Combinatorial Fitness Landscapes, pp. 233–262 (2014). https://doi.org/10.1007/978-3-642-41888-4_9, http://link.springer.com/10.1007/978-3-642-41888-4_9
Olson, R.S., Moore, J.H.: Tpot: a tree-based pipeline optimization tool for automating machine learning. In: Workshop on Automatic Machine Learning, pp. 66–74 (2016)
Google Scholar
Pimenta, C.G., de Sá, A.G.C., Ochoa, G., Pappa, G.L.: Fitness Landscape Analysis of Automated Machine Learning Search Spaces, pp. 114–130 (2020). https://doi.org/10.1007/978-3-030-43680-3_8, http://link.springer.com/10.1007/978-3-030-43680-3_8
Pushak, Y., Hoos, H.: Algorithm configuration landscapes: In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds.) PPSN 2018. LNCS, vol. 11102, pp. 271–283. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99259-4_22
Chapter Google Scholar
Rodrigues, N.M., Silva, S., Vanneschi, L.: A Study of Fitness Landscapes for Neuroevolution (jan 2020), http://arxiv.org/abs/2001.11272
de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16
Chapter Google Scholar
Treimun-Costa, G., Montero, E., Ochoa, G., Rojas-Morales, N.: Modelling parameter configuration spaces with local optima networks. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, pp. 751–759 (2020)
Google Scholar
Yafrani, M.E., et al.: A fitness landscape analysis of the travelling thief problem. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 277–284. ACM, New York (jul 2018). https://doi.org/10.1145/3205455.3205537, https://dl.acm.org/doi/10.1145/3205455.3205537

Download references

Acknowledgements

We would like to thank IFMT for the financial support for transportation and accommodation during the participation in the Conference.

Author information

Authors and Affiliations

Instituto Federal de Mato Grosso (IFMT), Cuiaba, Brazil
Matheus Cândido Teixeira
Universidade de Minas Gerais (UFMG), Belo Horizonte, Brazil
Gisele Lobo Pappa

Authors

Matheus Cândido Teixeira
View author publications
Search author on:PubMed Google Scholar
Gisele Lobo Pappa
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Matheus Cândido Teixeira .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teixeira, M.C., Pappa, G.L. (2023). Fitness Landscape Analysis of TPOT Using Local Optima Network. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14197. Springer, Cham. https://doi.org/10.1007/978-3-031-45392-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-45392-2_5
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45391-5
Online ISBN: 978-3-031-45392-2
eBook Packages: Computer ScienceComputer Science (R0)

Fitness Landscape Analysis of TPOT Using Local Optima Network

Abstract

Similar content being viewed by others

Analysis of Neutrality of AutoML Search Spaces with Local Optima Networks

Incremental Search Space Construction for Machine Learning Pipeline Synthesis

A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search Behaviour

1 Introduction

2 Related Work

3 Methodology

3.1 TPOT: Tree-Based Pipeline Optimization Tool