key: cord-027286-mckqp89v authors: Ksieniewicz, Paweł; Goścień, Róża; Klinkowski, Mirosław; Walkowiak, Krzysztof title: Pattern Recognition Model to Aid the Optimization of Dynamic Spectrally-Spatially Flexible Optical Networks date: 2020-05-23 journal: Computational Science - ICCS 2020 DOI: 10.1007/978-3-030-50423-6_16 sha: doc_id: 27286 cord_uid: mckqp89v The following paper considers pattern recognition-aided optimization of complex and relevant problem related to optical networks. For that problem, we propose a four-step dedicated optimization approach that makes use, among others, of a regression method. The main focus of that study is put on the construction of efficient regression model and its application for the initial optimization problem. We therefore perform extensive experiments using realistic network assumptions and then draw conclusions regarding efficient approach configuration. According to the results, the approach performs best using multi-layer perceptron regressor, whose prediction ability was the highest among all tested methods. According to Cisco forecasts, the global consumer traffic in the Internet will grow on average with annual compound growth rate (cagr) of 26% in years 2017-2022 [3] . The increase in the network traffic is a result of two main trends. Firstly, the number of devices connected to the internet is growing due to the increasing popularity of new services including Internet of Things (IoT ). The second important trend influencing the traffic in the internet is popularity of bandwidth demanding services such as video streaming (e.g., Netflix ) and cloud computing. The Internet consists of many single networks connected together, however, the backbone connecting these various networks are optical networks based on fiber connections. Currently, the most popular technology in optical networks is wdm (Wavelength Division Multiplexing), which is expected to be not efficient enough to support increasing traffic in the nearest future. In last few years, a new concept for optical networks has been deployed, i.e., architecture of Elastic Optical Networks (eons). However, in the perspective on the next decade some new approaches must be developed to overcome the predicted "capacity crunch" of the Internet. One of the most promising proposals is Spectrally-Spatially Flexible Optical Network (ss-fon) that combines Space Division Multiplexing (sdm) technology [14] , enabling parallel transmission of co-propagating spatial modes in suitably designed optical fibers such as multi-core fibers (mcfs) [1] , with flexible-grid eons [4] that enable better utilization of the optical spectrum and distanceadaptive transmissions [15] . In mcf-based ss-fons, a challenging issue is the inter-core crosstalk (xt) effect that impairs the quality of transmission (QoT ) of optical signals and has a negative impact on overall network performance. In more detail, mcfs are susceptible to signal degradation as a result of the xt that happens between adjacent cores whenever optical signals are transmitted in an overlapping spectrum segment. Addressing the xt constraints significantly complicates the optimization of ss-fons [8] . Besides numerous advantages, new network technologies bring also challenging optimization problems, which require efficient solution methods. Since the technologies and related problems are new, there are no benchmark solution methods to be directly applied and hence many studies propose some dedicated optimization approaches. However, due to the problems high complexity, their performance still needs a lot of effort to be put [6, 8] . We therefore observe a trend to use artificial intelligence techniques (with the high emphasis on pattern recognition tools) in the field of optimization of communication networks. According to the literature surveys in this field [2, 10, 11, 13] , the researchers mostly focus on discrete labelled supervised and unsupervised learning problems, such as traffic classification. Regression methods, which are in the scope of that paper, are mostly applied for traffic prediction and estimation of quality of transmission (QoT ) parameters such as delay or bit error rate. This paper extends our study initiated in [7] . We make use of pattern recognition models to aid optimization of dynamic mcf-based ss-fons in order to improve performance of the network in terms of minimizing bandwidth blocking probability (bbp), or in other words to maximize the amount of traffic that can be allocated in the network. In particular, an important topic in the considered optimization problem is selection of a modulation format (mf) for a particular demand, due to the fact that each mf provides a different tradeoff between required spectrum width and transmission distance. To solve that problem, we define applicable distances for each mf (i.e., minimum and maximum length of a routing path that is supported by each mf). To find values of these distances, which provide best allocation results, we construct a regression model and then combine it with Monte Carlo search. It is worth noting that this work does not address dynamic problems in the context of changing the concept over time, as is often the case with processing large sets, and assumes static distribution of the concept [9] . The main novelty and contribution of the following work is an in-depth analysis of the basic regression methods stabilized by the structure of the estimator ensemble [16] and assessment of their usefulness in the task of predicting the objective function for optimization purposes. In one of the previous works [7] , we confirmed the effectiveness of this type of solution using a regression algorithm of the nearest weighted neighbors, focusing, however, much more on the network aspect of the problem being analyzed. In the present work, the main emphasis is on the construction of the prediction model. Its main purpose is: -A proposal to interpret the optimization problem in the context of pattern recognition tasks. The rest of the paper is organized as follows. In Sect. 2, we introduce studied network optimization problem. In Sect. 3, we discuss out optimization approach for that problem. Next, in Sect. 4 we evaluate efficiency of the proposed approach. Eventually, Sect. 5 concludes the work. The optimization problem is known in the literature as dynamic Routing, Space and Spectrum Allocation (rssa) in ss-fons [5] . We are given with an ss-fon topology realized using mcfs. The topology consists of nodes and physical link. Each physical link comprises of a number of spatial cores. The spectrum width available on each core is divided into arrow and same-sized segments called slices. The network is in its operational state -we observe it in a particular time perspective given by a number of iterations. In each iteration (i.e., a time point), a set of demands arrives. Each demand is given by a source node, destination node, duration (measured in the number of iterations) and bitrate (in Gbps). To realize a demand, it is required to assign it with a light-path and reserve its resources for the time of the demand duration. When a demand expires, its resources are released. A light-path consists of a routing path (a set of links connecting demand source and destination nodes) and a channel (a set of adjacent slices selected on one core) allocated on the path links. The channel width (number of slices) required for a particular demand on a particular routing path depends on the demand bitrate, path length (in kilometres) and selected modulation format. Each incoming demand has to be realized unless there is not enough free resources when it arrives. In such a case, a demand is rejected. Please note that the selected light-paths in i -th iteration affect network state and allocation possibilities in the next iterations. The objective function is defined here as bandwidth blocking probability (bbp) calculated as a summed bitrate of all rejected demands divided by the summed bitrate of all offered demands. Since we aim to support as much traffic as it is possible, the objective criterion should be minimized [5, 8] . The light-paths' allocation process has to satisfy three basic rssa constraints. First, each channel has to consists of adjacent slices. Second, the same channel (i.e., the same slices and the same core) has to be allocated on each link included in a light-path. Third, in each time point each slice on a particular physical link and a particular core can be used by at most one demand [8] . There are four modulation formats available for transmissions-8-qam, 16-qam, qpsk and bpsk. Each format is described by its spectral efficiency, which determines number of slices required to realize a particular bitrate using that modulation. However, each modulation format is also characterized by the maximum transmission distance (mtd) which provides acceptable value of optical signal to noise ratio (osnr) at the receiver side. More spectrally-efficient formats consume less spectrum, however, at the cost of shorter mtds. Moreover, more spectrally-efficient formats are also vulnerable to xt effects which can additionally degrade QoT and lead to demands' rejection [7, 8] . Therefore, the selection of the modulation format for each demand is a compromise between spectrum efficiency and QoT. To answer that problem, we use the procedure introduced in [7] to select a modulation format for a particular demand and routing path [7] . Let m = 1, 2, 3, 4 denote modulation formats ordered in increasing mtds (and in decreasing spectral efficiency at the same time). It means that m = 1 denotes 8-qam and m = 4 denotes bpsk. Let MT D = [mtd 1 , mtd 2 , mtd3, mtd 4 ] be a vector of mtds for modulations 8-qam, 16-qam, qpsk, bpsk respectively. Moreover, let AT D = [atd 1 , atd 2 , atd3, atd 4 ] (where atd i <= mtd i , i = 1, 2, 3, 4) be the vector of applicable transmission distances. For a particular demand and a routing path we select most spectrally-efficient modulation format i for which atd i is grater of equal to the selected path length and the xt effect is on an acceptable level. For each candidate modulation format, we asses the xt level based on the adjacent resources' (i.e., slices and cores) availability using procedure proposed in [7] . It is important to note that we do not indicate atd 4 (for bpsk) since we assume that this modulation is able to support transmission on all candidate routing paths regardless of their length. Please also note that when xt level is too high for all modulation formats, the demand is rejected regardless of the light-paths' availability. In Sect. 2 we have studied rssa problem and emphasised the importance of efficient modulation selection task. For that task we have proposed solution method whose efficiency strongly depends on the applied atd vector. Therefore, we aim to find atd * vector that provides best results. The vector elements have to be positive and have upper bounds given by vector mtd. Moreover, the following condition have to be satisfied: atd i < atd i+1 , i = 1, 2. Since solving rssa instances is a time consuming process, it is impossible to evaluate all possible atd vectors in a reasonable time. We therefore make use of regression methods and propose a scheme to find atd * depicted in Fig. 1 . A representative set of 1000 different atd vectors is generated. Then, for each of them we simulate allocation of demands in ss-fon (i.e., we solve dynamic rssa). For the purpose of demands allocation (i.e., selection of light-paths), we use a dedicated algorithm proposed in [7] . For each considered atd vector we save obtained bbp. Based on that data, we construct a regression model, which predicts bbp based on an atd vector. Having that model, we use Monte Carlo method to find atd * vector, which is recommended for further experiments. To solve an rssa instance for a particular atd vector, we use heuristic algorithm proposed in [7] . We work under the assumption that there are 30 candidate routing paths for each traffic demand (generated using Dijkstra algorithm). Since the paths are generated in advance and their lengths are known, we can use an atd vector and preselect for these paths modulation formats based on the procedure discussed in Sect. 2. Therefore, rssa is reduced to the selection of one of the candidate routing paths and a communication channel with respect to the resource availability and assessed xt levels. From the perspective of pattern recognition methods, the abstraction of the problem is not the key element of processing. The main focus here is the representation available to construct a proper decision model. For the purposes of considerations, we assume that both input parameters and the objective function take only quantitative and not qualitative values, so we may use probabilistic pattern recognition models to process them. If we interpret the optimization task as searching for the extreme function of many input parameters, each simulation performed for their combination may also be described as a label for the training set of supervised learning model. In this case, the set of parameters considered in a single simulation becomes a vector of object features (x n ), and the value of the objective function acquired around it may be interpreted as a continuous object label (y n ). Repeated simulation for randomly generated parameters allows to generate a data set (X) supplemented with a label vector (y). A supervised machine learning algorithm can therefore gain, based on such a set, a generalization abilities that allows for precise estimation of the simulation result based on its earlier runs on the random input values. A typical pattern recognition experiment is based on the appropriate division of the dataset into training and testing sets, in a way that guarantees their separability (most often using cross-validation), avoiding the problem of data peeking and a sufficient number of repetitions of the validation process to allow proper statistical testing of mutual model dependencies hypotheses. For the needs of the proposal contained in this paper, the usual 5-fold cross validation was adopted, which calculates the value of the r 2 metric for each loop of the experiment. Having constructed regression model, we are able to predict bbp value for a sample atd vector. Please note that the time required for a single prediction is significantly shorter that the time required to simulate a dynamic rssa. The last step of our optimization procedure is to find atd * -vector providing lowest estimated bbp values. To this end, we use Monte Carlo method with a number of guesses provided by the user. The rssa problem was solved for two network topologies-dt12 (12 nodes, 36 links) and Euro28 (28 nodes, 82 links). They model Deutsche Telecom (German national network) and European network, respectively. Each network physical link comprised of 7 cores wherein each of the cores offers 320 frequency slices of 12.5 GHz width. We use the same network physical assumptions and xt levels and assessments as in [7] . Traffic demands have randomly generated end nodes and birates uniformly distributed between 50 Gbps and 1 Tbps, with granularity of 50 Gbps. Their arrival follow Poisson process with an average arrival rate λ demands per time unit. The demand duration is generated according to a negative exponential distribution with an average of 1/μ. The traffic load offered is λ/μ normalized traffic units (ntus). For each testing scenario, we simulate arrival of 10 6 demands. Four modulations are available (8-qam, 16-qam, qpsk, bpsk) wherein we use the same modulation parameters as in [7] . For each topology we have generated 9 different datasets, each consists of 1000 samples of atd vector and corresponding bbp. The datasets differ with the xt coefficient (μ = 1 · 10 −9 indicated as "xt1", μ = 2 · 10 −9 indicated as "xt2", for more details we refer to [7] ) and network links scaling factor (the multiplier used to scale lengths of links in order to evaluate if different lengths of routing paths influence performance of the proposed approach). For dt12 we use following scaling factors: 0.4, 0.6, 0.8, . . . , 2.0. For Euro28 the values are as follows: 0.104, 0.156, 0.208, 0.260, 0.312, 0.364, 0.416, 0.468, 0.520. We indicate them as "Sx.xxx " where x.xxx refers to the scaling factor value. Using these datasets we can evaluate whether xt coefficient (i.e., level of the vulnerability to xt effects) and/or average link length influence optimization approach performance. The experimental environment for the construction of predictive models, including the implementation of the proposed processing method, was implemented in Python, following the guidelines of the state-of-art programming interface of the scikit-learn library [12] . Statistical dependency assessment metrics for paired tests were calculated according to the Wilcoxon test, according to the implementation contained in scipy module. Each of the individual experiments was evaluated by r 2 score -a typical quality assessment metric for regression problems. The full source code, supplemented with employed datasets is publicly available in a git repository 1 . Five simple recognition models were selected as the base experimental estimators: knr-k-Nearest Neighbors regressor with five neighbors, leaf size of 30 and euclidean metric approximated by Minkowski distance, -dknr-knr regressor weighted by distance from closest patterns, mlp-a Multilayer Perceptron with one hidden layer of one hundred neurons, with the ReLU activation function and adam optimizer, dtr-cart tree with mse split criterion, lin-Linear Regression algorithm. In this section we evaluate performance of the proposed optimization approach. To this end, we conduct three experiments. Experiment 1 focuses on the number of patterns required to construct a reliable prediction model. Experiment 2 assesses the statistical dependence of built models. Eventually, experiment 3 verifies efficiency of the proposed approach as a function of number of guesses in the Monte Carlo search. The first experiment carried out as part of the approach evaluation is designed to verify how many patterns -and thus how many repetitions of simulations -must be passed to individual regression algorithms to allow the construction of a reliable prediction model. The tests were carried out on all five considered regressors in two stages. First, the range from 10 to 100 patterns was analyzed, and in the second, from 100 to 1000 patterns per processing. It is important to note that due to the chosen approach to cross-validation, in each case the model is built on 80% of available objects. The analysis was carried out independently on all available data sets, and due to the non-deterministic nature of sampling of available patterns, its results were additionally stabilized by repeating a choice of the objects subset five times. In order to allow proper observations, the results were averaged for both topologies. Plots for the range from 100 to 1000 patterns were additionally supplemented by marking ranges of standard deviation of r 2 metric acquired within the topology and presented in the range from the .8 value. The results achieved for averaging individual topologies are presented in Figs. 2 and 3 . For dt12 topology, mlp and dtr algorithms are competitively the best models, both in terms of the dynamics of the relationship between the number of patterns and the overall regression quality. The Linear Regression clearly stands out from the rate. A clear observation is also the saturation of the models, understood by approaching the maximum predictive ability, as soon as around 100 patterns in the data set. The best algorithms already achieve quality within .8, and with 600 patterns they stabilize around .95. The relationship between each of the recognition algorithms and the number of patterns takes the form of a logarithmic curve in which, after fast initial growth, each subsequent object gives less and less potential for improving the quality of prediction. This suggests that it is not necessary to carry out further simulations to extend the training set, because it will not significantly affect the predictive quality of the developed model. Very similar observations may be made for Euro28 topology, however, noting that it seems to be a simpler problem, allowing faster achievement of the maximum model predictive capacity. It is also worth noting here the fact that the standard deviation of results obtained by mlp is smaller, which may be equated with the potentially greater stability of the model achieved by such a solution. The second experiment extends the research contained in Experiment 1 by assessing the statistical dependence of models built on a full datasets consisting of a thousand samples for each case. The results achieved are summarized in Tables 1a and b. As may be seen, for the dt12 topology, the lin algorithm clearly deviates negatively from the other methods, in absolutely every case being a worse solution than any of the others, which leads to the conclusion that we should completely reject it from considering as a base for a stable recognition model. Algorithms based on neighborhood (knr and dknr) are in the middle of the rate, in most cases statistically giving way to mlp and dtr, which would also suggest departing from them in the construction of the final model. The statistically best solutions, almost equally, in this case are mlp and dtr. For Euro28 topology, the results are similar when it comes to lin, knr and dknr approaches. A significant difference, however, may be seen for the achievements of dtr, which in one case turns out to be the worst in the rate, and in many is significantly worse than mlp. These observations suggest that in the final model for the purposes of optimization lean towards the application of neural networks. What is important, the highest quality prediction does not exactly mean the best optimization. It is one of the very important factors, but not the only one. It is also necessary to be aware of the shape of the decision function. For this purpose, the research was supplemented with visualizations contained in Fig. 4 . Algorithms based on neighborhood (knn, dknn) and decision trees (dtr) are characterized by a discrete decision boundary, which in the case of visualization resembles a picture with a low level of quantization. In the case of an ensemble model, stabilized by cross-validation, actions are taken to reduce this property in order to develop as continuous a border as possible. As may be seen in the illustrations, compensation occurs, although in the case of knn and dknn leads to some disturbances in the decision boundary (interpreted as thresholding the predicted label value), and for the dtr case, despite the general correctness of the performed decisions, it generates image artifacts. Such a model may still retain high predictive ability, but it has too much tendency to overfit and leads to insufficient continuity of the optimized function to perform effective optimization. Clear decision boundaries are implemented by both the lin and mlp approaches. However, it is necessary to reject lin from processing due to the linear nature of the prediction, which (i ) in each optimization will lead to the selection of the extreme value of the analyzed range and (ii ) is not compatible with the distribution of the explained variable and must have the largest error in each of the optimas. Summing up the observations of Experiments 1 and 2, the mlp algorithm was chosen as the base model for the optimization task. It is characterized by (i ) statistically best predictive ability among the methods analyzed and (ii ) the clearest decision function from the perspective of the optimization task. The last experiment focuses on the finding of best atd vector based on the constructed regression model. To this end, we use Monte Carlo method with different number of guesses. Tables 2 and 3 present the obtained results as a function of number of guesses, which changes from 10 1 up to 10 9 . The results quality increases with the number of guesses up to some threshold value. Then, the results do not change at all or change only a little bit. According to the presented values, Monte Carlo method applied with 10 3 guesses provides satisfactory results. We therefore recommend that value for further experiments. The following work has considered the topic of employing pattern recognition methods to support ss-fon optimization process. For a wide pool of generated cases, analyzing two real network topologies, the effectiveness of solutions implemented by five different, typical regression methods was analyzed, starting from Logistic Regression and ending with neural networks. Conducted experimental analysis shows, with high probability obtained by conducting proper statistical validation, that mlp is characterized by the greatest potential in this type of solutions. Even with a relatively small pool of input simulations, constructing a data set for learning purpouses, interpretable in both the space of optimization and machine learning problems, simple networks of this type achieve both high quality prediction measured by the r 2 metric, and continuous decision space creating the potential for conducting optimization. Basing the model on the stabilization realized by using ensemble of estimators additionally allows to reduce the influence of noise on optimization, whichin a state-of-art optimization methods -could show a tendency to select invalid optimas, burdened by the nondeterministic character of the simulator. Further research, developing ideas presented in this article, will focus on the generalization of the presented model for a wider pool of network optimization problems. High-capacity transmission over multi-core fibers A comprehensive survey on machine learning for networking: evolution, applications and research opportunities Visual Networking Index: Forecast and Trends Elastic optical networking: a new dawn for the optical layer On the efficient dynamic routing in spectrally-spatially flexible optical networks On the complexity of RSSA of any cast demands in spectrally-spatially flexible optical networks Machine learning assisted optimization of dynamic crosstalk-aware spectrallyspatially flexible optical networks Survey of resource allocation schemes and algorithms in spectrally-spatially flexible optical networking Data stream classification using active learned neural networks Artificial intelligence (AI) methods in optical networks: a comprehensive survey An overview on application of machine learning techniques in optical networks Scikit-learn: machine learning in python Machine learning for network automation: overview, architecture, and applications Survey and evaluation of space division multiplexing: from technologies to optical networks Modeling and Optimization of Cloud-Ready and Content-Oriented Networks. SSDC Classifier selection for highly imbalanced data streams with Minority Driven Ensemble