Combination of multiple diagnosis systems in Self-Healing networks Expert Systems With Applications 64 (2016) 56–68 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Combination of multiple diagnosis systems in Self-Healing networks David Palacios, Emil J. Khatib, Raquel Barco ∗ Communications Engineering Dept., University of Málaga, 29071, Málaga, Spain a r t i c l e i n f o Article history: Received 8 January 2016 Revised 20 July 2016 Accepted 21 July 2016 Available online 22 July 2016 Keywords: LTE Self-healing Root cause analysis Self-organizing networks (SON) Hybrid ensemble classifier Automatic fault identification a b s t r a c t The Self-Organizing Networks (SON) paradigm proposes a set of functions to automate network manage- ment in mobile communication networks. Within SON, the purpose of Self-Healing is to detect cells with service degradation, diagnose the fault cause that affects them, rapidly compensate the problem with the support of neighboring cells and repair the network by performing some recovery actions. The diagnosis phase can be designed as a classifier. In this context, hybrid ensembles of classifiers en- hance the diagnosis performance of expert systems of different kinds by combining their outputs. In this paper, a novel scheme of hybrid ensemble of classifiers is proposed as a two-step procedure: a modeling stage of the baseline classifiers and an application stage, when the combination of partial diagnoses is ac- tually performed. The use of statistical models of the baseline classifiers allows an immediate ensemble diagnosis without running and querying them individually, thus resulting in a very low computational cost in the execution stage. Results show that the performance of the proposed method compared to its standalone components is significantly better in terms of diagnosis error rate, using both simulated data and cases from a live LTE network. Furthermore, this method relies on concepts which are not linked to a particular mobile communication technology, allowing it to be applied either on well established cellular networks, like UMTS, or on recent and forthcoming technologies, like LTE-A and 5G. © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). S t t ( 1. Introduction The growing demand for mobile services with ever-increasing bandwidth and the expanding number of users make necessary the deployment of new and more efficient mobile communication networks over the existing ones (GSM, UMTS), such as Long-Term Evolution (LTE). However, the complexity of this heterogeneous scenario, which comprises several Radio Access Technologies (RAT), requires challenging maintenance and complex operational tasks. Mobile operators need to offer new demanding services without increasing either operational expenditures (OPEX) or capital expenditures (CAPEX). In order to deal with that problem, the 3rd Generation Partnership Project (3GPP) has proposed Self- Organizing Networks (SON) ( 3GPP (d) ) as networks that include mechanisms to automate network procedures in order to help mo- bile operators with their management work, providing significant cost reduction. This automation of network management will also be essential in near and future technologies, like LTE-Advanced and 5G ( 3GPP (b) ). ∗ Corresponding author. Fax: +34952132027. E-mail addresses: dpc@ic.uma.es (D. Palacios), emil@uma.es (E.J. Khatib), rbarco@uma.es (R. Barco). http://dx.doi.org/10.1016/j.eswa.2016.07.030 0957-4174/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article u SON comprises three groups of functions: Self-Configuration, elf-Optimization and Self-Healing. The aim of the latter is to au- onomously solve the problems that a cell, with service degrada- ion or outage, could present ( 3GPP (e) ; Barco, Lázaro, and Muñoz 2012) ). This is done by means of four stages: • Fault Detection: Responsible for finding cells with problems, i.e., cells experiencing service outage or just suffering an unaccept- able service degradation. • Diagnosis of the fault cause: In this step, the actions to be per- formed in order to recover the system from the degradation it is suffering are decided. This step can be divided into two sub- stages: Fault Identification, this is, identifying the fault cause based on observable symptoms such as Key Performance Indi- cators (KPI) and alarms; and Action Identification, which corre- sponds to the decision of what tasks to perform to recover the system normal performance. • Fault recovery: In this step, the proposed solutions are carried out. • Fault compensation: Since diagnosing the fault and repairing it normally takes some time, compensation aims to diminish the impact of the fault by changing parameters in neighboring cells. nder the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). http://dx.doi.org/10.1016/j.eswa.2016.07.030 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2016.07.030&domain=pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ mailto:dpc@ic.uma.es mailto:emil@uma.es mailto:rbarco@uma.es http://dx.doi.org/10.1016/j.eswa.2016.07.030 http://creativecommons.org/licenses/by-nc-nd/4.0/ D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 57 f h c m n L p N w p K f r O t s a e a b m w i b n b t e b u C 2 m i t T t m p n t m d e c o a d n ( 2 n t s a t t t Fig. 1. Scheme of an automatic diagnosis system. l c a a l 2 2 o w T i o s ( t c t t i t fi t t n a w o This paper is focused on the diagnosis task, in particular in the ault identification, also called root cause analysis. Once a problem as been detected in a cell, root cause analysis identifies the fault ause given the value of performance indicators, alarms, counters, obile traces, etc. In the context of cellular networks, some diag- osis systems have been recently proposed. Barco, Díez, Wille, and ázaro (2009) and Barco, Lázaro, Wille, Díez, and Patel (2009) pro- osed diagnosis systems based on Bayesian Networks. Szilágyi and ováczki (2012) used a scoring system in order to determine how ell a specific case fits a diagnosis. Nováczki (2013) enhanced the revious system by adding profiling techniques. The method in hatib, Barco, Gómez-Andrades, and Serrano (2015) was based on uzzy logic and genetic algorithms. Gómez-Andrades, Muñoz, Ser- ano, and Barco (2016) proposed a diagnosis system based on Self- rganized Maps (SOM). Each of the previous methods has its pros and its cons. In prac- ice, this makes the selection of the diagnosis technique cumber- ome when the aim is to deploy a automatic diagnosis system in real network. Furthermore, once the technique has been decided, .g., fuzzy logic, operators normally design several standalone di- gnosis models. This is due to the fact that, firstly, different trou- leshooting experts will build different models and secondly, when odels are learnt from historical cases, different training datasets ill result in different models. To cope with the limitations of standard classifying systems n terms of accuracy and dataset-dependent performance, ensem- les of classifiers arose. Within these, homogeneous and heteroge- eous (commonly known as hybrid) ensembles of classifiers may e found, where the former stand for the ensemble of classifiers of he same kind and the latter stand for the combination of differ- nt kinds of systems and datasets. Despite homogeneous ensem- les have been widely studied and as of today still are extensively sed in different fields ( Begum, Chakraborty, & Sarkar, 2015; Liu, hen, Song, & Han, 20 09; Shen & Chou, 20 06; Wiezbicki & Ribeiro, 016 ). In this paper, a method for the generalized combination of ultiple diagnosis systems based on a hybrid ensemble approach s proposed and tested in the context of cellular networks, which o the authors’ knowledge is a research area still to be explored. he proposed work describes a method to gather, combine and use he knowledge held by any kind of expert system in any field that akes use of a classifying or diagnosis system. In this work, the roposed method is applied in the fault cause diagnosis in cellular etworks, where the expertise may be provided either by a human roubleshooting expert or by a database of cases assessed by auto- atic diagnosis systems. The proposed method allows combining iagnosis systems in a wide sense, being able to merge both sev- ral diagnosis models (expertise) and the tools used for their appli- ation (automatic diagnosis techniques) in the form of supervised r unsupervised classifying systems. Up to now, hybrid ensembles of classifiers are mainly based on set of baseline systems which must first assess the cases un- er test and, consequently, provide partial diagnoses which are fi- ally combined into a final decision using a majority vote scheme Ciocarlie, Lindqvist, Nováczki, & Sanneck, 2013; Gandhi & Pandey, 015; Wei et al., 2014 ). This procedure requires a relatively high umber of diagnosis techniques to be run in the test stage and, herefore, a noticeable expenditure of computational and time re- ources. The proposed work, however, presents a method which llows combining the diagnoses that the standalone diagnosis sys- ems would output for a case under test without actually needing hem to be run, thus lightening the computational weight of the est stage. The main contributions of this paper are: • A method to combine any number and kind of different stan- dalone classifiers as well as different sources of expert knowl- edge in order to get an enhanced performance compared to that of the base classifiers. In the context of troubleshooting in cellular networks this comprises the combination of several di- agnosis models and techniques for the automatic diagnosis. • A method to lighten the computational cost of the evaluation stage in hybrid ensembles of classifiers. This work proposes a scheme to model and emulate the behavior of every standalone classifier so these need not to be continuously queried before combining their partial diagnoses. This paper is organized as follows. Section 2 presents the prob- em formulation. Section 3 introduces the proposed method for ombining multiple baseline diagnosis systems. In Section 4 results re analyzed by means of both a network simulator and data from live LTE network. In Section 5 the future lines of work are out- ined. Finally, Section 6 summarizes the main conclusions. . Problem formulation .1. Root cause analysis in mobile communications networks In the same way that a patient is diagnosed by a doctor based n the symptoms he shows, the status of a communications net- ork may be diagnosed based on a set of performance indicators. his diagnosis task, also called root cause analysis or troubleshoot- ng, is often carried out by human experts using their knowledge n the underlying relations that the observed indicators and the tatus of the network have. However, the number of symptoms counters, alarms, KPIs, call traces, etc.) and possible fault causes he expert has to deal with increases as networks grow in size and omplexity, which makes this task to become a very difficult and ime consuming issue. Furthermore, the current manual troubleshooting is a layered ask, guided by a Trouble Ticket (TT) system. In this problem solv- ng system, a group of specialists tries first to diagnose and solve he problem by performing some simple checks. If they can not nd the root of the problem, this is raised to a more specialized eam (and so on), which performs a deeper study on the symp- oms the case exhibits and resorts to field engineers in case they eed to make some on site checks. As a response to this more and more inefficient procedure, utomatic diagnosis systems arose in an attempt of imitating the ay of acting of troubleshooters. Fig. 1 shows the basic scheme f a system for automatic diagnosis. It is composed of an au- 58 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 t m s p a a n 2 p i m e k i p s o d p t p t w e s s c m a t s s 1 l f t m e e p f p n w c s B V f h a o s i v k l e tomatic diagnosis technique and a diagnosis model. The first is an artificial intelligence system that outputs a diagnosis taking a set of symptoms, e.g., (KPIs) from a test case as its input. The second represents the knowledge a human expert would have on the underlying relations between the symptoms and the fault causes and may take different forms depending on the diagnosis technique it is destined to work with. For example, a diagnosis model may consist of the parameters (e.g., prior probabilities and probability density functions) required by a given diagnosis technique (e.g., bayesian classifier) or a set of rules for other techniques (e.g., Case Base Reasoning, CBR). As it can be seen in this figure, the diagnosis model may be built from a set of training cases by means of a machine learning algorithm or by trou- bleshooting experts by gathering their knowledge. The proposed method aims to combine the knowledge acquired by any number and kind of diagnosis models and automatic diagnosis tech- niques in an attempt to reduce the errors in fault detection and diagnosis. 2.2. Automated diagnosis from the classification theory A diagnosis system is a method that given a set of indicators or symptoms (called case hereafter) intends to infer the cause that provoked them. In this sense, a diagnosis system acts as a classi- fying system in which the attributes from the cases to be classi- fied correspond to the symptoms from the case to be diagnosed, and the classes to be assigned correspond to the causes to be in- ferred. This is an issue long time investigated in data mining the- ory ( Wu et al., 2008 ), and many types of classifiers have been de- veloped over the years in an attempt to get the maximum infor- mation the cases under diagnosis could provide. However, no algo- rithm has proven to be clearly better than the rest for all kinds of input data by now. One reason for the increasing effort s in the re- lated research is that the performance of a classifier normally de- pends on the nature and distribution of the data it has to work with. For this reason, the present paper focuses not only on com- bining different diagnosis models but on offering the possibility to combine multiple classifiers in the form of automatic diagnosis techniques. Let us assume we have a set of M fault causes to diagnose and R diagnosis systems (either diagnosis model or technique) to com- bine, and that each of these systems can have a subset of these causes as their output, namely, W r for the system r . In this sce- nario, the set of causes a diagnosis system can identify may be dif- ferent from one system to another. This can be seen in (1) , where each row stands for a W r and the element w r m stands for the m th fault cause, diagnosed by the r th system. According to this, each row may be different from another. ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ w 1 1 . . . w 1 m · · · w 1 M . . . . . . . . . . . . . . . w r 1 . . . w r m · · · w r M . . . . . . . . . . . . . . . w R 1 . . . w R m · · · w R M ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (1) In a diagnosis system, a case, x , is characterized by its symp- toms, x n , where x = { x 1 , x 2 , . . . , x N } , having a total of N pos- sible symptoms. However, each diagnosis system may consider only a subset of these symptoms, namely, N r for the diagnosis system r . In the context of diagnosis systems for mobile communica- tion networks a case corresponds to an observation or measure- ment from the network; a symptom may be an event counter, a Key Performance Indicator (KPI), a call trace or an alarm and he causes are seen as the network states, among which the nor- al and several fault states may be distinguished. In this paper, ome results from theory of classifiers is used, extended and ap- lied in this context in an attempt of combining the knowledge cquired by these R diagnosis systems, developing a more reli- ble and accurate root cause analysis system for communication etworks. .3. State-of-the-art in ensemble-based classification algorithms This section aims to provide a brief survey on the most recently roposed ensemble-based systems, most of which have been used n classifying tasks in areas not related to mobile communications. Ensembles of classifiers may nowadays be classified into ho- ogeneous and heterogeneous or hybrid. The first stand for those nsembles which put together instances of classifiers of the same ind, e.g., several k-Nearest Neighbor (kNN) classifiers. Conversely, n heterogeneous ensembles a set of classifiers of different kind are ut together, e.g., a kNN and a NN (Neural Network). This is the cope of the present work, as the latter also allow the combination f different sources of expert knowledge within a single enhanced iagnosis system. One of the earliest works on ensemble methods proposed to artition the feature space (i.e., the vector space in which the fea- ures of the cases to be diagnosed are defined) and to assign each art to a different classifier which is supposed to be the best for his subset of cases ( Dasarathy & Sheela, 1979 ). This idea has been idely explored and has given birth to the so-called mixture of xperts algorithm ( Jacobs, Jordan, Nowlan, & Hinton, 1991; Yuk- el, Wilson, & Gader, 2012 ), being the paradigm for the classifier- election type of ensemble methods. Under this approach, only one lassifier is working at the same time and its selection is deter- ined by the partition the case under test belongs. Conversely, in classifier-fusion methods all classifiers are usu- lly trained over the entire feature space. The classifier combina- ion process involves merging the individual classifiers to obtain a ystem that outperforms the standalone classifiers. This is the ba- is for the widely used bagging and boosting predictors ( Breiman, 996; Freund & Schapire, 1997 ), being AdaBoost an example of the atter and one of the most known and used algorithms for classi- ying nowadays. Classifier fusion methods can also be divided into hose which work with classification labels only and those which ake use of a continuous valued output for each classifier for ev- ry class. In this case, the outputs can be seen as the support an xpert gives to a class in terms of the class-conditional posterior robabilities ( Kuncheva, 2002 ). Some examples of ensemble methods as enhanced systems for ault disclosure can be found in the literature with many different urposes. In Liu et al. (2009) , an homogeneous ensemble of neural etworks with cross-validation for fault diagnosis of analog circuits ith tolerance is proposed. In Shen and Chou (2006) , several kNN lassifiers are put together on a majority-vote ensemble to clas- ify the patterns that several proteins may exhibit when folded. In egum et al. (2015) , an homogeneous ensemble of SVM (Support ector Machine) is proposed to identify different types of cancer rom a genetic analysis. Wiezbicki and Ribeiro (2016) proposes an omogeneous ensemble of neural networks, combined by means of weighted majority vote in a sensor network for the classification f gases. Regarding the most recent works on hybrid ensembles of clas- ifiers, in Wei et al. (2014) n ensembles are made up by combin- ng 3 n baseline classifiers. Each ensemble comprises three super- ised methods: a decision tree, a support vector machine and a NN algorithm. In each ensemble, the diagnoses from these base- ine classifiers are fused applying a weighted majority vote, where ach vote is weighted by the performance each individual classifier D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 59 Fig. 2. Proposed method for combining diagnosis systems. Stage 1: Construction of the behavior models. s a j v k t a m w s o m b s f c p c c b t c m p p d f g p c A a e t e n e t s ( c q W c v d 3 q s p b a a T a t d u 3 m b ( t n c i c T f I r d w c A u f i t a c n hows during a prior training stage. Then, the n resulting diagnoses re combined into a final diagnosis applying a non-weighted ma- ority vote. In this case, all the baseline classifiers must be super- ised diagnosis systems, as their performance must be previously nown in order to weigh their votes in the first stage. Unlike this, he proposed method allows the user to combine any kind of di- gnosis system, either supervised or unsupervised ones. And even ore important, regarding the operation stage, in Wei et al. (2014) , henever a new case is to be diagnosed it must pass through two teps, one of them made up of 3 n systems which must first each utput a diagnosis, resulting in a high computational cost. The ethod in the proposed work, however, needs the test cases to e assessed only by one step, which, furthermore, only consist of ome algebraic calculations. Once the training stage has been per- ormed, new cases will be diagnosed at a minimum computational ost. As for Gandhi and Pandey (2015) , a two-step method is again roposed. The first step consists of a learning stage for the base lassifiers and the second step consists of a majority vote-based ombining stage. Again and similar to Wei et al. (2014) , every aseline classifier is required to first diagnose every new case in he application step, which results in a high computational cost ompared to that from the application (test) step in the proposed ethod. In the context of cellular networks, Ciocarlie et al. (2013) pro- oses a hybrid ensemble of classifiers to detect anomalies in the erformance indicators of a cell. This work is focused on the fault etection. Unlike this, the proposed work does not just find a per- ormance degradation, but identifies the fault cause behind it. Re- arding its implementation, this method relies on the use of a ool of models. New models are added to this pool whenever a hange in the configuration parameters of the network takes place. number of N CM × ( N uni v ariate × N U KPI + N multi v ariate × N G KPI ) models, nd thus, instances of automatic techniques must be assessed for very single new case under test. In this expression, N CM stands for he number of sets of network configuration parameters consid- red; N U KPI and N univariate stand for the number of univariate tech- iques considered and the number of KPIs acting as their input in ach model; and N multivariate and N G KPI stand for the number of mul- ivariate techniques used and the number of groups of KPIs con- idered in each model. Like in Ciocarlie et al. (2013) , Wei et al. 2014) and Gandhi and Pandey (2015) , before an ensemble decision an be made, a high number of baseline classifiers must be first ueried. And again similarly to Ciocarlie et al. (2013) , according to ei et al. (2014) and Gandhi and Pandey (2015) , all the partial de- isions meet at a combining stage based in a weighted majority ote. t To the authors’ knowledge, no ensemble method for fault cause iagnosis in cellular networks has been proposed as of today. . Method for combining multiple automatic diagnosis systems In this section, a method for combining the knowledge ac- uired by any number and kind of standalone automatic diagnosis ystems by means of a classifier-fusion scheme is proposed. The roposed method consists of two stages: the construction of the ehavior models of the automatic diagnosis systems, Section 3.1 , nd the combination of these models in order to make a more ccurate diagnosis on the cases from a testing set, Section 3.2 . his can be seen in Figs. 2 and 5 . Before this method can be pplied, two sets of N -dimensional cases must be distinguished: he modeling set and the testing set, where each of these N imensions stands for a working KPI. The modeling set will be sed in the first stage and the testing set in the second. .1. Construction of the behavior models The baseline diagnosis systems are to be combined by means of ixing their models of behavior, which need to be extracted first. Once the diagnosis model from each diagnosis system has been uilt (either from training cases via a machine learning method Khatib, Barco, Gómez-Andrades, Muñoz, & Serrano, 2015 ), or from he experts’ knowledge ( Gómez-Andrades et al., 2016 ) each diag- osis system can start the classification (Fig. 1 ). In this stage, every ase from the modeling set is diagnosed by the R systems. That s, each system assigns to each case one of the M possible fault auses; in particular, one of the causes that system can discern. his can be seen in Fig. 2 , where the case x acts as the input or the R systems and, in turn, they assign it R diagnosis labels. f the system r diagnoses the case x with the cause m , this case eceives the label w r∗m . In this way, each diagnosis system makes a ifferent partition of the modeling set into | W r ∗| disjoint subsets, hose maximum is | W r |, that is, the number of causes that system onsiders (Fig. 3 ), where | A | is the number of elements in the set . This leads to finally identify M ∗ different causes, being M ∗ the nion of W r ∗ over r , with M ∗ ≤ M . According to this, a new matrix rom (1) may be written, substituting every row (i.e., every W r ) by ts corresponding W r ∗. Each row would represent one of the parti- ions of the modeling set and each column would represent how cause “is seen” by each diagnosis system regarding the KPIs the ases belonging to that w r m exhibit. It should be noticed that each of these M ∗ subsets contains a umber of | N r |-dimensional cases. At this point, the behavior of he diagnosis system r is modeled through the estimation of the 60 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 Fig. 3. Modeling set divided into different subsets by means of two different par- titions: on the left, the partition the first diagnosis system makes, having W 1 = { w 1 1 , w 1 2 , w 1 3 } with | W 1 | = | W 1 ∗| = 3 ; on the right, the partition the diagnosis sys- tem R makes, having W R = { w R 1 , w R 2 , w R 3 } and W R ∗ = { w R ∗ 1 , w R ∗ 2 } . In this last case, the diagnosis system R only diagnosed the causes 1 and 2 although being able of also identifying the fault cause 3. Table 1 Families of PDFs considered for the estimation of p(x n | w r∗m ) . Distribution PDF Parameters Beta �(a + b) �(a )�(b) x a −1 (1 − x ) b−1 a, b Normal 1 σ √ 2 π exp ( − 1 2 ( x −μ σ )2 ) μ, σ Log-normal 1 xσ √ 2 π exp ( − 1 2 ( ln (x −μ) σ )2 ) μ, σ Exponential λ exp (−λx ) λ Gen. extreme value 1 σ t (x ) ξ +1 exp (−t (x )) , μ, σ , ξ t (x ) = {( 1 + ( x −μ σ ) ξ )− 1 ξ ξ � = 0 exp (−(x − μ) /σ ) ξ = 0 T-location �( ν+1 2 ) �( ν 2 ) √ π νσ ( 1 + 1 ν ( x −μ σ )2 )− ν+1 2 ν, μ, σ Nakagami 2 m m �(m )�m x 2 m −1 exp ( − m � x 2 ) m, � Gamma 1 �(k ) θ k x k −1 exp ( − x θ ) k, θ Logistic exp ( x −μs ) s ( 1+ exp ( − x −μs ) ) 2 μ, s Log-logistic (β/α)(x/α) β−1 ( 1+(x/αβ ) ) 2 α, β Weibull k λ ( x λ )k −1 exp ( − x λ )k λ, k Rayleigh x σ 2 exp ( − 1 2 ( x σ )2 ) σ Rice x σ 2 exp ( − x 2 + ν2 2 σ 2 ) I 0 ( xν σ 2 ) ν, σ K & t p b t β w t w h c r 3 s a K b l o t j a P s ( r P a ( p f fi i a b p f f f P w g D I ( t i statistical distributions of the N r KPIs for the cases belonging to W r ∗. That is, the behavior of each diagnosis system is modeled by means of N r × M ∗ PDFs. The estimated statistical distribution of the n th KPI for the subset of cases diagnosed as m by the diag- nosis system r is p(x n | w r∗m ) . The choice of the PDF that estimates each one of these distributions is done according to the maximum likelihood (ML) criterion. To do so, some families of PDFs are con- sidered in the fitting procedure (Table 1 ). In a first step, the distri- bution of the KPI x n from the cases labeled as w r∗ m is fitted attend- ing to the ML criterion with each one of the considered families of PDFs. This results in a set of candidates for estimating its dis- tribution. These PDFs are then sorted by their likelihood and the one with the maximum value is chosen to be the estimation for the KPI. The reason for considering these families of PDFs is to get the better estimation of the distribution of the KPI x n given its belong- ing to w r∗m . Fig. 4 a shows a normalized histogram of the KPI “95th percentile RSRP” from the cases labeled as w r∗m . In this figure, two families of PDFs have been used in an attempt of fitting the under- lying histogram, the normal and the generalized extreme value. As it can be seen, the latter fits it better, resulting in a higher value in a likelihood-ratio test. While some KPIs are counters and they do not have an up- per limit, there are others that are inherently bounded, as they are defined as a ratio. Normally, the beta PDF is used to fit these PIs, usually limited between zero and one ( Barco, Lazaro, Diez, Wille, 2008 ). KPIs like the retainability or the accessibility of- en reach these extreme values making the resulting fitted beta resent asymptotes in these values. To avoid this issue the used eta function β′ is slightly different from that from Table 1 , β. In his case, ′ (x ) = (1 − P 0 − P 1 ) β(x ) + P 0 /h β δ(x ) + P 1 /h β δ(x − 1) , (2) here β( x ) stands for the distribution fitted to a set with no ex- reme values; P 0 and P 1 stand for the relative frequency of cases ith value 0 and 1 respectively; δ stands for the Dirac’s delta and β stands for the step (the resolution) when computing β ′ . This an be seen in Fig. 4 b, where a normalized histogram for the KPI etainability is shown. .2. Combination of behavior models This stage uses the cases from the testing set. In the previous tage, the estimated functions have been seen as conditional prob- bility density functions, that is, functions that express how the PIs are distributed over the cases diagnosed with a given cause y a given system. However, this set of functions may be seen as ikelihood functions by just changing the approach. From this point f view, the function depends on w r∗m given that an observation of he random variable x n (that is, the n th KPI) has taken place. Now, assuming the KPIs are independent among each other, a oint probability function of w r∗m , that is, p( x | w r∗m ) , may be written s p( x | w r∗m ) = ∏ n ∈ N r p(x n | w r∗m ) . (3) Given (3) , and assuming that the prior probability of each cause, (w r∗m ) is given by | w r∗m | | W r∗| , the a posteriori probability for a diagnosis ystem r to diagnose a case with the cause m given its KPIs are x i.e., P (w r m | x ) ) can be calculated by just applying the Bayes’ theo- em. That is, (w r m | x ) = ⎧ ⎨ ⎩ p( x | w r∗m ) P (w r∗m ) ∑ w r∗ i ∈ W r∗ p( x | w r∗i ) P (w r∗i ) i f P (w r∗m ) > 0 0 i f P (w r∗m ) = 0 (4) At this point, some diagnosis system may have not diagnosed given cause as seen in Fig. 3 . In such case, P (w r∗m ) and thus 4) would result equal to zero. In any case, M × R a posteriori robabilities may be distinguished. Fig. 5 shows this when a case y rom the testing subset is to be diagnosed. As it can be seen in this gure, the KPIs from the case y act as input values in the behav- or models of the R diagnosis systems, i.e., the probability functions p( y | w r∗m ) for w r∗m ∈ W r∗ and r = 1 , . . . , R . Then, the a posteriori prob- bilities P (w r m | y ) are computed using these together with P (w r∗m ) y means of the Bayes’ theorem. Now, these M × R a posteriori probabilities together with the rior probabilities can be combined over R using some algebraic unctions, producing M probabilities of the kind P (w m | y ) Rule t per unction used, where m again stands for the cause and t is an index or the rule used in the combination, that is, (w m | y ) Rule t = f Rule t ( P (w 1 m | y ) , . . . , P (w R m | y ) ; P (w m ) ) . (5) here P ( w m ) is defined as the average of P (w r∗ m ) over r . Some rules for the combination of a posteriori probabilities iven by several classifying systems are proposed in Kittler, Hatef, uin, and Matas (1998) and studied further in Kuncheva (2002) . n the first, those rules are derived from a maximum a posteriori MAP) estimation in a multiple random variable scenario in an at- empt of lightening the effort s of computing several joint probabil- ty density functions. These rules are summarized in Table 2 . D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 61 Fig. 4. (a) Normalized histogram for the KPI 95th percentile RSRP and two fitted PDFs: a generalized extreme value PDF in blue (round markers) and a normal PDF in red (square markers). (b), Normalized histogram for the KPI Retainability and a β′ PDF estimation. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 5. Proposed method for combining diagnosis systems. Stage 2: Combining the behavior models. p n d o t c 4 i i c i c b As this point, the fault cause with the maximum a posteriori robability is taken as the final diagnosis per each rule of combi- ation, d t . That is, Rule t = arg max m { P (w m | y ) Rule t } . (6) Note that a situation with M ∗ < M means that there is at least ne fault cause that have not been identified by any system. In his case, it would be impossible for it to be finally diagnosed in onsequence. . Proof of concept In this section, the proposed method is assessed by combin- ng two different diagnosis models. In the first test, each model s provided by a different expert; in the second test, each model omes from using different machine learning algorithms for build- ng the diagnosis models, provided the same set of training ases. The proposed method has been evaluated and compared to the aseline systems by means of the following figures of merit: 62 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 Table 2 Algebraic rules for the combination of a posteriori probabilities. Rule P(w m | y ) Product rule P (w m ) −(R −1) R ∏ r=1 P (w r m | y ) Sum rule (1 − R ) P(w m ) + R ∑ r=1 P(w r m | y ) Max rule (1 − R ) P(w m ) + R R max r=1 { P(w r m | y ) } Min rule P (w m ) −(R −1) R min r=1 { P (w r m | y ) } Median rule R med r=1 { P(w r m | y ) } o t t G T t b p f t D e • Diagnosis Error Rate (DER): it is the ratio of problematic cases diagnosed as a fault cause different to the real one (misclassi- fied cases), N MPC , to the total number of problematic cases, N PC . • False Positive Rate (FPR): it is the number of normal cases di- agnosed as problematic cases, ( N FP ), to the total number of nor- mal cases, ( N NC ). • False Negative Rate (FNR): it is the number of problematic cases diagnosed as normal cases, N FN , to the total number of prob- lematic cases, N PC . This is the most critical metric, as it gives an idea on how often the diagnosis system interprets there is no problem when actually some cells are suffering from mal- functioning. Given these definitions, an Overall Error Rate (OER) may be de- fined as OER = P N · F P R + P PR · (F NR + DER ) (7) where P N stands for the relative frequency of the normal cases and P PC stands for the relative frequency of the faulty cases. This metric is useful to assess every method at a single glance. Since these fig- ures of merit require the true cause to be known, the used testing set will include the real diagnosis. 4.1. Combination of diagnosis models devised by multiple experts 4.1.1. Scenario In this test, cases are provided by an LTE RAN simulator ( Muñoz et al., 2011 ). This simulator considers an LTE network composed Table 3 Simulation parameters for cells normal functioning. Parameter Configuration Cellular layout Hexagonal grid, 57 cells, cell radius Transmission direction Downlink Carrier frequency 2.0 GHz System bandwidth 1.4 MHz, 6 PRB (Physical Resource B Frequency reuse 1 Propagation model Okumura-Hata with wrap-around, L σs f = 8 dB and correlation distance Channel model Multipath fading, ETU model Mobility model Random direction, 3 kph Service model Full Buffer, Poisson traffic arrival Base station model Tri-sectorized antenna, SISO, P T X max = Azimuth beamwidth = 70 °, Elevati Scheduler Time domain: Round-Robin, Frequen Power control Equal transmit power per PRB Link Adaptation Fast, CQI (Channel Quality Indicator Handover Triggering event = A3, HOM (Hando Measurement type = RSRP Radio Link Failure SINR < −6.9 dB for 500 ms, Mehlfü Traffic distribution Evenly distributed in space Time resolution 100 TTI (Transmission Time Interval Epoch & KPI time 100 s f 57 macro-cells evenly distributed in space and grouped into 19 hree-sector-sites. To perform this test, similar network configura- ion parameters to those used in Gómez-Andrades et al. (2015) and ómez-Andrades et al. (2016) have been used. They can be seen in able 3 . With this simulator, 1196 cases have been obtained. In this case, raining cases are not needed since the diagnosis models have een defined by experts. It is assumed that a detection system is laced before the input of the diagnosis system, so that only the aulty cases are put under test, putting aside the cases belonging o a normal cause of functioning. Therefore, in this test only the ER is taken into account. In this scenario, six typical RAN fault causes have been consid- red ( M = 6 ): • Excessive downtilt: This situation takes place when the coverage area for a cell is too small, making the signal level in the edge of the cell to be too weak and causing a high number of han- dover failures. The quality of the signal in the surroundings of the cell is also decreased. • Coverage hole: A cell has a coverage hole in some point inside its area when the power received by the user at this point from any cell is not enough to hold the service. This excessive atten- uation can be caused by either obstacles or a bad RF planning and it mainly produces a high number of call drops. • Inter-system interference: This fault cause may occur due to other cellular networks, like WCDMA. It is not always an easy issue to solve, since the fault usually comes from an outer sys- tem. This fault normally causes both the SINR and the average throughput decrease. • Too late handover: A too late handover takes place if a radio link failure occurs while the UE (User Equipment) is moving from one cell to another and the corresponding handover between these cells has not taken place yet. In that case, the UE will request the second cell a connection re-establishment using the physical cell ID of the first cell and its Common Radio Network Temporary Identifier (C-RNTI) in that first cell, which will alert the second cell a too late handover has occurred. • Excessive uptilt: A cell suffers from excessive uptilt when its cov- erage area is larger than necessary, normally because of a bad configuration of the radiation parameters of the antennas. This situation can result in the overlapping of coverage areas from 0.5 km lock) og-normal slow fading, = 50 m 43 dBm, Downtilt = 9 ° on beamwidth = 10 ° cy domain: Best Channel ) based, perfect estimation ver Margin) = 3 dB, hrer, Wrulich, Colom Ikuno, Bosanska, and Rupp (2009) ) (100 ms) D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 63 Table 4 Parameters used for modeling fault causes in Section 4.1 and a priori probabilities for each cause. Fault cause Configuration P ( ω m ) Excessive downtilt Downtilt = [16, 15, 14] ° 0 .18 Coverage hole �hole = [49, 50, 52, 53] dBm 0 .09 Inter-system interf. P T X max = 33 dBm 0 .1 Downtilt = 15 ° Azimuth beamwidth = [30, 60] ° Elevation beamwidth = 10 ° Too late HO HOM = [6, 7, 8] dBm 0 .23 Excessive uptilt Downtilt = [0, 1] ° 0 .21 Lack of coverage P T X max = [7, 8, 9, 10] dBm 0 .19 a c P s a u Table 5 Diagnosis models for the diagnosis sys- tems used in test 1: used thresholds. KPI Thresholds Retainability [0.973, 0.996] HOSR [0.899, 0.989] RSRP [dBm] [ −76 . 9 , −72 . 4] RSRQ [dB] [ −18 . 8 , −18 . 2] SINR [dB] [13, 14.5] Throughput [kbps] [96.2, 111.67] Distance [km] [0.838, 0.88] 4 d v i h W t k s f W o & p “ a f s l l c t e r i d s 6 4 M a T t possibly non-adjacent cells, producing a high number of han- dovers and call drops in this cell and its neighbors • Lack of coverage: A user suffers from weak coverage when the Signal-to-Interference-Plus-Noise Ratio (SINR) measured in the cell is below the minimum level needed to maintain a planned performance requirement because the received power is low. The simulation parameters used to model these degradations re shown in Table 4 , as well as the a priori probability of these auses to take place, given by the experts. In this case, P (w 1 ∗m ) = (w 2 ∗m ) � = 0 ∀ m, so P (w m ) = P (w 1 ∗m ) = P (w 2 ∗m ) . As it can be seen, everal values have been used for modeling a single fault cause, ccording to lighter and more severe degradation. In this test, seven observable features or KPIs ( N = 7 ) have been sed to discern among this set of causes: • Retainability , given as a percentage. This performance indicator quantifies the ability of the cell to hold the service once ac- cepted by the admission control. It gives an idea on how often a user experiences a call drop. • Handover success rate (HOSR) , given as a percentage. This KPI measures the ability of the network to provide mobility to a user without losing its connection. It can be calculated as the ratio between the number of successful handovers and the total number of HO. • 95th percentile RSRP , given in dBm. The Reference Signal Re- ceived Power (RSRP) is defined as the linear average over the power contributions (in [W]) of the resource elements that carry cell-specific reference signals within the considered mea- surement frequency bandwidth. • 5th percentile RSRQ , given in dB. The Reference Signal Received Quality (RSRQ) is a signal quality indicator and is defined as the ratio RSRQ = N PRB · RSRP RSSI , (8) where N PRB is the number of resource blocks of the E-UTRA car- rier RSSI measurement bandwidth and RSSI stands for the to- tal received power within the measurement bandwidth. This is, considering the power from the serving cell, the power of the co-channel serving and non-serving cells, the adjacent chan- nel interference and any possible source of noise. In this paper, RSRQ is expressed in dB. • 95th percentile SINR , given in dB. The Signal-to-Interference- plus-Noise Ratio (SINR) is defined as the ratio between the power of the desired data signal and the sum of the powers of all inter-cell interferences and the noise. It is expressed in dB. • 95th percentile distance , given in km. This KPI measures the dis- tance between users and their serving cell, expressed in km. It can be estimated attending to the transmission delay between them and gives an idea of the cell coverage area. • Average throughput , given in kbps. In LTE systems, the user throughput depends on the SINR experienced by the user through the following equation, 3GPP (c) , T k = (1 − BLER (SINR k )) · D k T T I , (9) where BLER is the Block Error Rate obtained from the users’ SINR, D k is the data block payload in bits of user k and TTI is the transmission time interval. In order to show the impact a proper modeling may have in the diagnosis performance of the proposed method the propor- tion of cases used for the modeling to the testing set has been varied from 25% to 75%. To obtain more reliable results when the number of cases are scarce either in the testing or in the modeling set, 50 repetitions have been made per modeling-to- testing ratio, randomizing the cases assigned to each set. Then, the resulting diagnosis error rates have been averaged over the 50 repetitions. .1.2. The standalone classifiers In this test, for a given technique of automatic diagnosis, two iagnosis models are combined, R = 2 , where each of them is pro- ided by a different expert. This test represents the usual case n cellular networks where each troubleshooting expert defines is own set of rules and KPI thresholds to identify problems. hen deploying the diagnosis system in a network, according to he proposed method, instead of choosing one single model, the nowledge from both experts is fused by combining two diagno- is models. Furthermore, both diagnosis models comprise the six ault causes and the seven different KPIs described above. That is, 1 = W 2 with | W 1 | = M and N 1 = N 2 . The artificial intelligence technique used for these tests is based n a Fuzzy Logic Controller (FLC) ( Khatib, Barco, Gómez-Andrades, Serrano, 2015 ). This system contains rules, which are com- osed of the antecedent (the “if ...” part) and the consequent (the then ...” part), being the last the cause the fuzzy logic controller ssigns to a case if the antecedent is fulfilled attending to the uzzyfied observable features of the case. On the one hand, Table 5 hows the thresholds used in both diagnosis models. The lower imit stands for the value below which a KPI is considered to be ow; the upper limits stands for the value above which a KPI is onsidered to be high. On the other hand, Table 6 shows the if ..., hen ... rules that make up each diagnosis model, given by each xpert. From left to right, each column below “KPI” in Table 6 cor- esponds to the KPIs shown in Table 5 . H stands for a high value n that KPI and L for a low value. Regarding the numbering of the iagnoses, 1 means excessive downtilt; 2: coverage hole; 3: inter- ystem interference; 4: too late handover; 5: excessive uptilt and : lack of coverage. .1.3. Results Table 7 shows the diagnosis error rates computed when the ax rule is used for combining ( Table 2 ). In Table 7 , the aver- ge diagnosis error rate and the rate of improvement are shown. his last rate represents the amount of repetitions (among the 50 hat have been performed) in which the diagnosis error rate from 64 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 Table 6 Diagnosis models for the diagnosis systems used in test 1: used rules. Diagnosis model 1 Diagnosis model 2 KPI Diag. KPI Diag. L L H L – H L 1 – – H L – H L 1 H H – L L H L 1 L – H H – L H 2 L – – H H – H 2 L – H H H – H 2 L – – H L L H 3 L – – H L L H 3 L L H – L L H 3 L – H – L L H 3 – – H H H H – 4 L – H H L L – 3 – H H – H H – 4 L H – H L L – 3 H – H – H H – 4 – H H H L L H 3 – – H – H H H 4 L L – – – H H 4 H H – – H – H 4 L L – L – – H 4 H H – L – – H 4 L L H – H L – 4 H H – – – H H 4 L L H – H – H 4 H H H – – – H 4 L L L L L L – 4 – – H L H – H 4 – – L H – L H 5 – – H L – L H 4 H H – H – L H 5 L L H L – – H 4 H H L – L L H 5 – – L – L L H 5 – – – L L H L 6 – – L – L H L 6 L – – L L H L 6 – H – L L H L 6 H H – – L H L 6 L L L L L – L 6 Table 7 Results of test 1: Combining two versions of the same classifying algo- rithm. Modeling-to-testing ratio 25% 50% 75% Diagnosis syst. 1, average DER 13.81% 13.7% 13.65% Diagnosis syst. 2, average DER 16.34% 16.13% 16.3% Ens. Method: Max rule average DER 8.29% 5.92% 5.34% Rate of improvement 60% 98% 100% Table 8 Main parameters of the real LTE network used in test two. Parameter Configuration Network Layout Urban area Number of cells 8679 System bandwidth 10 MHz Number of PRBs 50 Frequency reuse factor 1 Max. Transmitted Power 46 dBm Max. Transmitted Power of UE 23 dBm Horizontal HPBW (Half-Power Beam Width) 65 ° HOM 3 dB KPI Time Period Hourly Number of observed cells 45 Number of days under observation 6 days per cell (on average) Size of the dataset 14 ,692 labeled cases w f b 4 p I v m d b a i t b 1 b k c the ensemble method is lower than the best one provided by the baseline diagnosis systems. With a 25% of modeling-to-testing ra- tio only 60% of the iterations shows a better ensemble diagnosis error rate than the ones from its base diagnosis systems, showing, therefore, little improvement in the average diagnosis error rate. This result highlights how the scarcity of cases for modeling im- pacts on the classifying performance of the ensemble. However, if the number of cases used for modeling is doubled 98% of the iter- ations shows a better diagnosis error rate, which results also in a lower average diagnosis error rate. In case the modeling-to-testing ratio is set to 75% every diagnosis error rate provided by the en- semble method is lower than the lowest provided by its compo- nents, reaching a 5.34% on average. This means a DER of approxi- mately 1/3 the lowest DER achieved by the standalone classifiers. Regarding the DER of the standalone diagnosis systems, it can be seen how these are held over the modeling-to-testing ratio. This is because of the randomizing process executed over the la- beled cases to be divided into the modeling and testing subsets. When this random permutation is performed a number of times and some subsets (two, in this case) are chosen blindly from this set, the averages of the amount of cases labeled with a given cause in each of these subsets tend to the ratios of the labels from the original set. This is a consequence of the law of large numbers. For this reason, the resulting averaged DER of these baseline systems is independent on the size of the subsets made from the original set of cases. 4.2. Combination of different diagnosis systems on a live network Once the proposed method has been tested with cases provided by a simulator, a second test with cases from a real live LTE net- ork has been performed. In this test, the diagnosis models built rom two different machine learning algorithms have been com- ined. .2.1. Scenario An LTE network composed of more than 80 0 0 different cells roviding coverage to almost 4 million people has been analyzed. ts vastness makes many different cells to coexist and also a wide ariety of problematic causes to come up. Table 8 summarizes the ain parameters of the network. Among all the available candi- ates, 45 random cells have been chosen to represent the network ehavior. These cells have been monitored for almost 6 days on verage and their KPIs have been stored in an hourly basis. Tak- ng into account that the state of a single cell varies substantially hroughout the day due to the traffic fluctuation, several cases have een stored from each cell at different hours, resulting in a total of 4,692 cases. Once these cases were gathered, they were all labeled y the experts, distinguishing four groups of cases ( M = 4 ): three inds of problematic patterns and the normal cell functioning. The auses of malfunctioning that were found are: • Overload : This fault cause is mainly distinguished by a high number of RRC connections in the cell, which makes the CPU processing load and the number of HO attempts raise conse- D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 65 Table 9 Prior probability of occurrence for the causes considered in test two, P ( w m ). P (Overload) P (Lack of cov.) P (Non-operating) P (Normal) 0 .01 0 .22 0 .47 0 .3 c T f p p p i t S w N 4 l c a d i & r d a e Table 10 Diagnosis models for the diagnosis systems used in test 2: used thresholds. KPI Thresholds Retainability [0.99, 0.997] Accessibility [0.992, 0.998] Number of RRC Connections [5846, 20703] Number of ping-pong HO [18, 83] Number of bad cov. reports [217, 1070] CPU average load [%] [22.5, 34.45] i w o g f t c p t m e n t s w i s s l h v s 1 n 4 p f i i A m T m t t f m b m g a n r n c a n i F d s quently. The accessibility and retainability KPIs also hold values quite below the ones for a cell with normal functioning. • Lack of coverage : This issue can be identified based on the num- ber of bad coverage evaluation reports, which should be notice- ably high. • Non-operating cell : In this case, and only if the cell is report- ing any KPI measurement, most of the reported measurements should be near zero: the retainability, the accessibility, the number of performed HO, the number of RRC connections or the number of coverage reports. The a priori probability of occurrence of each class has been omputed as the average of P (w r∗m ) over r within this selection, able 9 . From this table it should be noted that there are more aulty cases than healthy ones. This is because a previous non- erfect faulty cases detecting stage has been applied, which by- assed some normal cases that now are to be diagnosed as such. At this point, a 20% of the total number of cases (holding the roportion shown in Table 9 between them) were used as a train- ng set for the machine learning algorithms and the rest were used o conform the modeling and testing sets in a ratio that, as in ection 4.1.3 , was varied along the test. In this test six of the most representative KPIs in an LTE net- ork have been chosen to discern between the possible diagnoses, = 6 : • Retainability : described in Section 4.1.1 . • Accessibility : It is used to show the percentage of connections that have got access to that cell over the KPI time period. A low value in this KPIs means that many connections have been blocked during the access procedure. • Number of RRC connections : It is the number of successfully es- tablished RRC connections. Related to the Accessibility KPI, it gives an idea of the amount of users served by the cell. • Number of Ping-Pong Handovers : This KPI counts the number of ping-pong HO that takes place in the cell over the measure- ment time period. A high value in this KPI may mean a bad configuration in the handover policy, as the number of connec- tions that goes back and forth over a cell and its neighbors is high for a single call. • Number of bad coverage reports : It counts the number of times a cell is notified that the UE measured a signal level in which the requirements for the Event A2 takes place, 3GPP (a) . This is, the measured signal level is under a certain threshold. • CPU average load : It is the average CPU load due to the pro- cesses carried out by the cell over the KPI time period. .2.2. The standalone classifiers In this test, the two used standalone classifiers share a simi- ar diagnosis system, a fuzzy-logic controller, which diagnoses the ases attending to if . . . , then . . . rules. The difference resides in the lgorithms they use for learning the rules they apply during the iagnosis process. The first is a genetic algorithm and the second s a data driven algorithm ( Khatib, Barco, Gómez-Andrades, Muñoz, Serrano, 2015; Khatib, Barco, Gómez-Andrades, & Serrano, 2015 ) espectively. In genetic algorithms, three main processes may be istinguished: reproduction, by means of which new individuals re created by either mutation or combination of the previously xisting; evaluation, or the calculation of the probability of each ndividual to survive and reproduce, and selection, a process in hich some individuals are chosen to survive and reproduce based n the results from the evaluation stage. Likewise, data driven al- orithms first take a case from the training set and derives the uzzy rule that covers it. Then, it looks for the cases covered by his rule and scores the rule attending to the number of covered ases. New incoming cases are taken until the training set is com- letely explored. Provided this set of scored rules, the algorithm hen fuses them into a lower number of rules in a attempt of aximizing the number of cases (and therefore, the score) cov- red by the resulting fused rules. In these tests, it is assumed that ot only faulty cases, but also some normal cases are inputs for he diagnosis stage. This can happen when there is no detection ystem before the diagnosis system or in the realistic situation in hich the detection system has a given probability of error. As n Section 4.1.2 , both systems take as possible output all the pre- ented diagnoses making use of the six KPIs shown above. Table 10 hows the thresholds used for these KPIs to consider them high or ow and Table 11 shows the rules each machine learning algorithm as derived from the testing set. As in Table 6 , H stands for a high alue of the KPI and L, for a low value. The KPIs are sorted in the ame way as in Table 10 and the numbering of the diagnoses are : CPU overload; 2: lack of coverage; 3: non-operating cell and 4: ormal functioning. .2.3. Results Once the standalone diagnosis systems have been trained, the erformance metrics DER, FPR, FNR and OER have been computed or both the standalone diagnosis systems and the rules described n Table 2 . In this test, the modeling-to-testing ratio has been var- ed from 10% to 90% in steps of 10, making 10 iterations per step. s in Section 4.1.3 , a random permutation of the cases used for odeling and testing has been done in each of these 10 iterations. he resulting metrics have been then averaged. Table 12 shows the etrics that result of using a proportion of 60% in the modeling- o-testing ratio. This ratio has proved to minimize the values of all he metrics in this test. Unlike in Section 4.1 , this scenario is made rom real cases and contains outliers, that is, atypical cases. As the odeling-to-testing ratio rises, the probability for these outliers to elong to the modeling set also rises, thus inducing the behavior odels to deviate from modeling the trend of the typical cases iven a fault cause. On the other hand, if no outliers are taken into ccount during the model-fitting procedure their fault cause will ot be predictable in the second stage and the error rates will also ise up. As it can be seen in Table 12 , in most cases, the combined diag- osis system outperforms the standalone diagnosis systems. Con- retely, the median rule achieves the lowest overall error rate with 5.39%, approximately 2/3 from that of the best standalone diag- osis system. However, the most relevant improvement takes place n the reduction of the FNR, which has been reduced a 46%. The NR gives an idea of the amount of problematic causes wrongly eemed as normal. It is crucial making this metric as low as pos- ible, since considering a problematic case as normal may result 66 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 Table 11 Diagnosis models for the diagnosis systems used in test 2: used rules. Diagnosis model 1: Diagnosis model 2: from genetic algorithm from data driven algorithm KPI Diagnosis KPI Diagnosis H H – – L L 1 H H L – L L 1 H – H H L L 1 H H – H L L 1 – H H L – L 2 H – H H L L 1 H – – L – H 2 – L H L – H 2 H – – L H – 2 – H H – H H 2 H – H L – – 2 – – H L H H 2 – L H L – H 2 L – H – H H 2 H H – – H – 2 H H H – H – 2 – H H – H – 2 H L – L L H 2 – – H L H – 2 H – H L – H 2 H – H – H H 2 H – H L H – 2 H – H – H H 2 – L L L L L 3 – L L L L L 3 L L L L H – 4 L L – – H H 4 – L L L H H 4 L L L – H – 4 L L L – H H 4 L L H L L L 4 L – L L H H 4 L L L – – H 4 L – L – H H 4 – – L L H H 4 L L L H – – 4 Table 12 Results of test 2: Combining two different algorithms. DER FPR FNR OER Training: Data driven algorithm 2 .62% 16 .91% 6 .47% 11 .43% Training: Genetic algorithm 1 .87% 16 .61% 2 .68% 8 .16% Ensemble method Product rule 2 .6% 12 .21% 1 .32% 6 .2% Sum rule 1 .78% 11 .55% 1 .25% 5 .59% Max rule 1 .78% 11 .51% 1 .25% 5 .57% Min rule 2 .05% 11 .42% 1 .4% 5 .84% Median rule 1 .78% 10 .67% 1 .34% 5 .39% Majority vote rule 1 .78% 11 .23% 1 .25% 5 .49% in the worst case in unnoticed service outages and degradation in the network performance. Regarding this, the proposed method has proved to successfully reduce the FNR. Other indicators are not as critical. For example, misleading a fault cause with another may be to some extent tolerable (DER); although the actual problem is not that one the operator thinks it is, he is still aware of a problem in the network. Even considering normal cases as faulty may be tolerable as the network performance is not really degraded (FPR). These results can also be seen in the normalized confusion matrices from the diagnosis methods. Fig. 6 a shows the normal- ized confusion matrix for the FLC using genetic algorithm for rule learning; Fig. 6 b shows the confusion matrix given the data driven algorithm was used for learning the rules and Fig. 6 c shows the matrix from applying the median rule with a 60% of modeling- to-testing ratio in the ensemble method. In these matrices, the elements from the fourth column (excluding the main diagonal) account for the false negatives and the elements from the fourth row account for the false negatives. It can be seen how the elements from the main diagonal are reinforced in the ensemble method and how only those diagnoses which are mistaken by both baseline systems are slightly inherited by the latter. Fig. 6 c also shows graphically how the FPR and FNR dropped with respect to those from the standalone systems. 5. Future lines of work • Decision templates . The proposed method does not punish or reward the classifiers according to its performance during the training stage. Going a step further from the idea of the weighted majority vote used in Wei et al. (2014) , a score system based on class and classifier aware decision templates applied over the a posteriori probabilities P (w r m | x ) from Eq. (4) could be used to improve the overall accuracy. • Non-parametric PDFs . The proposal of analytically and parameter-defined PDFs results in a really light way of representing a statistical behavior, as only its parameters must be stored Table 1 to model a diagnosis system. However, these distributions may limit to some extent the statistical representation of the features from the training cases and they may eventually introduce a source of error in the posterior computation of P (w r m ) in case these cases follow a distribu- tion that has not been considered. To solve this, the future research could focus on using non-parametric PDFs, like the kernel-based ones. Probability density functions may be classified into parametric and non-parametric functions. The former have analytic expres- sions and their shape depends on the parameters those func- tions hold. The latter, however, are defined by means of a ker- nel function. If all the cases from the dataset are placed along the axis given by a feature of interest (a certain KPI, for exam- ple) and a kernel function is centered wherever a point is, an empirical non-parametric PDF would result from averaging the sum of these functions over the number of cases. The main ad- vantage of this method is its accuracy when modeling an em- pirical distribution. Its main drawback is that, since it is not defined by any parameters, it should be computed and stored point by point, possibly increasing the storage and comput- ing requirements. This method, however, may be used together with (c). First, a reduced set of synthetic KPIs is computed and then, their PDFs are accurately estimated with this method. • Use of synthetic KPIs via feature extraction. As it is described in Section 3.1 , N r × M ∗ × R PDFs should be estimated in order to model all the feature-class-classifier relations. If any of these factors is relatively high, the computing cost for all these PDFs to be computed could be prohibitive. Due to this, working with a reduced group of synthetic/extracted features is proposed in an attempt of mapping the N original features into ˆ N synthetic features with ˆ N < N. D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 67 Fig. 6. Normalized confusion matrices for the second test. 6 k i l k t s t a o t j s e h u g b t t T i r c In the recent years and mainly motivated by the impulse of data mining many methods for dimensionality reduction have arisen. Within these, it is worth highlighting the Principal Component Analysis method (PCA) ( Jolliffe, 2002 ). In an N - dimensional vector space, the simplest version of PCA (linear PCA) is a technique that finds the mutually-uncorrelated vec- tors onto which the projection of the samples generates the highest variances. The result is a set of orthogonal vectors sorted in descending order of achieved variance. The first of these vectors is that onto which the variance of the projec- tion of the samples is maximum. In this sense, the original KPIs constitute the N -dimensional vector space basis, whereas the ˆ N synthetic KPIs represent the orthogonal vectors with the high- est variance. To be rigorous, up to N synthetic orthogonal KPIs may be computed. However, only a small set of them, the first ˆ N , is enough to account for most of the variance of the data. By applying this technique, based on the eigenvalue decompo- sition of the covariance matrix of the original KPIs, these can be mapped into ˆ N , preserving most of the information contained in the former. . Conclusions A hybrid ensemble of classifiers, devised to merge expert nowledge from different sources has been presented and assessed n the context of fault cause diagnosis in cellular networks, al- owing the expertise from several troubleshooting experts and the nowledge contained in databases of cases previously diagnosed o be combined in order to develop a more accurate diagnosis ystem. Unlike the common approach of hybrid ensembles, based on he majority vote of their baseline components, this work proposes hybrid ensemble of classifiers obtained from the combination f the statistical behavior models of the baseline diagnosis sys- ems. This approach allows obtaining and afterwards combining by ust applying some algebraic rules the partial diagnoses from the tandalone classifiers without actually needing them to assess ev- ry case under test, thus reducing the computational cost of usual ybrid ensembles of classifiers. The method has been tested with two different sources of cases nder test: cases provided by an LTE RAN simulator and cases athered from a real live LTE network. Likewise, two use cases have een assessed: the combination of diagnosis models designed by wo different network troubleshooting experts and the combina- ion of two diagnosis systems using different learning algorithms. he proposed method has proved to outperform the behavior of ts base components in both tests in terms of the diagnosis error ate, proving to be an effective tool in the fault cause diagnosis in urrent and future self-healing networks. 68 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 G J J K K K L M N S W W W Y Acknowledgment This work has been partially funded by Optimi-Ericsson, Junta de Andalucía (Consejería de Ciencia, Innovación y Empresa, Ref. 59288 and Proyecto de Investigación de Excelencia P12-TIC-2905) and ERDF. References 3GPP (a). Evolved Universal Terrestrial Radio Access (E-UTRA) Radio Resource Con- trol (RRC); Protocol Specification, Rel-13, Version 13.2.0, (2015-12). TS 36.331. 3rd Generation Partnership Project. 3GPP (b). Feasibility study for Further Advancements for E-UTRA (LTE-Advanced), Rel-13, Version 13.0.0 (2015–12). TR 36.912. 3rd Generation Partnership Project. 3GPP (c) (May 2004). OFDM-HSDPA System level simulator calibration (R1-040500). 3GPP TSG-RAN WG1 37 . 3rd Generation Partnership Project (3GPP) . 3GPP (d). Self-Organizing Networks (SON); Concepts and requirements, Rel-13, Ver- sion 13.0.0 (2015–12). TS 32.500. 3rd Generation Partnership Project. 3GPP (e). Self-Organizing Networks (SON); Self-Healing concepts and requirements, Rel-13, Version 13.0.0 (2015–12). TS 32.541. 3rd Generation Partnership Project. Barco, R., Díez, L., Wille, V., & Lázaro, P. (2009). Automatic diagnosis of mobile com- munication networks under imprecise parameters. Expert Systems with Applica- tions, 36 (1), 489–500. doi: 10.1016/j.eswa.2007.09.030 . Barco, R., Lazaro, P., Diez, L., & Wille, V. (2008). Continuous versus discrete model in autodiagnosis systems for wireless networks. IEEE Transactions on Mobile Com- puting, 7 (6), 673–681. doi: 10.1109/TMC.2008.23 . Barco, R., Lázaro, P., Wille, V., Díez, L., & Patel, S. (2009). Knowledge acquisition for diagnosis model in wireless networks. Expert Systems with Applications, 36 (3), 4745–4752. doi: 10.1016/j.eswa.2008.06.042 . Barco, R., Lázaro, P., & Muñoz, P. (2012). A unified framework for Self-Healing in wireless networks. IEEE Communications Magazine, 50 (12), 134–142. doi: 10.1109/ MCOM.2012.6384463 . Begum, S., Chakraborty, D., & Sarkar, R. (2015). Cancer classification from gene ex- pression based microarray data using SVM ensemble. In 2015 international con- ference on condition assessment techniques in electrical systems (CATCON) (pp. 13– 16). doi: 10.1109/CATCON.2015.7449500 . Breiman, L. (1996). Bagging predictors. In Machine learning (pp. 123–140) . Ciocarlie, G., Lindqvist, U., Nováczki, S., & Sanneck, H. (2013). Detecting anomalies in cellular networks using an ensemble method. In Proceedings of the 9th interna- tional conference on network and service management (CNSM 2013) (pp. 171–174). doi: 10.1109/CNSM.2013.6727831 . Dasarathy, B., & Sheela, B. V. (1979). A composite classifier system design: con- cepts and methodology. Proceedings of the IEEE, 67 (5), 708–713. doi: 10.1109/ PROC.1979.11321 . Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55 (1), 119–139. doi: 10.1006/jcss.1997.1504 . Gandhi, I., & Pandey, M. (2015). Hybrid ensemble of classifiers using voting. In 2015 international conference on green computing and internet of things (ICGCIoT) (pp. 399–404). doi: 10.1109/ICGCIoT.2015.7380496 . Gómez-Andrades, A., Muñoz Luengo, P., Khatib, E., de la Bandera Cascales, I., Ser- rano, I., & Barco, R. (2015). Methodology for the design and evaluation of Self-Healing LTE networks. IEEE Transactions on Vehicular Technology, PP (99). doi: 10.1109/TVT.2015.2477945 . 1–1 ómez-Andrades, A., Muñoz, P., Serrano, I., & Barco, R. (2016). Automatic root cause analysis for LTE networks based on unsupervised techniques. IEEE Transactions on Vehicular Technology, 65 (4), 2369–2386. doi: 10.1109/TVT.2015.2431742 . acobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computing, 3 (1), 79–87. doi: 10.1162/neco.1991.3.1.79 . olliffe, I. (2002). Principal component analysis. Springer Series in Statistics (2nd). Springer-Verlag New York . hatib, E. J., Barco, R., Gómez-Andrades, A., Muñoz, P., & Serrano, I. (2015). Data mining for fuzzy diagnosis systems in LTE networks. Expert Systems with Appli- cations, 42 (21), 7549–7559. doi: 10.1016/j.eswa.2015.05.031 . hatib, E. J., Barco, R., Gómez-Andrades, A., & Serrano, I. (2015). Diagnosis based on genetic fuzzy algorithms for LTE Self-Healing. IEEE Transactions on Vehicular Technology . doi: 10.1109/TVT.2015.2414296 . Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (3), 226–239. doi: 10. 1109/34.667881 . uncheva, L. (2002). A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2), 281–286. doi: 10. 1109/34.982906 . iu, H., Chen, G., Song, G., & Han, T. (2009). Analog circuit fault diagnosis using bagging ensemble method with cross-validation. In International conference on mechatronics and automation, 2009. ICMA 2009 (pp. 4 430–4 434). doi: 10.1109/ ICMA.2009.5246675 . ehlführer, C. , Wrulich, M. , Colom Ikuno, J. , Bosanska, D. , & Rupp, M. (2009). Sim- ulating the long term evolution physical layer. In Proc. of 17th European signal processing conference (EUSIPCO) . Muñoz, P., de la Bandera, I., Ruíz, F., Luna-Ramírez, S., Barco, R., Toril, M., et al. (2011). Computationally-efficient design of a dynamic system-level LTE simu- lator. International Journal of Electronics and Telecommunications, 57 (3), 347–358. doi: 10.1155/2012/802606 . ováczki, S. (2013). An improved anomaly detection and diagnosis framework for mobile network operators. In 2013 9th international conference on the design of reliable communication networks (drcn) (pp. 234–241) . Shen, H.-B., & Chou, K.-C. (2006). Ensemble classifier for protein fold pattern recog- nition. Bioinformatics, 22 (14), 1717–1722. doi: 10.1093/bioinformatics/btl170 . zilágyi, P., & Nováczki, S. (2012). An automatic detection and diagnosis framework for mobile communication systems. IEEE Transactions on Network and Service Management, 9 (2), 184–197. doi: 10.1109/TNSM.2012.031912.110155 . ei, H., Lin, X., Xu, X., Li, L., Zhang, W., & Wang, X. (2014). A novel ensemble clas- sifier based on multiple diverse classification methods. In 2014 11th interna- tional conference on fuzzy systems and knowledge discovery (FSKD) (pp. 301–305). doi: 10.1109/FSKD.2014.6980850 . iezbicki, T., & Ribeiro, E. P. (2016). Sensor drift compensation using weighted neu- ral networks. In 2016 IEEE conference on evolving and adaptive intelligent systems (EAIS) (pp. 92–97). doi: 10.1109/EAIS.2016.7502497 . u, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14 (1), 1–37. doi: 10.1007/s10115- 007- 0114- 2 . uksel, S., Wilson, J., & Gader, P. (2012). Twenty years of mixture of experts. IEEE Transactions on Neural Networks and Learning Systems, 23 (8), 1177–1193. doi: 10. 1109/TNNLS.2012.2200299 . http://dx.doi.org/10.13039/501100002878 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0001 http://dx.doi.org/10.1016/j.eswa.2007.09.030 http://dx.doi.org/10.1109/TMC.2008.23 http://dx.doi.org/10.1016/j.eswa.2008.06.042 http://dx.doi.org/10.1109/MCOM.2012.6384463 http://dx.doi.org/10.1109/CATCON.2015.7449500 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0007 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0007 http://dx.doi.org/10.1109/CNSM.2013.6727831 http://dx.doi.org/10.1109/PROC.1979.11321 http://dx.doi.org/10.1006/jcss.1997.1504 http://dx.doi.org/10.1109/ICGCIoT.2015.7380496 http://dx.doi.org/10.1109/TVT.2015.2477945 http://dx.doi.org/10.1109/TVT.2015.2431742 http://dx.doi.org/10.1162/neco.1991.3.1.79 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0015 http://dx.doi.org/10.1016/j.eswa.2015.05.031 http://dx.doi.org/10.1109/TVT.2015.2414296 http://dx.doi.org/10.1109/34.667881 http://dx.doi.org/10.1109/34.982906 http://dx.doi.org/10.1109/ICMA.2009.5246675 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021 http://dx.doi.org/10.1155/2012/802606 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0023 http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0023 http://dx.doi.org/10.1093/bioinformatics/btl170 http://dx.doi.org/10.1109/TNSM.2012.031912.110155 http://dx.doi.org/10.1109/FSKD.2014.6980850 http://dx.doi.org/10.1109/EAIS.2016.7502497 http://dx.doi.org/10.1007/s10115-007-0114-2 http://dx.doi.org/10.1109/TNNLS.2012.2200299 Combination of multiple diagnosis systems in Self-Healing networks 1 Introduction 2 Problem formulation 2.1 Root cause analysis in mobile communications networks 2.2 Automated diagnosis from the classification theory 2.3 State-of-the-art in ensemble-based classification algorithms 3 Method for combining multiple automatic diagnosis systems 3.1 Construction of the behavior models 3.2 Combination of behavior models 4 Proof of concept 4.1 Combination of diagnosis models devised by multiple experts 4.1.1 Scenario 4.1.2 The standalone classifiers 4.1.3 Results 4.2 Combination of different diagnosis systems on a live network 4.2.1 Scenario 4.2.2 The standalone classifiers 4.2.3 Results 5 Future lines of work 6 Conclusions Acknowledgment References