Combination of multiple diagnosis systems in Self-Healing networks


Expert Systems With Applications 64 (2016) 56–68 

Contents lists available at ScienceDirect 

Expert Systems With Applications 

journal homepage: www.elsevier.com/locate/eswa 

Combination of multiple diagnosis systems in Self-Healing networks 

David Palacios, Emil J. Khatib, Raquel Barco ∗

Communications Engineering Dept., University of Málaga, 29071, Málaga, Spain 

a r t i c l e i n f o 

Article history: 

Received 8 January 2016 

Revised 20 July 2016 

Accepted 21 July 2016 

Available online 22 July 2016 

Keywords: 

LTE 

Self-healing 

Root cause analysis 

Self-organizing networks (SON) 

Hybrid ensemble classifier 

Automatic fault identification 

a b s t r a c t 

The Self-Organizing Networks (SON) paradigm proposes a set of functions to automate network manage- 

ment in mobile communication networks. Within SON, the purpose of Self-Healing is to detect cells with 

service degradation, diagnose the fault cause that affects them, rapidly compensate the problem with the 

support of neighboring cells and repair the network by performing some recovery actions. 

The diagnosis phase can be designed as a classifier. In this context, hybrid ensembles of classifiers en- 

hance the diagnosis performance of expert systems of different kinds by combining their outputs. In this 

paper, a novel scheme of hybrid ensemble of classifiers is proposed as a two-step procedure: a modeling 

stage of the baseline classifiers and an application stage, when the combination of partial diagnoses is ac- 

tually performed. The use of statistical models of the baseline classifiers allows an immediate ensemble 

diagnosis without running and querying them individually, thus resulting in a very low computational 

cost in the execution stage. 

Results show that the performance of the proposed method compared to its standalone components 

is significantly better in terms of diagnosis error rate, using both simulated data and cases from a live 

LTE network. Furthermore, this method relies on concepts which are not linked to a particular mobile 

communication technology, allowing it to be applied either on well established cellular networks, like 

UMTS, or on recent and forthcoming technologies, like LTE-A and 5G. 

© 2016 The Authors. Published by Elsevier Ltd. 

This is an open access article under the CC BY-NC-ND license 

( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). 

 
S  

t  

t  

(

 
1. Introduction 

The growing demand for mobile services with ever-increasing

bandwidth and the expanding number of users make necessary

the deployment of new and more efficient mobile communication

networks over the existing ones (GSM, UMTS), such as Long-Term

Evolution (LTE). However, the complexity of this heterogeneous

scenario, which comprises several Radio Access Technologies

(RAT), requires challenging maintenance and complex operational

tasks. Mobile operators need to offer new demanding services

without increasing either operational expenditures (OPEX) or

capital expenditures (CAPEX). In order to deal with that problem,

the 3rd Generation Partnership Project (3GPP) has proposed Self-

Organizing Networks (SON) ( 3GPP (d) ) as networks that include

mechanisms to automate network procedures in order to help mo-

bile operators with their management work, providing significant

cost reduction. This automation of network management will also

be essential in near and future technologies, like LTE-Advanced

and 5G ( 3GPP (b) ). 
∗ Corresponding author. Fax: +34952132027. 
E-mail addresses: dpc@ic.uma.es (D. Palacios), emil@uma.es (E.J. Khatib), 

rbarco@uma.es (R. Barco). 

 
http://dx.doi.org/10.1016/j.eswa.2016.07.030 

0957-4174/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article u
SON comprises three groups of functions: Self-Configuration,

elf-Optimization and Self-Healing. The aim of the latter is to au-

onomously solve the problems that a cell, with service degrada-

ion or outage, could present ( 3GPP (e) ; Barco, Lázaro, and Muñoz

2012) ). This is done by means of four stages: 

• Fault Detection: Responsible for finding cells with problems, i.e.,

cells experiencing service outage or just suffering an unaccept-

able service degradation. 
• Diagnosis of the fault cause: In this step, the actions to be per-

formed in order to recover the system from the degradation it

is suffering are decided. This step can be divided into two sub-

stages: Fault Identification, this is, identifying the fault cause

based on observable symptoms such as Key Performance Indi-

cators (KPI) and alarms; and Action Identification, which corre-

sponds to the decision of what tasks to perform to recover the

system normal performance. 
• Fault recovery: In this step, the proposed solutions are carried

out. 
• Fault compensation: Since diagnosing the fault and repairing it

normally takes some time, compensation aims to diminish the

impact of the fault by changing parameters in neighboring cells.
nder the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ). 

http://dx.doi.org/10.1016/j.eswa.2016.07.030
http://www.ScienceDirect.com
http://www.elsevier.com/locate/eswa
http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2016.07.030&domain=pdf
http://creativecommons.org/licenses/by-nc-nd/4.0/
mailto:dpc@ic.uma.es
mailto:emil@uma.es
mailto:rbarco@uma.es
http://dx.doi.org/10.1016/j.eswa.2016.07.030
http://creativecommons.org/licenses/by-nc-nd/4.0/


D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 57 

 
f  

h  

c  

m  

n  

L  

p  

N  

w  

p  

K  

f  

r  

O

 
t  

s  

a  

e  

a  

b  

m  

w

 
i  

b  

n  

b  

t  

e  

b  

u  

C  

2  

m  

i  

t  

T  

t  

m  

p  

n  

t  

m  

d  

e  

c  

o

 
a  

d  

n  

(  

2  

n  

t  

s  

a  

t  

t  

t

 
Fig. 1. Scheme of an automatic diagnosis system. 

 
l  

c  

a  

a  

l

2

2

 
o  

w  

T  

i  

o  

s  

(  

t  

c  

t

 
t  

i  

t  

fi  

t  

t  

n

 
a  

w  

o  
This paper is focused on the diagnosis task, in particular in the

ault identification, also called root cause analysis. Once a problem

as been detected in a cell, root cause analysis identifies the fault

ause given the value of performance indicators, alarms, counters,

obile traces, etc. In the context of cellular networks, some diag-

osis systems have been recently proposed. Barco, Díez, Wille, and

ázaro (2009) and Barco, Lázaro, Wille, Díez, and Patel (2009) pro-

osed diagnosis systems based on Bayesian Networks. Szilágyi and

ováczki (2012) used a scoring system in order to determine how

ell a specific case fits a diagnosis. Nováczki (2013) enhanced the

revious system by adding profiling techniques. The method in

hatib, Barco, Gómez-Andrades, and Serrano (2015) was based on

uzzy logic and genetic algorithms. Gómez-Andrades, Muñoz, Ser-

ano, and Barco (2016) proposed a diagnosis system based on Self-

rganized Maps (SOM). 

Each of the previous methods has its pros and its cons. In prac-

ice, this makes the selection of the diagnosis technique cumber-

ome when the aim is to deploy a automatic diagnosis system in

 real network. Furthermore, once the technique has been decided,

.g., fuzzy logic, operators normally design several standalone di-

gnosis models. This is due to the fact that, firstly, different trou-

leshooting experts will build different models and secondly, when

odels are learnt from historical cases, different training datasets

ill result in different models. 

To cope with the limitations of standard classifying systems

n terms of accuracy and dataset-dependent performance, ensem-

les of classifiers arose. Within these, homogeneous and heteroge-

eous (commonly known as hybrid) ensembles of classifiers may

e found, where the former stand for the ensemble of classifiers of

he same kind and the latter stand for the combination of differ-

nt kinds of systems and datasets. Despite homogeneous ensem-

les have been widely studied and as of today still are extensively

sed in different fields ( Begum, Chakraborty, & Sarkar, 2015; Liu,

hen, Song, & Han, 20 09; Shen & Chou, 20 06; Wiezbicki & Ribeiro,

016 ). In this paper, a method for the generalized combination of

ultiple diagnosis systems based on a hybrid ensemble approach

s proposed and tested in the context of cellular networks, which

o the authors’ knowledge is a research area still to be explored.

he proposed work describes a method to gather, combine and use

he knowledge held by any kind of expert system in any field that

akes use of a classifying or diagnosis system. In this work, the

roposed method is applied in the fault cause diagnosis in cellular

etworks, where the expertise may be provided either by a human

roubleshooting expert or by a database of cases assessed by auto-

atic diagnosis systems. The proposed method allows combining

iagnosis systems in a wide sense, being able to merge both sev-

ral diagnosis models (expertise) and the tools used for their appli-

ation (automatic diagnosis techniques) in the form of supervised

r unsupervised classifying systems. 

Up to now, hybrid ensembles of classifiers are mainly based on

 set of baseline systems which must first assess the cases un-

er test and, consequently, provide partial diagnoses which are fi-

ally combined into a final decision using a majority vote scheme

 Ciocarlie, Lindqvist, Nováczki, & Sanneck, 2013; Gandhi & Pandey,

015; Wei et al., 2014 ). This procedure requires a relatively high

umber of diagnosis techniques to be run in the test stage and,

herefore, a noticeable expenditure of computational and time re-

ources. The proposed work, however, presents a method which

llows combining the diagnoses that the standalone diagnosis sys-

ems would output for a case under test without actually needing

hem to be run, thus lightening the computational weight of the

est stage. 

The main contributions of this paper are: 

• A method to combine any number and kind of different stan-

dalone classifiers as well as different sources of expert knowl-
edge in order to get an enhanced performance compared to

that of the base classifiers. In the context of troubleshooting in

cellular networks this comprises the combination of several di-

agnosis models and techniques for the automatic diagnosis. 
• A method to lighten the computational cost of the evaluation

stage in hybrid ensembles of classifiers. This work proposes a

scheme to model and emulate the behavior of every standalone

classifier so these need not to be continuously queried before

combining their partial diagnoses. 

This paper is organized as follows. Section 2 presents the prob-

em formulation. Section 3 introduces the proposed method for

ombining multiple baseline diagnosis systems. In Section 4 results

re analyzed by means of both a network simulator and data from

 live LTE network. In Section 5 the future lines of work are out-

ined. Finally, Section 6 summarizes the main conclusions. 

. Problem formulation 

.1. Root cause analysis in mobile communications networks 

In the same way that a patient is diagnosed by a doctor based

n the symptoms he shows, the status of a communications net-

ork may be diagnosed based on a set of performance indicators.

his diagnosis task, also called root cause analysis or troubleshoot-

ng, is often carried out by human experts using their knowledge

n the underlying relations that the observed indicators and the

tatus of the network have. However, the number of symptoms

counters, alarms, KPIs, call traces, etc.) and possible fault causes

he expert has to deal with increases as networks grow in size and

omplexity, which makes this task to become a very difficult and

ime consuming issue. 

Furthermore, the current manual troubleshooting is a layered

ask, guided by a Trouble Ticket (TT) system. In this problem solv-

ng system, a group of specialists tries first to diagnose and solve

he problem by performing some simple checks. If they can not

nd the root of the problem, this is raised to a more specialized

eam (and so on), which performs a deeper study on the symp-

oms the case exhibits and resorts to field engineers in case they

eed to make some on site checks. 

As a response to this more and more inefficient procedure,

utomatic diagnosis systems arose in an attempt of imitating the

ay of acting of troubleshooters. Fig. 1 shows the basic scheme

f a system for automatic diagnosis. It is composed of an au-


58 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 

 
t  

m  

s  

p  

a  

a  

n

2

 
p  

i  

 
m  

e  

k  

i  

p  

s  

o  

d

 
p  

t  

p  

t  

w  

e  

s  

s  

c  

m

 
a  

t  

s  

s  

1  

l  

f  

t  

m  

e  

e  

p

 
f  

p  

n  

w  

c  

s  

B  

V  

f  

h  

a  

o

 
s  

i  

v  

k  

l  

e  
tomatic diagnosis technique and a diagnosis model. The first is

an artificial intelligence system that outputs a diagnosis taking

a set of symptoms, e.g., (KPIs) from a test case as its input. The

second represents the knowledge a human expert would have

on the underlying relations between the symptoms and the fault

causes and may take different forms depending on the diagnosis

technique it is destined to work with. For example, a diagnosis

model may consist of the parameters (e.g., prior probabilities

and probability density functions) required by a given diagnosis

technique (e.g., bayesian classifier) or a set of rules for other

techniques (e.g., Case Base Reasoning, CBR). As it can be seen in

this figure, the diagnosis model may be built from a set of training

cases by means of a machine learning algorithm or by trou-

bleshooting experts by gathering their knowledge. The proposed

method aims to combine the knowledge acquired by any number

and kind of diagnosis models and automatic diagnosis tech-

niques in an attempt to reduce the errors in fault detection and

diagnosis. 

2.2. Automated diagnosis from the classification theory 

A diagnosis system is a method that given a set of indicators

or symptoms (called case hereafter) intends to infer the cause that

provoked them. In this sense, a diagnosis system acts as a classi-

fying system in which the attributes from the cases to be classi-

fied correspond to the symptoms from the case to be diagnosed,

and the classes to be assigned correspond to the causes to be in-

ferred. This is an issue long time investigated in data mining the-

ory ( Wu et al., 2008 ), and many types of classifiers have been de-

veloped over the years in an attempt to get the maximum infor-

mation the cases under diagnosis could provide. However, no algo-

rithm has proven to be clearly better than the rest for all kinds of

input data by now. One reason for the increasing effort s in the re-

lated research is that the performance of a classifier normally de-

pends on the nature and distribution of the data it has to work

with. For this reason, the present paper focuses not only on com-

bining different diagnosis models but on offering the possibility

to combine multiple classifiers in the form of automatic diagnosis

techniques. 

Let us assume we have a set of M fault causes to diagnose and

R diagnosis systems (either diagnosis model or technique) to com-

bine, and that each of these systems can have a subset of these

causes as their output, namely, W r for the system r . In this sce-

nario, the set of causes a diagnosis system can identify may be dif-

ferent from one system to another. This can be seen in (1) , where

each row stands for a W r and the element w r m stands for the m th

fault cause, diagnosed by the r th system. According to this, each

row may be different from another. ⎡ 
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 

w 1 
1 

. . . w 1 m · · · w 1 M 
. 
. 
. 

. . . 
. 
. 
. 

. . . 
. 
. 
. 

w r 
1 

. . . w r m · · · w r M 
. 
. 
. 

. . . 
. 
. 
. 

. . . 
. 
. 
. 

w R 
1 

. . . w R m · · · w R M 

⎤ 
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 

(1)

In a diagnosis system, a case, x , is characterized by its symp-

toms, x n , where x = { x 1 , x 2 , . . . , x N } , having a total of N pos-
sible symptoms. However, each diagnosis system may consider

only a subset of these symptoms, namely, N r for the diagnosis

system r . 

In the context of diagnosis systems for mobile communica-

tion networks a case corresponds to an observation or measure-

ment from the network; a symptom may be an event counter,

a Key Performance Indicator (KPI), a call trace or an alarm and
he causes are seen as the network states, among which the nor-

al and several fault states may be distinguished. In this paper,

ome results from theory of classifiers is used, extended and ap-

lied in this context in an attempt of combining the knowledge

cquired by these R diagnosis systems, developing a more reli-

ble and accurate root cause analysis system for communication

etworks. 

.3. State-of-the-art in ensemble-based classification algorithms 

This section aims to provide a brief survey on the most recently

roposed ensemble-based systems, most of which have been used

n classifying tasks in areas not related to mobile communications.

Ensembles of classifiers may nowadays be classified into ho-

ogeneous and heterogeneous or hybrid. The first stand for those

nsembles which put together instances of classifiers of the same

ind, e.g., several k-Nearest Neighbor (kNN) classifiers. Conversely,

n heterogeneous ensembles a set of classifiers of different kind are

ut together, e.g., a kNN and a NN (Neural Network). This is the

cope of the present work, as the latter also allow the combination

f different sources of expert knowledge within a single enhanced

iagnosis system. 

One of the earliest works on ensemble methods proposed to

artition the feature space (i.e., the vector space in which the fea-

ures of the cases to be diagnosed are defined) and to assign each

art to a different classifier which is supposed to be the best for

his subset of cases ( Dasarathy & Sheela, 1979 ). This idea has been

idely explored and has given birth to the so-called mixture of

xperts algorithm ( Jacobs, Jordan, Nowlan, & Hinton, 1991; Yuk-

el, Wilson, & Gader, 2012 ), being the paradigm for the classifier-

election type of ensemble methods. Under this approach, only one

lassifier is working at the same time and its selection is deter-

ined by the partition the case under test belongs. 

Conversely, in classifier-fusion methods all classifiers are usu-

lly trained over the entire feature space. The classifier combina-

ion process involves merging the individual classifiers to obtain a

ystem that outperforms the standalone classifiers. This is the ba-

is for the widely used bagging and boosting predictors ( Breiman,

996; Freund & Schapire, 1997 ), being AdaBoost an example of the

atter and one of the most known and used algorithms for classi-

ying nowadays. Classifier fusion methods can also be divided into

hose which work with classification labels only and those which

ake use of a continuous valued output for each classifier for ev-

ry class. In this case, the outputs can be seen as the support an

xpert gives to a class in terms of the class-conditional posterior

robabilities ( Kuncheva, 2002 ). 

Some examples of ensemble methods as enhanced systems for

ault disclosure can be found in the literature with many different

urposes. In Liu et al. (2009) , an homogeneous ensemble of neural

etworks with cross-validation for fault diagnosis of analog circuits

ith tolerance is proposed. In Shen and Chou (2006) , several kNN

lassifiers are put together on a majority-vote ensemble to clas-

ify the patterns that several proteins may exhibit when folded. In

egum et al. (2015) , an homogeneous ensemble of SVM (Support

ector Machine) is proposed to identify different types of cancer

rom a genetic analysis. Wiezbicki and Ribeiro (2016) proposes an

omogeneous ensemble of neural networks, combined by means of

 weighted majority vote in a sensor network for the classification

f gases. 

Regarding the most recent works on hybrid ensembles of clas-

ifiers, in Wei et al. (2014) n ensembles are made up by combin-

ng 3 n baseline classifiers. Each ensemble comprises three super-

ised methods: a decision tree, a support vector machine and a

NN algorithm. In each ensemble, the diagnoses from these base-

ine classifiers are fused applying a weighted majority vote, where

ach vote is weighted by the performance each individual classifier


D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 59 

Fig. 2. Proposed method for combining diagnosis systems. Stage 1: Construction of the behavior models. 

s  

a  

j  

v  

k  

t  

a  

m  

w  

s  

o  

m  

b  

s  

f  

c

 
p  

c  

c  

b  

t  

c  

m

 
p  

p  

d  

f  

g  

p  

c  

A

a  

e  

t  

e  

n  

e  

t  

s  

(  

c  

q  

W  

c  

v

 
d

3

 
q  

s  

p  

b  

a  

a  

T  

a  

t  

d  

u

3

 
m

 
b  

(  

t  

n  

c  

i  

c  

T  

f  

I  

r  

d  

w  

c  

A  

u  

f  

i  

t  

a  

c

 
n  
hows during a prior training stage. Then, the n resulting diagnoses

re combined into a final diagnosis applying a non-weighted ma-

ority vote. In this case, all the baseline classifiers must be super-

ised diagnosis systems, as their performance must be previously

nown in order to weigh their votes in the first stage. Unlike this,

he proposed method allows the user to combine any kind of di-

gnosis system, either supervised or unsupervised ones. And even

ore important, regarding the operation stage, in Wei et al. (2014) ,

henever a new case is to be diagnosed it must pass through two

teps, one of them made up of 3 n systems which must first each

utput a diagnosis, resulting in a high computational cost. The

ethod in the proposed work, however, needs the test cases to

e assessed only by one step, which, furthermore, only consist of

ome algebraic calculations. Once the training stage has been per-

ormed, new cases will be diagnosed at a minimum computational

ost. 

As for Gandhi and Pandey (2015) , a two-step method is again

roposed. The first step consists of a learning stage for the base

lassifiers and the second step consists of a majority vote-based

ombining stage. Again and similar to Wei et al. (2014) , every

aseline classifier is required to first diagnose every new case in

he application step, which results in a high computational cost

ompared to that from the application (test) step in the proposed

ethod. 

In the context of cellular networks, Ciocarlie et al. (2013) pro-

oses a hybrid ensemble of classifiers to detect anomalies in the

erformance indicators of a cell. This work is focused on the fault

etection. Unlike this, the proposed work does not just find a per-

ormance degradation, but identifies the fault cause behind it. Re-

arding its implementation, this method relies on the use of a

ool of models. New models are added to this pool whenever a

hange in the configuration parameters of the network takes place.

 number of N CM ×
(
N uni v ariate × N U KPI + N multi v ariate × N G KPI 

)
models, 

nd thus, instances of automatic techniques must be assessed for

very single new case under test. In this expression, N CM stands for

he number of sets of network configuration parameters consid-

red; N U 
KPI 

and N univariate stand for the number of univariate tech-

iques considered and the number of KPIs acting as their input in

ach model; and N multivariate and N 
G 
KPI 

stand for the number of mul-

ivariate techniques used and the number of groups of KPIs con-

idered in each model. Like in Ciocarlie et al. (2013) , Wei et al.

2014) and Gandhi and Pandey (2015) , before an ensemble decision

an be made, a high number of baseline classifiers must be first

ueried. And again similarly to Ciocarlie et al. (2013) , according to

ei et al. (2014) and Gandhi and Pandey (2015) , all the partial de-

isions meet at a combining stage based in a weighted majority

ote. 

t  
To the authors’ knowledge, no ensemble method for fault cause

iagnosis in cellular networks has been proposed as of today. 

. Method for combining multiple automatic diagnosis systems 

In this section, a method for combining the knowledge ac-

uired by any number and kind of standalone automatic diagnosis

ystems by means of a classifier-fusion scheme is proposed. The

roposed method consists of two stages: the construction of the

ehavior models of the automatic diagnosis systems, Section 3.1 ,

nd the combination of these models in order to make a more

ccurate diagnosis on the cases from a testing set, Section 3.2 .

his can be seen in Figs. 2 and 5 . Before this method can be

pplied, two sets of N -dimensional cases must be distinguished:

he modeling set and the testing set, where each of these N

imensions stands for a working KPI. The modeling set will be

sed in the first stage and the testing set in the second. 

.1. Construction of the behavior models 

The baseline diagnosis systems are to be combined by means of

ixing their models of behavior, which need to be extracted first. 

Once the diagnosis model from each diagnosis system has been

uilt (either from training cases via a machine learning method

 Khatib, Barco, Gómez-Andrades, Muñoz, & Serrano, 2015 ), or from

he experts’ knowledge ( Gómez-Andrades et al., 2016 ) each diag-

osis system can start the classification (Fig. 1 ). In this stage, every

ase from the modeling set is diagnosed by the R systems. That

s, each system assigns to each case one of the M possible fault

auses; in particular, one of the causes that system can discern.

his can be seen in Fig. 2 , where the case x acts as the input

or the R systems and, in turn, they assign it R diagnosis labels.

f the system r diagnoses the case x with the cause m , this case

eceives the label w r∗m . In this way, each diagnosis system makes a
ifferent partition of the modeling set into | W r ∗| disjoint subsets,
hose maximum is | W r |, that is, the number of causes that system

onsiders (Fig. 3 ), where | A | is the number of elements in the set

. This leads to finally identify M ∗ different causes, being M ∗ the
nion of W r ∗ over r , with M ∗ ≤ M . According to this, a new matrix

rom (1) may be written, substituting every row (i.e., every W r ) by

ts corresponding W r ∗. Each row would represent one of the parti-
ions of the modeling set and each column would represent how

 cause “is seen” by each diagnosis system regarding the KPIs the

ases belonging to that w r m exhibit. 

It should be noticed that each of these M ∗ subsets contains a
umber of | N r |-dimensional cases. At this point, the behavior of

he diagnosis system r is modeled through the estimation of the


60 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 

Fig. 3. Modeling set divided into different subsets by means of two different par- 

titions: on the left, the partition the first diagnosis system makes, having W 1 = 
{ w 1 

1 
, w 1 

2 
, w 1 

3 
} with | W 1 | = | W 1 ∗| = 3 ; on the right, the partition the diagnosis sys- 

tem R makes, having W R = { w R 
1 
, w R 

2 
, w R 

3 
} and W R ∗ = { w R ∗

1 
, w R ∗

2 
} . In this last case, the 

diagnosis system R only diagnosed the causes 1 and 2 although being able of also 

identifying the fault cause 3. 

Table 1 

Families of PDFs considered for the estimation of p(x n | w r∗m ) . 
Distribution PDF Parameters 

Beta 
�(a + b) 

�(a )�(b) 
x a −1 (1 − x ) b−1 a, b 

Normal 1 
σ

√ 
2 π

exp 

(
− 1 

2 

(
x −μ
σ

)2 )
μ, σ

Log-normal 1 
xσ

√ 
2 π

exp 

(
− 1 

2 

(
ln (x −μ) 

σ

)2 )
μ, σ

Exponential λ exp (−λx ) λ
Gen. extreme value 1 

σ
t (x ) ξ +1 exp (−t (x )) , μ, σ , ξ

t (x ) = 
{(

1 + 
(

x −μ
σ

)
ξ
)− 1 

ξ ξ � = 0 
exp (−(x − μ) /σ ) ξ = 0 

T-location 
�( ν+1 

2 
) 

�( ν
2 
) 
√ 

π νσ

(
1 + 1 

ν

(
x −μ
σ

)2 )− ν+1 2 
ν, μ, σ

Nakagami 2 m 
m 

�(m )�m 
x 2 m −1 exp 

(
− m 

�
x 2 

)
m, �

Gamma 1 
�(k ) θ k 

x k −1 exp 
(
− x 

θ

)
k, θ

Logistic 
exp ( x −μs ) 

s ( 1+ exp ( − x −μs ) ) 
2 μ, s 

Log-logistic 
(β/α)(x/α) β−1 

( 1+(x/αβ ) ) 
2 α, β

Weibull k 
λ

(
x 
λ

)k −1 
exp 

(
− x 

λ

)k 
λ, k 

Rayleigh x 
σ 2 

exp 

(
− 1 

2 

(
x 
σ

)2 )
σ

Rice x 
σ 2 

exp 
(
− x 2 + ν2 

2 σ 2 

)
I 0 
(

xν
σ 2 

)
ν, σ

 
K  

&  

t  

p  

b  

t

β  

w  

t  

w  

h  

c  

r

3

 
s  

a  

K  

b  

l  

o  

t

 
j  

a

 
P  

s  

(  

r

P  

 
a  

(  

p  

f  

fi  

i  

 
a  

b

 
p  

f  

f  

f

P  

w

 
g  

D  

I  

(  

t  

i

statistical distributions of the N r KPIs for the cases belonging to

W r ∗. That is, the behavior of each diagnosis system is modeled by
means of N r × M ∗ PDFs. The estimated statistical distribution of
the n th KPI for the subset of cases diagnosed as m by the diag-

nosis system r is p(x n | w r∗m ) . The choice of the PDF that estimates
each one of these distributions is done according to the maximum

likelihood (ML) criterion. To do so, some families of PDFs are con-

sidered in the fitting procedure (Table 1 ). In a first step, the distri-

bution of the KPI x n from the cases labeled as w 
r∗
m is fitted attend-

ing to the ML criterion with each one of the considered families

of PDFs. This results in a set of candidates for estimating its dis-

tribution. These PDFs are then sorted by their likelihood and the

one with the maximum value is chosen to be the estimation for

the KPI. 

The reason for considering these families of PDFs is to get the

better estimation of the distribution of the KPI x n given its belong-

ing to w r∗m . Fig. 4 a shows a normalized histogram of the KPI “95th
percentile RSRP” from the cases labeled as w r∗m . In this figure, two
families of PDFs have been used in an attempt of fitting the under-

lying histogram, the normal and the generalized extreme value. As

it can be seen, the latter fits it better, resulting in a higher value

in a likelihood-ratio test. 

While some KPIs are counters and they do not have an up-

per limit, there are others that are inherently bounded, as they

are defined as a ratio. Normally, the beta PDF is used to fit these
PIs, usually limited between zero and one ( Barco, Lazaro, Diez,

 Wille, 2008 ). KPIs like the retainability or the accessibility of-

en reach these extreme values making the resulting fitted beta

resent asymptotes in these values. To avoid this issue the used

eta function β′ is slightly different from that from Table 1 , β. In
his case, 

′ (x ) = (1 − P 0 − P 1 ) β(x ) + P 0 /h β δ(x ) + P 1 /h β δ(x − 1) , (2)
here β( x ) stands for the distribution fitted to a set with no ex-

reme values; P 0 and P 1 stand for the relative frequency of cases

ith value 0 and 1 respectively; δ stands for the Dirac’s delta and
 β stands for the step (the resolution) when computing β

′ . This
an be seen in Fig. 4 b, where a normalized histogram for the KPI

etainability is shown. 

.2. Combination of behavior models 

This stage uses the cases from the testing set. In the previous

tage, the estimated functions have been seen as conditional prob-

bility density functions, that is, functions that express how the

PIs are distributed over the cases diagnosed with a given cause

y a given system. However, this set of functions may be seen as

ikelihood functions by just changing the approach. From this point

f view, the function depends on w r∗m given that an observation of
he random variable x n (that is, the n th KPI) has taken place. 

Now, assuming the KPIs are independent among each other, a

oint probability function of w r∗m , that is, p( x | w r∗m ) , may be written
s 

p( x | w r∗m ) = 
∏ 

n ∈ N r 
p(x n | w r∗m ) . (3)

Given (3) , and assuming that the prior probability of each cause,

 (w r∗m ) is given by 
| w r∗m | 
| W r∗| , the a posteriori probability for a diagnosis

ystem r to diagnose a case with the cause m given its KPIs are x

i.e., P (w r m | x ) ) can be calculated by just applying the Bayes’ theo-
em. That is, 

 (w r m | x ) = 
⎧ ⎨ 
⎩ 

p( x | w r∗m ) P (w r∗m ) ∑ 
w r∗

i 
∈ W r∗ p( x | w r∗i ) P (w r∗i ) 

i f P (w r∗m ) > 0 

0 i f P (w r∗m ) = 0 
(4)

At this point, some diagnosis system may have not diagnosed

 given cause as seen in Fig. 3 . In such case, P (w r∗m ) and thus
4) would result equal to zero. In any case, M × R a posteriori
robabilities may be distinguished. Fig. 5 shows this when a case y

rom the testing subset is to be diagnosed. As it can be seen in this

gure, the KPIs from the case y act as input values in the behav-

or models of the R diagnosis systems, i.e., the probability functions

p( y | w r∗m ) for w r∗m ∈ W r∗ and r = 1 , . . . , R . Then, the a posteriori prob-
bilities P (w r m | y ) are computed using these together with P (w r∗m )
y means of the Bayes’ theorem. 

Now, these M × R a posteriori probabilities together with the
rior probabilities can be combined over R using some algebraic

unctions, producing M probabilities of the kind P (w m | y ) Rule t per
unction used, where m again stands for the cause and t is an index

or the rule used in the combination, that is, 

 (w m | y ) Rule t = f Rule t 
(
P (w 1 m | y ) , . . . , P (w R m | y ) ; P (w m ) 

)
. (5)

here P ( w m ) is defined as the average of P (w 
r∗
m ) over r . 

Some rules for the combination of a posteriori probabilities

iven by several classifying systems are proposed in Kittler, Hatef,

uin, and Matas (1998) and studied further in Kuncheva (2002) .

n the first, those rules are derived from a maximum a posteriori

MAP) estimation in a multiple random variable scenario in an at-

empt of lightening the effort s of computing several joint probabil-

ty density functions. These rules are summarized in Table 2 . 


D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 61 

Fig. 4. (a) Normalized histogram for the KPI 95th percentile RSRP and two fitted PDFs: a generalized extreme value PDF in blue (round markers) and a normal PDF in red 

(square markers). (b), Normalized histogram for the KPI Retainability and a β′ PDF estimation. (For interpretation of the references to colour in this figure legend, the reader 
is referred to the web version of this article.) 

Fig. 5. Proposed method for combining diagnosis systems. Stage 2: Combining the behavior models. 

 
p  

n

d  

 
o  

t  

c

4

 
i  

i  

c  

i  

c

 
b

As this point, the fault cause with the maximum a posteriori

robability is taken as the final diagnosis per each rule of combi-

ation, d t . That is, 

 Rule t 
= arg max 

m 
{ P (w m | y ) Rule t } . (6)

Note that a situation with M ∗ < M means that there is at least
ne fault cause that have not been identified by any system. In

his case, it would be impossible for it to be finally diagnosed in

onsequence. 
. Proof of concept 

In this section, the proposed method is assessed by combin-

ng two different diagnosis models. In the first test, each model

s provided by a different expert; in the second test, each model

omes from using different machine learning algorithms for build-

ng the diagnosis models, provided the same set of training

ases. 

The proposed method has been evaluated and compared to the

aseline systems by means of the following figures of merit: 


62 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 

Table 2 

Algebraic rules for the combination of a posteriori probabilities. 

Rule P(w m | y ) 

Product rule P (w m ) 
−(R −1) 

R ∏ 
r=1 

P (w r m | y ) 

Sum rule (1 − R ) P(w m ) + 
R ∑ 

r=1 
P(w r m | y ) 

Max rule (1 − R ) P(w m ) + R 
R 

max 
r=1 

{ P(w r m | y ) } 

Min rule P (w m ) 
−(R −1) 

R 

min 
r=1 

{ P (w r m | y ) } 

Median rule 
R 

med 
r=1 

{ P(w r m | y ) } 

 
o  

t  

t  

G  

T

 
t  

b  

p  

f  

t  

D

 
e

 
• Diagnosis Error Rate (DER): it is the ratio of problematic cases

diagnosed as a fault cause different to the real one (misclassi-

fied cases), N MPC , to the total number of problematic cases, N PC .
• False Positive Rate (FPR): it is the number of normal cases di-

agnosed as problematic cases, ( N FP ), to the total number of nor-

mal cases, ( N NC ). 
• False Negative Rate (FNR): it is the number of problematic cases

diagnosed as normal cases, N FN , to the total number of prob-

lematic cases, N PC . This is the most critical metric, as it gives

an idea on how often the diagnosis system interprets there is

no problem when actually some cells are suffering from mal-

functioning. 

Given these definitions, an Overall Error Rate (OER) may be de-

fined as 

OER = P N · F P R + P PR · (F NR + DER ) (7)
where P N stands for the relative frequency of the normal cases and

P PC stands for the relative frequency of the faulty cases. This metric

is useful to assess every method at a single glance. Since these fig-

ures of merit require the true cause to be known, the used testing

set will include the real diagnosis. 

4.1. Combination of diagnosis models devised by multiple experts 

4.1.1. Scenario 

In this test, cases are provided by an LTE RAN simulator ( Muñoz

et al., 2011 ). This simulator considers an LTE network composed
Table 3 

Simulation parameters for cells normal functioning. 

Parameter Configuration 

Cellular layout Hexagonal grid, 57 cells, cell radius 

Transmission direction Downlink 

Carrier frequency 2.0 GHz 

System bandwidth 1.4 MHz, 6 PRB (Physical Resource B

Frequency reuse 1 

Propagation model Okumura-Hata with wrap-around, L

σs f = 8 dB and correlation distance
Channel model Multipath fading, ETU model 

Mobility model Random direction, 3 kph 

Service model Full Buffer, Poisson traffic arrival 

Base station model Tri-sectorized antenna, SISO, P T X max =
Azimuth beamwidth = 70 °, Elevati

Scheduler Time domain: Round-Robin, Frequen

Power control Equal transmit power per PRB 

Link Adaptation Fast, CQI (Channel Quality Indicator

Handover Triggering event = A3, HOM (Hando
Measurement type = RSRP 

Radio Link Failure SINR < −6.9 dB for 500 ms, Mehlfü
Traffic distribution Evenly distributed in space 

Time resolution 100 TTI (Transmission Time Interval

Epoch & KPI time 100 s 
f 57 macro-cells evenly distributed in space and grouped into 19

hree-sector-sites. To perform this test, similar network configura-

ion parameters to those used in Gómez-Andrades et al. (2015) and

ómez-Andrades et al. (2016) have been used. They can be seen in

able 3 . 

With this simulator, 1196 cases have been obtained. In this case,

raining cases are not needed since the diagnosis models have

een defined by experts. It is assumed that a detection system is

laced before the input of the diagnosis system, so that only the

aulty cases are put under test, putting aside the cases belonging

o a normal cause of functioning. Therefore, in this test only the

ER is taken into account. 

In this scenario, six typical RAN fault causes have been consid-

red ( M = 6 ): 

• Excessive downtilt: This situation takes place when the coverage

area for a cell is too small, making the signal level in the edge

of the cell to be too weak and causing a high number of han-

dover failures. The quality of the signal in the surroundings of

the cell is also decreased. 
• Coverage hole: A cell has a coverage hole in some point inside

its area when the power received by the user at this point from

any cell is not enough to hold the service. This excessive atten-

uation can be caused by either obstacles or a bad RF planning

and it mainly produces a high number of call drops. 
• Inter-system interference: This fault cause may occur due to

other cellular networks, like WCDMA. It is not always an easy

issue to solve, since the fault usually comes from an outer sys-

tem. This fault normally causes both the SINR and the average

throughput decrease. 
• Too late handover: A too late handover takes place if a radio link

failure occurs while the UE (User Equipment) is moving from

one cell to another and the corresponding handover between

these cells has not taken place yet. In that case, the UE will

request the second cell a connection re-establishment using the

physical cell ID of the first cell and its Common Radio Network

Temporary Identifier (C-RNTI) in that first cell, which will alert

the second cell a too late handover has occurred. 
• Excessive uptilt: A cell suffers from excessive uptilt when its cov-

erage area is larger than necessary, normally because of a bad

configuration of the radiation parameters of the antennas. This

situation can result in the overlapping of coverage areas from
0.5 km 

lock) 

og-normal slow fading, 

 = 50 m 

 43 dBm, Downtilt = 9 °
on beamwidth = 10 °

cy domain: Best Channel 

) based, perfect estimation 

ver Margin) = 3 dB, 

hrer, Wrulich, Colom Ikuno, Bosanska, and Rupp (2009) 

) (100 ms) 


D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 63 

Table 4 

Parameters used for modeling fault causes in Section 4.1 and a priori 

probabilities for each cause. 

Fault cause Configuration P ( ω m ) 

Excessive downtilt Downtilt = [16, 15, 14] ° 0 .18 
Coverage hole �hole = [49, 50, 52, 53] dBm 0 .09 
Inter-system interf. P T X max = 33 dBm 0 .1 

Downtilt = 15 °
Azimuth beamwidth = [30, 60] °
Elevation beamwidth = 10 °

Too late HO HOM = [6, 7, 8] dBm 0 .23 
Excessive uptilt Downtilt = [0, 1] ° 0 .21 
Lack of coverage P T X max = [7, 8, 9, 10] dBm 0 .19 

 
a  

c  

P  

s  

a

 
u

 
Table 5 

Diagnosis models for the diagnosis sys- 

tems used in test 1: used thresholds. 

KPI Thresholds 

Retainability [0.973, 0.996] 

HOSR [0.899, 0.989] 

RSRP [dBm] [ −76 . 9 , −72 . 4] 
RSRQ [dB] [ −18 . 8 , −18 . 2] 
SINR [dB] [13, 14.5] 

Throughput [kbps] [96.2, 111.67] 

Distance [km] [0.838, 0.88] 

 
4

 
d  

v  

i  

h  

W  

t  

k  

s  

f  

W

 
o  

&  

p  

“  

a  

f  

s  

l  

l  

c  

t  

e  

r  

i  

d  

s  

6

4

 
M  

a  

T  

t  
possibly non-adjacent cells, producing a high number of han-

dovers and call drops in this cell and its neighbors 
• Lack of coverage: A user suffers from weak coverage when the

Signal-to-Interference-Plus-Noise Ratio (SINR) measured in the

cell is below the minimum level needed to maintain a planned

performance requirement because the received power is low. 

The simulation parameters used to model these degradations

re shown in Table 4 , as well as the a priori probability of these

auses to take place, given by the experts. In this case, P (w 1 ∗m ) =
 (w 2 ∗m ) � = 0 ∀ m, so P (w m ) = P (w 1 ∗m ) = P (w 2 ∗m ) . As it can be seen,
everal values have been used for modeling a single fault cause,

ccording to lighter and more severe degradation. 

In this test, seven observable features or KPIs ( N = 7 ) have been
sed to discern among this set of causes: 

• Retainability , given as a percentage. This performance indicator

quantifies the ability of the cell to hold the service once ac-

cepted by the admission control. It gives an idea on how often

a user experiences a call drop. 
• Handover success rate (HOSR) , given as a percentage. This KPI

measures the ability of the network to provide mobility to a

user without losing its connection. It can be calculated as the

ratio between the number of successful handovers and the total

number of HO. 
• 95th percentile RSRP , given in dBm. The Reference Signal Re-

ceived Power (RSRP) is defined as the linear average over the

power contributions (in [W]) of the resource elements that

carry cell-specific reference signals within the considered mea-

surement frequency bandwidth. 
• 5th percentile RSRQ , given in dB. The Reference Signal Received

Quality (RSRQ) is a signal quality indicator and is defined as the

ratio 

RSRQ = N PRB · RSRP 
RSSI 

, (8) 

where N PRB is the number of resource blocks of the E-UTRA car-

rier RSSI measurement bandwidth and RSSI stands for the to-

tal received power within the measurement bandwidth. This is,

considering the power from the serving cell, the power of the

co-channel serving and non-serving cells, the adjacent chan-

nel interference and any possible source of noise. In this paper,

RSRQ is expressed in dB. 
• 95th percentile SINR , given in dB. The Signal-to-Interference-

plus-Noise Ratio (SINR) is defined as the ratio between the

power of the desired data signal and the sum of the powers

of all inter-cell interferences and the noise. It is expressed in

dB. 
• 95th percentile distance , given in km. This KPI measures the dis-

tance between users and their serving cell, expressed in km. It

can be estimated attending to the transmission delay between

them and gives an idea of the cell coverage area. 
• Average throughput , given in kbps. In LTE systems, the user

throughput depends on the SINR experienced by the user
through the following equation, 3GPP (c) , 

T k = (1 − BLER (SINR k )) ·
D k 
T T I 

, (9)

where BLER is the Block Error Rate obtained from the users’

SINR, D k is the data block payload in bits of user k and TTI is

the transmission time interval. 

In order to show the impact a proper modeling may have in

the diagnosis performance of the proposed method the propor-

tion of cases used for the modeling to the testing set has been

varied from 25% to 75%. To obtain more reliable results when

the number of cases are scarce either in the testing or in the

modeling set, 50 repetitions have been made per modeling-to-

testing ratio, randomizing the cases assigned to each set. Then,

the resulting diagnosis error rates have been averaged over the

50 repetitions. 

.1.2. The standalone classifiers 

In this test, for a given technique of automatic diagnosis, two

iagnosis models are combined, R = 2 , where each of them is pro-
ided by a different expert. This test represents the usual case

n cellular networks where each troubleshooting expert defines

is own set of rules and KPI thresholds to identify problems.

hen deploying the diagnosis system in a network, according to

he proposed method, instead of choosing one single model, the

nowledge from both experts is fused by combining two diagno-

is models. Furthermore, both diagnosis models comprise the six

ault causes and the seven different KPIs described above. That is,

 
1 = W 2 with | W 1 | = M and N 1 = N 2 . 
The artificial intelligence technique used for these tests is based

n a Fuzzy Logic Controller (FLC) ( Khatib, Barco, Gómez-Andrades,

 Serrano, 2015 ). This system contains rules, which are com-

osed of the antecedent (the “if ...” part) and the consequent (the

then ...” part), being the last the cause the fuzzy logic controller

ssigns to a case if the antecedent is fulfilled attending to the

uzzyfied observable features of the case. On the one hand, Table 5

hows the thresholds used in both diagnosis models. The lower

imit stands for the value below which a KPI is considered to be

ow; the upper limits stands for the value above which a KPI is

onsidered to be high. On the other hand, Table 6 shows the if ...,

hen ... rules that make up each diagnosis model, given by each

xpert. From left to right, each column below “KPI” in Table 6 cor-

esponds to the KPIs shown in Table 5 . H stands for a high value

n that KPI and L for a low value. Regarding the numbering of the

iagnoses, 1 means excessive downtilt; 2: coverage hole; 3: inter-

ystem interference; 4: too late handover; 5: excessive uptilt and

: lack of coverage. 

.1.3. Results 

Table 7 shows the diagnosis error rates computed when the

ax rule is used for combining ( Table 2 ). In Table 7 , the aver-

ge diagnosis error rate and the rate of improvement are shown.

his last rate represents the amount of repetitions (among the 50

hat have been performed) in which the diagnosis error rate from


64 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 

Table 6 

Diagnosis models for the diagnosis systems used in test 1: used rules. 

Diagnosis model 1 Diagnosis model 2 

KPI Diag. KPI Diag. 

L L H L – H L 1 – – H L – H L 1 

H H – L L H L 1 L – H H – L H 2 

L – – H H – H 2 L – H H H – H 2 

L – – H L L H 3 L – – H L L H 3 

L L H – L L H 3 L – H – L L H 3 

– – H H H H – 4 L – H H L L – 3 

– H H – H H – 4 L H – H L L – 3 

H – H – H H – 4 – H H H L L H 3 

– – H – H H H 4 L L – – – H H 4 

H H – – H – H 4 L L – L – – H 4 

H H – L – – H 4 L L H – H L – 4 

H H – – – H H 4 L L H – H – H 4 

H H H – – – H 4 L L L L L L – 4 

– – H L H – H 4 – – L H – L H 5 

– – H L – L H 4 H H – H – L H 5 

L L H L – – H 4 H H L – L L H 5 

– – L – L L H 5 – – – L L H L 6 

– – L – L H L 6 

L – – L L H L 6 

– H – L L H L 6 

H H – – L H L 6 

L L L L L – L 6 

Table 7 

Results of test 1: Combining two versions of the same classifying algo- 

rithm. 

Modeling-to-testing ratio 

25% 50% 75% 

Diagnosis syst. 1, average DER 13.81% 13.7% 13.65% 

Diagnosis syst. 2, average DER 16.34% 16.13% 16.3% 

Ens. Method: Max rule average DER 8.29% 5.92% 5.34% 

Rate of improvement 60% 98% 100% 

 
Table 8 

Main parameters of the real LTE network used in test two. 

Parameter Configuration 

Network Layout Urban area 

Number of cells 8679 

System bandwidth 10 MHz 

Number of PRBs 50 

Frequency reuse factor 1 

Max. Transmitted Power 46 dBm 

Max. Transmitted Power of UE 23 dBm 

Horizontal HPBW (Half-Power Beam Width) 65 °
HOM 3 dB 

KPI Time Period Hourly 

Number of observed cells 45 

Number of days under observation 6 days per cell (on average) 

Size of the dataset 14 ,692 labeled cases 

w  

f  

b

4

 
p  

I  

v  

m  

d  

b  

a  

i  

t  

b  

1  

b  

k  

c

 
the ensemble method is lower than the best one provided by the

baseline diagnosis systems. With a 25% of modeling-to-testing ra-

tio only 60% of the iterations shows a better ensemble diagnosis

error rate than the ones from its base diagnosis systems, showing,

therefore, little improvement in the average diagnosis error rate.

This result highlights how the scarcity of cases for modeling im-

pacts on the classifying performance of the ensemble. However, if

the number of cases used for modeling is doubled 98% of the iter-

ations shows a better diagnosis error rate, which results also in a

lower average diagnosis error rate. In case the modeling-to-testing

ratio is set to 75% every diagnosis error rate provided by the en-

semble method is lower than the lowest provided by its compo-

nents, reaching a 5.34% on average. This means a DER of approxi-

mately 1/3 the lowest DER achieved by the standalone classifiers. 

Regarding the DER of the standalone diagnosis systems, it can

be seen how these are held over the modeling-to-testing ratio.

This is because of the randomizing process executed over the la-

beled cases to be divided into the modeling and testing subsets.

When this random permutation is performed a number of times

and some subsets (two, in this case) are chosen blindly from this

set, the averages of the amount of cases labeled with a given cause

in each of these subsets tend to the ratios of the labels from the

original set. This is a consequence of the law of large numbers. For

this reason, the resulting averaged DER of these baseline systems

is independent on the size of the subsets made from the original

set of cases. 

4.2. Combination of different diagnosis systems on a live network 

Once the proposed method has been tested with cases provided

by a simulator, a second test with cases from a real live LTE net-
ork has been performed. In this test, the diagnosis models built

rom two different machine learning algorithms have been com-

ined. 

.2.1. Scenario 

An LTE network composed of more than 80 0 0 different cells

roviding coverage to almost 4 million people has been analyzed.

ts vastness makes many different cells to coexist and also a wide

ariety of problematic causes to come up. Table 8 summarizes the

ain parameters of the network. Among all the available candi-

ates, 45 random cells have been chosen to represent the network

ehavior. These cells have been monitored for almost 6 days on

verage and their KPIs have been stored in an hourly basis. Tak-

ng into account that the state of a single cell varies substantially

hroughout the day due to the traffic fluctuation, several cases have

een stored from each cell at different hours, resulting in a total of

4,692 cases. Once these cases were gathered, they were all labeled

y the experts, distinguishing four groups of cases ( M = 4 ): three
inds of problematic patterns and the normal cell functioning. The

auses of malfunctioning that were found are: 

• Overload : This fault cause is mainly distinguished by a high

number of RRC connections in the cell, which makes the CPU

processing load and the number of HO attempts raise conse-


D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 65 

Table 9 

Prior probability of occurrence for the causes considered in test two, 

P ( w m ). 

P (Overload) P (Lack of cov.) P (Non-operating) P (Normal) 

0 .01 0 .22 0 .47 0 .3 

 
c  

T  

f  

p  

p

 
p  

i  

t  

S

 
w  

N

 
4

 
l  

c  

a  

d  

i  

&  

r  

d  

a  

e  

Table 10 

Diagnosis models for the diagnosis systems used 

in test 2: used thresholds. 

KPI Thresholds 

Retainability [0.99, 0.997] 

Accessibility [0.992, 0.998] 

Number of RRC Connections [5846, 20703] 

Number of ping-pong HO [18, 83] 

Number of bad cov. reports [217, 1070] 

CPU average load [%] [22.5, 34.45] 

i  

w  

o  

g  

f  

t  

c  

p  

t  

m  

e  

n  

t  

s  

w  

i  

s  

s  

l  

h  

v  

s  

1  

n

4

 
p  

f  

i  

i  

A  

m  

T  

m  

t  

t  

f  

m  

b  

m  

g  

a  

n  

r

 
n  

c  

a  

n  

i  

F  

d  

s  
quently. The accessibility and retainability KPIs also hold values

quite below the ones for a cell with normal functioning. 
• Lack of coverage : This issue can be identified based on the num-

ber of bad coverage evaluation reports, which should be notice-

ably high. 
• Non-operating cell : In this case, and only if the cell is report-

ing any KPI measurement, most of the reported measurements

should be near zero: the retainability, the accessibility, the

number of performed HO, the number of RRC connections or

the number of coverage reports. 

The a priori probability of occurrence of each class has been

omputed as the average of P (w r∗m ) over r within this selection,
able 9 . From this table it should be noted that there are more

aulty cases than healthy ones. This is because a previous non-

erfect faulty cases detecting stage has been applied, which by-

assed some normal cases that now are to be diagnosed as such. 

At this point, a 20% of the total number of cases (holding the

roportion shown in Table 9 between them) were used as a train-

ng set for the machine learning algorithms and the rest were used

o conform the modeling and testing sets in a ratio that, as in

ection 4.1.3 , was varied along the test. 

In this test six of the most representative KPIs in an LTE net-

ork have been chosen to discern between the possible diagnoses,

 = 6 : 
• Retainability : described in Section 4.1.1 . 
• Accessibility : It is used to show the percentage of connections

that have got access to that cell over the KPI time period. A

low value in this KPIs means that many connections have been

blocked during the access procedure. 
• Number of RRC connections : It is the number of successfully es-

tablished RRC connections. Related to the Accessibility KPI, it

gives an idea of the amount of users served by the cell. 
• Number of Ping-Pong Handovers : This KPI counts the number of

ping-pong HO that takes place in the cell over the measure-

ment time period. A high value in this KPI may mean a bad

configuration in the handover policy, as the number of connec-

tions that goes back and forth over a cell and its neighbors is

high for a single call. 
• Number of bad coverage reports : It counts the number of times

a cell is notified that the UE measured a signal level in which

the requirements for the Event A2 takes place, 3GPP (a) . This

is, the measured signal level is under a certain threshold. 
• CPU average load : It is the average CPU load due to the pro-

cesses carried out by the cell over the KPI time period. 

.2.2. The standalone classifiers 

In this test, the two used standalone classifiers share a simi-

ar diagnosis system, a fuzzy-logic controller, which diagnoses the

ases attending to if . . . , then . . . rules. The difference resides in the

lgorithms they use for learning the rules they apply during the

iagnosis process. The first is a genetic algorithm and the second

s a data driven algorithm ( Khatib, Barco, Gómez-Andrades, Muñoz,

 Serrano, 2015; Khatib, Barco, Gómez-Andrades, & Serrano, 2015 )

espectively. In genetic algorithms, three main processes may be

istinguished: reproduction, by means of which new individuals

re created by either mutation or combination of the previously

xisting; evaluation, or the calculation of the probability of each
ndividual to survive and reproduce, and selection, a process in

hich some individuals are chosen to survive and reproduce based

n the results from the evaluation stage. Likewise, data driven al-

orithms first take a case from the training set and derives the

uzzy rule that covers it. Then, it looks for the cases covered by

his rule and scores the rule attending to the number of covered

ases. New incoming cases are taken until the training set is com-

letely explored. Provided this set of scored rules, the algorithm

hen fuses them into a lower number of rules in a attempt of

aximizing the number of cases (and therefore, the score) cov-

red by the resulting fused rules. In these tests, it is assumed that

ot only faulty cases, but also some normal cases are inputs for

he diagnosis stage. This can happen when there is no detection

ystem before the diagnosis system or in the realistic situation in

hich the detection system has a given probability of error. As

n Section 4.1.2 , both systems take as possible output all the pre-

ented diagnoses making use of the six KPIs shown above. Table 10

hows the thresholds used for these KPIs to consider them high or

ow and Table 11 shows the rules each machine learning algorithm

as derived from the testing set. As in Table 6 , H stands for a high

alue of the KPI and L, for a low value. The KPIs are sorted in the

ame way as in Table 10 and the numbering of the diagnoses are

: CPU overload; 2: lack of coverage; 3: non-operating cell and 4:

ormal functioning. 

.2.3. Results 

Once the standalone diagnosis systems have been trained, the

erformance metrics DER, FPR, FNR and OER have been computed

or both the standalone diagnosis systems and the rules described

n Table 2 . In this test, the modeling-to-testing ratio has been var-

ed from 10% to 90% in steps of 10, making 10 iterations per step.

s in Section 4.1.3 , a random permutation of the cases used for

odeling and testing has been done in each of these 10 iterations.

he resulting metrics have been then averaged. Table 12 shows the

etrics that result of using a proportion of 60% in the modeling-

o-testing ratio. This ratio has proved to minimize the values of all

he metrics in this test. Unlike in Section 4.1 , this scenario is made

rom real cases and contains outliers, that is, atypical cases. As the

odeling-to-testing ratio rises, the probability for these outliers to

elong to the modeling set also rises, thus inducing the behavior

odels to deviate from modeling the trend of the typical cases

iven a fault cause. On the other hand, if no outliers are taken into

ccount during the model-fitting procedure their fault cause will

ot be predictable in the second stage and the error rates will also

ise up. 

As it can be seen in Table 12 , in most cases, the combined diag-

osis system outperforms the standalone diagnosis systems. Con-

retely, the median rule achieves the lowest overall error rate with

 5.39%, approximately 2/3 from that of the best standalone diag-

osis system. However, the most relevant improvement takes place

n the reduction of the FNR, which has been reduced a 46%. The

NR gives an idea of the amount of problematic causes wrongly

eemed as normal. It is crucial making this metric as low as pos-

ible, since considering a problematic case as normal may result


66 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 

Table 11 

Diagnosis models for the diagnosis systems used in test 2: used rules. 

Diagnosis model 1: Diagnosis model 2: 

from genetic algorithm from data driven algorithm 

KPI Diagnosis KPI Diagnosis 

H H – – L L 1 H H L – L L 1 

H – H H L L 1 H H – H L L 1 

– H H L – L 2 H – H H L L 1 

H – – L – H 2 – L H L – H 2 

H – – L H – 2 – H H – H H 2 

H – H L – – 2 – – H L H H 2 

– L H L – H 2 L – H – H H 2 

H H – – H – 2 H H H – H – 2 

– H H – H – 2 H L – L L H 2 

– – H L H – 2 H – H L – H 2 

H – H – H H 2 H – H L H – 2 

H – H – H H 2 – L L L L L 3 

– L L L L L 3 L L L L H – 4 

L L – – H H 4 – L L L H H 4 

L L L – H – 4 L L L – H H 4 

L L H L L L 4 L – L L H H 4 

L L L – – H 4 

L – L – H H 4 

– – L L H H 4 

L L L H – – 4 

Table 12 

Results of test 2: Combining two different algorithms. 

DER FPR FNR OER 

Training: Data driven algorithm 2 .62% 16 .91% 6 .47% 11 .43% 

Training: Genetic algorithm 1 .87% 16 .61% 2 .68% 8 .16% 

Ensemble method 

Product rule 2 .6% 12 .21% 1 .32% 6 .2% 

Sum rule 1 .78% 11 .55% 1 .25% 5 .59% 

Max rule 1 .78% 11 .51% 1 .25% 5 .57% 

Min rule 2 .05% 11 .42% 1 .4% 5 .84% 

Median rule 1 .78% 10 .67% 1 .34% 5 .39% 

Majority vote rule 1 .78% 11 .23% 1 .25% 5 .49% 

 
in the worst case in unnoticed service outages and degradation

in the network performance. Regarding this, the proposed method

has proved to successfully reduce the FNR. Other indicators are not

as critical. For example, misleading a fault cause with another may

be to some extent tolerable (DER); although the actual problem is

not that one the operator thinks it is, he is still aware of a problem

in the network. Even considering normal cases as faulty may be

tolerable as the network performance is not really degraded (FPR).

These results can also be seen in the normalized confusion

matrices from the diagnosis methods. Fig. 6 a shows the normal-

ized confusion matrix for the FLC using genetic algorithm for rule

learning; Fig. 6 b shows the confusion matrix given the data driven

algorithm was used for learning the rules and Fig. 6 c shows the

matrix from applying the median rule with a 60% of modeling-

to-testing ratio in the ensemble method. In these matrices, the

elements from the fourth column (excluding the main diagonal)

account for the false negatives and the elements from the fourth

row account for the false negatives. It can be seen how the

elements from the main diagonal are reinforced in the ensemble

method and how only those diagnoses which are mistaken by

both baseline systems are slightly inherited by the latter. Fig. 6 c

also shows graphically how the FPR and FNR dropped with respect

to those from the standalone systems. 

5. Future lines of work 

• Decision templates . The proposed method does not punish or

reward the classifiers according to its performance during the
training stage. Going a step further from the idea of the

weighted majority vote used in Wei et al. (2014) , a score system

based on class and classifier aware decision templates applied

over the a posteriori probabilities P (w r m | x ) from Eq. (4) could
be used to improve the overall accuracy. 

• Non-parametric PDFs . The proposal of analytically and

parameter-defined PDFs results in a really light way of

representing a statistical behavior, as only its parameters must

be stored Table 1 to model a diagnosis system. However,

these distributions may limit to some extent the statistical

representation of the features from the training cases and they

may eventually introduce a source of error in the posterior

computation of P (w r m ) in case these cases follow a distribu-

tion that has not been considered. To solve this, the future

research could focus on using non-parametric PDFs, like the

kernel-based ones. 

Probability density functions may be classified into parametric

and non-parametric functions. The former have analytic expres-

sions and their shape depends on the parameters those func-

tions hold. The latter, however, are defined by means of a ker-

nel function. If all the cases from the dataset are placed along

the axis given by a feature of interest (a certain KPI, for exam-

ple) and a kernel function is centered wherever a point is, an

empirical non-parametric PDF would result from averaging the

sum of these functions over the number of cases. The main ad-

vantage of this method is its accuracy when modeling an em-

pirical distribution. Its main drawback is that, since it is not

defined by any parameters, it should be computed and stored

point by point, possibly increasing the storage and comput-

ing requirements. This method, however, may be used together

with (c). First, a reduced set of synthetic KPIs is computed and

then, their PDFs are accurately estimated with this method. 
• Use of synthetic KPIs via feature extraction. As it is described in

Section 3.1 , N r × M ∗ × R PDFs should be estimated in order
to model all the feature-class-classifier relations. If any of these

factors is relatively high, the computing cost for all these PDFs

to be computed could be prohibitive. Due to this, working with

a reduced group of synthetic/extracted features is proposed in

an attempt of mapping the N original features into ˆ N synthetic

features with ˆ N < N. 


D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 67 

Fig. 6. Normalized confusion matrices for the second test. 

 
6

 
k  

i  

l  

k  

t  

s

 
t  

a  

o  

t  

j  

s  

e  

h

 
u  

g  

b  

t  

t  

T  

i  

r  

c

In the recent years and mainly motivated by the impulse of

data mining many methods for dimensionality reduction have

arisen. Within these, it is worth highlighting the Principal

Component Analysis method (PCA) ( Jolliffe, 2002 ). In an N -

dimensional vector space, the simplest version of PCA (linear

PCA) is a technique that finds the mutually-uncorrelated vec-

tors onto which the projection of the samples generates the

highest variances. The result is a set of orthogonal vectors

sorted in descending order of achieved variance. The first of

these vectors is that onto which the variance of the projec-

tion of the samples is maximum. In this sense, the original KPIs

constitute the N -dimensional vector space basis, whereas the ˆ N 

synthetic KPIs represent the orthogonal vectors with the high-

est variance. To be rigorous, up to N synthetic orthogonal KPIs

may be computed. However, only a small set of them, the first
ˆ N , is enough to account for most of the variance of the data. 

By applying this technique, based on the eigenvalue decompo-

sition of the covariance matrix of the original KPIs, these can be

mapped into ˆ N , preserving most of the information contained

in the former. 

. Conclusions 

A hybrid ensemble of classifiers, devised to merge expert

nowledge from different sources has been presented and assessed
n the context of fault cause diagnosis in cellular networks, al-

owing the expertise from several troubleshooting experts and the

nowledge contained in databases of cases previously diagnosed

o be combined in order to develop a more accurate diagnosis

ystem. 

Unlike the common approach of hybrid ensembles, based on

he majority vote of their baseline components, this work proposes

 hybrid ensemble of classifiers obtained from the combination

f the statistical behavior models of the baseline diagnosis sys-

ems. This approach allows obtaining and afterwards combining by

ust applying some algebraic rules the partial diagnoses from the

tandalone classifiers without actually needing them to assess ev-

ry case under test, thus reducing the computational cost of usual

ybrid ensembles of classifiers. 

The method has been tested with two different sources of cases

nder test: cases provided by an LTE RAN simulator and cases

athered from a real live LTE network. Likewise, two use cases have

een assessed: the combination of diagnosis models designed by

wo different network troubleshooting experts and the combina-

ion of two diagnosis systems using different learning algorithms.

he proposed method has proved to outperform the behavior of

ts base components in both tests in terms of the diagnosis error

ate, proving to be an effective tool in the fault cause diagnosis in

urrent and future self-healing networks. 


68 D. Palacios et al. / Expert Systems With Applications 64 (2016) 56–68 

 
G  
 

J  

J  

K  

 
K  

 
K  
 

L  

 
M  
 

N  
 

S  

 
W  

 
W  

 
W  

 
Y  
 

Acknowledgment 

This work has been partially funded by Optimi-Ericsson, Junta

de Andalucía (Consejería de Ciencia, Innovación y Empresa, Ref.

59288 and Proyecto de Investigación de Excelencia P12-TIC-2905)

and ERDF. 

References 

3GPP (a). Evolved Universal Terrestrial Radio Access (E-UTRA) Radio Resource Con-
trol (RRC); Protocol Specification, Rel-13, Version 13.2.0, (2015-12). TS 36.331.

3rd Generation Partnership Project. 

3GPP (b). Feasibility study for Further Advancements for E-UTRA (LTE-Advanced),
Rel-13, Version 13.0.0 (2015–12). TR 36.912. 3rd Generation Partnership Project.

3GPP (c) (May 2004). OFDM-HSDPA System level simulator calibration (R1-040500).
3GPP TSG-RAN WG1 37 . 3rd Generation Partnership Project (3GPP) . 

3GPP (d). Self-Organizing Networks (SON); Concepts and requirements, Rel-13, Ver-
sion 13.0.0 (2015–12). TS 32.500. 3rd Generation Partnership Project. 

3GPP (e). Self-Organizing Networks (SON); Self-Healing concepts and requirements,
Rel-13, Version 13.0.0 (2015–12). TS 32.541. 3rd Generation Partnership Project. 

Barco, R., Díez, L., Wille, V., & Lázaro, P. (2009). Automatic diagnosis of mobile com-

munication networks under imprecise parameters. Expert Systems with Applica-
tions, 36 (1), 489–500. doi: 10.1016/j.eswa.2007.09.030 . 

Barco, R., Lazaro, P., Diez, L., & Wille, V. (2008). Continuous versus discrete model in
autodiagnosis systems for wireless networks. IEEE Transactions on Mobile Com-

puting, 7 (6), 673–681. doi: 10.1109/TMC.2008.23 . 
Barco, R., Lázaro, P., Wille, V., Díez, L., & Patel, S. (2009). Knowledge acquisition for

diagnosis model in wireless networks. Expert Systems with Applications, 36 (3),

4745–4752. doi: 10.1016/j.eswa.2008.06.042 . 
Barco, R., Lázaro, P., & Muñoz, P. (2012). A unified framework for Self-Healing in

wireless networks. IEEE Communications Magazine, 50 (12), 134–142. doi: 10.1109/
MCOM.2012.6384463 . 

Begum, S., Chakraborty, D., & Sarkar, R. (2015). Cancer classification from gene ex-
pression based microarray data using SVM ensemble. In 2015 international con-

ference on condition assessment techniques in electrical systems (CATCON) (pp. 13–

16). doi: 10.1109/CATCON.2015.7449500 . 
Breiman, L. (1996). Bagging predictors. In Machine learning (pp. 123–140) . 

Ciocarlie, G., Lindqvist, U., Nováczki, S., & Sanneck, H. (2013). Detecting anomalies in
cellular networks using an ensemble method. In Proceedings of the 9th interna-

tional conference on network and service management (CNSM 2013) (pp. 171–174).
doi: 10.1109/CNSM.2013.6727831 . 

Dasarathy, B., & Sheela, B. V. (1979). A composite classifier system design: con-

cepts and methodology. Proceedings of the IEEE, 67 (5), 708–713. doi: 10.1109/
PROC.1979.11321 . 

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line
learning and an application to boosting. Journal of Computer and System Sciences,

55 (1), 119–139. doi: 10.1006/jcss.1997.1504 . 
Gandhi, I., & Pandey, M. (2015). Hybrid ensemble of classifiers using voting. In

2015 international conference on green computing and internet of things (ICGCIoT)

(pp. 399–404). doi: 10.1109/ICGCIoT.2015.7380496 . 
Gómez-Andrades, A., Muñoz Luengo, P., Khatib, E., de la Bandera Cascales, I., Ser-

rano, I., & Barco, R. (2015). Methodology for the design and evaluation of
Self-Healing LTE networks. IEEE Transactions on Vehicular Technology, PP (99).

doi: 10.1109/TVT.2015.2477945 . 1–1 
ómez-Andrades, A., Muñoz, P., Serrano, I., & Barco, R. (2016). Automatic root cause
analysis for LTE networks based on unsupervised techniques. IEEE Transactions

on Vehicular Technology, 65 (4), 2369–2386. doi: 10.1109/TVT.2015.2431742 . 
acobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of

local experts. Neural Computing, 3 (1), 79–87. doi: 10.1162/neco.1991.3.1.79 . 
olliffe, I. (2002). Principal component analysis. Springer Series in Statistics (2nd).

Springer-Verlag New York . 
hatib, E. J., Barco, R., Gómez-Andrades, A., Muñoz, P., & Serrano, I. (2015). Data

mining for fuzzy diagnosis systems in LTE networks. Expert Systems with Appli-

cations, 42 (21), 7549–7559. doi: 10.1016/j.eswa.2015.05.031 . 
hatib, E. J., Barco, R., Gómez-Andrades, A., & Serrano, I. (2015). Diagnosis based

on genetic fuzzy algorithms for LTE Self-Healing. IEEE Transactions on Vehicular
Technology . doi: 10.1109/TVT.2015.2414296 . 

Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 20 (3), 226–239. doi: 10.

1109/34.667881 . 

uncheva, L. (2002). A theoretical study on six classifier fusion strategies. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 24 (2), 281–286. doi: 10.

1109/34.982906 . 
iu, H., Chen, G., Song, G., & Han, T. (2009). Analog circuit fault diagnosis using

bagging ensemble method with cross-validation. In International conference on
mechatronics and automation, 2009. ICMA 2009 (pp. 4 430–4 434). doi: 10.1109/

ICMA.2009.5246675 . 

ehlführer, C. , Wrulich, M. , Colom Ikuno, J. , Bosanska, D. , & Rupp, M. (2009). Sim-
ulating the long term evolution physical layer. In Proc. of 17th European signal

processing conference (EUSIPCO) . 
Muñoz, P., de la Bandera, I., Ruíz, F., Luna-Ramírez, S., Barco, R., Toril, M., et al.

(2011). Computationally-efficient design of a dynamic system-level LTE simu-
lator. International Journal of Electronics and Telecommunications, 57 (3), 347–358.

doi: 10.1155/2012/802606 . 

ováczki, S. (2013). An improved anomaly detection and diagnosis framework for
mobile network operators. In 2013 9th international conference on the design of

reliable communication networks (drcn) (pp. 234–241) . 
Shen, H.-B., & Chou, K.-C. (2006). Ensemble classifier for protein fold pattern recog-

nition. Bioinformatics, 22 (14), 1717–1722. doi: 10.1093/bioinformatics/btl170 . 
zilágyi, P., & Nováczki, S. (2012). An automatic detection and diagnosis framework

for mobile communication systems. IEEE Transactions on Network and Service

Management, 9 (2), 184–197. doi: 10.1109/TNSM.2012.031912.110155 . 
ei, H., Lin, X., Xu, X., Li, L., Zhang, W., & Wang, X. (2014). A novel ensemble clas-

sifier based on multiple diverse classification methods. In 2014 11th interna-
tional conference on fuzzy systems and knowledge discovery (FSKD) (pp. 301–305).

doi: 10.1109/FSKD.2014.6980850 . 
iezbicki, T., & Ribeiro, E. P. (2016). Sensor drift compensation using weighted neu-

ral networks. In 2016 IEEE conference on evolving and adaptive intelligent systems

(EAIS) (pp. 92–97). doi: 10.1109/EAIS.2016.7502497 . 
u, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top

10 algorithms in data mining. Knowledge and Information Systems, 14 (1), 1–37.
doi: 10.1007/s10115- 007- 0114- 2 . 

uksel, S., Wilson, J., & Gader, P. (2012). Twenty years of mixture of experts. IEEE
Transactions on Neural Networks and Learning Systems, 23 (8), 1177–1193. doi: 10.

1109/TNNLS.2012.2200299 . 

http://dx.doi.org/10.13039/501100002878
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0001
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0001
http://dx.doi.org/10.1016/j.eswa.2007.09.030
http://dx.doi.org/10.1109/TMC.2008.23
http://dx.doi.org/10.1016/j.eswa.2008.06.042
http://dx.doi.org/10.1109/MCOM.2012.6384463
http://dx.doi.org/10.1109/CATCON.2015.7449500
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0007
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0007
http://dx.doi.org/10.1109/CNSM.2013.6727831
http://dx.doi.org/10.1109/PROC.1979.11321
http://dx.doi.org/10.1006/jcss.1997.1504
http://dx.doi.org/10.1109/ICGCIoT.2015.7380496
http://dx.doi.org/10.1109/TVT.2015.2477945
http://dx.doi.org/10.1109/TVT.2015.2431742
http://dx.doi.org/10.1162/neco.1991.3.1.79
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0015
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0015
http://dx.doi.org/10.1016/j.eswa.2015.05.031
http://dx.doi.org/10.1109/TVT.2015.2414296
http://dx.doi.org/10.1109/34.667881
http://dx.doi.org/10.1109/34.982906
http://dx.doi.org/10.1109/ICMA.2009.5246675
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0021
http://dx.doi.org/10.1155/2012/802606
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0023
http://refhub.elsevier.com/S0957-4174(16)30377-3/sbref0023
http://dx.doi.org/10.1093/bioinformatics/btl170
http://dx.doi.org/10.1109/TNSM.2012.031912.110155
http://dx.doi.org/10.1109/FSKD.2014.6980850
http://dx.doi.org/10.1109/EAIS.2016.7502497
http://dx.doi.org/10.1007/s10115-007-0114-2
http://dx.doi.org/10.1109/TNNLS.2012.2200299

	Combination of multiple diagnosis systems in Self-Healing networks
	1 Introduction
	2 Problem formulation
	2.1 Root cause analysis in mobile communications networks
	2.2 Automated diagnosis from the classification theory
	2.3 State-of-the-art in ensemble-based classification algorithms

	3 Method for combining multiple automatic diagnosis systems
	3.1 Construction of the behavior models
	3.2 Combination of behavior models

	4 Proof of concept
	4.1 Combination of diagnosis models devised by multiple experts
	4.1.1 Scenario
	4.1.2 The standalone classifiers
	4.1.3 Results

	4.2 Combination of different diagnosis systems on a live network
	4.2.1 Scenario
	4.2.2 The standalone classifiers
	4.2.3 Results


	5 Future lines of work
	6 Conclusions
	 Acknowledgment
	 References