Clustering-based location in wireless networks


Expert Systems with Applications 37 (2010) 6165–6175
Contents lists available at ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
Clustering-based location in wireless networks

Luis Mengual, Oscar Marbán *, Santiago Eibe
Facultad de Informática, Universidad Politécnica de Madrid (UPM), Spain

a r t i c l e i n f o
Keywords:
Indoor location
Wireless lan
802.11 Signal strength
Fingerprint techniques
Clustering
0957-4174/$ - see front matter � 2010 Elsevier Ltd. A
doi:10.1016/j.eswa.2010.02.111

* Corresponding author. Address: Campus de Monte
de Informática, Universidad Politécnica de Madrid
Monte, Madrid, Spain.

E-mail addresses: lmengual@fi.upm.es (L. Meng
Marbán), seibe@fi.upm.es (S. Eibe).
a b s t r a c t

In this paper, we propose a three-phase methodology (measurement, calibration and estimation) for
locating mobile stations (MS) in an indoor environment using wireless technology. Our solution is a fin-
gerprint-based positioning system that overcomes the problem of the relative effect of doors and walls on
signal strength and is independent of network device manufacturers. In the measurement phase, our sys-
tem collects received signal strength indicator (RSSI) measurements from multiple access points. In the
calibration phase, our system utilizes these measurements in a normalization process to create a radio
map, a database of RSS patterns. Unlike traditional radio map-based methods, our methodology normal-
izes RSS measurements collected at different locations (on a floor) and uses artificial neural network
models (ANNs) to group them into clusters. In the third phase, we use data mining techniques (cluster-
ing) to optimize location results. Experimental results demonstrate the accuracy of the proposed method.
From these results it is clear that the system is highly likely to be able to locate a MS in a room or nearby
room.

� 2010 Elsevier Ltd. All rights reserved.
1. Introduction

The use of wireless technology has expanded over recent years.
Universities, private companies, as well as stations and airports all
provide wireless communication facilities. This type of networks
can be implemented using what are known as access point devices
that provide total connectivity over a surface and connection to
Internet. Several such devices are sometimes necessary to cover
an office, a university or an airport.

One of the services to be implemented on these networks is
user location on the premises. Through location and the visualiza-
tion of a plan of the coverage area, users can navigate the premises
as they would do using GPS navigation in an outdoor environment.
Additionally, a network manager would be able to pinpoint users
in such environments and interact with them through their posi-
tioning. Indoor positioning technology can also complement mo-
bile–telephone-based GPS user positioning to determine the
users exact location indoors.

In another context, location tracking is an essential feature for
enterprises building business-critical wireless networks. If infor-
mation technology (IT) staff can identify and track the location of
wireless clients and highly mobile assets, they can improve the
accuracy of WLAN planning and deployment, optimize ongoing
ll rights reserved.

gancedo s/n. DLSIIS, Facultad
(UPM), 28660 Boadilla del

ual), omarban@fi.upm.es (O.
network performance, enhance wireless security, and improve
the usefulness and value of important business applications. Loca-
tion tracking provides enhanced visibility and control of air space,
helping IT staff to deploy wireless networks that are as easy to
manage and as effective to deploy as traditional wired networks.

To address lack-of-visibility problems, organizations need a
cost-effective, easy-to-deploy solution for tracking and managing
thousands of Wi-Fi devices and tags across a variety of business
environments.

In heterogeneous environments, e.g. inside a building, though,
the received power is a very complex function of distance, wall
geometry, building infrastructures and obstacles. Even if you have
a detailed model of the building, it takes a lengthy simulation to
solve the direct problem of deriving signal strength given the loca-
tion. This is what has motivated us to consider flexible models
based on functions networks (neural networks) to implement a
system to locate MSs.

The received signal strength (RSS), a measure of the power re-
ceived by the client from an access point (AP), is the key parameter
for establishing the position of a MS in an indoor environment. Tra-
ditionally, WLANs have used different measurement techniques to
derive the position of MSs (Kaemarungsi, 2005; Raniwala & Chiueh,
2002). There are three major categories: closest access point-based
techniques, location pattern or fingerprint and distance or angle
measurement (see Fig. 1).

The closest AP method finds devices within the total coverage
area of a single AP. It is the simplest but least accurate way to lo-
cate a device or user. With the closest AP method, the location
tracking system identifies only devices within the total coverage

http://dx.doi.org/10.1016/j.eswa.2010.02.111
mailto:lmengual@fi.upm.es
mailto:omarban@fi.upm.es
mailto:seibe@fi.upm.es
http://www.sciencedirect.com/science/journal/09574174
http://www.elsevier.com/locate/eswa


Fig. 1. WLAN positioning techniques.

6166 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175
area of a single AP. This area can be quite large and include multi-
ple rooms. This system considers an AP to which a terminal con-
nects as the user location.

The main distance measurement-based approach is known as
TDOA (Time Difference of Arrival). This technique relies on the
timing precision between the signal transmitter and receiver. It
uses the propagation delay to calculate the distance between
transmitter and receiver (Golden & Bateman, 2007). Therefore, a
precise synchronization is very important in such systems. By
combining at least three distances from three reference positions,
triangulation can be used to estimate the MS’s location. Such
techniques will require a high accuracy clock in the communica-
tion system.

Angle of arrival (AOA) is a technique based on angle measure-
ment. This methodology locates the MS by determining the signal
angle of incidence. The location can be estimated using simple geo-
metric relationships, like triangulation (Sanchez, Afonso, Macias, &
Suarez, 2006).

In indoor environments, however, the distance between trans-
mitter and receiver is usually shorter than the time resolution that
the system can measure. Additionally, the MS is surrounded by
scattered objects, which results in multiple angles of signal recep-
tion. Therefore, the AOA and TDOA approaches are difficult to
implement in indoor environments.

Fingerprinting techniques generally only require measurement of
received signal strength or other non-geometric features at several
locations to form a database of location fingerprints (Kaemarungsi,
2006; Widyawan, Klepal, & Pesch, 2007). To estimate the mobile
location, the system needs to first measure the received signal
strength at particular locations and then search for the pattern or
fingerprints with the closest match in the database.

Generally, the deployment of fingerprinting-based positioning
systems can be divided into two phases. First, the location finger-
prints are collected in the off-line or calibration phase by perform-
ing a site survey of the received signal strength (RSS) from multiple
APs. Enough RSS measurements are taken to set up a database or a
table of RSS patterns for predetermined points of the coverage
area. The database of RSS patterns with their respective locations
is called a radio map.

Second, a MS will report a sample measured vector of RSSs from
different APs to a central server (or a group of APs will collect the
RSS measurements from a MS and send it to the server) in the on-
line or estimation phase. The server uses a positioning algorithm to
estimate the location of the MS based on the radio map and reports
the estimate back to the MS (or the application requesting the po-
sition information).
At this point we have several possibilities. The most common
algorithm used to estimate the location computes the Euclidean
distance between the sample measured RSS vector and each finger-
print in the database. For indoor positioning systems, other ad-
vanced algorithms and techniques, such as neural networks
(Battiti, Le Nhat, & Villani, 2002; Debono & Buhagiar, 2004; Yiming,
Biaz, Pandey, & Agrawal, 2006; Youssef & Agrawala, 2005),
probabilistic methods (Chai & Yang, 2007; Kontkanen et al.,
2004; Pan, Kwok, Yang, & Chen, 2006; Robinson & Psaromiligkos,
2005; Seigo & Kawaguchi, 2005) or fuzzy logic (Astrain, Villadan-
gos, Garitagoitia, González-Mendivil, & Cholvim, 2006), have been
introduced to determine the relationship between RSS samples and
the location fingerprint in the radio map.

Other approaches have been presented. For example, some
authors have proposed simulating the calibration phase with a
ray-tracing model (Nezafat, Kaveh, Tsuji, & Fukagawa, 2004;
Nuño & Páez-Borrallo, 2006; Zhong, Bin-Hong, Hao-Xing, Hsing-
Yi, & Sarkar, 2001). However, this model requires a highly detailed
model of the building. It could be tricky, if not unfeasible, to get
such a detailed model. One of the main drawbacks of both of these
models is that signal strength prediction is dependent not just on
building layout, but also on the position of many other hard-to-
model components, such as furniture, equipment and human
beings.

So far, fingerprinting techniques with a real calibration phase
have attracted more attention because they are a simple and the
most effective solution for indoor environments.

Of the solutions that use RSSI to locate a mobile user, some scan
the radioelectric spectrum of every square metre of the coverage
area. Additionally, they only use one card brand. It is unfeasible
to position another mobile user with a different Wi-Fi receiver
using this solution.

The solution proposed here is a fingerprint-based positioning
system that gets significantly better results than other similar sys-
tems because it normalizes measurements and optimizes clusters
formed according to the physical features of the ground plan. In
this way, the proposed solution is valid for any Wi-Fi receiver, as
we account for the relative effect of walls and obstacles as a rela-
tive power increase or decrease.

Additionally, our technique does not require an exhaustive scan
of the coverage area. Actually we present the results of sampling
just one point per location. Our system can adapt to the geometry
of each scenario and operate with any Wi-Fi device (irrespective of
the manufacturer).

Finally, we present the results of implementing our system in
practice and the evaluation tests run in a School of Computing


L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6167
building (Department of Computer Languages and Systems and
Software Engineering). The experimental results confirm the effec-
tiveness of our methodology.
2. Location estimation system issues

Unfortunately, commercial 802.11 hardware does not provide
positioning functionality. Even so, positioning can be implemented
using RSS measurements provided by an 802.11 wireless card. RSS
is a measure of the power received by the client from an AP and
provides information as to the clients location.

In heterogeneous environments, e.g. inside a building, though,
the received power is a very complex function of distance, geome-
try, and materials. This is further complicated by the fact that the
signal propagation is influenced by environmental factors like,
for example, the number of people in the working area, the posi-
tion of walls and other building infrastructures and the materials
they are made of, as well as the multi-path effect (Ali & Nobles,
2007).

Another problem is the RSSI gathered from wireless stations
using their device driver. RSSI is an optional parameter that has a
value of 0 through to RSSI_Max. This parameter is a measure of
the physical sublayer of the energy observed at the antenna used
to receive the current frame. RSSI shall be measured from the
beginning of the start frame delimiter (SFD) to the end of the
frame, i.e. header error check (HEC). RSSI is intended to be used
in a relative manner. Absolute RSSI reading accuracy is not
specified.

There is nothing in the 802.11 standard that stipulates a rela-
tionship between the RSSI value and any particular energy level
as could be measured in mW or dBm. Individual vendors have cho-
sen to provide their own levels of accuracy, granularity, and actual
power range (measured as mW or dBm) and their RSSI value range
(from 0 to RSSI_Max). Some vendors do not provide an RSSI value
and instead convert directly from dBm to percent.

In this study we used our own RSS collecting software to test
different wireless cards. Each card collected 300 samples over a
period of 5 minutes (1 sample/s) on the 3rd Floor of Building 4 at
the School of Computing (Universidad Politcnica Madrid). Fig. 2
shows the measurable ranges for the different wireless cards in a
room. We found that there was a difference of 20 dB between dif-
ferent cards (3Com/Ovislink). There is even a card (USROBOTICS)
that implements RSSI as a percentage and not in dBm units.

From these results we conclude that the mapping between the
actual RF energy and the RSSI range in a wireless device varies
Fig. 2. Measurable RSS ra
from one vendor to another. Since the range and the measurement
of RSS depend on the WLAN card, most fingerprint-based position-
ing systems have used the same wireless card vendor to collect the
location fingerprints and determine the location.

The results of our experimental tests match the study run in
Kaemarungsi (2006), showing the spread of values for five card
manufacturers. In Kaemarungsi (2006), Kaemarungsi even goes
as far as to recommend the use of the same card manufacturer
for radio map construction-based positioning systems. However,
our methodology is capable of working with different wireless
cards because we normalize all RSS measurements in the calibra-
tion phase.
3. Location estimation methodology

The methodology that we propose can locate any mobile user in
any indoor environment equipped with several symmetrically dis-
tributed APs. In the following we describe the proposed system.

3.1. Outline of the system

Our system is a three-phase fingerprint-based positioning solu-
tion, as shown in Fig. 3.

1. In phase 1, RSS measurements are collected from multiple APs
positioned in different rooms on one floor. These will be the
location fingerprints used to create the RSS database, called
the radio map, of that floor.

2. This information is processed in a calibration phase. The cali-
bration phase is divided into several stages. First, we normalize
the collected measurements to identify the relative effect of
walls and obstacles. Second, we use neural networks and the
computed normalized values to group the measurements in
clusters. Finally, we use the physical topology to optimize the
clusters formed.

3. In phase 3, called the estimation phase, our system can locate
any MS from its current RSS measurements in one or two rooms
on the floor. This is possible because of the cluster optimization
logic and the applied normalization methodology. In our system
we chose to use Kohonen networks as a clustering tool. Koho-
nen networks represent a type of self-organizing map (SOM),
which is actually a special class of neural networks. SOMs are
based on competitive learning, where the output nodes com-
pete with each other to be the winning node (or neuron), the
only node to be activated by a particular input observation. This
m

nge of WLAN cards.


RadioMap
Builder

Wireless DeviceDriver

Normalization
MeasuresClustering

RadioMap
OptimizedClusters

Estimator

Optimization
ClusteringLogic

EstimatedLocation

Measurement Phase

Estimation PhaseCalibrationPhase

Normalization
Measures

Signal Strength Measurements

Physical
Distribution Plant

RadioMap
Builder

RadioMap
Builder

Wireless DeviceDriver

Normalization
Measures

Normalization
MeasuresClusteringClustering

RadioMap
Estimator

Optimization
ClusteringLogic
Optimization

ClusteringLogic

EstimatedLocation

Measurement Phase

Estimation PhaseCalibrationPhase

Normalization
Measures

Normalization
Measures

Signal Strength Measurements

Physical
Distribution Plant

Physical
Distribution Plant

Fig. 3. Location estimation methodology structure.

6168 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175
clustering technique has the advantage of converting a complex
high-dimensional input signal into a two-dimensional discrete
map.

In the next section we detail the location estimation system.
3.2. Location estimation algorithm

Our location estimation system consists of the following phases
(see Fig. 3).
3.2.1. Measurement phase
As already mentioned, the RSSI measurements database is built

in this phase. To do this, we systematically measure the signal val-
ues relative to the available APs. The calculations, which we will
describe later, call for at least three APs arranged to form a triangle.
If there are more than three APs, they should be positioned sym-
metrically to optimize coverage. As many measurements will be
taken as necessary to assure that the results are significant.

At the end of the process, a vector S is output for each measure-
ment. This vector represents the signal power values received from
k surrounding APs (in this case k ¼ 4):

S ¼ðs1; s2; . . . ; skÞ ð1Þ

In this paper we will demonstrate that, according to our meth-
odology, the make of the card used to build the radio map is irrel-
evant. Generally, as explained in Section 2, the values output
would be different. However, later data processing makes them va-
lid for estimating the position of any card.
3.2.2. Calibration phase
This phase can be divided into the following stages illustrated in

Fig. 4.
Normalization. The normalization of measurements is a key as-

pect in our methodology. Thanks to normalization, we can take
into account the relative effect of walls and obstacles on the signal
loss from the surrounding APs. Additionally, the normalization of
measurements allows us to use any wireless device to estimate
the position of a mobile terminal.

To understand this key point, lets look at the measurements in
Fig. 2. The table data are evidence of the spread of the received RSSI
values. This rules out the use of a clustering technique. However,
you can solve this problem by referencing all the power values to
fixed coordinate sources on the map.

The normalization of measurements involves first setting the
normalization reference points. We will set both the reference
points and APs existing in the region under analysis will be set.
Actually, the coordinates of the measurement point nearest to each
AP will be taken as the normalization reference. Consequently, we
considered four reference point coordinates x1; x4; x6 and x7 ,
respectively, in this study. They correspond to the symmetric posi-
tions in which the APs were located.

Second, we calculate the mean value of the measures taken
using the different cards in each of the rooms with the set reference
points. This way, we get the following data.

Let k be number of APs = number of reference points, and lets
take n measurements as follows:

mi ¼ðS1; S2; . . . ; SnÞ where i ¼ 1; . . . ; n ð2Þ

measured at reference point:

mRj ¼ðS
R
1; S

R
2; . . . ; S

R
kÞ ð3Þ

e.g. at R ¼ P1:

mP1j ¼ðS
P1
1 ; S

P1
2 ; . . . ; S

P1
k Þ ð4Þ

Mean of the measurements at the reference points:

MR ¼ �mR ¼
Pn

j¼1 S
R
1j

n
;

Pn
j¼1 S

R
2j

n
; . . . ;

Pn
j¼1S

R
kj

n

 !
ð5Þ

e.g. at R ¼ P1:

MP1 ¼ �mP1 ¼
Pn

j¼1S
P1
1j

n
;

Pn
j¼1 S

P1
2j

n
; . . . ;

Pn
j¼1 S

P1
kj

n

 !
ð6Þ


Fig. 4. Detailed calibration phase.

L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6169
Then we proceed to expand the measures by making:

8mi ¼ðS1; S2; . . . ; SkÞ; 8R ¼ 1; . . . ; k
eRi ¼ �m

R � miÞ
� �

¼ MR � mi
� � ð7Þ

This would give, for one reference point, the vector eRi of size 1 � k
and, for k reference points, we would have the vector Ei of size
1 �ðk � kÞ

Ei ¼ðe
R1
i ; e

R2
i ; . . . ; e

Rk
i Þ ð8Þ
Example. If R ¼ P1; P4; P6; P7:
eP1i ¼ðM
P1 � miÞ; eP4i ¼ðM

P4 � miÞ
eP6i ¼ðM

P6 � miÞ; eP7i ¼ðM
P7 � miÞ

ð9Þ

the expansion vector of the measurements would be

Ei ¼ðeP1i ; e
P4
i ; e

P6
i ; e

P7
i Þ ð10Þ

This expansion process is motivated by the analysis of the
experimental results, as we will see in the evaluation section.

Clustering and optimized clustering. Having built the radio map
of normalized measures our system applies artificial intelligence
techniques to estimate the location. We will actually use clustering
techniques to form different SOMs from the values of the normal-
ized measurements with respect to each reference point. This way,
Fig. 5. Cluster vi
we will get as many SOMs as reference points were set in the
system.

A classification method validation process can be included in
SOM construction, although, as we will see in the examples, this
is not always necessary. The process involves dividing the input
data into two sets: a training set to form the SOMs and a test data
set to test the SOMs and check they are capable of locating a device
whose measurements have not been used to form the SOM. With
this test set, we can evaluate the performance of the SOMs in terms
of percentage of correct locations. The original data from the
experiments run are partitioned at random as follows: 90% for
the training set and 10% for the test set.

The next step is to optimize the above clusters. To get a preli-
minary evaluation of the cluster obtained from the positions with
respect to their arrangement in the physical space, the cluster is
plotted on a chart. Fig. 5 shows this chart.

As the chart shows, all the clusters contain measures taken at
different positions. As our ultimate aim is to define regions in
which to locate measurements, we decompose the above chart into
planes. Each plane contains the measurements of just one position,
as shown in Fig. 6. The regions containing points from each posi-
tion are easier to identify in Fig. 6. The result of the classification
will now be associated with a region instead of a position, as had
been the case up to this point. Obviously, the drawback of this
solution is that it outputs an area rather than pinpointing an exact
location. Therefore, the location estimation problem has need of
further processing.
sualization.


Fig. 6. Decomposition by cluster positions.

6170 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175
3.2.3. Estimation step
The MS location estimation phase gathers information about

the power actually received from the nearby APs. With this infor-
mation and the information about the optimized regions, we posi-
tion the MS in one of these regions. There are three possible ways
of doing this:

1. Exact location estimation: if the n SOMs output one position,
this is the MS location. If the chart plots a point located within
a region, we do not know the exact location of the MS.

2. Region estimation: the n SOMs output a region, and this region
is possibly different for each SOM. This conflict is settled by a
voting algorithm, where the position/region that is the most
common across all the SOMs is the final result.

3. Optimized region estimation according to the physical distribu-
tion of the space: as above, but only the regions that have points
that are close together in the physical space are taken into
account before the result is shown. For this purpose, rules are
built into the system to reduce the number of possible regions
for each result.The experimental results (see Section 4) show
that the likelihood of locating a station is high. Therefore, it is
possible to build optimized regions with a single wireless card
brand and use this infrastructure to locate any other brand of
wireless device.
4. Results of evaluation experiments

We applied our location estimation methodology shown in Sec-
tion 4 to implement an estimation location system and conducted
evaluation experiments on the 3rd floor of building 4 at the School
of Computing. Fig. 7 gives an overview of the experimental envi-
ronment. The figure shows the floor of part of our university build-
ing. The total surface area ranges from 300 to 500 m2. Faculty
members work in separate rooms. The walls are very thin (plaster).
This is an obstacle especially for locating MS in adjoining offices
because of the reception of similar powers from the APs.
Red circles show the location of each AP that we used in our
tests. Green squares show the locations (12) at which we took
measurements of the four AP signal strengths. The normalization
points are P1; P4; P6 and P7. In the survey step, we observed signal
strength for 4 min and took 240 measurements at each location.

The hardware that we used in our experiments included seven
different IEEE 802.11 wireless adapters; three wireless PC cards
(3Com, Enterasys and US Robotics); three wireless USB 2.0 adapt-
ers (SMC, Zoom and Ovislink) and one integrated internal Intel
Wi-Fi. APs are manufactured by US Robotics and 3Com.

We have divided the results into two parts for presentation,
depending on the measurements used to create the clusters. The
first part presents the results of using the measurements collected
from all the wireless cards. The second part uses measurements
from just one SMC wireless card. In this case, the data do not have
to be divided into a training and a test set, as the SMC card data
form the training set and the data of the other cards (not used to
define the SOMs) form the test set. In both cases, we have created
four different SOMs using the normalized measurements with re-
spect to the normalization points P1; P4; P6 and P7 in Fig. 7.

As the first step in the explanation of our method, we are going
to discuss an example using the experimental values (absolute or
pre-normalized values) obtained by different cards in the same
place (any place) on the floor in question:

mi;3Com ¼ð�92;�74;�67;�87Þ
mi;Enterasys ¼ð�86;�70;�52;�83Þ
mi;Ovislink ¼ð�75;�54;�49;�64Þ
mi;SMC ¼ð�82;�71;�53;�79Þ
mi;USRobotics ¼ðþ56;þ70;þ86 þ 63Þ

ð11Þ

Note that, as mentioned in Section 2, there is a wide spread of
values. This makes it impossible to directly apply the clustering
technique.

However, we can transform these absolute values into relative
values as explained in Section 3.2.2. This way, by normalizing the


Fig. 7. Floor plan.

L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6171
values with respect to the reference value P1 in each case, we get
the expansion vectors:

eP1i;3Com ¼ð�0:75;�0:02;�6:21;�2:69Þ
eP1i;Enterasys ¼ð�2:19;�0:94;�2:40;þ1:83Þ
eP1i;Ov islink ¼ðþ1:83;�0:11;�7:13;þ4:72Þ
eP1i;SMC ¼ð�0:29;þ0:67;þ2:27;þ1:99Þ
eP1i;USRobotics ¼ðþ0:63;�1:48;þ0:94;�0:78Þ

ð12Þ

By normalizing the values with respect to the reference point P4, we
get:

eP4i;3com ¼ð�5:52;�21:87;þ19:63;þ0:12Þ
eP4i;Enterasys ¼ð�6:61;�39:14;þ21:03;þ1:18Þ
eP4i;Ov islink ¼ð�6:46;�24:61;þ19:49;þ3:04Þ
eP4i;SMC ¼ð�7:07;�31:10;þ22:51;�0:01Þ
eP4i;USRobotics ¼ð�8:18;�34:09;þ22:76;þ1:58Þ

ð13Þ

By normalizing the values with respect to the reference point P6, we
get:

eP6i;3com ¼ð�7:34;þ0:83;þ22:18;�28; 67Þ
eP6i;Enterasys ¼ð�6:9;�7; 59;þ25; 34;�22:46Þ
eP6i;Ov islink ¼ð�13:61;þ1:34;þ23:74;�19:89Þ
eP6i;SMC ¼ð�12:67;�5:43;þ26:58;�25:12Þ
eP6i;USRobotics ¼ð�11:83;�7:51;þ29:24;þ22:24Þ

ð14Þ

and, finally, by normalizing the values with respect to the reference
value P7, we get:

eP7i;3com ¼ð�23:92;þ6:12;þ35:59;þ2:01Þ
eP7i;Enterasys ¼ð�30:28;þ3; 21;þ40:38;þ1:02Þ
eP7i;Ov islink ¼ð�35:82;þ9:74;þ34:58;þ3:03Þ
eP7i;SMC ¼ð�31:08;þ4:74;þ37:37;þ1:78Þ
eP7i;USRobotics ¼ð�33:73;þ2:09;þ34:87;þ2:24Þ

ð15Þ
After analysing the experimental results, we find that we have
managed to get very similar relative, card-independent values
using this method. This is because we measure the relative differ-
ence between two points across all the cards. This way, we can en-
ter these new relative data into our clustering system. The
deviations in the measurements, as shown in Fig. 8, are insignifi-
cant compared with the absolute values of the analysed cards.

Additionally, we could undertake the clustering process from k
perspectives (see Section 3.2.2). This will make the mobile user
location more precise.

4.1. Experiment 1: clustering data for all wireless cards

This is the first experiment we did by calibrating seven different
wireless devices and collecting received power data from the spec-
ified APs at the locations of interest.

After building the four SOMs, we named the clearly separate re-
gions in each one. For naming purposes, the SOM is plotted on a
chart. The measurement position indicated in the data is super-
posed, and, as you can see, the positions are clearly grouped (see
Fig. 9).

Having named the regions, each SOM will classify the measure-
ments in a particular location region or area where the wireless de-
vice could be located. Therefore, as we have created four SOMs, we
will have four regions or locations for the device depending on the
AP on which the measurement is normalized. This way, we have
located the wireless device. If the result of the four SOMs is the
same position, this is the device’s location. Otherwise we have to
analyse the different SOM responses to be able to locate the device.

Fig. 10 shows the confusion matrix for the SOMs using the test
set data. The table rows list the positions predicted by the system,
whereas the columns list the real positions. Clearly, the system
correctly predicts position P1, for example, in 98.15% of the cases,
and these points are incorrectly classified in 1.85% of the cases.

These results are quite acceptable for some positions, like
P1; P12 or P4, for example, but not for others, like P3 or P7, where
most of the test cases are classified incorrectly. Many of these
incorrect classifications correspond to the region defined as multi-
ple. This means that the four SOMs indicate that the wireless device
is equally likely to be located in several positions.


Fig. 10. Confusion matrix for SOM predictions.

Fig. 8. Comparison of absolute/normalized measurement STD.

Fig. 9. Region naming.

6172 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175
A voting algorithm has been used to find out the position or
positions (also termed regions) of the wireless device that the
SOMs together locate in multiple positions. In this algorithm, each
SOM indicates the region that it assigns to the device. The final re-
gion is the most voted by all the SOMs. The percentage of correctly
assigned regions is calculated using the following formula:


L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6173
XReg3Pos
Red

%CorrectedReg ð16Þ

Fig. 11 shows the result of using the location regions instead of
the exact measurement position. Comparing the total results (cor-
rected line) in Figs. 10 and 11, we find that the predictions have
improved in all cases as a result of using the regions. Taking posi-
tion P3, for example, we find that the system gives a correct re-
sponse in 1.85% of the cases using positions, whereas the success
rate is 90.74% if we use regions.
Fig. 11. Regions con

Fig. 12. Position vs. region

$Region Count Position
$Region P1 P12 P3 P4
P1 98,15%
P4,P5 98,15%
P7,P9,P12 5,56%
P3,P6,P12 94,44% 90,74%
P6,P9,P12 83,33%
P8,P9
Corrected 98,15% 94,44% 90,74% 98,15%

Fig. 13. Optimized
Fig. 12 is a graph comparing the success rate for positions and
regions. It shows that the system’s success rate increases signifi-
cantly using regions.

The system’s success rate increases using the regions, but the
approach has the drawback of there being very extensive regions
covering a large part of the physical space in which the wireless de-
vice can be located, e.g. ½P4; P5; P8; P9� or ½P5; P9; P12�. To reduce
the positioning area, at this point we enter information about the
layout of the physical space of the premises where the system is
going to be used, e.g. information related to the proximity of the
fusion matrix.

prediction comparison.

P5 P6 P7 P8 P9

61,11%
97,78% 90,74%

92,59%
16,67% 40,74%

75,56%
61,11% 92,59% 97,78% 75,56% 90,74%

region results.


Fig. 14. Comparison between estimated position, region and optimized region.

6174 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175
regions. The results with the optimized regions are shown in
Fig. 13. Comparing the results with the non-optimized regions
(see Fig. 14), we find that the success rates drops in some cases
and increases in others. The advantage is, though, that the regions
are quite a lot smaller, and the positioning is much more accurate
in most cases.

Fig. 14 is a comparison of the three wireless device location esti-
mation methods. In view of these results, we conclude that the best
and most accurate way to do things is to use regions optimized
according to the physical arrangement of the premises.

4.2. Experiment 2: clustering data for one wireless card

This experiment addresses a real operating environment where
there is at first just one type of wireless device on which to take the
$Region Count Position
$Region P1 P12 P3
P1 97,78%
P7,P8,P9 81,11%
P3,P6 94,89
P4,P5
P7,P9 64,00%
P8,P9 60,00%
Corrected 97,78% 81,11% 94,89

Fig. 15. Optimized

Fig. 16. Comparison between estimated p
measurements. As wireless devices of other brands are acquired,
they are located without modifying the SOMs or the regions de-
fined for the first device.

Therefore, unlike the last experiment, where all the measure-
ments on the available wireless devices were used, we have not di-
vided the data into a training set and a test set. In this case, the SMC
card data form the training set and the data on the other cards (not
used to define the SOMs) make up the test set. Fig. 15 shows the re-
sults of locating different devices from the data used to create the
SOMs. As you can see, success rate is over 90% in most cases.

Fig. 16 shows the comparison between using the position, the
region and the optimized region if a single card is used to create
the SOMs.

Finally, we can compare the results of using data from a single
wireless device and using the data from all the devices to create
P4 P5 P6 P7 P8 P9

% 95,33%
96,89% 74,00%

88,06% 91,11%
91,39%

% 96,89% 74,00% 95,33% 88,06% 91,39% 91,11%

region results.

osition, region and optimized region.


Fig. 17. One wireless device vs. all wireless devices used to build SOMs.

L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6175
the SOMs. Fig. 17 shows that the results are quite similar. The out-
come is sometimes better using all the cards, and the use of just
one card gets better results in other cases.

From these results we can conclude that the system can be used
in any environment with a wireless device and at least three APs.
5. Conclusions

We conceived and developed a system that is highly likely to
correctly estimate the location of a mobile terminal indoors in a
room or adjacent rooms.

Unlike many other implemented systems, the resulting system
is independent of the card brand and hardware technology (USB,
PCMCIA) manufacturer. Our system evaluates the position by rela-
tive differences in the measurements taken at places close to the
APs. The system is also adaptable to any environment. All it takes
is system training with any card at locations on the floor, the nor-
malization of the measurements and the creation of clusters. As of
this point, the system is ready to locate any user with any wireless
card.
Role of the funding source

This work was conducted as part of the CYCIT-funded Project
No. TIN2008-05924.

References

Ali, S., & Nobles, P. (2007). A novel indoor location sensing mechanism for IEEE
802.11 b/g wireless lan. In 4th Workshop on positioning, navigation and
communication (WPNC ’07) (pp. 9–15).

Astrain, J. J., Villadangos, J., Garitagoitia, J. R., González-Mendivil, J. R., & Cholvim, V.
(2006). Fuzzy location and tracking on wireless networks. In International
workshop on mobility management and wireless access (pp. 84–91).

Battiti, R., Le Nhat, T., & Villani, A. (2002). Location-aware computing: A neural
network model for determining location in wireless lans. Technical Report DIT-
5. Universita di Trento, Dipartimento di Informatica e Telecomunicazioni.

Chai, X., & Yang, Q. (2007). Reducing the calibration effort for probabilistic indoor
location estimation. IEEE Transactions on Mobile Computing, 6(6), 649–662. June.
Debono, C. J., & Buhagiar, J. K. (2004). Neural location detection in wireless network.
In 7th European conference on wireless technology.

Golden, S. A., & Bateman, S. S. (2007). Sensor measurements for Wi-Fi location with
emphasis on time-of-arrival ranging. IEEE Transactions on Mobile Computing,
6(10), 1185–1198.

Kaemarungsi, K. (2005). Design of indoor positioning systems based on location
fingerprinting technique. PhD thesis. University Pittsburg.

Kaemarungsi, K. (2006). Distribution of WLAN received signal strength indication
for indoor location determination. In 1st International symposium on wireless
pervasive computing.

Kontkanen, P., Myllymäki, P., Roos, T., Tirri, H., Valtonen, K., & Wettig, H. (2004).
Topics in probabilistic location estimation in wireless networks. In 15th IEEE
international symposium on personal, indoor and mobile radio communications,
Barcelona, Spain. IEEE Press.

Nezafat, M., Kaveh, M., Tsuji, H., & Fukagawa, T. (2004). Localization of wireless
terminals using subspace matching with ray-tracing-based simulations. In
Sensor array and multichannel signal processing workshop proceedings (pp. 623–
627).

Nuño, G., & Páez-Borrallo, J. M. (2006). New location estimation system for wireless
networks based on linear discriminant functions and hidden Markov models.
EURASIP Journal on Applied Signal Processing.

Pan, J. J., Kwok, J. T., Yang, Q., & Chen, Y. (2006). Multidimensional vector regression
for accurate and low-cost location estimation in pervasive computing. IEEE
Transactions on Knowledge and Data Engineering, 18(9), 1181–1193.

Raniwala, Ashish, & Chiueh, Tzi-Cker (2002). Deployment issues in enterprise
wireless lans. Wireless Communications.

Robinson, M., & Psaromiligkos, I. (2005). Received signal strength based location
estimation of a wireless lan client. IEEE Wireless Communications and Networking
Conference, 4, 2350–2354.

Sanchez, D., Afonso, S., Macias, E. M., & Suarez, A. (2006). Devices location in 802.11
infrastructure networks using triangulation. In The 2006 IAENG international
workshop on wireless networks, Hong Kong.

Seigo, I., & Kawaguchi, N. (2005). Bayesian based location estimation system using
wireless lan. In Proceedings of the third IEEE international conference on pervasive
computing and communications workshops (pp. 273–278).

Widyawan, K., Klepal, M., & Pesch, D. (2007). Influence of predicted and measured
fingerprint on the accuracy of RSSI-based indoor location systems. In 4th
Workshop on Positioning, Navigation and Communication (WPNC ’07) (pp. 145–
151).

Yiming, J., Biaz, S., Pandey, S., & Agrawal, P. (2006). Ariadne: A dynamic indoor
signal map construction and localization system. In International conference on
mobile systems, applications and services (pp. 151–164).

Youssef, M., & Agrawala, A. K. (2005). The Horus WLAN location determination
system. In Third international conference on mobile systems, applications, and
services (MobiSys 2005), Seattle, WA, USA.

Zhong, J., Bin-Hong, L., Hao-Xing, W., Hsing-Yi, C., & Sarkar, T. K. (2001). Efficient
ray-tracing methods for propagation prediction for indoor wireless
communications. IEEE Antennas and Propagation Magazine, 43(2).


	Clustering-based location in wireless networks
	Introduction
	Location estimation system issues
	Location estimation methodology
	Outline of the system
	Location estimation algorithm
	Measurement phase
	Calibration phase
	Estimation step


	Results of evaluation experiments
	Experiment 1: clustering data for all wireless cards
	Experiment 2: clustering data for one wireless card

	Conclusions
	Role of the funding source
	References