Clustering-based location in wireless networks Expert Systems with Applications 37 (2010) 6165–6175 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Clustering-based location in wireless networks Luis Mengual, Oscar Marbán *, Santiago Eibe Facultad de Informática, Universidad Politécnica de Madrid (UPM), Spain a r t i c l e i n f o Keywords: Indoor location Wireless lan 802.11 Signal strength Fingerprint techniques Clustering 0957-4174/$ - see front matter � 2010 Elsevier Ltd. A doi:10.1016/j.eswa.2010.02.111 * Corresponding author. Address: Campus de Monte de Informática, Universidad Politécnica de Madrid Monte, Madrid, Spain. E-mail addresses: lmengual@fi.upm.es (L. Meng Marbán), seibe@fi.upm.es (S. Eibe). a b s t r a c t In this paper, we propose a three-phase methodology (measurement, calibration and estimation) for locating mobile stations (MS) in an indoor environment using wireless technology. Our solution is a fin- gerprint-based positioning system that overcomes the problem of the relative effect of doors and walls on signal strength and is independent of network device manufacturers. In the measurement phase, our sys- tem collects received signal strength indicator (RSSI) measurements from multiple access points. In the calibration phase, our system utilizes these measurements in a normalization process to create a radio map, a database of RSS patterns. Unlike traditional radio map-based methods, our methodology normal- izes RSS measurements collected at different locations (on a floor) and uses artificial neural network models (ANNs) to group them into clusters. In the third phase, we use data mining techniques (cluster- ing) to optimize location results. Experimental results demonstrate the accuracy of the proposed method. From these results it is clear that the system is highly likely to be able to locate a MS in a room or nearby room. � 2010 Elsevier Ltd. All rights reserved. 1. Introduction The use of wireless technology has expanded over recent years. Universities, private companies, as well as stations and airports all provide wireless communication facilities. This type of networks can be implemented using what are known as access point devices that provide total connectivity over a surface and connection to Internet. Several such devices are sometimes necessary to cover an office, a university or an airport. One of the services to be implemented on these networks is user location on the premises. Through location and the visualiza- tion of a plan of the coverage area, users can navigate the premises as they would do using GPS navigation in an outdoor environment. Additionally, a network manager would be able to pinpoint users in such environments and interact with them through their posi- tioning. Indoor positioning technology can also complement mo- bile–telephone-based GPS user positioning to determine the users exact location indoors. In another context, location tracking is an essential feature for enterprises building business-critical wireless networks. If infor- mation technology (IT) staff can identify and track the location of wireless clients and highly mobile assets, they can improve the accuracy of WLAN planning and deployment, optimize ongoing ll rights reserved. gancedo s/n. DLSIIS, Facultad (UPM), 28660 Boadilla del ual), omarban@fi.upm.es (O. network performance, enhance wireless security, and improve the usefulness and value of important business applications. Loca- tion tracking provides enhanced visibility and control of air space, helping IT staff to deploy wireless networks that are as easy to manage and as effective to deploy as traditional wired networks. To address lack-of-visibility problems, organizations need a cost-effective, easy-to-deploy solution for tracking and managing thousands of Wi-Fi devices and tags across a variety of business environments. In heterogeneous environments, e.g. inside a building, though, the received power is a very complex function of distance, wall geometry, building infrastructures and obstacles. Even if you have a detailed model of the building, it takes a lengthy simulation to solve the direct problem of deriving signal strength given the loca- tion. This is what has motivated us to consider flexible models based on functions networks (neural networks) to implement a system to locate MSs. The received signal strength (RSS), a measure of the power re- ceived by the client from an access point (AP), is the key parameter for establishing the position of a MS in an indoor environment. Tra- ditionally, WLANs have used different measurement techniques to derive the position of MSs (Kaemarungsi, 2005; Raniwala & Chiueh, 2002). There are three major categories: closest access point-based techniques, location pattern or fingerprint and distance or angle measurement (see Fig. 1). The closest AP method finds devices within the total coverage area of a single AP. It is the simplest but least accurate way to lo- cate a device or user. With the closest AP method, the location tracking system identifies only devices within the total coverage http://dx.doi.org/10.1016/j.eswa.2010.02.111 mailto:lmengual@fi.upm.es mailto:omarban@fi.upm.es mailto:seibe@fi.upm.es http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa Fig. 1. WLAN positioning techniques. 6166 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 area of a single AP. This area can be quite large and include multi- ple rooms. This system considers an AP to which a terminal con- nects as the user location. The main distance measurement-based approach is known as TDOA (Time Difference of Arrival). This technique relies on the timing precision between the signal transmitter and receiver. It uses the propagation delay to calculate the distance between transmitter and receiver (Golden & Bateman, 2007). Therefore, a precise synchronization is very important in such systems. By combining at least three distances from three reference positions, triangulation can be used to estimate the MS’s location. Such techniques will require a high accuracy clock in the communica- tion system. Angle of arrival (AOA) is a technique based on angle measure- ment. This methodology locates the MS by determining the signal angle of incidence. The location can be estimated using simple geo- metric relationships, like triangulation (Sanchez, Afonso, Macias, & Suarez, 2006). In indoor environments, however, the distance between trans- mitter and receiver is usually shorter than the time resolution that the system can measure. Additionally, the MS is surrounded by scattered objects, which results in multiple angles of signal recep- tion. Therefore, the AOA and TDOA approaches are difficult to implement in indoor environments. Fingerprinting techniques generally only require measurement of received signal strength or other non-geometric features at several locations to form a database of location fingerprints (Kaemarungsi, 2006; Widyawan, Klepal, & Pesch, 2007). To estimate the mobile location, the system needs to first measure the received signal strength at particular locations and then search for the pattern or fingerprints with the closest match in the database. Generally, the deployment of fingerprinting-based positioning systems can be divided into two phases. First, the location finger- prints are collected in the off-line or calibration phase by perform- ing a site survey of the received signal strength (RSS) from multiple APs. Enough RSS measurements are taken to set up a database or a table of RSS patterns for predetermined points of the coverage area. The database of RSS patterns with their respective locations is called a radio map. Second, a MS will report a sample measured vector of RSSs from different APs to a central server (or a group of APs will collect the RSS measurements from a MS and send it to the server) in the on- line or estimation phase. The server uses a positioning algorithm to estimate the location of the MS based on the radio map and reports the estimate back to the MS (or the application requesting the po- sition information). At this point we have several possibilities. The most common algorithm used to estimate the location computes the Euclidean distance between the sample measured RSS vector and each finger- print in the database. For indoor positioning systems, other ad- vanced algorithms and techniques, such as neural networks (Battiti, Le Nhat, & Villani, 2002; Debono & Buhagiar, 2004; Yiming, Biaz, Pandey, & Agrawal, 2006; Youssef & Agrawala, 2005), probabilistic methods (Chai & Yang, 2007; Kontkanen et al., 2004; Pan, Kwok, Yang, & Chen, 2006; Robinson & Psaromiligkos, 2005; Seigo & Kawaguchi, 2005) or fuzzy logic (Astrain, Villadan- gos, Garitagoitia, González-Mendivil, & Cholvim, 2006), have been introduced to determine the relationship between RSS samples and the location fingerprint in the radio map. Other approaches have been presented. For example, some authors have proposed simulating the calibration phase with a ray-tracing model (Nezafat, Kaveh, Tsuji, & Fukagawa, 2004; Nuño & Páez-Borrallo, 2006; Zhong, Bin-Hong, Hao-Xing, Hsing- Yi, & Sarkar, 2001). However, this model requires a highly detailed model of the building. It could be tricky, if not unfeasible, to get such a detailed model. One of the main drawbacks of both of these models is that signal strength prediction is dependent not just on building layout, but also on the position of many other hard-to- model components, such as furniture, equipment and human beings. So far, fingerprinting techniques with a real calibration phase have attracted more attention because they are a simple and the most effective solution for indoor environments. Of the solutions that use RSSI to locate a mobile user, some scan the radioelectric spectrum of every square metre of the coverage area. Additionally, they only use one card brand. It is unfeasible to position another mobile user with a different Wi-Fi receiver using this solution. The solution proposed here is a fingerprint-based positioning system that gets significantly better results than other similar sys- tems because it normalizes measurements and optimizes clusters formed according to the physical features of the ground plan. In this way, the proposed solution is valid for any Wi-Fi receiver, as we account for the relative effect of walls and obstacles as a rela- tive power increase or decrease. Additionally, our technique does not require an exhaustive scan of the coverage area. Actually we present the results of sampling just one point per location. Our system can adapt to the geometry of each scenario and operate with any Wi-Fi device (irrespective of the manufacturer). Finally, we present the results of implementing our system in practice and the evaluation tests run in a School of Computing L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6167 building (Department of Computer Languages and Systems and Software Engineering). The experimental results confirm the effec- tiveness of our methodology. 2. Location estimation system issues Unfortunately, commercial 802.11 hardware does not provide positioning functionality. Even so, positioning can be implemented using RSS measurements provided by an 802.11 wireless card. RSS is a measure of the power received by the client from an AP and provides information as to the clients location. In heterogeneous environments, e.g. inside a building, though, the received power is a very complex function of distance, geome- try, and materials. This is further complicated by the fact that the signal propagation is influenced by environmental factors like, for example, the number of people in the working area, the posi- tion of walls and other building infrastructures and the materials they are made of, as well as the multi-path effect (Ali & Nobles, 2007). Another problem is the RSSI gathered from wireless stations using their device driver. RSSI is an optional parameter that has a value of 0 through to RSSI_Max. This parameter is a measure of the physical sublayer of the energy observed at the antenna used to receive the current frame. RSSI shall be measured from the beginning of the start frame delimiter (SFD) to the end of the frame, i.e. header error check (HEC). RSSI is intended to be used in a relative manner. Absolute RSSI reading accuracy is not specified. There is nothing in the 802.11 standard that stipulates a rela- tionship between the RSSI value and any particular energy level as could be measured in mW or dBm. Individual vendors have cho- sen to provide their own levels of accuracy, granularity, and actual power range (measured as mW or dBm) and their RSSI value range (from 0 to RSSI_Max). Some vendors do not provide an RSSI value and instead convert directly from dBm to percent. In this study we used our own RSS collecting software to test different wireless cards. Each card collected 300 samples over a period of 5 minutes (1 sample/s) on the 3rd Floor of Building 4 at the School of Computing (Universidad Politcnica Madrid). Fig. 2 shows the measurable ranges for the different wireless cards in a room. We found that there was a difference of 20 dB between dif- ferent cards (3Com/Ovislink). There is even a card (USROBOTICS) that implements RSSI as a percentage and not in dBm units. From these results we conclude that the mapping between the actual RF energy and the RSSI range in a wireless device varies Fig. 2. Measurable RSS ra from one vendor to another. Since the range and the measurement of RSS depend on the WLAN card, most fingerprint-based position- ing systems have used the same wireless card vendor to collect the location fingerprints and determine the location. The results of our experimental tests match the study run in Kaemarungsi (2006), showing the spread of values for five card manufacturers. In Kaemarungsi (2006), Kaemarungsi even goes as far as to recommend the use of the same card manufacturer for radio map construction-based positioning systems. However, our methodology is capable of working with different wireless cards because we normalize all RSS measurements in the calibra- tion phase. 3. Location estimation methodology The methodology that we propose can locate any mobile user in any indoor environment equipped with several symmetrically dis- tributed APs. In the following we describe the proposed system. 3.1. Outline of the system Our system is a three-phase fingerprint-based positioning solu- tion, as shown in Fig. 3. 1. In phase 1, RSS measurements are collected from multiple APs positioned in different rooms on one floor. These will be the location fingerprints used to create the RSS database, called the radio map, of that floor. 2. This information is processed in a calibration phase. The cali- bration phase is divided into several stages. First, we normalize the collected measurements to identify the relative effect of walls and obstacles. Second, we use neural networks and the computed normalized values to group the measurements in clusters. Finally, we use the physical topology to optimize the clusters formed. 3. In phase 3, called the estimation phase, our system can locate any MS from its current RSS measurements in one or two rooms on the floor. This is possible because of the cluster optimization logic and the applied normalization methodology. In our system we chose to use Kohonen networks as a clustering tool. Koho- nen networks represent a type of self-organizing map (SOM), which is actually a special class of neural networks. SOMs are based on competitive learning, where the output nodes com- pete with each other to be the winning node (or neuron), the only node to be activated by a particular input observation. This m nge of WLAN cards. RadioMap Builder Wireless DeviceDriver Normalization MeasuresClustering RadioMap OptimizedClusters Estimator Optimization ClusteringLogic EstimatedLocation Measurement Phase Estimation PhaseCalibrationPhase Normalization Measures Signal Strength Measurements Physical Distribution Plant RadioMap Builder RadioMap Builder Wireless DeviceDriver Normalization Measures Normalization MeasuresClusteringClustering RadioMap Estimator Optimization ClusteringLogic Optimization ClusteringLogic EstimatedLocation Measurement Phase Estimation PhaseCalibrationPhase Normalization Measures Normalization Measures Signal Strength Measurements Physical Distribution Plant Physical Distribution Plant Fig. 3. Location estimation methodology structure. 6168 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 clustering technique has the advantage of converting a complex high-dimensional input signal into a two-dimensional discrete map. In the next section we detail the location estimation system. 3.2. Location estimation algorithm Our location estimation system consists of the following phases (see Fig. 3). 3.2.1. Measurement phase As already mentioned, the RSSI measurements database is built in this phase. To do this, we systematically measure the signal val- ues relative to the available APs. The calculations, which we will describe later, call for at least three APs arranged to form a triangle. If there are more than three APs, they should be positioned sym- metrically to optimize coverage. As many measurements will be taken as necessary to assure that the results are significant. At the end of the process, a vector S is output for each measure- ment. This vector represents the signal power values received from k surrounding APs (in this case k ¼ 4): S ¼ðs1; s2; . . . ; skÞ ð1Þ In this paper we will demonstrate that, according to our meth- odology, the make of the card used to build the radio map is irrel- evant. Generally, as explained in Section 2, the values output would be different. However, later data processing makes them va- lid for estimating the position of any card. 3.2.2. Calibration phase This phase can be divided into the following stages illustrated in Fig. 4. Normalization. The normalization of measurements is a key as- pect in our methodology. Thanks to normalization, we can take into account the relative effect of walls and obstacles on the signal loss from the surrounding APs. Additionally, the normalization of measurements allows us to use any wireless device to estimate the position of a mobile terminal. To understand this key point, lets look at the measurements in Fig. 2. The table data are evidence of the spread of the received RSSI values. This rules out the use of a clustering technique. However, you can solve this problem by referencing all the power values to fixed coordinate sources on the map. The normalization of measurements involves first setting the normalization reference points. We will set both the reference points and APs existing in the region under analysis will be set. Actually, the coordinates of the measurement point nearest to each AP will be taken as the normalization reference. Consequently, we considered four reference point coordinates x1; x4; x6 and x7 , respectively, in this study. They correspond to the symmetric posi- tions in which the APs were located. Second, we calculate the mean value of the measures taken using the different cards in each of the rooms with the set reference points. This way, we get the following data. Let k be number of APs = number of reference points, and lets take n measurements as follows: mi ¼ðS1; S2; . . . ; SnÞ where i ¼ 1; . . . ; n ð2Þ measured at reference point: mRj ¼ðS R 1; S R 2; . . . ; S R kÞ ð3Þ e.g. at R ¼ P1: mP1j ¼ðS P1 1 ; S P1 2 ; . . . ; S P1 k Þ ð4Þ Mean of the measurements at the reference points: MR ¼ �mR ¼ Pn j¼1 S R 1j n ; Pn j¼1 S R 2j n ; . . . ; Pn j¼1S R kj n ! ð5Þ e.g. at R ¼ P1: MP1 ¼ �mP1 ¼ Pn j¼1S P1 1j n ; Pn j¼1 S P1 2j n ; . . . ; Pn j¼1 S P1 kj n ! ð6Þ Fig. 4. Detailed calibration phase. L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6169 Then we proceed to expand the measures by making: 8mi ¼ðS1; S2; . . . ; SkÞ; 8R ¼ 1; . . . ; k eRi ¼ �m R � miÞ � � ¼ MR � mi � � ð7Þ This would give, for one reference point, the vector eRi of size 1 � k and, for k reference points, we would have the vector Ei of size 1 �ðk � kÞ Ei ¼ðe R1 i ; e R2 i ; . . . ; e Rk i Þ ð8Þ Example. If R ¼ P1; P4; P6; P7: eP1i ¼ðM P1 � miÞ; eP4i ¼ðM P4 � miÞ eP6i ¼ðM P6 � miÞ; eP7i ¼ðM P7 � miÞ ð9Þ the expansion vector of the measurements would be Ei ¼ðeP1i ; e P4 i ; e P6 i ; e P7 i Þ ð10Þ This expansion process is motivated by the analysis of the experimental results, as we will see in the evaluation section. Clustering and optimized clustering. Having built the radio map of normalized measures our system applies artificial intelligence techniques to estimate the location. We will actually use clustering techniques to form different SOMs from the values of the normal- ized measurements with respect to each reference point. This way, Fig. 5. Cluster vi we will get as many SOMs as reference points were set in the system. A classification method validation process can be included in SOM construction, although, as we will see in the examples, this is not always necessary. The process involves dividing the input data into two sets: a training set to form the SOMs and a test data set to test the SOMs and check they are capable of locating a device whose measurements have not been used to form the SOM. With this test set, we can evaluate the performance of the SOMs in terms of percentage of correct locations. The original data from the experiments run are partitioned at random as follows: 90% for the training set and 10% for the test set. The next step is to optimize the above clusters. To get a preli- minary evaluation of the cluster obtained from the positions with respect to their arrangement in the physical space, the cluster is plotted on a chart. Fig. 5 shows this chart. As the chart shows, all the clusters contain measures taken at different positions. As our ultimate aim is to define regions in which to locate measurements, we decompose the above chart into planes. Each plane contains the measurements of just one position, as shown in Fig. 6. The regions containing points from each posi- tion are easier to identify in Fig. 6. The result of the classification will now be associated with a region instead of a position, as had been the case up to this point. Obviously, the drawback of this solution is that it outputs an area rather than pinpointing an exact location. Therefore, the location estimation problem has need of further processing. sualization. Fig. 6. Decomposition by cluster positions. 6170 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 3.2.3. Estimation step The MS location estimation phase gathers information about the power actually received from the nearby APs. With this infor- mation and the information about the optimized regions, we posi- tion the MS in one of these regions. There are three possible ways of doing this: 1. Exact location estimation: if the n SOMs output one position, this is the MS location. If the chart plots a point located within a region, we do not know the exact location of the MS. 2. Region estimation: the n SOMs output a region, and this region is possibly different for each SOM. This conflict is settled by a voting algorithm, where the position/region that is the most common across all the SOMs is the final result. 3. Optimized region estimation according to the physical distribu- tion of the space: as above, but only the regions that have points that are close together in the physical space are taken into account before the result is shown. For this purpose, rules are built into the system to reduce the number of possible regions for each result.The experimental results (see Section 4) show that the likelihood of locating a station is high. Therefore, it is possible to build optimized regions with a single wireless card brand and use this infrastructure to locate any other brand of wireless device. 4. Results of evaluation experiments We applied our location estimation methodology shown in Sec- tion 4 to implement an estimation location system and conducted evaluation experiments on the 3rd floor of building 4 at the School of Computing. Fig. 7 gives an overview of the experimental envi- ronment. The figure shows the floor of part of our university build- ing. The total surface area ranges from 300 to 500 m2. Faculty members work in separate rooms. The walls are very thin (plaster). This is an obstacle especially for locating MS in adjoining offices because of the reception of similar powers from the APs. Red circles show the location of each AP that we used in our tests. Green squares show the locations (12) at which we took measurements of the four AP signal strengths. The normalization points are P1; P4; P6 and P7. In the survey step, we observed signal strength for 4 min and took 240 measurements at each location. The hardware that we used in our experiments included seven different IEEE 802.11 wireless adapters; three wireless PC cards (3Com, Enterasys and US Robotics); three wireless USB 2.0 adapt- ers (SMC, Zoom and Ovislink) and one integrated internal Intel Wi-Fi. APs are manufactured by US Robotics and 3Com. We have divided the results into two parts for presentation, depending on the measurements used to create the clusters. The first part presents the results of using the measurements collected from all the wireless cards. The second part uses measurements from just one SMC wireless card. In this case, the data do not have to be divided into a training and a test set, as the SMC card data form the training set and the data of the other cards (not used to define the SOMs) form the test set. In both cases, we have created four different SOMs using the normalized measurements with re- spect to the normalization points P1; P4; P6 and P7 in Fig. 7. As the first step in the explanation of our method, we are going to discuss an example using the experimental values (absolute or pre-normalized values) obtained by different cards in the same place (any place) on the floor in question: mi;3Com ¼ð�92;�74;�67;�87Þ mi;Enterasys ¼ð�86;�70;�52;�83Þ mi;Ovislink ¼ð�75;�54;�49;�64Þ mi;SMC ¼ð�82;�71;�53;�79Þ mi;USRobotics ¼ðþ56;þ70;þ86 þ 63Þ ð11Þ Note that, as mentioned in Section 2, there is a wide spread of values. This makes it impossible to directly apply the clustering technique. However, we can transform these absolute values into relative values as explained in Section 3.2.2. This way, by normalizing the Fig. 7. Floor plan. L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6171 values with respect to the reference value P1 in each case, we get the expansion vectors: eP1i;3Com ¼ð�0:75;�0:02;�6:21;�2:69Þ eP1i;Enterasys ¼ð�2:19;�0:94;�2:40;þ1:83Þ eP1i;Ov islink ¼ðþ1:83;�0:11;�7:13;þ4:72Þ eP1i;SMC ¼ð�0:29;þ0:67;þ2:27;þ1:99Þ eP1i;USRobotics ¼ðþ0:63;�1:48;þ0:94;�0:78Þ ð12Þ By normalizing the values with respect to the reference point P4, we get: eP4i;3com ¼ð�5:52;�21:87;þ19:63;þ0:12Þ eP4i;Enterasys ¼ð�6:61;�39:14;þ21:03;þ1:18Þ eP4i;Ov islink ¼ð�6:46;�24:61;þ19:49;þ3:04Þ eP4i;SMC ¼ð�7:07;�31:10;þ22:51;�0:01Þ eP4i;USRobotics ¼ð�8:18;�34:09;þ22:76;þ1:58Þ ð13Þ By normalizing the values with respect to the reference point P6, we get: eP6i;3com ¼ð�7:34;þ0:83;þ22:18;�28; 67Þ eP6i;Enterasys ¼ð�6:9;�7; 59;þ25; 34;�22:46Þ eP6i;Ov islink ¼ð�13:61;þ1:34;þ23:74;�19:89Þ eP6i;SMC ¼ð�12:67;�5:43;þ26:58;�25:12Þ eP6i;USRobotics ¼ð�11:83;�7:51;þ29:24;þ22:24Þ ð14Þ and, finally, by normalizing the values with respect to the reference value P7, we get: eP7i;3com ¼ð�23:92;þ6:12;þ35:59;þ2:01Þ eP7i;Enterasys ¼ð�30:28;þ3; 21;þ40:38;þ1:02Þ eP7i;Ov islink ¼ð�35:82;þ9:74;þ34:58;þ3:03Þ eP7i;SMC ¼ð�31:08;þ4:74;þ37:37;þ1:78Þ eP7i;USRobotics ¼ð�33:73;þ2:09;þ34:87;þ2:24Þ ð15Þ After analysing the experimental results, we find that we have managed to get very similar relative, card-independent values using this method. This is because we measure the relative differ- ence between two points across all the cards. This way, we can en- ter these new relative data into our clustering system. The deviations in the measurements, as shown in Fig. 8, are insignifi- cant compared with the absolute values of the analysed cards. Additionally, we could undertake the clustering process from k perspectives (see Section 3.2.2). This will make the mobile user location more precise. 4.1. Experiment 1: clustering data for all wireless cards This is the first experiment we did by calibrating seven different wireless devices and collecting received power data from the spec- ified APs at the locations of interest. After building the four SOMs, we named the clearly separate re- gions in each one. For naming purposes, the SOM is plotted on a chart. The measurement position indicated in the data is super- posed, and, as you can see, the positions are clearly grouped (see Fig. 9). Having named the regions, each SOM will classify the measure- ments in a particular location region or area where the wireless de- vice could be located. Therefore, as we have created four SOMs, we will have four regions or locations for the device depending on the AP on which the measurement is normalized. This way, we have located the wireless device. If the result of the four SOMs is the same position, this is the device’s location. Otherwise we have to analyse the different SOM responses to be able to locate the device. Fig. 10 shows the confusion matrix for the SOMs using the test set data. The table rows list the positions predicted by the system, whereas the columns list the real positions. Clearly, the system correctly predicts position P1, for example, in 98.15% of the cases, and these points are incorrectly classified in 1.85% of the cases. These results are quite acceptable for some positions, like P1; P12 or P4, for example, but not for others, like P3 or P7, where most of the test cases are classified incorrectly. Many of these incorrect classifications correspond to the region defined as multi- ple. This means that the four SOMs indicate that the wireless device is equally likely to be located in several positions. Fig. 10. Confusion matrix for SOM predictions. Fig. 8. Comparison of absolute/normalized measurement STD. Fig. 9. Region naming. 6172 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 A voting algorithm has been used to find out the position or positions (also termed regions) of the wireless device that the SOMs together locate in multiple positions. In this algorithm, each SOM indicates the region that it assigns to the device. The final re- gion is the most voted by all the SOMs. The percentage of correctly assigned regions is calculated using the following formula: L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6173 XReg3Pos Red %CorrectedReg ð16Þ Fig. 11 shows the result of using the location regions instead of the exact measurement position. Comparing the total results (cor- rected line) in Figs. 10 and 11, we find that the predictions have improved in all cases as a result of using the regions. Taking posi- tion P3, for example, we find that the system gives a correct re- sponse in 1.85% of the cases using positions, whereas the success rate is 90.74% if we use regions. Fig. 11. Regions con Fig. 12. Position vs. region $Region Count Position $Region P1 P12 P3 P4 P1 98,15% P4,P5 98,15% P7,P9,P12 5,56% P3,P6,P12 94,44% 90,74% P6,P9,P12 83,33% P8,P9 Corrected 98,15% 94,44% 90,74% 98,15% Fig. 13. Optimized Fig. 12 is a graph comparing the success rate for positions and regions. It shows that the system’s success rate increases signifi- cantly using regions. The system’s success rate increases using the regions, but the approach has the drawback of there being very extensive regions covering a large part of the physical space in which the wireless de- vice can be located, e.g. ½P4; P5; P8; P9� or ½P5; P9; P12�. To reduce the positioning area, at this point we enter information about the layout of the physical space of the premises where the system is going to be used, e.g. information related to the proximity of the fusion matrix. prediction comparison. P5 P6 P7 P8 P9 61,11% 97,78% 90,74% 92,59% 16,67% 40,74% 75,56% 61,11% 92,59% 97,78% 75,56% 90,74% region results. Fig. 14. Comparison between estimated position, region and optimized region. 6174 L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 regions. The results with the optimized regions are shown in Fig. 13. Comparing the results with the non-optimized regions (see Fig. 14), we find that the success rates drops in some cases and increases in others. The advantage is, though, that the regions are quite a lot smaller, and the positioning is much more accurate in most cases. Fig. 14 is a comparison of the three wireless device location esti- mation methods. In view of these results, we conclude that the best and most accurate way to do things is to use regions optimized according to the physical arrangement of the premises. 4.2. Experiment 2: clustering data for one wireless card This experiment addresses a real operating environment where there is at first just one type of wireless device on which to take the $Region Count Position $Region P1 P12 P3 P1 97,78% P7,P8,P9 81,11% P3,P6 94,89 P4,P5 P7,P9 64,00% P8,P9 60,00% Corrected 97,78% 81,11% 94,89 Fig. 15. Optimized Fig. 16. Comparison between estimated p measurements. As wireless devices of other brands are acquired, they are located without modifying the SOMs or the regions de- fined for the first device. Therefore, unlike the last experiment, where all the measure- ments on the available wireless devices were used, we have not di- vided the data into a training set and a test set. In this case, the SMC card data form the training set and the data on the other cards (not used to define the SOMs) make up the test set. Fig. 15 shows the re- sults of locating different devices from the data used to create the SOMs. As you can see, success rate is over 90% in most cases. Fig. 16 shows the comparison between using the position, the region and the optimized region if a single card is used to create the SOMs. Finally, we can compare the results of using data from a single wireless device and using the data from all the devices to create P4 P5 P6 P7 P8 P9 % 95,33% 96,89% 74,00% 88,06% 91,11% 91,39% % 96,89% 74,00% 95,33% 88,06% 91,39% 91,11% region results. osition, region and optimized region. Fig. 17. One wireless device vs. all wireless devices used to build SOMs. L. Mengual et al. / Expert Systems with Applications 37 (2010) 6165–6175 6175 the SOMs. Fig. 17 shows that the results are quite similar. The out- come is sometimes better using all the cards, and the use of just one card gets better results in other cases. From these results we can conclude that the system can be used in any environment with a wireless device and at least three APs. 5. Conclusions We conceived and developed a system that is highly likely to correctly estimate the location of a mobile terminal indoors in a room or adjacent rooms. Unlike many other implemented systems, the resulting system is independent of the card brand and hardware technology (USB, PCMCIA) manufacturer. Our system evaluates the position by rela- tive differences in the measurements taken at places close to the APs. The system is also adaptable to any environment. All it takes is system training with any card at locations on the floor, the nor- malization of the measurements and the creation of clusters. As of this point, the system is ready to locate any user with any wireless card. Role of the funding source This work was conducted as part of the CYCIT-funded Project No. TIN2008-05924. References Ali, S., & Nobles, P. (2007). A novel indoor location sensing mechanism for IEEE 802.11 b/g wireless lan. In 4th Workshop on positioning, navigation and communication (WPNC ’07) (pp. 9–15). Astrain, J. J., Villadangos, J., Garitagoitia, J. R., González-Mendivil, J. R., & Cholvim, V. (2006). Fuzzy location and tracking on wireless networks. In International workshop on mobility management and wireless access (pp. 84–91). Battiti, R., Le Nhat, T., & Villani, A. (2002). Location-aware computing: A neural network model for determining location in wireless lans. Technical Report DIT- 5. Universita di Trento, Dipartimento di Informatica e Telecomunicazioni. Chai, X., & Yang, Q. (2007). Reducing the calibration effort for probabilistic indoor location estimation. IEEE Transactions on Mobile Computing, 6(6), 649–662. June. Debono, C. J., & Buhagiar, J. K. (2004). Neural location detection in wireless network. In 7th European conference on wireless technology. Golden, S. A., & Bateman, S. S. (2007). Sensor measurements for Wi-Fi location with emphasis on time-of-arrival ranging. IEEE Transactions on Mobile Computing, 6(10), 1185–1198. Kaemarungsi, K. (2005). Design of indoor positioning systems based on location fingerprinting technique. PhD thesis. University Pittsburg. Kaemarungsi, K. (2006). Distribution of WLAN received signal strength indication for indoor location determination. In 1st International symposium on wireless pervasive computing. Kontkanen, P., Myllymäki, P., Roos, T., Tirri, H., Valtonen, K., & Wettig, H. (2004). Topics in probabilistic location estimation in wireless networks. In 15th IEEE international symposium on personal, indoor and mobile radio communications, Barcelona, Spain. IEEE Press. Nezafat, M., Kaveh, M., Tsuji, H., & Fukagawa, T. (2004). Localization of wireless terminals using subspace matching with ray-tracing-based simulations. In Sensor array and multichannel signal processing workshop proceedings (pp. 623– 627). Nuño, G., & Páez-Borrallo, J. M. (2006). New location estimation system for wireless networks based on linear discriminant functions and hidden Markov models. EURASIP Journal on Applied Signal Processing. Pan, J. J., Kwok, J. T., Yang, Q., & Chen, Y. (2006). Multidimensional vector regression for accurate and low-cost location estimation in pervasive computing. IEEE Transactions on Knowledge and Data Engineering, 18(9), 1181–1193. Raniwala, Ashish, & Chiueh, Tzi-Cker (2002). Deployment issues in enterprise wireless lans. Wireless Communications. Robinson, M., & Psaromiligkos, I. (2005). Received signal strength based location estimation of a wireless lan client. IEEE Wireless Communications and Networking Conference, 4, 2350–2354. Sanchez, D., Afonso, S., Macias, E. M., & Suarez, A. (2006). Devices location in 802.11 infrastructure networks using triangulation. In The 2006 IAENG international workshop on wireless networks, Hong Kong. Seigo, I., & Kawaguchi, N. (2005). Bayesian based location estimation system using wireless lan. In Proceedings of the third IEEE international conference on pervasive computing and communications workshops (pp. 273–278). Widyawan, K., Klepal, M., & Pesch, D. (2007). Influence of predicted and measured fingerprint on the accuracy of RSSI-based indoor location systems. In 4th Workshop on Positioning, Navigation and Communication (WPNC ’07) (pp. 145– 151). Yiming, J., Biaz, S., Pandey, S., & Agrawal, P. (2006). Ariadne: A dynamic indoor signal map construction and localization system. In International conference on mobile systems, applications and services (pp. 151–164). Youssef, M., & Agrawala, A. K. (2005). The Horus WLAN location determination system. In Third international conference on mobile systems, applications, and services (MobiSys 2005), Seattle, WA, USA. Zhong, J., Bin-Hong, L., Hao-Xing, W., Hsing-Yi, C., & Sarkar, T. K. (2001). Efficient ray-tracing methods for propagation prediction for indoor wireless communications. IEEE Antennas and Propagation Magazine, 43(2). Clustering-based location in wireless networks Introduction Location estimation system issues Location estimation methodology Outline of the system Location estimation algorithm Measurement phase Calibration phase Estimation step Results of evaluation experiments Experiment 1: clustering data for all wireless cards Experiment 2: clustering data for one wireless card Conclusions Role of the funding source References