key: cord-1000038-9jyz6ho3
authors: Gu, Jinjing; Jiang, Zhibin; Sun, Yanshuo; Zhou, Min; Liao, Shenmeihui; Chen, Jingjing
title: Spatio-temporal trajectory estimation based on incomplete Wi-Fi probe data in urban rail transit network
date: 2020-10-16
journal: Knowl Based Syst
DOI: 10.1016/j.knosys.2020.106528
sha: 8b9715fe34b82ebaf1559e3d18024d763196f713
doc_id: 1000038
cord_uid: 9jyz6ho3

This study presents a methodology for estimating passenger’s spatio-temporal trajectory with personalization and timeliness by using incomplete Wi-Fi probe data in urban rail transit network. Unlike the automatic fare collection data that only records passenger’s entries and exits, the Wi-Fi probe data can capture more detailed passenger movements, such as riding a train or waiting on a platform. However, the estimation of spatio-temporal trajectories remains as a challenging task because a few unfavorable situations could result into deficient data. To address this problem, we first describe the Wi-Fi probe data and summarize their common defects. Then, the n-gram method is developed to infer missing spatio-temporal location information. Next, an estimation algorithm is designed to generate feasible spatio-temporal trajectories for each individual passenger by integrating multiple data sources, i.e., urban rail transit network topology, Wi-Fi probe data, train schedules, etc. This proposed method is tested on both simulated data in blind experiments and real-world data from a complex urban rail transit network. The results of case study show that 93% of passengers’ unique physical routes can be estimated. Then, for 80% of passengers, the number of feasible spatio-temporal trajectories can be reduced to one or two. Potential applications of the trajectory estimation approach are also identified.

A spatio-temporal trajectory (STT) of passenger in urban rail transit (URT) network is the recording of one passenger's motion, i.e., the route and train choice of each passenger at specific timestamps. Estimation of passengers' route and train choices within a complex seamless transfer URT network is a challenging task due to the rapid expansion of network operation and high departure frequencies. In Shanghai, China, there are 415 stations on 16 URT lines, 53 of which are transfer stations as of October 2019 [27] . In general, there are multiple routes and trains between each origin-destination pair (OD). For example, from Disney Resort to Tongji University, there are at least 8 possible routes and the minimum headway of trains is 2 minutes and 30 seconds This paper is organized as follows: Section 2 gives a review on route choice studies using AFC data and analyses of Wi-Fi probe data in URT networks. Then, Section 3 describes the Wi-Fi probe data and the algorithm outline. Afterwards, Section 4 explains in detail our proposal SST estimation methodology. Section 5 shows the results of blind experiment and real-world case study. And finally, Section 6 expounds the conclusions and future works.

The estimation of STT in URT networks is difficult, especially in the case of complex URT networks. Previous studies on STT estimation can be divided into two categories, i.e., route choice modeling using AFC transaction data and Wi-Fi probe data used in route-choice estimation.

The AFC system is used for collecting fares when passengers tap-in or tap-out the URT. Furthermore, this system also records the station and time information which can be used for inferring the passenger's route choice behavior [14, 40] . In light of the AFC data, some researchers tackled this problem based on a stochastic assignment principle [23, 26] and passenger flow assignment model [5, 13, 15] . Xu et al. [23] proposed a deletion algorithm for available routes based on the depth-first principle and calculated the corresponding proportions. But the model proposed in that paper is a static model. Moreover, train schedule connection networks (TSCN) were mainly used to estimate STT which was to assign passengers to each train. Zhou and Xu [7] connected rail network with train schedules. Sun et al. [33] named this network as TSCN and established a space-time trajectory estimation model which was a set generation and weighted assignment problem for feasible TSCN routes. The optimal trajectory problem using AFC data was also researched within the last few years. Chen et al. [31] , Chen et al. [32] and Zhu et al. [36] concentrated on optimizing route estimation method and established models to estimate the most likely space-time trajectory. Comparison of previous studies on estimating metro passengers' trajectories is listed in Table 1 . consideration. In addition, denied boarding at the station is a critical element that is neglected by [35] . Likewise, Hänseler et al. [6] assumed that passengers are able to board any desired train. This is also a critical limitation for estimating route choice using solely AFC in the congested metro systems. Without further information, it has a very limited way to differentiate a same 20 minutes journey observation is caused by denied boarding using a short route or no denied boarding with a long route. Therefore, the route and STT estimated by models could be inaccurate and the characteristics of individual cannot be reflected. Tang et al. [8] proposed a time geographic method to estimate the most likely space-time routes of passengers in vehicle level. Luo et al. [3] obtained load profiles of transit vehicles in the transit network consisting of 12 tram lines and 8 bus lines. The above-mentioned methods estimated metro passenger dynamics at the station-and vehicle-levels separately.

Thus, there is an urgent need to develop a new algorithm to describe detailed STT information both on-board and at stations.

The Wi-Fi probe technology has a relatively concentrated detection range than GPS data [11] and fast detection speed. So short-time passenger route restoration and monitoring can be realized. At present, most passengers are willing to open the Wi-Fi function of their cell phones when taking the metro, so the sampling rate can be guaranteed and the analysis results will be representative. With the comprehensive coverage of the Wi-Fi signal in the URT network, the quantity and position accuracy of the data are sufficient to support the STT estimation of the individual passenger.

The Wi-Fi probe data contains the interaction information between Access Points (APs) and the access device (e.g., mobile phones, computers). The route of the object carrying the Wi-Fi access device can be tracked dynamically. These studies about Wi-Fi technology mainly were applied in indoor positioning [1, 17, 22] .

Zhuang et al. [38] studied the AP location problem of smart phones by using automatic AP positioning and propagation parameter estimation of inertial navigation. Ma et al. [14] proposed a pedestrian trajectory estimation method to match the Wi-Fi indoor positioning system with the mapping building. Shang et al. [21] designed a space-time-state hyper network-based assignment approach to estimate the passenger flow state by integrating multiple data sources (i.e., AFC data, Wi-Fi data and flow data). Some researchers used media access control (MAC) addresses, which are unique identifiers of electronics devices, to estimate passengers' travel times or waiting times in stations of urban rail or bus [2, 25] . Abedi provide better internet service for passengers. These AP infrastructures can also be harnessed to capture and archive bulk positioning data while the mobile devices of passengers are connected to corresponding AP device.

After removing unrelated information, the Wi-Fi probe data attributes used in this study are listed in Table 2 . It is worth noting that the Wi-Fi probe data of this URT system is regulated by the public security department.

The dataset used in this paper is only for research purposes. It has been authorized by the metro transit police department. What's more, the personal information has been desensitized through the privacy-preserving authentication protocol [28] . In an ideal situation, once an anonymous passenger with an access device enters the URT system, this passenger's location information will be continuously collected over time. The location can be classified into station and link. Fig. 1 illustrates the Wi-Fi probe data collection process when one passenger travels from station A to station B via transfer station C. Therefore, this study aims to identify the most likely routes and trains that are taken by each individual passenger in term of AD_MAC. In particular, according to the historical trip information of an individual passenger, the station information can be complemented. Then, this problem is transformed into the third category issue proposed above. Considering that the information about trains or transfer stations of passenger is partially lost, additional data including walking time and actual train schedules are used to eliminate infeasible STT between the identified origins and destinations in this paper.

The STT estimation process of individual passenger is mainly composed of three steps. They can be respectively summarized as Wi-Fi probe data collection, path estimation and SST estimation. These points in Fig. 2 (a) are Wi-Fi data points for one journey. First, we can generate an initial Wi-Fi probe data network by sequentially connecting the adjacent Wi-Fi probe data. Then, we can coordinate this Wi-Fi data network with the URT network topology. The purpose of path estimation is to identify these crucial stations that passengers passed by coordinating the Wi-Fi data network with the URT network topology. These crucial stations include the original station, the destination station and the transfer station. As shown in Fig. 2 (b), there are two feasible paths from station A to station B, namely A→C→B and A→D→E→B. According to the formed Wi-Fi data network, no data is recorded at station D, station E and URT line 3, so it can be sure that this passenger travelled from station A to station B by path A→C→B. To estimate the passenger's STT along this estimated path, train timetable and transfer time parameters need to be used to generate the time when the passenger arrived at another four important positions. They are the station hall where passenger enters the URT system, the platform that passenger gets up a train, the platform that passenger gets off a train, and the station hall where the passenger exits the URT system or transfers to another URT line. The estimated STT is connected with arrowheads in Fig. 2 (c). And these four kinds of important positions are marked as red, yellow, green and blue, respectively. Finally, the SST of each individual passenger can be represented as a combination of the route and train choice at specific timestamps.

The passenger STT estimation algorithm developed for this study includes three subtasks: (1) extracting a route skeleton from the collected Wi-Fi probe data, (2) inferring the spatio-temporal locations for each 

(1) Space-time network topology This space-time network topology is integrated by URT network's physical topology and train diagram. An abstract URT network's physical topology G that consists of stations and trains is shown in Fig. 3 (a). The route from station A to station B by route A→C→B can be extended into such a STT, as shown in Fig. 3 are defined. Concretely, combining with the space-time network topology in Fig. 3(b) , the route A→C→B can be described

(2) Wi-Fi probe data network

One passenger's Wi-Fi probe data network i G is generated by connecting the key Wi-Fi probe data points 

The 

Given the URT network's physical topology, Wi-Fi data network and train schedules, this method aims to estimate all the feasible STT to the passengers. Firstly, constraints at boarding nodes and alighting nodes are proposed to generate the feasible train set. Then, node-link relationships are used to narrow the range of optional trains of each line. Finally, link-link relationships are considered to form a complete feasible trajectory. The main purpose of this method is to estimate real route and the precise trains that passengers take. Thus, the entry process and exit process are simplified. It is assumed that the last time recorded at origin station ( ) 

Furthermore, Eq. (8) and Eq. (9) should be held to avoid loops. 

Assuming that ( , , ) (12) should be established and 1 ( ) ( )

Assuming that ( , , ) Assuming that ( , )

in Eq. (12) .

Assuming that both ( , ) 

Time constraint could also be divided into four circumstances.

Assuming that both ( , ) 

Assuming that ( , ) 

When generate a complete STT, Eq. (17) must be held to ensure the train attributes for the boarding node and alighting node of the train link are the same. 

The main idea of passenger route generation is to estimate the location (i.e., hall, platform, train, etc.) that passengers may appear during the whole route in real time. It should consider the relationship between the time 

Add en v and ex v to V

for each feasible route

K v a while:

Narrow feasible train set into 

Add transfer link set tr A to A 

So-called blind experiments [34] is used to test this proposed method. In such an experiment, "certain information which could introduce bias or otherwise skew the result is withheld from the participants, but the experimenter will be in full possession of the facts". Two types of routes will be obtained using the proposed method: (1) the physical network route without any temporal attributes, and (2) the STT.

The blind experiment is conducted as follows: an experimenter generates a route with complete Wi-Fi probe data and thus has full knowledge of the experiment. Then the experimenter reveals partial Wi-Fi data to the observer according to different data missing rate. The main purpose of this experiment is to find the impacts of data missing rate on both route estimation and STT estimation.

The network topology used in this blind experiment is shown in Fig. 3(a) . Within the experiment, each trajectory corresponds to one passenger. In order to derive feasible routes and trajectories, the experimenter will present all the train schedules, minimum transfer time and different sample rates to the observer. Using the method proposed above, no matter what the data missing rate is, for all the passengers, at least one feasible route/trajectory can be found. For clarity, Other detailed estimation results under different data missing rates are given in Table 3 . Table 3 Route and trajectory estimation results under different data missing rates.

Wi-Fi data missing rate The percent of passengers with a unique route J o u r n a l P r e -p r o o f Table 3 shows that in this experiment, if the data missing rate is below 0.6, a unique route can be determined for 99% of passengers and a unique STT can be determined for 57% of passengers. Therefore, the method proposed in this paper is suitable for estimating feasible routes and trajectories unless the data missing rate is very high, e.g., 0.7. The method will be further demonstrated in a much more complex real-world network in the next section.

(1) Results of spatio-temporal location inference

This section presents a case study using the proposed spatio-temporal location inference method based on real-world URT network data. The Wi-Fi probe data is obtained from the Technology Center of Metro Co. Ltd. In addition, the Wi-Fi probe data has been eliminated outliers by referring to the approach mentioned in [9] . The dataset contains records during a period of 64 days (from November 29, 2017 to January 31, 2018). Each record contains the attributes that are listed in Table 2 As can be observed from Fig. 4 , the accuracy has an apparent improvement when the value of n increases from 1 to 2, and starts to decline when n equals 3. Apart from this, this 2-gram model achieves the best performance in these three models. The best accuracy values are marked in bold. It suggests that the next location may affected by the two preceding location mainly. The reason is that travel rule of individuals is pretty complex and may not travel following the historical trip always. In addition, we can find that these accuracy values improve as the value of k increases. This proposed algorithm gives a set/a sequence of locations, these top-k trajectories that best connect the given locations are retrieved from existing Wi-Fi trajectories. It is helpful to optimize the feasible scope of individual's trajectory estimation in the next step. In theory, all the feasible routes between these two station groups should be set as the initial feasible routes.

However, this sub-network is pretty complex and exist many feasible routes between these two station groups.

Actually, passengers are highly sensitive to travel time and transfer times, which means most passengers' route choice will be concentrated within a limited route set. Taking As shown in Fig. 6 , three typical circumstances occur within the estimation results: (1) the unique STT could be extracted; (2) the unique route could be found while two or more trajectories are possible; (3) more than one feasible route are found.

Here is one typical result about circumstance (2) . The passenger whose AD_MAC is 90-3C-**-**-13-AD enters URT network at Xiuyan Road, and leaves at Guoquan Road. URT line 11, 2 and 10 have been recorded in Wi-Fi probe data, while none of the transfer stations are collected. Fig. 7(b) illustrates the detailed information collected during the journey.

J o u r n a l P r e -p r o o f detailed STT estimation results in each route.

After data reprocessing, totally 345 valid data groups of passenger trajectories are recorded by Wi-Fi sensors.

Their origin and destination information are complete and belong to the scope of the OD groups that we study.

Then, sum of route and STT estimation results, as shown in Fig. 9 In the total data, 97 original Wi-Fi data groups can directly reflect the real route, constituting 28.12 % of total.

After using our method, 343 data groups output feasible route estimation results, accounting for 99.4 % of input data. As shown in Fig. 9 , 316 data groups receive unique feasible route, constituting 92.9 % of output data. So,

we can obtain the conclusion that the route and STT estimation method using Wi-Fi probe data is pretty suitable to determine the unique route-choice in complex network structures. Within the 316 data, 152 data groups acquire only one feasible STT, which means 44.6 % of passengers' travel trajectory could be accurately restored using Wi-Fi probe data and the proposed method. 117 passengers gain two feasible trajectories and the rest of passengers get more than two feasible trajectories, and further study still need to be done to estimate the most likely trajectory.

The main purpose of this study is to design a methodology to reconstruct individual passenger's actual route and train choice under incomplete Wi-Fi probe data situation. This proposed approach can be considered as one of the first steps toward the use of Wi-Fi probe data for the SST estimation at the network level. The solution to this problem facilitates a number of applications, for instance, the route choice is essential to allocating passenger flows to different rail transit lines, which further determine how fare revenues are distributed. The estimation results can also help operators to estimate the train loads over time and space. Furthermore, real-time demand management can be conducted more refined, e.g., detailed estimation of platform usage and number of passengers left behind for safety analysis, personalized dynamic route guidance, crowding information systems, real-time passenger flow control, and disruption management. Real-time application evaluation, such as analysis of car-specific on-board accumulation for comfort assessment and flexible pricing strategy and real-time train rescheduling strategies, can be optimized according these estimated SST results. It can also provide comprehensive and effective input data for trajectory tracking [30] , trajectory pattern mining [10, 12, 16] and trajectory prediction [20] . Specially, this proposed approach can significantly benefit for real-time monitoring SST of suspected COVID-19 patients at the URT network.

J o u r n a l P r e -p r o o f

This paper contributes to the route estimation research by presenting a general modeling framework for estimating passenger's STT with personalization and timeliness using the Wi-Fi probe data at the URT network level. After reviewing relevant studies about route-choice models and Wi-Fi technology, several literature gaps are identified. First, at the present stage, most of the route-choice models take AFC transaction data as model input. Since no intermediate information between origin and destination is recorded, the accuracy of the estimation is difficult to determine. Second, most of studies use Wi-Fi technology to position and calculate time parameters in a very limited area, such as a station or an airport. Third, detailed information of Wi-Fi probe data has not been described specifically. Therefore, in this study, structure and problems of Wi-Fi probe data in URT system are introduced and useful Wi-Fi data are integrated in the estimation method. Blind experiments prove that our method can find feasible routes and STT in a simple network successfully under any data missing rates.

In the case study, we reconstruct the real-world URT network into a topology network and choose a more complex OD pair which consisting of 8 feasible routes. Results show that (1) for 93% of passengers, a unique physical route can be estimated and (2) for 80% of passengers, the number of feasible STT is reduced to one or two. The STT estimation is capable to support the day-to-day operations of large URT systems. The route estimation results can be useful for metro corporation revenue allocations, the train loads estimation over time and space, real-time demand management and application evaluation.

The proposed approach can be considered as a first step towards the utilization of Wi-Fi probe data for the SST estimation at the network level. This paper could thus stimulate further research in three aspects. First, combining data from more data sources, such as metro video surveillance, is helpful to improve the real-time STT estimation accuracy for individual passenger. Second, semantic SST patterns mining [4] is useful to find the most likely path and trajectory from the multiple feasible paths estimated, which is also the main point of our future study. Then, it is of great theoretical and practical significance for backtracking of COVID-19 spreading trajectory and tracking of close contacts of infected passenger in real time.

Human activity recognition in indoor environments by means of fusing information extracted from intensity of WiFi signal and accelerations

Development and testing of a real-time WiFi-Bluetooth system for pedestrian network monitoring and data extrapolation

Constructing spatiotemporal load profiles of transit vehicles with multiple data sources

Semantic periodic pattern mining from spatio-temporal trajectories

Deduction of passengers' route choices from smart card data

A passenger-pedestrian model to assess platform and train usage from automated data

Model of passenger flow assignment for urban rail transit based on entry and exit time constraints

Estimating the most likely space-time routes, dwell times and route uncertainties from vehicle trajectory data: A time geographic method

Real-time passenger flow anomaly detection considering typical time series clustered characteristics at metro stations

Context-based prediction for road traffic state using trajectory pattern mining and recurrent convolutional neural networks

Incremental route inference from low-sampling GPS data: an opportunistic approach to online map matching

Mining distinct and contiguous sequential patterns from large vehicle trajectories. Knowl.-Based Syst

An integrated Bayesian approach for passenger flow assignment in metro networks

Pedestrian dead reckoning trajectory matching method for radio map crowdsourcing building in WiFi indoor positioning system

A dynamic traffic assignment model for highly congested urban networks

Discovering individual movement patterns from cell-id trajectory data by exploiting handoff features

Spotfi: decimeter level localization using wifi

Assessment of antenna characteristic effects on pedestrian and cyclists travel-time estimation based on Bluetooth and WiFi MAC addresses

Exploring pedestrian Bluetooth and WiFi detection at public transportation terminals

Predicting attributes and friends of mobile users from ap-trajectories

Integrating Lagrangian and Eulerian observations for passenger flow state estimation in an urban rail transit network: a space-time-state hyper network-based assignment approach

A hybrid indoor positioning algorithm based on WiFi fingerprinting and pedestrian dead reckoning

Passenger flow distribution model and algorithm for urban rail transit network based on multi-route choice

Poster: Monitoring transit systems using low cost WiFi technology, Vehicular Networking Conference

Modeling passenger flow distribution based on travel time of urban rail transit

A practical and communication-efficient deniable authentication with source-hiding and its application on Wi-Fi privacy

Characterizing travel space-time trajectory on urban rail transit network using Wi-Fi data

Trajectory tracking for uncertainty time delayed-state self-balancing train vehicles using observer-based adaptive fuzzy control

Estimating the most likely space-time route by mining automatic fare collection data

Data-driven method to estimate the maximum likelihood space-time trajectory in an urban rail transit system

Schedule-based rail transit route-choice estimation using automatic fare collection data

Rail transit travel time reliability and estimation of passenger route choice behavior: Analysis using automatic fare collection data

Estimating metro passengers' route choices by combining self-reported revealed preference and smart card data

A probabilistic passenger-to train assignment model based on automated data

Inferring left behind passengers in congested metro systems from automated data

Autonomous smartphone-based WiFi positioning system by using access points localization and crowdsourcing

Estimation of denied boarding in urban rail systems: alternative formulations and comparative analysis

Individual mobility prediction using transit smart card data

We greatly appreciate the very constructive comments made by the anonymous reviewers, who have helped significantly improve the quality and presentation of this paper. This work is supported by the National Natural 

can be calculated in accordance with the training model.Furthermore, these training model parameters would be updated with the generation of individual trajectory data.Notice that accuracy refers to the frequency of the true next spatio-temporal location occurring in the list of inferred locations. Let ( ) Center, then transfer to line 10 at Laoximen. Fig. 8(a) shows the route estimation results in the real-world URT network. In route 1, two trajectories are found, while in route 2, six trajectories can be found. Fig. 8(b) shows the J o u r n a l P r e -p r o o f

We wish to draw the attention of the Editor to the following facts which may be considered as potential conflicts of interest and to significant financial contributions to this work.

We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the Corresponding Author and which has been configured to accept email from jzb@tongji.edu.cn or