key: cord-0445547-kc17jcdf
authors: Thao, Tran Phuong
title: Location-based Behavioral Authentication Using GPS Distance Coherence
date: 2020-09-17
journal: nan
DOI: nan
sha: dddd4efdc8ed7cb9823250a8be8d575de5b360d9
doc_id: 445547
cord_uid: kc17jcdf

Most of the current user authentication systems are based on PIN code, password, or biometrics traits which can have some limitations in usage and security. Lifestyle authentication has become a new research approach. A promising idea for it is to use the location history since it is relatively unique. Even when people are living in the same area or have occasional travel, it does not vary from day to day. For Global Positioning System (GPS) data, the previous work used the longitude, the latitude, and the timestamp as the features for the classification. In this paper, we investigate a new approach utilizing the distance coherence which can be extracted from the GPS itself without the need to require other information. We applied three ensemble classification RandomForest, ExtraTrees, and Bagging algorithms; and the experimental result showed that the approach can achieve 99.42%, 99.12%, and 99.25% of accuracy, respectively.

The term Society 5.0 [4] has become a well-known buzzword which was introduced by the Japanese government in 2011 1 . Society 5.0 is used to refer to a super-smart society that balances economic advancement with the resolution of social problems. Society 5.0 focuses on two important keywords humancentered and smart life with the support of Artificial Intelligent (AI), Internet of Things (IoT), big data, and cutting-edge technologies.

Let's consider an example of the electronic payment system. In 1871, Western Union debuted the electronic fund transfer (EFT) which allows people to send money to pay for goods and services without necessarily having to be physically present at the point-of-sale. In 1946, John Biggins was the inventor of the first bank-issued credit card which can be used to replace paper money (although the concept of using a card for purchases and the term credit card was described in 1887 by Edward Bellamy). In 2011, Google was the first company to launch a project of mobile wallet which can be used to replace physical cash and even credit cards. Nowadays, the cashless payment system has become a recent trend. Many digital wallet services appeared such as Apple Pay (from 2014), Google Pay (from 2015 as Android Pay and from 2018 as Google Pay), Rakuten Pay (from 2016), etc.

The biggest challenge for such payment systems is how to authenticate (verify) the users. The current approach is to rely on the authentication of the mobile phones using PIN code, password, biometrics information (i.e., fingerprinting, iris, face, etc.), or multi-factor method which combines more than one method of authentication from independent categories of credentials.

Gradually, there appeared many sophisticated attacks in smartphone authentication. First, PIN code/ password-guessing attack [15, 16] tries to recover the password plaintext from its hashed form using brute force attack which systematically checks every combination of letters, symbols, numbers and dictionary attack which uses a dictionary of common words. Second, biometric spoofing tries to generate synthetic or fake biometric traits of legal users to fool the capture sensors including facial spoofing which utilizes printed facial photographs and digital video [21] or a 3D mask [22] , fingerprinting spoofing [23] which utilizes artificial replicas with different materials such as gelatin, latex, play-doh or silicone, and iris spoofing [17] which utilizes an image forging natural iridal texture characteristics [18] or even cosmetic contact lenses [19, 20] , and the combination of all these three spoofing types [24] . Third, smudge attack tries to guess the graphical password pattern in touch screen phones by analyzing the epidermal oils and smears left on the device's screen by the user's fingers [25] . Fourth, shoulder-surfing attack [26] uses social engineering techniques to steal the victim's personal information such as PIN code and password by looking over the victim's shoulder or by eavesdropping sensitive information being spoken and heard or keystrokes on a device. Furthermore, not related to any attack, several studies found that a large number of users themselves do not lock their smartphones. An analysis of over 150 smartphone users was conducted in [11] and showed that 33% of the users do not use any screen lock even PIN, password, or pattern. Face-to-face qualitative interviews with 28 participants were conducted in [12] . 29% of the users responded that they did not lock their devices. The three most common reasons include: emergency personnel not being able to identify them, not having the devices returned if lost, and not believing they worth data. An online survey with 260 participants and a field study with 52 participants was performed to analyze smartphone users' risk perception and behaviors [13] . They showed that 40.9% of users use slide-to-unlock and 16.2% of users do not use any screen lock.

Location-based Behavioral Authentication Toward the construction of a smarter and securer mobile-based authentication system, there are several questions. First, for mitigating the aforementioned attacks, is there an additional mobile-based authentication method that can support the conventional methods such as using PIN code, password, and human biometric traits (i.e., fingerprints, face, iris)? Second, imaging the scenario that a person is on the way going to a coffee shop. Before the person arrives, the coffee shop can predict that he/she will arrive 15 minutes later with a high probability, and prepares in advance his/her usual order, and will automatically subtract the charge from his account. The person then does not need to wait time for the order and payment process. So, the question is: is it possible to authenticate and predict the location (for example, the coffee shop) that the users are likely going to? Last but not least, in the recent situation of the COVID-19 pandemic, the current smartphone-based cashless payment can reduce the chance of using cash or card, but still, the user needs to touch the smartphone screen to show the bar code to the cashier. The final question is that is it possible if the user can pay for goods when only bringing the smartphone without the need to touch the screen?

An idea that can answer these questions is using behavioral (or habit)-based information. It is a new research topic in which the main challenge is how to decide good behavioral information for authentication. Inspirited from L. Fridman (MIT) et al. [5] just in 2016, GPS location history is the most promising approach because "It is relatively unique to each individual even for people living in the same area of a city. Also, outside of occasional travel, it does not vary significantly from day to day. Human beings are creatures of habit, and in as much as location is a measure of habit". At this time, it can only say that single behavioral authentication is an additional method to support the conventional methods (i.e., password, PIN code, biometrics) or a method to be combined with other behavioral factors. In the future, if a payment system can be constructed such that the users do not need to bring anything even small wearable devices such as smartwatches or RFID chips (e.g., the data can be collected via satellite sensors) and which can completely replace the conventional biometrics authentication, it is a step closer to Society 5.0.

Motivation A system that can achieve a high authentication accuracy is when it can collect multiple factors as much as possible. However, from the users' viewpoint, the most convenient system that does not bring strong privacy concerns to the users is when it does not require the users to provide too much information. From the GPS records, most of the previous work utilized the longitude, the latitude, and the information extracted from the timestamp (i.e., year, month, day, hour, minute, second, day of the week, etc.) as the features in the classification machine learning model for the user authentication. Given limited information from the GPS (longitude, latitude, and timestamp), if metadata that carries extra independent information can be obtained from the GPS itself, it can help to improve the accuracy. An example of GPS-based self-enhancement comes from [7] in which the address is extracted from the pair of longitude and latitude using a reverse geocoding.

Contribution In this paper, we propose an idea of extracting the distance coherence features from the GPS records themselves without the need to request any other information besides the GPS. For each user, the locations at close time clocks may have some closer correlation in physical distance than the locations at far time clocks since a human needs a period of time to gradually move from a location to another location. Since the idea actually reflects a movement "lifestyle" of the users, we hypothesized that it may improve the accuracy. Although it may be not 100% correct when the user goes forward and then backward within the considered period of time, the proposed distance coherence features are used as the additional features to support the previous features.

To evaluate how feasible the approach is, we collected 107,637 GPS records from 348 users. We applied three ensemble machine learning classification (Ran-domForest, ExtraTrees, and Bagging) on a total of 13 features including the distance coherences features. The experimental result showed that our approach outperforms the approach without the distance coherence features with the accuracy of 99.42% (for RandomForest), 99.12% (for ExtraTrees), 99.25% (for Bagging) and merely 0% of false positive rate and 0.01% of false negative rate (for all the three algorithms).

Considering it reasonability, it may raise the discussion that since the distance coherence score can be inferred from the GPS and the timestamp, so whether the entropy of the distance coherence is the same as that of the GPS and the timestamp, or in other words, whether the distance coherence gives no additional information to the GPS and the timestamp. However, for each sample, the corresponding distance coherence is computed from not just the sample but also other samples that have a close timestamp with the considered sample. Therefore, the GPS, the timestamp, and the distance coherence score are independent variables. Furthermore, of course the model using GPS and timestamp can be improved if they are combined with other factors such as Wifi information, web browser log, etc. However, the goal in this paper is to make clear whether the distance coherence score extracted from the GPS and the timestamp can be helpful for the better classification model. We thus excluded other factors to make the comparison clean.

Roadmap The rest of this paper is organized as follows. The related work is introduced in Section 2. The proposed method is described in Section 3. The experiment is presented in Section 4. The threat model is presented in Section 5. The discussion about future work is shown in Section 6. Finally, the conclusion is drawn in Section 7.

In this section, we present related work focusing on multimodal authentication using human-smartphone interactions and using other factors. The term multimodal (not multimodel ) is used to indicate the biometrics authentication using multiple biometric data. It is opposite with unimodal which uses only a single biometric data.

L. Fridman et al. [5] analyzed the behavioral data of four modalities from active mobile devices including text stylometry typed on a soft keyboard, application usage patterns, web browsing behavior, and physical location of the device from GPS (outdoor) and Wifi (indoor). The data was collected from 200 users in more than 30 days. The authors proposed a parallel binary decision-level fusion architecture for classifiers based on four biometric modalities. A. Alejandro et al. [8] performed an analysis on a multimodal data consisting of four biometric data channels (including touch gestures, keystroking, accelerometer, and gyroscope) and three behavior profiling (including WiFi, GPS location and app usage). The data was obtained during the natural human-smartphone interaction of 48 users on average 10 days per user. They proposed two authentication models named the one-time approach that uses all the channel information available during one session, and an active approach that uses behavioral data from multiple sessions by updating a confidence score. W. Shi et al. [6] proposed an authentication framework that enables continuous and implicit user identification service for a smartphone. The data was collected from four sensor modalities including voice, GPS location, multitouch, and locomotion. They conducted a preliminary empirical study with a small set of users (seven). The result showed that the four modalities are enough for mobile user identification. R. Valentin et al. [10] conducted an analysis on multimodal sensing modalities with mobile devices when the GPS, accelerometer, and audio signals are utilized for human recognition. The data was collected from four existing datasets which consist of 491 users. They applied four variants of deep learning for interpreting user activity and context as captured by multi-sensor systems. M. Upal et al. [14] investigated user authentication methods using the first non-commercial multimodal data which focuses on three smartphone sensors (front camera, touch sensor, and location service). The data was collected from 48 users for 2 months. Their benchmark results for face detection, face verification, touch-based user identification, and location-based next place prediction showed that more robust methods fine-tuned to the mobile platform are needed to achieve satisfactory verification accuracy. T. Thao et al. [7] extracted the addresses given the longitudes and latitudes from the GPS records and then applied the text mining on the addresses. The data was collected from 50 users for about four months. Their experimental result showed that the combination between the text features and the GPS data can improve the classification accuracy. B. Aaron et al. [9] proposed a wallet repository that can store biometric data using multiple layers: biometric layer, a genomic layer, a health layer, a privacy layer, and a processing layer. The processing layer can be used to determine and track the user location, the speed when the user is moving using GPS data.

Besides using human-smartphone interactions, multimodal authentication also uses other factors. T. Kaczmarek et al. [27] investigated a new hybrid biometric based on a human users seated posture pattern in an average office chair over the course of a typical workday. Their experimental results on a population of 30 users showed that the posture pattern biometric can capture a unique combination of physiological and behavioral traits and can authenticate the users with 91% of accuracy. M. Ivan et al. [28] proposed an approach which combines the PIN code and the pulse-response. For the experiment process, the biometric information from 10 users was collected. The result showed that each human body exhibits a unique response to a signal pulse applied at the palm of one hand, and measured at the palm of the other. The experimental result for user authentication achieved 88% of accuracy when the records are taken weeks apart. W. Louis et al. [31] and R. Alejandro et al. [33] constructed a continuous authentication system based on electrocardiogram (ECG) and electroencephalogram (EEG). Their approaches achieved 1.57% and 0.82% of false negative rate, respectively. E. Simon et al. [30] extracted distinct patterns from eye movement (it is different from iris) with 21 features that can be used for user authentication. The data was collected from 30 users in 2 weeks with 3 scenarios (no prior knowledge, the knowledge gained through a description, and knowledge gain through observation). The experimental result achieved 3.98% of equal error rate.

In this section, we describe our proposed method including data collection, feature extraction and selection, and our learning method.

We created a navigation application named MITHRA (Multi-factor Identification/auTHentication ReseArch) in the project of the University of Tokyo to collect the GPS information of the users. The application can be installed on both iOS and Android smartphones. The application was developed to be run in the background so that the users do not feel a burden with the user interface (UI) and that the memory consumption can be saved. The data was collected from 348 users with 107,637 GPS records including pairs of longitude and latitude for four months from January 11th to April 26th in 2017 2 . Compared to the existing works (see Secion 2), the number of users in our dataset is higher than most of the papers and is only lower than [10] which could collect the information from 491 users. We recruited the participants randomly. The users live and work in random areas. The GPS data was measured every minute. The value of the longitudes and latitudes were collected with the precision up to 6 decimal places (e.g., 36.xxxxxx) corresponding to 0.1 meters.

The privacy consent is shown to the users during the installation process. The installation can only be done if the users accept the terms and conditions agreement. Even after the application is successfully installed, the users can choose to start or stop using the application anytime. Any personal information of the users such as name, age, gender, race, ethnicity, income, education, etc. is not collected. Only the email address is collected as the user identity in the collected data which is used to distinguish the users with each other. Even though the application is used to collect the GPS information, the users do not need to disclose which location is the home, which location is the office, etc. Our project was reviewed by the Ethics Review Committee of the Graduate School of Information Science and Technology, the University of Tokyo. Finally, all the users who installed the application agreed to participate in our project.

The features are categorized into two groups: (i) the features extracted from the GPS and the timestamp, and (ii) the features using the distance coherence score.

There are seven features in this group. Two features were extracted from the GPS including the latitudes and the longitudes which are represented by float numbers. The valid ranges for the latitudes and the longitudes are the continuous range [−90, +90] and [−180, +180], respectively. Five features were extracted from the timestamp including month, day, hour, minute, and day of a week (i.e., seven days from Monday to Sunday) which are represented by integer numbers. The valid ranges for these features are the intervals [1, 12] , [1, 31] , [0, 23], [0, 59], and [1, 7] , respectively. The year was not extracted as a feature because all the samples in the dataset were collected in the same year (2017).

Distance Coherence There are α features in this group (we will soon explain how to choose α). Every z-th feature (where z ∈ [1, α]) represents the distance coherence (also known as similarity score) between each sample in the dataset with the average of all the other samples in the dataset that belong to the same user and that occur before or after p hours for every p ∈ [0, z] with the considered sample. p = 0 is the case when the other samples occur in the same hour with the considered sample.

More concretely, the features are computed as follows (see Figure 1 ). Let {dc z } denote the set of α features where z ∈ [1, α]. Let s i denote each sample in the dataset where i ∈ [1, n] and n denotes the number of samples (in our dataset, n = 107, 637). For each feature dc z , let K z = {s j } (where j ∈ [1, n] and j = i) denotes the set of all the other samples such that s i and s j belong to the same user U t (where t ∈ [1, 348] ). State differently, s i and s j have the same label U t . Let lat(s i ) and lat(s j ), lon(s i ) and lon(s j ), and hour(s i ) and hour(s j ) denote the latitude, the longitude, and the hour features for s i and s j , respectively. For each dc z , K z is chosen such that:

The average coordinate s j is determined from all the samples s j in K z such as:

The features are finally calculated as the distance between s i and s j : From Equation 1, we can observe that K z chosen for dc z is a subset of K z chosen for dc z for all z, z ∈ [1, α] such that z > z. It may raise the question that whether all the α features have a correlation. However, the averages from even correlated sets are completely different (for example, average(1, 2, 3) = 2 which is different from average(1, 2, 3, 4) = 5). All the features dc z are thus independent variables. A numeric example for how to calculate the distance coherence features will be given in Appendix A.

We now explain how the concrete value for α is. In our approach, we use three advanced classification machine learning algorithms which are RandomForest, ExtraTrees, and Bagging (explained in more detail in Section 3.3). We conducted an experiment for every α from 1 and increased it gradually. We found that the best α for RandomForest, ExtraTrees, and Bagging is 3, 4, and 5, respectively, at which the algorithms reach a peak performance (Section 4.3). Since α reflects the movement lifestyle of the users, it is reasonable for α to be not large. For instance, the GPS record (a pair of latitude, longitude) of a user U t at 15:00 may have some correlation (in terms of physical distance) with that at 14:00 and 16:00 than that at 13:00 and 17:00 or than that at 12:00 and 18:00. In the rest of this paper, we use α-DC to denote the approach in which α distance coherence features are used, and {lat, lon, mon, day, hour, min, weekday, dc 1 , dc 2 , · · · , dc 6 } to denote the set of the thirteen features related to both the GPS and timestamp and the distance coherence.

The distribution statistics for the features is described in Table 1 which includes the mean, standard error, median, standard deviation, Kurtosis score, skewness score, min value, and max value. A normal distribution check for the features is not necessary [32] . The negative and positive values in the latitude and the longitude in the "Min" and "Max" columns indicate that the users who used to commute in Japan might travel abroad during the time of data collection. This kind of data can create noises during the training and testing processes. However, we do not think it should be removed because the data reflects the natural behavior of the users. Although the noises may lower the accuracy, we want to measure how practical the approach is when using real data without being manipulated. 

This section explains the machine learning algorithms chosen for our model and the evaluation method. In the dataset, each user has a different label. Each label has a different set of records.

Average Ensemble Classifications The dataset contains 107,637 samples with a large number of labels (348 users). Instead of using the traditional algorithms, we use the advanced algorithms called average ensemble classifications to get better performance in terms of training speed and accuracy. The average ensemble technique builds several base estimators independently and produces one optimal predictive estimator by averaging the predictions of all the base estimators. The combined estimator is better than any of the single base estimators by reducing the variance to control over-fitting. The most common average ensemble algorithms are the followings:

-RandomForest [1]: This algorithm implements a meta estimator that fits a number of decision tree classifiers on various randomized sub-samples of the dataset and uses averaging to create the best predictive estimator. When each estimator is built, a bootstrap is created by randomly sampling the dataset with replacement. The size of the sub-samples is set to be the same as the size of the original input sample. A decision tree is usually trained by recursively splitting the data (the process to convert the non-homogeneous parent into the two most homogeneous child nodes). The algorithm selects an optimal split on the features selected at every node -ExtraTrees [2] : This algorithm also produces the best predictive estimator in a way like RandomForest. However, there are some differences. First, while RandomForest uses the optimal split, ExtraTrees uses the random split. Second, while RandomForest sets the bootstrap = T rue by default, ExtraTrees sets the bootstrap = F alse by default. This also means that while Random-Forest supports drawing sampling with replacement, ExtraTrees supports drawing sampling without replacement. -Bagging (Bootstrap Aggregating) [3] : This algorithm has one different point from RandomForest and ExtraTrees. While RandomForest and ExtraTrees select only a subset of randomized features for splitting a node, Bagging uses all the features for splitting a node.

Stratified K-Fold The data is shuffled at first and is then used in a k-fold cross validation. Since the numbers of samples of the users are imbalanced, using the normal k-fold cross validation can lead to the following problem. There may exist a class c k where k ∈ {1, 2, · · · , 348} such that all the samples in its sample set S k belong to the test set, and the training set does not contain any of its samples. The classifier, therefore, cannot learn about the class c k . To solve this problem, we used Stratified k-fold cross-validation object which is a variation of k-fold and can deal with imbalanced data in each class. As presented in Figure 2 , it splits the data in the train and the test sets and returns stratified folds that are made by preserving the percentage of samples for each class. Test set Train set

Evaluation Metrics To evaluate our approach, we measure the following metrics:

where tp, tn, f p, f n denote the true positive, true negative, false positive, and false negative values, respectively. F P R and F N R denote the false positive rate and false negative rate, respectively. The accuracy is a good metric when the distribution for each label is almost similar. However, for an imbalanced dataset, F1-score is the better metric.

This section presents the experimental setup, the results obtained after applying the classification, and how to find the best α for each algorithm.

The program was implemented using Python 3.7.4 on a computer MacBook Pro 2.8 GHz Intel Core i7, RAM 16 GB. The machine learning algorithms are executed using scikit-learn 3 library version 0.22. For each ensemble algorithm, the number of base estimators n estimators is set to 100. The k value in the stratified k-fold cross validation is set to k = 10. Since the categorical labels are represented in text strings (such as user001, user002, etc.), the labels are transformed to numerical values using the label encoding. Contrary to the ordinal encoding which encodes the labels to an integer array and the one-hot encoding which encodes the labels to a one-hot numeric array, the label encoding encodes the labels to the values between 0 and q − 1 where q is the number of distinct labels of all the classes. The label encoding is the most lightweight method and uses less disk space. Since the data is imbalanced, to avoid the situation that F1 is not between precision and recall, we calculate the three metrics (precision, recall, and F1 score) for each label and find their average weight by the number of true instances of each class. This process can be done by setting the parameter average = weighted in the sklearn.metrics. For the accuracy, this parameter is not necessary. Since the values of the distance coherence features are small, we scaled them up ×10 4 . For each of the three algorithms (RandomForest, ExtraTrees, and Bagging), an experiment was conducted with different α's . The classification was applied on 107,637 samples with 348 labels which correspond to 348 users.

The main result is presented in Table 2 . In the table, NoDC represents the approach not using distance coherence features while α-DC represents the approach using α distance coherence features. As proved later in Section 4.3, RandomForest, ExtraTrees, and Bagging reach the best performance at α = 3, α = 4 and α = 5, respectively. Thus, 3-DC, 4-DC, and 5-DC are chosen to compare with NoDC in this table (although only 1-DC can already beat NoDC (see Section 4.3)).

The result shows that our approach α-DC outperforms NoDC in all the cases. Comparing all the algorithms using NoDC only with each other, Bagging gives the best result with 98.69% of F1 score with 0.02% of false negative rate. Comparing all the algorithms using our approach with each other, RandomForest gives the best result with 99.42% of F1 score and merely 0.01% of false negative rate even though RandomForest just reaches α at α = 3 (which is less than α = 4 for ExtraTrees and α = 5 for Bagging). Comparing the improvement between α-DC and NoDC, ExtraTrees gives the best result when 2.34% of F1 score is increased ( = +2.34) and 0.04% of false negative rate is reduced ( = −0.04).

This section explains the experiment to find the best α for each algorithm. First, α is set to 1 and is then gradually increased until the performance becomes convergent or is reduced after reaching the peak. The result and its graphs are presented in Table 3 and Figure 3 . The proposed approach using RandomForest, ExtraTrees, and Bagging got the best performance at α = 3, α = 4, and α = 5, respectively. Figure 3 shows that in all the algorithms, the graph almost has the cone shape (the result is gradually increased, gets the peak, and then is reduced or becomes convergent), not a zigzag shape (in which we cannot predict where is the peak). The result also shows that by even just using 1-DC (α = 1), our approach can already beat NoDC. It is not a big deal for the server. When the number of users is much more increased (e.g., to thousands), it is not complicated to transform the current model from the one-class classification to a multi-class classification where each user has a different classifier with binary labels representing whether or not a sample belongs to that user.

In this section, we present the threat model including which attack is focused on, the adversary's probability, and the assumptions.

Most of such authentication systems, not just our approach but other previous biometrics-based authentication, focus on protecting against insider threats in which the adversary tries to impersonate the authentication of an authorized user in the system. As mentioned in Section 1, at this time the behavioralbased authentication should be used as an additional approach to support the conventional PIN code, password, or biometric authentications. So let's run an example in which our approach is combined with PIN code-based authentication. Let P r A denote the probability that the adversary A can break the system. P r A is defined as: P r A = P r guess · P r f orge (7) where P r guess and P r f orge denote the probability that A can correctly guess the PIN code and the average probability that A can fool the classifier, respectively. P r f orge is the false negative rate which is the percentage of identification instances in which the unauthorized users are incorrectly accepted. Table 2 shows that all the 3-DC, 4-DC, and 5-DC approaches corresponding to the three different algorithms have the same 0.01% of false negative rate. Thus, P r f orge = 10 −4 . Let τ and σ denote the number of digits in the PIN code and the number of guessing candidates for each PIN code digit. If A has n t tries before the device is locked with many wrong PIN codes, we have P r guess = nt σ τ . Finally, P r A is thus:

Most of the new smartphone operation systems nowadays require 6 digits for PIN code. Typically, there are 10 digits of candidates from 0 to 9 for each digit. The users often have 4 to 6 PIN code tries for both Android and iOS before the device is locked. Therefore, P r A 4 · 10 −10 to 6 · 10 −10 . Suppose the attacker can guess the PIN code after shoulder surfing and then robs the smartphone of the user. Since the application is designed such that every GPS record is sent to the server in realtime and the GPS history is not stored in the user smartphone, the attacker cannot see the log from the robbed phone to imitate the user's behavior. Also, there is no function of downloading the GPS log from the server to the smartphone because it is a doubtable action from a (suspicious) user. The only action that the attacker can manipulate on the GPS tracking application is to turn it on/off or uninstall it. If the attacker continues to use the smartphone (without the ability to search for the history log from the smartphone application), the probability for the attacker P r A is now 0.01%. Even though it is not 0% for the best case, it is still much better than 100% for A to break the system without our approach. Similarly, if the collusion attack in which an authorized user shares his/her PIN code to others occurs, P r A is also 0.01%. If the colluded user tells others his/her personal location history, it is unlikely for every single continuous GPS record to be imitated. This is why the idea of using behaviors (especially, long-term and continuous) is investigated.

The model is assumed that the server storing the GPS cannot be accessed or corrupted by the adversary. The data is encrypted and only the trusted server can decrypt it. The data is transmitted via a secure network. Each smartphone is used by only a unique user. The smartphone and the server are protected against the side-channel attack which can collect the user data via timing information, power consumption, electromagnetic leaks, or sound. Last but not least, the users are assumed to be honest in sending their own data to the server where the classifier is performed because the data may be actively manipulated by an adversary seeking to make the classifier produce false negatives [29] .

In this section, we discuss other security scenarios from using smartphones.

What if two users live and work in the same areas? As mentioned in Section 3.1, since our project recruited the users randomly, the users live and work in random areas. Even if there is a very rare case when two users live and work in the same area, they cannot have the same GPS tracking for every single hour because each user has many different activities at different timestamp not just at home and office (such as shopping, outdoor exercising, picking children at schools, etc.). Furthermore, inside the home and the office building, indoor positioning can be collected besides the GPS such as WiFi or Bluetooth beacons. Since the goal in this paper is to investigate the benefit of the extra information (i.e., the distance coherence) from the GPS itself, we do not consider to collect indoor location information; however, it is completely possible since the GPS and the indoor location information can be collected independently. Let's consider the case when legal users have the same trajectory within a period of time (e.g., elderly people in a senior home have daily activities confined to the surroundings). Since the longitude and latitude values have 6 decimal places (see Section 3.1), the precision is 0.1 meters. With this precision, two users cannot have the same movement log in a long period. How does the system work when individuals are outside their routine or when the attacker follows (imitates) the users behavior? Since these questions are not just for the GPS-based location authentication but the general behavioral-based authentication, we discuss from the general to specific perspectives. We want to emphasize that a single-factor behavioral-based authentication is used to support (not to replace) the conventional approaches such as password or biometrics; or it is combined with other behavioral factors to build up a multifactor behavioral-based authentication. If a user is outside his/her routine or the attacker tries to imitate the user's behavior, the password/biometric or other routines are used to lower the false rejection and false acceptance rates. Although behavioral-based authentication has not yet been commonly used, this new but promising research has been proved to be possible for real applications. For instance, Google has launched the Project Abacus [34] just since 2016 to collect smartphone sensor signals (i.e., front-facing camera, touchscreen and keyboard, gyroscope, accelerometer, magnetometer, ambient light sensor, etc) and demonstrated that human kinematics can convey important information about user identity and can serve as a valuable component of multi-modal authentication systems. Among many behaviors, location is a typical factor to identify users. Human beings are creatures of habit, and in as much as location is a measure of habit [5] . Also, the location is easy to collect since it is available in most modern smartphones.

Is it a problem when a user gets a new phone? It has no problem since the smartphone is just the device/tool, not the method. The user can register a location-based authentication system with an account and its application installed in his smartphones. As long as the user does not share his account to others and as long as the application is designed such that at a specific timestamp, an account can only be logged in a smartphone, his unique GPS data can be collected regardless of how many smartphones are used and regardless of whether the user shares his smartphones to others.

This section, we describe an idea for future work based on the separation of daily and weekly distance coherences. In our current approach, for each sample s i , the distance coherence features are calculated by grouping the other samples which have the corresponding clock hours close to the clock hour of s i regardless of the dates. We thus call it daily distance coherence. An example is given in the first chart of Figure 4 . The features chosen for the sample s i which has the timestamp at 7:00 April 10, 2020 (Friday) are calculated using the samples at 7:00±α on any date as long as they belong to the same user. However, another promising method may improve the accuracy or F1 score. For each sample s i , the distance coherence features are calculated by grouping the other samples which have the clock hours close to the clock hour of s i on only the days that have the same day of the week. We thus call it weekly distance coherence. Look at the example in the second chart of Figure 4 , suppose s i occurred at 7:00 April 10, 2020 (Friday), the featured chosen for s i are calculated from the samples at 7:00±α on every day of Friday such as April 03, 2020 or April 17, 2020, etc. These features may reflect the lifestyle of the users that we are aiming for in this paper. For example, a worker goes to work every weekday but goes to the usual supermarket every Saturday around 10:00, a student has a training course at a usual stadium every Thursday around 15:00. These habits can be measured by the weekly distance coherence. Remark that, the weekly distance coherence features are not covered in the daily ones. Each feature is computed from the average of all the samples chosen for the main sample. Even though the set of the samples chosen for the weekly case is a subset of the set of that in the daily case, their averages are different.

In this paper, we have shown that using the distance coherence score as the additional features can improve user authentication. We collected 107,637 GPS records including longitude, latitude, and timestamp from 348 users in Japan. The three average ensemble algorithms including RandomForest, ExtraTrees, and Bagging are applied to the classification and are evaluated using stratified k-fold. The experimental result showed that our approach outperforms the approach without the distance coherence in all the cases. The accuracy can reach up to 99.42%, 99.12%, and 99.25% using RandomForest, ExtraTrees, and Bagging, respectively. Especially, the F1 score can be improved even 2.34% and the false negative rate can be reduced 0.04% using ExtraTrees. -For s 2 , hour(s 2 ) = 11. The samples s i from user1 that satisfy (hour(s 2 ) − α) ≤ hour(s i ) ≤ (hour(s 2 )+α) consist of s 1 and s 3 (hour(s 1 ) = 10, hour(s 3 ) = 12). Thus: 

Random Forests

Extremely randomized trees

Ensembles on Random Patches

Cabinet Office, the Government of Japan

Active Authentication on Mobile Devices via Stylometry, Application Usage, Web Browsing, and GPS Location

SenGuard: passive user identification on smartphones using multiple sensors

Selfenhancing GPS-Based Authentication Using Corresponding Address

Multi-Lock: Mobile Active Authentication based on Multiple Biometric and Behavioral Patterns

System and method for real world biometric analytics through the use of a multimodal biometric analytic wallet

Multimodal Deep Learning for Activity and Context Recognition

Modifying smartphone user locking behavior

Are you ready to lock

It's a hard lock life: A field study of smartphone (un) locking behavior and risk perception

Active user authentication for smartphones: A challenge data set and benchmark results

Hashcat -Advanced Password Recovery

John the Ripper password cracker

Mobile Iris Challenge Evaluation (MICHE)-I, biometric iris dataset and protocols

How to Generate Spoofed Irises From an Iris Code Template

Cosmetic Contact Lenses and Iris Recognition Spoofing

Unraveling the Effect of Textured Contact Lenses on Iris Recognition

On the effectiveness of local binary patterns in face anti-spoofing

Spoofing in 2D face recognition with 3D masks and anti-spoofing with Kinect

System for and method of securing fingerprint biometric systems against fake-finger spoofing

Deep Representations for Iris, Face, and Fingerprint Spoofing Detection

Smudge attacks on smartphone touch screens

Understanding Shoulder Surfing in the Wild: Stories from Users and Observers

Assentication: User Deauthentication and Lunchtime Attack Mitigation with Seated Posture Biometric

Authentication Using Pulse-Response Biometrics

Adversarial classification

Preventing Lunchtime Attacks: Fighting Insider Threats With Eye Movement Biometrics

Continuous authentication using onedimensional multi-resolution local binary patterns (1dmrlbp) in ecg biometrics

Human Factors in Exhaustion and Stress of Japanese Nursery Teachers: Evidence from Regression Model on A Novel Dataset

STARFAST: a Wireless Wearable EEG/ECG Biometric System based on the ENOBIO Sensor

Learning Human Identity From Motion Patterns

In this section, we give a numeric example for the distance coherence extraction in Section 3.2. Suppose the data consists of 7 samples {s 1 , s 2 , · · · , s 7 } from 2 users {user1, user2} as showed in Table 4 . We explain how to calculate the distance coherence for each sample {dc 11 , dc 12 , dc 13 , dc 21 , dc 22 , dc 23 , dc 24 }. Suppose α (the number of distance coherence feature) is set to α = 1.-For s 1 , the hour extracted from the timestamp is hour(s 1 ) = 10. We find all the samples s i that belong to the same class (user1) and have hour(s i ) such that (hour(s 1 ) − α) ≤ hour(s i ) ≤ (hour(s 1 ) + α) regardless of the date and the second. Only s 2 satisfies the conditions (i.e., hour(s 2 ) = 11). Thus: dc 11 = 2 (lon 11 − lon 12 ) 2 + (lat 11 − lat 12 ) 2 (9)