Submitted 30 May 2018 Accepted 24 June 2019 Published 12 August 2019 Corresponding authors Reza Arfa, rezaarfa@gmail.com Rubiyah Yusof, rubiyah.kl@utm.my Academic editor Pablo Arbelaez Additional Information and Declarations can be found on page 12 DOI 10.7717/peerj-cs.206 Copyright 2019 Arfa et al. Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS Novel trajectory clustering method based on distance dependent Chinese restaurant process Reza Arfa1,2, Rubiyah Yusof1,2 and Parvaneh Shabanzadeh1,2 1 Centre for Artificial Intelligence and Robotics, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia 2 Centre for Artificial Intelligence and Robotics, Malaysia-Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia ABSTRACT Trajectory clustering and path modelling are two core tasks in intelligent transport systems with a wide range of applications, from modeling drivers’ behavior to traffic monitoring of road intersections. Traditional trajectory analysis considers them as separate tasks, where the system first clusters the trajectories into a known number of clusters and then the path taken in each cluster is modelled. However, such a hierarchy does not allow the knowledge of the path model to be used to improve the performance of trajectory clustering. Based on the distance dependent Chinese restaurant process (DDCRP), a trajectory analysis system that simultaneously performs trajectory clustering and path modelling was proposed. Unlike most traditional approaches where the number of clusters should be known, the proposed method decides the number of clusters automatically. The proposed algorithm was tested on two publicly available trajectory datasets, and the experimental results recorded better performance and considerable improvement in both datasets for the task of trajectory clustering compared to traditional approaches. The study proved that the proposed method is an appropriate candidate to be used for trajectory clustering and path modelling. Subjects Artificial Intelligence, Computer Vision, Visual Analytics Keywords Path modelling, Trajectory clustering, Anomaly detection, Chinese restaurant process, Distance dependent CRP INTRODUCTION The trajectory of a moving object obtained by tracking the object’s position from one frame to the next is a simple yet efficient descriptor of an object’s motion. Trajectory analysis has long been a research focus in different fields of study (Jonsen, Myers & Flemming, 2003; Pao et al., 2012; Reed et al., 1999; Fox, Sudderth & Willsky, 2007). In the context of intelligent surveillance systems (ITS) (Tian et al., 2017), trajectory clustering is a critical core technology in many surveillance applications including activity analysis (Morris & Trivedi, 2011), path modelling (Zhang, Lu & Li, 2009), anomaly detection (Dee & Velastin, 2008), and road intersection traffic monitoring (Aköz & Karsligil, 2014). Many trajectory analysis systems consist of two main steps. In the first step, trajectories are grouped into clusters based on their similarities. Most proposed methods assume the number of clusters to be known. After the trajectories are clustered, the path taken by agents How to cite this article Arfa R, Yusof R, Shabanzadeh P. 2019. Novel trajectory clustering method based on distance dependent Chinese restaurant process. PeerJ Comput. Sci. 5:e206 http://doi.org/10.7717/peerj-cs.206 mailto:rezaarfa@gmail.com mailto:rubiyah.kl@utm.my https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj-cs.206 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://doi.org/10.7717/peerj-cs.206 in each cluster will be modelled. There are at least two limitations with these approaches. First, in real-world problems, the number of clusters is usually unknown or is expensive to acquire. Furthermore, trajectory clusters and path models are closely related, whereby the knowledge of one helps in improving the performance of the other. Most existing trajectory analysis methods can be categorized into similarity-based models and Probabilistic Topic Models (PTM). The main stages of similarity-based approaches are calculating a similarity matrix and clustering the trajectories based on the similarity matrix. At the first stage, pairwise similarities between trajectories are obtained via a similarity function and stored into a N ×N matrix, where N is the total number of available trajectories. Defining a suitable similarity measure is a challenging task that directly affects the overall accuracy of the system (Zhang, Kaiqi & Tieniu, 2006). Well-known similarity measures used for trajectory analysis include Euclidean distance, dynamic time wrapping (DTW) (Keogh & Pazzani, 2000), Hausdorff distance (Atev, Miller & Papanikolopoulos, 2010), and Longest Common Sub-Sequences (LCSS) (Vlachos, Kollios & Gunopulos, 2002). After the similarity matrix is obtained, the second stage uses any standard clustering algorithm to cluster the trajectories into K clusters based on their similarities. Typical clustering algorithms include spectral clustering (Ng, Jordan & Weiss, 2002), agglomerative clustering (Xi, Weiming & Wei, 2006), and fuzzy c-means (Weiming et al., 2006). The main disadvantage of similarity-based approaches is that it requires the number of clusters, K , to be known in advance. When trajectories are clustered, some studies perform path modelling in a further stage. Path models are useful in intelligent surveillance systems and used for compact representation of clusters, performing real-time anomaly detection (Morris & Trivedi, 2011), and high-level scene understanding (Lei et al., 2014), and route planning (Joseph et al., 2011). Makris & Ellis (2005) modelled the path as an envelope, which denotes the extent of a path by finding the two farthest samples in a cluster. Morris & Trivedi (2011) used the weighted average of trajectories of each cluster to form the path model for that cluster. Based on the dominant set clustering approach, Yiwen et al. (2014) proposed a system that obtains the scene structure from clustered trajectories. All these approaches, however, model the path after the trajectories are clustered. Therefore, the performance of the modelled path is limited to how well trajectories are clustered. Also, the modelled path is not used to improve the trajectory clustering. Another well-known class of approaches in trajectory analysis is based on probabilistic topic model (PTM) (Papadopoulos, 2008). In PTM approaches, trajectories are first converted into a set of symbols via a pre-defined codebook. This new representation of trajectories is then treated as documents while the symbols are treated as words. Compared to a similarity-based approach, trajectory analysis methods based on PTM do not usually require the number of clusters in advance. Jeong, Chang & Choi (2011) used latent Dirichlet allocation (LDA) and the hidden Markov model (HMM) to discover the semantic regions and the temporal relationship between them. A two-level LDA topic model is proposed by Song et al. (Lei et al., 2014). The first level LDA models the motion of single-agent as distributions over patch-based Arfa et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.206 2/15 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.206 features. The second level LDA uses the output of the first-level to learn interactions over multi-agents. This model, however, does not perform trajectory clustering. Wang et al. (2011) proposed a dual hierarchical Dirichlet process (Dual-HDP). Unlike previous PTM models, Dual-HDP is capable of clustering the trajectories and modelling the semantic scene at the same time. Each semantic region is modelled as a distribution over grids, and the scene is modelled as a distribution over the semantic regions. The number of clusters and the semantic scene is decided automatically. Since the model relies only on bag-of-grids representation, it cannot capture the long-term dependency between observations. This results in having a partial path model for each cluster. Having a full path model is an important step for interpreting agents’ movement in scenarios such as highways and junctions. Furthermore, since only quantised trajectories are used, the overall performance of Dual-HDP is highly sensitive to grid size. Choosing a large grid size rapidly decreases the performance due to quantisation error. On the other hand, choosing a small grid size requires considerably more amount of data to learn the trajectory patterns. This study proposed a trajectory clustering and path modelling system that clusters the trajectories and models the path taken by each cluster at the same time. Our approach is based on distant dependent Chinese restaurant process (DDCRP) (Blei & Frazier, 2011), which is a generalisation of the normal Chinese restaurant process (CRP) (Pitman, 2002). METHODS Distance dependence chinese restaurant process The Chinese restaurant process (CRP) is a distribution on partitions of integers proposed by Pitman (2002). CRP can be explained by the following analogy: Imagine a Chinese restaurant with an infinite number of tables. The first customer enters the restaurant and sits at the first table with probability1. Next, customers enter the restaurant and sit at occupied tables with probability proportional to the number of customers sitting on that table or sit at an empty table with the probability relative to a parameter α. After this process, which is known as a customer-table assignment, customers sitting on the same table will share a similar dish. This process can be described as follows: P(zi=k|z−i,α)∝ { nk,k≤K α,k=K +1 (1) where zi denotes table assignment for the ith customer, K is the total number of occupied tables, and z−i is table assignmthe ent of all other customers except ith customer, and nk is the total number of customers sitting on the ith table. More details of CRP and its connection to Dirichlet process can be found in Gershman & Blei (2012). The distance dependence Chinese restaurant process (DDCRP) generalises the CRP and allows for a non-exchangeable distribution over partitions (Blei & Frazier, 2011). Unlike CRP, where each customer is assigned to a table, in DDCRP each customer is assigned to another customer with a probability relative to their distance/similarity. Therefore, the more similar two customers, the more probable they will get a direct link. It is important to Arfa et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.206 3/15 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.206 note that it is still possible for two customers with small similarities to be indirectly linked to each other via intermediate customers. After this procedure, which is also known as a customer to customer assignment, customers who are directly or indirectly linked will sit down at a table and share a similar dish. More formally, let dij represent the distance between ith and jth customers. Probability of customer i have a direct link with customer j is calculated as: P ( ci= j|D,f ,τ ) ∝ { τ, if i= j f (dij), otherwise (2) where f (d) denotes a monolithically decreasing decaying function that satisfies f (∞)=0, D is the matrix of pairwise distance between customers, and τ is a constant that indicates the probability of self-link. The DDCRP was proposed originally for modelling non-exchangeable text documents where the distance between the dates of documents determines their similarity. The documents are converted into their bag-of-words (BoW) representation before the posterior probability of DDCRP is calculated. Such a conversion to BoW representation is a crucial step that makes the inference of DDCRP computationally tractable. Recently researchers have adopted DDCRP for problems beyond language processing. Ghosh et al. (2011) proposed a hierarchical extension of DDCRP for producing coarser image segmentations in the form of human-like segmentations. In a more recent study, Baldassano, Beck & Li (2015) used DDCRP to model a complex web of connections with a small number of interacting units. The proposed method is used to model the connectivity between sub-regions of the human brain and analysing human migration behaviour. Also, Ren et al. (2016) used DDCRP for key frame selection from unordered image sets, where the selected frames are used for dense 3D reconstruction. Trajectory analysis with distance dependent CRP Unlike text data where observations in documents are words sampled from a corpus with a limited number of words, observations in trajectories are not discrete. Trajectories are vectors with varying length where each observation gets a real value bounded by the scene’s size. One can divide the scene into blocks of equal sizes and convert a trajectory into its discrete form. After such a conversion, the resulting quantized trajectories are equal length vectors and each observation gets a discrete value. The size of grids in this case, however, will have a direct impact on the system performance. While theoretically smaller grids can improve the performance, they require substantially more data for training. Another disadvantage of treating trajectories as documents is the bag-of-words representation. Such representation discards the order between observations. Discarding the orders between samples in trajectory data is problematic since it is possible for agents from opposite directions to share the same observations over grids. One solution to avoid this problem is to quantise the direction of observations (Wang et al., 2011). Estimating the direction of observation requires further processing and sometimes includes an inaccurate estimation. Such a quantisation increases the size of the corpus and, therefore, requires more data for training. In addition, with bag-of-word representation alone long-term Arfa et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.206 4/15 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.206 dependencies between observation cannot be captured which results in having partial path models in existing PTM approaches. We addressed these problems by using similarity between trajectories as the prior probability in DDCRP. Using such a prior probability limits the assignment of trajectories and promotes trajectories to get linked based on how similar two trajectories are. In addition to the similarity measure, whether the trajectories are linked together or not, also will depend on their discrete observation over the grids. Since most similarity measures can be applied prior to converting the trajectory into discrete form, such a formulation is less sensitive to the choice of grid size. In addition, since some similarity measures, including Modified Hausdorff and LCSS, also take the order of the observations into account, it is not required to quantise the direction anymore. Any raw trajectory Ti, is usually represented by a sequence of its ni observation Ti =[oi,1,...,oi,l,...,oi,ni]. In this representation, oi,l indicates lth observed position of ith object. Let dij to indicate pairwise distance between ith and jth trajectories. This distance can be of any general distance used to measure similarity between trajectories. The result of pairwise distance between N trajectories can be stored in a distance matrix and denoted as D∈