key: cord-0675752-peyqnpkk authors: Ren, Jing; Xia, Feng; Liu, Yemeng; Lee, Ivan title: Deep Video Anomaly Detection: Opportunities and Challenges date: 2021-10-11 journal: nan DOI: nan sha: 75468f55b7df661558336354509bad417b5960fa doc_id: 675752 cord_uid: peyqnpkk Anomaly detection is a popular and vital task in various research contexts, which has been studied for several decades. To ensure the safety of people's lives and assets, video surveillance has been widely deployed in various public spaces, such as crossroads, elevators, hospitals, banks, and even in private homes. Deep learning has shown its capacity in a number of domains, ranging from acoustics, images, to natural language processing. However, it is non-trivial to devise intelligent video anomaly detection systems cause anomalies significantly differ from each other in different application scenarios. There are numerous advantages if such intelligent systems could be realised in our daily lives, such as saving human resources in a large degree, reducing financial burden on the government, and identifying the anomalous behaviours timely and accurately. Recently, many studies on extending deep learning models for solving anomaly detection problems have emerged, resulting in beneficial advances in deep video anomaly detection techniques. In this paper, we present a comprehensive review of deep learning-based methods to detect the video anomalies from a new perspective. Specifically, we summarise the opportunities and challenges of deep learning models on video anomaly detection tasks, respectively. We put forth several potential future research directions of intelligent video anomaly detection system in various application domains. Moreover, we summarise the characteristics and technical problems in current deep learning methods for video anomaly detection. With the decreased cost of deploying surveillance cameras, the application of video surveillance is widely expanded into different scenarios. These surveillance cameras constantly produce abundant surveillance videos, which brings a huge workload and a series of technical challenges for anomaly detection in different contexts [1] . Over the past decades, deep learning has achieved great success and showed superior performance in many tasks that were previously considered to be computationally unattainable, such as face matching [2] , recommendation system [3] , and anomaly detection [4] . Correspondingly, more and more efforts have been devoted to video anomaly detection with deep learning-based models, which also made great achievements in recent years [5] . An intelligent video anomaly detection system is capable of detecting the abnormal behaviours or entities that diverge significantly from the normality, such as identifying multiple moving objects with limited prior knowledge for video surveil-lance [6] , or detecting specific incidents such as fighting, stampede, traffic accident, and vagrant [7] , [8] . Video anomaly is usually contextual and defined as per the real scenarios. For example, it is normal to observe crowd gathering in supermarkets or vocal concert while abnormal when social distance is required to stall the spread of virus. Among most video anomaly detection algorithms, most of them can localise the anomalies temporally and spatially [9] . Specifically, detection process concentrates on identifying the video fragments that contain anomalies among all videos, while localization devotes to determining which frame is anomalous and explaining which part of this frame is considered anomalous. Recent relevant research can deal with both problems with deep learning-based models offering an end-to-end solution. Due to its great importance, recent years have witnessed an upsurge of research interests and applications of intelligent video anomaly detection system. However, anomaly detection in video surveillance still faces a series of challenges: • Ambiguousness. Anomaly detection is broadly regarded as the process of detecting events that are not expected to appear in a specific context. However, in real-world situations, the boundary between normal and abnormal items are not partitioned clearly. For example, some normal samples will also exhibit strange characteristics that abnormal events hold, which impedes the detection accuracy of models. • Dependency. Up to now, there has not been a unified definition of anomaly despite that it is introduced in numerous literatures. On the other hand, all these definitions could not be directly applied into a specific anomaly detection task. Even the same event are likely to have different characteristics and vary widely in different backgrounds. The contextual dependency of anomaly make the detection models not adaptable. • Sparcity and Diversity. Different from the general classification tasks, positive samples (i.e. anomaly) are much less than negative samples in real-world anomaly detection datasets. This kind of data imbalance characteristic make the supervised models difficult to train. Besides, real-world anomalous behaviours are diverse and cannot be illustrated entirely, sometimes it may even have not happened yet. Therefore, it is impractical to consider all types of possible anomalies in one model. • Privacy. When detecting anomalies in non-video datasets, the private information of users (e.g. name) could be replaced by random generalised codes, which has no influence on the final experimental results. While in video surveillance data especially including facial and behavioural information, the individual privacy will be offended if the data is open source. This privacy characteristic leads to the fact of being short of open-source dataset. • Noise. With the wide coverage of video surveillance, cameras are deployed for improving safety, and they are frequently found in places such as elevators, cross roads, shopping malls, restaurants and even some personal homes. While obtaining video surveillance data is readily supported by existing imaging facilities, manually annotating these data is a time consuming process and it is prone to errors. The noise of data will undoubtedly influence the accuracy of models eventually. To tackle the challenges mentioned above, various algorithms have been devised and gained remarkable experimental results. There have been relevant surveys introducing video anomaly detection models. Kiran et al. [10] reviewed unsupervised and semi-supervised video anomaly detection models. Mabrouk and Zagrouba [11] detailed the procedure inside an intelligent video anomaly detection system, including feature extraction and description. Pawar and Attar [12] analysed deep learning techniques for video-based anomalous activity detection. Yao and Hu [13] introduced both traditional and deep learning-based approaches for video violence detection. Both [14] and [15] provided a comprehensive survey of deep learning-based models for video anomaly detection with minor differences in classifications, while [14] has an additional part evaluating performance among the models. Su et al. [16] summarised the latest methods of violence detection in existing video sequences. Roshan et al. [17] reviewed the recent trends in violence detection, and performed a comparative study of different state-of-the-art shallow and deep models. Ramzan et al. [18] reviewed various state-of-the-art techniques of violence detection, which is not limited to deep learning models. In [19] , the authors conducted an in-depth analysis of the deep learning based anomaly detection methods for image and video data. Moreover, current challenges and future research directions are also discussed. Our work is different from these previous studies in two ways. On the one hand, this survey investigates various applications that video anomaly detection systems could be applied, which is not limited to a fixed domain. On the other hand, we systematically summarise potential opportunities in different applications, and challenges that are still existed in present algorithms, rather than compare the mechanisms behind algorithms as other surveys do. Our contributions are outlined as follows. • A prospective summarization of the opportunities and challenges on video anomaly detection with deep learning methods is presented. • A series of potential research and development directions of intelligent video anomaly detection systems in various application domains have been put forth. • A thorough analysis of major technical challenges in deep learning methods for video anomaly detection has been summarised, thus providing insights into further improvements of models. The rest of this survey paper is organised as follows. In section II, we point out the potential opportunities for deep video anomaly detection models in different real application scenarios. Then, we review existing technical challenges, as shown in section III. Finally, the concluding remark of this survey is presented in section IV. Most existing studies are devoted to detecting anomalies in traffic video surveillance, while video anomaly detection tasks broadly exist in various real-world scenarios. In this section, we not only introduce the deep video anomaly detection in intelligent transportation, but also outlined several potential opportunities in other fields, i.e., digital education, smart home, public heath, and digital twins. Transportation is an essential part of the production, life, and economic development of human society. The current transportation system provides people with fast, comfortable, and safe transportation services [20] . However, the increasing demand for transportation by the rapidly growing population have directly led to an explosive increase in the number of motor vehicles. Therefore, problems such as traffic congestion and frequent traffic accidents will follow. In response to this, intelligent transportation system (ITS) consequently came into being, and practice has proved that ITS is an ideal solution to the traffic problems caused by the current economic development [21] , [22] . It is acknowledged that ITS is the most popular research direction among other video anomaly detection applications, which has gained remarkable improvement on detection results as well. Anomaly detection tasks in road traffic scenarios are usually broad, focusing on entities such as vehicles, pedestrian, environment and their interactions [23] , [24] . Considering that the detection accuracy of traffic monitoring system is influenced by several factors, such as weather and traffic condition, much efforts have been devoted to studying the robustness of detection results in the intelligent traffic system [25] , [26] . With recent development of deep learning and wireless communication technologies, numerous innovative traffic monitoring systems have been developed [27] - [29] . Li et al. [30] aim to detect vehicle anomalies (e.g. traffic accidents) in an unsupervised way. The detection framework is built using a Faster R-CNN [31] , which adopts SENet [32] as the backbone feature extractor. Aboah [33] proposed a vision-based system for traffic anomaly detection. The anomaly detection process is composed of three main parts: a background estimator to extract background features, a road mask extractor to filter out false anomaly candidates, and a decision tree to confirm and finalise detection results. Despite the consistent development of new deep learning based model for improving the video anomaly detection accuracy in different kinds of environments, there are many open opportunities to be studied in the future work. For example, there is still a huge gap between learning algorithms and real-world deployment of systems. Moreover, the realism of simulation environment in autonomous driving should be improved to ensure the robustness of the model in unstable traffic situations. The traditional offline teaching and learning process is gradually shifting to online platforms due to the development of ICT technologies during the last decade. The outbreak of COVID-19 accelerated this process. Due to this epidemic, online education will be the main way of knowledge delivery within the next period of time. Meanwhile, online examinations popularize as the time requires. Detecting cheating behaviours and proctoring remote online examinations efficiently and effectively is an essential precondition to ensure fairness among examinees [34] . However, traditional cheating detection methods may no longer be wholly successful in fully preventing cheating during examinations. It is imperative to devise an artificial intelligence enabled system to automatically detect cheating behaviours during an examination. Actually, a series of techniques have been developed and applied to intelligent proctoring systems, such as gaze tracking, voice detection, and identification of any entities that are not allowed to exist during an examination. These technologies bring fair and objective examination supervision while saving manpower. Atoum et al. [35] proposed an OEP system to automatically and continuously detect cheating behaviours during online exams by using a wearcam and a webcam. Despite that a wearcam could provide a more broad view, equipping every student with a wearcam at home is still not realistic. Bawarith et al. [36] proposed an online protector in the e-exam management system that realised fingerprint authentication, and eye movement tracking. Moreover, students who move away from the screen can also be detected. Tiong and Lee [37] presented a deep learning system, DenseLSTM, as the behaviour detection agent. This method can extract better feature representation and strengthen the feature activation of the network, which is effective for predicting potential e-cheating behaviours. A flow diagram of an intelligent invigilation system is shown in Figure 1 . In essence, an educational video surveillance system is a complete record of student learning behaviour. This video data retains more detail than traditional forms of educational data storage. For example, for most education stakeholders, including researchers, the score of a course or a student's Grade Point Average (GPA) is often used to evaluate that student's knowledge mastery. This approach brings convenience while losing too much information. With the increase in computing power, we have the ability to process large amounts of data quickly. Recording the learning process through video certainly provides a great help in the analysis of teaching and learning. The video recording of the learning process undoubtedly preserves the entire learning process of the students as well as the examination process. In addition to cheat detection, this provides data security for all educationrelated anomaly analysis, including course failure analysis, psychological issues, etc. In order to ensure the safety of the home, many people install video surveillance system at home. Video monitoring is a small part of home automation systems, and is considered a comprehensive security guarantee [38] . People can use mobile phone and computer to watch the video and master the realtime home situation anytime and anywhere they want [39] . Because it is waste of time and energy staring at the screen all the time, automatically identifying anomalous behaviours and sending the alarm signal immediately is undoubtedly necessary. Yhaya et al. [40] proposed an adaptive system for abnormality detection in human activities. This data-driven system adapts to changes in human behavioural routine and has the capacity of discarding old behavioural patterns through embedding a forgetting mechanism. Withanage et al. [41] investigated applied computer vision to recognise fallen postures with RGB-D imaging, to facilitate robot-based in-situ assistantance of falling accidents in elderly independent living. Markovitz et al. [42] worked directly on human pose graphs that can be constructed from a video sequence, which will not be influenced by nuisance parameters such as viewpoint or illumination. This unsupervised deep learning model could identify anomalous human behaviours by learning normal behaviours. Similarly, Morais et al. [43] also learned regularity in skeleton trajectories by modelling the dynamics and interaction of the coupled features in their model. An advantage of this model is that it can provide explanations of its internal reasoning and the visualization of corresponding factors. This is an important part in deep learning based anomaly detection models. Existing research mostly focus on video monitoring techniques that can record the video fragment when someone appearing in the webcam, while automated anomaly detection is seldom studied. Increases in the elderly dependency ratio is a common problem faced all over the world, which increased additional burden on governments to fund pensions and healthcare [44] . Nevertheless, for people who cannot afford a caregiver or prefer to live alone, if the intelligent video anomaly detection system is installed in their homes, the elderly can live independently and the emergency (e.g. the elderly fell down) could be detected and handled timely [45] . Therefore, developing video anomaly detection systems in smart home is significant to enhance the quality and convenience of human life. In fact, this kind of intelligent system can also be installed in the hospital and nursing home to reduce the risk brought by unknown accidents [46] . Public health is an interdisciplinary field, which is related to a variety of areas including epidemiology, biostatistics, social sciences, etc. Besides, environmental health, community health, behavioural health, mental health, and other important sub-areas are also included in the scope of public health. The main purpose of public health is to improve the quality of human life through the prevention and treatment of diseases. By monitoring cases and health indicators, video anomaly detection can benefit public health from many perspectives. Taking the epidemic entitled coronavirus disease 2019 (COVID-19) as example, in order to avoid the further transmission of infectious disease, intelligent video surveillance system can be applied to detect the anomalous behaviours [47] , [48] . Bhambani et al. [49] proposed a real-time face mask and social distancing violation detection system using YOLO object detection on video footage and images. Zuo et al. [50] developed a deep learning-based pedestrian social distancing detection system, which could be used to analyse the new norm of urban mobility amid the pandemic. Saponara et al. [51] implemented a real-time, AI-based system for COVID-19, which is composed of a deep learning object detection model in combination with a social distancing calculation algorithm. Intelligent monitoring systems utilise real-time video information to detect anomalous patterns and perform predictive analytics. The anomaly type is then identified and predefined signals will then be initiated to perform remedial actions. Together with wearable sensors, user-specific behaviour pattern, and indoor environment parameters, residents' health conditions can be monitored and further analysed [52] . Vision-based ambient assisted living, also denoted as AAL, is developed to improve older and vulnerable people's daily lives. Comparing to environmental or body worn sensors, video anomaly detection technologies are much cheaper, more effective, and easier to be implemented. For example, fall detection methods are developed based on RGB camera, multiple cameras, and depth cameras [53] . Patient monitoring system is also another important application of video anomaly detection in public health. In hospitals, such system is employed to better observe patients regularly, which can detect abnormal activities in wards including irregular poses, unbalanced walking, bed climbing, etc [54] . Cattani et al. [55] presented a method to evaluate the periodicity possibility of pathological movements by extracting and processing motion signals from videos. Giving the credit to low cost of camera and the maturity of computer vision technologies, anomaly detection in public health will definitely be further developed. Vision-based methods can be combined with other sensor data to improve their robustness and accuracy. In an industrial environment, accurate anomaly detection could help the early detection of potential failures and proactive maintenance schedule management [56] . With the aim of achieving high-performance anomaly detection, recent years have witnessed the growing research interests of implementing Digital Twin technologies in a dynamic industrial edge/cloud network [57] . In general, Digital Twin (DT) technology is used to build a virtual environment that serves as the real-time digital counterpart of a physical object or process. Moreover, advances in Digital Twin technology can help realistic simulations of complex machinery, and thereby speeding up the process of realizing smart manufacturing and Industry 4.0. Nowadays, the importance of combining digital twin with deep learning is increasingly recognised by both academia and industry in anomaly detection tasks [58] . In [59] , the authors used DT to generate a large dataset of normal operation data covering a complete year of operation. Then, a Siamese Autoencoder (SAE) architecture is applied into anomaly detection in a weakly-supervised way. Due to the critical nature of the power grid, the ability to detect anomalies in the power grid is of critical importance [60] . In this paper, the authors used convolutional neural network (CNN) within the Automatic Network Guardian for ELectrical systems (ANGEL) Digital Twin environment to detect physical faults in a power system. The proposed method could not only detect the fault in the power system, but also have the capacity of identifying which bus contains the anomaly. Gao et al. [61] used DT to collect real-time data and realised real-time defect recognition. With the emergence of the new types of anomalies, the traditional models are time-consuming and costly, which have to be rebuilt. To solve this problem, they proposed a deep lifelong learning method for novel class recognition. It should be noted that all DT-driven anomaly detection systems mentioned above cannot be directly applied to video surveillance data. In modern industries, cameras are deployed with a high density to seamlessly monitor the status of machines and the activities of workers [58] . DT technology can adopt modern data visualization schemes such as virtual reality (VR) and augmented reality to provide more illustrative and user-friendly views. Therefore, integration of deep learning models and Digital Twin techniques can be further exploited to solve video anomaly detection tasks. Moreover, DT technology has the capacity to generate synthetic data sets including anomalies in different contexts, which solves the problem of lacking dataset with enough positive samples and without noise. An architecture diagram of an anomaly detection/prediction system by combining digital twin technology and deep learning models is shown in Figure 2 . Numerous deep learning-based models and intelligent systems have been proposed for the different types of anomalies and technical difficulties encountered in various applications. Obviously, these models and systems can help to reduce human resources consumption to a large extent, and make people's lives more convenient. However, there are still many problems and challenges in video anomaly detection. In this section, we discuss the technical problems and challenges existed in models according to the model structure (i.e., Reconstruction-based models, Predictive models, Generative models, One-Class classification models, and Hybrid models). There are some connections between the models of different categories. For example, a predictive model can use a generator to predict the next frame of video, and a discriminator to discriminate whether the prediction is real or fake [62] , [63] . A comparison among these models is summarised in Table III . The anomalous instances are often scarce compared with normal instances. To address this problem, reconstructionbased anomaly detection methods usually learn the features of normal behaviours in an unsupervised way. The basic idea of reconstruction model is to reconstruct normal data with low value of reconstruction error in testing phase, and make their distribution closer to training data. Correspondingly, the reconstruction error of anomalous data is expected to be higher. Deep AutoEncoder [74] is the most common used model in reconstruction models, which is composed of an Encoder to compress the input vector into a low-dimension embedding, and a Decoder to reconstruct this dense vector back to the input vector. The objective of a DeepAD [75] is to minimise the reconstruction error L between the input vector x i and the reconstructed vector, which could be expressed as: where N is the normal training data, and D(E(·)) is the DeepAD framework. Here, the Encoder could be any kind of neural networks, such as Convolution Neural Network (CNN), and Long Short-Term Memory (LSTM). Despite the popularity of DeepAD and its variants, Gong et al. [76] pointed out that the assumption of anomaly with higher value of reconstruction error will not be satisfied if an autoencoder is unable to generalise abnormal data. In other words, the anomaly is reconstructed using a generalised model, and the generated representation by encoder cannot guarantee its validity. Thus, the model cannot explain why the detected anomaly frame is anomalous. Video is composed of a series of frames, which could be viewed as an order of spatial and temporal signals. The task of a predictive model is to predict the t frame by giving the past p frames, which could be expressed as: [64] , [65] Predictive models Normal data can be well predicted, i.e., having closer difference between predicted frame and real frame than abnormal data Higher computational complexity [66] , [67] Generative models Generator generates irregularities for Discriminator network, and Discriminator is trained as a binary classifier Expensive training; Instability; Difficulties in reproduction; Mode collapse [68] , [69] One-Class classification models Normal data are compacted into a hyperplane or hypersphere, and anything deviating significantly from the normal behaviour is termed as anomalous Training with more hours [70] , [71] Hybrid models Deep learning models are used as feature extractors to generate representations, and representations are input to an classification algorithm Suboptimal detection performance due to the separation between representation learning and classification model [72] , [73] The loss function of a predictive model is constructed based on the real target frame and its prediction frame: where x t is the real target frame in timestamp t, and x t is the predicted frame. The predictive model assumes that normal events can be well predicted. Therefore, the difference between predicted frame x t and its ground truth x t can be used to detect abnormal events. Although the predictive models perform well in video anomaly detection tasks, it has higher computational complexity. Therefore, predictive model is more suitable for offline applications. Generative models usually contain an architecture to generate frames based on Gaussian distribution, such as Generative adversarial networks (GAN) [77] . GAN is composed of a generator and a discriminator. The role of the generator is to fit a new data distribution according to the actual distribution of the real data, and the discriminator is to discriminate whether the vector is extracting from real data or generated data. The loss function of GAN is expressed as follows: The front part of this function aims to maximise the probability of identifying the real data, and the latter is to discriminate the generated data. Here, generator and discriminator could be any kind of neural network architectures, like CNN. Different from other models, GAN could serve as an end-to-end model by simultaneously training the generator and the discriminator. Moreover, the generator could generate the abnormal samples at the same time. Therefore, GAN is one of the most widely used models in video anomaly detection. Despite its advantages, GAN suffers from some inevitable defects, including expensive training, instability, difficulties in reproduction and mode collapse. Considering the ambiguity and diversity of anomaly, the development of multi-class classification for the detection of video anomaly is urgently needed. In detection of video anomaly, researchers often treat anything deviating significantly from the normal behaviour to be termed as anomalous. Thus, the anomaly detection task with no anomalous labels could be viewed as a one-class classification (OCC) problem. The core idea of this kind of model in video anomaly detection is to find a hypersphere that encloses the network representations of the normal data [78] . Any data points that are not included in this hypersphere will be considered anomalous. The combination of deep learning and OCC models could be trained to learn the dense feature representation with the one-class classification objective jointly. However, this kind of model requires extended training time [5] . Every kind of model has its own objective function and specified advantage in solving anomaly detection tasks. Therefore, researchers can consider making multiple models serving different blocks in one model, which could take advantage of different models and improve the detection accuracy. In the hybrid models, the learned representative features from deep learning methods can be transferred to traditional algorithms like Support Vector Machine (SVM) classifiers [79] . The low dimensional feature vectors make hybrid models more scalable and computationally efficient, which is suitable for solving video anomaly detection tasks. Different from other models that have customised loss function, the loss function of a hybrid model is generic, which means the feature extractor has no influence on the feature representation. As a result, the performance of the hybrid model is suboptimal. Even though the hybrid Models have excellent performance in tasks, they are mostly task-dependent and not able to switch between different tasks. In this paper, we presented the potential opportunities for deep video anomaly detection models in several emerging real-world application scenarios, and discussed the technical problems in the literatures. The novel perspective of this survey in deep video anomaly detection offers a clear guidance to researchers who are interested in this field. A survey of singlescene video anomaly detection Matching algorithms: Fundamentals, applications and challenges Deep matrix factorization for trust-aware recommendation in social networks Industrial pollution areas detection and location via satellite-based IIoT Deep learning for anomaly detection: A review Track everything: Limiting prior knowledge in online multi-object recognition Mobile crowdsourcing in smart cities: Technologies, applications, and future challenges Human interactive behavior: A bibliographic review Video anomaly detection and localization based on an adaptive intra-frame classification network An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos Abnormal behavior recognition for intelligent video surveillance systems: A review Deep learning approaches for video-based anomalous activity detection A survey of video violence detection A comprehensive review on deep learning-based methods for video anomaly detection A survey on deep learning techniques for video anomaly detection Deep learning in video violence detection Violence detection in automated video surveillance: Recent trends and comparative studies A review on state-of-the-art violence detection techniques Image/video deep anomaly detection: A survey Joint computation offloading, power allocation, and channel assignment for 5G-enabled traffic management systems Big data analytics in intelligent transportation systems: A survey Exploring human mobility patterns in urban scenarios: A trajectory data perspective A survey on an emerging area: Deep learning for smart city data Lotad: Longterm traffic anomaly detection based on crowdsourced bus trajectory data Robust positioning for road information services in challenging environments Vehicle trajectory clustering based on dynamic representation learning of internet of vehicles An edge traffic flow detection scheme based on deep learning in an intelligent transportation system Traffic monitoring system based on deep learning and seismometer data Ranking station importance with human mobility patterns using subway network datasets Multi-granularity tracking with modularlized components for unsupervised vehicles anomaly detection Faster R-CNN: Towards realtime object detection with region proposal networks Squeeze-and-excitation networks A vision-based system for traffic anomaly detection using deep learning and decision trees Remote proctoring: Expanding reliability and trust Automated online exam proctoring E-exam cheating detection system E-cheating prevention measures: Detection of cheating at online examinations using deep learning approach-a case study Smart home anti-theft system: a novel approach for near real-time monitoring and smart home security for wellness protocol Security and privacy of smart home systems based on the internet of things and stereo matching algorithms Towards a data-driven adaptive anomaly detection system for human activity Fall recovery subactivity recognition with RGB-D cameras Graph embedded pose clustering for anomaly detection Learning regularity in skeleton trajectories for anomaly detection in videos Does population aging contribute to increased fiscal spending? The smart home for the elderly: Perceptions, technologies and psychological accessibilities: The requirements analysis for the elderly in Thailand Overview of artificial intelligence in medicine Video analytics on social distancing and detecting mask Social distancing detection with deep learning model Real-time face mask and social distancing violation detection system using yolo Referencefree video-to-real distance approximation-based urban social distancing analytics amid COVID-19 pandemic Implementing a real-time, AI-based, people detection and social distancing measuring system for COVID-19 Smart Home Environment: Artificial Intelligence-Enabled IoT Framework for Smart Living and Smart Health. AI-Based Services for Smart Cities and Urban Infrastructure A survey on vision-based fall detection Sensors, vision and networks: From video surveillance to activity recognition and health monitoring Monitoring infants by automatic video processing: A unified approach to motion analysis Detecting outlier patterns with query-based artificially generated searching conditions Digital twin-driven online anomaly detection for an automation system based on edge intelligence From surveillance to digital twin: Challenges and recent advances of signal processing for industrial internet of things Real-world anomaly detection by using digital twin systems and weakly supervised learning Smart grid anomaly detection using a deep learning digital twin A deep lifelong learning method for digital twin-driven defect recognition with novel classes Anomaly detection in traffic surveillance videos with gan-based future frame prediction Dual discriminator generative adversarial network for video anomaly detection Anomaly detection with robust deep autoencoders A study of deep convolutional auto-encoders for anomaly detection in videos Anomaly detection in surveillance video based on bidirectional prediction Residual spatiotemporal autoencoder for unsupervised video anomaly detection Variational autoencoder based anomaly detection using reconstruction probability Generative neural networks for anomaly detection in crowded scenes Adversarially learned one-class classifier for novelty detection A deep one-class neural network for anomalous event detection in complex scenes Highdimensional and large-scale anomaly detection using a linear one-class SVM with deep learning Anomalynet: An anomaly detection network for video surveillance DAEN: Deep autoencoder networks for hyperspectral unmixing Deepad: A generic framework based on deep learning for time series anomaly detection Memorizing normality to detect anomaly: Memoryaugmented deep autoencoder for unsupervised anomaly detection Generative adversarial networks Deep one-class classification Deep learning for anomaly detection: A survey The authors would like to thank Teng Guo, Shuo Yu, and Ke Sun for their help with the first draft of this paper.