key: cord-0869064-fyvjnugr authors: Sharadhi, A. K; Gururaj, Vybhavi; Shankar, Sahana P.; Supriya, M. S; Chogule, Neha Sanjay title: Face Mask Recogniser Using Image Processing and Computer Vision Approach date: 2022-04-03 journal: Global Transitions Proceedings DOI: 10.1016/j.gltp.2022.04.016 sha: 608dfd4a56d069ad1e203de9ad336e92f5b4426a doc_id: 869064 cord_uid: fyvjnugr The world saw a health crisis with the onset of the COVID-19 virus outbreak. The mask has been identified as the most efficient way to prevent the spread of virus [1]. This has driven the necessity for a face mask recogniser that not only detects the presence of a mask but also gives the accuracy to which a person is wearing the face mask. Also, the face mask should be recognised in all angles as well. The goal of this study is to create a new and improved real time face mask recogniser using image processing and computer vision approach. A Kaggle dataset which consisted of images with and without masks was used. For the purpose of this study a pre-trained convolutional neural network Mobile Net V2 was used. The performance of the given model was assessed. The model presented in this paper can detect the face mask with 98% precision. This Face mask recogniser can efficiently detect the face mask in side wise direction which makes it more useful. A comparison of the performance metrics of the existing algorithms is also presented. Now with the spread of the infectious variant OMICRON, it is necessary to implement such a robust face mask recogniser which can help control the spread. The COVID-19 virus created a paramount health emergency in the history of mankind [1] . The virus could spread by the droplets from the contaminated individual [2, 3] . The most important defence against the virus is the face mask. This is also advised by the World Health Organisation (WHO) [4, 5] . Wearing the mask can safeguard an individual against the virus. It is not only important to wear a mask but also wear the mask in a way that covers the nose and mouth completely. Wearing the mask inappropriately can also spread the virus and will not provide significant protection [6] . Developing a face mask recogniser that not only detects the mask but also the accuracy to which a person is wearing the mask can help prevent the outburst of the virus and save many lives [7] . This face mask recogniser can be used in public places to monitor the crowd and identify individuals who are not wearing the mask or those who are wearing it incorrectly. This can help to spread awareness and educate people the correct way to wear a mask. This implementation can help frontline workers focus on eradication of the virus [8] . This face mask recogniser is a necessity and as a face mask is our shield against the virus, developing this model was essential and presently with the scare of the new variants, this finds high application value which motivated the idea for this study. The face mask recogniser in this paper will be using Deep Learning (Tensor-flow and Keras), Image processing, Computer Vision (OpenCV). The coronavirus has created a health crisis globally. The most efficient technique to counter the virus is wearing a face mask. Many people wear the mask just for the sake of it and often are complacent on the way they wear their mask. This could further lead to the spread of the virus. The major spread of this virus is due to the lack of consciousness and carelessness of the people [9, 10] . The existing methods are not efficient and have insufficient robustness. There are limited datasets also available to train this data [1] . In the paper [9] , they have used a Multi-Task Cascaded Neural Network(MTCNN) to get the object of interest. Their dataset was trained using the LeNet Algorithm for efficient training. There was good prediction for faces with masks and relatively lower prediction for unmasked faces due to the variety of features present on the face. In [10, 11, 12] , the YOLO object detection is being used. It focused on face mask recognition as well as maintaining a certain distance in crowded places. The created model is employed in public places as it has good precision, not many heavy components and is time efficient. The model suggested in [13] , uses Support Vector Machine (SVM) with a soft-margin. The dataset used in this is the Face Mask detection dataset. Here during the process of evaluation, a confusion matrix is used to evaluate the performance. The model has 91.7% accuracy. This model can however detect the face but cannot specify the extent to which the person is correctly wearing the face mask. It detects the face mask presence and gives a 'mask' or 'no mask' classification only. It is however much faster than deep learning models. In the approach given in [14] , a bounding box was generated around the facial features of the individual. This was used for classification into the "mask" and "no mask" categories. This paper suggested sending an email to the unidentified person cautioning them to wear the face mask at all times. It mainly uses deep learning and computer vision to achieve the required output. In the method proposed in [15, 16, 17] , MobileNetV2 pre-trained classifier is used for the purpose of classification. A Kaggle dataset with real world persons with masks is used. The various steps were training the model, validation, and then finally testing of the model. This model provided a large percentage of recognition and could be scaled to public places for generating awareness. Detailed description about the algorithm and various techniques used for construction of this study"s Face Mask Recognition Model is given in this part. In Fig. 1 , it starts with data collection. In this case study, the data is collected from an available Kaggle dataset. Then it is loaded to perform pre-processing in-order to clean the collected data. Then the data is split into a training and testing set. Training dataset will be used to train the model while testing data will be used to test the model. A fully trained and tested model results in effective and accurate detection of the presence or absence of a face mask. Deep learning is a type of machine learning algorithm which is influenced by the structure of the human brain. On the basis of given logical structure, it analyses the data to draw conclusions that would be similar to a human. It operates on multilayered neural network algorithms to understand the given data. Neural network algorithms are also influenced by the human brain's structure, which performs detection of patterns in order to identify various kinds of data. It has individual functioning layers to filter out different kinds that work similar to the human brain [18] . It is one of the important types of a neural network that is used for object detection, image classification, image recognition and face recognition etc. It carries higher feature extraction capabilities and it has a low processing cost, therefore it plays a vital role in pattern recognition jobs involving computer vision. It uses a convolution kernel to retrieve top-level features from images, by converging those convolution kernels with the initial input images, which are analysed and categorised later by these algorithms. The important factor for the image is its resolution, and it is interpreted as the set of pixels by the computer [19] . It is an important Image Classification architectural model of Convolutional neural network (CNN) which comprises different layers, which contains a set of learnable filters. It needs very few competing resources and is suitable for handheld systems, embedded devices and computers with the low processing powers or poor GPUs. It is compatible for web browsers as they have limited computation, graphics processing and storage. The main layer of Mobile Net V2 model is Depth Wise Separable Convolution Filter. It improves efficiency by tailoring network structure, width and resolution by regulating the transactions between accuracy and latency [20, 21] . One of the rapidly growing technologies nowadays, and it is a method where several actions are taken on the input image in order to get informational insights from enhanced images. Input images are taken in the format of pixels where each pixel correlates with the three colours red, green, blue (RGB) and sometimes black and white. It is also a kind of signal processing and has two methods: analog image processing and digital image processing. An analog image processing is used in case of hard copies such as print-outs and images. In the usage of digital image processing technique, the image must undergo three important stages which are the processing, enhancement and display, knowledge extraction [22] . Face Net was proposed by Google researchers in 2015 and it is the facial recognition system that uses deep convolutional neural network architecture such as ZF-Net and Inception in order to generate very high-quality facial landmarks mapping from the images. It also decreases usage of many parameters by adding 1*1 convolutions. It accomplished best results in face recognition dataset such as YouTube Face Database and Labelled Faces in the Wild (LFW). The first step in developing a Face Mask Recogniser Classification Model is acquiring the necessary data. The dataset will be used for training of the data on the individuals who are wearing a mask and who are not, so that the Mask Recognition model can distinguish the individuals who are wearing masks or not. In this research study, to build this model, the used dataset is obtained from the Kaggle which consists of 7553 RGB images, containing two classes: with-mask and without-mask. Images of faces with masks are 3725 and images of faces without masks are 3828. The next step is about labelling the collected data into two respective groups as: with-mask and without-mask. And they are characterised into two groups as shown in Fig.2. and Fig.3 ., for example: This is the phase which is done before a splitting of the dataset into training and testing sets. The major phases in preprocessing are resizing the image size, converting those images into arrays, lastly is to perform one hot encoding on the labels. This procedure is really important as any classification model"s performance depends on the extent to which cleaned data is used. Therefore, only preprocessed data should be directed into the model. In this study, every image in the dataset is resized into 224 x 224 pixels. The effectiveness of the training model depends on resized images in the way that, lesser the size of the image, then the model will run greater. The following step is processing all the images present in the data set into an array by using a loop function. In the last step, perform one hot encoding on the labels of the categorised dataset executed in the previous step. And also most of the machine learning algorithms are incapable of dealing with the categorical data directly, as they need numerical values in all input and output variables, including this study"s classification algorithm. In this step, the data will get changed into numerical labels where the algorithm can have a better understanding of how to process the given data properly. Next, the dataset will be separated into two groups, namely the training set which is 80%, and the rest will be the testing set which is 20%. Each batch will contain combined masked and unmasked images. This step is about structuring of the Face Mask Recogniser Model which consists of various stages like construction of training image generator for data augmentation; building of the foundational model using Mobile Net V2 feature extractor; adding model parameters such as ReLu, Softmax, Average Pooling 2D; compilation; training and lastly saving up the model for further future prediction. For the process of face detection, Blob Analysis is used on the acquired image datasets to analyse each face shapes, features; which includes area, length, locations and position of features. These features are extracted and passed to the Face-Net model. It uses deep convolutional neural networks(CNN) to train these features to obtain Face detections. It will be the respective object of interest or region of interest in the image. Thus the face will be recognised in the format of a rectangular box known as a bounding box. Finally, the created function will return the corresponding feature positions of Detected Faces such as locations, predictions by implementing blob analysis. The last part of the model is implementation in real time, which is about loading the previously trained Classification model from the disk and deploying it to give real time face mask detection. Model is employed on live video streams via WebCam. These video streams are made up of frames, that's why the video will be read from frame to frame, then Face Mask Recognition Algorithm performs in a way that face detection is applied on each frame of the video. It goes to the next process only if face is detected, and from the batches of frames containing detected faces, re-preprocessing will be carried out using classifier Mobile Net V2. The output of the classifier is a bounding box as "Red" for the subject who is not wearing a mask and "Green" for presence of mask. It contains a label in "text" format which holds the prediction of the detection as "Mask" or "No-Mask" along with a probability string which holds the predictive percentage of the detection. Fig. 4 . shows the results of real time implementation of the model. To confirm that the model is predicting effectively, the testing of the model comprises specific procedures. The first process is to make predictions upon the testing data set. Upon defined iterations, loss and accuracy are noted down while training the model. The results in training this model show that accuracy keeps on increasing while loss keeps on decreasing. At some point, where there is stable accuracy then there is no need for more iterations. Next step is utilising the below mentioned performance metrics to analyse the overall performance of the MobileNetV2 model: In this study, it is about handling the classification of facial features of people who are wearing masks or not. In the above formula, the metrics parameters are defined as follows: • Precision as in equation (1) estimates relevant data points such as faces that the model correctly identifies. • Recall as in equation (2) • False positive speaks for those sets of facial features of individuals, which are not part of that respective category of class but were falsely identified as being a part of that wrong class set. • False Negative expresses those sets of facial features of individuals, which actually are a part of that respective class category but were inaccurately classified as not being a part of that actual class set. The above mentioned performance metrics were specifically chosen due to their capability to give best results each time. And the results of training are shown using gg-plot which is given in Fig. 5 . In Table 1 , it defines the values of evaluation metrics which are Precision, Recall and F1-Score for each of the labels, obtained after testing of this study"s classification model. It is observed that the MobileNetV2 model achieves 98% accuracy. In contrast to conventional models which need large amounts of computation, the model given in this study can be easily embedded in real time recognition systems. In Table 2 , it gives a comparison of the existing Face Mask Recognition methods used in various works. Each of the models studied were able to perform classification and detection. Various methods used were a combination of YOLOv2 with ResNet, MobileNetV2 with SSD, SVM, GAN Retina Face Mask and VGG16, but the proposed model is using MobileNetV2 and CNN. It also shows these methods varied in the performance metric used and the recorded values for those metrics. But, the proposed model which uses MobileNetV2 Classifier with Convolutional Neural Network(CNN) method, had superior performance than any other model surveyed. The coronavirus pandemic has taught the world the importance of wearing a mask especially in public or crowded places. The mask if worn correctly can provide protection from the virus. If the mask is worn incorrectly it is useless and could lead to extension of the virus. Implementing a face mask recogniser is essential to keep a check on the public and control the virus to a certain extent. This face mask recogniser can help recognise the defaulters and rectify them. This model can be deployed in various crowded places like airports, bus stations, markets, offices, hospitals [29] . This study presents a Face Recognition Classifier Model which uses extensive deep learning, image processing techniques for face mask recognition. The model discussed is trained with a comprehensive dataset that consists of several images of "mask" and ""no mask" used for training. After different phases like training, performance correctness, and testing stages, the model provides the probability percentage of the mask worn by the people with high accuracy. All the organisations must quickly approve and make use of this machine learning techniques and new digital data assets, in order to use more unstructured data resources for more planning, prevention against COVID-19. Deep Learning Framework to Detect Face Masks from Video Footage Detection of Disease in Bombyx Mori Silkworm by Using Image Analysis Approach Face Mask Detection Using Deep Learning A convolutional transformation network for malware classification An Application of Deep-Learning Techniques to Face Mask Detection During the COVID-19 2021 IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech) Deep-learning-empowered breast cancer auxiliary diagnosis for 5GB remote E-health Masked Face Detection using Multi-Graph Convolutional Networks A Novel Approach to Detect Face Mask using CNN Offline signature recognition using image processing techniques and back propagation neuron network system Evaluating the Masked and Unmasked Face with LeNet Algorithm HIT4Mal: Hybrid image transformation for malware classification Real-time Face Mask and Social Distancing Violation Detection System using YOLO 3D reconstruction for motion blurred images using deep learning-based intelligent systems Face mask detection based on Transfer learning and PP-YOLO PCB Fault detection using Image processing Single Camera Masked Face Identification Image deconvolution for optical small satellite with deep learning and real-time GPU acceleration Face Mask Wearing Detection Using Support Vector Machine (SVM) Perceptual enhancement for autonomous vehicles: restoring visually degraded images for context prediction via adversarial training Face Mask Detection Using OpenCV Face Mask Detection Using MobileNetV2 in The Era of COVID-19 Pandemic Face mask detection using MobileNet and Global Pooling Block Study of the Performance of Machine Learning Algorithms for Face Mask Detection Deep Learning Model for Face Mask Based Attendance 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) Performance evaluation of intelligent face mask detection system with various deep learning classifiers Retina Mask: A face mask detector Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection Deep Learning based Safe Social Distancing and Face Mask Detection in Public Areas for COVID-19 Safety Guidelines Adherence A Novel GAN-Based Network for Unmasking of Masked Face