key: cord-0989986-hrwq09ci
authors: Kapoor, Rajiv; Goel, Rohini; Sharma, Avinash
title: An intelligent railway surveillance framework based on recognition of object and railway track using deep learning
date: 2022-03-14
journal: Multimed Tools Appl
DOI: 10.1007/s11042-022-12059-z
sha: 2b32d41444cdd36ee6e397ec875b5281f97e088e
doc_id: 989986
cord_uid: hrwq09ci

In high speed railways, the intelligent railway safety system is necessary to avoid the accidents due to collision between trains and obstacles on the railway track. The unceasing research work is being performed to reinforce the railway safety and to diminish the accident rates. The rapid development in the field of deep learning has prompted new research opportunities in this area. In this paper, a novel and efficient approach is proposed to recognize the objects (obstacles) on the railway track ahead the train using deep classifier network. The 2-D Singular Spectrum Analysis (SSA) is utilized as decomposition tool that decomposes the image in useful components. That component is further applied to the deep classifier network. The obstacle recognition performance is enhanced by the combination of 2D-SSA and deep network. This method also presents a novel measure to identify the railway tracks. In addition, the performance of this approach is analyzed under different illumination conditions using OSU thermal pedestrian benchmark database. This system can be a tremendous support to curtail rail accidental rate and monetary loads. The results of proposed approach present good accuracy as well as can effectively recognize the objects (obstacles) on the railway track which helps to the railway safety. It also achieves a better performance with 85.2% accuracy, 84.5% precision and 88.6% recall.

With the development of high speed railways, the safety necessities [37] of the railways are increasing day by day because the rapid speed of trains lead to rise in train mishaps. The obstacle on the railway tracks is one of the main factors that are responsible for the mishaps. The obstruction are pedestrians, vehicles crossing the tracks, animals meandering on the tracks [42] and some other heavy non-living objects fallen on the track from the overpass. In both developed and developing countries, a large number of railway crossing have not proper safety means like gates and lights. These issues cause challenges for the travelers as well as lead to monetary loss to railways as train cancelations and accidental compensations to aggrieved individuals. These issues can be resolved by utilizing a warning system that can significantly diminish the train mishaps by prior detection of the obstacles on the railway tracks. Various researches have been worked to detect and recognize the obstacles on the railway tracks. Zhou et al. [49] combined object detection and background learning by introducing a unified framework named DEtecting Contiguous Outliers in the low Low-rank Representation (DECLOR) for detection of moving object.

Chen et al. [8] proposed a technique dependent on image matching and frame coupling to handle the object detection issue caused by moving camera and movement of the object. Berg et al. [5] proposed a approach for obstacle recognition on the railway track utilizing monocular thermal camera and procured a novel data collection. Sinha et al. [39] introduced an object detection method on railway tracks utilizing vibration sensors and separate out the signal from significant level acoustic noise utilizing a novel Monte Carlo based Bayesian analysis. V. Amaral et al. [2] discussed obstacle identification framework in railway level crossing utilizing 3D point clouds procured with 2D laser scanner. Wu et al. [46] introduced a coarse to fine thresholding plan on particle trajectories in video sequence of moving object captured by moving camera. Karaduman [22] proposed the obstacle identification framework utilizing laser distance meter and rail mounted camera. Manikandan et al. [29] proposed an obstacle recognition approach utilizing thermal camera and ADA boost algorithm. The rapid growth in field of machine learning and deep neural network is also contributing in the field of railway applications.

Mittal et al. [31] introduced a vision based railway track observing methodology utilizing deep learning classifier for uncontrolled real world data. Krummenacher et al. [24] introduced two machine learning strategies dependent on SVM and neural network to recognize rail wheels. Garcia et al. [13] proposed a methodology for independent train stop activity utilizing monocular vision-based methodology and Deep-Learning models. Kapoor et al. [21] demonstrated a framework using Hough transform and HSV segmentation with deep learning for recognition of railway track and obstacles on it. The scene of the railway track has something in common with the road scene; therefore the research work in the field of road information detection also helps to broaden the viewpoint. Qi Wang et al. [43] proposed a siamesed fully convolutional network (s-FCN) to segment the road regions in RGB images to improve the detection performance. Wang et al. [44] also weakly supervised adversarial domain adaption to enhance the segmentation performance from the synthetic data to real scenes using three deep neural networks. Sudha et al. [41] utilized the YOLO deep network and improved visual background extraction algorithm to detect multiple vehicles in the input videos. Zeng et al. [38] built a fusion network for segmentation of lane line and detection of road elements for autonomous driving. Nayak et al. [33] also demonstrated the use of various deep learning models for detection application particularly for COVID-19 detection in chest X-ray images whereas in the proposed work deep detection frameworks are used for thermal images of railway track.

Other than obstacles on the railway track, the adverse weather conditions like rain, haze, darkness, cloudy weather are also big challenge for railway safety. As these conditions have different level of illumination that directly affect the visibility of the moving train at high speed. Many times low visibility conditions can cause major loss to railways. The system should also be invariant to illumination so that it can provide proper visibility in any type of adverse weather condition. Mangale et al. [28] suggested a methodology to identify object for low illumination and different weather condition thermal images using directed acyclic graph (DAG). M. Kristo et al. [23] discussed an approach for thermal object detection in adverse weather conditions using YOLO framework.

Decomposition is a fundamental stage that is imperative for providing proper information to the classifier for image recognition. Sharma et al. [34] presented a method for the automated emotion recognition using higher order statistics (i.e. third order cumulants). The sub bands of signals are applied to particle swarm optimization to remove redundant information. Then the deep learning algorithm is used to predict the human emotions. Srirangam et al. [26] utilized the Time frequency matrix of EEG signals that is evaluated by Fourier Synchro Squeezing Transform (FSST) and Wavelet Synchro Squeezing Transform (WSST) to classify focal and non focal EEG signals using deep CNN. Chaudhary et al. [6] presented an approach for decomposition using 2D Fourier Bessel Series Expansion based empirical wavelet Transform (FBSE-EWT). The sub images from 2D-FBSE-EWT are used with machine learning methods and ResNet-50 based method to detect glaucoma. Chaudhary et al. [7] also used Fourier Bessel Series Expansion-based Decomposition (FBSED) with various pre-trained deep network classifiers to diagnose the COVID-19 using X-ray and CT images. The 2D-EMD [12] approach outperforms numerous relative methods like wavelet based image decomposition [11] . In any case, the execution of 2D-EMD is truly tedious because it is an iteration based process [20] . Whereas the 2D-SSA approach requires no iterations therefore it significantly curtails computational cost with improved efficiency [47] . The 2D-SSA approach outperforms 2D-EMD and many other number of state of art approaches in terms of accuracy and also requires very less execution time than 2D-EMD [9] . The 2-D SSA algorithm is really useful for extracting more precise discriminating patterns from original data scenes [17] . There is always a requirement of an object recognition framework that is capable of achieving high safety on the railway tracks. The achievements of the deep learning (RCNN [14] , Fast/Faster RCNN [15, 36] , YOLO [35] & SSD [25] etc.) in the field of object recognition motivated to analyze their performance in obstacle recognition on the railway track. The performance of the 2D-SSA [18] in the field of image enhancement and image restoration [3] further motivates to utilize its capabilities along with deep network to enhance the performance of object recognition for railway safety.

In this paper, the proposed approach automatically performs railway track identification and object (obstacle) recognition on the railway track. In this approach, a system is designed using 2D-SSA that decomposes the input frame into various discriminative components (patterns). This approach is intended to identify railway tracks by using particular component (pattern) that contains the information about tracks. Then component (pattern) having the object (obstacle) is applied to the deep networks to efficiently recognizes the obstacles on the railway track. In addition, the major contributions of the paper are:

& The 2D-SSA approach is first time used to extract the railway track information to detect the railway track in railway safety applications. & To the best of our knowledge, this is the earliest effort to use 2D-SSA with deep network classifier to enhance the object recognition performance. & The performance of different deep networks with 2D-SSA is analyzed to choose suitable one for railway safety applications. & This proposed approach also provides an efficient recognition performance under different illumination conditions when tested with OSU thermal pedestrian database from OTCBVS benchmark dataset collection [10] .

The organization of paper in various subsections is as follows: Section 2 depicts the proposed strategy in detail including detection of railway tracks and recognition of obstacles on the railway track. The Section 3 describes the result and discussions. At the end, the conclusion and future scope provided in the Section 4.

The proposed technique is utilized to identify the railway track along with the obstacle on it from railway track thermal video (https://www.youtube.com/watch?v=xzGc71JFiBI). The basic block diagram of the proposed strategy is shown in Fig. 1 . The frames are separated from the thermal video sequence and motion artifacts are compensated from the frame. Then the frames are decomposed into the 'g' distinct components having different important information of the scene using 2D-singular spectrum analysis (2D-SSA). Then the features of the railway track are extracted from 'l th ' component to detect the railway track. Similarly, the 'k th ' component having information about the obstacle is applied to the deep network (Faster R-CNN/ SSD/ Yolov2/Yolov5 [27] ) to recognize obstacle. The final predictions (railway track and obstacle) are combined to illustrate the final results at the output. The distinct steps of the proposed approach are illustrated in algorithm 1 and explained in the following subsections.

The thermal railway track video may have motion artifacts. The motion blur is the artifact of fundamental importance in the captured images when the camera is not in focus due to the train motion. In this work, the effect of motion blur in the frame is compensated by using wiener filtering [48] as it has shown good performance in removal of motion blur in the images. The motion artifacts in the form of blur are compensated by wiener filter as:

Where F(u,v) is the input frame with motion artifacts, H w (u,v) is the wiener filter function, S(u,v) is the point spread function, k is the reciprocal of signal to noise ratio and I(u,v) is the output of wiener filter used for further processing. 

The efficient information extracted from the thermal video frames can enhance the performance of recognition system. In this work, 2D-SSA [47] is used as an efficient decomposition tool from thermal images. When Singular Spectral Analysis (SSA) is implemented on the 2-D signal i.e. image, the steps of implementation are embedding, the Singular Value Decomposition (SVD) [40] , diagonal averaging and grouping like 1-D SSA. The embedding and diagonal averaging steps have to be modified from 1-D to 2-D and rest of the stages remain identical with 1-D SSA. The steps of the 2D-SSA algorithm with mathematical explanation are discussed as below:

In the embedding of the 2-D signal (image), the initial stage is to transform the image (2Ddata) into another matrix form i.e. known as trajectory matrix [16] . The trajectory matrix not only contains all the information of an image but also retain the neighborhood information. Consider an image 'I' of size sxt. To construct trajectory matrix 'T', a window 'B' of size m Â n is considered, where 1 m s and 1 n t. Referring top left point of an image is reference point of the window. The reference points (x, y) ranges with 1 x Àm ¼ s À m þ 1 and 1 y Àn ¼ t À n þ 1 . The path of movement of window 'B x,y ' on image is given by reference points (x, y) as follows:

Then, the elements of the each window 'B x, y' is restructured as columns by transposing rows one by one as: B x;y ! ¼ I x;y ; I x;yþ1 À À À I x;yþnÀ1 À À À I xþmÀ1;yþnÀ1 À Á ð3Þ

where I x,y is the pixel value at location (x,y) and B x;y ! is coloumn vector for window 'B x,y ' with reference point (x,y). Finally all the column vectors B x;y ! are arranged into trajectory matrix 'T' as:

The next step is to perform Singular Value Decomposition (SVD) of trajectory matrix 'T'. The SVD expansion of 'T' can be obtained through the eigen decomposition of the lag covariance matrix C = TT T . The SVD of 'C' can be represented as follows [30] :

Where λ k k ¼ 1; 2:K ð Þare the eigen values of TT T and E k are normalized eigen vectors corresponding to λ k .

The process of diagonal averaging is performed with the help of two step hankelization [19] process. Firstly, apply hankelization within each block and then applied between blocks. The averaging of the value belongs to the same element in the image is carried out. This enables to decompose the image into several components extracted based on SVD.

Where, t (k) is the 2D signal projected from T k after diagonal averaging for k ¼ 1; 2:K ð Þ and T k is the 'k th ' weighted orthogonal matrix which is given as:

Then the next step is grouping of the 'K' number of 2D projected signals t (k) into 'g' disjoint groups using Hierarchical Cluster Analysis (HCA) [45] to obtain 'g' components of the input image as below:

The 'g' groups of the 2D projected signals are formed as s 1 ð Þ ; s 2 ð Þ s g ð Þ for 1 g K. The every group s (i) represents the decomposed component of the input image i.e. the sum of all the 2D projected signals of group 'i' which is represented as:

The final 'g' decomposed components of the input image are represented as 'S' which is given as:

Once the input frame is decomposed into 'g' components, then the each component is analyzed to identify that which component contains the desired information i.e. information about railway track and obstacles on the tracks.

The first and most significant part in the proposed work is the recognition of railway track. The frame is decomposed into various discriminative components (patterns) using 2D Singular spectrum analysis (2D-SSA). The discriminative components contain vertical, horizontal and diagonal information of the scene. The pattern of tracks most resembles with the component having diagonal features. The 'l th ' component that contains the strong pattern for the railway track is selected from the 'g' discriminative components of the particular frame as given below:

Where,

having strong information about railway track.

Then the coordinates of the railway tracks are extracted from the 'l th ' component using thresholding segmentation. The track features are given as:

Where, '1' and '0' addresses white and dark color respectively, S Track (i, j) is discriminative component (pattern) having strong information about railway track, S Seg (i,j) is output image having railway track features, T low and T high are the lower & upper limit threshold. These coordinate locations are highlighted on the particular frame for the representation of the railway track. Hence the railway tracks are recognized in each frame of the video sequence.

Once the railway tracks are identified, the subsequent step is to recognize objects (obstacles) on the railway tracks. Initially the 'k th ' component (having horizontal information) that contains the strong pattern for the obstacles on railway track is selected from the 'g' discriminative components of the particular frame which is given as:

Where, s (i) | i=k is the 'k th ' discriminative component (pattern) having strong information of obstacles on the railway track. After the extraction of components having obstacles information, this particular component is initialized to the deep network to recognize the obstacles.

Faster Region based Convolutional Neural Network (Faster R-CNN) [36] is utilized to perceive obstacles on the railway tracks. The Faster R-CNN comprises of two modules: initial, a deep fully convolutional network used to propose regions and second is the Fast R-CNN detector [15] that identifies the objects using region proposals. In default setup, there are anchors at position of image having 3 scales and 3 ratios. At last, the Fast R-CNN network is utilized for classification which has two fully connected layers. The one layer classifies the proposals in N+ 1 distinctive class (where N class and one background class for eliminating bad proposals). Another fully connected layer is utilized for better adjustment of bounding box for 'N' classes utilizing regression prediction. The anchors those overlap the ground truth with an Intersection over Union (IOU) greater than 0.7 are classified as foreground and the anchors don't overlap any ground truth object (having IOU under 0.3) is classified as background. The loss function to be limited is given as:

Where, m v is the predicted probability of anchor v being an object. Vector n v indicates the parameterized coordinates of predicted bouncing box. L c and L r are the classification and regression loss respectively. The object identification strategy YOLOv2 [35] divides the image into S × S grids and predicts B bounding box and C class probability for every framework cell. Each bounding box comprises of five predictions and object confidence. The object confidence addresses the dependability of existing object in the box. Nonetheless, as an overall object identification model, YOLOv2 is appropriate to cases where there are a variety of classes to be distinguished, and the distinctions among the classes are huge. At last, the non-maximum suppression (NMS) strategy is applied to dispose of redundant bounding boxes.

The most recent model of the YOLO object recognition model is now YOLOv5 [27] , released by Glenn Jocher in 2020. YOLOv5 depends on the PyTorch structure. The YOLOv5s is the smallest model, and YOLOv5m, YOLOv5l and YOLOv5x are according to the increasing size. As the network size increases, its performance may likewise increases, at the expense of extra processing times. Accordingly, the bigger models may just be helpful for complex issues where enormous datasets are available.

During preparing, SSD [25] need to figure out which default boxes compare to ground truth detection and train the network accordingly. For each ground truth box, the selection is done from default boxes that vary over location, aspect ratio, and scale. The SSD training objective is drived from the MultiBox objective yet is extended to deal with various object classes. The overall objective loss function is a weighted sum of the localization loss &; the confidence loss.

The trains have a braking distance before which they have to be alerted avoid collision with obstacles. The braking distance of a train is a distance from the point its brakes are applied to the point it comes to a stop. The braking distance 'D' of the train [4] is given as:

Where, 'u' is the speed of train at the point when brakes are applied, 'a' is train's deceleration rate, 'g' is the acceleration due to gravity and (h 2 -h 1 ) the gradient of the track i.e. the difference in height at which deceleration began (h 1 ) and the its height at the stopping point (h 2 ). Normally Indian trains decelerate at the rate of 0.5 m/s2 to 1.2 m/s2 which may increase up to 1.5 m/s2 in case of emergency brakes [1] . In plain regions, so the gradient of track (h 2 -h 1 ) is considered zero. The average speed of train is approx. between 80 and 100 Km/h [1] . The speed/time wise analysis along with the calculation of braking distance at speed of 100 Km/h is illustrated in Table 1 .

The timely recognition of the obstacles on the track (i.e. before the braking distance) is the main aspect of the performance of the recognition system i.e. R D > D. The recognition distance R D is the distance between obstacle and train at the time of recognition. The minimum time at which the system should give an alert, must be greater than braking time 't B ' of the train. The performance of the network is improved for the timely recognition of the obstacle by optimizing the deep network parameters.

In this section, the experimental setup for the proposed approach is discussed along with the experimental results illustrated subsequently.

In this work, the frames are extracted from thermal video sequence of railway tracks. Total 749 frames of thermal video are utilized to figure out the performance of proposed work. The implementation of Faster R-CNN is performed using system with configuration Intel(R) Core(TM) i5-1035G1CPU @ 1.00 GHz, 1.19 GHz, 16 GB RAM and NVIDIA MX230 GPU.

First of all the frames are extracted from the thermal video sequence of railway track as shown in Fig. 2(a) and motion artifacts are compensated. The results of the removal of the motion artifacts are illustrated in Fig. 2(c) . Then the each frame is decomposed into several components using 2D-SSA. There are two important parameters which affect the performance of the SSA i.e. window size m Â n and the number of components 'g' to be decomposed from the input frame. The performance of the proposed approach is analyzed for different values of m Â n and g but the most informative components are decomposed at window size of m Â n ¼ 8 Â 8 and number of discriminative components (patterns) g=15. Hence, every frame is decomposed into the 15 distinct components are shown in Fig. 2(d) . The different components decomposed from input frame highlight the different information of the scene captured by the frame. In the proposed work, the desired information about railway track and obstacles are utilized for further processing.

For the detection of the railway track, the every component decomposed from the each frame is analyzed, and then it is observed that the component S Track = s (4) (4th component) of every frame contains strong information about the railway tracks. Therefore, in this work, the 4th component of each frame is used for the recognition of the railway track. Then the features of the railway tracks in the 4th component are separated and utilized for the recognition of the railway track. At last, these features are superimposed on the frame to represent the tracks. The 4th component of the frame, the extracted features from the components and the final recognized tracks are shown in Fig. 3(b) , (c) and (d) respectively. Similarly, the railway track is identified in all frames and hence it enables the railway track recognition. Fig. 4(b) .

In this work, the transfer learning technique is applied by using the pre-trained networks.

Additionally, fine-tuning of the parameters and extending the training set with the sample collection are performed to improve the performance as much as possible. Through the transfer learning approach, the training is started with the implementation of the pre-trained parameters to include the useful information gathered from a previously trained network with thermal 

At last the different deep networks (such as Faster R-CNN, SSD, Yolov2 and Yolov5) are used to recognize obstacles on the railway tracks. When the deep networks are trained with training data, then 2 nd components of the 2D SSA output of validation frames is applied to the network which gives recognized obstacles at the output. The recognized obstacle on the railway track in the 2 nd component using Faster R-CNN along with railway track is shown in Fig. 4(c) . Figure 4 (d) shows the final recognized railway track & obstacle on the track using Faster R-CNN. The recognized obstacle on railway track in the 2 nd component along with railway track using SSD is shown in Fig. 4(e) .

Figure 4(f) shows the final recognized railway track and obstacle using SSD deep network on the frame. Figure 4 (g) demonstrates the recognized obstacle on the railway track in the 2nd component along with railway track using YOLOv2. The final recognized railway track and obstacle using YOLOv2 deep network on the frame is shown in the Fig. 4(h) . The recognized obstacle on the railway track in the 2nd component using Yolov5 is shown in Fig. 4 (i) and the final recognized railway track & obstacle on the track using Yolov5 is shown in Fig. 4(j) . The obstacle recognition using Yolov5 is also performed without 2D-SSA as shown in Fig. 4(l) to analyze the effect of 2D-SSA on the recognition performance of Yolov5.

The performance of the proposed technique can be assessed with the assistance of various parameters. In this work, the important parameters used to analyze the performance are recognition accuracy [32] , precision [32] and recall [32] which are given as: 

Recall ¼ True Positive Number of images having object class ð18Þ

On the railway track, obstacle ought to be recognized before an adequate distance (i.e. braking distance) with the goal that the speed of the train can be controlled to avoid collision. The performance of the Faster R-CNN is improved for the timely recognition of the obstacle by optimizing the Intersection over Union (IoU) parameter. All in all, the IoU value between 0.7 to 0.9 is considered as adequate range for the precise identification utilizing Faster R-CNN Network. The initial results of the proposed algorithm are acquired by identification network utilizing threshold value 0.7 as appeared in Fig. 5 (a) yet the obstacle is recognize at shorter distance from the train. The performance of the system is analyzed for different values of IoU to make a tradeoff between recognition accuracy and distance from train at the time of recognition. To take care of this issue, it is observed that at the IoU threshold value '0.5' the Faster R-CNN not only timely recognizes the obstacle on the track but also attains good accuracy. The final result of railway track identification and obstacle recognition at IoU =0.5 is appeared in Fig. 5(b) . The performance of the purposed system is analyzed for different IoU using above mentioned performance metrics. The parameters evaluated for different values of the Intersection over Union (IoU) as shown in Table 2 . The accuracy, precision and recall parameters have better value at IoU '0.5'. Figure 6 illustrates that the graphical representation of the performance of the proposed method. It is concluded from the observations that the purposed method performs better at IoU= 0.5 with accuracy of approximately 85%.

The performance comparison of our proposed method with the other approaches used for object recognition is done for thermal railway track data with these parameters as shown in Table 3 . Initially the performance of YOLOv2, SSD, Yolov5 and Faster R-CNN framework are analyzed for recognition of obstacles, and then these approaches are used along with 2D SSA for enhancement of the recognition performance for thermal railway track data. The SSD approach performs better than the YOLOv2 framework but not than others. The Yolov5 and Faster R-CNN framework has shown better performance than both YOLOv2 and SSD. When 2D SSA is combined with these frameworks, it will boost the recognition performance.

Although the performance of YOLOv2 and SSD is enhanced by utilizing 2D SSA but it unable to outperform the performance of the Faster R-CNN and Yolov5. When the Faster R-CNN and Yolov5 are used with 2D SSA, then the recognition system outperforms all the methods for thermal railway track data as discussed in comparison. The comparative plot is also shown in Fig. 7 for performance analysis of this work with other state of the art methods. Table 4 illustrates accuracy vs. inference time analysis of the proposed work with other approaches. The performance of Faster R-CNN and Yolov5 is at par in terms of accuracy, precision and recall for thermal railway track data but. Yolov5 outperforms in terms speed. As discussed in Section 2, if the train is running at the speed of 100 Km/h, it takes 56 s. (approx.) to stop from the point when brakes are applied with deceleration of 0.5 m/s 2 and covers braking distance of 770 m. The proposed system detects the obstacle from the distance of 1Km i.e. the proposed system gives alert approx. 73 s. prior. As discussed in Table 4 the detection time required is 231 ms. The minimum processing time (i.e. sum of detection time and braking time) is 231ms + 56 s = 56.231 s. Hence the proposed system can comfortably avoid the possibility of collision between train and obstacle to enhance railway safety. 

The low illumination due to adverse environmental conditions like rain, Haze, cloudy weather and darkness can cause problems in recognition of obstacles. The proposed work is also tested for different illumination conditions to analyze its illumination invariance. The OSU Thermal Pedestrian Database from OTCBVS Benchmark Dataset Collection [10] is used to test the performance of the purposed work in different illumination conditions. The dataset contains thermal images of different weather conditions like light rain, Haze, cloudy and fair sunlight. The recognition performance is analyzed with the Faster R-CNN network that was additionally trained on thermal image data for the object class. For the training and testing purposes, the Faster R-CNN detector architecture is used. Figure 8(a) shows the thermal image under light rain conditions in the afternoon with temperature: 68°F, minimal dew point: 58°F, humidity: 70% and visibility: 9.0 miles. The 2nd component is decomposed from the image using 2D SSA as shown in Fig. 8 Figure 9 (b) shows the 2nd component is decomposed from the image using 2D SSA because the 2nd component strongly contains the information about pedestrian. Figure 9 (c) represents the output of the Faster R-CNN with recognized pedestrian. The final recognized pedestrians in the input image of light rain conditions are shown in Fig. 9(d) . It is analyzed that the proposed system can work for recognition of objects under Haze conditions. The proposed work is also tested for thermal image having scene captured under dark cloudy conditions. Figure 10(a) shows the thermal image under dense cloudy conditions in the morning with temperature: 53°F, minimal dew point: 46°F, humidity: 77% and visibility: 8.0 miles. The 2nd component is decomposed from the image using 2D SSA as shown in Fig. 10(b) because the 2nd component strongly contains the information about pedestrian. Figure 10 (c) represents the output of the Faster R-CNN with recognized pedestrian. The final recognized pedestrians in the input image of light rain conditions are shown in Fig. 10(d) . This method also shows good recognition results for low illumination in dark clouds. The next illumination condition is considered when weather is clear and sunny. The thermal image captured under fair sunny conditions in the morning with temperature: 71°F, minimal dew point: 32°F, humidity: 45% and visibility: 10 miles is shown in Fig. 11 (a). Figure 11 (b) shows the 2nd component is decomposed from the image using 2D SSA because the 2nd component strongly contains the information about pedestrian. Figure 11 (c) represents the output of the Faster R-CNN with recognized pedestrian. The final recognized pedestrians in the input image of light rain conditions are shown in Fig. 11(d) . At last, the illumination condition in partly cloudy weather is considered. The thermal image captured in partly cloudy in the morning with temperature: 57°F, minimal dew point: 37°F, humidity: 47% and visibility: 10 miles is shown in Fig. 12(a) . The 2nd component is decomposed from the image using 2D SSA because the 2nd component contains the information of pedestrian as shown in Fig. 12(b) .

The output of the Faster R-CNN with recognized pedestrian is represented in Fig. 12 (c). Figure 12 (d) shows the final recognized pedestrians in the input image of light rain conditions. Hence the performance of the proposed work is analyzed under different illuminations in different weather conditions; it gives good illumination invariance and shows better performance in all illumination conditions.

This paper introduced a novel and efficient method for identification of railway track and obstacle recognition on the railway track. The proposed approach utilizes the 2D SSA to decompose the thermal image into distinct information carrying components, and then the deep learning network used the particular component to identify the obstacle on the railway tracks. Identification of the railway tracks using SSA is another significance of this work. The concatenation of Deep network with 2D SSA introduced a more efficient and robust approach to build a early warning framework to forestall railway accidents to enhance railway safety. As there is no significant modification is needed in the train infrastructure, so this system will be cost-effective. This will also diminish the monetary burden in terms of railway compensation. Additionally, this work is also shows good performance when tested for different illumination conditions using OSU thermal pedestrian database from OTCBVS benchmark dataset collection. This system will be extended for more efficient object recognition under distinct recognition challenges like occlusion, background clutter etc. as a future work to improve the adequacy of the framework.

Studies on the effects of braking loads on a Railway Wheel

12 (a) Input thermal image under partly cloudy condition (b) The 2nd component of the input image containing more dominant information of pedestrians (c)

Laser-based obstacle detection at railway level crossings

Singular spectrum analysis for image processing

Calculating train braking distance

Detecting rails and obstacles using a train-mounted thermal camera

Automatic diagnosis of glaucoma using two-dimensional Fourier-Bessel series expansion based empirical wavelet transform

FBSED based automatic diagnosis of COVID-19 using X-ray and CT images

A novel method of object detection from a moving camera based on image matching and frame coupling

Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool

A two-stage template approach to persondetection in thermal imagery

Improved hyperspectral image classification with noise reduction pre-process

Empirical mode decomposition of hyperspectral images for support vector machine classification

Application of computer vision and deep learning in the railway domain for autonomous train stop operation

Rich feature hierarchies for accurate object detection and semantic segmentation

Fast R-CNN

On the choice of parameters in singular spectrum analysis and related subspace-based methods

An algebraic view on finite rank in 2D-SSA

2D-extension of singular spectrum analysis: algorithm and elements of theory

Analysis of time series structure: SSA and related techniques

The empirical mode decompositionand the Hilbert spectrum for nonlinear and non-stationary time seriesanalysis

Deep learning based object and railway track recognition using train mounted thermal imaging system

Image processing based obstacle detection with laser measurement in railways

Thermal object detection in difficult weather conditions using YOLO

Wheel defect detection with machine learning

SSD: single shot multibox detector

Time-frequency domain deep convolutional neural network for the classification of focal and non-focal EEG signals

Augmented reality maintenance assistant using YOLOv5

Object detection and tracking in thermal video using Directed Acyclic Graph (DAG)

Vision based obstacle detection on railway track

Statistical and adaptive signal processing: spectral estimation. Signal Modeling, Adaptive Filtering and Array Processing

Vision based railway track monitoring using deep learning

Investigations of object detection in images/videos using various deep learning techniques and embedded platforms-A

Application of deep learning techniques for detection of COVID-19 cases using chest X-ray images: A comprehensive study

Automated emotion recognition based on higher order statistics and deep learning algorithm

Yolo9000: Better, faster, stronger

Faster R-CNN: Towards real-time object detection with region proposal networks

A novel objects detection system for improving safety at unmanned railway crossings

Road information detection method based on deep learning

Obstacle detection on railway tracks using vibration sensors and signal filtering using bayesian analysis

Introduction to Matrix Computations

An intelligent multiple vehicle detection and tracking using modified vibe algorithm and deep learning algorithm

Detection of moving objects in railway using vision

Embedding structured contour and location prior in Siamesed fully convolutional networks for road detection

Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes

Hierarchical grouping to optimize an objective function

Moving object detection with a freely moving camera via background motion subtraction

Novel two-dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging

An adaptive restoration method for motion-blurred image based on Wiener filtering

Moving object detection by detecting contiguous outliers in the low-rank representation

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Acknowledgements Authors are grateful to Delhi Technological University and specially Sh Anand Vardhan, CEO of MM Logic soft P Ltd for sponsoring the project.