key: cord-1015279-pg1r3qqr authors: Zang, Xiaohan; Li, Baimin; Zhao, Lulu; Yan, Dandan; Yang, Licai title: End-to-End Depression Recognition Based on a One-Dimensional Convolution Neural Network Model Using Two-Lead ECG Signal date: 2022-02-07 journal: J Med Biol Eng DOI: 10.1007/s40846-022-00687-7 sha: 64634c731a23e764f79dae78130c20d4fe57e284 doc_id: 1015279 cord_uid: pg1r3qqr PURPOSE: Depression is a common mental illness worldwide and has become an important public health problem. The current clinical diagnosis of depression mainly relies on the doctor’s experience and subjective diagnosis, which results in the low diagnostic efficiency and insufficient objectivity of diagnostic results. Therefore, establishing a physiological and psychological model for computer-aided diagnosis is an urgent task. In order to solve the above problems, this article uses a convolutional neural network (CNN) to identify depression based on electrocardiogram (ECG). METHODS: Our method uses the raw ECG signal as the input of one-dimensional CNN, and uses the automatic feature processing layer of CNN to learn and distinguish signal features without additional feature extraction and feature selection steps. In order to obtain the optimal model, ECG segments of different durations (3 s, 4 s, 5 s and 6 s) and CNNs with different layers were used for comparison. In order to obtain modeling data, the resting ECG of 37 depression patients and 37 healthy controls were collected. In the proposed network, larger convolution kernels are used to better focus on overall changes. In addition, this article focuses on the inter-patient data classification standard, where the training and test sets come from different patient data. RESULTS: Through comprehensive comparison, the 5 s ECG segment and 5-layer CNN are recommended in related applications. The proposed approach achieves high classification performance with accuracy of 93.96%, sensitivity of 89.43%, specificity of 98.49%, positive productivity of 98.34%. CONCLUSION: The experimental results indicate that the end-to-end deep learning approach can identify depression from ECG signals, and possess high diagnostic performance. It also shows that ECG is a potential biomarker in the diagnosis of depression. With the development of society and the rapid economic growth, people's various pressures are gradually increasing. Mental disorders occur frequently and people are gradually eroded by psychological sub-health. As a typical mental illness, depression has become an important public health problem. According to the report from the World Health Organization in 2021, about 280 million people worldwide are suffering depression, and depression is a leading cause of disability worldwide [1] . However, its complex pathogenesis remains poorly understood. The diagnosis of depression mainly relies on the comprehensive evaluation of psychiatrists and lacks specific physiological indicators. In addition, there is a serious shortage of psychiatrists. On average, there is less than one mental health worker for every 100,000 people in the world, and the ratio is far below 1 per 100,000 people in low-and middle-income countries [2] . The outbreak of the COVID-19 has further aggravated the psychological pressure of the public. One-third of the participants exhibited anxiety, and nearly one-fifth of the participants had depression symptoms and sleep problems [3] . The current clinical diagnosis efficiency and medical resource problems of depression have become increasingly prominent, which makes many depression patients miss the opportunity for timely treatment. It not only increases the difficulty of diagnosis and treatment, but also causes serious impact on patients' physical and mental health. Therefore, establishing a physiological and psychological model for computer-aided diagnosis is an urgent and meaningful work, which will help to explore the physiological and pathological variations of patients with depression, and can also provide objective data support for depression screening. Previous studies have shown that depression is associated with dysfunction of the autonomic nervous system (ANS), and that patients with depression show reduced parasympathetic modulation [4] [5] [6] . ANS modulation can be evaluated by analyzing heart rate variability (HRV) and is widely used in the identification of psychiatric disorders [7] [8] [9] . Many recent studies have used HRV to identify depression. For example, Kuang et al. [10] developed a method to distinguish healthy people from depression patients based on HRV sequence. Byun et al. [11] employed HRV indicators to classify patients with severe depression from healthy people, with an accuracy rate of 74.4%. However, most previous studies [10] [11] [12] [13] have used conventional machine learning methods combined with HRV to identify depression. Although these methods achieve favourable classification performance, they have many drawbacks. For example, the HRV sequence is usually represented by the variation in RR intervals in the ECG signal [14] . However, R points obtained by different QRS wave detection algorithms are also different, which will affect the classification results of subsequent classifiers. Moreover, machine learning relies heavily on feature extraction and feature selection. This process is not only time-consuming and labor-intensive, but also computationally expensive. Different from conventional machine learning, the method based on deep learning can self-learn features from input signals without manual feature extraction and feature selection. Many papers have shown that the deep learning architecture performs excellently in classifying ECG signals for heart diseases. Hannun et al. [15] developed a deep neural network (DNN) that uses a single-lead ECG to classify 12 rhythm categories with high diagnostic performance similar to that of cardiologists. Attia et al. [16] found that the application of AI using CNN to the standard 12-lead ECG enabled the detection of left ventricular dysfunction with an area under the curve (AUC) of 0.93. Moreover, the development of deep learning based solutions has also been increasingly applied to classify physiological signals [17, 18] . Based on the above reasons, this article attempts to employ deep learning methods directly on ECG signals to recognise depression. Firstly, the ECG signals of all subjects are preprocessed and divided into ECG segment with fixedlength (for example, 5 s) to expand the number of samples and real-time detection. Then the ECG segment is input to CNN, and the recognition result is obtained. In the proposed network, larger convolution kernels are used to better focus on overall changes and the max-pooling operation is applied to all the pooling layers. In addition, this article focuses on the inter-patient data classification standard that is more in line with the actual clinical environment, where the training and test sets come from different patient data. In order to compare the recognition effects of ECG segments with different lengths, the ECG segments of 3 s, 4 s, 5 s and 6 s were adopted. Accuracy, positive productivity, sensitivity, and specificity are used to evaluate the performance of the model. The experimental results indicate that the end-to-end deep learning approach can identify depression from ECG signals, and possess high diagnostic performance with an accuracy of 93.96%. If confirmed in the clinical environment, the identification method can be embedded in ECG monitors and smartphones, thereby simplifying and facilitating the early diagnosis of depression. As far as we know, few studies directly use ECG signals to classify patients with depression. This will be a worthwhile and interesting attempt. Thirty-seven depression patients and 37 healthy subjects were employed in this research. The demographic and clinical characteristics of the subjects are shown in Table 1 . All subjects had no heart disease (including coronary heart disease, tachycardia, myocardial ischemia, etc.), hypertension, or cerebral infarction. All depression patients were inpatients recruited from the Second Affiliated Hospital of Jining Medical College, Shandong, China. The experiment was conducted according to the Declaration of Helsinki and approved by the Ethics Committee of the Second Affiliated Hospital of Jining Medical College. Informed consent was obtained from all individual participants included in the study. The healthy subjects in this research were recruited from Shandong University and did not have any history of mental disorders or drug abuse. All subjects did not drink caffeine or alcoholic beverages, nor did they engage in any strenuous activities in the 24 h before the data collection. The ECG signal was collected in a separate room, with no noise, no electromagnetic interference, and was kept at a suitable temperature and humidity. Selected subjects were made an appointment previously, and were informed of the contraindications before the collection. After entering the collection room, subjects need to fill in basic information such as name and age, and then they will be informed of the collection process and matters needing attention. The multifunctional physiology experiment instrument RM-6280C (Chengdu Instrument Factory, Sichuan, China) was used to collect the ECG signals of the subjects, and its sampling rate was set to 1 kHz. After the subjects lay flat on the testbed, their ECG signals were recorded with three electrodes placed on the right wrist and both ankles in accordance with the standard limb two-lead ECG acquisition method. Subjects were reminded to close their eyes and keep their whole body relaxed during the collection. ECG signals were recorded after stabilization, and the whole effective signal lasted for 5.5 min. Examples of the raw ECG signals are shown in Fig. 2a . After recording the signal, heart rate and pulse pressure were measured with OMRON HEM-7051. In order to maintain consistency, the ECG signals of depression patients and healthy controls were obtained under the same experimental conditions. During the process of collecting data for depression patients, two psychiatrists were accompanied to prevent any emergency. The framework of the proposed approach is shown in Fig. 1 . The approach includes three major stages: data preprocessing, data segmentation, and classification. The ECG signals collected by the ECG equipment often contain a lot of noise. Common noise includes electromyogram (EMG) interference [19] , power frequency interference [20] and baseline drift [21] . Therefore, the ECG signals need to be filtered before classification. After comparing several filtering methods [22] [23] [24] [25] [26] in the experiment, low-pass filter was used to remove EMG interference, notch filter was used to remove power frequency interference, and median filter was used to remove baseline drift. First, a Butterworth low-pass filter was used to filter out high frequency noise above 80 Hz, then a notch filter was used to filter out 50 Hz power frequency interference, and finally a median filter with a window size of 109 was used to eliminate baseline drift. In order to eliminate offset and standardize the ECG signals amplitude, all ECG signals were resampled to a common sampling rate of 360 Hz and then standardized by z-score. Finally, 50 sampling points before and after the ECG signals were deleted to eliminate the influence of time delay in the filtering process. The preprocessed ECG signals segment is shown in Fig. 2b . . 1 The framework of the proposed approach The number of samples in a dataset is an important consideration in deep learning problems. Data segmentation, the process of dividing samples in ECG data into meaningful segments, is a potential approach to address this problem, and it has been used in existing ECG data studies [27] [28] [29] . Considering that depression is a continuous state, which is different from cardiogenic diseases that cause specific changes in a single heartbeat, this research uses ECG segments as the research object. In addition, the information intercepted by ECG segments is more comprehensive and does not depend on a specific QRS algorithm. In this research, the 5.5 min data set was divided into epochs of 5 s (or other lengths) each and assigned with the same label. CNN is one of the most commonly used types of artificial neural networks [30] . The parameters of the convolution kernel share the sparsity of the connection between the hidden layers, so that CNN has a smaller calculation weight for the grid-like features, and has a stable effect on the feature extraction of the data. Since the dimensionality of the ECG signal is different from that of the image, this study introduces a one-dimensional CNN. CNN consists of three basic layers, including convolutional layers, pooling layers and fully connected layer with active functions. Table 2 shows the architecture of the CNN model in this paper. This network contains 2 convolution layers, 2 maxpooling layers and 1 fully-connected layer. The activation function of the convolutional layer is the rectified linear units (RELU). After each convolution layer, a max-pooling is applied to the obtained feature maps. The Softmax function of the fully-connected layer outputs normal or depression. In the training stage, the cross-entropy function is applied as the loss function, and Adam is used as the optimizer. When training the model, the learning rate was set to 0.01 and 100 epochs were used. We used one-dimensional CNN to extract features from ECG segments and classify them. The network can automatically learn the features in ECG signals, avoiding complicated manual extraction. The preprocessed ECG signals are used directly as input, so it has more information than manually extracting features. We follow the guidelines provided by the Association for Advancement of Medical Instrumentation (AAMI) to Fully-connected 2 --calculate four performance metrics, accuracy [31] , positive productivity [32] , sensitivity [33] and specificity [34] as the followings: where TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives respectively. In this section, we describe the experiments to evaluate our method. There were 37 depression patients and 37 normal controls each. Using the inter-patient data classification standard, 29 people were randomly selected as the data source for the training set and the remaining 8 as the data source of the test set. This makes the ECG segments of the training and testing sets come from different people. The candidates in the training and test sets were randomly changed, and 10 independent experiments were performed. Take the average of 10 test results as the evaluation results. In order to achieve the purpose of clinical real-time detection, a duration of less than 10 s is considered. In this research, 3 s, 4 s, 5 s, 6 s ECG segments were used for experiments comparison. Taking the duration of 5 s as an example, 4000 ECG segments in the training set and 1060 ECG segments in the test set were obtained by non-overlapping cutting of ECG signals. The number of ECG segments in depression patients and normal controls was equal. Figure 3 and Table 3 respectively show the confusion matrix and classification evaluation results of ECG segments of different durations, where D stands for depression and N for normal. It can be seen from Fig. 3 that the ECG segments of different durations show the same pattern, that is, the normal recognition rate is slightly higher than that of depression. And with the increase of durations, the recognition rate of depression patients is on the rise. Table 3 that the classification performance obtained by ECG segments of different durations is relatively ideal. The sensitivity and specificity can reach 98.49% and 98.34% when the ECG segment is 5 s. The highest classification accuracy is also the 5 s ECG segment, which is 93.96%. As the length of the ECG segment increases, the classification accuracy, sensitivity, specificity and positive productivity are gradually improving, while the classification performance corresponding to the 6 s ECG segment decreases slightly. There are two reasons for speculating: (1) The amount of information contained in the long record is too much, which is not conducive to fine recognition; (2) The increase in the length of the data leads to a decrease in the amount of data, which is not conducive to the application of deep learning. Excessive data will not only lead to longer running time but also higher hardware requirements. This is not conducive to the complex and large number of iterations, nor can it meet real-time requirements. Under comprehensive consideration, the 5 s ECG segment is recommended in related applications. CNN models with different convolutional layers are used to classify 5 s ECG segments to analyze the influence of network architecture on the classification results. Table 4 shows the results. It can be seen from Table 4 that the best classification effect is the 11-layer CNN model with an accuracy rate of 94.94%. Compared with the five-layer CNN, the 11-layer CNN improves the accuracy by 0.95%, but increases the training time by 22.8 times. As the number of layers increases, it takes more time to complete the classification, but the classification performance does not significantly improve. Considering the computational cost and running time, the 5-layer CNN model is more recommended in related applications. Another interesting phenomenon is that most of the existing deep learning to classify ECG signals use small convolution kernels of size 5 and 3 [16, [35] [36] [37] , while our network uses a one-dimensional convolution kernel of size 16. After conducting an experimental comparison, it is found that the classification accuracy of a convolution kernel with a size of 16 is about 5% higher than that of 3. It is speculated that different research purposes mainly cause this difference in size. The existing related literature mainly uses the difference of a single heartbeat to distinguish heart diseases. Even when the receptive field is the smallest, the area where the convolution kernel size of 3 may contain meaningful changes. While depression is a persistent state, a convolution kernel with a size of 16 can better pay attention to the overall changes. The training results of inputting 5 s ECG segments into the 5-layer CNN model are shown in Fig. 4 . Figure 4a and b respectively represent the model variation curves of loss value and accuracy rate in the model training process. It can be seen from the figure that the convergence speed of the model is fast, and the accuracy rate reaches 95% at the 8th epoch; after the 81st epoch, the loss value drops below 0.01 and remains stable. The proposed method for identifying depression is also compared with other recent studies, and the results are shown in Table 5 . In the research field of ECG-based depression recognition, the proposed method has the highest accuracy rate of 93.96% than all other approaches. Also, Table 5 reveals that the method proposed in this paper has achieved superior performance in reducing the number of false positives, thereby obtaining a sensitivity of 98.49%. The proposed technique has comparable sensitivity to Kuang et al.'s [10] research, while achieving higher accuracy and specificity. In addition, the positive productivity of the proposed method is higher than that of Xing et al. [13] . Some previous studies [11, 12, 38] focused on classifying patients with major depressive disorder and healthy people, thus ignoring the identification of mild and moderate patients. In other studies, Kuang et al. [10] achieved an [39] used relatively small sample sizes. Previous studies have used HRV features to discriminate patients with depression, and HRV is usually expressed from the RR interval collected from ECG data. It is inevitable that the QRS wave algorithm will be utilized to locate R-peak. The R-peak positioned by different QRS wave algorithms also has biases. The proposed method directly takes the ECG signal as input without extracting the HRV sequence. In addition, previous studies used simple classifiers to train features manually extracted from HRV sequences [10-13, 38, 39] . The performance of this type of classifier mainly depends on the feature selection process, which is laborious and time-consuming. The proposed method can overcome the shortcomings of manual feature extraction and selection. It is worth noting that this research uses the inter-patient division method, which means, the test set and training set come from different patients. This method is more in line with actual application scenarios, where known patient data is used to train the model and new patient data is used for testing. ECG signals are easy to obtain and its detection is inexpensive. More importantly, compared with other commonly used physiological signals, the popularity of ECG acquisition equipment is high. Our approach could be integrated in ECG acquisition equipment for initial screening of depression in community hospitals and economically disadvantaged areas to solve the shortage of psychiatrists in economically disadvantaged areas. In order to automatically identify depression, a method of using CNN to classify ECG signals is proposed in this paper. CNN dose not need to manually extract and select features. And, as an effective classifier, CNN can directly take ECG segments as input. This research compares the classification effects of ECG segments with different durations and CNN networks with different layers. Under comprehensive consideration, a 5 s ECG segment and a 5-layer CNN were suggested in related applications. Experimental results reveal that the proposed approach achieves high classification performance with accuracy of 93.96%, sensitivity of 89.43%, specificity of 98.49%, positive productivity of 98.34%. In short, the method developed in this research can quickly and accurately distinguish patients with depression. Overall, the proposed method has great potential as a computer-aided diagnostic tool for screening patients with depression. This research used data from 37 depressed patients and 37 normal controls. Small samples limit the use of other deep neural network models. Increased sample size usually leads to improved performance. At present, we are recruiting more research subjects. Besides, this research has only completed the classification of depression and normal people. For future research, we will try to classify different levels of depression. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Generalized anxiety disorder, depressive symptoms and sleep quality during COVID-19 outbreak in China: A web-based cross-sectional survey Alteration of heart rate variability in patients of depression Impact of depression and antidepressant treatment on heart rate variability: A review and metaanalysis Autonomic neurocardiac function in patients with major depression and effects of antidepressive treatment with nefazodone Acute mental stress assessment via short term HRV analysis in healthy adults: A systematic review with metaanalysis Heart rate variability in depressive and anxiety disorders Analysis of heart rate variability in posttraumatic stress disorder patients in response to a trauma-related reminder Depression recognition according to heart rate variability using Bayesian Networks Detection of major depressive disorder from linear and nonlinear heart rate variability features during mental task protocol Entropy analysis of heart rate variability and its application to recognize major depressive disorder: A pilot study Task-state heart rate variability parameter-based depression detection model and effect of therapy on the parameters Heart rate variability. Standards of measurement, physiological interpretation, and clinical use Cardiologistlevel arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram Deep learning framework for subject-independent emotion detection using wireless signals DeprNet: A deep convolution neural network framework for detecting depression using EEG A review on feature extraction and denoising of ECG signal using wavelet transform Design of digital filter on ECG signal processing ECG signal de-noising and baseline wander correction based on CEEMDAN and wavelet threshold Deep learning approach for active classification of electrocardiogram signals Optimal selection of wavelet basis function applied to ECG signal denoising ECG baseline wander correction based on mean-median filter and empirical mode decomposition Edge preserving filtering by combining nonlinear mean and median filters Median based method for baseline wander removal in photoplethysmogram signals Comparing features from ECG pattern and HRV analysis for emotion recognition system ECG pattern analysis for emotion detection Multiple time scales analysis for identifying congestive heart failure based on heart rate variability A deep convolutional neural network model to classify heartbeats Classification of cardiac patient states using artificial neural networks Model-based parameter estimation applied on electrocardiogram signal Detection of ventricular Arrhythmias using roots location in AR-modelling Approximation by rational functions Region aggregation network: improving convolutional neural network for ECG characteristic detection Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) Multi-class arrhythmia detection from 12-lead varied-length ECG using attention-based time-incremental convolutional neural network. Information Fusion Precision medicine and artificial intelligence: A pilot study on deep learning for hypoglycemic events detection based on ECG An objective screening method for major depressive disorder using logistic regression analysis of heart rate variability data obtained in a mental task paradigm Finding and evaluating suitable contents to recognize depression based on neuro-fuzzy algorithm The author reports no conflicts of interest in this work.