key: cord-0071670-hn19h2ny
authors: Wang, Shui-Hua; Satapathy, Suresh Chandra; Zhou, Qinghua; Zhang, Xin; Zhang, Yu-Dong
title: Secondary Pulmonary Tuberculosis Identification Via pseudo-Zernike Moment and Deep Stacked Sparse Autoencoder
date: 2021-12-16
journal: J Grid Comput
DOI: 10.1007/s10723-021-09596-6
sha: d45aa4ddef118af4c74404c188bbc4c07230f750
doc_id: 71670
cord_uid: hn19h2ny

Secondary pulmonary tuberculosis (SPT) is one of the top ten causes of death from a single infectious agent. To recognize SPT more accurately, this paper proposes a novel artificial intelligence model, which uses Pseudo Zernike moment (PZM) as the feature extractor and deep stacked sparse autoencoder (DSSAE) as the classifier. In addition, 18-way data augmentation is employed to avoid overfitting. This model is abbreviated as PZM-DSSAE. The ten runs of 10-fold cross-validation show this model achieves a sensitivity of 93.33% ± 1.47%, a specificity of 93.13% ± 0.95%, a precision of 93.15% ± 0.89%, an accuracy of 93.23% ± 0.81%, and an F1 score of 93.23% ± 0.83%. The area-under-curve reaches 0.9739. This PZM-DSSAE is superior to 5 state-of-the-art approaches.

Page 2 of 16 because most PT cases belong to SPT. The most widespread locales of SPT lesions include the apical and posterior segments of the upper lobe and the dorsal segment of the lower lobe [7] . The chest CT (CCT) exhibitions are enumerated as exudative patchy shadows of uneven density, miliary shadows, tree bud signs, proliferative nodular shadows, caseous pneumonia, cavities, tuberculosis bulbs, bronchial dissemination, satellite foci, calcifications and cord strips, and lots of other forms [8, 9] . In the past, manual analysis of SPT is tedious and onerous. In addition, the interpretation results are without doubt influenced by intra−/inter-expert factors. Recently, scholars favor the use of advanced artificial intelligence tools to design automatic SPT classification methods. For instance, Bagci, et al. [10] proposed a computer-aided quantification and detection model for cavitary TB. Based on the support vector machine (SVM), their team proposed a new shapebased SVM (SSVM) to detect airways and cavities. Nevertheless, this method has to manually extract shape-based features. Li, et al. [11] combined features from both convolutional neural network (CNN) and AutoEncoder (AE). Their team dubbed their method as "AECNN". However, the performances (Accuracy = 0.8126, F1 = 0.8149, and Recall = 0.8172) are insufficient in clinical usage. Based on deep learning (DL), Park, et al. [12] proposed a DL-based automatic detection (DLAD) algorithm to detect active pulmonary tuberculosis. Using the high-sensitivity cutoff, the sensitivities and specificities were 94.3%-100% and 91.1%-100%. Using the high-specificity cutoff, the figure turned to 84.1%-99.0% and 99.1%-100%. James-Reynolds, et al. [13] used a three-dimensional block-based residual deep learning (RDL) model with depth information. This method is shortened to RDL. Nevertheless, 3D models consume more computation costs and resources. They can easily overfit during training. Xie, et al. [14] presented a computer-aided system (CAS) for multiple-category PT detection in radiograms. A learning scalable pyramid structure was utilized for the faster RCNN. The weakness is that the pyramid structure in faster RCNN is still inefficient. The data augmentation implemented merely enhanced the size to 5x of the original size; thus, the improvement is restricted.

In order to improve the diagnosis performance of SPT, this paper proposed a novel artificial intelligence model based on the pseudo-Zernike moment (PZM) and deep stacked sparse autoencoder (DSSAE). The novelties of this study are four points:

(i) We are the first to apply PZM to automatic SPT diagnosis. (ii) A novel PZM-DSSAE model is proposed to identify SPT. (iii) 18-way data augmentation is used to prevent overfitting. (iv) The proposed "PZM-DSSAE" model is better than 5 state-of-the-art approaches.

The dataset was described in Ref. [15, 16] , of which the retrospective study was exempt by the Institutional Review Board of local hospitals. The data is available upon reasonable request to the corresponding authors. All images are stored in picture archiving and communication systems (PACS) format. Besides, the detailed demographics are list in Table 1 . The data is available upon reasonable request to corresponding authors. Two junior radiologists V a and V b with ten years of chest-related diagnostic experiences read the radiographs collectively. V a and V b record the sizes, distributions, and morphological shapes of the CT manifestations of the lesions, and then make slicewise annotations. Several slices (More than 0 and less than 5) are selected by the slice level selection method [15] . V c is the senior radiologist. The sizes of all images are stored as 1024 × 1024. Let x 0 stands for a lung-window CCT image. The labeling R is yielded via.

(1) When the radiologist V a disagrees with V b , we will invite the senor one V c to join and make a consensus by majority voting (MV). Table 2 shows the abbreviation list. Let each raw image be x o , the original dataset is written as X 0 = {x 0 (i)}. The four preprocessing steps can be written as.

where x g , x HS , x crop , and x are the results of grayscale operation, histogram stretch (HS), cropping, and

resizing. Four points should be noted: (i) All CCT images are saved in red, green, and blue (RGB) layout though those images are grayscale essentially; (ii) cropping values (z t , z b , z l , z r ) are measured in terms of pixels from four directions: top, bottom, left, and right, respectively; (iii) X = {x(i)} denotes the ultimate output of preprocessing, and (iv) the resized [W, W] image set X is utilized for input to the proposed AI model.

Suppose W = 256 and only consider individual image size, the data compression ratio (DCR) of the whole preprocessing is deduced as: and the space-saving ratio (SSR) is defined as.

This high DCR of 48 and SSR of 97.92% could help the AI model prediction and future cloud computing and online web service. Figure 1 shows the diagram of preprocessing procedures. Figure 2 displays two samples of the preprocessed dataset.

Image moment was firstly introduced by Hu [17] , who employed geometric moments to generate a set of invariants. Nevertheless, geometric moments are sensitive to noise. Thus, Teague [18] introduced Zernike moments (ZMs) based on orthogonal Zernike polynomials. The orthogonal ZMs have been verified to behave more robust in noisy conditions, and they can yield an almost zero value of redundancy measure [19] (Fig. 3) . Later, pseudo Zernike moment (PZM) is derived from Zernike moment. PZMs have been proven to give better performances than other moment functions such as Hu moments, Zernike moments, etc. Hence, PZM is more expressive and offers more feature vectors than ZM. PZM has been successfully applied in breast cancer diagnosis [20] , pathological brain detection [21] , etc.

The kernel of PZMs is a set of orthogonal pseudo-Zernike polynomials defined inside a unit circle (UC) using the polar coordinate. The transformation between Cartesian and polar coordinates are.

The PZM y pq of order p with repetition q of an image x(r, θ) is defined as [22] .

where the pseudo-Zernike polynomials (PZPs) V pq (r, θ) of the order p are defined as.

where radial polynomials R pq are defined as.

where 0 ≤ |q| ≤ p. PZM triangular matrix Y = {y pq }, 0 ≤ p ≤ p max are defined in terms of polar coordinates (r, θ) within |r| ≤ 1. Therefore, the computation of PZM needs a linear transformation of the image plane (IP) coordinates (w 1 , decomposition is shown in Fig. 6 , in which (a-c) show the raw peak image, the reconstructed image using p max = 6 PZFs, and the difference between the raw and reconstructed image. Figure 6 (d) gives the detailed decomposition and the corresponding PZMs. There are many other excellent feature extractors which we will investigate in our future studies.

The PZM triangular matrix is vectorized into a (p max + 1) 2 vector feature Y, and then are sent to a custom deep stacked sparse autoencoder (DSSAE), which is one of the deep learning models [24, 25] .

The fundamental element of DSSAE in the autoencoder (AE) [26] , which is a typical neural network that learns to map its input Y to output Z.

There is an internal code output C that represents the input Y. The whole AE can be split into two parts: An encoder part (W Y , B Y ) that maps the input Y to the code C, and a decoder part (W Z , B Z ) that maps the code C to a reconstructed data Z, namely,

We expect the output Z equals to the input Y. The structure of AE is displayed in Fig. 7 , where the encoder part is with weight W Y and bias B Y , and the decoder part is with weights W Z and bias B Z . We have.

where the output Z is an estimate of input Y, and g LS is the log sigmoid function [27] :

The sparse autoencoder (SAE) [28] is a variant of AE. To minimize the error between the input vector Y and the output Z, the raw loss function of AE is deduced as:

where N S means the number of training samples. From Eqs. (11), (12), we find the output Z can be expressed in the way of where g AE is the abstract of the AE model; therefore, Eq. (14) can be revised as Sparse autoencoder (SAE) encourages sparsity into AE. SAE only allows a small fraction of the hidden neurons to be active at the same time [29] . This sparsity forces SAE to respond to unique statistical features of the training data [30] .

In practice, one L 2 regularization term Γ w of the weights (W Y , W Z ) and one regularization term Γ s of the sparsity constraint are defined to avoid overcomplete mapping or trivial mapping. The loss function of SAE is written as:

Internal Code where c s stands for the sparsity regulation factor, and c w the weight regulation factor. The sparsity regularization term is defined as:

where g KL stands for the Kullback-Leibler divergence [31] function, |C| is the number of elements of internal code output C. ̂j is the j-th neuron's average activation value over all N S training samples, and ρ is its desired value, named sparsity proportion factor. The weight regularization term Γ w is defined as.

The training procedure is set to scaled conjugate gradient descent (SCGD) [32] method.

We use SAE as a building block and create the final deep stacked sparse autoencoder (DSSAE) classifier by following three operations: (i) We include input layer, preprocessing layer, PZM layer, and vectorization layer; (ii) We stack four SAEs with various numbers of hidden neurons; (iii) We append softmax layer at the end of our AI model.

The structure of this proposed PZM-DSSAE model is itemized in Table 3 and illustrated in Fig. 8 . The sizes of the input CCT images are 1024 × 1024 × 3. After processing, all the CCT images are normalized to grayscaled images with the sizes of W × W. Afterwards, PZM layer will generate triangular matrix Y = {y pq }, 0 ≤ p ≤ p max , − p ≤ q ≤ p (See Fig. 5 ). The vectorization arranges all the PZM into one vector with number of elements of (p max + 1) ^ 2 (See Fig. 6d ).

In the classification stage, four SAE blocks with the number of neurons of (S 1 , S 2 , S 3 , S 4 ) are employed. Only the encoder parts of the four SAE are stacked (See Fig. 8 ). Finally, a softmax layer with neurons of S c is appended, where N c means the number of classes in our classification task.

The PZM feature set Y is set to V-fold cross-validation. The whole dataset is split into V separate folds.

At v-th trial, the v-th fold is used as the test set, and all the rest folds {1, …, v − 1, v + 1, …, V} are merged and used as training. This procedure repeats until all the folds are used only once as the test set (See Fig. 9 ).

To avoid randomness, we run above V-fold crossvalidation U times with different initial random seeds. Suppose the sample number of each class is T k (k = 1, 2). The ideal confusion matrix (CM) is.

Note here the off-diagonal entries of ideal F are all zero, viz., f ideal (m, n) = 0, ∀ m ≠ n. Here suppose "P" and "N" means positive and negative, corresponding to SPT and HC, respectively. The meaning of TP, TN, FP, and FN are shown in Table 4 .

Nine measures are used in this study: sensitivity, specificity, precision, accuracy, F1 score, Matthews correlation coefficient (MCC) [33] , Fowlkes-Mallows index (FMI) [34] , receiver operating characteristic (ROC), area under the curve (AUC). The first four measures are defined as. and the middle three measures are defined as: 

Page 9 of 16 1 The above measures are calculated in the mean and standard deviation (MSD) format. Besides, ROC is a curve to measure a binary classifier with varying discrimination thresholds. The ROC curve is created by plotting the sensitivity against 1-specificity. The AUC is calculated based on the ROC curve [35] .

Data augmentation creates fake training images to increase the size of the training set. Recently, multiple-way data augmentation (MDA) attract scholar's research interests. Wang [36] proposed a 14-way data augmentation (DA), in which the authors employed seven different DA techniques on raw training image x(m) and its mirrored image x ′ (m), thus creating 14 new images for each raw image. Cheng [37] proposed a 16-way DA which adds salt-and-pepper noise (SAPN) to the 14-way DA.

This study adds a new DA technique-speckle noise (SN)-to the 16-way DA. Thus, we have an 18-way DA for the training set. Suppose the raw training image is x(m), the SN altered image is defined as where N SN is uniformly distributed random noise. The mean and variance of N SN is set to γ m and γ v , respectively. The SAPN altered image x SAPN (m) is defined as:

where γ d stands for noise density, and ℚ the probability function. x min and x max correspond to the black and white colors, respectively. First, G 1 different DA methods as displayed in Fig. 10 are applied to raw training image x(m). Let H g , g = 1, …, G 1 denote each DA operation, we have the augmented datasets on raw image x(m) as.

Suppose G 2 stands for the size of generated new images for each DA method, we have.

Second, horizontal mirrored image is generated by.

where f HM stands for horizontal mirror function. Third, all the G 1 different DA methods are performed on the mirror image x ′ (m), and generate G 1 different datasets.

Fourth, the raw image x(m), the mirrored image x ′ (m), G 1 -way datasets of raw image H g [x(m)], and G 1 -way datasets of horizontal mirrored image H g [x ′ (m)], are combined. The final generated dataset from x(m) is defined as I(m):

where f fuse is the concatenation function. Suppose augmentation factor is G 3 , which stands for the number of images in I(m), we have.

Algorithm 1 summarizes the pseudocode of the proposed 18-way DA method. The maximum order is set to p max = 24, indicating we will have (24 + 1) 2 = 625 PZMs. The weight regularization factor c w = 0.001, the sparsity regulation factor c s = 1.1, and the sparsity proportion factor is set to ρ = 0.05. The number of neurons in the four SAEs is 300, 200, 100, and 60, respectively. The number of classes is set to S c = 2, viz., SPT and HC. V is set to Page 11 of 16 1 10 in V-fold cross validation. The model runs U = 10 times. The mean and variance of uniformly distributed random noise in SN are 0 and 0.05, respectively. The noise density in SAPN is set to 0.05. The factors in our data augmentation are set to G 1 = 9 and G 2 = 30; thus, the augmentation factor G 3 = 524.

Suppose Fig. 2a is the raw preprocessed training image x(m), Fig. 11 displays the H g [x(m)], g = 1, …, G 1 datasets on the image x(m). Due to the page limit, the horizontally mirrored image x ′ (m) and its corresponding G 1 datasets are not displayed. As shown in Fig. 11 , our 18-way DA help increases the diversity of the training set.

The statistical analysis of the proposed model is displayed in Table 6 . As is shown, the ten runs of 10-fold cross-validation obtain the performances of this proposed model. Its sensitivity is 93.33% ± 1.47%, the specificity is 93.13% ± 0.95%, the precision is 93.15% ± 0.89%, the accuracy is 93.23% ± 0.81%, the F1 score is 93.23% ± 0.83%, MCC and FMI are 86.47% ± 1.62% and 93.24 ± 0.83, respectively.

An ablation study is carried out to check the effectiveness of multiple-way data augmentation. If we remove the multiple-way data augmentation from our model, the performance will decrease. The sensitivity and specificity will decrease to 90.07% and 89.24%, respectively. Table 7 clearly shows that the performances of all the seven indicators decrease if we remove MDA from our model. Figure 12 illustrates the ROC plots, from which we can see the AUC will decrease from 0.9739 to 0.9253 if MDA is not used.

In future studies, MDA can be integrated into other classification models [38, 39] .

This proposed method is compared with 5 state-ofthe-art methods: SSVM [10] , AECNN [11] , DLAD [12] , RDL [13] , CAS [14] . All the methods are run on the same dataset via ten runs of 10-fold cross-validation. The results are shown in Table 8 . Figure 13 shows the error bar comparison of all SPT identification methods. The figure shows that our PZM-DSSAE model achieves the highest performance among all SPT identification. The reasons for our method achieving the best performance are three folds: (a) We choose PZM as the feature extractor. (b) We introduce DSSAE as the classifier. (c) 18-way data augmentation is used to prevent overfitting.

This study proposes a novel AI model of "PZM-DSSAE" for SPT diagnosis. The PZM is used to extract image features, and the DSSAE is used as the classifier. In addition, 18-way data augmentation is used to prevent overfitting. Finally, the whole model "PZM-DSSAE" is better than 5 state-of-the-art approaches. The shortcomings of our method are four folds: (i) We manually seek the optimal order p max . (ii) The structure of DSSAE is obtained by a trial-and-error method. (iii) Our model is only tested by simulation experiments. (iv) The dataset is relatively small. In the future, we shall try to propose automatic algorithms to determine this AI model's optimal order and structure. Besides, we shall attempt to collect more data and verify our model in a stricter medical environment. 

High prevalence of pulmonary tuberculosis among female sex workers, men who have sex with men, and transgender women in Papua New Guinea

Does Drug-Resistant Extrapulmonary Tuberculosis Hinder TB Elimination Plans? A Case from Delhi, India

Airway endoscopy and tracheal cytology of two-year-old thoroughbred horses during the first year of race training

Early experience of laparoscopy in emergency operation theatre at Lahore General Hospital, Lahore

Positive effects of lumbar puncture simulation training for medical students in clinical practice

A young patient with stroke and primary tuberculosis

Traditional Chinese medicine combined with western medicine for the treatment of secondary pulmonary tuberculosis A PRISMA-compliant meta-analysis

Gram-negative bacilli are a major cause of secondary pneumonia in patients with pulmonary tuberculosis: evidence from a cross-sectional study in a tertiary hospital in Nigeria

Alok: Clinico-radiological Difference between Primary and Secondary MDR Pulmonary Tuberculosis

Computer-aided detection and quantification of cavitary tuberculosis from CT scans

AE-CNN Classification of Pulmonary Tuberculosis Based on CT Images

Development and validation of a deep learning-based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs

Analysis of tuberculosis severity levels from CT pulmonary images based on enhanced residual deep learning architecture

Computer-Aided System for the Detection of Multicategory Pulmonary Tuberculosis in Radiographs

Diagnosis of secondary pulmonary tuberculosis by an eight-layer improved convolutional neural network with stochastic pooling and hyperparameter optimization

Explainable diagnosis of secondary pulmonary tuberculosis by graph rank-based average pooling neural network

Visual pattern recognition by moment invariants

Image analysis via the general theory of moments

LMZMPM: local modified Zernike moment per-unit mass for robust human face recognition

An Improved CAD System for Breast Cancer Diagnosis Based on Generalized Pseudo-Zernike Moment and Ada-DEWNN Classifier

Exploring a smart pathological brain detection method on pseudo Zernike moment

The scale invariants of pseudo-Zernike moments

Pseudo-Zernike Functions

Serverless Workflows for Containerised Applications in the Cloud Continuum

Detecting Cryptomining malware: a deep learning approach for static and dynamic analysis

Auto-HMM-LMF: feature selection based method for prediction of drug response via autoencoder and hidden Markov model

Sparsely-connected autoencoder (SCA) for single cell RNAseq data mining

Clutter removal in Through-The-Wall radar imaging using sparse autoencoder with low-rank projection

Construction of a Sensitive and Speed Invariant Gearbox Fault Diagnosis Model Using an Incorporated Utilizing Adaptive Noise Control and a Stacked Sparse Autoencoder-Based Deep Neural Network

Autoencoder based blind source separation for photoacoustic resolution enhancement

Some properties of Tsallis and Tsallis-Lin quantum relative entropies

Prediction of Draft Force of a Chisel Cultivator Using Artificial Neural Networks and Its Comparison with Regression Model

Adrenal glands enhancement in computed tomography as predictor of short-and intermediate term mortality in critically ill patients

Susceptibility to Seismic Amplification and Earthquake Probability Estimation Using Recurrent Neural Network (RNN) Model in Odisha

Efficiently calculating ROC curves, AUC, and uncertainty from 2AFC studies with finite samples

Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Information Fusion

PSSPNN: PatchShuffle Stochastic Pooling Neural Network for an Explainable Diagnosis of COVID-19 with Multiple-Way Data Augmentation

A review of supervised classification based on contrast patterns: applications, trends, and challenges

Enhancing text using emotion detected from EEG signals

The data is available upon reasonable request to the corresponding authors.

We have no conflicts of interest to disclose concerning this paper.