key: cord-0280429-fjrdeyk4 authors: McIntosh, Declan; Marques, Tunai Porto; Albu, Alexandra Branzan title: Preservation of High Frequency Content for Deep Learning-Based Medical Image Classification date: 2022-05-08 journal: nan DOI: 10.1109/crv52889.2021.00010 sha: 5cfbfd26354866771dba23cc4c8f238e91ee9b44 doc_id: 280429 cord_uid: fjrdeyk4 Chest radiographs are used for the diagnosis of multiple critical illnesses (e.g., Pneumonia, heart failure, lung cancer), for this reason, systems for the automatic or semi-automatic analysis of these data are of particular interest. An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists, ultimately allowing for better medical care of lung-, heart- and chest-related conditions. We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information that is typically lost in the down-sampling of high-resolution radiographs, a common step in computer-aided diagnostic pipelines. Our proposed approach requires only slight modifications to the input of existing state-of-the-art Convolutional Neural Networks (CNNs), making it easily applicable to existing image classification frameworks. We show that the extra high-frequency components offered by our method increased the classification performance of several CNNs in benchmarks employing the NIH Chest-8 and ImageNet-2017 datasets. Based on our results we hypothesize that providing frequency-specific coefficients allows the CNNs to specialize in the identification of structures that are particular to a frequency band, ultimately increasing classification performance, without an increase in computational load. The implementation of our work is available at github.com/DeclanMcIntosh/LeGallCuda. Chest radiograph testing for medical diagnosis is an important and widely used component of care for a multitude of afflictions. This is particularly important during the COVID-19 pandemic, as it has been shown by the Canadian Association of Radiologists [1] that chest radiographs constitute valuable supplemental diagnosis methods for infections related to the virus. Chest radiographs also are commonly used for the diagnosis and treatment of conditions associated with, but not exclusively related to, the SARS-CoV-2 virus such as Pneumonia [2] . The need for efficient, timely and costeffective automated detection of abnormalities in radiograph images expands beyond any singular outbreak as Pneumonia was the worldwide top cause of death for infants in 2017 [3] . More generally, cardiothoracic and pulmonary conditions are among the leading causes of morbidity, mortality and health service use in the world [4] . The diagnosis of this and other diseases targeted by chest radiographs (e.g., Cardiomegaly, Pneumothorax) pose a significant burden for physicians and radiologists, as this analysis can be time consuming, prone to errors and inter-expert disagreements. The computer visionenabled interpretation of large numbers of chest X-rays has the potential to reduce the operational costs involved in the interpretation of these potentially live-saving tests. Moreover, semi-automated methodologies for pre-screening radiographs (i.e., flagging a sample as potentially unhealthy pending further human deliberation) expands the possible use cases of these automated analysis systems. A number of recent works indicate the growing interest in performing the automatic diagnosis of radiographs using computer vision [5] - [10] . These methods are primarily focused in optimizing existing general computer vision architectures or performing transfer learning using pre-trained parameters (i.e., modifying only a subset of them to conform to a specific application) obtained with large image datasets. These methods often ignore optical properties of the data being used, such as its high resolution, textural features and other high-frequency components. This phenomenon is due to the common down-sampling step occurring at the input of popular CNN-based image classifiers [11] - [15] . While typically overlooked, this spatial dimension reduction may often eliminate important components of the image, in particular those of higher frequencies. Given that most down-sampling techniques use smoothing filters (e.g., Gaussian) to avoid aliasing, highfrequency components are lost in the increasing levels of blur. Considering that some methods reduce the dimensions of radiograph images by a factor of more than 10 [5]- [7] relative to the raw images from the CheXpert dataset [16] , this issue can become particularly pressing in medical imaging. We propose a method for preserving these finer details (high-frequency components), typically lost during the downsampling performed before image classification. We employ a Discrete Wavelet Transform (DWT) to preserve all structural information from input images while at the same time performing the necessary down-sampling operations for the use of CNN-based classification pipelines. This can be accomplished by taking advantage of the energy-preserving properties of DWTs, as illustrated by their use in image compression schemes such as JPEG2000 [17] . The output of this method are 3 additional independent channels to be included in both training/testing and inference phases. This expansion of input channels causes only a negligible increase in inference time and in the amount of memory (trainable parameters, and gradients) required for training. Increasing the number of input channels only linearly scales both FLOPs and required gradients of the network's second layer. The additional highfrequency information contained in the extra channels allow for better generalization of the dataset and ultimately increased performance. Our results show that the frequency-specific coefficients offered by the proposed method allow state-of-the-art CNNs to detect visual structures associated to frequency bands only accessible in higher-resolution images, which would otherwise be eliminated. This novel capability increases image classification performance without a relevant growth of computational load. The finer details preserved by the system proposed are particularly beneficial for medical imaging, where such high-frequency components carry important information. The remainder of this article is structured as follows. Section II discusses works related to the proposed system. Section III details our proposed approach and necessary background. In Section IV we describe the experimental setup created to evaluate the proposed approach, as well as the results of experiments. Finally in Section V we draw conclusions, and propose future work. Related works to the proposed system include frameworks for medical image processing and classification, and energy compacting methods such as the Discrete Cosine Transform (DCT) [18] and Discrete Wavelet Transforms (DWT) [19] in the context of preprocessing for deep learning-based image classification. 1) Deep learning-based medical image classification: Dunnmon et al. [5] considered 216,431 expertly annotated frontal chest radiographs and explored the ability of CNNs to correctly classify them as normal or not. The authors found using a test set of 533 samples that a relatively small number of images were required for the training of efficient general abnormality detectors: the use of 2,000 training images yielded an average area under the receiver operating characteristic curve (AUC) of 0.84, while a 20,000-sample image set generated an AUC of 0.95. This study offered a general idea on the scale expected on benchmark datasets used for medical imaging classification (in particular for radiographs). More recently Hashmi et al. [6] proposed a system that employs transfer learning to train specialized image classifiers for the detection of Pneumonia in radiographs. Each classifier is obtained by fine-tuning pre-computed parameters from diverse CNNs (i.e., ResNet18 [11] , Xception [20] , InceptionV3 [13] , DenseNet121 [21] , and MobileNetV3 [15] ) on a dataset of chest X-ray images. The individual predictions from this ensemble of classifiers are weighted to produce the final output; this approach outperformed the predictions of any individual classifier. Tang et al. [22] trained multiple CNNs that scored high AUC in the distinction between normal and abnormal frontal chest radiographs (0.98 for test samples detached from the training dataset, 0.94 for samples from an external dataset). Moreover, the authors report that fine-tuning the aforementioned image classifiers using radiographs from pediatric patients resulted in an AUC of 0.944, attesting to the generalization power of the networks used. The results from Tang et al. represent strong arguments about the fact that modern CNNs ( [11] - [15] ) are able to match and often surpass expertlevel analysis of radiographs. 2) DCT-Based Image energy compression preprocessing: Previous methods in the medical imaging field have used the frequency decomposition and signal energy compaction properties of the DCT to reduce problem dimensionality [23] , [24] . These methods used the DCT to refactor their input into a series of frequency bins representing channels with greatly reduced dimensions [23] , [24] . Sridlhar and Murali Krishna [23] proposed a probabilistic neural network to detect brain tumors in magnetic resonance brain images. The images used in this work were often subject to severe noise from issues related to inconsistencies in data collection, which ultimately hindered the neural network-based classification performance. In an attempt to attenuate these issues, the authors used the Discrete Cosine Transform (DCT) to reduce image dimensionality and perform feature extraction in the frequency domain. The authors then used the DCT-based frequency features as inputs of their neural networks, instead of the original MRI images. The reduced high-frequency input noise for their approach significantly increased the model accuracy. Boukhechba, Wu, and Bazine [24] proposed a preprocessing approach to hyper-spectral imagery data for independent component analysis. Their method utilizes the DCT's ability to compress the energy of an input signal into a small number of frequency components. This property allowed the authors to reduce dimensionality and decrease sensitivity to high-frequency noise by only considering the lower frequency components while maintaining the majority of signal energy. 3) DWT Image energy compression preprocessing: Li et al. [25] proposed WaveCNets, which uses DWT-based replacements for the intermediate down-sampling layers within the networks. The authors replace pooling and strided convolutions with different wavelet transforms during the downsampling phases that typical CNNs contain, continuing to consider only low frequency components. The WaveCNets models where evaluated by replacing the applicable layers in several state-of-the-artCNNs. The authors report a noticeable increase in model accuracy and tolerance to Gaussian noise in ImageNet [7] classification tasks. A wavelet-based preprocessing method for feature extraction was proposed by Reema and Babu [26] for brain tumor detection in MRI images. After applying a DWT on input MRI images, the authors extract hand-crafted visual features. The method proposed in [26] then utilizes a Support Vector Machine (SVM) to allow for the detection and segmentation of tumors based on the aforementioned features. This intial DWT-based preprocessing step proposed in [26] boosted the system performance and decreased the complexity of input MRI images used in the detection and segmentation tasks. Our novel method utilizes similar frequency-based image decomposition to provide increased image signal energy (i.e., apparent resolution) to the input of CNNs commonly used for image classification. These additional information provide important visual features to the CNNs, allowing the networks to more efficiently classify radiographs and generalize for data arriving from diverse datasets. The data contained in the highfrequency components we provide, which would otherwise be lost in regular down-sampling schemes, is presented in a format that causes only a negligible increase to model size and inference time. Our proposed approach employs Discrete Wavelet Transforms as a means to preserve important structural information, in particular high-frequency components, of high-resolution input images before down-sampling them to conform to the dimensions required by popular CNNs (e.g., 224 × 224 and 299 × 299 pixels). These additional components, which would otherwise be lost, help the networks better generalize the automatic classification of radiographs (as discussed in the following Sections). Non-medical imaging applications can also take advantage of our proposed method, as illustrated by a consistent increase in performance observed over different CNNs on a dataset of generic images (i.e., ImageNet [27] ). Figure 1 summarizes the proposed approach. The only minor modification required by our framework upon existing CNNs is to increase the number of expected input channels by a factor of 4 (to accommodate the additional DWT-generated inputs). Therefore, it can be easily integrated with any system that uses CNNs as automatic feature extractors, such as image classifiers, generative adversarial networks, object detectors, instance segmentation pipelines, among others. Discrete Wavelet Transforms have become ubiquitous among image compression pipelines, in particular because of their higher compression ratios and lack of "blocking" artifacts generated by their predecessors, Discrete Cosine Transforms (DCT) [28] . The use of 2-dimensional DWT in images creates a set number of frequency-based decomposition levels of an input, which can later be used to partially or fully reconstruct the original image (given a specific choice of filter bank). Each decomposition level created by a 2D DWT can be obtained following a two-step process (dyadic decomposition). First, two 1D filters are convolved with only the rows of the input using a step size of two (i.e., centered at every other pixel index), generating two coefficients as outputs: a low-and a high-frequency one, both possessing only half the number of columns of the input. The 1D filters proposed in LeGall 5/3 [19] are detailed in Equations (1) (high-frequency filter F h ) and (2) (low-frequency filter F l ). Second, these two coefficients are convolved with the transposes of F h and F l along their columns, further doubling the number of output coefficients to a total of four. Similarly to the first step, this convolution reduces the number of rows by half. Note that one could choose to start the convolution operations with the columns of the input, rather than the rows. The input image is typically referred to as LL 0 , and the outputs of the first application of F h and F l as H 0 and L 0 , respectively. While L 0 represents an approximation of the input signal at a coarser resolution, H 0 carries the highfrequency details of it [28] . The second application of F h and F l , this time in the columns of L 0 and H 0 , creates four outputs, namely: LL 1 , HL 1 , LH 1 and HH 1 (the subscript reflects the decomposition level). These four outputs possess half the spatial resolution of the input LL 0 . We seek to preserve the high-frequency components of the input images during down-sampling, and a one-level dyadic, DWT-based decomposition generates four coefficients with half the spatial resolution of the input. Therefore, in order to match the size of such coefficients with the required input dimensions from CNNs, we first resize the input images using bi-linear interpolation to double the resolution required by a network (e.g., 598 × 598 for a requirement of 299 × 299), and continue to apply the DWT-based decomposition. As a result we match the input size requirements of CNNs while preserving all information from the twice-spatial resolution input. Figure 1 illustrates this process. The frequency-based coefficients LL 1 , HL 1 , LH 1 and HH 1 are used as the input channels for the image classification CNNs that we train. Figure 2 illustrates the four decomposition components discussed (note that we suppress the subscript because only one level is used). The LeGall 5/3 filters create a lossless representation of their inputs [19] , meaning that an input image can be recovered using exclusively the calculated decomposition coefficients. This lossless reconstruction capability is especially useful in our work because all information is carried in the inputs forming a 4-channel representation, each with half of the spatial resolution of the original images. This group of channels creates a frequency-based separation between the components of the input, encoding structurally independent but semantically related features. Based on the results presented in the remainder of this article, we hypothesize that providing frequency-specific coefficients (e.g., low-frequency or high-frequency only; see Figures 2 (a) and (b) ), allows the CNNs to specialize in the identification of structures specific of a frequency band; this ultimately increases classification performance, without an increase in computational load. Figure 2 shows that the typical down-sampled input ("LL" images of this Figure) is enhanced with three additional channels that better characterize high-frequency regions, such as the clavicle edges highlighted in yellow. These complementary inputs encode all information (that would otherwise be lost in down-sampling), allowing for a better generalization of the CNNs that use these coefficients. The frequent scaling operations of modern CNNs represent a critical factor in determining the allocation of resources: any increase in spatial resolution by a factor of X increases the memory required for training and the FLOPs of the convolutional layers by the same factor X. Due to this scaling and limited memory and computation resources, the input image resolution is often severely limited. Our proposed approach uses LeGall 5/3-based complementary inputs (i.e., coefficients of the dyadic decomposition) in the form of three extra input channels to encode all the information from the original image (despite the reduction in spatial resolution), as illustrated in Figure 1 . This strategy, different from simply providing higher-resolution images and dealing with their resource implications, only negligibly increases the number of FLOPs and memory required. In fact, only the number of parameters of the first layer in a CNN is increased. For instance, the three extra DWT-based channels we propose only increased the number of parameters in a MobileNet [15] by 0.02% (see other examples on Tables I and II ). This percentage is even smaller for bigger networks (e.g., ResNet50 [11] , InceptionV3 [13] ). As a result, our method provides all the information from an image with twice the spatial resolution required by CNNs, but instead of doubling the number of parameters involved in their training, it only very slightly increases it. A detailed analysis of the 4-channel input we provide (see Figure 2 ) follows. 1) The first channel, LL, which represents the result of applying two low-pass filters (F l and its transpose) to rows and columns of the input, is structurally similar to the a regularly down-sampled image (often obtained using nearest neighbor, bi-linear or bi-cubic interpolation). 2) The following three channels, HL, LH and HH are a combination of the outputs of F l and F h , or singularly F h and its transpose. As detailed in [29] , the coefficients obtained with the application of filters F l and F h can lead to a perfect reconstruction of the input by representing only specific frequency bands at a time (e.g., "HL: only the lowest frequencies from the highfrequency components of the input"). Although we consider only one-channel images (i.e., grayscale) in this work, our method can be extended to multichannel images. For example, a 3-channel RGB image could be represented by 12 DWT-generated coefficients. We utilize several state-of-the-art CNNs to evaluate our proposed method with respect to compatibility and scaling of different models' parameters. We use the same hyperparameters (specified below) in the training phases of all models for consistency. For each CNN we train the image classifier with wavelet-supplemented inputs (i.e., the proposed approach) and regular ones. In order to evaluate the potential of the proposed approach in medical and natural images alike, we perform our experiments in two large datasets: NIH Chest-8 Dataset [7] , composed of 108, 000 de-identified and annotated images of chest X-rays, and ImageNet-2017 [27] , which possesses 1, 200, 000 images spanning 1,000 categories. ImageNet-2017 was chosen as it allows for an analysis of the generalization capabilities of the proposed method in a large dataset of diverse natural images. The models involved in our analysis are MobileNetV2 [12] , ResNet-50 [11] , AlexNet [14], and InceptionNet [13] on the NIH Chest-8 Dataset [7] , and SqueezeNet [30] , MobileNetV2 [12] , ResNet-50 [11] , and InceptionNet [13] on our ImageNet-2017 benchmark [27] of grayscale-only images. We do not train AlexNet [14] on ImageNet because of its large number of parameters and because it has been shown that more recent CNNs (e.g., [11] , [13] ) outperform it. While training/testing we performed classification on all 1,000 classes of the dataset. Images are converted to grayscale for training and evaluation (as discussed in the previous Section). The hyper parameters used for all models and ImageNet were: categorical cross entropy loss, spatial input dimensions of 224 × 224 pixels, learning rate of 1e −4 , batch size of 32, ADAM optimizer [31] and an indefinite number of epochs (the training stopped when the validation loss for each model stabilized). We modify the multi-class classification task proposed by the dataset to eight separate binary classification tasks, one for each class. This allows for more points of comparison between wavelet-and non-wavelet-based inputs. The eight classes of the NIH Chest-8 dataset are: Pneumothorax, Effusion, Mass, Pneumonia, Cardiomegaly, Nodule, Atelectasis, and Infiltration. To handle the data imbalance for our new problem formulation we randomly sample a number of negative samples from the test set based on the number of positive samples creating a ratio of 1 : 1 per class-specific classification task. Testing and validation on the NIH Chest-8 Dataset were performed on the entire testing and validation subsets, respectively. Weights are initialized using the randomized method proposed by He et al. [32] . The hyper parameters used for all models trained on this dataset were: binary cross entropy loss (BCE), spatial input dimensions of 512 × 512 pixels, learning rate of 1e −5 , batch size of 8, ADAM optimizer [31] and an indefinite number of epochs (again, training stopped when the validation loss for each model stabilized). Due to the reformulation of the NIH Chest-8 task to a series of 8 binary classification problems, the ratio of positive to negative examples in the test set exceed 1 : 50 for some classes. For this reason, the utilization of metrics based on True and False Positives would not result in useful performance insights. Instead we report BCE and mean squared error (MSE) values for the analysis of the predictions of each model. Table I provides the aggregate relative performance gains or losses between the wavelet pre-processed inputs (proposed method) and regular ones considering the five tested CNNs. These results reflect an aggregate measure over the test set for all 8 binary classification problems (i.e., a binary classification per class of the NIH Chest-8 Dataset). Table I shows that the proposed method resulted in improvements (i.e., reduction) in the test set BCE loss for all models but MobileNetV2. We hypothesize that the MobileNet results are related to two main reasons, namely: 1) MobileNet's early spatial down-sampling (particular to this CNN architecture) nullifies the additional information provided by the proposed method before features can be properly extracted/learned. 2) Non-ideal choice of hyper parameters for this model. While other methods could more quickly generalize the training data, this smaller model was not able to do that for the same set of hyper parameters (e.g., potentially due to the learning rate being too large, preventing the model to reach convergence). Tested models excluding MobileNetV2 [12] (i.e., ResNet-50 [11] , AlexNet [14], InceptionNet [13] , and SquezeNetV1.1 [30] ) performed better by an average of 3.1% on the test set of the NIH Chest-8 dataset when considering test BCE and using the proposed method. This indicates that these models, with the same set of hyper parameters, were able to use the higher-frequency information encoded in the additional channels created in our method to improve their generalized predictions. This improvement in performance is most likely due to the additional information encoded in the three extra channels our method provides, which would otherwise be lost in regular down-sampling pipelines (e.g., high-frequency portions of a specific texture that would be eliminated with a simple resizing of the image). Ablation studies summarized in Table III show that only when using all three DWT-based additional inputs the performance of the different models on the NIH Chest-8 increased. This indicates that the additional high-frequency components of our method do not just present new individual interest points (e.g., edges and corners) to the network, as the output of a Sobel or Laplacian of Gaussian (LoG) operator would, but rather the entirety of the information present in the input. This signal-energy-preserving characteristic appears to be necessary for efficient representation of coherent statistical features that can be learned by the CNNs. The heterogeneity of this dataset distinguishes it from the more semantically coherent images found in the previously discussed NIH Chest-8 dataset. Thus it represents an efficient indicator of the usefulness of the proposed method for heterogeneous visual data. Table II shows that we obtained results in the ImageNet-2017 experiments which are similar to those using the NIH Chest-8 dataset, with an average relative increase of 2.42% in performance across all models for Categorical Cross Entropy (CCE) and 2.71% relative increase in Top-1 Accuracy. Notably, in this evaluation all models show an increase in relative performance due to the additional high-frequency features encoded by the proposed method. This likely happens because the classes of ImageNet are more easily differentiated with the use of high-frequency information, for instance the highly divergent textures between the dog and boat classes. Note that the results from ImageNet-2017 are underreporting the contributions of proposed method, given that the majority of images provided by the dataset fall below the minimum ideal source resolution of 448 × 448 pixels (double the input dimensions of a given CNN). Regardless, results indicate a systematic increase in classification performance across all state-of-the-art CNNs evaluated stemming from the use of the proposed DWT method. We expect to observe further improvements in classification performance for datasets that contain higher-resolution images. The results on this large dataset show that the additional high-frequency coefficients offered by the proposed method contribute to the classification of images of diverse natures (as illustrated by the 1, 000 visual classes of ImageNet-2017 [27] ). In the previous sub-sections the influence of the additional high-frequency DWT-based channels were presented on aggregate metrics considering all classes. Here we focus on the relative class-specific improvements. The worst-performing class with the use of our method in the NIH Chest-8 dataset (when compared to down sampled inputs) was Cardiomegaly. Cardiomegaly is characterized in radiographs as an enlarged heart (see Figure 4 ). The relevant visual features used in the classification of Cardiomegaly are concentrated in lower frequencies of the image (e.g., the total size of homogeneous regions of the heart is an important visual cue for this class). As discussed, our DWT-based method preserves additional information in higher frequencies. Therefore, not only these additional high-frequency components are not necessary for an efficient classification of Cardiomegaly, they might represent a detrimental addition (i.e., high-frequency noise). Previously in the NIH Chest-8 dataset benchmarks Mass was noted to be a particularly challenging class [7] , due to its intraclass variability. Our experiments reported an increase in classification performance in this class when using the proposed DWT-based inputs. Such increase can be associated to the fact that edges and changes in textures (high-frequency components) represent the main visual cues of this class (see Figure 3 ), therefore its classification is particularly benefited by the additional inputs we provide. This phenomenon is illustrated by the high-frequency structural elements that are easily visualized in the DWT-based channels of Figure 3 , in contrast with those from Figure 4 . Based on our thorough experiments and ablation studies we conclude that the additional inputs provided by the proposed method are effective in offering useful supplementary highfrequency components for classification of images, in particular for classes where fine features and textures are particularly distinctive. We propose a novel and simple, CNN-agnostic method that encodes all information from higher-resolution images when down-sampling them for use with networks of image classification. These extra information are represented as three additional channels (per original channel of the input), and our experiments show that they systematically increase the performance of state-of-the-art CNNs for image classification. Our proposed method performs a Discrete Wavelet Transform (DWT) using the LeGall 5/3 filter bank [29] to encode additional high-frequency components of the input, effectively presenting all of its information after down-sampling. These complementary channels add only a negligible number of parameters to the networks and require minor modifications to the first few layers of a CNN architecture, thus representing an approach that can be easily incorporated to existing systems. These supplemental channels allow diverse CNNs to better detect visual structures that are associated with specific frequency bands otherwise lost. This improvement is highlighted in the results we present: our method improves the image classification performance of various CNNs without a noticeable increase in computational load. In tested models excluding MobileNetV2 [12] our proposed approach reduced testing loss (BCE) by 3.1% in binary classification tasks from the NIH Chest-8 dataset [7] . Detailed analysis of the interaction between the proposed method and the Cardiomegaly and Pneumothorax classes of [7] revealed that these additional channels are particularly beneficial to classes where texture or finer details are predominant. Beyond medical imagining usage, we explore the potential of the proposed approach in a large generic dataset, ImageNet-2017 [27] . Considering the performance on all 1, 000 classes of this dataset, our experiments show an average relative decrease of 2.42% across all CNNs models evaluated for Categorical Cross Entropy loss and 2.71% relative increase in Top-1 Accuracy. These improvements highlight how the highfrequency components offered by our method can help CNNs better generalize for medical imaging and generic datasets alike. Future work will address the application of DWT using other wavelets for data encoding, as well as investigate the effectiveness of these additional input channels to systems that use CNNs as features extractors for other tasks, such as: object detection, instance segmentation and Generative Adversarial Networks-based applications. The canadian society of thoracic radiology (cstr) and canadian association of radiologists (car) consensus statement regarding chest imaging in suspected and confirmed covid-19 Chest radiography and pneumonia in primary care: diagnostic yield and consequences for patient management Pneumonia Global, regional, and national life expectancy, all-cause mortality, and causespecific mortality for 249 causes of death, 1980-2015: a systematic analysis for the global burden of disease study Assessment of convolutional neural networks for automated classification of chest radiographs Efficient pneumonia detection in chest xray images using deep transfer learning Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Chexnet: Radiologistlevel pneumonia detection on chest x-rays with deep learning An efficient deep learning approach to pneumonia classification in healthcare Identifying pneumonia in chest x-rays: A deep learning approach Deep residual learning for image recognition Proceedings of the IEEE conference on computer vision and pattern recognition Rethinking the inception architecture for computer vision Imagenet classification with deep convolutional neural networks Searching for mobilenetv3 Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison JPEG2000 Image Compression Fundamentals, Standards and Practice Discrete cosine transform Sub-band coding of digital images using symmetric short kernel filters and arithmetic coding techniques Xception: Deep learning with depthwise separable convolutions Densely connected convolutional networks Automated abnormality classification of chest radiographs using deep convolutional neural networks Brain tumor classification using discrete cosine transform and probabilistic neural network Dct-based preprocessing approach for ica in hyperspectral data analysis Wavelet integrated cnns for noise-robust image classification Tumor detection and classification of mri brain image using wavelet transform and svm Imagenet: A large-scale hierarchical image database Implementation and comparison of the 5/3 lifting 2d discrete wavelet transform computation schedules on fpgas Sub-band coding of digital images using symmetric short kernel filters and arithmetic coding techniques Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size Adam: A method for stochastic optimization Delving deep into rectifiers: Surpassing human-level performance on imagenet classification The authors would like to acknowledge the valuable contribution received by the Jamie Cassels Undergraduate Research award 1 , and the University of Victoria for their logistical and academic support.