key: cord-0058835-pu6la9d7 authors: Benedetti, Priscilla; Perri, Damiano; Simonetti, Marco; Gervasi, Osvaldo; Reali, Gianluca; Femminella, Mauro title: Skin Cancer Classification Using Inception Network and Transfer Learning date: 2020-08-24 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58799-4_39 sha: db0a5b1d71b0cd0eb1c9db91864bfca8223e7f16 doc_id: 58835 cord_uid: pu6la9d7 Medical data classification is typically a challenging task due to imbalance between classes. In this paper, we propose an approach to classify dermatoscopic images from HAM10000 (Human Against Machine with 10000 training images) dataset, consisting of seven imbalanced types of skin lesions, with good precision and low resources requirements. Classification is done by using a pretrained convolutional neural network. We evaluate the accuracy and performance of the proposal and illustrate possible extensions. Training of neural networks for automated diagnosis of pigmented skin lesions can be a difficult process due to the small size and lack of diversity of available datasets of dermatoscopic images. The HAM10000 ("Human Against Machine with 10000 training images") dataset is a collection of dermatoscopic images from different populations acquired and stored by different modalities. We used the benchmark dataset, with a small number of images and a strong imbalance among the 7 different types of lesions, to prove the validity of our approach, which is characterized by good results and light usage of resources. Exploiting a highly engineered convolutional neural network with transfer learning, customized data augmentation and a non-adaptive optimization algorithm, we show the possibility of obtaining a final model able to precisely recognize multiple categories, although scarcely represented in the dataset. The whole training process has a limited impact on computational resources, requiring no more than 20 GB of RAM space. The rest of paper is structured as follows: Sect. 2 describes the related work in the field of medical images processing. Section 3 illustrates the dataset of interest. Section 4 gives an overview of the model architecture. Section 5 includes the training process and shows experimental results. Finally, some final comments and future research directions are reported in Sect. 6. Processing of biomedical images has always been a field strongly beaten by CNN pioneers. The first related papers date back to 1991 [1] , with a strong impulse in the following years in the search for methods for automating the classification of pathologies and related diagnosis [2, 3] . Nowadays, almost thirty years later, reliability of networks reached a rather high level, as well as intrinsic complexity. This reliability allowed a wide diffusion of the approach of subjecting diagnostic images to automatic classification systems, from evolutionary algorithms [4] [5] [6] [7] to deep networks [8] [9] [10] [11] , being them either convolutive or not. Even in the medical sector of dermatology, automatic image recognition and classification was used for decades to detect tumor skin lesions [12, 13] . Recent and promising research has highlighted the possibility that properly trained machines can exceed the human recognition and classification capability to recognize skin cancers. The scores obtained are very encouraging [14] and we are confident that in the near future the recognition capacity of these forms of pathologies will become almost total. Today CNNs are used for image feature extraction. Features are used for image classification [15, 16] . Dermatoscopy is often used to get better diagnoses of pigmented skin lesions, either benign or malignant. With dermatoscopic images is also possible to train artificial neural networks to recognize pigmented skin lesions automatically. Nevertheless, training requires the usage of a large number of samples, although the number of high quality images with reliable labels is either limited or restricted to only a few classes of diseases, often unbalanced. Due to these limitations, some previous research activities focused on melanocytic lesions (in order to differentiate between a benign and malignant sample) and disregarded non-melanocytic pigmented lesions, even if very common. In order to boost research on automated diagnosis of dermatoscopic images, HAM10000 has been providing the participant of the ISIC 2018 classification challenge, hosted by the annual MICCAI conference in Granada, Spain [17] , specific images. The set of 10015 8-bit RGB color images were collected in 20 years from populations from two different sites, specifically the Department of Dermatology of the Medical University of Vienna, and the skin cancer practice of Cliff More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (followup), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). Other features in the individual dataset include age, gender and bodysite of lesion (localization) [17] . mentioned max pooling operations are still present in the reduction blocks. The structure of the network used in this work is shown on Fig.1 The original Inception-ResNet-v2 architecture [20] has a stem block consisting of the concatenation of multiple convolutional and pooling layers, while Inception-ResNet blocks (A, B and C) contain a set of convolutional filters with an average pooling layer. As prevously mentioned, reduction blocks (A, B) replace the average pooling operation with a max pooling one. This structure has been extended with a final module consisting of a flattening step, two fully-connected layers of 64 units each, and the softmax classifier. The overall module is trainable on a single GPU with reduced memory consumption. This work consists of two training rounds, after a step of data processing in order to deal with the strong imbalance of the dataset: -A first classification training process using class weights. -Rollback of previous obtained best model to improve classification performance with a second training phase. In the first stage of data processing, after the creation of a new column with a more readable definition of labels, each class was translated into a numerical code using pandas.Categorical.codes. Afterwards, missing values in "age" column was filled with column mean value. Figure 2 and Fig. 3 show the HAM10000 data distribution. Finally, images are loaded and resized from 450 × 600 to 299 × 299 in order to be correctly processed by the network. After a normalization step on RGB arrays, we split the dataset into a training and validation set with 80:20 ratio (Fig. 4) . In order to re-balance the dataset, we chose to shrink the amount of images for each class to an equal maximum dimension of 450 samples. This significant decrease of available images is then mitigated by applying a step of data augmentation. Training set expansion is made by altering images with small transformations to reproduce some variations, such as horizontal flips, vertical flips, translations, rotations and shearing transformations. Due to the limited number of samples for the training process, we decided to take advantage of transfer learning, utilizing Inception-ResNet-v2 pre-trained on ImageNet [21] and Tensorflow, a deep learning framework developed by Google, for fine-tuning of the last 40 layers. Keras library offers a wide range of optimizers: Adaptive optimization methods such as AdaGrad, RMSProp, and Adam are widely used for deep neural networks training due to their fast convergence times. However, as described in [22] , when the number of parameters exceeds the number of data points these optimizers often determine a worse generalization capability compared with non-adaptive methods. In this work we used a stochastic gradient descent optimizer (SGD), with learning rate set to 0.0006 and usage of momentum and Nesterov Accelerated Gradient in order to adapt updates to the slope of the loss function (categorical crossentropy) and speed up the training process. The total number of epochs was set to 100, using a small batch size 10. A set of class weight was introduced in the training process to get more emphasis on minority class recognition. A maximum patience of 15 epochs was set to the early stopping callback in order to mitigate the overfitting visible in Fig. 5 , which shows the history of training and validation process. Finally, the model achieves an accuracy of 73.4% on the validation set, using weights from the best epoch. Figure 6 shows the confusion matrix for the model on the validation set. Fig. 6 . Two of the minority classes, Actinic Keratoses (akiec) and Dermatofibroma (df), are not properly recognized. Melanoma (mel) is often mistaken for generic keratoses (bkl), as already mentioned in Fig. 3 In order to improve classification performance, specially on minority classes, we loaded the best model obtained in the first round to extend the training phase and explore other potential local minimum points of the loss function, by using an additional amount of 20 epochs. This second step led to an enhancement in overall predictions, reaching the maximum accuracy value of 78.9%. Figure. 7 shows the normalized confusion matrix on the validation set for the final fine-tuned model. In this case, 6 out of 7 categories are classified with a total ratio of True Positives higher than 75%, even in presence of extremely limited sample set, as vascular lesions (vasc), 30 samples, and dermatofibroma (df), 16 samples. The whole process of training has required less than four hours on Google Colab cloud's GPU, for an overall RAM utilization below 20 GB (Fig. 8 ). In conclusion, in this paper we investigate the possibility of obtaining improved performances in the classification of 7 significantly unbalanced different types of skin diseases, with a small amount of available images. With use of a finetuned deep inception network, data augmentation and class weights, the model can achieve a good final diagnostic accuracy. The described training process has a light resource usage, requiring less than 20 GB of RAM space, and it can be executed in a Google Colab notebook. For future improvements larger datasets of dermatoscopic images are needed. The model shown in this paper can be regarded as a starting point to implement a lightweight diagnostic support system for dermatologists, for example in the Web as well as through a mobile application. Image processing of human corneal endothelium based on a learning network Computerized detection of clustered microcalcifications in digital mammograms using a shift-invariant artificial neural network A simulation framework for scheduling performance evaluation on CPU-GPU heterogeneous system Pattern differentiation of glandular cancerous cells and normal cells with cellular automata and evolutionary learning An evolutionary approach to feature function generation in application to biomedical image patterns The AES implantation based on OpenCL for multi/many core architecture Strategies and systems towards grids and clouds integration: a DBMS-based solution A novel end-to-end classifier using domain transferred deep convolutional neural networks for biomedical images Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally ICCSA 2016 ICCSA 2017 Automatically early detection of skin cancer: study based on nueral netwok classification The skin cancer classification using deep convolutional neural network Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks An approach for improving automatic mouth emotion recognition Towards a learningbased performance modeling for accelerating deep neural networks The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions Gradient-based learning applied to document recognition Deep Learning Inception-v4, Inception-ResNet and the Impact of Imagenet classification with deep convolutional neural networks The marginal value of adaptive gradient methods in machine learning