A Feature-Based Out-of-Distribution Detection Approach in Skin Lesion Classification

Carvalho, Thiago; Vellasco, Marley; Amaral, José Franco; Figueiredo, Karla

doi:10.1007/978-3-031-45389-2_23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14196))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

542 Accesses
1 Citation

Abstract

When dealing with Deep Learning applications in open-set problems, accurately classifying known classes seen in the training phase is not the only aspect to be taken into account. In such a context, detecting Out-of-Distribution (OOD) samples plays an important role as an auxiliary task, generally solved by OOD detection methods. For medical applications, detecting unknown samples may in classification problems can be beneficial for many aspects, such as a better understanding of the diagnosis and probably a more adequate treatment. In this article, we evaluate a feature space-based approach, named as OpenPCS-Class, for OOD detection in medical applications, more specifically skin lesion classification. We compare the OpenPCS-Class against important OOD detection methods, evaluating different model architectures and OOD datasets. The OpenPCS-Class outperformed other methods at 48.4% and 5.3% in terms of FPR95 and AUROC, respectively.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Out-of-Distribution Detection for Long-Tailed and Fine-Grained Skin Lesion Images

Out-of-Distribution Detection for Skin Lesion Images with Deep Isolation Forest

Exploring distribution-based approaches for out-of-distribution detection in deep learning models

Article 28 December 2024

1 Introduction

Deep Learning (DL) models have found widespread use in various applications, ranging from autonomous driving [19] and pest detection to speech recognition [26]. Despite its outstanding results in tasks related to computer vision and natural language processing, accuracy is not the only subject to be taken into consideration in a DL deployment [10]. Depending on the problem, other aspects may also become important, such as the explainability and the capability to handle samples from unknown classes [34].

The DL models are known to learn generally in closed-set assumptions, and such out-domain restrictions are reflected in their inefficiency in explicitly showing ignorance about input samples from unseen classes. As a result, a DL model trained in such a setup is often unable to identify an unknown class data as unknown, which leads to problems of model overconfidence [32]. The overconfidence has several natures, such as unidentified overfitting problems, bias, or even the choice of the softmax function for the model’s output layer, making directly identifying unknown samples more difficult [34]. Therefore, the model needs to be robust and able to handle Out-of-distribution (OOD) samples, which can come in various forms depending on the problem.

For medical applications, OOD detection is an important auxiliary task to improve the ability to detect unseen classes in an open-set problem. For example, when classifying an unseen rare skin lesion using a DL model to classify skin lesions, it would be preferable to identify it as unknown instead of erroneously classifying it as one of the known classes [25, 36]. Therefore, the OOD detection task has drawn attention to a wide range of applications, such as histopathology [22], X-ray [3], and magnetic resonance images [14] classification problems.

OOD Detection can be considered a recent field of research in the area of DL, having one of the main objectives to improve the ability of models to recognize unknown samples. In other words, an OOD detection algorithm should be able to identify whether an input can be considered known or unknown. The most straightforward option for OOD detection is to use activations from the model’s output layer, as this is closest to the final inference result [9]. These strategies typically rely on logits or softmax outputs to compute confidence scores, which are then used to differentiate between known and unknown classes.

More recently, researchers have explored using the feature space of the model to identify unknown samples, based on the assumption that the feature space can be useful for OOD detection, as intermediate layers capture different levels of semantic features [20]. One of the methods is named Open Principal Component Score (OpenPCS), which uses a low dimensional feature space representation from Principal Component Analysis (PCA) to fit class-wise Gaussian distributions to identify whether data is known or unknown. This approach was first implemented for semantic segmentation problems, but it can be extended for multi-class classification, named as OpenPCS-Class [5]. However, the feature-space approach for OOD detection, especially the OpenPCS-Class, is still underexplored for many applications.

In this article, we evaluate the OpenPCS-Class for OOD detection in skin lesion classification problems. The objective is to evaluate the capability of a Gaussian-based approach using the feature space to identify unseen classes in this medical application, which is usually a complex task with numerous OOD classes related to unknown skin lesions. The contribution of this work is three-fold:

1.
We evaluate the OpenPCS-Class method for OOD detection in skin lesion problems. We use different OOD data to evaluate the approach, ranging from samples of unseen classes of skin lesions to different medical problems.
2.
We compare the results with traditional and state-of-the-art methods for OOD detection. We assess how these methods behave in the presence of different OOD classes and additional ID data.
3.
We also evaluate these models in different model architectures to investigate the model’s contribution to OOD detection using different space representations.

2 Related Works

Detecting OOD samples is crucial in building reliable Deep Learning models that need to operate effectively in an open-set scenario. In medical applications, such strategies allow DL models to enhance the robustness of such results in a critical task. These works are generally concentrated in semantic segmentation and image classification tasks [4]. Karimi et al. [13] proposed a spectral analysis of the intermediate features of DL models to enhance the robustness of the segmentation task in multiple organs by quantifying the uncertainty of the segmentation result. Wollek et al. [30] evaluated some state-of-the-art OOD detection methods in several medical application tasks related to the image classification problem, discussing the advantages and drawbacks of such methods in identifying unknown samples closer to the training classes.

Due to the relevance of this topic in DL applications to guarantee safety and robustness, there are a plethora of new strategies related to OOD detection. One of the most common methods for Out-of-Distribution (OOD) detection involves using the softmax output as an OOD score, known as Maximum Softmax Probability (MSP) [11]. The MSP is based on the idea that unknown class samples would generate lower confidence scores for each known class, which are then used to distinguish ID and OOD data. This method was evaluated in a wide range of problems, including medical applications. Zhang et al. [37], for example, evaluated the effectiveness of the MSP method in OOD detection for diabetic retinopathy detection and chest radiography-related problems. However, using softmax output can sometimes lead to overconfident scores on unknown data, which is inappropriate for OOD detection [33].

To avoid the issues associated with the softmax, the feature space can also distinguish between known and unknown samples. Lee et al. [15] proposed a method that uses the information from the feature space to detect OOD samples, assuming that the feature representation can be fitted into Gaussian distributions. In this case, the class-conditional Gaussian distributions are obtained and the score is computed as the Mahalanobis distance from a test sample to the closest class-conditional distribution [24]. This OOD detection method was applied in different medical applications related to image analysis, such as malaria parasitized cells classification [28], lung cancer classification [2], and skin lesion classification [25].

Despite its efficiency in the OOD detection task, the feature space is generally a high-dimensional representation, which can be often an inefficient representation with high redundancy and lead to a harder fit of the OOD detection method [31]. To alleviate the problem of high dimensionality in intermediate representations, Oliveira et al. [21] proposed a method for OOD detection, called OpenPCS, using PCA to reduce the dimensionality of the feature space. The low-dimensionality representation is then used to adjust class-conditional Gaussian distributions, and the score is calculated by finding the maximum likelihood between a sample’s intermediate representation and the class-conditional distributions. More recently, Carvalho et al. [5] proposed an extension of its method for multi-class classification problems, named as OpenPCS-Class. This method was successfully evaluated in benchmark problems, but the OpenPCS-Class is still unexplored in different applications, including medical image analysis.

3 Detecting Unseen Samples Using Feature Space

In this section, we describe in detail the OpenPCS-Class method strategy. We also briefly introduce the OOD detection problem in skin lesion problems, motivating the applicability of this work. The code of this work is publicly available^{Footnote 1}

3.1 Open Principal Component Score for Image Classification

The Open Principal Component Score (OpenPCS) is a method that uses intermediate features for OOD detection in a semantic segmentation task. Originally, this method could be applied only to Fully Convolutional Networks (FCN), which can be prohibitively for direct utilization of OpenPCS for different DL tasks.

The OpenPCS-Class can be seen as the extension of the OpenPCS method for classification tasks. This method discards the need of a FCN but retains the main characteristics of using a combination of intermediate features in a low-level representation. For a better comprehension of the method, Fig. 1 displays the method overview for an image classification problem.

The OpenPCS-Class is an OOD detection method that can combine features from different layers to distinguish whether a sample belongs to a known or unknown class. For each model layer l, we transform the activation map $a^{(l)}$ to the corresponding activation vector $h^{(l)}$ by using a reduction method (e.g., average pooling). Therefore, we always obtain its feature vector independently from the layer specification.

One of the main abilities of the OpenPCS method is the capability to combine the feature representation from different layers, which is a user-defined parameter. For classification tasks, the features are combined by concatenating their vectors, resulting in a feature vector h. The drawback of such an approach is the high dimensionality of the feature vector h. To alleviate this issue, we apply the PCA to obtain a better representation in a low dimension.

To fit the eigenvectors and eigenvalues for the PCA, we follow a class-wise approach. Therefore, we use the collection of feature vectors related to each of the known classes to fit the parameters of the PCA, creating a specific dimensionality reduction for each of the known classes, according to Eq. 1.

$$\begin{aligned} h^{*}_c = h \cdot v_{c} \end{aligned}$$

(1)

where h is the feature representation, $v_{c}$ is the eigenvector with the highest eigenvalues for dimensionality reduction for class c, and $h^{*}_c$ is the feature vector after transformation for class c. Therefore, depending on the class c, the resulting low-dimension feature vector $h^{*}_{c}$ can be different. It is important to note that we can obtain different low-dimensional representations for the same feature vector h, depending on the class c that we are evaluating.

The OOD score is computed by estimating how likely the feature vector is to each of the known class. In the literature, the Gaussian density estimator was successfully used to quantify the OOD-ness [18, 21]. Therefore, we adopted the Gaussian density estimator to fit and compute its corresponding likelihood. Mathematically, the OOD score for each class c is computed according to Eq. 2

$$\begin{aligned} G_c(h^{*}_c) = \frac{1}{\sqrt{2\pi \sigma _{c}^2}}\exp \left( -\frac{(h^{*}_c -\mu _{c})^2}{2\sigma _{c}^2}\right) \end{aligned}$$

(2)

where $\mu _c$ and $\sigma _c$ represents the mean and standard deviation for a known class c, and $G_{c}(a^{*}_c)$ represents the probability density of $h^{*}_c$ generated by $G_{c}$. The final OOD score is the maximum log-likelihood over all known classes, as defined in Eq. 3.

$$\begin{aligned} s = \max _{c=1}^{n} {\log \left[ G_{c} \left( h^{*}_{c} \right) \right] } \end{aligned}$$

(3)

where n is the number of classes. In summary, to detect if a sample can be considered as OOD, we obtain its feature vector representation and apply a class-wise dimensionality reduction, and for each low-dimensionality representation vector, we calculate the log-likelihood to its corresponding class. The OOD score can be viewed as the maximum log-likelihood over all of the classes. For an ID sample, the likelihood would be lower for all of the classes, except for the corresponding class, which yields a high s score. As for the OOD sample, the likelihood tends to be lower for all class-wise distributions, so the score s also is lower. Thereby, the ID and OOD samples can be distinguished by setting a threshold value for s.

3.2 OOD Detection in Skin Lesion Classification

With the world constantly evolving, new medical pathologies are frequently discovered through diagnosis. However, the identification of novel or rare diseases can be troublesome for DL-based automated diagnosis, potentially leading to incorrect classification and inappropriate treatment [36]. In such cases, OOD detection methods can play an essential role in identifying whether a new sample belongs to any of the known classes of the problem, thus providing an auxiliary task for DL-based approaches.

Specifically for dermatological-related tasks, OOD detection strategies can be handy to identify samples from unseen classes during the training phase. For instance, consider a deep learning problem aimed at automatically identifying the three most common skin lesions, as illustrated in Fig. 2. In an open-set scenario, the trained model may encounter unseen skin lesions, so it would be preferable to detect them as unknown instead of erroneously classifying them as the closer known class. Ideally, the OOD detection method should be capable of correctly classifying these samples as OOD, but this may depend on the chosen strategy [6]. Especially when ID and OOD samples are visually similar, it can be challenging to distinguish between known and unknown classes [30].

This article evaluates the feature space-based OOD detection method in different scenarios. In some experiments, we verify the capability of the OpenPCS-Class to detect near-OOD samples, typically skin lesions images taken in the same settings as the ID samples. We also evaluate some OOD detection approaches in the same problem (skin lesions), but under different conditions to evaluate the changing of such OOD detection strategies. Finally, we also evaluate these models in far-OOD detection samples, but related to medical applications.

4 Experiments

This section presents the experimental protocol for our case studies in the OOD detection task. In this work, we focused on the skin lesion classification problem, selecting different medical-related samples as OOD.

4.1 OOD Methods

We evaluated three robust methods commonly employed in this area to assess the OOD detection results. One of them is the Maximum Softmax Probability (MSP) method [11], which is a traditional approach that utilizes the softmax probability vector to identify unknown samples. By computing the maximum probability value of all classes, MSP assumes that a lower MSP score suggests that the model is less accurate about the predicted class, which could indicate an OOD sample.

Another method we selected is Energy-Based Out-of-Distribution detection (EBO) [16], a more sophisticated technique that uses the output space to calculate the OOD score. EBO computes the entropy of the logit and employs it as an OOD score to distinguish between OOD and ID samples.

We also opted for a feature space-based method for OOD detection to provide a more insightful discussion of the OpenPCS-Class approach. The Mahalanobis OOD detection method [12] measures the OOD score as the Mahalanobis distance of class-conditional Gaussian distribution in the feature space. In this case, if compared to the Gaussian distributions, OOD samples are expected to be further away from ID samples.

4.2 Datasets

For the OOD detection in medical multi-class classification, we have utilized the HAM10000 dataset as our In-Distribution dataset ($D_{in}$) for skin lesion classification [27]. This dataset comprises 10000 images of seven distinct skin lesions: Melanocytic nevi, Melanoma, Benign keratosis-like lesions, Basal cell carcinoma, Actinic keratoses, Vascular lesions, and Dermatofibroma. For this study, we have selected the first four classes as our ID classes, which contain 6705, 1113, 1099, and 514 samples, respectively. In addition, we have designated the remaining three classes as OOD samples to form the basis of our first case study. For the other experiments, we maintain the same $D_{in}$ and ID classes, changing the OOD samples.

The dataset from the second case study consists of a wide range of skin lesion images taken in different parts of the body. However, a significant issue with this dataset is that some classes overlap with those found in $D_{in}$. Therefore, to ensure a fair comparison of the OOD detection task, we remove the overlapping classes from $D_{out}$.

The third selected $D_{out}$ is related to the monkeypox classification problem [1]. The dataset contains images of monkeypox lesions and different skin lesions (e.g., chickenpox), given that the problem originally was built as a binary classification to identify whether a lesion can be considered monkeypox. Therefore, we used all images from this dataset as OOD samples in the third experiment.

For the fourth case study, we have manually selected images of rare skin lesions that do not belong to any of the classes in $D_{in}$. In this experiment, we have included additional ID images obtained from different circumstances than those found in the HAM10000 dataset. This collection of images will enable us to gain practical insights into the identification of unknown and uncommon classes in skin lesion classification and evaluate how the OOD detection methods perform when presented with different ID samples.

4.3 Metrics

To compare the methods, we selected three metrics to evaluate the OOD detection task in multi-classification problems [35].

AUROC (Area Under Recall Operating Curve) summarizes the Recall Operating Curve (ROC) as calculating the area under the curve. As ROC is usually used in a binary classification problem, to evaluate the OOD detection task using this metric, we consider only ID and OOD classes, independently from the fine-grained classes. Mathematically, the AUROC can be approximated as evaluating the True Positive Rate (TPR) and False Positive Rate (FPR) at discrete threshold values, presented in Eq. 4

$$\begin{aligned} \text {AUROC} = \sum _{i=1}^{n-1} \frac{1}{2} (x_{i+1} - x_i) (y_i + y_{i+1}) \end{aligned}$$

(4)

where n is the number of thresholds, $x_i$ and $y_i$ are the false positive and true positive rates, respectively, at the i-th threshold.

AUPR (Area Under Precision-Recall Curve) is a metric that summarizes the Precision-Recall trade-off for different threshold values for a specific class. This metric is highly important for imbalance problems, which may be the case for our experiments. Therefore, we calculate the AUPR for the OOD class.

FPR95 indicates the False Positive Rate (FPR) when the True Positive Rate (TPR) is 95%. Typically, the FPR95 describes how likely the method could erroneously classify as unknown at a reasonably high TPR. Therefore, the lower the FPR95 is, the better the OOD detection method. Unlike the other ones, this is a dependent, since we define a cutoff value to classify as known or unknown.

4.4 Experimental Details

To evaluate our proposed approach, we used the same experimental procedure in all experiments. For the $D_{in}$, we split the dataset proportionally into training (60%), validation(20%), and test (20%) sets. We fit the OOD detection methods using the training set and, for all experiments, we use the test samples from $D_{in}$ and the whole $D_{out}$ to evaluate the separability between ID and OOD samples, respectively. During the testing phase, we randomly selected 500 samples from each set of $D_{in}$ and $D_{out}$ (when applicable) and computed the average metrics over ten runs. We also used the Wilcoxon signed-test rank to verify the statistical significance between the best result metric and all others. In Sect. 5, we denote an average result with a statistical difference using an underscore in the tables.

We also assess the impact of the OOD detection methods in different model architectures. As our problem is related to the image classification, we selected three models, Vision Transformer (ViT) model [7], ConvNeXT [17], and ResNet [8]. For the first two architectures, we used the pre-trained weights on ImageNet1k and finetuned the classification layer in the $D_{in}$ problem. For the ResNet model architectures, we trained from scratch, following a similar training procedure as presented in the literature [29].

5 Discussion and Results

This section contains the results of the four case studies in medical applications. It is important to note that the selected OOD detection methods are similar in the experimental setup (i.e., it does not require any model retraining and just one forward pass is needed to identify OOD samples), but use different approaches to detect unseen classes.

For the first experiment, Table 1 summarizes the results for different OOD detection methods and architectures.

Table 1. OOD Detection Results for Experiment 1

Full size table

The first experiment is a more challenging for discriminating wheter a sample belongs to a known or unknown class. This idea is reflected in the OOD detection metric results, showing a lower AUROC score for all of the methods, if compared to the other experiments. Even so, we noticed that OpenPCS-Class outperformed all three methods in terms of AUROC and AUPR, independently from the model architecture. Also, the FPR95 shows that our approach can enhance the OOD detection task considering a real-world scenario, considering the threshold that yields a TPR at 95%. In that case, the OpenPCS-Class can lower the FPR in this condition up to 7.1% (using ResNet model and MSP method).

The model architecture plays an important role in OOD detection. In this experiment, the ViT model increased the capability to detect OOD samples, at least for the OpenPCS-Class method. Especially for feature-based approaches, the model used can impact directly the results, given that different model architectures can yield feature activations.

The second experiment can be considered an easier task in OOD detection if compared to the first one. Although the classes from $D_{out}$ are similar in the first two experiments, the images were obtained in different body parts, which can facilitate the OOD detection task. The results for the second experiment can be observed in Table 2.

Table 2. OOD Detection Results for Experiment 2

Full size table

In this experiment, the approaches based on the feature space had a better OOD detection capability, if compared to those who use the output space. In fact, the feature space can contain low-level and high-level feature information, which can help to detect unknown classes in different contexts. On the other hand, the output space does not contain such kind of information, which may help to understand the difference between those approaches. Therefore, these strategies directly impact the scores generated, as illustrated in Fig. 3.

The main objective of OOD detection is to yield scores that could be easy to distinguish between ID and OOD samples. To evaluate the distributions obtained in Fig. 3, we conduct a Welch t-test [38], which rejected the hypothesis that the ID and OOD distributions have equal means ($p < 0.05$) only for the OpenPCS-Class.

The OpenPCS-Class, in this experiment, outperformed all three methods for OOD detection (decreased 48.7% in terms of FPR95, if compared to Mahalanobis and ConvNeXT). However, there is a slight difference between the OpenPCS-Class and Mahalanobis methods, depending on the model architecture. For transformer-based models, both methods obtained, AUROC and AUPR metrics closer to one. This result corroborates recent findings that models based on Transformer architecture can enhance the robustness of OOD detection [23].

The third experiment uses skin lesions pathologies that are more different from those presented in $D_{in}$. The results are presented in Table 3.

Table 3. OOD Detection Results for Experiment 3

Full size table

Although the $D_{out}$ in the third study case contains images from skin lesions, we noticed that all methods, independently from the model architecture, enhanced the OOD detection metrics, if compared to the previous experiments. As the OOD samples are related to diseases like monkeypox and chickenpox, the images are more dissimilar to those presented to the model in the training phase (using $D_{in}$), so there is a low confidence score in the output space, and the feature representation from OOD samples are more dissimilar to those presented from ID samples, resulting in an easier OOD detection task.

In this experiment, the feature-based approaches also obtained a considerably high capability to detect samples visually dissimilar to those presented in $D_{in}$. For Transformer-based approaches, Mahalanobis and OpenPCS-Class obtained comparable results, given that they could almost differentiate total ID and OOD samples. However, for the ResNet architecture, the difference between those approaches is more significant, showing better performance for the OpenPCS-Class method (increased 5.1% if compared to the Mahalanobis detector).

For the last case study, Table 4 displays the results for OOD detection using the same experimental protocol as the previous experiments.

Table 4. OOD Detection Results for Experiment 4

Full size table

In this experiment, we observed that feature-based approaches performed comparably better in detecting a wide range of pathologies as OOD. Even in the presence of new images for ID classes that are slightly different from those presented in the $D_{in}$, the OpenPCS-Class outperformed other methods in all three evaluation metrics. Therefore, even with visually different ID samples, the feature-space approaches obtained better results in the OOD detection task.

Although we only present the distributions for the second experiment, we used the Welch t-test for all experiments in this section. For all experiments using the transformer-based architectures, we noted that the ID and OOD distributions can be easily distinguished (i.e., it contains different means) for the OpenPCS-Class.

6 Conclusions

In this work, we evaluated the OpenPCS-Class for a new domain of application for OOD detection, more specifically for skin lesion problems. The feature space-based approaches, in general, obtained a superior OOD detection when the OOD samples are visually more dissimilar to ID ones, corresponding to the latter three experiments of this work.

Compared to all the methods evaluated in the experiments, the OpenPCS-Class outperformed in all scenarios regarding AUROC, and 9 (out of 12) in terms of average FPR95. More interestingly, the transformer-based models were more suitable for the OpenPCS-Class method, which always obtained superior OOD detection results.

Going forward, we aim to evaluate the OpenPCS-Class in different medical classification problems, to get a better perspective of feature space-based models in OOD detection application problems.

Notes

1.
Code available at https://github.com/mdrs-thiago/skin-lesion-ood-detection.

References

Ali, S.N., et al.: Monkeypox skin lesion detection using deep learning models: a preliminary feasibility study. arXiv preprint arXiv:2207.03342 (2022)
Berger, C., Paschali, M., Glocker, B., Kamnitsas, K.: Confidence-based out-of-distribution detection: a comparative study and analysis. In: Sudre, C.H., et al. (eds.) UNSURE/PIPPI -2021. LNCS, vol. 12959, pp. 122–132. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87735-4_12
Chapter Google Scholar
Calderon-Ramirez, S., Yang, S., Elizondo, D., Moemeni, A.: Dealing with distribution mismatch in semi-supervised deep learning for COVID-19 detection using chest X-ray images: a novel approach using feature densities. Appl. Soft Comput. 123, 108983 (2022)
Article Google Scholar
Cao, T., Huang, C.W., Hui, D.Y.T., Cohen, J.P.: A benchmark of medical out of distribution detection. arXiv preprint arXiv:2007.04250 (2020)
Carvalho, T., Vellasco, M., Amaral, J.F.: Out-of-distribution detection in deep learning models: a feature space-based approach. In: International Joint Conference on Neural Networks (2023)
Google Scholar
Cho, W., Park, J., Choo, J.: Training auxiliary prototypical classifiers for explainable anomaly detection in medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2624–2633 (2023)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hendrycks, D., et al.: Scaling out-of-distribution detection for real-world settings. In: International Conference on Machine Learning, pp. 8759–8773. PMLR (2022)
Google Scholar
Hendrycks, D., Carlini, N., Schulman, J., Steinhardt, J.: Unsolved problems in ML safety. arXiv preprint arXiv:2109.13916 (2021)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
Kamoi, R., Kobayashi, K.: Why is the mahalanobis distance effective for anomaly detection? arXiv preprint arXiv:2003.00402 (2020)
Karimi, D., Gholipour, A.: Improving calibration and out-of-distribution detection in deep models for medical image segmentation. IEEE Trans. Artif. Intell. 4, 383–397 (2022)
Article Google Scholar
Lambert, B., Forbes, F., Doyle, S., Tucholka, A., Dojat, M.: Improving uncertainty-based out-of-distribution detection for medical image segmentation. arXiv preprint arXiv:2211.05421 (2022)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in neural information processing systems, vol. 31 (2018)
Google Scholar
Liu, W., Wang, X., Owens, J., Li, Y.: Energy-based out-of-distribution detection. Adv. Neural. Inf. Process. Syst. 33, 21464–21475 (2020)
Google Scholar
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Google Scholar
Martinez, J.A.C., Oliveira, H., dos Santos, J.A., Feitosa, R.Q.: Open set semantic segmentation for multitemporal crop recognition. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021)
Article Google Scholar
Muhammad, K., et al.: Vision-based semantic segmentation in scene understanding for autonomous driving: recent achievements, challenges, and outlooks. IEEE Trans. Intell. Transp. Syst. 23, 22694–22715 (2022)
Article Google Scholar
Nunes, I., Pereira, M.B., Oliveira, H., Santos, J.A.D., Poggi, M.: Fuss: Fusing superpixels for improved segmentation consistency. arXiv preprint arXiv:2206.02714 (2022)
Oliveira, H., Silva, C., Machado, G.L., Nogueira, K., Dos Santos, J.A.: Fully convolutional open set segmentation. Mach. Learn. 112, 1733–1784 (2021)
Article MathSciNet MATH Google Scholar
Pawlowski, N., Glocker, B.: Abnormality detection in histopathology via density estimation with normalising flows. In: Medical Imaging with Deep Learning (2021)
Google Scholar
Podolskiy, A., Lipin, D., Bout, A., Artemova, E., Piontkovskaya, I.: Revisiting Mahalanobis distance for transformer-based out-of-domain detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13675–13682 (2021)
Google Scholar
Ren, J., Fort, S., Liu, J., Roy, A.G., Padhy, S., Lakshminarayanan, B.: A simple fix to mahalanobis distance for improving near-OOD detection. arXiv preprint arXiv:2106.09022 (2021)
Roy, A.G., et al.: Does your dermatology classifier know what it doesn’t know? detecting the long-tail of unseen conditions. Med. Image Anal. 75, 102274 (2022)
Article Google Scholar
Swetha, P., Srilatha, J.: Applications of speech recognition in the agriculture sector: a review. ECS Trans. 107(1), 19377 (2022)
Article Google Scholar
Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1), 1–9 (2018)
Article Google Scholar
Uwimana, A., Senanayake, R.: Out of distribution detection and adversarial attacks on deep neural networks for robust medical image analysis. arXiv preprint arXiv:2107.04882 (2021)
Wightman, R., Touvron, H., Jégou, H.: ResNet strikes back: an improved training procedure in timm. arXiv preprint arXiv:2110.00476 (2021)
Wollek, A., Willem, T., Ingrisch, M., Sabel, B., Lasser, T.: A knee cannot have lung disease: out-of-distribution detection with in-distribution voting using the medical example of chest X-ray classification. arXiv preprint arXiv:2208.01077 (2022)
Wright, J., Ma, Y.: High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications. Cambridge University Press (2022)
Google Scholar
Wu, Y., et al.: Revisit overconfidence for OOD detection: reassigned contrastive learning with adaptive class-dependent threshold. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4165–4179 (2022)
Google Scholar
Wu, Y., et al.: Disentangling confidence score distribution for out-of-domain intent detection with energy-based learning. arXiv preprint arXiv:2210.08830 (2022)
Yang, J., Zhou, K., Li, Y., Liu, Z.: Generalized out-of-distribution detection: a survey. arXiv preprint arXiv:2110.11334 (2021)
Ye, N., et al.: OOD-bench: quantifying and understanding two dimensions of out-of-distribution generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7947–7958 (2022)
Google Scholar
Zadorozhny, K., Thoral, P., Elbers, P., Ciná, G.: Out-of-distribution detection for medical applications: guidelines for practical evaluation. In: Shaban-Nejad, A., Michalowski, M., Bianco, S. (eds.) Multimodal AI in Healthcare. Studies in Computational Intelligence, vol. 1060, pp. 137–153. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-14771-5_10
Zhang, O., Delbrouck, J.-B., Rubin, D.L.: Out of distribution detection for medical images. In: Sudre, C.H., et al. (eds.) UNSURE/PIPPI -2021. LNCS, vol. 12959, pp. 102–111. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87735-4_10
Chapter Google Scholar
Zimmerman, D.W., Zumbo, B.D.: Rank transformations and the power of the student T test and welch T’test for non-normal populations with unequal variances. Can. J. Exp. Psychol. 47(3), 523 (1993)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, Conselho Nacional de Desenvolvimento e Pesquisa (CNPq) under Grants 140254/2021-8 and 308717/2020-1, and Fundação de Amparo à Pesquisa do Rio de Janeiro (FAPERJ)

Author information

Authors and Affiliations

Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
Thiago Carvalho & Marley Vellasco
Rio de Janeiro State University, Rio de Janeiro, Brazil
José Franco Amaral & Karla Figueiredo

Authors

Thiago Carvalho
View author publications
Search author on:PubMed Google Scholar
Marley Vellasco
View author publications
Search author on:PubMed Google Scholar
José Franco Amaral
View author publications
Search author on:PubMed Google Scholar
Karla Figueiredo
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Thiago Carvalho .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carvalho, T., Vellasco, M., Amaral, J.F., Figueiredo, K. (2023). A Feature-Based Out-of-Distribution Detection Approach in Skin Lesion Classification. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-45389-2_23
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)

A Feature-Based Out-of-Distribution Detection Approach in Skin Lesion Classification

Abstract

Similar content being viewed by others

Out-of-Distribution Detection for Long-Tailed and Fine-Grained Skin Lesion Images

Out-of-Distribution Detection for Skin Lesion Images with Deep Isolation Forest

Exploring distribution-based approaches for out-of-distribution detection in deep learning models

1 Introduction

2 Related Works

3 Detecting Unseen Samples Using Feature Space

3.1 Open Principal Component Score for Image Classification

3.2 OOD Detection in Skin Lesion Classification

4 Experiments

4.1 OOD Methods

4.2 Datasets

4.3 Metrics

4.4 Experimental Details

5 Discussion and Results

6 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Keywords

Publish with us

Profiles

A Feature-Based Out-of-Distribution Detection Approach in Skin Lesion Classification

Abstract

Similar content being viewed by others

Out-of-Distribution Detection for Long-Tailed and Fine-Grained Skin Lesion Images

Out-of-Distribution Detection for Skin Lesion Images with Deep Isolation Forest

Exploring distribution-based approaches for out-of-distribution detection in deep learning models

Explore related subjects

1 Introduction

2 Related Works

3 Detecting Unseen Samples Using Feature Space

3.1 Open Principal Component Score for Image Classification

3.2 OOD Detection in Skin Lesion Classification

4 Experiments

4.1 OOD Methods

4.2 Datasets

4.3 Metrics

4.4 Experimental Details

5 Discussion and Results

6 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us

Profiles