1 Introduction

Non-alcoholic fatty liver disease (NAFLD) is increasingly diagnosed worldwide. It is the most common cause of abnormal liver function tests and chronic liver disease in both developed and developing countries [1]. NAFLD refers to fat accumulation, mainly triglycerides, in hepatocytes so that it exceeds 5% of the liver’s weight. Primary NAFLD results from insulin resistance and often occurs as part of the metabolic changes accompanying obesity, type 2 diabetes and dyslipidemia. The histological damage in NAFLD is very similar to that seen in patients with alcoholic liver disease, but NAFLD is not alcohol-induced by definition [2]. So, it is important to exclude secondary causes of steatosis. Furthermore, NAFLD affects approximately 1.5 billion people worldwide, making it the most common liver disease [3].

Given the seriousness of this context, the diagnosis of NAFLD must be rapid, accurate and efficient. Currently, liver biopsy is the most specific method for diagnosing NAFLD, and it also makes it possible to assess the severity of fatty infiltration in the liver. However, this method is an invasive procedure that poses a risk to the patient and is expensive. Therefore, alternative pathology detection methods are desirable in this context [4]. Most patients are asymptomatic, and the diagnosis is considered suspicious of NAFLD after finding elevated transaminases in routine tests. This disease is also a frequent incidental finding on ultrasound (US) performed for other reasons, such as suspected gallstones [5]. Hepatic US is currently the most widely available, simple and inexpensive non-invasive method for detecting hepatic steatosis in clinical practice. However, the accuracy of US is highly operator-dependent, and its sensitivity is reduced when steatosis infiltration is less than 30% or in morbidly obese patients. In addition, the quantification of hepatic steatosis is subjective and can be influenced by the heterogeneity observed in some patients with NAFLD [6].

On the other hand, thermographic studies have been widely included in medical practice to obtain additional data for the diagnosis of various diseases and determine the best methods and the effectiveness of treatment. The main advantages of thermographic research are the relatively low cost, absence of ionizing radiation or electromagnetic fields, absence of contraindications, safety, and the ability to diagnose the disease at an early stage [7]. Thus, thermography in medicine can be used to identify the dynamics of pathological processes. Specifically in patients with NAFLD, it is observed that the development of the disease causes a decrease in the surface temperature of the liver [8], making it a promising method for diagnosis. Even so, the capture quality of the infrared emission that thermal cameras use to generate surface thermograms is influenced by the environment, which attenuates the thermographic effect, making its detection difficult. Consequently, a controlled environment is required for image acquisition and pre-processing to reduce noise and refine image characteristics [9].

In this context, few studies have been carried out using medical thermal images in conjunction with Neural Networks for detection or classification. Pinto et al. (2021) and Farias et al. (2023) developed a study to classify thermal images of the liver into two groups: healthy and NAFLD. The first used the AlexNet Convolutional Neural Network with satisfactory results, achieving an accuracy of 96%. The second study used the temporal analysis technique with a temporal convolutional network, achieving an accuracy of 88%. Both prove the great potential that the analysis of medical thermal images has if combined with machine learning, enabling professionals to be supported by Neural Networks during the diagnosis of this disease.

In machine learning, convolutional neural networks (CNNs) are important architectures that enable the resolution of some problems, such as facial recognition, autonomous vehicles, and intelligent medical treatment [12]. Furthermore, in deep learning, CNNs are widely known for their ability to achieve high accuracy in classifying medical images. However, these models that are not pre-trained, i.e., with the initial weights of the network starting with random values, have some limitations.

Training the deep learning model on a large number of images requires huge computational resources, and for proper training of the model, it needs a very significant amount of standard training datasets, which is the biggest problem for medical images; after all, data can be expensive and difficult to obtain. Furthermore, there are still ethical privacy issues [13]. With this perspective, Siamese Neural Networks (SNNs) emerge as an architecture of artificial neural networks composed of two identical neural networks (they share the same weights), united using perceptrons whose function is to calculate the similarity between the response of the two networks [14]. They were proposed to attenuate the data scarcity problem since this type of architecture requires small samples for training and learning the model.

Therefore, considering the importance of a non-invasive diagnosis of NAFLD together with the contribution that machine learning is capable of bringing to the classification of images within the scope of computer vision, the present work aims to classify thermal images of the liver region as affected by NAFLD or healthy, using a Siamese neural network architecture. This work makes the following contributions: a study for the processing of medical thermal images, a study to detail the structure of the Siamese network and the application of the Siamese neural network in the classification of medical thermal images. This study is structured as follows: Sect. 2 presents the proposed method and the image dataset used. Section 3 presents the result and discusses what was obtained. Finally, Sect. 4 presents the conclusions of this work.

2 Materials and Method

This section presents the image dataset used and the proposed method for classifying thermal images of the liver (Fig. 1). The method consists of three steps: The first is the extraction of the region of interest (ROI), followed by the pre-processing of the ROI and, finally, the classification of the images by the Siamese network. Below, each step is detailed.

Fig. 1.
figure 1

Proposed Method

2.1 Dataset

The dataset comprises images from 40 patients, 18 diagnosed with NAFLD and 22 healthy. In total, there are 103 images of patients with this disease and 132 images of healthy patients. The diagnosis of these patients for NAFLD was confirmed via ultrasound of the abdominal region by the specialist, serving to determine and label the patients as affected or not by NAFLD. A thermograph with an infrared sensor FLIR - Model S650C, which has a resolution of 640\(\,\times \,\)480 pixels, has been used to acquire the images. As mentioned, a controlled environment is required, so the specialist has developed a thorough protocol for acquiring the thermal images, which consists of certain steps: initially, the environment is acclimatized, where the patient is kept for 15 min at 23\(^\circ \)C and a relative humidity of 65%.

At this location, the patients were positioned in dorsal decubitus (lying with their chests up) and the images were then taken with the camera above the patient, forming a \(90^\circ \) angle (Fig.  2). Five images were captured per patient. In addition, throughout the process of acclimatization and taking the images, the patients had their abdominal region exposed.

Fig. 2.
figure 2

Examples of thermal images obtained from the dataset.

2.2 Extraction of the Liver Region (ROI)

In this study, the region of interest (ROI) comprises the location of the liver in the acquired images. Ozougwu (2017) defines the liver region as the upper right quadrant of the abdomen. The ROI extraction process was carried out using the proprietary software Flir ToolsFootnote 1, provided by the manufacturer of the thermograph used to acquire the images. Then, with the help of the specialist, ROIs of different dimensions were obtained from the dataset to capture the liver region better since each patient has a different physiology. The ROI was exported to a temperature table in degrees Celsius, where each cell shows a value for the temperature at each specific point.

2.3 Pre-processing

Once the ROI had been delineated, its temperature table was transformed into thermal images, which use a gray scale for their representation (Fig 3(a)). The process described in Eq. 1 was used to map the temperature values read t to other desired domains f(t).

$$\begin{aligned} f(t) = \frac{(t*1000 - oldMin)(newMax - newMin)}{(oldMax - oldMin)} + newMin. \end{aligned}$$
(1)

In Eq. 1, oldMin represents the lowest temperature among all the images in the dataset, multiplied by 1000, while oldMax is the highest temperature among all the images multiplied by 1000. newMin and newMax are constant values that represent the minimum and maximum values of the scale adopted (0 and 255 for gray). In this way, all the base images were converted into intermediate thermal images.

Fig. 3.
figure 3

ROI pre-processing steps

Filters were applied to the images to enhance them. The median filter and adaptive histogram equalization (CLAHE) [16] (Fig 3(b)) were used and demonstrated a significant improvement in network performance in preliminary studies [10, 11]. Finally, to solve the problem of the different ROI dimensions obtained for each patient, all the ROI images were centrally cropped to 200 by 200 pixels.

2.4 Classification

Siamese Networks have a high potential to learn, even from a small set of data, and require relative computing power compared to what they can deliver. With this in mind, a Siamese Neural Network architecture was developed for this context. This architecture works with two input images and a single output, which provides a number of how similar these two images are by means of a similarity function.

The architecture is composed of a pair of Convolutional Networks that share the same weights and are responsible for extracting features from the input image pair. The internal structure of the pair of convolutional networks comprises four Convolutional Layers, with kernel size of 2 by 2 pixels, each containing a MaxPooling layer and a Dropout layer. The two outputs of the Convolutional Networks are processed by the Euclidean Distance Layer, converging their result to a single neuron, activated by a sigmoid function, therefore, this last neuron is also trained, generating the final result of the similarity of the two images (Fig.  4). Thus, the closer the value is to 1.0, the network identifies that the two input images are similar, i.e. they should belong to the same class; otherwise, for values close to 0.0, the network identifies that the input images are not very similar, i.e. they should belong to different classes.

Fig. 4.
figure 4

Siamese Neural Network Architecture

2.5 Evaluation Metrics

As a result of the classification, there are four possibilities: true positives (TP), which are patients with NAFLD who have been correctly classified as sick, and true negatives (TN), which are healthy patients who have been correctly classified as healthy. There are also false positives (FP), which are healthy patients who were wrongly classified as having NAFLD, and false negatives (FN), which are patients with NAFLD who were wrongly classified as healthy. Based on these values, it was possible to calculate the metrics chosen to evaluate the network: accuracy, precision and recall. Each metric is calculated according to the Eqs. 2, 3, and 4 below.

$$\begin{aligned} precision = \frac{TP}{TP+FP}, \end{aligned}$$
(2)
$$\begin{aligned} recall = \frac{TP}{TP+FN}, \end{aligned}$$
(3)
$$\begin{aligned} accuracy = \frac{TP+TN}{TP+TN+FP+FN}. \end{aligned}$$
(4)

In medical applications, one of the most important metrics is the recall metric since the most significant risk to the patient is that they have the disease, and the model classifies them as healthy. Recall, therefore, provides the model’s effectiveness in detecting the disease where it exists. Precision calculates how effective the model is at detecting healthy patients where there are healthy ones. Finally, accuracy reveals how effective the model is at getting its classifications right, whether positive or negative.

3 Results and Discussion

This section presents the experimental setup and the analysis of the results achieved at each stage of the proposed method.

3.1 Experimental Setup

The dataset was divided into three different sets: train, test, and validation. The images were separated per patient so that each set was as balanced as possible since the number of images per patient was different. Therefore, the final training set had 126 (66 healthy and 60 NAFLD) images, the test set had 89 (56 healthy and 33 NAFLD) images, and the validation set had 20 (10 healthy and 10 NAFLD) images. There was a careful separation so that the images of the same patient belong to a single set.

The training was designed as follows: as the Siamese network receives pairs of images as input, two image pairs were formed for each image in the training set, the first with a positive example (same class) and the other with a negative example (different classes), so that the image completing the pair was chosen at random. As a result, the network could have one positive and one negative example for each image, and a total of 252 image pairs could be provided.

The network’s hyperparameters were batch size 10, Adam optimizer with a learning rate of 0.001, Dropout layers of 0.3 and a binary cross-entropy chosen as the loss function since the problem is one of binary classification (similar or not similar). Data augmentation was not used, as the aim was to work with the original data set, even with a few examples, and observe the behaviour of the Siamese network, which was proposed precisely for this reason. Training was carried out over 200 epochs on the Google Colab platform in conjunction with the Keras library.

3.2 Image Classification Results

The test stage was designed specifically for the context of this work. The Siamese network receives two images as input, but in a real scenario, the specialist treats each image individually, using the Neural Network to classify the patient’s image as healthy or NAFLD. Given this, it was necessary to devise a test strategy to meet this context.

The strategy adopted was to find the two best training images, one of a healthy patient and the other of a patient with NAFLD. This choice was made by analyzing the output of the network that had already been trained. For all the training pairs, among the pairs of the same class, the healthy pair and the NAFLD pair with the best classification values were extracted, and at the end of the process, the first image from each pair was chosen. In this way, each image in the test set could be tested at the same time with the most representative images of the healthy and NAFLD classes.

The result with the highest network response, as a similarity function, will be the final classification of the image tested. The results of the evaluation metrics obtained are shown in Table  3.2, along with the results of the related works carried out by [10, 11].

Table 1. Results obtained by the proposed method and comparison with related works

Compared to the few studies already carried out in the area [10, 11], the Siamese network, despite being an early-stage study, shows promising results. Recall, as the most relevant metric, reached 96%, demonstrating that the model is effective at detecting sick people among all the sick people. This represents a great improvement compared to the work of Pinto et al. (2021), and very close to what was achieved by Farias et al. (2024).

The Siamese Network has an advantage in that it does not require the use of Data Augmentation and has a much simpler architecture than those used in the related works. This means less computing power for the training and testing phase, and less time is required for pre-processing the images. Accuracy was the lowest of the three metrics, but this is to be expected for a work still in progress. The lowest metric recorded was accuracy, meaning that the model still classifies many healthy people as sick, increasing the number of revisions to be carried out by the specialist assisted by the network.

Analyzing the architectures of the networks compared, it is clear that the convolutional part of the proposed network, used for feature extraction, is much simpler than the others, which may cause low accuracy and low precision. The network learned better the characteristics of the images of sick patients than the images of healthy patients, revealing a point of attention in the images of this group in the used dataset. Figure  5 shows an example of each network classification, where (a) is a true positive, (b) is a true negative, (c) is a false positive and (d) is a false negative.

Fig. 5.
figure 5

Images classified by the proposed method

The results obtained show that the potential of the Siamese network is notorious. It represents a promising step in the classification of thermal images for non-alcoholic hepatic steatosis precisely because of the low quantity of images available for training the network, a situation in which Siamese networks acquire their greatest value.

4 Conclusion

This study presented an approach using Siamese Neural Networks (SNN) to classify non-alcoholic fatty liver disease (NAFLD). The use of thermal images has shown promise, highlighting the importance of adequate pre-processing to improve image quality and reduce noise. The proposed architecture achieved an accuracy of 71%, precision of 57% and recall of 96%, revealing that SNNs are a viable alternative for NAFLD classification, especially in scenarios similar to the context presented where there is a limited amount of training data. For future research, it is suggested to explore additional image enhancement techniques, as well as using the architecture already presented, but with modifications in the feature extraction stage, replacing the convolutional layers with a network specialized in extracting features such as, for example, different versions of EfficientNet.