Abstract
According to the World Health Organization (WHO), 2.3 million women were diagnosed with breast cancer in 2020, causing almost 700.000 deaths worldwide. The first occurrences (in situ stage) usually have a good response if there is an early detection because the earliest form of tumor does not have sufficient potential to kill. However, the next stage has a low survival rate because the most threatening tumor characteristic is cell division spreading throughout the body, damaging lungs, livers, bones, and the brain. An alternative to deal with this problem is to enhance the velocity of the diagnosis and detection of breast cancer. Thus, machine learning algorithms have proven to be effective in this task. In this context, this work investigates how double transfer learning improves two Deep Learning architectures, VGG-16 and VGG-19, using histopathological images, i.e., employing BreakHIS as the first transfer learning and PatchCamelyon as the second. Results indicate that using double transfer learning, results for precision, specificity, recall, and F-Score improve to 99%, 83%, 89%, and 90%, respectively.
Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Cancer is the uncontrollable growth of cells that can spread to other parts of the body [3]. Breast cancer, which originates in the breast, is a disease that affects males and females, but the majority are women over 40 years old. Obesity, drug use, family history, and late menopause are some of the consequences that favor the occurrence. Particularly in Brazil, the National Cancer Institute estimates a total of 704,000 new cases of cancer in the country, and the most frequent is Breast Cancer, representing 30.1% of all occurrences [13].
Having said that, there are other issues associated with the illness. The primary problem is that there are more than 100 subdivisions of Breast Cancer [12] that the pathologists need to differentiate during an exam analysis, which makes it a challenging task even if the professional is very experienced. Moreover, people are susceptible to physical, physiological, mental, and external factors that can affect their work efficiency, such as illness, fatigue, excessive work, and familiar issues, among many others.
Thus, to help the pathologists in this endeavor and try to decrease these high occurrences of cancer, the main focus of this investigation is on improving diagnosis methods and techniques for early detection to increase patients’ life expectations, i.e., before metastasis. Additionally, pathologists used to have confidence in their expertise to analyze thousands of histopathology images. On the other hand, according to Litjens et al. [8], one of the performance tests with 11 pathologists without time constraints reached 72.4% of correct diagnosis (27.6% were not identified during the test).
In this context, this work aims to investigate VGG-16 and VGG-19 using single and double transfer learning (DTL) to help pathologists detect breast cancer metastasis efficiently. The idea is to train VGGs using the BreakHis dataset [18], then use the PatchCamelyon for detecting metastasis. Therefore, this work is divided as follows: Sect. 2 presents some related works; Sect. 3 shows some literature review and our proposal; Sect. 4 shows the computational experiment, its setup, and results; finally, Sect. 5 presents the conclusions of this investigation and future work.
2 Related Works
Well-known convolutional neural network architectures (CNN), such as VGG, Resnet, Densenet, etc., have played a notable role in image recognition with the advent of Deep Learning [1]. For instance, Visual Geometry Group (VGG) [17] won the ImageNet Challenge in 2014, and it was presented during the International Conference on Learning Representations. This architecture brought a different concept of Deep Networks with an improved small architecture (3x3 conv. filters and 16–19 weight layers) for pattern recognition. Nowadays, VGG is commonly used in the medical context for anomaly detection, characterizing neoplasias into benign or malign based on image recognition [15], such as the work of Filho & Cortes [5], Santos-Bustos et al. [16], and Liu et al. [9].
Ismail & Sovuthy [6] performed a comparison between VGG-16 (94% ACC) and ResNet50 (91.7% ACC) in the classification of two classes of neoplasias (normal and abnormal tumor). In other investigations, VGG-19 has shown the best results in many articles, such as [21] and [15]. In VigneshKumar & Sumathi [21], a hybrid architecture combining VGG-19 and SVM was presented, which reaches 98% of sensibility and 97.8% of accuracy. Furthermore, in Saaber et al. [15], the AUC, Accuracy, Specificity, Precision, and F1-score evaluated for classifying breast cancer using histopathological images were all more than 97.35% (precision), and the best one is 99.5% for AUC.
Regarding specifically breast cancer detection using DTL, Matos et al. [10] proposed a classification approach of histopathological images from BreakHis dataset comparing Double Transfer Learning (DTL) using Inception-v3 pre-trained with ImageNet dataset and Support Vector Machine (SVM) to classify breast cancer patches in four magnitude levels: 40x, 100x, 200x and 400x. Results showed that the proposed DTL approach got an accuracy of 91%.
Vo-Le et al. [22] investigated DTL in VGG-16, GoogLeNet, and ResNet-50 using the Patch Camelyon as a base transfer learning. Afterward, they trained different machine learning classifiers on VBCan dataset, a set of images of hematoxylin and eosin (H&E) stained lymph node sections collected from two specialized hospitals in Vietnam, to improve the deep learning models. The results have shown that VGG obtained the best Recall (97.76%), ResNet the best accuracy (96.98%), and GoogleNet the best precision (98.58%).
In our study, we are investigating the use of double transfer learning technique in VGG-16 and VGG-19 to detect breast cancer metastasis in the Patch Camelyon dataset. To achieve this, we first trained the models using the BreakHis dataset. Next, we used the Patch Camelyon dataset to identify the presence of metastasis. Finally, we compare our results with those of [10] and [4].
3 Material and Method
3.1 Convolutional Neural Networks
The mathematical representation of the human neuron was presented for the first time by [11] in the Bulletin of Mathematical Biology. McCulloch and Pits proposed that all the behaviors of neurons in the human brain can be interpreted logically, considering impulses (neuron activities), inhibitory or excitatory synapses, and a logical theory called “Nets without circles”.
The concepts developed by McCulloch and Pitts have served as the foundation for all neural network and deep learning architectures created to date. Since 2010, deep learning networks have been widely recognized as one of the most effective modern techniques for solving complex pattern recognition problems [1], especially those neural networks, so-called Convolutional Neural Networks (CNN) that have shown high accuracy in recognizing abstract patterns and features that a human would not easily obtain [19].
A CNN has an input layer that receives data used for training and validation. In this present work, the input layers [2] are histopathological features from Whole-Slide medical images; then, data goes through convolutional layers that are blocks of a deep network, in which filters and parameters are combined in the training stage; usually, after a convolutional layer there is a pooling layer, which are essential components used for reducing spatial dimensions of the input data (width \(\times \) length); finally, a CNN can use a dense layer represented by a Fully Connected Neural Network, which combines the outputs from previous layers performing a prediction (classification or regression). The complete illustration of a CNN can be found in Fig. 1 and Fig. 2, representing the architectures used in this work.
The difference between VGG16 and VGG19 is the number of layers. While VGG16 consists of sixteen weight layers (thirteen convolutional layers and three fully connected layers), VGG19 has nineteen weight layers (sixteen convolutional layers and three fully connected layers). Furthermore, VGG16 has five blocks of convolutional layers, with the number of convolutional layers in each block being [2, 2, 3, 3, 3]. In comparison, VGG19 also has five blocks of convolutional layers, but the number of convolutional layers in each block is [2, 2, 4, 4, 4].
3.2 Proposal: Double Transfer Learning and VGG
In our proposal, we developed a structure with the combination of the Double Transfer Learning (DTL) technique, VGG16, and VGG19 (trained separately) using three datasets: ImageNet (pre-trained), BreakHis, and Patch Camelyon (target domain). These architectures were chosen because of their superior performance in classification tasks in different domains. To clarify how DTL, VGG networks, and connections work, Fig. 3 shows the architecture proposal.
The VGG16 and VGG19 models are initially pre-trained using data from ImageNet before extracting histopathological features from the input data. Next, the convolutional neural networks (CNNs) are trained using the Patch Camelyon dataset for Simple Transfer Learning (STL), the first classification step. Then, the VGG models are further trained using the BreakHis dataset before being tested using the Patch Camelyon dataset to determine if a second transfer learning process affects the detection of metastasis.
4 Computational Experiments
4.1 Setup
The Google coding environment is used: Collaboratory and Drive, which store all the Python notebooks and datasets (BreakHis and Patch Camelyon). The code was executed using the following configuration: GPU Tesla T4, 12 GB RAM, and 78.2 GB HDD with TensorFlow and Keras frameworks.
The model parameters set for the train were kernel size 3 \(\times \) 3, pooling size 2 \(\times \) 2, input shape 640 \(\times \) 640 \(\times \) 3 for BreakHis first classification step, and 96 \(\times \) 96 \(\times \) 3 for input shape of the classification with target domain (Patch Camelyon). Each VGG was trained in five epochs, batch equal to 32, seed 1337, image size 224 (Tensor length), and four filters.
Early Stopping was used if there was no more progress in the learning attributes, and a Reduce Learning Rate was applied to control the tax training flow in each epoch, reducing the factor of 0.5 with a minimum of 0.00001 that can reach the learning rate. The sampling method was hold-out using the rate 80/20. The code of the present work is stored in Google Drive, which can be easily accessed at https://github.com/danyllosilva/MBCD-BK-PC-MSc.
4.2 Database
Two datasets were used for training and testing the VGG algorithms: Patch Camelyon, reference for metastatic tissue detection [20] and BreakHis Dataset, reference for breast cancer detection [7].
Breast Cancer Histopathological Database (known also as BreakHis dataset) version 1.0 is composed of 7,909 microscopic images in TIF file format, 3-channel RGB, and four different magnifying factors: 1,995 images in factor 40x, 2,081 in 100x, 2,013 in 200x, and 1,820 in 400x (Fig. 4).
The Patch Camelyon dataset contains 327,800 histopathological images from Radboud University Medical Center, and it was divided into patches labeled by professional pathologists with an indication of the positive or negative occurrence of metastatic breast cancer in the patch region. The base ratio is balanced 50/50 for the two classes, and the image resolution is 96 \(\times \) 96, using 3-channel RGB and TIF as file formats. Each label in the image helps the algorithm learn the region’s characteristics during the training phase to give a conclusive diagnosis of the presence or absence of metastatic breast cancer. Figure 5 shows an example of how each patch can be found in the dataset.
4.3 Pre-processing
In pre-processing, some transformations in the images from BreakHis dataset were applied to Data Augmentation: horizontal flip with a probability of 0.5, random brightness contrast also with the same probability, a random resized crop of 0.8 for chance occurrence, and blur with a limit of 1px by 1px. During the integrity data check of Patch Camelyon, some corrupted images were automatically checked and removed to guarantee that only valid data could be used for training and testing.
4.4 Evaluation Metrics
The following metrics allow us to evaluate the performance of each model. They are based on correct classifications (True Positives - TP and True Negatives - TN) and incorrect ones (False Positives - FP and False Negatives - FN). Additionally, we can build the confusion matrix using these values, which can help us visualize how the model performs in terms of correct and incorrect classification. Then, the following metrics complete the evaluation, showing the performance numbers.
In this context, Eq. 1 computes the accuracy, which is the overall model performance, but it is not enough to evaluate the performance due to dataset distribution. Supposing that the dataset presents 80% of positives and 20% of negatives, a simple model that classifies all input as positive would show an accuracy of 80%. However, the satisfactory result is not a consequence of a good model but the effect of the classification distribution in the dataset.
Equation 2 calculates the model precision, which represents how false positives impact the model performance. In other words, the higher the false positive rate, the lower the precision.
The recall presented in Eq. 3 demonstrates how effectively the model identifies false negatives. This metric is crucial in health applications, particularly in cancer detection, as a false negative could cost the patient’s life.
The specificity is shown in Eq. 4, which is the proportion of true negatives (TN) detected correctly. The False Positives affect the metric value, which can lead to an unnecessary treatment. Thus, a look into specificity is made along with a look to recall.
Finally, the metric F1-score provides an overall understanding of how well the model performs on its task, mainly when dealing with false positives and false negatives. In other words, it gives us a general idea of how the model is making classifications. The more incorrect the classifications, the lower the F1-score
4.5 Results
The results are divided into VGG in STL, using only Patch Camelyon, and DTL, using BreakHis pl Patch Camelyon. The following metrics are used: precision, specificity, recall, and F1-score calculated by the Confusion Matrix of VGG16 and VGG19 as previously presented.
Figure 6 depicts the confusion matrix for VGG16 and VGG19 using STL. It shows 4879 correct classifications and 1130 incorrect for VGG16. While 4685 are correct and 1315 are incorrect for VGG19. These numbers reflect the following results in the performance metrics as presented in Table 1: VGG16 in STL obtained 83% of precision and specificity, 84% of recall, and F1-score equal to 82%. In comparison, VGG19 reached 79% of precision, the same for specificity, 77% of recall, and F1-score of 78%. In other words, VGG16 got the best results in STL.
When DTL is applied, the results are considerably better than STL, as we can see in Fig. 7, in which 5113 are correct and 616 are incorrect, representing an enhancement of 234 correct classifications in VGG16. In comparison, VGG19 using DTL increases 373 correct classifications, which is higher than VGG16; however, it is not enough to overcome it.
Regarding metrics in Table 1, VGG16 reached a precision of 99%, a specificity of 86%, a recall of 84%, and an F1-score of 90%, which is a considerable improvement compared to STL excepting in the recall that has presented no improvements. In VGG19, the improvements are minor when using DTL, increasing the precision to 85% and keeping the specificity unchanged. On the other hand, DTL has considerably improved the recall to 89% and the F1-score to 87% in VGG19. Although these improvements in VGG19 are impressive, they are not enough to overcome VGG16, except for the recall metric.
In Table 2, we compare our proposal with the following works: [10] and [4]. The first used a different approach, considering four magnitude levels of the BreakHis dataset as the target dataset to reach the best ACC than the state-of-the-art. In the comparison of our best result in VGG16 (DTL) and [10] (Fig. 2), we achieved a better accuracy than all other accuracy results, mainly compared to the “Proposed (Inception-v3 + Filter)” approach that reached 91% at 100x magnification as the best result. This work is more in-depth because it deals with two datasets, BreakHis as the first domain and Patch Camelyon as the target (shown in Fig. 3), and two points of view (STL and DTL) regardless of the magnitude level.
Concerning [4], the authors obtained a maximum accuracy of 85% using mainly well-established data augmentation techniques (randomly sized cutout mask, for example) in the Patch Camelyon dataset. Our best result reached 0.92 for accuracy.
5 Conclusions and Future Work
This work proposed a novel Deep Double Transfer Learning approach with two established CNN architectures, VGG16 and VGG19. It shows that DTL is a promising technique and has a positive effect in both CNNs, but with better results in VGG16, reaching a precision of 99%, a specificity of 86%, a recall of 84%, and an F1-score of 99%.
According to the metrics evaluated and compared against two published works from [14] and [4], this work could beat the accuracy of both works, showing that using VGG in the Double Transfer Learning context is a promising approach.
Future work includes applying DTL in other CNN-established models or adding more parameters and different approaches for Data Augmentation to improve results along the epochs, which are fewer yet (5) only for use as reference because the state-of-the-art works use few epochs to guarantee the best results at first and see whether the model is promising.
References
Aggarwal, C.C.: Neural Networks and Deep Learning. In: Neural Networks and Deep Learning, pp. 9–53. Springer (2018)
Beysolow II, T.: Introduction to Deep Learning Using R: a step-by-step guide to learning and implementing Deep Learning Models Using R. Apress (2017)
Cancer, N.I.O.: The definition of cancer \(|\) 2024. In: What is Cancer, p. 1. National Institute of Cancer (USA) (2024)
Ericsson, A., Kana, F.: Convolutional Neural Networks for Classification of Metastatic Tissue in Lymph Nodes (2021)
Filho, M.L.R., Cortes, O.A.C.: Efficient breast cancer classification using histopathological images and a simple VGG. Revista de Informática Teórica e Aplicada 29(1), 102–114 (2022). https://doi.org/10.22456/2175-2745.119207
Ismail, N.S., Sovuthy, C.: Breast cancer detection based on deep learning technique. In: 2019 International UNIMAS STEM 12th Engineering Conference (EnCon), pp. 89–92 (2019)
Jiang, Y., Chen, L., Zhang, H., Xiao, X.: Structure of BreakHis dataset. PLoS ONE 23(23), 1–6 (2023). https://doi.org/10.1371/journal.pone.0214587.t004
Litjens, G., et al.: 1399 H &E-stained sentinel lymph node sections of breast cancer patients: the camelyon dataset. In: GigaScience, O. (ed.) The CAMELYON Dataset, pp. 2–3. Oxford (2022)
Liu, Z., Peng, J., Guo, X., Chen, S., Liu, L.: Breast cancer classification method based on improved VGG16 using mammography images. J. Radiat. Res. Appl. Sci. 17(2), 100885 (2024). https://doi.org/10.1016/j.jrras.2024.100885
Matos, J.D., Britto, A.D.S., Oliveira, L.E.S., Koerich, A.L.: Double transfer learning for breast cancer histopathologic image classification. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019). https://doi.org/10.1109/IJCNN.2019.8852092
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. In: Bulletin of Mathematical Biophysics, pp. 115–133. Springer (1943)
Prevention, B.N.I.O.C.: Estimate \(|\) 2024. In: Cancer Incidence in Brazil, pp. 39–40. National Institute of Cancer Prevention (Brazil) (2024)
Primo, W.Q.: National cancer institute and the 2023-2025 estimate – cancer incidence in brazil \(|\) 2023. In: Estimate \(|\) 2023-2025, pp. 1–2. National Institute of Cancer (USA) (2023)
Rodrigues Filho, M.L., Cortes, O.A.: Classificação de Imagens Histopatológicas Usando uma VGG e Mecanismo de Atenção para Detecção de Câncer de Mama. SBC 23(23), 1–6 (2023). https://doi.org/10.5753/sbcas_estendido.2023.229388
Saaber, A., Sakr, M., Abo-Seida, O.M., Keshk, A., Chen, H.: A novel deep-learning model for automatic detection and classification of breast cancer using the transfer-learning technique. IEEE 57, 71194–71209 (2021)
Santos-Bustos, D.F., Nguyen, B.M., Espitia, H.E.: Towards automated eye cancer classification via VGG and ResNet networks using transfer learning. Eng. Sci. Technol. Int. J. 35, 101214 (2022). https://doi.org/10.1016/j.jestch.2022.101214
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2014)
Spanhol, F., Oliveira, L.S., Petitjean, C., Heutte, L.: A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462 (2016). https://doi.org/10.1109/TBME.2015.2496264
Vecchiotti, P., Vesperini, F., Principi, E., Squartini, S., Piazza, F.: Convolutional neural networks with 3-D kernels for voice activity detection in a multiroom environment. In: Esposito, A., Faudez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Multidisciplinary Approaches to Neural Computing. SIST, vol. 69, pp. 161–170. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56904-8_16
Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 210–218. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_24
VigneshKumar, K., Sumathi, N.: A hybrid deep learning approach for breast cancer detection using VGG-19 and support vector machine. Int. J. Mech. Eng. 83–93 (2021)
Vo-Le, C., Son, N.H., Van Muoi, P., Phuong, N.H.: Breast cancer detection from histopathological biopsy images using transfer learning. In: IEEE Eighth International Conference on Communications and Electronics (ICCE), pp. 408–412 (2021). https://doi.org/10.1109/ICCE48956.2021.9352069
Acknowledgment
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Silva e Silva, D.C., Cortes, O.A.C., Diniz, J.O.B. (2025). The Impact of Double Transfer Learning in VGG Architectures for Metastasis Breast Cancer Detection. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15414. Springer, Cham. https://doi.org/10.1007/978-3-031-79035-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-79035-5_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79034-8
Online ISBN: 978-3-031-79035-5
eBook Packages: Computer ScienceComputer Science (R0)







