1 Introduction

Dengue is an acute febrile disease caused by a virus, which is transmitted mainly by the Aedes aegypti, a mosquito that also transmits chikungunya fever and the zika virus [20]. It is an endemic disease in the Americas, Africa, Asia and Oceania. The World Health Organization estimates that, annually, between 100 and 400 million cases occur each year, with 500,000 to 1 million developing the severe form, which can be fatal [33]. In Brazil, dengue is a public health problem. According to the [24], approximately 1.5 million probable cases were registered in Brazil in 2022, an increase of 206% compared to 2021. This shows the importance of disease prevention for the well-being of the population. One of the most effective ways to prevent Dengue is to eliminate breeding sites for the Aedes aegypti mosquito, the primary vector of the disease. Larvae foci can be found commonly in artificial deposits and containers with stagnant water (e.g., flower pots or tires), where they allow the development of the mosquito larvae. Therefore, an important actor in controlling this disease control is the population, by preventing the formation of breeding sites in their homes. Unfortunately, identifying these breeding sites remains a challenge, as the general population lacks sufficient knowledge to distinguish Ae. Aegypti larvae from other species, such as Aedes albopictus or Culex sp. Also to be found in urban regions in Santa Catarina [12].

Mosquitoes is a holometabolous insect which goes through the complete metamorphosis process from eggs, larva, pupa, to the adult stage. The most favorable stage for collecting mosquito samples may be the larval stage, since it is contained in a place with water. Morphological characteristics of the larva, color, bristles, length of breathing tube, its positioning in the water are factors taken into consideration for the characterization of the insect, such as the differentiation of Culex and Aedes mosquitoes. Usually the classification of mosquito larvae is based on a microscopic analysis by biologists [14].

An alternative solution in order to enable citizens to identify the presence of Ae. Aegypti larvae in their homes could be to develop a Deep Learning (DL) model for the automatic classification of mosquito larvaes deployed in a mobile application that enables the classification using a photograph of a mosquito larva found by the user taken with the cellphone camera. This would make the diagnosis quick and accessible, not requiring specialist knowledge of mosquito morphology.

There is already research on the development of DL models aiming at the classification of mosquito larvae images. Yet, most of the existing models do not classify between Aedes species, thus omitting important information when to be used for dengue prevention. The only model developed to distinguish between Aedes species [5] reported a very low accuracy of only 64.58% that is inappropriate considering the risk of misclassification. Another limitation of existing research is the use of either microscopic images or pictures taken with a special zoom lens [17, 25], not available to a broader audience.

Therefore, aiming at the development of a DL model to be deployed in a mobile application to be used by citizens in the urban region of Florianopolis/SC, we analyze the following research question: Is it possible to develop a DL model to classify larvae of Ae. Aegypti, Ae. Albopictus and Culex sp. Mosquitoes based on photos taken with a cellphone camera with an accuracy of at least 90%, considering the risk related to erroneous classification in the context of an intelligent mobile application to be used by a wide target audience.

2 Background: Morphology of Mosquito Larvae

There are fundamental differences in the morphology of mosquito larvae between different genera and species as shown in Fig. 1. Mosquito larvae have three body regions: head, thorax, and abdomen. The head of mosquito larvae is large and sclerotized (composed of a hardened exoskeleton). The shape of the head of Aedes and Culex larvae is broad. The head has two eyes, two antennae, and a brush-like or comb-like mouth apparatus. The eyes are usually small, simple (not compound) and are found on both sides of the head [8]. The head of the Aedes is generally shorter and wider than that of the Culex. Aedes antennae have small scales on the surface, while Culex antennae are smooth. Aedes jaws are wider and shorter than Culex jaws. In general, the thorax of larvae is wider than the head and has three rows of bristles. The thorax of the Aedes has white spots. Culex's thorax has no distinctive spots. In addition, the bristles on the surface of the thorax of Aedes are longer and more sparse than those of Culex. Commonly, the abdomen is elongated, composed of ten segments, and its eighth segment has the respiratory siphon that is used to distinguish between species [7, 8]. The abdomen of the Aedes is shorter and wider than that of the Culex, while the scales on the abdomen of the Aedes are rounder and darker than those of the Culex.

Fig. 1.
figure 1

Fundamental differences in the morphology of mosquito larvae: Aedes and Culex sp. [9, 14].

The siphon in the abdomen allows the larvae to breathe oxygen from the air while remaining submerged in water. The siphon is one of the main characteristics used to identify mosquito species. Its shape, size and length can vary greatly between species, as well as the presence or absence of a specialized structure called a “pecten”. The pecten is a row of spines located at the base of the siphon. There are some notable differences in the siphon between Aedes and Culex. The Culex siphon is longer and narrower than the one of Aedes [7, 8, 14, 32].

The genus Aedes has two main disease vector species, Ae. Aegypti and Ae. Albopictus. Some morphological characteristics distinguish the larvae of these species [7]. The head of the Ae. Aegypti is more rounded, while the head of the Ae. Albopictus is more elongated. There are also subtle differences in the bristles of the antennae. The bristles on the thorax of the Ae. Aegypti are longer, while in the Ae. Albopictus they are shorter. And on the abdomen, the bristles of the Aedes albopictus are simpler and without many branches (Fig. 2) [7, 8, 14, 32].

Fig. 2.
figure 2

Main differences in the head, thorax and abdomen of Aedes species (Ae. Aegypti and Ae. Albopictus) and Culex [8, 14].

All these characteristics are commonly distinguishable only by biologists and specialists, typically with the aid of microscopes.

3 Related Work

To summarize research adopting Deep Learning for the automatic classification of images of mosquito larvae during the last ten years, a systematic mapping following the procedure proposed by [26] was conducted. As a result a total of 12 research articles were found published mostly during the last five years (Table 1).

Table 1. Overview on research adopting DL for the classification of mosquito larvae images.

Most of the research focuses on the classification of Aedes vs. Non-Aedes mosquito larvae, omitting detailed information when using the model for dengue prevention, while few aim at distinguishing between Aedes species. Few also consider the Culex species [5, 30], another vector of diseases (such as encephalitis, lymphatic filariasis and Nile fever). Focusing on the development of such a model to be used by citizens for dengue prevention, especially in the urban region of Florianópolis/SC, it is imperative to consider all relevant species, including also a differentiation between Ae. Aegypti and Ae. Albopictus in order to help the users to take adequate actions to prevent proliferation. Yet, so far no research in Brazil with such a focus has been encountered.

Only [5] presents a research that addresses the primary mosquito species of concern, including Ae. Aegypti, Ae. Albopictus, Anopheles, and Culex sp. However, the results achieved were not significant, with a reported accuracy of only 64%. This low level of accuracy poses a significant risk for users and could put human lives in danger in the context of the usage scenario.

Another limitation regarding most existing research is that the DL models have been trained with microscopic images, different from photos that would be taken by a citizen with a cellphone camera. Only [5, 17] and [25] use images taken in with a cellphone, yet using a special zoom lens, which again will not be available to a broader audience.

The current research mostly used Convolutional Neural Networks (CNNs), including ResNets as well as older models such as Alexnet. No research with more recent CNNs, such as EfficientNets have been encountered, although high performances have been reported in other application domains comparing it with other models [31].

The performance of the trained models varies largely, with an accuracy ranging from 59.67% [30] to about 97% [13, 29], while most performance results are rather low especially considering the risk related to a misclassification in this context. The highest performances reported are also related to classifications distinguishing only Aedes vs. Non-Aedes, but not among Aedes species. Although DenseNet achieved a high accuracy (97%), as reported by [13], it was not designed specifically for mobile applications, and generally has a higher number of parameters and computational complexity. The only research reporting performance with respect to the classification between Aedes species [5] indicates a very low accuracy of only 64.58%.

These results show that there is currently a lack of DL models for the automatic classification of mosquito larvae that are able to distinguish between Aedes species based on pictures taken with a cellphone camera with minimal acceptable accuracy.

4 Research Methodology

This research is characterized as applied research [16] by identifying and characterizing a need and aiming to contribute a practical solution adopting DL. Based on the results of a previous systematic mapping as presented in Sect. 3, we follow a systematic process for the human-centric interactive development of Deep Learning models [1, 22, 28]. As presented in Table 2 this includes the analysis of requirements, data preparation and an experiment on training and evaluating different DL models, as well as testing the trained models with new unseen images. The evaluation results are compared between the different DL models as well as in comparison with models encountered in literature.

Table 2. Overview on phases of the research methodology.

5 Development of the Image Classification Model

5.1 Requirements Analysis

Adopting [23] requirements notation, the goal is to develop a DL model that learns from experience E with respect to some class of tasks T and with performance measure P, if its performance on tasks in T, measured by P, improves with experience E. Here, Task (T) is to classify mosquito larvae (single label) from a photo taken with a cellphone camera. Experience (E) is a corpus of labeled images of mosquito larvae of the genus Aedes including the species Ae. Aegypti, Ae. Albopictus, genus Culex and a non-mosquito object. In terms of performance (P), considering the risk of misclassification in this context, an accuracy performance level greater than 90% is expected in order to ensure that the model can effectively identify mosquito larvae and contribute to dengue prevention efforts. This level of accuracy would provide a sufficient degree of confidence in the model’s ability to identify mosquito larvae correctly and minimize the risk of false positives or false negatives.

5.2 Data Preparation

Due to the unavailability of public datasets of cellphone photos of larvae of the relevant mosquito species, we collected a set of 1.999 images, including 748 images of Ae. Aegypti larvae, 464 images of Ae. Albopictus larvae, 447 images of Culex sp. Larvae and 340 images of Non-Mosquito objectsFootnote 1.

The images were collected and labeled by researchers from the initiative Computação na Escola/INCoD/INE/UFSC in cooperation with the Laboratory of Hematozoan Transmitters/UFSC and the Coordination of the Laboratory Reference Network/Central Public Health Laboratory of SC. The images were taken with cell phone cameras (Samsung Galaxy S10, Xiaomi Redmi Note 9 and Xiaomi Mi 11 Pro) without any additional lenses (Table 3).

Table 3. Camera specifications.

The images were collected with varying backgrounds, angles, and resolutions and saved in.jpg format (Fig. 3).

Fig. 3.
figure 3

Examples of images from the data set.

In order to assure data quality the following data quality characteristics were considered in accordance with IEEE Std 2801 [18] as presented in Table 4.

Table 4. Data quality characteristics.

The data set was divided into a training (79.2%), validation (19.8%), and test (2%) set.

5.3 Model Training, Evaluation and Comparison

In accordance with literature five different DL architectures for image classification that are also indicated for deployment in mobile applications have been trained: MobileNetv2, ResNet18, ResNet34, EfficientNet_B0, and EfficientNet_Lite0. MobileNetv2 is an architecture designed specifically for mobile device applications, with a focus on efficiency and low resource consumption [15]. ResNets 18 and 34 were chosen because of their efficiency and being the least complex of the ResNet architecture family considering a deployment in a mobile app [6]. EfficientNet_B0 is the most basic version of the EfficientNet family, and is designed to be efficient in terms of both computational resources and performance [31] and the lightest version, EfficientNet_Lite0, designed specifically for mobile device applications [21].

The models were developed in Python using Jupyter Notebook/Google Colab. For the development we used the Fast.ai library [10], an open source deep learning library that provides a high-level API to simplify the process of training and validating DL models as well as its support for transfer learning. In order to run EfficientNet and Mobilenet architectures we also used the TIMM library, which provides a collection of pre-trained SOTA (State-of-the-Art) computer vision models [10].

The images were resized to 224 × 224 pixels and subjected to random data augmentation. An overview of the training parameters and results is shown in Table 5.

Table 5. Overview on training parameters and results.

In an exploratory analysis of the results of the models’ validation, EfficientNet_Lite0 stands out by showing the best overall performance in terms of accuracy, precision, recall, and F1-score. This result is particularly notable considering that this architecture was designed to be efficient in terms of computational resources according to [21]. The ResNet18, ResNet34, and EfficientNet_B0 models performed almost similarly, with all metrics above 96%. Although there are subtle differences. ResNet34 had slightly higher accuracy, while EfficientNet_B0 had slightly higher precision, recall, and F1-score. MobileNetv2 demonstrated the worst performance among the models, with all metrics significantly below the others. Despite being designed to be fast and lightweight, the model compromises performance in exchange for efficiency. The results of the evaluation of each model are presented in Table 6.

Table 6. Results of evaluation metrics.

6 Prediction Test

Following ISO/IEC 4213 [19] we performed a test with the trained models predicting the classification of previously unseen images.

6.1 Test Preparation

Test Dataset. The test set is composed of a total of 40 images with the data quality characteristics as described in Table 7.

Table 7. Test data set characteristics

Metrics. To evaluate the performance of the models, the following metrics were employed: Accuracy, to evaluate overall performance; precision and recall to support an understanding how the model handles false positives and false negatives, and F1 score, the harmonic mean of the accuracy and recall results; specificity was measured for correctly identifying negative examples and minimizing false positives.

Execution Environment. The test was run using Jupyter Notebooks in Google Colab.

6.2 Test Results

The test results are presented in Table 8.

Table 8. Test results.

The results show that the ResNet18 model obtained the highest accuracy, precision, recall, and F1 score during the test, while the MobileNetv2 model obtained the lowest overall performance. EfficientNet_B0 and EfficientNet_Lite0 perform similarly to ResNet34 in terms of accuracy, precision, and recall. Due to the higher accuracy, ResNet18 performs better in correctly classifying both positive and negative examples, minimizing classification errors, including false positives and false negatives. Due to higher precision, ResNet18 and EfficientNet_Lite0 perform better in correctly identifying positive examples and minimizing false positives.

The specificity of 97% for the ResNet and EfficientNet models families indicates that they are very effective at identifying true negatives while avoiding false positives. This high specificity value suggests that these models can correctly discern the negative classes, reducing the misclassification rate and improving the reliability of the predictions.

The lowest performance results were observed with regard to MobileNetv2, which indicates that the model has a lower performance in correctly classifying examples in general. Despite these results, and also having the lowest specificity among the models, the specificity of 87%, still indicates that MobileNetv2 can handle the identification of true negatives and avoid false positives with some efficiency.

Comparing ResNet18, ResNet34 and EfficientNet_B0 only small differences between precision and recall (1%) were observed, suggesting that the models have a balanced performance between avoiding false positives (high precision) and identifying true positives (high recall). This is also confirmed by the F1 score of > 90% of ResNet18, ResNet34 and EfficienttNet_B0, EfficientNet_Lite0, which indicates a robust performance of the models for correct identification of examples (Table 9).

Table 9. Confusion matrices

The confusion matrices indicate that all models have accurately classified the images of Ae. Aegypti. In our usage scenario, any misclassification would have severe consequences, as it could mistakenly lead the user to believe that the larva found in their home is not a primary vector for dengue. ResNet18, ResNet34, EfficientNet_B0 and EfficientNet_Lite0 demonstrate a similar misclassification of Ae. Albopictus as Ae. Aegypti, still being a classification error, yet with less risk to the user. ResNet34, EfficientNet_B0 also misclassified in one case an image of a Culex sp. as Ae. Albopictus. Again, MobileNetv2 demonstrated the worst performance misclassifying Ae. Albopictus, Culex sp. And even non-mosquito objects. However, even this model did not misclassify any of the images of Ae. Aegypti.

7 Discussion

In this research we studied the viability of classifying images of mosquito larvae (Ae. Aegypti, Ae. Albopictus, Culex, and nonmosquito) taken with a cellphone camera comparing the performance of different DL models (MobileNetv2, ResNet18, ResNet34, EfficientNet_B0 and EfficientNet_Lite0). With the exception of MobileNetv2, the models reached convergence quickly and showed satisfactory validation results, suggesting that in a few training epochs they were able to learn the main features relevant for classification. The EfficientNet_Lite0 model achieved convergence the fastest, while EfficientNet_B0 obtained the best fit to the data. ResNet18, ResNet34, EfficientNet_B0 and EfficientNet_Lite0 achieved similar accuracy during validation ranging from 96.17% to 97.58%. Similar results were also achieved during testing on previously unseen data, during which ResNet34, EfficientNet_B0 and EfficientNet_Lite0 demonstrated an accuracy of 90% and ResNet 92.5%. These results provide a first indication that these models were able to learn the generalization of the classification of this kind of image, achieving performance results above the required minimum of 90% in the context of our usage scenario. Analyzing the confusion matrices it can also be observed that none of the models misclassified the images of Ae. Aegypti, which would be the worst case in our usage scenario as it could harm the user due to leading him/her to not taking actions by not recognizing the larva found as a potential dengue vector.

During training and validation the EfficientNet families converged and fitted the dataset better due to the optimization techniques for automatic model size adjustment by EfficientNet_Lite0, and the resolution scaling adjustment technique of EfficientNet_B0. Yet, during testing the ResNet18 model performed best with regard to all metrics due to a combination of factors. And, although ResNet34 has a deeper architecture, which may allow it to learn more complex representations of the data, the increased complexity may also increase the risk of overfitting and make generalization to new data more difficult. In addition, ResNet34 may be more susceptible to performance degradation issues, such as decreased performance when adding additional layers, which may have affected its classification performance.

The MobileNetv2 underperformed the other models due to its simpler architecture, with fewer layers and fewer parameters. This limits its ability to learn complex data representations, affecting classification performance. And, although these characteristics make it light and fast, they also result in lower performance.

Threats to Validity. As any empirical study there are several threats to validity. Concerning selection bias, we aimed at preparing a diverse and representative sample of images that covers the range of variability in the population being studied. We considered various cameras, backgrounds, angles, and lighting for external validity, but our dataset may not cover all contexts, potentially limiting generalizability. Considering the size of our dataset containing 1.999 images we consider the risk of a sample size bias minimal especially when also compared to existing approaches reported in the literature. In order to prevent data preprocessing bias we only used standardized preprocessing techniques that are applicable to all models being compared. We also maintained consistent training parameters for comparability between models. With regard to evaluation metric bias, we used multiple standard evaluation metrics that cover different aspects of model performance and to report the results of all metrics used.

8 Conclusion

The results of our research demonstrate that it is possible to classify images of mosquito larvae even distinguishing between Aedes species with sufficient performance of testing accuracy > 90% and without misclassifying images of Ae. Aegypti. Our results are also much better than the ones reported by the only other study aiming at the differentiation between Aedes species reporting an accuracy of only 64.58%.

Both of the the models (ResNet18 and EfficientNet_Lite0), which demonstrated the best results during training/validation or testing, are also small architectures specifically indicated for the deployment in mobile applications, which thus will allow the implementation of such an application in order to allow the operationalization of the intended usage scenario of citizens using the automated classification for dengue prevention in their homes.