Iris-CV: Classifying Iris Flowers Is Not as Easy as You Thought

de Paiva Rocha Filho, Itamar; Vasconcelos Teixeira, João Pedro; Lucena Lins, João Wallace; Honorato de Sousa, Felipe; Chaves Sousa, Ana Clara; Ferreira Junior, Manuel; Ramos, Thaís; Silva, Cecília; do Rêgo, Thaís Gaudencio; de Almeida Malheiros, Yuri; Silva Filho, Telmo

doi:10.1007/978-3-030-91699-2_18

Itamar de Paiva Rocha Filho¹⁰,
João Pedro Vasconcelos Teixeira¹⁰,
João Wallace Lucena Lins¹⁰,
Felipe Honorato de Sousa¹⁰,
Ana Clara Chaves Sousa¹⁰,
Manuel Ferreira Junior¹⁰,
Thaís Ramos¹¹,
Cecília Silva¹²,
Thaís Gaudencio do Rêgo¹⁰,
Yuri de Almeida Malheiros¹⁰ &
…
Telmo Silva Filho¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1275 Accesses

Abstract

The iris flower dataset is a ubiquitous benchmark task in machine learning literature. With its 150 instances, four continuous features, and three balanced classes, of which one is linearly separable from the others, iris is generally considered an easy problem. Hence researchers usually rely on other datasets when they need more challenging benchmarks. A similar situation happens with computer vision datasets such as MNIST and ImageNet, which have been widely explored. The state of the art models essentially solves these problems, motivating the search for more challenging tasks. Therefore, this paper introduces a new computer vision toy dataset featuring iris flowers. Users of a nature photography application took the pictures, thus they include noisy background information. Additionally, certain desirable features are not guaranteed, such as single, similarly-sized objects at the center of each picture, which makes the task more challenging. Our benchmark results show that the dataset can be challenging for traditional machine learning algorithms without any pre-processing steps, while state of the art deep learning architectures achieve around 82% accuracy, which means some effort will be necessary to drive this accuracy closer to what has been accomplished for MNIST and ImageNet.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Assessment of Iris Flower Classification Using Machine Learning Algorithms

End-to-End Automated Iris Segmentation Framework Using U-Net Convolutional Neural Network

Flower Species Detection System Using Deep Convolutional Neural Networks

1 Introduction

Since Fisher’s original publication in the context of linear discriminant analysis [4], the iris flower dataset turned into one of the most well-known and explored datasets in statistical classification and machine learning (ML), with over 18,000 citations until the date of publication of this work.

The dataset consists of 150 observations, equally divided into three classes (Iris setosa, Iris virginica and Iris versicolor), which are described by four features: the length and the width of the sepals and petals, in centimeters. The measurements were taken by Edgar Anderson [1], who was interested in measuring the morphological variation of these species, while Fisher was the first to use the dataset in a statistical learning context.

Iris is generally considered an easy classification problem and is frequently used as ML’s “hello world” (i.e., as the first example one comes into contact with as a beginner in this area). Most classification techniques have no trouble achieving accuracies well above 90% for iris with various hyperparameter configurations, as shown by multiple benchmark results available at OpenML [17].

In this paper, we introduce a new iris flower toy dataset, which breaks away from the original dataset’s simplicity by turning the problem of iris classification into a computer vision (CV) task. Since the rise of deep learning (DL), the literature of CV has seen many of its well-known problems, including MNIST [9], ImageNet [3], and the newer Fashion-MNIST [18], being mostly solved, i.e., there have been models which achieved near-human or even better-than-human accuracy for these tasks.

Hence, this work introduces a dataset which features CV challenges such as fine grained categorization, background noise, real world environment conditions (e.g. lighting variation), as well as different scales and non-centered objects. The main idea is to make this dataset available so that it can be used to validate machine/deep learning approaches considering difficult CV scenarios. The paper also contains the classification results of traditional machine learning algorithms, as well as some state-of-the-art deep learning architectures using the Iris-CV dataset.

The remainder of this paper is organized as follows: Sect. 2 shows some related works using the iNaturalist dataset, Sect. 3 describes our new dataset, Sect. 4 presents benchmark results, including experiments with traditional ML algorithms and state of the art deep neural networks, and finally, Sect. 5 contains our final remarks.

2 Related Works

The iNaturalist is a dataset proposed by [16] and has 675,170 images from more than 5 thousand different species of animals and plants. The category of plants has almost 200 thousand images. Those species images were captured from all over the world, with different cameras and image qualities. Also, it has a large class imbalance [16]. This dataset is constantly being updated.

Although iNaturalist is a toy dataset that was used in Kaggle competitions, it was already used in published papers. Some works were focused on detection [16] and others’ purposes were to make classifications such as this work.

Plant classification was the focus of a study using the iNaturalist dataset [12], where the authors used a convolutional neural network to classify the plant’s subclasses. Data Augmentation was used to reduce overfitting and balance the classes. Then, a transfer learning approach based on ResNet50 was utilized. This work classifies different plant species rather than only iris unlike our work.

Another study used the entire iNaturalist dataset to make both classification and detection [16]. For classification, the following deep network architectures were performed: ResNets, Inception V3, Inception ResNetV2 and MobileNet. From those models, Inception ResNetV2 SE had the best performance. This work also classifies different plant species rather than focusing on iris flowers, and their experiments did not include classic algorithms unlike ours.

This work uses the iNaturalist dataset as a single source, but some other researchers used it combined with different datasets. An example of it was the work proposed by [7]. After selecting different species of plant images from three datasets, they applied deep learning techniques for plant classification. Their goal was to achieve at least 50 percent accuracy as a baseline classification and ResNet50 was able to classify almost half of the iNaturalist observations. The iNaturalist dataset performed better than the Portuguese Flora dataset, but the Google Image Search observations were better than both.

As the purpose of this paper is to classify the different species of Iris, the iNaturalist dataset provided the images necessary to do that. The paper differs from works described in this section since it focuses on benchmark analysis, thus it covers a variety of algorithms and computational cost for each of them. In addition, we focus specifically on iris flowers, whereas the other works used plants in general.

3 A New Iris Dataset

Our new iris dataset, called Iris-CV, consists of 5,139 examples extracting from iNaturalist (September 10th 2020). Each example representing an RGB image associated with a label corresponding to five different species: Bearded Iris (Be) (Iris x germanica, 928 images), Douglas Iris (Do) (Iris douglasiana, 944 images), Dwarf Crested Iris (Dw) (Iris cristata, 1290 images), Western Blue Iris (We) (Iris missouriensis, 1036 images), and Yellow Iris (Ye) (Iris pseudacorus, 941 images), as shown in Table 1. All images were gathered from iNaturalist [10], a website that provides Creative Commons-licensed pictures of fauna and flora taken by users worldwide.

Table 1. Dataset size per class

Full size table

After downloading the images, we manually removed photos that had too much noise, i.e., pictures of many different flowers, human hands covering the majority of the frame, and blurry images, which could confuse learning models, resulting in a harder problem.

The original images have different sizes, therefore we resized them to a resolution of 256x256, which maintained the images’ main features while avoiding high memory requirements. We also kept the color information, which is coded in three RGB channels, as it can be important to differentiate the classes. Table 2 shows resized examples of each class. The classes show different color patterns, particularly the Yellow Iris, which is appropriately named, and the Dwarf Crested Iris, with its white and orange crests.

Table 2. Class names and examples from the Iris-CV dataset.

Full size table

The pictures also show heavy background information, thus models must learn how to separate the flowers from the background to perform well. Images may contain multiple and/or non-centered flowers. Additionally, due to the different image and flower sizes, some individuals may become small compared to others after we resize the pictures, as seen in some examples in Table 2. There can also be very different lighting conditions across pictures.

Finally, successful models will have to learn that each species can show different colors, for instance, Dwarf Crested Irises can be lavender, lilac, pale blue, purple, white, or pink. As a result of all these features, Iris-CV can be a challenging computer vision problem.

4 Benchmark Results

We begin our experimental analysis by validating our dataset with eight different classic algorithms using scikit-learn and XGBoost libraries, as listed below:

Decision tree (DT);
Extra tree (ET);
Gradient boosting (GB);
Extreme gradient boosting (XGB);
Multilayer perceptron (MLP);
Perceptron;
Random forest (RF);
K-nearest neighbors (KNN).

Since these algorithms are not originally equipped to receive pixel matrices as input, we flattened the 256x256x3 images, obtaining vectors with 196,608 dimensions which can then be used as input. After obtaining the input vectors, we rescaled the images by dividing each pixel by 255 before training and testing the algorithms.

Experiments were carried out using 5-fold stratified cross-validation, which maintains class proportions across folds. The code was implemented using python’s scikit-learn library [11].

The algorithms and their hyperparameter values were chosen based on their performances on Fashion-MNIST’s benchmark [18] and we used several machines to streamline the execution of tasks. Despite each machine has different settings (i.e. processor, graphic cards and memory), it is important to notice that this only affects the training time, not the algorithms overall performance. The best results achieved per classifier can be seen in Table 3.

We tested different values (Table 3) for the following hyperparameters: criterion; objective; splitter; max_depth; loss; n_estimators; activation; hidden_layer_sizes; penalty; n_neighbors; weights; and p. See scikit-learn’s documentation [11] for more details about these parameters. In addition, Table 4 describes the hardware resources for each classifier.

Table 3. Results with standard deviation (Std) of classic algorithms for the Iris-CV dataset using 5-fold cross validation. The time column refers only to training time. Hyperparameter names are shown as they appear in scikit-learn’s documentation.

Full size table

Table 4. Hardware settings for training and evaluate each classifier

Full size table

Even though most results surpassed the expected accuracy of a random classifier (0.251), the overall performance shows how difficult this problem is for classic methods without any extra preprocessing specifically designed to improve their results. Best results were obtained by XGBoost with 500 estimators, multi:softmax as the split objective, and 3 as the max depth, with a mean accuracy of 0.614. Results were also poor compared to Fashion-MNIST [18] and MNIST [9], where the same algorithms can reach accuracy values over 0.85 and 0.95, respectively.

As mentioned in Sect. 3, our dataset’s difficulty can be explained by some factors, such as somewhat high-dimensional images, RGB channels, non-centered images, noisy backgrounds, and petals with different colors within each species. Thus, this problem cannot be tackled by simple approaches.

4.1 Deep Neural Net Results

We now turn to state-of-the-art Convolutional Neural Network (CNN) architectures, implemented using TensorFlow 2 [5], to see how these techniques performing in the classification of Iris-CV images. We chose to the hyperparameters used in the DenseNet [8] paper, since this is a state-of-the-art network. We trained each network with the stochastic gradient descent (SGD) optimizer, using 0.9 for the Nesterov momentum and 40 epochs. Regarding the learning rate, we chose a starting value of 0.1, decaying at a pace of \(10^\frac{epoch}{20}\) as a way of tuning network performance, which was observed empirically. In addition, we used different state-of-the-art architectures such as EfficientNet, MobileNet and ResNet50 to verify if it has any performance improvement regarding state-of-the-art network architectures.

Also, we performed data augmentation to balance the class proportions. We tested several combinations of the parameters of the ImageDataGenerator provided by TensorFlow and Keras, viewing the images we had as a result and choosing the ones that kept most of the original information. Below we list the augmentation parameters:

Rescale: 1/255
Rotation Range: 20\(^{\circ }\)
Width Shift Range: 0.1
Height Shift Range: 0.1
Horizontal Flip: True
Shear Range: 0.1
Zoom Range: 0.4 - 0.5
Fill Mode: nearest

More information about these settings and the process used by ImageDataGenerator is available in the official Tensorflow documentation^{Footnote 1}. The trained architectures are listed below:

DenseNet121. Proposed by Huang [8], this deep net aims to optimize the flow of information between the layers of the network, making a dependency link between them, trying to minimize convergence time through shorter paths.

EfficientNetB0. This architecture, introduced by Mingxing Tan [15], is a neural network obtained through a compound scaling method of ConvNets’s depth, width, and resolution.

InceptionV3. This architecture is a refinement of its antecessors, firstly by the introduction of batch normalization, and later by additional factorization ideas in the third iteration [14].

MobileNetV2. This network uses light convolution layers to filter the features in the intermediate layer, is based on a residual inverted structure [13].

ResNet50. ResNet [6] is an abbreviation for Residual Networks. This type of deep convolutional neural network uses residual blocks and can work with many layers of depths, avoiding the vanishing gradient problem. Specifically, ResNet50 is a 50-layer residual network.

Xception. Xception [2] is a novel deep convolutional neural network architecture inspired by Inception [14], where Inception modules have been replaced with depthwise separable convolutions.

The models were evaluated using 5-fold stratified cross-validation. Table 5 shows that all CNNs outperformed almost all classic algorithms, as expected. Most architectures achieved accuracies over 0.74, except for EfficientNetB3, which achieved the worst performance among the deep learning architectures. Results also show that MobileNetV2 holds the highest accuracy and can be considered the state of the art for the Iris-CV dataset.

Table 5. Results with standard deviation (Std) of Deep Learning architectures for the Iris-CV dataset using 5-fold cross-validation. Columns B and E refer to batch size and number of epochs, respectively. We used different batch sizes considering hardware resources of each machine. The DenseNet121, EfficientNetB0, and MobileNetV2 architectures were trained using RTX 2060 Super, ResNet 50 was trained with RTX 2070, while the remaining ones were trained with a GTX 1660Ti.

Full size table

In addition to assessing accuracy, we analyzed the confusion matrix that corresponds to the best MobileNetV2 result, so that we can determine which classes the models have the most difficulty in predicting. The confusion matrix is shown in Table 6.

Table 6. Confusion matrix corresponding to MobileNetV2’s best test accuracy – each class is represented by the first two letters in its common name.

Full size table

The MobileNetV2 confusion matrix shows that the network was good at differentiating Western Iris (We), Yellow Iris (Ye), and Dwarf Crested Iris (Dw) from the other classes, as their precision scores corresponded to approximately 88%, 89%, and 89%, respectively, as shown in Table 7. The other two classes – Bearded Iris (Be) and Douglas Iris (Do) – were not as well discriminated, and the algorithm had some trouble distinguishing them correctly since the flowers belonging to these two classes often have the same color, although they have different petal shapes. The similarity between these two classes can be observed in Table 8. Douglas Iris can also sometimes be confused with Dwarf Crested Irises, due to their white petal markings. As a result, Douglas Irises were the hardest flowers to classify correctly, with only 69% precision.

Table 7. Precision, recall and F1-Score corresponding to MobileNetV2’s best results in test dataset - each class is represented by the first letters in its common name

Full size table

Table 8. Examples of Bearded Iris and Douglas Iris flowers showing their similar colors, but different petal structures.

Full size table

5 Conclusion

This paper introduced a new computer vision dataset, called Iris-CV, consisting of five classes of iris flowers. The images show many features that make this dataset a challenging task, such as non-centered flowers, different lighting conditions, multiple flowers per image, and classes that naturally appear with different petal colors. Due to all of these reasons, Iris-CV proved to be too hard for traditional machine learning algorithms, with poorer results than those observed for established benchmark datasets, such as MNIST and Fashion-MNIST.

State-of-the-art deep neural nets also performed worse than current results for MNIST, Fashion-MNIST, and ImageNet, with MobileNetV2 achieving 82% accuracy, which is the best cross-validated result so far. Additionally, an analysis of the best confusion matrix produced by MobileNetV2 showed that three of the classes are more easily classified, namely Dwarf Crested Iris, Western Iris, and Yellow Iris. The remaining two classes – Bearded Iris and Douglas Iris – offered harder challenges, with the latter being the toughest to discriminate.

Since this paper main objective is to propose a new Iris dataset considering computer vision common problems (e.g. occlusion, background noise and fine grained features), we performed a baseline of most used and state-of-the-art algorithms. Future works include further exploration of deep neural net architectures and regularization techniques since the architectures used in this work focused on learning large datasets and some of them overfitted. In addition, hyper parameter tuning is also possible through optimization techniques such as bayesian optimization, grid search or random search, specially classic machine learning approaches which returned worse performed in this dataset. Therefore, our next goal is to improve the current benchmark results, collect more data from different sources, and extract an object detection dataset based on the same images.

Notes

1.
See Tensorflow documentation for more details.

References

Anderson, E.: The species problem in iris. Ann. Mo. Bot. Gard. 23(3), 457–509 (1936). http://www.jstor.org/stable/2394164
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936). https://doi.org/10.1111/j.1469-1809.1936.tb02137.x, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-1809.1936.tb02137.x
Goodfellow, I., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/software available from tensorflow.org
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Heredia, I.: Large-scale plant classification with deep neural networks. In: Proceedings of the Computing Frontiers Conference, pp. 259–262 (2017)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Loarie, S.: A community for naturalists (2008). https://www.inaturalist.org/
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
R. Al-Qurran, M.A.A., Shatnawi, A.: Plant classification in the wild: a transfer learning approach. International Arab Conference on Information Technology (ACIT) (2018)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018)
Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: Openml: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198, http://doi.acm.org/10.1145/2641190.2641198
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv preprint arXiv:1708.07747

Download references

Author information

Authors and Affiliations

Universidade Federal da Paraíba, João Pessoa, PB, Brazil
Itamar de Paiva Rocha Filho, João Pedro Vasconcelos Teixeira, João Wallace Lucena Lins, Felipe Honorato de Sousa, Ana Clara Chaves Sousa, Manuel Ferreira Junior, Thaís Gaudencio do Rêgo, Yuri de Almeida Malheiros & Telmo Silva Filho
Universidade Federal do Rio Grande do Norte, Natal, RN, Brazil
Thaís Ramos
Universidade Federal de Pernambuco, Recife, PE, Brazil
Cecília Silva

Authors

Itamar de Paiva Rocha Filho
View author publications
Search author on:PubMed Google Scholar
João Pedro Vasconcelos Teixeira
View author publications
Search author on:PubMed Google Scholar
João Wallace Lucena Lins
View author publications
Search author on:PubMed Google Scholar
Felipe Honorato de Sousa
View author publications
Search author on:PubMed Google Scholar
Ana Clara Chaves Sousa
View author publications
Search author on:PubMed Google Scholar
Manuel Ferreira Junior
View author publications
Search author on:PubMed Google Scholar
Thaís Ramos
View author publications
Search author on:PubMed Google Scholar
Cecília Silva
View author publications
Search author on:PubMed Google Scholar
Thaís Gaudencio do Rêgo
View author publications
Search author on:PubMed Google Scholar
Yuri de Almeida Malheiros
View author publications
Search author on:PubMed Google Scholar
Telmo Silva Filho
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Paiva Rocha Filho, I. et al. (2021). Iris-CV: Classifying Iris Flowers Is Not as Easy as You Thought. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_18
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

Iris-CV: Classifying Iris Flowers Is Not as Easy as You Thought

Abstract

Similar content being viewed by others

Assessment of Iris Flower Classification Using Machine Learning Algorithms

End-to-End Automated Iris Segmentation Framework Using U-Net Convolutional Neural Network

Flower Species Detection System Using Deep Convolutional Neural Networks

Explore related subjects

1 Introduction

2 Related Works

3 A New Iris Dataset

4 Benchmark Results

4.1 Deep Neural Net Results

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us