key: cord-0058865-6hh33kxs authors: Kuznetsova, Anna; Maleva, Tatiana; Soloviev, Vladimir title: Detecting Apples in Orchards Using YOLOv3 date: 2020-08-24 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58799-4_66 sha: f880099bcabd42078edfe449f1055fe8b41f6c75 doc_id: 58865 cord_uid: 6hh33kxs A machine vision system for detecting apples in orchards was developed. The system designed for use in harvesting robots is based on a YOLOv3 algorithm modification with pre- and postprocessing. As a result, apples that are blocked by leaves and branches, green apples on a green background, darkened apples are detected. Apple detection time averaged 19 ms with 90.8% Recall (fruit detection rate), and 7.8% FPR. As a result of intensification, mechanization, and automation, agricultural productivity has increased significantly. In general, in developed countries, the number of people employed in agriculture decreased by 80 times during the 20th century. Nevertheless, manual labor is the main component of costs in agriculture, reaching 40% of the total value of vegetables, fruits, and cereals grown [1, 2] . Horticulture is one of the most labor-intensive sectors of agriculture. The level of automation in horticulture is about 15%, fruits are harvesting manually, and crop shortages reach 50%. At the same time, as a result of urbanization, every year, it is becoming increasingly difficult to recruit seasonal workers for the harvest [3] . It is obvious that the widespread use of robots can bring significant benefits in horticulture, increase labor productivity, reduce the share of heavy manual routine harvesting operations, and reduce crop shortages. Fruit picking robots have been developing since the late 1960s. However, today all the developments are in the prototype stage. The low speed of picking fruits and unsatisfactory rates of picked and unhandled apples are hindering the industrial implementation of fruits harvesting robots. It is largely due to the insufficient quality of machine vision systems used in robots for picking apples [4, 5] . Recently, many neural network models have been trained to recognize apples. However, computer vision systems based on these models in existing harvesting robot prototypes work too slowly. They also do not detect darkened apples and apples with a lot of overlapping leaves and branches, as well as green apples on a green background, take yellow leaves as apples, etc. To solve these problems arising in apple detection in orchards, we propose to use the YOLOv3 algorithm with special pre-and postprocessing of images taken by the camera placed on the manipulator of the harvesting robot. The remainder of the paper is structured as follows. Section 2 reviews related works on apple detection in orchards using intelligent algorithms, Sect. 3 presents our technique, and finally, the results are discussed in Sect. 4. The efficiency and productivity of harvesting robots are largely determined by algorithms used to detect fruits in images. In various prototypes of such robots, various recognition techniques based on one or more factors were used. One of the main factors based on which the fruit can be detected in the image is color. The set color threshold can be used for each pixel in the image to determine if this pixel belongs to the fruit. In [6, 7] , this approach showed a 90% share of correctly recognized apples, and in [8] , a 95% share of correctly recognized apples, although on very limited datasets (several dozen images). Of course, color-based apple detection works well in the case of red apples, but, as a rule, does not provide satisfactory quality for green apples [9] . To solve the problem of green fruit recognition, many authors combine image analysis in the visible and infrared spectra [7, [10] [11] [12] . For example, in [11] , a 74% fraction of correctly detected apples (accuracy), obtained by combining analysis of the visible and infrared spectra, is compared with 66% of the correctly detected fruits based on analysis of only the visible spectrum and with 52% accuracy based on analysis of only the infrared spectrum. The obvious virtue of detecting fruit by color is the ease of implementation, but this method detects green and yellow-green apples very poorly. Also, bunches of red apples merge into one large apple, and this leads to an incorrect determination of the apple bounding box coordinates. Thermal cameras are quite expensive and inconvenient in practical use since the difference between apples and leaves is detected when shooting is made within two hours after dawn. To detect spherical fruits, such as tomatoes, apples, and citrus, fruit recognition algorithms based on the analysis of geometric shapes could be used. The main advantage of the analysis of geometric shapes is the low dependence of the quality of recognition of objects on the level of illumination. To detect spherical fruits, in [13] [14] [15] , modifications of the Hough circular transform were used to improve the detection quality of fruits partially hidden by leaves or other fruits. In [16, 17] , fruit detection algorithms based on convex object identification were proposed. In order to improve the quality of recognition in uncontrolled environments, which may deteriorate due to uneven lighting, a partial overlap of fruits with other fruits and leaves, as well as other features, many researchers use a combination of color and shape analysis algorithms. Systems based on such algorithms work very quickly, but complex scenes, especially with fruits, overlapped by leaves or other fruits, are usually not recognized effectively by such systems. Detecting fruits by shape gives a large error, since not only apples are round, but also gaps, leaf silhouettes, spots and shadows on apples. Combining circle selection algorithms with subsequent pixel analysis is inefficient in terms of calculation speed. Fruits photographed in orchards in natural conditions differ from the leaves and branches in texture, and this can be used to facilitate the separation of fruits from the background. Differences in texture play a particularly important role in fruit recognition when the fruits are grouped in bunches or overlapped by other fruits or leaves. For example, in [18] , apples were detected based on image texture analysis in combination with color analysis, and the proportion of correctly recognized fruits was 90% (on a limited dataset). Detecting fruits by texture only works in close-up images with good resolution and works very poorly in backlight. The low speed of texture-based fruit detection algorithms and a too high proportion of undetected fruits lead to the inefficiency of practical use of this technique. Machine learning methods have been already used to detect fruits in images for a long time. The earliest prototype of a fruit-picking robot that detects red apples against a background of green leaves using machine learning techniques was presented back in 1977 by E.A. Parrish and J.A.K. Goksel [19] . In [11] , in order to detect green apples against the background of green leaves, Kmeans clustering was applied to a and b CIE L * a * b color space coordinates in the visible spectrum, as well as to image coordinates in the infrared spectrum with the subsequent removal of noise. It allowed the authors to correctly detect 74% of apples in the images from the test data set. The use of a linear classifier and the KNN-classifier to detect apples and peaches in the machine vision system was compared in [21] , with both classification algorithms yielding similar accuracy at 89%. In [22] , the linear classifier has shown 80% accuracy of apple detection. The authors of [23] recognized apples, bananas, lemons, and strawberries in images, using a KNN-classifier and reported 90% accuracy. Applying KNN-classifier to color and texture data allowed to find 85% of green apples in raw images and 95% in hand-processed images [24] . In [25] , the SVM-based apple detection algorithm was introduced. This classifier balanced the ratio between accuracy and recognition time, showing 89% of correctly detected fruits and an average apple detection time equal to 3.5 s. Using SVM for apple detection in [26] has shown 92% accuracy (on a test dataset of 59 apples). It is very unusual that boosted decision trees were practically not used in fruit detection systems. In [27] , the AdaBoost algorithm was used to recognize kiwi in orchards, which made it possible to achieve a 92% share of correctly detected fruits against branches, leaves, and soil. In [28, 29] , AdaBoost was applied to color analysis to automatically detect ripe tomatoes in a greenhouse, showing 96% accuracy. The search for examples of the use of modern algorithms like XGBoost, LightGBM, CatBoost for detecting fruits in images has not yielded results. All these early-stage machine learning techniques for fruit detection were tested on very limited datasets of several dozens of images, so the results cannot be generalized for practical use. Since 2012, with the advent of deep convolutional neural networks, in particular, AlexNet proposed by A. Krizhevsky, I. Sutskever, and G.E. Hinton in [30] , machine vision and its use for detecting various objects in images, including fruits, received an impetus in development. AlexNet took first place in the ImageNet Large-Scale Visual Recognition Challenge 2012 (the share of correctly recognized images was 84.7% against 73.8% in the second place). In [32] . In 2016, a new algorithm was proposed -YOLO (You Look Only Once) [33] . Before this, in order to detect objects in images, classification models based on neural networks were applied to a single image several timesin several different regions and/or on several scales. The YOLO approach involves a one-time application of one neural network to the whole image. The model divides the image into regions and immediately determines the scope of objects and probabilities of classes for each object. The third version of the YOLO algorithm was published in 2018 as YOLOv3 [34] . The YOLO algorithm is one of the fastest, and it has already been used in robots for picking fruits. Y. Tian, G. Yang, Zh. Wang, H. Wang, E. Li, and Z. Liang (2019) proposed a modification of the YOLO model and applied it to detect apples in images [35, 36] . The authors made the network tightly connected: each layer was connected to all subsequent layers, as the DenseNet approach suggests [37] . The IoU (Intersection over Union) indicator turned out to be 89.6%, with an average recognition time of one apple equal to 0.3 s. H. Kang and C. Chen (2020) developed DaSNet-v2 neural network to detect apples. This network detects objects in images in a single pass, considering their superposition, just like YOLO. The IoU in this model was at 86% [38] . Sh. Wan and S. Goudos (2020) compared three algorithms: the standard Faster R-CNN, the proposed by them modification of the Faster R-CNN, and YOLOv3 for detection of oranges, apples, and mangoes. It turned out that the modification proposed by the authors reveals about 90% of the fruits, which is 3-4% better than the standard Faster R-CNN on the same dataset and at about the same level as YOLOv3 [39] . The Department of data analysis and machine learning of the Financial University under the Government of the Russian Federation, together with the Laboratory of machine technologies for cultivating perennial crops of the VIM Federal Scientific Agroengineering Center, is developing a robot for harvesting apples. The VIM Center develops the mechanical component of the robot, while the Financial University is responsible for the intelligent algorithms for detecting fruits and operating the manipulator for their picking. To detect apples, we used the standard YOLOv3 algorithm [34] trained on the COCO dataset [40] , which contains 1.5 million objects of 80 categories marked out in images. Since we considered only apple orchards, we were guided by the round shape of objects, and the categories "apples" and "oranges" were combined. Some image preprocessing techniques are used to improve apple detection quality: • adaptive histogram alignment (to increase contrast); • slight blur; • thickening of the borders. As a result, it was possible to mitigate the negative effects of shadows, glare, minor damages of apples, and the presence of thin branches overlapping apples. As the test data set, 6083 images of apples of various kinds, including red and green apples, taken in orchards by the VIM Center employees, were used. Using the standard YOLOv3 algorithm to detect apples in this set of images showed that not all apples are recognized successfully (Fig. 1) . The following primary factors preventing recognition of apples in images were identified: • backlight; • presence of dark spots on apples and/or a noticeable perianth; • the proximity of the green apple shade to leaves shade; • existence of empty gaps between the leaves, which the network mistook for small apples; • overlapping apples by other apples, branches, and leaves. To attenuate the negative influence of backlight, images, on which this problem was detected by comparing the number of dark pixels with the average, were strongly lightened. Since spots on apples, perianth, as well as thin branches, are represented in images by pixels of brown shades, such pixels were replaced by yellow ones, which made it possible to recognize apples in such images successfully as shown in Fig. 2 . Figure 3 shows examples of images in which yellow leaves, as well as small gaps between leaves, are mistakenly recognized as apples. To prevent the system from taking yellow leaves for apples, during postprocessing, we discarded recognized objects whose ratio of the larger side of the circumscribed rectangle to the smaller one was more than 3. In order not to take the gaps between the leaves for apples, during the postprocessing, objects were discarded whose area of the circumscribed rectangle was less than the threshold. The problem when not all the apples forming a bunch are detected (Fig. 4) is not significant for the robot, since the manipulator takes out only one apple at each step, and the number of apples in the bunch decreases. It should be noted that this problem arises when analyzing canopy-view images only. When analyzing close-up images taken by a camera located on the robot arm, this problem does not occur. In general, the system proposed recognizes both red and green apples quite accurately (Fig. 5, 6) . The system detects apples that are blocked by leaves and branches, green apples on a green background, darkened apples, etc. Green apples are detected better if the shade of apple is at least slightly different from the shade of leaves (Fig. 6) . The detection time for one apple ranged from 7 to 46 ms, considering pre-and postprocessing (on average, one apple was detected in 19 ms). We evaluated the apple detection results by manually comparing the ground-truth apples labeled by the authors manually in the images and the apples detected by the algorithm and calculating the following metrics: • TP (True Positive) stands for the share of fruits correctly detected by the algorithm; • FP (False Positive)the number of errors of the first kind, i.e. background objects in the image, mistakenly accepted by the algorithm as fruits; • FN (False Negative)the number of errors of the second kind, i.e. fruits not detected by the algorithm. represent the share of actual fruits among all the objects identified by the algorithm as the fruits and the fruit detection rate (i.e. the share of detected fruits among all the ground-truth fruits), are the shares of not detected fruits and background objects detected as fruits by mistake. The algorithm proposed in this paper turned out to demonstrate a smaller fraction of undetected apples (FNR) than in modern models [34, 35, 37, 38] based on convolutional neural networks, with a higher fruit detection rate. Agricultural robots for field operations: concepts and components Robotics and intelligent machines in agriculture Design and implementation of an aided fruit-harvesting robot (Agribot) Automation in agriculture A review of automation and robotics for the bioindustry Apple location method for the apple harvesting robot A fruit detection system and an end effector for robotic harvesting of Fuji apples Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot A review of key techniques of vision-based control for harvesting robot Image fusion of visible and thermal images for fruit detection Low and high-level visual feature-based apple detection from multi-modal images. Precision Agric Apple detection in natural tree canopies from multimodal images Fruit location in a partially occluded image Ripened strawberry recognition based on Hough transform An object detection method for quasi-circular fruits based on improved Hough transform Vision-based localization of mature apples in tree images using convexity Detection and location algorithm for overlapped fruits based on concave spots searching On-tree fruit recognition using texture properties and color data Pictorial pattern recognition applied to fruit harvesting Low and high-level visual feature-based apple detection from multi-modal images. Precision Agric Computer vision to locate fruit on a tree Development of a real-time machine vision system for apple harvesting robot A new method for fruits recognition system Determination of the number of green apples in RGB images recorded in orchards Automatic recognition vision system guided for apple harvesting robot Automatic apple recognition based on the fusion of color and 3D feature for robotic fruit picking Recognition of kiwifruit in field based on Adaboost algorithm Robust tomato recognition for robotic harvesting using feature images fusion Detecting tomatoes in greenhouse scenes by combining AdaBoost classifier and colour analysis ImageNet classification with deep convolutional neural networks Very deep convolutional networks for large-scale image recognition Robotic kiwifruit harvesting using machine vision, convolutional neural networks, and robotic arms You only look once: unified, real-time object detection YOLOv3: an incremental improvement Apple detection during different growth stages in orchards using the improved YOLO-V3 model Detection of apple lesions in orchards based on deep learning methods of CycleGAN and YOLO-V3-Dense Densely connected convolutional networks Fruit detection, segmentation and 3D visualization of environments in apple orchards Faster R-CNN for multi-class fruit detection using a robotic vision system COCO: Common Objects in Context Dataset