key: cord-102158-5xg40s4o authors: Coulibali, Zonlehoua; Cambouris, Athyna Nancy; Parent, Serge-Étienne title: Site-specific machine learning predictive fertilization models for potato crops in Eastern Canada date: 2020-03-12 journal: bioRxiv DOI: 10.1101/2020.03.12.988626 sha: doc_id: 102158 cord_uid: 5xg40s4o Statistical modeling is commonly used to relate the performance of potato (Solanum tuberosum L.) to fertilizer requirements. Prescribing optimal nutrient doses is challenging because of the involvement of many variables including weather, soils, land management, genotypes, and severity of pests and diseases. Where sufficient data are available, machine learning algorithms can be used to predict crop performance. The objective of this study was to predict tuber yield and quality (size and specific gravity) as impacted by nitrogen, phosphorus and potassium fertilization as well as weather, soils and land management variables. We exploited a data set of 273 field experiments conducted from 1979 to 2017 in Quebec (Canada). We developed, evaluated and compared predictions from a hierarchical Mitscherlich model, k-nearest neighbors, random forest, neuronal networks and Gaussian processes. Machine learning models returned R2 values of 0.49–0.59 for tuber marketable yield prediction, which were higher than the Mitscherlich model R2 (0.37). The models were more likely to predict medium-size tubers (R2 = 0.60–0.69) and tuber specific gravity (R2 = 0.58–0.67) than large-size tubers (R2 = 0.55–0.64) and marketable yield. Response surfaces from the Mitscherlich model, neural networks and Gaussian processes returned smooth responses that agreed more with actual evidence than discontinuous curves derived from k-nearest neighbors and random forest models. When marginalized to obtain optimal dosages from dose-response surfaces given constant weather, soil and land management conditions, some disagreements occurred between models. Due to their built-in ability to develop recommendations within a probabilistic risk-assessment framework, Gaussian processes stood out as the most promising algorithm to support decisions that minimize economic or agronomic risks. 110 * NPK factorial design or others where N (nitrogen), P (phosphorus) and K (potassium) 111 were kept constant. . We matched the duration from planting to harvest but the classes names differed. 116 The preceding crops were categorized as in Parent et al. [7] as grasslands, legumes, 117 cereals, low-residue crops and high-residue crops. Toponymic names, geographical 118 coordinates and years were recorded at each site. Fertilizers other than N, P or K, 119 fertilizer source, dosage and application method, seeding density and date, harvest date, 120 tuber marketable yield (excluding tubers < 2.5 cm in diameter), tuber size distribution 121 (small, medium, large) and SG were recorded. The N fertilizers were either all applied 122 at seeding or split-applied between seeding and hilling. The P fertilizers were banded at 123 planting. The K fertilizers were band-applied or split-applied before planting and at 124 planting. We added 17 trials conducted in 2016 and 2017 in the Outaouais, Centre-du-125 Québec, and Lac-Saint-Jean regions. We reported the growing season lengths provided 126 by scouting teams covering the period from seeding to harvest and not strictly 127 corresponding to the theoretical CFIA [53] growth duration as shown for cultivars 128 Superior, Goldrush, Krantz and FL 1533 from the trials used for model analysis (Table 129 2). 130 203 side, c jis the compositional vector at the left-hand side, c j + is the compositional vector 204 at the right-hand side, and g() is the geometric mean function. The proportion of the textural components and the carbon content formed the 206 soil texture simplex. The balances are presented in Table 5 . We followed the 207 [denominator parts | numerator parts] notation [71]. 208 Index for rainfall 224 Rd is daily rainfall, n is the number of days and Tm is daily mean temperature. 315 Typically, values greater than 0.5 are considered acceptable [92] . The MAE is the 316 average of the absolute differences between predictions and observations as in equation Mitscherlich, NN and GP models generated smooth response curves, while the KNN 407 and RF models generated stepped curves. The marketable yield was non-responsive to P 408 application in the RF model. There was also no effect of K fertilization on the yield 409 shown by the Mitscherlich and RF models. All models for the P trial somewhat 410 underestimated marketable yield while response curves followed data for N. (Fig 4) showed increasing response to N fertilization across models, 419 while response was globally poor for P and K. For the [S | M] balance, responses 420 increased with increasing fertilizer doses, except for P and K trials data fitted with GP 421 model (Fig 5) . There was also poor response for K trial with SG (Fig 6) . The SG 422 response decreased from zero K levels and increased then decreased as P dosage 423 increased. For N trials, SG slightly increased then decreased as N dose increased in the 424 RF model, but was non-responsive with the other models. The prediction of optimum fertilizer doses and optimum or maximum outputs 588 showed some disagreements for the case presented (Fig 7) . There should be a single 589 economic optimal dose or agronomic optimal dose at each site each year. Some models 590 were more consistent than others in deriving optimal doses depending on the target 591 variable. At extremely low predicted N, P or K doses, it could be challenging to manage 592 the fertilization program at low economic risk for producers, who generally consider 593 that the cost of over-fertilization is low compared to the cost of under-fertilization [37, 594 38]. The probabilistic prediction capability of Gaussian processes may help to 595 determine credible dosage. The average GP curve is shown as a black line, with its 463 optimal dosage as a black dot. Five sampled GP curves are plotted as grey lines, with 464 their optimal doses as grey dots. The probability distributions of the 1000 optimal doses 465 are shown under the respective response curves. The figures show that predicted means prediction only for the N trial the probabilistic prediction was equal to the mean GP prediction for P 471 trial i.e., 87 kg P ha -1 , while N and K trials returned equal predictions with the [S | M] 472 balance prediction models with 0.0 kg ha -1 and 0.70 kg ha -1 , respectively (Fig 10). For 473 tuber SG prediction models Examples of optimal economic N, P, K doses distribution with Gaussian 477 processes using marketable yield for selected trials. N: nitrogen, P: phosphorous Examples of agronomic optimal N, P, K doses distribution with Gaussian 481 processes using tuber size [M, S | L] balance for selected trials. N: nitrogen, P: 482 phosphorous, and K: potassium 483 26 484 Fig 10: Examples of agronomic optimal N, P, K doses distribution with Gaussian 485 processes using tuber size [S | M] balance for selected trials. N: nitrogen, P: 486 phosphorous Examples of agronomic optimal N, P, K doses distribution with Gaussian 489 processes using tuber SG for selected trials Features with low or no importance could be removed 498 without affecting model performance [74]. The preceding crops categories i.e., 499 grassland, small grains, legumes, low-residue crops and high-residue crops, as 500 categorized by Parent et al. [7], returned zero (for tuber SG) or faintest scores (for other 501 target variables) and were thus removed despite a substantial body of literature on the 502 advantages of crop rotation to the next crop. Nonetheless, Zebarth et al. [96] stated that 503 the amount of nitrogen mineralized from organic matter during the growing season 504 cannot be predicted accurately. Torma et al. [97] found that the N supplied by soil and 505 crop residues (maize, potato, silage maize, soybean, sunflower, winter rape, winter 506 wheat) ranged from 20 to 132 kg ha -1 , while the phosphorus ranged from 2 to 24 kg ha -1 31 or decreasing GP samples, which are more frequent when the sample is close 604 to patterns in data where the response to fertilizer is flat. A zero-fertilizer 605 recommendation could be interpreted as a soil sufficiently fertile to supply the crop Khiari et al. [43] assessed the 50 th and 80 th 618 percentiles. The mean (50%), the median or any other percentile dose could be 619 computed to support decision-making. For example, the mean GP and the probability 620 distribution processes returned the upper bound of the simulation dosage (i.e., 250 kg N 621 ha -1 ) as the economic optimal dose for the N trial with the marketable yield prediction 622 model (Fig 8). The conditional expectation percentiles showed that a lower dose (i.e., 623 223 kg N ha -1 ) could be recommended This study assessed machine learning techniques as an alternative for potato 627 fertilizer recommendations at local scale usually handled by statistical models or meta-628 analysis at regional scale. A large collection of field trial data provided information to 629 fit machine learning models with specific traits of cultivars, soil properties, weather 630 indexes, and N, P and K fertilizers dosage used as predictive features P and 632 K doses derived from yield, or against optimal agronomic N, P and K doses derived 633 from tuber size and SG. The models trained using machine learning algorithms 634 outperformed the Mitscherlich tri-variate response predictive model. The marketable 635 yield prediction coefficient (R 2 ) varied between 0.49 and 0.59, while the Mitscherlich from uniform distributions under constant weather conditions, soil properties 643 and land management factors As large amounts of data are being assembled into observational data sets, in the context of precision agriculture. To assess model performance 648 under real-world situations data since accurate future weather data covering the growing season are unavailable Any biotic factor other than 652 fertilizer, e.g., length of growing season or planting density, could be optimized with management scenarios. With more experiment data, the training and testing division 655 could be performed at trial level to improve the model predictive ability Criteria for publishing papers on crop modeling. Field Crops 661 Research An overview of available crop growth and yield 663 models for studies and assessments in agriculture Decision support systems in potato production Potato, sweet potato, and yam 668 models for climate change: a review Mathematical models of plant growth and development A neural network experiment on the 672 site-specific simulation of potato tuber growth in Eastern Canada. Computers and 673 Electronics in Agriculture Site-specific multilevel 675 modeling of potato response to nitrogen fertilization Effects of soil compaction on potato growth and its 678 removal by cultivation Differentiation of potato ecosystems on 680 the basis of relationships among physical, chemical and biological soil parameters Agronomic practices An analysis of the response of sugar beet and potatoes to 685 fertilizer nitrogen and mineral soil mineral nitrogen Potato response to crop sequence and 688 nitrogen fertilization following sod breakup in a Gleyed Humo-Ferric Podzol Responses of potato (Solanum tuberosum L.) to 691 green manure cover crops and nitrogen fertilization rates Soil mineralizable nitrogen 694 and soil nitrogen supply under two-year potato rotations Italian ryegrass 697 management effects on nitrogen supply to a subsequent potato crop Effect of straw and fertilizer nitrogen management for 700 spring barley on soil nitrogen supply to a subsequent potato crop A model of the development and bulking of potatoes 703 (Solanum Tuberosum L.) I. Derivation from well-managed field crops. Field Crops 704 Research Potato response 706 to nitrogen sources and rates in an irrigated sandy soil Rate and timing of nitrogen fertilization of 709 Russet Burbank potato: Yield and processing quality The potato crop: the scientific basis for improvement Water relations and growth of potatoes. The potato crop: 714 Springer Effects of climate on different potato 716 genotypes. 2. Dry matter allocation and duration of the growth cycle Comparison of empirical daily surface 719 incoming solar radiation models. Agricultural and Forest Meteorology Yield levels of potato crops: Recent achievements and future 722 prospects Prediction of soil nitrogen supply in potato fields using soil temperature and water 727 content information SA. Soil nutrient bioavailability: a mechanistic approach Minerals, soils and roots Net primary 733 productivity and below-ground crop residue inputs for root crops: Potato (Solanum 734 tuberosum L.) and sugar beet (Beta vulgaris L.) Water-nutrients interaction: exploring the effects of water as a central role 737 for availability & use efficiency of nutrients by shallow rooted vegetable crops -a 738 review Potash requirements of potatoes Düngung sichert ertrag und qualität. Land & Fort The significance of trends in concentrations of total nitrogen 745 and nitrogenous compounds Commercial potato production in North America The Potato Association of America Handbook Global markets for processed potato products Meeting global food needs: realizing the potential via 752 genetics x environment x management interactions Do farmers waste fertilizer? A comparison of ex post optimal 755 nitrogen rates and ex ante recommendations by model, site and year. Agricultural 756 Systems Nouveaux outils de gestion de l'azote dans la production de la pomme de 758 terre. CRAAQ, Colloque sur la pomme de terre Dynamics of nitrate leaching under irrigated potato rotation in 760 Washington State: a long-term simulation study. Agriculture, ecosystems & 761 environment Long-term simulations of nitrate leaching from potato 763 production systems in Prince Edward Island Controls on nitrate loading and implications for BMPs under intensive potato 767 production systems in Prince Edward Island Groundwater monitoring to 770 support development of BMPs for groundwater protection: the Abbotsford-Sumas 771 aquifer case study An agri-773 environmental phosphorus saturation index for acid coarse-textured soils Agri-776 environmental models using Mehlich-III soil phosphorus saturation index for Canadian journal of soil science Mehlich-III soil phosphorus saturation indices for Quebec acid to near neutral mineral 780 soils varying in texture and genesis Nitrogen balances and yields of spring cereals 783 as affected by nitrogen fertilization in northern conditions: A meta-analysis Management of nitrogen and water in potato production Pers, Wageningen2000 Disaggregating model bias and variability 788 when calculating economic optimum rates of nitrogen fertilization for corn Alternative benchmarks for economically 791 optimal rates of nitrogen fertilization for corn Advances in machine learning applications in software 794 engineering: IGI Global Application of machine learning methodologies for predicting corn economic optimal 797 nitrogen rate Soil test correlation, calibration, and recommendation Soil Testing and Plant Analysis Potato plants characteristics, maturity. Canadian Food Inspection Agency: 802 Canadian Food Inspection Agency A specific gravity calculator for potatoes Soil Classification Working Group. Canadian system of soil classification Canadian system of soil classification Numerical clustering of soil series using 810 morphological profile attributes for potato Methods of soil 813 analysis: Part 1 -Physical and mineralogical methods (Agronomy M): Soil Science 814 Society of America Determination of soil 816 texture by laser diffraction method Inventaire des problèmes de 819 dégradation des sols agricoles du Québec: rapport synthèse. Entente auxiliaire Canada-38 Québec sur le développement agro-alimentaire Québec Service de recherche en sols Soil reaction and exchangeable acidity Soil sampling and methods of analysis. 2. 2nd 824 ed1993 Methods 826 of soil analysis Part 2 Chemical and microbiological properties1982 A comparison of three methods of organic 828 carbon determination in some New Zealand soils Table interprétative de la mesure du pH des sols du Québec par quatre 831 méthodes différentes Mehlich-III extractable elements Correlation of Mehlich 3, Bray 1, and 836 ammonium acetate extractable P, K, Ca, and Mg for Alaska agricultural soils A modified single solution method for the determination of 839 phosphate in natural waters Guide de référence en fertilisation Guide de 843 référence en fertilisation. 2è ed2010 Groups of parts and their balances in compositional 845 data analysis Balance trees reveal microbial niche differentiation. mSystems Plant ionome diagnosis using sound 849 balances: case study with mango (Mangifera Indica). Frontiers in plant science Development and testing of Canada-wide interpolated spatial models of daily 853 minimum-maximum temperature and precipitation for 1961-2003 Corn response to nitrogen is influenced by soil texture and weather Scikit-859 learn: machine learning in Python Dealing with zeros and 861 missing values in compositional data sets using nonparametric imputation zCompositions -R Package for 864 multivariate imputation of left-censored data under a compositional approach R: A language and environment for statistical computing. R Foundation 867 for Statistical Computing tidyverse: easily install and load the 'Tidyverse'. R package version 1.2.1 compositions: compositional data analysis. 871 R package version 1 robCompositions: an R-package for robust statistical 873 analysis of compositional data John Wiley 875 & Sons Python tutorial, technical report CS R9526: Centrum 877 voor Wiskunde en Informatica (CWI) Amsterdam SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python. arXiv 880 preprint The NumPy array: a structure for efficient 882 numerical computation Hunter JD. Matplotlib: A 2D graphics environment Réseaux de neurones A review of supervised machine learning algorithms 889 and their applications to ecological data Modeling Avena fatua seedling emergence dynamics: An artificial neural network 892 approach. Computers and Electronics in Agriculture Classification of arrhythmia using machine learning techniques Gaussian processes based bivariate 896 control parameters optimization of variable-rate granular fertilizer applicator A bivariate response surface for growth data. 899 Fertilizer research Model 901 evaluation guidelines for systematic quantification of accuracy in watershed 902 simulations Hydrological modeling of the Iroquois 904 river watershed using HSPF and SWAT 1 Nitrogen uptake across site specific 907 management zones in irrigated corn production systems A simple 910 software tool to simulate nitrate and potassium co-leaching under potato crop Nitrogen management for 913 potato: general fertilizer recommendations Residual plant nutrients in crop 916 residues -an important resource. Acta Agriculturae Scandinavica Section B-Soil and 917 Crop rotation effects on soil fertility and plant nutrition University of Maryland: 920 NRAES Do plants need nitrate? The mechanisms by which 922 nitrogen form affects plants Chapter 6 -Functions of Macronutrients. Marschner's mineral nutrition of higher 925 plants Feddes RA. Water, heat and crop growth A simulation model for potato growth and 928 development: Substor-potato Version 2.0: Michigan State University, Department of 929 Crop and Soil Sciences Adaptation of potato to high temperatures and salinity-a review Compaction of 933 coarse-textured soils: balance models across mineral and organic compositions. 934 Frontiers in Ecology and Evolution Above-ground and below-ground plant development Potatoes and human health The effect of in-row seed piece spacing and harvest date of the tuber yield 941 and processing quality of Conestoga potatoes in southern Manitoba Aspects physiologiques de la croissance et du développement La pomme de terre: production ennemis et maladies, utilisations. PARIS: INRA; 1996 Evaluation of the effect of density on 947 potato yield and tuber size distribution Factors affecting specific gravity loss in crisping 949 potato crops in Yield response of 951 potatoes to variable nitrogen management by landform element and in relation to 952 petiole nitrogen -A case study Nitrogen fertilization and 954 irrigation affects tuber characteristics of two potato cultivars Influence of fertilizer management and soil fertility on 957 tuber specific gravity: a review Effect of nitrogen, phosphorus, and potassium fertilizers on yield 960 components and specific gravity of potatoes Effects of nitrogen, phosphorus, and 963 potassium on yield, specific gravity, crisp colour, and tuber chemical composition of 964 potato Comparison of models for describing corn yield response 967 to nitrogen-fertilizer Modeling nutrient responses in the field. Plant 969 and Soil Comparison of three 971 statistical models describing potato yield response to nitrogen fertilizer Modified-quadratic/plateau model for describing plant-responses 974 to fertilizer Quadratic and quadratic-plus-plateau models for predicting 976 optimal nitrogen rate of corn: A comparison Relationships between nitrogen rate, 978 plant nitrogen concentration, yield, and residual soil nitrate-nitrogen in silage corn Agronomic use efficiency 981 of N fertilizer in maize-based systems in sub-Saharan Africa within the context of 982 integrated soil fertility management Rich AE. Potato diseases Influence of weed 985 competition on potato growth, production and radiation use efficiency Potato_df.csv' file available in 990 'data' repository at