key: cord-306124-sn780ike authors: Jakariya, Md.; Alam, Md. Sajadul; Rahman, Md. Abir; Ahmed, Silvia; Elahi, M. M. Lutfe; Khan, Abu Mohammad Shabbir; Saad, Saman; Tamim, H. M.; Ishtiak, Taoseef; Sayem, Sheikh Mohammad; Ali, Mirza Shawkat; Akter, Dilruba title: Assessing climate-induced agricultural vulnerable coastal communities of Bangladesh using machine learning techniques date: 2020-06-16 journal: Sci Total Environ DOI: 10.1016/j.scitotenv.2020.140255 sha: doc_id: 306124 cord_uid: sn780ike Abstract The agricultural arena in the coastal regions of South-East Asian countries is experiencing the mounting pressures of the adverse effects of climate change. Controlling and predicting climatic factors are difficult and require expensive solutions. The study focuses on identifying issues other than climatic factors using the Livelihood Vulnerability Index (LVI) to measure agricultural vulnerability. Factors such as monthly savings of the farmers, income opportunities, damage to cultivable lands, and water availability had significant impacts on increasing community vulnerability with regards to agricultural practice. The study also identified the need for assessing vulnerability after certain intervals, specifically owing to the dynamic nature of the coastal region where the factors were found to vary among the different study areas. The development of a climate-resilient livelihood vulnerability assessment tool to detect the most significant factors to assess agricultural vulnerability was done using machine learning (ML) techniques. The ML techniques identified nine significant factors out of 21 based on the minimum level of standard deviation (0.03). A practical application of the outcome of the study was the development of a mobile application. Custom REST APIs (application programming interface) were developed on the backend to seamlessly sync the app to a server, thus ensuring the acquisition of future data without much effort and resources. The paper provides a methodology for a unique vulnerability assessment technique using a mobile application, which can be used for the planning and management of resources by different stakeholders in a sustainable way. The Intergovernmental Panel on Climate Change (IPCC) highlights that the vast low-lying coastal region of Bangladesh is particularly vulnerable to risks from climate change (IPCC, 2014) . Increase in temperatures, changing rainfall patterns, sea-level rise, and increase in frequency and intensity of extreme climatic events are negatively impacting agriculture, water resources, human health, and the ecosystem (Wheeler and Von Braun, 2013) . Climate change has already had major impacts on the lives and livelihoods of the people in the coastal areas of Bangladesh (GoB, 2018 (GoB, , 2011 . The greater reliance of the people on the agricultural sector has made it more challenging today as a sustainable livelihood. Agriculture will become even more susceptible in the future due to changes in climate variables (IPCC, 2007; Islam et al., 2011) . One predominant impact of climate change will be fluctuations in crop yield due to frequent changes in climatic variables (Mendelsohn & Dinar, 2003) . Moreover, extreme climatic events, soil salinity in coastal areas, and incidence of pests and diseases due to the increased temperature and humidity may result in additional adverse effects on the agriculture sector (Rosenzweig et al., 2001) . Despite technological development, climatic factors are still fundamental dominators of agricultural productivity. Addressing the issues to find solutions to reduce agricultural vulnerability would require an integrated and comprehensive management plan with particular consideration for hazard vulnerability and the resilience of the coastal population to climate change (Sajjad & Chan, 2019) . The authors recognize that long-term measures will be required to address the climatic factors of vulnerability and, therefore, the paper suggests methods to find the second most significant factors that can be addressed more easily and can reduce agricultural vulnerability in the short run. An effort was made to find the crop yield vulnerability of the farmers of the three coastal districts of Bangladesh by identifying the significant factors that have increased effects on the vulnerability score by Machine Learning models. This was done by first calculating the Vulnerability Livelihood Index of agriculture for each of the study areas. Vulnerability Indicators help monitor and keep track of the changing vulnerability over time and space (Shah et al., 2013) . The three components that characterize vulnerability include J o u r n a l P r e -p r o o f 3 to cope with vulnerabilities related to rice production that would be specific for each region and could be managed locally with the help of mobile applications. Developing such technology-based solutions has become extremely important, especially considering the scarcity of global resources and, more importantly, the recent COVID-19 pandemic, which has emphasized the need to address the issue further. Bangladesh, as a country, with its unique geographical location, is prone to natural disasters and climatic effects. Therefore, at a time of such disasters, be it geographical or in the health-sector when mobility needs to be limited, it becomes next to impossible to maintain proper resource management. The work in this paper thus presents a complete system for the coastal areas of Bangladesh that work with data acquisition using mobile applications, data processing using machine learning techniques, and interpretation through web-based interfaces. In addition, the architecture that was designed for this system is quite generalized and can be adopted for work in other domains as well, with minimal modifications. Three coastal districts of Bangladesh, namely, Patuakhali, Kutubdia, and Khulna were selected for this research study. The maps of the study area are shown in Figure 1 . The vulnerability assessment method used in the study was based on the vulnerability assessment method of GIZ Vulnerability Sourcebook (Fritzsche et al., 2014) , which is built upon the IPCC Framework. It should be noted that the GIZ study only addressed specific methods of the vulnerability score calculation, but this study goes a step further and puts forth a practical application and usability of the score, giving policymakers a chance to apply the vulnerability information for functional purposes. Moreover, while GIZ only identified the main components of vulnerability (e.g., exposure, sensitivity, and potential impacts), this research project identified the significant factors for all components of vulnerability, as shown in Figure 2 through engaging in extensive discussions with local communities. (2), (3), and (4) below: (1) where, W = Weight and N = total number of factors. The categories assigned for the vulnerability score is shown in Table 1 . The regions were assigned into a category of low, medium, and high vulnerability based on the vulnerability score along with consultations with experts and local villagers (Schiffman & Kanuk, 2004 ). Random sampling methods were used to select the study areas and study population (Bernard, H.R. 2002) . A total of 930 households were selected to collect preliminary data in order to get a general overview of the study area. Broadly, socio-economic, climatic, water and sanitation, and disaster management related issues were covered in the questionnaire. A separate set of 297 samples were chosen from the preliminary selection of households to assess agricultural production and related vulnerability issues. The average vulnerability index and the spatial distribution of the houses according to vulnerability were prepared using J o u r n a l P r e -p r o o f 5 of each respondent, stores the result in a database and sends the data to a web dashboard. The web dashboard summarizes results for all respondents and also provides individual level vulnerability scores. Administrators can use information from this dashboard to determine how resources can be optimally allocated to provide personalized help to each vulnerable individual. The factors related to three different variables of vulnerability, e.g., exposure, sensitivity, and adaptive capacity, were identified through focus group discussions (FGD) with the local farmers in each village. Climate change-related exposures are global issues, whereas sensitivity and adaptive capacity are location-specific and can be addressed with local interventions (Wilbanks, 2003; Hess, et al., 2008) . The results of the FGDs are presented in The vulnerability weights reflect the perception of farmers regarding the factors related to vulnerability in the study areas. Weights measured from the ranking exercise conducted with farmers are displayed in Table 2 . Across the coastal region of Bangladesh, the climatic conditions were amongst the factors with the highest weights, which illustrate their importance to assess vulnerability levels. The climatic conditions include average rainfall, average humidity, and average temperature (weight = 0.3), which holds a weight of particular importance that could disrupt farming activities. During FGD sessions, farmers often mentioned price and market conditions as vital factors for sustaining livelihoods. The factors shown in Table 2 are location-specific and were collected through focus group discussions with the local farmers. The parameters with the highest vulnerability weights belong to exposure: average rainfall Community response variations about different vulnerability factors in the three districts are shown in Figure 5 . Among the parameters for sensitivity, the parameter with the highest weight in Cox's Bazar and Patuakhali that affects agricultural productivity is rain availability (weight = 0.2287, weight = 0.2181 respectively), whereas, in Khulna and Patuakhali, the parameter that holds the highest weight in the vulnerability calculation is the percentage of damaged crops (weight = 0.2181). The factors of adaptive capacity help to overcome the exposure and sensitivity factors when measuring the vulnerability of a community or household. The factors that hold the highest weight in the vulnerability calculation for Cox's Bazar were education level and seasonal crop diseases (weight = 0.134). The adaptive capacity factor with the highest weight in Khulna was also observed to be seasonal crop diseases (weight = 0 .1081). In Patuakhali, the factor with the highest weight for adaptive capacity was the education level (weight = 0.1171). Table 3 shows the state of crop yield vulnerability of the three coastal regions of Bangladesh, which is reflected in the vulnerability scores of different villages in the study area. Each village's vulnerability score was derived from analyzing the individual score of each household. Every household's individual score was then examined and the cumulative score was achieved to obtain the vulnerability score. The maps in Figure 6 of three coastal regions show geographical areas of vulnerability, which is the subject matter of the study. The spatial map shows the vulnerability level of the villagers according to the household survey. Similar to hazard maps, the vulnerability maps in Figure 6 highlight the zones where farmers and farming land are most vulnerable to a variety of factors, which include social, physical, and economic aspects of rice production, as discussed above. (Bathrellos et al., 2011 (Bathrellos et al., , 2013 (Bathrellos et al., & 2017 ). The overall average vulnerability level was found to be relatively moderate in all three study areas. This shows an overall similar vulnerability situation in the coastal region of the country. However, slight variations in terms of vulnerability were seen in Maheswaripur It is apparent from the study that the vulnerability in the crop yield sector varies according to regional and temporal variations of natural disasters in the coastal areas of Bangladesh. It was observed that among the 297 farmers, about 46%, 34%, and 55% were vulnerable to the risk of humidity, temperature, and precipitation, respectively. All the significant factors for vulnerability assessment were filtered using two different methods of statistical analysis and machine learning methods. Later, a comparative analysis of both the methods identified the best method to use for developing the mobile application. It was done with an understanding to develop a mobile application that was simple and convenient for the users in terms of handling fewer vulnerability factors for input. The following sections discuss both methods. The multivariate logistic regression model was performed to screen out the non-significant factors of sensitivity and adaptive capacity (Tolles & Meurer, 2016; Brunner & Giannini, 2011) . The goodness of fit of the model was high because the value of R 2 is 0.70 (Draper & Smith, 1998) . On the basis of the Wald test, five variables for sensitivity and five variables for adaptive capacity showed significance (p-value < 0.01) out of a total of 19 variables, which were considered initially for the vulnerability score calculation (Fahrmeir et al. 2013; J o u r n a l P r e -p r o o f Ward & Ahlquist, 2018) . The vulnerability score was measured by integrating the significant factors related to sensitivity and adaptive capacity without influencing the original score (Table 4) . Though the statistical analysis gave a primary list of important variables, the correlation of the vulnerability scores generated only using these variables with those generated using the full list of variables under the GIZ framework was not high (R 2 = 0.7). So, using machine learning, we opted to find a better approach that would generate vulnerability scores closer to the original ones with fewer variables. The distribution of individual vulnerabilities, calculated as per the GIZ method, is plotted in Figure 7 . It can be seen that vulnerability follows a normal distribution and that there are no extreme vulnerability scores. It was assumed that the distribution is such because the agriculture-dependent coastal people are generally more or less vulnerable. Of the 297 data points that were collected, three had null values for different factors, which might have been caused by erroneous data entries. Since the number of erroneous data points was very small, they were simply dropped and the remaining 294 data points were considered for the ML models. Moreover, in the dataset, there were only three distinct values for temperature, humidity, and precipitation. This occurred because each district was given a single value for each of these factors. As a result, there was minimal variance in the data for these factors and thus, was excluded from the ML models. Finally, to check whether any factor had little influence in predicting vulnerability, a column with random floating-point values taken from the half-open interval [0, 1) was added, entitled "Random". The intention was to make an importance ranking of the factors where any factor ranked below "Random" could easily be disregarded. Thus, in the end, 294 data points having 20 actual factors and one random factor were considered. Randomly chosen, 80% of these data points were kept for training the models and the remaining 20% for testing the performance of the models. Before training the models, to ensure that there were no factors with high correlation, the Spearman's rank correlation coefficient between each pair of factors in the training set was calculated and no two factors with high correlation were found. Later, the vulnerability scores obtained by using the GIZ formula were taken as ground truths and five different regressors J o u r n a l P r e -p r o o f 9 were tested to generate vulnerability scores as close as possible to the ones attainted using the GIZ method. The models and their respective performances are shown in Table 6 . It can be seen that linear regression and Bayesian Ridge Regression performed well in predicting vulnerability scores while random forest regression, XGB regression, and extremely randomized trees regression overfitted the training data. The hyperparameters for the random forest, XGB, and extremely randomized trees regression (Breiman, 2001; Chen and Guestrin, 2016; Geurts, et al., 2006) through Bayesian optimization were attempted to be tuned but were not successful in reducing the variance of these models without reducing their predictive capacity on the test set. This might be attributed to the fact that, in this case, these models are too complex for the small dataset being used. Linear regression and Bayesian Ridge Regression, on the other hand, did not require any hyperparameter tuning. As these models were functioning well, they were finally selected to generate the importance ranking of different factors. In order to obtain the importance scores of different vulnerability factors, permutation importance was used, which works by measuring the R 2 score on the original set of factors for a model and then calculating the decrease in R 2 by randomly permutating the values of each of the factors one at a time (Altmann et al., 2010) . In this way, the factors with a larger decrease in R 2 value are considered to be more important. Figure 8 Although linear regression and Bayesian Ridge Regression did not produce the same rankings, it was noticed that the two ranking schemes were similar in putting the same factors in higher or lower positions. To get a unified ranking, the ranks produced by the two regression models were summed up and sorted, with the factors in ascending order according to their sum of ranks. Later, the factors with smaller sums of ranks were considered more important than those with bigger sums of ranks. This unified ranking is shown in Table 6 . The vulnerability factors with the lowest rank to the second-highest rank and so on were dropped one by one and trained new linear and Bayesian ridge regression models with continuously reducing sets of factors. Table 7 lists how the new models performed with the reduced sets of factors. It was noted that the factor "Random" was already dropped from our dataset before training the new models because it was no longer necessary. It can be seen from Table 7 that up to 11 factors can be dropped and the vulnerability scores that have under 0.03 standard deviation from the original vulnerability scores while retaining a Pearson correlation coefficient of 0.93 can still be predicted. Since temperature, humidity, and precipitation were not included in the ML model like the original vulnerability calculation, it can be stated that, in actuality, up to 14 factors can be reduced and reasonable predictive capacity of vulnerability scores can still be maintained by asking only 9 questions. The ML method demonstrated successfully in identifying significant factors for vulnerability score calculation than that of the statistical approach. The ML method also demonstrated that it could strategically identify the significant vulnerability factors with the highest rank for designing program intervention without considering all significant factors to reduce a specific community vulnerability in a resource constraint situation. A mobile application was developed after a successful reduction of non-significant factors, which later was used to assess the vulnerability scores. The design of the user interface (UI) and user experience (UX) were heavily considered while developing the mobile application so that people of any age with little educational background can use it. Farmers can log into this mobile application and answer the questions corresponding to the top 9 vulnerability factors, which were discussed in the previous section. To avoid any false input data, unrealistically large integer numbers of any input field can be filtered. Later, the responses of each individual will be sent to the central virtual server and vulnerability scores will be calculated for each individual household using the Bayesian Ridge Regression model. To reduce agricultural vulnerability, it is important to consider the factors identified as being significant, such as soil and existence of groundwater, crop diseases, etc. along with the physical process of the area which provided issues related to agricultural vulnerability and this ultimately would help planners and policymakers to develop sustainable agricultural planning for the coastal communities of Bangladesh (Bathrellos et al., 2013) . Vulnerability assessment and planning are highly inter-dependent. In order to become more accurate in assessing farmers' vulnerability, it is important to consider environmental, social, economic, and other relevant factors such as culture, ethical issues, the proper understanding of the static relationships between man and nature, etc. while designing such an intervention (Ford et al., 2018 , Bathrellos et al., 2017 . A proper method of data collection, Journal Pre-proof J o u r n a l P r e -p r o o f such as, if possible, an anthropogenic approach, could be applied for collecting such important information from the villagers. If the significant factors for vulnerability are identified properly, it would be easier for policymakers and planners to allocate scarce resources in a sustainable way. The mobile application, for example, provided a dynamic vulnerability score. The score, in reality, is not static and changes continuously. The application also allows scores to be upgraded as frequently as required by relevant stakeholders to address a particular situation. As the application will be available to farmers for data input, the collected information will be more accurate while keeping expenses low as there will be no physical involvement for data collection and storage activities. The method of a mobile application-based quick vulnerability assessment technique can also be applied to assess other livelihood aspects simply by identifying significant factors responsible for vulnerability as the factors responsible for vulnerability are highly location and subjectspecific. Agricultural production in the coastal regions of Bangladesh is highly vulnerable to changes in weather conditions. The prevailing situation demands the development of a dynamic agricultural plan that considers the future consequences of climatic change and vulnerability to natural hazards. The study identified the most vulnerable agriculture-dependent households using a rapid and cost-effective method. The study focused on the development of a practical and community-friendly application to assess the vulnerability scores to aid local government institutions and other similar organizations with planning and management. Spatial maps were also prepared to show the locations of vulnerable households along with the extent to which they were vulnerable (Tambe et al., 2011) . The study also provided a description for the application of the vulnerability scores, which can later be used to understand the vulnerability issues of certain livelihood options, such as agriculture, fishermen, health, etc. Climate change-related exposures are global issues, whereas sensitivity and adaptive capacity are local issues and can be addressed with local interventions (Wilbanks, 2003; Hess, et al., 2008) . Community response to the identified vulnerability factors of sensitivity and adaptive capacity varied significantly due to the diverse developmental profile and geographical characteristics of the study areas. Statistical and machine learning methods were initially used to filter the most significant factors. The ML method was more successful in this aspect and was used to develop a mobile application that helped in vulnerability assessment by figuring out the factors which require immediate government intervention. The Climate Resilient Vulnerability Assessment Tool, which provides an authentic and faster process of identifying vulnerability, would be able to bring about a revolutionary change in resource distribution and, more importantly, the allocation of scarce resources in a sustainable way. The Vulnerability assessment process is usually highly technical, whereas the mobile application, which is based on a built-in system, provides a more user-friendly approach and can be used independently. The research findings provided an important starting point for directing future research into crop yield vulnerability to climate variability and change. It is expected that the output of the study can be used by policymakers and other stakeholders for better designing and targeting climate change adaptation policies and programs to ensure sustainability. Permutation importance: A corrected feature importance measure Potential suitability for urban planning and industry development by using natural hazard maps and geological -geomorphological parameters Assessment of rural community and agricultural development using geomorphological -geological factors and GIS in the Trikala prefecture Suitability estimation for urban development using multi-hazard assessment map Random forest Trial design, measurement, and analysis of clinical investigations Study on Livelihood Systems Assessment, Vulnerable Groups Profiling, and Livelihood Adaptation to Climate Hazard and Long-term Climate Change in Drought Prone Areas of North-West Bangladesh. Centre for Environmental Geographic Information Services (CEGIS) and Food and Agricultural Organization of the United Nations (FAO) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining -KDD '16 Applied Regression Analysis The dynamics of vulnerability: Locating coping strategies in Kenya and Tanzania Regression: Models, Methods, and Applications Vulnerability and its discontents: The past, present, and future of climate change vulnerability research The Vulnerability Sourcebook: Concept and guidelines for standardised vulnerability assessments. Bonn and Eschborn: Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH Extremely randomized trees Climate Change Climate Change Climate variations: Farming systems and livelihoods in the high barind tract and coastal areas of Bangladesh Total vulnerability of the littoral zone to climate change-driven natural hazards in north Brittany Climate, water, and agriculture Climate change and extreme weather events: Implications for food production, plant diseases, and pests. Global Change and Human Health Risk assessment for the sustainability of coastal communities: A preliminary study Consumer Behavior This research was supported and funded by the Climate Change Trust Fund (CCTF), the Government of the People's Republic of Bangladesh and the Department of Environment (DoE). The authors declare that there is no conflict of interest..016Journal Pre-proof J o u r n a l P r e -p r o o f