key: cord-0058654-vrq1ux3o
authors: Capolupo, Alessandra; Monterisi, Cristina; Caporusso, Giacomo; Tarantino, Eufemia
title: Extracting Land Cover Data Using GEE: A Review of the Classification Indices
date: 2020-08-19
journal: Computational Science and Its Applications - ICCSA 2020
DOI: 10.1007/978-3-030-58811-3_56
sha: bcf750517a09efd5ce6a7280526623306d31c137
doc_id: 58654
cord_uid: vrq1ux3o

Land Use/Land Cover (LU/LC) data includes most of the information suitable for tackling many environmental issues. Remote sensing is largely recognized as the most significant method to extract them through the application of various techniques. They can be extracted through the application of many techniques. Among the several classification approaches, the index-based method has been recognized as the best one to gather LU/LC information from different images sources. The present work is intended to assess its performance exploiting the great potentialities of Google Earth Engine (GEE), a cloud-processing environment introduced by Google to storage and handle a large number of information. Twelve atmospherically corrected Landsat satellite images were collected on the experimental site of Siponto, in Southern Italy. Once the clouds masking procedure was completed, a large number of indices were implemented and compared in GEE platform to detect sparse and dense vegetation, water, bare soils and built-up areas. Among the tested algorithms, only NDBaI2, CVI, WI2015, SwiRed and STRed indices showed satisfying performance. Although NDBaI2 was able to extract all the main LU/LC categories with a high Overall Accuracy (OA) (82.59%), the other mentioned indices presented a higher accuracy than the first one but are able to identify just few classes. An interesting performance is shown by the STRed index since it has a very high OA and can extract mining areas, water and green zones. GEE appeared the best solution to manage the geospatial big data.

Every day, many new space-borne sensors are introduced in order to increase massively the earth observation (EO) datasets since, nowadays, they play a key role in landscape planning and environmental monitoring. This results in the generation of a continuous stream of geospatial data that must be stored and handled, producing, in turn, novel geospatial data to be managed. As reported by the Open Geospatial Consortium (OGC), the global EO archive exceed the one Exabyte during 2015. Therefore, geospatial data are generally considered as big data and, over the last few years, new cloud computing platforms have been introduced to overcome the limitations of common desktop software. In fact, such environments need an excellent computational power to integrate data acquired by different sensors and providing complementary information [1] . Thus, considering their features, these software require a great amount of time to meet the fixed operational purpose. Conversely, the cloud platforms allow to save acquisition and processing time exploiting the great potentialities of the cloud. Among them, Google Earth Engine (GEE) (https://earthengine.google.org), designed and realized by Google, is, currently, the most promising cloud computing environment [2] . This is essentially due to its main properties, enhanced by [1] and [2] :

1. the presence of an Application Programming Interface (API) aimed at helping the users to interact with the integrated data catalogue, consisting of publicly available geospatial datasets. Such catalogue is continuously updated, indeed, about 6000 scenes, acquired by several sensors, are daily uploaded; 2. the presence of a High-Performance Computing (HPC) infrastructure intended to speed the processing phase up thanks to the integration of many processors in running the algorithms. This results in solving all the issues linked to the storage and handle of geospatial big data; 3. the presence of an interactive programming environment aimed at ensuring the possibility to develop specific code to meet user's needs.

Therefore, it appears extremely useful to manage geospatial big data and, mainly, to create Land Use/Land Cover (LU/LC) map at global scale. Such thematic charts can be obtained through the application of several methods, such as classification indices [9] [10] [11] [12] , maximum likelihood supervised classification (ML) [3] , machine learning algorithms (MLAs) [4] and object-based image analysis (OBIA) approach [ [5] and [6] ]. Although none of the above-mentioned approaches are problems-free and able to always produce the best result, [7] demonstrated that the index-based classification technique, built on the combination of diverse spectral bands, is the best one for automatically revealing LU/LC information from satellite data in multitemporal and multisensory analysis perspectives. Therefore, over the years, several indices have been developed to quickly extract some LU/LC categories according to their specific needs. Although some review papers have been realized to describe their potentialities and limitations, currently, they are not exhaustive since the indices are continuously updated.

Therefore, this paper is aimed at exploring the potentialities of 85 indices to automatically extract LU/LC information on the pilot site of Siponto, a historical municipality in the Puglia Region (Southern Italy). Both traditional and new indices were investigated in order to assess and compare their performance and, thus, identify the optimal index to distinguish each LU/LC categories. Therefore, 59 indices developed to detect sparse and dense vegetation, 5 introduced to classify water, 7 presented to distinguish bare soil and the remaining 14 indices related to built-up areas identification were tested on twelve Landsat images, belonging to three different missions. Each considered mission is equipped with diverse sensors in order to ensure to examine their potentialities in both a multi-sensor and a multi-temporal perspective.

The paper is composed by three main sections: the first one, titled "Materials and methods", aimed at describing the selected classification indices and the applied processing environment; the second one, "Results and discussion", instead, is intended to report the obtained outcomes in order to identify the optimal index to be adopted to extract the needed LU/LC data; finally, the third section synthetizes the conclusions of the work, describing the performance of the detected optimal solution and of the platform.

The coastline of Siponto in the Puglia Region (Southern Italy) was chosen as study area (Fig. 1 ). Such area, indeed, has played a key role since its foundation (194 BC), inasmuch it was considered as the most considerable hub form the commercial as well as maritime perspectives. Nevertheless, because of two devastating earthquakes occurred in 1223 and 1255, this site was subjected to a gradually depopulation process and, consequently, the swamping of its seaport. From then on, it was mainly exploited to meet agricultural purposes thanks to the presence of a dense network of irrigation ditches. However, over the last few years, this issue lost its significance in favor of tourism, drove by the beauty of its landscape and the climate conditions. Therefore, it is a meaningful site to evaluate the classification indices performance due to the changes experienced by its territory over the years and the heterogeneity of its landscape. Selected traditional indices, described in depth in the following section, were tested on twelve Landsat images, listed in Table 1 . They were collected from three different Landsat missions: 5, 7 and 8, respectively, covering a period of 17 years (2002-2019). Moreover, one image for each season was acquired in order to carry out a multi-sensor, multi-temporal and multi-season evaluation. All the collected images composing the dataset were supplied in the Universal Transverse Mercator (UTM) projection and the World Geodetic System (WGS84) datum. An additional criterion was introduced to select the data: a cloud cover threshold equal to 20% was set.

Traditional desktop software, commonly involved in geospatial analysis, show numerous limitations in storage and managing geospatial big data [1] . Therefore, in the last few years, Google realized a new cloud computing platform to go beyond these issues and to optimize the processing phase: Google Earth Engine (GEE) [1] . It gives the possibility both to download the selected images and to process them exploiting the great cloud potentiality and saving acquiring and processing time [7] . Moreover, its most eligible property: satellite images can be downloaded in raw as well as preprocessed format. This implies the reduction of the time needed to acquire and process geospatial big data [2] and [1] . GEE involves a JavaScript Application Programming Interface (API) as well, allowing the users to carry out whenever operations, such as spectral bands integration to compute classification indices.

Taking advantages of GEE abilities, atmospherically corrected images were selected and all the subsequent processing steps ( Fig. 1) were implemented in GEE environment and performed on cloud. Once the selected images were collected in a preprocessed format, where needed, the cloud masking procedure was conducted by adopting a proper filters, already implemented in GEE, as proposed by [8] . Such filter, based on the information provided by the quality assessment (QA) band, rends transparent cloudy pixels which will not considered in classification indices computation phase. Conversely, the ortho rectification process was not required since the USGS provided satisfying images. Classification indices were computed on the outcome of pre-processed phase (https://developers.google.com/earth-engine) (Fig. 2) . 

Index-based classification approach is widely and efficient applied to quickly generate LU/LC thematic map from satellite images [7] . Such property makes it more attractive than the other classification methods both when a large volume of geospatial data should be investigated and global maps are required [7] . Therefore, an enormous number of indices have been introduced over the years. Each of them is devoted to detecting a specific LU/LC class and just few of them can simultaneously identify several categories. In fact, based on the integration of the information provided by one or more spectral bands, indices can extract Earth's features according to their spectral signature, commonly considered as a footprint since each element has a different trend, even if objects belonging to the same LU/LC class show a similar sign. This is due to features ability of absorbing, reflecting and transmitting the energy. Thus, each band is essential to bring out specific properties, as highlighted by several research activities. For instance, [9] enhanced the relevance of Red band to discriminate vegetated areas because of its dependency by the energy absorbed by chlorophyll; conversely, [10] demonstrated the significance of TIR as well as SWIR band to classify bare soil and built-up areas [11] which is also essential to extract information related to sparse and dense vegetation thanks to its linkage with the amount of water in leaves [12] .

Therefore, 85 conventional indices, listed in Table 2 , were computed and their performance compared in order to bring out their potentialities in classifying Landsat satellite images. Although all of them were quickly implemented in GEE environment, just a few reported appreciable contribution: Normalized Difference Bareness Index (version 2) (NDBaI2) (Eq. 1) [13] , SwirTirRed (STRed) index (Eq. 2) [14] , SwiRed 

The method "stratified random sampling point" was adopted to generate a multitemporal reference dataset composed by a total of 11,245 pixels, used as testing samples, as suggested by [17] . Such points were proportionally distributed in each LU/LC category according to their area. Consequently, they were allocated as following: 1328 pixels were dedicated to water class, 492 pixels to built-up areas, 151 pixels to mining areas, 3165 pixels to bare soil, and 755 and 924 pixels to sparse and dense vegetation, respectively. Overall Accuracy (OA), Producer's Accuracy (PA), and User's Accuracy (UA) were, thus, estimated for each obtained thematic map [18] and [19] . All introduced metrics show a value between 0 and 1: as closest to 1 their value is, better the accuracy is. Once classification indices performance was estimated, they were compared in order to detect the optimal indices for identifying different LU/LC classes.

This paper is aimed at evaluating the potentialities of 85 conventional indices (Table 2) in distinguishing LU/LC classes from Landsat satellite images using GEE platform. Therefore, twelve atmospherically corrected images, belonging to 5, 7 and 8 missions, were selected according to the criteria reported in Sect. 2.1. Selecting such missions allows to assess indices performance on a multi-sensor and multi-temporal perspective since each of them is equipped with a different sensor (Table 1 ) and covers a diverse historical period. Moreover, the dependency from season was explored as well by acquiring, for all the considered Landsat missions, an image for fall, spring, summer and autumn, respectively (Table 1) . Indices performance was assessed on the study area of Siponto, an historical city of Puglia Region (Southern Italy).

The best Overall Accuracy (OA) of each index as well as the detected LU/LC classes were reported in Table 2 . Only the Automated Water Extraction Index (AWEI) [20] , Normalized Difference Bareness Index (NDBaI) [13] and Normalized Difference Bareness Index (version 2) (NDBaI2) [13] were able to extract the maximum number of categories: bare soil, built-up areas, dense and sparse vegetation, water and mining areas. Nevertheless, among them, NDBaI2 showed the highest OA (82.59%); on the contrary, AWEI had the worst OA value (68.04%). This means that, although the three mentioned indices can automatically classify the whole study area, NDBaI2 showed the best performance. Its OA was slightly reduced by its difficulties in distinguishing sparse from dense vegetation.

Totally opposite was the performance presented by the Misra Yellow Vegetation Index (MYVI) [21] and the Triangular Greenness Index (TGI) [22] . Both, indeed, cannot detect any class in the experimental site. Our considerations are supported by literature as well since [23] demonstrated that MYVI encounters some diffculties in detecting LU/LC information because it does not consider atmosphere-soil-vegetation interactions. Similarly, TGI is strongly influenced by the scale and by the chlorophyll content in leaves and, consequently, it is appropriate just in few cases. Therefore, although it is recognized as the optimal index to classify "green areas" from highresolution images, it is completely inadequate to extract LU/LC information from medium and low resolution input data [24] . On the contrary, the other computed indices can extract just few LU/LC categories in line with the purpose of their creation. For instance, among the indices introduced to detect the water, the Water index 2015 (WI2015) [25] showed the best performance (99.81%), while Composite Vegetation Index (CVI) [15] presented the best performance (98.0) in discriminating vegetated areas and SwiRed index [14] , instead, had the best OA (97.76) in detecting built-up areas. A very high accuracy (94.71%) is also obtained by calculating the SwirTiRed (STRed) index [14] which is able to assess different LU/LC categories, such as mining areas, water as well as sparse and dense vegetation.

GEE cloud computing platform played a key role in this research because it allowed to download atmospherically corrected satellite images and to automatize the processing step with a consequent reduction of acquisition and processing time. After programmed a specific code, such environment allowed to automatically extract the LU/LC information from selected satellite images which were separately analyzed. Thus, this study confirms the great ability of the GEE platform in processing geospatial big data, overcoming the limitations of commonly applied desktop software. Beyond to minimize the acquisition and processing time thanks to its eligible property to implement adapted programming code, it can exploit the cloud capacity in storing and managing a large amount of data without needing excellent computational power capacity. Over the years, several indices have been introduced to bring out LU/LC data. Currently, although each of them shows different performance, the optimal index for all LU/LC class has not been detected yet. Thus, this research explored the performance of 85 indices. All investigated indices showed the ability of automatically extracting LU/LC information in a short time independently from the size of the study area. Nevertheless, just three of them (AWEI, NDBaI and NDBaI2) were able to detect the main LU/LC categories (bare soil, built-up areas, water, mining areas, sparse and dense vegetation). Thus, the best performance, in terms of number of detected classes and Overall Accuracy, was shown by NDBaI2 index. Similarly, the optimal index for revealing each LU/LC category was assessed as well. Therefore, CVI, WI2015 and SwiRed were the best indices to detect "green areas", water and built-up areas, respectively. An interesting performance was presented by STRed index as well, since it can quickly distinguish water, vegetated and mining areas generating a really high accurate outcome. GEE environment appeared to be the best solution to automatize the processing step, speeding up all the procedure. Therefore, this study confirms also the great potentiality of GEE in handling geospatial big data, reducing acquiring and processing time as well as operational cost. 

Google earth engine applications since inception: usage, trends, and potential

Google earth engine: planetary-scale geospatial analysis for everyone

Maximum likelihood method modified in estimating a prior probability and in improving misclassification errors

Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data

A novel approach for detecting agricultural terraced landscapes from historical and contemporaneous photogrammetric aerial photos

A class-oriented strategy for features extraction from multidate ASTER imagery

Multitemporal settlement and population mapping from Landsat using Google Earth Engine

Multitemporal cloud masking in the Google Earth Engine

Estimating plant traits of grasslands from UAV-acquired hyperspectral images: a comparison of statistical approaches

Assessment of flood hazard areas at a regional scale using an index-based approach and analytical hierarchy process: application in Rhodope-Evros region

An assessment of Landsat TM band 6 thermal data for analysing land cover in tropical dry forest regions

Application of hyperspectral imaging sensor to differentiate between the moisture and reflectance of healthy and infected tobacco leaves

A new bare-soil index for rapid mapping developing areas using landsat 8 data

Landsat images classification algorithm (LICA) to automatically extract land cover information in google earth engine environment

Detection of spatio-temporal changes of vegetation in coastal areas subjected to soil erosion issue. Aquatic Ecosystem Health & Management

Estimating daily gross primary production of maize based only on MODIS WDRVI and shortwave radiation data

A global reference database from very high resolution commercial satellite data and methodology for application to Landsat derived 30 m continuous field tree cover data

Accuracy assessment of per-field classification integrating very fine spatial resolution satellite imagery with topographic data

Rules and standards for spatial data quality in GIS environments

Automated water extraction index: a new technique for surface water mapping using landsat imagery

Kauth-Thomas brightness and greenness axes

Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density. Rem.0 Sens

Use of normalized difference bareness index in quickly mapping bare areas from TM/ETM +

Performance evaluation of vegetation indices using remotely sensed data

Comparing Landsat water index methods for automated water classification in eastern Australia

AFRI-aerosol free vegetation index

Estimation of forest leaf area index using vegetation indices derived from hyperion hyperspectral data

A modified soil adjusted vegetation index

Atmospherically resistant vegetation index (ARVI) for EOS-MODIS

Adjusting the tasselled-cap brightness and greenness factors for atmospheric path radiance and absorption on a pixel by pixel basis

Evaluation of vegetation indices and a modified simple ratio for boreal applications

The vegetative index number and crop identification

Remote sensing image-based analysis of the relationship between urban heat island and land use/cover changes

Mapping urban bare land automatically from Landsat imagery with a simple index

A new spectral index for extraction of built-up area using Landsat-8 data

Urban built-up area extraction and change detection of adama municipal area using time-series landsat images

BCI: A biophysical composition index for remote sensing of urban environments

Using the MIR bands in vegetation indices for the estimation of grassland biophysical parameters from satellite remote sensing in the Alps region of Trentino (Italy)

A new spectral index for the extraction of built-up land features from Landsat 8 satellite imagery

Fusing high-spatial-resolution remotely sensed imagery and OpenStreetMap data for land cover classification over urban areas

Identification and area measurement of the built-up area with the built-up index (BUI)

Use of normalized difference built-up index in automatically mapping urban areas from TM imagery

Combinational biophysical composition index (CBCI) for effective mapping biophysical composition in urban areas

Analysis of impervious surface and its impact on urban heat environment using the normalized difference impervious surface index (NDISI)

Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves

Comparison of time series tasseled cap wetness and the normalized difference moisture index in detecting forest disturbances

Using thematic mapper data to identify contrasting soil plains and tillage practices

A clustering separation measure

Monitoring vegetation systems in the Great Plains with ERTS

Applying built-up and bare-soil indices from landsat 8 to cities in dry climates

The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features

A spectral method for determining the percentage of green herbage material in clipped samples

Influences of canopy architecture on relationships between various vegetation indices and LAI and FPAR: a computer simulation

Enhanced built-up and bareness index (EBBI) for mapping built-up and bare land in an urban area

Optimization of soil-adjusted vegetation indices

Enhanced normalized difference index for impervious surface area estimation at the plateau basin scale

Estimating PAR absorbed by vegetation from bidirectional reflectance measurements

Sensitivity of the enhanced vegetation index (EVI) and normalized difference vegetation index (NDVI) to topographic effects: a case study in high-density cypress forest

Remote mapping of standing crop biomass for estimation of the productivity of the shortgrass prairie

Use of a green channel in remote sensing of global vegetation from EOS-MODIS

A soil-adjusted vegetation index (SAVI)

Ghost cities" identification using multi-source remote sensing datasets: a case study in Yangtze River Delta

Using landsat digital data to detect moisture stress in corn-soybean growing regions

The generalized difference vegetation index (GDVI) for dryland characterization

Estimation of canopy-average surface-specific leaf area using Landsat TM data

GEMI: a non-linear index to monitor global vegetation from satellites

Derivation of leaf area index from quality of light on the forest floor Ecology

Spatially located platform and aerial photography for documentation of grazing impacts on wheat

Aerial color infrared photography for determining early in-season nitrogen requirements in corn

Transformed difference vegetation index (TDVI) for vegetation cover mapping

Applicability of green-red vegetation index for remote sensing of vegetation phenology

A visible band index for remote sensing leaf chlorophyll content at the canopy scale

Spectral indices in n-space

Relation between social and environmental conditions in Colombo Sri Lanka and the urban index estimated by satellite remote sensing data

A new index-based built-up index (IBI) and its eco-environmental significance

Vegetation and soil lines in visible spectral space: a concept and technique for remote estimation of vegetation fraction

Calculating the vegetation index faster

Study on remote sensing monitoring of vegetation coverage in the field

Efficient segmentation of urban areas by the VIBI

Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: modeling and validation in the context of precision agriculture

The MERIS global vegetation index (MGVI): description and preliminary application

Using WorldView-2 Vis-NIR multispectral imagery to support land mapping and feature extraction using normalized difference index ratios

Snow monitoring using remote sensing data: modification of normalized difference snow index

The tasselled cap-a graphic description of the spectral temporal development of agricultural crops as seen by Landsat

Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery