key: cord-0127775-anyn9ydt authors: Demertzis, Konstantinos; Kostinakis, Konstantinos; Morfidis, Konstantinos; Iliadis, Lazaros title: A Comparative Evaluation of Machine Learning Algorithms for the Prediction of R/C Buildings' Seismic Damage date: 2022-03-25 journal: nan DOI: nan sha: 9f1003ffc718b57c2f29b515ae24b053a344e420 doc_id: 127775 cord_uid: anyn9ydt Seismic assessment of buildings and determination of their structural damage is at the forefront of modern scientific research. Since now, several researchers have proposed a number of procedures, in an attempt to estimate the damage response of the buildings subjected to strong ground motions, without conducting time-consuming analyses. These procedures, e.g. construction of fragility curves, usually utilize methods based on the application of statistical theory. In the last decades, the increase of the computers' power has led to the development of modern soft computing methods based on the adoption of Machine Learning algorithms. The present paper attempts an extensive comparative evaluation of the capability of various Machine Learning methods to adequately predict the seismic response of R/C buildings. The training dataset is created by means of Nonlinear Time History Analyses of 90 3D R/C buildings with three different masonry infills' distributions, which are subjected to 65 earthquakes. The seismic damage is expressed in terms of the Maximum Interstory Drift Ratio. A large-scale comparison study is utilized by the most efficient Machine Learning algorithms. The experimentation shows that the LightGBM approach produces training stability, high overall performance and a remarkable coefficient of determination to estimate the ability to predict the buildings' damage response. Due to the extremely urgent issue, civil protection mechanisms need to incorporate in their technological systems scientific methodologies and appropriate technical or modeling tools such as the proposed one, which can offer valuable assistance in making optimal decisions. One of the most important, but also challenging, scientific issues in the field of earthquake engineering is the estimation of the structural response of buildings subjected to earthquake ground motions. Since now, numerous research studies have dealt with the above issue and proposed a vast variety of different methods aiming at the seismic assessment of structures. Many of these methods focus on the rapid determination of the earthquake damage response and on the seismic vulnerability assessment of large number of buildings without performing computationally hard analyses, in an attempt to overcome the difficulties resulting from the timeconsuming conduction of demanding nonlinear analysis methods (e.g [1-7]), These procedures usually utilize methods based on the application of statistics theory. In the last decades, the increase of the computers' power has led to the development of modern statistical methods based on the adoption of Machine Learning (ML) algorithms. The up to date research on these methods revealed that they can provide a fast, reliable, and computationally easy way for screening of vulnerable structures and that they can be used as an efficient alternative to the conduction of demanding numerical simulations (e.g. [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] ). The achievement of this goal is made through the creation of a relationship mapping that emulates the structure's behavior. ML is one of the most important scientific field of the new era that includes those algorithmic methods that can be learned from data. It combines ideas from the sciences of statistics and probabilities to make accurate future predictions, while mathematical optimization techniques are used to improve the performance of a system. There are four distinct categories of ML with independent characteristics of learning: a) the information-based learning methodologies that employ concepts from information theory to build models, b) the similarity-based learning methods that build models based on comparing features of known and unknown objects or measure similarity between past and forthcoming occurrences, c) the probability-based learning techniques that build models based on measuring how likely it is that some event will occur and, finally, d) the error-based learning that builds models based on minimizing the total error through a set of training instances. On the other hand, based on how to use the data, there are three main categories of ML algorithms: a) Supervised Learning in which the training process of the algorithm is based on samples of labeled data, b) Unsupervised Learning which is the ability of the algorithm to detect patterns in unknown data and, finally, c) Reinforcement Learning that employs algorithms for discovering the environment based on rewarded actions. Several research studies have proved that the ML methods, mainly Artificial Neural Networks (ANNs), can effectively assess the seismic response of complex structures. A comprehensive literature review of the most commonly used and newly developed ML techniques for the assessment of the buildings' damage has been made by Harirchian et. al [18] , by Xie et. al [19] and by Sun et. al [20] . A brief review of some of the most important research works is given below. Molas and Yamazaki [21] were among the first researchers who studied the ability of ANNs to adequately predict the seismic damage of wooden structures. At almost the same time, Stephens and VanLuchene [22] used trained ANNs in order to estimate the damage level of R/C structures expressed by means of Park and Ang damage index. Rafiq et. al [23] investigated various types of ANNs (Multi-layer Perceptron, Radial Basis Networks and normalized Radial Basis Networks), aiming at utilizing them to solve engineering problems. In another significant research study conducted by Latour and Omenzetter [24] the ability of the ANNs to reliably estimate the earthquake-induced damage of planar R/C frames was investigated by using nonlinear time history analyses' results. A similar investigation was carried out by Arslan [25] , who studied the impact of certain structural parameters on the damage level of regular R/C buildings under seismic ground motions. To this end, a dataset, created through the application of nonlinear pushover analyses, was used to train the ANNs. Rofooei et al. [26] utilized data from nonlinear dynamic analyses of 2D moment resisting R/C frames in order to investigate the influence of structural and seismic features on the ANNs' performance. The correlation between the interstory drift ratios and the plastic hinge rotation of R/C shear walls was studied by Vafei et al. [27] , who used ANNs trained by results taken from nonlinear modal pushover analyses. Kia and Sensoy [28] investigated the impact of certain seismic parameters on the ability of ANNs to assess the seismic damage level of R/C concrete frames based on nonlinear time history analyses of a 2D moment resisting R/C frame. Kostinakis and Morfidis conducted a series of research studies [29] [30] [31] [32] in an attempt to estimate the reliability of ANNs as regards the estimation of the seismic response of R/C buildings. In their studies, they examined also the number and the combination of the input parameters through which an optimum prediction for the damage state of R/C buildings can be achieved, the influence of the parameters which are used for the configuration of the networks' training on the efficiency of their predictions, as well as the impact of the presence of masonry infills on the results. Burton et al. [33] adopted ML methods in an attempt to estimate aftershock collapse vulnerability of buildings utilizing mainshock intensity, seismic response and certain damage indicators. More recently, Zhang et al. [34] proposed a ML framework for the assessment of the post-earthquake structural safety of a 4-story R/C special moment frame building. In another research conducted by the same research team [35] several ML methods were utilized in order to adequately estimate the residual structural capacity of damaged tall buildings. In another paper [36] , a novel framework for earthquake vulnerability assessment of buildings via Rapid Visual Screening is proposed using type-2 fuzzy. Nguyen et al. [37] adopted ANNs and Extreme Gradient Boosting methods for the prediction of planar steel moment frames' seismic response. In particular, the researchers used a comprehensive dataset for the training and testing of the ML models, created by nonlinear dynamic analyses of 36 steel moment frames with different structural characteristics subjected to a large number of ground motions. In a most recent study, Li et al. [38] proposed a method that combines the interstory drift spectrum and a deep learning method to estimate the maximum interstory drift ratio of buildings. The results of the most research studies established the ability of ML techniques in the successful prediction of the seismic damage. However, all of the abovementioned researchers adopted one or only a few ML methods for their study, namely, no study has made an attempt to utilize a large number of ML methods, in order to comparatively evaluate their efficiency in assessing the damage response with adequate reliability. The present paper aims at an extensive comparative evaluation of a large number of Machine Learning algorithms for the reliable prediction of 3D R/C buildings' seismic response. In order to accomplish this aim, a large training dataset consisting of 30 R/C buildings with different structural parameters (the number of stories, the structural eccentricity and the ratio of base shear received by R/C walls (if they exist) along the two orthogonal horizontal axes) was selected. The buildings were designed on the basis of provisions of EC2 [39] and EC8 [40] . For each one of these buildings three different configurations regarding their masonry infill walls were assumed (without masonry infills, with masonry infills in all stories and with masonry infills in all stories except for the ground story), leading to three different data subsets consisting of 30 buildings each. The selected buildings were analyzed for 65 appropriately chosen real earthquake records using Nonlinear Time History Analyses (NTHA). As inputs in the process of Machine Learning methods both seismic and structural parameters widely used in the literature were chosen. The well-documented Maximum Interstory Drift Ratio (MIDR) was selected as the damage index for the R/C. In this section the procedure adopted in order to formulate the problem in terms compatible to ML methods is presented. The procedure consists of the following steps: • Generation of the training data set, which includes selection of a large number of representative R/C buildings, design and modeling of the inelastic properties of the them and selection of an adequate number of seismic motions. • Selection of the problem's input (structural and seismic) parameters. • Conduction of NTHA, according to which the buildings are analysed for the selected earthquake records and their seismic response is determined. Consequently, processing of the analyses' results in order to compute the values of an appropriate seismic damage index (in the present study the MIDR index), which is selected as the output parameter (target) of the ML procedures. In order to fulfill the purposes of the present research study, a large training data set consisting of buildings with a variety of structural characteristics was considered. An attempt was made to select structures that are representative of the buildings designed and built with the aid of modern seismic codes and according to the common construction practice in European countries with regions of high seismicity. More specifically, a set of 30 R/C buildings was selected (see [30] ). The buildings' structural system consists of members in two perpendicular directions (denoted as axes x and y). Moreover, they are rectangular in plan and regular in elevation and in plan according to the criteria set by EC8 [40] . The buildings possess different characteristics concerning the stories' number nst (stories' height: 3.2m), the value of structural eccentricity eo (i.e. the distance between the mass center and the stiffness center of stories) and the ratio of the base shear received by the walls along two horizontal orthogonal directions (axes x and y): nvx and nvy. The values of these structural parameters for the selected buildings are given in Table 1 . More details about the selected buildings can be found in [30] . Table 1 The values of structural parameters of the selected R/C buildings In the above table, Lx and Ly are the dimensions of the rectangular shaped plans of the selected buildings and e0=(e0x 2 +e0y 2 ) 1/2 , where e0x, e0y are the structural eccentricities along axes x and y respectively. In order to investigate the impact of the masonry infills on the seismic response and damage of the buildings, for each one of the 30 structures three different assumptions about the distribution of the masonry infills were considered, leading to three different training subsets: (a) subset denoted as ROW_FORM_BARE consisting of the 30 buildings without masonry infills (bare structures), (b) subset denoted as ROW_FORM_FULL-MASONRY consisting of the 30 buildings with masonry infills uniformly distributed along the height (infilled structures) and (c) subset denoted as ROW_FORM_PILOTIS consisting of the 30 buildings with the first story bare and the upper stories infilled (structures with pilotis). Consequently, the total number of structures investigated herein is 30 different structural systems x 3 different distributions of masonry infills = 90. The three abovementioned subsets of the buildings, as a result of their different masonry infills' configurations, were trained separately by the same ML methods, in order to draw conclusions about the possible differences in the predictive ability of the ML techniques, resulting from the influence of the infill walls on the seismic response of them. The 30 selected bare buildings were modeled, analyzed and designed according to the provisions of EC2 [39] and EC8 [40] . For the buildings' elastic modelling all recommendations of EC8 were followed (diaphragmatic behavior of the slabs, rigid zones in the joint regions of beams/columns and beams/walls, values of flexural and shear stiffness corresponding to cracked R/C elements). The buildings were classified as Medium Ductility Class (MDC) structures. The analyses and design was done with the aid of the modal response spectrum method, as defined in EC8. All buildings were designed for the combination of vertical loads 1.35G+1.50Q, as well as the seismic combination G+0.3Q±E, (where G, Q are the dead and live loads, and E is the seismic action expressed by the simultaneous application of the design spectrum of EC8 along the direction of axes x and y). The design of the structural members was made following the provisions of EC2 and EC8, utilizing the professional program for R/C building analysis and design RAF [41] . After the elastic modeling and design of the bare buildings, the three subsets mentioned above (bare, infilled buildings, buildings with pilotis) were created and their nonlinear behavior was simulated, in order to analyze them by means of NTHA. The modeling of the structures' nonlinear behavior was made using lumped plasticity models (plastic hinges at the column and beam ends, as well as at the base of the walls). The Modified Takeda hysteresis rule [42] was adopted in order to model the material inelasticity of the structural members. Moreover, the effects of axial load-biaxial bending moments (P-M1-M2) interaction at columns and walls hinges were taken into consideration. The yield moments of the R/C elements and the parameters which were necessary for the determination of the P-M1-M2 interaction diagram of the vertical R/C elements' cross sections were computed using the XTRACT software [43] . Regarding the infill walls' modeling, in the present study, the equivalent diagonal strut model was adopted. This model is one of the most well-known and documented in the relevant literature macro-models [44] [45] [46] . It does not account for the local failure, but it participates in the building's global collapse mechanism, which is the main objective of the present study. In particular, each infill panel was modeled as single equivalent diagonal strut with stress-strain diagram according to the model proposed by Crisafulli [47] (Fig. 1) . Figure 1 illustrates the simulation of the masonry infills based on the Crisafulli model, along with all the basic parameters used to define the properties of the diagonal struts. Note that the values of these parameters were computed with the aid of the code provisions given in EC6 [48] . The Machine Learning methods are computational structures which are capable of approaching the solution of multi-parametric problems. This feature gives the flexibility to select the number of the parameters (input parameters) through which a problem can be formulated. For the present investigation's purposes, both structural and seismic parameters were chosen in order to adequately describe the problem. Considering the structural parameters, four macroscopic characteristic, which are considered crucial for the vulnerability assessment of existing 3D R/C buildings were selected: the total height of buildings Htot, the ratios of the base shear that is received by R/C walls (if they exist) along two horizontal orthogonal directions x and y (ratio nvx and ratio nvy) and the structural eccentricity e0 (Table 1) . As regards the seismic parameters, it must be noticed that there are many definitions of them, which are obtained from the accelerograms records. For the present study, the 14 seismic parameters presented in Table 2 have been chosen (e.g. [49] [50] ), in an attempt to select the ones widely used by the relevant literature to describe better the seismic excitations and their impact to structures. The exported result of the solution of the problem which is examined in the present paper is the estimation of the seismic damage state of R/C buildings, so a reliable measure that can adequately quantify their damage response must be adopted as a target (output parameter) for the Machine Learning algorithms. More specifically, the 90 buildings presented above were analyzed by means of NTHA for a suite of 65 earthquake ground motions, accounting for the design vertical loads. As a consequence, a total of 5850 NTHA (90 buildings x 65 earthquake records) were conducted in the present research. The analyses were performed using the computer program Ruaumoko [51] . Regarding the selection of the input earthquake motions, each of these consists of a pairs of horizontal bidirectional seismic components, obtained from the PEER [52] and the European strong-Motion database [53] . The selection of the records was made bearing in mind the coverage of a large variety of realistic values for the 14 ground motion parameters considered as inputs. In Table 2 the range of the ground motion parameters' values that correspond to the 65 chosen strong motions is depicted. For the calculation of the above seismic parameters, the computer program SeismoSignal [50] was utilized. For each one of the nonlinear analyses, the assessment of the seismic damage was determined. In particular, the estimation of the seismic damages that are expected to occur in structural members of R/C buildings is accomplished through the calculation of certain measures which try to quantify the severity of the damage. The choice of a reliable damage measure, that can adequately capture the damage level of the building, is a very difficult task, since it depends on numerous parameters. The present research study, in order to express the buildings' seismic damage, adopts the Maximum Interstory Drift Ratio (MIDR). More specifically, MIDR corresponds to the maximum story's drift among the perimeter frames and it is calculated according to Fig.2 . The MIDR, which is extensively used as an effective indicator of structural and nonstructural damage of R/C buildings (e.g. [54] [55] ), has been adopted by many researchers for the assessment of the structures' inelastic response. In order to identify the most effective algorithm that is capable to predict the R/C buildings' seismic damage with high accuracy, an extensive comparison with the most widely used supervised ML models was made. A comprehensive review of the comparison models is summarized as follows: 1. Light Gradient Boosting Machine: is a gradient boosting framework based on decision trees to increases the efficiency of the model and reduces memory usage [56] . This method produces an ensemble prediction model by a set of weak decision trees prediction models. It builds the model smoothly, allowing at the same time the optimization of an arbitrarily differentiable loss function [57] . 3. Random Forest Regressor: A Random Forest is a meta-learner that builds a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and to control over-fitting [58] . 4. Extra Trees Regressor: Extra Trees is an information-based learning methodology. Specifically, it is an ensemble machine learning algorithm that combines the predictions from many decision trees [59] . 5. k-Nearest Neighbors Regressor: k-Nearest Neighbors Regressor is a similarity-based learning algorithm, according to which the target is predicted by local interpolation of the targets associated with the nearest neighbors in the training set [60] . 6. Linear Regression: Linear Regression is a model that assumes a linear relationship between the input variables (x) and the output variable (y), so that (y) can be calculated from a linear combination of the input variables (x). In linear regression, relationships are modeled using linear prediction functions whose unknown model parameters are estimated from the probability distribution of the prediction values [61] . 7. Bayesian Ridge: Bayesian Ridge is a type of linear regression algorithm that uses probability distributions rather than point estimates in order to solve a regression problem [62] . 8. Ridge Regression: Ridge Regression is a regression method that does not provide confidence limits. It uses regularization L2-norm in order to solve a high covariance problem, even if the errors come from an abnormal distribution [63] . 9. Decision Tree Regressor: A decision tree is a tree-based model including chance event outcomes, resource costs, in order to displays conditional control statements. Each node represents an attribute, each branch represents the outcome of an attribute test, and each leaf represents the decision taken after computing all attributes. The paths from the root to leaf represent the regression process [64] . 10. AdaBoost Regressor: It is a meta-learner that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset where the weights of instances are adjusted according to the error of the current prediction [65] . 11. Elastic Net: The Elastic Net is a normalized regression method to fit data that linearly combines the L1 and L2 norms of the lasso and ridge regression methods [66] . 12. Lasso Regression: Least Absolute Shrinkage and Selection Operator Lasso Regression is a type of linear regression methodology that uses a shrinkage technique in which data are shrunk to a central point, such as the average value [67] . 13. Orthogonal Matching Pursuit: Orthogonal Matching Pursuit is a sparse approximation algorithm which finds the optimal multidimensional data projection fitting the data with high accuracy [68] . 14. Huber Regressor: Huber Regressor is a regression method which defines a threshold based on the distance between target and prediction that makes the loss function switch from a squared error to an absolute one [69] . 15. Least Angle Regression: Least Angle Regression is a linear regression algorithm for fitting high-dimensional data. The solution consists of a curve denoting the solution for each value of the L1 norm of the parameter vector in which the estimated parameters are increased in a direction equiangular to each one's correlations with the residual [70] . The abovementioned ML techniques were utilized for the statistical analysis of the training datasets in order to estimate their predictability in the estimation of the buildings' seismic damage. The following regression metrics were used to compare the results and to detect the ML algorithm which is the most efficient: Coefficient of Determination -. In order to express the correlation between two random variables, 2 is used which is expressed in terms of percentage. This metric gives the rate of variability of the Y values calculated by X and vice versa. 2 is defined as follows: (1) where are the observed values of the dependent variable, ̂ are the estimated values of the dependent variable, ̅ is the arithmetic mean of the observed values and n is the number of observations. 2 attains values in the interval [0,1], with optimal performance when its values approach the unit, indicating that the regression model adapts optimally to the data. Mean Absolute Error -MAE. MAE is the measure that quantifies the error between the estimated and the observed values. It is calculated by the formula: where is the estimated values and is the observed ones. The average of the absolute value of the difference between these values is defined as the absolute error of their relation | | = | − |. Mean Square Error -MSE. MSE is the basic comparison measure that calculates how well a model approaches the number of control examples in a regression process. It is given by the following formula: where is an observed value and ̂ is an estimated value for the n predictions. Root Mean Squared Error -RMSE. RMSE calculates the average error of the predicted values in relation to the actual values. RMSE is based on the following formula: where ( ) is the value predicted by program i for a simple hypothesis j and is the target value for the simple hypothesis j. The success of a regression model requires extremely small values for the RMSE, while the best case (absolute correlation between actual and predicted values and therefore absolute success of the model) is achieved when ( ) − = 0. MAPE provides an objective measure of the estimation error as a percentage of demand (e.g. the estimation error is on average 10% of actual demand) without depending on the order of magnitude of demand. It is given by the following formula where is the actual value and is the forecast value: Generally speaking, RMSE gives more importance to the highest errors, hence it is more sensitive to outliers, whereas, on the other hand, MAE is more robust to outliers. RMSE and MSE work on the principle of averaging the errors, while MAE's calculation is based on the median of the error. Finally, MAPE is a very intuitive interpretation in terms of relative error. In order to confirm the effectiveness of the ML algorithms, extensive ML tests were performed and the comparative results (ranked form the most efficient to the least efficient method) obtained for each one of the three datasets in terms of the abovementioned metrics are presented in the following Tables 4, 5 and 6: Tables 4, 5, and 6 clearly show the superiority of the Light Gradient Boosting Machine (LightGBM) algorithm, which excels in all metrics, while the performance error remains very low compared to the other approaches. Specifically, the accuracy of the LightGBM, exceeds on average the second-best method by almost 3.5%, while the recorded error is significantly smaller. These features are clearly demonstrated by the very high-performance results that it has achieved, as well as its ability to generalize to new unknown situations and to effectively model real-world data. Specifically, the results revealed that using LightGBM it is possible to correlate sophisticated parameters in a simple way and to solve dynamic problems like the prediction of the R/C buildings' seismic response with high accuracy and with an affordable computational cost. In the following, a thorough description, along with analytical details, of the implementation of the most efficient ML algorithm (LightGBM) are given. LightGBM [71] is an information-based learning methodology, which belongs to the class of gradient boosting algorithms and uses a learning algorithm based on regression trees. Regression trees are a simple, easy-to-interpret technique that works best in single-dimensional data analysis (not multidimensional data such as photos, videos, etc.). Considering a set of the form ( , ) for = 1, 2, … , with = ( 1 , 2 , … , ) and for = 1, 2, … , , the construction of a regression tree is defined as follows: 1. The set of target variable's values yi is divided into regions 1 , 2 , … , 2. The variable is modeled as a constant in each region so that: Having as a criterion of minimization the sum of the squares ∑( − ( )) 2 it is easy to calculate the optimal ̂, which is the average of in the region : The problem which arises is that using the sum of the squares in order to find the best results, the algorithm becomes extremely time-consuming. For this reason, another approach is usually used, according to which in each step the target variable is divided into two areas through two branches, a variable and the separation point s are selected, which results in the largest reduction in the sum of squares. Essentially, in this way a variable and a point are sought, in order to minimize the following function: where 1 ( , ) = { | _ ≤ } και 2 ( , ) = { | _ > }. Then, the process is repeated for each area created. The question that arises is how big the trees should be. Note that a large tree will be very specialized in data resulting in a low predictive ability for new data that they have never seen before, while a small tree may not have been properly trained resulting in yielding unsatisfactory results. One solution to the problem is to set a minimum threshold and only if the reduction in the sum of squares achieved by the division is larger than the threshold the separation takes place. This strategy is not always optimal, as a bad initial separation can then lead to a very good next one. The strategy that works best is pruning the tree. The idea is to grow a tree with a predetermined number of nodes and then to prune it using a criterion based on the complexity of the tree as follows: 1. Firstly, a tree is trained at least ⊂ 0 , which can be any tree that resulted from the pruning of the tree 0 . 2. Setting the terminal nodes of , with the node representing the region , then: Essentially, the first term of the function measures how well the tree adapts to the training data (small values indicate good adaptation) and the second term measures the complexity of the tree. The parameter ≥ 0 indicates the counterpoint between complexity and good fit of the tree. For = 0 the resulting tree is 0 , as no cost is added for each node included in the tree. As the parameter grows, the cost of the tree complexity increases, so it results in smaller trees which do not adapt as well to the training data. The smaller the parameter the larger the tree that is constructed, resulting often in overfitting in the training data and, consequently, in a poor performance for other data sets. As mentioned above, LightGBM is a gradient boosting algorithm. The Boosting technique is based on the creation of successive trees. Each tree is trained using information from previous trees. The algorithm works as follows: 1. For each observation in the set of training data ̂ ( ) = 0 and = is set. In each round a tree ̂ with nodes is trained, having as a response variable the residuals of the operation (what is left over from the previous regression round) which are denoted by . A pruned version of the new tree is added so that: Respectively: Repeating the process from step 2 for times ( is defined by the user) the final form of the model is obtained: In order the Boosting technique to be effective, the user must specify the number of trees to be created, the parameter and the number of nodes in each tree. A large number of trees can easily be over-adapted to training data resulting in a poor generalization ability. The parameter determines how fast the model will learn. Typical values of are from 0.001 to 0.1. The number of nodes controls the complexity of each tree. Often, trees of a single division, also known as branches, are satisfactory because the learning in the model is done slowly and in a controlled way. The Gradient Boosting technique is an extension of the Boosting technique, combining two methods, the Gradient Descent algorithm and the Boosting technique. Gradient Descent is a firstclass optimization method. In order to find the total minimum of a function using this technique, its derivative is firstly calculated and then the inverse process of finding the derivative is used. The derivative measures how much the value of a function ( ) will change if the variable changes slightly. It is essentially the slope of the function. High values of the function indicate a large slope and therefore a large change in the value of ( ) for small changes of . This algorithm is iterative, namely it initializes a random value in , calculates the derivative of the function at the given point and modifies so that: where the parameter determines how fast it will move in the negative direction of the derivative. The process is repeated until the algorithm converges. In the case of Gradient Boosting, the algorithm suggests training trees in the negative derivative of the loss function. For example, taking as a loss function the sum of the squares of the residuals divided by 2 so that: Calculating the derivative: That is, the negative derivative of the loss function equals to the residuals . So, essentially, the process involves training a tree based on the residuals, to which a pruned by version of the new tree is added. In this way, the Gradient Boosting technique adds successive trees at any given time to the negative derivative of the loss function so that: where = { ( ) = ( ) } and : → , ∈ that q represents the structure of each tree, represents the number of leaves and each corresponds to an independent tree structure with the leaf weights being denoted as . In the LightGBM technique, trees of different structure are combined, with the structure of each tree being the number of nodes that are created. The loss function that is minimized at any time is given by the formula: The first term measures how well the model adapts to the training data (small values indicate good adaptation) and the second term measures the complexity of each tree, where a new term is introduced in addition to the number of leaves ( ), something that results in a reduction in the weights of leaves: The parameter indicates the penalty value for the growth of the tree, so that large values of will lead to small trees and small values of will lead to large trees. The parameter regulates how well the tree weights will shrink, namely an increase of its value leads to the tree weights' shrinkage. Thus: So, the problem is deciding which ( ) minimizes the loss function at time : From the power series expansion Taylor it follows: So the resulting relation is: where =̂( −1) ( ,̂( −1) ) and ℎ =̂( −1) 2 ( ,̂( −1) ). Subtracting the constants, the loss function becomes: Putting = { | ( ) = } the set of observations on sheet , the above relation is reformulated as follows: Setting = ∑ ∈ and = ∑ ℎ ∈ the following relation emerges: Assuming that the structure of the tree ( ( )) is known, the optimal weight on each leaf is obtained by minimizing the above relation with respect to , so that: Subsequently, by replacing , the following equation results, which also calculates the quality of the structure of the new tree: Finally, the algorithm creates divisions using the following function: where the first fraction is the score of the left part of the separation, the second fraction is the score of the right part of the separation, the third fraction is the score in case that the separation does not take place and measures the cost of the complexity of the separation. The process of solving a problem begins with creating a tree and growing it up to a specific user-defined depth. The tree is pruned in the divisions with a negative Gain and, then, a truncated version of the new tree is added to the model. The procedure is repeated for times ( : parameter defined by the user). It is important to note that the LighGBM algorithm, which is characterized by its efficiency, accuracy and speed, creates histograms and uses the generated classes instead of the entire range of each variable's values, achieving a significant reduction in training time. It also grows vertically, which means that it grows at the level of leaf (leaf-wise method, Fig. 3) , while other algorithms grow at depth (depth-wise method (Fig. 4) ), choosing to grow the leaf with the maximum difference of the cost function. During the leaf-wise tree growth, the algorithm becomes very efficient, as it can significantly reduce the losses, thus gaining accuracy, while at the same time the regression processes are completed quickly. Another important feature that makes LightGBM one of the most complete and widespread algorithms in Machine Learning is that it does not use all the training data, but a sample of them, which results from the Gradient One Side Sampling method (GOSS). The basic idea of the GOSS methodology focuses on the fact that not all observations contribute the same to the training of the algorithm, since those with a small cost function's first derivative are better trained than those with a large one. Ignoring the observations with a small derivative result in the creation of biased samples and in a definite change in the distribution of data, something which leads to a separation that is greater than the optimal one and to an obvious over-adaptation of the model to the sample. To address the problem, random observations with a small cost function's derivative are selected, which are sorted according to the absolute value of their derivative. Finally, the × 100% with the largest derivative and the × 100% from the rest are selected. For the calculation of the loss function the observations with a small derivative are multiplied by 1− b , thus giving more importance to the poorly trained, without significantly differentiating the distribution of the data. By training only one sample in each iteration, a significant increase in the process of the algorithm learning is achieved, resulting in its fast convergence to the optimal solution. Specifically, for a training set of with cases such that = { 1 , 2, … , }, where each is a vector with dimension in the space . In each iteration of the gradient boosting algorithm, the negative slopes of the cost function in relation to the output of the model are denoted as = { 1 , 2, … , }. Implementing the GOSS method, the cases are classified according to the absolute values of their degrees in descending order. Thus, a set with the × 100% larger slopes, a set c consisting of (1 − ) × 100% cases with the smallest slopes and a subset with size × | c | are created. Then, all cases are classified according to the estimated variance cost in vector ( ) on the set ⊂ , so that: is used to normalize the sum of the slopes above with respect to the magnitude of c . The performance metrics of the LighGBM algorithm for the three datasets considered herein were given in Tables 4, 5, and 6. Generally, the LighGBM algorithm achieves the highest coefficient of determination, while the error fluctuation remains very low in comparison to the other methods. This gives a clear explanation that a large percentage of data points (91% in the first dataset, 78% in the second dataset, and 89% in the third dataset) fall within the results of the regression equation, therefore the method adapts optimally to the data. Note that in the above Tables some of the most valid error metrics are compared, since, in the forecasting procedure by ML methods, the error measurement between the estimated value and the actual value is useful both to assess the performance of the model and to define the objective function of the model. In any case, the LightGBM approach produces the lowest error, which is explained as high overall performance, training stability, and generalization ability. Finally, the algorithm has satisfactory training times, which can be further improved if the training data are pre-sorted. Diagrams of the methodology, that show its superiority and the way the LightGBM algorithm works, as well as the way of modeling the problem, are presented in the following. The plots of LightGBM algorithm for the dataset of the bare buildings are presented in the following Figs 5-8: More specifically, the prediction error plot shows the actual targets from each dataset against the predicted values generated by the model. This allows identifying how much variance exists in the model by comparing them against the 45 o line, where the prediction matches exactly the model. Also, the residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a nonlinear model is more appropriate. Moreover, a learning curve is a plot that shows time or experience on the x-axis and learning or improvement on the y-axis. The model is evaluated on the training dataset after each update during training and depicts the measured performance. Finally, the validation curve is a graphical technique that can be used to measure the influence of a single hyperparameter. By looking at this curve, it can be determined if the model is underfitting, overfitting or just-right for some range of hyperparameter values. In the present paper an extensive comparative evaluation of a large number of Machine Learning algorithms for the reliable prediction of 3D R/C buildings' seismic response was carried out. In order to accomplish this aim, a large training dataset consisting of 30 R/C buildings with different structural parameters (the number of stories, the structural eccentricity and the ratio of base shear received by R/C walls (if they exist) along the two orthogonal horizontal axes) was selected. The buildings were designed on the basis of provisions of EC8 and EC2. For each one of these buildings three different configurations regarding their masonry infill walls were assumed (without masonry infills, with masonry infills in all stories and with masonry infills in all stories except for the ground story), leading to three different data subsets consisting of 30 buildings each. The selected buildings were analyzed for 65 appropriately chosen real earthquake records using Nonlinear Time History Analyses. As inputs in the process of Machine Learning methods both seismic and structural parameters widely used in the literature were chosen. The welldocumented Maximum Interstory Drift Ratio was selected as the damage index for the R/C buildings. Based on the research study's results, the following conclusions can be drawn: • Historical data can be utilized in order to develop a realistic model, capable to effectively simulate the earthquake response and to predict with great accuracy the seismic damage of structures belonging to different types. • The general methodology of the proposed procedure uses the most technologically advanced methods in the field of civil engineering and expands them significantly, as it extracts the hidden knowledge found in structural and seismic data in order to add intelligence to the methods of seismic response prediction, as well as to the mechanisms for optimal decision-making related to seismic risk. • The high generalizability of the LightGBM algorithm, as well as the convergence stability of the proposed methodology, proves that it is capable of performing well even when the problem is multiparametric. • The GOSS technique used by the LightGBM algorithm handles with great precision the noisy scattered points of incorrect classification, something that other methodologies cannot handle. • The tree segmentation method utilized by the algorithm leads to results characterized by remarkable prediction, while offering generalization, which is one of the key requirements in the field of machine learning. Moreover, it reduces bias and variance, as well as eliminates overfitting, implementing a robust forecast model. • The proposed method, as a problem of multiple spatial-temporal variables, argues that machine learning methods can be utilized in order to solve dynamic problems of high complexity with affordable computational costs. • The proposed procedure constitutes a very promising methodology, which can significantly improve the safety of structures and infrastructure in general under earthquake excitations. The most important task for the evolution of the proposed methodology is, initially, the process of finding optimization solutions to achieve higher accuracy results. Also, of great importance is the detection of the optimal hyperparameters of the algorithm, in order to enhance the predictive process. Moreover, the training dataset can be expanded to buildings with different structural characteristics and to earthquake records with seismic features of greater range. Finally, the expansion of the methodology with data transformation techniques should be considered, so that the algorithm can locate the optimal representations of the input variables in order to make it easier to extract the useful information. Rapid visual screening of buildings for potential seismic hazards: a handbook (FEMA-154) Rapid assessment of seismic demand in existing building structures A hybrid method for the vulnerability assessment of R/C and URM buildings NCEER-ATC joint study on fragility of buildings The effect of material and ground motion uncertainty on the seismic vulnerability curves of RC structure Artificial neural network surrogate modelling for real-time predictions and control of building damage during mechanised tunnelling Prediction of dynamic properties of ultra-high performance concrete by an artificial intelligence approach Static load estimation using artificial neural network: Application on a wing rib Adaptive Elitist Differential Evolution Extreme Learning Machines on Big Data: Intelligent Recognition of Invasive Species Semisupervised Hybrid Modeling of Atmospheric Pollution in Urban Centers Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Exploratory Approach Based on Complex Network Defined Splines Vibration-based support vector machine for structural health monitoring A machine-learning approach for structural damage detection using least square support vector machine based on a new combinational kernel function A multi-convolutional autoencoder approach to multivariate geochemical anomaly recognition Structural damage identification based on unsupervised feature-extraction via variational auto-encoder A review on application of soft computing techniques for the rapid visual safety evaluation and damage classification of existing buildings The promise of implementing machine learning in earthquake engineering: A state-of-the-art review Machine learning applications for building structural design and performance assessment: State -of -the -art review Neural Networks for quick Earthquake Damage Estimation Integrated Assessment of Seismic Damage in Structures Neural network design for engineering applications Prediction of seismic-induced structural damage using artificial neural networks An evaluation of effective design parameters on earthquake performance of RC buildings using neural networks Estimating the vulnerability of the concrete moment resisting frame structures using Artificial Neural Networks Real-time Seismic Damage Detection of Concrete Shear Walls Using Artificial Neural Networks Assessment the Effective Ground Motion Parameters on Seismic Performance of R/C Buildings using Artificial Neural Network Seismic parameters' combinations for the optimum prediction of the damage state of R/C buildings using neural networks Approaches to the rapid seismic damage prediction of r/c buildings using artificial neural networks Optimization of the seismic performance of masonry infilled R/C buildings at the stage of design using artificial neural networks Comparative evaluation of MFP and RBF neural networks' ability for instant estimation of r/c buildings' seismic damage level Estimating aftershock collapse vulnerability using mainshock intensity, structural response and physical damage indicators A machine learning framework for assessing post-earthquake structural safety Pattern recognition approach to assess the residual structural capacity of damaged tall buildings Developing a hierarchical type-2 fuzzy logic model to improve rapid evaluation of earthquake hazard safety of existing buildings Prediction of seismic drift responses of planar steel moment frames using artificial neural network and extreme gradient boosting A data-driven building's seismic response estimation method using a deep convolutional neural network Design of concrete structures, Part 1-1: General rules and rules for buildings. European Committee for Standardization Design of structures for earthquake resistance -part 1: general rules, seismic actions and rules for buildings, European Committee for Standardization Structural Analysis and Design Software Inelastic analysis of RC frame structures Imbsen Software Systems. XTRACT Version 3.0.5: Cross-sectional structural analysis of components Analytical modelling of infilled frames structures -a general review Mathematical macromodeling of infilled frames: state of the art Spacone E Masonry infilled frame structures: state-ofthe-art review of numerical modelling Seismic behaviour of reinforced concrete structures with masonry infills Design of masonry structures -Part 1-1: General rules for reinforced and unreinforced masonry structures, European Committee for Standardization Geotechnical earthquake engineering Ruaumoko -A program for inelastic time-history analysis: program manual Strong motion database The seismic design handbook Building specific damage estimation LightGBM: A Highly Efficient Gradient Boosting Decision Tree A working guide to boosted regression trees Random Forests Extremely randomized trees Output-sensitive algorithms for computing nearestneighbor decision boundaries Applied Regression Analysis Bayesian Methods for Data Analysis, Third Edition A Critical View of Ridge Regression A framework for sensitivity analysis of decision trees The return of AdaBoost.MH: multi-class Hamming trees Regularization and Variable Selection via the Elastic Net Variable selection with prior information for generalized linear models via the prior lasso method Matching pursuit-based shape representation and recognition using scale-space Archived from the original on 2015-01-26 Least Angle Regression" (PDF) LightGBM: A Highly Efficient Gradient Boosting Decision Tree