Comparing LIME and SHAP Global Explanations for Human Activity Recognition

Alves, Patrick; Delgado, Jaime; Gonzalez, Luis; Rocha, Anderson R.; Boccato, Levy; Borin, Edson

doi:10.1007/978-3-031-79035-5_12

Patrick Alves⁹,
Jaime Delgado⁹,
Luis Gonzalez⁹,
Anderson R. Rocha⁹,
Levy Boccato⁹ &
…
Edson Borin⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15414))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

513 Accesses
1 Citation

Abstract

The development of complex machine learning models has been increasing in recent years, and the need to understand the decisions made by these models has become essential. In this context, eXplainable Artificial Intelligence (XAI) has emerged as a field of study that aims to provide explanations for the decisions made by ML models. This work presents a comparison between two state-of-the-art XAI techniques, LIME and SHAP, in the context of Human Activity Recognition (HAR). As LIME provides only local explanations, we present a way to compute global feature importance from LIME explanations based on a global aggregation approach and use correlation metrics to compare the feature importance provided by LIME and SHAP across different HAR datasets and models. The results show that using correlation metrics to compare XAI techniques is not enough to conclude if the techniques are similar or are not, so we employ a feature removal and retrain approach and show that, besides some divergences in the correlation metrics, both XAI techniques successfully identify the most and least important features used by the model for the task.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

A Comparison of Global Explanations Given on Electronic Health Records

Explainable Artificial Intelligence (XAI): A Perspective

Enhancing Machine Learning Model Using Explainable AI

1 Introduction

Smartphone-based Human Activity Recognition (HAR) consists in identifying individuals activities such as walking, running, sitting, standing, and jumping, given samples from tri-axial sensors such as accelerometers and gyroscopes, typically embedded in smartphones. These sensors provide signals capable of discriminating the activities. The significance of solving HAR extends to various applications, particularly in remote patient monitoring and elderly care within the healthcare sector, identifying potential fall [20], and monitoring the degree of Parkinson’s disease [17]. Additionally, smartphone sensor data can be useful in the security field for authentication, employing patterns as a distinctive identifier [5].

To improve the capability of the classifiers, it is often necessary to extract features from the raw data by computing handcrafted features, employing deep learning models, or dimensionality reduction techniques [8, 24, 27]. The first aforementioned perspective favors the interpretation of the so-called latent representation, as the involved features are specifically tailored and selected for the task. On the other hand, dimensionality reduction and deep learning approaches lead to representations that are difficult to understand and to assign relevance [22, 24, 28]. These representations are commonly very abstract, and the use of complex models (black-box) to classify the data makes it difficult to understand the factors that have more impact on the outcome. To address this issue, Explainable Artificial Intelligence (XAI) techniques are useful for understanding the factors of the input data that have more impact on the model outcome or to understand the model behavior by calculating the contribution of each feature, computing the feature importance.

XAI techniques play an important role in making systems more interpretable, helping experts in deploying and judging the model’s reliability [19]. SHAP and LIME are state-of-the-art XAI techniques that provide explanations for models based on images, tabular, and text data [13, 19]. These techniques are model-agnostic, that is, they can be applied to any machine-learning model and can pinpoint which features are more relevant for the model to make a decision.

To address the explanations provided, they can be classified as Local explanations, that are used to provide insights into individual outcomes based on specific input data, while global explanations aim to offer a broader understanding of the model by providing explanations across a set of samples.

One of the challenges in the literature is how to measure the (dis)agreement among the explanations provided by different XAI techniques and in the case of disagreement, “in what technique should we trust?”.

In this paper, we aim to provide a robust comparison among LIME and SHAP explanations, two of the most popular XAI techniques, for six HAR datasets over four classifiers. In this context, we compute correlation metrics to evaluate the linearity and rank similarity of the explanations, measuring their consistency, and validate the quality of these metrics. This research has been motivated by the work performed on the H.IAAC^{Footnote 1} which explores various feature learning and dimensionality reduction techniques to obtain compact latent representations for smartphone sensors’ data.

In this work, we evaluate the importance of features produced by a feature extractor that combines the original temporal data collected by the smartphone accelerometer and gyroscope into a latent space with 24 features. We explore with a feature extractor that applies the Fast Fourier Transform (FFT) to project the original data into the frequency domain and then employs KPCA (a dimensionality reduction technique) to combine the features and reduce them to a set of 24 features. By identifying the importance of features within this latent space, developers can gain valuable insights into the newly constructed feature space, enabling them to assess the efficiency of the feature extractor and interpret the decisions made by the classifier.

The contributions of this paper are as follows:

(i)
We employed a new method to provide global explanations from LIME based on aggregations similar to those provided by SHAP;
(ii)
We provide a robust comparison measuring the agreement between LIME and SHAP explanations for six HAR datasets over four classifiers. Furthermore, we show that despite some disagreements among the explanations, these XAI techniques have successfully identified the least relevant features for the classification task.
(iii)
We analyze the effect of removing the most and least relevant features on the model performance, showing that the least relevant features have almost no impact on the model performance, while the most relevant features do not necessarily have a high impact on the model performance.

Our results indicate that we can leverage LIME to generate global explanations, providing insights on how the model behaves over the entire dataset. To the best of our knowledge, this is one of the first works that compare the global explanations of LIME and SHAP for the HAR task. This comparison can assist researchers in understanding any disagreements among the explanations. Moreover, we demonstrate that, despite potential discrepancies, the explanations are complementary and can be used with confidence to understand the model’s behavior and deploy the model in real-world applications.

The rest of this paper is divided into the following parts: Sect. 2 presents the related work for HAR, XAI, and how to compare the explanations. Section 3 shows the theoretical background of XAI and describes the XAI techniques employed in this work. Section 4 discusses the datasets, the preprocessing steps, and the metrics to compare the explanations. Section 5 exhibits the research questions and experiment results and addresses the proposed questions. Finally, Sect. 6 presents the conclusions and feature works.

2 Related Work

In the HAR literature, researchers have employed different approaches to explain the models, such as LIME, SHAP, LRP, Grad-CAM, and others to provide the relevance of each slice of the time series to the prediction [6, 10, 21]. On the other hand, some works have extracted handcrafted features from the time and frequency domains or dimensionality reduction to represent the data and then employed LIME and SHAP to explain the models [3, 8, 9, 24].

These are some XAI approaches employed to explain HAR models, but these works do not provide a concrete analysis or discussions about the explanations. Furthermore, they do not address the disagreement problem among the XAI techniques. In the literature, the most common approach to measuring the disagreement between the explanations is to compute the correlation among explanations over individual samples [2, 7, 11, 16]. This last-mentioned approach is not a good way to validate the disagreement. Since, in some cases, we have a classification problem and there are many differences between the samples of the data from different classes, so it is viable to think that the reasons for the model’s decision can be totally different.

To evaluate the trustworthiness of the explanations, some researchers employed XAI techniques to explain the model’s decision and compared the explanations with the previous knowledge about the problem [3, 9, 25]. Another comparison is among the explanations provided by different XAI techniques, aiming to verify if the explanations are coherent with each other. This approach is common in the literature, providing subjective analysis among the explanations, verifying if the set of the most important features is the same among the explanations, for example, or only discussing the differences in time consumed to compute the feature importance metric [4, 11].

In this study, we compute correlation metrics to assess the agreement between LIME and SHAP global explanations for HAR models. Additionally, we evaluate the quality of the feature importance results by removing the least and most relevant features according to LIME and SHAP, then retraining and re-evaluating the models using these subsets.

3 Theoretical Background

This section provides an overview of LIME and the methodology we employ to compute global importance values from local ones.

Local Interpretable Model-agnostic Explanations [19] is a model-agnostic method that explains the predictions of any classifier by learning an interpretable model locally around the prediction over a dataset from perturbed instances around the instance to be explained and then fits a linear model to the dataset. The coefficients of the linear model are used to explain the prediction. LIME has been used to explain the predictions of a variety of models, including deep neural networks, random forests, and support vector machines. It is also capable of explaining the predictions of models in a variety of domains, including text, image, and tabular data.

To measure the agreement among global explanations, it is necessary to calculate feature importance globally for LIME to compare the XAI techniques globally. van der Linden et al. [12] proposed a methodology to calculate global explanations for LIME based on global aggregations of local explanations. Our approach is similar to the one proposed by van der Linden et al. [12] – the key difference is that we compute explanations per class, then compute the global explanations using the classes explanations.

Our method computes the global average importance for each class and then sums the aggregated importance for each feature to obtain the global importance. The steps to calculate the global importance are as follows:

Step 1: Group the samples by class and calculate the feature importance for each sample;

Step 2: Calculate the absolute mean of the feature importance $W_{ij}$ for each class k:

$$\begin{aligned} I_{j}^{k} = \dfrac{1}{N_k}\sum _{i=1}^{N_k} |W_{ij}| \; ; \; \forall k \in \{1, \cdots , M\} \end{aligned}$$

(1)

where M is the number of classes, $N_k$ is the number of samples that belong to class k, $W_{ij}$ is the feature importance for the feature j from the $i-th$ sample, and $I_{j}^{k}$ is the importance of feature j for class k;

Step 3: Finally, the global explanation is calculated by summing the feature importance for each class:

$$\begin{aligned} I_{j}^{global} = \sum _{k=1}^M I_{j}^{k} \; ; \; \forall j \in \{1, \cdots , N\} \end{aligned}$$

(2)

where $I_{j}^{global}$ is the global importance of the feature j over the entire set of samples.

4 Materials and Methods

This section presents the datasets used in our experiments, the preprocessing steps applied to standardize the data, and the methodology employed to evaluate the explanations. The experimental setup includes the dimensionality reduction techniques employed to reduce the data to a common feature space, the classifiers used, and the metrics to evaluate the disagreement of XAI techniques.

4.1 Datasets

As mentioned before, the selection of the datasets and the choice to employ a reducer aiming to provide a general and compact representation is based on the work of H.IAAC^{Footnote 2} team, which selected a subset of available datasets to work with.

After analyzing various datasets, the group decided to work with: KuHAR (KH) [15], MotionSense (MS) [14], RealWorld (RW) [23], WISDM [26], and the updated version of the UCI-HAR (UCI) dataset [18]. These datasets have some differences regarding the smartphone position, the metric used to record the accelerometer samples, the sampling rate, and the number of users and activities registered, but they share common activities, such as sitting, standing, walking, walking upstairs, downstairs, and running. Table 1 provides a comprehensive overview of the datasets’ key characteristics.

Table 1. Selected datasets and their main features.

Full size table

Due to many divergences among the datasets’ characteristics, we executed preprocessing operations to minimize attribute variations, ensure a meaningful comparison of evaluation results, and address disparities among the datasets. Table 2 presents the set of preprocessing steps applied to generate the Standardized view for each dataset.

Table 2. Set of preprocessing steps applied to generate the Standardized view for each dataset.

Full size table

The gravity component was removed using a third-order high-pass Butterworth filter with a cutoff frequency of 0.3 Hz, an approach presented by Anguita et al. [1], and the re-labeling step was applied to standardize the labels for the same activities across the datasets.

After these rigorous preprocessing steps on each dataset, we ensured that all datasets had the same data format, the same number of features, and that their features have the same meaning, facilitating the comparison among them. Furthermore, we balanced the dataset for activities (i.e., classes), ensuring that each activity had the same number of samples, and split the data into training, validation, and test sets in 70%, 10%, and 20%, respectively, by users, ensuring that there is no intersection between samples of the same users in the training, validation, and test sets.

4.2 Experimental Methodology

Figure 1 illustrates the feature extraction pipeline used in our experiments. First, we applied the Fast Fourier Transform (FFT) to transform the data to the frequency domain. Then, we explored several dimensionality reduction techniques (reducers), including Principal Component Analysis (PCA), Kernel PCA (KPCA), Independent Component Analysis (ICA), Locally Linear Embedding (LLE), Isomap, Uniform Manifold Approximation and Projection (UMAP).

We also explored different target dimensionalities, such as 12, 18, and 24 features, all possible combinations of the sets of datasets to train the reducer varying from 1 to 3 datasets, and evaluated the performance on RF, SVM, and KNN classifiers ordered by the minimum accuracy achieved among them for each reducer configuration. Our results indicated that applying KPCA trained with MotionSense and RealWorld-Waist datasets allowed us to reduce the data to 24 features while maintaining the classifiers’ performance with minimal loss of accuracy, as shown in Table 3.

Table 3. Accuracies achieved by each ML model on each dataset. The best and worst accuracy for each dataset are in bold and underlined, respectively.

Full size table

Notice that, in general, all classifiers managed to learn the task and perform much better than a random classifier^{Footnote 3}.

After reducing the data and training the classifiers, we applied the LIME and SHAP XAI techniques to generate the global explanations using the test sets. To easily facilitate the comparison among the techniques, we normalized the relevance vector for each technique, dataset, and classifier using the $\ell _1$-norm.

To compare the agreement among the explanations, we computed correlation coefficients among the importance scores assigned to each feature by LIME and SHAP. We utilized three commonly employed correlation metrics: Pearson, Spearman rank correlation, and Kendall’s Tau.

Pearson correlation is ideal for continuous data with a linear relationship, measuring the strength and direction of this connection. Spearman correlation, a non-parametric method, assesses monotonic relationships without assuming linearity or specific data distribution. It’s less affected by outliers and works well with ordinal or non-linear data. Kendall’s Tau, another non-parametric method, is used for both continuous and ordinal data and It evaluates the association strength based on data ranks and is robust against outliers.

In summary, Pearson is best suited for evaluating linear relationships, whereas Spearman and Kendall’s Tau are more appropriate for assessing monotonic relationships or when the data does not conform to normality assumptions.

To interpret the coefficient values calculated by each metric, we can refer to Table 4a and Table 4b.

Table 4. Interpretation of correlation coefficients

Full size table

5 Experimental Results

This section presents the results of our experiments, the analysis, and the answers to the research questions proposed to evaluate the disagreement between LIME and SHAP when explaining feature importance and if the divergence in the explanations is a problem for the interpretability of the models. As addressed in the Sect. 4.2, we preprocess the datasets by computing the FFT and reducing the data to 24 dimensions with KPCA before training the ML models.

The first research question is addressed in Sect. 5.1 (Are LIME and SHAP coherent when explaining feature importance?), where we analyze the coherence between the feature importance values produced by LIME and SHAP for each ML model and dataset, computing different correlation metrics to measure how similar the explanations are. Subsequently, in Sect. 5.2 (Is there a direct relationship between classifier discriminative capability and feature importance?), we evaluate the impact of removing the most and least important features on model performance across different ML models and datasets. This analysis helps determine how the exclusion of these features affects the overall performance.

5.1 Explainability Coherence

The first research question aims to evaluate the coherence between the feature importance values produced by LIME and SHAP for each ML model and dataset. Answering this question is essential to understand if the explanations provided by these techniques are consistent and can be trusted.

To answer RQ 1, i.e., “Are LIME and SHAP coherent when explaining feature importance?”, we first train each ML model (RF, SVM, kNN, and DT) with each dataset, then we calculate the importance of each feature according to each explainability technique for each ML model trained.

Figure 2 shows the global importance of each feature according to SHAP and LIME, when classifying the samples from the WISDM dataset using an RF, SVM, kNN, and a DT ML model. Notice that the explainability techniques do not assign the same importance to each feature. However, the techniques tend to agree on the most relevant and least relevant features. Both techniques agree that the most important features for the RF model (Fig. 2a) are features 0, 1, 2, and 8, and the most important features for the DT model (Fig. 2b) are features 1, 2, and 4. For the SVM (Fig. 2c) and kNN (Fig. 2d) models, the most important features are 0, 1, and 2, in this order.

The same trend can be observed when training the four ML models with other datasets, and, despite a few exceptions, the importance assigned by LIME and SHAP to each feature has a strong correlation.

Figure 3 shows the Pearson, Spearman, and Kendall’s Tau correlation for LIME and SHAP feature importance values. The results indicate a very strong correlation among LIME and SHAP ’ feature importance values overall, with a few exceptions, mostly on DT models. However, even in these cases, the correlation was close to the determined boundary and demonstrated a strong to very strong correlation (based on Tables 4a and 4b), indicating that the explanations provided by both techniques agree with each other. Despite slight differences in ranking when analyzing the most important features, these differences are negligible because the techniques can indicate the set of features that are most important.

The results in Figs. 3a and 3b indicate that LIME and SHAP had the lowest Spearman and KendallTau correlation when computing the feature importance values for kNN and DT models on the KuHAR and UCI datasets. However, Fig. 3c indicates that the Pearson correlation for this experiment is high. This happens because there are some features to which their relevance is much bigger than the others and the Pearson correlation is very sensitive to outliers. Figure 2b shows the feature importance values for the DT model trained with WISDM. We can see that feature 0 has a much higher importance value than the others, which can lead to a high Pearson correlation.

5.2 Feature Importance Vs Classifier Performance

This section presents the experimental results to address RQ 2, i.e., “Is there a direct relationship between classifier discriminative capability and feature importance?”. Answering this question helps to understand if the least relevant features can be removed without affecting the model’s performance and if the most relevant features are essential to the model’s performance. To answer this question, we propose two new questions:

RQ 2-a “Does the removal of features with low relevance affect the discriminative power of the models?”;
RQ 2-b “Does the removal of highly relevant features significantly affect the discriminative power of the models?”.

Answering these questions will help to understand the impact of each feature on the model’s performance, aiding researchers in determining which features are essential to the model without affecting its performance.

In addressing RQ 2-a, we computed the feature importance for each ML model using both LIME and SHAP. Subsequently, we organized the 24 features in ascending order of relevance and retrained the ML model by eliminating the least important features. Similarly, to address RQ 2-b, we conducted the same analysis, focusing on removing the most important features.

The performance of the models when removing the most and least important features by XAI technique is depicted in Figs. 4 and 5. The lines in red and gray (-Best) represent the accuracy of the models when removing the most important features according to LIME and SHAP, respectively. And, the lines in blue and black (-Worst) represent the accuracy of the models when removing the least important ones according to LIME and SHAP, respectively.

The results indicate that, despite some differences in their respective rankings, the XAI techniques agree on the sets of features that are most and least important. It can be observed that the lines corresponding to the removal of the most important features (in red and gray) decrease together, while the lines corresponding to the removal of the least important features (in blue and black) also decrease together. Furthermore, the performance of the models is generally unaffected when the least important features are removed, except for the DT model, which exhibits oscillatory behavior. Conversely, removing the most important features significantly affects model performance, resulting in a noticeable decrease in accuracy.

Moreover, the expected behavior is that the lines corresponding to the removal of the most important features (in red and gray) decrease faster than the lines corresponding to the removal of the least important features (in blue and black). This pattern is observed in most cases. However, there are exceptions, such as the SVM model on the KH, MS, and RW-Thigh datasets, where the lines decrease at the same rate, and the DT model, which exhibits oscillatory behavior across almost all datasets.

Regarding these exceptions, several hypotheses can be proposed: (i) There might still be redundancy across features, allowing the model to adapt to the new feature space created by the removal of certain features. This adaptation could result in assigning more importance to the remaining features, potentially explaining the unexpected behavior of the SVM model on the KH, MS, and RW-Thigh datasets. (ii) The DT model might have identified correlations between some features, focusing primarily on one of them. When this feature is removed, the model shifts its attention to another feature, which could account for the observed oscillatory behavior.

6 Conclusions and Future Work

In this paper, we evaluated the explanations generated by two XAI techniques, LIME and SHAP, on a variety of HAR dataset and ML models. To do so, we employed correlation metrics to compare the explanations generated by the techniques based on feature importance. We also showed that, for Random Forest and SVM, the techniques agree almost perfectly in ranking comparison, indicating that the explanations are very similar. Despite some differences when comparing the rankings of the features, these differences are negligible because, in most cases, the techniques successfully identify the most important and least important features. This finding is important because it demonstrates that the XAI techniques are capable of identifying the most and least relevant features for the model, which is their primary goal.

The results of this study show that correlation metrics alone are not sufficient to evaluate how similar the explanations generated by the techniques are. Furthermore, the removal of the least important features does not affect the model’s performance, which indicates that the techniques are capable of identifying the least important features. On the other hand, the removal of the most important features does not necessarily affect the model’s performance, but this does not mean that the techniques are incapable of identifying the most important features. Instead, it suggests that the model may learn the same patterns from other features.

As future work, we intend to investigate the similarities and differences between the explanations generated by the XAI techniques in other domains, such as time and frequency domains, and to understand how the techniques behave with more interpretable forms of data.

Code Availability

All codes are open-source, licensed under MIT License, and available at https://github.com/H-IAAC/KR-Papers-Artifacts

Notes

1.
The Hub for Artificial Intelligence and Cognitive Architectures (H.IAAC) aims to develop and disseminate knowledge about technologies capable of integrating different intelligence resources into mobile devices, making them skilled in decision-making. To read more about the H.IAAC project, visit the website: https://hiaac.unicamp.br/en/about-the-hub/.
2.
Link to the page: https://hiaac.unicamp.br/en/.
3.
Since the dataset is balanced, the Random classifier performance is estimated by dividing 100% by the number of classes.

References

Anguita, D., et al.: A public domain dataset for human activity recognition using smartphones. In: ESANN, vol. 3 (2013)
Google Scholar
Belaid, M.K., Bornemann, R., Rabus, M., Krestel, R., Hüllermeier, E.: Compare-XAI: toward unifying functional testing methods for post-hoc XAI algorithms into a multi-dimensional benchmark. In: World Conference on XAI, pp. 88–109. Springer (2023)
Google Scholar
Bragança, H., Colonna, J.G., Oliveira, H.A., Souto, E.: How validation methodology influences human activity recognition mobile systems. Sensors 22(6), 2360 (2022)
Article Google Scholar
Duell, J., Fan, X., Burnett, B., Aarts, G., Zhou, S.M.: A comparison of explanations given by explainable artificial intelligence methods on analysing electronic health records. In: 2021 IEEE EMBS International Conference on BHI, pp. 1–4. IEEE (2021)
Google Scholar
Ferreira, A., Santos, G., Rocha, A., Goldenstein, S.: User-centric coordinates for applications leveraging 3-axis accelerometer data. IEEE Sens. J. 17(16), 5231–5243 (2017)
Article MATH Google Scholar
Giurgiu, I., Schumann, A.: Explainable failure predictions with RNN classifiers based on time series data. arXiv preprint arXiv:1901.08554 (2019)
Gwinner, F., Tomitza, C., Winkelmann, A.: Comparing expert systems and their explainability through similarity. In: Decision Support Systems, p. 114248 (2024)
Google Scholar
Hassan, M.M., Uddin, M.Z., Mohamed, A., Almogren, A.: A robust human activity recognition system using smartphone sensors and deep learning. Futur. Gener. Comput. Syst. 81, 307–313 (2018)
Article Google Scholar
Jeyashree, G., Padmavathi, S.: Ihar-a fog-driven interpretable human activity recognition system. Trans. Emerg. Telecommun. Technol. 33(9) (2022)
Google Scholar
Kaushik, A., Gurucharan, K., Padmavathi, S.: Enhancing human activity recognition: an exploration of machine learning models and explainable AI approaches for feature contribution analysis. In: 2023 ICEMCE, pp. 1–6. IEEE (2023)
Google Scholar
Krishna, S., et al.: The disagreement problem in explainable machine learning: a practitioner’s perspective. arXiv preprint arXiv:2202.01602 (2022)
van der Linden, I., Haned, H., Kanoulas, E.: Global aggregations of local explanations for black box models. arXiv preprint arXiv:1907.03039 (2019)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Malekzadeh, M., Clegg, R.G., Cavallaro, A., Haddadi, H.: Mobile sensor data anonymization. In: Proceedings of the International Conference on IOT Design and Implementation, pp. 49–58 (2019)
Google Scholar
Nahid, A.A., Sikder, N., Rafi, I.: Ku-har: an open dataset for human activity recognition. Mendeley Data (2021)
Google Scholar
Neely, M., Schouten, S., Bleeker, M., Lucic, A.: Order in the court: explainable ai methods prone to disagreement. arXiv preprint arXiv:2105.03287 (2021)
Papadopoulos, A., Kyritsis, K., Klingelhoefer, L., Bostanjopoulou, S., Chaudhuri, K.R., Delopoulos, A.: Detecting Parkinsonian tremor from IMU data collected in-the-wild using deep multiple-instance learning. IEEE BHI 24(9) (2019)
Google Scholar
Reyes-Ortiz, J.L., Oneto, L., Samà, A., Parra, X., Anguita, D.: Transition-aware human activity recognition using smartphones. Neurocomputing 171 (2016)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Google Scholar
Santoyo-Ramón, J.A., Casilari, E., Cano-García, J.M.: A study of one-class classification algorithms for wearable fall sensors. Biosensors 11(8), 284 (2021)
Article MATH Google Scholar
Schlegel, U., Arnout, H., El-Assady, M., Oelke, D., Keim, D.A.: Towards a rigorous evaluation of XAI methods on time series. In: 2019 IEEE/CVF (ICCVW), pp. 4197–4201. IEEE (2019)
Google Scholar
Soni, V., Yadav, H., Semwal, V.B., Roy, B., Choubey, D.K., Mallick, D.K.: A novel smartphone-based human activity recognition using deep learning in health care. In: 3rd International Conference on MIND 2021, pp. 493–503. Springer (2023)
Google Scholar
Sztyler, T., Stuckenschmidt, H.: On-body localization of wearable devices: an investigation of position-aware activity recognition. In: 2016 IEEE International Conference on PerCom, pp. 1–9. IEEE (2016)
Google Scholar
Uddin, M., Soylu, A.: Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-based neural structured learning. Sci. Rep. (2021)
Google Scholar
Vijayvargiya, A., Singh, P., Kumar, R., Dey, N.: Hardware implementation for lower limb surface emg measurement and analysis using explainable ai for activity recognition. IEEE TIM 71, 1–9 (2022)
MATH Google Scholar
Weiss, G.M., Yoneda, K., Hayajneh, T.: Smartphone and smartwatch-based biometrics using activities of daily living. IEEE Access 7, 133190–133202 (2019)
Article Google Scholar
Yang, J., Nguyen, M.N., San, P.P., Li, X., Krishnaswamy, S.: Deep convolutional neural networks on multichannel time series for human activity recognition. In: IJCAI, vol. 15, pp. 3995–4001. Buenos Aires, Argentina (2015)
Google Scholar
Zhang, M., Sawchuk, A.A.: Manifold learning and recognition of human activity using body-area sensors. In: 2011 10th ICMLAW2011 10th International Conference on ML and Applications and Workshops, vol. 2, pp. 7–13. IEEE (2011)
Google Scholar

Download references

Acknowledgements

This project was supported by the Ministry of Science, Technology, and Innovation of Brazil, with resources granted by the Federal Law 8.248 of October 23, 1991, under the PPI-Softex [01245.003479/2024-10]. The authors also thank CNPq (315399/2023-6 and 404087/2021-3) and Fapesp (2013/08293-7) for their financial support.

Author information

Authors and Affiliations

Hub for Artificial Intelligence and Cognitive Architectures (H.IAAC), University of Campinas, Campinas, 13083-852, Brazil
Patrick Alves, Jaime Delgado, Luis Gonzalez, Anderson R. Rocha, Levy Boccato & Edson Borin

Authors

Patrick Alves
View author publications
Search author on:PubMed Google Scholar
Jaime Delgado
View author publications
Search author on:PubMed Google Scholar
Luis Gonzalez
View author publications
Search author on:PubMed Google Scholar
Anderson R. Rocha
View author publications
Search author on:PubMed Google Scholar
Levy Boccato
View author publications
Search author on:PubMed Google Scholar
Edson Borin
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Patrick Alves .

Editor information

Editors and Affiliations

Universidade Federal Fluminense, Niterói, Brazil
Aline Paes
Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
Filipe A. N. Verri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alves, P., Delgado, J., Gonzalez, L., Rocha, A.R., Boccato, L., Borin, E. (2025). Comparing LIME and SHAP Global Explanations for Human Activity Recognition. In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science(), vol 15414. Springer, Cham. https://doi.org/10.1007/978-3-031-79035-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-79035-5_12
Published: 30 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79034-8
Online ISBN: 978-3-031-79035-5
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

Comparing LIME and SHAP Global Explanations for Human Activity Recognition

Abstract

Similar content being viewed by others

A Comparison of Global Explanations Given on Electronic Health Records

Explainable Artificial Intelligence (XAI): A Perspective

Enhancing Machine Learning Model Using Explainable AI

1 Introduction

2 Related Work

3 Theoretical Background

4 Materials and Methods

4.1 Datasets

4.2 Experimental Methodology

5 Experimental Results

5.1 Explainability Coherence

5.2 Feature Importance Vs Classifier Performance

6 Conclusions and Future Work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us