1 Introduction

Launched in 2017, Canada’s national Artificial Intelligence (AI) Strategy was the first in the world with the aim of guiding AI policy priorities at a country level. This helped Canada to guide AI policy definition and prioritize investments, to stimulate research, innovation and development of solutions in AI as well as its conscientious and ethical aspects [1]. Finland developed its national AI strategy also in 2017, closely followed by Japan, France, Germany and the United Kingdom in 2018. More than 30 other countries and regions have launched their national AI strategies as of 2021Footnote 1, including Brazil.

Accompanying the development of national AI strategies, AI indices started to be built in order to compare nations on their level of AI development. The Global AI Index proposed by Tortoise Media provides a ranking of 62 countries around the world in order to benchmark nations on their level of investment, innovation and implementation of AI [2]. The Global AI Vibrancy ToolFootnote 2 proposed by the Stanford Institute for Human-Centered Artificial Intelligence (Stanford HAI) provides a weighted ranking of 29 countries with the aim of identifying the countries that are leading AI race in terms of two main dimensions: Research and Development and Economy.

Each AI dimension used in the construction of an indicator is associated with a relative importance weight, intending to account for the fact that contributions of different dimensions to the level of AI development have varying degrees of impact levels. Tortoise Media adopted a weighting approach based on subjective assumptions, which may affect the composite scoring for each country and consequently their position in the ranking. On the other hand, Stanford HAI’s approach, worrying about the subjectivity of dimension weights and its impact on the final ranking, adopts a user preference-based weighting methodology and provides an interactive tool that allows the user to change the dimension weights, obtaining different rankings depending on the preference declared by the user.

Despite the differences in the aforementioned weighting approaches, those indices are similar in the sense that they are built based on a linear aggregation of multiple dimensions–although different dimensions are adopted in each of them. Generally, when one assumes a linear aggregation, one does not consider interactions among criteria. Indeed, when defining the set of criteria, non-redundancy is a desirable property [3]. However, one frequently observes correlated criteria in real applications and, therefore, adopting an approach that models interaction between criteria could be useful to avoid biased results [4].

As pointed out in the composite indicator literature [5, 6], indicators that take into account multiple dimensions should be aggregated and weighted accordingly, considering correlation and compensability issues among indicators and avoiding subjective weighting approaches. Robustness analysis should be undertaken to assess the impact of assumptions and hypotheses set when building the composite indicator in terms of the choice of weights and the aggregation method (among others). As a consequence, it is likely that changes in these assumptions may be required, thus ultimately leading to different final decisions.

Therefore, motivated by the requirements that usually stands behind the development of robust indicators, this paper presents a critical analysis of AI indicators when comparing countries in terms of weighting and aggregation approaches. We attempt to answer the following 3 research questions:

  1. 1.

    Do the criteria weights influence the resulting AI ranking of countries?

  2. 2.

    Is the hypothesis of interactions between AI dimensions true?

  3. 3.

    Does the use of a non-linear aggregator in order to consider the interaction between criteria influence the resulting AI ranking of countries?

In the conducted analysis we consider the Global AI Index proposed by Tortoise because it presents a ranking with a greater number of countries and considers a greater number of dimensions in the analysis, and also due to the ease of data acquisition. In order to answer those questions, we apply the MCDA (Multiple Criteria Decision Aid) methods SMAA and Choquet Integral.

The first question is answered by conducting robustness analyses through the application of the Stochastic Multicriteria Acceptability Analysis (SMAA) [7, 8]. SMAA is based on an inverse weight space analysis in order to describe the criteria weights that make each country the most “preferred” one. Therefore, it does not require weights to be pre-defined. SMAA can also be used with different decision models besides the weighted sum. The result given by SMAA–among other descriptive measures–is the probability of a country occupying each of the positions in the ranking. Since SMAA considers simultaneously the uncertainty in all parameters, it is particularly useful for robustness analysis. We apply SMAA with the weighted sum varying the weights and conduct a comprehensive analysis of the ranking variation, comparing the results with the Tortoise ranking. We consider two scenarios: weight information totally missing and weights following the preference order adopted in the Tortoise Index.

In order to answer Question 2, we show that the AI dimensions are statistically redundant. Finally, to answer Question 3, we evaluate the use of a non linear aggregator, called Choquet integral [9], which takes into account interaction among criteria. We also compare the ranking obtained by means of the Choquet integral with the Tortoise ranking.

This paper is organized as follows. In Sect. 2, we introduce the Multiple Criteria Decision Aid problem. Section 3 presents an overview of the methodology used in Tortoise to derive the Global AI Index. In Sect. 4, we provide the theoretical background on the SMAA methodology and the Choquet integral. Section 5 presents the methodology adopted in this paper. Results are presented and discussed in Sect. 6. We conclude this paper in Sect. 7.

2 Multiple Criteria Decision Aid

Multiple Criteria Decision Aid (MCDA) is an area of research concerned with mathematical and computational design tools that can be used either by an individual decision-maker (DM) or a group of DMs, to evaluate a finite number of decision alternatives regarding a set of performance criteria, which are determined according to the decision context [10]. These approaches make it possible to reduce the subjectivity inherent in decision-making processes while considering the preferences of the DM(s).

The DM is who has the power over the decision and is responsible for setting the model parameters. The set of alternatives \(\mathcal {A} = \{a_{1}, a_2, \ldots , a_m\}\) is a finite set of m elements, with \(m \ge 2\), being all of them considered possible solutions for the studied problem. Decision criteria \(G = \{g_1, g_2, \ldots , g_n\}\) are qualitative or quantitative attributes used to evaluate the different alternatives. In MCDA we assume that there are at least two criteria (\(n \ge 2\)). For each criterion \(g_j\) is given a relative importance \(w_j\) called criterion weight, \(j=1, \ldots , n\). We denote \(\textbf{w} = (w_1, w_2, \ldots , w_n)\) the criteria weight vector.

Each alternative \(a_i\) is evaluated according to each criterion \(g_j\), for \(i=1, \ldots , m\) and \(j=1, \ldots , n\), representing the performance of the alternative in relation to that criterion. \(g_j(a_i)\) denotes the performance of the alternative \(a_i\) in relation to the criterion \(g_j\), for \(i=1, \ldots , m\) and \(j=1, \ldots , n\).

The different MCDA methods aim to solve a decision-making problem with m alternatives \(\mathcal {A} = \{a_1, \ldots , a_m\}\), evaluated according to n criteria \(\{g_1, \ldots , g_n\}\). Without loss of generality, it is assumed that all criteria must be maximized. Thus, the decision problem is defined as (1)

$$\begin{aligned} \max \{g_1(a), \ldots , g_n(a) / a \in \mathcal {A}\}. \end{aligned}$$
(1)

A utility function \(u(a_i, \textbf{w})\) is then applied as the aggregation procedure in order to obtain a utility value–or score–that, in the context of the ranking problematic, is used to order the set of alternatives obtaining a final ranking.

3 Tortoise’s Approach to Deriving the Global AI Index: An Overview of the Methodology

The Global AI Index (GAII) proposed by Tortoise Media [2] aims at ranking 62 countries, represented by \(\left[ a_1, \ldots , a_{62} \right] \), based on their level of development in articial intelligence, which is measured by combining three categories of indicators:

Implementation::

This category evaluates how artificial intelligence is being implemented by businesses, governments, and communities. It comprises three dimensions: Talent, Infrastructure, and Operating Environment;

Innovation::

This category measures technological advancements and methodological breakthroughs that indicate a greater potential for artificial intelligence in the future. This pillar is divided into two dimensions: Research and Development;

Investment::

This category assesses the financial and procedural commitments made towards artificial intelligence, and is composed of two dimensions: Commercial Ventures and Government Strategy.

Scores for each of the seven dimensions–also called criteria–are obtained by aggregating several sub-criteria. These scores are displayed on the Tortoise websiteFootnote 3. Then, a Total Score (TS) for each country is obtained through a weighted sum (WS), calculated as

$$\begin{aligned} \text {TS}_i^{WS} = \frac{1}{\sum _{j=1}^{n} w_j}\sum _{j=1}^{n} w_j g_j(a_i), \forall i, \end{aligned}$$
(2)

where \(w_j\) represents the weight assigned to criterion \(g_j\), and \(g_j(a_i)\) is the score of country \(a_i\) over criterion \(g_j\), for \(i = 1, \cdots , 62\) and \(j = 1, \cdots , 7\). The weights used by Tortoise to obtain the total score are: Talent (\(w_1 = 5\)), Infrastructure (\(w_2 = 3\)), Operating Environment (\(w_3 = 2\)), Research (\(w_4 = 5\)), Development (\(w_5 = 3\)), Government strategy (\(w_6 = 1\)), and Commercial ventures (\(w_7 = 5\)). The scores of the top six countries according to the Tortoise GAII are illustrated in Table 1.

Table 1. Scores for each AI dimension and the total score (TS) of the top six countries determined by Tortoise GAII.

One may note in Table 1 that the USA has emerged as the leader in the GAII, scoring the maximum possible points in four out of the seven criteria: Talent, Infrastructure, Operating Environment, and Government Strategy. As a result, the USA received the highest overall score among the 62 countries ranked by the GAII.

This approach provides an understanding of a country’s AI capacity by considering multiple factors, and the weights assigned to each criterion reflect their relative importance in determining a country’s overall score. However, two main drawbacks related to Tortoise GAII can be highlighted. Firstly, the assignment of weights can be considered subjective, as noted by Tortoise itself [2]. Secondly, the adoption of the weighted sum as an aggregator is based on the unverified hypothesis of independence between the criteria.

4 Preliminaries

In this section, we present the building blocks of our proposal: the SMAA algorithm and the Choquet integral, including a description on the adjustment of the Choquet integral parameters.

4.1 SMAA

SMAA (Stochastic Multicriteria Acceptability Analysis) is a simulation-based method for discrete multicriteria decision makings problems where model parameters are uncertain, imprecise, or, specifically in the case of criteria weights, partially or totally missing [7, 11]. Uncertain information is represented by probability distributions. Throughout a Monte-Carlo simulation process, values for the uncertain variables are sampled from their distributions, and alternatives are evaluated by applying the decision model that can be a weighted average, for instance, or any other aggregation procedure. In a ranking problem, SMAA determines all possible rankings for alternatives and quantifies the results in terms of probabilities. Usually, the recommended solution is the ranking with the highest probability.

Different SMAA variants have been proposed in the literature [11]. The basis for most of those variants is the SMAA-2 version proposed in [8]. SMAA-2 computes three descriptive measures. The rank acceptability index, denoted by \(b_{i}^s\), describes the probability of an alternative \(a_{i}\) being in s-th position of the rank. It ranges between 0 and 1, and the closer \(b_{i}^s\) is to 1, the greater the probability of \(a_{i}\) being in position s. Central weight vector \(w_{i}^c\) describes Decision-Maker (DM) preferences supporting alternative \(a_{i}\) being ranked first. Central weight vectors allow an inverse decision-making approach: DM can learn the weights that lead an alternative to rank first instead of previously defining them and building a solution for the problem. The confidence factor \(p^{c}_i\) is the probability of an alternative being the preferred one with the criteria weights expressed by its central weight vector.

As probabilities for all possible solutions are provided by SMAA, this is a methodology that describes how robust the model is subject to different uncertainties in the input data, being so a useful tool for robustness analysis. The rank acceptability index can support this analysis. For instance, alternatives with high acceptability for the best ranks are candidates for occupying the best places in the rank, while alternatives with large acceptability for the worst ranks should be avoided in the best positions even if they would have fairly high acceptability for the best ranks. If none of the alternatives receives high acceptability indices for the best ranks, it indicates a need to measure the criteria, preferences or both more accurately.

4.2 Choquet Integral

The (discrete) Choquet integral (CI) [9] is a non-linear aggregation function that takes into account interaction among criteria. It is defined as follows:

$$\begin{aligned} TS_i^{CI} = \sum _{j=1}^{n}\left[ g_{(j)}(a_i) - g_{(j-1)}(a_i)\right] \mu \left( \left\{ (j), \ldots , (n) \right\} \right) , \end{aligned}$$
(3)

where \((1), \ldots , (n)\) indicates a permutation of the indices j such that \(0 = g_{(0)}(a_i) \le g_{(1)}(a_i) \le g_{(j)}(a_i) \le \ldots \le g_{(n)}(a_i)\) and \(\mu (\cdot )\) represents the set of parameters known as capacity coefficients. A capacity \(\mu :2^{N} \rightarrow \mathbb {R}_{+}\), where \(N=\left\{ 1, 2, \ldots , n\right\} \) is the set of criteria, is a set function that satisfies the following axioms:

  1. (a)

    Normalization: \(\mu (\emptyset ) = 0\) and \(\mu (N) = 1\),

  2. (b)

    Monotonicity: \(\forall A \subseteq B \subseteq N, \mu (S) \le \mu (T)\).

An interesting aspect of the Choquet integral is that the capacity coefficients are associated with the Shapley values [13], i.e., a well-known solution concept from game theory. In multicriteria decision making, the Shapley value of a criterion j, represented by \(\phi _j\), indicates its marginal contribution on the aggregation procedure. The linear relation between \(\mu \) and \(\phi _j\) is given as follows:

$$\begin{aligned} \phi _{j} = \sum _{A \subseteq N\backslash \left\{ j\right\} } \frac{\left( n-\left| A\right| -1\right) !\left| A\right| !}{n!} \left[ \mu (A \cup \left\{ j\right\} ) - \mu (A) \right] , \end{aligned}$$
(4)

where \(\left| A \right| \) represents the cardinality of subset A. A property associated with the Shapley values that are useful when learning the capacity coefficients and interpreting the obtained parameters is that \(\phi _j \ge 0\), \(\forall j=1\). Therefore, the importance assigned to each criterion is at least zero and the higher the \(\phi _j\), the higher criterion j contributes to the aggregation.

Besides the marginal contributions, one may also interpret the interaction between criteria. In this case, the Shapley interaction index between criteria \(j,j'\) is given by [14, 15]

$$\begin{aligned} I_{j,j'} = \sum _{A \subseteq N\backslash \left\{ j,j'\right\} } \frac{\left( n-\left| A\right| -2\right) !\left| A\right| !}{\left( n-1\right) !} \left[ \mu (A \cup \left\{ j,j'\right\} ) - \mu (A \cup \left\{ j\right\} ) - \mu (A \cup \left\{ j'\right\} ) + \mu (A)\right] . \end{aligned}$$
(5)

Each \(I_{j,j'}\) can be interpreted as the interaction degree between criteria \(j,j'\). If \(I_{j,j'} < 0\), there is a negative interaction (or redundant effect) between criteria \(j,j'\). If \(I_{j,j'} > 0\), there is a positive interaction (or complementary effect) between criteria \(j,j'\). In the case where \(I_{j,j'} = 0\), there is no interaction criteria \(j,j'\), and they act independently.

Although there exists a generalization of Shapley index to any coalition of criteria (see [16] for further details), for the scope of this paper, we will restrict this parameters to singletons and pairs of criteria. Indeed, in this study, we consider a particular case of the Choquet integral, which is based on the notion of 2-additive capacity [16]. A 2-additive capacity implies that only exists interactions between pairs of criteria. In other works, we consider that the interaction among 3 or more criteria must be zero. Based on a 2-additive capacity, the Choquet integral can be defined by means of the Shapley values and Shapley interaction indices as follows [16]:

$$\begin{aligned} TS_i^{2adCI} = & {} \sum _{I_{j,j'} > 0} \min \{g_j(a_i), g_{j'}(a_i)\}I_{j,j'} + \sum _{I_{j,j'} < 0} \max \{g_j(a_i), g_{j'}(a_i)\}|I_{j,j'}| \nonumber \\ {} + & {} \sum _{j=1}^{n} g_{j}(a_i) (\phi _{j} - \frac{1}{2}\sum _{j'\ne j}|I_{j,j'}|). \end{aligned}$$
(6)

An interesting aspect of Eq. (6) in comparison with Eq. (3) is that one reduces the number of parameters from \(2^n\) to \(n(n+1)/2\). Surely, we lose flexibility to model all kinds of interaction among criteria. However, the 2-additive Choquet integral offers a good trade-off between flexibility and model complexity [17, 18].

When one assumes a 2-additive capacity, one may redefine the axioms of a capacity in terms of the Shapley values and interaction indices as

$$\begin{aligned} \sum _{j=1}^{n} \phi _j = 1 \end{aligned}$$
(7)

and

$$\begin{aligned} \phi _{j} - \frac{1}{2}\sum _{j' \ne j}|I_{j,j'}| \ge 0, \forall j \in N. \end{aligned}$$
(8)

4.3 An Unsupervised Approach to Learn the Choquet Integral Parameters

Once one adopts the Choquet integral as the aggregation function to calculate the scores and rank the alternatives, one needs to define its parameters. This task in the Choquet integral is quite complicated, as one has several parameters to be defined. However, one may adopt a strategy that can automatically adjust some parameters without defining them subjectively.

Inspired by [19], we consider in this paper a non-supervised approach to automatically adjust the Shapley interaction indices. The goal is to define the interaction index \(I_{j,j'}\) as close as possible from the negative of a similarity measure \(\rho _{j,j'}\) between pairs of criteria, such as the correlation coefficient between them [20]. The idea behind this approach is to mitigate, for instance, biased results provided by correlated criteria. Suppose that two criteria are positively correlated. If we do not take this data structure characteristic into account, when one aggregates the evaluations provided by these criteria, one may sum twice the same information. Therefore, as the Choquet integral can model interaction between criteria, one may define a negative interaction index (which models a redundant effect) to positively correlated criteria. This will reduce the impact of criteria correlations.

The optimization problem used to automatically adjust the Shapley interaction indices is given as follows:

$$\begin{aligned} \begin{array}{ll} \displaystyle \min _{I_{j,j'}, \forall j,j' \in N} &{} \sum _{j,j'}\left( I_{j,j'} + \rho _{j,j'}\right) ^2 \\ \text {s.t.} &{} \phi _{j} - \frac{1}{2} \sum _{j' \ne j} \pm I_{j,j'} \ge 0, \, \, \forall j \in N \\ &{} \sum _{j} \phi _{j} = 1 \end{array}, \end{aligned}$$
(9)

where ± in the first constraint avoids the use of absolute values. Note that, in this optimization problem, we do not find the Shapley values. Indeed, they should be (subjectively) predefined. Moreover, as it is a quadratic problem, it can be easily tackled by most of the available solvers.

5 Methodology

In order to answer the 3 research questions set in this study, we conducted four analyses of the GAII proposed by Tortoise [2] considering the same 62 countries as alternatives and the same seven AI dimensions as criteria: Talent (\(g_1\)), Infrastructure (\(g_2\)), Operating (\(g_3\)), Environment (\(g_4\)), Research (\(g_5\)), Development (\(g_6\)), Commercial Ventures (\(g_7\)) and Government Strategy (\(g_8\)). We take the scores displayed in Tortoise’s websiteFootnote 4 as criteria performance.

In the first analysis, we apply the weighted sum of Eq. (2) to obtain a total score for each country, as in GAII, assuming weight information is totally missing. Throughout SMAA application, weights are randomly generated and the rank acceptability index is given as result. The weighted sum is also applied with SMAA in the second analysis, but weights follow an ordinal preference. We assume the same order of preference as the one adopted in the construction of the Tortoise index [2], i.e., \(w_1\) (Talent) = \(w_4\) (Research) = \(w_7\) (Commercial ventures) > \(w_2\) (Infrastructure) = \(w_5\) (Development) > \(w_3\) (Operating Environment) > \(w_6\) (Government strategy). In this case, throughout SMAA application, weights are randomly generated by respecting the constraint imposed by these preferences. A comprehensive analysis based on the rank acceptability index is then conducted, comparing these two results with the Tortoise ranking.

The third analysis consists in verify the redundancies in the dataset, measured by the correlation coefficient between pairs of criteria. In the fourth analysis, we evaluate the use of the 2-additive Choquet integral to aggregate the criteria information. For this purpose, we apply the non-supervised approach, presented in Sect. 4.3, to obtain the interaction indices \(I_{j,j1}\) that are as close as possible to the negative of the correlation coefficients. As the Shapley values \(\phi _j\), \(j=1, \ldots , n\), we assume the same weights as in the Tortoise analysis. However, we normalize them as follows:

$$\begin{aligned} \phi _j = \frac{w_j}{\sum _{j=1}^n w_j}. \end{aligned}$$
(10)

Based on the obtained interaction indices, we apply the Choquet integral expressed in Eq. (6). The obtained scores are used to construct the new ranking. We then compare the ranking provided by the Choquet integral with respect to the Tortoise one. We verify if an approach that models criteria interactions may lead to a different ranking of countries.

6 Results and Discussion

In this section, we present and discuss the results of the conducted analysis.

6.1 Probabilistic Country Rankings: A Weight Sum Method and SMAA Perspective

The first analysis refers to the application of the weighted sum and SMAA with randomly generated weights. Table 2 shows the obtained rank acceptability indices, with the top six countries ranked according to Tortoise GAII. We have highlighted the highest percentage in each column to indicate the country with the highest probability of being in that particular position. Results indicate that the USA and China are the two most acceptable alternatives, having the highest acceptability for the first two ranks. In particular, the USA has an acceptability score of 93.94% for the first rank and 6% for the second rank, which adds up to almost 100%. As a result, the USA seems to be a robust choice for being in the first position, regardless of criteria weights. Similarly, China appears as a robust choice for the second position, indicating that the first and second positions obtained in the Tortoise ranking come across as being robust choices.

From the third position onwards, the ranking acceptability indices indicate a ranking significantly different from the Tortoise GAII. This is evident in the case of the UK, which holds the third position in Tortoise GAII but has only a 6.59% probability of being ranked third when the weights are random generated, and a potential 46.19% chance of assuming the fourth position. Moreover, no robust choice can be made for the third position as the highest acceptability index achieved was only 24.34% by Israel.

We have elaborated Fig. 1, which presents the same information as Table 2, but in the form of a heat map that shows all possible combinations of country and position, sorted according to the Tortoise GAII. As before, it is evident that the USA and China have a high probability of ranking first and second, respectively. In the middle of the heat map, the probabilities for other countries to occupy a specific position are low, and towards the end positions, the probabilities increase, albeit not robustly. This first analysis shows that the interchanges in the criteria weights have a significant impact on the ranking.

Table 2. Rank acceptability indices with randomly generated weights of the top six countries of the Tortoise GAII.
Fig. 1.
figure 1

Rank acceptability indices (%) with randomly generated weights.

The second analysis to the application of weighted sum and SMAA considering ordinal weights and no longer completely random weights. Table 3 presents the rank acceptability indices of this analysis. The first remark of the results is that there is a 100% chance that USA and China be in first and second place, respectively. Despite a high probability of the UK ranking third at 63.04%, there remains a significant chance of 31.49% that it could instead be positioned fifth. It is noteworthy that Singapore ranks sixth in the Tortoise Index, despite having a high probability (76.49%) of ranking fourth. Conversely, Canada, which ranks fourth in the Tortoise GAII, has only a 6.91% probability of being in that position.

Table 3. Rank acceptability indices with ordinal weights of the top six countries of the Tortoise GAII.
Fig. 2.
figure 2

Rank acceptability indices (%) with ordinal weights.

Figure 2 illustrates the same values presented in Table 3 in the form of a heat map, showing all possible combinations of countries and positions. As expected, maintaining the order of relevant weights results in a more robust solution. However, Fig. 2 also reveals that this robust solution may differ from Tortoise GAII at certain points.

These results show that criteria weights do influence the obtained AI ranking of countries. Therefore, it is essential to conduct sensitivity analysis to evaluate the impact of weight changes and avoid unfair treatment of countries by assuming subjective weights. For example, while UK was ranked third in Tortoise GAII, when random weights were used (even with order preferences), it had a low probability of being ranked in that position. Similarly, Singapore ranked sixth in Tortoise GAII but had a 76.49% probability of being ranked fourth when preference weight order was applied.

6.2 Country Rankings: A Choquet Perspective

In this subsection, we present the results using the Choquet integral. As first investigation, we calculated the Pearson correlation coefficients between pairs of criteria. The obtained values are presented in Fig. 3. There are clearly a lot of redundancies within this dataset. For instance, there are strong correlations between Talent and Environment (\(\rho = 0.8103\)), Talent and Commercial Ventures (\(\rho = 0.7951\)), Environment and Research (\(\rho = 0.8459\)), Environment and Commercial Ventures (\(\rho = 0.8474\)), and Research and Commercial Ventures (\(\rho = 0.7759\)).

Fig. 3.
figure 3

Correlations coefficients among AI dimensions.

Aiming at mitigating the effect of positively correlated criteria in the aggregation procedure, we searched for interaction indices as close as possible from the negative of the correlation coefficients that can be applied with the Choquet integral. As the Shapley values we assumed the same weights as in the Tortoise analysis but normalized by Eq. (10). By solving the optimization problem (9), we achieved \(I_{1,2} = -0.006\), \(I_{1,4} = -0.153\), \(I_{1,5} = -0.060\), \(I_{1,7} = -0.198\), \(I_{2,3} = -0.145\), \(I_{2,4} = -0.037\), \(I_{2,6} = -0.062\), \(I_{3,6} = -0.022\), \(I_{4,5} = -0.100\), \(I_{4,7} = -0.128\), \(I_{5,7} = -0.091\) (the remaining interaction indices are practically zero). Based on these parameters, the 2-additive Choquet integral leads to the total scores presented in Fig. 4 (for the top 6 countries according to Tortoise GAII).

Table 4. Scores for each AI dimension and the total score (TS) of the top six countries determined by the 2-additive Choquet.

An interesting remark is that Singapore achieved the third position in the ranking. Note the redundant effect between Infrastructure and Operating modeled by the Choquet integral (\(I_{2,3} = -0.145\)) in order to mitigate the positive correlation between them (\(\rho _{2,3} = 0.4130\)). This helps to explain how Singapore achieved a better score in comparison with UK and could move from the 6th to the 3rd position (in comparison with Tortoise GAII). Indeed, by looking at Eq. (6), the part associated with a negative interaction index takes the maximum between the scores. Therefore, the bad performance of Singapore in Operating was overcome by the very good score on Infrastructure (which is higher than the score of UK).

7 Conclusion

This paper presented a critical analysis of AI indicators for comparing countries. We apply the SMAA methodology and the Choquet integral to analyze the Tortoise GAII in terms of criteria weights and aggregation procedure. The SMAA analysis results in rank acceptability indices for all countries, which can be used for deriving robust conclusions. More specifically, the rank acceptability index allows quantifying the amount of instability in the results induced by uncertain criteria weights. By applying the Choquet integral, we explored how the solutions change when a non-linear function is assumed.

Regarding the first hypothesis that the criteria weights influence the resulting AI ranking, we have observed that even when randomly varying the weights while adhering to the same ordinal preference as used in Tortoise GAII, the positions with the highest probabilities do not always align with those presented in the Tortoise ranking. Furthermore, for certain ranking positions, it may not be even possible to find a robust choice. It can be concluded that the decision regarding weight determination will strongly influence the final ranking.

The hypothesis of interaction between criteria was verified, indicating that an aggregation procedure than the weighted sum shall be applied in the construction of such indicators when considering the same AI dimensions as criteria. This was confirmed after comparing the ranking determined by the 2-additive Choquet with the Tortoise ranking, since compensations between criteria are made through interaction indices, resulting in changes in rankings.

It is important to note that the rank acceptability index given by SMAA is able to provide only a rough ranking of the alternatives because there is no objective way to combine acceptability indices for different ranks to reach a complete ranking [12]. This characteristic naturally opens up possibilities for future studies, such as the use of the SMAA pairwise winning index to propose a new AI ranking of the countries. A wider outlook would also include the proposal of new measures in order to compare the probability matrix with the given ranking. As another future study, we are planning to apply the SMAA methodology also with the Choquet integral in order to conduct a robustness analysis with respect to model structure.