1 Introduction

Theory of Mind (ToM) refers to the ability of individuals to attribute mental states to themselves and others, including beliefs, desires, intentions, and emotions. It is an important cognitive skill that enables us to understand and predict the behaviour of others. Additionally, it plays a fundamental role in communication and social interaction [24].

A Multi-Agent System (MAS) is a system composed of multiple autonomous agents that interact with each other to achieve a common goal. These agents can be humans or software, and the system’s goals are accomplished through collaboration, negotiation, and coordination among them [68]. The conceptualisation of MAS takes inspiration from by human society, and considers approaches of modelling not only autonomous intelligent agents as individuals, but the aspects concerning the societies made up of these agents. Furthermore, the field of MAS also explores hybrid systems, where humans are considered as part of the MAS, which aligns MAS with emerging concepts such as hybrid intelligence [1].

Although there have been fascinating achievements in the field of MAS, including models for describing and implementing sophisticated reasoning mechanisms and communication methods, there has always been a continuous exploration of incorporating human cognitive capabilities, such as ToM, within the field of MAS. By integrating ToM into autonomous intelligent agents and, consequently into MAS composed of these agents, there is a promise of achieving sophisticated capabilities that were previously seen only in humans, human interactions, and human societies. This integration also opens up new possibilities for systems where software agents and humans can work together synergistically, leading to the development of systems that align with the concept of hybrid intelligence [1].

In this paper, we present a systematic review which aims to give a brief, but comprehensive, account of the state-of-the-art in applying ToM in the area of MAS. Our goal is to identify what is lacking to achieve the ambitious vision of an AI-human society. It is crucial to evaluate current approaches and techniques in artificial and cognitive intelligence, as well as to address the limitations and challenges involved in reaching these goals.

There are four sections in the paper. Section 2 presents an overview of the systematic review methodology, including the identification of research gaps, the search strategies used, and the selection and exclusion criteria used for articles. In Sect. 3, the findings are discussed in relation to the specific points of interest outlined earlier. Finally, the paper concludes with a few final remarks and references.

2 Methodology

In order to conduct the systematic review, Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [41] were adhered to. The objective of the study was to fill a few research gaps regarding ToM and MAS employing the SPIDER as a strategy defining the following aspects [15]: Sample (S), Phenomenon of Interest (P), Design (D), Evaluation (E), and Research type (R). This strategy is useful for systematic reviews involving studies of different designs and types of interventions. Below, each aspect for this strategy is detailed:

  • Sample: Studies focused on the ToM and MAS context.

  • Phenomenon of Interest: Theoretical and empirical research on applying ToM to MAS.

  • Interest/Information Sources: Peer-reviewed articles, conference proceedings, and book chapters published in English, available online.

  • Design: Qualitative and quantitative studies, theories, and case studies.

  • Evaluation: Studies that examine the relationship between ToM and MAS, or applied ToM to MAS.

  • Research type: Primary research studies, including experimental and non-experimental designs. And the applications areas (health, education, games, personal assistant).

The following research questions were used to identify such gaps:

  • (RQ1) What theory underlies the use of the theory of mind? What is the reference work? (here, we may have multiple works grounded in the same theory, which could be from psychology, philosophy, etc.).

  • (RQ2) Which mental attitudes are modelled by the work? (for example beliefs, memories, intentions, emotions, etc.).

  • (RQ3) How does the system model knowledge related to the theory of mind? What is the form of representation for information about the theory of mind?

  • (RQ4) Does the work model Theory of Mind of human users or only computational entities (software agents)?

  • (RQ5) What is the application area of the work? (for example, health, games, personal assistants, theoretical works, etc.)

  • (RQ6) What technologies were used to develop the work? (for example, modelling and simulation platforms, algorithms, AI techniques, etc.)

  • (RQ7) What are the main results achieved?

  • (RQ8) What are the main challenges encountered?

  • (RQ9) What are the limitations pointed out by the work?

2.1 Databases

Four databases were utilised for this study, each of which was carefully selected based on their relevance to our research topic. Specifically, we focused on databases that were most closely aligned with the scope of our study, and that had a proven track record of returning a significant number of relevant papers. Our aim was to ensure that we had access to the most comprehensive and up-to-date literature on the subject, in order to provide a rigorous and thorough analysis. In this regard, the databases we selected were instrumental in providing us with the necessary information to carry out our review. We examined PubMed, Web of Science, Scopus, and IEEE Xplore databases. Grey literature is not considered in this study.

Table 1. Inclusion and Exclusion Criteria.

2.2 Adopted Criteria and Selection Procedures

This study employed an accurate search approach, utilising four electronic databases (see Sect. 2.1) to identify relevant scientific articles on ToM and MAS. After the search process, it was applied a two-stage selection process. In the first stage, the researchers screened the titles, abstracts, and keywords of each article to identify potentially relevant studies. Subsequently, studies that did not meet the inclusion criteria were excluded from the analysis. In cases where there was uncertainty during the initial evaluation, a second evaluation of the full text was conducted, in which we were able to evaluate if those studies were part of the scope of this literature review. The inclusion and exclusion criteria for all studies are outlined in Table 1.

Across all databases, a string with the following keywords was investigated.

figure a

Note that this search string focus on searching articles that use synonyms for ‘agent’ in the context of software agents, but also including the concept of ‘Theory of Mind’ specifically, which is the interest of this study.

2.3 Selection Process

The selection process began with 698 articles, including 144 from PUBMED, 184 from WoS, 311 from Scopus database, and 58 from IEEE Xplore. Out of the 698 articles, 126 were removed due to duplication. From the remaining 572 articles, 456 were excluded in the initial stage as they did not meet the selection criteria. Five reviewers participated in the first stage, with each assigned to read the title and abstract of all 572 articles that remained after removing the duplicates.

The review was conducted in a blinded manner using the Rayyan toolFootnote 1 to minimise the influence of individual researchers’ decisions. Subsequently, the blinding review was removed, allowing for a comparison of the researchers’ decisions. In this stage, a set of rules was established to facilitate the selection process. Articles receiving three or more votes for inclusion were selected for full reading, while those receiving four or more votes for exclusion were directly excluded. In cases where conflicting votes occurred, the reviewers convened to make a final decision. Applying these criteria, a total of 116 articles were selected for full reading and subsequently categorized into a separate section.

With the final set of 116 articles for full reading, each researcher was tasked with answering the questions created before the reading. In this part of the research, three researchers participated. After reading the articles, 22 were excluded because they were out of scope (mostly related to the study of ToM in humans), 16 were excluded because they were only published in workshops, 11 were excluded because they were short papers (extended abstracts), 4 were excluded because they were books, 2 articles were excluded because they could not be found online for free, and 1 article was excluded because it was a survey.

Fig. 1.
figure 1

PRISMA Diagram.

Fig. 2.
figure 2

Papers by year of publication.

Table 2. Main references used by the papers.

Figure 1 presents the PRISMA diagram of the systematic review, which illustrates the eligible investigation process.

2.4 Selected Articles

In the end, 60 articles remained and were included in this study. The selected studies were published between 2002 and 2022. We observe an increase in the popularity of the topic from 2019 to 2022. However, there is a slight decrease in publications from 2020 to 2022, which could be attributed to external factors, such as the COVID-19 pandemic. The pandemic could have limited the participation of researchers in events and access to university and research labs as a whole. Figure 2 displays the distribution of the selected articles by year. From the 60 selected articles, 26 of them are from academic journals and 34 from conference proceedings.

3 Results

3.1 Answer to RQ1 (Main References)

Table 2 shows those works that were referenced as the theory that underlines the use of ToM, or as the primary reference to the work. Only those works referred to by multiple sources are included. Furthermore, in 25% of the articles included, the main reference could not be identified since the authors did not provide this information explicitly. Of the remaining articles (approximately 19%), while they did provide this information, none of those main references were referenced by multiple articles.

Table 3. Mental attitudes explicitly modelled.
Table 4. Knowledge representation.

3.2 Answer to RQ2 (Modelled Mental Attitudes)

Table 3 summarises the mental attitudes (or states) modelled by the selected articles. It is notable that most of the research focuses on mental attitudes commonly present in agent architectures, including the well-known BDI (Beliefs-Desires-Intentions) architecture [10]. Additionally, some works focus on modelling memories (which are slightly stronger than beliefs), emotions, and goals.

3.3 Answer to RQ3 (Knowledge Representation)

Table 4 summarises the distribution of the selected articles according to how they represent the knowledge related to the ToM. A formal representation is presented in approximately 65% of the works, including epistemic logic, Bayesian modeling, state transition systems, and MDPs. In addition, many of the works utilise symbolic representations, which are similar to agent-oriented programming languages. Furthermore, some works use a representation based on vectors of characteristics, which is typically used in machine learning techniques. Finally, there are works that use natural language representation, and one work differs from others by using semantic databases (ontologies) to represent ToM.

Table 5. Entities modelled.
Table 6. Application areas.

3.4 Answer to RQ4 (Modelled Entities)

Table 5 summarises the distributions of the selected articles based on whether they modelled ToM for other software agents, for humans (e.g., human users), or for both. Most of the research focuses on modelling ToM for other software agents and studying the diverse phenomena that can emerge from agents with this capability. Some works focus on modelling ToM for both software agents and humans, often introducing abstract approaches and logic that can incorporate the modelling and reasoning about ToM. Finally, there are some works that focus on modelling ToM for humans, particularly related to personal assistants. In these works, an software agent aims to improve its decision-making and interaction by using ToM.

3.5 Answer to RQ5 (Application Areas)

Table 6 summarises the distributions of the selected articles according to their application domains. It can be noted that most of the research focus on providing approaches without any particular application domain, providing theoretical proofs and definitions over the proposed logic or framework.

Regarding the works that apply the proposed approach and explore applications, there is a notable interest in the areas of games, human-agent interaction and agent-based simulations. In the area of games, most works explore how ToM can provide an advantage to agents with this capability, studying whether they outperform agents without this capability. Additionally, some works focus on providing a more realistic experience for players and/or improving the naturalness of Non-Playable Characters (NPCs), with some also contrasting human-agent interaction in games. In the area of human-agent interaction, most works explore how ToM can improve the software agent communication by anticipating humans thoughts, intentions, etc. and remembering important information such as their beliefs, goals, and preferences. In the area of agent-based simulation, most works focus on studying social phenomena in which individuals model ToM. Those studies include emergent behaviours, agents profiles, and other related factors.

Furthermore, there are works focusing on how ToM can be applied to: (i) identify false beliefs, (ii) deceive others (and similar dishonest behvaiours), (iii) train and/or implement better negotiators, (iv) achieve cooperation, (v) achieve better results in educational platforms, (vi) infer mental attitudes applied to robotics, and (vii) create better personal assistants.

Table 7. Main technologies.

3.6 Answer to RQ6 (Technologies)

Table 7 presents the main technologies employed in the selected articles for this literature review. It is evident that the majority of the works concentrate on studies that necessitate a human-computer interface, enabling interaction between humans and agents, or showcasing emergent behaviours through agent-based simulations. Additionally, a significant number of articles employ probabilistic approaches and machine learning as their primary technologies. A few studies employ robotics, planning, and agent-oriented programming languages as their main technologies. Lastly, some works remain focused on theoretical aspects without utilising any specific technology.

3.7 Answer to RQ7 (Main Results)

Most of the selected works focus on providing formal properties, semantics, and definitions regarding their approaches for modelling and reasoning with ToM [11, 13, 25, 35, 36, 42, 51, 54, 55, 69] or aim to provide proof of concepts [3, 9, 14, 16,17,18,19, 22, 28,29,30, 39, 44, 45, 47,48,49, 62,63,64]. For example, some studies demonstrate that ToM improves the performance of agents, especially when they have access to more information. They also show that agents with ToM outperform those who are not able to model and reason about ToM [16, 63].

Furthermore, many works aim to provide a computational model for ToM, which can address various challenges, such as (i) dealing with uncertainty [21, 32, 54]; (ii) handling preferences [13]; (iii) based on machine learning models [34, 40, 50, 65]; (iv) using planning [58, 66]; (v) incorporating emotions [23, 52, 53, 59]; (vi) using ontologies [56]; among others [2, 8, 12, 20, 26, 27, 31, 38, 57, 61].

Finally, one work provides a datasets as one of the main results [4]. Some works provide empirical results, such as comparing the ability of humans to model ToM with those modeled by agents [5], comparing the performance of agents with and without ToM [33, 60], among others [50].

3.8 Answer to RQ8 (Main Challenges)

More than half of the selected papers do not point out any challenges in developing their approaches. Most of the works that did mention challenges highlight the difficulty of dealing with the dynamism of real-world environments, which also creates a highly dynamic mental state of agents. For instance, in [4], the authors describe how changes in the physical world, which is very dynamic, affect agents’ beliefs, consequently the ToM they have about each other, making it a challenging issue. In [18], the authors also describe the unpredictability of the environment as one of the main challenges. In [20], the authors explain the challenge of robots representing knowledge about their interactions with the environment and other agents. In [23], the authors discuss about the difficulty of capturing real-time gestures from human users and creating a more general model applicable to other domains. In [31], the authors emphasise the complexity of applying achieved results in real-world scenarios. In [12], the authors describe challenges associated with selecting the correct plan agents should use to achieve user-delegated goals, considering the dynamics of the environment and their individual state. Also, in [39], the authors state that scalability is one of the main challenges, making it hard to consider more factors to make inferences related to ToM. Further, in [47], the authors describe the great challenge of including humans in the loop when considering ToM.

Furthermore, in [2], the authors describe how challenging it can be for agents to decide on initial interactions when their behaviors are unknown. In [25], the authors point out the difficulty of developing cognitive models capable of making inferences about “common sense”. In [36], the authors describe the challenge of finding a balance between representing the explicit and implicit beliefs of others. In [22], the authors find it challenging to collect and use knowledge from crowdsourcing and to evaluate different narratives based on the diversity of phrases, actions, and sentiments. In [57], the authors describe the challenge of implementing complex social capabilities in humanoid robots. In [64], the authors highlight the difficulty of extending the proposed approach to real and more complex case studies, such as legal cases. In [52], the authors describe the challenge of integrating different emotional and cognitive theories into a single agent architecture. In [21], the authors describe the challenge of learning parameters from collected data and exploring different learning techniques. In [60], the authors describe the difficulty of training and measuring skills related to modeling and reasoning about ToM.

Finally, in [11], the authors point out the inadequacy of mathematical frameworks from computer science to model ToM. In [63], the authors describe that higher-order ToM only gains an advantage over lower-order ToM or other simple strategies when enough information is available, which is often hard to achieve in many scenarios. In [33], the authors describe the difficulty of dealing with wrong and uncertain ToM. Lastly, in [66], the authors describe the challenges related to debugging the system, particularly in their case, when debugging narrative planning problems.

3.9 Answer to RQ9 (Main Limitations)

The works that applied subjective or quantitative evaluations supported by human users, such as [13, 28], indicated that the number of human users in the experiments was limited. Also, some works have described the need to extend their approach with other mental attitudes and knowledge representations. For instance, in [19], the authors describe that they intends to extend their work to model desires and intentions in addition to beliefs, while in [66], the authors explain that their approach does not allow the representation of uncertainty, which may be necessary for certain applications.

Furthermore, in [58], the authors note that their approach only explored a limited number of agents. In [23, 45], the authors describe the specificity of their approaches as limitations. In [23], the authors highlight the difficulty of learning the user’s model, while in [36], the authors discuss inconsistencies that can arise from using propositional logic in their approach. Additionally, in [31], the authors acknowledge that their simplified model for agents and environment could not capture all complexities of social interaction as they occur. Moreover, in [39], the authors state that their approach is limited to inferring the intents of only one user and considers a limited number of intents. In [22, 30], the authors describe limitations regarding the technologies used, such as robotics and crowdsourcing. In [21], the authors describe limitations regarding the data used and consequently the generated model. In [63], the authors mention that they only explore two levels of ToM, which could be a limitation of their work. In [33, 44], the authors assume perfect knowledge about other agents’ policies, which could also be a limitation of their work. In [25], the authors describe that they intend to explore other formal properties based on possible worlds.

Many articles also point out common limitations related to validation and evaluation methods, such as [12, 17, 28, 48, 52, 53, 60]. They often highlight limitations resulting from (i) using controlled environments for experiments that may not reflect the real world, and (ii) using a limited number of parameters and/or fixed instances of the problem, among others. Finally, note that the majority of the articles do not mention limitations (their limitations, if any, could not be accessed for this literature review).

4 Conclusion

In this paper we have presented a short and systematic literature review of the approaches in MAS used to apply ToM, that is the concept from Cognitive Science understood as the ability of humans to model the minds of other agents. As future work, we plan to situate these approaches w.r.t. the ‘flavours’ of ToM in Cognitive Science [24].

There seems to be a representational equivalence between the flavours of ToM from Cognitive Science and types of AI modelling approaches. For instance Theory-Theory of Mind (TT) would be the equivalent of symbolic AI, Simulation Theory of Mind (ST) of subsymbolic AI, and the Hybrid Theory of Mind would be something similar to Neurosymbolic AI. While TT represents the type of ToM where agents already have an understanding to some extent of the other agents’ minds, e.g., a set or a knowledge base that consists of beliefs of others’ mental attitudes, ST represents a process for using the already known set of others’ mental attitudes to simulate their minds in order to predict their behaviour. HT is a ‘high-level’ ToM that combines TT and ST in a practical reasoning process in the way its target (the other agent who’s mind is being mentalised) would.

Perhaps the most obvious application of HT in AI is to enable language, communication, and explanation [37], a component that is foundational for MAS where agents must communicate in order to interact successfully with other artificial agents, or humans. This could enable future research directions in MAS, in which simulation of intelligent agents are showed to humans, and humans are asked to estimate their mental attitudes, that means, humans building a ToM about the mental attitudes of intelligent agents [43].

However, as we have seen from answering the literature review questions in this paper, there are quite a few limitations regarding the application of ToM in MAS, and quite a few challenges still remain to be addressed in order for ToM in MAS to showcase the promises regarding realistic human-like mentalisation in agent-agent interactions. Furthermore, there might be even more limitations which more than half of the papers that presented applications of ToM to MAS did not even consider to report.