ws-procs9x6 Emergence of Communication in Embodied Agents Evolved for the Ability to Solve a Collective Navigation Problem Davide Marocco Stefano Nolfi Institute of Cognitive Science and Technologies, CNR, Viale Marx 15 Rome, 00137, Italy [davide.marocco; stefano.nolfi]@istc.cnr.it In this paper we present the results of an experiment in which a collection of simulated robots that are evolved for the ability to solve a collective navigation problem develop a communication system that allows them to better cooperate. The analysis of the obtained results indicates how evolving robots develop a non-trivial communication system and exploit different communication modalities. Results also indicate how the possibility to co-adapt the robots’ individual and social/communicative behaviour plays a key role in the development of progressively more complex and effective individuals. 1. Introduction The development of embodied agents able to interact autonomously with the physical world and to communicate on the basis of a self-organizing communication system is a new exciting field of research (Steels and Vogt, 1997; Cangelosi and Parisi, 1998; Steels, 1999; Steels and Kaplan, 2001; Marocco, Cangelosi and Nolfi, 2003; Quinn et al, 2003; for a review see Kirby, 2002; Steels, 2003; Wagner et al., 2003; Nolfi, 2005). The objective is to identify methods of how a population of agents equipped with a sensory-motor system and a cognitive apparatus can develop a grounded communication system and use their communication abilities to solve a given problem. Answering this question is important for both scientific and technological reasons. From a scientific point of view, understanding how communication abilities and a communication system might emerge in a population of interacting embodied agents might shed lights on the evolution of animal communication and on the origin of language. From a technological point of view, understanding the fundamental principles involved might lead to the development of innovative communication methods for multi-agent software systems, autonomous robots, and ubiquitous computing devices. In this paper we present the results of an experiment in which a collection of simulated robots that are evolved for the ability to solve a collective navigation problem develop a communication system that allows them to cooperate better. Robots are provided with simple sensory-motor systems that allow them to move, produce signals with varying frequencies, and gather information from their physical and social environment (including signals produced by other agents). Since the chosen problem admits a variety of qualitatively different solutions and since robots are selected on the basis of their ability to solve the collective navigation problem (and not on the basis of their communication abilities), evolving robots are left free to determine the circumstances in which communication is used, the structure of the communication system (i.e. the number, the type and the “meaning” of signals), the communication modalities (i.e. the role played by communicating individuals), and the relation between individual and social/communication abilities. The analysis of the obtained results indicates how evolving robots develop a non-trivial communication system and exploit different communication modalities. Results also indicate 1 how the possibility to co-adapt the robots’ individual and social communicative behaviours play a key role in the development of progressively more complex and effective individuals. In the following section we will review the related literature. In section 3, we will describe our experimental setup and we will show how a communication ability emerges as a result of an indirect selective pressure. In section 4, we will describe the type of signals produced by evolved robots and their effects on other robots’ behaviours. In section 5 we will describe the modalities with which evolved individuals communicate. In section 6, we will describe the relation between the individual and social/communicative behaviour. Finally, in section 7, we will discuss the implications of the obtained results. 2. Related literature In their pioneering work on evolution of communication Werner and Dyer (1992) evolved a simulated population of male and female agents living in a toroidal grid environment for the ability to ‘mate’ (i.e. for the ability of male agents to reach the physical location of female agents). The sensory-motor structure of the agents was designed so as to force them to rely on signalling behaviour and to prevent any possible alternative strategy. Indeed, females are immobile (i.e. they cannot reach males) and males are blind (i.e. they cannot detect the position of females). Females are allowed to detect the position and orientation of the closest male located in the 5x5 cells area surrounding them and to send signals (i.e. vectors of 3 binary values) to all males located in the same surrounding area. Males, on the other hand, are allowed to detect the signals produced by females and to move. By analyzing the obtained results, the authors showed how evolved individuals were able to solve the mating problem by exploiting their communication ability despite the fact that communication was not explicitly rewarded in the fitness function. By analysing how evolved males modified their motor behaviour on the basis of the signals detected, the authors showed how evolved females produced few different signals whose effect could be described with sentences like “go forward”, “turn left”, and “turn right”. More recently, Cangelosi and Parisi (1998) and Marocco et al. (2003) demonstrated how communication can emerge also in experimental settings in which communication is not necessarily required to solve adaptive problems, but it allows individuals to achieve better performance. In Marocco et al. (2003), for example, a population of robotic arm controllers have been evolved for the ability to discriminate objects with different shapes (i.e. spherical or cubic objects) on the basis of tactile sensory information by continuing to touch spherical objects while avoiding cubic objects. Evolving robots are asked to play alternatively the role of speaker and hearer. When they assume the role of speaker they only receive as input the state of tactile sensors and are allowed to produce a vector of floating point value to ‘name’ the object. When they assume the role of hearer, they receive as input both tactile and communication information (i.e. the vector of floating point number produced by a speaker agent that previously interacted with the same object). Evolved individuals display an ability to discriminate the two type of objects and to produce the appropriate motor behaviour (i.e. continuing to touch or avoiding the object) on the basis of tactile sensory information only but they also develop an ability to name the two objects with different patterns and to use these patterns to discriminate the objects immediately, thus avoiding to waste the time necessary to discriminate objects through physical interactions. Quinn et al. (Quinn 2001; Quinn et al, 2003) evolved a team of mobile robots for the ability to move while remaining close to one another. Robots are only provided with proximity sensors and therefore do not have dedicated communication channels. Despite of that, they evolve a primitive form of implicit communication based on bodily movement that allows them to coordinate and to assume different roles (see also Baldassarre et al, 2003). This is achieved through two sub-behaviours: (1) an approaching behaviour in which one of the two robots tends to produce a sustained level of activation on the infrared sensors of the other robot, and (2) a front-inversion behaviour in which the robot that experiences a sustained level of activation in its infrared sensors inverts its direction of movement. The 2 combination of these two sub-behaviours, in fact, allows the two robots to align and to assume the role of follower and leader, respectively. Finally, Di Paolo (2000) reported the results of a set of experiments in which two simulated robots moving in an arena have been evolved for the ability to approach each other and to remain close together as long as possible. Robots are provided with two motors controlling the two wheels, a sound organ able to produce sounds with different intensities, and two sound sensors symmetrically placed at ±45 degrees with respect to the frontal side of the robot that detects the intensity of the sound produced by the two robots. Evolved robots exploit the possibility to modulate the intensity of the produced sounds by producing rhythmical sounds with varying intensities phase-locked at some value near perfect anti- phase. Like the authors of the models reviewed above, we are interested in building a model in which communication can emerge without being explicitly rewarded. However, we are also interested in experimental set-ups in which individual and social/communicative behaviour can co-evolve by mutually shaping one another. Moreover, we are interested in studying how individuals might discover categories that are useful from a communication point of view and that are not already explicitly or implicitly identified in the experimental setting. Finally, we are interested in studying how not only communication abilities but also communication modalities, that regulate how individuals interact/communicate, can emerge as a result of an indirect selective pressure. For this reason, we will not impose a restricted and predefined interaction schema and we will leave robots free to determine the modality with which they will interact. By restricted and predefined interaction schema we mean the interaction modality adopted for example in Werner and Dyer (1992), in which females and males individuals can only play the role of the speaker and hearer, respectively. Or the interaction modality adopted in Cangelosi and Parisi (1998) and Marocco et al. (2003), in which, agents alternatively assume the role of speaker or hearer and in which speakers are allowed to send to hearer robots a signal consisting of a single pattern after having interacted for a certain amount of time with the same object that will be experienced by the hearer. 3. Experimental set-up and emergence of communication A group of four simulated robots live in a square arena of 270x270cm surrounded by walls that contains two circular target areas (see Figure 1, left). The robots have a circular body with a radius of 11 cm and are provided with: two motors controlling the two corresponding wheels, one communication actuator capable of sending signals with varying frequencies (signals are encoded as floating point values ranging from 0.0 to 1.0), eight infrared sensors (that detect obstacles up to a distance of about 15 cm), one ground sensor (which, by detecting the colour of the floor, can ascertain whether the robot is located on a target area or not), and four communication sensors that detect signals produced by other robots up to a distance of 100cm from four corresponding directions (i.e. frontal [315o-45o], rear [135o- 225o], left [225o-315o], right [45o-135o])1. The implementation of communication sensors are shaped in a way that allows robots to detect signals emitted by only a single robot for each direction. 1 We are currently trying to implement these robots in hardware. Robots’ signalling production and detection system will be based on Designer Systems DS-IRCM infrared wireless communication modules sold by Totalrobots, that allow the formation of a local bi-directional wireless network, and appropriate software routines. Communication between modules only occurs within a given angular range and within a distance up to 10 meters. Interferences between modules are prevented by hardware and software routines that check the intensity of incoming signals. 3 Figure 1. Left: The environment and the robots. The square represents the arena surrounded by walls. The two grey circles represent two target areas. The four black circles represent four robots. Right: The neural controller of evolving robots. Internal neurons and recurrent connections are only included in one of the two experimental settings (see text). The robots’ neural controllers consist of neural networks with 14 sensory neurons (that encode the activation states of the corresponding 13 sensors and the activation state of the communication actuator at time t-1, i.e. each robot can hear its own emitted signal at the previous time step) directly connected to the three motor neurons that control the desired speed of the two wheels and the communication signal produced by the robot. In a first experimental setting, neural controllers did not include internal neurons and recurrent connections. In the second experimental setting, neural controllers also included two internal neurons with recurrent connections. On both case the output of motor neurons was computed according to the logistic function (2). In the case of the second experimental setting, the output of sensory and internal neurons was computed according to function (3) and (4), respectively (for a detailed description of these activation functions and the relation with other related models, see Nolfi [2002]). We will call the robots of the former experimental setting reactive robots (since in their case motor actions can only be determined on the basis of the current sensory state, plus the copy of the communication neuron at the time t-1) and the robots of the latter experimental setting non-reactive robots (since in their case the motor actions are also influenced by previous sensory and internal states). ∑+= i iijjj OwtA (1) Ajj e O −+ = 1 1 (2) ( ) ( )jjtjj IOO ττ −+= − 11 (3) ( ) ( ) ( )jAtjj jeOO ττ −++= −−− 11 11 (4) 4 With Aj being the activity of the jth neuron, tj being the bias of the jth neuron, wij the weight of the incoming connections from the ith to the jth neuron, Oi the output of the ith neuron, Oj(t-1) the output of the jth neuron at the previous time step, τj the time constant of the jth neuron, and Ij the intensity of the jth sensors. Robots were evolved for the ability to find and remain in the two target areas by subdividing themselves equally between the two areas. Each team of four robots was allowed to "live" for 20 trials, lasting 100 seconds (i.e. 1000 time steps of 100 ms each). At the beginning of each trial the position and the orientation of the robots was randomly assigned outside the target areas. The fitness of the team of robots consists of the sum of 0.25 scores for each robot located in a target area and a score of -1.00 for each extra robot (i.e. each robot exceeding the maximum number of two) located in a target area. The total fitness of a team is computed by summing the fitness gathered by the four robots in each time step. The initial population consisted of 100 randomly generated genotypes that encoded the connection weights, the biases, and the time constants (in the case of non-reactive robots) of 100 corresponding neural controllers (each parameter is encoded with 8 bits and normalized in the range [–5.0, +5.0], in the case of connection weights and biases, and in the range [0.0, 1.0], in the case of time constants). Each genotype is translated into 4 identical neural controllers that are embodied in the four corresponding robots (i.e. teams are homogeneous and consist of four identical robots, for a discussion about this point and alternative selection schemas see Quinn, 2000, 2001; Quinn et al. 2003; Baldassarre et al. 2003). The 20 best genotypes of each generation were allowed to reproduce by generating five copies each, with 2% of their bits replaced with a new randomly selected value. The evolutionary process lasted 100 generations (i.e. the process of testing, selecting and reproducing robots is iterated 100 times). The experiment was replicated 10 times for each of the two experimental conditions (reactive and non-reactive neural controllers). Figure 2. The behaviour displayed by the team of evolved robots of one of the best replications for reactive (left) and non-reactive robots (right). The square and the grey circles indicate the arena and the target area respectively. Lines inside the arena indicate the trajectory of the four robots during a trial. The numbers indicate the starting and ending position of the corresponding robot (the ending position is marked with a white circle). By analysing the behaviour of one of the best teams of evolved robots for the two experimental conditions we can see how both reactive and non-reactive robots are able find and remain in the two target areas by equally distributing themselves between the two. Figure 2 (left) shows a typical behaviour exhibited by reactive robots. In this example Robots #2 and #3 quickly reach two different empty target areas. Then, robot #1 joins robot #2 in the top-left 5 target area. Robots #0, approaches and avoids the top-left target area (that already contains two robots) and later joins the bottom-left target area. In the example shown in the right part of Figure 2, that displays a typical behaviour exhibited by non-reactive robots, robots #2 and #3 quickly reach two different empty target areas. Later on, robot #1 and then robot #0 approach and enter the bottom-right target area. As soon as the third robot (i.e. robot #0) enters the area, robot #1 leaves the bottom-right target area and, after exploring the environment for a while, enters and remains in the top-left target area. To determine whether the possibility to signal and to use other robots’ signals is exploited by evolving robots and whether the type of neural architecture influences the obtained performance we tested the evolved teams of reactive and non-reactive experiments in three conditions: a normal condition, a deprived condition in which robots were not allowed to detect other robots’ signals (i.e. in which communication sensors were always set to a null value), and in a no-signal conditions in which the two sets of evolutionary experiments were replicated by not allowing robots to detect other robots’ signals and in which evolved robots were tested in the same deprived condition (see Figure 3). 0 0.1 0.2 0.3 0.4 0.5 Normal Deprived No-signals Figure 3. Average fitness of all teams of the last generations of 10 different replications of the experiments. Grey and black histograms represent the average fitness of reactive and non-reactive robots, respectively. Normal histograms represent the fitness obtained by testing the robot in a normal condition (in the same condition in which they have been evolved). Deprived histograms represent the fitness obtained by testing robots (evolved in a normal condition) in a control condition in which they are not allowed to detect other robots’ signals. No- signals histograms represent the fitness obtained by evolving and testing robots in a control condition in which they are not allowed to detect other robots’ signals. Fitness value are normalized in the range [0.0-1.0], were 1.0 corresponds to the case in which individuals spend the entire lifetime in target areas equally divided into two groups (i.e. a fitness value that cannot be reached in practice since robots first have to locate and reach the two target areas). In all cases, individuals have been tested for 1000 trials. Performance in the “Normal” condition is better than in the other two conditions. The difference is statistically significant (p < 0.001) both in the case of reactive and non-reactive robots. The fact that performance in the “Normal” condition are better demonstrate that robots use their ability to produce and detect signals. The fact that non-reactive outperform reactive robots in the normal condition (differences in performance are statistically significant) indicates that the possibility to integrate sensory- motor information through time is exploited by non-reactive individuals. Moreover, the fact that the differences of performance between reactive and non-reactive conditions are not statistically significant in the “Deprived” and “No-Signal” conditions indicates that the possibility to integrate sensory-motor information through time is exploited by 6 communicating individuals only. We will analyse the differences between robots’ individual and social behaviour and the relation between these two forms of behaviours in more detail in section 6. 4. The evolved communication system: signals produced and their effects of other robots behaviour By analysing the teams of the best replication of the experiment we observed that evolved individuals developed a non-trivial communication system, both in the case of the reactive and non-reactive experiments. More specifically evolving robots display an ability to develop a sort of lexicon (including 4-5 different signals), a perceptually grounded categorization of the physical and social world reflected by the different signals, an ability to appropriately modulate their motor behaviour on the basis of the signals detected, an ability to appropriately modulate their signalling behaviour on the basis of the signals detected. In the next two sections we will describe in details the signals produced by reactive and non-reactive robots in different conditions and the effect of the detected signals on robots’ motor and signalling behaviours. As we will see, in the case of the best replication, reactive and non-reactive robots developed a similar signalling system. However, non-reactive robots outperform reactive robots in their ability to “use” the signals detected. In section 5 we will describe the evolved communication modalities. Finally, in section 6, we will describe robots’ individual behaviour and the relation between individual and social/communicative behaviours. 4.1 Experiment I – Reactive robots Reactive robots of the best replication (the same described in Figure 2, left) produce at least four different types of signals: (a) a signal A with an value of about 0.07 produced by robots located outside the target areas not interacting with other robots located inside target areas; (b) a signal B with an value of about 0.45 produced by robots located alone inside a target area; (c) a signal C, an highly varying signal with an average value of 0.25, produced by robots located inside a target area that also contains another robot; (d) a signal D with an value of about 0.01 produced by robots that are approaching a target area and are interacting with another robot located inside the target area. Robots receiving these four types of signals modify their motor and/or signalling behaviour on the basis of the signal received and on other available sensory information. More specifically: (1) robots located outside the target areas receiving signal A tend to modify their motor behaviour in a way that allow them to explore the environment more effectively, i.e. to find more quickly the target areas (see below); (2) robots located outside target areas receiving signal B tend to modify their motor behaviour (by approaching the robot emitting the signal and the corresponding target area) and their signalling behaviour (i.e. by producing signal D instead of signal A); (3) robots located outside the target areas receiving the signal C (i.e. the signal produced by two robots located inside a target area) modify their motor behaviour so as to move away from the signal source. 7 (4) robots located inside the target areas that receive the signal C (i.e. the signal produced by two other robots located inside the target area) tend to modify their motor behaviour so as to exit from the target area. To verify the functionality of signal A, we measured the time elapsed until at least one robot of the team reaches one of the two target areas in a normal condition and in a control condition in which robots were not allowed to detect signals (i.e. in which the state of the four communication sensors of all robots was always set to a null value). By testing the best evolved team of robots in the two conditions, we observed that the time needed to reach the first target area, on the average, is 6.727s and 7.765s in the case of the normal and the control condition, respectively (grey bars in Figure 4). This implies that signals A, produced by robots located outside target areas allow the team to better explore the environment and, consequently, to more quickly reach the target areas. 0 2 4 6 8 Normal Deprived Figure 4. The average time elapsed (seconds) until at least one robot of the team reaches one of the two target areas in a normal condition (“Normal”) and in a control condition (“Deprived”) in which robots were not allowed to detect signals during the test. Black and grey bars represent the average time required by non-reactive and reactive robots, respectively. Average performance obtained by testing the robots for 1000 trials lasting 100 seconds. To verify the functionality of signal B, we tested a team consisting of two robots placed in an environment including only a single target area (one of the robots was manually placed into the target area, while the other one was placed in a random position outside the area) in a normal condition and in a control condition in which robots were not allowed to detect signals (i.e. in which the state of the four communication sensors of all robots was always set to a null value). Testing the best evolved team of robots in the two conditions, we observed that the percentage of trials in which the robot placed outside was able to reach the target area within 100 seconds is 80.9% and 53.2% in the normal and control condition, respectively (grey bars in Figure 5). This demonstrates that robots detecting signal B modify their motor behaviour so to approach the source of the signal. We will discuss the effect of signal B on robots signalling behaviour in the next section. 8 0 20 40 60 80 100 Normal Deprived Figure 5. Percentage of trials in which both robots were able to reach the target area. Tests of a team consisting of two robots placed in an environment including only a single target area in a normal condition (“Normal”) and in a control condition (“Deprived”) in which robots were not allowed to detect signals. Grey and black bars represent the performance of reactive and non-reactive robots, respectively. Average performance obtained by testing the robots for 1000 trials lasting 100 seconds. Robots located inside a target area produce signal B. However, two interacting robots located in the same target area reciprocally modulate their signalling behaviour so as to produce signal C (i.e. a highly varying signal with an average value of 0.25). Indeed, by placing two robots in a target area and by preventing the former to detect the signal produced by the latter, we observed that the first robot produces the signal B. To verify whether signal C reduces the chances that more than two robots enter in the same target area, we tested a team consisting of three robots in an environment including only a single target area in a normal condition and in a “Deprived” control condition in which communication was disabled (i.e. in which the state of the four communication sensors of all robots was always set to a null value). At the beginning of each trial two robots are placed inside the target area and one robot is placed outside the target area with randomly selected positions and orientations. Testing the best evolved team of robots in the two conditions we observed that the percentage of trials in which the robot initially placed outside the target area erroneously enters the area is 4.8% and 43.7%, in the normal and control condition, respectively (see grey bars in Figure 6). To verify whether signal C increases the chances that a robot exits from a target area that contains more than two robots we tested a team of three robots in an environment including only a single target area in a normal condition and in a “Deprived” control condition in which communication was disabled. At the beginning of each trial, all three robots were placed inside the target area with randomly selected positions and orientations. The percentage of times in which one of the three robots exit from the target area is 52.8% and 26.9% in normal and deprived conditions, respectively (see Figure 7, grey bars). 9 0 20 40 60 80 100 Normal Deprived Figure 6. Percentage of trials in which a third robot erroneously enters in a target area that already contains two robots in a normal condition (“Normal”) and in a control condition (“Deprived”) in which robots were not allowed to detect signals. Grey and black bars represent the performance of reactive and non-reactive robots, respectively. Average performance obtained by testing the robots for 1000 trials lasting 100 seconds. 0 20 40 60 80 100 Normal Deprived Figure 7. Percentage of times in which one robot exit from a target area that contains three robots in a normal condition (“Normal”) and in a control condition (“Deprived”) in which robots were not allowed to detect signals. Grey and black bars represent the performance of reactive and non-reactive robots, respectively. Average performance obtained by testing the robots for 1000 trials lasting 100 seconds. 4.2 Experiment II – Non-Reactive robots Non-reactive robots of the best replication (the same described in Figure 2, right) produce at least five different types of signals. The signals A, B, C and D are analogous to the corresponding signals produced by reactive robots (i.e. although the value of the signals varies, they are produced in the same circumstances and have functionally similar effects). More precisely, non-reactive robots produce the following signals: (a) a signal A with an value of about 0.42 produced by robots located outside the target areas that do not detect other robots’ signals; (b) a signal B with an value of about 0.85 produced by robots located alone inside a target area; 10 (c) a signal C, an highly varying signal with an average value of 0.57, produced by robots located inside a target area that also contains another robot; (d) a signal D with an value of about 0.07 produced by robots outside target areas that are approaching a target area and are interacting with another robot located inside the target area. (e) a signal E, an highly varying with an average value of 0.33, produced by robots located outside the target areas interacting with other robots also located outside target areas. Robots receiving these five types of signals modify their motor and/or signalling behaviour on the basis of the signal received and of other available sensory information. More specifically: (1) robots located outside the target areas receiving signal A modify their signalling behaviour by producing signal E; (2) robots located outside target areas receiving signal B tend to modify their motor behaviour (by approaching the robot emitting the signal and the corresponding target area) and their signalling behaviour by producing signal D; (3) robots located outside the target areas receiving the signal C (i.e. the signal produced by two robots located inside the target area) modify their motor behaviour so as to move away from the signal source; (4) robots located inside target areas receiving the signal C (i.e. the signal produced by two robots located inside the target area) modify their motor behaviour so as to exit from the target area. (5) robots located outside the target areas receiving signal E tend to modify their motor behaviour to better explore the environment. The fact that signals A and E produced by robots located outside target areas allow them to explore the environment more effectively (i.e. to more quickly find target areas) is demonstrated by the fact that the average time in which the first robot enter in one of the two target areas is 5.922s and 6.478s in normal and deprived conditions, respectively (see the black bars in Figure 4). By testing the best teams of the other replications of the experiment similar results were observed in most of the cases (result not shown). Overall, these results indicate that robots exploit their signalling behaviour to produce a form of coordinated exploration that increases their ability to quickly find the target areas. To identify the relative roles of the two signals we ran an additional test in which robots were allowed to produce and detect signal A but were not allowed to switch from signal A to E (they were forced to produce signal A even when they start to detect the signal A produced by other robots). The obtained result (i.e. an average time of 6.952s) indicates that the functionality is provided by the signal E, while the role of signal A is that to trigger the production of signal E. The fact that signal B increases the chances that other robots enter the target area from which the signal is produced is demonstrated by the fact that the percentage of trials in which a robot placed outside the target area enters in a target area that already contains a single robot is 97.2% and 75.4% in the case of robot tested in normal and deprived conditions, respectively (see the black bars in Figure 5). The fact that signal C reduces the chances that other robots enter into a target area that already contains two robots is demonstrated by the fact that the percentage of times in which a third robot joins a target area that already contains two other robots is 2.3% and 82.6% in normal and deprived conditions, respectively (see Figure 6, black bars). The fact that signal C increases the chances that a robot exits from target area that contains more than two robots is demonstrated by the fact that the percentage of times in which one of three robots located in the same target area exit the area is 84.6% and 2.7% in normal and deprived conditions, respectively (see Figure 7, black bars). 11 The functionality of signal D (both in the case of non-reactive and reactive robots) and more generally the functionality of the effects that signals detected have on the type of signals produced will be discussed in the next section. 5. The evolved communication system: communication modalities Evolving robots might rely on mono or bi-directional communication forms. In mono- directional communication forms, the motor behaviour or the signal produced by one individual affects the behaviour of a second individual but the behaviour of the latter individual does not alter the behaviour of the former individual. In these forms of communication, the two robots play the role of the ‘speaker’ and of the ‘hearer’, respectively, and communication can be described as a form of information exchange (in which communication allows the ‘hearer’ to access information that is available to the ‘speaker’) or as a form of manipulation (in which the ‘speaker’ alters the behaviour of the ‘hearer’ in a way that is useful to the ‘speaker’ or to both the ‘speaker’ and the ‘hearer’). In bi-directional communication forms, on the other hand, the motor or signalling behaviour of one individual affects the second individual and vice versa. In these forms of communication each robot plays both the role of the ‘speaker’ and of the ‘hearer’ (i.e. different roles cannot be identified). Another important aspect that characterizes communication forms is whether they consist of static or dynamical processes. In static communication forms, the signal produced by an individual is only a function of the current state of the individual. In dynamic communication forms, instead, the signal produced at a given time step is also a function of the signals produced and detected previously. As an example of a static communication form we might consider the case of a robot emitting an alarm signal continuously (until the danger disappears). As an example of a dynamic communication form we might consider the case of two individuals that alternatively play the role of the speaker and of the hearer by taking turns (Iizuka and Ikegami, 2003a, 2003b). Bi-directional and dynamical communication forms might lead to emergent properties (e.g. synchronization or shared attention) that result from the mutual interaction between two or more individuals and that cannot be explained by the sum of the individual contributions only (Di Paolo, 2000). In the experiments reported in this paper the modalities that regulate communicative behaviours are not predefined but vary within evolving individuals. Indeed, as we will see, evolved agents use different communication modalities in different circumstances. To describe the communication modalities used, let us consider a simplified situation in which a team consisting of two robots is placed in an arena that includes only a single target area. Figure 8 (left) and Figure 9 show the typical motor and signalling behaviour exhibited by two reactive robots. Initially the two robots are both outside the target area and produce a signal with an value of about 0.07 (signal A). When the two robots get close enough and detect the other robot’s signal, they slightly change their motor trajectory so as to increase their chances to end up in a target area. Individual #0 reaches the target area and starts to produce a signal with an value of about 0.45 (signal B). Once robot #1 gets close enough to robot #0 (i.e. as soon as it starts to detect the signal B produced by robot #1) it modifies its trajectory so as to approach the direction of the signal and it starts to produce signal D (i.e. a signal with almost null value). Later on, when robot #1 enters the area, the two robots start to produce an highly varying signal with an value of about 0.25 (signal C). Signal C affects the motor behaviour of robots located outside the target area (which tend to avoid the target area) and inside the target area (which tend to exit from areas that contains more than two robots). Signal C also affects the signalling behaviour of other robots located inside a target area. 12 Indeed, robots located inside a target area switch their signalling behaviour from B to C when they detect the signal produced by another robot located in the same target area. Figure 8. The behaviour of two robots tested in an arena including a single target area. The dashed and full lines represent the trajectory of robot #0 and #1, respectively. The numbers indicate both the starting and ending positions of the corresponding robots. Left: typical behaviour exhibited by a reactive robot. Right: typical behaviour exhibited by a non-reactive robot. 13 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 1250 1500 lifecycles si gn al in te n si ty 0 & 1 out 0 in, 1 out 0 & 1 in A B C D Figure 9. Values of the signals produced by the two reactive robots during the behaviour shown in the left side of Figure 8. Dashed and full lines indicate the values of the signals produced by robot #0 and #1, respectively. Letters (A,B,C, and D) indicate the 4 classes of signals produced by the robots. The black lines at the bottom of the figure indicate the three phases in which: (1) both robots are outside the target area, (2) robot #0 is in and robot #1 is out, and (3) both robots are inside the target area. The grey lines at the bottom of the figure indicate the phases in which the two robots are located within the signal range. The short signals produced when the robots outside target areas are produced when they detect an obstacle through their infrared sensors. These signals do not seem to play any functional role. As shown in Figure 8 (right) and Figure 10, non-reactive robots rely on similar communication modalities. Initially the two robots are both outside the target area and produce a signal with an value of about 0.42 (signal A). As soon as the two robots get close enough to detect their signals, they produce a signal with a varying value and an average value of 0.33 (signal E) and they vary their motor trajectory by increasing their turning angle so to increase their chance to enter into a target area. After some time robot #0 reaches the target area and starts to produce a signal with an value of about 0.85 (signal B). Later on, once robot #1 returns close enough to robot #0 and detects the signal B produced by robot #0, it modifies its motor trajectory (by approaching robot #0) and its signalling behaviour (by producing signal D, i.e. a signal with an almost null value, instead of signal A). When also robot #1 enters the area, the two robots start to produce a varying signal with an average value of about 0.57 (signal C). This signal reduces the probability that other robots will enter in the area and eventually, if an additional robot erroneously joins the area, increases the probability that a one of the robots exits from the area. 14 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 lifecycles si gn al in te n si ty 0 & 1 out 0 in, 1 out 0 & 1 in A B C D E Figure 10. Values of the signals produced by the two robots provided with hidden units during the behaviour shown in the right side of Figure 7. Dashed and full lines indicate the values of the signals produced by robot #0 and #1, respectively. Letters (A, B, C, D and E) indicate the 5 classes of signals produced by the robots. The black lines in the bottom part of the figure indicate the three phases in which: (1) both robots are out the target area, (2) robot n.0 is in and robot n.1 is out, and (3) both robots are inside the target area. The grey line in the bottom part of the figure indicate the phases in which the two robots are located within the signal range. The short signals produced when the robots outside target areas are produced when they detect an obstacle through their infrared sensors. These signals do not seem to play any functional role. By analysing the functionality of the different signals and the context in which they are used, we can see how evolved robots use different communication modalities and select on the fly the modality that is appropriate for the current situation. The situation in which one robot is located inside a target area and another robot is located outside, within the communication range, is a case in which the former robot has access to information (related to the location of the target area) to which the latter robot does not have access to. In this particular case, communication should be mono-directional since the latter robot should change its behaviour on the basis of the signal produced by the former robot while the former robot should not necessarily change its motor or signalling behaviour as a result of the signal produced by the latter robot. Indeed in this situation the evolved robots rely on a mono-directional communication form in which the former robot produces the signal B and the latter robot switches its signalling behaviour off by producing the signal D (i.e. a signal with an almost null value). This communication interaction thus can be described as an information exchange behaviour in which the former robot (the speaker) produces a signal that encodes information related to the location of the target area and the latter robot (the hearer) exploits this information to navigate toward the area. Or, 15 alternatively, this communication interaction can be described as a form of manipulation in which the former robot (the speaker) manipulates the motor behaviour of the latter robot (the hearer) so as to drive the robot toward the target area. The ability of robots located outside target areas to switch their signalling behaviour off (i.e. to produce the signal D) as soon as they detect the signal B plays an important function both in the case of reactive and non-reactive robots. Indeed, by testing a team of two robots in an environment including a single target area, in a normal condition and in a control condition in which robots were prevented from the ability to switch between signal A and D, we observed that performance in the control condition are much worse. More precisely, the percentage of trials in which both robots were able to reach the target area within 100 seconds drop from 80.9% to 22.5% (in the case of reactive robots) and from 97.2% and 23.8% (in the case of non-reactive robots) in the normal and control conditions, respectively (Figure 11). 0 20 40 60 80 100 Normal No Modulation Figure 11. Percentage of trials in which a team of two robots randomly placed in an environment with only one target area are able to both enter in the target area. Tests performed in a normal conditions and in a control condition in which robots outside target area were not allowed to switch their signalling behaviour from A to D. Grey and black bars represent the performance of reactive and non-reactive robots, respectively. Average performance obtained by testing the robots for 1000 trials lasting 100 seconds. On the contrary, when two robots are located in the same target area, none of the two robots have access to the relevant information (i.e. the fact that the target area contains two robots). This information, however, can be generated by the interaction between the two robots through a bi-directional communication modality. This is indeed the communication modality that is selected by evolved robots in this circumstance. The signal produced by one of the two robot affects the signal produced by the second robot, and vice versa. This bi-directional interaction allow the two robots to switch from signal B (i.e. a signal that increases the chances that other robots will joint the area) to signal C (i.e. a signal that decreases the chances that other robots will joint the area). Interestingly in this circumstance, evolved robots also rely on a dynamical communication modality, i.e. they produce signals that vary in time as a result of signals previously produced and detected by the two robots. More precisely, in the case of non- reactive robots, the signal C tend to vary in time as a result of the following factors: (1) the value of the signal detected inhibits the signal produced, (2) the intensity of the inhibition also depends on the direction of the detected signal, (3) the signal tends to be detected by always varying relative directions since robots located inside target area turn on the spot. 16 The production of an oscillatory signal with an average value of 0.57 (in the case of non- reactive robots) in this situation, rather then a stable non-dynamical signal, plays an important functional role. Indeed, we observed that evolved robots rely on oscillatory signals in all the replications of the experiment (both in the case of reactive and non-reactive robots). Moreover we observed that stable signals do not allow to reach the same level of performance. To ascertain whether the production of a stable signal could lead to the same functionality of this oscillatory signal we performed a test in which non-reactive robots were forced to emit a stable signal when located in a target area that contained two robots. Robots were allowed to behave normally in all other cases. The test was repeated 10 times by using stable signals with 10 different values ranging from 0.1 to 1.0. The fact that, as shown in Figure 12, obtained performance is always lower than performance obtained by allowing the robots to produce the oscillatory signal confirms that the dynamical nature of the signal is functional. 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 12. The continuous line represents the fitness values obtained by a team of non-reactive robots tested in a control condition in which robots are forced to emit a stable signal when they are located in a target area that contains two or more robots. The vertical and horizontal axis represent the fitness and the value of the signal, respectively. The dashed line represents a benchmark showing the value of the fitness reached by a team tested in normal condition. For each condition robots are tested for 1000 trials. One reason that might explain the necessity to rely on an oscillatory signal in this circumstance is the fact that the signal C has at least three different functions: it informs other robot located in the target area of the presence of the signalling robot, it reduces the probability that other robots enter the target area, and it increases the probability that, when the target area contains more than two robots, one of the robot will exit the area. Indeed by analysing the behaviour of the robots in the test in which robots were forced to produce signals with a fixed value we observed that: (a) when the value of the signal is below 0.7, robots tend to erroneously exit from the target area also when the area includes only two robots, and (b) when the value of the signal is 0.7 or above, both the ability to reduces the chances that other robots joint the target area and the ability to increase the chances that robots exit from the target area (when the area contains more that two robots) are severely impaired. Another possible reason that might explain the necessity to produce an oscillatory signal is the fact that the signal C must produce the same effect (i.e. reduce the chances that other robots enter in the target area) both when the signal is produced by two or three interacting robots located into the same target area, and two different effects (i.e. increase the chances that one robot exit from the target area or not) when the signal is produced by three or two robots located in the same target area, respectively. The frequency of oscillation of signal C varies when the signal is produced by two or three robots located in the same target area. Indeed, by analysing the signals produced in these two circumstances by a Fourier Transform, we observed that the frequency and the 17 spectrograms are different in the two cases (see Figure 13). These two signals, C1 and C2, emitted by a robot located in an area with another or two other robots respectively, have different functions since signal C2 increases the chances that one of the robot exits from the target area while signal C1 does not. Figure 13. The two spectrograms (obtained by the Fourier Transform) of the signal C emitted by a robot. The left and right pictures correspond to the signal produced by a robot located in a target area that contains another or another two robots, respectively. The sampling frequency is 10Hz, since each communication output is emitted by a robot each 0.1 seconds. Therefore, the frequency components range is [0,5] Hz. 6. Relation between individual and social/communicative behaviour Since the robots individual and social behaviour co-evolve we might wonder which the relation between these two forms of behaviour is and how the possibility to co-adapt these forms of behaviour is exploited in evolved individuals. The fact that the performance of robots that are tested in the “Deprived” control condition is similar to that of robots evolved and tested in a “No-signal” control condition (see Figure 3) indicates that evolved robots develop an effective individual behaviour (i.e. a behaviour that maximizes the performance that can be achieved without signalling) even if they have always been evaluated in a normal condition (in which signals are available). The adaptive pressure toward the development of an effective individual behaviour can be explained by considering that the social enhancement that can be achieved by exploiting the signal produced by the other robots is not always guaranteed. Indeed, the availability of the signals required is due to the presence of other robots in the right environmental locations that, in turn, is influenced by unpredictable variable such us the initial positions and orientations of the robots. By analysing the behaviour displayed by evolved robots tested in the “Deprived” control condition (Figures 15), we can see that both reactive and non-reactive robots are able to spend about 60% of their lifetime in the three most favourable conditions (in which the team gathers a fitness of 0.5, 0.75, or 1.0) and less than 10% of their lifetime in the two least favourable conditions (in which the team gathers a fitness of -0.25 or -1.0). These performances are achieved through a simple behaviour (see Figure 14) that includes the following elementary behaviours: (a) when robots approach walls or other robots they avoid the obstacles by turning approximately 90o; (b) when the robots are far from walls and are not located in target areas they move by producing a curvilinear trajectory; (c) when the robots are located in a target area, they remain in the area by turning on the spot. Reactive and non-reactive robots mainly differ with respect to the curvilinear trajectories produced far from walls, which lead to smaller and larger semi-circles in the case of reactive and non-reactive robots, respectively. These larger semi-circular trajectories allow non- reactive robots to find the target areas much more quickly with respect to non-reactive robots. 18 Indeed, the percentage of lifecycle in which all the four robots are located in the target areas is about 17% and 44%, on average, in the case of reactive and non-reactive robots (see Figure 15). The fact that non-reactive robots are better in finding the target areas, however, does not translate into better performance (in the “Deprived” condition) since non-reactive robots are more likely to spend their lifetime both in positive conditions (in which each target area contains one or two robots) and negative conditions (in which a target area contains three or four robots). As a consequence, although non-reactive robots display a better exploration strategy than reactive robots, overall performance of reactive and non-reactive robots is similar in “Deprived” conditions. Figure 14. Typical behaviour displayed by the team of evolved robots in a “deprived” condition in which they are not allowed to detect other robots’ signals for reactive (left) and non-reactive robots (right). The numbers indicate the starting and ending position of the corresponding robot (the ending position is also marked with a white circle). 0 0.05 0.1 0.15 0.2 0.25 0.3 void 1 2 1 + 2 2+2 1+3 3 4 19 Figure 15. Percentage of lifecycles spent by a team of four robots in the 8 possible different states tested in a “Deprived” condition in which robots are not allowed to detect other robots’ signals. “Void” indicates the case in which all the four robots are located outside target areas (fitness = 0.0). “1” indicates the case in which only a single robot is located in a target area (fitness = 0.25). “2” indicates the case in which two robots are located in target areas. “1+2” indicates the case in which one robot is located in a target area an other two robots are located in the other target area (fitness = 0.75). “2+2” indicates the case in which each of the two target area contains two robots (fitness = 1.0). “1 + 3” the case in which one target area contains one robot and the other three robots (fitness = 0.0). “3” indicates the case in which three robots are located in the same target area (fitness = -0.25). “4” indicates the case in which four robots are located in the same target area (fitness = -1.0). Grey and black bars represent the performance of reactive and non-reactive robots, respectively. Average performance obtained by testing the robots for 1000 trials lasting 1000 cycles. When we look at the relation between individual and social behaviour, however, we can see that the characteristics of the individual behaviour exhibited by reactive and non-reactive robots perfectly match with the characteristics of their social/communicative behaviour. In other words, individual and social behaviour are tightly co-adapted. The fact that reactive robots display a ‘sub-optimal’ exploration behaviour (i.e. a behaviour that does not maximize the probability to quickly find and enter into target areas) can be explained by considering their limited ability to avoid target areas that already contain two robots and to exit from target areas that contain more than two robots (see Figure 6 and 7). A reliable ability to avoid situations in which more than two robots are located into the same target area thus constitute a pre-requisite for the emergence of a better exploration ability. The lack of this pre-requisite in reactive individuals explains why their exploration ability is not further optimised. On the other hand, the better capability of non-reactive robots to avoid situations in which more than two robots are located in the same target area (see Figure 6 and 7), explains why evolved non-reactive robots developed a more effective exploration behaviour. Indeed, if we look at the time spent by robots in target areas that contain more than two robots, we can see that in a “Deprived” condition, non-reactive robots are much worse than reactive robots (Figure 15). In normal conditions, instead, non-reactive robots are much better than reactive robots (Figure 16). 20 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 void 1 2 1 + 2 2+2 1+3 3 4 Figure 16. Percentage of lifecycles spent by a team of four robots in the 8 possible different states (see the legend of Figure 15) tested in a normal condition. Grey and black bars represent the performance of reactive and non-reactive robots, respectively. Average performance obtained by testing the robots for 1000 trials lasting 1000 cycles. 7. Discussion In this paper we have described the results of an experiment in which an effective communication system arises among a collection of initially non-communicating agents evolved for the ability to solve a collective navigation problem. By analysing the obtained results we observed how evolving individuals developed: (a) an effective communication system, (b) an effective individual behaviour, (c) an ability to rely on different communication modalities and to autonomously select the modality that is appropriate to the current circumstances. The communication system that emerges in the experiments is based on 4-5 different signals that characterize crucial features of the environment, of the agents/agents relations, and agents/environmental relations (e.g. the relative location of a target area, the number of agents contained in a target area, etc.). These features, that have been discovered autonomously by the agents themselves, are grounded in agents’ sensory-motor experiences. Used signals, therefore, do not only refer to the characteristics of the physical environment but also to those of the social environment constituted by the other agents and by their current state. Evolved individuals also display an ability to appropriately tune their individual and communicative behaviour on the basis of the signals detected (e.g. by approaching, avoiding, or exiting a target area, by modifying their exploratory behaviour, etc.) . Indeed, the type of signals produced, the context in which they are produced, and the effect of signals detected constitute three interdependent aspects of the communication system that co-adapt during the evolutionary process and co-determine the ‘meaning’ and the efficacy of each signal and of the communication system as a whole. 21 The individual behaviour of evolved robots includes simple elementary behaviours that allow robots to avoid obstacles, explore the environment, and remain in target areas. Interestingly, robots individual behaviour tend to be optimised (with respect to the possibility to obtain the best possible performance when signals produced by other robots are not available) despite the fact that robots are always evaluated in social conditions during the evolutionary process. This unexpected result might be explained by considering that required signals might not always be available (even in normal conditions in which robots are allowed to communicate) since their availability depends on the physical location of the robots in the environment that in turn depends on unpredictable events such us robots initial positions and orientations or noise). In other words, optimised individual behaviours guarantee good performance even when required signals are not available. This tendency to optimise both individual and social behaviour leads to the development of control systems structured hierarchically according to a layered organization in which the individual abilities represent the most basic layer and communication/social ability represents an higher level layer that modulates the lower level. The fact that communication abilities represent a high level structure that modulates the basic individual behaviours of the robots does not prevent evolving robots to co-adapt their individual and communicative behaviour. Indeed, by comparing the results of different replication of the experiments, we observed that individual behaviours tend to be selected in order to maximize individual performance (when signals from other robots are not available) but also in order to maximize the performance that can be achieved by combining the robots individual and social capabilities. Evolved robots also exploit different communication modalities (e.g. mono-directional communication forms in which one robot act as a ‘speaker’ and a second robot act as a ‘hearer’ or bi-directional communication forms in which two robots concurrently influence each other through their signalling and/or motor behaviour) by selecting the modality that is appropriate to each specific communicative interaction. Evolving individuals also engage in complex communication behaviours that involve three different robots that concurrently affect each other so to produce appropriate collective behaviours (e.g. so to push one of the three robots located inside the same target area out of the area). Evolved robots also exploit time varying signals that allow them to generate information that is not available to any single robot (e.g. information related to how many robots are located in a target area) and that serve different functions. The analysis of the evolutionary dynamics suggests that new individual capabilities might represent a crucial pre-requisite toward the development of new communication capabilities and vice versa. For example, the individual ability to explore the environment by entering and remaining into target areas represents a crucial pre-requisite for the development an ability to produce signal B, that attract other robots toward the target area. On the other hand, the emergence of a social/communicative ability to avoid target areas that contain two robots and to exit from areas that contain more than two robots, represents crucial pre-requisites for the development of better individual exploration strategies. In fact, as we showed in section 6, very effective exploration strategies provide an adaptive advantage only in combination with effective communication systems that allow to robots to avoid situations in which more than two robots are located in the same target area. This process in which progress in individual abilities might pose the basis for the achievement of progresses in communication abilities and vice versa might lead to an open ended evolutionary phases in which individuals tend to develop progressively more complex and effective strategies. 22 Acknowledgments The research has been supported by the ECAGENTS project funded by the Future and Emerging Technologies programme (IST-FET) of the European Community under EU R&D contract IST-1940. References Baldassarre G., Nolfi S. & Parisi D. (2003). Evolving mobile robots able to display collective behaviour. Artificial Life, 9: 255-267. Cangelosi A. & Parisi D. (1998) The emergence of a ‘language’ in an evolving population of neural networks. Connection Science, 10: 83-97 Di Paolo E.A. (2000). Behavioural coordination, structural congruence and entrainment in a simulation of acoustically coupled agents. Adaptive Behaviour 8:1. 25-46. Kirby S. (2002). Natural Language from Artificial Life. Artificial Life, 8(2):185--215. Iizuka H. and Ikegami T. (2003a). Adaptive Coupling and Intersubjectivity in Simulated Turn-Taking Behaviours. In Banzahf et al. (Eds.), Proceedings of ECAL 03, Dortmund: Springer Verlag. Iizuka H. and Ikegami T. (2003b). Simulating Turn-taking Behaviors with Coupled Dynamical Recognizers. In R.K. Standish, M.A. Bedau and H.A. Abbass (Eds.), MIT, Proceedings of Artificial Life VIII, Cambridge, MA: MIT Press. Iizuka H. & Ikegami T. (2004). Simulating autonomous coupling in discrimination of light frequencies. Connection Science. 16(4): 283-299. Marocco D., Cangelosi A. & Nolfi S. (2003), The emergence of communication in evolutionary robots. Philosophical Transactions of the Royal Society London - A, 361: 2397-2421. Nolfi S. (2002). Evolving robots able to self-localize in the environment: The importance of viewing cognition as the result of processes occurring at different time scales. Connection Science (14) 3:231-244. Nolfi S. (2005). Emergence of Communication in Embodied Agents: Co-Adapting Communicative and Non-Communicative Behaviours. Connection Science. (17) 3-4:231- 248. Nolfi S. & Marocco D. (2001). Evolving robots able to integrate sensory-motor information over time, Theory in Biosciences, 120:287-310. Quinn M. (2000). Evolving cooperative homogeneous multi-robot teams. In Proceedings of the IEEE / RSJ International Conference on Intelligent Robots and Systems (IROS 2000). IEEE Press. Quinn M. (2001). Evolving communication without dedicated communication channels. In Kelemen, J. and Sosik, P. (Eds.) Advances in Artificial Life: Sixth European Conference on Artificial Life (ECAL 2001). Springer Verlag. Quinn M., Smith L., Mayley G. & Husbands P. (2003). Evolving controllers for a homogeneous system of physical robots: Structured cooperation with minimal sensors. Philosophical Transactions of the Royal Society of London, Series A: Mathematical, Physical and Engineering Sciences 361, pp. 2321-2344. Steels L. (1999). The Talking Heads Experiment, Antwerpen, Laboratorium. Limited Pre- edition. Steels L. (2003) Evolving grounded communication for robots. Trends in Cognitive Science. 7(7): 308-312. Steels L. and Kaplan F. (2001). AIBO's first words: The social learning of language and meaning. Evolution of Communication, 4:3-32. 23 Steels L. & Vogt P. (1997) Grounding adaptive language games in robotic agents. In: P. Husband & I. Harvey (Eds.), Proceedings of the 4th European Conference on Artificial Life. Cambridge MA: MIT Press. Werner, G.M. & Dyer M.G. (1991). Evolution of communication in artificial organisms. In Langton, C. G., Taylor, C., Farmer, J. D., and Rasmussen, S. (Eds.) Proceedings of the Workshop on Artificial Life. pages: 659-687. Reading, MA, Addison-Wesley. Wagner K., Reggia J.A., Uriagereka J., Wilkinson G.S. (2003). Progress in the simulation of emergent communication and language. Adaptive Behavior, 11(1):37-69. 24 Introduction Related literature Experimental set-up and emergence of communication The evolved communication system: signals produced and their Experiment I – Reactive robots Experiment II – Non-Reactive robots The evolved communication system: communication modalities Relation between individual and social/communicative behavio Discussion