key: cord-0253741-jf6tnrq9 authors: Huber, Tobias; Mertes, Silvan; Rangelova, Stanislava; Flutura, Simon; Andr'e, Elisabeth title: Dynamic Difficulty Adjustment in Virtual Reality Exergames through Experience-driven Procedural Content Generation date: 2021-08-19 journal: nan DOI: 10.1109/ssci50451.2021.9660086 sha: 917aaaadcf97a52f56ed7b353ccdfb6a737665d7 doc_id: 253741 cord_uid: jf6tnrq9 Virtual Reality (VR) games that feature physical activities have been shown to increase players' motivation to do physical exercise. However, for such exercises to have a positive healthcare effect, they have to be repeated several times a week. To maintain player motivation over longer periods of time, games often employ Dynamic Difficulty Adjustment (DDA) to adapt the game's challenge according to the player's capabilities. For exercise games, this is mostly done by tuning specific in-game parameters like the speed of objects. In this work, we propose to use experience-driven Procedural Content Generation for DDA in VR exercise games by procedurally generating levels that match the player's current capabilities. Not only finetuning specific parameters but creating completely new levels has the potential to decrease repetition over longer time periods and allows for the simultaneous adaptation of the cognitive and physical challenge of the exergame. As a proof-of-concept, we implement an initial prototype in which the player must traverse a maze that includes several exercise rooms, whereby the generation of the maze is realized by a neural network. Passing those exercise rooms requires the player to perform physical activities. To match the player's capabilities, we use Deep Reinforcement Learning to adjust the structure of the maze and to decide which exercise rooms to include in the maze. We evaluate our prototype in an exploratory user study utilizing both biodata and subjective questionnaires. Working and leisure conditions in today's society are shifting further from physical exertion to purely digital activities. Especially with the ongoing COVID-19 pandemic and the associated restrictions to physical leisure activities, people do not achieve the recommended levels of physical activity. The back in particular is at risk for postural damage due to monotonous and sedentary work. One way to tackle this problem is to use Virtual Reality (VR) games that encourage physical exercise. The use of VR exercise games (often called exergames) has been shown to increase the motivation to do physical activity for workers in sedentary occupations [1] , [2] . In order for exercises to show a positive effect on the users' health, they should be done multiple times per week [3] . However, periodically repeating the same tasks often gets boring. In order to keep players motivated over a longer period of time, we propose to combine two methods commonly used to tackle problems of repetition and boredom: Procedural Content Generation (PCG) and Dynamic Difficulty Adjustment (DDA). DDA keeps players motivated by matching the game's challenge to the players' skill level. When done correctly, this allows players to enter a state of flow between anxiety and boredom [4] , [5] . For exergames, DDA has the additional benefit of adjusting the difficulty of the exercises such that they provide efficient physical training without overburdening the player [6] . Incorporating PCG into exergames to create visually different levels has been shown to reduce repetition between different play sessions [7] . In this work, we present a prototype of a VR exergame that utilizes PCG for DDA by creating game levels whose difficulty matches the player's capabilities. In our prototype, the player has to traverse a maze that includes exercise rooms that have to be completed when the player wants to pass them. As an exemplary use case, we select several exercises to prevent lower back pain. The mazes are procedurally generated by a neural network that is trained with a Deep Reinforcement Learning (DRL) algorithm in order to adapt the difficulty of the generated mazes to the player. An example for such a maze is shown in Fig. 1 . The difficulty of the maze is mainly influenced by two factors: 1) the physical exertion during the exercise rooms and 2) the complexity of the maze. More complex mazes are more difficult to traverse and players might have to repeat an exercise room several times. Thus, adjusting the structure of the maze and the difficulty of the exercise rooms contained within the maze allows for implicit control of the physical and cognitive effort that the user has to put in. Such a combination of physical and cognitive effort is often used in commercial exergames, such as Beat Saber 1 , one of the best-selling exergames. In Beat Saber, the player must physically hit an incoming sequence of virtual blocks. The difficulty is determined by the complexity of the sequence (cognitive effort) and the time in which the user must hit the blocks (physical effort). This combination prevents Beat Saber from becoming boring, as players can choose sequences that suit their cognitive and physical abilities. However, Beat Saber only uses predefined sequences and does not dynamically adjust the difficulty. The fact that the required physical effort and the complexity of the generated levels both have to be adapted to the player's capabilities makes exergames more challenging for DDA than traditional games. If the current level is too complex but requires a fitting amount of physical effort, then the DDA algorithm should only adjust the complexity but not the physical challenge and vice versa. As far as we are aware, our prototype is the first approach to explore the use of PCG for DDA in exergames, allowing for simultaneous adaption of the required cognitive and physical effort. Furthermore, it is the first prototype to show the feasibility of RL for DDA in a VR-based exergame. II. RELATED WORK a) VR Exergames increase physical activity: Using immersive VR games to motivate players to exercise has been explored since 2011 when Finkelstein et. al [8] introduced the game Astrojumper. Here, players had to avoid incoming asteroids by moving their bodies. While Astrojumper utilized a three-wall stereoscopic projected display, Charoensook et al. [9] showed that VR exergames using a Head-Mounted Display (HMD) can also increase players' heart rates during play sessions. More recently, Yoo et al. [1] allowed workers in a sedentary workplace to play a range of commercial VR games during work breaks over the duration of eight weeks and measured their exertion through questionnaires and a wearable heart rate monitor. Their results show that the VR games 1 https://www.beatsaber.com/ motivated the workers over a longer period of time to be physically active. b) Dynamic Difficulty Adjustment (DDA): To keep players engaged over several months, it is important that the games do not become boring when the players eventually get used to them. On the flip side, new players should not be overwhelmed with the challenges of the game. When both the capabilities of the player and the challenges of the game are properly balanced, players enter a flow-state, feeling a deep sense of enjoyment [4] , [5] . For this reason, most single-player games continuously increase the difficulty during the course of the game. Such a predetermined difficulty increase, however, can never perfectly fit the different learning speeds of all players. This is especially true when the difficulty is linked to physical exercises. DDA tries to tackle this problem by adjusting the difficulty during gameplay based on the player's capabilities [10] , [11] . A common method for DDA is to apply Reinforcement Learning (RL). The basic idea of RL is that an agent interacts with an environment in order to maximize the accumulated reward given by a reward function. By incorporating the player's performance into the action selection [12] or the reward function [13] , RL can be used to train AI opponents that play on the same level as the player. Instead of training non-player characters, other approaches use RL to fine-tune specific in-game parameters such as speed and size of objects that directly influence the difficulty of the player's task [14] . c) Procedural Content Generation (PCG): In games, PCG refers to the autonomous generation of game content through algorithmic means [15] . This is often used to increase replay value by creating vast amounts of different content without the need for more and more human designers and artists. Such PCG systems in commercial games mostly do not take player behavior into account [15] . For exergames, Pezzera et al. [7] showed that using PCG to create visually different levels without adapting to the player can already reduce repetition and therefore increase player motivation. This work focuses on experience-driven PCG systems that are used for DDA by procedurally generating game levels that match the player's capabilities [16] . Similar to our approach, Shaker et al. [17] train a neural network to generate levels for a platform game based on the player's performance. Others utilize Bayesian optimization [18] and evolutionary algorithms [19] to create fitting levels procedurally. For mazes in particular, van der Linden et al. [20] propose graph grammars to allow game designers to generate mazes of a specific difficulty levels. In contrast to our approach, the aforementioned methods did not create content for exergames. The difficulty of their levels was mainly based on the complexity of the levels and did not have to adapt to the physical exertion of the player. Balancing the cognitive and physical difficulty presents an additional challenge for a combination of PCG and DDA in exergames. d) DDA in Exergames: Despite the promising results of DDA for motivating players, there are only few VR exergames that use DDA and so far those games only tune small amounts of in-game parameters based on heuristics [6] , [8] . In this work, we propose to use a combination of RL and procedural generation for DDA in VR exergames, based on the promising results of those approaches in the DDA literature. For non-VR exergames, DDA is mostly used for rehabilitation games, where patients with varying degrees of impairments have very different requirements for the exergame and where the patients' capabilities might change drastically during recovery. Besides increasing physical activity, rehabilitation is one of the main applications for exergames [2] . Many rehabilitation exergames focus on post-stroke rehabilitation [14] , [21] - [24] . Here, regular exercise can help to improve mobility in affected body parts. However, other medical conditions, like Parkinson's, were also explored [2] , [25] , [26] . The DDA approaches in many of those rehabilitation exergames adjust a small number of specific parameters in the game (e.g. the speed of in-game objects) according to heuristics based on the player's performance [21] - [24] , [26] . Other rehabilitation exergames adapted similar in-game parameters through fuzzy systems [26] , evolutionary algorithms [27] and RL agents [14] . Instead of adjusting specific in-game parameters, this work presents the first prototype for an exergame that uses DDA to procedurally generate in-game levels that match the player's capabilities. By doing so, we open the possibility to adjust the required physical and cognitive efforts simultaneously. To show the feasibility of a DDA system that procedurally generates game levels adapting to the capabilities of the player in a VR exergame, we created a first prototype. In this prototype, the player has to traverse a procedurally generated maze that includes several exercise rooms (see Fig. 1 ). The cognitive challenge of the maze comes from the fact that players cannot see over the walls of the maze. Therefore, they must explore different paths in the maze and remember which paths they have already chosen. The physical challenge is given by the exercise rooms. To pass through these rooms, the players must perform physical activities that are designed to prevent back pain (see section III-A). At the end of the maze, the player rates how difficult and exhausting the maze was on a combined 5-point Likert scale (1-not at all difficult to 5-extremely difficult). Based on this rating, the difficulty of the next maze is adjusted (see section III-B). In this way, the difficulty of the generated mazes adjusts according to the player's training progress. When the current maze was too easy, the next maze will get harder, and if the current maze was too challenging, the next one will be easier. As an exemplary use case, we chose exercises to prevent back pain. Since we want our prototype to be usable with standard commercial VR setups (we used the HTC VIVE Pro 2 ) there are two main restrictions to the physical exercises we can use in the game. First, since it is very hard to lay down or get up while wearing an HMD and holding controllers in each hand, exercises that involve laying down are not feasible. Second, the exercises should not depend on specific foot movements, since those can not be tracked with a basic VR setup. Based on feedback from colleagues at the Institute of Sport Science at the University of Augsburg, we implemented the following three exercises as described in [28] : • Upper body rotation (Fig. 2 a) . Participants must hold on to two in-game bars at shoulder height and move them left and right by rotating the upper body in a smooth motion. • Forward torso bend (Fig. 2 b) . Players have to bend their torso forward, with legs extended and upper body as straight as possible, until they are able to grab an in-game bar laying on the ground. Subsequently, they have to slowly straighten their body and stretch upwards, moving the in-game bar slightly behind their back. • Bending and stretching with torso rotation (Fig. 2 c) . From a hip-width stance, the players have to bend their legs and upper body as far as possible and turn the upper body to the left in order to pick up an in-game item. Then they have to stretch the upper body upwards and turn to the right in order to place the item on a platform. The exercise is repeated with alternating starting sides. To verify that the exercises are done correctly, the game tracks the position of the controllers in the player's hands and checks whether they follow a predefined path describing the correct motion. We created variants of the rooms with different levels of exertion by varying the number of required repetitions of the physical exercise. In order to enhance the level generation with the ability to adapt the difficulty of consecutively generated mazes to the individual player, we designed an adaptive level generation system. The goal of that system is to create different maze structures that fit the player's needs with respect to cognitive and physical effort. Thus, two major factors are responsible for a sufficient maze structure: • The structure of corridors has to be solvable in a way that the player does neither feel mentally overwhelmed by its complexity nor bored by too simple structures. • The player should have to exert a reasonable degree of physical effort. Thus, exercise rooms have to appear in Fig. 3 . Exemplary sequence of our maze generation process. By iteratively placing connections to new rooms, a maze structure is created. an appropriate frequency and difficulty while traversing through the maze. RL has proven its ability to adapt the difficulty of games. However, traditional RL algorithms suffer under exploding state spaces when dealing with complex relations like the structure of a maze. Since deep learning is more suited to model complex relations, we decided to build the maze generation system based on DRL. As most established DRL algorithms share the fact that the required amount of training effort grows substantially when dealing with high-dimensional input state spaces, we do not train the algorithm to create mazes from scratch. Instead, we use a procedural approach similar to the PCGRL approach recently proposed by [29] . We designed a fixed grid of different exercise rooms, where each room consists of an exercise with a predefined difficulty level (see section III-A). Fig. 4 illustrates the final room grid that we used in our experiments. As can be seen, the position of the rooms was chosen in a way that the same exercises, although with different difficulties, are not positioned in close vicinity to each other. Further, to prevent the algorithm from adding two rooms at the same step, the exercise rooms were placed such that a single interconnection between any two rooms never results in a third room being crossed. In order to create a maze structure, we use DRL to connect the exercise rooms with corridors. To this end, the DRL algorithm successively connects two rooms, whereby different sequences of connected rooms result in different interconnection structures. Further, many room connection sequences result in crossings between the generated corridors, leading to a variety of different maze instances. By learning the sequence of the room connections, the DRL algorithm is equipped with the ability to implicitly generate mazes of different difficulty. See Fig. 3 for a simplified visualization of our generation process. To allow the DRL algorithm to solve the learning problem stated above, we modeled it as a Markov decision process. Since the goal of the DRL algorithm is to build a maze that fits the desired difficulty level, we use the player's difficulty rating at the end of each maze for the reward. For every generated maze, the difference between the desired difficulty level (in our case 3) and the actual player rating is given as negative reward to the DRL. As mentioned above, the DRL algorithm works iteratively, i.e., in every step, either a new interconnection between exercise rooms is made, or the final interconnection to the end room is made. Thus, the action space A of the DRL algorithm is defined as follows: Note that this implies that not every room is necessarily incorporated in the final maze. This is a key factor to being able to adapt to the player's physical needs, as a reduced number of exercise rooms results in less physical effort. Since every action generates a new interconnection, intermediate mazes are created. Thus, in every step, the maze that has evolved up to that point is used as part of the input state for the DRL algorithm. All in all, the state space was composed of the following components: • Intermediate maze of preceding step. The maze generated by the previous step is encoded into a 2-dimensional grid map. Exercise rooms, as well as corridors and crossings, are mapped to predefined numerical values and given to the network as a 2-dimensional array, allowing the model architecture to make use of spatial information. • Maze difficulty of preceding step. As the goal of the algorithm is to build mazes with a certain difficulty, we found that explicitly feeding the difficulty of the maze that was generated in the preceding step enhanced the performance of the model. The difficulty is assessed by running a user simulation on the intermediate maze, which will be explained in more detail later. • Number of crossings. As the number of crossings that occur in a maze is one of the key factors to different levels of difficulty, we decided to directly feed the number of crossings of the previously generated maze into the network. • Occupied exercise rooms. The 2-dimensional grid of the intermediate maze only implicitly contains the information which exercise rooms are already occupied. Thus, we include the occupied rooms, encoded as one-hot vectors, in the state space to ease training. The 2D representation of the intermediate maze of the preceding step is fed into a block of convolutional layers. The resulting output is concatenated with all other components of the input and fed into a succeeding block of fully connected layers. The network is trained using the Deep Q-Learning algorithm as proposed by Mnih et al. [30] . One crucial factor for the success of a DRL approach is the amount of training data. As human players need a certain amount of time for traversing each maze, it is not feasible to train the network solely with training data produced by real human players. Thus, we decided to pretrain our network on a user simulation that estimates the difficulty of a given maze. This simulation is also used to estimate the difficulty of the intermediate mazes contained in the input states. The user simulation is implemented to replicate the user's behavior as realistically as possible. Thus, the simulation consists of an agent that has to find its way through a maze, whereas exercise rooms that are traversed add up to the physical effort demanded by the maze. When the agent reaches a crossing the first time, the simulation randomly decides which path to take. This approximation was chosen since players never see the maze from above and therefore have to choose paths randomly at the beginning of the game until they explore more of the maze. However, as real users will probably be less likely to take the same path twice, the simulation agent remembers which way it has chosen at a certain crossing, and the probability of taking that path a second time when repeatedly passing the respective crossing is decreased. To approximate the users' physical effort, we assigned an effort level to each exercise room, modeling the physical effort that has to be invested when passing the room. For the final effort estimation of the whole maze, the effort levels of all passed exercise rooms are summed up until the end room is reached. As an initial proof-of-concept, we conducted an exploratory human user study to verify whether our prototype is able to adjust the difficulty of the second of two consecutive mazes based on user feedback for the first maze. A. Experiment Design a) Research Questions: For this study we had two main research questions: 1) is our approach able to adjust the difficulty for users who did not like the difficulty of the first maze (e.g., lower the difficulty for someone who found the first maze to be too hard and increase the difficulty for users who found it too easy.) and 2) can our approach sustain the difficulty for users that were satisfied with the difficulty of the first maze (e.g., rated the difficulty with 3 out of 5)? For both questions, a key challenge is that the approach should not adjust the maze's complexity or the required physical exertion individually but keep the balance between those two aspects. b) Methodology: For a subjective evaluation of the difficulty, we recorded the participants' in-game rating of the difficulty at the end of each maze, which is described at the beginning of section III and which is also used as input for our DDA algorithm. In addition, we asked the participants how complex and exerting each maze was after they finished playing. For complexity, we used a 5-point Likert scale (1-"not at all" to 5-"extremely"). For exertion, we used the Borg RPE (ratings for perceived exertion) scale proposed by Borg [31] , which measures the perceived exertion on a range from 6 ("No exertion at all") to 20 ("Maximal exertion"). Furthermore, we recorded an electrocardiographic (ECG) signal to objectively measure the participants' exertion level during the VR session. The ECG sensor was a 1-Lead sensor attached to the right side of the participants' upper body 3 . The sensor was connected to an 8-channel wireless hub, including eight generic inputs and one ground. The operational sample rate was 1kHz (i.e., 1000 samples were recorded for every second). The heart rate (HR) of the participants was calculated from the raw ECG signal, using the Python library Biosppy 4 . Before calculating the HR, the ECG signal was filtered using Finite Impulse Response with bandpass frequency between 3 and 45 Hz. In order to measure the participants' flow and their general satisfaction with our prototype, we used the game experience questionnaire (GEQ) [32] . The core module of this questionnaire was recently empirically evaluated by Law et al. [33] and Johnson et al. [34] . Following their suggestions, we only used the categories Competence, Immersion, Flow, and Positive Affect and excluded the "It was aesthetically pleasing" question from Immersion. c) Procedure: Before starting the experiment, the participants had to sign a consent form, followed by a short introduction to the setup and the procedure. After that, the participants filled out a pre-questionnaire containing sociodemographic questions. The pre-questionnaire additionally included two items about their previous experience with gaming ("I play games daily") and VR ("I have experience with VR") Fig. 5 . The subjective ratings of the participants who were not satisfied with the first maze (left) and participants who were satisfied (right). To unify the values, we linearly mapped the Borg scale (6-20) to a 5-point Likert scale (1) (2) (3) (4) (5) . Then we inverted the values for participants, who rated the first maze as too difficult such that an upwards trend indicates correct adaptation. The Error bar shows the 95% CI. measured on a 5-point Likert scale (1-"strongly disagree" to 5-"strongly agree"). Then, they put on the HMD and the ECG sensor was attached to their body. The VR session started with a tutorial level, which explained the controls of the game. During this tutorial, a supervisor answered all questions which the participants might have about the controls. After the tutorial, the participants were left to play the game without additional help from the supervisor, apart from warnings about hitting objects in the real world and clarifying the in-game ratings (some people thought a 5 would mean that they liked the maze, which we wanted to avoid.) Between the two mazes, there was a short break of approximately two minutes to bring participants' heart rates to a resting rate. Immediately after completing the second maze, the participants filled out a post-questionnaire that consisted of the items regarding the complexity and exertion of each maze and the GEQ. a) Participants: In order to test our proof-of-concept, we recruited 19 (5 female, 14 male) students with a mean age of 26.84 (SD=4.55). Most participants had either a bachelor's or master's degree. Four only had a high school degree and one already possessed a doctoral degree. On average, the participants reported a neutral gaming frequency (M = 2.79), and an above-average VR experience (M = 3.68). Only four of the participants had never used HMDs before. b) Research Question 1: For our first research question, we looked at the participants who were unsatisfied with the first maze. In total, the first maze was too easy (rated 1 or 2) for 9 players and too hard (rated 4 or 5) for 4 players. To unify those 13 players, we inverted the complexity and in-game ratings for the players who found it too hard (i.e. mapped 5 to 1 and 4 to 2). The results for this unified adaption group are shown on the left side of Fig. 5 . For the first maze, this adaption group had an average in-game rating of 1.69 (SD=0.46) and for the second maze 2.46 (SD=1.15). The subjective complexity rating went from a mean of 2.69 (SD=0.61) to 3.0 (SD=0.96). Since the Borg RPE scale is more nuanced, we did not unify it. The exertion rating for participants who rated the first maze as too easy went from a mean of 10.0 (SD=1.75) to 10.44 (SD=2.15). For participants who found the first maze too hard, the exertion rating went Fig. 6 . The mean HR measured in beats per minute for participants who found the first maze too easy, were satisfied with it or found it too difficult. Error bars show the 95% Confidence Interval (CI). from 13.0 (SD=1.41) to 11.0 (SD=2.0). The ECG signal results, namely the HR measured in beats per minute (bpm), are shown in Fig. 6 . Since HR is a continuous variable and has no specific desired middle value, we did not combine the participants for this value. The mean HR of participants who rated the first maze as too easy increased from 89.96 bpm (SD=15.05) to 93.90 bpm (SD=14.50). For the participants that rated the first maze as too hard, the mean HR decreased from 99.53 bpm (SD=10.07) to 97.81 bpm (SD=9.68). The trends above indicate that our prototype was able to adjust the difficultly for participants' who were unsatisfied with the first maze. c) Research Question 2: For our second research question, we only looked at participants who were satisfied with the first maze and gave it an in-game rating of 3. The subjective results of this sustain group are shown on the right side of Fig. 5 . Here, the mean in-game rating for the second maze was 2.0 (SD=0.58). The subjectively reported exertion level decreased from a mean of 10.0 (SD=3.25) to 8.67 (SD=1.86) and the reported complexity decreased from 3.0 (SD=0.0) to 1.67 (SD=0.75). The mean HR rose from 92.66 bpm (SD=12.39) to 94.40 bpm (SD=13.71), between the two mazes ( Fig. 6 middle) . Here, the results are conflicted. The subjective ratings indicate a decrease in difficulty while the HR suggests a slight increase in exertion. d) Game Experience: The results of the GEQ are shown in Fig. 7 . Noticeably, the flow value is above average with a mean of 3.42 (SD=0.71). The results of our exploratory user study indicate that there is potential in procedurally generating levels with a fitting difficulty for DDA in VR exergames. Our prototype was able to adapt both the cognitive as well as physical difficulty of the second maze according to the needs of participants who were unsatisfied with the first maze. For people who were already satisfied with the first maze, the results are conflicted. The subjective responses indicate that the second maze was easier while the HR shows slightly increased exertion. Based on the positive results for participants who were unsatisfied with the first maze, we think that a third maze would positively adapt to the participants again. Across all groups, the heart rate results for the second maze are promising since they are comparable to the approximately 95 bpm that Charoensook et al. [9] measured for the two most exerting VR games during their study on exertion during VR gameplay. The results of the GEQ (see Fig. 7) show that participants experienced an aboveaverage level of flow even though our prototype received below-average ratings for competence and immersion. This suggests that the DDA algorithm, which was the focus of our interest, helped to keep participants in a state of flow, even though other aspects of the game were not developed to their full potential. While the results of our evaluation are promising, they should only be taken as a proof-of-concept. The goal of our exploratory study was not to test a finished exergame but to test the plausibility of combining DDA and PCG to create VR exergame levels that match the participants' capabilities. Because of this exploratory nature of our study and the high associated effort and time investment for participants, we only obtained a relatively small number of participants. While this is common among exploratory evaluations of DDA in exergames ( [21] , [24] ), a bigger number of participants would be needed to statistically verify the usability of a finished VR exergame that uses our proposed approach. Furthermore, in order to see the full adaptation capabilities of such a finished exergame, it would need to be evaluated over a longer period of time in which participants use the exergame regularly. Based on the proof-of-concept in this work, a full study with a more developed exergame could investigate the potential of the approach in more detail. To inform the future design of adaptive level generation systems for exergames, we want to conclude the discussion with some lessons we learned during the user study with our prototype. a) Adapt the simulation: The user simulation in our prototype is static since it is only used for pretraining the model and to get an estimate of the difficulty of the intermediate mazes. The final adaptation to the player is left to the DRL algorithm that generates the maze. In retrospect, we think that adapting the simulation to better reflect the player, similar to [19] , might speed up the adaption. In particular, we think that this might have prevented the perceived drop in difficulty for participants that were satisfied with the first maze since it could account for players getting used to the game. b) Keep levels short: We tried to create short mazes with less than 8 rooms in most mazes. However, physically doing the exercises took longer than we anticipated. The participants in our study spent up to 10 minutes in each maze. This drastically reduces the number of learning steps that can be done in a given amount of time and therefore slows down adaptation. In the future, we will aim to create even shorter levels. For instance, we would like to explore other methods for creating variants of our exercise rooms with different difficulty, instead of only increasing the repetitions. In this paper, we show a first prototype of how procedural generation can be incorporated into the DDA system of a VR exergame using deep reinforcement learning. The results of our exploratory user study are promising and indicate that the generated levels can indeed adapt both the required physical and cognitive effort according to the player's capabilities. However, our particular prototype showed some problems that could guide the design of future systems. Two main problems were that the generated levels took too long to allow for fast adaption and that we used a static user simulation. Based on our results, we are confident that procedural generation of levels with appropriate difficulty, in addition to traditional in-game parameter adjustment, can improve future DDA systems for exergames. Embedding a VR game studio in a sedentary workplace: Use, experience and exercise benefits Virtual reality-based exercise with exergames as medicine in different contexts: A short review Physical activity and public health: updated recommendation for adults from the american college of sports medicine and the american heart association Flow: The psychology of optimal experience Toward a psychology of optimal experience Designing a personalized VR exergame Approaches for increasing patient's engagement and motivation in exergames-based autonomous telerehabilitation Astrojumper: Motivating exercise with an immersive virtual reality exergame Heart rate and breathing variability for virtual reality game play The case for dynamic difficulty adjustment in games Dynamic difficulty adjustment (DDA) in computer games: A review Challengesensitive action selection: an application to game balancing Go with the flow: Reinforcement learning in turn-based battle video games MPRL: multiple-periodic reinforcement learning for difficulty adjustment in rehabilitation games Polymorph: dynamic difficulty adjustment through level generation Towards automatic personalized content generation for platform games Finding game levels with the right difficulty in a few trials through intelligent trial-and-error Dynamic difficulty adjustment in 2d platformers through agent-based procedural level generation Designing procedurally generated levels Optimising engagement for stroke rehabilitation using serious games Designing engaging, playable games for rehabilitation Towards customizable games for stroke rehabilitation Therapeutic games' difficulty adaptation: An approach based on player's ability and motivation Adaptive difficulty in exergames for parkinson's disease patients Dynamic difficulty adjustment in exergames for rehabilitation: a mixed approach Dynamic difficulty adjustment with evolutionary algorithm in games for rehabilitation robotics Rückenfitness: Grundlagen,Übungen, Spiele. Limpert Pcgrl: Procedural content generation via reinforcement learning Human-level control through deep reinforcement learning A comparison between three rating scales for perceived exertion and two different work tests The Game Experience Questionnaire Systematic review and validation of the game experience questionnaire (GEQ) -implications for citation and reporting practice Validation of two game experience scales: The player experience of need satisfaction (PENS) and game experience questionnaire (GEQ) ACKNOWLEDGMENT This work presents and discusses results in the context of the research project ForDigitHealth. The project is part of the Bavarian Research Association on Healthy Use of Digital Technologies and Media (ForDigitHealth), funded by the Bavarian Ministry of Science and Arts. We thank our students Peter Fefelow, Nikolai Glaab, David Makowski, Luitpold Reiser, Alexander Renk, Rusmin Spahic, Sebastian Spolwind, Leon Wöhrl and Dennis Zürn for helping us to implement the prototype. We thank Patrick Dohle and Stefan Künzell for helping us to design the physical exercises.