A human-like learning control for digital human models in a physics-based virtual environment HAL Id: hal-01171441 https://hal.archives-ouvertes.fr/hal-01171441 Submitted on 3 Jul 2015 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. A human-like learning control for digital human models in a physics-based virtual environment Giovanni de Magistris, Alain Micaelli, Paul Evrard, Jonathan Savin To cite this version: Giovanni de Magistris, Alain Micaelli, Paul Evrard, Jonathan Savin. A human-like learning control for digital human models in a physics-based virtual environment. Journal of Visualization and Computer Animation, John Wiley & Sons, 2015, pp.423-440. �10.1007/s00371-014-0939-0�. �hal-01171441� https://hal.archives-ouvertes.fr/hal-01171441 https://hal.archives-ouvertes.fr Noname manuscript No. (will be inserted by the editor) A human-like learning control for digital human models in a physics-based virtual environment Giovanni De Magistris · Alain Micaelli · Paul Evrard · Jonathan Savin Received: / Accepted: Abstract This paper presents a new learning con- trol framework for digital human models in a physics- based virtual environment. The novelty of our con- troller is that it combines multi-objective control based on human properties (combined feedforward and feed- back controller) with a learning technique based on hu- man learning properties (human-being’s ability to learn novel task dynamics through the minimization of insta- bility, error and effort). This controller performs multi- ple tasks simultaneously (balance, non-sliding contacts, manipulation) in real time and adapts feedforward force as well as impedance to counter environmental distur- bances. It is very useful to deal with unstable manipula- tions, such as tool-use tasks, and to compensate for per- turbations. An interesting property of our controller is that it is implemented in cartesian space with joint stiff- ness, damping and torque learning in a multi-objective control framework. The relevance of the proposed con- trol method to model human motor adaptation has been demonstrated by various simulations. G. De Magistris CEA, LIST, LSI, rue de Noetzlin, Gif-sur-Yvette, F-91190 France Tel.: +33 7 61 30 84 66 E-mail: giovanni demagistris@hotmail.it E-mail: giovanni.de-magistris@cea.fr A. Micaelli CEA, LIST, LSI, rue de Noetzlin, Gif-sur-Yvette, F-91190 France P. Evrard CEA, LIST, LSI, rue de Noetzlin, Gif-sur-Yvette, F-91190 France J. Savin Institut national de recherche et de sécurité (INRS), rue du Morvan, CS 60027, Vandœuvre-lès-Nancy, F-54519 France Keywords Digital Human Model · Motion Control · Bio-Inspired Motor Control · Virtual Reality 1 Introduction Digital human model (DHM) technique is rapidly emerging as an enabling technology and a unique line of research for the verification of human factors issues in industry, which is the general purpose of our work. In order to evaluate the physical (biomechanical) as- pects of working conditions, several software packages have been developed to facilitate ergonomic assessment, such as SAMMIE [55], JACK [4], Ergoman [59] and SANTOSHuman [69,68]. Simulations computed with these software packages usually rely on kinematic an- imation frameworks. Such frameworks use either pre- recorded motions obtained by a tracking system and motion capture or interactive manual positioning of the DHM body through a mouse, menus and keyboard. In the first case, simulations are realistic but they require extensive instrumentation of a full scale mock-up of the future workstation or a similar existing one. They are extremely time consuming because of motion cap- ture data processing [6]. Furthermore, their ability to predict complex human postures and movements for various sizes and dimensions in a timely and realistic manner is strictly dependent on the accuracy of the motion database. In the second case, simulations are clearly subjective (the designer, possibly with no spe- cific skill in ergonomics, chooses arbitrarily a posture or trajectory). Again, they are time consuming (built up like a cartoon) and usually appear unnatural [13], even though these digital manikins possess semi-automatic controls provided by a set of behaviours, such as gaz- ing, reaching, walking and grasping. These issues do 2 Giovanni De Magistris et al. Fig. 1: Adaptive and Learning controller not encourage designers to consider alternative scenar- ios, which would be beneficial for a comprehensive as- sessment of the future work situation. Moreover, such software packages are subject to numerous limitations: since they are restricted to static models and calcu- lation, they neglect dynamic aspects. Neither do they consider contact forces between the DHM and objects (at best the designer has to arbitrarily set both contact force magnitude and direction manually). For these rea- sons, assessment of biomechanical risk factors based on simulations of industrial or experimental situations may lead to real stress underestimation of up to 40-50% [43]. A challenging aim therefore consists in developing a DHM capable of performing tasks as an artificial human-being through dynamically consistent motions, behaviours and internal characteristics (positions, ve- locities, accelerations and torques) based on a simple description of the future work task, in order to achieve realistic ergonomics assessments of various work task scenarii at an early stage of the design process. 2 Human behaviours To achieve this goal, a multi-objective DHM controller based on human behaviours using simulated physics is presented in this article. In our simulation framework, the entire motion of the human model in the virtual environment is ruled by real-world Newtonian physi- cal and mechanical simulation, along with automatic control of applied forces and torques. To develop this controller, we chose to take into account the following important behaviours of human motor control: 1. spring-like behaviour: Won and Hogan [71] noted that muscle elastic properties and reflexes produce a restoring force to an undisturbed trajectory when the hand is slightly perturbed, as a spring between the hand and the planned trajectory. The mechan- ical impedance (strength of these spring-like prop- erties) increases with endpoint force [25] or muscle activation [39] and it is adapted to counter environ- mental disturbances [48]. This behaviour is imple- mented in the feedback part of our controller. 2. anticipatory capabilities: When a multibody system gets in touch with an object, it is important to make the limb more compliant to avoid “contact instabil- ity” [31]. An important conclusion, which consis- tently emerges from the theoretical analysis, is that mechanics needs a feedforward control. A number of studies have shown that the nervous system uses internal representations to anticipate the consequences of dynamic interaction forces. In particular, Lackner and Dizio [40] demonstrated that the central nervous system (CNS) is able to predict the centripetal and Coriolis forces; Grib- ble and Ostry [26] demonstrated the compensation of interaction torques during multijoint limb move- ment. These studies suggest that the nervous sys- tem has sophisticated anticipatory capabilities. We therefore need to design accurate internal models of body dynamics and contacts. Generally, a feedforward control model is based on the anticipatory computation of the forces that will be needed to carry out a desired motion plan, with- out sensory information. The CNS therefore needs A human-like learning control for digital human models in a physics-based virtual environment 3 an internal representation or an inverse model of the human model and environment. This control technique is fast and does not have the instability risk, but has an obvious drawback: the sensitivity to unexpected disturbances. The feedfor- ward control is not able to compensate for perturba- tions. If these disturbances can be measured, we can make on-the-fly correction of the movement. This method corresponds to the feedback control of our controller. 3. motion error minimization: Shadmehr and Mussa- Ivaldi [61] demonstrated that, trial after trial, the CNS reduces motion error through the compensa- tion of the environmental forces and the feedfor- ward control adaptation. An illustrative example is Kawato’s feedback error learning model [37], based on cooperation between two control mechanisms: a feedback loop, which operates in an initial train- ing phase, and a feedforward model, which subse- quently emerges. In this model, a feedback error is used as the learning signal for the feedforward model, which gradually compensates for any dy- namic disturbances, and thereby learns an internal model of the body dynamics. This learning con- trol model does not converge in unstable situations [52], while the controller described in this paper is more adapted to unstable interaction (see [72] and Sect. 9). 4. metabolic cost minimization: the CNS optimizes the arm impedance to achieve a desired margin of sta- bility while minimizing metabolic cost [10]. Following these human motor control behaviours, we developed a new whole body control based on feedfor- ward and feedback mechanisms (Fig. 1) inspired by the human ability to adapt force and impedance to deal with stable or unstable situations and to compensate for perturbations [72,24]. 3 Overview on adaptive and learning control Adaptive and learning controls found in the literature can be distributed into four groups: 1. Classical Adaptive – Gain Scheduling [2] – Model Reference Adaptive Control (MRAC) [35] – Self-tuning regulator [3] – Self-Oscillating Adaptive Systems (SOAS) [33] 2. Periodic Adaptive/Learning – Iterative Learning Control (ILC) [5] – Repetitive Control (RC) [41] – Run-to-Run control (R2R) [12] 3. Machine Learning – Reinforcement Learning [9,8] 4. Non-symbolic learning tool – Artificial neural network [32] – Fuzzy logic [16] – Genetic algorithms [62] In our study, we wanted to develop an algorithm adapted to unstable interactions, which are inevitable in our context (namely, verification of human factors in industrial work task design). In particular, our case study dealt with the task of clipping small metal parts to a plastic instrument panel of a vehicle [19]. In this work-task, we observed subjects performing the same task repeatedly. When we tried to simulate this task with a DHM, one way to compensate for the repetitive part of the error is to use periodic adaptive/learning control. With this type of controller, a robot performs the same task for numerous iterations, reducing the pe- riodic error at each following trial. If a task has reproducible dynamics or fixed envi- ronment, impedance control is used to impose a desired dynamic behaviour to the interaction between the robot end-effector and the environment [30,14]. The common control impedance techniques requires a reproducible dynamics (the target impedance model is fixed). For this reason, it is not adapted when the environment changes (the interaction may become unstable). One possible solution to perform unstable tasks is to increase impedance in order to deal with incorrect force arising from unknown dynamics. Yet, while higher impedance may increase stability in movement task, it may also lead to instability during interactions with a stiff environment. Common periodic adaptive control learns only force from the feedback error. Thus, it is inefficient in un- stable situations because the force will be different in each trial due to noise or external perturbations [70]. In addition, common ILC algorithms do not require a low mechanical impedance to obtain safety and energy minimization [28]. The algorithm developed below is more adapted to unstable interactions than common ILC algorithms, be- cause it allows to change the force in each trial [11] and to obtain low impedance. Learning the optimal force and impedance appropriate for different tasks can help the robot achieve them with minimum error and least amount of energy (as humans do [23]). 4 Giovanni De Magistris et al. Fig. 2: DHM with skinning and collision geometry (left). Right hand model with skinning and collision geometry (right) 4 Digital human model using simulated physics 4.1 Model of human body and dynamics In our study, the human body is kinematically modelled as a set of articulated rigid bodies (Fig. 2) organized into a redundant tree structure, which is characterized by its degrees of freedom (dof). Each articulation can be modelled into a number of revolute joints depending on the function of the corresponding human segment. Our DHM therefore comprises of 39 joint dof and 6 root dof, with 8 dof for each leg and 7 for each arm. The root is not controlled. For validation purposes, several DHMs have been dimensioned based on each subject’s anthropometry [29]. The dynamics of the DHM are described as a second order system as: MṪ +NT +G = Lτ + ∑ j JTcj Wcj + ∑ k JTendkW i endk (1) M is the generalized inertia matrix; Ṫ is the accelera- tion in generalized coordinates; NT is the centrifugal and coriolis forces; G is the gravity force in generalized coordinates; L is the matrix to select the actuated de- grees of freedom (L = [0 I]T with 0 the zero matrix and I the identity matrix); τ is the set of joint torques; J is the jacobian matrix; W is the wrench applied by digital human models on environments (W = [Γ T F T ]T with Γ the moment in cartesian space and F the force in cartesian space). In the notation of this paper, frames are denoted by subscripts as follows: – com: center of mass frame – c: non-sliding contacts at known fixed locations such as the contact points between the feet and the ground – end: end-effector frame – q: joint space – ρ: ρ-space – K, B, τ: the learning rate of stiffness, damping or torque Moreover the following superscripts are used: – min: joint stiffness, damping and feedforward torque required to maintain posture stability and to reduce the systematic deviation caused by the interaction with the environment – d: ”desired” values – l: the learned torque, stiffness or damping – ini: the initial torque, stiffness or damping – i: wrench derived from unknown contacts with en- vironment - interaction wrench – ff : feedforward – fb: feedback – ob: object 4.2 Contacts model Simulations were based on the XDE physics sim- ulation module developed at the CEA-LIST (http://www.kalisteo.fr/lsi/en/aucune/a-propos- de-xde). This module manages the whole physics simulation in real time, including accurate and robust A human-like learning control for digital human models in a physics-based virtual environment 5 constraint-based methods for contact and collision resolution [45,46]. Friction effects were modelled in compliance with Coulomb’s friction law, which can be formulated as: ∥fxy∥ ≤ µ ∥fz∥ (2) with ∥fxy∥ being the tangential contact force, µ the dry friction factor and ∥fz∥ the normal contact force. 4.3 Hand model The hand model, illustrated in Fig. 2, has 20 dof. To control joint positions θ, we use a simple Proportional- Derivative controller. The desired joint positions θ are a set of desired positions θd corresponding to different preset grasps. 5 Adaptive controller based on human behaviours Corresponding to the above analysis of human motion control, we propose a human-like learning controller (Fig. 1) composed of feedforward and feedback controls, both of which are adapted during movements. This con- troller is inspired by the works of Yang et al [72] and Ganesh et al [24]. The proposed controller can deal with both stable and unstable conditions. The learned stiffness, damping and feedforward torque compensate for external pertur- bations. This behaviour is similar to human adaptation [65]. 5.1 Cartesian controller with joint stiffness, damping and torque learning We describe a cartesian controller that, given a target in cartesian space, learns joint space parameters. An in- teresting property of this controller is that although it is a cartesian controller, its impedance is learned and dis- tributed according to the limbs’ dynamics. As demon- strated in [44], to control limb stiffness and stability, the CNS must increase joint stiffness when an external force is applied to the hand. This result is obtained with our controller in Sect. 9. The desired cartesian space impedance is: Kend = J †T end,ρ ( Kρ − ∂JTend,ρ ∂ρ W iend ) J † end,ρ Bend = J †T end,ρBρJ † end,ρ (3) K is the stiffness matrix; B is the damping matrix; J† = M−1JT (JM−1JT )−1 is the dynamic pseudoin- verse matrix with J a full rank matrix; ρ = Sq. S is a matrix to select a part of the actuated degrees of free- dom (S = [I 0]) to obtain a dyamic model independent of non-sliding contact forces at known fixed locations in Eq. 1 such as the contacts between the feet and the ground (see Appendix A). 5.2 Overall cost function As explained in Sect. 2, the CNS minimizes the motion error cost ME(t) (Eq. 5) and the metabolic cost MC(t) [10] (to learn impedance and feedforward torque, a hu- man does not spend extra effort (Eq. 6)). We therefore set our overall cost function C(t) as: C(t) = ME(t) + MC(t) (4) with: ME(t) = 1 2 ϵT (t)[J †T end,ρMρJ † end,ρ]ϵ(t) (5) and: MC(t) = 1 2 ∫ t t−D Φ̃T (σ)Q−1Φ̃(σ)dσ (6) Mρ is the inertia matrix (see Appendix A) and Q = diag(I ⊗ QK, I ⊗ QB, Qτ ). ϵ is the tracking error commonly used in robotics [63] defined as: ϵ = δ(V d, V r) + bδ(Hd, Hr) ∈ se(3) (7) with Hr ∈ SE(3), Hd ∈ SE(3), V r ∈ se(3) and V d ∈ se(3), where SE(3) is the special Euclidian group and se(3) is the Lie algebra of SE(3). δ(Hd, Hr) denotes the displacement (position and orientation) error between the desired and current state; δ(V d, V r) denotes the velocity (linear and an- gular velocity) error between the desired and current state. Φ̃(t) = Φ(t) − Φd(t) = [vec(Klρ(t)) T , vec(Blρ(t)) T , (τlρ(t)) T ]T −[vec(Kminρ (t))T , vec(Bminρ (t))T , (τminρ (t))T ]T = [vec(K̃(t))T , vec(B̃(t))T , τ̃(t)T ]T (8) where vec(·) is the column vectorization operator, K̃ = Klρ(t) − Kminρ (t), K̃ = Blρ(t) − Bminρ (t) and τ̃ = τlρ(t)−τminρ (t). Kminρ , Bminρ and τminρ are joint stiffness, damping and feedforward torque required to maintain posture stability and to reduce systematic deviation caused by the interaction with the environment (see Appendix B). In Eq. 8, the function Φ(t) that adapts stiffness, damping and feedforward torque tends to the minimal value Φd(t) with a metabolic cost minimization [10]. 6 Giovanni De Magistris et al. To measure stability, we use the motion error cost ME in Eq. 5. If there exists δ > 0 such that ∫ t1 t ṀE(σ)dσ < δ, (9) then human interaction with an environment is stable in [t, t1] [36]. 5.3 DHM Torques Following the important behaviours of human motor control listed in Sect. 2, we propose a DHM controller composed of feedforward and feedback parts that are adapted during trials: τρ = Sτ ff + Sτfb − τlρ (10) where τff is the torque to compensate for DHM dynam- ics (feedforward part of our controller in Sect. 8.1); τlρ (Eq. 16) is the learned feedforward torque that depends on the feedback error. τfb = −LT (JTcomFcom + JTendW d end + J T c ∆fc) is the torque to compensate trajectory errors (feedback part of our controller in Sect. 8.2). ∆fc is the contact forces. W dend is the desired task wrench in Eq. 11 that adapts stiffness and damping in Eq. 8. The desired task wrench W dend is computed by us- ing an adaptive proportional-derivative (PD) feedback control law: W dend = K l endδ(H d, Hr) + Blendδ(V d, V r) + Biniendϵ = (Klend + bB ini end)δ(H d, Hr) + (Blend + B ini end)δ(V d, V r) (11) Kend and Bend denote the cartesian stiffness and damp- ing matrix respectively. As explained in Sect. 5.1, our controller learn joint space parameters using Eqs. 13 and 14. To pass from joint to cartesian stiffness and damping, we use the Eq. 3. It is important to remember that joint-space and ρ-space are related by the relationship ρ = Sq. The Biniend is chosen according to: Biniend = J †T end,ρB ini ρ J † end,ρ (12) with Biniρ being a symmetric positive definite matrix with minimal eigenvalue λmin(B ini ρ ) ≥ λB > 0. This minimal feedback matrix insures stable and compliant motion control. It corresponds to the mechanical prop- erties of the passive muscles of the human relaxed arm [54]. 5.4 Learning laws In order to vary the mechanical control of a limb over time, the cerebellum plays an important role in the hu- man motor learning process, forming and storing asso- ciated muscle activation patterns. According to Smith [64], stiffness varies throughout the movement. Based on human properties detailed in Sect. 2, stiffness Klρ(t) and damping Blρ(t) are adapted as follows: Klρ(t, k + 1) = K l ρ(t, k) +QK{J † end,ρ[ϵ(t, k)δ(H d, Hr)T ]J †T end,ρ − γ(t)K l ρ(t, k)} (13) Blρ(t, k + 1) = B l ρ(t, k) +QB{J † end,ρ[ϵ(t, k)δ(V d, V r)T ]J †T end,ρ − γ(t)B l ρ(t, k)} (14) with Klρ(t, k = 0) = 0[nρ,nρ] and B l ρ(t, k = 0) = 0[nρ,nρ], t ∈ [0, D), QK and QB are symmetric positive definite constant gain matrices. The forgetting factor of learning γ is defined by: γ(t) = p 1 + u ∥ϵ(t)∥2 (15) with positive p and u values. To obtain convergence, we need γ(t) > 0 (see Appendix B). The learning response speed can be tuned through the choice of p and u. If γ(t) is large, torque and impedance learning will be slow; if γ(t) is small, we will obtain slow torque and impedance unlearning. Unlike the constant value of γ in [24], the time vary- ing definition of γ in Eq. 15 has the following advan- tage: when ϵ(t) is large, γ(t) is small and vice versa. For this reason, we have a controller that quickly in- creases torque and impedance during bad tracking per- formance and quickly decreases torque and impedance during good tracking performance. The learned feedforward torque is adapted through: τlρ(t, k + 1) = τ l ρ(t, k) + Qτ [J † end,ρϵ(t, k) − γ(t, k)τ l ρ(t, k)] (16) with τlρ(t, k = 0) = 0[nρ,1], t ∈ [0, D], Qτ a symmetric positive definite constant matrix. The diagonal learning rate matrices QK, QB and Qτ are empirically chosen. In particular we choose QK > Qτ because human stiffness increases faster than feedforward torque [10] and QB = QK/b according to Eq. 11. 6 Trajectory planner based on human psychophysical principles A movement can be characterized, independently of the end-effector, by: – the initial and final points of the trajectory (position and orientation) A human-like learning control for digital human models in a physics-based virtual environment 7 – obstacle positions (via-points of the trajectory) – duration Experimental study of human movements has shown that voluntary movements obey the following three ma- jor psychophysical principles: – Hick-Hyman’s law: the average reaction time TRave required to choose among n probable choices depends on their logarithm [34]: TRave = d log2(n + 1) (17) – Fitts’ law: the movement time depends on the log- arithm of the relative accuracy (the ratio between movement amplitude and target dimension) [21]: D = g + z log2(2ΥP) (18) where D is the duration time, Υ is the amplitude, P is the accuracy, and g and z are empirically de- termined constants. – Kinematics invariance: hand movements have a bell-shaped speed profile in straight reaching move- ments [50]. The speed profile is independent of the movement direction and amplitude. For more com- plex trajectories (i.e. handwriting) the same princi- ple predicts a correlation between speed and curva- ture [51] described as a 2/3 power law: ṡ(t) = ZsR 1− 2 3 (19) where ṡ(t) is the tangential velocity, R is the radius of curvature and Zs is a proportionality constant, also termed ”velocity gain factor”. For this reason, more complex trajectories can be divided into overlapping basic trajectory similar to reaching movements. Such spatio-temporal invari- ant features of normal movements can be explained by a variety of criteria of maximum smoothness, such as the minimum jerk criterion [22] or the min- imum torque-change criterion [67]. We implemented a modified minimum jerk criterion with via-points to calculate trajectories and avoid ob- stacles. The original minimum-jerk model in [22] may fail to predict the hand path and can only be applied to av- erage data because it predicts a single optimum move- ment for given via-points. Unlike the original minimum jerk model, the 2/3 power law can be applied to all movements. The main problem with this method is the formula, which predicts speed from paths. In this study, we therefore chose Todorov’s model [66], which com- bines the original minimum-jerk model and the 2/3 power law model and uses a path observed in a spe- cific trial to predict the speed profile. Todorov’s model substitutes a smoothness constraint for the 2/3 power law (see Appendix C.2). This model is validated and compared to the 2/3 power law in [66] for four tasks with a specified path. For a given hand path in space, Todorov’s model [66] assumes that the speed profile is the one that minimizes the third derivative of position (also named ”jerk”): Jerk = ∫ D 0 ∥∥∥∥ d3dt3 r[s(t)] ∥∥∥∥2 (20) with r(s) = [x(s), y(s), z(s)] a 3D hand path and s is the curvilinear coordinate. According to this approach, minimization is performed only over the speed profiles because the path is specified. Formal definition of the inside term of the integral in Eq. 20 is in Appendix C.1. In the original minimum jerk model [22], the mini- mum jerk trajectory is a 5th-order polynomial in t. Us- ing the end-point constraints, we can compute the coef- ficients of this polynomial. The trajectory and speed are found by a given set of via-points and thus, the hand is constrained to pass through the via-points at defi- nite times. To calculate the minimum jerk trajectory, it is necessary to give passage times TP , positions x, ve- locities v and accelerations a. In the Todorov’s model, the passage times TP are not defined a priori, but are determined by the algorithm explained below. To find the optimal jerk for any given passage times TP and intermediate points x, Todorov’s model mini- mizes the jerk with respect to v and a by setting the gradient to zero and solving the resulting system of lin- ear equations. To find the intermediate times TP , the method uses a nonlinear simplex method to minimize the optimal jerk over all possible passage times. In the same way as for translations, the speed profile of a rotation is the one that minimizes the third deriva- tive of orientation (or ”jerk”), with a 3D rotation path r(s) = [α(s), β(s), γ(s)]. In brief, to calculate the minimum jerk trajectory for the rotations and the translations, we need to pro- vide the positions X, the initial and final velocities V and the initial and final accelerations A. An illustrative example of a minimum jerk trajec- tory simulation is given in [19] and a comparison be- tween real human data and simulations is given in [20]. 7 Duration time based on human laws Duration times are a-priori chosen following the 3D Fitt’s law proposed in [27] for a pointing task. The reach and position states are similar to a pointing task at trivariate targets, and therefore we use the equation in 8 Giovanni De Magistris et al. Fig. 3: w, h and d measurements for the 3D Fitt’s law [27] to calculate movement time D: D ≈ 56 + 5208 log2 (√ fw (θ) ( Υ w )2 + 1 9.2 ( Υ h )2 + fd (θ) ( Υ d )2 + 1 ) (21) with fw (0 ◦) = 0.211, fw (45 ◦) = 0.242, fw (90 ◦) = 0.717, fd (0 ◦) = 0.194, fd (45 ◦) = 0.147 and fd (90 ◦) = 0.312. Υ is the distance (or amplitude), θ is the move- ment angle (the human user’s axis of movement), w is the width measured along movement axis, h is the height measured along Z-axis, and d is perpendicular to both (see Fig. 3). 8 Feedforward and feedback control The optimization framework (see Fig. 4) is based on a combined anticipatory feedforward and feedback con- trols system based on underlying notions of the accel- eration - based control method [1,15] and a Jacobian- Transpose (JT) control method [57,42,18]. These controllers are formulated as two successive Quadratic Programming (QP) controllers (Fig. 4), each of them dealing with a great number of dof and solving simultaneously all constraint equations. The controller is introduced to compute joint torques that achieve different objectives and satisfies different constraints. In our multi-objective control, a task means that a certain frame on the DHM body should be transferred from an initial state to a desired state. 8.1 Feedforward During the feedforward phase, the objectives are: 1. Objective based on acceleration control. This feedfor- ward action compensates for the low frequency, rigid body behaviour of the DHM dynamics. The goal is to minimize the difference between actual acceler- ation A and desired acceleration Ad found by the minimum jerk trajectory planner. A is expressed in terms of the unknowns of the sys- tem Ṫ as:{ V = JT A = JṪ + J̇T (22) with J being the Jacobian matrix expressed in its own frame. 2. Regularization for QP problem: To regularize the QP problem, we set the desired torque τd, the de- sired contact force fdc and the desired acceleration Ṫ d to zero. During the feedforward phase, the constraints are: 1. Dynamic equation. As explained in Sect. 2, the CNS is able to predict dynamics. We therefore set the DHM dynamics in Eq. 1 as a feedforward constraint. 2. Contact point accelerations. To help maintain con- tacts, contact acceleration must be null. Ac = JcṪ + J̇cT = 0 (23) 3. Non-sliding contacts. The non-sliding contacts are expressed as a set of inequality constraints. Con- tact constraints are imposed at contact points be- tween the feet and the ground. The contact force fc should remain within the friction cone. The lin- earized Coulomb friction model [1] is applied, in which the friction cone of each contact is approx- imated by a four-faced polyhedral convex cone. The contact constraints are formularized as: Ecifci + dci < 0 (24) where Eci is the approximated friction cone, and dci is a customer defined margin vector so that the projection of fci on the normal vector of each facet of the friction cone should be kept larger than dci. A human-like learning control for digital human models in a physics-based virtual environment 9 Fig. 4: Block diagram of the cartesian control framework We summarize the feedforward phase as: Ô = arg min 1 2 τff ,Ṫ ,fc ∥∥∥∥∥∥∥   τffṪ fc   −   τff d Ṫ d fdc   ∥∥∥∥∥∥∥ 2 Q (25) subject to:  MṪ + NT + G = Lτ + JTc fc Ecfc + dc ≥ 0 JcṪ + J̇cT = 0 (26) The optimization objective is the same for each task, which is to minimize the error between the variable and its desired value. The objectives are combined in the diagonal weight matrix Q. These values are chosen ac- cording to the priorities of different objectives. With this optimization, we obtain τff , fc, Ṫ . 8.2 Feedback In the feedback part, for each task, we imagine that a virtual wrench is applied at a certain frame on the DHM body to guide its motion towards a given tar- get. These virtual wrenches are computed by solving an optimization problem. To obtain this, in the feedback phase the objectives are: 1. COM position. The dynamic controller maintains the DHM balance by imposing that the horizontal plane projection of the COM lie within a convex support region [7]. For this COM-tracking objective, we consider only the force component and F dcom is obtained by using a PD control in ℜ3 to measure the error between the actual and desired COM po- sitions. F dcom = Kcom(x d com − xrcom) + Bcom(vdcom − vrcom) (27) where Kcom and Bcom are the proportional and derivative gain matrix respectively. 2. End-effector. The end-effector task is used for per- forming some specific motions. In this paper the ob- jective is to realize point to point movement with the human-based learning controller in Eq. 11. 10 Giovanni De Magistris et al. 3. Minimize the difference between actual contact force and feedforward contact force. ∆fdc = 0(3nfc ,1) with Q∆fc = w∆fcI3nfc In the feedback phase the constraints are: 1. Static equilibrium. The wrenches are constrained by the static equilibrium of the DHM: Lτfb = −JTcomFcom −J T endW d end − ∑ i Jci T ∆fci (28) 2. Non-sliding contacts Ec(fc + ∆fc) + dc ≥ 0 (29) We summarize the feedback phase as: Ô = arg min 1 2 Fcom,τ fb,∆fc ∥∥∥∥∥∥   FcomWend ∆fc   −   F dcomW dend ∆fdc   ∥∥∥∥∥∥ 2 Q (30) subject to:{ Lτfb =−JTcomFcom − JTendW d end − J T c ∆fc Ec(fc + ∆fc) + dc ≥ 0 (31) The optimization objective is the same for each task, which is to minimize the error between the variable and its desired value. The objectives are combined in the diagonal weight matrix Q. These values are chosen ac- cording to the priorities of different objectives. With this optimization, we obtain Fcom, Wend, ∆fc. The feedback joint torque is equal to: τfb = −LT (JTcomFcom + J T endW d end + J T c ∆fc) (32) 9 Results Our simulation framework requires a PC running a Python 2.7 environment with XDE modules. With a simulation step of 0.01 s, the joint torques are calculated in quasi-real-time (computation duration is 1.5 times the simulation duration) on a PC equipped with an Intel Xeon E5630 (12M Cache, 2.53 GHz Pro- cessor, 24 Gb of RAM). Several simulations have been made using our new joint stiffness, damping and torque-learning cartesian controller. A first case-study dealt with a fictional hand task, a second case-study dealt with an experimental as- sembly task. All simulations consisted of controlling a 45 dof DHM, with 6 dof for the root position and orien- tation, using actuators/muscle producing joint torques τ in a 6-dimensional Cartesian task space characterized by an interaction external wrench W iend while tracking a minimum-jerk task reference trajectory detailed in Sect. 6. The wrench is derived from the contacts or given by an imposed wrench field. There are four contact points on each foot. During the experimental task, we observed that torso orientation varied very little. We therefore add an objective to maintain the desired torso orientation equal to its initial orientation. The optimization weights for the different objectives are: 104 for the COM, 5·103 for the right hand task, 101 for the posture, 102 for the head, 102 for the torso, 100 for the contact task and 102 for the gravity compen- sation. These weights are empirically chosen based on the estimated importance and priorities of the different objectives. The learning rate matrices QK, QB and Qτ in [72] have been changed for different applications and they are empirically chosen based on the importance and priorities of the different objectives. We choose Qk > Qτ because human stiffness in- crease faster than feedforward torque [10]. The con- troller parameters are selected as QK = diag[8.](nρ,nρ), QB = diag[0.8](nρ,nρ), Qτ = diag[1.](nρ,nρ), a = 0.2, u = 5, b = 10 for all simulations. In [20], we used this controller to simulate an ex- perimental insert clipping activity in quasi-real-time and applied the simulated postures, time and exertions to an OCRA index-based ergonomic assessment [53]. Given only scant information on the scenario (typically initial and final operator-positions and clipping force), the simulated ergonomic evaluations were in the same risk area as those based on experimental human data. In addition, DHM trajectories are similar to real tra- jectories. 9.1 Free hand movement The first case studied is a point-to-point movement: the right hand goes from the initial right hand position to the insert position. At the start of the simulation, the insert is placed on the table in Fig. 8a, and the DHM body is upright and its arms are along the body. This reproduces the grasping action of the experimental task in [19]. The movement duration D = 1.3 s is chosen ac- cording to the 3D Fitt’s law proposed in Sect. 7 for a pointing task. A constant interaction external wrench W extend = [0 N · m, 0 N · m, 0 N · m, 3 N, 3 N, 3 N]T is applied to the right hand during the motion. Adaptation is simu- lated for 225 iterations. At the end of each iteration the joint position is reset to the start point and the joint velocity and acceleration are reset to zero. A human-like learning control for digital human models in a physics-based virtual environment 11 In the first phase (iterations 1-75) interaction wrench is absent. In the second phase (iterations 76- 149) a constant perturbation wrench W iend = W ext end is applied to the right hand. In the third phase (iterations 150-225) interaction wrench is absent. As demonstrated in Fig. 5, DHM increases its joint stiffness in order to maintain limb stability in the pres- ence of applied external forces at the hand [10,44]. We note that the error decreases (see Fig. 6) and therefore, initial divergent trajectories become conver- gent and successful after learning. We observe similar pattern of stiffness and feedfor- ward torque (Fig. 7) of the experiments in [10]. This is derived from stiffness and damping adaptation to com- pensate for unstable interaction, without a large modi- fication of the feedforward torque. We note that the limb stiffness converges to small values when no forces are applied at the hand. The lower-magnitude joint stiffness is typical of a human subject acting with a zero force field [44]. 9.2 Simulation of a insertion with a virtual object In the second case-study, we simulate the insertion task in [19]. The interaction external wrench is derived from the contacts between the insert and the virtual object represented in Fig. 8a. The digital mock-up (DMU) scenario is represented in Fig. 8a. This reproduces the experimental environ- ment in [19] by ensuring geometric similarity. The in- puts used to build the DMU scenario are the workplace spatial organization (x, y and z dimensions), inserts and tool descriptions (x, y, z positions and weight) and the DHM position. In Figs. 9 and 8b we show the results of the simula- tion when the right hand goes from the virtual object center xob to the x = xob + [0 m, 0.03 m, −0.02 m] po- sition (the reference frame is represented in Fig. 8a). Adaptation is simulated for 50 iterations. We note in Fig. 8b that the asymptotic force slightly decreases. We have demonstrated this human behavior by human subject experiments [17]. 10 Conclusion and future works In this paper, we have described a multi-objective con- trol of digital human models based on human-being’s ability to learn novel task dynamics through the mini- mization of instability, error and effort. Our controller has been validated with a 45 dof DHM. For this paper, we applied our algorithm to a rather simple case study, of limited impact relatively to the complexity of ac- tual work gestures. In order to confirm the encouraging results and to give the desired genericity to our con- troller and DHM, we plan to do additional theoretical and practical works. One improvement will be to enrich prehension sim- ulation. For our case study, we explicitly specified the type of grasp (palmar, pinch, full-handed) and orien- tation of the object in the operator’s hand, according to the final orientation (the object is attached to the hand). In the near future, we plan to introduce prehen- sion functions in our kinematic model. In order not to make our kinematic model heavier (20 segments and 28 additional dof per hand [49]), we propose to replace the wrist and fingers by a dedicated end-effector whose characteristics (number of joints, types, rotational and translational range) would mimic the dof observed for each type of grasp [47]. For example, this effector would have more dof in pinch mode than in full-handed grasp mode. Another improvement will involve the parametriza- tion of the controller. Actually, our controller needs sev- eral parameters to be set, for instance tasks weights. In our case study, we set those parameters empirically. To improve the genericity of our algorithm, the tasks weights could be automatically calculated without re- quiring any manual tuning [58]. References 1. Abe, Y., Silva, M.D., Popović, J.: Multiobjective control with frictional contacts. In: Proc. ACM SIGGRAPH/EG Symposium on Computer Animation, pp. 249-258. Airela- Ville, Switzerland (2007) 2. Andreiev, N.: A process controller that adapts to signal and process conditions. Control Engineering 38 (1977) 3. Astrom, K., Borrison, U., Wittenmark, B.: Theory and application of self-tuning regulators. Automatica 13, 457- 476 (1977) 4. Badler, N.: Virtual humans for animation, ergonomics, and simulation. In: Proceedings of the IEEE workshop on non- rigid and articulates motion, pp. 28-36 (1997) 5. Bien, Z., Xu, J.: Iterative learning control: analysis, design, integration and applications. In: Kluwer Academic Publish- ers Norwell. MA, USA (1998) 6. Bradwell, B., Li, B.: A tutorial on motion capture driven character animation. In: Eight IASTED International Con- ference Visualization, Imaging, and Image Processing. Palma de Mallorca (2008) 7. Bretl, T., Lall, S.: Testing static equilibrium for legged robots. IEEE Transactions on Robotics 24, 794-807 (2008) 8. Buchli, J., Stulp, F., Theodorou, E., Schaal, S.: Learning variable impedance control. The International Journal of Robotics 30, 820-833 (2011) 9. Buchli, J., Theodorou, E., Schaal, S.: Reinforcement learn- ing of full-body humanoid motor skills. In: 10th IEEE- RAS International Conference on Humanoid Robots (Hu- manoids), pp. 405-410 (2010) 12 Giovanni De Magistris et al. (a) Joint stiffness (b) Joint damping Fig. 5: Learned joint stiffness and damping (mean over one period) (a) (b) Fig. 6: Cartesian position [m] and orientation [rad] error (mean over one period) (a); Cartesian linear [m/s] and angular [rad/s] velocity error (mean over one period) (b) Fig. 7: Learned force (mean over one period) A human-like learning control for digital human models in a physics-based virtual environment 13 (a) (b) Fig. 8: DMU scenario and virtual object to model insertion with the stiffness Kobj of the object equal to 1000 N/m in the four direction (a); Average interaction force during insertion (b); (a) (b) Fig. 9: Learned joint stiffness during insertion (mean over one period) (a); Cartesian position [m] and orientation [rad] error during insertion (mean over one period) (b) 10. Burdet, E., Osu, R., Franklin, D., Milner, T., Kawato, M.: The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature 414, 446-449 (2001) 11. Burdet, E., Tee, K.P., Mareels, I., Milner, T.E., Chew, C., Franklin, D.W., Osu, R., Kawato, M.: Stability and motor adaptation in human arm movements. Biological Cybernet- ics 94, 20-32 (1998) 12. Castillo, E.: Run-to-run process control: literature review and extensions. J. Qual. Technol. 29, 184-196 (1997) 13. Chaffin, D.: Human motion simulation for vehicle and workplace design. Human Factors and Ergonomics in Man- ufacturing 17, 475-484 (2007) 14. Cheah, C., Wang, D.: Learning impedance control for robotic manipulators. IEEE Transactions on Robotics and Automation 14, 452-465 (1998) 15. Colette, C., Micaelli, A., Andriot, C., Lemerle, P.: Robust balance optimization control of humanoid robots with mul- tiple non coplanar grasps and frictional contacts. In: Pro- ceedings of the IEEE International Conference on Robotics and Automation, pp. 3187-3193. Pasadena, USA (2008) 16. Commuri, S., Lewis, F.: Adaptive-fuzzy logic control of robot manipulators. In: IEEE International Conference on Robotics and Automation, vol. 3, pp. 2604-2609. Minneapo- lis, MN (1996) 14 Giovanni De Magistris et al. 17. De Magistris, G.: Dynamic digital human model control design for the assessment of the workstation ergonomics. PhD Thesis - Pierre and Marie Curie University (2013) 18. De Magistris, G., Micaelli, A., Andriot, C., Savin, J., Marsot, J.: Dynamic virtual manikin control design for the assessment of the workstation ergonomy. In: First In- ternational Symposium on Digital Human Modeling. Lyon (2011) 19. De Magistris, G., Micaelli, A., Evrard, P., Andriot, C., Savin, J., Gaudez, C., Marsot, J.: Dynamic control of DHM for ergonomic assessments. International Journal of Indus- trial Ergonomics 43, 170-180 (2013) 20. De Magistris, G., Micaelli, A., Savin, J., Gaudez, C., Marsot, J.: Dynamic digital human model for ergonomic assessment based on human-like behaviour and requiring a reduced set of data for a simulation. In: Second Interna- tional Digital Human Model Symposium 2013. Ann Arbor, USA (2013) 21. Fitts, P.: The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology 47, 381-391 (1954) 22. Flash, T., Hogan, N.: The coordination of arm move- ments: an experimentally confirmed mathematical model. Journal of Neuroscience 7, 1688-1703 (1985) 23. Franklin, D., Burdet, E., Osu, R., Tee, K., Chew, C., Mil- ner, T., Kawato, M.: CNS learns stable, accurate, and effi- cient movements using a simple algorithm. J Neuroscience 28, 11165-11173 (2008) 24. Ganesh, G., Albu-Schaeffer, A., Haruno, M., Kawato, M., Burdet, E.: Biomimetic motor behavior or simultaneous adaptation of force, impedance and trajectory in interac- tion tasks. In: IEEE International Conference on Robotics and Automation. Anchorage, Alaska, USA (2010) 25. Gomi, H., Osu, R.: Task-dependent viscoelasticity of hu- man multijoint arm and its spatial characteristics for in- teraction with environments. Journal of Neuroscience 18, 8965-8978 (1998) 26. Gribble, P., Ostry, D.: Compensation for interaction torques during single and multijoint limb movement. Jour- nal of Neurophysiology 82, 2310-2326 (1999) 27. Grossman, T., Balakrishnan, R.: Pointing at trivariate targets in 3d environments. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 447-454. New York (2004) 28. Haddadin, S., Albu-Schffer, A., Hirzinger, G.: Require- ments for safe robots: Measurements, analysis & new in- sights. International Journal on Robotics Research 28, 1507-1527 (2009) 29. Hanavan, E.: A mathematical model of the human body. Wright-Patterson Air Force Base Report No. AMRLTR- 102, 64-102 (1964) 30. Hogan, N.: Impedance control: an approach to manipulation-part i: Theory; part ii: Implementation; part iii: Applications. Transaction ASME J. Dynamic Systems, Measurement and Control 107, 11-24 (1985) 31. Hogan, N.: Mechanical impedance of single- and multi- articular systems. j. m. winters & s.l. woo. springerverlag. Multiple muscle systems: Biomechanics and movement or- ganization (1990) 32. Hovland, G., Sikka, P., McCarragher, B.: Skill acquisition from human demonstration using a hidden markov model. In: IEEE International Conference on Robotics and Au- tomation, vol. 3, pp. 2706-2711. Minneapolis, MN (1996) 33. Hsu, L.: Self-oscillating adaptive systems (soas) without limit-cycles. In: Proceedings of the American Control Con- ference, vol. 13. Albuquerque, New Mexico (1997) 34. Hyman, R.: Stimulus information as a determinant of reaction time. Journal of Experimental Psychology 45, 188- 196 (1953) 35. Ioannou, P., Kokotović, P.: Adaptive systems with re- duced models. Lecture Notes in Control and Information Sciences 47 (1983) 36. Jagannathan, S.: Neural Network Control of Nonlin- ear Discrete-Time Systems. CRC Press Taylor & Francis Group, FL (2006) 37. Kawato, M., Gomi, H.: A computational model of four regions of the cerebellum based on feedback error learning. Biological Cybernetics 69, 95-103 (1992) 38. Khatib, O., Sentis, L., Park, J., Warren, J.: Whole-body dynamic behavior and control of human-like robots. Inter- national Journal of Humanoid Robotics 01, 29-43 (2004) 39. Kirsch, R., Boskov, D., Rymer, W., Center, R., Center, M., Cleveland, O.: Muscle stiffness during transient and continuous movements of catmuscle: perturbation charac- teristics and physiological relevance. IEEE Transactions on Biomedical Engineering 41, 758-770 (1994) 40. Lackner, J., Dizio, P.: Rapid adaptation to coriolis force perturbations of arm trajectory. Journal of Neurophysiol- ogy 72, 299-313 (1994) 41. Li, C., Zhang, D., Zhuang, X.: A survey of repetitive con- trol. In: Proceedings of 2004 IEEE/RSJ International Con- ference on Intelligent Robots and Systems, pp. 1160-1166. Sendai, Japan (2004) 42. Liu, M., Micaelli, A., Evrard, P., Escande, A., Andriot, C.: Interactive dynamics and balance of a virtual character during manipulation tasks. In: IEEE International Confer- ence on Robotics and Automation, pp. 1676-1682. Shang- hai, China (2011) 43. Lämkull, D., Hanson, L., Ortengren, R.: A comparative study of digital human modelling simulation results and their outcomes in reality: A case study within manual as- sembly of automobiles. International Journal of Industrial Ergonomics 39, 428-441 (2009) 44. McIntyre, J., Mussa-Ivaldi, F., Bizzi, E.: The control of stable arm postures in the multi-joint arm. Exp Brain Res, pp. 248-264 (1996) 45. Merlhiot, X.: A robust, efficient and time-stepping com- patible collision detection method for non-smooth contact between rigid bodies of arbitrary shape. In: Proceedings of the Multibody Dynamics ECCOMAS Thematic Conference (2007) 46. Merlhiot, X.: Extension of a time-stepping compatible contact determination method between rigid bodies to de- formable models. In: Proceedings of the Multibody Dynam- ics ECCOMAS Thematic Conference (2009) 47. Miller, A., Knoop, S., Christensen, H., Allen, P.: Auto- matic grasp planning using shape primitives. In: IEEE In- ternational Conference on Robotics and Automation, vol. 2, pp. 1824-1829 (2003) 48. Milner, T., Cloutier, C.: Compensation for mechanically unstable loading in voluntary wrist movement. Experimen- tal Brain Research 94, 522-532 (1993) 49. Miyata, N., Kouki, M., Mochimaru, M., Kawachi, K., Kurihara, T.: Hand link modelling and motion generation from motion capture data based on 3d joint kinematics. In: Proceedings SAE International Iowa (2005) 50. Morasso, P.: Spatial control of arm movements. Experi- mental Brain Research 42, 223-227 (1981) 51. Morasso, P., Mussa-Ivaldi, F.: Trajectory formation and handwriting: a computational model. Biological Cybernet- ics 45, 131-142 (1982) 52. Morasso, P., Sanguineti, V.: Ankle stiffness alone cannot stabilize upright standing. Journal of Neurophysiology 88, 2157-2162 (2002) A human-like learning control for digital human models in a physics-based virtual environment 15 53. Occhipinti, E.: Ocra, a concise index for the assessment of exposure to repetitive movements of the upper limbs. Ergonomics 41, 1290-1311 (1998) 54. Perreault, E., Kirsch, R., Crago, P.: Multijoint dynam- ics and postural stability of the human arm. Experimental Brain Research 157, 507-517 (2004) 55. Porter, J.M., Case, K., Marshall, R., Gyi, D., Sims, R.: Beyond Jack and Jill: designing for individuals using HADRIAN. International Journal of Industrial Ergonomics 33, 249-264 (2004) 56. Prakash, N.: Differential Geometry, an Integrated Ap- proach. TATA McGraw-Hill Publishing Company Limited (1981) 57. Pratt, J., Torres, A., Dilworth, P., Pratt, G.: Virtual ac- tuator control. In: IEEE International Conference on Intel- ligent Robots and Systems, pp. 1219-1226 (1996) 58. Salini, J.: Dynamic control for the task/posture coordina- tion of humanoids: toward synthesis of complex activities. Ph.D. thesis, University of Pierre and Marie Curie (2012) 59. Schaub, K., Landau, K., Menges, R., Grossmann, K.: A computer-aided tool for ergonomic workplace design and preventive health care. Human Factors and Ergonomics in Manufacturing 7, 269-304 (1997) 60. Sciavicco, L., Siciliano, B.: Modelling and Control of Robot Manipulators. Springer, London (2000) 61. Shadmehr, R., Mussa-Ivaldi, F.: Adaptive representation of dynamics during learning of a motor task. Journal of Neuroscience 14, 3208-3224 (1997) 62. Si, J., Zhang, N., Tang, R.: Modified fuzzy associative memory scheme using genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation (CEC), vol. 3, pp. 2002-2006 (1999) 63. Slotine, J., Li, W.: Applied Nonlinear Control. Prentice- Hall, Englewood Cliff, NJ (1991) 64. Smith, A.: Does the cerebellum learn strategies for the optimal time-varying control of joint stiffness? Behavioral and Brain Sciences 19, 399-410 (1996) 65. Tee, K., Franklin, D., Kawato, M., Milner, T., Burdet, E.: Concurrent adaptation of force and impedance in the redundant muscle system. Biological Cybernetics 102, 31- 44 (2010) 66. Todorov, E., Jordan, M.: Smoothness maximization along a predefined path accurately predicts the speed profiles of complex arm movements. Journal of Neurophysiology 80, 697-714 (1998) 67. Uno, Y., Kawato, M., Suzuki, R.: Formation and control of optimal trajectory in human multijoint arm movement: Minimum torque-change model. Biological Cybernetics 61, 89-101 (1989) 68. Vignes, R.: Modeling Muscle Fatigue in Digital Hu- mans. Center for Computer-Aided Design, The University of IOWA. Tech. rep. (2004) 69. VSR Research Group: Technical report for project virtual soldier research. Center for Computer-Aided Design, The University of IOWA. Tech. rep. (2004) 70. Wolpert, D., Miall, C., Kawato, M.: Internal models in the cerebellum. Trends Cognitive Sciences 2, 338-347 (1998) 71. Won, J., Hogan, N.: Stability properties of human reach- ing movements. Experimental Brain Research 107, 125-136 (1995) 72. Yang, C., Ganesh, G., Haddadin, S., Parusel, S., Al- buSchaeffer, A., Burdet, E.: Human like adaptation of force and impedance in stable and unstable interactions. Trans- actions on Robotics 27, 918-930 (2011) 16 Giovanni De Magistris et al. A Relation between cartesian space and joint space Using Eqs. 1, the interaction dynamics is: MṪ + NT + G = Lτ + JTc Wc + J T endW i end (33) Given an interaction wrench W i end . In this paper, we treat the DHM control where the floating base is the foot. We consider cases with the foot fixed to the ground. In this way, we obtain a completely actuated DHM with fixed-base robots characteristics. The dynamic model of DHM is: Mqq̈ + Nqq̇ + Gq = τ + J T c,qWc + J T end,qW i end (34) with Mq = L T ML, Nq = L T NL, Gq = L T G, JTc,q = L T JTc and J T end,q = LT JT end . When the only contact with the ground is the foot and it is the root, we obtain JcL = 0. Since ρ = Sq and S is a matrix to select a part of the actuated degrees of freedom (S = [I 0]) to obtain a dyamic model independent of non-sliding contact forces at known fixed locations in Eq. 1 such as the contacts between the feet and the ground, we can write the system as: Mρρ̈ + Nρρ̇ + Gρ = τρ + J T end,ρW i end (35) with Mρ = SMqS T , Nρ = SNqS T , Gρ = SGq and J T end,ρ = SJT end,q . From Eq. 35 and since δW i end = Kendvec(H −1 end δHend) = KendJend,qδq = KendJend,qδ(S tρ) = KendJend,ρδρ, we obtain: δτρ + δ(J T end,ρW i end) = δτρ + (δJ T end,ρ)W i end + J T end,ρδW i end = δτρ + (δJ T end,ρ)W i end + J T end,ρKendJend,ρδρ = 0 (36) Since δτρ = −Kρδρ and Eq. 36, we obtain: Kρ = − δτρ δρ = JTend,ρKendJend,ρ + ∂JT end,ρ ∂ρ W iend (37) Finally, the cartesian impedance is: Kend = J †T end,ρ ( Kρ − ∂JT end,ρ ∂ρ W iend ) J † end,ρ (38) with J † end,ρ the dynamic pseudo-inverse [38] defined as: J † end,ρ = M−1ρ J T end,ρ(Jend,ρM −1 ρ J T end,ρ) −1 (39) It can be similarly obtained Bend = J †T end,ρ BρJ † end,ρ . B Convergence Analysis B.1 Motion error cost function The first derivative of ME (Eq. 5) can be calculated as follows: ṀE = 1 2 d dt [ϵT (J †T end,ρ MρJ † end,ρ )ϵ] = 1 2 [ϵ̇T (J †T end,ρ MρJ † end,ρ )ϵ +ϵT (J̇ †T end,ρ MρJ † end,ρ + J †T end,ρ ṀρJ † end,ρ + J †T end,ρ MρJ̇ † end,ρ )ϵ + ϵT (J †T end,ρ MρJ † end,ρ )ϵ̇] (40) with ϵ = V − V ∗, ϵ̇ = A − A∗ (41) and V ∗ = V d − bδ(Hd, Hr). V d is the velocity obtained by minimum jerk planner. A∗ is the derivative of V ∗. Matrix Mρ is symmetric, we therefore obtain: ṀE = [ϵ T (J †T end,ρ MρJ † end,ρ )ϵ̇] + 1 2 [ϵT (J †T end,ρ ṀρJ † end,ρ )ϵ] + [ϵT (J †T end,ρ MρJ̇ † end,ρ )ϵ] (42) The relationship between ρ velocity and Cartesian space velocity can be expressed as: V = Jend,ρρ̇ ⇒ ρ̇ = J † end,ρ V (43) Differentiating Eq. 43, the cartesian acceleration term can be found as: A = Jend,ρρ̈ + J̇end,ρρ̇ (44) A human-like learning control for digital human models in a physics-based virtual environment 17 then the equation of robot motion in joint space can also be represented in Cartesian space coordinates by the relationship: ρ̈ = J † end,ρ (A − J̇end,ρρ̇) = J † end,ρ (A − J̇end,ρJ † end,ρ V ) (45) Substituting Eqs. 45 and 43 into Eq. 35 yields: MρJ † end,ρ [A − J̇end,ρJ † end,ρ V ] + NρJ † end,ρ V + Gρ = τρ + J T end,ρW i end (46) Multiplying both side by J †T end,ρ , we obtain: (J †T end,ρ MρJ † end,ρ )A = [−J†T end,ρ NρJ † end,ρ + J †T end,ρ MρJ † end,ρ J̇end,ρJ † end,ρ ]V − J†T end,ρ Gρ + J †T end,ρ τρ + W i end (47) Using Eq. 10, we otbain: (J †T end,ρ MρJ † end,ρ )A = [−J†T end,ρ NρJ † end,ρ + J †T end,ρ MρJ † end,ρ J̇end,ρJ † end,ρ ]V − J†T end,ρ Gρ + J †T end,ρ τffρ − W d end − J †T end,ρ τlρ + W i end (48) where τ ff ρ is the torque to compensate for DHM dynamics. By definition, it can be written as: τffρ ≡ Mρρ̈ ∗ + Nρρ̇ ∗ + Gρ ≡ MρJ † end,ρ A∗ + [NρJ † end,ρ − MρJ † end,ρ J̇end,ρJ † end,ρ ]V ∗ + Gρ (49) Using Eq. 41 and substituting Eq. 49 into Eq. 48 yields: (J †T end,ρ MρJ † end,ρ )ϵ̇ = [−J†T end,ρ NρJ † end,ρ + J †T end,ρ MρJ † end,ρ J̇end,ρJ † end,ρ ]ϵ + W iend − J †T end,ρ τlρ − W d end (50) Substituting Eq. 50 into Eq. 42 yields: ṀE =ϵ T [(−J†T end,ρ NρJ † end,ρ + J †T end,ρ MρJ̇ † end,ρ + J †T end,ρ MρJ † end,ρ J̇end,ρJ † end,ρ )ϵ + W i end − J†T end,ρ τlρ − W dend] + 1 2 [ϵT (J †T end,ρ ṀρJ † end,ρ )ϵ] + [ϵT (J †T end,ρ MρJ̇ † end,ρ )ϵ] = 1 2 ϵT [J †T end,ρ (Ṁρ − 2Nρ)J † end,ρ ]ϵ + ϵT [W i end − J†T end,ρ τlρ − W dend] + ϵ T [J †T end,ρ MρJ̇ † end,ρ + J †T end,ρ MρJ † end,ρ J̇end,ρJ † end,ρ ]ϵ (51) Matrix Ṁρ − 2N is skew-symmetry [60] and for this reason, we have: ϵT (J †T end,ρ (Ṁρ − 2Nρ)J † end,ρ )ϵ = 0 (52) Let us now analyze the third term of Eq. 51. Using Eq. 39, since Jend,ρJ † end,ρ = I and J̇end,ρJ † end,ρ + Jend,ρJ̇ † end,ρ = 0, we obtain: J †T end,ρ MρJ̇ † end,ρ = (Jend,ρM −1 ρ J T end,ρ )−1Jend,ρM −1 ρ MρJ̇ † end,ρ = (Jend,ρM −1 ρ J T end,ρ )−1Jend,ρJ̇ † end,ρ J †T end,ρ MρJ † end,ρ J̇end,ρJ † end,ρ = (Jend,ρM −1 ρ J T end,ρ )−1Jend,ρM −1 ρ MρJ † end,ρ J̇end,ρJ † end,ρ = −(Jend,ρM −1 ρ J T end,ρ )−1Jend,ρJ̇ † end,ρ (53) Substituting Eq. 52 and Eq. 53 into Eq. 51, we obtain: ṀE = ϵ T [W iend − J †T end,ρ τlρ − W d end] (54) Using Eqs. 54, 11 and 38, we have: ṀE=−ϵT Biniendϵ − ϵ T Kl end δ(Hd, Hr) − ϵT Bl end δ(V d, V r) − ϵT J†T end,ρ τlρ + ϵ T W i end =−ϵT Bini end ϵ − ϵT [ J †T end,ρ ( Kρ − ∂JTend,ρ ∂ρ W i end ) J † end,ρ ] δ(Hd, Hr) − ϵT (J†T end,ρ BlρJ † end,ρ )δ(V d, V r) − ϵT J†T end,ρ τlρ + ϵ T W i end (55) We can derive δME(t) = ME(t) − ME(t − D) from Eqs. 55 and 8 as: δME(t) = ∫ t t−D{−ϵ T (σ)Bini end (σ)ϵ(σ) − ϵT (σ)[J†T end,ρ K̃J † end,ρ ](σ)δ(Hd, Hr)(σ) − ϵT (σ)[J†T end,ρ B̃J † end,ρ ](σ)δ(V d, V r)(σ) −ϵT (σ)[J†T end,ρ τ̃](σ) − ϵT (σ)[J†T end,ρ Kminρ J † end,ρ ](σ)δ(Hd, Hr)(σ) −ϵT (σ)[J†T end,ρ Bminρ J † end,ρ ](σ)δ(V d, V r)(σ) − ϵT (σ)[J†T end,ρ τminρ ](σ) + ϵ T (σ)W i end (σ)}dσ (56) Any smooth interaction force can be approximated by the linear terms of its Taylor expansion along the reference trajectory as follows: W iend(t) = W i,0 end (t) + [J †T end,ρ KiρJ † end,ρ ](t)δ(Hd, Hr) + [J †T end,ρ BiρJ † end,ρ ](t)δ(V d, V r) (57) where W i,0 end is the zero order term compensated by J †T end,ρ τmin; [J †T end KiρJ † end ] and [J †T end BiρJ † end ] are the first order coefficients. From Eqs. 57 and 41, we can obtain the values for Kminρ (t), B min ρ (t) and τ min ρ (t) to guarantee stability (Eq. 58). Different W i end will yield different values of Kminρ (t), B min ρ (t) and τ min ρ (t) and when W i end is zero or is assisting the tracking task ||ϵ(t)|| → 0, Kminρ (t), Bminρ (t) and τ min ρ (t) will be 0. Kminρ (t), D min ρ (t) and τ min ρ (t) represent the minimal required effort of stiffness, damping and feedforward force required to guarantee∫ t t−D{−ϵ T (σ)(J †T end,ρ Kminρ J † end,ρ )(σ)δ(Hd, Hr)(σ) − ϵT (σ)(J†T end,ρ Bminρ J † end,ρ )(σ)δ(V d, V r)(σ) −ϵT (σ)J†T end,ρ τminρ (σ) + ϵ T (σ)W i end (σ)}dσ ≤ 0 (58) so that from Eq. 55 we have ∫ t t−D ṀE(σ)dσ ≤ 0. From Eqs. 56 and 58, we can write: δME(t) ≤ ∫ t t−D{−ϵ T (σ)Bini end (σ)ϵ(σ) − ϵT (σ)(J†T end,ρ K̃J † end,ρ )(σ)δ(Hd, Hr)(σ) −ϵT (σ)(J†T end,ρ B̃J † end,ρ )(σ)δ(V d, V r)(σ) − ϵT (σ)(J†T end,ρ τ̃)(σ)}dσ (59) 18 Giovanni De Magistris et al. B.2 Metabolic cost function The metabolic cost function is: MC(t) = 1 2 ∫ t t−D Φ̃T (σ)Q−1Φ̃(σ)dσ (60) According to the definition of Φ(t) and Q, the following properties of vec(·), ⊗ and tr(·) operators: vec(ΩY U) = (UT ⊗ Ω)vec(Y ), tr(ΩY ) = vec(ΩT )T vec(Y ), tr(ΩY ) = tr(Y Ω) (61) and using the symmetry of Q−1 K , we obtain: vec(K̃T )T (I ⊗ QK)−1vec(K̃T ) = vec(K̃T )T ((Q −1 K )T ⊗ I)vec(K̃T ) = vec(K̃T )T vec(K̃T Q−1 K ) = tr{K̃K̃T Q−1 K } = tr{K̃T Q−1 K K̃} (62) In the same way, can be found the terms corresponding to B̃ and τ̃. For these reasons, we can define δMC(t) = MC(t) − MC(t − D) as: δMC(t) = 1 2 ∫ t t−D{tr{[K̃ T (σ)Q−1 K K̃(σ)] − [K̃T (σ − D)Q−1 K K̃(σ − D)]} + tr{[B̃T (σ)Q−1 B B̃(σ)] − [B̃T (σ − D)Q−1 B B̃(σ − D)]} +[τ̃T (σ)Q−1τ τ̃(σ)] − [τ̃T (σ − D)Q−1τ τ̃(σ − D)]}dσ (63) From Eqs. 13, 14 and 16, we obtain: δK = QK{J † end,ρ [ϵ(t)δ(Hd, Hr)T ]J †T end,ρ − γ(t)Klρ(t)} δB = QB{J † end,ρ [ϵ(t)δ(V d, V r)T ]J †T end,ρ − γ(t)Blρ(t)} δτ = Qτ {J † end,ρ ϵ(t) − γ(t)τlρ(t)} (64) Using the symmetry of Q−1 K , K̃(σ) − K̃(σ − D) = δK(σ) and Eq. 64, the first term in the integrand of Eq. 63 can be written as: tr{[K̃T (σ)Q−1 K K̃(σ)] − [K̃T (σ − D)Q−1 K K̃(σ − D)]} = tr{[K̃T (σ) − K̃T (σ − D)]T Q−1 K [2K̃T (σ) − K̃T (σ) + K̃T (σ − D)]} = tr{δKT (σ)Q−1 K [2K̃(σ) − δK(σ)]} = −tr{δKT (σ)Q−1 K δK(σ)} + 2 tr{δK(σ)Q−1 K K̃(σ)} = −tr{δKT (σ)Q−1 K δK(σ)} + 2ϵT (σ)(J†T end,ρ K̃J † end,ρ )(σ)δ(Hd, Hr)(σ) − 2γ(σ)tr{(Klρ)T (σ)K̃(σ)} (65) In the same way, can be found the second terms in the integrand of Eq. 63 as: tr{B̃T (σ)Q−1 B B̃(σ) − B̃T (σ − D)Q−1 B B̃(σ − D)} = −tr{δBT (σ)Q−1 B δB(σ)} + 2ϵT (σ)(J†T end,ρ B̃J † end,ρ )(σ)δ(Hd, Hr)(σ) − 2γ(σ)tr{(Blρ)T (σ)B̃(σ)} (66) and third terms in the integrand of Eq. 63 as: [τ̃T (σ)Q−1τ τ̃(σ)] − [τ̃ T (σ − D)Q−1τ τ̃(σ − D)] = −[δτ T (σ)Q−1τ δτ(σ)] + 2ϵ T (σ)(J †T end,ρ τ̃)(σ) − 2γ(σ)(τlρ) T (σ)τ̃(σ) (67) Replacing Eqs. 65, 66 and 67 into 63, we obtain: δMC(t)=− 12 ∫ t t−D[δΦ̃ T (σ)Q−1δΦ̃(σ)]dσ − ∫ t t−D[γ(σ)Φ̃ T (σ)Φ(σ)]dσ + ∫ t t−D[ϵ T (σ)(J †T end,ρ K̃J † end,ρ )(σ)δ(Hd, Hr)(σ) + ϵT (σ)(J †T end,ρ B̃J † end,ρ )(σ)δ(V d, V r)(σ)+ϵT (σ)(J †T end,ρ τ̃)(σ)]dσ (68) Combining Eqs. 59 and 68, we obtain the first difference of cost function: δC(t) = C(t) − C(t − D) = δME(t) + δMC(t) ≤ − 1 2 ∫ t t−D[δΦ̃ T (σ)Q−1δΦ̃(σ)]dσ − ∫ t t−D[γ(σ)Φ̃ T (σ)Φ̃ + γ(σ)Φ̃T (σ)Φd(σ) + ϵT (σ)Bini end (σ)ϵ(σ)]dσ (69) To obtain δC(t) ≤ 0, a sufficient condition is: ϵT Biniendϵ + γΦ̃ T Φ̃ + γΦ̃T Φd ≥ λB||ϵ||2 + γ||Φ̃||2 − γ||Φ̃||||Φd|| ≥ 0 (70) where λB as the infimum of the smallest eigenvalue of B ini end . Replacing γ(t) = p 1+u||ϵ(t)||2 into Eq. 70, we obtain: λBu||ϵ||4 + λB||ϵ||2 + p||Φ̃||2 − p||Φ̃||||Φd|| ≥ 0 (71) To find the regions of points (||ϵ||2, ||Φ̃||) for each of which Eq. 71 holds, we need firstly to determine the set of points that satisfies: λBu||ϵ||4 + λB||ϵ||2 + p||Φ̃||2 − p||Φ̃||||Φd|| = 0 (72) Eq. 72 is an ellipse passing trough the points (||ϵ||2 = 0, ||Φ̃|| = 0) and (||ϵ||2 = 0, ||Φ̃|| = ||Φd||). To find the canonical equation of this ellipse, we need only to complete the squares and we obtain: λBu(||ϵ||2 + 12u ) 2 + p(||Φ̃|| − ||Φ d|| 2 )2 λB 4u + p||Φd||2 4 = 1 (73) A human-like learning control for digital human models in a physics-based virtual environment 19 By Krasovskii-LaSalle principle, ||ϵ||2 and ||Φ̃|| will converge to an invariant set Ωs ⊆ Ω on which δC(t) = 0, where Ω is the bounding set defined as: Ω ≡  (||ϵ||2, ||Φ̃||), λBu(||ϵ|| 2 + 1 2u )2 + p(||Φ̃|| − ||Φd||/2)2 λB 4u + p||Φd||2 4 ≤ 1   (74) If the parameter γ is constant [24], the bounding set is:{ (||ϵ||2, ||Φ̃||), 4λB||ϵ||2 + 4γ(||Φ̃|| − ||Φd||/2)2 γ||Φd||2 ≤ 1 } (75) γ does not affect convergence, but the convergence speed and size of convergence set. C Minjerk C.1 Formal definition Using Eq. 20, we can write the inside term of the integral as: P = ∥∥∥∥ d3dt3 r(s) ∥∥∥∥2 = ∥∥∥∥ d2dt2 r′(s)ṡ ∥∥∥∥2 = ∥∥∥∥ ddt(r′′(s)ṡ2 + r′(s)s̈) ∥∥∥∥2 = ∥∥r′′′(s)ṡ3 + 3r′′(s)ṡs̈ + r′(s)...s∥∥2 (76) To explicit the invariance with respect to rotations and translations of the minimization problem in Eq. 76, we can define uniquely 3D curve [56] by its curvature R(s) and its torsion η(s). The path r satisfies Frenet’s formulas: t = Rn n′ = ηb − Rt b′ = −ηn (77) From geometry, we know that: r′ = t′ r′′ = Rn r′′′ = R′n + R(ηb − Rt) (78) We replacing Eq. 78 in Eq. 76 and we obtain: P = ∥∥n(R′ṡ3 + 3Rṡs̈) + t(...s − R2ṡ3) + b(ṡ3Rη)∥∥2 (79) n, t and b are orthogonal and thus we obtain: P = (R′ṡ3 + 3Rṡs̈)2 + ( ... s − R2ṡ3)2 + (ṡ3Rη)2 (80) C.2 Relation to the 2/3 power law We want to find the relation of Eq. 80 to 2/3 power law. To obtain this, we define a function: Zs = ṡ 3R(s) (81) Zs corresponds to the term multiplying the torsion η in Eq. 80. We derive Eq. 81 respect to time and we obtain: R′(s)ṡ4 + 3ṡ2s̈R(s) = Z′sṡ R′(s)ṡ3 + 3ṡs̈R(s) = Z′s (82) The term R′(s)ṡ3 + 3ṡs̈R(s) is equal to the term multiplying n in Eq. 79. We now substitute Eq. 82 in Eq. 79: P = ∥∥n(Z′s) + t(...s − ZsR) + b(Zsη)∥∥2 = Z′2s + (...s − ZsR)2 + Z2s η2 (83) From Eq. 81, we have: ṡ(t) = Z 1 3 s R − 1 3 (84) In the 2/3 power law Z 1 3 s = const and Z ′ s = 0 and it is equivalent to setting the coefficient of n of the instantaneous jerk to zero, and the coefficient of b proportional to the coefficient of t. To demonstrate this, we analyze the 2D power law: (x′2 + y′2)1/2 = const (√ (x′y′′ − y′x′′)2) (x′2 + y′2)3/2 ) ⇒ x′y′′ − y′x′′ = const (85) Taking derivatives, we obtain: x′ y′ = x′′′ y′′′ , r′ = r′′′ (86) The jerk vector points is orthogonal to n and aligned with t. Thus, the jerk along n is zero.