key: cord-0618632-wl1uzqmq authors: Liu, Liyang; Fryc, Simon; Wu, Lan; Vu, Thanh; Paul, Gavin; Vidal-Calleja, Teresa title: Active and Interactive Mapping with Dynamic Gaussian ProcessImplicit Surfaces for Mobile Manipulators date: 2020-10-25 journal: nan DOI: nan sha: 622c27d3262f590e5f685a974fac97500a745788 doc_id: 618632 cord_uid: wl1uzqmq In this paper, we present an interactive probabilistic framework for a mobile manipulator which moves in the environment, makes changes and maps the changing scene alongside. The framework is motivated by interactive robotic applications found in warehouses, construction sites and additive manufacturing, where a mobile robot manipulates objects in the scene. The proposed framework uses a novel dynamic Gaussian Process (GP) Implicit Surface method to incrementally build and update the scene map that reflects environment changes. Actively the framework provides the next-best-view (NBV), balancing the need of pick object reach-ability and map's information gain (IG). To enforce a priority of visiting boundary segments over unknown regions, the IG formulation includes an uncertainty gradient based frontier score by exploiting the GP kernel derivative. This leads to an efficient strategy that addresses the often conflicting requirement of unknown environment exploration and object picking exploitation given a limited execution horizon. We demonstrate the effectiveness of our framework with software simulation and real-life experiments. Recent years have seen great progress in autonomous mobile manipulator applications. Typical examples include construction field exploration [1] , automated public space sanitisation [2] , third party guided 3D printing [3] and logistic and warehousing. These applications call for an interactive robotic system that actively maps a changing environment, then planning its motion and manipulation with the help of the online generated map for obstacle avoidance. The manipulation task could be in the form of object picking or 3D printing. With limited on-board resources or stringent time constraints, the robot must plan its motion efficiently to accomplish the prescribed mission. Regardless of the task at hand, an autonomous robotic manipulator needs to first explore the environment. This can occur by directing a robot towards frontiers that indicate unknown regions of the environment [4] . Exploration can also be guided for targeted Information Gain (IG), for example in terrain monitoring [5] , active object detection [6] - [8] and fruit detection [9] . For efficiency, it is desirable to combine mapping and picking into a single phase. The task becomes an active and interactive mapping problem, i.e. to select and pick desired objects during mapping, with mapping guided towards successful selection, and where the next movement is chosen to expand its knowledge of the workspace. Generally, an autonomous mobile manipulator needs to choose a next-best-view (NBV) that satisfies two goals: the spot needs to enable the robot to detect and pick objects, and with the on-board sensor's broad field-of-view (FOV), the robot can increase map coverage and improve map accuracy. Conflicts often arise when pursuing both goals. The new spot must be in a previously mapped and connected section of the map to allow the planning of a collision-free path, as well as contain as many manipulable objects as possible. The robot also needs to identify and inspect frontiers in order to expand its knowledge of the workspace. An NBV selection scheme must be carefully designed to consider all factors involved to guarantee mission success. Further, the decision making mechanism needs to be flexible for an optimum outcome, a naive approach based on rigid logic, be it fuzzy or not, will not be adequate to cater for and interact with the continuously changing world. Although the concept of dynamic environment mapping has been explored in the past. Much of the focus is in detecting and tracking of moving objects in static scene background [10] [11] [12] . This is not relevant to the area of path planning in a structurally changing environment. In KinectFusion [13] , a volumetric map is built using the Truncated Signed Distance Function (TSDF) data structure. TSDF requires high storage space which may not be nec-essary for large-scale applications. Further, to smooth out noise and cater for dynamic scenes, it relies on computation intensive hardware architecture-GPU to time-average scanned data for each voxel, dramatic topological changes take a significant delay to appear. Octomap [14] is another discrete way of showing existence of objects in 3D space. In [1] , we presented a method of dynamic scene update in the Octomap representation. However, Octomap does not address structural correlations between nearby cells, map resolution and accuracy are often compromised. Moreover, due to discretisation, Octomap and other mapping representations do not offer a flexible way to reason about action IG hence are not first choice in active exploration applications. The Gaussian Probabilistic Implicit Surface (GPIS) [15] , on the other hand, offers a probabilistic yet accurate map representation of the scene. The GPIS encodes spatial correlation embedded in input data and it is suitable for Bayesian fusion on scans from varying viewpoints [16] , [17] . One can query the GPIS for scene geometry and uncertainty at arbitrary resolution, visited or unexplored alike. Being a dense implicit surface, GPIS facilitate obstacle avoidance for safe pick trajectory planning, it also offers a natural mechanism to check for object manipulability in space. The surface normals, available as a natural component of GPIS with derivatives [16] , [17] can be exploited for interactive applications [18] , including object shape detection, classification and validation. On the other hand, the probabilistic formulation in GPIS makes it particularly amenable to active perception problems as the robot can use any IG based analytical methods to make flexible decision about the next optimal move. The are extensive literature in GP based IG exploration, including entropy reduction [5] , conditional entropy reduction [19] and mutual information [20] . Moreover, GPIS can be viewed as an immediate probabilistic conditioning tool on current structural data. If a mechanism is devised to detect and filter out moved scene points, the map can instantly be inferred by conditioning on the changed area. Thus it is possible to achieve zero-delay between scene change and mapping outputs. We will show in this paper how to utilise the immediate conditioning and flexible IG attributes of GP to handle dynamic scene changes and make fair action selection in an exploration task. This paper presents an interactive active mapping framework for mobile manipulator systems. The contribution of this work comes in three folds. First we propose a dynamic GPIS algorithm (Section III) in which each incoming sensor scan is trained into an instantaneous GP that defines a probabilistic tolerance layer, which is used to detect and discard previous GPIS training points taken from the scene, thus producing a resultant implicit surface accurately capturing environment changes (Fig. 1a) and d)). Secondly, given the probabilistic map representation, we develop an analytical frontier score for each possible NBV candidate (Section VI) exploiting the GP kernel derivative, this greatly improves fairness and flexibility in NBV selection. Last but foremost the proposed framework provides a viable means for a mobile manipulator to interact with the environment (Section V): plans base path and generates arm trajectory with obstacle avoidance capability, choosing NBV that maximises scene object manipulability (Fig. 1b) ). Results of the full online implementation of our framework are presented in Section VII demonstrating its performance in simulation and real experiments. Our framework aims to provide an efficient object collection pipeline for a mobile manipulator operating in an un-mapped 3D scene. As illustrated in Fig. 2 , it consists of a GP-based dynamic mapping component that maps the changing world, and an NBV selection module that utilizes the probabilistic map information to recommend the best action. The mobile manipulator relies on this framework to plan its arm trajectory and base paths to travel and make changes in the surrounding environment. Initially, the robot is positioned next the objects to be picked. First it performs scene inspection and builds the map. Next it detects interested objects and picks up one. Then it updates the map to synchronize with the changed world. Based on the map, it computes the NBV and moves towards it. The inspection cycle then repeats. The proposed dynamic mapping representation of this work is based on the online GPIS fusion in [17] . In a similar way, our mapping module consists of two GP phases. A GPIS inferred from multiple depth images to recover the scene reconstruction, and a frame level GPR to detect the changed GPIS training points. The main difference with [17] is the introduction of the dynamic scene handling and a virtual wall that is described in this section. We now briefly review how GPIS describes the scene map. Let us define a point x ∈ R 3 and a function f : where d is the distance of the point from the surface. Then the GPIS is defined by the posterior distribution of the value of f at an arbitrary testing point x * given by , where the predictive mean and variance are given by We follow the notations k * , K and k(x * , x * ) to represent covariances between x * and n training points and n × n covariance matrix of the training points and the covariance function at x * respectively [21] . We opted to use Matérn 3 class covariance function (ν = 3/2), where l is empirically obtained. B. Instantaneous 2.5D Map from Scan GPR For each incoming depth image, a stand-alone observation GPR is trained for arbitrary bearing depth inference. This GPR takes the form of bearing to inverse depth function: . Similar to [17] , we use the Ornstein-Uhlenbeck (OU) covariance function to model the IDP observation, as the OU kernel is best suited for modeling random walk curves [21] without excessive smoothing. With this instantaneous scan GP, for any bearing angle θ s , its inverse depth f IDP (θ s ) and uncertainty σ IDP (θ s ) can be inferred. Its associated scan point coordinates x s can be derived as The GPR inferred 2.5D depth map can be viewed as a tolerance blanket of allowed range values from the sensor perspective, any points in the field of view (FOV) yet falling outside the blanket is considered an anomaly. This property will be utilized to detect removed objects in III-C. When a new depth image is received, the new 3D points are fused and added to the GPIS training dataset to expand the 3D map. On the other hand, training points that belong to a recently removed object must be deleted to ensure consistency between the map and the physical world. For this to work, we identify all GPIS training samples that fall into the sensor's FOV at current pose, infer its IDP value then apply three possible treatments: delete, fuse or ignore. The procedure can be described in Algorithm 1. Specifically, for any point x is in the GPIS training set, we apply a simple transformation to obtain its position in the current camera frame x is = Tx is . The camera's transformation T = (R, t) ∈ SE(3) is assumed given. We compute its bearing angle θ is and decide if it falls in the (c) Map before and after object remove (d) Old points xis marked (green) for deletion if distance to virtual wall greater than tolerance d σs. FOV. For each valid x is we infer its IDP and uncertainty pair (f IDP , σ IDP ) following the method in III-B. We then analyse the difference between the stored inverse depth and apply three possible treatments: delete, fuse or ignore. Delete: For objects removed from the scene, the new scan would reflect off whatever object is behind the removed one, with an increased measured range. We delete this point with the following anomaly detection criterion: where g is an empirically determined scale factor. This simple check works for removed objects that have other obstacles lying behind, but fails for objects with an empty background, which have an invalid range measurement. We use a method similar to [1] to solve this problem, where we artificially replace every invalid sample (NaN value) in the scanned depth with a very large number, effectively creating a virtual wall at "infinity". The ideas are illustrated in Fig. 3 . The virtual wall is trained together with the rest of data to form the scan GPR. Fuse: For is within a small range, the old GPIS point x is and new scan measurement x s describe the same scene obstacle. We thus apply Bayesian fusion to update its position estimate: Here, σ s is the matching point uncertainty and can be computed as described in [17] . Ignore: Points outside the above categories are occluded scene points and should be left unchanged. With the newly updated data storage, we perform the second stage GPIS training. The readers are encouraged to refer to [17] for a full treatment of online GPIS formulation. Algorithm 1: Dynamic GPIS Update, σ occ = 0.02 IV. GAUSSIAN PROCESS FRONTIER As mentioned above the GPIS inferred lends itself nicely to an IG based exploration strategy. We now introduce the proposed frontier metric for GP, inspired by the gradient occupancy map method in [20] . This gradient value, when used in NBV selection, encourages the robot to explore boundary regions over other unexplored regions. Different to [20] , our frontier metric is in the form of uncertainty gradient, which comes with an analytical expression as uncertainty is readily available from GP. The uncertainty gradient for an arbitrary point x can be defined as where σ 2 is provided by GPIS together with the distance to the surface mean value. For mobile manipulators whose motion is confined to the 2D ground plane, and with an unnegligible base size, it is incorrect to approximate the frontier as a point gradient in 3D space. Rather, we use a line segment to define the space occupied by the robot, then retrieve the ground plane GPIS frontier. Let l i = [l x , l y , 0] denote the direction of a line segment s i on the ground transversal to robot heading, which is discretised to contain a set P i of 3D points. Denote δα as the length of an infinitesimal segment along l i . Then for a point x i j in the segment s i , j ∈ supp(P i ), a small perturbation results in a neighbour point x i j + δαl i . Analogous to the way image corner features are detected [22] , we define a frontier metric as the Sum of Directional Squared Difference (SDSD) in uncertainty for all points in P i along l i : The GP inference in (2) leads to σ 2 (x i j ) = P[f (x i j )], using the Matérn kernel equation (3), we get an expression for the gradient of GPIS uncertainty at x i j : From a mathematical perspective, this GP based formulation offers a rigorous measure of exploration boundary, hence has higher fidelity in comparison to discretised frontier detection methods [20] . V. MANIPULABILITY This section presents a metric to determine whether the manipulator is capable of reaching and picking the objects in the environment. This metric filters out object poses that minimise the manipulability index [23] of the arm during picking. The filter uses the geometric volume of an annulus sector as a heuristic to determine which poses are viable. The frame of this region is fixed to the manipulator base frame and all poses outside this region are rejected by the filter. For this process we use the determinant of the Jacobian matrix J to calculate the robot's measure of manipulability m as follows, m = det(JJ T (θ)) ∈ R where the robot's Jacobian J relates the end-effector's Cartesian velocity with the joints velocitiesq. The annulus sector used in the pose manipulability filter is derived from the manipulator's kinematics by sampling valid end-effector discrete poses. The configuration space C F ull is defined as all the possible configurations of the manipulator's joints q. Performing an analysis of the entire configuration space would be too computationally expensive. Thus, a discrete sampling C Sample ⊂ C F ull is obtained for all possible manipulator configurations by incrementing each joint by a fixed distance ∆θ between the joint limit intervals [θ min , θ max ]. For every configuration ∀q ∈ C Sample we calculate the end-effector pose in task space T using forward kinematics ξ E = K(q), as well as the manipulability index m where: with the position p ∈ R 3 and the rotation R ∈ SO(3). This process gives us an n-by-4 matrix containing the endeffector position and manipulability index for each sampled configuration, We discard any rows with a manipulability index m below a threshold m thres , The remaining points are used to determine the minimum and maximum bounds of the manipulable Cartesian workspace T m ⊂ T . This allows us to differentiate between the reachable region of the manipulator workspace and the manipulable region of the workspace as shown in Fig. 4 , where the manipulable region in front of the robot is defined with a radius interval This section presents our the strategy for choosing the NBV. We first describe how to obtain a group of candidate poses, then present our utility function for NBV calculation. The first question in mind is how to obtain the candidate poses. Since our goal is to map object piles, the next position to place the robot should be around the circumference of the partially explored pile map, as illustrated in Fig. 5 . We query the GPIS to obtain a probabilistic implicit surface representing the scene. The map can be assumed to have a roughly conical shape which is generically the case when stacking object to maintain stability. We project the surface points onto the ground to form a 2D occupancy map and an uncertainty map. Using image processing methods, we obtain the contour of the occupancy, (see Fig. 5(a) ), which is composed of a set of piece-wise linear segments {s i } of the map base. Half of the segments are from incomplete surfaces without full exploration, hence have high uncertainty, referred to as un-real. The remaining half are genuine base outline segments and have low uncertainty. Using this uncertainty we classify each segment as real or unreal. This linear segments are the candidates for positioning the robot in next step. i . Let us denote the utility function at the i'th segment as U [t] ( s i ), the goal of NBV is to identify the segment i [t] * with maximum utility to position the robot, We now present factors that affect the interactive mapping performance: gives preference to the boundary segments over other un-real segments. See IV for details, the contour segment becomes the transversal line segment described therein. • Interact order h[s i ]: gives a preference for picking order, it can be in the form of segment height, which is available from the GPIS. This is useful for object picking in warehousing or construction site applications -picking from higher locations is preferred over lower ones as it causes minimum disruption to object piles. Further, this encourages the object pile operated on to maintain a convex shaped outline, leaving enough space for the robot to backtrack to uncluttered regions. • Travel distance d[s i ]: prefer to visit close-by scanned segments, can be computed as the Dijkstra minimum travel distance from current location to candidate. • Avoid repeating failure p[s i ]( t ): avoid spots where previous pick/detect attempts failed within a time duration, and is a function of time. Due to cluttering or plan failure, a robot at a seemingly optimal position may fail to detect or pick up objects. Yet once the neighbouring regions are cleared, the location may be useful again. The penalty factor helps to avoid choosing the same location immediately after pick failure. To do this we store a location's last failure time t f [s i ] in a quad-tree data structure that represents ground positions. Initially, all cells are assigned with negative infinity time corresponding to no penalty. As a failure occurs, the t f [s i ] at the robot's position is set to the current time, (16) to ensure the cell is excluded from NBV selection for a small time window. Considering the above-mentioned factors, we present the following action utility formulation to maximise throughput: positions. The frontier segment with valid m, h and σ 2 scores, participate in NBV selection and competes with other real segments. Once the frontier is explored, with an on-board sensor's broad FOV, its surrounding regions will become valid and the frontier gets shifted. The newly uncovered segments and shifted frontier are added to the next round of NBV selection. We evaluated the performance of our framework with extensive simulated and real-life experiments. We build our mobile manipulator using the Neobotix MP700 [24] as the robot base, and mounted a Universal Robot ur5e [25] as the arm. Our active mapping framework is developed in C++/ROS system. Our test environment consists of piles of bricks on flat terrain. The bricks are labelled with AR-tags for easy pose detection, and have metal plates attached for magnetic grasping. This scenario resembles the necessary exploration and brick detection component in automatic construction tasks, where our framework is used to facilitate efficient brick picking and wall building. We designed our simulation environment in the Gazebo simulator consisting of brick piles as shown in Fig. 6 . We also developed the robot base and arm simulation models with accurate mechanical properties to mimic the real-life system. An ablation study and benchmark tests are performed for the simulated environment. For the real-life experiments, we set up two scenes shown in Fig. 7 to evaluate the effectiveness of map construction and NBV selection. A video of our results can be found in https://tinyurl.com/y5qwn864. A. Simulation 1) Ablation Study: We analysed the importance of each factor in our NBV selection scheme, by removing it from the framework and made observations of its performance. We run each variant in a fixed time interval and collect the average number of picked bricks, percentage of covered map, number of object falls and collisions between robot and environment. The results are given in Table I . In particular, the column for object falls indicates a pile has collapsed due to unbalanced picking. Obviously, this should be avoided for any real application such as in warehousing and construction sites. The "Order" factor (test case 3) is important in reducing pile collapses. Further, the robot should never enter an unknown region before mapping it, as this could result in collisions with unforeseen obstacles. The test results for "Travel Distance" and "Frontier" show the occurrences of collisions, which could cause severe damage to the robot and objects it operates on. Without the frontier factor (case 6) map coverage is affected, resulting in long execution time. Without the failure penalty, the robot can get stuck indefinitely on a seemingly good spot that does not host pickable objects, as shown in the "frontier" row. Considering all results, the complete utility function (item 1) case performs the best in each test category. We compare the performance of our framework against Octomap based active mapping for three test scenarios (Fig. 6) , three times each. Our framework outperforms Octmap in mission completion rate and map coverage. We observed in the Octomap case, the robot frequently went to spots that do not host pickable objects, this can be explained by the fact that our framework performs manipulability analysis utilising the GPIS features, therefore can smartly choose better locations for picking. We compare performance of our framework with random exploration strategy, in aspect of task throughput and map coverage where throughput is quantified as number of objects picked in our setting. We also compare the map coverage. From the result plot Fig. 8 our strategy excels in task accomplish rate. In real-life experiment, we tested accuracy of the dyanmic GPIS mapping component and effectiveness of NBV selection in our framework. In real-life test 1 ( Fig. 9(a) ), we compared accuracy of Dynamic GPIS with a singleshot mesh, as shown in Fig. 9 . We first generated a map by feeding through two depth images: the original scan and one after removing a brick, (c) and (d). The second map is a mesh directly produced from the second scan. We used CloudCompare to compare the signed distance difference between the two maps (e), and the error are indeed very small (f ). In real-life test 2 ( Fig. 9(b) ), we tested effectiveness of NBV selection, showing results in Fig. 10 . We set the scene with two rows of bricks with top two lying side by side, the robot was facing to the right brick initially ((a)). After the robot detected and picked the first brick, the map was updated and NBV shifted towards the high brick on the left ((b)). The robot then moved to the NBV position, ((c)). Finally it picks up the left brick. This shows our NBV strategy behaved in the desired order. In this letter, we presented an interactive and active mapping framework for a mobile manipulator platform based on dynamic GPIS. Since the framework is probabilistic, it is able to perform immediate mapping update for a dynamically changing environment. Using the probabilistic map, our NBV selection scheme has been shown to be able to balance the needs of information gain on known regions, frontier driven map expansion as well as object manipulability. Most importantly, the dense map generated enables a robot to safely move around in and apply changes to the environment. Both simulation and real-life experiments show our system can efficiently explore and interact with a large pile of objects in an environment. Efficient pipeline for mobile brick picking NTU Singapore researchers build disinfection robot to aid cleaners in COVID-19 outbreak Additive Manufacturing A frontier-based approach for autonomous exploration Multiresolution mapping and informative path planning for uav-based terrain monitoring Active object recognition via monte carlo tree search A novel approach to steel rivet detection in poorly illuminated steel structural environments Mapping Repetitive Structural Tunnel Environment For A Biologically-Inspired Climbing Robot Multi-robot region-ofinterest reconstruction with dec-mcts Scene flow propagation for semantic mapping and object discovery in dynamic street scenes Static-map and dynamic object reconstruction in outdoor scenes using 3-d motion segmentation Dynaslam: Tracking, mapping, and inpainting in dynamic scenes Kinectfusion: Real-time dense surface mapping and tracking OctoMap: An efficient probabilistic 3D mapping framework based on octrees Gaussian process implicit surfaces Skeleton-based conditionally independent gaussian process implicit surfaces for fusion in sparse to dense 3d reconstruction Online continuous mapping using gaussian process implicit surfaces Gaussian process implicit surfaces for shape estimation and grasping Building 3d object models during manipulation by reconstruction-aware trajectory optimization Gaussian processes autonomous mapping and exploration for range-sensing mobile robots A combined corner and edge detector Manipulability of robotic mechanisms Mobile robot mp-700 UNIVERSAL ROBOT UR5e, a flexible and lightweight robotic arm