The Composer Policy enables smooth, robust transitions between motion skills. It can be deployed in a zero-shot manner to the real-world and enables a growing library of skills.
Problem
- Enabling a robot with a large, ever-growing, library of skills is not straightforward.
- Simple instantaneous switching between skill experts is unstable and often ends in collapse, especially for highly dynamic motions.
- Popular mixture-of-experts (MoE) methods and hierarchical methods (1) require model retraining when adding new policies, and (2) can result in motion quality degradation of the lower level experts.
- Our Composer Policy learns to perform transitions between highly dynamic motions and enables a growing library of motion skills for quadruped robots. Our main advantages are:
- 1. New experts can be added without any retraining.
- 2. Preservation of the motion quality learned by the original expert.
Our Approach
- Each skill (e.g., trot, hop, spin) is trained as a standalone expert via motion imitation in simulation.
- Experts are capable of robustly executing their learned motion, but transitions are challenging:
- Our key insight is that a target expert only needs the current state to be within its learned distribution to successfully assume control.
- Our Composer Policy is independently trained to drive the agent towards an arbitrary target physical state.
- Uses a shrinking boundaries reward formulation.
- Initial and target states are randomly sampled from animation data + randomizations.
- NOT directly conditioned on information or latent activations from any experts.
The Composer Policy takes over during transition periods, and hands back control to an expert once the target state is reached. By sampling a target state known to be within the distribution of a target expert, transitions are possible between any expert pair in the library.
Results
Library of 9 distinct skills: Trot (F) and (B), Pace (F) and (B), Hop, Hop-Turn, Spin, Sidestep Fast, Sidestep Slow)
- A total of 72 unique transition pairs.
Generalization to new experts: during training, animation from only 4 experts is sampled; evaluations are run with 5 newly added experts.
Simulation Success Rate: Average success rate of >99.99%.
Real-World Success Rate: 360 real-world trials, 97.2% success rate (only 10 total failures).
Success rates for 72 possible transition pairs. Experts highlighted in blue were part of the animation training set (N=4), while purple indicates completely new experts. Composer policy is near perfect in simulation and highly successful in the real-world.
- Transitions are smooth and short: typically <1 second to completion, even for complex or opposing-motion transitions.
- PCA Analysis confirms that the state distribution of the Composer Policy encompasses the distribution of the experts in the library, allowing transitions to exist.
(Top): shows the distribution of experts as a scatter plot. The purple contours depict the distribution density of the Composer Policy's states, where the outermost circle includes 95% of the data.
(Bottom): Displays the trajectory for two transition pairs. It shows that the Composer Policy can drive the agent through the in-between states of two experts all the way inside the target expert distribution.
Cite
@inproceedings{christmann2024expert,
title={Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots},
author={Christmann, Guilherme and Luo, Ying-Sheng and Chen, Wei-Chao},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
pages={9727--9734},
year={2024},
organization={IEEE}
}