Given a set of independently trained RL skills enabling quick transitions becomes a challenge, especially in the real-world. We introduce a method that lets quadruped robots transition between multiple learned locomotion skills.
We implement a meta-controller that leverages a transition scoring model conditioned on the latent state representations of underlying policy experts.

Method

Each gait is independently learned using motion imitation from animation data.
- Domain randomization ensures robust zero-shot sim2real deployment.
Transition-Net:
- An MLP trained to predict the success of a transition between two policies, conditioned on specific policy pairs, a target phase, and the latent state representations of the active policy.
  - Latent representation are activations from the last hidden layer of each policy. It encodes the current state of the robot under the policy.
- Trained as a binary classifier: Given a transition configuration (source policy, destination policy, source latent, target phase), predicts success/failure.

Diagram detailing the whole process from training the library of experts, collecting the dataset and training the transition-net classifier, and the meta-controller during deployment.

At runtime, a meta-controller queries this network in real-time to determine when and how to switch gaits without destabilizing the robot.

The meta-controller is queried at every time step. Once the predicted score is good enough, the queued policy takes control of the robot.

Cite

@inproceedings{christmann2023expanding,
  title={Expanding versatility of agile locomotion through policy transitions using latent state representation},
  author={Christmann, Guilherme and Luo, Ying-Sheng and Soeseno, Jonathan Hans and Chen, Wei-Chao},
  booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={5134--5140},
  year={2023},
  organization={IEEE}
}