HAML: Humanoid Adversarial Multi-skill Learning via A Single Policy for Robust Sim-to-Real Deployment

Overview

This paper presents HAML, a robust two-stage learning framework designed to translate large-scale, weakly labeled motion datasets into deployable multi-skill humanoid controllers. Addressing the critical challenges of mode collapse in conditional generation and the engineering gap between simulation and reality, the authors propose a scalable system that utilizes coarse clip-level skill labels to minimize annotation effort while ensuring high controllability. The core methodological innovation lies in a condition-aware adversarial training mechanism that injects mismatched transition-label pairs, explicitly penalizing incorrect associations to ensure the discriminator enforces strict instruction compliance. To enable physical deployment, the system employs a teacher-student distillation strategy where a privileged teacher is converted into a student policy operating solely on history-stacked proprioceptive observations, effectively bypassing the need for unreliable external state estimation. Extensive evaluations in simulation and on a physical Unitree G1 robot demonstrate that HAML achieves superior skill coverage and transition stability compared to state-of-the-art baselines, successfully realizing real-time, high-frequency control under onboard hardware constraints.