PPO#

RoboVerse provides three PPO implementations with different features and use cases:

1. Stable-Baselines3 PPO (Recommended for Beginners)#

Based on Stable-Baselines3, this implementation provides a more user-friendly interface with comprehensive configuration options.

Usage#

# Basic PPO training with Franka robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym

# PPO with Gym interface
python get_started/rl/0_ppo_gym_style.py --sim mjx --num-envs 256

Configuration#

Check the file header in get_started/rl/0_ppo.py for available configuration options including:

Task selection (--task)
Robot type (--robot)
Simulator backend (--sim)
Environment settings

2. CleanRL PPO#

Based on CleanRL, this implementation provides a more minimal and educational approach with direct algorithm implementation.

Usage#

# CleanRL PPO with RoboVerse environment
python roboverse_learn/rl/clean_rl/ppo.py --task reach_origin --robot franka --sim mjx --num_envs 2048

Configuration#

Configuration defaults live in roboverse_learn/rl/configs/clean_rl/ppo.py (parsed with tyro). Use --help for all options, including:

Task selection (--task)
Robot type (--robot)
Simulator backend (--sim)
Training hyperparameters (--num_envs, --learning_rate, etc.)

3. RSL-RL PPO (OnPolicyRunner)#

Based on rsl_rl for high-throughput on-policy training with asymmetric observations.

Usage#

# RSL-RL PPO for Unitree G1 walking
python -m roboverse_learn.rl.rsl_rl.ppo --task walk_g1_dof29 --robot g1 --sim isaacgym --num-envs 4096

Configuration#

Install dependency: pip install rsl_rl
CLI defaults: roboverse_learn/rl/configs/rsl_rl/ppo.py (tyro). Run with --help to see environment, training, and PPO hyperparameters (--num-steps-per-env, --max-iterations, --clip-param, etc.).
Outputs: checkpoints, TensorBoard logs, and final scripted policy are saved under models/{exp_name}/{task}/ by default (override with --model-dir).

Quick Start Examples#

For detailed tutorials and infrastructure setup:

Infrastructure Overview: See RL Infrastructure for complete setup
Quick Examples: See Quick Start Examples for ready-to-run commands

PPO#

1. Stable-Baselines3 PPO (Recommended for Beginners)#

Usage#

Configuration#

2. CleanRL PPO#

Usage#

Configuration#

3. RSL-RL PPO (OnPolicyRunner)#

Usage#

Configuration#

Quick Start Examples#

This Page