# PPO

RoboVerse provides three PPO implementations with different features and use cases:

## 1. Stable-Baselines3 PPO (Recommended for Beginners)

Based on [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3), this implementation provides a more user-friendly interface with comprehensive configuration options.

### Usage

```bash
# Basic PPO training with Franka robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym

# PPO with Gym interface
python get_started/rl/0_ppo_gym_style.py --sim mjx --num-envs 256
```

### Configuration

Check the file header in `get_started/rl/0_ppo.py` for available configuration options including:
- Task selection (`--task`)
- Robot type (`--robot`) 
- Simulator backend (`--sim`)
- Environment settings

## 2. CleanRL PPO 

Based on [CleanRL](https://github.com/vwxyzjn/cleanrl), this implementation provides a more minimal and educational approach with direct algorithm implementation.

### Usage

```bash
# CleanRL PPO with RoboVerse environment
python roboverse_learn/rl/clean_rl/ppo.py --task reach_origin --robot franka --sim mjx --num_envs 2048
```

### Configuration

Configuration defaults live in `roboverse_learn/rl/configs/clean_rl/ppo.py` (parsed with `tyro`). Use `--help` for all options, including:
- Task selection (`--task`)
- Robot type (`--robot`)
- Simulator backend (`--sim`) 
- Training hyperparameters (`--num_envs`, `--learning_rate`, etc.)

## 3. RSL-RL PPO (OnPolicyRunner)

Based on [rsl_rl](https://github.com/leggedrobotics/rsl_rl) for high-throughput on-policy training with asymmetric observations.

### Usage

```bash
# RSL-RL PPO for Unitree G1 walking
python -m roboverse_learn.rl.rsl_rl.ppo --task walk_g1_dof29 --robot g1 --sim isaacgym --num-envs 4096
```

### Configuration

- Install dependency: `pip install rsl_rl`
- CLI defaults: `roboverse_learn/rl/configs/rsl_rl/ppo.py` (tyro). Run with `--help` to see environment, training, and PPO hyperparameters (`--num-steps-per-env`, `--max-iterations`, `--clip-param`, etc.).
- Outputs: checkpoints, TensorBoard logs, and final scripted policy are saved under `models/{exp_name}/{task}/` by default (override with `--model-dir`).

## Quick Start Examples

For detailed tutorials and infrastructure setup:

- **Infrastructure Overview**: See [RL Infrastructure](../../metasim/get_started/advanced/rl_example/infrastructure.md) for complete setup
- **Quick Examples**: See [Quick Start Examples](../../metasim/get_started/advanced/rl_example/quick_examples.md) for ready-to-run commands