[실습] π₀ 행동 생성을 LeRobot으로

들어가며

6편의 π₀는 LeRobot에 정식 통합되어 있어, OpenVLA보다 훨씬 가볍게 시작할 수 있습니다. PaliGemma 3B + Action Expert 구조가 flow matching으로 어떻게 행동을 만드는지를 직접 확인해 봅니다.

이번 글에서 다룰 두 가지는 다음과 같습니다.

π₀로 ALOHA sim에서 task 수행
Flow matching denoising 단계 시각화 — 노이즈에서 행동이 만들어지는 과정

π₀는 약 3.3B 파라미터(PaliGemma 3B + Action Expert)라 8GB VRAM이면 추론 가능합니다. CPU도 가능하나 매우 느립니다.

0. 환경 준비

LeRobot의 π₀는 0.10.0 이후 버전에 통합됐습니다.

git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e ".[pi0,aloha]"

설치 확인:

from lerobot.common.policies.pi0.modeling_pi0 import PI0Policy
print("pi0 available")

1. π₀로 ALOHA sim에서 task 수행

import torch
import gym_aloha
import gymnasium as gym
import imageio

from lerobot.common.policies.pi0.modeling_pi0 import PI0Policy

device = "cuda" if torch.cuda.is_available() else "cpu"

# 사전학습 π₀ (~7GB 다운로드)
policy = PI0Policy.from_pretrained("lerobot/pi0")
policy.to(device).eval()

env = gym.make(
    "gym_aloha/AlohaTransferCube-v0",
    obs_type="pixels_agent_pos",
    max_episode_steps=400,
)

obs, info = env.reset(seed=42)
frames = [env.render()]
done = False
total_reward = 0

while not done:
    state = torch.from_numpy(obs["agent_pos"]).float().to(device).unsqueeze(0)
    images = {
        f"observation.images.{cam}":
            torch.from_numpy(obs["pixels"][cam])
                .float().permute(2, 0, 1).unsqueeze(0).to(device) / 255.0
        for cam in obs["pixels"]
    }
    observation = {
        "observation.state": state,
        **images,
        "task": "transfer the cube from one arm to the other",
    }
    with torch.no_grad():
        action = policy.select_action(observation)

    obs, reward, terminated, truncated, info = env.step(action.squeeze(0).cpu().numpy())
    done = terminated or truncated
    frames.append(env.render())
    total_reward += reward

print(f"reward: {total_reward:.2f}, success: {info.get('is_success', False)}")
imageio.mimsave("pi0_aloha.mp4", frames, fps=50)

결과 해석

영상에서 양팔이 큐브를 한 팔에서 다른 팔로 넘기는 모습을 보실 수 있습니다. ACT나 Diffusion Policy 같은 specialist와 비교했을 때, π₀의 차이는 이렇습니다.

자연어 task description이 입력으로 들어감 ("transfer the cube...")
Flow matching으로 더 부드러운 연속 행동
학습 시 못 본 비슷한 task에도 어느 정도 일반화

같은 환경에서 3편 실습의 ACT 결과와 영상을 비교해 보시기 바랍니다. ACT는 학습 task에는 강하지만 instruction을 미세하게 바꾸면 흔들립니다. π₀는 더 안정적입니다.

2. Flow Matching의 denoising 단계 시각화

π₀의 핵심은 flow matching입니다. 노이즈에서 시작해 학습된 velocity field를 따라 step별로 행동을 만들어갑니다. 보통 5~10 step.

LeRobot의 PI0Policy는 내부적으로 denoising loop를 호출합니다. 이걸 직접 호출해 중간 단계의 행동을 모아 시각화할 수 있습니다.

import matplotlib.pyplot as plt
import numpy as np

# 한 step의 관측을 고정
obs, _ = env.reset(seed=42)
state = torch.from_numpy(obs["agent_pos"]).float().to(device).unsqueeze(0)
images = {
    f"observation.images.{cam}":
        torch.from_numpy(obs["pixels"][cam])
            .float().permute(2, 0, 1).unsqueeze(0).to(device) / 255.0
    for cam in obs["pixels"]
}
batch = {
    "observation.state": state,
    **images,
    "task": "transfer the cube from one arm to the other",
}

# sample_actions를 직접 호출해 각 step의 행동 수집
# (LeRobot ≥ 0.10에서 return_intermediate 옵션 제공)
with torch.no_grad():
    intermediate = policy.sample_actions(
        batch,
        num_steps=10,
        return_intermediate=True,
    )
# intermediate: (num_steps + 1, batch, chunk_len, action_dim)

# 첫 batch, 첫 chunk step의 14차원 행동이 step 진행에 따라 어떻게 변하는지
arr = intermediate[:, 0, 0, :].cpu().numpy()  # (steps+1, 14)
for joint_idx in range(4):  # 처음 4개 관절만
    plt.plot(arr[:, joint_idx], label=f"joint {joint_idx}")
plt.xlabel("denoising step")
plt.ylabel("action value")
plt.legend()
plt.savefig("flow_matching_progression.png")

결과 해석

flow_matching_progression.png를 보면 노이즈에서 시작한 값들이 점진적으로 안정된 행동으로 수렴합니다. 5~10 step 안에 수렴이 끝나는 게 보입니다.

이게 π₀가 Diffusion Policy보다 빠른 이유입니다. Diffusion은 보통 50~100 step이 필요하니까요.

num_steps를 1, 2, 5, 10으로 바꿔가며 비교해 보시면 step이 너무 적을 때 행동이 거칠어지고, 충분할 때 매끈해지는 차이가 보입니다.

정리

실습	손에 잡힌 개념
π₀ + ALOHA sim	VLM + Action Expert 구조의 실제 동작
denoising 단계 시각화	flow matching이 노이즈를 행동으로 변환하는 과정

π₀는 7편의 SmolVLA, GR00T N1로 이어지는 산업화 흐름의 출발점입니다. 다음 실습에서는 더 작은 SmolVLA를 노트북에서 굴려봅니다.

다음 글 안내

이전 실습 → [실습] OpenVLA
다음 실습 → [실습] SmolVLA로 시작하기
본문 글 → π 시리즈와 산업화의 시작
시리즈 전체 지도 → VLA 학습 로드맵