Skip to content

Commit

Permalink
Improve rgb ppo, update code for latest sapien release (#291)
Browse files Browse the repository at this point in the history
* make pick cube a bit easier from vision

* updates

* fix typos

* Update sapien_env.py

* simplify code

* permit picking experiment name
  • Loading branch information
StoneT2000 authored Apr 26, 2024
1 parent e1a47be commit 7abac39
Show file tree
Hide file tree
Showing 18 changed files with 100 additions and 67 deletions.
6 changes: 3 additions & 3 deletions docs/source/user_guide/tutorials/custom_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ In general one use case of setting a positive `reconfiguration_freq` value is fo

## Episode Initialization / Randomization

Task initialization and randomization is handled in the `_initalize_actors` function and is called whenever `env.reset` is called. The objective here is to set the initial states of objects, including the robot. As the task ideally should be simulatable on the GPU, batched code is unavoidable. Note that furthermore, by default everything in ManiSkill tries to stay batched, even if there is only one element. Finally, like `_load_scene` the options argument is also passed down here if needed.
Task initialization and randomization is handled in the `_initialize_episode` function and is called whenever `env.reset` is called. The objective here is to set the initial states of objects, including the robot. As the task ideally should be simulatable on the GPU, batched code is unavoidable. Note that furthermore, by default everything in ManiSkill tries to stay batched, even if there is only one element. Finally, like `_load_scene` the options argument is also passed down here if needed.

An example from part of the PushCube task

Expand All @@ -168,7 +168,7 @@ from mani_skill.utils.structs.pose import Pose
import torch
class PushCubeEnv(BaseEnv):
# ...
def _initialize_actors(self, env_idx: torch.Tensor, options: dict):
def _initialize_episode(self, env_idx: torch.Tensor, options: dict):
# use the torch.device context manager to automatically create tensors on CPU or CUDA depending on self.device, the device the environment runs on
with torch.device(self.device):
b = len(env_idx)
Expand All @@ -184,7 +184,7 @@ class PushCubeEnv(BaseEnv):
self.obj.set_pose(obj_pose)
```

An `env_idx` is one of the arguments to this function, and is a list of environment IDs that need initialization. This is given as ManiSkill supports **partial resets**, where at each timestep potentially only a subset of parallel environments will undergo a reset, which calls `_initialize_actors` here.
An `env_idx` is one of the arguments to this function, and is a list of environment IDs that need initialization. This is given as ManiSkill supports **partial resets**, where at each timestep potentially only a subset of parallel environments will undergo a reset, which calls `_initialize_episode` here.

Since a scene builder is used, to initialize objects to their original states, we simply call `self.table_scene.initialize(env_idx)`, a function all scene builders implement.

Expand Down
10 changes: 5 additions & 5 deletions examples/baselines/ppo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,19 +79,19 @@ python ppo.py --env_id="OpenCabinetDrawer-v1" \

## Visual Based RL

Below is a sample of various commands for training a image-based policy with PPO that are lightly tuned. The fastest again is also PushCube-v1 which can take about 1-5 minutes and PickCube-v1 which takes 30-60 minutes. You will need to tune the `--num_envs` argument according to how much GPU memory you have as rendering visual observations uses a lot of memory. The settings below should all take less than 15GB of GPU memory. Note that while if you have enough memory you can easily increase the number of environments, this does not necessarily mean wall-time or sample efficiency improve.
Below is a sample of various commands for training a image-based policy with PPO that are lightly tuned. The fastest again is also PushCube-v1 which can take about 1-5 minutes and PickCube-v1 which takes 15-45 minutes. You will need to tune the `--num_envs` argument according to how much GPU memory you have as rendering visual observations uses a lot of memory. The settings below should all take less than 15GB of GPU memory. Note that while if you have enough memory you can easily increase the number of environments, this does not necessarily mean wall-time or sample efficiency improve.

The visual PPO baseline is not guaranteed to work for tasks not tested below as some tasks do not have dense rewards yet or well tuned ones, or simply are too hard with standard PPO (or our team has not had time to verify results yet)



```bash
python ppo_rgb.py --env_id="PushCube-v1" \
--num_envs=512 --update_epochs=8 --num_minibatches=16 \
--num_envs=256 --update_epochs=8 --num_minibatches=8 \
--total_timesteps=1_000_000 --eval_freq=10 --num-steps=20
python ppo_rgb.py --env_id="OpenCabinetDrawer-v1" \
--num_envs=256 --update_epochs=8 --num_minibatches=16 \
--total_timesteps=100_000_000 --num-steps=100 --num-eval-steps=100
python ppo_rgb.py --env_id="PickCube-v1" \
--num_envs=256 --update_epochs=8 --num_minibatches=8 \
--total_timesteps=10_000_000
```

To evaluate a trained policy you can run
Expand Down
10 changes: 7 additions & 3 deletions examples/baselines/ppo/ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

@dataclass
class Args:
exp_name: str = os.path.basename(__file__)[: -len(".py")]
exp_name: Optional[str] = None
"""the name of this experiment"""
seed: int = 1
"""seed of the experiment"""
Expand Down Expand Up @@ -158,7 +158,11 @@ def get_action_and_value(self, x, action=None):
args.batch_size = int(args.num_envs * args.num_steps)
args.minibatch_size = int(args.batch_size // args.num_minibatches)
args.num_iterations = args.total_timesteps // args.batch_size
run_name = f"{args.env_id}__{args.exp_name}__{args.seed}__{int(time.time())}"
if args.exp_name is None:
args.exp_name = os.path.basename(__file__)[: -len(".py")]
run_name = f"{args.env_id}__{args.exp_name}__{args.seed}__{int(time.time())}"
else:
run_name = args.exp_name
writer = None
if not args.evaluate:
print("Running training")
Expand Down Expand Up @@ -205,7 +209,7 @@ def get_action_and_value(self, x, action=None):
if args.save_train_video_freq is not None:
save_video_trigger = lambda x : (x // args.num_steps) % args.save_train_video_freq == 0
envs = RecordEpisode(envs, output_dir=f"runs/{run_name}/train_videos", save_trajectory=False, save_video_trigger=save_video_trigger, max_steps_per_video=args.num_steps, video_fps=30)
eval_envs = RecordEpisode(eval_envs, output_dir=eval_output_dir, save_trajectory=args.evaluate, max_steps_per_video=args.num_eval_steps, video_fps=30)
eval_envs = RecordEpisode(eval_envs, output_dir=eval_output_dir, save_trajectory=args.evaluate, trajectory_name="trajectory", max_steps_per_video=args.num_eval_steps, video_fps=30)
envs = ManiSkillVectorEnv(envs, args.num_envs, ignore_terminations=not args.partial_reset, **env_kwargs)
eval_envs = ManiSkillVectorEnv(eval_envs, args.num_eval_envs, ignore_terminations=not args.partial_reset, **env_kwargs)
assert isinstance(envs.single_action_space, gym.spaces.Box), "only continuous action space is supported"
Expand Down
20 changes: 12 additions & 8 deletions examples/baselines/ppo/ppo_rgb.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

@dataclass
class Args:
exp_name: str = os.path.basename(__file__)[: -len(".py")]
exp_name: Optional[str] = None
"""the name of this experiment"""
seed: int = 1
"""seed of the experiment"""
Expand Down Expand Up @@ -193,8 +193,8 @@ def __init__(self, sample_obs):
self.out_features += feature_size

# for state data we simply pass it through a single linear layer
extractors["state"] = nn.Linear(state_size, 64)
self.out_features += 64
extractors["state"] = nn.Linear(state_size, 256)
self.out_features += 256

self.extractors = nn.ModuleDict(extractors)

Expand All @@ -217,12 +217,12 @@ def __init__(self, envs, sample_obs):
latent_size = self.feature_net.out_features
self.critic = nn.Sequential(
layer_init(nn.Linear(latent_size, 512)),
nn.Tanh(),
nn.ReLU(inplace=True),
layer_init(nn.Linear(512, 1)),
)
self.actor_mean = nn.Sequential(
layer_init(nn.Linear(latent_size, 512)),
nn.Tanh(),
nn.ReLU(inplace=True),
layer_init(nn.Linear(512, np.prod(envs.unwrapped.single_action_space.shape)), std=0.01*np.sqrt(2)),
)
self.actor_logstd = nn.Parameter(torch.ones(1, np.prod(envs.unwrapped.single_action_space.shape)) * -0.5)
Expand Down Expand Up @@ -256,8 +256,11 @@ def get_action_and_value(self, x, action=None):
args.batch_size = int(args.num_envs * args.num_steps)
args.minibatch_size = int(args.batch_size // args.num_minibatches)
args.num_iterations = args.total_timesteps // args.batch_size
run_name = f"{args.env_id}__{args.exp_name}__{args.seed}__{int(time.time())}"

if args.exp_name is None:
args.exp_name = os.path.basename(__file__)[: -len(".py")]
run_name = f"{args.env_id}__{args.exp_name}__{args.seed}__{int(time.time())}"
else:
run_name = args.exp_name
writer = None
if not args.evaluate:
print("Running training")
Expand Down Expand Up @@ -297,6 +300,7 @@ def get_action_and_value(self, x, action=None):
# rgbd obs mode returns a dict of data, we flatten it so there is just a rgbd key and state key
envs = FlattenRGBDObservationWrapper(envs, rgb_only=True)
eval_envs = FlattenRGBDObservationWrapper(eval_envs, rgb_only=True)

if isinstance(envs.action_space, gym.spaces.Dict):
envs = FlattenActionSpaceWrapper(envs)
eval_envs = FlattenActionSpaceWrapper(eval_envs)
Expand All @@ -308,7 +312,7 @@ def get_action_and_value(self, x, action=None):
if args.save_train_video_freq is not None:
save_video_trigger = lambda x : (x // args.num_steps) % args.save_train_video_freq == 0
envs = RecordEpisode(envs, output_dir=f"runs/{run_name}/train_videos", save_trajectory=False, save_video_trigger=save_video_trigger, max_steps_per_video=args.num_steps, video_fps=30)
eval_envs = RecordEpisode(eval_envs, output_dir=eval_output_dir, save_trajectory=args.evaluate, max_steps_per_video=args.num_eval_steps, video_fps=30)
eval_envs = RecordEpisode(eval_envs, output_dir=eval_output_dir, save_trajectory=args.evaluate, trajectory_name="trajectory", max_steps_per_video=args.num_eval_steps, video_fps=30)
envs = ManiSkillVectorEnv(envs, args.num_envs, ignore_terminations=False, **env_kwargs)
eval_envs = ManiSkillVectorEnv(eval_envs, args.num_eval_envs, ignore_terminations=False, **env_kwargs)
assert isinstance(envs.single_action_space, gym.spaces.Box), "only continuous action space is supported"
Expand Down
44 changes: 41 additions & 3 deletions mani_skill/envs/sapien_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,8 +275,8 @@ def __init__(
self.single_observation_space
self.observation_space

def _update_obs_space(self, obs: Any):
"""call this function if you modify the observations returned by env.step and env.reset via an observation wrapper. The given observation must be a numpy array"""
def update_obs_space(self, obs: Any):
"""call this function if you modify the observations returned by env.step and env.reset via an observation wrapper."""
self._init_raw_obs = common.to_numpy(obs)
del self.single_observation_space
del self.observation_space
Expand Down Expand Up @@ -346,6 +346,10 @@ def _default_human_render_camera_configs(
"""Add default cameras for rendering when using render_mode='rgb_array'. These can be overriden by the user at env creation time """
return []

@property
def scene(self):
return self._scene

@property
def sim_freq(self):
return self._sim_freq
Expand Down Expand Up @@ -902,7 +906,10 @@ def _after_simulation_step(self):
# Simulation and other gym interfaces
# -------------------------------------------------------------------------- #
def _set_scene_config(self):
physx.set_scene_config(**self.sim_cfg.scene_cfg.dict())
# **self.sim_cfg.scene_cfg.dict()
physx.set_shape_config(contact_offset=self.sim_cfg.scene_cfg.contact_offset, rest_offset=self.sim_cfg.scene_cfg.rest_offset)
physx.set_body_config(solver_position_iterations=self.sim_cfg.scene_cfg.solver_position_iterations, solver_velocity_iterations=self.sim_cfg.scene_cfg.solver_velocity_iterations, sleep_threshold=self.sim_cfg.scene_cfg.sleep_threshold)
physx.set_scene_config(gravity=self.sim_cfg.scene_cfg.gravity, bounce_threshold=self.sim_cfg.scene_cfg.bounce_threshold, enable_pcm=self.sim_cfg.scene_cfg.enable_pcm, enable_tgs=self.sim_cfg.scene_cfg.enable_tgs, enable_ccd=self.sim_cfg.scene_cfg.enable_ccd, enable_enhanced_determinism=self.sim_cfg.scene_cfg.enable_enhanced_determinism, enable_friction_every_iteration=self.sim_cfg.scene_cfg.enable_friction_every_iteration, cpu_workers=self.sim_cfg.scene_cfg.cpu_workers )
physx.set_default_material(**self.sim_cfg.default_materials_cfg.dict())

def _setup_scene(self):
Expand Down Expand Up @@ -1169,3 +1176,34 @@ def render(self):
# scene_mesh = merge_meshes(meshes)
# scene_pcd = scene_mesh.sample(num_points)
# return scene_pcd


# Printing metrics/info
def print_sim_details(self):
sensor_settings_str = []
for uid, cam in self._sensors.items():
if isinstance(cam, Camera):
cfg = cam.cfg
sensor_settings_str.append(f"RGBD ({cfg.width}x{cfg.height})")
sensor_settings_str = "_".join(sensor_settings_str)
sim_backend = "gpu" if physx.is_gpu_enabled() else "cpu"
print(
"# -------------------------------------------------------------------------- #"
)
print(
f"Task ID: {self.spec.id}, {self.num_envs} parallel environments, sim_backend={sim_backend}"
)
print(
f"obs_mode={self.obs_mode}, control_mode={self.control_mode}"
)
print(
f"render_mode={self.render_mode}, sensor_details={sensor_settings_str}"
)
print(
f"sim_freq={self.sim_freq}, control_freq={self.control_freq}"
)
print(f"observation space: {self.observation_space}")
print(f"(single) action space: {self.single_action_space}")
print(
"# -------------------------------------------------------------------------- #"
)
4 changes: 2 additions & 2 deletions mani_skill/envs/scene.py
Original file line number Diff line number Diff line change
Expand Up @@ -521,9 +521,9 @@ def _setup_gpu(self):
# As physx_system.gpu_init() was called a single physx step was also taken. So we need to reset
# all the actors and articulations to their original poses as they likely have collided
for actor in self.non_static_actors:
actor.set_pose(actor.inital_pose)
actor.set_pose(actor.initial_pose)
for articulation in self.articulations.values():
articulation.set_pose(articulation.inital_pose)
articulation.set_pose(articulation.initial_pose)
self.px.gpu_apply_rigid_dynamic_data()
self.px.gpu_apply_articulation_root_pose()

Expand Down
4 changes: 3 additions & 1 deletion mani_skill/envs/tasks/control/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ def __init__(self, *args, robot_uids=CartPoleRobot, **kwargs):
@property
def _default_sim_config(self):
return SimConfig(
sim_freq=100, control_freq=100, scene_cfg=SceneConfig(solver_iterations=2)
sim_freq=100,
control_freq=100,
scene_cfg=SceneConfig(solver_position_iterations=2),
)

@property
Expand Down
2 changes: 1 addition & 1 deletion mani_skill/envs/tasks/quadruped_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def _default_sim_config(self):
max_rigid_contact_count=2**20,
),
scene_cfg=SceneConfig(
solver_iterations=4,
solver_position_iterations=4,
bounce_threshold=0.2,
),
)
Expand Down
4 changes: 2 additions & 2 deletions mani_skill/envs/tasks/tabletop/pick_clutter_ycb.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,11 +164,11 @@ def _initialize_episode(self, env_idx: torch.Tensor, options: dict):
# reset objects to original poses
if b == self.num_envs:
# if all envs reset
self.all_objects.pose = self.all_objects.inital_pose
self.all_objects.pose = self.all_objects.initial_pose
else:
# if only some envs reset, we unfortunately still have to do some mask wrangling
mask = torch.isin(self.all_objects._scene_idxs, env_idx)
self.all_objects.pose = self.all_objects.inital_pose[mask]
self.all_objects.pose = self.all_objects.initial_pose[mask]

def evaluate(self):
return {
Expand Down
9 changes: 7 additions & 2 deletions mani_skill/envs/tasks/tabletop/pick_cube.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,11 @@ def _initialize_episode(self, env_idx: torch.Tensor, options: dict):
self.goal_site.set_pose(Pose.create_from_pq(goal_xyz))

def _get_obs_extra(self, info: Dict):
# in reality some people hack is_grasped into observations by checking if the gripper can close fully or not
obs = dict(
tcp_pose=self.agent.tcp.pose.raw_pose, goal_pos=self.goal_site.pose.p
is_grasped=info["is_grasped"],
tcp_pose=self.agent.tcp.pose.raw_pose,
goal_pos=self.goal_site.pose.p,
)
if "state" in self.obs_mode:
obs.update(
Expand All @@ -86,11 +89,13 @@ def evaluate(self):
torch.linalg.norm(self.goal_site.pose.p - self.cube.pose.p, axis=1)
<= self.goal_thresh
)
is_grasped = self.agent.is_grasping(self.cube)
is_robot_static = self.agent.is_static(0.2)
return {
"success": is_obj_placed & is_robot_static,
"is_obj_placed": is_obj_placed,
"is_robot_static": is_robot_static,
"is_grasped": is_grasped,
}

def compute_dense_reward(self, obs: Any, action: torch.Tensor, info: Dict):
Expand All @@ -100,7 +105,7 @@ def compute_dense_reward(self, obs: Any, action: torch.Tensor, info: Dict):
reaching_reward = 1 - torch.tanh(5 * tcp_to_obj_dist)
reward = reaching_reward

is_grasped = self.agent.is_grasping(self.cube)
is_grasped = info["is_grasped"]
reward += is_grasped

obj_to_goal_dist = torch.linalg.norm(
Expand Down
27 changes: 2 additions & 25 deletions mani_skill/examples/benchmarking/gpu_sim.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,31 +38,8 @@ def main(args):
else:
env = gym.make_vec(args.env_id, num_envs=args.num_envs, vectorization_mode="async", vector_kwargs=dict(context="spawn"), obs_mode=args.obs_mode,)
base_env = gym.make(args.env_id, obs_mode=args.obs_mode).unwrapped
sensor_settings_str = []
for uid, cam in base_env._sensors.items():
cfg = cam.cfg
sensor_settings_str.append(f"{cfg.width}x{cfg.height}")
sensor_settings_str = "_".join(sensor_settings_str)
print(
"# -------------------------------------------------------------------------- #"
)
print(
f"Benchmarking ManiSkill GPU Simulation with {num_envs} parallel environments"
)
print(
f"env_id={args.env_id}, obs_mode={args.obs_mode}, control_mode={args.control_mode}"
)
print(
f"render_mode={args.render_mode}, sensor_details={sensor_settings_str}, save_video={args.save_video}"
)
print(
f"sim_freq={base_env.sim_freq}, control_freq={base_env.control_freq}"
)
print(f"observation space: {env.observation_space}")
print(f"action space: {base_env.single_action_space}")
print(
"# -------------------------------------------------------------------------- #"
)

base_env.print_sim_details()
images = []
video_nrows = int(np.sqrt(num_envs))
with torch.inference_mode():
Expand Down
6 changes: 4 additions & 2 deletions mani_skill/utils/building/actor_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,8 +212,10 @@ def build(self, name):
and initial_pose_b == 1
and physx.is_gpu_enabled()
):
actor.inital_pose = Pose.create(initial_pose.raw_pose.repeat(num_actors, 1))
actor.initial_pose = Pose.create(
initial_pose.raw_pose.repeat(num_actors, 1)
)
else:
actor.inital_pose = initial_pose
actor.initial_pose = initial_pose
self.scene.actors[self.name] = actor
return actor
2 changes: 1 addition & 1 deletion mani_skill/utils/building/articulation_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,6 @@ def build(
articulation: Articulation = Articulation.create_from_physx_articulations(
articulations, self.scene, self.scene_idxs
)
articulation.inital_pose = initial_pose
articulation.initial_pose = initial_pose
self.scene.articulations[self.name] = articulation
return articulation
Loading

0 comments on commit 7abac39

Please sign in to comment.