Skip to content

Commit

Permalink
Multiprocessing and better documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
bpiwowar committed Sep 6, 2024
1 parent 0e4a6ef commit f028543
Show file tree
Hide file tree
Showing 6 changed files with 267 additions and 106 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# Version 0.6.0

- *Multiprocess*: no more limitation on the number of races
- Improved documentation

# Version 0.5.0

- Changed default for steer discretization steps (7)
Expand Down
114 changes: 74 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,18 @@ Each controlled kart is parametrized by `pystk2_gymnasium.AgentSpec`:
- `name` defines name of the player (displayed on top of the kart)
- `rank_start` defines the starting position (None for random, which is the
default)
- `use_ai` flag (False by default) to ignore actions (when calling `step`, and
use a SuperTuxKart bot)
- `use_ai` flag (False by default) to ignore actions (when calling `step`, a
SuperTuxKart bot is used instead of using the action)
- `camera_mode` can be set to `AUTO` (camera on for non STK bots), `ON` (camera
on) or `OFF` (no camera).


## Current limitations

## Environments
- no graphics information is available (i.e. pixmap)

Limitations:

- only one SuperTuxKart environment can be created for now
- no graphics information is available (i.e. pixmap)
## Environments

After importing `pystk2_gymnasium`, the following environments are available:

Expand All @@ -55,19 +54,22 @@ After importing `pystk2_gymnasium`, the following environments are available:
- `difficulty` is the difficulty of the AI bots (lowest 0 to highest 2,
default to 2)

Some environments are created using wrappers,
- `supertuxkart/simple-v0` is a simplified environment with a fixed number of
observations for paths (controlled by `state_paths`, default 5), items
(`state_items`, default 5), karts (`state_karts`, default 5)
- `supertuxkart/flattened-v0` has observation and action spaces simplified at
the maximum (only `discrete` and `continuous` keys)
- `supertuxkart/flattened_continuous_actions-v0` removes discrete actions
Some environments are created using wrappers (see below for wrapper
documentation),
- `supertuxkart/simple-v0` (wrappers: `ConstantSizedObservations`) is a
simplified environment with a fixed number of observations for paths
(controlled by `state_paths`, default 5), items (`state_items`, default 5),
karts (`state_karts`, default 5)
- `supertuxkart/flattened-v0` (wrappers: `ConstantSizedObservations`,
`PolarObservations`, `FlattenerWrapper`) has observation and action spaces
simplified at the maximum (only `discrete` and `continuous` keys)
- `supertuxkart/flattened_continuous_actions-v0` (wrappers: `ConstantSizedObservations`, `PolarObservations`, `OnlyContinuousActionsWrapper`, `FlattenerWrapper`) removes discrete actions
(default to 0) so this is steer/acceleration only in the continuous domain
- `supertuxkart/flattened_multidiscrete-v0` is like the previous one, but with
- `supertuxkart/flattened_multidiscrete-v0` (wrappers: `ConstantSizedObservations`, `PolarObservations`, `DiscreteActionsWrapper`, `FlattenerWrapper`) is like the previous one, but with
fully multi-discrete actions. `acceleration_steps` and `steer_steps` (default
to 5) control the number of discrete values for acceleration and steering
respectively.
- `supertuxkart/flattened_discrete-v0` is like the previous one, but with fully
- `supertuxkart/flattened_discrete-v0` (wrappers: `ConstantSizedObservations`, `PolarObservations`, `DiscreteActionsWrapper`, `FlattenerWrapper`, `FlattenMultiDiscreteActions`) is like the previous one, but with fully
discretized actions

The reward $r_t$ at time $t$ is given by
Expand All @@ -83,20 +85,41 @@ finishes the race.

Wrappers can be used to modify the environment.

### ConstantSizedObservations
### Constant-size observation

`pystk2_gymnasium.ConstantSizedObservations( env, state_items=5,
state_karts=5, state_paths=5 )` ensures that the number of observed items,
karts and paths is constant. By default, the number of observations per category
is 5.

### Polar observations

`pystk2_gymnasium.PolarObservations(env)` changes Cartesian
coordinates to polar ones (angle in the horizontal plane, angle in the vertical plan, and distance) of all 3D vectors.

### Discrete actions

Ensures that the number of observed items is constant (e.g. for other karts,
tracks).
`pystk2_gymnasium.DiscreteActionsWrapper(env, acceleration_steps=5, steer_steps=7)` discretizes acceleration and steer actions (5 and 7 values respectively).

### PolarObservations
### Flattener (actions and observations)

Changes Cartesian coordinates to Polar ones.
This wrapper groups all continuous and discrete spaces together.

### FlattenerWrapper
`pystk2_gymnasium.FlattenerWrapper(env)` flattens **actions and
observations**. The base environment should be a dictionary of observation
spaces. The transformed environment is a dictionary made with two entries,
`discrete` and `continuous` (if both continuous and discrete
observations/actions are present in the initial environment, otherwise it is
either the type of `discrete` or `continuous`). `discrete` is `MultiDiscrete`
space that combines all the discrete (and multi-discrete) observations, while
`continuous` is a `Box` space.

Flattens actions and observations
### Flatten multi-discrete actions

### FlattenMultiDiscreteActions
`pystk2_gymnasium.FlattenMultiDiscreteActions(env)` flattens a multi-discrete
action space into a discrete one, with one action per possible unique choice of
actions. For instance, if the initial space is $\{0, 1\} \times \{0, 1, 2\}$,
the action space becomes $\{0, 1, \ldots, 6\}$.


## Multi-agent environment
Expand All @@ -107,9 +130,13 @@ dictionary of single-kart ones where **string** keys that range from `0` to
`n-1` with `n` the number of karts.

To use different gymnasium wrappers, one can use a `MonoAgentWrapperAdapter`.
Example

Let's look at an example to illustrate this:

```py

from pystk_gymnasium import AgentSpec

agents = [
AgentSpec(use_ai=True, name="Yin Team", camera_mode=CameraMode.ON),
AgentSpec(use_ai=True, name="Yang Team", camera_mode=CameraMode.ON),
Expand Down Expand Up @@ -155,7 +182,8 @@ up):
- `paths_start`, `paths_end`, `paths_width`: 3D vectors to the paths start and
end, and vector of their widths (scalar). The paths are sorted so that the
first element of the array is the current one.
- `paths_distance`: the distance of the paths starts and ends (vector of dimension 2)
- `paths_distance`: the distance of the paths starts and ends (vector of
dimension 2)
- `powerup`: collected power-up
- `shield_time`
- `skeed_factor`
Expand All @@ -167,19 +195,25 @@ up):
import gymnasium as gym
from pystk2_gymnasium import AgentSpec

# Use a a flattened version of the observation and action spaces
# In both case, this corresponds to a dictionary with two keys:
# - `continuous` is a vector corresponding to the continuous observations
# - `discrete` is a vector (of integers) corresponding to discrete observations
env = gym.make("supertuxkart/flattened-v0", render_mode="human", agents=[AgentSpec(use_ai=False)])

ix = 0
done = False
state, *_ = env.reset()

while not done:
ix += 1
action = env.action_space.sample()
state, reward, terminated, truncated, _ = env.step(action)
done = truncated or terminated

# STK gymnasium uses one process
if __name__ == '__main__':
# Use a a flattened version of the observation and action spaces
# In both case, this corresponds to a dictionary with two keys:
# - `continuous` is a vector corresponding to the continuous observations
# - `discrete` is a vector (of integers) corresponding to discrete observations
env = gym.make("supertuxkart/flattened-v0", render_mode="human", agent=AgentSpec(use_ai=False))

ix = 0
done = False
state, *_ = env.reset()

while not done:
ix += 1
action = env.action_space.sample()
state, reward, terminated, truncated, _ = env.step(action)
done = truncated or terminated

# Important to stop the STK process
env.close()
```
11 changes: 10 additions & 1 deletion src/pystk2_gymnasium/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
from gymnasium.envs.registration import register, WrapperSpec
from .definitions import ActionObservationWrapper, AgentSpec # noqa: F401
from .wrappers import MonoAgentWrapperAdapter # noqa: F401
from .wrappers import ( # noqa: F401
MonoAgentWrapperAdapter,
FlattenMultiDiscreteActions,
FlattenerWrapper,
)
from .stk_wrappers import ( # noqa: F401
ConstantSizedObservations,
DiscreteActionsWrapper,
PolarObservations,
)

# Version is setup automatically
__version__ = "0.0.0"
Expand Down
85 changes: 32 additions & 53 deletions src/pystk2_gymnasium/envs.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
import pystk2
from gymnasium import spaces

from pystk2_gymnasium.pystk_process import PySTKProcess

from .utils import max_enum_value, rotate
from .definitions import AgentSpec

Expand Down Expand Up @@ -117,28 +119,18 @@ def get_action(action: STKAction):
class BaseSTKRaceEnv(gym.Env[Any, STKAction]):
metadata = {"render_modes": ["human"]}

INITIALIZED: ClassVar[Optional[bool]] = None

#: List of available tracks
TRACKS: ClassVar[List[str]] = []

@staticmethod
def initialize(with_graphics: bool):
if BaseSTKRaceEnv.INITIALIZED is None:
BaseSTKRaceEnv.INITIALIZED = with_graphics
pystk2.init(
pystk2.GraphicsConfig.hd()
if with_graphics
else pystk2.GraphicsConfig.none()
)
#: Flag when pystk is initialized
_process: PySTKProcess = None

assert (
with_graphics == BaseSTKRaceEnv.INITIALIZED
), "Cannot switch from graphics to not graphics mode"
def initialize(self, with_graphics: bool):
if self._process is None:
self._process = PySTKProcess(with_graphics)

BaseSTKRaceEnv.TRACKS = pystk2.list_tracks(
pystk2.RaceConfig.RaceMode.NORMAL_RACE
)
if not BaseSTKRaceEnv.TRACKS:
BaseSTKRaceEnv.TRACKS = self._process.list_tracks()

def __init__(
self,
Expand All @@ -164,7 +156,7 @@ def __init__(

assert render_mode is None or render_mode in self.metadata["render_modes"]
self.render_mode = render_mode
BaseSTKRaceEnv.initialize(render_mode == "human")
self.initialize(render_mode == "human")

# Setup the variables
self.default_track = track
Expand Down Expand Up @@ -207,36 +199,14 @@ def reset_race(
ix
].controller = pystk2.PlayerConfig.Controller.AI_CONTROL

def warmup_race(self):
"""Creates a new race and step until the first move"""
assert self.race is None

self.race = pystk2.Race(self.config)

# Start race
self.race.start()
self.world = pystk2.WorldState()
self.track = pystk2.Track()
self.track.update()

while True:
self.race.step()
self.world.update()
if self.world.phase == pystk2.WorldState.Phase.GO_PHASE:
break

def close(self):
super().close()
if self.race is not None:
self.race.stop()
self.race = None

def world_update(self):
def world_update(self, keep=True):
"""Update world state, but keep some information to compute reward"""
self.last_overall_distances = [
max(kart.overall_distance, 0) for kart in self.world.karts
]
self.world.update()
if keep:
self.last_overall_distances = [
max(kart.overall_distance, 0) for kart in self.world.karts
]
self.world = self._process.get_world()
return self.world

def get_state(self, kart_ix: int, use_ai: bool):
kart = self.world.karts[kart_ix]
Expand Down Expand Up @@ -337,7 +307,7 @@ def sort_closest(positions, *lists):
obs = {}
if use_ai:
# Adds actions
action = self.race.get_kart_action(kart_ix)
action = self._process.get_kart_action(kart_ix)
obs = {
"action": {
"acceleration": np.array([action.acceleration], dtype=np.float32),
Expand Down Expand Up @@ -392,6 +362,15 @@ def render(self):
# Just do nothing... rendering is done directly
pass

def race_step(self, *action):
return self._process.race_step(*action)

def warmup_race(self):
self.track = self._process.warmup_race(self.config)

def close(self):
self._process.close()


class STKRaceEnv(BaseSTKRaceEnv):
"""Single player race environment"""
Expand Down Expand Up @@ -445,17 +424,17 @@ def reset(
].controller = pystk2.PlayerConfig.Controller.PLAYER_CONTROL

self.warmup_race()
self.world.update()
self.world_update(False)

return self.get_observation(self.kart_ix, self.agent.use_ai), {}

def step(
self, action: STKAction
) -> Tuple[pystk2.WorldState, float, bool, bool, Dict[str, Any]]:
if self.agent.use_ai:
self.race.step()
self.race_step()
else:
self.race.step(get_action(action))
self.race_step(get_action(action))

self.world_update()

Expand Down Expand Up @@ -537,7 +516,7 @@ def reset(
logging.debug("Observed kart indices %s", self.kart_indices)

self.warmup_race()
self.world.update()
self.world_update(False)

return (
{
Expand All @@ -554,7 +533,7 @@ def step(
) -> Tuple[pystk2.WorldState, float, bool, bool, Dict[str, Any]]:
# Performs the action
assert len(actions) == len(self.agents)
self.race.step(
self.race_step(
[
get_action(actions[str(agent_ix)])
for agent_ix, agent in enumerate(self.agents)
Expand Down
Loading

0 comments on commit f028543

Please sign in to comment.