Utilities

Utilities for training and evaluating RL models on OpenAI gym environments

class numpy_ml.rl_models.rl_utils.EnvModel[source]

A simple tabular environment model that maintains the counts of each reward-outcome pair given the state and action that preceded them. The model can be queried with

>>> M = EnvModel()
>>> M[(state, action, reward, next_state)] += 1
>>> M[(state, action, reward, next_state)]
1
>>> M.state_action_pairs()
[(state, action)]
>>> M.outcome_probs(state, action)
[(next_state, 1)]
state_action_pairs()[source]

Return all (state, action) pairs in the environment model

reward_outcome_pairs(s, a)[source]

Return all (reward, next_state) pairs associated with taking action a in state s.

outcome_probs(s, a)[source]

Return the probability under the environment model of each outcome state after taking action a in state s.

Parameters:
  • s (int as returned by self._obs2num) – The id for the state/observation.
  • a (int as returned by self._action2num) – The id for the action taken from state s.
Returns:

outcome_probs (list of (state, prob) tuples) – A list of each possible outcome and its associated probability under the model.

state_action_pairs_leading_to_outcome(outcome)[source]

Return all (state, action) pairs that have a nonzero probability of producing outcome under the current model.

Parameters:outcome (int) – The outcome state.
Returns:pairs (list of (state, action) tuples) – A list of all (state, action) pairs with a nonzero probability of producing outcome under the model.
numpy_ml.rl_models.rl_utils.tile_state_space(env, env_stats, n_tilings, obs_max=None, obs_min=None, state_action=False, grid_size=(4, 4))[source]

Return a function to encode the continous observations generated by env in terms of a collection of n_tilings overlapping tilings (each with dimension grid_size) of the state space.

Parameters:
  • env (gym.wrappers.time_limit.TimeLimit instance) – An openAI environment.
  • n_tilings (int) – The number of overlapping tilings to use. Should be a power of 2. This determines the dimension of the discretized tile-encoded state vector.
  • obs_max (float or np.ndarray) – The value to treat as the max value of the observation space when calculating the grid widths. If None, use env.observation_space.high. Default is None.
  • obs_min (float or np.ndarray) – The value to treat as the min value of the observation space when calculating the grid widths. If None, use env.observation_space.low. Default is None.
  • state_action (bool) – Whether to use tile coding to encode state-action values (True) or just state values (False). Default is False.
  • grid_size (list of length 2) – A list of ints representing the coarseness of the tilings. E.g., a grid_size of [4, 4] would mean each tiling consisted of a 4x4 tile grid. Default is [4, 4].
Returns:

  • encode_obs_as_tile (function) – A function which takes as input continous observation vector and returns a set of the indices of the active tiles in the tile coded observation space.
  • n_states (int) – An integer reflecting the total number of unique states possible under this tile coding regimen.

numpy_ml.rl_models.rl_utils.get_gym_environs()[source]

List all valid OpenAI gym environment ids

numpy_ml.rl_models.rl_utils.get_gym_stats()[source]

Return a pandas DataFrame of the environment IDs.

numpy_ml.rl_models.rl_utils.is_tuple(env)[source]

Check if the action and observation spaces for env are instances of gym.spaces.Tuple or gym.spaces.Dict.

Notes

A tuple space is a tuple of several (possibly multidimensional) action/observation spaces. For our purposes, a tuple space is necessarily multidimensional.

Returns:
  • tuple_action (bool) – Whether the env’s action space is an instance of gym.spaces.Tuple or gym.spaces.Dict.
  • tuple_obs (bool) – Whether the env’s observation space is an instance of gym.spaces.Tuple or gym.spaces.Dict.
numpy_ml.rl_models.rl_utils.is_multidimensional(env)[source]

Check if the action and observation spaces for env are multidimensional or Tuple spaces.

Notes

A multidimensional space is any space whose actions / observations have more than one element in them. This includes Tuple spaces, but also includes single action/observation spaces with several dimensions.

Parameters:env (gym.wrappers or gym.envs instance) – The environment to evaluate.
Returns:
  • md_action (bool) – Whether the env’s action space is multidimensional.
  • md_obs (bool) – Whether the env’s observation space is multidimensional.
  • tuple_action (bool) – Whether the env’s action space is a Tuple instance.
  • tuple_obs (bool) – Whether the env’s observation space is a Tuple instance.
numpy_ml.rl_models.rl_utils.is_continuous(env, tuple_action, tuple_obs)[source]

Check if an env’s observation and action spaces are continuous.

Parameters:
  • env (gym.wrappers or gym.envs instance) – The environment to evaluate.
  • tuple_action (bool) – Whether the env’s action space is an instance of gym.spaces.Tuple or gym.spaces.Dict.
  • tuple_obs (bool) – Whether the env’s observation space is an instance of gym.spaces.Tuple or gym.spaces.Dict.
Returns:

  • cont_action (bool) – Whether the env’s action space is continuous.
  • cont_obs (bool) – Whether the env’s observation space is continuous.

numpy_ml.rl_models.rl_utils.action_stats(env, md_action, cont_action)[source]

Get information on env’s action space.

Parameters:
  • md_action (bool) – Whether the env’s action space is multidimensional.
  • cont_action (bool) – Whether the env’s action space is continuous.
Returns:

  • n_actions_per_dim (list of length (action_dim,)) – The number of possible actions for each dimension of the action space.
  • action_ids (list or None) – A list of all valid actions within the space. If cont_action is True, this value will be None.
  • action_dim (int or None) – The number of dimensions in a single action.

numpy_ml.rl_models.rl_utils.obs_stats(env, md_obs, cont_obs)[source]

Get information on the observation space for env.

Parameters:
  • env (gym.wrappers or gym.envs instance) – The environment to evaluate.
  • md_obs (bool) – Whether the env’s action space is multidimensional.
  • cont_obs (bool) – Whether the env’s observation space is multidimensional.
Returns:

  • n_obs_per_dim (list of length (obs_dim,)) – The number of possible observation classes for each dimension of the observation space.
  • obs_ids (list or None) – A list of all valid observations within the space. If cont_obs is True, this value will be None.
  • obs_dim (int or None) – The number of dimensions in a single observation.

numpy_ml.rl_models.rl_utils.env_stats(env)[source]

Compute statistics for the current environment.

Parameters:env (gym.wrappers or gym.envs instance) – The environment to evaluate.
Returns:env_info (dict) – A dictionary containing information about the action and observation spaces of env.