# Utilities¶

Utilities for training and evaluating RL models on OpenAI gym environments

class numpy_ml.rl_models.rl_utils.EnvModel[source]

A simple tabular environment model that maintains the counts of each reward-outcome pair given the state and action that preceded them. The model can be queried with

>>> M = EnvModel()
>>> M[(state, action, reward, next_state)] += 1
>>> M[(state, action, reward, next_state)]
1
>>> M.state_action_pairs()
[(state, action)]
>>> M.outcome_probs(state, action)
[(next_state, 1)]

state_action_pairs()[source]

Return all (state, action) pairs in the environment model

reward_outcome_pairs(s, a)[source]

Return all (reward, next_state) pairs associated with taking action a in state s.

outcome_probs(s, a)[source]

Return the probability under the environment model of each outcome state after taking action a in state s.

Parameters: s (int as returned by self._obs2num) – The id for the state/observation. a (int as returned by self._action2num) – The id for the action taken from state s. outcome_probs (list of (state, prob) tuples) – A list of each possible outcome and its associated probability under the model.
state_action_pairs_leading_to_outcome(outcome)[source]

Return all (state, action) pairs that have a nonzero probability of producing outcome under the current model.

Parameters: outcome (int) – The outcome state. pairs (list of (state, action) tuples) – A list of all (state, action) pairs with a nonzero probability of producing outcome under the model.
numpy_ml.rl_models.rl_utils.tile_state_space(env, env_stats, n_tilings, obs_max=None, obs_min=None, state_action=False, grid_size=(4, 4))[source]

Return a function to encode the continous observations generated by env in terms of a collection of n_tilings overlapping tilings (each with dimension grid_size) of the state space.

Parameters: env (gym.wrappers.time_limit.TimeLimit instance) – An openAI environment. n_tilings (int) – The number of overlapping tilings to use. Should be a power of 2. This determines the dimension of the discretized tile-encoded state vector. obs_max (float or np.ndarray) – The value to treat as the max value of the observation space when calculating the grid widths. If None, use env.observation_space.high. Default is None. obs_min (float or np.ndarray) – The value to treat as the min value of the observation space when calculating the grid widths. If None, use env.observation_space.low. Default is None. state_action (bool) – Whether to use tile coding to encode state-action values (True) or just state values (False). Default is False. grid_size (list of length 2) – A list of ints representing the coarseness of the tilings. E.g., a grid_size of [4, 4] would mean each tiling consisted of a 4x4 tile grid. Default is [4, 4]. encode_obs_as_tile (function) – A function which takes as input continous observation vector and returns a set of the indices of the active tiles in the tile coded observation space. n_states (int) – An integer reflecting the total number of unique states possible under this tile coding regimen.
numpy_ml.rl_models.rl_utils.get_gym_environs()[source]

List all valid OpenAI gym environment ids

numpy_ml.rl_models.rl_utils.get_gym_stats()[source]

Return a pandas DataFrame of the environment IDs.

numpy_ml.rl_models.rl_utils.is_tuple(env)[source]

Check if the action and observation spaces for env are instances of gym.spaces.Tuple or gym.spaces.Dict.

Notes

A tuple space is a tuple of several (possibly multidimensional) action/observation spaces. For our purposes, a tuple space is necessarily multidimensional.

Returns: tuple_action (bool) – Whether the env’s action space is an instance of gym.spaces.Tuple or gym.spaces.Dict. tuple_obs (bool) – Whether the env’s observation space is an instance of gym.spaces.Tuple or gym.spaces.Dict.
numpy_ml.rl_models.rl_utils.is_multidimensional(env)[source]

Check if the action and observation spaces for env are multidimensional or Tuple spaces.

Notes

A multidimensional space is any space whose actions / observations have more than one element in them. This includes Tuple spaces, but also includes single action/observation spaces with several dimensions.

Parameters: env (gym.wrappers or gym.envs instance) – The environment to evaluate. md_action (bool) – Whether the env’s action space is multidimensional. md_obs (bool) – Whether the env’s observation space is multidimensional. tuple_action (bool) – Whether the env’s action space is a Tuple instance. tuple_obs (bool) – Whether the env’s observation space is a Tuple instance.
numpy_ml.rl_models.rl_utils.is_continuous(env, tuple_action, tuple_obs)[source]

Check if an env’s observation and action spaces are continuous.

Parameters: env (gym.wrappers or gym.envs instance) – The environment to evaluate. tuple_action (bool) – Whether the env’s action space is an instance of gym.spaces.Tuple or gym.spaces.Dict. tuple_obs (bool) – Whether the env’s observation space is an instance of gym.spaces.Tuple or gym.spaces.Dict. cont_action (bool) – Whether the env’s action space is continuous. cont_obs (bool) – Whether the env’s observation space is continuous.
numpy_ml.rl_models.rl_utils.action_stats(env, md_action, cont_action)[source]

Get information on env’s action space.

Parameters: md_action (bool) – Whether the env’s action space is multidimensional. cont_action (bool) – Whether the env’s action space is continuous. n_actions_per_dim (list of length (action_dim,)) – The number of possible actions for each dimension of the action space. action_ids (list or None) – A list of all valid actions within the space. If cont_action is True, this value will be None. action_dim (int or None) – The number of dimensions in a single action.
numpy_ml.rl_models.rl_utils.obs_stats(env, md_obs, cont_obs)[source]

Get information on the observation space for env.

Parameters: env (gym.wrappers or gym.envs instance) – The environment to evaluate. md_obs (bool) – Whether the env’s action space is multidimensional. cont_obs (bool) – Whether the env’s observation space is multidimensional. n_obs_per_dim (list of length (obs_dim,)) – The number of possible observation classes for each dimension of the observation space. obs_ids (list or None) – A list of all valid observations within the space. If cont_obs is True, this value will be None. obs_dim (int or None) – The number of dimensions in a single observation.
numpy_ml.rl_models.rl_utils.env_stats(env)[source]

Compute statistics for the current environment.

Parameters: env (gym.wrappers or gym.envs instance) – The environment to evaluate. env_info (dict) – A dictionary containing information about the action and observation spaces of env.