Utilities¶

Utilities for training and evaluating RL models on OpenAI gym environments

class numpy_ml.rl_models.rl_utils.EnvModel[source]¶

A simple tabular environment model that maintains the counts of each reward-outcome pair given the state and action that preceded them. The model can be queried with

>>> M = EnvModel()
>>> M[(state, action, reward, next_state)] += 1
>>> M[(state, action, reward, next_state)]
1
>>> M.state_action_pairs()
[(state, action)]
>>> M.outcome_probs(state, action)
[(next_state, 1)]

state_action_pairs()[source]¶: Return all (state, action) pairs in the environment model

reward_outcome_pairs(s, a)[source]¶: Return all (reward, next_state) pairs associated with taking action a in state s.

outcome_probs(s, a)[source]¶

Return the probability under the environment model of each outcome state after taking action a in state s.

Parameters:	s (int as returned by `self._obs2num`) – The id for the state/observation. a (int as returned by `self._action2num`) – The id for the action taken from state s.
Returns:	outcome_probs (list of (state, prob) tuples) – A list of each possible outcome and its associated probability under the model.

state_action_pairs_leading_to_outcome(outcome)[source]¶

Return all (state, action) pairs that have a nonzero probability of producing outcome under the current model.

Parameters:	outcome (int) – The outcome state.
Returns:	pairs (list of (state, action) tuples) – A list of all (state, action) pairs with a nonzero probability of producing outcome under the model.

numpy_ml.rl_models.rl_utils.tile_state_space(env, env_stats, n_tilings, obs_max=None, obs_min=None, state_action=False, grid_size=(4, 4))[source]¶

Return a function to encode the continous observations generated by env in terms of a collection of n_tilings overlapping tilings (each with dimension grid_size) of the state space.

Parameters:

env (gym.wrappers.time_limit.TimeLimit instance) – An openAI environment.
n_tilings (int) – The number of overlapping tilings to use. Should be a power of 2. This determines the dimension of the discretized tile-encoded state vector.
obs_max (float or np.ndarray) – The value to treat as the max value of the observation space when calculating the grid widths. If None, use env.observation_space.high. Default is None.
obs_min (float or np.ndarray) – The value to treat as the min value of the observation space when calculating the grid widths. If None, use env.observation_space.low. Default is None.
state_action (bool) – Whether to use tile coding to encode state-action values (True) or just state values (False). Default is False.
grid_size (list of length 2) – A list of ints representing the coarseness of the tilings. E.g., a grid_size of [4, 4] would mean each tiling consisted of a 4x4 tile grid. Default is [4, 4].

Returns:

encode_obs_as_tile (function) – A function which takes as input continous observation vector and returns a set of the indices of the active tiles in the tile coded observation space.
n_states (int) – An integer reflecting the total number of unique states possible under this tile coding regimen.

numpy_ml.rl_models.rl_utils.get_gym_environs()[source]¶: List all valid OpenAI gym environment ids

numpy_ml.rl_models.rl_utils.get_gym_stats()[source]¶: Return a pandas DataFrame of the environment IDs.

numpy_ml.rl_models.rl_utils.is_tuple(env)[source]¶

Check if the action and observation spaces for env are instances of gym.spaces.Tuple or gym.spaces.Dict.

Notes

A tuple space is a tuple of several (possibly multidimensional) action/observation spaces. For our purposes, a tuple space is necessarily multidimensional.

Returns:	tuple_action (bool) – Whether the env’s action space is an instance of `gym.spaces.Tuple` or `gym.spaces.Dict`. tuple_obs (bool) – Whether the env’s observation space is an instance of `gym.spaces.Tuple` or `gym.spaces.Dict`.

numpy_ml.rl_models.rl_utils.is_multidimensional(env)[source]¶

Check if the action and observation spaces for env are multidimensional or Tuple spaces.

Notes

A multidimensional space is any space whose actions / observations have more than one element in them. This includes Tuple spaces, but also includes single action/observation spaces with several dimensions.

Parameters:	env (`gym.wrappers` or `gym.envs` instance) – The environment to evaluate.
Returns:	md_action (bool) – Whether the env’s action space is multidimensional. md_obs (bool) – Whether the env’s observation space is multidimensional. tuple_action (bool) – Whether the env’s action space is a `Tuple` instance. tuple_obs (bool) – Whether the env’s observation space is a `Tuple` instance.

numpy_ml.rl_models.rl_utils.is_continuous(env, tuple_action, tuple_obs)[source]¶

Check if an env’s observation and action spaces are continuous.

Parameters:

env (gym.wrappers or gym.envs instance) – The environment to evaluate.
tuple_action (bool) – Whether the env’s action space is an instance of gym.spaces.Tuple or gym.spaces.Dict.
tuple_obs (bool) – Whether the env’s observation space is an instance of gym.spaces.Tuple or gym.spaces.Dict.

Returns:

cont_action (bool) – Whether the env’s action space is continuous.
cont_obs (bool) – Whether the env’s observation space is continuous.

numpy_ml.rl_models.rl_utils.action_stats(env, md_action, cont_action)[source]¶

Get information on env’s action space.

Parameters:

md_action (bool) – Whether the env’s action space is multidimensional.
cont_action (bool) – Whether the env’s action space is continuous.

Returns:

n_actions_per_dim (list of length (action_dim,)) – The number of possible actions for each dimension of the action space.
action_ids (list or None) – A list of all valid actions within the space. If cont_action is True, this value will be None.
action_dim (int or None) – The number of dimensions in a single action.

numpy_ml.rl_models.rl_utils.obs_stats(env, md_obs, cont_obs)[source]¶

Get information on the observation space for env.

Parameters:

env (gym.wrappers or gym.envs instance) – The environment to evaluate.
md_obs (bool) – Whether the env’s action space is multidimensional.
cont_obs (bool) – Whether the env’s observation space is multidimensional.

Returns:

n_obs_per_dim (list of length (obs_dim,)) – The number of possible observation classes for each dimension of the observation space.
obs_ids (list or None) – A list of all valid observations within the space. If cont_obs is True, this value will be None.
obs_dim (int or None) – The number of dimensions in a single observation.

numpy_ml.rl_models.rl_utils.env_stats(env)[source]¶

Compute statistics for the current environment.

Parameters:	env (`gym.wrappers` or `gym.envs` instance) – The environment to evaluate.
Returns:	env_info (dict) – A dictionary containing information about the action and observation spaces of env.

numpy-ml

Navigation

Related Topics

Utilities¶