Utilities¶
Utilities for training and evaluating RL models on OpenAI gym environments
-
class
numpy_ml.rl_models.rl_utils.
EnvModel
[source]¶ A simple tabular environment model that maintains the counts of each reward-outcome pair given the state and action that preceded them. The model can be queried with
>>> M = EnvModel() >>> M[(state, action, reward, next_state)] += 1 >>> M[(state, action, reward, next_state)] 1 >>> M.state_action_pairs() [(state, action)] >>> M.outcome_probs(state, action) [(next_state, 1)]
-
reward_outcome_pairs
(s, a)[source]¶ Return all (reward, next_state) pairs associated with taking action a in state s.
-
outcome_probs
(s, a)[source]¶ Return the probability under the environment model of each outcome state after taking action a in state s.
Parameters: - s (int as returned by
self._obs2num
) – The id for the state/observation. - a (int as returned by
self._action2num
) – The id for the action taken from state s.
Returns: outcome_probs (list of (state, prob) tuples) – A list of each possible outcome and its associated probability under the model.
- s (int as returned by
-
state_action_pairs_leading_to_outcome
(outcome)[source]¶ Return all (state, action) pairs that have a nonzero probability of producing outcome under the current model.
Parameters: outcome (int) – The outcome state. Returns: pairs (list of (state, action) tuples) – A list of all (state, action) pairs with a nonzero probability of producing outcome under the model.
-
-
numpy_ml.rl_models.rl_utils.
tile_state_space
(env, env_stats, n_tilings, obs_max=None, obs_min=None, state_action=False, grid_size=(4, 4))[source]¶ Return a function to encode the continous observations generated by env in terms of a collection of n_tilings overlapping tilings (each with dimension grid_size) of the state space.
Parameters: - env (
gym.wrappers.time_limit.TimeLimit
instance) – An openAI environment. - n_tilings (int) – The number of overlapping tilings to use. Should be a power of 2. This determines the dimension of the discretized tile-encoded state vector.
- obs_max (float or np.ndarray) – The value to treat as the max value of the observation space when
calculating the grid widths. If None, use
env.observation_space.high
. Default is None. - obs_min (float or np.ndarray) – The value to treat as the min value of the observation space when
calculating the grid widths. If None, use
env.observation_space.low
. Default is None. - state_action (bool) – Whether to use tile coding to encode state-action values (True) or just state values (False). Default is False.
- grid_size (list of length 2) – A list of ints representing the coarseness of the tilings. E.g., a grid_size of [4, 4] would mean each tiling consisted of a 4x4 tile grid. Default is [4, 4].
Returns: - encode_obs_as_tile (function) – A function which takes as input continous observation vector and returns a set of the indices of the active tiles in the tile coded observation space.
- n_states (int) – An integer reflecting the total number of unique states possible under this tile coding regimen.
- env (
-
numpy_ml.rl_models.rl_utils.
get_gym_stats
()[source]¶ Return a pandas DataFrame of the environment IDs.
-
numpy_ml.rl_models.rl_utils.
is_tuple
(env)[source]¶ Check if the action and observation spaces for env are instances of
gym.spaces.Tuple
orgym.spaces.Dict
.Notes
A tuple space is a tuple of several (possibly multidimensional) action/observation spaces. For our purposes, a tuple space is necessarily multidimensional.
Returns: - tuple_action (bool) – Whether the env’s action space is an instance of
gym.spaces.Tuple
orgym.spaces.Dict
. - tuple_obs (bool) – Whether the env’s observation space is an instance of
gym.spaces.Tuple
orgym.spaces.Dict
.
- tuple_action (bool) – Whether the env’s action space is an instance of
-
numpy_ml.rl_models.rl_utils.
is_multidimensional
(env)[source]¶ Check if the action and observation spaces for env are multidimensional or
Tuple
spaces.Notes
A multidimensional space is any space whose actions / observations have more than one element in them. This includes
Tuple
spaces, but also includes single action/observation spaces with several dimensions.Parameters: env ( gym.wrappers
orgym.envs
instance) – The environment to evaluate.Returns: - md_action (bool) – Whether the env’s action space is multidimensional.
- md_obs (bool) – Whether the env’s observation space is multidimensional.
- tuple_action (bool) – Whether the env’s action space is a
Tuple
instance. - tuple_obs (bool) – Whether the env’s observation space is a
Tuple
instance.
-
numpy_ml.rl_models.rl_utils.
is_continuous
(env, tuple_action, tuple_obs)[source]¶ Check if an env’s observation and action spaces are continuous.
Parameters: Returns: - cont_action (bool) – Whether the env’s action space is continuous.
- cont_obs (bool) – Whether the env’s observation space is continuous.
-
numpy_ml.rl_models.rl_utils.
action_stats
(env, md_action, cont_action)[source]¶ Get information on env’s action space.
Parameters: Returns: - n_actions_per_dim (list of length (action_dim,)) – The number of possible actions for each dimension of the action space.
- action_ids (list or None) – A list of all valid actions within the space. If cont_action is True, this value will be None.
- action_dim (int or None) – The number of dimensions in a single action.
-
numpy_ml.rl_models.rl_utils.
obs_stats
(env, md_obs, cont_obs)[source]¶ Get information on the observation space for env.
Parameters: Returns: - n_obs_per_dim (list of length (obs_dim,)) – The number of possible observation classes for each dimension of the observation space.
- obs_ids (list or None) – A list of all valid observations within the space. If cont_obs is True, this value will be None.
- obs_dim (int or None) – The number of dimensions in a single observation.