Training¶

`Trainer`¶

class numpy_ml.rl_models.trainer.Trainer(agent, env)[source]¶

An object to facilitate agent training and evaluation.

Parameters:	agent (`AgentBase` instance) – The agent to train. env (`gym.wrappers` or `gym.envs` instance) – The environment to run the agent on.

train(n_episodes, max_steps, seed=None, plot=True, verbose=True, render_every=None, smooth_factor=0.05)[source]¶

Train an agent on an OpenAI gym environment, logging training statistics along the way.

Parameters:

n_episodes (int) – The number of episodes to train the agent across.
max_steps (int) – The maximum number of steps the agent can take on each episode.
seed (int or None) – A seed for the random number generator. Default is None.
plot (bool) – Whether to generate a plot of the cumulative reward as a function of training episode. Default is True.
verbose (bool) – Whether to print intermediate run statistics to stdout during training. Default is True.
smooth_factor (float in [0, 1]) – The amount to smooth the cumulative reward across episodes. Larger values correspond to less smoothing.

plot_rewards(rwd_greedy)[source]¶

Plot the cumulative reward per episode as a function of episode number.

Notes

Saves plot to the file ./img/<agent>-<env>.png

Parameters:	rwd_greedy (float) – The cumulative reward earned with a final execution of a greedy target policy.