# Training¶

## Trainer¶

class numpy_ml.rl_models.trainer.Trainer(agent, env)[source]

An object to facilitate agent training and evaluation.

Parameters: agent (AgentBase instance) – The agent to train. env (gym.wrappers or gym.envs instance) – The environment to run the agent on.
train(n_episodes, max_steps, seed=None, plot=True, verbose=True, render_every=None, smooth_factor=0.05)[source]

Train an agent on an OpenAI gym environment, logging training statistics along the way.

Parameters: n_episodes (int) – The number of episodes to train the agent across. max_steps (int) – The maximum number of steps the agent can take on each episode. seed (int or None) – A seed for the random number generator. Default is None. plot (bool) – Whether to generate a plot of the cumulative reward as a function of training episode. Default is True. verbose (bool) – Whether to print intermediate run statistics to stdout during training. Default is True. smooth_factor (float in [0, 1]) – The amount to smooth the cumulative reward across episodes. Larger values correspond to less smoothing.
plot_rewards(rwd_greedy)[source]

Plot the cumulative reward per episode as a function of episode number.

Notes

Saves plot to the file ./img/<agent>-<env>.png

Parameters: rwd_greedy (float) – The cumulative reward earned with a final execution of a greedy target policy.