class numpy_ml.rl_models.trainer.Trainer(agent, env)[source]

An object to facilitate agent training and evaluation.

  • agent (AgentBase instance) – The agent to train.
  • env (gym.wrappers or gym.envs instance) – The environment to run the agent on.
train(n_episodes, max_steps, seed=None, plot=True, verbose=True, render_every=None, smooth_factor=0.05)[source]

Train an agent on an OpenAI gym environment, logging training statistics along the way.

  • n_episodes (int) – The number of episodes to train the agent across.
  • max_steps (int) – The maximum number of steps the agent can take on each episode.
  • seed (int or None) – A seed for the random number generator. Default is None.
  • plot (bool) – Whether to generate a plot of the cumulative reward as a function of training episode. Default is True.
  • verbose (bool) – Whether to print intermediate run statistics to stdout during training. Default is True.
  • smooth_factor (float in [0, 1]) – The amount to smooth the cumulative reward across episodes. Larger values correspond to less smoothing.

Plot the cumulative reward per episode as a function of episode number.


Saves plot to the file ./img/<agent>-<env>.png

Parameters:rwd_greedy (float) – The cumulative reward earned with a final execution of a greedy target policy.