Learning rate schedulers

ConstantScheduler

class numpy_ml.neural_nets.schedulers.ConstantScheduler(lr=0.01, **kwargs)[source]

Returns a fixed learning rate, regardless of the current step.

Parameters:initial_lr (float) – The learning rate. Default is 0.01
learning_rate(**kwargs)[source]

Return the current learning rate.

Returns:lr (float) – The learning rate
copy()[source]

Return a copy of the current object.

set_params(hparam_dict)[source]

Set the scheduler hyperparameters from a dictionary.

ExponentialScheduler

class numpy_ml.neural_nets.schedulers.ExponentialScheduler(initial_lr=0.01, stage_length=500, staircase=False, decay=0.1, **kwargs)[source]

An exponential learning rate scheduler.

Notes

The exponential scheduler decays the learning rate by decay every stage_length steps, starting from initial_lr:

learning_rate = initial_lr * decay ** curr_stage

where:

curr_stage = step / stage_length          if staircase = False
curr_stage = floor(step / stage_length)   if staircase = True
Parameters:
  • initial_lr (float) – The learning rate at the first step. Default is 0.01.
  • stage_length (int) – The length of each stage, in steps. Default is 500.
  • staircase (bool) – If True, only adjusts the learning rate at the stage transitions, producing a step-like decay schedule. If False, adjusts the learning rate after each step, creating a smooth decay schedule. Default is False.
  • decay (float) – The amount to decay the learning rate at each new stage. Default is 0.1.
learning_rate(step, **kwargs)[source]

Return the current learning rate as a function of step.

Parameters:step (int) – The current step number.
Returns:lr (float) – The learning rate for the current step.
copy()[source]

Return a copy of the current object.

set_params(hparam_dict)[source]

Set the scheduler hyperparameters from a dictionary.

KingScheduler

class numpy_ml.neural_nets.schedulers.KingScheduler(initial_lr=0.01, patience=1000, decay=0.99, **kwargs)[source]

The Davis King / DLib learning rate scheduler.

Notes

The KingScheduler computes the probability that the slope of the OLS fit to the loss history is negative. If the probability that it is negative is less than 51% over the last patience steps, the scheduler exponentially decreases the current learning rate by decay.

References

[1]King, D. (2018). “Automatic learning rate scheduling that really works”. http://blog.dlib.net/2018/02/automatic-learning-rate-scheduling-that.html
Parameters:
  • initial_lr (float) – The learning rate to begin at. Default is 0.01.
  • patience (int) – Amount of time to maintain the current learning rate without a decrease in loss before adjustment. Default is 1000.
  • decay (float) – The amount to decay the learning rate at each new stage. Default is 0.99.
learning_rate(step, cur_loss)[source]

Compute the updated learning rate for the current step and loss.

Parameters:
  • step (int) – The current step number. Unused.
  • cur_loss (float) – The loss at the current step.
Returns:

lr (float) – The learning rate for the current step.

copy()[source]

Return a copy of the current object.

set_params(hparam_dict)[source]

Set the scheduler hyperparameters from a dictionary.

NoamScheduler

class numpy_ml.neural_nets.schedulers.NoamScheduler(model_dim=512, scale_factor=1, warmup_steps=4000, **kwargs)[source]

The Noam learning rate scheduler, originally used in conjunction with the Adam optimizer in [1].

Notes

The Noam scheduler increases the learning rate linearly for the first warmup_steps steps, and decreases it thereafter proportionally to the inverse square root of the step number:

lr = scale_factor * ( (model_dim ** (-0.5)) * adj_step )
adj_step = min(step_num ** (-0.5), step_num * warmup_steps ** (-1.5))

References

[1]Vaswani et al. (2017) “Attention is all you need”. 31st Conference on Neural Information Processing Systems, https://arxiv.org/pdf/1706.03762.pdf
Parameters:
  • model_dim (int) – The number of units in the layer output. Default is 512.
  • scale_factor (float) – A fixed coefficient for rescaling the final learning rate. Default is 1.
  • warmup_steps (int) – The number of steps in the warmup stage of training. Default is 4000.
learning_rate(step, **kwargs)[source]
copy()[source]

Return a copy of the current object.

set_params(hparam_dict)[source]

Set the scheduler hyperparameters from a dictionary.