Learning rate schedulers¶
ConstantScheduler
¶
-
class
numpy_ml.neural_nets.schedulers.
ConstantScheduler
(lr=0.01, **kwargs)[source]¶ Returns a fixed learning rate, regardless of the current step.
Parameters: initial_lr (float) – The learning rate. Default is 0.01
ExponentialScheduler
¶
-
class
numpy_ml.neural_nets.schedulers.
ExponentialScheduler
(initial_lr=0.01, stage_length=500, staircase=False, decay=0.1, **kwargs)[source]¶ An exponential learning rate scheduler.
Notes
The exponential scheduler decays the learning rate by decay every stage_length steps, starting from initial_lr:
learning_rate = initial_lr * decay ** curr_stage
where:
curr_stage = step / stage_length if staircase = False curr_stage = floor(step / stage_length) if staircase = True
Parameters: - initial_lr (float) – The learning rate at the first step. Default is 0.01.
- stage_length (int) – The length of each stage, in steps. Default is 500.
- staircase (bool) – If True, only adjusts the learning rate at the stage transitions, producing a step-like decay schedule. If False, adjusts the learning rate after each step, creating a smooth decay schedule. Default is False.
- decay (float) – The amount to decay the learning rate at each new stage. Default is 0.1.
KingScheduler
¶
-
class
numpy_ml.neural_nets.schedulers.
KingScheduler
(initial_lr=0.01, patience=1000, decay=0.99, **kwargs)[source]¶ The Davis King / DLib learning rate scheduler.
Notes
The KingScheduler computes the probability that the slope of the OLS fit to the loss history is negative. If the probability that it is negative is less than 51% over the last patience steps, the scheduler exponentially decreases the current learning rate by decay.
References
[1] King, D. (2018). “Automatic learning rate scheduling that really works”. http://blog.dlib.net/2018/02/automatic-learning-rate-scheduling-that.html Parameters:
NoamScheduler
¶
-
class
numpy_ml.neural_nets.schedulers.
NoamScheduler
(model_dim=512, scale_factor=1, warmup_steps=4000, **kwargs)[source]¶ The Noam learning rate scheduler, originally used in conjunction with the Adam optimizer in [1].
Notes
The Noam scheduler increases the learning rate linearly for the first warmup_steps steps, and decreases it thereafter proportionally to the inverse square root of the step number:
lr = scale_factor * ( (model_dim ** (-0.5)) * adj_step ) adj_step = min(step_num ** (-0.5), step_num * warmup_steps ** (-1.5))
References
[1] Vaswani et al. (2017) “Attention is all you need”. 31st Conference on Neural Information Processing Systems, https://arxiv.org/pdf/1706.03762.pdf Parameters: