MomentumBase¶
The fundamental class from which all algorithms for computing momentum
coefficients are derived is MomentumBase.
The momentum coefficient can be adjusted to yield a smoother trajectory and mitigate the zigzagging of gradient descent methods. The update for the auxiliary state sequence
depends on the sequence of the momentum coefficient \(t^{(j+1)}\). The classes described in Momentum Classes provide different alternatives to updating this coefficient. The momentum coefficient used by default corresponds to the Nesterov method implemented in MomentumNesterov.
Classes derived from MomentumBase should override/define the
method MomentumBase.update.
Momentum Classes¶
The momentum functionality is defined by the following classes:
-
This implements the standard PGM variant from [6]. The momentum coefficient is updated as
\[t^{(j+1)} = \frac{1}{2} \left( 1 + \sqrt{1 + 4 \; (t^{(j)})^2} \right) \;,\]starting with \(t^{(1)} = 1\).
-
This implements the linear momentum coefficient variant from [14]. The momentum coefficient is updated as
\[t^{(j+1)} = \frac{j + b}{b} \;,\]with \(b\) a constant positive value usually selected as \(b \geq 2\).
-
This implements the generalized linear momentum coefficient variant from [40]. The momentum coefficient is updated as
\[t^{(j+1)} = \frac{j + a}{b} \;,\]with \(a\) and \(b\) constant positive values usually selected as \(a \in [50, 80]\) and \(b \geq 2\).