MomentumBase¶
The fundamental class from which all algorithms for computing momentum
coefficients are derived is MomentumBase
.
The momentum coefficient can be adjusted to yield a smoother trajectory and mitigate the zigzagging of gradient descent methods. The update for the auxiliary state sequence
depends on the sequence of the momentum coefficient \(t^{(j+1)}\). The classes described in Momentum Classes provide different alternatives to updating this coefficient. The momentum coefficient used by default corresponds to the Nesterov method implemented in MomentumNesterov
.
Classes derived from MomentumBase
should override/define the
method MomentumBase.update
.
Momentum Classes¶
The momentum functionality is defined by the following classes:
-
This implements the standard PGM variant from [6]. The momentum coefficient is updated as
\[t^{(j+1)} = \frac{1}{2} \left( 1 + \sqrt{1 + 4 \; (t^{(j)})^2} \right) \;,\]starting with \(t^{(1)} = 1\).
-
This implements the linear momentum coefficient variant from [14]. The momentum coefficient is updated as
\[t^{(j+1)} = \frac{j + b}{b} \;,\]with \(b\) a constant positive value usually selected as \(b \geq 2\).
-
This implements the generalized linear momentum coefficient variant from [40]. The momentum coefficient is updated as
\[t^{(j+1)} = \frac{j + a}{b} \;,\]with \(a\) and \(b\) constant positive values usually selected as \(a \in [50, 80]\) and \(b \geq 2\).