MomentumBase

The fundamental class from which all algorithms for computing momentum coefficients are derived is MomentumBase.

The momentum coefficient can be adjusted to yield a smoother trajectory and mitigate the zigzagging of gradient descent methods. The update for the auxiliary state sequence

\[\mathbf{y}^{(j+1)} = \mathbf{x}^{(j+1)} + \frac{t^{(j)} - 1}{t^{(j+1)}} \left( \mathbf{x}^{(j+1)} - \mathbf{x}^{(j)} \right) \;,\]

depends on the sequence of the momentum coefficient \(t^{(j+1)}\). The classes described in Momentum Classes provide different alternatives to updating this coefficient. The momentum coefficient used by default corresponds to the Nesterov method implemented in MomentumNesterov.

Classes derived from MomentumBase should override/define the method MomentumBase.update.

Momentum Classes

The momentum functionality is defined by the following classes:

  • MomentumNesterov

    This implements the standard PGM variant from [6]. The momentum coefficient is updated as

    \[t^{(j+1)} = \frac{1}{2} \left( 1 + \sqrt{1 + 4 \; (t^{(j)})^2} \right) \;,\]

    starting with \(t^{(1)} = 1\).

  • MomentumLinear

    This implements the linear momentum coefficient variant from [14]. The momentum coefficient is updated as

    \[t^{(j+1)} = \frac{j + b}{b} \;,\]

    with \(b\) a constant positive value usually selected as \(b \geq 2\).

  • MomentumGenLinear

    This implements the generalized linear momentum coefficient variant from [40]. The momentum coefficient is updated as

    \[t^{(j+1)} = \frac{j + a}{b} \;,\]

    with \(a\) and \(b\) constant positive values usually selected as \(a \in [50, 80]\) and \(b \geq 2\).