Solvers¶
ProximalGradient¶

Proximal gradient descent. 
The proximalgradient method [BT2009], [N2013] is a method to solve problems of the form
where $f$ is a differentiable function for which we have access to its gradient and $g$ is a potentially nonsmooth function for which we have access to its proximal operator.
Examples
References
 BT2009
Beck, Amir, and Marc Teboulle. “Gradientbased algorithms with applications to signal recovery.” Convex optimization in signal processing and communications (2009)
 N2013
Nesterov, Yu. “Gradient methods for minimizing composite functions.” Mathematical Programming 140.1 (2013): 125161.
Primaldual hybrid gradient¶

Primaldual hybrid gradient splitting method. 
The primaldual hybrid gradient method [C2013] [V2013] [CP2016] is a method to solve problems of the form
where $f$ is a differentiable function for which we have access to its gradient and $g$ and $h$ are potentially nonsmooth functions for which we have access to their proximal operator.
Examples
References
 C2013
Condat, Laurent. “A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms.” Journal of Optimization Theory and Applications 158.2 (2013): 460479.
 V2013
Vũ, Bằng Công. “A splitting algorithm for dual monotone inclusions involving cocoercive operators.” Advances in Computational Mathematics 38.3 (2013)
 CP2016
Chambolle, Antonin, and Thomas Pock. “An introduction to continuous optimization for imaging.” Acta Numerica 25 (2016)
Threeoperator splitting¶

DavisYin three operator splitting method. 
The three operator splitting [DY2017] [PG2018] is a method to solve problems of the form
where $f$ is a differentiable function for which we have access to its gradient and $g$ and $h$ are potentially nonsmooth functions for which we have access to their proximal operator.
Examples
References
 DY2017
Davis, Damek, and Wotao Yin. “A threeoperator splitting scheme and its optimization applications.” SetValued and Variational Analysis, 2017.
 PG2018
Pedregosa, Fabian, and Gauthier Gidel. “Adaptive Three Operator Splitting.” Proceedings of the 35th International Conference on Machine Learning, 2018.
FrankWolfe¶

FrankWolfe algorithm. 
The FrankWolfe (FW) or conditional gradient algorithm [J2003], [P2018], [PANJ2018] is a method for constrained optimization. It can solve problems of the form
where \(f\) is a differentiable function for which we have access to its gradient and \(\mathcal{D}\) is a compact set for which we have access to its linear minimization oracle (lmo). This is a routine that given a vector \(\bs{u}\) returns a solution to
Contrary to other constrained optimization algorithms like projected gradient descent, the FrankWolfe algorithm does not require access to a projection, hence why it is sometimes referred to as a projectionfree algorithm. It instead relies exclusively on the linear minimization oracle described above.
The FrankWolfe algorithm is implemented in this library in the method copt.minimize_frank_wolfe()
. As most other methods it takes as argument an objective function to minimize, but unlike most other methods, it requires access to a linear minimization oracle, which is a routine that for a given $d$dimensional vector \(\bs{u}\) solves the linear problems \(\argmin_{\bs{z} \in D}\, \langle \bs{u}, \bs{z}\rangle\).
At each iteration, the FrankWolfe algorithm uses the linear minimization oracle to identify the vertex \(\bs{s}_t\) that correlates most with the negative gradient. Then next iterate \(\bs{x}^+\) is constructed as a convex combination of the current iterate \(\bs{x}\) and the newly acquired vertex \(\bs{s}\):
The stepsize \(\gamma\) can be chosen by different strategies:
Backtracking linesearch. This is the default option and corresponds to the keyword argument
step_size="backtracking"
This is typically the fastest and simplest method, if unsure, use this option.DemyanovRubinov stepsize. This is a stepsize of the form
\[\gamma = \langle \nabla f(\bs{x}), \bs{s}  \bs{x}\rangle / (L \\bs{s}  \bs{x}\^2)~.\]This stepsize typically performs well but has the drawback that it requires knowledge of the Lipschitz constant of \(\nabla f\). This stepsize can be used with the keyword argument
step_size="DR"
. In this case the Lipschitz constant \(L\) needs to be specified through the keyword argumentlipschitz
. For example, if the lipschitz constant is 0.1, then the signature should includestep_size="DR", lipschitz=0.1
.Oblivious stepsize. This is the very simple stepsize of the form
\[\gamma = \frac{2}{t+2}~,\]where \(t\) is the number of iterations. This stepsize is oblivious since it doesn’t use any previous information of the objective. It typically performs worst than the alternatives, but is simple to implement and can be competitive in the case in the case of noisy objectives.
Examples
References:
 J2003
Jaggi, Martin. “Revisiting FrankWolfe: ProjectionFree Sparse Convex Optimization.” ICML 2013.
 P2018
Pedregosa, Fabian “Notes on the FrankWolfe Algorithm”, 2018
 PANJ2018
Pedregosa, Fabian, Armin Askari, Geoffrey Negiar, and Martin Jaggi. “StepSize Adaptivity in ProjectionFree Optimization.” arXiv:1806.05123 (2018).
 LJ2015
LacosteJulien, Simon, and Martin Jaggi. “On the global linear convergence of FrankWolfe optimization variants.” Advances in Neural Information Processing Systems. 2015.
Stochastic methods¶

Stochastic average gradient augmented (SAGA) algorithm. 

Stochastic average gradient augmented (SAGA) algorithm. 

Variancereduced three operator splitting (VRTOS) algorithm. 

Stochastic FrankWolfe (SFW) algorithm. 