1.5. First-Order Methods¶
1.5.1. Steepest Descent (Gradient Descent)¶
In the Steepest Descent method (also know as Gradient Descent), the search direction at iteration (k) is defined by:
where \(\nabla f(\mathbf{x}_k)\) is the gradient of the function at point \(\mathbf{x}_k\). The next point is calculated as:
where \(\alpha_k\) is a scalar step size.
The implementation with optymus:
from optymus import Optimizer
# Define the objective function
f = lambda x: x[0]**2 + x[1]**2
# Define the initial point
x0 = [0.0, 2.0]
# Define the optimization problem
results = Optimizer(f_obj=f, x0=x0, method='steepest_descent')
# Print the solution
results.report()
1.5.2. Conjugate Gradient Methods¶
1.5.2.1. Fletcher–Reeves Method¶
The Fletcher–Reeves method is defined as:
where \(\beta_k\) is given by:
The next point is calculated similarly as in the Gradient Descent method.
The implementation with optymus:
from optymus import Optimizer
# Define the objective function
f = lambda x: x[0]**2 + x[1]**2
# Define the initial point
x0 = [0.0, 2.0]
# Define the optimization problem
results = Optimizer(f_obj=f, x0=x0, method='conjugate_gradient', gradient_type='fletcher_reeves')
# Print the solution
results.report()
1.5.2.2. Hestenes–Stiefel Method¶
In the Hestenes–Stiefel method, the parameter \(\beta_k\) is calculated differently:
The search direction and next point calculation follow the same pattern as above .
The implementation with optymus:
from optymus import Optimizer
# Define the objective function
f = lambda x: x[0]**2 + x[1]**2
# Define the initial point
x0 = [0.0, 2.0]
# Define the optimization problem
results = Optimizer(f_obj=f, x0=x0, method='conjugate_gradient', gradient_type='hestenes_stiefel')
# Print the solution
results.report()
1.5.2.3. Polak–Ribiére Method¶
The Polak–Ribiére method calculates \(\beta_k\) as follows:
The search direction and next point are calculated in the same way as the other conjugate gradient methods.
The implementation with optymus:
from optymus import Optimizer
# Define the objective function
f = lambda x: x[0]**2 + x[1]**2
# Define the initial point
x0 = [0.0, 2.0]
# Define the optimization problem
results = Optimizer(f_obj=f, x0=x0, method='conjugate_gradient', gradient_type='polak_ribiere')
# Print the solution
results.report()
1.5.2.4. Dai-Yuan Method¶
The Dai-Yuan method calculates \(\beta_k\) as follows:
The search direction and next point are calculated in the same way as the other conjugate gradient methods.
The implementation with optymus:
from optymus import Optimizer
# Define the objective function
f = lambda x: x[0]**2 + x[1]**2
# Define the initial point
x0 = [0.0, 2.0]
# Define the optimization problem
results = Optimizer(f_obj=f, x0=x0, method='conjugate_gradient', gradient_type='dai_yuan')
# Print the solution
results.report()
1.5.3. Quasi-Newton Methods¶
Quasi-Newton methods seek to approximate the Hessian matrix to avoid the direct computation of second-order derivatives. These methods update the Hessian approximation \(B_k\) at each iteration. In literature we can find this method classified as a second-order methods, but here we classify it as a first-order method because it does not require the computation of second-order derivatives, only and approximation of the Hessian matrix.
1.5.3.1. Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method¶
The BFGS method updates the Hessian approximation as follows:
where \(\mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k\) and \(\mathbf{y}_k = \nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k)\).
The implementation with optymus:
from optymus import Optimizer
# Define the objective function
f = lambda x: x[0]**2 + x[1]**2
# Define the initial point
x0 = [0.0, 2.0]
# Define the optimization problem
results = Optimizer(f_obj=f, x0=x0, method='bfgs')
# Print the solution
results.report()
1.5.3.2. Limited-Memory BFGS (L-BFGS) Method¶
The L-BFGS method is a memory-efficient version of the BFGS method. It stores only a few vectors of the most recent iterations to approximate the Hessian matrix.
The L-BFGS method approximates the Hessian matrix using the following formula:
where \(\rho_k = 1 / y_k^T s_k\).
Note
The LBFGS is not implemented in optymus yet.