![]() |
Sleipnir
A linearity-exploiting reverse mode autodiff library and nonlinear program solver DSL.
|
This page documents the algorithms Sleipnir implements.
In reverse accumulation AD, the dependent variable to be differentiated is fixed and the derivative is computed with respect to each subexpression recursively. In a pen-and-paper calculation, the derivative of the outer functions is repeatedly substituted in the chain rule:
(∂y/∂x) = (∂y/∂w₁) ⋅ (∂w₁/∂x) = ((∂y/∂w₂) ⋅ (∂w₂/∂w₁)) ⋅ (∂w₁/∂x) = ...
In reverse accumulation, the quantity of interest is the adjoint, denoted with a bar (w̄); it is a derivative of a chosen dependent variable with respect to a subexpression w: ∂y/∂w.
Given the expression f(x₁,x₂) = sin(x₁) + x₁x₂, the computational graph is:
The operations to compute the derivative:
w̄₅ = 1 (seed)
w̄₄ = w̄₅(∂w₅/∂w₄) = w̄₅
w̄₃ = w̄₅(∂w₅/∂w₃) = w̄₅
w̄₂ = w̄₃(∂w₃/∂w₂) = w̄₃w₁
w̄₁ = w̄₄(∂w₄/∂w₁) + w̄₃(∂w₃/∂w₁) = w̄₄cos(w₁) + w̄₃w₂
https://en.wikipedia.org/wiki/Automatic_differentiation#Beyond_forward_and_reverse_accumulation
We want to solve the following optimization problem.
where f(x) is the cost function.
The Lagrangian of the problem is
The gradients are
The first-order necessary conditions for optimality (KKT conditions) are
Next, we'll apply Newton's method to the optimality conditions. Let H be ∂²L/∂x² and pˣ be the step for x.
In summary, the following system gives the iterate pₖˣ.
The iterate is applied like so
We want to solve the following optimization problem.
where f(x) is the cost function and cₑ(x) is the equality constraints.
The Lagrangian of the problem is
The gradients are
The first-order necessary conditions for optimality (KKT conditions) are
where Aₑ = ∂cₑ/∂x. We'll rearrange them for the primal-dual system.
Next, we'll apply Newton's method to the optimality conditions. Let H be ∂²L/∂x², pˣ be the step for x, and pʸ be the step for y.
Group them into a matrix equation.
Invert pʸ.
In summary, the reduced 2x2 block system gives the iterates pₖˣ and pₖʸ.
The iterates are applied like so
Section 6 of [^3] describes how to check for local infeasibility.
We want to solve the following optimization problem.
where f(x) is the cost function, cₑ(x) is the equality constraints, and cᵢ(x) is the inequality constraints. First, we'll reformulate the inequality constraints as equality constraints with slack variables.
To make this easier to solve, we'll reformulate it as the following barrier problem.
where μ is the barrier parameter. As μ → 0, the solution of the barrier problem approaches the solution of the original problem.
The Lagrangian of the barrier problem is
The gradients are
The first-order necessary conditions for optimality (KKT conditions) are
where Aₑ = ∂cₑ/∂x, Aᵢ = ∂cᵢ/∂x, S = diag(s), and e is a column vector of ones. We'll rearrange them for the primal-dual system.
Next, we'll apply Newton's method to the optimality conditions. Let H be ∂²L/∂x², Σ be the primal-dual barrier term Hessian S⁻¹Z, pˣ be the step for x, pˢ be the step for s, pʸ be the step for y, and pᶻ be the step for z.
Group them into a matrix equation.
Invert pʸ and pᶻ.
Multiply the second row by S⁻¹ and replace S⁻¹Z with Σ.
Solve the second row for pˢ.
Substitute Σ = S⁻¹Z into the first two terms.
Substitute the explicit formula for pˢ into the fourth row and simplify.
Substitute the new second and fourth rows into the system.
Eliminate the second row and column.
Solve the third row for pᶻ.
Substitute the explicit formula for pᶻ into the first row.
Expand and simplify.
Substitute the new first and third rows into the system.
Eliminate the third row and column.
Expand and simplify pˢ.
In summary, the reduced 2x2 block system gives the iterates pₖˣ and pₖʸ.
The iterates pˢ and pᶻ are given by
The iterates are applied like so
where αₖᵐᵃˣ and αₖᶻ are computed via the fraction-to-the-boundary rule shown in equations (15a) and (15b) of [^2].
Section 6 of [^3] describes how to check for local infeasibility.
[^1]: Nocedal, J. and Wright, S. "Numerical Optimization", 2nd. ed., Ch. 19. Springer, 2006.
[^2]: Wächter, A. and Biegler, L. "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming", 2005. http://cepac.cheme.cmu.edu/pasilectures/biegler/ipopt.pdf
[^3]: Byrd, R. and Nocedal, J. and Waltz, R. "KNITRO: An Integrated Package for Nonlinear Optimization", 2005. https://users.iems.northwestern.edu/~nocedal/PDFfiles/integrated.pdf
[^4]: Gu, C. and Zhu, D. "A Dwindling Filter Algorithm with a Modified Subproblem for Nonlinear Inequality Constrained Optimization", 2014. https://sci-hub.st/10.1007/s11401-014-0826-z
[^5]: Hinder, O. and Ye, Y. "A one-phase interior point method for nonconvex optimization", 2018. https://arxiv.org/pdf/1801.03072.pdf