← Back to Blogs
Danskin’s Theorem
Value functions, envelope formulas, directional derivatives, and nonsmooth gradients.
Let be the decision variable and be a parameter.
Define the value function
When the minimizer is not unique, may fail to be differentiable. The right object is a
directional derivative (and, if desired, a generalized subdifferential).
Reference.
The material is standard; see Theorem 4.13 in the book Perturbation Analysis of Optimization Problems
(Bonnans–Shapiro) for closely related statements and proofs.
Danskin’s theorem is a workhorse because it turns a seemingly “implicit” differentiation problem into a “direct” one. When a value function is defined by an inner optimization,
it is tempting to think that differentiating requires differentiating the optimizer map
. In practice that map can be discontinuous, non-unique, or simply expensive to
differentiate (it may require sensitivity equations or differentiating through an iterative solver).
Danskin’s theorem says you typically don’t need any of that: whenever is differentiable at ,
its gradient is obtained by holding the optimizer fixed,
so the dependence “drops out.” This is exactly what makes bilevel objectives and value
functions tractable in modern optimization pipelines: compute an optimal (or near-optimal)
by solving the inner problem, then compute by standard
autodiff—without backpropagating through the solver or ever forming .
In large-scale settings this is the difference between a cheap, stable gradient oracle and an
impractical one, and it underpins widely used methods such as envelope-based gradient descent,
alternating minimization with outer updates, and many dual/regularized formulations where the
“min over ” is used to define a smooth (or directionally smooth) outer objective.
Assumptions
Let be nonempty and closed, and let be open. Assume:
- is continuous in .
- For every , the minimum is attained, and the minimizer set
is nonempty and compact.
- For every , the map is differentiable on , and is continuous in .
Theorem (directional derivative and generalized gradient)
Directional derivative.
Under the assumptions above, is directionally differentiable on and for every and ,
Differentiability criterion.
If either is a singleton, or is the same for all ,
then is differentiable at and
Clarke subdifferential (optional).
If is locally Lipschitz (e.g., is bounded on a neighborhood of ),
then the Clarke subdifferential satisfies
Proof (directional derivative)
Fix and direction .
For small, set .
Upper bound.
For any minimizer , we have and . Hence
Letting and using differentiability of ,
Since this holds for every ,
Lower bound.
Let be any minimizer at .
Since , we have
Divide by . By the mean value theorem applied to ,
there exists such that
By compactness of and continuity, the sequence has limit points.
Take a sequence such that .
By standard stability of argmin under continuity + compactness, .
Using continuity of ,
Since the right-hand side does not depend on the subsequence, we obtain
Conclusion.
The limsup and liminf match, so the directional derivative exists and equals
.
Remark (Clarke subdifferential)
The Clarke formula can be derived quickly by rewriting with
and applying the standard Clarke/Danskin rule for maxima,
which yields , hence
.