Heming Zhang's Blogs

Convex Optimization

view on github

Previous Lectures
Similar Courses
1 Basic Concepts
2 Unstricted Optimization Methods

Previous Lectures

Optimization Model and Software Tool (Instructor: Qingxing Dong)


Lecture1	Lecture2	Lecture3	Lecture4	Lecture5
Lecture6	Lecture7	Lecture8	Lecture9	Lecture10

Self-Recommended Books

Similar Courses

WashU ESE 415 Convex Optimization(2019 Spring)

1 Basic Concepts

2 Unstricted Optimization Methods

2.1 Gradient Descent Algorithm

(1) Given a function $f(\mathbf{x})$
(2) Intialize the start point $\mathbf{x}^{(0)}$, $k\leftarrow{0}$
(3) Judge whether $\mathbf{x}^{(k)}$ is the minimum or approximate minimum. If so, stop algorithm, and if not, jump to next step
(4) Use negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction $\mathbf{p}^{(k)}$(fastest descent direction)
(5) Use Steepest Descent(Line Search) to find $\lambda_k=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}})}=\arg\ \min_{\lambda_{k}}{\phi(f(\mathbf{x}^{(k)},\lambda_k))}$
OR (5’) Use Approximate Optimal Stepsize to find $\lambda_k$

If $f(\mathbf{x})$ has first-order continuous partial derivative, and use Taylor First-Order Expansion at point $\mathbf{x}^{(k)}$ with negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction

\[\begin{aligned} f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}) &=f(\mathbf{x}^{(k)}-\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}})\\ &=f(\mathbf{x}^{(k)})-{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\big] \end{aligned}\]

And sometimes, we regulate $\mathbf{p}^{(k)}=\frac{-\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}{|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}|}$, and we get

\[\lambda_{k}=\lambda_{0}||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||\]

Finally, we get

\[\begin{aligned} \mathbf{x}^{(k+1)}&=\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}\\ &=\mathbf{x}^{(k)}+(\lambda_{0}||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||)(-\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}{||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||})\\ &= \mathbf{x}^{(k)}-\lambda_0{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})} \end{aligned}\]

If $f(\mathbf{x})$ has second-order continuous partial derivative, and use Taylor Second-Order Expansion at point $\mathbf{x}^{(k)}$ with negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction

\[\begin{aligned} f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}) &=f(\mathbf{x}^{(k)}-\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}})\\ &=f(\mathbf{x}^{(k)})-{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\big]\\&+\frac{1}{2}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big]\mathbf{H}(\mathbf{x}^{(k)})\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big] \end{aligned}\]

Hence, we get the best stepsize by taking derivative for $\lambda_k$

\[\lambda_k=\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}{\mathbf{H}(\mathbf{x}^{(k)})}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\]

And sometimes, we regulate $\mathbf{p}^{(k)}=\frac{-\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}{|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}|}$, and we get

\[\lambda_k=\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})\|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\|}{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}{\mathbf{H}(\mathbf{x}^{(k)})}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\]

2.2 Newton’s Method

If $f(\mathbf{x})$ has continuous second-order partial derivative, and $\mathbf{x}^{(k)}$ as its approximate minimum, which we have second-order Taylor Expansion:
\[f(\mathbf{x})\approx{ f(\mathbf{x}^{(k)})+\nabla{f(\mathbf{x}^{(k)})}^{\text{T}}(\mathbf{x}-\mathbf{x}^{(k)})+\frac{1}{2}{(\mathbf{x}-\mathbf{x}^{(k)})^{\text{T}}}\mathbf{H}(\mathbf{x}^{(k)})(\mathbf{x}-\mathbf{x}^{(k)})}\]
Take $(\mathbf{x}-\mathbf{x}^{(k)})$ as main variable to take derivative for above equation, and we get
\[\mathbf{p}^{(k)}=(\mathbf{x}-\mathbf{x}^{(k)})=-\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}\]
Hence we usually call $\mathbf{p}^{(k)}=-\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}$ as Newton Direction.

And we can continue to find Approximate Optimal Stepsize with $\lambda_k=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}})}=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}-\lambda_k{\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}})}$

2.3 Quasi-Newton method

Since it is hard to compute Hessian Matrix if variable $\mathbf{x}$ has high dimensions.

DFP Method: Approximate Inverse of Hessian Matrix with $\bar{\mathbf{H}}^{(k)}$
BFGS Method: Approxiamte Hessian Matrix ${\mathbf{B}}^{(k)}$