Convex Optimization

view on github

Previous Lectures

  • Optimization Model and Software Tool (Instructor: Qingxing Dong)
         
Lecture1 Lecture2 Lecture3 Lecture4 Lecture5
Lecture6 Lecture7 Lecture8 Lecture9 Lecture10

Similar Courses

1 Basic Concepts

2 Unstricted Optimization Methods

2.1 Gradient Descent Algorithm

(1) Given a function $f(\mathbf{x})$
(2) Intialize the start point $\mathbf{x}^{(0)}$, $k\leftarrow{0}$
(3) Judge whether $\mathbf{x}^{(k)}$ is the minimum or approximate minimum. If so, stop algorithm, and if not, jump to next step
(4) Use negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction $\mathbf{p}^{(k)}$(fastest descent direction)
(5) Use Steepest Descent(Line Search) to find $\lambda_k=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}})}=\arg\ \min_{\lambda_{k}}{\phi(f(\mathbf{x}^{(k)},\lambda_k))}$
OR (5’) Use Approximate Optimal Stepsize to find $\lambda_k$

  • If $f(\mathbf{x})$ has first-order continuous partial derivative, and use Taylor First-Order Expansion at point $\mathbf{x}^{(k)}$ with negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction
\[\begin{aligned} f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}) &=f(\mathbf{x}^{(k)}-\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}})\\ &=f(\mathbf{x}^{(k)})-{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\big] \end{aligned}\]
  • And sometimes, we regulate $\mathbf{p}^{(k)}=\frac{-\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}{|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}|}$, and we get
\[\lambda_{k}=\lambda_{0}||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||\]
  • Finally, we get
\[\begin{aligned} \mathbf{x}^{(k+1)}&=\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}\\ &=\mathbf{x}^{(k)}+(\lambda_{0}||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||)(-\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}{||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||})\\ &= \mathbf{x}^{(k)}-\lambda_0{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})} \end{aligned}\]
  • If $f(\mathbf{x})$ has second-order continuous partial derivative, and use Taylor Second-Order Expansion at point $\mathbf{x}^{(k)}$ with negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction
\[\begin{aligned} f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}) &=f(\mathbf{x}^{(k)}-\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}})\\ &=f(\mathbf{x}^{(k)})-{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\big]\\&+\frac{1}{2}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big]\mathbf{H}(\mathbf{x}^{(k)})\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big] \end{aligned}\]
  • Hence, we get the best stepsize by taking derivative for $\lambda_k$
\[\lambda_k=\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}{\mathbf{H}(\mathbf{x}^{(k)})}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\]
  • And sometimes, we regulate $\mathbf{p}^{(k)}=\frac{-\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}{|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}|}$, and we get
\[\lambda_k=\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})\|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\|}{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}{\mathbf{H}(\mathbf{x}^{(k)})}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\]

2.2 Newton’s Method

If $f(\mathbf{x})$ has continuous second-order partial derivative, and $\mathbf{x}^{(k)}$ as its approximate minimum, which we have second-order Taylor Expansion:

\[f(\mathbf{x})\approx{ f(\mathbf{x}^{(k)})+\nabla{f(\mathbf{x}^{(k)})}^{\text{T}}(\mathbf{x}-\mathbf{x}^{(k)})+\frac{1}{2}{(\mathbf{x}-\mathbf{x}^{(k)})^{\text{T}}}\mathbf{H}(\mathbf{x}^{(k)})(\mathbf{x}-\mathbf{x}^{(k)})}\]

Take $(\mathbf{x}-\mathbf{x}^{(k)})$ as main variable to take derivative for above equation, and we get

\[\mathbf{p}^{(k)}=(\mathbf{x}-\mathbf{x}^{(k)})=-\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}\]

Hence we usually call $\mathbf{p}^{(k)}=-\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}$ as Newton Direction.

And we can continue to find Approximate Optimal Stepsize with $\lambda_k=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}})}=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}-\lambda_k{\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}})}$

2.3 Quasi-Newton method

Since it is hard to compute Hessian Matrix if variable $\mathbf{x}$ has high dimensions.

  • DFP Method: Approximate Inverse of Hessian Matrix with $\bar{\mathbf{H}}^{(k)}$
  • BFGS Method: Approxiamte Hessian Matrix ${\mathbf{B}}^{(k)}$