Convex Optimization
view on githubPrevious Lectures
- Optimization Model and Software Tool (Instructor: Qingxing Dong)
Lecture1 | Lecture2 | Lecture3 | Lecture4 | Lecture5 |
Lecture6 | Lecture7 | Lecture8 | Lecture9 | Lecture10 |
- Self-Recommended Books
Similar Courses
1 Basic Concepts
2 Unstricted Optimization Methods
2.1 Gradient Descent Algorithm
(1) Given a function $f(\mathbf{x})$
(2) Intialize the start point $\mathbf{x}^{(0)}$, $k\leftarrow{0}$
(3) Judge whether $\mathbf{x}^{(k)}$ is the minimum or approximate minimum. If so, stop algorithm, and if not, jump to next step
(4) Use negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction $\mathbf{p}^{(k)}$(fastest descent direction)
(5) Use Steepest Descent(Line Search) to find $\lambda_k=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}})}=\arg\ \min_{\lambda_{k}}{\phi(f(\mathbf{x}^{(k)},\lambda_k))}$
OR (5’) Use Approximate Optimal Stepsize to find $\lambda_k$\[\begin{aligned} f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}) &=f(\mathbf{x}^{(k)}-\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}})\\ &=f(\mathbf{x}^{(k)})-{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\big] \end{aligned}\]
- If $f(\mathbf{x})$ has first-order continuous partial derivative, and use Taylor First-Order Expansion at point $\mathbf{x}^{(k)}$ with negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction
\[\lambda_{k}=\lambda_{0}||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||\]
- And sometimes, we regulate $\mathbf{p}^{(k)}=\frac{-\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}{|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}|}$, and we get
\[\begin{aligned} \mathbf{x}^{(k+1)}&=\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}\\ &=\mathbf{x}^{(k)}+(\lambda_{0}||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||)(-\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}{||\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})||})\\ &= \mathbf{x}^{(k)}-\lambda_0{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})} \end{aligned}\]
- Finally, we get
\[\begin{aligned} f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}}) &=f(\mathbf{x}^{(k)}-\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}})\\ &=f(\mathbf{x}^{(k)})-{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\big]\\&+\frac{1}{2}\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big]\mathbf{H}(\mathbf{x}^{(k)})\big[\lambda_k{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}\big] \end{aligned}\]
- If $f(\mathbf{x})$ has second-order continuous partial derivative, and use Taylor Second-Order Expansion at point $\mathbf{x}^{(k)}$ with negative gradient $-\nabla_{\mathbf{x}}{f}(\mathbf{x}^{(k)})^{\text{T}}$ as search direction
\[\lambda_k=\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}{\mathbf{H}(\mathbf{x}^{(k)})}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\]
- Hence, we get the best stepsize by taking derivative for $\lambda_k$
\[\lambda_k=\frac{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})\|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}\|}{\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}{\mathbf{H}(\mathbf{x}^{(k)})}\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})}\]
- And sometimes, we regulate $\mathbf{p}^{(k)}=\frac{-\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}}{|\nabla_{\mathbf{x}}f(\mathbf{x}^{(k)})^{\text{T}}|}$, and we get
2.2 Newton’s Method
If $f(\mathbf{x})$ has continuous second-order partial derivative, and $\mathbf{x}^{(k)}$ as its approximate minimum, which we have second-order Taylor Expansion:
\[f(\mathbf{x})\approx{ f(\mathbf{x}^{(k)})+\nabla{f(\mathbf{x}^{(k)})}^{\text{T}}(\mathbf{x}-\mathbf{x}^{(k)})+\frac{1}{2}{(\mathbf{x}-\mathbf{x}^{(k)})^{\text{T}}}\mathbf{H}(\mathbf{x}^{(k)})(\mathbf{x}-\mathbf{x}^{(k)})}\]Take $(\mathbf{x}-\mathbf{x}^{(k)})$ as main variable to take derivative for above equation, and we get
\[\mathbf{p}^{(k)}=(\mathbf{x}-\mathbf{x}^{(k)})=-\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}\]Hence we usually call $\mathbf{p}^{(k)}=-\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}$ as Newton Direction.
And we can continue to find Approximate Optimal Stepsize with $\lambda_k=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}+\lambda_k{\mathbf{p}^{(k)}})}=\arg\ \min_{\lambda_{k}}{f(\mathbf{x}^{(k)}-\lambda_k{\mathbf{H}(\mathbf{x}^{(k)})^{-1}\nabla{f(\mathbf{x}^{(k)})}})}$
2.3 Quasi-Newton method
Since it is hard to compute Hessian Matrix if variable $\mathbf{x}$ has high dimensions.
- DFP Method: Approximate Inverse of Hessian Matrix with $\bar{\mathbf{H}}^{(k)}$
- BFGS Method: Approxiamte Hessian Matrix ${\mathbf{B}}^{(k)}$