Optimization Technique

120_Inverse Hessian Updating

elif 2024. 3. 29. 16:52

The Newton method converges very quickly, but it can be inefficient because it requires calculating $n(n+1)/2$ second-order derivatives to generate the Hessian matrix. Additionally, the Newton method can be challenging to solve if the function's Hessian is singular at any iteration. Similarly, in most engineering problems, calculating second-order derivatives can be very complex or impossible.

 

This post will explain how to overcome the disadvantages of the Newton method by generating approximations of the Hessian matrix or its inverse at each iteration. These approximations use only first-order derivatives, and such methods are known as quasi-Newton method

 

The fundamental idea behind quasi-Newton methods is to update the approximation of the current Hessian using the change in design and the gradient vector between iterations. During the update, properties such as symmetry and positive definiteness are preserved, which is crucial since if positive definiteness isn't guaranteed, the search direction might not be a descent direction for the cost function.

The derivation of the update procedure is based on the quasi-Newton condition or the secant eqaution, assuming that the curvature of the cost function in the search direction is the same between two consecutive points.

 

DFP Method

The Davidon-Fletcher-Powell(DFP) method constructs an approximation of the inverse Hessian using only first-order derivatives.

 

Choose the initial design ${x^{(0)}}$. Then, set a symmetric and positive definite $n \times n$ matrix ${{\text{A}}^{(0)}}$ as an estimate for the inverse of the Hessian of the cost function. ${{\text{A}}^{(0)}}$ can also be set as the identity matrix. Next, specify the convergence parameter $\varepsilon $, set $k=0$, and calculate the gradient vector of the cost function as ${{\text{c}}^{(0)}} = \nabla f({x^{(0)}})$.

 

Calculate the norm of the gradient vector $\left\| {{{\text{c}}^{(k)}}} \right\|$. If $\left\| {{{\text{c}}^{(k)}}} \right\| < \varepsilon $, stop otherwise, continue.

It continuing, calculate the search direction and optimum step size as follows.

 

 

Update the design variables using the calculated step size and search direction.

 

 

Update ${{\text{A}}^{(k)}}$, the approximation to the inverse Hessian of the cost function.

 

 

Here, the correction matrices ${{\text{B}}^{(k)}}$ and ${{\text{C}}^{(k)}}$ are calculated using the quasi-Newton condition.

 

 

Here ${\text{s}}$ represents the change in design, and ${\text{y}}$ is the change in gradient. Through the above process, set $k=k+1$ and repeat. The initial iterations of this process are similar to the steepest-descent method.

 

Here, ${{\text{A}}^{(k)}}$ is positive definite for all $k$, ensuring that the algorithm always converges to a local minimum. The fact that ${{\text{A}}^{(k)}}$ remains positive definite for all $k$ can be understood through the following expression.

 

 

The above expression demonstrates that ${\text{d}}$ is a descent direction when the gradient is not zero. This means that by selecting $\alpha  > 0$, the value of the function $f\left( {{x^{(k)}}} \right)$ can be decreased. In the next post, I'll solve an example problem using this method.

'Optimization Technique' 카테고리의 다른 글

122_BFGS Method  (0) 2024.03.31
121_DFP Method Example  (0) 2024.03.30
119_Modified Newton Method  (0) 2024.03.28
118_Conjugate Gradient Algorithm  (0) 2024.03.27
115_KKT Conditions for LP Problem  (0) 2024.03.24