# Difference between revisions of "Line search methods"

(27 intermediate revisions by one user not shown) | |||

Line 3: | Line 3: | ||

=Introduction= | =Introduction= | ||

− | An algorithm is a line search method if it seeks the minimum of a defined nonlinear function by selecting a reasonable direction vector that, when computed iteratively with a reasonable step size, will provide a function value closer to the absolute minimum of the function. Varying these will change the "tightness" of the optimization. For example, given the function <math>f(x)</math>, an initial <math>x_k</math> is chosen | + | An algorithm is a line search method if it seeks the minimum of a defined nonlinear function by selecting a reasonable direction vector that, when computed iteratively with a reasonable step size, will provide a function value closer to the absolute minimum of the function. Varying these will change the "tightness" of the optimization. For example, given the function <math>f(x)</math>, an initial <math>x_k</math> is chosen. To find a lower value of <math>f(x)</math>, the value of <math>x_{k+1}</math> is increased by the following iteration scheme |

+ | <math>x_{k+1} = x_k + \alpha_k p_k</math>, | ||

+ | |||

+ | in which <math>\alpha_k</math> is a positive scalar known as the step length and <math>p_k</math> defines the step direction. Figure 1 gives a clear flow chart to indicate the iteration scheme. | ||

+ | |||

+ | [[File:Line search alogorithm chart.png|frame|Figure 1: Algorithm flow chart of line search methods (Conger, adapted from Line Search wikipedia page)]] | ||

=Step Length= | =Step Length= | ||

+ | Choosing an appropriate step length has a large impact on the robustness of a line search method. To select the ideal step length, the following function could be minimized: | ||

+ | |||

+ | <math>\phi(\alpha) = f(x_k + \alpha p_k), \alpha >0</math>, | ||

+ | |||

+ | but this is not used in practical settings generally. This may give the most accurate minimum, but it would be very computationally expensive if the function <math>\phi</math> has multiple local minima or stationary points, as shown in Figure 2. | ||

+ | |||

+ | [[File:Step length image.png|frame|50px|Figure 2: Complexity of finding ideal step length (Nocedal & Wright)]] | ||

+ | |||

+ | A common and practical method for finding a suitable step length that is not too near to the global minimum of the <math>\phi</math> function is to require that the step length of <math>\alpha_k</math> reduces the value of the target function, or that | ||

+ | |||

+ | <math>f(x_k+\alpha p_k < f(x_k)</math> | ||

+ | |||

+ | This method does not ensure a convergence to the function's minimum, and so two conditions are employed to require a ''significant decrease condition'' during every iteration. | ||

+ | |||

+ | ==Wolfe Conditions== | ||

+ | These conditions, developed in 1969 by Philip Wolfe, are an inexact line search stipulation that requires <math>\alpha_k</math> to decreased the objective function by significant amount. This amount is defined by | ||

+ | |||

+ | <math>f(x_k+\alpha p_k) < f(x_k) + c_1 \alpha \nabla f^T_k p_k</math> | ||

+ | |||

+ | where <math>c_1</math> is between 0 and 1. Another way of describing this condition is to say that the decrease in the objective function should be proportional to both the step length and the directional derivative of the function and step direction. This inequality is also known as the ''Armijo condition''. In general, <math>c_1</math> is a very small value, ~<math>10^{-4}</math>. | ||

+ | |||

+ | The ''Armijo condition'' must be paired with the ''curvature condition'' | ||

+ | |||

+ | <math> \nabla f(x_k + \alpha p_k)^T p_k \ge f(x_k) + c_2 \nabla f^T_k p_k</math> | ||

+ | |||

+ | to keep the <math>\alpha_k</math> value from being too short. In this condition, <math>c_2</math> is greater than <math>c_1</math> but less than 1. These two conditions together are the Wolfe Conditions. | ||

+ | |||

+ | Another, more stringent form of these conditions is known as the ''strong Wolfe conditions''. The ''Armijo condition'' remains the same, but the curvature condition is restrained by taking the absolute value of the left side of the inequality. | ||

+ | |||

+ | <math> |\nabla f(x_k + \alpha p_k)^T p_k | \ge f(x_k) + c_2 \nabla f^T_k p_k</math> | ||

+ | |||

+ | This left hand side of the curvature condition is simply the derivative of the <math>\phi</math> function, and so this constraint prevents this derivative from becoming too positive, removing points that are too far from stationary points of <math>\phi</math> from consideration as viable <math>\alpha_k</math> values. | ||

+ | |||

+ | These conditions are valuable for use in Newton methods. | ||

+ | |||

+ | ==Goldstein Conditions== | ||

+ | Another approach to finding an appropriate step length is to use the following inequalities known as the Goldstein conditions. This condition, instead of having two constants, only employs one: | ||

+ | |||

+ | <math>f(x_k) + (1-c)\alpha_k \nabla f^T_k p_k \le f(x_k + \alpha p_k) \le f(x_k) + c \alpha_k \nabla f^T_k p_k </math> | ||

+ | |||

+ | The second equality is very similar to the Wolfe conditions in that it is simply the sufficient decrease condition. The first inequality is another way to control the step length from below. This is best seen in the Figure 3. | ||

+ | |||

+ | [[File:Goldstein.png|frame|Figure 3: Application of the Goldstein Conditions (Nocedal & Wright)]] | ||

+ | |||

+ | In comparison to the Wolfe conditions, the Goldstein conditions are better suited for [https://optimization.mccormick.northwestern.edu/index.php/Quasi-Newton_methods quasi-Newton methods] than for Newton methods. | ||

+ | |||

=Step Direction = | =Step Direction = | ||

− | + | When using line search methods, it is important to select a search or step direction with the steepest decrease in the function. This will increase the efficiency of line search methods. To identify this steepest descent at varying points along the function, the angle <math>\theta_k</math> between the chosen step direction and the negative gradient of the function <math> -\nabla f_k</math>, which is the steepest slope at point k. The angle is defined by | |

− | + | ||

− | + | ||

− | + | ||

− | = | + | <math>cos{\theta_k} = \frac{-\nabla f^T_k p_k}{||\nabla f_k|| ||p_k||}</math>, |

− | + | ||

− | + | ||

− | + | ||

− | + | ||

− | + | and, as with the step length, it is not efficient to completely minimize <math> \theta_k </math>. The amount that <math>p_k</math> can deviate from the steepest slope and still produce reasonable results depends on the step length conditions that are adhered to in the method. For example, if <math>\alpha_k</math> satisfies the Wolfe conditions, the ''Zoutendijk condition'' applies: | |

− | <math>\ | + | |

+ | <math>cos{\theta_k}^2 ||\nabla f_k||^2 < \infty</math>, which implies that <math>cos{\theta_k}^2 ||\nabla f_k||^2 \to 0</math> | ||

+ | |||

+ | There are various algorithms to use this angle property to converge on the function's minimum, and they each have their benefits and disadvantages depending on the application and complexity of the target function. The major algorithms available are the ''steepest descent method'', the ''Newton method'', and the ''quasi-Newton methods''. These algorithms are explained in more depth elsewhere within this Wiki. | ||

+ | |||

+ | When using these algorithms for line searching, it is important to know their weaknessess. The steepest descent method is the "quintessential globally convergent algorithm", but because it is so robust, it has a large computation time. The Newton methods rely on choosing an initial input value that is sufficiently near to the minimum. This is because the Hessian matrix of the function may not be positive definite, and therefore using the Newton method may not converge in a descent direction. The Newton method can be modified to atone for this. | ||

Line 30: | Line 79: | ||

3. Nocedal, J. & Wright, S. (2006) Numerical Optimization (Springer-Verlag New York, New York) 2 Ed p 664. | 3. Nocedal, J. & Wright, S. (2006) Numerical Optimization (Springer-Verlag New York, New York) 2 Ed p 664. | ||

+ | |||

+ | 4. Wolfe P (1969) Convergence Conditions for Ascent Methods. SIAM Review 11(2):226-235. |

## Latest revision as of 11:28, 7 June 2015

Author names: Elizabeth Conger

Steward: Dajun Yue and Fengqi You

## Contents |

# Introduction

An algorithm is a line search method if it seeks the minimum of a defined nonlinear function by selecting a reasonable direction vector that, when computed iteratively with a reasonable step size, will provide a function value closer to the absolute minimum of the function. Varying these will change the "tightness" of the optimization. For example, given the function , an initial is chosen. To find a lower value of , the value of is increased by the following iteration scheme

,

in which is a positive scalar known as the step length and defines the step direction. Figure 1 gives a clear flow chart to indicate the iteration scheme.

# Step Length

Choosing an appropriate step length has a large impact on the robustness of a line search method. To select the ideal step length, the following function could be minimized:

,

but this is not used in practical settings generally. This may give the most accurate minimum, but it would be very computationally expensive if the function has multiple local minima or stationary points, as shown in Figure 2.

A common and practical method for finding a suitable step length that is not too near to the global minimum of the function is to require that the step length of reduces the value of the target function, or that

This method does not ensure a convergence to the function's minimum, and so two conditions are employed to require a *significant decrease condition* during every iteration.

## Wolfe Conditions

These conditions, developed in 1969 by Philip Wolfe, are an inexact line search stipulation that requires to decreased the objective function by significant amount. This amount is defined by

where is between 0 and 1. Another way of describing this condition is to say that the decrease in the objective function should be proportional to both the step length and the directional derivative of the function and step direction. This inequality is also known as the *Armijo condition*. In general, is a very small value, ~.

The *Armijo condition* must be paired with the *curvature condition*

to keep the value from being too short. In this condition, is greater than but less than 1. These two conditions together are the Wolfe Conditions.

Another, more stringent form of these conditions is known as the *strong Wolfe conditions*. The *Armijo condition* remains the same, but the curvature condition is restrained by taking the absolute value of the left side of the inequality.

This left hand side of the curvature condition is simply the derivative of the function, and so this constraint prevents this derivative from becoming too positive, removing points that are too far from stationary points of from consideration as viable values.

These conditions are valuable for use in Newton methods.

## Goldstein Conditions

Another approach to finding an appropriate step length is to use the following inequalities known as the Goldstein conditions. This condition, instead of having two constants, only employs one:

The second equality is very similar to the Wolfe conditions in that it is simply the sufficient decrease condition. The first inequality is another way to control the step length from below. This is best seen in the Figure 3.

In comparison to the Wolfe conditions, the Goldstein conditions are better suited for quasi-Newton methods than for Newton methods.

# Step Direction

When using line search methods, it is important to select a search or step direction with the steepest decrease in the function. This will increase the efficiency of line search methods. To identify this steepest descent at varying points along the function, the angle between the chosen step direction and the negative gradient of the function , which is the steepest slope at point k. The angle is defined by

,

and, as with the step length, it is not efficient to completely minimize . The amount that can deviate from the steepest slope and still produce reasonable results depends on the step length conditions that are adhered to in the method. For example, if satisfies the Wolfe conditions, the *Zoutendijk condition* applies:

, which implies that

There are various algorithms to use this angle property to converge on the function's minimum, and they each have their benefits and disadvantages depending on the application and complexity of the target function. The major algorithms available are the *steepest descent method*, the *Newton method*, and the *quasi-Newton methods*. These algorithms are explained in more depth elsewhere within this Wiki.

When using these algorithms for line searching, it is important to know their weaknessess. The steepest descent method is the "quintessential globally convergent algorithm", but because it is so robust, it has a large computation time. The Newton methods rely on choosing an initial input value that is sufficiently near to the minimum. This is because the Hessian matrix of the function may not be positive definite, and therefore using the Newton method may not converge in a descent direction. The Newton method can be modified to atone for this.

# References

1. Sun, W. & Yuan, Y-X. (2006) Optimization Theory and Methods: Nonlinear Programming (Springer US) p 688.

2. Anonymous (2014) Line Search. (Wikipedia). http://en.wikipedia.org/wiki/Line_search.

3. Nocedal, J. & Wright, S. (2006) Numerical Optimization (Springer-Verlag New York, New York) 2 Ed p 664.

4. Wolfe P (1969) Convergence Conditions for Ascent Methods. SIAM Review 11(2):226-235.