Comments
Transcript
Stationary values of manyvariable functions
PARTIAL DIFFERENTIATION theorem then becomes f(x) = f(x0 ) + ∂f 1 ∂2 f ∆xi + ∆xi ∆xj + · · · , ∂xi 2! i j ∂xi ∂xj i (5.20) where ∆xi = xi − xi0 and the partial derivatives are evaluated at (x10 , x20 , . . . , xn0 ). For completeness, we note that in this case the full Taylor series can be written in the form ∞ 1 (∆x · ∇)n f(x) x=x0 , f(x) = n! n=0 where ∇ is the vector differential operator del, to be discussed in chapter 10. 5.8 Stationary values of many-variable functions The idea of the stationary points of a function of just one variable has already been discussed in subsection 2.1.8. We recall that the function f(x) has a stationary point at x = x0 if its gradient df/dx is zero at that point. A function may have any number of stationary points, and their nature, i.e. whether they are maxima, minima or stationary points of inflection, is determined by the value of the second derivative at the point. A stationary point is (i) a minimum if d2 f/dx2 > 0; (ii) a maximum if d2 f/dx2 < 0; (iii) a stationary point of inflection if d2 f/dx2 = 0 and changes sign through the point. We now consider the stationary points of functions of more than one variable; we will see that partial differential analysis is ideally suited to the determination of the position and nature of such points. It is helpful to consider first the case of a function of just two variables but, even in this case, the general situation is more complex than that for a function of one variable, as can be seen from figure 5.2. This figure shows part of a three-dimensional model of a function f(x, y). At positions P and B there are a peak and a bowl respectively or, more mathematically, a local maximum and a local minimum. At position S the gradient in any direction is zero but the situation is complicated, since a section parallel to the plane x = 0 would show a maximum, but one parallel to the plane y = 0 would show a minimum. A point such as S is known as a saddle point. The orientation of the ‘saddle’ in the xy-plane is irrelevant; it is as shown in the figure solely for ease of discussion. For any saddle point the function increases in some directions away from the point but decreases in other directions. 162 5.8 STATIONARY VALUES OF MANY-VARIABLE FUNCTIONS S P B y x Figure 5.2 Stationary points of a function of two variables. A minimum occurs at B, a maximum at P and a saddle point at S. For functions of two variables, such as the one shown, it should be clear that a necessary condition for a stationary point (maximum, minimum or saddle point) to occur is that ∂f =0 ∂x and ∂f = 0. ∂y (5.21) The vanishing of the partial derivatives in directions parallel to the axes is enough to ensure that the partial derivative in any arbitrary direction is also zero. The latter can be considered as the superposition of two contributions, one along each axis; since both contributions are zero, so is the partial derivative in the arbitrary direction. This may be made more precise by considering the total differential df = ∂f ∂f dx + dy. ∂x ∂y Using (5.21) we see that although the infinitesimal changes dx and dy can be chosen independently the change in the value of the infinitesimal function df is always zero at a stationary point. We now turn our attention to determining the nature of a stationary point of a function of two variables, i.e. whether it is a maximum, a minimum or a saddle point. By analogy with the one-variable case we see that ∂2 f/∂x2 and ∂2 f/∂y 2 must both be positive for a minimum and both be negative for a maximum. However these are not sufficient conditions since they could also be obeyed at complicated saddle points. What is important for a minimum (or maximum) is that the second partial derivative must be positive (or negative) in all directions, not just in the x- and y- directions. 163 PARTIAL DIFFERENTIATION To establish just what constitutes sufficient conditions we first note that, since f is a function of two variables and ∂f/∂x = ∂f/∂y = 0, a Taylor expansion of the type (5.18) about the stationary point yields f(x, y) − f(x0 , y0 ) ≈ 1 (∆x)2 fxx + 2∆x∆yfxy + (∆y)2 fyy , 2! where ∆x = x − x0 and ∆y = y − y0 and where the partial derivatives have been written in more compact notation. Rearranging the contents of the bracket as the weighted sum of two squares, we find 2 2 fxy 1 fxy ∆y 2 fxx ∆x + f(x, y) − f(x0 , y0 ) ≈ + (∆y) fyy − . 2 fxx fxx (5.22) For a minimum, we require (5.22) to be positive for all ∆x and ∆y, and hence 2 /fxx ) > 0. Given the first constraint, the second can be fxx > 0 and fyy − (fxy 2 written fxx fyy > fxy . Similarly for a maximum we require (5.22) to be negative, 2 and hence fxx < 0 and fxx fyy > fxy . For minima and maxima, symmetry requires that fyy obeys the same criteria as fxx . When (5.22) is negative (or zero) for some values of ∆x and ∆y but positive (or zero) for others, we have a saddle point. In 2 . In summary, all stationary points have fx = fy = 0 and this case fxx fyy < fxy they may be classified further as 2 < fxx fyy , (i) minima if both fxx and fyy are positive and fxy 2 (ii) maxima if both fxx and fyy are negative and fxy < fxx fyy , 2 > fxx fyy . (iii) saddle points if fxx and fyy have opposite signs or fxy 2 Note, however, that if fxy = fxx fyy then f(x, y) − f(x0 , y0 ) can be written in one of the four forms 2 1 ∆x|fxx |1/2 ± ∆y|fyy |1/2 . ± 2 For some choice of the ratio ∆y/∆x this expression has zero value, showing that, for a displacement from the stationary point in this particular direction, f(x0 + ∆x, y0 + ∆y) does not differ from f(x0 , y0 ) to second order in ∆x and ∆y; in such situations further investigation is required. In particular, if fxx , fyy and fxy are all zero then the Taylor expansion has to be taken to a higher order. As examples, such extended investigations would show that the function f(x, y) = x4 + y 4 has a minimum at the origin but that g(x, y) = x4 + y 3 has a saddle point there. 164 5.8 STATIONARY VALUES OF MANY-VARIABLE FUNCTIONS Show that the function f(x, y) = x3 exp(−x2 − y 2 ) has a maximum at the point ( 3/2, 0), a minimum at (− 3/2, 0) and a stationary point at the origin whose nature cannot be determined by the above procedures. Setting the first two partial derivatives to zero to locate the stationary points, we find ∂f = (3x2 − 2x4 ) exp(−x2 − y 2 ) = 0, ∂x ∂f = −2yx3 exp(−x2 − y 2 ) = 0. ∂y (5.23) (5.24) For (5.24) to be satisfied we require x = 0 or y = 0 and for (5.23) to be satisfied we require x = 0 or x = ± 3/2. Hence the stationary points are at (0, 0), ( 3/2, 0) and (− 3/2, 0). We now find the second partial derivatives: fxx = (4x5 − 14x3 + 6x) exp(−x2 − y 2 ), fyy = x3 (4y 2 − 2) exp(−x2 − y 2 ), fxy = 2x2 y(2x2 − 3) exp(−x2 − y 2 ). We then substitute the pairs of values of x and y for each stationary point and find that at (0, 0) fxx = 0, and at (± fyy = 0, fxy = 0 3/2, 0) fxx = ∓6 3/2 exp(−3/2), fyy = ∓3 3/2 exp(−3/2), fxy = 0. Hence, applying criteria (i)–(iii) above, we find that (0, 0) is an undetermined stationary point, ( 3/2, 0) is a maximum and (− 3/2, 0) is a minimum. The function is shown in figure 5.3. Determining the nature of stationary points for functions of a general number of variables is considerably more difficult and requires a knowledge of the eigenvectors and eigenvalues of matrices. Although these are not discussed until chapter 8, we present the analysis here for completeness. The remainder of this section can therefore be omitted on a first reading. For a function of n real variables, f(x1 , x2 , . . . , xn ), we require that, at all stationary points, ∂f =0 ∂xi for all xi . In order to determine the nature of a stationary point, we must expand the function as a Taylor series about the point. Recalling the Taylor expansion (5.20) for a function of n variables, we see that ∆f = f(x) − f(x0 ) ≈ 1 ∂2 f ∆xi ∆xj . 2 i j ∂xi ∂xj 165 (5.25) PARTIAL DIFFERENTIATION maximum 0.4 0.2 0 −0.2 −0.4 2 minimum −3 −2 −1 x 0 1 2 3 −2 0y Figure 5.3 The function f(x, y) = x3 exp(−x2 − y 2 ). If we define the matrix M to have elements given by Mij = ∂2 f , ∂xi ∂xj then we can rewrite (5.25) as ∆f = 12 ∆xT M∆x, (5.26) where ∆x is the column vector with the ∆xi as its components and ∆xT is its transpose. Since M is real and symmetric it has n real eigenvalues λr and n orthogonal eigenvectors er , which after suitable normalisation satisfy eTr es = δrs , Mer = λr er , where the Kronecker delta, written δrs , equals unity for r = s and equals zero otherwise. These eigenvectors form a basis set for the n-dimensional space and we can therefore expand ∆x in terms of them, obtaining ar er , ∆x = r 166