Comments
Description
Transcript
Properties of distributions
30.5 PROPERTIES OF DISTRIBUTIONS In many circumstances, however, random variables do not depend on one another, i.e. they are independent. As an example, for a person drawn at random from a population, we might expect height and IQ to be independent random variables. Let us suppose that X and Y are two random variables with probability density functions g(x) and h(y) respectively. In mathematical terms, X and Y are independent RVs if their joint probability density function is given by f(x, y) = g(x)h(y). Thus, for independent RVs, if X and Y are both discrete then Pr(X = xi , Y = yj ) = g(xi )h(yj ) or, if X and Y are both continuous, then Pr(x < X ≤ x + dx, y < Y ≤ y + dy) = g(x)h(y) dx dy. The important point in each case is that the RHS is simply the product of the individual probability density functions (compare with the expression for Pr(A∩B) in (30.22) for statistically independent events A and B). By a simple extension, one may also consider the case where one of the random variables is discrete and the other continuous. The above discussion may also be trivially extended to any number of independent RVs Xi , i = 1, 2, . . . , N. The independent random variables X and Y have the PDFs g(x) = e−x and h(y) = 2e−2y respectively. Calculate the probability that X lies in the interval 1 < X ≤ 2 and Y lies in the interval 0 < Y ≤ 1. Since X and Y are independent RVs, the required probability is given by 1 2 g(x) dx h(y) dy Pr(1 < X ≤ 2, 0 < Y ≤ 1) = 1 2 −x e = 0 1 dx 1 2e−2y dy 0 2 1 = −e−x 1 × −e−2y 0 = 0.23 × 0.86 = 0.20. 30.5 Properties of distributions For a single random variable X, the probability density function f(x) contains all possible information about how the variable is distributed. However, for the purposes of comparison, it is conventional and useful to characterise f(x) by certain of its properties. Most of these standard properties are defined in terms of averages or expectation values. In the most general case, the expectation value E[g(X)] of any function g(X) of the random variable X is defined as # g(xi )f(xi ) for a discrete distribution, (30.45) E[ g(X)] = i g(x)f(x) dx for a continuous distribution, where the sum or integral is over all allowed values of X. It is assumed that 1143 PROBABILITY the series is absolutely convergent or that the integral exists, as the case may be. From its definition it is straightforward to show that the expectation value has the following properties: (i) if a is a constant then E[a] = a; (ii) if a is a constant then E[ag(X)] = aE[g(X)]; (iii) if g(X) = s(X) + t(X) then E[ g(X)] = E[ s(X)] + E[t(X)]. It should be noted that the expectation value is not a function of X but is instead a number that depends on the form of the probability density function f(x) and the function g(x). Most of the standard quantities used to characterise f(x) are simply the expectation values of various functions of the random variable X. We now consider these standard quantities. 30.5.1 Mean The property most commonly used to characterise a probability distribution is its mean, which is defined simply as the expectation value E[X] of the variable X itself. Thus, the mean is given by # xi f(xi ) for a discrete distribution, (30.46) E[X] = i xf(x) dx for a continuous distribution. The alternative notations µ and x are also commonly used to denote the mean. If in (30.46) the series is not absolutely convergent, or the integral does not exist, we say that the distribution does not have a mean, but this is very rare in physical applications. The probability of finding a 1s electron in a hydrogen atom in a given infinitesimal volume dV is ψ ∗ ψ dV , where the quantum mechanical wavefunction ψ is given by ψ = Ae−r/a0 . Find the value of the real constant A and thereby deduce the mean distance of the electron from the origin. Let us consider the random variable R = ‘distance of the electron from the origin’. Since the 1s orbital has no θ- or φ-dependence (it is spherically symmetric), we may consider the infinitesimal volume element dV as the spherical shell with inner radius r and outer radius r + dr. Thus, dV = 4πr 2 dr and the PDF of R is simply Pr(r < R ≤ r + dr) ≡ f(r) dr = 4πr 2 A2 e−2r/a0 dr. The value of A is found by requiring the total probability (i.e. the probability that the electron is somewhere) to be unity. Since R must lie between zero and infinity, we require that ∞ A2 e−2r/a0 4πr2 dr = 1. 0 1144 30.5 PROPERTIES OF DISTRIBUTIONS Integrating by parts we find A = 1/(πa30 )1/2 . Now, using the definition of the mean (30.46), we find ∞ ∞ 4 E[R] = rf(r) dr = 3 r3 e−2r/a0 dr. a0 0 0 The integral on the RHS may be integrated by parts and takes the value 3a40 /8; consequently we find that E[R] = 3a0 /2. 30.5.2 Mode and median Although the mean discussed in the last section is the most common measure of the ‘average’ of a distribution, two other measures, which do not rely on the concept of expectation values, are frequently encountered. The mode of a distribution is the value of the random variable X at which the probability (density) function f(x) has its greatest value. If there is more than one value of X for which this is true then each value may equally be called the mode of the distribution. The median M of a distribution is the value of the random variable X at which the cumulative probability function F(x) takes the value 12 , i.e. F(M) = 12 . Related to the median are the lower and upper quartiles Ql and Qu of the PDF, which are defined such that F(Ql ) = 14 , F(Qu ) = 34 . Thus the median and lower and upper quartiles divide the PDF into four regions each containing one quarter of the probability. Smaller subdivisions are also possible, e.g. the nth percentile, Pn , of a PDF is defined by F(Pn ) = n/100. Find the mode of the PDF for the distance from the origin of the electron whose wavefunction was given in the previous example. We found in the previous example that the PDF for the electron’s distance from the origin was given by f(r) = 4r2 −2r/a0 e . a30 (30.47) Differentiating f(r) with respect to r, we obtain r 8r df e−2r/a0 . = 3 1− dr a0 a0 Thus f(r) has turning points at r = 0 and r = a0 , where df/dr = 0. It is straightforward to show that r = 0 is a minimum and r = a0 is a maximum. Moreover, it is also clear that r = a0 is a global maximum (as opposed to just a local one). Thus the mode of f(r) occurs at r = a0 . 1145 PROBABILITY 30.5.3 Variance and standard deviation The variance of a distribution, V [X], also written σ 2 , is defined by V [X] = E (X − µ)2 = # j (xj − µ)2 f(xj ) (x − µ)2 f(x) dx for a discrete distribution, for a continuous distribution. (30.48) Here µ has been written for the expectation value E[X] of X. As in the case of the mean, unless the series and the integral in (30.48) converge the distribution does not have a variance. From the definition (30.48) we may easily derive the following useful properties of V [X]. If a and b are constants then (i) V [a] = 0, (ii) V [aX + b] = a2 V [X]. The variance of a distribution is always positive; its positive square root is known as the standard deviation of the distribution and is often denoted by σ. Roughly speaking, σ measures the spread (about x = µ) of the values that X can assume. Find the standard deviation of the PDF for the distance from the origin of the electron whose wavefunction was discussed in the previous two examples. Inserting the expression (30.47) for the PDF f(r) into (30.48), the variance of the random variable R is given by ∞ ∞ 4 4r2 (r − µ)2 3 e−2r/a0 dr = 3 (r4 − 2r3 µ + r 2 µ2 )e−2r/a0 dr, V [R] = a0 a0 0 0 where the mean µ = E[R] = 3a0 /2. Integrating each term in the integrand by parts we obtain 3a20 . 4 √ Thus the standard deviation of the distribution is σ = 3a0 /2. V [R] = 3a20 − 3µa0 + µ2 = We may also use the definition (30.48) to derive the Bienaymé–Chebyshev inequality, which provides a useful upper limit on the probability that random variable X takes values outside a given range centred on the mean. Let us consider the case of a continuous random variable, for which f(x) dx, Pr(|X − µ| ≥ c) = |x−µ|≥c where the integral on the RHS extends over all values of x satisfying the inequality 1146 30.5 PROPERTIES OF DISTRIBUTIONS |x − µ| ≥ c. From (30.48), we find that (x − µ)2 f(x) dx ≥ c2 σ2 ≥ |x−µ|≥c |x−µ|≥c f(x) dx. (30.49) The first inequality holds because both (x − µ)2 and f(x) are non-negative for all x, and the second inequality holds because (x − µ)2 ≥ c2 over the range of integration. However, the RHS of (30.49) is simply equal to c2 Pr(|X − µ| ≥ c), and thus we obtain the required inequality Pr(|X − µ| ≥ c) ≤ σ2 . c2 A similar derivation may be carried through for the case of a discrete random variable. Thus, for any distribution f(x) that possesses a variance we have, for example, 1 1 Pr(|X − µ| ≥ 2σ) ≤ and Pr(|X − µ| ≥ 3σ) ≤ . 4 9 30.5.4 Moments The mean (or expectation) of X is sometimes called the first moment of X, since it is defined as the sum or integral of the probability density function multiplied by the first power of x. By a simple extension the kth moment of a distribution is defined by # xk f(xj ) for a discrete distribution, k µk ≡ E[X ] = j k j x f(x) dx for a continuous distribution. (30.50) For notational convenience, we have introduced the symbol µk to denote E[X k ], the kth moment of the distribution. Clearly, the mean of the distribution is then denoted by µ1 , often abbreviated simply to µ, as in the previous subsection, as this rarely causes confusion. A useful result that relates the second moment, the mean and the variance of a distribution is proved using the properties of the expectation operator: V [X] = E (X − µ)2 2 = E X − 2µX + µ2 = E X 2 − 2µE[X] + µ2 2 = E X − 2µ2 + µ2 = E X 2 − µ2 . (30.51) In alternative notations, this result can be written (x − µ)2 = x2 − x2 1147 or σ 2 = µ2 − µ21 . PROBABILITY A biased die has probabilities p/2, p, p, p, p, 2p of showing 1, 2, 3, 4, 5, 6 respectively. Find (i) the mean, (ii) the second moment and (iii) the variance of this probability distribution. By demanding that the sum of the probabilities equals unity we require p = 2/13. Now, using the definition of the mean (30.46) for a discrete distribution, E[X] = xj f(xj ) = 1 × 12 p + 2 × p + 3 × p + 4 × p + 5 × p + 6 × 2p j 53 53 2 53 p= × = . 2 2 13 13 Similarly, using the definition of the second moment (30.50), x2j f(xj ) = 12 × 12 p + 22 p + 32 p + 42 p + 52 p + 62 × 2p E[X 2 ] = = j 253 253 p= . 2 13 Finally, using the definition of the variance (30.48), with µ = 53/13, we obtain (xj − µ)2 f(xj ) V [X] = = j = (1 − µ)2 12 p + (2 − µ)2 p + (3 − µ)2 p + (4 − µ)2 p + (5 − µ)2 p + (6 − µ)2 2p 480 3120 p= = . 169 169 It is easy to verify that V [X] = E X 2 − (E[X])2 . In practice, to calculate the moments of a distribution it is often simpler to use the moment generating function discussed in subsection 30.7.2. This is particularly true for higher-order moments, where direct evaluation of the sum or integral in (30.50) can be somewhat laborious. 30.5.5 Central moments The variance V [X] is sometimes called the second central moment of the distribution, since it is defined as the sum or integral of the probability density function multiplied by the second power of x − µ. The origin of the term ‘central’ is that by subtracting µ from x before squaring we are considering the moment about the mean of the distribution, rather than about x = 0. Thus the kth central moment of a distribution is defined as # (xj − µ)k f(xj ) for a discrete distribution, (30.52) νk ≡ E (X − µ)k = j (x − µ)k f(x) dx for a continuous distribution. It is convenient to introduce the notation νk for the kth central moment. Thus V [X] ≡ ν2 and we may write (30.51) as ν2 = µ2 − µ21 . Clearly, the first central moment of a distribution is always zero since, for example in the continuous case, ν1 = (x − µ)f(x) dx = xf(x) dx − µ f(x) dx = µ − (µ × 1) = 0. 1148 30.5 PROPERTIES OF DISTRIBUTIONS We note that the notation µk and νk for the moments and central moments respectively is not universal. Indeed, in some books their meanings are reversed. We can write the kth central moment of a distribution in terms of its kth and lower-order moments by expanding (X − µ)k in powers of X. We have already noted that ν2 = µ2 − µ21 , and similar expressions may be obtained for higher-order central moments. For example, ν3 = E (X − µ1 )3 3 = E X − 3µ1 X 2 + 3µ21 X − µ31 = µ3 − 3µ1 µ2 + 3µ21 µ1 − µ31 = µ3 − 3µ1 µ2 + 2µ31 . (30.53) In general, it is straightforward to show that νk = µk − k C1 µk−1 µ1 + · · · + (−1)r k Cr µk−r µr1 + · · · + (−1)k−1 (k Ck−1 − 1)µk1 . (30.54) Once again, direct evaluation of the sum or integral in (30.52) can be rather tedious for higher moments, and it is usually quicker to use the moment generating function (see subsection 30.7.2), from which the central moments can be easily evaluated as well. The PDF for a Gaussian distribution (see subsection 30.9.1) with mean µ and variance σ 2 is given by (x − µ)2 1 . f(x) = √ exp − 2σ 2 σ 2π Obtain an expression for the kth central moment of this distribution. As an illustration, we will perform this calculation by evaluating the integral in (30.52) directly. Thus, the kth central moment of f(x) is given by ∞ νk = (x − µ)k f(x) dx −∞ ∞ (x − µ)2 1 dx (x − µ)k exp − = √ 2σ 2 σ 2π −∞ ∞ 1 y2 = √ (30.55) y k exp − 2 dy, 2σ σ 2π −∞ where in the last line we have made the substitution y = x − µ. It is clear that if k is odd then the integrand is an odd function of y and hence the integral equals zero. Thus, νk = 0 if k is odd. When k is even, we could calculate νk by integrating by parts to obtain a reduction formula, but it is more elegant to consider instead the standard integral (see subsection 6.4.2) ∞ I= exp(−αy 2 ) dy = π 1/2 α−1/2 , −∞ 1149