Comments
Description
Transcript
Important joint distributions
30.15 IMPORTANT JOINT DISTRIBUTIONS where ∂x 1 ∂y1 ∂(x1 , x2 . . . , xn ) J≡ = .. ∂(y1 , y2 , . . . , yn ) . ∂x1 ∂yn ... .. . ... ∂xn ∂y1 .. . ∂xn ∂yn , is the Jacobian of the xi with respect to the yj . Suppose that the random variables Xi , i = 1, 2, . . . , n, are independent and Gaussian distributed with means µi and variances σi2 respectively. Find the PDF for the new variables spherical shell in Z-space, Zi = (Xi − µi )/σi , i = 1, 2, . . . , n. By considering an elemental find the PDF of the chi-squared random variable χ2n = ni=1 Zi2 . Since the Xi are independent random variables, f(x1 , x2 , . . . , xn ) = f(x1 )f(x2 ) · · · f(xn ) = n 1 (xi − µi )2 exp − . (2π)n/2 σ1 σ2 · · · σn 2σi2 i=1 To derive the PDF for the variables Zi , we require |f(x1 , x2 , . . . , xn ) dx1 dx2 · · · dxn | = |g(z1 , z2 , . . . , zn ) dz1 dz2 · · · dzn |, and, noting that dzi = dxi /σi , we obtain n 1 1 2 g(z1 , z2 , . . . , zn ) = exp − z . (2π)n/2 2 i=1 i Let us now consider the random variable χ2n = ni=1 Zi2 , which we may regard as the square of the distance from the origin in the n-dimensional Z-space. We now require that g(z1 , z2 , . . . , zn ) dz1 dz2 · · · dzn = h(χ2n )dχ2n . If we consider the infinitesimal volume dV = dz1 dz2 · · · dzn to be that enclosed by the n-dimensional spherical shell of radius χn and thickness dχn then we may write dV = Aχn−1 n dχn , for some constant A. We thus obtain 2 1 2 n−2 h(χ2n )dχ2n ∝ exp(− 21 χ2n )χn−1 n dχn ∝ exp(− 2 χn )χn dχn , where we have used the fact that dχ2n = 2χn dχn . Thus we see that the PDF for χ2n is given by h(χ2n ) = B exp(− 12 χ2n )χn−2 n , for some constant B. This constant may be determined from the normalisation condition ∞ h(χ2n ) dχ2n = 1 0 and is found to be B = [2n/2 Γ( 21 n)]−1 . This is the nth-order chi-squared distribution discussed in subsection 30.9.4. 30.15 Important joint distributions In this section we will examine two important multivariate distributions, the multinomial distribution, which is an extension of the binomial distribution, and the multivariate Gaussian distribution. 1207 PROBABILITY 30.15.1 The multinomial distribution The binomial distribution describes the probability of obtaining x ‘successes’ from n independent trials, where each trial has only two possible outcomes. This may be generalised to the case where each trial has k possible outcomes with respective probabilities p1 , p2 , . . . , pk . If we consider the random variables Xi , i = 1, 2, . . . , n, to be the number of outcomes of type i in n trials then we may calculate their joint probability function f(x1 , x2 , . . . , xk ) = Pr(X1 = x1 , X2 = x2 , . . . , Xk = xk ), k where we must have i=1 xi = n. In n trials the probability of obtaining x1 outcomes of type 1, followed by x2 outcomes of type 2 etc. is given by px1 1 px2 2 · · · pxk k . However, the number of distinguishable permutations of this result is n! , x1 !x2 ! · · · xk ! and thus f(x1 , x2 , . . . , xk ) = n! px1 px2 · · · pxk k . x1 !x2 ! · · · xk ! 1 2 (30.146) This is the multinomial probability distribution. If k = 2 then the multinomial distribution reduces to the familiar binomial distribution. Although in this form the binomial distribution appears to be a function of two random variables, it must be remembered that, in fact, since p2 = 1 − p1 and x2 = n − x1 , the distribution of X1 is entirely determined by the parameters p and n. That X1 has a binomial distribution is shown by remembering that it represents the number of objects of a particular type obtained from sampling with replacement, which led to the original definition of the binomial distribution. In fact, any of the random variables Xi has a binomial distribution, i.e. the marginal distribution of each Xi is binomial with parameters n and pi . It immediately follows that E[Xi ] = npi and V [Xi ]2 = npi (1 − pi ). (30.147) At a village f ête patrons were invited, for a 10 p entry fee, to pick without looking six tickets from a drum containing equal large numbers of red, blue and green tickets. If five or more of the tickets were of the same colour a prize of 100 p was awarded. A consolation award of 40 p was made if two tickets of each colour were picked. Was a good time had by all? In this case, all types of outcome (red, blue and green) have the same probabilities. The probability of obtaining any given combination of tickets is given by the multinomial distribution with n = 6, k = 3 and pi = 13 , i = 1, 2, 3. 1208 30.15 IMPORTANT JOINT DISTRIBUTIONS (i) The probability of picking six tickets of the same colour is given by 6 0 0 6! 1 1 1 1 Pr (six of the same colour) = 3 × = . 6!0!0! 3 3 3 243 The factor of 3 is present because there are three different colours. (ii) The probability of picking five tickets of one colour and one ticket of another colour is 5 1 0 4 1 1 1 6! = . Pr(five of one colour; one of another) = 3 × 2 × 5!1!0! 3 3 3 81 The factors of 3 and 2 are included because there are three ways to choose the colour of the five matching tickets, and then two ways to choose the colour of the remaining ticket. (iii) Finally, the probability of picking two tickets of each colour is 2 2 2 6! 10 1 1 1 Pr (two of each colour) = = . 2!2!2! 3 3 3 81 Thus the expected return to any patron was, in pence, 10 4 1 + + 40 × = 10.29. 100 243 81 81 A good time was had by all but the stallholder! 30.15.2 The multivariate Gaussian distribution A particularly interesting multivariate distribution is provided by the generalisation of the Gaussian distribution to multiple random variables Xi , i = 1, 2, . . . , n. If the expectation value of Xi is E(Xi ) = µi then the general form of the PDF is given by 1 aij (xi − µi )(xj − µj ) , f(x1 , x2 , . . . , xn ) = N exp − 2 i j where aij = aji and N is a normalisation constant that we give below. If we write the column vectors x = (x1 x2 · · · xn )T and µ = (µ1 µ2 · · · µn )T , and denote the matrix with elements aij by A then f(x) = f(x1 , x2 , . . . , xn ) = N exp − 21 (x − µ)T A(x − µ) , where A is symmetric. Using the same method as that used to derive (30.145) it is straightforward to show that the MGF of f(x) is given by M(t1 , t2 , . . . , tn ) = exp µT t + 12 tT A−1 t , where the column matrix t = (t1 E[Xi Xj ] = t2 ··· tn )T . From the MGF, we find that ∂2 M(0, 0, . . . , 0) = µi µj + (A−1 )ij , ∂ti ∂tj 1209 PROBABILITY and thus, using (30.135), we obtain Cov[Xi , Xj ] = E[(Xi − µi )(Xj − µj )] = (A−1 )ij . Hence A is equal to the inverse of the covariance matrix V of the Xi , see (30.139). Thus, with the correct normalisation, f(x) is given by 1 exp − 21 (x − µ)T V−1 (x − µ) . f(x) = n/2 1/2 (2π) (det V) (30.148) Evaluate the integral I= ∞ exp − 12 (x − µ)T V−1 (x − µ) dn x, where V is a symmetric matrix, and hence verify the normalisation in (30.148). We begin by making the substitution y = x − µ to obtain exp(− 21 yT V−1 y) dn y. I= ∞ Since V is a symmetric matrix, it may be diagonalised by an orthogonal transformation to the new set of variables y = ST y, where S is the orthogonal matrix with the normalised eigenvectors of V as its columns (see section 8.16). In this new basis, the matrix V becomes V = ST VS = diag(λ1 , λ2 , . . . , λn ), where the λi are the eigenvalues of V. Also, since S is orthogonal, det S = ±1, and so dn y = |det S| dn y = dn y . Thus we can write I as I= n yi 2 ··· exp − dy1 dy2 · · · dyn 2λi −∞ −∞ −∞ i=1 n ∞ yi 2 = exp − (30.149) dyi = (2π)n/2 (λ1 λ2 · · · λn )1/2 , 2λi i=1 −∞ ∞ where we have used the standard integral −∞ exp(−αy 2 ) dy = (π/α)1/2 (see subsection 6.4.2). From section 8.16, however, we note that the product of eigenvalues in (30.149) is equal to det V. Thus we finally obtain ∞ ∞ ∞ I = (2π)n/2 (det V)1/2 , and hence the normalisation in (30.148) ensures that f(x) integrates to unity. The above example illustrates some importants points concerning the multivariate Gaussian distribution. In particular, we note that the Yi are independent Gaussian variables with mean zero and variance λi . Thus, given a general set of n Gaussian variables x with means µ and covariance matrix V, one can always perform the above transformation to obtain a new set of variables y , which are linear combinations of the old ones and are distributed as independent Gaussians with zero mean and variances λi . This result is extremely useful in proving many of the properties of the mul1210