Comments
Description
Transcript
Properties of joint distributions
30.12 PROPERTIES OF JOINT DISTRIBUTIONS 30.11.3 Marginal and conditional distributions Given a bivariate distribution f(x, y), we may be interested only in the probability function for X irrespective of the value of Y (or vice versa). This marginal distribution of X is obtained by summing or integrating, as appropriate, the joint probability distribution over all allowed values of Y . Thus, the marginal distribution of X (for example) is given by # f(x, yj ) for a discrete distribution, (30.130) fX (x) = j f(x, y) dy for a continuous distribution. It is clear that an analogous definition exists for the marginal distribution of Y . Alternatively, one might be interested in the probability function of X given that Y takes some specific value of Y = y0 , i.e. Pr(X = x|Y = y0 ). This conditional distribution of X is given by g(x) = f(x, y0 ) , fY (y0 ) where fY (y) is the marginal distribution of Y . The division by fY (y0 ) is necessary in order that g(x) is properly normalised. 30.12 Properties of joint distributions The probability density function f(x, y) contains all the information on the joint probability distribution of two random variables X and Y . In a similar manner to that presented for univariate distributions, however, it is conventional to characterise f(x, y) by certain of its properties, which we now discuss. Once again, most of these properties are based on the concept of expectation values, which are defined for joint distributions in an analogous way to those for singlevariable distributions (30.46). Thus, the expectation value of any function g(X, Y ) of the random variables X and Y is given by # for the discrete case, j g(xi , yj )f(xi , yj ) E[g(X, Y )] = ∞i ∞ g(x, y)f(x, y) dx dy for the continuous case. −∞ −∞ 30.12.1 Means The means of X and Y are defined respectively as the expectation values of the variables X and Y . Thus, the mean of X is given by # for the discrete case, j xi f(xi , yj ) E[X] = µX = ∞i ∞ xf(x, y) dx dy for the continuous case. (30.131) −∞ −∞ E[Y ] is obtained in a similar manner. 1199 PROBABILITY Show that if X and Y are independent random variables then E[XY ] = E[X]E[Y ]. Let us consider the case where X and Y are continuous random variables. Since X and Y are independent f(x, y) = fX (x)fY (y), so that ∞ ∞ ∞ ∞ xyfX (x)fY (y) dx dy = xfX (x) dx yfY (y) dy = E[X]E[Y ]. E[XY ] = −∞ −∞ −∞ −∞ An analogous proof exists for the discrete case. 30.12.2 Variances The definitions of the variances of X and Y are analogous to those for the single-variable case (30.48), i.e. the variance of X is given by # 2 for the discrete case, j (xi − µX ) f(xi , yj ) 2 V [X] = σX = ∞i ∞ 2 −∞ −∞ (x − µX ) f(x, y) dx dy for the continuous case. (30.132) Equivalent definitions exist for the variance of Y . 30.12.3 Covariance and correlation Means and variances of joint distributions provide useful information about their marginal distributions, but we have not yet given any indication of how to measure the relationship between the two random variables. Of course, it may be that the two random variables are independent, but often this is not so. For example, if we measure the heights and weights of a sample of people we would not be surprised to find a tendency for tall people to be heavier than short people and vice versa. We will show in this section that two functions, the covariance and the correlation, can be defined for a bivariate distribution and that these are useful in characterising the relationship between the two random variables. The covariance of two random variables X and Y is defined by Cov[X, Y ] = E[(X − µX )(Y − µY )], (30.133) where µX and µY are the expectation values of X and Y respectively. Clearly related to the covariance is the correlation of the two random variables, defined by Corr[X, Y ] = Cov[X, Y ] , σX σY (30.134) where σX and σY are the standard deviations of X and Y respectively. It can be shown that the correlation function lies between −1 and +1. If the value assumed is negative, X and Y are said to be negatively correlated, if it is positive they are said to be positively correlated and if it is zero they are said to be uncorrelated. We will now justify the use of these terms. 1200 30.12 PROPERTIES OF JOINT DISTRIBUTIONS One particularly useful consequence of its definition is that the covariance of two independent variables, X and Y , is zero. It immediately follows from (30.134) that their correlation is also zero, and this justifies the use of the term ‘uncorrelated’ for two such variables. To show this extremely important property we first note that Cov[X, Y ] = E[(X − µX )(Y − µY )] = E[XY − µX Y − µY X + µX µY ] = E[XY ] − µX E[Y ] − µY E[X] + µX µY = E[XY ] − µX µY . (30.135) Now, if X and Y are independent then E[XY ] = E[X]E[Y ] = µX µY and so Cov[X, Y ] = 0. It is important to note that the converse of this result is not necessarily true; two variables dependent on each other can still be uncorrelated. In other words, it is possible (and not uncommon) for two variables X and Y to be described by a joint distribution f(x, y) that cannot be factorised into a product of the form g(x)h(y), but for which Corr[X, Y ] = 0. Indeed, from the definition (30.133), we see that for any joint distribution f(x, y) that is symmetric in x about µX (or similarly in y) we have Corr[X, Y ] = 0. We have already asserted that if the correlation of two random variables is positive (negative) they are said to be positively (negatively) correlated. We have also stated that the correlation lies between −1 and +1. The terminology suggests that if the two RVs are identical (i.e. X = Y ) then they are completely correlated and that their correlation should be +1. Likewise, if X = −Y then the functions are completely anticorrelated and their correlation should be −1. Values of the correlation function between these extremes show the existence of some degree of correlation. In fact it is not necessary that X = Y for Corr[X, Y ] = 1; it is sufficient that Y is a linear function of X, i.e. Y = aX + b (with a positive). If a is negative then Corr[X, Y ] = −1. To show this we first note that µY = aµX + b. Now Y = aX + b = aX + µY − aµX ⇒ Y − µY = a(X − µX ), and so using the definition of the covariance (30.133) Cov[X, Y ] = aE[(X − µX )2 ] = aσX2 . It follows from the properties of the variance (subsection 30.5.3) that σY = |a|σX and so, using the definition (30.134) of the correlation, Corr[X, Y ] = aσX2 a = , |a| |a|σX2 which is the stated result. It should be noted that, even if the possibilities of X and Y being non-zero are mutually exclusive, Corr[X, Y ] need not have value ±1. 1201 PROBABILITY A biased die gives probabilities 12 p, p, p, p, p, 2p of throwing 1, 2, 3, 4, 5, 6 respectively. If the random variable X is the number shown on the die and the random variable Y is defined as X 2 , calculate the covariance and correlation of X and Y . We have already calculated in subsections 30.2.1 and 30.5.4 that p= 2 , 13 E[X] = 53 , 13 253 E X2 = , 13 V [X] = 480 . 169 Using (30.135), we obtain Cov[X, Y ] = Cov[X, X 2 ] = E[X 3 ] − E[X]E[X 2]. Now E[X 3 ] is given by E[X 3 ] = 13 × 12 p + (23 + 33 + 43 + 53 )p + 63 × 2p 1313 = p = 101, 2 and the covariance of X and Y is given by Cov[X, Y ] = 101 − 3660 53 253 × = . 13 13 169 The correlation is defined by Corr[X, Y ] = Cov[X, Y ]/σX σY . The standard deviation of Y may be calculated from the definition of the variance. Letting µY = E[X 2 ] = 253 gives 13 2 2 2 2 p 2 1 − µ Y + p 22 − µ Y + p 32 − µ Y + p 42 − µ Y 2 2 2 + p 52 − µY + 2p 62 − µY 187 356 28 824 = p= . 169 169 σY2 = We deduce that Corr[X, Y ] = 3660 169 169 28 824 169 ≈ 0.984. 480 Thus the random variables X and Y display a strong degree of positive correlation, as we would expect. We note that the covariance of X and Y occurs in various expressions. For example, if X and Y are not independent then V [X + Y ] = E (X + Y )2 − (E[X + Y ])2 = E X 2 + 2E[XY ] + E Y 2 − {(E[X])2 + 2E[X]E[Y ] + (E[Y ])2 } = V [X] + V [Y ] + 2(E[XY ] − E[X]E[Y ]) = V [X] + V [Y ] + 2 Cov[X, Y ]. 1202 30.12 PROPERTIES OF JOINT DISTRIBUTIONS More generally, we find (for a, b and c constant) V [aX + bY + c] = a2 V [X] + b2 V [Y ] + 2ab Cov[X, Y ]. (30.136) Note that if X and Y are in fact independent then Cov[X, Y ] = 0 and we recover the expression (30.68) in subsection 30.6.4. We may use (30.136) to obtain an approximate expression for V [ f(X, Y )] for any arbitrary function f, even when the random variables X and Y are correlated. Approximating f(X, Y ) by the linear terms of its Taylor expansion about the point (µX , µY ), we have f(X, Y ) ≈ f(µX , µY ) + ∂f ∂X (X − µX ) + ∂f ∂Y (Y − µY ), (30.137) where the partial derivatives are evaluated at X = µX and Y = µY . Taking the variance of both sides, and using (30.136), we find V [ f(X, Y )] ≈ ∂f ∂X 2 V [X] + ∂f ∂Y 2 V [Y ] + 2 ∂f ∂X ∂f ∂Y Cov[X, Y ]. (30.138) Clearly, if Cov[X, Y ] = 0, we recover the result (30.69) derived in subsection 30.6.4. We note that (30.138) is exact if f(X, Y ) is linear in X and Y . For several variables Xi , i = 1, 2, . . . , n, we can define the symmetric (positive definite) covariance matrix whose elements are Vij = Cov[Xi , Xj ], (30.139) and the symmetric (positive definite) correlation matrix ρij = Corr[Xi , Xj ]. The diagonal elements of the covariance matrix are the variances of the variables, whilst those of the correlation matrix are unity. For several variables, (30.138) generalises to ∂f 2 ∂f ∂f V [Xi ] + Cov[Xi , Xj ], V [f(X1 , X2 , . . . , Xn )] ≈ ∂Xi ∂Xi ∂Xj i i j=i where the partial derivatives are evaluated at Xi = µXi . 1203 PROBABILITY A card is drawn at random from a normal 52-card pack and its identity noted. The card is replaced, the pack shuffled and the process repeated. Random variables W , X, Y , Z are defined as follows: W =2 X=4 Y =1 Z =2 if the drawn card is a heart; W = 0 otherwise. if the drawn card is an ace, king, or queen; X = 2 if the card is a jack or ten; X = 0 otherwise. if the drawn card is red; Y = 0 otherwise. if the drawn card is black and an ace, king or queen; Z = 0 otherwise. Establish the correlation matrix for W , X, Y , Z. The means of the variables are given by µW = 2 × µY = 1 × 1 4 1 2 = 12 , = µX = 4 × µZ = 2 × 1 , 2 3 + 2 13 6 3 = . 52 13 × 2 13 16 , 13 = The variances, calculated from = V [U] = E U 2 − (E[U])2 , where U = W , X, Y or Z, are 2 16 2 3 2 2 = 4 × 14 − 12 = 34 , σX2 = 16 × 13 , + 4 × 13 − 13 = 472 σW 169 2 2 6 3 69 = 169 . σY2 = 1 × 12 − 12 = 14 , σZ2 = 4 × 52 − 13 σU2 The covariances are found by first calculating E[W X] etc. and then forming E[W X]−µW µX etc. 3 2 8 8 E[W X] = 2 (4) 52 , Cov[W , X] = 13 − 12 16 + 2 (2) 52 = 13 = 0, 13 E[W Y ] = 2(1) 1 4 = 12 , Cov[W , Y ] = E[XZ] = 4(2) − Cov[W , Z] = 0 − E[W Z] = 0, E[XY ] = 4(1) 1 2 6 52 6 52 + 2(1) = 4 52 = 8 , 13 12 , 13 Cov[X, Y ] = Cov[X, Z] = 8 13 12 13 1 2 − − Cov[Y , Z] = 0 − E[Y Z] = 0, 1 2 1 2 1 2 = 14 , 3 3 , = − 26 13 16 13 1 2 3 16 13 13 3 13 = 0, = 108 , 169 3 . = − 26 The correlations Corr[W , X] and Corr[X, Y ] are clearly zero; the remainder are given by −1/2 = 0.577, Corr[W , Y ] = 14 34 × 14 −1/2 3 3 69 × 169 = −0.209, Corr[W , Z] = − 26 4 −1/2 472 69 × 169 = 0.598, Corr[X, Z] = 108 169 169 3 1 69 −1/2 = −0.361. Corr[Y , Z] = − 26 4 × 169 Finally, then, we can write down the correlation matrix: 1 0 0.58 −0.21 0 1 0 0.60 . ρ= 0.58 0 1 −0.36 −0.21 0.60 −0.36 1 1204